How to code the model part

This file is where you define the core model for your BojAI pipeline.
You can implement any architecture you like, from PyTorch to scikit-learn, or even CUDA-based logic.

The only limitation so far is that only torch models can be saved and loaded later. We aim to improve this later.


What You Can Use

You can use any type of model, including:

  • PyTorch (CNN, Transformer, MLP, etc.)
  • Scikit-learn models (e.g., LogisticRegression, SVC)
  • NumPy or rule-based logic
  • CUDA models compiled from .cu files
  • Custom tokenizers (define here if needed)

No base class is required. Your model only needs to be instantiable and callable by the trainer.


👇 Examples

1. A Simple Python Model

class YourModel:
    def __init__(self):
        self.model = None  # Replace with your logic

    def predict(self, x):
        pass  # Optional

    def save(self, path):
        pass  # Optional

    def load(self, path):
        pass  # Optional

2. A Model Defined in .cu and Called via Subprocess

import subprocess
import tempfile
import os

class YourModel:
    def __init__(self):
        self.cuda_binary = "./my_model_binary"
        if not os.path.exists(self.cuda_binary):
            raise FileNotFoundError(f"CUDA binary not found")

    def predict(self, x):
        with tempfile.NamedTemporaryFile(mode='w+', delete=False) as input_file,              tempfile.NamedTemporaryFile(mode='r', delete=False) as output_file:
            input_file.write(self._format_input(x))
            input_file.flush()
            subprocess.run([self.cuda_binary, input_file.name, output_file.name], check=True)
            output = output_file.read()
        os.remove(input_file.name)
        os.remove(output_file.name)
        return self._parse_output(output)

    def _format_input(self, x):
        return " ".join(map(str, x))

    def _parse_output(self, output_str):
        return list(map(float, output_str.strip().split()))

🔗 Where This Model Is Used

File Purpose
custom_trainer.py Training and evaluation
custom_pipeline_user.py Inference and predictions
global_vars.py Model instantiation and options

Extend Freely

You can add:

  • A forward() method (for deep learning)
  • A fit() method (for scikit-learn models)
  • Logging, dropout control, or hybrid logic

No restrictions apply—it’s your model.


Final Step: Hook It Up in global_vars.py

Once you’ve defined your model in model.py, register it by importing it in global_vars.py, and defining the following:

def getNewTokenizer():
    # Return any tokenizer you want to use
    pass

def getNewModel():
    # Return an instance of your model
    pass

Example

from model import YourModel
from transformers import AutoTokenizer

def getNewTokenizer():
    return AutoTokenizer.from_pretrained("bert-base-uncased")

def getNewModel():
    return YourModel()

Optional: init_model Hook

def init_model(data, model, hyper_params):
    # Optional post-instantiation logic
    pass

Use this only if your model depends on the dataset or hyperparameters after being created.

For example, if your model requires input_dim = len(tokenizer) or similar config-specific logic, define a .init() method in your model and call it here.

def init_model(data, model, hyper_params):
    model.init(data, hyper_params)

Optional: Provide User Options

You can give users the option to choose between models or tokenizers via the UI or CLI. Define different models then import and add them to the options list.

Example Setup

from model import CNNModel, TransformerModel

options = {
    "cnn": CNNModel,
    "transformer": TransformerModel
}

Make sure "options" is enabled in your browseDict:

browseDict = {
    ...
    "options": True,
    ...
}

This enables dropdowns or CLI flags for model/tokenizer selection.