How to code the model part
This file is where you define the core model for your BojAI pipeline.
You can implement any architecture you like, from PyTorch to scikit-learn, or even CUDA-based logic.
The only limitation so far is that only torch models can be saved and loaded later. We aim to improve this later.
What You Can Use
You can use any type of model, including:
- PyTorch (CNN, Transformer, MLP, etc.)
- Scikit-learn models (e.g., LogisticRegression, SVC)
- NumPy or rule-based logic
- CUDA models compiled from
.cu
files - Custom tokenizers (define here if needed)
No base class is required. Your model only needs to be instantiable and callable by the trainer.
👇 Examples
1. A Simple Python Model
class YourModel:
def __init__(self):
self.model = None # Replace with your logic
def predict(self, x):
pass # Optional
def save(self, path):
pass # Optional
def load(self, path):
pass # Optional
2. A Model Defined in .cu
and Called via Subprocess
import subprocess
import tempfile
import os
class YourModel:
def __init__(self):
self.cuda_binary = "./my_model_binary"
if not os.path.exists(self.cuda_binary):
raise FileNotFoundError(f"CUDA binary not found")
def predict(self, x):
with tempfile.NamedTemporaryFile(mode='w+', delete=False) as input_file, tempfile.NamedTemporaryFile(mode='r', delete=False) as output_file:
input_file.write(self._format_input(x))
input_file.flush()
subprocess.run([self.cuda_binary, input_file.name, output_file.name], check=True)
output = output_file.read()
os.remove(input_file.name)
os.remove(output_file.name)
return self._parse_output(output)
def _format_input(self, x):
return " ".join(map(str, x))
def _parse_output(self, output_str):
return list(map(float, output_str.strip().split()))
🔗 Where This Model Is Used
File | Purpose |
---|---|
custom_trainer.py |
Training and evaluation |
custom_pipeline_user.py |
Inference and predictions |
global_vars.py |
Model instantiation and options |
Extend Freely
You can add:
- A
forward()
method (for deep learning) - A
fit()
method (for scikit-learn models) - Logging, dropout control, or hybrid logic
No restrictions apply—it’s your model.
Final Step: Hook It Up in global_vars.py
Once you’ve defined your model in model.py
, register it by importing it in global_vars.py
, and defining the following:
def getNewTokenizer():
# Return any tokenizer you want to use
pass
def getNewModel():
# Return an instance of your model
pass
Example
from model import YourModel
from transformers import AutoTokenizer
def getNewTokenizer():
return AutoTokenizer.from_pretrained("bert-base-uncased")
def getNewModel():
return YourModel()
Optional: init_model
Hook
def init_model(data, model, hyper_params):
# Optional post-instantiation logic
pass
Use this only if your model depends on the dataset or hyperparameters after being created.
For example, if your model requires
input_dim = len(tokenizer)
or similar config-specific logic, define a.init()
method in your model and call it here.
def init_model(data, model, hyper_params):
model.init(data, hyper_params)
Optional: Provide User Options
You can give users the option to choose between models or tokenizers via the UI or CLI. Define different models then import and add them to the options list.
Example Setup
from model import CNNModel, TransformerModel
options = {
"cnn": CNNModel,
"transformer": TransformerModel
}
Make sure "options"
is enabled in your browseDict
:
browseDict = {
...
"options": True,
...
}
This enables dropdowns or CLI flags for model/tokenizer selection.