How to code the training logic

This file defines your custom training logic. BojAI allows you to implement your own training loop from scratch while taking care of all the plumbing—model loading, tokenizer, device management, etc.

All you need to do is extend the base Trainer class and implement your logic.

Define Hyperparameters

Before starting your training, open the global_vars.py file, assign a name to your task, and define your hyperparameters as a dictionary.

Any hyperparameter you define here will automatically become a field of the Trainer class. This means you can access it using self.[hyperparameter_name] inside any method of the trainer.

Example

In global_vars.py:

task_type = "gegt"

hyper_params = {
    "batch_size": 32,
    "learning_rate": 1e-5,
    "num_epochs": 1,
    "num_workers": 0,
}

Anywhere inside the Trainer class:

self.num_epochs
self.learning_rate
...etc

Now go back to custom_trainer.py and continue working:

Base Class: Trainer

Every custom trainer must extend Trainer, like this:

from trainer import Trainer

class ImplementYourTrainer(Trainer):
    ...

You must not rename this class or change its base class.

When BojAI runs your pipeline, it automatically injects:

  • self.model: your model instance
  • self.device: the target device (CPU or GPU)
  • self.tokenizer: if needed
  • self.hyper_params: a dictionary of your training configuration

Method 1: train()

This is where you define your training loop.

def train(self, qthread, progress_worker, loss_worker):
    ...

Features Available Inside This Method

  • Loop through self.training_data, this is a Dataset object.

  • Call self.model.train_step() or equivalent. self.model is the model you defined in step 2.

  • Report progress to UI with:

    • progress_worker: reporting the progress of the training loop. For example, reporting the end of one epoch.
    • loss_worker: reporting the loss of the training loop. For example, at the end of each epoch, report the loss at that stage.
    • qthread: allows the UI to refresh.
  • Example:

    progress_worker.emit(percent)
    loss_worker.emit(loss_value)
    qthread.msleep(1)  # optional but recommended for UI refresh
    

Example Skeleton

total_steps = len(self.training_data)
for i, data in enumerate(self.training_data):
    loss = self.model.train_step(data)
    progress = int((i + 1) / total_steps * 100)
    progress_worker.emit(progress)
    loss_worker.emit(loss)
    qthread.msleep(1)

You can use PyTorch, scikit-learn, or even plain Python loops.


Method 2: evaluate()

This is where you define how your model is evaluated after and during training.

def evaluate(self, eval_dataset=None):
    ...
  • If eval_dataset is provided, use it instead of self.eval_data.
  • Compute whatever metric fits your task: accuracy, BLEU, F1, MSE, etc.

Example Skeleton

total = 0
correct = 0
for data in eval_dataset:
    prediction = self.model.predict(data)
    correct += (prediction == data.label)
    total += 1
return correct / total

You can return any object (float, dict, etc.). BojAI will display it.


Logging for Visualizations

BojAI provides a built-in Visualizer that lets you track:

  • Loss over epochs
  • Evaluation performance on training data
  • Evaluation performance on validation data
  • Evaluation on new datasets after deployment

To enable this, your trainer’s both methods can log values at each step using self.logger:

Example Logging

# Log training loss
self.logger.log(epoch=epoch_num, loss=loss)

# Log evaluation accuracy on training set
self.logger.log(epoch=epoch_num, train=acc_train)

# Log evaluation on validation set
self.logger.log(epoch=epoch_num, valid=acc_valid)

# Log evaluation on new inference dataset
self.logger.log(eval_value=acc_eval)

You must always provide the epoch, and either train and loss and valid OR eval_value. If you log eval_value with the other values, the visualizer will through an error. To learn more about why, read about Optimization bias

If you log all these values, they will be stored automatically and can be plotted in the GUI or CLI using the built-in Visualizer.


Want to See it in Action?

Check out the dedicated guide: How to Use the Visualizer

It shows exactly how to log your training progress and compare evaluation results.


Run the training stage

Once this class is implemented, BojAI can run your full training phase via:

In CLI:
bojai start --pipeline name-from-build --directory where/the/editing files/are --stage train

In UI: bojai start --pipeline name-from-build --directory where/the/editing files/are --stage train --ui

You now have full control over how your model is trained, evaluated, and visualized.