How to create your own custom pipeline
BojAI allows you to define your own end-to-end machine learning pipelines tailored to your unique data and modeling needs. Creating a custom pipeline gives you full control over how data is processed, models are trained, and results are generated—all while benefiting from BojAI’s modular structure and visualization features.
To define a custom pipeline, you will need to implement five Python modules, each responsible for a different aspect of the ML workflow. These files are:
File Name | Description |
---|---|
custom_data_processor.py |
Defines how your raw data is preprocessed, tokenized, and transformed before training. This includes tasks like cleaning, splitting, or encoding data. |
custom_pipeline_user.py |
Acts as the orchestrator of your pipeline. It loads configurations and connects your data processor, model, and trainer together in a runnable pipeline. |
custom_trainer.py |
Contains the logic for training your model, tracking metrics, and saving checkpoints. You define how training epochs run and how evaluations are performed. |
global_vars.py |
Stores all constants and paths (e.g., model names, dataset paths, tokenizer type, etc.) in one place. This keeps your pipeline easy to update and configure. |
model.py |
Defines the architecture of the model you are training. This can be a HuggingFace model, a PyTorch nn.Module , or any other structure supported by BojAI. |
Follow the steps in this section to create your own custom pipeline and use it.