How to create your own custom pipeline

Welcome to 2.0 release!

BojAI allows you to define your own end-to-end machine learning pipelines tailored to your unique data and modeling needs. Creating a custom pipeline gives you full control over how data is processed, models are trained, and results are generated—all while benefiting from BojAI’s modular structure and visualization features.

To define a custom pipeline, you will need to implement five Python modules, each responsible for a different aspect of the ML workflow. These files are:

File Name	Description
`custom_data_processor.py`	Defines how your raw data is preprocessed, tokenized, and transformed before training. This includes tasks like cleaning, splitting, or encoding data.
`custom_pipeline_user.py`	Acts as the orchestrator of your pipeline. It loads configurations and connects your data processor, model, and trainer together in a runnable pipeline.
`custom_trainer.py`	Contains the logic for training your model, tracking metrics, and saving checkpoints. You define how training epochs run and how evaluations are performed.
`global_vars.py`	Stores all constants and paths (e.g., model names, dataset paths, tokenizer type, etc.) in one place. This keeps your pipeline easy to update and configure.
`model.py`	Defines the architecture of the model you are training. This can be a HuggingFace model, a PyTorch `nn.Module`, or any other structure supported by BojAI.

Follow the steps in this section to create your own custom pipeline and use it.

Updated on 19 Jul 2025