How to use the data processing agent
If your dataset is structured differently from what BojAI expects, you can use the Data Processing Agent to automatically adapt the pipeline based on a description you provide.
⚠️ This is a beta feature, and it currently works via Ollama, which must be installed locally.
Prerequisites
To use the agent, you must have Ollama installed on your system.
Then download the LLM locally by running:
ollama pull mistral
When to Use the Agent
During the CLI initialization flow, after you enter the initialization data, BojAI will ask:
Would you like to use the agent? [Y/n]
If you select Y
, you’ll go through a few guided steps to describe your dataset so the agent can adapt the pipeline for you.
Steps When Using the Agent
-
Confirm Usage
You’ll be asked:
Would you like to use the agent? [Y/n]
Enter
Y
to continue orN
to skip. -
Confirm Image Use
You’ll be asked:
Does your data contain images? [Y/n]
This helps the agent know whether to treat inputs as image paths.
-
Describe Your Dataset
You’ll be prompted to enter a clear description of how your data is organized.
Example:My data folder contains a .txt file where each line has input and output separated by a comma.
Or for images:
Each subfolder contains one image and a label.txt file with the class.
Entering
0
at this step will cancel agent use.
What the Agent Does
- Creates a temporary copy of your
custom_data_processor.py
file - Generates a new
get_inputs_outputs
method using a local LLM (via Ollama) - Injects the function into the file and tests it
- On success: overwrites the original file with the new one
- On failure: retries up to 3 times with feedback before giving up
If successful, your YourDataProcessor
class will now have a working get_inputs_outputs()
method tailored to your data.
Guidelines for Descriptions
The better your description, the better the result. Be sure to include:
- What files are present (e.g.
.csv
,.txt
, image folders) - How inputs and outputs are separated or paired
- Any formats (e.g. line-based, JSON objects, folder-based)
What Happens Under the Hood
- The agent constructs a prompt with your description
- It asks Ollama to generate a valid Python method
- It uses regex to extract the function and replaces the existing one
- Then, it tests it using a call to
get_item_untokenized(0)
If the function runs correctly, the changes are saved. If not, the agent tries to fix it based on the error, up to three times.