CLI Reference

Initialize

This is the first step you see when launching the Bojai CLI. It collects the basic information required to initialize your pipeline session, including the model, data location, and training/evaluation split.

You will be asked for:

  • A name for your pipeline
  • The path to your dataset
  • The training/evaluation split (must add to 1)
  • (If applicable) A tokenizer architecture

Once confirmed, a new pipeline session is created, and you are given the option to proceed to the preparation stage.


Prepare

This stage of the interface allows you to examine your data before you start using it in training and evaluation. While in this stage you can view information about your pipeline’s session including:

  • Model name
  • Model type
  • Tokenizer type
  • Number of datapoints
  • Whether the data preparation was successful

Additionally, you can perform different actions by typing the letter in brackets before each option and pressing Enter.

  • [v] View tokenized data — Displays a tokenized datapoint from your dataset. You can provide a specific index or get a random datapoint by hitting enter without entering a datapoint number. Once you’re done viewing your data, you can type q to go back to prepare’s main menu and choose other options.

  • [r] View raw data — Shows the raw (untokenized) version of a datapoint. If your pipeline is processing images, it will open the image; otherwise, it prints the raw text. You can also view a random datapoint by hitting enter without entering a datapoint number. Once you’re done viewing your data, you can type q to go back to prepare’s main menu and choose other options.

  • [u] Update dataset path — Changes the dataset directory and reloads the data.

  • [t] Continue to training — Moves to the training stage using the prepared data.

  • [q] Quit — Exits the preparation stage.


Train

This stage of the interface allows you to train and evaluate your model. It displays the current hyperparameters and device being used. You can then select one of the following options:

  • [t] Start training — Trains your model. If your model is non-trainable (like kNN), it will skip training.

  • [u] Update hyperparameters — Allows you to change any of the training hyperparameters (like learning rate or batch size).

  • [e] Evaluate model — Evaluates your model on the validation set and prints the metric (e.g., Accuracy).

  • [r] Replace model — Replaces your current model with a new instance (resets weights).

  • [d] Deploy model (if trained) — Launches the deployment stage, but only if the model has been trained.

  • [p] Go back to data preparation — Returns to the prepare stage with the same session state.

  • [q] Quit — Exits the training stage.


Deploy

This stage of the interface allows you to evaluate the model with your original data, upload new evaluation data and use them to evaluate, download the model, and use the model. The options you can use are listed below:

  • [a] Add new evaluation data — Allows you to provide a new dataset to evaluate the model on.

  • [e] Evaluate on original data — Evaluates your model using the default validation dataset.

  • [n] Evaluate on new data — Evaluates your model using the new dataset you uploaded.

  • [u] Use model for inference — Prompts you to input text (or other data) and returns the model’s output.

  • [s] Save model — Saves the trained model as a .bin file in a location you choose.

  • [t] Go back to training — Returns to the training CLI.

  • [p] Go back to data processing — Returns to the data preparation CLI.

  • [q] Quit — Exits the deployment stage.