banhxeo.train.config module

class banhxeo.train.config.LossConfig(*, name: str = 'CrossEntropyLoss', kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Configuration for the loss function used during training.

Variables:
  • name (str) – The name of the loss function class from torch.nn.modules.loss (e.g., “CrossEntropyLoss”, “MSELoss”). Defaults to “CrossEntropyLoss”.

  • kwargs (Dict[str, Any]) – A dictionary of keyword arguments to be passed to the loss function’s constructor (e.g., {“weight”: torch.tensor([0.1, 0.9])}). Defaults to an empty dictionary.

name: str
kwargs: Dict[str, Any]
get_loss_function()[source]

Instantiates and returns the configured PyTorch loss function.

Returns:

An instance of the specified PyTorch loss function.

Raises:
  • ValueError – If the specified name is not a valid loss function in torch.nn.modules.loss.

  • AttributeError – If the specified name is not found.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class banhxeo.train.config.OptimizerConfig(*, name: str = 'AdamW', scheduler_name: str | None = None, optimizer_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, lr_scheduler_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Configuration for the optimizer and learning rate scheduler.

Variables:
  • name (str) – The name of the optimizer class from torch.optim (e.g., “AdamW”, “SGD”). Defaults to “AdamW”.

  • scheduler_name (str | None) – Optional name of the learning rate scheduler class from torch.optim.lr_scheduler (e.g., “LambdaLR”, “ReduceLROnPlateau”). Defaults to None (no scheduler).

  • warmup_steps – Number of initial steps during which the learning rate is linearly warmed up from 0 to its initial value. This is typically implemented by custom logic or a scheduler like transformers.get_linear_schedule_with_warmup. Note: This field is declarative; actual warmup logic needs to be implemented in the training loop or via a specific scheduler. Defaults to 0.

  • optimizer_kwargs (Dict[str, Any]) – Keyword arguments for the optimizer’s constructor (e.g., {“lr”: 1e-3, “weight_decay”: 0.01}). Defaults to an empty dict.

  • lr_scheduler_kwargs (Dict[str, Any]) – Keyword arguments for the LR scheduler’s constructor. Defaults to an empty dict.

name: str
scheduler_name: str | None
optimizer_kwargs: Dict[str, Any]
lr_scheduler_kwargs: Dict[str, Any]
class Config[source]

Bases: object

arbitrary_types_allowed = True
get_optimizer(model_parameters)[source]

Instantiates and returns the configured PyTorch optimizer.

Parameters:

model_parameters – An iterable of model parameters to optimize, typically model.parameters().

Returns:

An instance of the specified PyTorch optimizer.

Raises:

ValueError – If the specified name is not a valid optimizer in torch.optim.

get_scheduler(optimizer)[source]

Instantiates and returns the configured PyTorch learning rate scheduler.

Parameters:

optimizer – The PyTorch optimizer instance for which to create the scheduler.

Returns:

An instance of the specified LR scheduler, or None if scheduler_name is not set.

Raises:

ValueError – If scheduler_name is set but not found in torch.optim.lr_scheduler.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class banhxeo.train.config.TrainerConfig(*, output_dir: str = './training_output', num_train_epochs: int = 3, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, gradient_accumulation_steps: int = 1, training_shuffle: bool = True, logging_steps: int = 100, save_steps: int | None = 500, save_total_limit: int | None = None, evaluate_during_training: bool = False, evaluation_steps: int | None = None, seed: int = 42, optim: OptimizerConfig, loss: LossConfig)[source]

Bases: BaseModel

Configuration for the Trainer.

Specifies parameters for the training process, including directories, epochs, batch sizes, logging, saving, evaluation, and optimization settings.

Variables:
  • output_dir (str) – Directory to save checkpoints, logs, and other training artifacts. Defaults to “./training_output”.

  • num_train_epochs (int) – Total number of training epochs to perform. Defaults to 3.

  • per_device_train_batch_size (int) – Batch size per GPU/CPU for training. Defaults to 8.

  • per_device_eval_batch_size (int) – Batch size per GPU/CPU for evaluation. Defaults to 8.

  • gradient_accumulation_steps (int) – Number of updates steps to accumulate gradients before performing a backward/update pass. Effective batch size will be per_device_train_batch_size * num_devices * gradient_accumulation_steps. Defaults to 1.

  • training_shuffle (bool) – Whether to shuffle the training data at each epoch. Defaults to True.

  • logging_steps (int) – Log training loss and metrics every N global steps. Defaults to 100.

  • save_steps (int | None) – Save a checkpoint every N global steps. If None, checkpoints are only saved at the end of epochs (if CheckpointCallback is used). Defaults to 500.

  • save_total_limit (int | None) – If set, limits the total number of saved checkpoints. Older checkpoints will be deleted. If None, all checkpoints are kept. Needs to be implemented in CheckpointCallback or Trainer. Defaults to None.

  • evaluate_during_training (bool) – Whether to run evaluation on the eval dataset during training. Defaults to False.

  • evaluation_steps (int | None) – If evaluate_during_training is True, evaluate every N global steps. If None, evaluation might occur at epoch ends if controlled by callbacks. Defaults to None.

  • seed (int) – Random seed for initialization and data shuffling. Defaults to 42.

  • optim (banhxeo.train.config.OptimizerConfig) – An OptimizerConfig instance defining the optimizer and LR scheduler.

  • loss (banhxeo.train.config.LossConfig) – A LossConfig instance defining the loss function.

output_dir: str
num_train_epochs: int
per_device_train_batch_size: int
per_device_eval_batch_size: int
gradient_accumulation_steps: int
training_shuffle: bool
logging_steps: int
save_steps: int | None
save_total_limit: int | None
evaluate_during_training: bool
evaluation_steps: int | None
seed: int
optim: OptimizerConfig
loss: LossConfig
classmethod check_grad_acc_steps(v: int) int[source]

Validates gradient_accumulation_steps.

model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].