banhxeo.train.config module

class banhxeo.train.config.LossConfig(*, name: str = 'CrossEntropyLoss', kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Configuration for the loss function used during training.

Variables:

name (str) – The name of the loss function class from torch.nn.modules.loss (e.g., “CrossEntropyLoss”, “MSELoss”). Defaults to “CrossEntropyLoss”.
kwargs (Dict[str, Any]) – A dictionary of keyword arguments to be passed to the loss function’s constructor (e.g., {“weight”: torch.tensor([0.1, 0.9])}). Defaults to an empty dictionary.

name: str

kwargs: Dict[str, Any]

get_loss_function()[source]

Instantiates and returns the configured PyTorch loss function.

Returns:

An instance of the specified PyTorch loss function.

Raises:

ValueError – If the specified name is not a valid loss function in torch.nn.modules.loss.
AttributeError – If the specified name is not found.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class banhxeo.train.config.OptimizerConfig(*, name: str = 'AdamW', scheduler_name: str | None = None, optimizer_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, lr_scheduler_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]

Bases: BaseModel

Configuration for the optimizer and learning rate scheduler.

Variables:

name (str) – The name of the optimizer class from torch.optim (e.g., “AdamW”, “SGD”). Defaults to “AdamW”.
scheduler_name (str | None) – Optional name of the learning rate scheduler class from torch.optim.lr_scheduler (e.g., “LambdaLR”, “ReduceLROnPlateau”). Defaults to None (no scheduler).
warmup_steps – Number of initial steps during which the learning rate is linearly warmed up from 0 to its initial value. This is typically implemented by custom logic or a scheduler like transformers.get_linear_schedule_with_warmup. Note: This field is declarative; actual warmup logic needs to be implemented in the training loop or via a specific scheduler. Defaults to 0.
optimizer_kwargs (Dict[str, Any]) – Keyword arguments for the optimizer’s constructor (e.g., {“lr”: 1e-3, “weight_decay”: 0.01}). Defaults to an empty dict.
lr_scheduler_kwargs (Dict[str, Any]) – Keyword arguments for the LR scheduler’s constructor. Defaults to an empty dict.

name: str

scheduler_name: str | None

optimizer_kwargs: Dict[str, Any]

lr_scheduler_kwargs: Dict[str, Any]

class Config[source]

Bases: object

arbitrary_types_allowed = True

get_optimizer(model_parameters)[source]

Instantiates and returns the configured PyTorch optimizer.

Parameters:: model_parameters – An iterable of model parameters to optimize, typically model.parameters().
Returns:: An instance of the specified PyTorch optimizer.
Raises:: ValueError – If the specified name is not a valid optimizer in torch.optim.

get_scheduler(optimizer)[source]

Instantiates and returns the configured PyTorch learning rate scheduler.

Parameters:: optimizer – The PyTorch optimizer instance for which to create the scheduler.
Returns:: An instance of the specified LR scheduler, or None if scheduler_name is not set.
Raises:: ValueError – If scheduler_name is set but not found in torch.optim.lr_scheduler.

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class banhxeo.train.config.TrainerConfig(*, output_dir: str = './training_output', num_train_epochs: int = 3, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, gradient_accumulation_steps: int = 1, training_shuffle: bool = True, logging_steps: int = 100, save_steps: int | None = 500, save_total_limit: int | None = None, evaluate_during_training: bool = False, evaluation_steps: int | None = None, seed: int = 42, optim: OptimizerConfig, loss: LossConfig)[source]

Bases: BaseModel

Configuration for the Trainer.

Specifies parameters for the training process, including directories, epochs, batch sizes, logging, saving, evaluation, and optimization settings.

Variables:

output_dir (str) – Directory to save checkpoints, logs, and other training artifacts. Defaults to “./training_output”.
num_train_epochs (int) – Total number of training epochs to perform. Defaults to 3.
per_device_train_batch_size (int) – Batch size per GPU/CPU for training. Defaults to 8.
per_device_eval_batch_size (int) – Batch size per GPU/CPU for evaluation. Defaults to 8.
gradient_accumulation_steps (int) – Number of updates steps to accumulate gradients before performing a backward/update pass. Effective batch size will be per_device_train_batch_size * num_devices * gradient_accumulation_steps. Defaults to 1.
training_shuffle (bool) – Whether to shuffle the training data at each epoch. Defaults to True.
logging_steps (int) – Log training loss and metrics every N global steps. Defaults to 100.
save_steps (int | None) – Save a checkpoint every N global steps. If None, checkpoints are only saved at the end of epochs (if CheckpointCallback is used). Defaults to 500.
save_total_limit (int | None) – If set, limits the total number of saved checkpoints. Older checkpoints will be deleted. If None, all checkpoints are kept. Needs to be implemented in CheckpointCallback or Trainer. Defaults to None.
evaluate_during_training (bool) – Whether to run evaluation on the eval dataset during training. Defaults to False.
evaluation_steps (int | None) – If evaluate_during_training is True, evaluate every N global steps. If None, evaluation might occur at epoch ends if controlled by callbacks. Defaults to None.
seed (int) – Random seed for initialization and data shuffling. Defaults to 42.
optim (banhxeo.train.config.OptimizerConfig) – An OptimizerConfig instance defining the optimizer and LR scheduler.
loss (banhxeo.train.config.LossConfig) – A LossConfig instance defining the loss function.

output_dir: str

num_train_epochs: int

per_device_train_batch_size: int

per_device_eval_batch_size: int

gradient_accumulation_steps: int

training_shuffle: bool

logging_steps: int

save_steps: int | None

save_total_limit: int | None

evaluate_during_training: bool

evaluation_steps: int | None

seed: int

optim: OptimizerConfig

loss: LossConfig

classmethod check_grad_acc_steps(v: int) → int[source]: Validates gradient_accumulation_steps.

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].