banhxeo.train.config module
- class banhxeo.train.config.LossConfig(*, name: str = 'CrossEntropyLoss', kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Bases:
BaseModel
Configuration for the loss function used during training.
- Variables:
name (str) – The name of the loss function class from torch.nn.modules.loss (e.g., “CrossEntropyLoss”, “MSELoss”). Defaults to “CrossEntropyLoss”.
kwargs (Dict[str, Any]) – A dictionary of keyword arguments to be passed to the loss function’s constructor (e.g., {“weight”: torch.tensor([0.1, 0.9])}). Defaults to an empty dictionary.
- name: str
- kwargs: Dict[str, Any]
- get_loss_function()[source]
Instantiates and returns the configured PyTorch loss function.
- Returns:
An instance of the specified PyTorch loss function.
- Raises:
ValueError – If the specified name is not a valid loss function in torch.nn.modules.loss.
AttributeError – If the specified name is not found.
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class banhxeo.train.config.OptimizerConfig(*, name: str = 'AdamW', scheduler_name: str | None = None, optimizer_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>, lr_scheduler_kwargs: ~typing.Dict[str, ~typing.Any] = <factory>)[source]
Bases:
BaseModel
Configuration for the optimizer and learning rate scheduler.
- Variables:
name (str) – The name of the optimizer class from torch.optim (e.g., “AdamW”, “SGD”). Defaults to “AdamW”.
scheduler_name (str | None) – Optional name of the learning rate scheduler class from torch.optim.lr_scheduler (e.g., “LambdaLR”, “ReduceLROnPlateau”). Defaults to None (no scheduler).
warmup_steps – Number of initial steps during which the learning rate is linearly warmed up from 0 to its initial value. This is typically implemented by custom logic or a scheduler like transformers.get_linear_schedule_with_warmup. Note: This field is declarative; actual warmup logic needs to be implemented in the training loop or via a specific scheduler. Defaults to 0.
optimizer_kwargs (Dict[str, Any]) – Keyword arguments for the optimizer’s constructor (e.g., {“lr”: 1e-3, “weight_decay”: 0.01}). Defaults to an empty dict.
lr_scheduler_kwargs (Dict[str, Any]) – Keyword arguments for the LR scheduler’s constructor. Defaults to an empty dict.
- name: str
- scheduler_name: str | None
- optimizer_kwargs: Dict[str, Any]
- lr_scheduler_kwargs: Dict[str, Any]
- get_optimizer(model_parameters)[source]
Instantiates and returns the configured PyTorch optimizer.
- Parameters:
model_parameters – An iterable of model parameters to optimize, typically model.parameters().
- Returns:
An instance of the specified PyTorch optimizer.
- Raises:
ValueError – If the specified name is not a valid optimizer in torch.optim.
- get_scheduler(optimizer)[source]
Instantiates and returns the configured PyTorch learning rate scheduler.
- Parameters:
optimizer – The PyTorch optimizer instance for which to create the scheduler.
- Returns:
An instance of the specified LR scheduler, or None if scheduler_name is not set.
- Raises:
ValueError – If scheduler_name is set but not found in torch.optim.lr_scheduler.
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class banhxeo.train.config.TrainerConfig(*, output_dir: str = './training_output', num_train_epochs: int = 3, per_device_train_batch_size: int = 8, per_device_eval_batch_size: int = 8, gradient_accumulation_steps: int = 1, training_shuffle: bool = True, logging_steps: int = 100, save_steps: int | None = 500, save_total_limit: int | None = None, evaluate_during_training: bool = False, evaluation_steps: int | None = None, seed: int = 42, optim: OptimizerConfig, loss: LossConfig)[source]
Bases:
BaseModel
Configuration for the Trainer.
Specifies parameters for the training process, including directories, epochs, batch sizes, logging, saving, evaluation, and optimization settings.
- Variables:
output_dir (str) – Directory to save checkpoints, logs, and other training artifacts. Defaults to “./training_output”.
num_train_epochs (int) – Total number of training epochs to perform. Defaults to 3.
per_device_train_batch_size (int) – Batch size per GPU/CPU for training. Defaults to 8.
per_device_eval_batch_size (int) – Batch size per GPU/CPU for evaluation. Defaults to 8.
gradient_accumulation_steps (int) – Number of updates steps to accumulate gradients before performing a backward/update pass. Effective batch size will be per_device_train_batch_size * num_devices * gradient_accumulation_steps. Defaults to 1.
training_shuffle (bool) – Whether to shuffle the training data at each epoch. Defaults to True.
logging_steps (int) – Log training loss and metrics every N global steps. Defaults to 100.
save_steps (int | None) – Save a checkpoint every N global steps. If None, checkpoints are only saved at the end of epochs (if CheckpointCallback is used). Defaults to 500.
save_total_limit (int | None) – If set, limits the total number of saved checkpoints. Older checkpoints will be deleted. If None, all checkpoints are kept. Needs to be implemented in CheckpointCallback or Trainer. Defaults to None.
evaluate_during_training (bool) – Whether to run evaluation on the eval dataset during training. Defaults to False.
evaluation_steps (int | None) – If evaluate_during_training is True, evaluate every N global steps. If None, evaluation might occur at epoch ends if controlled by callbacks. Defaults to None.
seed (int) – Random seed for initialization and data shuffling. Defaults to 42.
optim (banhxeo.train.config.OptimizerConfig) – An OptimizerConfig instance defining the optimizer and LR scheduler.
loss (banhxeo.train.config.LossConfig) – A LossConfig instance defining the loss function.
- output_dir: str
- num_train_epochs: int
- per_device_train_batch_size: int
- per_device_eval_batch_size: int
- gradient_accumulation_steps: int
- training_shuffle: bool
- logging_steps: int
- save_steps: int | None
- save_total_limit: int | None
- evaluate_during_training: bool
- evaluation_steps: int | None
- seed: int
- optim: OptimizerConfig
- loss: LossConfig
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].