banhxeo.data.dataset.imdb module

class banhxeo.data.dataset.imdb.IMDBDataset(root_dir: str | None = None, split_name: str = 'train', seed: int = 1234)[source]

Bases: BaseTextDataset

__init__(root_dir: str | None = None, split_name: str = 'train', seed: int = 1234)[source]

Initializes the BaseTextDataset.

Parameters:
  • root_dir – The root directory where datasets are stored. If None, defaults to the current working directory.

  • split_name – The name of the dataset split (e.g., “train”, “test”).

  • config – A DatasetConfig object containing metadata for the dataset.

  • seed – A random seed for reproducibility.

  • download – If True, attempts to download and extract the dataset if it’s not already present and config.url is provided.