banhxeo.data.dataset.amazon_review module
- class banhxeo.data.dataset.amazon_review.AmazonReviewFullDataset(root_dir: str | None = None, split_name: str = 'train', seed: int = 1234)[source]
Bases:
BaseTextDataset
- __init__(root_dir: str | None = None, split_name: str = 'train', seed: int = 1234)[source]
Initializes the BaseTextDataset.
- Parameters:
root_dir – The root directory where datasets are stored. If None, defaults to the current working directory.
split_name – The name of the dataset split (e.g., “train”, “test”).
config – A DatasetConfig object containing metadata for the dataset.
seed – A random seed for reproducibility.
download – If True, attempts to download and extract the dataset if it’s not already present and config.url is provided.