banhxeo.data.config module
- class banhxeo.data.config.DatasetSplit(train: int, test: int, val: int | None = None)[source]
Bases:
object
- train: int
- test: int
- val: int | None = None
- __init__(train: int, test: int, val: int | None = None) None
- class banhxeo.data.config.DownloadDatasetFile(name: str, ext: str, source: str | None = None)[source]
Bases:
object
- name: str
- ext: str
- source: str | None = None
- __init__(name: str, ext: str, source: str | None = None) None
- class banhxeo.data.config.DatasetConfig(*, name: str, url: str | None = None, file_info: DownloadDatasetFile | None = None, md5: str | None = None, hf_path: str | None = None, hf_name: str | None = None, text_column: str = 'text', label_column: str | None = 'label', label_map: Dict[str, int] = {'neg': 0, 'pos': 1}, split: DatasetSplit | None = None)[source]
Bases:
BaseModel
- name: str
- url: str | None
- file_info: DownloadDatasetFile | None
- md5: str | None
- hf_path: str | None
- hf_name: str | None
- text_column: str
- label_column: str | None
- label_map: Dict[str, int]
- split: DatasetSplit | None
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class banhxeo.data.config.TorchDatasetConfig(*, tokenizer: Tokenizer, tokenizer_config: TokenizerConfig, vocab: Vocabulary, is_classification: bool = False, transforms: List[Transforms] | ComposeTransforms = [], text_column_name: str = 'text', label_column_name: str | None = 'label', label_map: Dict[str, int] | None)[source]
Bases:
BaseModel
- tokenizer_config: TokenizerConfig
- vocab: Vocabulary
- is_classification: bool
- transforms: List[Transforms] | ComposeTransforms
- text_column_name: str
- label_column_name: str | None
- label_map: Dict[str, int] | None
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].