src.corpora.tokenization_utils.batch_tokenize

batch_tokenize(ds: Dataset, tokenizer, batch_size: int, text_column='text') Iterator[BatchEncoding][source]

Yields batches of tokenized sentences from the given dataset.