src.corpora.tokenization_utils.batch_tokenize¶

batch_tokenize(ds: Dataset, tokenizer, batch_size: int, text_column='text') → Iterator[BatchEncoding][source]¶: Yields batches of tokenized sentences from the given dataset.