src.corpora.tokenization_utilsΒΆ
Functions
Yields batches of tokenized sentences from the given dataset. |
|
Yields batches of the given size from the given iterable. |
|
Groups texts in a batch together. |
Classes
Very similar to ShufflerIterDataPipe, but with a seed, and it ignores the set_shuffle_settings stuff. |