src.corpora.indexer.read_cache_file¶

read_cache_file(file, flatten: bool = False) → Iterator[BatchEncoding][source]¶: Reads the cache files produced by cache_and_group and yields tokenized sequences. If flatten is false, this returns the docs as they were presented to the caching process. If flatten is True, then the documents returned are actually concatenated documents, where the number is the number of documents presented as a batch to the caching process.