Mistral
Getting Started
Overview
Installation
Configuration
Training
Download Models
Evaluation
Tutorials
Training With Multiple GPU's
Training On Multiple Nodes With DeepSpeed
Resuming From Checkpoint
Generate Text With A Trained Model
Training A Model With Google Cloud + Kubernetes
About
Contributing
API reference
src.args
src.core
src.corpora
src.corpora.auto
src.corpora.detokenization
src.corpora.indexer
src.corpora.tokenization_utils
src.corpora.tokenization_utils.batch_tokenize
src.corpora.tokenization_utils.batched
src.corpora.tokenization_utils.concatenate_and_group_texts
src.corpora.tokenization_utils.SeededShufflerIterDataPipe
src.models
src.overwatch
src.util
Differences between Mistral and Hugging Face
Mistral
»
src
»
src.corpora
»
src.corpora.tokenization_utils
»
src.corpora.tokenization_utils.batch_tokenize
src.corpora.tokenization_utils.batch_tokenize
¶
batch_tokenize
(
ds
:
Dataset
,
tokenizer
,
batch_size
:
int
,
text_column
=
'text'
)
→
Iterator
[
BatchEncoding
]
[source]
¶
Yields batches of tokenized sentences from the given dataset.