src.corpora.detokenization.wikitext_detokenize

wikitext_detokenize(example: Dict[str, str]) Dict[str, str][source]

Wikitext is whitespace tokenized and we remove these whitespaces.

Taken from https://github.com/NVIDIA/Megatron-LM/blob/main/tasks/zeroshot_gpt2/detokenizer.py