![[ICO]](/icons/blank.gif) | Name | Last modified | Size | Description |
|---|
|
![[DIR]](/icons/back.gif) | Parent Directory | | - | |
![[ ]](/icons/tex.gif) | knowledge_bias.tex | 09-Jan-2024 17:38 | 876 | |
![[ ]](/icons/tex.gif) | knowledge_toxicity.tex | 09-Jan-2024 17:38 | 884 | |
![[ ]](/icons/tex.gif) | core_scenarios_bias.tex | 09-Jan-2024 17:38 | 886 | |
![[ ]](/icons/tex.gif) | reasoning_apps_metrics.tex | 09-Jan-2024 17:38 | 892 | |
![[ ]](/icons/tex.gif) | core_scenarios_toxicity.tex | 09-Jan-2024 17:38 | 894 | |
![[ ]](/icons/tex.gif) | question_answering_bias.tex | 09-Jan-2024 17:38 | 894 | |
![[ ]](/icons/tex.gif) | targeted_evaluations_bias.tex | 09-Jan-2024 17:38 | 898 | |
![[ ]](/icons/tex.gif) | question_answering_toxicity.tex | 09-Jan-2024 17:38 | 902 | |
![[ ]](/icons/tex.gif) | targeted_evaluations_toxicity.tex | 09-Jan-2024 17:38 | 906 | |
![[ ]](/icons/tex.gif) | targeted_evaluations_bbq_metrics.tex | 09-Jan-2024 17:38 | 912 | |
![[ ]](/icons/tex.gif) | targeted_evaluations_apps_metrics.tex | 09-Jan-2024 17:38 | 914 | |
![[ ]](/icons/tex.gif) | core_scenarios_summarization_metrics.tex | 09-Jan-2024 17:38 | 920 | |
![[ ]](/icons/tex.gif) | targeted_evaluations_copyright_metrics.tex | 09-Jan-2024 17:38 | 924 | |
![[ ]](/icons/tex.gif) | targeted_evaluations_disinformation_metrics.tex | 09-Jan-2024 17:38 | 934 | |
![[ ]](/icons/tex.gif) | reasoning_efficiency.tex | 09-Jan-2024 17:38 | 1.2K | |
![[ ]](/icons/tex.gif) | knowledge_efficiency.tex | 09-Jan-2024 17:38 | 1.3K | |
![[ ]](/icons/tex.gif) | question_answering_efficiency.tex | 09-Jan-2024 17:38 | 1.7K | |
![[ ]](/icons/tex.gif) | knowledge_calibration.tex | 09-Jan-2024 17:38 | 2.0K | |
![[ ]](/icons/tex.gif) | targeted_evaluations_calibration.tex | 09-Jan-2024 17:38 | 2.0K | |
![[ ]](/icons/tex.gif) | calibration_accuracy.tex | 09-Jan-2024 17:38 | 2.0K | |
![[ ]](/icons/tex.gif) | reasoning_accuracy.tex | 09-Jan-2024 17:38 | 2.3K | |
![[ ]](/icons/tex.gif) | core_scenarios_calibration.tex | 09-Jan-2024 17:38 | 2.5K | |
![[ ]](/icons/tex.gif) | question_answering_calibration.tex | 09-Jan-2024 17:38 | 2.5K | |
![[ ]](/icons/tex.gif) | knowledge_accuracy.tex | 09-Jan-2024 17:38 | 3.1K | |
![[ ]](/icons/tex.gif) | knowledge_fairness.tex | 09-Jan-2024 17:38 | 3.1K | |
![[ ]](/icons/tex.gif) | knowledge_robustness.tex | 09-Jan-2024 17:38 | 3.1K | |
![[ ]](/icons/tex.gif) | targeted_evaluations_fairness.tex | 09-Jan-2024 17:38 | 3.1K | |
![[ ]](/icons/tex.gif) | targeted_evaluations_robustness.tex | 09-Jan-2024 17:38 | 3.2K | |
![[ ]](/icons/tex.gif) | openbookqa_openbookqa_.tex | 14-Feb-2024 14:13 | 3.2K | |
![[ ]](/icons/tex.gif) | gsm_gsm_.tex | 14-Feb-2024 14:13 | 3.3K | |
![[ ]](/icons/tex.gif) | mmlu_mmlu_subject:abstract_algebra.tex | 14-Feb-2024 14:13 | 3.3K | |
![[ ]](/icons/tex.gif) | mmlu_mmlu_subject:college_chemistry.tex | 14-Feb-2024 14:13 | 3.3K | |
![[ ]](/icons/tex.gif) | mmlu_mmlu_subject:computer_security.tex | 14-Feb-2024 14:13 | 3.3K | |
![[ ]](/icons/tex.gif) | mmlu_mmlu_subject:us_foreign_policy.tex | 14-Feb-2024 14:13 | 3.3K | |
![[ ]](/icons/tex.gif) | legalbench_legalbench_subset:international_citizenship_questions.tex | 14-Feb-2024 14:13 | 3.3K | |
![[ ]](/icons/tex.gif) | legalbench_legalbench_subset:proa.tex | 14-Feb-2024 14:13 | 3.8K | |
![[ ]](/icons/tex.gif) | natural_qa_closedbook_natural_qa_closedbook_mode:closedbook.tex | 14-Feb-2024 14:13 | 3.8K | |
![[ ]](/icons/tex.gif) | mmlu_mmlu.tex | 14-Feb-2024 14:13 | 4.0K | |
![[ ]](/icons/tex.gif) | med_qa_med_qa_.tex | 14-Feb-2024 14:13 | 4.0K | |
![[ ]](/icons/tex.gif) | calibration_calibration_detailed.tex | 09-Jan-2024 17:38 | 4.0K | |
![[ ]](/icons/tex.gif) | mmlu_mmlu_subject:econometrics.tex | 14-Feb-2024 14:13 | 4.0K | |
![[ ]](/icons/tex.gif) | targeted_evaluations_accuracy.tex | 09-Jan-2024 17:38 | 4.0K | |
![[ ]](/icons/tex.gif) | legalbench_legalbench_subset:abercrombie.tex | 14-Feb-2024 14:13 | 4.1K | |
![[ ]](/icons/tex.gif) | natural_qa_openbook_longans_natural_qa_openbook_longans_mode:openbook_longans.tex | 14-Feb-2024 14:13 | 4.1K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:number_theory,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.1K | |
![[ ]](/icons/tex.gif) | legalbench_legalbench_subset:function_of_decision_section.tex | 14-Feb-2024 14:13 | 4.2K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:algebra,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | wmt_14_wmt_14_source_language:fr,target_language:en.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | wmt_14_wmt_14_source_language:ru,target_language:en.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | wmt_14_wmt_14_source_language:cs,target_language:en.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | wmt_14_wmt_14_source_language:de,target_language:en.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:precalculus,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:prealgebra,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.3K | |
![[ ]](/icons/tex.gif) | wmt_14_wmt_14.tex | 14-Feb-2024 14:13 | 4.4K | |
![[ ]](/icons/tex.gif) | question_answering_accuracy.tex | 09-Jan-2024 17:38 | 4.4K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:intermediate_algebra,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.4K | |
![[ ]](/icons/tex.gif) | core_scenarios_fairness.tex | 09-Jan-2024 17:38 | 4.4K | |
![[ ]](/icons/tex.gif) | question_answering_fairness.tex | 09-Jan-2024 17:38 | 4.4K | |
![[ ]](/icons/tex.gif) | wmt_14_wmt_14_source_language:hi,target_language:en.tex | 14-Feb-2024 14:13 | 4.4K | |
![[ ]](/icons/tex.gif) | core_scenarios_robustness.tex | 09-Jan-2024 17:38 | 4.4K | |
![[ ]](/icons/tex.gif) | question_answering_robustness.tex | 09-Jan-2024 17:38 | 4.5K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:counting_and_probability,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.5K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought_subject:geometry,level:1,use_official_examples:False,use_chain_of_thought:True.tex | 14-Feb-2024 14:13 | 4.5K | |
![[ ]](/icons/tex.gif) | narrative_qa_narrative_qa_.tex | 14-Feb-2024 14:13 | 4.6K | |
![[ ]](/icons/tex.gif) | legalbench_legalbench_subset:corporate_lobbying.tex | 14-Feb-2024 14:13 | 4.7K | |
![[ ]](/icons/tex.gif) | reasoning_general_information.tex | 09-Jan-2024 17:38 | 4.8K | |
![[ ]](/icons/tex.gif) | math_chain_of_thought_math_chain_of_thought.tex | 14-Feb-2024 14:13 | 5.0K | |
![[ ]](/icons/tex.gif) | legalbench_legalbench.tex | 14-Feb-2024 14:13 | 5.0K | |
![[ ]](/icons/tex.gif) | knowledge_general_information.tex | 09-Jan-2024 17:38 | 5.3K | |
![[ ]](/icons/tex.gif) | targeted_evaluations_efficiency_detailed.tex | 09-Jan-2024 17:38 | 5.7K | |
![[ ]](/icons/tex.gif) | core_scenarios_accuracy.tex | 14-Feb-2024 14:13 | 8.3K | |
![[ ]](/icons/tex.gif) | question_answering_general_information.tex | 09-Jan-2024 17:38 | 9.3K | |
![[ ]](/icons/tex.gif) | targeted_evaluations_general_information.tex | 09-Jan-2024 17:38 | 9.3K | |
![[ ]](/icons/tex.gif) | core_scenarios_efficiency.tex | 14-Feb-2024 14:13 | 9.4K | |
![[ ]](/icons/tex.gif) | core_scenarios_general_information.tex | 14-Feb-2024 14:13 | 19K | |
|