This page contains the contradiction datasets that we created as part of our work on detecting contradiction in text. For more information about this, see: Marie-Catherine de Marneffe, Anna N. Rafferty and Christopher D. Manning. 2008. Finding contradictions in text. ACL-08. Annotation GuidelinesAnnotation guidelines for marking contradictions by Marie-Catherine de Marneffe and Christopher Manning, used for these data sets. RTE datasetsWe have annotated the PASCAL RTE datasets for contradiction. These datasets are marked for a 3-way decision in terms of entailment: "YES" (entails), "NO" (contradicts) and "UNKNOWN" (doesn't entail but is not a contradiction). The datasets are therefore not balanced: contradictions constitute about 10 percent of the data. Some datasets were annotated only by Marie-Catherine de Marneffe; others were double-annotated by various students and faculty at Stanford with subsequent adjudication of different judgments. RTE1_dev1 data RTE1_dev2 data RTE1_test data RTE2_dev data RTE2_test data RTE3_dev data RTE3_test dataNegation datasetsA corpus where contradictions arise from negation has also been created by adding negative markers to the RTE2 test data. A small development set of 102 pairs has been constructed by randomly sampling 51 pairs of entailments and 51 pairs of non-entailments from the RTE2 developement set, and adding negative markers. The pairs are marked for contradiction: contradiction="YES" or contradiction="NO". In terms of 3-way decision, the contradiction="YES" items should be mapped to entailment="NO", and the contradiction="NO" to entailment="UNKNOWN". These datasets are balanced. RTE2_dev Negation data RTE2_test Negation data"Real-life" contradiction corpusWe have also gathered a collection of contradictions appearing "in the wild". This corpus contains 131 pairs of contradictions that naturally occur in texts, compared to the manually constructed RTE datasets. real-life contradictionsContact |