RTE3 alignment annotation


Alignment annotation guidelines

How to begin?

Go to the annotation tool, and select the file you want to annotate in the "Data sets" column.

Give a name for your annotation file in the "New annotation" field: your login name followed by the file name (e.g., "mcdm_RTE3_dev1.infoAnno").

Later, you will be able to directly choose your annotation file in the "Open annotation" field. To continue, click on the "Annotate >>" button at the left bottom of the page.

There is an annotation matrix for each "hypothesis-passage" pair. The hypothesis words are given in the columns. The passage words are in the rows. The direction of alignment goes from the hypothesis to the passage: you need to align the words in the columns to zero, 1 (or multiple) words in the rows. An alignment to multiple words would be the alignment of the word died in the hypothesis to the 3 words kicked the bucket in the passage.

You can save a file on which you are working, either on the server ("Save to server"), or locally ("Save locally").

Annotation marks

The different possible annotation marks are:

green
simple alignment (double click)
There is a bi-directional alignment between the two words aligned. Choose this option when there is no clear direction of alignment.
orange
hyp --> passage (right click)
The hypothesis word entails the passage word. For example, drinking in the hypothesis entails consumption in the passage.
blue
passage --> hyp (left click)
Exactly the reverse of the orange annotation: the passage word entails the hypothesis word.
black
structural alignment (shift click)
Black annotation captures structural alignment. For example, antonyms aligned will be marked in black.

Trade-off between lexical alignment and structural alignment

Aligning subgraphs is preferred than aligning words here and there in the sentence:
  • Determiners, adjectives, and numbers preceding a noun have to be aligned with the determiner, adjective or number adjoined to the aligned noun.
  • The following example illustrates the subgraph alignment:
    • T: The galaxy, measuring just 2,000 light-years across, is a fraction of the size of our own Milky Way, which streches 100,000 light-years in diameter.
    • H: The Milky Way measures 2,000 light-years across.
    Although measures is more lexically related to measuring, we want to align it to streches which is structurally related to Milky Way. We also align 2,000 light-years with 100,000 light-years which is were the contradiction lies.
  • All the words in the sentence do NOT have to be aligned!


Annotation log

The RTE3_dev dataset has been split into 9 files in order to allow independent annotations. So far we have been assigned to only one file each. Depending on our motivation, we will proceed to a double annotation later!

Data Annotator 1 name Annotator 1 status Annotator 2 name Annotator 2 status Differences adjudicated
RTE3_dev1 Marie done!
RTE3_dev2 Bill done!
RTE3_dev3 Nate done!
RTE3_dev4 Dan R done!
RTE3_dev5 Chloé done!
RTE3_dev6 David done!
RTE3_dev7 Eric done!
RTE3_dev8 Dan J done!
RTE3_dev9 Chris done!

Marie-Catherine de Marneffe
Last modified: Wed Feb 21 22:45:06 PST 2007