RTE3 alignment annotation
Alignment annotation guidelines
How to begin?
Go to the annotation tool, and select the file you want to annotate in the "Data sets" column.
Give a name for your annotation file in the "New annotation" field: your login name followed by the file name (e.g., "mcdm_RTE3_dev1.infoAnno").
Later, you will be able to directly choose your annotation file in the "Open annotation" field.
To continue, click on the "Annotate >>" button at the left bottom of the page.
There is an annotation matrix for each "hypothesis-passage" pair. The hypothesis words are given in the columns. The passage words are in the rows. The direction of alignment goes from the hypothesis to the passage: you need to align the words in the columns to zero, 1 (or multiple) words in the rows. An alignment to multiple words would be the alignment of the word died in the hypothesis to the 3 words kicked the bucket in the passage.
You can save a file on which you are working, either on the server ("Save to server"), or locally ("Save locally").
Annotation marks
The different possible annotation marks are:
- green
- simple alignment (double click)
-
There is a bi-directional alignment between the two words aligned. Choose this option when there is no clear direction of alignment.
- orange
- hyp --> passage (right click)
-
The hypothesis word entails the passage word. For example, drinking in the hypothesis entails consumption in the passage.
- blue
- passage --> hyp (left click)
-
Exactly the reverse of the orange annotation: the passage word entails the hypothesis word.
- black
- structural alignment (shift click)
-
Black annotation captures structural alignment. For example, antonyms aligned will be marked in black.
Trade-off between lexical alignment and structural alignment
Aligning subgraphs is preferred than aligning words here and there in the sentence:
-
Determiners, adjectives, and numbers preceding a noun have to be aligned with the determiner, adjective or number adjoined to the aligned noun.
- The following example illustrates the subgraph alignment:
- T: The galaxy, measuring just 2,000 light-years across, is a fraction of the size of our own Milky Way, which streches 100,000 light-years in diameter.
- H: The Milky Way measures 2,000 light-years across.
Although measures is more lexically related to measuring, we want to align it to streches which is structurally related to Milky Way. We also align 2,000 light-years with 100,000 light-years which is were the contradiction lies.
- All the words in the sentence do NOT have to be aligned!
Annotation log
The RTE3_dev dataset has been split into 9 files in order to allow independent annotations. So far we have been assigned to only one file each. Depending on our motivation, we will proceed to a double annotation later!
Data |
Annotator 1 name |
Annotator 1 status |
Annotator 2 name |
Annotator 2 status |
Differences adjudicated |
RTE3_dev1 |
Marie |
done! |
|
|
|
RTE3_dev2 |
Bill |
done! |
|
|
|
RTE3_dev3 |
Nate |
done! |
|
|
|
RTE3_dev4 |
Dan R |
done! |
|
|
|
RTE3_dev5 |
Chloé |
done! |
|
|
|
RTE3_dev6 |
David |
done! |
|
|
|
RTE3_dev7 |
Eric |
done! |
|
|
|
RTE3_dev8 |
Dan J |
done! |
|
|
|
RTE3_dev9 |
Chris |
done! |
|
|
|
Marie-Catherine de Marneffe
Last modified: Wed Feb 21 22:45:06 PST 2007
|