public class AnCoraPronounDisambiguator
extends Object
A utility for preprocessing the AnCora Spanish corpus.
Attempts to disambiguate Spanish personal pronouns which have
multiple senses:
me, te, se, nos, os
Each of these can be used as 1) an indirect object pronoun or as
2) a reflexive pronoun. (me, te, nos, and os can
also be used as direct object pronouns.)
For the purposes of corpus preprocessing, all we need is to
distinguish between the object- and reflexive-pronoun cases.
Disambiguation is done first by (dictionary-powered) heuristics, and
then by brute force. The brute-force decisions are manual tags for
verbs with clitic pronouns which appear in the AnCora corpus.
- Author:
- Jon Gauthier
- See Also:
SpanishTreeNormalizer