Subtypes usually can't redefine equals()

From: Christopher Manning 
Sender: owner-java-nlp-list@lists.Stanford.EDU
To: java-nlp-list@lists.Stanford.EDU
Subject: Re: TaggedWord question
Date: Sat, 5 Apr 2003 14:52:44 -0800

[I thought it might be educational to send this to everybody, so without
Roger's permission, here's the second in my very occasional series of
Java tips (they come out about once a year, I guess).]

On 4 April 2003, Roger Levy  wrote:
 > Hi Chris,
 > 
 > TaggedWord's equality condition doesn't require identity of tag, just
 > of word.  This tripped me up recently and can be a bit annoying to
 > work around.  Is there a good reason why this shouldn't be changed?

YES!  There's a good reason.

This actually used to be wrong in the earliest versions of the trees
package (doing what you suggest), because what you write seems obvious
and a good idea, but it's actually wrong.  The problem comes about
because TaggedWord is a subtype of Word.  (The root of the problem here
is a reason why in general one should be suspicious of using subtyping:
Once in a JavaNLP meeting -- before your time -- I delivered a sermon on
"Favor composition over inheritance".)

Why it doesn't work is that it breaks the symmetry and/or transitivity
of equals.  Consider TaggedWord tw1, tw2, and Word w, all with the same
word() in them, but tw1 and tw2 have different tag() values.  For the
obvious way of doing what you suggest w.equals(tw1) and tw1.equals(w)
would now not give the same answer, and even if you defined things so
they did by using constraints on types, w.equals(tw1) and w.equals(tw2)
would by true, but not tw1.equals(tw2).  And there just isn't any way to
fix it.  (Well, almost no way: One could fix it under a closed world
assumption where you know all the subtypes of Word, but you can't do
that because Java has an open world semantics.  With the built in
instanceof you can't tell whether an object is 'really' just a word, and
not a subtype.  You could do it via the getClass() style reflection
mechanisms, but if one is going to such baroque lengths of making it so
that a TaggedWord can't be equal to a Word for purposes of equality,
then it's not exactly clear why it was made a subtype of Word in the
first place....  See the documentation of equals() in the Object class
javadoc, which is actually quite detailed on this issue.

Despite these problems, in this particular case, I think it is often
useful having TaggedWord as a subtype of Word, since then it can appear
anywhere a Word can, and things just work, but you can get at the
tagging if it is present.  You just have to be aware that for all Label
types, equals() checks only the value().  [And sometimes times this is
useful since it means that things like percolating heads don't make
nodes unequal from what they were before.]

However, the opposite behavior would also sometimes be useful.  If you
wanted to have in the Label interface a veryEqual() method, then that
would be fine with me, providing it had a clear and defensible semantics
with the above sort of properties.  I think one reasonable option that
would work is: require that the two labels have the same value(),
implement the same subset of the HasWord, HasTag, and HasCategory()
interfaces, and that the word(), tag(), and category() match if the
respective interface is implemented.

Chris.

Christopher Manning
Last modified: Sat Apr 5 14:57:03 PST 2003