Tuesday, 6 November 2012

Pitfalls of "consistent with equals"

The equals() method and the compareTo() method on Comparable are two of the most basic in Java. Yet their definitions have an interesting wrinkle around the concept of "consistent with equals".

The equals() method

The equals() method is both well defined and unclear in Java. It is well-defined in that it specifies exactly how the check for equal must work. They must be reflexive, symmetric, transitive, consistent and handle null.

However the equals() is also unclear. The Javadoc says that the method "Indicates whether some other object is "equal to" this one". Note the "equal to" part is in quotes. The key here is that it does not define how the equality should be determined. There are three standard choices:

  • the identity of the object (==), which is the default inherited from Object
  • the entire observable state of the object, such that if two objects are equal then one is substitutable for the other in other parts of the application
  • some subset of the information that makes logical sense to check equals on, such as an ID

The compareTo() method

The Comparable interface defines the concept of comparability. The Javadoc specifies that it "imposes a total ordering on the objects of each class that implements it".

Classes that implement Comparable have a natural ordering and can be sorted easily and used in collections like TreeSet and TreeMap without a separate Comparator.

The interface is well-defined, in that it specifies that the implementation must ensure a form of symmetry and transitive behaviour, just like equals(). It does not specify exactly how the comparison should be made, but defines the concept of consistency with equals.

Consistent/Inconsistent with equals

The Comparable interface says the following:

The natural ordering for a class C is said to be consistent with equals if and only if e1.compareTo(e2) == 0 has the same boolean value as e1.equals(e2) for every e1 and e2 of class C.

Basically, this requires that the concept of equals defined by compareTo is the same as that defined by equals() (apart from nulls). This is at first glance a simple requirement but, as I'm going to discuss, it has complications.

This kind of definition is especially useful when considering operator overloading. If we consider a Java-like language where == is not object identity, but comparison using a method, and the greater-than/less-than operators are also available the question is what method to call. Greater-than/less-than would in this Java-like language naturally be based on compareTo() and == would call equals().

  // our new Java-like language
  if (a < b) return "Less";      // translation ignoring nulls: if (a.compareTo(b) < 0)
  if (a > b) return "Greater";   // translation ignoring nulls: if (a.compareTo(b) > 0)
  if (a == b) return "Equal";    // translation ignoring nulls: if (a.equals(b))
  throw new Exception("Impossible assuming no nulls?");

But if compareTo() is "inconsistent with equals" then this code can throw the exception, because a.compareTo(b) can return zero when a.equals(b) is false.

Other problems can occur in collections like TreeMap:

  // Foo class is "inconsistent with equals"
  assert foo1.equals(foo2) == false;
  assert foo1.compareTo(foo2) == 0;
  
  TreeMap<Foo, String> map = ...
  map.put(foo1, "a");
  map.put(foo2, "b");

Here the two objects are not equal using equals() but are equal using compareTo(). In this case, the map will have size one, not size two!

Because of the problems with being "inconsistent with equals", the Javadoc says It is strongly recommended (though not required) that natural orderings be consistent with equals.

Many JDK classes implementing Comparable do so in a manner "consistent with equals". These include Byte, Short, Integer, Long, Character and String.

Some others are more interesting:

  • BigDecimal - well-known to be "inconsistent with equals", as 4.00 is not equal to 4.0, but do compare the same
  • Double/Float - provides an explicit sort order and equals check for positive and negative zero and NaN to ensure compareTo is "consistent with equals"
  • CharSet - based on the ID/name. equals is case-senstitive, while compareTo is not, although as the names are normalized it tends not to matter - dubiously "consistent"
  • *Buffer (nio) - based on remaining content of the buffer. equals and compareTo appear to be "consistent" under my testing
  • Rdn (ldap) - based on the normalized form of the state, thus is "consistent with equals"
  • ObjectStreamField (serialization) - based on the name, but with primitives sorted first. equals is not overriden, so is "inconsistent with equals"
  • ObjectName (jmx) - equals based on the canonical name, compareTo defined as "not completely specified but is intended to be such that a sorted list of ObjectNames will appear in an order that is convenient for a person to read"! "inconsistent with equals"
  • FileTime (io) - implements equals using compareTo, definitely "consistent with equals"
  • File (io) - OS-specific, implements equals using compareTo, definitely "consistent with equals"
  • URI - source code appears to be "consistent with equals"
  • UUID - source code for equals includes variant that compareTo doesn't, but it appears to be unessesary code with no effect such that it is "consistent with equals"
  • Date (java.util) - source code difficult to follow because equals changes the state of the object (normalizing the deprecated setters), whereas compareTo does not change the state. As both ultimately so the same calculation it is "consistent with equals"
  • Date/Time (sql) - subclass java.util.Date and add nothing new, so are also "consistent with equals"
  • Timestamp (sql) - horrible compareTo and equals implementations that break the contracts, but is "consistent with equals" when comparing two Timestamp objects
  • Calendar/GregorianCalendar - equals compares the state, while compareTo compares along the time-line. Definitely "inconsistent with equals"

Note: In almost all cases, I had to examine the source code or write test code to determine whether a class was or was not "consistent with equals". There is a good Adopt-a-JDK task here to clarify the Javadoc and check UUID equals.

JSR-310

Having seen too many issues around BigDecimal, the plan has been for JSR-310 classes to be "consistent wth equals". However, a recent thread shows how controversial this can get.

Basically, it seems really simple to define equals/compareTo for some classes. LocalDate is just a date in a single calendar system, so has an obvious ordering and equals. LocalTime is just a time, so has an obvious ordering and equals. Instant is just an instantaeous point on the time-line, so has an obvious ordering and equals.

However other cases are not so obvious. Consider OffsetDateTime:

  dt1 = OffsetDateTime.parse("2012-11-05T06:00+01:00");
  dt2 = OffsetDateTime.parse("2012-11-05T07:00+02:00");

These two date-time objects represent the same instantaneous point on the time-line. But they have different local times and different offsets from UTC/Greenwich.

So, one question to readers.... which of these options do you prefer...

  1. dt1 not equal to dt2, compareTo() takes into account the local time and offset, "consistent with equals" (with a separate Comparator to sort by instant)
  2. dt1 not equal to dt2, compareTo() based on the instantaneous point on the time-line, "inconsistent with equals"
  3. dt1 equal to dt2, compareTo() based on the instantaneous point on the time-line, "consistent with equals"
  4. dt1 equal to dt2 and do not implement Comparable
  5. dt1 not equal to dt2 and do not implement Comparable

Personally I struggle with the idea that dt1.equals(dt2) would be true, but I'm open to opinions.

BTW, the question could also be posed with respect to BigDecimal. If you could to make that class "consistent with equals" would you change the equals() method or the compareTo() method?

29 comments:

  1. My inclination would be to retain consistency between equals() and compareTo(), and favour the most natural definition of compareTo() when deciding on the implementations - so option 3 above. I think the definition of compareTo() which option 1 would give you is further removed from what you'd intuitively expect than the definition of equals() which option 3 would give you.

    You could of course always add a strictEquals() method to cover the more 'natural' definition of equality (but that doesn't address the problem of how best to map methods to operators).

    ReplyDelete
  2. The XML Schema spec (in particular section 3.2.7.4 in http://www.w3.org/TR/xmlschema-2/#dateTime ) wrestles with the notions of order and equality for dateTime with offsets. In particular, "dateTime is a partial order since there is no determinate relationship between certain instants".

    What if OffsetDateTime did not implement Comparable and you instead supplied one (or more) external Comparator(s)? Then you would no longer need to have consistent equals and compareTo methods. The Comparator could be configurable depending on what aspects you care about.

    ReplyDelete
  3. How would the 25th hour be handled? This week we faced external "sources of truth" who erred in their handling of DST changeover. The same wallclock time occurred twice, so in some sense are equal, but must be strictly ascending to preserve their order, which is inconsistent.

    ReplyDelete
  4. @Henry, the option 1 definition is actually "sort by instant, and if they are equal take the local date-time into account"

    @Alex, the XML guys invented an unable to determine answer to work around the problem. Bit of a cop-out really. The question is whether a sort order (option 1 in previous para) that works most of the time and is ideal for human-order sorts is better than no comparability. Its certainly clear that if you want true instant ordering you can add a comparator for that.

    @binkley, well an OffsetDateTime contains the offset, so the chances are that the external supplier sent you local times in one offset followed by local times in a different offset. As the sort would be by instant first, you'd get the right order.

    ReplyDelete
  5. The competing force is that equals() wants to include all observable state. The URL class famously treats http://localhost/ and http://127.0.0.1/ as equal because it considers their hostnames to be equivalent. That means I can't differentiate the two when used as map keys, as in this silly example:

    Map urlToHostname = new LinkedHashMap();
    for (URL url : new URL[] { new URL("http://localhost/"), new URL("http://127.0.0.1/") }) {
    urlToHostname.put(url, url.getHost());
    }
    System.out.println(urlToHostname); // prints {http://localhost/=127.0.0.1}

    Your best option is to have compareTo ignore the timezone field, and to have equals() honor it. Consistent-with-equals is not a requirement, and anyone who has used String.CASE_INSENSITIVE_ORDER should be prepared for this.

    Side note: Android 4 breaks compatibility with Java by implementing URL.equals more naturally.
    http://developer.android.com/reference/java/net/URL.html#equals(java.lang.Object)

    ReplyDelete
  6. Option 3 looks more natural to me, or option 4 with a separate static comparator.

    ReplyDelete
  7. I would go for the third option. I would argue that the whole point of timezone is for convenience in human representation, which has nothing to do with mathematical computation.

    ReplyDelete
    Replies
    1. However, timezone implies a location, since our location determines our timezone. So, I would argue for option1, dt1 and dt2 are not equal, because they represent the same instant of time in two different locations.

      Delete
  8. @Olivier, @Emmanuel, option 3 that you are arguing for means that you cannot sensibly use the object as a key in a map, as per swankjesse's example.

    @Olivier, bear in mind that the Instant class covers the use case of a simple instantaneous point on the time-line without an offset, and that has a clear equals/compareTo definition.

    @swankjesse, "inconsistent with equals" would mean that your HashMap example would work, but a TreeMap example wouldn't. Surely thats not acceptable...

    ReplyDelete
  9. @Stephen I see - if Option 1 preserves 'instant ordering' then that does shift things in favour of Option 1 for me. I'm still a little uncomfortable that in this case, compareTo() could return non-zero for comparisons between objects representing equivalent instants, however I think it's still a valid design choice for this class, which contains more than just the concept of 'instant'.

    What this discussion seems to highlight is that when considering equals(), a full state comparison is perhaps most natural, but for compareTo(), a 'value' type comparison may be more natural - and these can sometimes pull you in different directions. If the tension between the two is too great, then perhaps don't implement Comparable at all, but provide external Comparators instead. I don't think that's really necessary in this case though.

    ReplyDelete
  10. I fail to see in what case a map of OffsetDateTime would require a distinction between similar instants expressed in different timezones, "merging" identical instants looks like the main use case to me. But if this is really necessary I guess it's still possible to use the string representation as the key.

    ReplyDelete
  11. I prefer option 3, in which the two offsets are considered equal and compareTo() is consistent with equals().

    I see the friction between compareTo() and equals() being more of an inflexibility in defining equivalence. What "compareTo() is consistent with equals()" really means is that the equivalence relation defined by compareTo() produces the same partitions of instances as the equivalence relation defined by equals(). Of course, compareTo() additionally defines a total ordering of its partitions. But whereas Java provides a mechanism for defining arbitrary (totally) ordered partitions, it does not provide a mechanism for defining arbitrary unordered partitions. There is no "Equalator" interface to match the Comparator interface. In a perfect world, Comparator/Comparable would be an extension of Equalator/Equatable with some covariant return of the comparison method, making each ordered equivalence consistent with an unordered equivalence by default. That way, implementations that needed the stricter relation could simply accept only the stronger interface, and others could accept the weaker.

    ReplyDelete
  12. My view:

    The entire purpose of the OffsetDateTime class is to represent different offsets - if all someone wants to represent is an Instant then they have a class for that already. As such dt1 is not the same thing as dt2, and so it's clear to me that dt1 *must not* be equal to dt2 and it would be highly confusing if they were equal. The accessors on OffsetDateTime return different values on dt1 and dt2, and there is a toInstant method on it which makes comparing for Instant equality trivial anyway. People who want an Instant should use an instance of an Instant. This rules out options 3 & 4.

    I would also view it as extremely desirable to have compareTo consistent with equals - I would hope if they could start over on Java they would make it required to be that way. It's much the most intuitive form. Personally I think that if you want to order OffsetDateTime then it is probably desirable for the order in which dt1 and dt2 appear to be determinate. This rules out option 2.

    The only question I am unclear on is whether you need to implement Comparable at all; on balance, since any date time is a point on a line, I would be inclined to say yes for the convenience of users. Balanced against that: if in doubt, leave it out. Has anyone come up with a use case for comparing instances of OffsetDateTime where it wouldn't make more sense to use instances of Instance?

    So I would say option 1, or if not then option 5 (not listed) - dt1 not equal to dt2, Comparable not implemented.

    ReplyDelete
  13. One thing that might change my view on equals is if Instant and OffsetDateTime both implemented/extended some supertype capturing the idea of being an instantaneous point on the time-line, which could therefore meaningfully declare a contract for equals that both subtypes should adhere to - however, I cannot see such a supertype. I see DateTime does not specify an instantaneous point on the time-line, as it has Local subtypes.

    ReplyDelete
    Replies
    1. And if this were the case instant.equals(offsetDateTime) should be able to return true if they represent the same instant, just as arrayList.equals(linkedList) can return true if they contain the same elements in the same order.

      Delete
  14. I agree with Robert Elliot's comments -- the two objects are clearly not equal. If I wished to use something that just represents a point in the timeline, that's what Instant is for. The whole point of an OffsetDateTime is to be able to differentiate between different expressions of the same instant...

    ReplyDelete
  15. @swankjesse Using a URL as a map key at all is dangerous. The keys in a hash map must always hash to the same value, and a.equals(b) must return the same answer every time you call it. Since URL.equals depends on DNS, it may return different answers at different points in time. "http://www.google.com" and "http://1.2.3.4" may or may not be the same, it depends on what an external system says (and even if it is up).

    I think URI on the other hand doesn't implement equals using DNS :)



    With option 1, the obvious question is which way are they sorted - should 06:00+01:00 be before or after 07:00+02:00 ?

    The real problem is that there is not a single definition of equality or ordering that is suitable for everyone, and they're not like typeclasses where you can say which one you want (Comparator aside).


    I would go with 4, even if it doesn't solve the "are they equal?" question.

    ReplyDelete
  16. Lots of good comments. I'm actually glad to see that not everyone agrees on the answer, as it means it is essentially a judgement call, rather than a right/wrong.

    I think Robert's first comment nails the equals() side. So, its mostly down to whether to implement a convenient default ordering or not implement Comparable at all.

    ReplyDelete
    Replies
    1. Thinking more about Robert's comments, I agree that Comparable is not a good fit and shouldn't be implemented by such a class. The ordering is non-intuitive and feels arbitrary for all three options. In the rare circumstance that someone would need instances of OffsetDateTime to be comparable, they can be lossy converted to a comparable type, or a Comparator can be defined for the specific case, opting to be consistent with equals() as appropriate.

      Delete
  17. I also agree with Robert's option 5 ( dt1 *not* equal to dt2 and do *not* implement Comparable )

    Having #equals() as part of java.lang.Object was a mistake for the JDK in my opinion, so I would avoid the similar mistake of pulling the 'compareTo' method into a date-class itself.

    The API could then provide multiple Comparator implementations, that document whether they take only the instant into account, or also the timezone etc.

    ReplyDelete
  18. On the broader point of the design of Java (Equalator, should "consistent with equals" have been mandatory etc), I think my conclusion is that there are three basic types of equals - identity, state and ordered. Java confuses this by not providing separation between the latter two. Interestingly, I can't think of any evidence that other newer JVM languages have captured the distinction between the two. And of course, sometimes there is no distinction between the last two, as in Integer or Instant (LocalDate also turns out to be complex when you consider calendar systems). So, maybe we should imagine a new language where == is linked to ordered comparison, === is state based and ==== is identity!

    ReplyDelete
  19. Well, if an API does not offer the appropriate comparison method for a particular use case... then why not simply write a custom Comparator as needed?!

    Cheers.
    Joerg

    ReplyDelete
  20. There is also another option to consider, of not implementing equals at all. This might be worthwhile if it is considered confusing that two date-times are not considered equal for the same instant in different time-zones; the user would need to explicitly compare the instant and time zone. However, since date-times are real value objects, then with sufficient documentation, I agree with Robert's comments. A standard set of comparators could/should be provided, e.g. BY_INSTANT_THEN_TZ_ASC, BY_LOCALTIME, etc.

    Also, "some subset of the information that makes logical sense to check equals on, such as an ID" always seems wrong to me - that suggests the need for a "matches" method, not equals, and also suggests that the class is not a "value" class, in which case equals, and in particular hashCode, do not make sense (if the class represents an entity).

    ReplyDelete
  21. Just want to add some discussion points regarding BigDecimal
    BigDecimal - well-known to be "inconsistent with equals", as 4.00 is not equal to 4.0, but do compare the same
    Mathematically numbers are the same, but from engineering point of view 4.0 is not equals to 4.00 because of precision. When you measuring length with different tools the result will depends on the tool's resolution i.e. Vernier caliper has resolution .1-.05 mm and Micrometer caliper has resolution .001 mm. Even if you will get same results after measurements it is correct to write them 4.0 and 4.00 to show precision of measurement.

    ReplyDelete
  22. +1 to Robert Elliot with his idea #5 "strict equality and dropped Comparable."
    Alex Z's BigDecimal sample shows by analogy what some order doesn't mean it is total.
    IMO Comparable<T> is a bad design in itself with perfect example of
    `Date implements Comparable<Date>` with `boolean compareTo(Date)`
    and `Timestamp extends Date` overloading `boolean compareTo(Timestamp)`.
    Single Comparable String in CharSequence hierarchy is a good peek too.
    It is bad what java classes can not implement interfaces.
    May be than could be better for operator overloading?
    Equivalence<T> {bool eq(T)};
    PartialOrder<T&gt extends Equivalence<T> {bool le(T); default bool eq(T) {..}}
    Order<T> extends PartialOrder<T> { CMP cmp(T);
    default bool le(T) {..} default bool eq(T){..}}
    Object.class implements Equivalence<Object>;
    OffsetDateTime.class implements PartialOrder<OffsetDateTime>;
    CharSequence.class implements Order<CharSequence>;
    String.class implements Order<String>;

    ReplyDelete
  23. I see this as being analogous to BigDecimal & would thus choose option 2. The 2 OffsetDateTime objects are clearly not equal, but they can't be ordered in any sensible way either (just as you can't say 4.0 > 4.00 or 4.0 < 4.00 for BigDecimal). Plus that I can't think of any case where one would want/need to impose an order on dt1 and dt2, i.e. an implementation which is consistent with equals would always be arbitrarily.

    Also note: Collections.sort(List) is stable: "[...] equal elements will not be reordered as a result of the sort."
    So to me it would be "least surprise" if, when sorting a list of OffsetDateTime, dt1 and dt2 would still be in the same order in the list. And this is exactly what you can only guarantee with option 2. Also I don't want to have to provide a Comparator each time, so it 's definitely 2 for me :)

    Kind regards, Anthony

    ReplyDelete
  24. Option 2.
    It just seem the most sensible thing that...

    1. Two OffsetDateTime instances in different time zones cannot be mutually equal.
    2. When ordering OffsetDateTime's, they have to be normalized to a comparable state (time line).

    ReplyDelete
  25. I missed one option:

    dt1 not equal to dt2, compareTo() based on the instantaneous point on the time-line *then on offset*, "consistent with equals"

    I prefer that one.

    ReplyDelete
  26. I think, the following two lines:
    dt1 = OffsetDateTime.parse("2012-11-05T06:00+01:00");
    dt2 = OffsetDateTime.parse("2012-11-05T07:00+02:00");

    should read as:
    dt1 = OffsetDateTime.parse("2012-11-05T06:00+02:00");
    dt2 = OffsetDateTime.parse("2012-11-05T07:00+01:00");

    only then, the time instants of dt1 and dt2 will be equal.

    ReplyDelete