Friday, 5 October 2007

JSR-310 and Java 7 language changes

Part of the difficulty I'm finding with designing JSR-310 (Dates and Times) is that I constantly come across gaps in the language of Java. My concern is that these gaps will shape the API to be less than it should be. Let me give some examples:

BigDecimal

JSR-310 is considering adding classes representing durations. (So is JSR-275, but thats another story.)

The aim of the duration classes is to meet the use case of representing "6 days", or "7 minutes". As a result, the first set of code uses int to represent the amount.

 public class Days {     // code is for blog purposes only!
   private int amount;
   ...
 }

However, what happens when you get down to seconds and milliseconds? Do we have one class that represents seconds, and a separate class that represents milliseconds? That seems rather naff. What would be better is to have a class to represent seconds with decimal places:

 public class Days {
   private double amount;
   ...
 }

But double is a no-no. Like float, it is unreliable - unable to represent some decimal values, and with sometimes unexpected answers from the maths. Of course, the answer is BigDecimal:

 public class Days {
   private BigDecimal amount;
 }

So, why am I, and others, reticent to use this 'correct' solution? Its because we don't have BigDecimal literals and operators. This is a clear case where language-level choices are affecting library-level design for the worse.

Immutables

JSR-310 is based around immutable classes, which are now a well recognised best practice for classes like dates and times. Unfortunately, the Java language is not well setup for immutable classes.

Ideally, there should be language level support for immutable classes. This would involve a single keyword to declare a class as immutable, and could allow certain runtime optimisations (so I'm told...).

 public immutable class DateTime {
   ...
 }

Unfortunately, it is probably too late for Java to do much on this one.

Self types

Another missing Java feature affecting JSR-310 is self-types. These are where a superclass can declare a method that automatically causes all subclasses to return the same type as the subclass:

 public abstract class AbstractDateTime {
   public <this> plusYears(Duration duration) {
     return factory.create(this.amount + duration);   // pseudo-code
   }
 }
 public final class DateTime extends AbstractDateTime {
 }
 // usage
 DateTime result = dateTime.plus(duration);

The thing to note is the 'this' generic style syntax. The effect is that the user of the subclass was able to use the method and get the correct return type. They were returned a DateTime rather than a AbstractDateTime.

This can be achieved today by manually overriding the method in each and every subclass. However that doesn't work if you want to add a new method to the abstract superclass in the future and have it picked up by every subclass (which is what you want in a JSR, as a JSR doesn't have a perfect crystal ball for its first release).

Again, a language-level missing feature severely compromises a library-level JSR.

Operator overloading

Another area where JSR-310 may choose the 'wrong' approach is int-wrapper classes. For JSR-310 this would be a class representing a duration in years.

If I were to give you a problem description that said you need to model 'the number of apples' in a shop', and to be able to add, subtract and multiply that number, you'd have a couple of design options.

The first option is to hold an int and perform regular maths using +, - and *. The downside is that only javadoc tells you that the int is a number of apples, instead of a number of oranges. This is exactly the situation before generics.

The second option is to create an Apples class wrapping the int. Now, you cannot confuse apples with oranges. Unfortunately, you also cannot use +, - and *. This is despite the fact that they are obviously valid in this scenario. Using plus/minus/multipliedBy/dividedBy methods just doesn't have the same clarity.

Given this choice today, most architects/designers seem to be choosing the first option of just using an int. Language designers should really be looking at that decision and shaking their head. The whole point of the generics change was to move away from reliance on javadoc. Yet here, in another design corner, people still prefer a lack of real safety and javadoc to 'doing the right thing'. Why? Primarily because of the lack of operator overloading.

Ironically, I think this is an area that really might change in the future. But if JSR-310 has chosen to use int rather than proper classes by that point, a great opportunity will have passed.

Summary

My assertion that language-level features (or rather missing features) affect library design isn't really news. The interesting thing with JSR-310 is just how constraining the missing features are proving to be.

The trouble is that with no clear idea on where the future of the Java language lies, a JSR like 310 cannot make good choices which will fit well with future language change. The danger is that we simply create another date and time library that doesn't fit well with the Java language of 2010 or later.

Opinions welcome on whether JSR-310 should completely ignore potential language changes, or try to make a best guess for the future.

32 comments:

  1. BigDecimal: the real problem I think has nothing to do with the fact that BigDecimal has no literals and operators, but because it's slow. I could certainly understand if some would say "hey, I'd rather have some slight error in my milliseconds than bad performance" (it might very well be that BD is quite fast nowadays, but then it at least still has a bad rep)

    Immutables: could you elaborate on this? I don't see any big problems with the way we do it now?

    Self types: I can see that this would be useful, would it allow things like List<>?

    Operator overloading: well this subject has been beaten to death, but I must say your Apples and Oranges example doesn't really convince me. The same could also be accomplished by introducing type aliases or typedefs, where you could say:

    type foo = long;

    Which would have the same operations as a long without actually being the same type (I would introduce this for all types so we can also use it to get rid of those lengthy generified types like List, Value>>>)

    ReplyDelete
  2. Yet another example on how the power features of Java also hinder development of APIs like JSR-310, problem being that you don't have some freedoms found in other languages =(
    Even if the API is not the one you envisioned I hope it helps setting straight the Date/Calendar "fiasco".

    ReplyDelete
  3. As I understand it, IBM's BigDecimal class got into Java 5, with performance improvements and MathContext. I believe the lack of operator overloading *is* the problem; C++'s abuse of << and >> is not a good reason to leave it out of Java.

    If we could define new int types like Ada, I hope they wouldn't be assignable to and from int without casting, which would still leave JSR-310 out of date.

    ReplyDelete
  4. How I wish a gang of bottom-up people like Stephen worked for Sun and had the power to reform and bring forth a Java 3. Yeah I know, I'm a dreamer...

    ReplyDelete
  5. Casper, there’s a team of people at EPFL in Switzerland working on Scala http://www.scala-lang.org/, which looks a lot like Java 3 should to me. Others might recommend Groovy.

    Stephen, if you write something that works well with Scala, Groovy, Rhino, Jython, etc. and leaves Java to the enterprise developers, I’d be happy. Except when I’m doing enterprise development.

    ReplyDelete
  6. Stephen Colebourne6 October 2007 10:33

    On BigDecimal, from reading the code I believe that the performance should now be OK (it has a fast and a slow mode internally). As a result, its the literals and operators that are holding us back.

    My immutable point was that if the compiler and hotspot runtime had the info that an object is immutable, then certain performance optimisations become possible.

    I've not heard of List being possible using self-types, although as a return type I think it could be very useful (and actually very necessary).

    I agree that type-aliasing might be a solution to some of the operator overloading questions, I'm not yet convinced that its the whole solution.

    On Java 3, I do believe that the best thing for Java would be a mildly incompatible new version with a supplied migrator that converts Java 2 code to Java 3. What I mean is a language that at first glance looks very close to Java (rather like Groovy).

    On Scala, I think it has great potential, however I hate that where they place the type (val surname : String). As a result its not my Java 3.

    ReplyDelete
  7. i think, operator overloading and self typing is really a nice addition.

    i had a talk with many developers regarding operator overloading over java, most of them oppose about this feature, they mentioned, it must introduce a new level of confusion. where many api will trend to reintroduce operator behaviors.

    i belief, feature doesn't harm at all, the harm is coming from the usages,, better design pattern and convention might prevent from such problem.

    i would love to use both features.

    thanks for mentioning about 310.

    ReplyDelete
  8. "On Scala, I think it has great potential, however I hate that where they place the type (val surname : String). As a result its not my Java 3. "

    As long as you guys get hung up on trivial issues like this, you will always be stuck with crappy mainstream languages.

    ReplyDelete
  9. Slava: you prefer to stuck with a language nobody will ever use.

    ReplyDelete
  10. Torbjörn Gannholm7 October 2007 20:05

    You lost me, I don't understand the problem. What's wrong with representing a duration as an array of long amounts of milliseconds?

    Just because you want to have Days doesn't mean your internal representation has to be that. And who wants a decimal number of days?

    ReplyDelete
  11. Maybe it's just that it's too late, but I don't understand this at all.

    1. BigDecimal. So there are not operators and you'll have to use method calls. It sounds crazy to me that this prevents somebody from using a certain data type. For instance I don't understand how people managed in designing Vector and Matrix libraries, that of course have the same problem, but in some really really worse instantiation.

    2. Immutable. I don't understand what the 'immutable' would do that we can't do by putting 'final' on all fields. And which kind of optimizations would deliver?

    2. Selftype. "This can be achieved today by manually overriding the method in each and every subclass. However that doesn't work if you want to add a new method to the abstract superclass in the future and have it picked up by every subclass (which is what you want in a JSR, as a JSR doesn't have a perfect crystal ball for its first release)."

    What's the point? If you're changing the superclass adding a method you're changing the API (I mean, there's no existing client calling that method), so where's the problem in adding the methods in subclasses? Adding manually a few lines to subclasses seriously disrupt the capacity of designing an API? :-)

    "Again, a language-level missing feature severely compromises a library-level JSR."

    So how people managed in designing the 309 JSRs prior Date Time? :-)

    ReplyDelete
  12. PS Couldn't the 'self type' be replaced by this?

    class AbstractDateTime
    {
    T plusYears() { return (T)new AbstractDateTime(); }
    }

    class DateTime1 extends AbstractDateTime
    {
    }

    class DateTime2 extends AbstractDateTime
    {
    }



    class X
    {
    public void method()
    {
    DateTime1 dt1a = new DateTime1();
    DateTime1 dt1b = dt1a.plusYears();
    DateTime2 dt2a = new DateTime2();
    DateTime2 dt2b = dt1b.plusYears();
    }
    }

    ReplyDelete
  13. Of course the last line in the code above should be:

    DateTime2 dt2b = dt2a.plusYears();

    ReplyDelete
  14. ... and the method signature shoudl be really:

    T plusYears() { return (T)new AbstractDateTime(); }

    ReplyDelete
  15. Torbjörn Gannholm8 October 2007 05:47

    Your use case for self types is also false. How can you be sure that addition is carried out the same way in different sub-classes of AbstractDateTime? I would say it is more likely that it isn't, that different sub-classes would represent different time systems with different basic units that would be converted. Or, if not very dfferent units, at least different precisions which amounts to basically the same thing.

    The different precisions is what makes your BigDecimal use case false, too, by the way.

    I am glad you are starting to see my point about immutable classes, though.

    ReplyDelete
  16. "My immutable point was that if the compiler and hotspot runtime had the info that an object is immutable, then certain performance optimisations become possible"

    Is there any information you can give to support that statement? (Not that I don't believe you, I just want to see if it would be worth it)

    ReplyDelete
  17. Operator Overloading isn't necessary if you take an alternate approach in the object model. If you allow for logical definition of numeric types as a subset of an existing numeric type, you'd be able to have separate classes for apples and oranges such that you can do basic arithmetic with either apples or oranges but you can't mix them. Sun has, so far, not been terribly receptive of this alternative.

    ReplyDelete
  18. Because 1/60 doesn't have an exact BigDecimal representation, BigDecimal isn't really any better for the value of Days than double (or float). You would need to use some sort of rational fraction to avoid the same sort of unexpected results as plague naive users of binary floating point types.

    ReplyDelete
  19. > public immutable class DateTime

    So you are asking for a "const" keyword?

    ReplyDelete
  20. Stephen Colebourne8 October 2007 18:21

    @Mark, the same thought on 1/60 has been on my mind too. Is there any reason why a future language wouldn't offer rational fraction as a basic numeric type (instead of decimal/double/float)?

    @David, I agree that there is more than one solution to the issues/use cases I raised, and operator overloading is only part of the problem. Aliasing types (separately) could easily be part of the solution.

    ReplyDelete
  21. On BigDecimal: the performance is acceptable for most apps that REALLY need this (e.g., financial stuff), but it's still a couple orders of magnitude slower than primitive numeric types. Depending on the algorithms you plan to have there, BigDecimal is still a no-no; especially algorithms that require iteration - a for loop that creates thousands of BigDecimal objects is still a pain, even with modern GCs. The lack of mutable BigInteger/BigDecimal types is a huge blunder. (I implemented a Mandelbrot generator with BigDecimal, and it sucks ass.)

    On 'immutable' keyword: Not necessary, this is a very good use case for annotations (JSR-305 should be a good host for an @Immutable annotation).

    ReplyDelete
  22. While some languages already have rational type libraries, I think it unlikely that future general purpose languages would offer rationals instead of decimal/double/float. For all their complications, floating point types are just too useful (and have good hardware support). After all why do people spend so much time agonising over differences in the 15th decimal place of measurements that are only accurate to 6 places anyway? And of course why did they print all 15 places?

    Back to durations. What are you trying to achieve with fractional days anyway? Why not just measure durations in double seconds. A double can represent integer seconds exactly up to 2^53 (quite a while!). This means that all the 'human' scale durations are represented exactly, while still handling other values seamlessly. Formatters (and conversions to days:hours:minute ... representations) will require some care in implementation, but this need only be done once.

    ReplyDelete
  23. On immutable, a simpler approach is to use a marker interface. This makes it possible to identify an immutable object using existing Java syntax. As for performance gains, a compiler could already recognise a class that had no mutators and was thus immutable. However, another property often suggested in conjunction with immutability is value semantics --- where == maps to equals(). This does allow efficiency gains at least for small objects. In particular arrays can just contain the values instead of references to objects. This saves a reference and an object header for every entry.

    ReplyDelete
  24. Slava Pestov is right about Java folks not taking time to expand their minds a little and learn from other languages like Scala, OCaml, ML, Haskell...

    Microsoft is doing massive amounts of language R&D right now and C# 3.0 is the result. They aren't sitting still either -- they will be taking on the multi-core problem next.

    The Scala guys are on the right track.

    ReplyDelete
  25. For a duration type class I have been using an immutable class that represents a starting and ending time (DateRange). My common use case is to do something for every nth interval where the interval could be hours, days, weeks, etc. For this I have an iterator class that takes the range, the interval, and an instance of a class representing the function to perform. At the application level you can define a constant range and use it with any number of constant iterators. The iterator can use any of the basic calendar intervals or be user defined such as Friteenth (friday before 3rd sunday).

    ReplyDelete
  26. I do not have any opinion on BigDecimal, self types and operator overloading right now, but I like what you say about immutable at language level.
    It is actually the case where we as programmers face an idiom, which we currently signifies as design patterns, similar to Singleton, Builder, Decorator etc.
    What about:
    public singleton mySingleton {
    ...
    }
    or
    public class CRCOutputStream decorate OutputStream {
    ...}

    This is basically a need for a domain specific language for ourselves, which I occasionally have used precompilers to address..
    There are some existing magics already (take for instance transient...).

    But basically, couldn't much of the trouble be avoided using @annotations?

    ReplyDelete
  27. It seems to me that, sadly, there was never any real thought given to the treatment of mutability and immutability in the Java libraries. An excellent counter-example can be found in Apple's Cocoa frameworks (most notably in the Foundation framework), which establish the convention that immutable classes are base classes that have mutable descendants.

    Setter methods for immutable properties are declared in subclasses. The root class (NSObject) implements copy() and mutableCopy() methods that can be overridden as necessary by subclasses ('copy' being somewhat analagous to Java's clone() method), meaning that any mutable instance can provide an immutable copy of itself, and vice versa, through a simple and pervasive API.

    Come to think of it, it might be helplful to take a look at how time intervals are handled in the Foundation framework. I haven't worked with Cocoa in a while, but as I
    recall, its handling of dates and time values was fairly elegant. I know the Foundation framework represents time intervals as a double (typedefed as NSTimeInterval), where the integral portion represents seconds. I'm not sure precisely what the code does under the covers to manage rounding issues, but I can't imagine that it would be all that daunting.

    ReplyDelete
  28. I think there's a lot of people who don't understand why we need a Duration object. Why not just a number of milliseconds? Well, because not all days are created equal. Where I live, in a place with daylight saving time, one day each year has 23 hours and one day has 25 hours. The other days all have 24 hours. Likewise (and more obviously) not all months have the same number of days.

    Any proposed implementation of Duration that doesn't take these basic requirements into account isn't worth implementing. So, just a number of milliseconds isn't good enough. And likewise neither float nor BigDecimal is of any use in a Days object. "6 days" makes sense, but "2.3 days" doesn't.

    I know you weren't talking about that specifically, you were just using it as an example to beat on the Java language designers. But clearly there's a lot of responders that don't get it.

    ReplyDelete
  29. Stephen Colebourne13 October 2007 00:18

    @Osvaldo, Thanks for the performance indication on BigDecimal. Maybe we need the bytecode extensions being discussed elsewhere (MLVM) to include decimal/rational handling.

    @Kevin, I'll blog more about my opinion on Scala at some point :-)

    @Neils, Your singleton, related to immutable, is another design pattern that would be better expressed in the language.

    @Paul, Your explanation on duration adds the clarity which I didn't quite get to. The best approach will probably end up being one class for scientific precise durations, and another for human-scale durations/periods like days & months.

    BTW, I don't entirely agree that 2.3 days is not useful. At work I can enter 2.5 hours into my time-recording program. However it is instantly converted to 2 hours 30 minutes, so perhaps the double is just used as a constructor/factory, and int used internally.

    ReplyDelete
  30. Donno if you have seen it Stephen, but your reasoning is being challanged on this blog:
    http://cafe.elharo.com/java/operator-overloading/

    (Still trying to figure out what group and a ring he is referring to.)

    /Casper

    ReplyDelete
  31. Stephen Colebourne13 October 2007 14:27

    @Casper, thanks for the link - I'll have to post about operator overloading separately soon

    ReplyDelete
  32. My own pet project:

    http://pec.dev.java.net/nonav/compile/index.html

    Is an extended compiler for patterns, one of the patterns is immutable. The compiler uses a marker interface and enforces immutability.

    ReplyDelete