Tuesday, 21 September 2010

Two features for #bijava

In my last post I brought up the idea of a backwards incompatible version of Java. What might just two of the incompatible changes be?

Backwards incompatible Java - #bijava

Firstly, a reminder. The key with this proposal is that these changes still result in a language that is still recognisably Java. The goal is to enhance the working lives of 10 million developers, most of whom do not have the luxury of just switching to another language. Improving their productivity by just 5% would be a huge benefit globally.

Secondly, its obviously the case that these ideas are not new. Scala, Groovy and Fantom are all trialling different approaches to a better Java right now. What hasn't been talked about has been taking some of these new approaches directly into Java once you remove the absolute hurdle of backwards incompatibility.

Remove primitives, Add nullable types

The first example is primitives and nullable types. The approach here is interesting, in that we remove a language feature in order to be able to add a better one.

Exposed primitives are a pain in the Java language. They were added to version 1.0 for performance and attracting developers from C/C++. Version 1.0 of Java was slow enough as it was, so performance mattered. And attracting developers is a key reason to add any new feature.

However, primitives have caused major problems to the language as it has evolved. Today, you cannot have a generified list of primitive types for example. The split between primitive and object types causes this.

The more appropriate solution is to have every type in the source code as an object. It is then the compiler's job to optimise to use primitives where it makes sense. Doing this effectively is where the nullable types comes in.

Nullable types allow you as a developer to specify whether any given variable can or cannot contain null. Clearly, an int is at a high level equivalent to an Integer that cannot contain null. Equivalent enough that a smart compiler could optimise the difference and use the primitive under the covers when the variable is declared as non-null.

But it turns out that academic research has shown that programmers actually intend the default of most variables to be not-null. Thus, we need to change the meaning of variable declarations.

// Java code - (ignoring closures for this example)
 public Person findPerson(List<Person> people, String surname) {
   if (surname != null) {
     for (Person person : people) {
       if (surname.equals(person.getSurname())) {
         return person;
       }
     }
   }
   return null;
 }
 // #bijava code
 public Person? findPerson(List<Person> people, String? surname) {
   if (surname != null) {
     for (Person person : people) {
       if (surname.equals(person.getSurname())) {
         return person;
       }
     }
   }
   return null;
 }

Here we have added the ? symbol to any variable that can hold null. If a variable cannot hold null, then it is does not have the ? on the type. By doing this, the compiler can check the null-handling behaviour of the code. For example, the line "surname.equals(...)" would not compile without the previous check to ensure that surname was non-null.

In summary, this is a classic change which cannot be made today. Removing primitives would break code, so would changing the meaning of a variable declaration such that the default is non-null variables. Yet both are good changes.

The point here is that the resulting language is still effectively Java. We haven't scared off lots of developers. Its a challenging change for the language designer and compiler writer, but results in a lot better code for 10 million developers.

Equals

The second example of an incompatible change is the equals method.

In Java today, we use the .equals() method all the time for comparing two objects. Yet for primitives we have to use ==. The reasons are ones we rarely think about, yet if we take a step back its clearly odd.

Given how frequent the .equals() method is used, it makes perfect sense to have an operator for it. Clearly, the right choice for the operator is ==. But we can't make this change to Java as it is backwards incompatible.

But, with #bijava, this change can be made. The existing == operator is renamed to ===, and a new operator == is added that simply compiles to .equals(). (Technically, it has to handle null, which is another reason why nullable types help.)

// #bijava code
 public Person? findPerson(List<Person> people, String? surname) {
   if (surname != null) {
     for (Person person : people) {
       if (surname == person.getSurname()) {
         return person;
       }
     }
   }
   return null;
 }

As shown above, this change, seen in many other languages, has a huge impact on the readability of code. If you are working today, try spending 5 minutes replacing .equals() by == in some of your code, and see the readability benefits.

Of course this is another example of a change where we need a backwards incompatible version to gain the benefit.

Summary

Neither of these changes are really that radical. It is entirely possible to write a tool that will convert source code from the old form to the new and back again. The tool, plus the JVM bytecode, provides the backwards compatability story necessary to reassure managers.

Some will say that these examples aren't radical enough. And they're right. (They are just two proposals of many for what would be included in #bijava.) But the key point is that #bijava must be close enough to Java (post JDK 7/8) such that that huge body of 10 million developers can be brought along without excessive training or learning needs. Specifically, each change above can be taught to a developer in less than five minutes just standing at their desk.

It also means that #bijava is not a threat to Scala, Groovy, Clojure, Fantom or whatever you're favourite language is. Each of these has their own target area, and Java cannot and will not morph into them. Thus, they are free to continue to argue their own case as to why developers should just move away from Java completely.

#bijava isn't a panacea. It will not result in all the changes we might want. But it changes the mindset to allowing there to be some incompatibilities between versions of Java, providing the right supporting infrastrucure (modules, conversion tools) are present.

Feedback welcome as always! Please use #bijava to talk about backwards incompatible Java on twitter.
PS. I'll get comments invalidly marked as spam out as soon as I can!

30 comments:

  1. I know you've rejected checked exceptions, but would you be interested in this compromise for bijava?
    http://blog.lidalia.org.uk/2010/01/checked-and-unchecked-exceptions.html

    ReplyDelete
  2. More on the topic of this post - when it comes to equals I would favour going down the Scala path and just allowing what we call operators as method names, and allow single argument methods to be called as instance methodname argument.

    ReplyDelete
  3. It might help if you could declare types (classes) which the === operator was not permitted. That is these types were always compared by value. This avoids the problem of distinguishable instances of new Integer(3).

    ReplyDelete
  4. I note that when it comes to comparing Wrapper types and primitives e.g. (Double) 0.0 == 0.0, arithmetic comparison is used in preference to object comparison.

    ReplyDelete
  5. If the compiler/JVM was smart enough to optimise List correctly or Map this could be prefereable to using the the Trove4j libraries.

    ReplyDelete
  6. To me it looks like Google not Oracle will be in better position to implement biJava. They have a lot of elements ready with Java fork for Android, and will probably need to give up the Java compatibility anyway.
    Why not make this language better in the process.

    ReplyDelete
  7. Stephen Colebourne21 September 2010 at 21:29

    @Robert, I can guarantee that a language in the style of Java (as #bijava is) will not have method name operators or methods called using spaces. Its a completely different style of language, and encourages the worst sides of operator overloading.

    ReplyDelete
  8. Sorry if this is a bad question, (I am kinda new). Why don't you prefer Validators instead? Where you can have a @NotNull annotation. Imagine if that gets built into the Java spec and introduced in JDK7, we can do crazy validation stuffs directly which can be more powerful than "?" and backwards compatible. I was introduced by validators from a seminar by Mike Keith, and I was impressed, but I believe it would be awesome to have in the JDK directly.

    ReplyDelete
  9. In a Java Posse thread I mentioned that Fantom's type system is unsound, but did not go into it further because it was off topic. Apparently here's my chance to explain.

    The issue is observability of fields' values before they are initialized. There are some simple fixes and work arounds:

    - A null check on every non-null field read (given correct code it's virtually free with branch prediction). *It is unsound.*

    - All fields must be initialized before the super-constructor is called (or directly afterwards and calls to overridable methods are disallowed during construction); access to this is disallowed before then. Sound, but has a lot of slack: You can't create two objects referencing each other in final non-null fields.

    - Delayed types (MS research for Spec#)

    - Pre-objects (Dissertation of Richard Cobbe: "Much ado about nothing : putting Java's null in its place")

    Frankly I think the first option (Fantom) is best, but there should be warnings for potentially unsafe access to this during construction. Delayed types are a close second.

    With kind regards
    Ben

    ReplyDelete
  10. @Mohamed Unfortunately the annotation JSR 305 (which has @Nonnull) is dormant, and JSR 308 which would allow such annotations in more places has been delayed until JDK8. I agree that this could deliver the 'nullable/nonnullable' feature suggested by Stephen.

    ReplyDelete
  11. The idea of having to put @NotNull in front of a parameter or field when research suggests 2/3 of parameters are intended to be non-null fills me with dread. Java is verbose already.

    ReplyDelete
  12. The JSR 308 proposal would allow the specification of a default nullability for a type, so you could make @NotNull the default. I prefer the annotation to Stephen's '?'.

    ReplyDelete
  13. So 1/3 of references will then need @Nullable - 9 characters - in front of them in a language already somewhat obfuscated by its verbosity. A one character ? distinguishing between references that can and cannot be null strikes me as simple, intuitive and light years better than all that extra noise.

    ReplyDelete
  14. Stephen, I definitely like the idea of BIJava.
    But I also believe that all incompatibilities should result in a compile-time error, in order not to silently change the semantics of existing code. I would therefore prefer keeping the semantics of == operator as is and introduce a new one for equals(). Candidates could be ~~ or ~=.

    ReplyDelete
  15. I disagree with nullable proposal. Because nulls just should not be used ever.

    I think Option (or Maybe) should be added to the standard library. And all JDK APIs that operate nulls should be switched to options (and current null versions marked deprecated).

    ReplyDelete
  16. I think these would be two fantastic changes to the language. I would prefer the ? instead of @NotNull for the reasons already stated, Java is already very verbose.

    ReplyDelete
  17. I like you idea of a #bijava. But I think that your part on .equals() looks a lot like operator overloading. And I think that is something we could hope to see in a new, incompatible version of Java.

    I mean, if primitive types doesn't exists anymore, then operator overloading would be a huge improvement to readability, as replacing .equals() by == would.

    Groovy has a nice example on how to make operator overloading easy, by the way (http://groovy.codehaus.org/Operator+Overloading). Easy to read, not hard to understand.

    ReplyDelete
  18. Jave used to be nice and clean, until idiots like you made it incomprehensible. I've moved on to C# already. Good luck with your rat's nest of a language.

    ReplyDelete
  19. Stephen Colebourne23 September 2010 at 20:47

    @Mark, I agree that unifying == and === for certain types like Integer makes perfect sense.

    On @NotNull vs ?, I consider the class-level @NotNull to be too remote from the definition of the actual type/variable. By being close (and not verbose) it makes devs think more closely about their definitions.

    @Ben, I'm interested that you find that aspect of Fantom to be a concern. Thanks for clarifying the unsound comment though.

    @Stepan, The Scala/Haskell Option type requires a functional mind-set, which is a long way from mainstream. Its also a lot more verbose. If you like that style, use Scala.

    ReplyDelete
  20. My list:

    * make Array a true class of Collections
    * hybrid types: i once had a function that would only work with serializable collections, I had to use generics to make it work, I want "Serializable Collection sc;"
    * unless/until : much easier to read than if(!a) or while(!a)
    *fix up the library in all its nasty places, lets make Date et al immutable, stack not a subclass of vector, maybe a compiler could just wrap/unwrap as necessary between bijava and regular java then?

    ReplyDelete
  21. The reality of fantom is slightly different at the moment. A variable marked non-null can be assigned null because the compiler does not insert null checks on field reads, that is my main concern:

    class Fields {
    ....static Void main() {
    ........//Str x := null; // compile time error
    ........Str x := Fields.make().unsound; // output: null
    ........echo(x);
    ....}
    ....
    ....Str unsound;
    ....
    ....new make() {
    ........this.unsound = unsound0();
    ....}
    ....
    ....Str unsound0() {
    ........return unsound;
    ....}
    }

    Another concern is that the "specification" [1] states that "[a] non-nullable type is guaranteed to never store the null value." It should go into the issues of field reads and explain why a NPE might be thrown when reading a non-null field.

    With kind regards
    Ben

    [1]: http://fantom.org/doc/docLang/TypeSystem.html

    ReplyDelete
  22. Stephen Colebourne23 September 2010 at 23:48

    @Ben, if that compiles, then I would see it as a bug. Code was added to the compiler to check that all fields are non-null at the end of the constructor.

    ReplyDelete
  23. I am for sugaring equals -> ==.

    Since it's gone without saying until this point, we should also make sure the other comparative basics are covered, !=, >, <, >=, <=. Aside from !=, those would obviously check for a Comparable implementation to be valid. So much clearer than checking equality against the returned number.

    As long as we're being backwards incompatible, I'd also have the compareTo method return an enumeration, because returning integers was confusing.

    As for identity equals, do we need another operator for that? Perhaps a protected final method on Object would be sufficient for anyone longing for the old == Object behavior. Alternatively, System.identityEquals().

    Identity equals should still be the default equals implementation, naturally, but I don't otherwise see much use for it. Perhaps I am biased, simply because I've so rarely needed it.

    ReplyDelete
  24. I think you're referring to definite assignment [1] which does not cover this case.

    With kind regards
    Ben

    [1]: http://fantom.org/sidewalk/topic/595

    ReplyDelete
  25. Funny to see most people are talking about things Scala has fixed since years...

    Example: equals vs. ==

    ReplyDelete
  26. Artur Biesiadowski27 September 2010 at 09:49

    I think that entire problem with removing primitive types is around arrays (the real ones, which would exists somewhere behind the scenes, even if you want to hide them from the world).

    You have basically following choices:
    1) As soon as array is concerned, everything is wrapped in object. Performace/memory nightmare.
    2) You have some kind of reified generics and require jvm to create multiple versions of the same code depending on the target type (at least at jit level). Not sure, but I think this is what C#/.NET has chosen to do?
    3)Marker bits for small integers like in many (if not all) implementations of smalltalk. Some performance hit for every single operation and it doesn't really help with double type (which I'm probably most concerned about).

    Unless I have misunderstood you and you just want to be able to type 10.hashCode() which is not a problem at all.

    ReplyDelete
  27. @Artur, I think that an array of Double that doesn't allow null or identity equality of elements can be translated transparently to double[].

    ReplyDelete
  28. Would nullable types also extend to generics?

    I'm not sure why they wouldn't, but there was no example in the text.

    Could prove useful:
    List

    Could also get a bit strange:
    List

    ReplyDelete
  29. To my knowledge, there isn't a "good" solution yet to the problems of initialisation with nullable types. By "good" I mean flexible, low overhead for programmers, and sound. To take a couple of examples, having to guard every dereference with a null-check is no fun for programmers who know by design that certain fields will never be null. The Delayed Types of MSR are very ingenious, but the system in the paper is far too complex for direct use at the course level. The implementation of delayed types in Spec# greatly simplifies the system, but at the expense of soundness.

    As a mild shameless plug, we are hoping to publish a paper with (we think) a "good" solution :)

    BTW, I think nullable types extend to generics ok but I'm not sure anyone has formally presented the details yet.

    ReplyDelete
  30. IMHO, it is also important to polish details: http://2ality.blogspot.com/2010/10/does-java-make-you-less-productive-than.html

    ReplyDelete