Thursday, 6 November 2014

Better nulls in Java 10?

Rethinking null when Java has value types.

Null in Java

Was null really the billion dollar mistake? Well, thats an interesting question, but what is certain is that null is widely used in Java, and frequently a very useful concept. But, just because it is useful doesn't mean that there isn't something better.

Optional everywhere!

There are those that would point you to adopt the following strategy - avoid using null everywhere, and use Optional if you need to express the possible absence of something. This has a certain attraction, but I firmly believe the Java is not yet ready for this (in Java SE 8) as per my previous post on Optional.

Specifically, all the additional object boxes that Optional creates will have an effect of memory and garbage collection unless you are very lucky. In addition, its relatively verbose in syntax terms.

Nullable annotations

A second approach to handling null is to use the @Nullable and @NonNull annotations. Except that it is only fair to point out that from an engineering perspective, annotations are a mess.

Firstly, there is no formal specification for nullable annotations. As such, there are multiple competing implementations - one from FindBugs, one from JSR-305, one from IntelliJ, one from Eclipse, one from Lombok and no doubt others. And the semantics of these can differ in subtle ways.

Secondly, they don't actually do anything:

  @javax.annotation.ParametersAreNonnullByDefault
  public class Foo {
    public StringBuilder format(String foo) {
      return new StringBuilder().append(foo);
    }
    public StringBuilder formatAllowNull(@Nullable String foo) {
      return new StringBuilder().append(foo);
    }
  }

Here, I've use @javax.annotation.ParametersAreNonnullByDefault to declare that all parameters to methods in the class are non-null unless otherwise annotated, that is they only accept non-null values. Except that there is nothing to enforce this.

If you run an additional static checker, like FindBugs or your IDE, then the annotation can warn you if you call the "format" method with a null. But there is nothing in the language or compiler that really enforces it.

What I want is something in the compiler or JVM that prevents the method from being called with null, or at least throws an exception if it is. And that cannot be switched off or ignored!

The nice part about the annotations is that they can flip the default approach to null in Java, making non-null the default and nullability the unusual special case.

Null with value types in JDK 10

The ongoing work on value types (target JDK 10 !!) may provide a new avenue to explore. Thats because Optional may be changed in JDK 10 to be a value type. What this might mean is that an Optional property of a Java-Bean would no longer be an object in its own right, instead it would be a value, with its data embedded in the parent bean. This would take away the memory/gc reasons preventing use in beans, but there is still the syntactic overhead.

Now for the hand-waving part.

What if we added a new psuedo keyword "nonnull" that could be used as a modifier to a class. It would be similar to @javax.annotation.ParametersAreNonnullByDefault but effectively make all parameters, return types and variables in the class non-null unless modified.

Then, what if we introduce the use of "?" as a type suffix to indicate nullability. This is a common syntax, adopted by Fantom, Ceylon and Kotlin, and very easy to understand. Basically, wherever you see String you know the variable is non-null and wherever you see String? you know that it might be "null".

Then, what if we took advantage of the low overhead value type nature of Optional (or perhaps something similar but different) and used it to represent data of type String?. Given this, a "nonnull" class would never be able to see a null at all. This step is the most controversial, and perhaps not necessary - it may be possible for the compiler and JVM to track String and String? without the Optional box.

But the box might come in useful when dealing with older code not written with the "nonnull" keyword. Specifically, it might be possible to teach the JVM to be able to auto box and un-box the Optional wrapper. It might even be possible to release a standard annotation jar that projects could use on codebases that need to remain compatible with JDK 8 or 9, where the annotation could be interpreted by a JDK 10 JVM to provide similar nullability information.

  public nonnull class Foo {
    public StringBuilder format(String foo) {
      return new StringBuilder().append(foo);
    }
    public StringBuilder formatAllowNull(String? foo) {
      return new StringBuilder().append(foo);
    }
  }

The example above has been rewritten from annotations to the new style. It is clearly a shorter syntax, but it would also benefit from proper integration into the compiler and JVM resulting in it being very difficult to call "format" with a null value. Behind the scenes, the second method would be compiled in bytecode to take a parameter of Optional<String>, not String.

The hard part is the question of what methods can be called on an instance of String?. The simple encoding suggests that you can only call the methods of Optional<String>, but this might be a little surprising given the syntax. The alternative is to retain the use of null, but with an enforced check before use.

The even harder part is what to do with things like Map.get(key) where null currently has two meanings - not found and found null.

This concept also appears to dovetail well into the value type work. This is because value types are likely to behave like primitives, and thus cannot be null (the value type has to be boxed to be able to be null). By providing proper support for null in the language and JVM, this aspect of value types, which might otherwise be a negative, becomes properly handled.

It should also be pointed out that one of the difficulties that current tools have in this space is that the JDK itself is not annotated in any way to indicate whether null is or is not allowed as an input or return type. Using this approach, the JDK would be updated with nullability, solving that big hurdle.

(As a side note, I wrote up some related ideas in 2007 - null-safe types and null-safe invocation.)

Just a reminder. This was a hand-wavy thought experiment, not a detailed proposal. But since I've not seen it before, I thought it was worth writing.

Just be careful not to overuse Optional in the meantime. Its not what it was intended for, despite what the loud (and occasionally zealot-like) fans of using it everywhere would have you believe.

Summary

A quick walk through null in Java, concluding with some hand-waving of a possible approach to swapping the default for types wrt null.

Feel free to comment on the concept or the detail!

22 comments:

  1. The ? concept you describe has been in C# for many years, but still has failed to prevent most NPEs. These core concepts cannot work as afterthoughts in my view...

    ReplyDelete
    Replies
    1. They only exist on value types in C#, not class types.
      But yes, making such a big change that breaks existing code so heavily doesn't seem like a good option. Maybe make the syntax the other way around, so a String! is a non-nullable class.

      Delete
    2. The proposal doesn't break existing code. Only if the class has "nonnull" in its definition can you use the String? syntax. The hand-wavy theory is that the JVM/language can be taught to allow older bytecode to interoperate successfully.

      Delete
    3. Is there any bytecode change needed? To me it looks like adding a bunch of "checkNotNull" should do. All legacy code would be treated as allowing null everywhere, unless external annotations get added (the Kotlin way). The biggest problem with null is that it may come in unexpected, and the compiler forcing you to add "?" solves it nicely.

      Delete
    4. So if it's only for notnull classes, it becomes the same as the c# implementation, which in my opinion works.

      Delete
  2. Hi, you might already be aware of the checker framework and its nullness checker, but if you aren't you might want to take a look at it:

    http://types.cs.washington.edu/checker-framework/current/checker-framework-manual.html#nullness-checker

    They provide annotated JDK 7 and 8 and it's possible to have compilation fail on violations.
    They also provide their own javac if you are targeting JDK 7 and want to use annotations in places that are only allowed since Java 8.

    ReplyDelete
    Replies
    1. Very aware of it, but very unconvinced. Null is a language-level problem, and annotations and checker are a poor band-aid.

      Delete
  3. For backward compatibility, I think you'd need to introduce a symbol for nonnull-ness (e.g. !) instead of one for nullness (?):

    String a; // nullable
    String! a; // nonnull

    ReplyDelete
    Replies
    1. I don't, because by adding the "nonnull" keyword, I'm flipping the default within that class. How it interacts with code that doesn't use the new keyword is the more interesting question.

      Delete
    2. The code would then become less readable, because every time you see a type somewhere in code you also must look at the class declaration to understand what it really means.

      Delete
    3. Lovro gets it right. The problem with switches that modify how the compiler parses your code (and the class level nonnull annotation is just that, a switch) is that now interpreting the code becomes context dependent. Java has always stayed away from such switches for good reason.

      With my suggestion, a statement "String a" will always mean what we expect it to mean: a is a nullable reference.

      Delete
    4. This blog asks readers to imagine a future world where the whole of the JDK has been changed to use "nonnull". In that world, what is abnormal today will seem normal. Just as generics and lambdas are now normal. In this case, there is plenty of academic evidence to suggest that non-null is what most developers want as the default, and in my recent coding on Java 8, I find that most uses of null can be avoided. A String! solution for non-null simply won't cut it IMO, nor will Optional. Its the default that we need to change.

      Delete
    5. I fully agree with you, but I'd go further and get red of the nonnull keyword. Just mark the source (and I mean mark the source, not add a compiler switch) as Java 10 and everything will be non-null. This mark could be something as simple as writing "package 10 com.foo.bar;". You need no keyword and no source level compatibility (a Java 10 compiler must be able to process older sources, but once you add the mark, all the syntax garbage like "int a []" and can raw types can go).

      Delete
  4. I guess there are many ways to implement this and the "!" vs "?" discussion shows that it is no easy decision what to pick. But in general I think that something like this is very necessary and those difficulties should not stop the introduction of such an important feature.

    ReplyDelete
  5. As the java.util. value based class, wouldn't JVMs have the option of storing these similar to a primitive - on the stack or inline if being member of a class, instead of adding another level of indirection via a reference, which would put additional pressure on the heaptalking about massive collections of Optionals, which frankly sounds like a misuse of the structure anyways.

    ReplyDelete
  6. Really looking forward to better nulls and value types, would have been better if they target this feature for Java 9 instead of 10, given how fed up Java community is from null checks and NPEs.

    ReplyDelete
  7. I agree with Salam Kaser, you can't "fix" poor understanding of fundamentals.
    It would not come to your mind to try to provide "better division by zero in java 10" because you expect people to know a minimum about math. IDEs don't even warn you if you type : "float x = 10 / 0;".

    NPEs are the "division by zero" of object instanciation in my opinion.
    We already have plenty of tools to prevent / account for it properly (compiler warnings, IDEs, Optional, Validators, utilities).
    The only slight improvement I like is the syntactic sugar you proposed with null safe invocation : person#getAddress#getZipCode.
    In the wrong hands, it may very well lead to pushing programming errors further in a program.

    Overly protective default behavior brings well documented troubles in JavaScript : it takes even more understanding of the language to figure out how it compensates for "mistakes".


    Bottom line, you will never prevent poor understanding and bad design, and this is not limited to programming... heck, some people microwaved pets.

    ReplyDelete
    Replies
    1. > I agree with Salam Kaser, you can't "fix" poor understanding of fundamentals.

      Right, but this is not what this post is about. Whenever you need to access a member of `a` which is not proven to be non-null, you're making a decision what should happen for null. The problem is that you're not always aware of this decision and/or the nullability of `a`.

      Not allowing `a.` for nullable `a` makes this decision explicit. The compiler forces you to use either `.?` to handle null safely or `!.` (Kotlin syntax) to throw.

      > NPEs are the "division by zero" of object instanciation in my opinion.

      Right, but division by zero is so rare that nobody cares. There are quite a few big programs using no division at all (or a division by a constant only), but there is none doing no dereferencing.

      > We already have plenty of tools to prevent / account for it properly (compiler warnings, IDEs, Optional, Validators, utilities).

      We don't have anything good. `Optional` costs performance, by design should not be used for fields, leads to forking between Optional-using and null-using libraries, and can't be used in thousands of existing libraries. All the others need some annotations, which are nothing but a too verbose and weak alternatives to this proposal.

      > The only slight improvement I like is the syntactic sugar you proposed with null safe invocation : person#getAddress#getZipCode.

      The big improvement is making you aware of the possibility of `person` and `address` being null and forcing you to decide between the null-safe and throwing syntaxes.

      > In the wrong hands, it may very well lead to pushing programming errors further in a program.

      And in the good hands, it can lower the probability of NPE bugs by a huge factor.

      Delete
    2. > The big improvement is making you aware of the possibility of `person` and `address` being null and forcing you to decide between the null-safe and throwing syntaxes.

      I used the word "slight" because I see it more like a really useful shortcut than a new feature.
      Don't get me wrong I'm not shooting down the importance of dereferencing, I'm doubting of the cause here.
      The really tricky NPEs I've seen so far came from bad design and/or poor management of external factors : bad use/understanding of frameworks, external configuration, lazy loading, bad input validation (file, user input..).
      These get mostly fixed by fundamentals, documentation and testing... not at compile time.

      IMHO the NPEs that this article will fix are the most obvious ones, not the ones that cost hours (and money) to debug. Therefor, does it warrant changing the default for null management in the whole language?
      What I mean is : if the problem lies between the chair and the keyboard, are you sure you want to heavily patch the language?
      Or maybe my reasoning is flawed somewhere ?
      Or maybe I'm just used to limping around a clunky language feature so much, that I don't see the point of fixing it?

      Delete
    3. > I used the word "slight" because I see it more like a really useful shortcut than a new feature.

      Not exactly a shortcut. The important part (for me) is the differentiating between nullable and non-nullable references. With a nullable reference, you can't use plain `.`, so `?.` and `!.` are the only choices (unless you make the compiler recognize `a!=null ? a.b : whatever` like Kotlin does).

      > IMHO the NPEs that this article will fix are the most obvious ones

      Agreed, but the obvious ones are more abundant and can get costly too if they slip through CR and tests.

      > Therefor, does it warrant changing the default for null management in the whole language?

      IMHO yes, as the change is backward compatible (old sources get recognized as old and can be compiled) and it's a change in the right direction. Moreover, it's small change and it makes all the @Nullable, @Nonnull, and @NotNull crutches obsolete.

      > if the problem lies between the chair and the keyboard, are you sure you want to heavily patch the language?

      The problem sort of always lies there and the change seems to me pretty trivial when compared to lambdas.

      My experience tells me that it's always good to be explicit and this feature allows to be explicit about nullness with a very terse and clean syntax. Alone the fact that it documents what may be null is IMHO worth it.

      Delete
  8. Nice post. Null is tricky, invaluable, and a necessary evil all at once.

    I kind of agree with Cedric re nonnull. Although an IDE can be made to highlight a nullable declaration differently than a nonnullable one, in general it would be confusing to deal with a nonnull class modifier for the reasons he points out. That said, I agree with your more general point that nonnull should be the default. But assuming that ship has sailed, the nonnull type indicator (!) tossed around in other discussions, both Java and C#, seems like the pragmatic choice.

    Personally, I'm not convinced nullable state is as big a problem as many claim. I despise null checks as much as the next guy, but a lot of state out there is mutable and a lot of that mutable state is at one point in its life-cycle nullable. Just sayin'. I'll exercise restraint and not rant about this.

    Another part of me simply wants Oracle to break free of the innovator's dilemma they seem to be in and finally create a Java 2.0, shed all the baggage, and join the 21st century. Java 1.x is a good thing, but it's necessarily stuck in the past and will remain there forever as long as there is an industry depending on it. But there's a growing population of new programmers building new software not in Java at least partly because Java suc... er... hasn't kept pace. If Oracle wants to recapture or at least tap into this market segment and continue to innovate, they need to consider forking Java 2.0.

    The Java 2.0 language needs to remain Java, not Groovy, not Scala, not Closure, or any of the others. But it also needs to fix things, the big ones, that keep it firmly planted in 1999. The Java 2.0 VM needs to be redesigned from the ground up with polyglot in mind as well as many other concerns (hint: rape and pillage the CLR). And for f**k sake don't provide method overloading!

    Anyway, great post as usual.

    ReplyDelete
  9. Having used Java for many years, and Scala for the last few years, I'd say that Scala's Option has been a huge win in preventing NPE's. With Scala, I've come close to eliminating NPE's entirely, whereas with Java I expect NPE's routinely. So my answers to your questions are:

    Was null a billion dollar mistake? Yes.

    Does Optional everywhere lead to bad performance? Actually, you don't really use "Option everywhere". As it turns out, most stuff isn't optional. So the boxing overhead isn't an issue most of the time. If it becomes a performance problem, then avoiding Option is always, well, an option.

    The syntactic overhead is slightly annoying but (1) one gets used to it pretty quickly, (2) there are nice ways of making it more terse (e.g. applicative style), and (3) there are benefits to the syntactic overhead such as consistency with how other container types are treated (Lists, etc.) For example, Options have the same collections methods as everything else: map, flatMap, foreach, etc.

    ReplyDelete