Stephen Colebourne's blog

Pattern match Optional in Java 21

2024-02-20T09:19:00.005+00:00

I'm going to describe a trick to get pattern patching on Optional in Java 21, but one you'll probably never actually use.

Using Optional

As of Java 21, Pattern matching in Java allows us to check a value against a type like an instanceof with a new variable being declared of the correct type. Pattern matching can handle simple types and the deconstruction of records. But pattern matching of arbitrary classes like Optional is not yet supported. (Work to support pattern match methods is ongoing).

In normal code, the best way to use Optional is with one of the functional methods:

  var addressOpt = findAddress(personId);
  var addressStr = addressOpt
        .map(address -> address.format())
        .orElse("No address available");

This works well in most cases. But sometimes you want to use the Optional with a return statement. This results in code using get() like this:

  var addressOpt = findAddress(personId);
  if (addressOpt.isPresent()) {
    // early return if address found
    return addressOpt.get().format();
  }
  // lots of other code to handle case when address not found

One way to improve this is to write a simple method:

  /**
   * Converts an optional to an iterable for use in the for-each statement.
   *
   * @param <lT> the type of optional element
   * @param optional the optional
   * @return an iterable representation of the optional
   */
  public static <lT> Iterable<lT> inOptional(Optional<lT> optional) {
    return optional.isPresent() ? List.of(optional.get()): List.of();
  }

Which allows the following neat form:

  for (var address : inOptional(findAddress(personId))) {
    // early return if address found
    return address.format();
  }
  // lots of other code to handle case when address not found

This is a great approach providing that you don't need an else branch.

Using Optional with Pattern matching

With Java 21 and pattern matching we have a new way to do this!

  if (findAddress(personId).orElse(null) instanceof Address address) {
    // early return if address found
    return address.format();
  } else {
    // lots of other code to handle case when address not found
  }

This makes use of the fact that instanceof rejects null. Any expression can be used that creates an Optional - just pop .orElse(null) on the end.

Note that this does not work with Java 17 in most cases, because unconditional patterns were not permitted. (In most cases, the expression will return a value of the type being checked for, and the compiler rejects it as being "always true". Of course this was a simplification in ealier Java versions, as the null really needed to be checked for.)

Is this a useful trick?

In reality, this is of marginal use. Where I think it could be of is a long if-else chain, for example:

  if (obj instanceof Integer value) {
    // use value
  } else if (obj instanceof Long value) {
    // use value
  } else if (obj instanceof Double value) {
    // use value
  } else if (lookupConverter(obj).orElse(null) instanceof Converter conv) {
    // use converter
  } else {
    // even more options
  }

(It is valuable here as it avoids calling lookupConverter(obj) until necessary.)

Anyway, it is a fun trick even if you never use it!

One day soon I imagine we will use something like this:

  if (lookupConverter(obj) instanceof Optional.of(Converter conv)) {
    // use converter
  } else {
    // even more options
  }

which I think we'd all agree is the better approach.

Comments at Reddit

Java on-ramp - Fully defined Entrypoints

2022-10-06T12:18:00.004+01:00

How do you start a Java program? With a main method of course. But the ceremony around writing such a method is perhaps not the nicest for newcomers to Java.

There has been a bit of dicussion recently about how the "on-ramp" for Java could be made easier. This is the original proposal. Here are follow ups - OpenJDK, Reddit, Hacker news.

Starting point

This is the classic Java Hello Word:

  public class HelloWorld { 
    public static void main(String[] args) { 
      System.out.println("Hello World");
    }
  }

Lots of stuff going on - public, class, a class name, void, arrays, a method, method call. And one of the weirdest things in Java - System.out - a public static field in lower case. Something that is pretty much never seen in normal Java code. (I still remember System.out.println being the most confusing part about getting started in Java 1.0 - why are there two dots and why isn't it out()?)

The official proposal continues to discuss:

A more tolerant launch protocol
Unnamed classes
Predefined static imports for the most critical methods and fields

The ensuing discussion resulted in various suggestions. Having taken some time to reflect on the proposal and discussion, here is my contribution, which is that what is really needed is something more comprehensive.

Entrypoints

When a Java program starts some kind of class file needs to be run. It could be a normal class, but that isn't ideal as we don't really want static/instance variables, subclasses, parent interfaces, access control etc. One suggestion was for it to be a normal interface, but that isn't ideal as we don't want to mark the methods as default or allow abstract methods.

I'd like to propose that what Java needs is a new kind of class declaration for entrypoints.

I don't think this is overly radical. We already have two alternate class declarations - record and enum. They have alternate syntax that compiles to a class file without being explictly a class in source code. What we need here is a new kind - entrypoint - that compiles to a class file but has different syntax rules, just like record and enum do.

I believe this is a fundamentally better approach than the minor tweaks in the official proposal, because it will be useful to developers of all skill levels, including framework authors. ie. it has a much better "bang for buck".

The simplest entrypoint would be:

  // MyMain.java
  entrypoint {
    SystemOut.println("Hello World");
  }

In the source code we have various things:

Inferred class name from file name, the class file is MyMain$entrypoint
Top-level code, no need to discuss methods initially
No access to paraneters
New classes SystemOut, SystemIn and SystemErr
No constructor, as a new kind of class declaration it doesn't need it

The classes like SystemOut may seem like a small change, but it would have been much simpler for me from 25 years ago to understand. I don't favour more static imports for them (either here or more generally), as I think SystemOut.println("Hello World") is simple enough. More static imports would be too magical in my opinion.

The next steps when learning Java are for the instructor to expand the entrypoint.

Add a named method (always private, any name although main() would be common)
Add parameters to the method (maybe String[], maybe String...)
Add return type to the method (void is default return type)
Group code into a block
Add additional methods (always private)

Here are some valid examples. Note that instructors can choose the order to explain each feature:

  entrypoint {
    SystemOut.println("Hello World");
  }
  entrypoint main() {
    SystemOut.println("Hello World");
  }
  entrypoint main(String[] args) {
    SystemOut.println("Hello World");
  }
  entrypoint {
    main() {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    void main(String[] args) {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    main(String[] args) {
      output("Hello World");
    }
    output(String text) {
      SystemOut.println(text);
    }
  }

Note that there are never any static methods, static variables, instance variables or access control. If you need any of that you need a class. Thus we have proper separation of concerns for the entrypoint of systems, which would be Best Practice even for experienced developers.

Progressing to classes

During initial learning, the entrypoint class declaration and normal class declaration would be kept in separate files:

  // MyMain.java
  entrypoint {
    SystemOut.println(new Person().name());
  }
  // Person.java
  public class Person {
    String name() {
      return "Bob";
    }
  }

However, at some point the instructor would embed an entrypoint (of any valid syntax) in a normal class.

  public class Person {
    entrypoint {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

We discover that an entrypoint is normally wrapped in a class which then offers the ability to add static/instance variables and access control.

Note that since all methods on the entrypoint are private and the entrypoint is anonymous, there is no way for the rest of the code to invoke it without hackery. Note also that the entrypoint does not get any special favours like an instance of the outer class, thus there is no issue with no-arg constructors - if you want an instance you have to use new (the alternative is unhelpful magic that harms learnability IMO).

Finally, we see that our old-style static main method is revealed to be just a normal entrypoint:

  public class Person {
    entrypoint public static void main(String[] args) {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

ie. when a method is declared as public static void main(String[]) the keyword entrypoint is implicitly added.

What experienced developers gain from this is a clearer way to express what the entrypoint actually is, and more power in expressing whether they want the command line arguments or not.

Full-featured entrypoints

Everything above is what most Java developers would need to know. But an entrypoint would actually be a whole lot more powerful.

The basic entrypoint would compile to a class something like this:

  // MyMain.java
  entrypoint startHere(String[] args) {
    SystemOut.println("Hello World");
  }
  // MyMain$entrypoint.class
  public final MyMain$entrypoint implements java.lang.Entrypoint {
    @Override
    public void main(Runtime runtime) {
      runtime.execute(() -> startHere(runtime.args()));
    }
    private void startHere(String[] args) {
      SystemOut.println("Hello World");
    }
  }

Note that it is final and methods are private.

The Entrypoint interface would be:

  public interface java.lang.Entrypoint {
    /**
     * Invoked by the JVM to launch the program.
     * When the method completes, the JDK terminates.
     */
    public abstract void main(Runtime runtime);
  }

The Runtime.execute method would be something like:

  public void execute(ThrowableRunnable runnable) {
    try {
      runnable.run();
      System.exit(0);
    } catch (Throwable ex) {
      ex.printStackTrace();
      System.exit(1);
    }
  }

The JVM would do the following:

Load the class file specified on the command line
If it implements java.lang.Entrypoint call the no-args constructor and invoke it
Else look for a legacy public static void main(String[]), and invoke that

Note that java.lang.Entrypoint is a normal interface that can be implemented by anyone and do anything!

This last point is critical to enhancing the bang-for-buck. I was intriguied by things like Azul CRaC which wants to own the whole lifecycle of the JVM run. Wouldn't that be more powerful if they could control the whole lifecycle through Entrypoint. Another possibile use is to reset the state when an application has finished, allowing the same JVM to be reused - a bit like Function-as-a-Service providers or build system daemons do. (I suspect it may be possible to enhance the entrypoint concept to control the shutdown hooks and to catch things like System.exit but that is beyond the scope of this blog.) For example, here is a theoretical application framework entrypoint:

  // FrameworkApplication.java - an Open Source library
  public interface FrameworkApplication extends Entrypoint {
    public default main(Runtime runtime) {
      // do framework things
      start();
      // do framework things
    }
    public abstract start();
  }

Applications just implement this interface, and they can run it by specifying their own class name on the command line, yet it is a full-featured framework application!

Summary

I argue that the proposal above is more powerful and more useful to experienced developers than the official proposal, while still meeting the core goal of step-by-step on-ramp learning. Do let me know what you think. (Reddit discussion).

Big problems at the timezone database

2021-09-25T01:55:00.001+01:00

The last time I wrote about the timezone database on this blog, the database was under threat from a lawsuit. Fortunately that lawsuit went away relatively quickly as the company involved got the message that their action was a big mistake. Unfortunately this time the mess is internal.

Paul Eggert is the project lead of the timezone database hosted at IANA, a position referred to as the TZ Coordinator. He is an expert in the field, having been involved in documenting timezone data for decades. Unfortunately, he is currently ignoring all objections to an action only he seems intent on making to solve an invented problem that only he sees as important.

The database is the world's principle source of timezone information. The data is included in everything from operating systems to smartphones to programming language development kits such as the JDK. While you may never have heard of it, the sheer pervasiveness of the data makes the potential impact of change or damage pretty huge.

The timezone database contains information about how clocks have varies in each region around the world. The mandate of the project is to record this information from 1970 onwards.

Of course, computers being what they are, a function that returns the timezone for a given date can be passed in a pre-1970 date as well as a post-1970 one. For this, and reasons of completeness, the timezone database contains pre-1970 data as well as post-1970 data. If you go to your JDK or operating system and ask for the timezone offset for 1920-01-01 for the ID "Europe/Oslo" or "Europe/Berlin" you will get an answer:

  DateTimeZone oslo = DateTimeZone.forID("Europe/Oslo");
  System.out.println(oslo.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 3600000
  DateTimeZone berlin = DateTimeZone.forID("Europe/Berlin");
  System.out.println(berlin.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 7200000

The proposed change is to downgrade "Europe/Oslo" to be merely an alias for "Europe/Berlin". The rationale is that since the two regions have the same data post-1970 there should only be one ID. The issue with this is that querying "Europe/Oslo" in pre-1970 (as above) will now return the data from Berlin. ie. the well researched pre-1970 data for Oslo will be replaced by that of Berlin.

The situation with Joda-Time is even worse. With Joda-Time, aliases (also known as Links) are actively resolved. Before the proposed change this test case passes, after the proposed change the test case fails:

  assertEquals("Europe/Oslo", DateTimeZone.forID("Europe/Oslo").getID());

In other words, it will be impossible for a Joda-Time user to hold the ID "Europe/Oslo" in memory. This could be pretty catastrophic for systems that rely on timezone management, particularly ones that end up storing that data in a database. To mitigate this to some degree, I've added a test case and released a version that has such a test case in it, but obviously users of Joda-Time that haven't upgraded to the latest version may still see problems if they update the tzdb version.

But what about backwards compatibility? Well it seems that the TZ Coordinator just doesn't see this as being important. To him, it doesn't matter that users may be relying on this data.

(Technically, the data has moved, not been deleted. But the file containing the moved data is never normally used by downstream systems, thus to all intents and purposes it has been deleted.)

The question you might ask is why is Berlin favoured over Oslo? Why can Berlin keep its status and full history, but Oslo gets effectively deleted? The answer is that Berlin has the greater population - would you have guessed that if you didn't read it here?

From my perspective, I cannot see how it is not incredibly unfair that the timezone ID that represents the country of Norway, "Europe/Oslo", is treated as less important than the ID that represents the country of Germany, "Europe/Berlin". Yes, there is a technical rationale around 1970 and population, but that is really pretty arcane.

In fact it goes further than this. The TZ Coordinator does not really believe that there should be an ID for Oslo/Norway at all. (The official project rules say that there should only be one ID for locations where timezone data is the same post-1970. Country borders simply dont matter.)

Some of the 30 IDs proposed for downgrade are "Europe/Oslo", "Europe/Stockholm", "Europe/Copenhagen", "Europe/Amsterdam", "Europe/Luxembourg", "Europe/Monaco", "Atlantic/Reykjavik" and "Indian/Mahe". If you regularly use any of these IDs you may be affected by this change. Iceland is a classic case - "Atlantic/Reykjavik" is to be downgraded in favour of "Africa/Abidjan". Bonus points if you know which country Abidjan is in!

What is driving the change? Well this is where it gets really weird. The TZ Coordinator's argument is that there is a fairness/equity problem if Oslo is allowed to keep its pre-1970 history but other locations (typically in Africa) are not. I have two problems with this. Firstly, I consider it to also be unfair and inequitable that Berlin gets to have pre-1970 history and Oslo does not. Secondly, the correct approach to solving a fairness/equity problem is to level up, not level down. ie. Most leaders would want to improve the worse performers on the team, not force the best performers to be as bad as the worst.

So, we have a change with terrible downstream effects, including a potential fork of a major global data set and broken end-user applications, to make a problem that no one was complaining about a whole lot worse.

I've spent months trying to stop this happening, but appear to have lost the battle. This is despite near unaminity on the mailing list requesting the changes to be paused. Tonight 9 of the 30 changes have been included in release 2021b. These are not the ones affecting Europe.

I still hope that a solution can be found that the TZ Coordinator is happy with that avoids an impact on countries like Norway, Sweden, Denmark or the Netherlands. In the medium term, I hope that funding can be found for the CLDR project to take on the timezone database (as CLDR has a much better record at managing data like this).

Stay tuned as I try and work out how best to resolve this completely unecessary drama.

Java switch - 4 wrongs don't make a right

2019-11-04T20:32:00.000+00:00

The switch statement in Java is being changed. But is it an upgrade or a mess?

Classic switch

The classic switch statement in Java isn't great. Unlike many other parts of Java, it wasn't properly rethought when pulling features across from C all those years ago.

The key flaw is "fall-through-by-default". This means that if you forget to put a break clause in each case, processing will continue on to the next case clause.

Another flaw is that variables are scoped to the entire switch, thus you cannot reuse a variable name in two different case clauses. In addition, default clause is not required, which leaves readers of the code unclear as to whether a clause was forgotten or not.

And of course there is also the key limitation - that the type to be switched on can only be an integer, enum or string.

 String instruction;
 switch (trafficLight) {
   case RED:
     instruction = "Stop";
   case YELLOW:
     instruction = "Prepare";
     break;
   case GREEN:
     instruction = "Go";
     break;
 }
 System.out.println(instruction);

The code above does not compile because there is no default clause, leaving instruction undefined. But even if it did compile, it would never print "Stop" due to the missing break. In my own coding, I prefer to always put a switch at the end of a method, with each clause containing a return to reduce the risks of switch.

Upgraded switch

As part of Project Amber, switch is being upgraded. But sadly, I'm unconvinced as to the merits of the new design. To be clear, there are some good aspects, but overall I think the solution is overly complex and with some unpleasant syntax choices.

The key aim is to add an expression form, where you can assign the result of the switch to a variable. This is rather like the ternary operator (eg. x != null ? x : ""), which is the expression equivalent of an if statement. An expression form would reduce problems like the undefined variable above, because it makes it more obvious that each branch must result in a variable.

The current plan is to add not one, but three new forms of switch. Yes, three.

Explaining this in a blog post is, unsurprisingly, going to take a while...

Type 1: Statement with classic syntax. As today. With fall-through-by-default. Not exhaustive.
Type 2: Expression with classic syntax. NEW! With fall-through-by-default. Must be exhaustive.
Type 3: Statement with new syntax. NEW! No fall-through. Not exhaustive.
Type 4: Expression with new syntax. NEW! No fall-through. Must be exhaustive.

The headline example (type 4) is of course quite nice:

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> "Prepare";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

As can be seen, the new syntax of type 3 and 4 uses an arrow instead of a colon. And there is no need to use break if the code consists of a single expression. There is also no need for a default clause when using an enum, because the compiler can insert it for you provided you've included all the known enum values. So, if you missed out GREEN, you would get a compile error.

The devil of course is in the detail.

Firstly, a clear positive. Instead of falling through by listing multiple labels, they can be comma-separated:

 // type 4
 var instruction = switch (trafficLight) {
   case RED, YELLOW -> "Stop";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

Straightforward and obvious. And avoids many of the simple fall-through use cases.

What if the code to execute is more complex than an expression?

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> {
     revYourEngine();
     yield "Prepare";
   }
   case GREEN -> "Go";
 };
 System.out.println(instruction);

yield? Shrug. For a long time it was going to be break {expression}, but this clashes with labelled break (a syntax feature that is rarely used).

So what about type 2?

 // type 2
 var instruction = switch (trafficLight) {
   case RED: yield "Stop";
   case YELLOW:
     System.out.println("Prepare");
   case GREEN: yield "Go";
 };
 System.out.println(instruction);

Oops! I forgot the yield. So, an input of YELLOW will output "Prepare" and then fall-through to yield "Go".

So, why is it proposed to add a new form of switch that repeats the fall-through-by-default error from 20 years ago? The answer is orthogonality - a 2x2 grid with expression vs statement and fall-through-by-default vs no fall-through.

A key question is whether being orthogonal justifies adding a almost totally useless form of switch (type 2) to the language.

So, type 3 is fine them?

Well, no. Because of the insistence on orthogonality, and thus an insistence of copying the historic rules relating to type 1 statement switch, there is no requirement to list all the case clauses:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
 }

So, what happens for YELLOW? The answer is nothing, but as a reader I am left wondering if the code is correct or incomplete. It would be much better if the above was a compile error, with developers forced to write a default clause:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
   default -> {}
 }

The official argument is that since type 1 statement switch (the current one) does not force exhaustiveness, neither can the new type 3 statement switch. My view is that keeping a bad design from 20 years ago is a worse sin.

What else? Well, one thing to bear in mind is that expressions cannot complete early, thus there is no way to return directly from within a switch expression (type 2 or 4). Nor is there a way to continue/break a loop. Trust me when I say there is an endless supply of Java exam questions in the rules that actually apply.

Summarizing the types

Type 1: Classic statement switch

Statement
Fall-through-by-default
return allowed, also continue/break a loop
Single scope for variables
Logic for each case is a sequence of statements potentially ending with break
Not exhaustive - default clause is not required
yield is not allowed

Type 2: Classic syntax expression switch

Expression
Fall-through-by-default
return not allowed, cannot continue/break a loop
Single scope for variables
Logic for each case can be a yield expression, or a sequence of statements potentially ending with yield
Exhaustive - default clause is required
Must use yield to return values

Type 3: Arrow-form statement switch

Statement
Fall-through is not permitted
return allowed, also continue/break a loop
No variable scope problems, logic for each case must be a statement or a block
Not exhaustive - default clause is not required
yield is not allowed

Type 4: Arrow-form expression switch

Expression
Fall-through is not permitted
return not allowed, cannot continue/break a loop
No variable scope problems, logic for each case must be an expression or a block ending with yield
Exhaustive - default clause is required
Must use yield to return values, but only from blocks (it is implied when not a block)

Are you confused yet?

OK, I'm sure I didn't explain everything perfectly, and I may well have made an error somewhere along the way. But the reality is that it is complex, and there are lots of rules hidden in plain sight. Yes, it is orthogonal. But I really don't think that helps in comprehending the feature.

What would I do?

Type 4 switch expressions are fine (although I have real issues with the extension of the arrow syntax from lambdas). My problem is with type 2 and 3. In reality, those two types of switch will be very rare, and thus most developers will never see them. Given this, I believe it would be better to not include them at all. Once this is accepted, there is no point in treating the expression form as a switch, because it won't actually have many connections to the old statement form.

I would drop type 2 and 3, and allow type 4 switch expressions to become what is known as statement expressions. (Another example of a statement expression is a method call, which can be used as an expression or as a statement on a line of its own, ignoring any return value.)

 // Stephen's expression switch
 var instruction = match (trafficLight) {
   case RED: "Stop";
   case YELLOW: "Prepare";
   case GO: "Go";
 }
 // Stephen's expression switch used as a statement (technically a statement expression)
 match (instruction) {
   case "Stop": doStop();
   case "Go": doGo();
   default: ;
 }

My approach uses a new keyword match, as I believe extending switch is the wrong baseline to use. Making it a statement expression means that there is only one set of rules - it is always an expression, it is just that you can use it as though it were a statement. What you can't do with my approach is use return in the statement version, because it isn't actually a statement (you can't use return from any expression in Java today, so this would be no different).

Summary

If you ignore the complexity, and just use type 4 switch expressions, the new feature is quite reasonable.

However, in order to add the one form of switch Java needed, we've also got two other duds - type 2 and 3. In my view, the feature needs to go back to the drawing board, but sadly I suspect it is now too late for that.

User-defined literals in Java?

2019-03-22T11:39:00.000+00:00

Java has a number of literals for creating values, but wouldn't it be nice if we had more?

Current literals

These are some of the literals we can write in Java today:

integer - 123, 12s, 1234L, 0xB8E817, 077, 0b1011_1010
floating point - 45.6f, 56.7d, 7.656e6
string - "Hello world"
char - 'a'
boolean - true, false
null - null

Project Amber is also considering adding multi-line and/or raw string literals.

But there are many other data types that would benefit from literals, such as dates, regex and URIs.

User-defined literals

In my ideal future, I'd like to see Java extended to support some form of user-defined literals. This would allow the author of a class to provide a mechanism to convert a sequence of characters into an instance of that class. It may be clearer to see some examples using one possible syntax (using backticks):

 Currency currency = `GBP`;
 LocalDate date = `2019-03-29`;
 Pattern pattern = `strata\.\w+`;
 URI uri = `https://blog.joda.org/`;

A number of semantic features would be required:

Type inference
Raw processing
Validated at compile-time

Type inference

Type inference is of course a key aspect of literals. It would have to work in a similar way to the existing literals, but with a tweak to handle the new var keyword. ie. these two would be equivalent:

 LocalDate date = `2019-03-29`;
 var date = LocalDate`2019-03-29`;

The type inference would also work with methods (compile error if ambiguous):

 boolean inferior = isShortMonth(`2019-04-12`);

 public boolean isShortMonth(LocalDate date) { return date.lengthOfMonth() < 31; }

Raw processing

Processing of the literal should not be limited by Java's escape mechanisms. User-defined literals need access to the raw string. Note that this is especially useful for regex, but would also be useful for files on Windows:

 // user-defined literals
 var pattern = Pattern`strata\.\w+`;
 // today
 var pattern = Pattern.compile("strata\\.\\w+");

Today, the `\` needs to be escaped, making the regex difficult to read.

Clearly, the problem with parsing raw literals is that there is no mechanism to escape. But the use cases for user-defined literals tend to have constrained formats, eg. a date doesn't contain random characters. So, although there might be edge cases where this would be a problem, they would vert much be edge cases.

Validated at Compile-time

A key feature of literals is that they are validated at compile-time. You can't use an integer literal to create an int if the value is larger than the maximum allowed integer (2^31).

User-defined literals also need to be parsed and validated at compile-time too. Thus this code would not compile:

 LocalDate date = `2019-02-31`;

Most types which would benefit from literals only accept specific input formats, so being able to check this at compile time would be beneficial.

How would it be implemented?

I'm pretty confident that there are various ways it could be done. I'm not going to pick an approach, as ultimately those that control the JVM and language are better placed to decide. Clearly though, there is going to need to be some form of factory method on the user class that performs the parse, with that method invoked by the compiler. And ideally, the results of the parse would be stored in the constant pool rather than re-parsed at runtime.

What I would say is that user-defined literals would almost be a requirement for making value types usable, so something like this may be on the way anyway.

Summary

I like literals. And I would really like to be able to define my own!

Any thoughts?

Commercial support for Joda and ThreeTen projects

2019-01-09T12:23:00.000+00:00

The Java ecosystem is made up of many individuals, organisations and companies producing many different libraries. Some of the largest projects have long had support options where users of the project, typically corporates, can pay for an enhanced warranty, guaranteed approach to bug fixes and more.

Small projects, run by a single individual or a team, have been unable to offer this service, even if they wanted to. In addition, there is a more subtle problem. The amount a small project could charge is too low for a corporate to pay.

This sounds odd, but was brought home to me by this thread on twitter:

As the thread indicates, it is basically impossible for a corporate to gift money to a small project, and it is not viable for small projects to meaningfully offer a support contract.

The problem is that not paying the maintainers has negative consequences. Take the recent case where a developer handed his open source project on to another person, who then used it to steal bitcoins.

Pay the maintainers

I believe there is now a solution to the problem. Tidelift.

Tidelift offers companies a monthly subscription to support their open source usage. And they pay some of that income directly to the maintainers of the projects that the company uses.

Maintainers are expected to continue maintaining the project, follow a responsible disclosure process for security issues and check their licensing. Tidelift does not get to control the project roadmap, and maintainers do not have to provide an active helpdesk or consulting. See here for more details.

As such, I'm now offering commercial support for Joda-Time, Joda-Money, Joda-Beans, Joda-Convert, Joda-Collect, ThreeTen-Extra, ThreeTen-backport via the Tidelift subscription.

This is an extra option for those that want to support the maintainers of open source but haven't been able to find a way to do so until now. The Joda and ThreeTen projects will always be free and available under a permissive licence, so there is no need to worry as a result of this.

Comments welcome.

Should you adopt Java 12 or stick on Java 11?

2018-10-31T07:02:00.000+00:00

Should you adopt Java 12 or stick on Java 11 for the next 3 years? Seems like an innocuous question, but it is one of the most important decisions out there for those running on the JVM. I'll try to cover the key aspects of the decision, with the assumption that you care about running with the latest set of security patches in production.

TL;DR, It is vital to fully understand and accept the risks before adopting Java 12.

The Java release train

There is now a new release of Java every six months, so Java 12 is less than five months away despite Java 11 having just been released. As part of the process of moving to more frequent releases, certain releases are designated to be LTS (long term support) and as such will have security patches available for four years or more. This makes them "major" releases, not because they have a bigger feature set but because they have multi year support.

It is expected that Java 11 patches (11.0.1, 11.0.2, 11.0.3, etc.) will be smaller and simpler than Java 8 updates (8u20, 8u40, 8u60, etc.) - Java 11 updates will be more focused on security patches, without the internal enhancements of Java 8 updates. Instead, Oracle want us to think of Java 12, 13, 14 etc. as small upgrades, similar to an imaginary Java 11u20, 11u40 etc. To be blunt, I find this nonsensical.

Senior Oracle employees have repeatedly argued that updates such as 8u20 and 8u40 often broke code. This was not my experience. In fact my experience was that update releases primarily contained bug fixes. The only break I can remember was the addition of --allow-script-in-comments to Javadoc, which isn't a core part of Java. As a result, I have never feared picking up the latest update release - and this has been a core benefit of the Java platform.

Drilling down into why update releases tend to cause no problems, lets examine the differences between release types:

Model	Old model		New model
Upgrade	Java major releases	Java update releases	Java release train	Java patches
Frequency	Every 3 years or so	Every 6 months	Every 6 months	Every 3 months
Versions	6 -> 7 -> 8	8 -> 8u20 -> 8u40	11 -> 12 -> 13	11 -> 11.0.1 -> 11.0.2
Language changes	✅	✘	✅	✘
JVM changes	✅	✘	✅	✘
Major enhancements	✅	✘	✅	✘
Added classes/methods	✅	✘	✅	✘
Removed classes/methods	✘	✘	✅	✘
New deprecations	✅	✘	✅	✘
Internal enhancements	✅	✅	✅	✘
JDK tool changes	✅	✅	✅	✘
Bug fixes	✅	✅	✅	✅
Security patches	✅	✅	✅	✅

Given the table above, I find it amazing that anyone would claim moving from 11 to 12 to 13 is anything like moving from 8u20 to 8u40. But that is the official Oracle viewpoint:

Going from Java 9->10->11 is closer to going from 8->8u20->8u40 than from 7->8->9.
Oracle FAQ

As the table clearly shows, each version in the Java release train can contain any change traditionally associated with a full major version. These include language changes and JVM changes, both of which have major impacts on IDEs, bytecode libraries and frameworks. In addition, there will not only be additional APIs, but also API removals (something that did not happen prior to 8).

Oracle's claim is that because each release is only 6 months after the previous one, there won't be as much "stuff" in it, thus it won't be as hard to upgrade. While true, it is also irrelevant. What matters is whether the upgrade has the potential to damage your code stack or not. And clearly, going from 11 -> 12 -> 13 has much greater potential for damage than 8 -> 8u20 -> 8u40 ever did.

The key difference compared to updates like 8u20 -> 8u40 are the changes to the bytecode version, and the changes to the specification. Changes to the bytecode version tend to be particularly disruptive, with most frameworks making heavy use of libraries like ASM or ByteBuddy that are intimately linked to each bytecode version. And moving from 8u20 -> 8u40 still had the same Java SE specification, with all the same classes and methods, something that cannot be relied on moving from 12 to 13. I simply do not accept Oracle's argument that the "amount of stuff" in a release is more significant than the "type of stuff".

Note however that another one of Oracle's claims really does matter. They point out that if you stick with Java 11 and plan to move to the next LTS version when it is released (ie. Java 17) that you might find your code doesn't compile. Remember that the Java development rules now state that an API method can be deprecated in one version and removed in the next one. Rules that do not take LTS releases into account. So, a method could be deprecated in 13 and removed in 15. Someone upgrading from 11 to 17 would simply find a deleted API having having never seen the deprecation. Lets not panic too much about removal though - the only APIs likely to be removed are specialist ones, not those in widespread use by application code.

Considerations before adopting the release train

In this section, I try to outline some of the considerations/risks that must be considered before adopting the release train.

Locked in to the train
If you adopt Java 12 and use a new language feature or new API, then you are effectively locking your project in to the release train. You have to adopt Java 13, 14, 15, 16 and 17. And you have to adopt each new release within one month of the next release coming out.

Remember that with the new release train, each release has a lifetime of six months, and is obsolete just seven months after release. That is because there will be only six months of security patches for each release, the first patch 1 month after release and the second 4 months after release. After 7 months, the next set of security patches come out but the older release will not get them.

Do your processes allow for a Java upgrade, any necessary bug fixing, testing and release within that narrow 1 month time window? Or are you willing to run in production on a Java version below the security baseline?

Upgrade blockers
There are many possible things that can block an upgrade of Java. Lets make a list of some of the common ones.

Insufficient development resources: Your team may get busy, or be downsized, or the project may go to production and the team disbanded. Can you guarantee that development time will be available to do the upgrade from Java 15 to 16 in two years time?

Build tools and IDEs: Will your IDE support each new version on the day of release? Will Maven? Gradle? Do you have a backup plan if they don't? Remember, you only have 1 month to complete the upgrade, test it and get it released to production. Under this section other tools include Checkstyle, JaCoCo, PMD, SpotBugs and many more.

Dependencies: Will your dependencies all be ready for each new version? Quickly enough for you to meet the 1 month deadline? Remember, it is not just your direct dependencies, but everything in your stack. Bytecode manipulation libraries are particularly affected for example, such as ByteBuddy and ASM.

Frameworks: Another kind of dependency, but a large and important one. Will Spring produce a new version every six months, within the narrow one month time window? Will Jakarta EE (formerly Java EE)? What happens if they don't?

Now the traditional approach to any of these blockers was to wait. With versions of Java up to 8, a common approach was to wait 6 to 12 months before starting the upgrade to give tools, libraries and frameworks the chance to fix any bugs. But of course the waiting approach is incompatible with the release train.

Cloud / Hosting / Deployment
Do you have control of where and how your code runs in production? For example, if you run your code in AWS Lambda you do not have control. AWS Lambda has not adopted Java 9 or 10, and they don't even have Java 11 even though it is over a month after release. Unless AWS give a public guarantee to support each new Java version, then you simply can't adopt Java 12. (My working assumption is that AWS Lambda will only support major LTS versions, supported by the Amazon Corretto JDK announcement.)

What about hosting of your CI system? Will Jenkins, Travis, Circle, Shippable, GitLab be updated quickly? What do you do if they are not?

Predicting the future
Perhaps you have read through the list above and are happy your code and processes today can cope. Great! But it is critical to understand that you are also restricting your ability to change in the future.

For example, maybe your code doesn't run on AWS Lambda today. But are you willing to say you can't do so for the next three years?

Planning for the release train

If you are considering adopting the release train, I recommend preparing a list of all the things you depend on now, or might depend on in the next 3 years. You need to be confident that everything on that list will work correctly and be upgraded along with the release train, or have a plan if that dependency is not upgraded. The list for my day job is something like this:

Amazon AWS
Eclipse
IntelliJ
Travis CI
Shippable CI
Maven
Maven plugins (compile, jar, source, javadoc, etc)
Checkstyle, & associated IDE plugins and maven plugin
JaCoCo, & associated IDE plugins and maven plugin
PMD, & associated maven plugin
SpotBugs, & associated maven plugin
OSGi bundle metadata tool
Bytecode tools (Byte buddy / ASM etc)
Over 100 jar file dependencies

And I've probably forgotten something.

Don't get me wrong. I think it is perfectly possible to make a choice to say that you are willing to take the risk. That the benefits of new language features, and probable enhanced performance, make the effort worthwhile. But I strongly believe it is more risky than remaining on Java 11.

A middle ground?

One possible middle ground is to develop your application for Java 12, but run it in production on Java 13, 14, 15 etc. as soon as they come out. Sadly, this approach is less viable than it should be.

The removal of APIs and changes to the bytecode version add uncertainty to the stack. Even if your code doesn't use one of the removed APIs, one of your libraries might. Or a bytecode manipulation library might need upgrading, with knock on effects. So while the middle ground is a possible fallback if you get stuck, it is far from a no-risk solution.

Some additional links

Spring framework has expressed its policy wrt Java 12 in a video. The key sections are:

Jaba 8 and 11 as the LTS branches officially supported from our end. Best efforts support for the releases inbetween. ... if you intend to upgrade to 12 ... we are very willing to work with you ... but they are not officially production supported. ... The long term support releases are what we are primarily focussed on. Java 12 and higher will be best effort from our side.

As an example of a typical software vendor, Liferay states:

Liferay has decided we will not certify every single major release of the JDK. We will instead choose to follow Oracle's lead and certify only those marked for LTS.
Liferay blog

Oracle's official "misconceptions" slide about the new release model.

Summary

I'm sure some development teams will adopt the Java release train. My hope is that they do so with their eyes wide open. I know we won't be adopting the release train at my day job any time soon, a key blocker being our use of AWS Lambda, but I'd be concerned about all the other points too.

Feel free to leave a comment, especially if you think I've missed any points that should be on the considerations list.

Oracle's Java 11 trap - Use OpenJDK instead!

2018-09-26T11:57:00.005+01:00

TL:DR; Java is still available at zero-cost, you just need to stop using Oracle JDK and start using an OpenJDK build, such as this one or this one.

The trap

Java 11 has been released. It is a major release because it has long-term support (LTS). But Oracle have also set it up to be a trap (either deliberately or accidentally).

For 23 years, developers have downloaded the JDK from Oracle and used it for $free. Type "JDK" into your favourite search engine, and the top link will be to an Oracle Java SE download page (I'm deliberately not providing a link). But that search and that link is now a trap.

Oracle JDK, the one all web searches take you to, is now commercial not $free.

The key part of the terms is as follows:

You may not: use the Programs for any data processing or any commercial, production, or internal business purposes other than developing, testing, prototyping, and demonstrating your Application;

The trap is as follows:

Download Oracle JDK (because that is what you've always done, and it is what the web-search tells you)
Use it in production (because you didn't realise the license changed)
Get a nasty phone call from Oracle's license enforcement teams demanding lots of money

In other words, Oracle can rely on inertia from Java developers to cause them to download the wrong (commercial) release of Java. Unless you read the text/warnings/legalese very carefully you might not even realise Oracle JDK is now commercial, and that you are therefore liable to pay Oracle for using this particular JDK in production.

(Update, 2018-10-03: Searches for Java 11 and JDK 11 now seem to be resolving to OpenJDK builds, not commercial ones!)

Is this trap malicious behaviour on the part of Oracle? Readers will have their own opinions. I do suggest bearing in mind that Oracle invests huge amounts in developing Java, so it is reasonable to have a commercial plan available for those that want it. And they do provide a $free alternative completely valid for commercial use...

The solution

The solution is simple!

Use an OpenJDK build.

There are many different $free OpenJDK builds of Java 11, so you need to choose the one that best fits your needs.

The Adoptium (formerly AdoptOpenJDK) build is $free, GPL licensed (with Classpath exception so safe for commercial use), and a good choice as it is vendor-neutral and is intended to have 4+ years of security patches.

Download $free Java from Adoptium here

The OpenJDK build from Oracle is $free, GPL licensed (with Classpath exception so safe for commercial use), and provided alongside their commercial offering. It will only have 6 months of security patches, after that Oracle intends you to upgrade to Java 12.

Download $free Java from Oracle here

Many more OpenJDK builds are available, including ones available via your package manager. See this post for a list covering the wide variety of OpenJDK builds. And see my post on zero-cost Java for background info.

Download other OpenJDK builds here

And for a counterpoint, see Marcus' great summary of why the underlying changes here are actually good news.

Summary

Do NOT download or use the Oracle JDK unless you intend to pay for it.

For Java 11, download and use an OpenJDK build, from AdoptOpenJDK , Oracle or elsewhere.

(No comments on this post. There are plenty of other places to express opinions.)

Java release chains - Splitting features from security

2018-09-20T11:10:00.000+01:00

There is now a Java release every 6 months - March and September. It started with Java 9 and we're about to get Java 11. But should you jump on the release train? To answer that, we need to look at how Java's release chains are being split.

Looking back at Java 8

In the olden days life was simple. There was a "major" Java release every few years and it contained lots of new features, for example Java 5, 6, 7 and 8. Each major release included new JDK methods, new JDK classes, deprecations, new JVM features and new language features. However, life wasn't actually as simple as it seemed.

Looking at Java 8, once it was released there was a regular frequency of "update" releases. The most well-known of these were 8u20, 8u40 and 8u60. But there were also many others - 8u5, 8u11, 8u25, 8u31, 8u45, 8u51, 8u65, 8u66, 8u71, 8u73, 8u74, 8u77, etc. So, what was going on?

Well the plan was quite simple, just not that widely known. 8u20, 8u40, 8u60 and so on were "feature" releases, while all the rest were security patch releases. See the full table.

8u20, 8u40, 8u60 and so on were "feature" releases - every six months
8u5, 8u11, 8u25, 8u31 and so on were "security" releases - every three months, plus additional emergency releases

If you look closely, you can see a pattern. The first security release after a feature release had a number 5 greater (8u25 is 5 greater than 8u20). The second security release after a feature release had a number 11 greater (8u31 is 11 greater than 8u20). This left space for emergency security releases like 8u66.

So what was a Java 8 feature release?

Well a feature release was allowed to contain anything that didn't impact the Java SE specification. So, JVM or tool enhancements might be allowed, particularly if covered by a flag that was disabled by default. For example, the "endorsed-standards override mechanism and the extension mechanism" was deprecated in 8u40, 8u60 added a new IBM character set, and 8u181 removed the Derby database from the JDK bundle.

If you did notice these feature releases, it might have been because Java 8 lambdas were pretty buggy until 8u40 (in my experience). As such, I've often commented that 8u40 should have been named 8.1 (although logically 8u20 would have been 8.1 and 8u40 would have been 8.2). Such a version numbering scheme would have been easy to grasp for the whole Java ecosystem, but sadly didn't happen with Java 8.

Java 11 and the release train

The observant will notice that there was a feature release every six months and a security release every three months. But this is exactly like the "new" release train post Java 9!

The key difference between then and now is that from Java 9 onwards each six-monthly "feature" release can now contain any change. Thus major JVM, library and language changes can be introduced every six months, which wasn't allowed before. It also means there is a new Java SE specification for each feature release.

To handle this high frequency and still provide stability, every three years, which is every six releases, there will be an "LTS" release - long term support. As I've discussed before, the LTS name is rather confusing given that Oracle will not be providing security patches beyond six months on an "long-term" LTS release for $free. (Read the linked blog to understand how the OpenJDK will be performing that role from now on.)

Obviously with this new release schedule the question of naming arose. Firstly, according to Oracle we are not supposed to refer to any release as a "major" release. Instead, all should be considered to be "feature" releases, and of equal importance. This is emphasised by using a single incrementing number - 9, 10, 11, 12, 13 and so on.

But are all releases really of equal importance when some have long-term support (LTS)?

The answer is clearly no. For many large companies, the LTS releases are the only ones that they will consider adopting, either to manage churn or because they insist on purchasing a support contract for Java. Thus only one conclusion can be drawn:

Java 11 is a major release because it has LTS, not because of its feature set. Java 12, 13 and so on are less significant simply because they do not have LTS.

Now this doesn't mean that Java 12 is any less tested or is in some way unsuited for general use. Nor does it mean that Java 11 is bigger than any of the other versions. It is simply a statement of fact that because Java 11 has LTS (long-term support) it will be more significant, and see adoption by a different type of user.

Release chains

The support model is built on chains of releases. In order to pick up a security patch, you may also have to pickup other changes. The support chain for any Java release is simple enough. The higher the number, the more fixes/features it contains. Thus 8u20 contains everything from 8u5 and 8u11.

Using bold for major/feature releases and italic for security patches we can see the pattern:

8, 8u5, 8u11, 8u20, 8u25, 8u31, 8u40, 8u45, 8u51, 8u60, 8u65, 8u71, ...

From Java 9 onwards there are effectively two separate patterns.

Firstly, there is the "release train" pattern if you want both features and security patches:

9, 9.0.1, 9.0.4, 10, 10.0.1, 10.0.2, 11, 11.0.1, 11.0.2, 12, 12.0.1, 12.0.2, ...

Secondly, there is the "LTS" pattern if you only want security patches:

11, 11.0.1, 11.0.2, 11.0.3, 11.0.4, 11.0.6, 11.0.7, ...

Oracle is focussing its $free efforts on the first "release train" pattern. After 11.0.2, it is ultimately the job of the OpenJDK community (probably led by Red Hat) to keep the chain in the second "LTS" pattern going.

Note the key difference though. The two chains effectively split features from security.

In Oracle's view, the old Java 8 chain and the new "release train" chain are essentially equivalent. I disagree, but I need another blog article to explain why (as this one is already too long!).

Summary

There are now two separate chains of releases - the "release train" with both features and security patches, and the "LTS" with just security patches. Neither chain is equivalent to how it worked in Java 8, so you need to choose which chain to adopt carefully.

Comments welcome. For example, did an 8u feature release break your code?

From Java 8 to Java 11

2018-09-06T11:01:00.001+01:00

Moving from Java 8 to Java 11 is trickier than most upgrades. Here are a few of my notes on the process.

(And here are a couple of other blogs - Benjamin Winterberg and Leonardo Zanivan.)

Modules

Java 9 introduced one of the largest changes in the history of Java - modules. Much has been said on the topic, by me and others. A key point is sometimes forgotten however:

You do not have to modularise your code to upgrade to Java 11.

In most cases, code running on the classpath will continue to run on Java 9 and later where modules are completely ignored. This is terrible for library authors, but great for application developers.

So my advice is to ignore modules as much as you can when upgrading to Java 11. Turning your application into Java modules may be a useful thing to consider in a few years time when open source dependencies really start to adopt modules. Right now, attempting to modularise is just painful as few dependencies are modules.

(The main reason I've found to modularise your application is to be able to use jlink to shrink the size of the JDK. But in my opinion, you don't need to fully modularise to do this - just create a single jar-with-dependencies with a simple no-requires no-exports module-info.)

Deleted parts of the JDK

Some parts of the JDK have been removed. These were parts of Java EE and Corba that no longer fitted well with the JDK, or could be maintained elsewhere.

If you use Corba then there is little anyone can do to help you. However, if you use the Java EE modules then the fix for the deleted code should be simple in most cases. Just add the appropriate Maven jars.

On the Java client side, things are more tricky with the removal of Java WebStart. Consider using Getdown or Update4J instead.

Unsafe and friends

Sun and Oracle have been telling developers for years not to use sun.misc.Unsafe and other sharp-edge JDK APIs. For a long time, Java 9 was to be the release where those classes disappeared. But this never actually happened.

What you might get with Java 11 however is a warning. This warning will only be printed once, on first access to the restricted API. It is a useful reminder that your code, or a dependency, is doing something "naughty" and will need to be fixed at some point.

What you will also find is that Java 11 has a number of new APIs specifically designed to avoid the need to use Unsafe and friends. Make it a priority to investigate those new APIs if you are using an "illegal" API. For example, Base64, MethodHandles.privateLookupIn, MethodHandles.Lookup.defineClass, StackWalker and Variable Handles.

Tooling and Libraries

Modules and the new six-monthly release cycle have conspired to have a real impact on the tooling and libraries developers use. Some projects have been able to keep up. Some have struggled. Some have failed.

When upgrading to Java 11, a key task is to update all your dependencies to the latest version. If there hasn't been a release in since Java 9 came out, then that dependency may need extra care or testing. Make sure you've updated your IDE too.

But it is not just application dependencies that need updating, so does Maven (and Gradle too, but I don't know much about Gradle myself). Most Maven plugins have changed major versions to a v3.x, and upgrading Maven itself to v3.5.4 is also beneficial.

Sadly, the core maven team is very small, so there are still some bugs/issues to be solved. However, if your Maven build is sensible and simple enough, you should generally be OK. But do note that upgrading a plugin from a v2.x to a v3.x may involve changes to configuration beyond that just associated with modules. For example, the Maven Javadoc plugin has renamed the argLine property.

A key point to note is the way Maven operates with modules. When the Maven compiler or surefire plugin finds a jar file that is modular (ie. with a module-info.class) it can place that jar on the modulepath instead of the classpath. As such, even though you might intend to run the application fully on the classpath, Maven might compile and test the code partly on the classpath and partly on the modulepath. At present, there is nothing that can be done about this.

Sometimes your build will need some larger changes. For example, you will need to change Findbugs to SpotBugs. And change Cobertura to JaCoCo.

If yo use Web Start, there is an open source alternative - OpenWebStart.

These build changes may take some time - they did for me. But the information available by a simple web search is increasing all the time.

Summary

I've upgraded a number of Joda/ThreeTen projects to support Java 9 or later now. It was very painful. But that is because as a library author I have to produce a jar file with module-info that builds and runs on Java 8, 9, 10 and 11. (In fact some of my jar files run on Java 6 and 7 too!)

Having done these migrations, my conclusion is that the pain is primarily in maintaining compatibility with Java 8. Moving an application to Java 11 should be simpler, because there is no need to stay tied to Java 8.

Comments welcome, but note that most "how to" questions should be on Stack Overflow, not here!

Time to look beyond Oracle's JDK

2018-09-03T09:38:00.002+01:00

From Java 11 its time to think beyond Oracle's JDK. Time to appreciate the depth of the ecosystem built on OpenJDK. Here is a list of some key OpenJDK builds.

This is a quick follow up to my recent zero-cost Java post

OpenJDK builds

In practical terms, there is only one set of source code for the JDK. The source code is hosted in Mercurial at OpenJDK.

Anyone can take that source code, produce a build and publish it on a URL. But there is a distinct certification process that should be used to ensure the build is valid.

Certification is run by the Java Community Process, which provides a Technology Compatibility Kit (TCK, sometimes referred to as the JCK). If an organization produces an OpenJDK build that passes the TCK then that build can be described as "Java SE compatible".

Note that the build cannot be referred to as "Java SE" without the vendor getting a commercial license from Oracle. For example, builds from AdoptOpenJDK that pass the TCK are not "Java SE", but "Java SE compatible" or "compatible with the Java SE specification". Note also that certification is currently on a trust-basis - the results are not submitted to the JCP/Oracle for checking and cannot be made public. See Volker's excellent comment for more details.

To summarise, the OpenJDK + Vendor process turns one sourcebase into many different builds.

In the process of turning the OpenJDK sourcebase into a build, the vendor may, or may not, add some additional branding or utilities, provided these do not prevent certification. For example, a vendor cannot add a new public method to an API, or a new language feature.

Oracle JDK

http://www.oracle.com/technetwork/java/javase/downloads/

From Java 11 this is a branded commercial build with paid-for support. It can be downloaded and used without cost only for development use. It cannot be used in production without paying Oracle (so it is a bit of a trap for the unwary). Oracle intends to provide full paid support until 2026 or later (details). Note that unlike in the past, the Oracle JDK is not "better" than the OpenJDK build (provided both are at the same security patch level). See here for more details of the small differences between Oracle JDK and the OpenJDk build by Oracle.

OpenJDK builds by Oracle

http://jdk.java.net/

These are $free pure unbranded builds of OpenJDK under the GPL license with Classpath Extension (safe for use in companies). These builds are only available for the first 6 months of a release. For Java 11, the expectation is there will be Java 11.0.0, then two security patches 11.0.1 and 11.0.2. To continue using the OpenJDK build by Oracle with security patches, you would have to move to Java 12 within one month of it being released. (Note that the provision of security patches is not the same as support. Support involves paying someone to triage and act upon your bug reports.)

Eclipse Temurin by Adoptium (formerly AdoptOpenJDK)

https://adoptium.net/

These are $free pure unbranded builds of OpenJDK under the GPL license with Classpath Extension. Unlike the OpenJDK builds by Oracle, these builds will continue for a much longer period for major releases like Java 11. The Java 11 builds will continue for 4 years, one year after the next major release (details). Adoptium is a project at Eclipse backed by some major industry companies. They will provide builds provided that other groups create and publish security patches in a source repository at OpenJDK.

Red Hat OpenJDK builds

Red Hat provides builds of OpenJDK via Red Hat Enterprise Linux (RHEL) which is a commercial product with paid-for support (details). They are very good at providing security patches back to OpenJDK, and Red Hat has run the security updates project of Java 6 and 7. The Red Hat build is integrated better into the operating system, so it is not a pure OpenJDK build (although you wouldn't notice the difference).

Other Linux OpenJDK builds

Different Linux distros have different ways to access OpenJDK. Here are some links for common distros: Debian, Fedora, Arch, Ubuntu.

Azul Zulu

https://www.azul.com/downloads/zulu/

Zulu is a branded build of OpenJDK with commercial paid-for support. In addition, Azul provides some Zulu builds for $free as "Zulu Community", however there are no specific commitments as to the availability of those $free builds. Azul has an extensive plan for supporting Zulu commercially, including plans to support Java 9, 13 and 15, unlike any other vendor (details).

Amazon Corretto

https://aws.amazon.com/corretto/

In preview from 2018-11-14, this is a zero-cost build of OpenJDK with long-term support that passes the TCK. It is under the standard GPL+CE license of all OpenJDK builds. Amazon will be adding their own patches and running Corretto on AWS, so it will be very widely used. Java 8 support will be until at least June 2023.

IBM Semeru (formerly AdoptOpenJDK OpenJ9 builds)

https://developer.ibm.com/languages/java/semeru-runtimes/

In addition to the standard OpenJDK builds, AdoptOpenJDK offered builds with OpenJ9 instead of HotSpot. OpenJ9 was originally IBM's JVM, but OpenJ9 is now Open Source at Eclipse. These builds are now avaialble from IBM directly. They also provide commercial paid-for support for the AdoptOpenJDK builds with OpenJ9.

SAP

https://sap.github.io/SapMachine/

SAP provides a JDK for Java 10 and later under the GPL+CE license. They also have a commercial closed-source JVM. Information on support lifetimes can be found here.

Liberica

https://bell-sw.com/java.html

Liberica from Bellsoft is a $free TCK verified OpenJDK distribution for x86, ARM32 and ARM64. Information on support can be found here.

ojdkbuild project

https://github.com/ojdkbuild/ojdkbuild

A community project sponsored by Red Hat that produces OpenJDK builds. They are the basis of Red Hat's Windows builds. The project has no customer support but does intend to provide builds so long as Red Hat supports a Java version. No information is provided about the TCK.

Others

There are undoubtedly other builds of OpenJDK, both commercial and $free. Please contact me if you'd like me to consider adding another section.

Summary

There are many different builds of OpenJDK, the original upstream source repository. Each build offers its own unique take - $free or commercial, branded or unbranded.

Choice is great. But if you just want the "standard", currently my best advice is to use the OpenJDK builds by Oracle, Adoptium builds or the one in your Operating System (Linux).

Java is still available at zero-cost

2018-08-28T07:53:00.003+01:00

The Java ecosystem has always been built on a high quality $free (zero-cost) JDK available from Oracle, and previously Sun. This is as true today as it always has been - but the new six-monthly release cycle does mean some big changes are happening.

Six-monthly releases

Java now has a release every six months, something which greatly impacts how each version is supported. By support, I mean the provision of update releases with security patches and important bug fixes.

Up to and including Java 8, $free security updates were provided for many years. Certainly up to and beyond the launch of the next version. With Java 9 and the six-monthly release cycle, this $free support is now much more tightly controlled.

In fact, Oracle will not be providing $free long-term support (LTS) for any single Java version at all from Java 11 onwards.

Version	Release date	End of $free updates from Oracle
Java 8	March 2014	January 2019 (for commercial use)
Java 9	Sept 2017	March 2018
Java 10	March 2018	Sept 2018
Java 11	Sept 2018	March 2019 ~~(might be extended, see below)~~
Java 12	March 2019	Sept 2019

The idea here is simple. Oracle wants to focus its energy on moving Java forward with the cost of long-term support directly paid for by customers (instead of giving it away for $free). To do this, they need developers to continually upgrade their version of Java, moving version every six months (and picking up the patch releases in-between). Of course, for most development shops, such rapid upgrade is not feasible. But Java is now developed as OpenJDK, which means that Oracle's support dates are not the only ones to consider.

OpenJDK

A key point to grasp is that most JDK builds in the world are based on the open source OpenJDK project. The Oracle JDK is merely one of many builds that are based on the OpenJDK codebase. While it used to be the case that Oracle had additional extras in their JDK, as of Java 11 this is no longer the case.

Many other vendors also provide builds based on the OpenJDK codebase. These builds may have additional branding and/or additional non-core functionality. Most of these vendors also contribute back to the upstream OpenJDK project, including the security patches.

The impact is that the JDK you use should now be a choice you actively make, not passively accept. How fast can you get security patches? How long will it be supported? Do you need to be able to apply contractual pressure to a vendor to help with any issues?

In addition, there are two main ways that the JDK is obtained. The first is an update mechanism buit into the operating system (eg. *nix). The second is to visit a URL and download a binary (eg. Windows).

To examine this further, lets look at Java 8 and Java 11 separately.

Staying on Java 8

If you want to stay on Java 8 after January 2019, here are the choices as I see them:

1) Don't care about security.

It is entirely possible to remain on the last $free release forever. Just don't complain when hackers destroy your company.

2) Rely on Operating System updates.

On *nix platforms, you may well obtain your JDK via the operating system (eg. Red Hat, Debian, Fedora, Arch, etc.). And as such, updates to the JDK are delivered via the operating system vendor. This is where Red Hat's participation is key - they promise Java 8 updates until June 2023 in Red Hat Enterprise Linux - but they also have an "upstream first" policy, meaning they prefer to push fixes back to the "upstream" OpenJDK project. Whether you get security patches to the JDK or not will depend on your operating system vendor, and whether they need you to pay for those updates.

3) Pay for support.

A number of companies, including Azul, IBM, Oracle and Red Hat, offer ongoing support for Java. By paying them, you get access to the stream of security patches and update releases with certain guarantees (as opposed to volunteer-led approaches). If you have cash, maybe it is fair and reasonable to pay for Java?

4) Use the non-commercial build in a commercial setting.

Oracle will provide builds of Java 8 for non-commercial use until December 2020, so you could use those. But you don't want Oracle's software licensing teams chasing you, do you?

5) Build OpenJDK yourself.

The stream of security patches * is published to a public Mercurial repository under the GPL license. As such, it is perfectly possible to build OpenJDK yourself by keeping track of commits to that repository. I suspect this not a very realistic choice for most companies.

6) Use the builds from Adoptium ~~AdoptOpenJDK~~.

Update 2021-11-04: AdoptOpenJDK is now Adoptium. The links below have not been updated as there is not a matching page in all cases.

The community team at AdoptOpenJDK has been busy over the past few years creating a build farm and testing rig. As such, they are now able to take the stream of security patches * and turn them into releases, just like you would get from the commercial offerings. They also intend to run the Java TCK (testing compatibility kit) to allow these builds to be fully certified as being compatible with the Java SE specification. Their plan is to produce Java 8 builds until September 2023 or later (two years after Java 17 comes out). Obviously, you don't get a warranty or genuine support - its a community build farm project. But for most users that want to use Java 8 without paying, this is likely the best choice.

Note that Azul also offers $free OpenJDK release builds under the name Zulu. A key advantage of Azul's offering is that you can pay for support later if you need it without changing your JDK.

* The last two options assume that a group actually will step forward and take over the "JDK 8 updates" OpenJDK project once Oracle stop. While the exact project details are not yet confirmed, this IBM statement indicates real backing for the approach:

Recognizing the impact that the release cycle changes will have with Java developers, IBM will partner with other members of the OpenJDK community to continue to update an OpenJDK Java 8 stream with security patches and critical bug fixes. We intend to keep the current LTS version secure and high quality for 4 years. This timescale bridges the gap between LTS versions with 1 year to allow for a migration period. IBM has also invested in an open build and test project (AdoptOpenJDK.net) along with many partners and Java leaders to provide community binaries across commonly used platforms of OpenJDK with Hotspot and OpenJDK with Eclipse OpenJ9. These community binaries are TCK (Java SE specification) compliance tested and ready for developers to download and use in production.
IBM statement

And here is further indication of Red Hat's support for the June 2023 date, based on their "upstream first" policy.

> Red Hat has said it may step forward to be the maintainer for JDK 11 - might it also > step forward to be the maintainer for JDK 8?
Yes.
> How long will JDK 8 be maintained?
June 2023 is right for JDK 8, but I wouldn't be surprised if it goes on beyond that.
Andrew Haley, Red Hat

And finally, here is the official Oracle view on that transition.

Note that the process around security issues will be managed by the newly formed vulnerability group (which formally codifies what was happening anyway).

Staying on Java 9 or Java 10

Don't.

No-one wants to provide builds or support for Java 9 or Java 10. And anyway, there is no good reason not to upgrade to Java 11 in my opinion.

(Actually, Azul are providing medium-term support for Java 9, if you really need it.)

Staying on Java 11

It is a brave new world and not 100% clear, but this is how it looks like things will happen.

Firstly, it is not clear that there will be an Oracle JDK that is $free to download. Despite my best attempts, I could not get 100% clarity on this point. However, this tweet and other sources indicate that it will be accessible for development purposes, just not for use in production (which is a bit of a trap for the unwary). But in reality, it is now irrelevant as to whether Oracle JDK is $free to download or not.

Now that Java 11 is released, we can see that Oracle JDK can be downloaded for $free. However, the license has changed, explicitly ruling out use in a production environment (without paying). Given that Oracle JDK has been the main JDK in use for 23 years, this is a huge trap for the unwary.

But this is not actually a problem, because Oracle JDK is no longer that important. That is because from Java 11, developers can treat Oracle JDK and OpenJDK as being equivalent. (See here for the detailed differences.) It is no longer appropriate or correct to consider the OpenJDK build to be secondary or less important. In fact, the most important build is now the OpenJDK one.

To be more specific, as of the release date, Java 11 developers should consider using AdoptOpenJDK or jdk.java.net to obtain a binary download, not any page on oracle.com.

So, for how long will Oracle provide security patches to Java 11?

Again, the answer to this is not 100% clear:

> What will Java 11 get from Oracle?
At least six months of free, GPL-licensed updates with binaries at https://jdk.java.net.
(I say “at least” because that’s the current plan. The plan could change, to a longer time period, if the situation warrants.)
Mark Reinhold, Oracle

Clearly, six months of security updates is not long enough to treat Java 11 as a "long term support" (LTS) release. The promise of the period being potentially longer doesn't really help, as companies need longer timelines and more certainty. So the working assumption should be that Java 11 has just 6 months of releases containing security patches from Oracle.

At this point, things move into the realms of probabilities. In all likelihood, when Oracle steps down from managing the "JDK 11 updates" project at OpenJDK (the one containing the security patches), someone else will take over. Exactly as with Java 8, discussed above. This has happened before with Java 6 and 7. And the evidence is that it will happen for Java 11 too:

OpenJDK is a community project. It's up to the community to support it. In practice this means that a group of organizations and individuals will maintain each OpenJDK LTS release for some period (TBA for 11, but it's sure to be a *lot* longer than six months.) I am certain that there will be a jdk11u project, and it will be properly and professionally run. I think it's likely that I'll be leading the project, but someone else may be chosen.
Andrew Haley, Red Hat

See also this Red Hat blog on the topic.

That covers the repository containing the security patches. (Red Hat have an excellent record in maintaining old releases of OpenJDK for the wider community.) But there is still the question of producing actual releases to download that have been certified as passing the Java SE testing TCK.

This is where the AdoptOpenJDK build farm is critical:

As part of the discussions Andrew mentioned, AdoptOpenJDK offered to build, test and make available OpenJDK LTS binaries for the major (and several minor) platforms. This isn't yet set in concrete but folks broadly thought that was a good idea. So the challenge of having a build and test farm for this joint effort is solved.
Martijn Verburg, AdoptOpenJDK

And AdoptOpenJDK are currently planning to do create releases until September 2022, one year after Java 17 comes out.

If people do what they say they will, then we can therefore conclude that it will be possible to use Java 11 for 4 years from release, with security patches, for $free (zero-cost). (I would imagine that if volunteers came forward, the September 2022 date could be moved even further into the future.)

Of course, only you and your company can decide if it is right and proper to use Java without giving back to the ecosystem. It could be argued that it is more ethical to either pay for support, or assist either the AdoptOpenJDK or "JDK 11 updates" OpenJDK project.

This is therefore the updated table of $free updates:

Version	Release date	End of Oracle $free updates	End of OpenJDK-based $free updates
Java 8	March 2014	January 2019	June 2023 (or later)
Java 9	Sept 2017	March 2018	N/A
Java 10	March 2018	Sept 2018	N/A
Java 11	Sept 2018	March 2019 (or later)	September 2022 (or later)
Java 12	March 2019	Sept 2019	N/A

(June 2023 is the date Red Hat has provided for the end of JDK 8 security patches, September 2022 is the date AdoptOpenJDK have provided - one year after the expected release of the next LTS (long-term support) version, Java 17).

Platforms

The OpenJDK builds by Oracle at jdk.java.net only cover three platforms. But this doesn't mean that they are the only platforms OpenJDK runs on. For example, AdoptOpenJDK provides Java 8 builds on 9 platforms with Hotspot (including ARM, Windows 32-bit and Solaris) and more platforms with OpenJ9.

Summary

All the pieces are in place for Java 11 to be maintained as a long-term support release. However, unlike Java 6, 7 and 8, Oracle will not be leading the long-term support effort. In all likelihood, Red Hat will take over this task - they have said publicly that they would like to.

For the first 6 months of Java 11's life, Oracle will be providing GPL+CE licensed $free zero-cost downloads at jdk.java.net with security patches.

To get GPL+CE licensed $free zero-cost update releases of Java 11 after the first six months, you are likely to need to obtain them from a different URL and a different build team. Currently, my view is that your package manager or Adoptium is the best place to look for those builds.

Feel free to comment if I've missed something obvious.

Upgrading to Eclipse Photon

2018-07-09T10:51:00.000+01:00

I use Eclipse as my Java IDE. And the new release, Photon is now out.

Photon is a large release, with lots of new features. The most important is the separation of the test and main classpaths, which has always been a point of pain in the IDE. Now it just works as you would expect, and the Maven plugin M2E correctly sets it up:

Note the darker colour of the src/test classpath elements.

Support for Java 9 (modules) and Java 10 (local variable type inferenece) is also present, ready for Java 11 in September. You can also use JUnit 5. It even tries to help you reach 100% code coverage!

All in all, I feel this is a release where upgrading will make a difference to everyday coding.

I've upgraded my own Eclipse installations, and it all went pretty well. You can either start from a clean install (I prefer the basic IDE without plugins so I can choose which ones to add). Or you can add Photon as an update site, and let Eclipse update itself.

One problem I had was the plugin that connects Maven (M2E) to Checkstyle (Eclipse-CS), known as m2e-code-quality. Fortunately, the team at GEBIT have been maintaining a fork of the original plugin. However, they don't release it in binary form. As such, I had to build the plugin locally (no big deal - its a simple build).

To simplify the process however, I've created a repository on GitHub with my Eclipse setup files, and a binary zip of the GEBIT forked plugin.

To use just the m2e-code-quality GEBIT fork, download the zip file and add it as an update site. Here are some instructions.

Thank you Eclipse team for a great release!

PS. I won't be answering "how to" questions about upgrading Eclipse or the eclipse-setup repository. There are plenty of other places to ask questions, such as Stack Overflow or the Eclipse Forums.

JPMS modules for library developers - negative benefits

2018-03-22T10:52:00.000+00:00

Java 9 introduced a major new feature - JPMS, the Java Platform Module System. After six months I've come to the conclusion that JPMS currently offers "negative benefits" to open source library developers. Read on to understand why.

Modules for library developers

Java 8 is probably the most successful Java release ever. It is widely used and widely liked. As such, almost all open source libraries run on Java 8 (as library authors want their code to be used!). Some libraries with a long history also still run on older versions. Joda-Convert has a Java 6 baseline, while Joda-Time has a Java 5 baseline. Others have a Java 8 baseline, such as ThreeTen-Extra.

Java 9 was released in September 2017, but it is not a release that will be supported for a number of years. Instead, it had a lifetime of six months and is now obsolete because Java 10 is out. And in six months time Java 11 will be out making Java 10 obsolete, and so on.

While most releases last six months, some are luckier. Java 11 will be a "long term support" (LTS) release with security and bug support for a few years (Java 8 is also an LTS release). Thus, even though Java 10 is out, Java 8 is still the sensible Java version for open source library developers to target right now because it is the current LTS release.

But what happens when Java 11 comes out? Since Java 8 will be unsupported relatively soon after Java 11 is released, you'd think that the sensible baseline would be 11. Unfortunately I believe many companies will be sticking with Java 8 for a long time. An aggressive open source project might move quickly to a Java 11 baseline, but doing so would be a risky strategy for adoption.

The module-path

Before discussing the JPMS options for open source library developers, it is important to cover the distinction between the class-path and the module-path. The class-path that we all know and love is still present in Java 9+, and it mostly works in the same way.

The module-path is new. When a jar file is on the module-path any module-info is used to apply the new stricter JPMS rules. For example, a public method is no longer callable unless it has been exported from the module it is contained in (and required by the caller's module).

The basic idea is simple, you put old fashioned non-modular jar files on the class-path, while you put modular jar files on the module-path. Nothing enforces this however, and it turns out this is a bit of a problem. There are thus four possibilities:

modular jar on the module-path
modular jar on the class-path
classic non-modular jar on the module-path
classic non-modular jar on the class-path

To be sure your library works correctly, you need to test it both on the class-path and on the module-path. For example, service loading is very different on the module-path compared to the class-path. And some resource lookup methods also work completely differently.

To complicate this further, JPMS has no explicit support for testing. In order to test a module on the module-path (which is a tightly locked down set of packages) you have to use the --patch-module command line flag. This flag effectively extends the module, adding the testing packages into the same module as the classes under test. (If you only test the public API, you can do this without using patch-module, but in Maven you'd need a whole new project and pom.xml to achieve that approach, so its likely to be rare.)

In the latest Maven surefire plugin (v2.21.0 and later) the patch-module flag is used, but if your module has optional dependencies, or you have additional testing dependencies, you may have to manually add them, see this issue and this issue.

Given all this, what should an open source library developer do?

Option 1, do nothing

In most cases, but not all, code that is compiled on Java 8 or earlier will run just fine on the class-path in Java 9+. So, you can do nothing and ignore JPMS.

The problem is that other projects will depend on your library. By not adopting JPMS at all, you block those projects from progressing in their modularization. (A project can only choose to fully modularize once all of its dependencies are modularized.)

Of course if your code doesn't run on Java 9+ because you've used sun.misc.Unsafe or something else you shouldn't have done then you've got other things to fix.

And don't forget that a user could put your jar file on the class-path or the module-path. Have you tested both? ie. The truth is that "do nothing" is not possible - at a minimum you have extra testing to do, even if it just to document that your project does't work on the module-path.

Option 2, add a module name

Java 9+ recognises a new entry in the MANIFEST.MF file. The Automatic-Module-Name entry allows a jar file to declare what name it will use if/when it is turned into a proper modular jar file. Here is how you can use Maven to add it:

 <plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-jar-plugin</artifactId>
  <configuration>
   <archive>
    <manifestEntries>
     <Automatic-Module-Name>org.foo.bar</Automatic-Module-Name>
    </manifestEntries>
   </archive>
  </configuration>
 </plugin>

This is a nice simple way to move forward. It reserves the module name and allows other projects that depend on your jar file to fully modularize if they wish.

But because its so simple, its easy to forget the testing aspect. Again, your jar file might be placed on the class-path or on the module-path, and the two can behave quite differently. In fact, now that it has some module information, tools may treat it differently.

When Maven sees an Automatic-Module-Name it will normally place the classes on the module-path instead of the class-path. This may have no effect, or it may show up a bug where your code works on the class-path but not on the module-path. With Maven right now, you have to use surefire plugin v2.20.1 to test on the class-path (an old version that doesn't know about the module-path). To test on the module-path, use v2.21.0. Swapping between these two versions is of course a manual process, see this issue for a request to improve this.

While upgrading some of my projects I added Automatic-Module-Name without testing on the module-path. When I did eventually test on the module-path the tests failed, as the code simply didn't work on the module-path. Unfortunately, I now have some releases on Maven-Central that have Automatic-Module-Name but don't work on the module-path, happy days...

To emphasise this, just adding something to the MANIFEST.MF file can have an effect on how the project is run and tested. You need to test on both the class-path and module-path.

Option 3, add module-info.java

This is the full modularization approach described in numerous web pages and tutorials on JPMS.

 module org.foo.bar {
   requires org.threeten.extra;
   exports org.foo.bar;
   exports org.foo.bar.util;
 }

So, what are the implications of doing this to the open source project?

Unlike option 2, your code now has a baseline of Java 9+. The Java 8 compiler won't understand the file. What we really want is a jar file that contains Java 8 class files, but with just the module-info.java file compiled under Java 9+. In theory, when running on Java 8 the module-info.class file will be ignored if it is not used.

Maven has a technique to achieve this. While the technique works OK, it turns out to be nowhere near sufficient to achieve the goal. To actually get a single jar file that works on both Java 8 and 9+ you need:

use the release flag on Java 9+ to build for Java 8
add an OSGi require capability filter to inform it that its still Java 8 compatible
exclude module-info.java from maven-javadoc-plugin when building on Java 8
use maven-javadoc-plugin v3.0.0-M1 (not later), manually copy dependencies to a directory and refer to them using additional Javadoc command line arguments, see this issue
exclude module-info.java from maven-checkstyle-plugin
exclude module-info.java from maven-pmd-plugin
manually swap the version of maven-surefire-plugin to test both the module-path and the class-path

And probably some more I've forgotten about. Here is one pom.xml before integrating Java 9. Here it is after integrating Java 9. An increase from 650 to 862 lines, with lots of complexity, profiles and workarounds.

With a Java 11 baseline, the project would be simpler again, but that baseline isn't going to happen for a number of years. Note that my comments should not be interpreted as anti-Maven. A small team there is working hard to do the best they can - JPMS is complex.

And just for kicks, your project can no longer be used by Android (as the team there seems to be very slow in adding a simple "ignore module-info" rule). And many tools with older versions of bytecode libraries like ASM will fail too - I had a report that a particular version of Tomcat/TomEE could not load the modular jar file. I've ended up having to release a "classic" non-modular jar file to cope with these situations, something which is profoundly depressing.

While I've added module-info.java to some of my projects, I cannot recommend others to do so - its a very painful and time-consuming process. The time to consider it would appear to be once Java 11 or beyond is widely adopted and the baseline of your project.

Negative benefits

Now for the controversial part.

It is my conclusion that JPMS, as currently designed, has "negative benefits" for open source libraries.

As explained above, the cost of full modularization is high for library developers. The need to retain Java 8 compatibility makes JPMS really hard to use (module information should have been textual, not a class file). The tooling is still incomplete/buggy. Many older projects can't cope with the new jar files if you do go for it. Much of this will improve over time, but we're talking a number of years before Java 11 is widely adopted. But don't be lulled into just believing waiting will solve the key problem.

The split (bifurcation) of the module-path from the class-path is an absolute nightmare. At a stroke, there are now two different ways that your library can be run, and the two environments have quite different qualities. Code that compiles and runs on the class-path will often not compile or not run on the module-path. And vice versa. As a library author, you cannot control whether the class-path or module-path is used. You have no choice - you must test both, which you probably won't think to do. (And Maven currently provides no way to test both in one pom.xml)

Given all this effort and extra complexity, we should be getting some great benefits, right? Well no.

JPMS is supposed to ensure reliable configuration (that all your dependencies are available at startup) and strong encapsulation (that other code can't see or use packages that you want to keep hidden). But since there is no way to stop your modular jar file being used on the class-path, you get none of these benefits.

Did you put lots of effort into choosing which packages to hide? Meaningless, as the user can just put the jar file on class-path and call your internal packages. Did you believe that the JVM will check all your dependent modules are available before starting? Afraid not, no checks performed when the user puts the jar file on class-path.

Since we get none of the claimed benefits of JPMS, but get lots of extra work in testing and complexity in the build tools, I feel "negative benefits" is a pretty accurate summary.

Summary

As of today, JPMS is a pain for library authors. The split of the module-path from the class-path is at the heart of the problem. You really can't afford to release a library without testing on both module-path and class-path - there are subtle corner cases where the environments differ.

What is desperately needed is a small change to JPMS. There needs to be a way for a library author to insist that the modular jar file they are producing can only be run on the module-path (with any attempt to use it on the class-path preventing application startup). This would eliminate the need for testing both class-path and module-path. Together with the passage of time, JPMS might yet achieve its goals and go from negative to positive benefits.

Java 9 has six weeks to live

2018-02-05T13:10:00.001+00:00

Java 9 is obsolete in just six weeks (20th March 2018). What? You haven't upgraded yet? Well, Java 10 is only going to last six months before it is obsolete too.

Update 2018-03-20: Java 10 is released. Java 9 is obsolete.

Release train impact

The new Java release train means that there will be a new release of Java every six months. And when the next release comes out, the previous release is obsolete.

What do I mean by obsolete?

In practical terms it means that there are no more security updates from Oracle. (Theoretically, the OpenJDK community could release security updates, but there is no sign of this yet). And since you don't want to run your software without the latest security updates, you are expected to upgrade to Java 10 as soon as it is released.

As a user of Java, here are three possible ways to approach the release train:

Stay on Java 8, the current LTS (long term support) release, until the next LTS release occurs (Java 11)
Move from Java 9 to Java 10 to Java 11, making sure you update rapidly to get the security updates
Stay on Java 9 (or Java 10) and don't worry about security updates

If you have already moved to Java 9, you have effectively committed to option 2 or 3. If you care about security updates, you need to be prepared to switch to Java 10 shortly after it is release on 20th March. To do this, you probably should be testing with a Java 10 pre-release now. If you find that to be a challenge, you have to stop caring about security, or consider going back to Java 8 LTS.

However you look at it, being on the release train is a big commitment.

Will your dependencies work on the next version?
Will your IDE be ready?
Will your build tool (Maven, Gradle etc.) be ready?
Will your other tools (spotbugs, checkstyle, PMD etc.) be ready?
How fast are you going to be able to update when the release you are on is obsolete?

Lots to consider. And given the number of external tools/dependencies to consider, I think its fair to say that its a bold choice to use Java 9 or 10.

Summary

With a release every six months, it is important to decide on an approach to the release train. If you want to upgrade every six months, great! But you'll need to test pre-releases of Java with your whole toolchain in advance of the release to ensure you don't get stuck on an unpatched obsolete release of Java.

Java SE 9 - JPMS automatic modules

2017-05-09T08:14:00.001+01:00

This article in my series on the Java Platform Module System (JPMS) is focussed on automatic modules. JPMS was previously known as Project Jigsaw and is the module system in Java SE 9. See also Module basics, Module naming and Modules & Artifacts.

Automatic modules

Lets say you are in charge of Java, and after 20 years you want to add a module system to the platform. As well as the problems of designing the module system itself, you have to consider migration of all the existing code written in Java (and to a degree, other JVM languages).

The solution to this that JPMS has chosen is automatic modules. Unfortunately, my opinion is that it is the wrong solution.

To understand automatic modules, we have to start by looking at how jar files will be specified in future. In addition to the classpath, Java SE 9 will also have a modulepath. The basic idea is that modules (jar files containing module-info.class) will be placed on the modulepath, not the classpath. In fact, placing a module on the classpath will cause the module declaration (module-info.class) to be completely ignored, which is usually not what you want.

As a basic rule, the modulepath cannot see the classpath. If you create a module and put it on the modulepath, all its dependencies must also be on the modulepath. Thus, in order to write a module at all, all the dependencies must also have been converted to be modules. And many of those dependencies are likely to be open source projects, with varying release schedules.

Clearly, this is a bit of a problem. Essentially, it would mean that an application would need to wait until every dependency had become a module before it could add module-info.java. The "solution" to this is automatic modules.

An automatic module is a normal jar file - one without a module-info.class file - that is placed on the modulepath. Thus the modulepath will contain two types of module - "real" and "automatic". Since there is no module-info.class, an automatic module is missing the metadata normally associated with a module:

no module name
no list of exported packages
no list of dependencies

Unsurprisingly, the list of exported packages is simply set to be all packages in the jar file. The list of dependencies is set to be everything on the modulepath, plus the whole classpath. As such, automatic modules allow the modulepath to depend on the classpath, something which is normally not allowed. While not ideal, both of these defaults for the metadata make sense.

The final missing piece of information is the module name. This has been a big point of discussion in Project Jigsaw. As it stands, the module name will be derived from the filename of the jar file. For me, this is a critical problem with the basic design of automatic modules (but see the mitigation section below).

As per my last blog, modules are not artifacts. But the filename of a jar file is usually the Maven artifactId, a name disconnected from the module name (which should be the super-package reverse DNS). For example, the filename of Google Guava is guava while the correct module name would be com.google.common.

Taken in isolation, this all works fine. The application can now be modularised, and I can depend on a project that is not modularised:

 module com.foo.myapp {
  requires guava;  // guava not yet modularised, so use the filename
 }

But in order to work, the module-info.java file must specify the dependency as being on the filename, not the module name.

The astute will note that depending on the module name is generally not possible until the dependency is a module. But equally, depending on the filename is using a name that is going to be wrong once the dependency is turned into a module:

 module com.foo.myapp {
  requires com.google.common;  // now guava has been modularised
 }

It is this change of name that is at the heart of the problem. Essentially it means that the name used to define the dependencies of your library/application will change once they are modularised. While this name change is fine in a private codebase, it will be hell in the world of Java open source.

Impact of automatic modules

To fully understand how automatic modules affect the open source Java community, it is best to look at a use case. Lets consider what happens if an open source project is released that depends on a filename instead of a module name. And then another open source project is released that depends on that:

Project	Version	Module name	Requires
Strata	v1	com.opengamma.strata	org.joda.convert
Strata	v1	com.opengamma.strata	guava (a filename)
Joda-Convert	v1	org.joda.convert	guava (a filename)
Guava	v1	(not a module yet)

What we now have is a graph of three projects, where the lowest is an automatic module, and the next two are real modules that depend on an automatic module. When Guava is finally modularised, a new release will occur. But Strata and Joda-Convert cannot immediately use the new release, because the module name they reference is now wrong:

Project	Version	Module name	Requires
Strata	v1	com.opengamma.strata	org.joda.convert
Strata	v1	com.opengamma.strata	guava (a filename)
Joda-Convert	v1	org.joda.convert	guava (a filename)
Guava	v2	com.google.common
		Module Hell - "guava" != "com.google.common"

As can be seen, this setup does not work. We have Module Hell The top two projects depend on "guava", not "com.google.common". And there is no way to have the same packages loaded under two different module names.

What happens if Joda-Convert is updated to match the new Guava? (Strata is not updated)

Project	Version	Module name	Requires
Strata	v1	com.opengamma.strata	org.joda.convert
Strata	v1	com.opengamma.strata	guava (a filename)
Joda-Convert	v2	org.joda.convert	com.google.common
Guava	v2	com.google.common
		Module Hell - "guava" != "com.google.common"

This configuration does not work. There is no way to have the same packages loaded under two different module names.

What happens if Strata is updated to match the new Guava? (Joda-Convert is not updated)

Project	Version	Module name	Requires
Strata	v2	com.opengamma.strata	org.joda.convert
Strata	v2	com.opengamma.strata	com.google.common
Joda-Convert	v1	org.joda.convert	guava (a filename)
Guava	v2	com.google.common
		Module Hell - "guava" != "com.google.common"

This configuration also does not work. There is no way to have the same packages loaded under two different module names.

The only way to get it to work is to update the whole stack together:

Project	Version	Module name	Requires
Strata	v2	com.opengamma.strata	org.joda.convert
Strata	v2	com.opengamma.strata	com.google.common
Joda-Convert	v2	org.joda.convert	com.google.common
Guava	v2	com.google.common

To summarise, when a module depends on an automatic module, and that module is then depended on by others, the whole stack is then linked. Everything in the stack had to go from v1 to v2 together.

Looking back at the whole example, it should be clear that the problem started right back at the beginning. There never should have been a v1 release of Strata that depended on a filename "guava". Or a v1 release of Joda-Convert that depended on a filename. Instead, Strata should have waited until both Guava and Joda-Convert had released a modularised v2. And, Joda-Convert should have waited until Guava had released a modularised v2. To avoid Module Hell, migration must occur from bottom to top.

Given this, it is my opinion that this means that automatic modules do not by themselves provide a viable migration path for the Java open source community. The rule is as follows:

Do not release to Maven Central a modular jar file that expresses a dependency on a filename. Instead, wait until all dependencies can be expressed as module names.

Any jar file in Maven Central that expresses a dependency on a filename will be a cause of Module Hell.

Mitigation

Without any mitigation, the community would have to modularise each open source library one by one from the bottom of the stack upwards. No open source project could do anything until all its dependencies are modularised - a bottom-up migration.

The main piece of mitigation proposed for this problem is that jar files can have a new entry in MANIFEST.MF called "Automatic-Module-Name". When JPMS examines an automatic module, if the MANIFEST.MF entry is present then it uses the value as the module name instead of the filename.

This can be used to break the cycle to a degree. In the case above, the Strata team can release a version at any time with the new MANIFEST.MF entry, essentially stating what module name Strata is going to have in the future. Similarly, Joda-Convert can release at any time with the MANIFEST.MF entry stating what its module name will be. Full modularisation will still need to proceed from the bottom-up, but anyone depending on Joda-Convert or Strata can safely release without being affected by the question over Guava's module name.

The key difference between adding the MANIFEST.MF entry and adding a module declaration is that the MANIFEST.MF entry does not need the names of the dependencies to be specified. As such, there is no need to wait for the dependencies to be modularised before adding the MANIFEST.MF entry.

The rule outlined above can thus also be expressed as:

Do not release to Maven Central a modular jar file that depends on an automatic module, unless the automatic module has an "Automatic-Module-Name" MANIFEST.MF entry.

To be clear, I don't think this is ideal, but we are where we are. The message to the open source community is in two parts therefore.

Firstly, do not add a module-info.java module declaration until:

all of your runtime dependencies have been modularised (either as a full module or with a MANIFEST.MF entry)
all those modularised dependencies have been released to Maven Central
your library depends on the updated versions

Secondly, if you can't meet these criteria, but your project is well structured and otherwise suitable for modularisation, please add a MANIFEST.MF entry following the agreed module naming conventions (super-package reverse-DNS).

If everyone does this, then we stand a reasonable chance of avoiding Module Hell.

Summary

Automatic modules allow modules to depend on non-modules. But this is achieved by specifying a requires clause on the filename, not the module name. This will cause pain later if modules are published depending on the filename. A new MANIFEST.MF entry allows any open source project to choose a module name and publish that choice immediately. When JPMS sees the MANIFEST.MF entry, it will use the value as the name for the automatic module.

Community members must at all costs avoid publishing modular jar files that depend on filenames. But community members can publish jar files containing the new MANIFEST.MF entry from now on (although technically the MANIFEST.MF entry is not yet finalised, it probably will be soon).

Comments and feedback welcome.

Java SE 9 - JPMS modules are not artifacts

2017-04-24T09:12:00.000+01:00

This is the next article in a series I'm writing to help make sense of the Java Platform Module System (JPMS) in Java SE 9. JPMS was developed as Project Jigsaw. Other articles in the series are Module basics and Module naming.

Module != Artifact

If you want to grasp what JPMS modules are all about, it turns out that it is critical to understand what they are not. In particular, they are not artifacts.

Firstly, lets define an artifact. An artifact is a file produced when developing software. For a project on Maven Central, this includes jar files of bytecode, jar files of sources and jar files of Javadoc. We are interested primarily in the bytecode for this discussion.

Secondly, lets assume that a project is going to have the same module name over time. This is just like package names - projects don't change package name with every release.

Given this, what is the mapping between an artifact and a module?

Versions

Each version of a project will consist of a different artifact (jar file), perhaps released on Maven Central. Each version will have the same module name. But, we also know that the Java platform (JPMS) does not know about versions or version-selection.

Therefore, when assembling a modulepath for Java SE 9, something else is going to have to choose the correct version of the module. This will typically be the build tool, eg. Maven.

But while the classpath will tolerate having two versions of the artifact (typically with bad consequences at runtime), the JPMS modulepath will refuse to start if there two modules contain the same package, as would happen if two versions of the same module are found.

Maven already manages versions of course, picking one version from a set of versions, where all with the same groupId:artifactId. With Java SE 9 we can say that Maven is picking one artifact from a set of artifacts to use in the runtime JPMS module graph.

Artifacts		JPMS runtime module
org.joda : joda-convert : 1.2	Build tool must pick one of these artifacts for the runtime JPMS module graph	org.joda.convert
org.joda : joda-convert : 1.1
org.joda : joda-convert : 1.0

Patching bugs

We've all run into the problem of finding a bug in another library, such as one on Maven Central. Most of the time, we workaround the bug. Sometimes, we have to fix it and have a private copy of the library (ie. a private version of the library). In extreme cases, we have to publish the bug-fix version to Maven Central.

If you want to publish a patched version of an open source project, the standard way to do this is to use your groupId, not the original groupId.

The patched version still uses the same package name. If it didn't it, it would be no use as a patched version.

Exactly the same rationale applies to the module name - the module name of the patched version needs to stay the same. This makes sense, because the module name is an aspect of the bytecode, not of the deployment.

As before, the build tool needs to be setup to pick the correct jar file artifact, this time choosing between the patched one and the original.

Artifacts		JPMS runtime module
org.joda : joda-convert : 1.2	Build tool must pick either the original artifact or the patch for the runtime JPMS module graph	org.joda.convert
com.foo : patched-joda-convert : 1.2-P

License-driven jars

There can be a situation where the equivalent code is released twice for license reasons. The most common case of this has been driven by the JCP.

Imagine that a JSR has been produced, and because of licensing restrictions, Apache and RedHat each decide to produce their own version of the specification (API) jar, one Apache licensed and one LGPL licensed. There will be two different jar file artifacts in Maven Central - redhat-jsr789-1.0.jar and apache-jsr789-1.0.jar. Both of these contain the same package name, and the same interfaces, because they are the same specification.

When creating a module-info.java file for these, teams might be tempted to put "redhat" or "apache" in the module name. But this would be wrong, as a package must be in one module at runtime. Thus, both teams must use the same module name, based on the package name, such as javax.foo. In reality, future JSRs will need to choose their own module name.

Just as in the other cases, the build tool will need to select the correct jar file artifact to use for any given application.

Artifacts		JPMS runtime module
org.apache : apache-jsr789 : 1.0	Build tool must pick one of the artifacts for the runtime JPMS module graph	javax.foo
com.redhat : redhat-jsr789 : 1.0-GA

Two worlds

JPMS brings front and centre a distinction that we probably haven't thought too much about - that between build tools and code. I find it to be a useful mental model to keep these two worlds separate.

	Build/Deploy world	Code world
Concept	Artifacts (jar files) Versions Groups Organizations	Modules Packages Classes/Interfaces Methods/Fields
Identifier	org.joda : joda-convert : 1.2	org.joda.convert
Tool	Maven Gradle Ant	javac java jar

Summary

All the use cases above (and others) reduce to the fact that a build tool, such as Maven, must pick one artifact from many to satisfy the module requested in the JPMS modules graph. The module, and the module name, is tightly linked to the bytecode and packages. By contrast, the Maven groupId:artifactId:version co-ordinates are a tool for identifying each artifact in order to pick the correct one to use.

Hopefully, this helps explain the philosophical difference between artifacts and modules, and why modules should be named after their super-package, not after the artifactId. And why you cannot derive the module name from groupId:artifactId - just look at the tables above to see the varying artifact names.

Questions and comments welcome.

Java SE 9 - JPMS module naming

2017-04-20T15:01:00.000+01:00

The Java Platform Module System (JPMS) is soon to arrive, developed as Project Jigsaw. This article follows the introduction and looks at how modules should be named.

As with all "best practices", they are ultimately the opinion of the person writing them. I hope however to convince you that my opinion is right ;-). And as a community, we will certainly benefit if everyone follows the same rules, just like we benefited from everyone using reverse-DNS for package names.

TL;DR - My best practices

These are my recommendations for module naming:

Module names must be reverse-DNS, just like package names, e.g. org.joda.time.
Modules are a group of packages. As such, the module name must be related to the package names.
Module names are strongly recommended to be the same as the name of the super-package.
Creating a module with a particular name takes ownership of that package name and everything beneath it.
As the owner of that namespace, any sub-packages may be grouped into sub-modules as desired so long as no package is in two modules.

Thus the following is a well-named module:

  module org.joda.time {
    requires org.joda.convert;

    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

As can be seen, the module contains a set of packages (exported and hidden), all under one super-package. The module name is the same as the super-package name. The author of the module is asserting control over all names below org.joda.time, and could create a module org.joda.time.18n in the future if desired.

To understand why this approach makes sense, and the finer details, read on.

JPMS naming

Naming anything in software is hard. Unsurprisingly then, agreeing an approach to naming modules has also turned out to be hard.

The naming rules allow dots, but prohibit dashes, thus lots of name options are closed off. As a side note, module names in the JVM are more flexible, but we are only considering names at the Java level here.

These are the two basic approaches which I think make sense:

1) Project-style. Short names, as commonly seen in the jar filename from Maven Central.

2) Reverse DNS. Full names, exactly as we've used for a package names since Java v1.0.

Here are some examples to make it more clear:

	Project-style	Reverse-DNS
Joda-Time	joda.time	org.joda.time
Commons-IO	commons.io	org.apache.commons.io
Strata-Basics	strata.basics	com.opengamma.strata.basics
JUnit	junit	org.junit

All things being equal, we'd choose the shorter name - project-style. It is certainly more attractive when reading a module-info.java file. But there are some clear reasons why reverse-DNS must be chosen.

It is worth noting that Mark Reinhold currently indicates a preference for project-style names. However, the linked mail doesn't really deal with the global uniqueness or clashing elements of the naming problem, and others in the expert group disagreed with project-style names.

Ownership and Uniqueness

The original designers of Java made a very shrewd choice to proposed reverse-DNS names for packages. This approach has scaled very well, through the incredible rise of open source software. It provides two key properties - Ownership and Uniqueness.

The ownership aspect of reverse-DNS delegates control of part of the global DNS namespace to an individual or company. It is a universally agreed approach with enough breadth of identifiers to make clashes rare. Within that namespace, developers are then responsible for ensuring uniqueness. Together, these two aspects result in globally unique package names. As such, it is pretty rare that code has two colliding packages, despite modern applications pulling in hundreds of dependent jar files. For example, the Spark framework and Apache Spark co-exist despite having the same simple name. But look what happens if we only use project-style names:

	Project-style	Reverse-DNS
Spark framework	spark.core	com.sparkjava.core
Apache-Spark	spark.core	org.apache.spark.core

As can be seen, the project-style names clash! JPMS will simply refuse to start a modulepath where two modules have the same name, even if they contain different packages. (Since these projects haven't chosen module names yet, I've tweaked the example to make them clash. But this example is far from impossible, which is the point here!)

Not convinced? Well imagine what would happen if package names were not reverse-DNS. If your application pulls in hundreds of dependencies, do you think there would be no duplicates?

Of course we have project-style names today in Maven - the jar filename is the artifactId which is a project-style name. Given this, why don't we have problems today? Well it turns out that Maven is smart enough to rename the artifact if there is going to be a clash. The JPMS does not offer this ability - your only choice with a clash will be to rewrite the module-info-class file of the problematic module and all other modules that refer to it.

As a final example of how project-style name clashes can occur, consider a startup creating a new project - "willow". Since they are small, they choose a module name of "willow". Over the next year, the startup becomes fantastically successful, growing at an exponential rate, meaning that there are now 100s of modules within the company depending on "willow". But then a new Open Source project starts up, and calls itself "willow". Now, the company can't use the open source project. Nor can the company release "willow" as open source. These clashes are avoided if reverse-DNS names are used.

To summarize this section, we need reverse-DNS because module names need to be globally unique, even when writing modules that are destined to remain private. The ownership aspect of reverse-DNS provides enough namespace separation for companies to get the uniqueness necessary. After all, you wouldn't want to confuse Joda-Time with the freight company also called Joda would you?

Modules as package aggregates

The JPMS design is fundamentally simple - it extends JVM access control to add a new concept "modules" that groups together a set of packages. Given this, there is a very strong link between the concept of a module and the concept of a package.

The key restriction is that a package must be found in one and only one module.

Given that a module is formed from one or more packages, what is the conceptually simplest name that you can choose? I argue that it is one of the package names that forms the module. And thus a name you've already chosen. Now, consider we have a project with three packages, which of these three should be the module name?

  module ??? {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
  }

Again, I'd argue there isn't really a debate. There is a clear super-package, and that is what should be used as the module name - org.joda.time in this case.

Hidden packages

With JPMS, a module can hide packages. When hidden, the internal packages are not visible in Javadoc, nor are they visible in the module-info.java file. This means that consumers of the module have no immediate way of knowing what hidden packages a module has.

Now consider again the key restriction that a package must be found in one and only one module. This restriction applies to hidden packages as well as exported ones. Therefore if your application depends on two modules and both have the same hidden package, your application cannot be run as the packages clash. And since information on hidden packages is difficult to obtain, this clash will be surprising. (There are some advanced ways to around these clashes using layers, but these are designed for containers, not applications.)

The best solution to this problem is exactly as described in the last section. Consider a project with three exported packages and two hidden ones. So long as the hidden packages are sub-packages of the module name, we should be fine:

  module org.joda.time {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

By using the super-package name as the module name, the module developer has taken ownership of that package and everything below it. So long as all the non-exported packages are conceptually sub-packages, the end-user application should not see any hidden package clashes.

Automatic modules

JPMS includes a feature whereby a regular jar file, without a module-info.class file, turns into a special kind of module just by placing it on the modulepath. The automatic module feature is controversial in general, but a key part of this is that the name of the module is derived from the filename of the jar file. In addition, it means that people writing module-info.java files have to guess the name that someone else will use for a module. Having to guess a name, and having the Java platform pick a name based on the filename of a jar file are both bad ideas in my opinion, and that of many others, but our efforts to stop them seem to have failed.

The naming approach outlined in this article provides a means to mitigate the worst effects of this. If everyone uses reverse-DNS based on the super-package, then the guesses that people make should be reasonably accurate, as the selection process of a name should be fairly straightforward.

What if there isn't a clear super-package?

There are two cases to consider.

The first case is where there really is a super-package, it's just that it has no code. In this case, the implied super-package should be used. (Note that this example is Google Guava, which doesn't have guava in the package name!):

  module com.google.common {
    exports com.google.common.base;
    exports com.google.common.collect;
    exports com.google.common.io;
  }

The second case is where a jar file has two completely unrelated super-packages:

  foo.jar
  - package com.foo.util
  - package com.foo.util.money
  - package com.bar.client

The right approach here is to break the jar file into two separate modules:

  module com.foo.util {
    requires com.bar.client;
    exports com.foo.util;
    exports com.foo.util.money;
  }
  module com.bar.client {
    exports com.bar.client;
  }

Failure to do this is highly likely to cause clashes at some point, as there is no way that com.foo.util should be claiming ownership of the com.bar.client namespace.

If com.bar.client is going to be a hidden package when converted to modules, then instead of it being a separate module, it can be repackaged (i.e. shaded) under the module's super-package:

  module com.foo.util {
    exports com.foo.util;
    exports com.foo.util.money;
    // not exported: com.foo.util.shade.com.bar.client;
  }

Can you have sub-modules?

Yes. When a module name is chosen, the developer is taking control of a namespace. That namespace consists of the module name and all sub-names below it - sub-package names and sub-module names.

Ownership of that namespace allows the developer to release one module or many. The main constraint is that there should not be two published modules containing the same package.

As a side effect of this, the practice of larger projects releasing an "all" jar will need to stop. An "all" jar is used when the project has lots of separate jar files, but also wants to allow end-users to depend on a single jar file. These "all" jar files are a pain in Maven dependency trees, but will be a disaster in JPMS ones, as there is no way to override the metadata, unlike in Maven.

What if my existing project does not meet these guidelines?

The harsh suggestion is to change the project in an incompatible manner so it does meet the guidelines. JPMS in Java SE 9 is disruptive. It does not take the approach of providing all the tools necessary to meet all the edge cases in current deployments. As such, it is not surprising that some jar files and some projects will require some major rework.

Why ignore the Maven artifactId?

JPMS is an extension to the Java platform (language and runtime). Maven is a build system. Both are necessary, but they have different purposes, needs and conventions.

JPMS is all about packages, grouping them together to form modules and linking those. In this way, developers are working with source code, just like any other source code. What artifacts the source code is packed up into is a separate question. Understanding the separation is hard, because currently there is a one-to-one mapping between the module and the jar file, however, we should not assume this will always be the case in the future.

Another example of this separation is versioning. JPMS has little to no support for versions, yet build systems like Maven do. When running the application, Maven is responsible for collecting a coherent set of artifacts (jar files) to run the application, just as before. It's just that some of those might be modules.

Finally, the Maven artifactId does not exist in isolation. Maven makes unique identifiers by combining the groupId, artifactId and classifier. Only the combination is sufficiently globally unique to be useful. Picking out just the artifactId and trying to make a unique module name from it is asking for trouble.

See also this follow up article on modules vs artifacts.

Summary

JPMS module names, and the module-info.java in general, are going to require real thought to get right. The module declaration will be as much a part of your API as your method signatures.

The importance is heightened because, unlike Maven and other module systems, JPMS has no way to fix broken metadata. If you rely on some modular jar files, and get a clash or find some other mistake in the module declarations, your only options will be to not use JPMS or to rewrite the module declarations yourself. Given this difficulty, it is not yet clear that JPMS will be a success, thus your best option may be to not modularize your code.

See the TL;DR section above for the summary of the module name proposal. Feedback and questions welcome.

PS. For clarity, my personal interest is ensuring Java succeeds, something that will IMO require consistent naming.

Java 9 modules - JPMS basics

2017-04-17T06:17:00.000+01:00

The Java Platform Module System (JPMS) is the major new feature of Java SE 9. In this article, I will introduce it, leaving most of my opinions to a follow up article. This is based on these slides.

Java Platform Module System (JPMS)

The new module system, developed as Project Jigsaw, is intended to raise the abstraction level of coding in Java as follows:

The primary goals of this Project are to:
* Make the Java SE Platform, and the JDK, more easily scalable down to small computing devices;
* Improve the security and maintainability of Java SE Platform Implementations in general, and the JDK in particular;
* Enable improved application performance; and
* Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.
To achieve these goals we propose to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. The module system should be powerful enough to modularize the JDK and other large legacy code bases, yet still be approachable by all developers.

However as we shall see, project goals are not always met.

What is a JPMS Module?

JPMS is a change to the Java libraries, language and runtime. This means that it affects the whole stack that developers code with day-to-day, and as such JPMS could have a big impact. For compatibility reasons, most existing code can ignore JPMS in Java SE 9, something that may prove to be very useful.

The key conceptual point to grasp is that JPMS adds new a concept to the JVM - modules. Where previously, code was organized into fields, methods, classes, interfaces and packages, with Java SE 9 there is a new structural element - modules.

a class is a container of fields and methods
a package is a container of classes and interfaces
a module is a container of packages

Because this is a new JVM element, it means the runtime can apply strong access control. With Java 8, a developer can express that the methods of a class cannot be seen by other classes by declaring them private. With Java 9, a developer can express that a package cannot be seen by other modules - ie. a package can be hidden within a module.

Being able to hide packages should in theory be a great benefit for application design. No longer should there be a need for a package to be named "impl" or "internal" with Javadoc declaring "please don't use types from this package". Unfortunately, life won't be quite that simple.

Creating a module is relatively simple however. A module is typically just a jar file that has a module-info.class file at the root - known as a modular jar file. And that file is created from a module-info.java file in your sourcebase (see below for more details).

Using a modular jar file involves adding the jar file to the modulepath instead of the classpath. If a modular jar file is on the classpath, it will not act as a module at all, and the module-info.class will be ignored. As such, while a modular jar file on the modulepath will have hidden packages enforced by the JVM, a modular jar file on the classpath will not have hidden packages at all.

Other module systems

Java has historically had other module systems, most notably OSGi and JBoss Modules. It is important to understand that JPMS has little resemblance to those systems.

Both OSGi and JBoss Modules have to exist without direct support from the JVM, yet still provide some additional support for modules. This is achieved by launching each module in its own class loader, a technique that gets the job done, yet is not without its own issues.

Unsurprisingly, given these are existing module systems, experts from those groups have been included in the formal Expert Group developing JPMS. However, this relationship has not been harmonious. Fundamentally, the JPMS authors (Oracle) have set out to build a JVM extension that can be used for something that can be described as modules, whereas the existing module systems derive experience and value from real use cases and tricky edge cases in big applications that exist today.

When reading about modules, it is important to consider whether the authors of the article you are reading are from the OSGi/JBoss Modules design camp. (I have never actively used OSGi or JBoss Modules, although I have used Eclipse and other tools that use OSGi internally.)

module-info.java

The module-info.java file contains the instructions that define a module (the most important ones are covered here, but there are more). This is a .java file, however the syntax is nothing like any .java file you've seen before.

There are two key questions that you have to answer to create the file - what does this module depend on, and what does it export:

module com.opengamma.util {
  requires org.joda.beans;  // this is a module name, not a package name
  requires com.google.guava;

  exports com.opengamma.util;  // this is a package name, not a module name
}

(The names to use for modules needed a whole separate article, for this one I'll use package-name style)

This module declaration says that com.opengamma.util depends on (requires) org.joda.beans and com.google.guava. It exports one package, com.opengamma.util. All other packages are hidden when using the modulepath (enforced by the JVM).

There is an implicit dependency on java.base, the core module of the JDK. Note that the JDK itself is also modularized, so if you want to depend on Swing, XML or Logging, that dependency needs to be expressed.

module org.joda.beans {
  requires transitive org.joda.convert;

  exports org.joda.beans;
  exports org.joda.beans.ser;
}

This module declaration says that org.joda.beans depends on (requires) org.joda.convert. The "requires transitive", as opposed to a simple "requires", means that any module that requires org.joda.beans can also see and use the packages from org.joda.convert. This is used here as Joda-Beans has methods where the return type is from Joda-Convert. This is shown by a dashed line.

module org.joda.convert {
  requires static com.google.guava;

  exports org.joda.convert;
}

This module declaration says that org.joda.convert depends on (requires) com.google.guava, but only at compile time, "requires static", as opposed to a simple "requires". This is an optional dependency. If Guava is on the modulepath, then Joda-Convert will be able to see and use it, and no error will occur if Guava is not present. This is shown by a dotted line.

Access rules

When running a modular jar on the modulepath with JVM access rules applied, code in package A can see a type in package B if:

the type is public
package B is exported from it's module
there is a dependency from the module containing package A to the module containing package B

Thus, in the example above, code in module com.opengamma.util can see packages org.joda.beans, org.joda.beans,ser, org.joda.convert and any package exported by Guava. However, it cannot see package org.joda.convert.internal (as it is not exported). In addition, code module com.google.guava cannot see code in package org.joda.beans or org.joda.convert as there is no modular dependency.

What can go wrong?

The basics described above are simple enough. It is initially quite easy to imagine how you might build an application from these foundations and benefit from hiding packages. Unfortunately, quite a few things can go wrong.

1) All use of module-info files only applies if using modular jars on the modulepath. For compatibility, all code on the classpath is packaged up as a special unnamed module, with no hidden packages and full access to the whole JDK. Thus, the security benefits of hiding packages are marginal at best. However, the modules of the JDK itself are always run in modular mode, thus are always guaranteed the security benefits.

2) Versions of modules are not handled. You cannot have the same module name loaded twice - you cannot have two versions of the same module loaded twice. It is left entirely to you, and thus to your build tool, to create a coherent set of modules that can actually be run. Thus, the classpath hell situation caused by clashing versions is not solved. Note that putting the version number in the module name is a Bad Idea that does not solve this problem and creates others.

3) Two modules may not contain the same package. This seems eminently sensible, until you consider that it also applies to hidden packages. Since hidden packages are not listed in module-info.class, a tool like Maven must unpack the jar file to discover what hidden packages there are in order to warn of clashes. As a user of the library, such a clash will be completely surprising, as you won't have any indication of the hidden packages in the Javadoc. This is a more general indication that JPMS does not provide sufficient isolation between modules, for reasons that are far from clear at this point.

4) There must be no cycles between modules, at compile time and at runtime. Again, this seems sensible - who wants to have module A depend on B depend on C which depends on A? But the reality of existing projects is that this happens, and on the classpath is not a problem. For example, consider what would happen if Guava decided to depend on Joda-Convert in the example above. This restriction will make some existing open source projects hard to migrate.

5) Reflection is changing, such that non-public fields and methods will no longer be accessible via reflection. Since almost every framework uses reflection in this way, there will be significant work needed to migrate existing code. In particular, the JDK will be very locked down against reflection, which may prove painful (command line flags can escape the trap for now). This article hasn't had a chance to explore how the module declaration can influence reflection - see "opens" in the slides for more details.

6) Are your dependencies modularized? In theory, you can only turn your code into a module once all your dependencies are also modules. For any large application with hundreds of jar file dependencies, this will be a problem. The "solution" is automatic modules, where a normal jar file placed on the modulepath is automatically turned into a module. This process is controversial, with naming a big issue. Library authors should not publish modules that depend on automatic modules to public repositories like Maven Central unless they have the Automatic-Module-Name manifest entry. Again, automatic modules deserve their own article!

7) Module naming is not yet set in stone. I've come to believe that naming your module after the highest package it contains, causing that module to "take ownership" of the subpackages, is the only sane strategy.

8) Conflicts with build systems - who is in charge? A Maven pom.xml also contains information about a project. Should it be extended to allow module information to be added? I would suggest not, because the module-info.java file contains a binding part of your API, and that is best expressed in .java code, not metadata like pom.xml.

For those wanting a book to read in much more depth, try this one from Nicolai.

Summary

Do not get too excited about JPMS - modules in Java 9. The above is only a summary of what is possible with the module-info.java file and the restrictions of the JPMS. If you are thinking of modularizing your library or application, please wait a little longer until everything becomes a little clearer.

Feedback welcome, but bear in mind that I am planning more articles.

Java Time (JSR-310) enhancements in Java SE 9

2017-02-07T14:47:00.000+00:00

The java.time.* API (JSR-310) was added to Java SE 8, but what has been going on since then?

Java Time in Java SE 9

There are currently 117 java time issues targetted into Java SE 9. Most of these are not especially interesting, with a lot of mistakes in the Javadoc that needed fixing. What follows are some of the interesting ones:

Main enhancements:

JDK-8146218 - Add LocalDate.datesUntil method producing Stream.
Adds two new methods - LocalDate.datesUntil(LocalDate) and LocalDate.datesUntil(LocalDate,Period) - returning a stream of dates.

JDK-8068730 - Increase precision of Clock.systemUTC().
The clock in Java - System.currentTimeMillis() - has ticked in milliseconds since Java was first released. With Java SE 9, users of Clock will see higher precision, depending on the available clock of the operating system.

JDK-8071919 - Clock.tickMillis(ZoneId zone) method.
With the system clock now returning higher precision, a new method was added - Clock.tickMillis(ZoneId) - that chops off the extra precision to restrore the millisecond ticking behaviour of Java SE 8.

JDK-8030864 - Add efficient getDateTimeMillis method to java.time.
This adds two methods named epochSecond to Chronology that have no object creation to convert date-time fields to an epoch-second.

JDK-8142936 - Duration methods for days, hours, minutes, seconds, etc.
The Java SE 8 API of Duration turned out to be incomplete for certain use cases. This change adds a slew of new methods that allow parts of the duration to be reliably returned.

JDK-8148849 - Truncating Duration.
Adds a method Duration.truncatedTo(TemporalUnit) to allow truncation, similar to the existing method on Instant.

JDK-8032510 - Add Duration.dividedBy(Duration).
A new method to allow a duration to be divided by another duration.

JDK-8133079 - LocalDate and LocalTime ofInstant() factory methods.
Add new factory methods in LocalDate and LocalTime to simplify conversion from Instant.

JDK-8143413 - Add toEpochSecond methods for efficient access.
Add methods to LocalDate, LocalTime and OffsetTime to optimize conversion to epoch-seconds.

Formatting:

JDK-8148947 - DateTimeFormatter pattern letter 'g'.
This adds a pattern letter for modified Julian days.

JDK-8155823 - Add date-time patterns 'v' and 'vvvv'.
This adds support for the "generic non-location" format for time-zones as defined by CLDR, such as "Pacific Time" (the format ignores daylight saving time). Methods were also added to the formatter builder - DateTimeFormatterBuilder.appendGenericZoneText().

JDK-8066806 - DateTimeFormatter cannot parse offset with single digit hour.
The formatter is extended to support 11 more time-zone offset formats, including single digit hours such as +2:00.

JDK-8031085 - DateTimeFormatter won't parse format "yyyyMMddHHmmssSSS".
This extends support for adjacent value parsing to fractions. Where in Java SE 8 a pattern like this looks like it should work, but doesn't, in Java SE 9 it just works.

JDK-8145633 - Adjacent value parsing for Localized Patterns .
This extends support for adjacent value parsing to localized patterns, such as week-based-year and week of year.

JDK-8148949 - DateTimeFormatter pattern letters 'A','n','N'.
These patterns were altered to be more flexible and produce less errors.

Performance:

JDK-8073394 - Clock.systemUTC() should return a constant.
This change avoids creating unecessary objects when using Clock.

JDK-8074003 - ZoneRules.getOffset(Instant) can be optimized.
This change reduces object churn when looking up time-zone data.

JDK-8074002 - ZoneId.systemDefault() should be faster.
This enhancement uses a clever approach to cache the time-zone while handling TimeZone.setDefault.

JDK-8068803 - Performance of LocalDate.plusDays could be better.
This optimizes LocalDate.plusDays for the common case of adding a small number of days.

JDK-8066291 - ZoneIdPrinterParser can be optimized.
The method ZoneRulesProvider.getAvailableZoneIds() now returns an immutable set, not a mutable one. Since most user code calls ZoneRules.getAvailableZoneIds(), it will be unaffected.

Thanks

Thanks for all these bug fixes and enhancements go to the many contributors, both inside and outside Oracle. (I've mostly been a reviewer, rather than an author, which has worked pretty well overall.)

Summary

The java.time.* keep on moving forward. Any feedback or other enhancement suggestions are welcome!

Code generating beans - mutable and immutable

2016-09-26T02:36:00.001+01:00

Java has long suffered from the pain of beans. To declare a simple data class takes far too much code. At JavaOne 2016, I talked about code generation options - see the slides.

Code generation of mutable and immutable beans

The Java ecosystem is massive. So many libraries releasd as open source and beyond, which naturally leads to the question as to how those libraries communicate. And it is the basic concept of beans that is the essential glue, despite the ancient specification. How do ORMs (Hibernate etc.), Serialization (Jackson etc.) and Bean Mappers (Dozer etc.) communicate? Via getters and setters.

The essential features of beans have moved beyond the JavaBeans spec, and are sometimes referred to as POJOs. The features are:

Mutable
No-args constructor
Getters and Setters
equals() / hashCode() / toString()

But writing these manually is slow, tedious and error-prone. Code generation should be able to help us here.

But should we be using mutable beans in 2016? No, no, no!

It is time to be writing immutable data structure (immutable beans). But the only practical way to do so is code generation, especially if you want to have builders.

In my talk at JavaOne 2016, I considered various code generation approaches:

IDE code generation

This is fine as far as it goes, but while the code is likely to be correct immediately after generation, there is still no guarantee that the generated code will stay correct as the class is maintained over time.

AutoValue, Immutables and VALJOGen

These three projects - AutoValue, Immutables, VALJOGen - use annotation processors to generate code during compilation. The idea is simple - the developer writes an abstract class or interface, and the tool code generates the implementation at compile time. However, these tool all focus on immutable beans, not mutable (Immutables can generate a modifiable bean, but it doesn't match the JavaBeans spec, so many tools will reject it).

On the up side, there is no chance to mess up the equals / hashCode / toString. While the tools all allow the methods to be manually written if necessary, most of the time, the default is what you want. It is also great not to have to implement the immutable builder class manually.

On the down side, you as the developer have to write abstract methods, not fields. A method is a few more keystrokes than a field, and the Javadoc requires an @return line too. With AutoValue, this is particularly painful, as you have to write the outline of the builder class. With Immutables, there is no need for this.

Of the three projects, AutoValue provides a straightforward simple tool that works, and where the implementation class is hidden (package scoped). Immutables provides a full-featured tool, with many options and ways to generate. By default, the implementation class is publicly visible and used by callers, but there are ways to make it package-scoped (with more code written by you). VALJOGen allows full customisation of the generation template. There is no doubt that Immutables is the most comprehensive of the three projects.

All three annotation processing tools must be setup before use. In general, adding the tool to Maven will do most of the work (Maven support is good). For Eclipse and IntelliJ, these instructions are comprehensive.

Lombok

The Lombok project also uses annotations to control code generation. However, instead of acting as an annotation processor, it hacks into the internal APIs of Eclipse and the Java compiler.

This approach allows code to be generated within the same class, avoiding the need for developers to work with an abstract class or interface. This means that instead of writing abstract methods, the developer writes fields, which is a more natural thing to do.

The key question with Lombok is not what it generates, but the way it generates it. If you are willing to accept the hacky approach, IDE limitations, and the inability to debug into the generated code, then it is a neat enough solution.

For Eclipse, Lombok must be installed, which is fairly painless as there is a GUI. Other tools require other installation approaches, see this page.

Joda-Beans

The Joda-Beans project takes a third approach to code generation. It is a source code regenerator, creating code within the same source file, identified by "autogenerated start/end" comments.

Developers write fields, not abstract methods which is simpler and less code. They also generate code into the same class, which can be final if desired.

One key benefit of generating all the code to the same class is that the code is entirely valid when checked out. There is no need to install a plugin or configure your IDE in any way.

Unlike the other choices, Joda-Beans also provides functionality at runtime. It aims to add C# style properties to Java. What this means in practice is that you can easily treat a bean as a set of properties, loop over the properties and create instances using a standardized builder. These features are the ideal building block for serialization frameworks, and Joda-Beans provides XML, JSON and binary serialization that operates using the properties, generally without reflection. The trade off here is that Joda-Beans is a runtime dependency, so it is only the best option if you use the additional properties feature.

The Joda-Beans regenerator can be run from Maven using a plugin. If you use the standard Eclipse Maven support, then simply saving the file in Eclipse will regenerate it. There is also a Gradle plugin.

Comparing

Most of the projects have some appeal.

AutoValue is a simple annotation processor that hides the implementation class, but requires more code to trigger it.
Immutables is a flexible and comprehensive annotation processor that can be used in many ways.
Lombok requires the least coding by the developer, but the trade-off is the implementation via internal APIs.
Joda-Beans is different in that it has a runtime dependency adding C# style properties to Java, allowing code to reliably loop over beans.

My preference is Joda-Beans (which I wrote), because I like the fact that the generated code is in the same source file, so callers see a normal final class, not an interface or abstract class. It also means that it compiles immediately in an IDE that has not been configured when checked out from source control. But Joda-Beans should really only be used if you understand the value of the properties support it provides.

If I was to pick another tool, I would use Immutables. It is comprehensive, and providing you invest the time to choose the best way to generate beans for your needs, it should have everything you need.

Finally, it is important that readers have a chance to look at the code written and the code generated. To do this, I have created the compare-beangen GitHub project. This project contains source code for all the tools above and more, demonstrating what you have to write.

To make best use of the project, check it out and import it into your IDE. That way, you will experience what code generation means, and how practical it is to work with it. (For example, see what happens when you rename a field/method/class. Does the code generator cope?)

Summary

It is time to start writing and using immutable beans instead of mutable ones. The Strata open source market risk project (my day job) has no mutable beans, so it is perfectly possible to do. But to use immutable beans, you are going to need a code generator, as they are too painful to use otherwise.

This blog has summarised five code generators, and provided a nice GitHub project for you to do you own comparisons.

Private methods in interfaces in Java 9

2016-09-20T07:26:00.000+01:00

Java SE 9 is slowly moving towards the finishing line. One new feature is private methods on interfaces.

Private methods on interfaces in Java 9

In Java 7 and all earlier versions, interfaces were simple. They could only contain public abstract methods.

Java 8 changed this. From Java 8, you can have public static methods and public default methods.

public interface HolidayCalendar {
  // static method, to get the calendar by identifier
  public static HolidayCalendar of(String id) {
    return Util.holidayCalendar(id);
  }
  
  // abstract method, to find if the date is a holiday
  public abstract boolean isHoliday(LocalDate date);
  
  // default method, using isHoliday()
  public default boolean isBusinessDay(LocalDate date) {
    return !isHoliday(date);
  }
}

Note that I have chosen to use the full declaration, with "public" on all three methods even though it is not required. I have argued that this is best practice for Java SE 8, because it makes the code clearer (now there are three types of method) and prepares for a time when there will be non-public methods.

And that time is very soon, as Java 9 is adding private methods on interfaces.

public interface HolidayCalendar {
  // example of a private interface method
  private void validateDate(LocalDate date) {
    if (date.isBefore(LocalDate.of(1970, 1, 1))) {
   throw new IllegalArgumentException();
 }
  }
}

Thus, methods can be public or private (with the default being public if not specified). Private methods can be static or instance. In both cases, the private method is not inherited by sub-interfaces or implementations. The valid combinations of modifiers in Java 9 will be as follow:

public static - supported
public abstract - supported
public default - supported
private static - supported
private abstract - compile error
private default - compile error
private - supported

Private methods on interfaces will be very useful in rounding out the functionality added in Java 8.

Var and val in Java?

2016-03-26T00:04:00.000+00:00

Should local variable type inference be added to Java? This is the question being pondered right now by the Java language team.

Local Variable Type Inference

JEP-286 proposes to add inference to local variables using a new psuedo-keyword (treated as a "reserved type name").

We seek to improve the developer experience by reducing the ceremony associated with writing Java code, while maintaining Java's commitment to static type safety, by allowing developers to elide the often-unnecessary manifest declaration of local variable types.

A number of possible keywords have been suggested:

var - for mutable local variables
val - for final (immutable) local variables
let - for final (immutable) local variables
auto - well lets ignore that one shall we...

Given the implementation strategy, it appears that the current final keyword will still be accepted in front of all of the options, and thus all of these would be final (immutable) variables:

final var - changes the mutable local variable to be final
final val - redundant additional modifier
final let - redundant additional modifier

Thus, the choice appears to be to add one of these combinations to Java:

var and final var
var and val - but final var and final val also valid
var and let - but final var and final let also valid

In broad terms, I am unexcited by this feature and unconvinced it actually makes Java better. While IDEs can mitigate the loss of type information when coding, I expect some code reviews to be significantly harder (as they are done outside IDEs). It should also be noted that the C# coding standards warn against excessive use of this feature:

Do not use var when the type is not apparent from the right side of the assignment.
Do not rely on the variable name to specify the type of the variable. It might not be correct.

Having said the above, I suspect there is very little chance of stopping this feature. The rest of this blog post focuses on choosing the right option for Java

Best option for Java

When this feature was announced, aficionados of Scala and Kotlin naturally started arguing for var and val. However, while precedence in other languages is good to examine, it does not necessarily apply that it is the best option for Java.

The primary reason why the best option for Java might be different is history. Scala and Kotlin had this feature from the start, Java has not. I'd like to show why I think val or let is wrong for Java, because of that history.

Consider the following code:

 public double parSpread(SwapLeg leg) {
   Currency ccyLeg = leg.getCurrency();
   Money convertedPv = presentValue(swap, ccyLeg);
   double pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

Local variable type inference would apply fine to it. But lets say that the type on one line was unclear, so we choose to keep it to add clarity (as per the C# guidelines):

 public double parSpread(SwapLeg leg) {
   val ccyLeg = leg.getCurrency();
   Money convertedPv = presentValue(swap, ccyLeg);
   val pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

Fine, you might say. But what if the code is written in a team that insists on marking every local variable as final.

 public double parSpread(final SwapLeg leg) {
   val ccyLeg = leg.getCurrency();
   final Money convertedPv = presentValue(swap, ccyLeg);
   val pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

Suddenly, we have a mess. Some parts of the code use final to indicate a "final" (immutable) local variable. Whereas other parts of the code use val. It is the mixture that is all wrong, and it is that mixture that you do not get in Scala or Kotlin.

(Perhaps you don't code using final on every local variable? I know I don't. But I do know that it is a reasonably common coding standard, designed to add safety to the code and reduce bugs.)

Contrast the above to the alternative:

 public double parSpread(final SwapLeg leg) {
   final var ccyLeg = leg.getCurrency();
   final Money convertedPv = presentValue(swap, ccyLeg);
   final var pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

This is a lot more consistent within Java. final continues to be the mechanism used everywhere to get a final (immutable) variable. And if you, like me, don't worry about the final keyword, it reduces to this:

 public double parSpread(final SwapLeg leg) {
   var ccyLeg = leg.getCurrency();
   Money convertedPv = presentValue(swap, ccyLeg);
   var pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

I understand the objections many readers will be having right now - that there should be two new "keywords", one for mutable and one for immutable local variables and that both should be of the same length/weight (or that the mutable one should be longer) to push people to use the immutable form more widely.

But in Java it really isn't that simple. We've had the final keyword for many years. Ignoring it results in an unpleasant and inconsistent mess.

Summary

I don't personally like local variable type inference at all. But if we are to have it, we have to make it fit well within the existing language.

I argue that val or let simply does not fit Java, because final already exists and has a clear meaning in that space. While not ideal, I must argue for var and final var, as the only combination on offer that meets the key criteria of fitting the existing language.

Explicit receiver parameters

2015-12-22T11:37:00.000+00:00

I discovered that Java 8 has a language feature I'd never heard of before today!

Explicit receiver parameters

Consider a simple method in Java 7:

 public class Currency {
   public String getCode() { ... }
 }

In Java 8, it turns out there is a second way to write the method:

 public class Currency {
   public String getCode(Currency this) { ... }
 }

The same trick also works for methods with arguments:

 public class Currency {
   public int compareTo(Currency this, Currency other) { ... }
 }

So, what is going on here?

The relevant part of the Java Language Specification is here.

The receiver parameter is an optional syntactic device for an instance method or an inner class's constructor. For an instance method, the receiver parameter represents the object for which the method is invoked. For an inner class's constructor, the receiver parameter represents the immediately enclosing instance of the newly constructed object. Either way, the receiver parameter exists solely to allow the type of the represented object to be denoted in source code, so that the type may be annotated. The receiver parameter is not a formal parameter; more precisely, it is not a declaration of any kind of variable, it is never bound to any value passed as an argument in a method invocation expression or qualified class instance creation expression, and it has no effect whatsoever at run time.

The new feature is entirely optional, and exists to allow the type to be annotated at the point it is used:

 public class Currency {
   public int compareTo(@AnnotatedUsage Currency this, Currency other) { ... }
 }
 @Target(ElementType.TYPE_USE)
 public @interface AnnotatedUsage {}

The annotation is made available in reflection using Method::getAnnotatedReceiverType().

Looking to the future

I discovered this feature in the context of Project Valhalla, the effort to add value types to Java. One possible syntax being considered to allow different methods on List<int> from List<String> is to use receiver types:

 public class List<any T> {
   public int sum(List<int> this) { ... }
 }

This syntax would be used to define a sum() method that only applies when List is parameterized by int, and not when parameterized by anything else, such as String.

For more information, see this email to the Valhalla experts mailing list. But please note that the list is read-only, cannot be signed up to and is for pre-selected experts only. In addition, everything discussed there is very, very early in the process, and liable to change greatly before being released in a real version of Java (such as Java 10).

Summary

Java 8 has a new language feature which you've probably never heard of and will probably never use. But at least you now have some knowledge that you can use to prove you are a Java Guru!

Naming Optional query methods

2015-09-01T10:10:00.001+01:00

In the last article I outlined a pragmatic approach to Java 8's Optional class. In this one I'm looking at how we should name query methods that might return Optional.

Convenience method naming

Consider a requirement to produce a tree data structure, perhaps something like XML DOM. The basic data structure might look something like this:

 public XmlElement {
   private final String name;
   private final String content;
   private final Map<String, String> attributes;
   private final List<XmlElement> children;
 }

(This article uses this as an example, but many other data structures or domain objects face similar issues, so don't get too caught up in the "XML"-ness of the example, and it's inability to represent every last subtlety of XML.)

The key question for this article is about the convenience methods this class exposes. Using standard best practices, an element with no children should return an empty list, and an element with no attributes should return an empty map. Similarly, we can use the empty string, "", when there is no content.

 public XmlElement {
   public String getName() {...}
   public String getContent() {...}
   public Map<String, String> getName() {...}
   public List<XmlElement> getChildren() {...}
 }

This approach is fine, and allows us to fully query this immutable data structure. But it would be nice to have some additional convenience methods. The main one to consider is a method to return the value of an attribute by name. The naive implementation would be:

 public XmlElement {
   public String getAttribute(String name) {
     return attributes.get(name);
   }
 }

But, this method can return null which the last article recommended against. The not so naive implementation would be:

 public XmlElement {
   public Optional<String> getAttribute(String name) {
     return Optional.ofNullable(attributes.get(name));
   }
 }

This is better, as null is no longer returned and it is now clear that there is a failure case to consider. However, what if the calling code knows that the data it is asking for is expected to be there, and is going to throw an exception if it isn't?

The "standard" approach in that case is to use the method Optional.orElseThrow() on the result of the last method. And that is good advice. But, equally, if the same call to orElseThrow() follows lots of calls to the new helper method, it may be a sign that the convenience method isn't helping as much as it should! As such, perhaps it is worthy of a helper method itself:

 public XmlElement {
   public String getAttribute(String name) {
     String value = attributes.get(name);
     if (value == null) {
       throw new IllegalArgumentException(
           "Unknown attribute '" + name + "' on element '" + this.name + "'"));
     }
     return value;
   }
 }

(Note that this follows the last article's recommendation to use null, not Optional, within the methods of a class.)

So, this is better for the many use cases where it is a failure case anyway if the attribute is missing. ie. where the caller knows that the attribute is defined to be mandatory, and it is necessarily an exception if it is missing.

In reality though, some attributes are going to be mandatory and some are optional. Which brings us to the point of this article, and design approach for a case like this, and the ensuing naming problem. My suggestion is to have two methods:

 public XmlElement {

   // throw exception if no attribute
   public String getAttribute(String name) { ... }

   // return empty optional if no attribute
   public Optional<String> findAttribute(String name) { ... }
 }

In this approach, there are two methods, and the caller can choose one method if they want the exception when the name is not found, and the other method if they want to handle the missing case.

The need for this approach is relatively rare - it needs to be a method that can be used in both ways, mandatory and optional. It also needs to be an API that will be called enough for the value of the two helper methods to outweigh the extra cost. But when it does crop up, it needs a good naming convention as the two methods cannot be overloads. And that is what I'm proposing here.

Name the method that throws an exception when not found getXxx(String)
Name the method that returns an optional findXxx(String)

With this approach, both methods are available, and have reasonable names. It seems to me, that when this kind of situation arises, the mandatory case is typically most common, and thus it gets the "get" name.

There are other possible naming approaches to this "near overload". But I've settled on this one as the right balance for my APIs.

Note however, that this is not a suggestion to name all methods returning an Optional something like findXxx(String). Only use it when appropriate, such as when paired with an exception throwing version do I think this makes sense.

Summary

This article outlines a naming approach for when you need overloaded convenience methods on an API, typically for accessing a collection, such as a map or list. It proposes adding one method, getXxx(String), that throws an exception when the key cannot be found, and a second method findXxx(String), that returns an empty optional when the key is not found.

Update 2015-10-15

The original version of this article proposed the name "getXxxOptional()". I've changed that to "findXxx()" based on the comments and experimentation with both options. Its clear to me now that "findXxx()" is a way better method name than "getXxxOptional()" for a situation like this.