Tuesday 20 February 2024

Pattern match Optional in Java 21

I'm going to describe a trick to get pattern patching on Optional in Java 21, but one you'll probably never actually use.

Using Optional

As of Java 21, Pattern matching in Java allows us to check a value against a type like an instanceof with a new variable being declared of the correct type. Pattern matching can handle simple types and the deconstruction of records. But pattern matching of arbitrary classes like Optional is not yet supported. (Work to support pattern match methods is ongoing).

In normal code, the best way to use Optional is with one of the functional methods:

  var addressOpt = findAddress(personId);
  var addressStr = addressOpt
        .map(address -> address.format())
        .orElse("No address available");

This works well in most cases. But sometimes you want to use the Optional with a return statement. This results in code using get() like this:

  var addressOpt = findAddress(personId);
  if (addressOpt.isPresent()) {
    // early return if address found
    return addressOpt.get().format();
  }
  // lots of other code to handle case when address not found

One way to improve this is to write a simple method:

  /**
   * Converts an optional to an iterable for use in the for-each statement.
   *
   * @param <lT> the type of optional element
   * @param optional the optional
   * @return an iterable representation of the optional
   */
  public static <lT> Iterable<lT> inOptional(Optional<lT> optional) {
    return optional.isPresent() ? List.of(optional.get()): List.of();
  }

Which allows the following neat form:

  for (var address : inOptional(findAddress(personId))) {
    // early return if address found
    return address.format();
  }
  // lots of other code to handle case when address not found

This is a great approach providing that you don't need an else branch.

Using Optional with Pattern matching

With Java 21 and pattern matching we have a new way to do this!

  if (findAddress(personId).orElse(null) instanceof Address address) {
    // early return if address found
    return address.format();
  } else {
    // lots of other code to handle case when address not found
  }

This makes use of the fact that instanceof rejects null. Any expression can be used that creates an Optional - just pop .orElse(null) on the end.

Note that this does not work with Java 17 in most cases, because unconditional patterns were not permitted. (In most cases, the expression will return a value of the type being checked for, and the compiler rejects it as being "always true". Of course this was a simplification in ealier Java versions, as the null really needed to be checked for.)

Is this a useful trick?

In reality, this is of marginal use. Where I think it could be of is a long if-else chain, for example:

  if (obj instanceof Integer value) {
    // use value
  } else if (obj instanceof Long value) {
    // use value
  } else if (obj instanceof Double value) {
    // use value
  } else if (lookupConverter(obj).orElse(null) instanceof Converter conv) {
    // use converter
  } else {
    // even more options
  }

(It is valuable here as it avoids calling lookupConverter(obj) until necessary.)

Anyway, it is a fun trick even if you never use it!

One day soon I imagine we will use something like this:

  if (lookupConverter(obj) instanceof Optional.of(Converter conv)) {
    // use converter
  } else {
    // even more options
  }

which I think we'd all agree is the better approach.

Comments at Reddit

Thursday 6 October 2022

Java on-ramp - Fully defined Entrypoints

How do you start a Java program? With a main method of course. But the ceremony around writing such a method is perhaps not the nicest for newcomers to Java.

There has been a bit of dicussion recently about how the "on-ramp" for Java could be made easier. This is the original proposal. Here are follow ups - OpenJDK, Reddit, Hacker news.

Starting point

This is the classic Java Hello Word:

  public class HelloWorld { 
    public static void main(String[] args) { 
      System.out.println("Hello World");
    }
  }

Lots of stuff going on - public, class, a class name, void, arrays, a method, method call. And one of the weirdest things in Java - System.out - a public static field in lower case. Something that is pretty much never seen in normal Java code. (I still remember System.out.println being the most confusing part about getting started in Java 1.0 - why are there two dots and why isn't it out()?)

The official proposal continues to discuss:

  • A more tolerant launch protocol
  • Unnamed classes
  • Predefined static imports for the most critical methods and fields

The ensuing discussion resulted in various suggestions. Having taken some time to reflect on the proposal and discussion, here is my contribution, which is that what is really needed is something more comprehensive.

Entrypoints

When a Java program starts some kind of class file needs to be run. It could be a normal class, but that isn't ideal as we don't really want static/instance variables, subclasses, parent interfaces, access control etc. One suggestion was for it to be a normal interface, but that isn't ideal as we don't want to mark the methods as default or allow abstract methods.

I'd like to propose that what Java needs is a new kind of class declaration for entrypoints.

I don't think this is overly radical. We already have two alternate class declarations - record and enum. They have alternate syntax that compiles to a class file without being explictly a class in source code. What we need here is a new kind - entrypoint - that compiles to a class file but has different syntax rules, just like record and enum do.

I believe this is a fundamentally better approach than the minor tweaks in the official proposal, because it will be useful to developers of all skill levels, including framework authors. ie. it has a much better "bang for buck".

The simplest entrypoint would be:

  // MyMain.java
  entrypoint {
    SystemOut.println("Hello World");
  }

In the source code we have various things:

  • Inferred class name from file name, the class file is MyMain$entrypoint
  • Top-level code, no need to discuss methods initially
  • No access to paraneters
  • New classes SystemOut, SystemIn and SystemErr
  • No constructor, as a new kind of class declaration it doesn't need it

The classes like SystemOut may seem like a small change, but it would have been much simpler for me from 25 years ago to understand. I don't favour more static imports for them (either here or more generally), as I think SystemOut.println("Hello World") is simple enough. More static imports would be too magical in my opinion.

The next steps when learning Java are for the instructor to expand the entrypoint.

  • Add a named method (always private, any name although main() would be common)
  • Add parameters to the method (maybe String[], maybe String...)
  • Add return type to the method (void is default return type)
  • Group code into a block
  • Add additional methods (always private)

Here are some valid examples. Note that instructors can choose the order to explain each feature:

  entrypoint {
    SystemOut.println("Hello World");
  }
  entrypoint main() {
    SystemOut.println("Hello World");
  }
  entrypoint main(String[] args) {
    SystemOut.println("Hello World");
  }
  entrypoint {
    main() {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    void main(String[] args) {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    main(String[] args) {
      output("Hello World");
    }
    output(String text) {
      SystemOut.println(text);
    }
  }

Note that there are never any static methods, static variables, instance variables or access control. If you need any of that you need a class. Thus we have proper separation of concerns for the entrypoint of systems, which would be Best Practice even for experienced developers.

Progressing to classes

During initial learning, the entrypoint class declaration and normal class declaration would be kept in separate files:

  // MyMain.java
  entrypoint {
    SystemOut.println(new Person().name());
  }
  // Person.java
  public class Person {
    String name() {
      return "Bob";
    }
  }

However, at some point the instructor would embed an entrypoint (of any valid syntax) in a normal class.

  public class Person {
    entrypoint {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

We discover that an entrypoint is normally wrapped in a class which then offers the ability to add static/instance variables and access control.

Note that since all methods on the entrypoint are private and the entrypoint is anonymous, there is no way for the rest of the code to invoke it without hackery. Note also that the entrypoint does not get any special favours like an instance of the outer class, thus there is no issue with no-arg constructors - if you want an instance you have to use new (the alternative is unhelpful magic that harms learnability IMO).

Finally, we see that our old-style static main method is revealed to be just a normal entrypoint:

  public class Person {
    entrypoint public static void main(String[] args) {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

ie. when a method is declared as public static void main(String[]) the keyword entrypoint is implicitly added.

What experienced developers gain from this is a clearer way to express what the entrypoint actually is, and more power in expressing whether they want the command line arguments or not.

Full-featured entrypoints

Everything above is what most Java developers would need to know. But an entrypoint would actually be a whole lot more powerful.

The basic entrypoint would compile to a class something like this:

  // MyMain.java
  entrypoint startHere(String[] args) {
    SystemOut.println("Hello World");
  }
  // MyMain$entrypoint.class
  public final MyMain$entrypoint implements java.lang.Entrypoint {
    @Override
    public void main(Runtime runtime) {
      runtime.execute(() -> startHere(runtime.args()));
    }
    private void startHere(String[] args) {
      SystemOut.println("Hello World");
    }
  }

Note that it is final and methods are private.

The Entrypoint interface would be:

  public interface java.lang.Entrypoint {
    /**
     * Invoked by the JVM to launch the program.
     * When the method completes, the JDK terminates.
     */
    public abstract void main(Runtime runtime);
  }

The Runtime.execute method would be something like:

  public void execute(ThrowableRunnable runnable) {
    try {
      runnable.run();
      System.exit(0);
    } catch (Throwable ex) {
      ex.printStackTrace();
      System.exit(1);
    }
  }

The JVM would do the following:

  • Load the class file specified on the command line
  • If it implements java.lang.Entrypoint call the no-args constructor and invoke it
  • Else look for a legacy public static void main(String[]), and invoke that

Note that java.lang.Entrypoint is a normal interface that can be implemented by anyone and do anything!

This last point is critical to enhancing the bang-for-buck. I was intriguied by things like Azul CRaC which wants to own the whole lifecycle of the JVM run. Wouldn't that be more powerful if they could control the whole lifecycle through Entrypoint. Another possibile use is to reset the state when an application has finished, allowing the same JVM to be reused - a bit like Function-as-a-Service providers or build system daemons do. (I suspect it may be possible to enhance the entrypoint concept to control the shutdown hooks and to catch things like System.exit but that is beyond the scope of this blog.) For example, here is a theoretical application framework entrypoint:

  // FrameworkApplication.java - an Open Source library
  public interface FrameworkApplication extends Entrypoint {
    public default main(Runtime runtime) {
      // do framework things
      start();
      // do framework things
    }
    public abstract start();
  }

Applications just implement this interface, and they can run it by specifying their own class name on the command line, yet it is a full-featured framework application!

Summary

I argue that the proposal above is more powerful and more useful to experienced developers than the official proposal, while still meeting the core goal of step-by-step on-ramp learning. Do let me know what you think. (Reddit discussion).

Saturday 25 September 2021

Big problems at the timezone database

The last time I wrote about the timezone database on this blog, the database was under threat from a lawsuit. Fortunately that lawsuit went away relatively quickly as the company involved got the message that their action was a big mistake. Unfortunately this time the mess is internal.

Paul Eggert is the project lead of the timezone database hosted at IANA, a position referred to as the TZ Coordinator. He is an expert in the field, having been involved in documenting timezone data for decades. Unfortunately, he is currently ignoring all objections to an action only he seems intent on making to solve an invented problem that only he sees as important.

The database is the world's principle source of timezone information. The data is included in everything from operating systems to smartphones to programming language development kits such as the JDK. While you may never have heard of it, the sheer pervasiveness of the data makes the potential impact of change or damage pretty huge.

The timezone database contains information about how clocks have varies in each region around the world. The mandate of the project is to record this information from 1970 onwards.

Of course, computers being what they are, a function that returns the timezone for a given date can be passed in a pre-1970 date as well as a post-1970 one. For this, and reasons of completeness, the timezone database contains pre-1970 data as well as post-1970 data. If you go to your JDK or operating system and ask for the timezone offset for 1920-01-01 for the ID "Europe/Oslo" or "Europe/Berlin" you will get an answer:

  DateTimeZone oslo = DateTimeZone.forID("Europe/Oslo");
  System.out.println(oslo.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 3600000
  DateTimeZone berlin = DateTimeZone.forID("Europe/Berlin");
  System.out.println(berlin.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 7200000

The proposed change is to downgrade "Europe/Oslo" to be merely an alias for "Europe/Berlin". The rationale is that since the two regions have the same data post-1970 there should only be one ID. The issue with this is that querying "Europe/Oslo" in pre-1970 (as above) will now return the data from Berlin. ie. the well researched pre-1970 data for Oslo will be replaced by that of Berlin.

The situation with Joda-Time is even worse. With Joda-Time, aliases (also known as Links) are actively resolved. Before the proposed change this test case passes, after the proposed change the test case fails:

  assertEquals("Europe/Oslo", DateTimeZone.forID("Europe/Oslo").getID());

In other words, it will be impossible for a Joda-Time user to hold the ID "Europe/Oslo" in memory. This could be pretty catastrophic for systems that rely on timezone management, particularly ones that end up storing that data in a database. To mitigate this to some degree, I've added a test case and released a version that has such a test case in it, but obviously users of Joda-Time that haven't upgraded to the latest version may still see problems if they update the tzdb version.

But what about backwards compatibility? Well it seems that the TZ Coordinator just doesn't see this as being important. To him, it doesn't matter that users may be relying on this data.

(Technically, the data has moved, not been deleted. But the file containing the moved data is never normally used by downstream systems, thus to all intents and purposes it has been deleted.)

The question you might ask is why is Berlin favoured over Oslo? Why can Berlin keep its status and full history, but Oslo gets effectively deleted? The answer is that Berlin has the greater population - would you have guessed that if you didn't read it here?

From my perspective, I cannot see how it is not incredibly unfair that the timezone ID that represents the country of Norway, "Europe/Oslo", is treated as less important than the ID that represents the country of Germany, "Europe/Berlin". Yes, there is a technical rationale around 1970 and population, but that is really pretty arcane.

In fact it goes further than this. The TZ Coordinator does not really believe that there should be an ID for Oslo/Norway at all. (The official project rules say that there should only be one ID for locations where timezone data is the same post-1970. Country borders simply dont matter.)

Some of the 30 IDs proposed for downgrade are "Europe/Oslo", "Europe/Stockholm", "Europe/Copenhagen", "Europe/Amsterdam", "Europe/Luxembourg", "Europe/Monaco", "Atlantic/Reykjavik" and "Indian/Mahe". If you regularly use any of these IDs you may be affected by this change. Iceland is a classic case - "Atlantic/Reykjavik" is to be downgraded in favour of "Africa/Abidjan". Bonus points if you know which country Abidjan is in!

What is driving the change? Well this is where it gets really weird. The TZ Coordinator's argument is that there is a fairness/equity problem if Oslo is allowed to keep its pre-1970 history but other locations (typically in Africa) are not. I have two problems with this. Firstly, I consider it to also be unfair and inequitable that Berlin gets to have pre-1970 history and Oslo does not. Secondly, the correct approach to solving a fairness/equity problem is to level up, not level down. ie. Most leaders would want to improve the worse performers on the team, not force the best performers to be as bad as the worst.

So, we have a change with terrible downstream effects, including a potential fork of a major global data set and broken end-user applications, to make a problem that no one was complaining about a whole lot worse.

I've spent months trying to stop this happening, but appear to have lost the battle. This is despite near unaminity on the mailing list requesting the changes to be paused. Tonight 9 of the 30 changes have been included in release 2021b. These are not the ones affecting Europe.

I still hope that a solution can be found that the TZ Coordinator is happy with that avoids an impact on countries like Norway, Sweden, Denmark or the Netherlands. In the medium term, I hope that funding can be found for the CLDR project to take on the timezone database (as CLDR has a much better record at managing data like this).

Stay tuned as I try and work out how best to resolve this completely unecessary drama.

Monday 4 November 2019

Java switch - 4 wrongs don't make a right

The switch statement in Java is being changed. But is it an upgrade or a mess?

Classic switch

The classic switch statement in Java isn't great. Unlike many other parts of Java, it wasn't properly rethought when pulling features across from C all those years ago.

The key flaw is "fall-through-by-default". This means that if you forget to put a break clause in each case, processing will continue on to the next case clause.

Another flaw is that variables are scoped to the entire switch, thus you cannot reuse a variable name in two different case clauses. In addition, default clause is not required, which leaves readers of the code unclear as to whether a clause was forgotten or not.

And of course there is also the key limitation - that the type to be switched on can only be an integer, enum or string.

 String instruction;
 switch (trafficLight) {
   case RED:
     instruction = "Stop";
   case YELLOW:
     instruction = "Prepare";
     break;
   case GREEN:
     instruction = "Go";
     break;
 }
 System.out.println(instruction);

The code above does not compile because there is no default clause, leaving instruction undefined. But even if it did compile, it would never print "Stop" due to the missing break. In my own coding, I prefer to always put a switch at the end of a method, with each clause containing a return to reduce the risks of switch.

Upgraded switch

As part of Project Amber, switch is being upgraded. But sadly, I'm unconvinced as to the merits of the new design. To be clear, there are some good aspects, but overall I think the solution is overly complex and with some unpleasant syntax choices.

The key aim is to add an expression form, where you can assign the result of the switch to a variable. This is rather like the ternary operator (eg. x != null ? x : ""), which is the expression equivalent of an if statement. An expression form would reduce problems like the undefined variable above, because it makes it more obvious that each branch must result in a variable.

The current plan is to add not one, but three new forms of switch. Yes, three.

Explaining this in a blog post is, unsurprisingly, going to take a while...

  • Type 1: Statement with classic syntax. As today. With fall-through-by-default. Not exhaustive.
  • Type 2: Expression with classic syntax. NEW! With fall-through-by-default. Must be exhaustive.
  • Type 3: Statement with new syntax. NEW! No fall-through. Not exhaustive.
  • Type 4: Expression with new syntax. NEW! No fall-through. Must be exhaustive.

The headline example (type 4) is of course quite nice:

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> "Prepare";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

As can be seen, the new syntax of type 3 and 4 uses an arrow instead of a colon. And there is no need to use break if the code consists of a single expression. There is also no need for a default clause when using an enum, because the compiler can insert it for you provided you've included all the known enum values. So, if you missed out GREEN, you would get a compile error.

The devil of course is in the detail.

Firstly, a clear positive. Instead of falling through by listing multiple labels, they can be comma-separated:

 // type 4
 var instruction = switch (trafficLight) {
   case RED, YELLOW -> "Stop";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

Straightforward and obvious. And avoids many of the simple fall-through use cases.

What if the code to execute is more complex than an expression?

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> {
     revYourEngine();
     yield "Prepare";
   }
   case GREEN -> "Go";
 };
 System.out.println(instruction);

yield? Shrug. For a long time it was going to be break {expression}, but this clashes with labelled break (a syntax feature that is rarely used).

So what about type 2?

 // type 2
 var instruction = switch (trafficLight) {
   case RED: yield "Stop";
   case YELLOW:
     System.out.println("Prepare");
   case GREEN: yield "Go";
 };
 System.out.println(instruction);

Oops! I forgot the yield. So, an input of YELLOW will output "Prepare" and then fall-through to yield "Go".

So, why is it proposed to add a new form of switch that repeats the fall-through-by-default error from 20 years ago? The answer is orthogonality - a 2x2 grid with expression vs statement and fall-through-by-default vs no fall-through.

A key question is whether being orthogonal justifies adding a almost totally useless form of switch (type 2) to the language.

So, type 3 is fine them?

Well, no. Because of the insistence on orthogonality, and thus an insistence of copying the historic rules relating to type 1 statement switch, there is no requirement to list all the case clauses:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
 }

So, what happens for YELLOW? The answer is nothing, but as a reader I am left wondering if the code is correct or incomplete. It would be much better if the above was a compile error, with developers forced to write a default clause:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
   default -> {}
 }

The official argument is that since type 1 statement switch (the current one) does not force exhaustiveness, neither can the new type 3 statement switch. My view is that keeping a bad design from 20 years ago is a worse sin.

What else? Well, one thing to bear in mind is that expressions cannot complete early, thus there is no way to return directly from within a switch expression (type 2 or 4). Nor is there a way to continue/break a loop. Trust me when I say there is an endless supply of Java exam questions in the rules that actually apply.

Summarizing the types

Type 1: Classic statement switch

  • Statement
  • Fall-through-by-default
  • return allowed, also continue/break a loop
  • Single scope for variables
  • Logic for each case is a sequence of statements potentially ending with break
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 2: Classic syntax expression switch

  • Expression
  • Fall-through-by-default
  • return not allowed, cannot continue/break a loop
  • Single scope for variables
  • Logic for each case can be a yield expression, or a sequence of statements potentially ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values

Type 3: Arrow-form statement switch

  • Statement
  • Fall-through is not permitted
  • return allowed, also continue/break a loop
  • No variable scope problems, logic for each case must be a statement or a block
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 4: Arrow-form expression switch

  • Expression
  • Fall-through is not permitted
  • return not allowed, cannot continue/break a loop
  • No variable scope problems, logic for each case must be an expression or a block ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values, but only from blocks (it is implied when not a block)

Are you confused yet?

OK, I'm sure I didn't explain everything perfectly, and I may well have made an error somewhere along the way. But the reality is that it is complex, and there are lots of rules hidden in plain sight. Yes, it is orthogonal. But I really don't think that helps in comprehending the feature.

What would I do?

Type 4 switch expressions are fine (although I have real issues with the extension of the arrow syntax from lambdas). My problem is with type 2 and 3. In reality, those two types of switch will be very rare, and thus most developers will never see them. Given this, I believe it would be better to not include them at all. Once this is accepted, there is no point in treating the expression form as a switch, because it won't actually have many connections to the old statement form.

I would drop type 2 and 3, and allow type 4 switch expressions to become what is known as statement expressions. (Another example of a statement expression is a method call, which can be used as an expression or as a statement on a line of its own, ignoring any return value.)

 // Stephen's expression switch
 var instruction = match (trafficLight) {
   case RED: "Stop";
   case YELLOW: "Prepare";
   case GO: "Go";
 }
 // Stephen's expression switch used as a statement (technically a statement expression)
 match (instruction) {
   case "Stop": doStop();
   case "Go": doGo();
   default: ;
 }

My approach uses a new keyword match, as I believe extending switch is the wrong baseline to use. Making it a statement expression means that there is only one set of rules - it is always an expression, it is just that you can use it as though it were a statement. What you can't do with my approach is use return in the statement version, because it isn't actually a statement (you can't use return from any expression in Java today, so this would be no different).

Summary

If you ignore the complexity, and just use type 4 switch expressions, the new feature is quite reasonable.

However, in order to add the one form of switch Java needed, we've also got two other duds - type 2 and 3. In my view, the feature needs to go back to the drawing board, but sadly I suspect it is now too late for that.

Friday 22 March 2019

User-defined literals in Java?

Java has a number of literals for creating values, but wouldn't it be nice if we had more?

Current literals

These are some of the literals we can write in Java today:

  • integer - 123, 12s, 1234L, 0xB8E817, 077, 0b1011_1010
  • floating point - 45.6f, 56.7d, 7.656e6
  • string - "Hello world"
  • char - 'a'
  • boolean - true, false
  • null - null

Project Amber is also considering adding multi-line and/or raw string literals.

But there are many other data types that would benefit from literals, such as dates, regex and URIs.

User-defined literals

In my ideal future, I'd like to see Java extended to support some form of user-defined literals. This would allow the author of a class to provide a mechanism to convert a sequence of characters into an instance of that class. It may be clearer to see some examples using one possible syntax (using backticks):

 Currency currency = `GBP`;
 LocalDate date = `2019-03-29`;
 Pattern pattern = `strata\.\w+`;
 URI uri = `https://blog.joda.org/`;

A number of semantic features would be required:

  • Type inference
  • Raw processing
  • Validated at compile-time
Type inference

Type inference is of course a key aspect of literals. It would have to work in a similar way to the existing literals, but with a tweak to handle the new var keyword. ie. these two would be equivalent:

 LocalDate date = `2019-03-29`;
 var date = LocalDate`2019-03-29`;

The type inference would also work with methods (compile error if ambiguous):

 boolean inferior = isShortMonth(`2019-04-12`);

 public boolean isShortMonth(LocalDate date) { return date.lengthOfMonth() < 31; }
Raw processing

Processing of the literal should not be limited by Java's escape mechanisms. User-defined literals need access to the raw string. Note that this is especially useful for regex, but would also be useful for files on Windows:

 // user-defined literals
 var pattern = Pattern`strata\.\w+`;
 // today
 var pattern = Pattern.compile("strata\\.\\w+");

Today, the `\` needs to be escaped, making the regex difficult to read.

Clearly, the problem with parsing raw literals is that there is no mechanism to escape. But the use cases for user-defined literals tend to have constrained formats, eg. a date doesn't contain random characters. So, although there might be edge cases where this would be a problem, they would vert much be edge cases.

Validated at Compile-time

A key feature of literals is that they are validated at compile-time. You can't use an integer literal to create an int if the value is larger than the maximum allowed integer (2^31).

User-defined literals also need to be parsed and validated at compile-time too. Thus this code would not compile:

 LocalDate date = `2019-02-31`;

Most types which would benefit from literals only accept specific input formats, so being able to check this at compile time would be beneficial.

How would it be implemented?

I'm pretty confident that there are various ways it could be done. I'm not going to pick an approach, as ultimately those that control the JVM and language are better placed to decide. Clearly though, there is going to need to be some form of factory method on the user class that performs the parse, with that method invoked by the compiler. And ideally, the results of the parse would be stored in the constant pool rather than re-parsed at runtime.

What I would say is that user-defined literals would almost be a requirement for making value types usable, so something like this may be on the way anyway.

Summary

I like literals. And I would really like to be able to define my own!

Any thoughts?

Wednesday 9 January 2019

Commercial support for Joda and ThreeTen projects

The Java ecosystem is made up of many individuals, organisations and companies producing many different libraries. Some of the largest projects have long had support options where users of the project, typically corporates, can pay for an enhanced warranty, guaranteed approach to bug fixes and more.

Small projects, run by a single individual or a team, have been unable to offer this service, even if they wanted to. In addition, there is a more subtle problem. The amount a small project could charge is too low for a corporate to pay.

This sounds odd, but was brought home to me by this thread on twitter:

As the thread indicates, it is basically impossible for a corporate to gift money to a small project, and it is not viable for small projects to meaningfully offer a support contract.

The problem is that not paying the maintainers has negative consequences. Take the recent case where a developer handed his open source project on to another person, who then used it to steal bitcoins.

Pay the maintainers

I believe there is now a solution to the problem. Tidelift.

Tidelift offers companies a monthly subscription to support their open source usage. And they pay some of that income directly to the maintainers of the projects that the company uses.

Maintainers are expected to continue maintaining the project, follow a responsible disclosure process for security issues and check their licensing. Tidelift does not get to control the project roadmap, and maintainers do not have to provide an active helpdesk or consulting. See here for more details.

As such, I'm now offering commercial support for Joda-Time, Joda-Money, Joda-Beans, Joda-Convert, Joda-Collect, ThreeTen-Extra, ThreeTen-backport via the Tidelift subscription.

This is an extra option for those that want to support the maintainers of open source but haven't been able to find a way to do so until now. The Joda and ThreeTen projects will always be free and available under a permissive licence, so there is no need to worry as a result of this.

Comments welcome.

Wednesday 31 October 2018

Should you adopt Java 12 or stick on Java 11?

Should you adopt Java 12 or stick on Java 11 for the next 3 years? Seems like an innocuous question, but it is one of the most important decisions out there for those running on the JVM. I'll try to cover the key aspects of the decision, with the assumption that you care about running with the latest set of security patches in production.

TL;DR, It is vital to fully understand and accept the risks before adopting Java 12.

The Java release train

There is now a new release of Java every six months, so Java 12 is less than five months away despite Java 11 having just been released. As part of the process of moving to more frequent releases, certain releases are designated to be LTS (long term support) and as such will have security patches available for four years or more. This makes them "major" releases, not because they have a bigger feature set but because they have multi year support.

It is expected that Java 11 patches (11.0.1, 11.0.2, 11.0.3, etc.) will be smaller and simpler than Java 8 updates (8u20, 8u40, 8u60, etc.) - Java 11 updates will be more focused on security patches, without the internal enhancements of Java 8 updates. Instead, Oracle want us to think of Java 12, 13, 14 etc. as small upgrades, similar to an imaginary Java 11u20, 11u40 etc. To be blunt, I find this nonsensical.

Senior Oracle employees have repeatedly argued that updates such as 8u20 and 8u40 often broke code. This was not my experience. In fact my experience was that update releases primarily contained bug fixes. The only break I can remember was the addition of --allow-script-in-comments to Javadoc, which isn't a core part of Java. As a result, I have never feared picking up the latest update release - and this has been a core benefit of the Java platform.

Drilling down into why update releases tend to cause no problems, lets examine the differences between release types:

Model Old model New model
Upgrade Java major releases Java update releases Java release train Java patches
Frequency Every 3 years or so Every 6 months Every 6 months Every 3 months
Versions 6 -> 7 -> 8 8 -> 8u20 -> 8u40 11 -> 12 -> 13 11 -> 11.0.1 -> 11.0.2
Language changes
JVM changes
Major enhancements
Added classes/methods
Removed classes/methods
New deprecations
Internal enhancements
JDK tool changes
Bug fixes
Security patches

Given the table above, I find it amazing that anyone would claim moving from 11 to 12 to 13 is anything like moving from 8u20 to 8u40. But that is the official Oracle viewpoint:

Going from Java 9->10->11 is closer to going from 8->8u20->8u40 than from 7->8->9.
Oracle FAQ

As the table clearly shows, each version in the Java release train can contain any change traditionally associated with a full major version. These include language changes and JVM changes, both of which have major impacts on IDEs, bytecode libraries and frameworks. In addition, there will not only be additional APIs, but also API removals (something that did not happen prior to 8).

Oracle's claim is that because each release is only 6 months after the previous one, there won't be as much "stuff" in it, thus it won't be as hard to upgrade. While true, it is also irrelevant. What matters is whether the upgrade has the potential to damage your code stack or not. And clearly, going from 11 -> 12 -> 13 has much greater potential for damage than 8 -> 8u20 -> 8u40 ever did.

The key difference compared to updates like 8u20 -> 8u40 are the changes to the bytecode version, and the changes to the specification. Changes to the bytecode version tend to be particularly disruptive, with most frameworks making heavy use of libraries like ASM or ByteBuddy that are intimately linked to each bytecode version. And moving from 8u20 -> 8u40 still had the same Java SE specification, with all the same classes and methods, something that cannot be relied on moving from 12 to 13. I simply do not accept Oracle's argument that the "amount of stuff" in a release is more significant than the "type of stuff".

Note however that another one of Oracle's claims really does matter. They point out that if you stick with Java 11 and plan to move to the next LTS version when it is released (ie. Java 17) that you might find your code doesn't compile. Remember that the Java development rules now state that an API method can be deprecated in one version and removed in the next one. Rules that do not take LTS releases into account. So, a method could be deprecated in 13 and removed in 15. Someone upgrading from 11 to 17 would simply find a deleted API having having never seen the deprecation. Lets not panic too much about removal though - the only APIs likely to be removed are specialist ones, not those in widespread use by application code.

Considerations before adopting the release train

In this section, I try to outline some of the considerations/risks that must be considered before adopting the release train.

Locked in to the train
If you adopt Java 12 and use a new language feature or new API, then you are effectively locking your project in to the release train. You have to adopt Java 13, 14, 15, 16 and 17. And you have to adopt each new release within one month of the next release coming out.

Remember that with the new release train, each release has a lifetime of six months, and is obsolete just seven months after release. That is because there will be only six months of security patches for each release, the first patch 1 month after release and the second 4 months after release. After 7 months, the next set of security patches come out but the older release will not get them.

Do your processes allow for a Java upgrade, any necessary bug fixing, testing and release within that narrow 1 month time window? Or are you willing to run in production on a Java version below the security baseline?

Upgrade blockers
There are many possible things that can block an upgrade of Java. Lets make a list of some of the common ones.

Insufficient development resources: Your team may get busy, or be downsized, or the project may go to production and the team disbanded. Can you guarantee that development time will be available to do the upgrade from Java 15 to 16 in two years time?

Build tools and IDEs: Will your IDE support each new version on the day of release? Will Maven? Gradle? Do you have a backup plan if they don't? Remember, you only have 1 month to complete the upgrade, test it and get it released to production. Under this section other tools include Checkstyle, JaCoCo, PMD, SpotBugs and many more.

Dependencies: Will your dependencies all be ready for each new version? Quickly enough for you to meet the 1 month deadline? Remember, it is not just your direct dependencies, but everything in your stack. Bytecode manipulation libraries are particularly affected for example, such as ByteBuddy and ASM.

Frameworks: Another kind of dependency, but a large and important one. Will Spring produce a new version every six months, within the narrow one month time window? Will Jakarta EE (formerly Java EE)? What happens if they don't?

Now the traditional approach to any of these blockers was to wait. With versions of Java up to 8, a common approach was to wait 6 to 12 months before starting the upgrade to give tools, libraries and frameworks the chance to fix any bugs. But of course the waiting approach is incompatible with the release train.

Cloud / Hosting / Deployment
Do you have control of where and how your code runs in production? For example, if you run your code in AWS Lambda you do not have control. AWS Lambda has not adopted Java 9 or 10, and they don't even have Java 11 even though it is over a month after release. Unless AWS give a public guarantee to support each new Java version, then you simply can't adopt Java 12. (My working assumption is that AWS Lambda will only support major LTS versions, supported by the Amazon Corretto JDK announcement.)

What about hosting of your CI system? Will Jenkins, Travis, Circle, Shippable, GitLab be updated quickly? What do you do if they are not?

Predicting the future
Perhaps you have read through the list above and are happy your code and processes today can cope. Great! But it is critical to understand that you are also restricting your ability to change in the future.

For example, maybe your code doesn't run on AWS Lambda today. But are you willing to say you can't do so for the next three years?

Planning for the release train

If you are considering adopting the release train, I recommend preparing a list of all the things you depend on now, or might depend on in the next 3 years. You need to be confident that everything on that list will work correctly and be upgraded along with the release train, or have a plan if that dependency is not upgraded. The list for my day job is something like this:

  • Amazon AWS
  • Eclipse
  • IntelliJ
  • Travis CI
  • Shippable CI
  • Maven
  • Maven plugins (compile, jar, source, javadoc, etc)
  • Checkstyle, & associated IDE plugins and maven plugin
  • JaCoCo, & associated IDE plugins and maven plugin
  • PMD, & associated maven plugin
  • SpotBugs, & associated maven plugin
  • OSGi bundle metadata tool
  • Bytecode tools (Byte buddy / ASM etc)
  • Over 100 jar file dependencies

And I've probably forgotten something.

Don't get me wrong. I think it is perfectly possible to make a choice to say that you are willing to take the risk. That the benefits of new language features, and probable enhanced performance, make the effort worthwhile. But I strongly believe it is more risky than remaining on Java 11.

A middle ground?

One possible middle ground is to develop your application for Java 12, but run it in production on Java 13, 14, 15 etc. as soon as they come out. Sadly, this approach is less viable than it should be.

The removal of APIs and changes to the bytecode version add uncertainty to the stack. Even if your code doesn't use one of the removed APIs, one of your libraries might. Or a bytecode manipulation library might need upgrading, with knock on effects. So while the middle ground is a possible fallback if you get stuck, it is far from a no-risk solution.

Some additional links

Spring framework has expressed its policy wrt Java 12 in a video. The key sections are:

Jaba 8 and 11 as the LTS branches officially supported from our end. Best efforts support for the releases inbetween. ... if you intend to upgrade to 12 ... we are very willing to work with you ... but they are not officially production supported. ... The long term support releases are what we are primarily focussed on. Java 12 and higher will be best effort from our side.

As an example of a typical software vendor, Liferay states:

Liferay has decided we will not certify every single major release of the JDK. We will instead choose to follow Oracle's lead and certify only those marked for LTS.
Liferay blog

Oracle's official "misconceptions" slide about the new release model.

Summary

I'm sure some development teams will adopt the Java release train. My hope is that they do so with their eyes wide open. I know we won't be adopting the release train at my day job any time soon, a key blocker being our use of AWS Lambda, but I'd be concerned about all the other points too.

Feel free to leave a comment, especially if you think I've missed any points that should be on the considerations list.