Saturday, 25 September 2021

Big problems at the timezone database

The last time I wrote about the timezone database on this blog, the database was under threat from a lawsuit. Fortunately that lawsuit went away relatively quickly as the company involved got the message that their action was a big mistake. Unfortunately this time the mess is internal.

Paul Eggert is the project lead of the timezone database hosted at IANA, a position referred to as the TZ Coordinator. He is an expert in the field, having been involved in documenting timezone data for decades. Unfortunately, he is currently ignoring all objections to an action only he seems intent on making to solve an invented problem that only he sees as important.

The database is the world's principle source of timezone information. The data is included in everything from operating systems to smartphones to programming language development kits such as the JDK. While you may never have heard of it, the sheer pervasiveness of the data makes the potential impact of change or damage pretty huge.

The timezone database contains information about how clocks have varies in each region around the world. The mandate of the project is to record this information from 1970 onwards.

Of course, computers being what they are, a function that returns the timezone for a given date can be passed in a pre-1970 date as well as a post-1970 one. For this, and reasons of completeness, the timezone database contains pre-1970 data as well as post-1970 data. If you go to your JDK or operating system and ask for the timezone offset for 1920-01-01 for the ID "Europe/Oslo" or "Europe/Berlin" you will get an answer:

  DateTimeZone oslo = DateTimeZone.forID("Europe/Oslo");
  System.out.println(oslo.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 3600000
  DateTimeZone berlin = DateTimeZone.forID("Europe/Berlin");
  System.out.println(berlin.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 7200000

The proposed change is to downgrade "Europe/Oslo" to be merely an alias for "Europe/Berlin". The rationale is that since the two regions have the same data post-1970 there should only be one ID. The issue with this is that querying "Europe/Oslo" in pre-1970 (as above) will now return the data from Berlin. ie. the well researched pre-1970 data for Oslo will be replaced by that of Berlin.

The situation with Joda-Time is even worse. With Joda-Time, aliases (also known as Links) are actively resolved. Before the proposed change this test case passes, after the proposed change the test case fails:

  assertEquals("Europe/Oslo", DateTimeZone.forID("Europe/Oslo").getID());

In other words, it will be impossible for a Joda-Time user to hold the ID "Europe/Oslo" in memory. This could be pretty catastrophic for systems that rely on timezone management, particularly ones that end up storing that data in a database. To mitigate this to some degree, I've added a test case and released a version that has such a test case in it, but obviously users of Joda-Time that haven't upgraded to the latest version may still see problems if they update the tzdb version.

But what about backwards compatibility? Well it seems that the TZ Coordinator just doesn't see this as being important. To him, it doesn't matter that users may be relying on this data.

(Technically, the data has moved, not been deleted. But the file containing the moved data is never normally used by downstream systems, thus to all intents and purposes it has been deleted.)

The question you might ask is why is Berlin favoured over Oslo? Why can Berlin keep its status and full history, but Oslo gets effectively deleted? The answer is that Berlin has the greater population - would you have guessed that if you didn't read it here?

From my perspective, I cannot see how it is not incredibly unfair that the timezone ID that represents the country of Norway, "Europe/Oslo", is treated as less important than the ID that represents the country of Germany, "Europe/Berlin". Yes, there is a technical rationale around 1970 and population, but that is really pretty arcane.

In fact it goes further than this. The TZ Coordinator does not really believe that there should be an ID for Oslo/Norway at all. (The official project rules say that there should only be one ID for locations where timezone data is the same post-1970. Country borders simply dont matter.)

Some of the 30 IDs proposed for downgrade are "Europe/Oslo", "Europe/Stockholm", "Europe/Copenhagen", "Europe/Amsterdam", "Europe/Luxembourg", "Europe/Monaco", "Atlantic/Reykjavik" and "Indian/Mahe". If you regularly use any of these IDs you may be affected by this change. Iceland is a classic case - "Atlantic/Reykjavik" is to be downgraded in favour of "Africa/Abidjan". Bonus points if you know which country Abidjan is in!

What is driving the change? Well this is where it gets really weird. The TZ Coordinator's argument is that there is a fairness/equity problem if Oslo is allowed to keep its pre-1970 history but other locations (typically in Africa) are not. I have two problems with this. Firstly, I consider it to also be unfair and inequitable that Berlin gets to have pre-1970 history and Oslo does not. Secondly, the correct approach to solving a fairness/equity problem is to level up, not level down. ie. Most leaders would want to improve the worse performers on the team, not force the best performers to be as bad as the worst.

So, we have a change with terrible downstream effects, including a potential fork of a major global data set and broken end-user applications, to make a problem that no one was complaining about a whole lot worse.

I've spent months trying to stop this happening, but appear to have lost the battle. This is despite near unaminity on the mailing list requesting the changes to be paused. Tonight 9 of the 30 changes have been included in release 2021b. These are not the ones affecting Europe.

I still hope that a solution can be found that the TZ Coordinator is happy with that avoids an impact on countries like Norway, Sweden, Denmark or the Netherlands. In the medium term, I hope that funding can be found for the CLDR project to take on the timezone database (as CLDR has a much better record at managing data like this).

Stay tuned as I try and work out how best to resolve this completely unecessary drama.

Monday, 4 November 2019

Java switch - 4 wrongs don't make a right

The switch statement in Java is being changed. But is it an upgrade or a mess?

Classic switch

The classic switch statement in Java isn't great. Unlike many other parts of Java, it wasn't properly rethought when pulling features across from C all those years ago.

The key flaw is "fall-through-by-default". This means that if you forget to put a break clause in each case, processing will continue on to the next case clause.

Another flaw is that variables are scoped to the entire switch, thus you cannot reuse a variable name in two different case clauses. In addition, default clause is not required, which leaves readers of the code unclear as to whether a clause was forgotten or not.

And of course there is also the key limitation - that the type to be switched on can only be an integer, enum or string.

 String instruction;
 switch (trafficLight) {
   case RED:
     instruction = "Stop";
   case YELLOW:
     instruction = "Prepare";
     break;
   case GREEN:
     instruction = "Go";
     break;
 }
 System.out.println(instruction);

The code above does not compile because there is no default clause, leaving instruction undefined. But even if it did compile, it would never print "Stop" due to the missing break. In my own coding, I prefer to always put a switch at the end of a method, with each clause containing a return to reduce the risks of switch.

Upgraded switch

As part of Project Amber, switch is being upgraded. But sadly, I'm unconvinced as to the merits of the new design. To be clear, there are some good aspects, but overall I think the solution is overly complex and with some unpleasant syntax choices.

The key aim is to add an expression form, where you can assign the result of the switch to a variable. This is rather like the ternary operator (eg. x != null ? x : ""), which is the expression equivalent of an if statement. An expression form would reduce problems like the undefined variable above, because it makes it more obvious that each branch must result in a variable.

The current plan is to add not one, but three new forms of switch. Yes, three.

Explaining this in a blog post is, unsurprisingly, going to take a while...

  • Type 1: Statement with classic syntax. As today. With fall-through-by-default. Not exhaustive.
  • Type 2: Expression with classic syntax. NEW! With fall-through-by-default. Must be exhaustive.
  • Type 3: Statement with new syntax. NEW! No fall-through. Not exhaustive.
  • Type 4: Expression with new syntax. NEW! No fall-through. Must be exhaustive.

The headline example (type 4) is of course quite nice:

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> "Prepare";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

As can be seen, the new syntax of type 3 and 4 uses an arrow instead of a colon. And there is no need to use break if the code consists of a single expression. There is also no need for a default clause when using an enum, because the compiler can insert it for you provided you've included all the known enum values. So, if you missed out GREEN, you would get a compile error.

The devil of course is in the detail.

Firstly, a clear positive. Instead of falling through by listing multiple labels, they can be comma-separated:

 // type 4
 var instruction = switch (trafficLight) {
   case RED, YELLOW -> "Stop";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

Straightforward and obvious. And avoids many of the simple fall-through use cases.

What if the code to execute is more complex than an expression?

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> {
     revYourEngine();
     yield "Prepare";
   }
   case GREEN -> "Go";
 };
 System.out.println(instruction);

yield? Shrug. For a long time it was going to be break {expression}, but this clashes with labelled break (a syntax feature that is rarely used).

So what about type 2?

 // type 2
 var instruction = switch (trafficLight) {
   case RED: yield "Stop";
   case YELLOW:
     System.out.println("Prepare");
   case GREEN: yield "Go";
 };
 System.out.println(instruction);

Oops! I forgot the yield. So, an input of YELLOW will output "Prepare" and then fall-through to yield "Go".

So, why is it proposed to add a new form of switch that repeats the fall-through-by-default error from 20 years ago? The answer is orthogonality - a 2x2 grid with expression vs statement and fall-through-by-default vs no fall-through.

A key question is whether being orthogonal justifies adding a almost totally useless form of switch (type 2) to the language.

So, type 3 is fine them?

Well, no. Because of the insistence on orthogonality, and thus an insistence of copying the historic rules relating to type 1 statement switch, there is no requirement to list all the case clauses:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
 }

So, what happens for YELLOW? The answer is nothing, but as a reader I am left wondering if the code is correct or incomplete. It would be much better if the above was a compile error, with developers forced to write a default clause:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
   default -> {}
 }

The official argument is that since type 1 statement switch (the current one) does not force exhaustiveness, neither can the new type 3 statement switch. My view is that keeping a bad design from 20 years ago is a worse sin.

What else? Well, one thing to bear in mind is that expressions cannot complete early, thus there is no way to return directly from within a switch expression (type 2 or 4). Nor is there a way to continue/break a loop. Trust me when I say there is an endless supply of Java exam questions in the rules that actually apply.

Summarizing the types

Type 1: Classic statement switch

  • Statement
  • Fall-through-by-default
  • return allowed, also continue/break a loop
  • Single scope for variables
  • Logic for each case is a sequence of statements potentially ending with break
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 2: Classic syntax expression switch

  • Expression
  • Fall-through-by-default
  • return not allowed, cannot continue/break a loop
  • Single scope for variables
  • Logic for each case can be a yield expression, or a sequence of statements potentially ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values

Type 3: Arrow-form statement switch

  • Statement
  • Fall-through is not permitted
  • return allowed, also continue/break a loop
  • No variable scope problems, logic for each case must be a statement or a block
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 4: Arrow-form expression switch

  • Expression
  • Fall-through is not permitted
  • return not allowed, cannot continue/break a loop
  • No variable scope problems, logic for each case must be an expression or a block ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values, but only from blocks (it is implied when not a block)

Are you confused yet?

OK, I'm sure I didn't explain everything perfectly, and I may well have made an error somewhere along the way. But the reality is that it is complex, and there are lots of rules hidden in plain sight. Yes, it is orthogonal. But I really don't think that helps in comprehending the feature.

What would I do?

Type 4 switch expressions are fine (although I have real issues with the extension of the arrow syntax from lambdas). My problem is with type 2 and 3. In reality, those two types of switch will be very rare, and thus most developers will never see them. Given this, I believe it would be better to not include them at all. Once this is accepted, there is no point in treating the expression form as a switch, because it won't actually have many connections to the old statement form.

I would drop type 2 and 3, and allow type 4 switch expressions to become what is known as statement expressions. (Another example of a statement expression is a method call, which can be used as an expression or as a statement on a line of its own, ignoring any return value.)

 // Stephen's expression switch
 var instruction = match (trafficLight) {
   case RED: "Stop";
   case YELLOW: "Prepare";
   case GO: "Go";
 }
 // Stephen's expression switch used as a statement (technically a statement expression)
 match (instruction) {
   case "Stop": doStop();
   case "Go": doGo();
   default: ;
 }

My approach uses a new keyword match, as I believe extending switch is the wrong baseline to use. Making it a statement expression means that there is only one set of rules - it is always an expression, it is just that you can use it as though it were a statement. What you can't do with my approach is use return in the statement version, because it isn't actually a statement (you can't use return from any expression in Java today, so this would be no different).

Summary

If you ignore the complexity, and just use type 4 switch expressions, the new feature is quite reasonable.

However, in order to add the one form of switch Java needed, we've also got two other duds - type 2 and 3. In my view, the feature needs to go back to the drawing board, but sadly I suspect it is now too late for that.

Friday, 22 March 2019

User-defined literals in Java?

Java has a number of literals for creating values, but wouldn't it be nice if we had more?

Current literals

These are some of the literals we can write in Java today:

  • integer - 123, 12s, 1234L, 0xB8E817, 077, 0b1011_1010
  • floating point - 45.6f, 56.7d, 7.656e6
  • string - "Hello world"
  • char - 'a'
  • boolean - true, false
  • null - null

Project Amber is also considering adding multi-line and/or raw string literals.

But there are many other data types that would benefit from literals, such as dates, regex and URIs.

User-defined literals

In my ideal future, I'd like to see Java extended to support some form of user-defined literals. This would allow the author of a class to provide a mechanism to convert a sequence of characters into an instance of that class. It may be clearer to see some examples using one possible syntax (using backticks):

 Currency currency = `GBP`;
 LocalDate date = `2019-03-29`;
 Pattern pattern = `strata\.\w+`;
 URI uri = `https://blog.joda.org/`;

A number of semantic features would be required:

  • Type inference
  • Raw processing
  • Validated at compile-time
Type inference

Type inference is of course a key aspect of literals. It would have to work in a similar way to the existing literals, but with a tweak to handle the new var keyword. ie. these two would be equivalent:

 LocalDate date = `2019-03-29`;
 var date = LocalDate`2019-03-29`;

The type inference would also work with methods (compile error if ambiguous):

 boolean inferior = isShortMonth(`2019-04-12`);

 public boolean isShortMonth(LocalDate date) { return date.lengthOfMonth() < 31; }
Raw processing

Processing of the literal should not be limited by Java's escape mechanisms. User-defined literals need access to the raw string. Note that this is especially useful for regex, but would also be useful for files on Windows:

 // user-defined literals
 var pattern = Pattern`strata\.\w+`;
 // today
 var pattern = Pattern.compile("strata\\.\\w+");

Today, the `\` needs to be escaped, making the regex difficult to read.

Clearly, the problem with parsing raw literals is that there is no mechanism to escape. But the use cases for user-defined literals tend to have constrained formats, eg. a date doesn't contain random characters. So, although there might be edge cases where this would be a problem, they would vert much be edge cases.

Validated at Compile-time

A key feature of literals is that they are validated at compile-time. You can't use an integer literal to create an int if the value is larger than the maximum allowed integer (2^31).

User-defined literals also need to be parsed and validated at compile-time too. Thus this code would not compile:

 LocalDate date = `2019-02-31`;

Most types which would benefit from literals only accept specific input formats, so being able to check this at compile time would be beneficial.

How would it be implemented?

I'm pretty confident that there are various ways it could be done. I'm not going to pick an approach, as ultimately those that control the JVM and language are better placed to decide. Clearly though, there is going to need to be some form of factory method on the user class that performs the parse, with that method invoked by the compiler. And ideally, the results of the parse would be stored in the constant pool rather than re-parsed at runtime.

What I would say is that user-defined literals would almost be a requirement for making value types usable, so something like this may be on the way anyway.

Summary

I like literals. And I would really like to be able to define my own!

Any thoughts?

Wednesday, 9 January 2019

Commercial support for Joda and ThreeTen projects

The Java ecosystem is made up of many individuals, organisations and companies producing many different libraries. Some of the largest projects have long had support options where users of the project, typically corporates, can pay for an enhanced warranty, guaranteed approach to bug fixes and more.

Small projects, run by a single individual or a team, have been unable to offer this service, even if they wanted to. In addition, there is a more subtle problem. The amount a small project could charge is too low for a corporate to pay.

This sounds odd, but was brought home to me by this thread on twitter:

As the thread indicates, it is basically impossible for a corporate to gift money to a small project, and it is not viable for small projects to meaningfully offer a support contract.

The problem is that not paying the maintainers has negative consequences. Take the recent case where a developer handed his open source project on to another person, who then used it to steal bitcoins.

Pay the maintainers

I believe there is now a solution to the problem. Tidelift.

Tidelift offers companies a monthly subscription to support their open source usage. And they pay some of that income directly to the maintainers of the projects that the company uses.

Maintainers are expected to continue maintaining the project, follow a responsible disclosure process for security issues and check their licensing. Tidelift does not get to control the project roadmap, and maintainers do not have to provide an active helpdesk or consulting. See here for more details.

As such, I'm now offering commercial support for Joda-Time, Joda-Money, Joda-Beans, Joda-Convert, Joda-Collect, ThreeTen-Extra, ThreeTen-backport via the Tidelift subscription.

This is an extra option for those that want to support the maintainers of open source but haven't been able to find a way to do so until now. The Joda and ThreeTen projects will always be free and available under a permissive licence, so there is no need to worry as a result of this.

Comments welcome.

Wednesday, 31 October 2018

Should you adopt Java 12 or stick on Java 11?

Should you adopt Java 12 or stick on Java 11 for the next 3 years? Seems like an innocuous question, but it is one of the most important decisions out there for those running on the JVM. I'll try to cover the key aspects of the decision, with the assumption that you care about running with the latest set of security patches in production.

TL;DR, It is vital to fully understand and accept the risks before adopting Java 12.

The Java release train

There is now a new release of Java every six months, so Java 12 is less than five months away despite Java 11 having just been released. As part of the process of moving to more frequent releases, certain releases are designated to be LTS (long term support) and as such will have security patches available for four years or more. This makes them "major" releases, not because they have a bigger feature set but because they have multi year support.

It is expected that Java 11 patches (11.0.1, 11.0.2, 11.0.3, etc.) will be smaller and simpler than Java 8 updates (8u20, 8u40, 8u60, etc.) - Java 11 updates will be more focused on security patches, without the internal enhancements of Java 8 updates. Instead, Oracle want us to think of Java 12, 13, 14 etc. as small upgrades, similar to an imaginary Java 11u20, 11u40 etc. To be blunt, I find this nonsensical.

Senior Oracle employees have repeatedly argued that updates such as 8u20 and 8u40 often broke code. This was not my experience. In fact my experience was that update releases primarily contained bug fixes. The only break I can remember was the addition of --allow-script-in-comments to Javadoc, which isn't a core part of Java. As a result, I have never feared picking up the latest update release - and this has been a core benefit of the Java platform.

Drilling down into why update releases tend to cause no problems, lets examine the differences between release types:

Model Old model New model
Upgrade Java major releases Java update releases Java release train Java patches
Frequency Every 3 years or so Every 6 months Every 6 months Every 3 months
Versions 6 -> 7 -> 8 8 -> 8u20 -> 8u40 11 -> 12 -> 13 11 -> 11.0.1 -> 11.0.2
Language changes
JVM changes
Major enhancements
Added classes/methods
Removed classes/methods
New deprecations
Internal enhancements
JDK tool changes
Bug fixes
Security patches

Given the table above, I find it amazing that anyone would claim moving from 11 to 12 to 13 is anything like moving from 8u20 to 8u40. But that is the official Oracle viewpoint:

Going from Java 9->10->11 is closer to going from 8->8u20->8u40 than from 7->8->9.
Oracle FAQ

As the table clearly shows, each version in the Java release train can contain any change traditionally associated with a full major version. These include language changes and JVM changes, both of which have major impacts on IDEs, bytecode libraries and frameworks. In addition, there will not only be additional APIs, but also API removals (something that did not happen prior to 8).

Oracle's claim is that because each release is only 6 months after the previous one, there won't be as much "stuff" in it, thus it won't be as hard to upgrade. While true, it is also irrelevant. What matters is whether the upgrade has the potential to damage your code stack or not. And clearly, going from 11 -> 12 -> 13 has much greater potential for damage than 8 -> 8u20 -> 8u40 ever did.

The key difference compared to updates like 8u20 -> 8u40 are the changes to the bytecode version, and the changes to the specification. Changes to the bytecode version tend to be particularly disruptive, with most frameworks making heavy use of libraries like ASM or ByteBuddy that are intimately linked to each bytecode version. And moving from 8u20 -> 8u40 still had the same Java SE specification, with all the same classes and methods, something that cannot be relied on moving from 12 to 13. I simply do not accept Oracle's argument that the "amount of stuff" in a release is more significant than the "type of stuff".

Note however that another one of Oracle's claims really does matter. They point out that if you stick with Java 11 and plan to move to the next LTS version when it is released (ie. Java 17) that you might find your code doesn't compile. Remember that the Java development rules now state that an API method can be deprecated in one version and removed in the next one. Rules that do not take LTS releases into account. So, a method could be deprecated in 13 and removed in 15. Someone upgrading from 11 to 17 would simply find a deleted API having having never seen the deprecation. Lets not panic too much about removal though - the only APIs likely to be removed are specialist ones, not those in widespread use by application code.

Considerations before adopting the release train

In this section, I try to outline some of the considerations/risks that must be considered before adopting the release train.

Locked in to the train
If you adopt Java 12 and use a new language feature or new API, then you are effectively locking your project in to the release train. You have to adopt Java 13, 14, 15, 16 and 17. And you have to adopt each new release within one month of the next release coming out.

Remember that with the new release train, each release has a lifetime of six months, and is obsolete just seven months after release. That is because there will be only six months of security patches for each release, the first patch 1 month after release and the second 4 months after release. After 7 months, the next set of security patches come out but the older release will not get them.

Do your processes allow for a Java upgrade, any necessary bug fixing, testing and release within that narrow 1 month time window? Or are you willing to run in production on a Java version below the security baseline?

Upgrade blockers
There are many possible things that can block an upgrade of Java. Lets make a list of some of the common ones.

Insufficient development resources: Your team may get busy, or be downsized, or the project may go to production and the team disbanded. Can you guarantee that development time will be available to do the upgrade from Java 15 to 16 in two years time?

Build tools and IDEs: Will your IDE support each new version on the day of release? Will Maven? Gradle? Do you have a backup plan if they don't? Remember, you only have 1 month to complete the upgrade, test it and get it released to production. Under this section other tools include Checkstyle, JaCoCo, PMD, SpotBugs and many more.

Dependencies: Will your dependencies all be ready for each new version? Quickly enough for you to meet the 1 month deadline? Remember, it is not just your direct dependencies, but everything in your stack. Bytecode manipulation libraries are particularly affected for example, such as ByteBuddy and ASM.

Frameworks: Another kind of dependency, but a large and important one. Will Spring produce a new version every six months, within the narrow one month time window? Will Jakarta EE (formerly Java EE)? What happens if they don't?

Now the traditional approach to any of these blockers was to wait. With versions of Java up to 8, a common approach was to wait 6 to 12 months before starting the upgrade to give tools, libraries and frameworks the chance to fix any bugs. But of course the waiting approach is incompatible with the release train.

Cloud / Hosting / Deployment
Do you have control of where and how your code runs in production? For example, if you run your code in AWS Lambda you do not have control. AWS Lambda has not adopted Java 9 or 10, and they don't even have Java 11 even though it is over a month after release. Unless AWS give a public guarantee to support each new Java version, then you simply can't adopt Java 12. (My working assumption is that AWS Lambda will only support major LTS versions, supported by the Amazon Corretto JDK announcement.)

What about hosting of your CI system? Will Jenkins, Travis, Circle, Shippable, GitLab be updated quickly? What do you do if they are not?

Predicting the future
Perhaps you have read through the list above and are happy your code and processes today can cope. Great! But it is critical to understand that you are also restricting your ability to change in the future.

For example, maybe your code doesn't run on AWS Lambda today. But are you willing to say you can't do so for the next three years?

Planning for the release train

If you are considering adopting the release train, I recommend preparing a list of all the things you depend on now, or might depend on in the next 3 years. You need to be confident that everything on that list will work correctly and be upgraded along with the release train, or have a plan if that dependency is not upgraded. The list for my day job is something like this:

  • Amazon AWS
  • Eclipse
  • IntelliJ
  • Travis CI
  • Shippable CI
  • Maven
  • Maven plugins (compile, jar, source, javadoc, etc)
  • Checkstyle, & associated IDE plugins and maven plugin
  • JaCoCo, & associated IDE plugins and maven plugin
  • PMD, & associated maven plugin
  • SpotBugs, & associated maven plugin
  • OSGi bundle metadata tool
  • Bytecode tools (Byte buddy / ASM etc)
  • Over 100 jar file dependencies

And I've probably forgotten something.

Don't get me wrong. I think it is perfectly possible to make a choice to say that you are willing to take the risk. That the benefits of new language features, and probable enhanced performance, make the effort worthwhile. But I strongly believe it is more risky than remaining on Java 11.

A middle ground?

One possible middle ground is to develop your application for Java 12, but run it in production on Java 13, 14, 15 etc. as soon as they come out. Sadly, this approach is less viable than it should be.

The removal of APIs and changes to the bytecode version add uncertainty to the stack. Even if your code doesn't use one of the removed APIs, one of your libraries might. Or a bytecode manipulation library might need upgrading, with knock on effects. So while the middle ground is a possible fallback if you get stuck, it is far from a no-risk solution.

Some additional links

Spring framework has expressed its policy wrt Java 12 in a video. The key sections are:

Jaba 8 and 11 as the LTS branches officially supported from our end. Best efforts support for the releases inbetween. ... if you intend to upgrade to 12 ... we are very willing to work with you ... but they are not officially production supported. ... The long term support releases are what we are primarily focussed on. Java 12 and higher will be best effort from our side.

As an example of a typical software vendor, Liferay states:

Liferay has decided we will not certify every single major release of the JDK. We will instead choose to follow Oracle's lead and certify only those marked for LTS.
Liferay blog

Oracle's official "misconceptions" slide about the new release model.

Summary

I'm sure some development teams will adopt the Java release train. My hope is that they do so with their eyes wide open. I know we won't be adopting the release train at my day job any time soon, a key blocker being our use of AWS Lambda, but I'd be concerned about all the other points too.

Feel free to leave a comment, especially if you think I've missed any points that should be on the considerations list.

Wednesday, 26 September 2018

Oracle's Java 11 trap - Use OpenJDK instead!

TL:DR; Java is still available at zero-cost, you just need to stop using Oracle JDK and start using an OpenJDK build, such as this one or this one.

The trap

Java 11 has been released. It is a major release because it has long-term support (LTS). But Oracle have also set it up to be a trap (either deliberately or accidentally).

For 23 years, developers have downloaded the JDK from Oracle and used it for $free. Type "JDK" into your favourite search engine, and the top link will be to an Oracle Java SE download page (I'm deliberately not providing a link). But that search and that link is now a trap.

Oracle JDK, the one all web searches take you to, is now commercial not $free.

The key part of the terms is as follows:

You may not: use the Programs for any data processing or any commercial, production, or internal business purposes other than developing, testing, prototyping, and demonstrating your Application;

The trap is as follows:

  1. Download Oracle JDK (because that is what you've always done, and it is what the web-search tells you)
  2. Use it in production (because you didn't realise the license changed)
  3. Get a nasty phone call from Oracle's license enforcement teams demanding lots of money

In other words, Oracle can rely on inertia from Java developers to cause them to download the wrong (commercial) release of Java. Unless you read the text/warnings/legalese very carefully you might not even realise Oracle JDK is now commercial, and that you are therefore liable to pay Oracle for using this particular JDK in production.

(Update, 2018-10-03: Searches for Java 11 and JDK 11 now seem to be resolving to OpenJDK builds, not commercial ones!)

Is this trap malicious behaviour on the part of Oracle? Readers will have their own opinions. I do suggest bearing in mind that Oracle invests huge amounts in developing Java, so it is reasonable to have a commercial plan available for those that want it. And they do provide a $free alternative completely valid for commercial use...

The solution

The solution is simple!

Use an OpenJDK build.

There are many different $free OpenJDK builds of Java 11, so you need to choose the one that best fits your needs.

The AdoptOpenJDK build is $free, GPL licensed (with Classpath exception so safe for commercial use), and a good choice as it is vendor-neutral and is intended to have 4+ years of security patches.

Download $free Java from AdoptOpenJDK here

The OpenJDK build from Oracle is $free, GPL licensed (with Classpath exception so safe for commercial use), and provided alongside their commercial offering. It will only have 6 months of security patches, after that Oracle intends you to upgrade to Java 12.

Download $free Java from Oracle here

Many more OpenJDK builds are available, including ones available via your package manager. See this post for a list covering the wide variety of OpenJDK builds. And see my post on zero-cost Java for background info.

Download other OpenJDK builds here

And for a counterpoint, see Marcus' great summary of why the underlying changes here are actually good news.

Summary

Do NOT download or use the Oracle JDK unless you intend to pay for it.

For Java 11, download and use an OpenJDK build, from AdoptOpenJDK , Oracle or elsewhere.

(No comments on this post. There are plenty of other places to express opinions.)

Thursday, 20 September 2018

Java release chains - Splitting features from security

There is now a Java release every 6 months - March and September. It started with Java 9 and we're about to get Java 11. But should you jump on the release train? To answer that, we need to look at how Java's release chains are being split.

Looking back at Java 8

In the olden days life was simple. There was a "major" Java release every few years and it contained lots of new features, for example Java 5, 6, 7 and 8. Each major release included new JDK methods, new JDK classes, deprecations, new JVM features and new language features. However, life wasn't actually as simple as it seemed.

Looking at Java 8, once it was released there was a regular frequency of "update" releases. The most well-known of these were 8u20, 8u40 and 8u60. But there were also many others - 8u5, 8u11, 8u25, 8u31, 8u45, 8u51, 8u65, 8u66, 8u71, 8u73, 8u74, 8u77, etc. So, what was going on?

Well the plan was quite simple, just not that widely known. 8u20, 8u40, 8u60 and so on were "feature" releases, while all the rest were security patch releases. See the full table.

  • 8u20, 8u40, 8u60 and so on were "feature" releases - every six months
  • 8u5, 8u11, 8u25, 8u31 and so on were "security" releases - every three months, plus additional emergency releases

If you look closely, you can see a pattern. The first security release after a feature release had a number 5 greater (8u25 is 5 greater than 8u20). The second security release after a feature release had a number 11 greater (8u31 is 11 greater than 8u20). This left space for emergency security releases like 8u66.

So what was a Java 8 feature release?

Well a feature release was allowed to contain anything that didn't impact the Java SE specification. So, JVM or tool enhancements might be allowed, particularly if covered by a flag that was disabled by default. For example, the "endorsed-standards override mechanism and the extension mechanism" was deprecated in 8u40, 8u60 added a new IBM character set, and 8u181 removed the Derby database from the JDK bundle.

If you did notice these feature releases, it might have been because Java 8 lambdas were pretty buggy until 8u40 (in my experience). As such, I've often commented that 8u40 should have been named 8.1 (although logically 8u20 would have been 8.1 and 8u40 would have been 8.2). Such a version numbering scheme would have been easy to grasp for the whole Java ecosystem, but sadly didn't happen with Java 8.

Java 11 and the release train

The observant will notice that there was a feature release every six months and a security release every three months. But this is exactly like the "new" release train post Java 9!

The key difference between then and now is that from Java 9 onwards each six-monthly "feature" release can now contain any change. Thus major JVM, library and language changes can be introduced every six months, which wasn't allowed before. It also means there is a new Java SE specification for each feature release.

To handle this high frequency and still provide stability, every three years, which is every six releases, there will be an "LTS" release - long term support. As I've discussed before, the LTS name is rather confusing given that Oracle will not be providing security patches beyond six months on an "long-term" LTS release for $free. (Read the linked blog to understand how the OpenJDK will be performing that role from now on.)

Obviously with this new release schedule the question of naming arose. Firstly, according to Oracle we are not supposed to refer to any release as a "major" release. Instead, all should be considered to be "feature" releases, and of equal importance. This is emphasised by using a single incrementing number - 9, 10, 11, 12, 13 and so on.

But are all releases really of equal importance when some have long-term support (LTS)?

The answer is clearly no. For many large companies, the LTS releases are the only ones that they will consider adopting, either to manage churn or because they insist on purchasing a support contract for Java. Thus only one conclusion can be drawn:

Java 11 is a major release because it has LTS, not because of its feature set. Java 12, 13 and so on are less significant simply because they do not have LTS.

Now this doesn't mean that Java 12 is any less tested or is in some way unsuited for general use. Nor does it mean that Java 11 is bigger than any of the other versions. It is simply a statement of fact that because Java 11 has LTS (long-term support) it will be more significant, and see adoption by a different type of user.

Release chains

The support model is built on chains of releases. In order to pick up a security patch, you may also have to pickup other changes. The support chain for any Java release is simple enough. The higher the number, the more fixes/features it contains. Thus 8u20 contains everything from 8u5 and 8u11.

Using bold for major/feature releases and italic for security patches we can see the pattern:

  • 8, 8u5, 8u11, 8u20, 8u25, 8u31, 8u40, 8u45, 8u51, 8u60, 8u65, 8u71, ...

From Java 9 onwards there are effectively two separate patterns.

Firstly, there is the "release train" pattern if you want both features and security patches:

  • 9, 9.0.1, 9.0.4, 10, 10.0.1, 10.0.2, 11, 11.0.1, 11.0.2, 12, 12.0.1, 12.0.2, ...

Secondly, there is the "LTS" pattern if you only want security patches:

  • 11, 11.0.1, 11.0.2, 11.0.3, 11.0.4, 11.0.6, 11.0.7, ...

Oracle is focussing its $free efforts on the first "release train" pattern. After 11.0.2, it is ultimately the job of the OpenJDK community (probably led by Red Hat) to keep the chain in the second "LTS" pattern going.

Note the key difference though. The two chains effectively split features from security.

In Oracle's view, the old Java 8 chain and the new "release train" chain are essentially equivalent. I disagree, but I need another blog article to explain why (as this one is already too long!).

Summary

There are now two separate chains of releases - the "release train" with both features and security patches, and the "LTS" with just security patches. Neither chain is equivalent to how it worked in Java 8, so you need to choose which chain to adopt carefully.

Comments welcome. For example, did an 8u feature release break your code?