Tuesday, 9 May 2017

Java SE 9 - JPMS automatic modules

This article in my series on the Java Platform Module System (JPMS) is focussed on automatic modules. JPMS was previously known as Project Jigsaw and is the module system in Java SE 9. See also Module basics, Module naming and Modules & Artifacts.

Automatic modules

Lets say you are in charge of Java, and after 20 years you want to add a module system to the platform. As well as the problems of designing the module system itself, you have to consider migration of all the existing code written in Java (and to a degree, other JVM languages).

The solution to this that JPMS has chosen is automatic modules. Unfortunately, my opinion is that it is the wrong solution.

To understand automatic modules, we have to start by looking at how jar files will be specified in future. In addition to the classpath, Java SE 9 will also have a modulepath. The basic idea is that modules (jar files containing module-info.class) will be placed on the modulepath, not the classpath. In fact, placing a module on the classpath will cause the module declaration (module-info.class) to be completely ignored, which is usually not what you want.

As a basic rule, the modulepath cannot see the classpath. If you create a module and put it on the modulepath, all its dependencies must also be on the modulepath. Thus, in order to write a module at all, all the dependencies must also have been converted to be modules. And many of those dependencies are likely to be open source projects, with varying release schedules.

Clearly, this is a bit of a problem. Essentially, it would mean that an application would need to wait until every dependency had become a module before it could add module-info.java. The "solution" to this is automatic modules.

An automatic module is a normal jar file - one without a module-info.class file - that is placed on the modulepath. Thus the modulepath will contain two types of module - "real" and "automatic". Since there is no module-info.class, an automatic module is missing the metadata normally associated with a module:

  • no module name
  • no list of exported packages
  • no list of dependencies

Unsurprisingly, the list of exported packages is simply set to be all packages in the jar file. The list of dependencies is set to be everything on the modulepath, plus the whole classpath. As such, automatic modules allow the modulepath to depend on the classpath, something which is normally not allowed. While not ideal, both of these defaults for the metadata make sense.

The final missing piece of information is the module name. This has been a big point of discussion in Project Jigsaw. As it stands, the module name will be derived from the filename of the jar file. For me, this is a critical problem with the basic design of automatic modules (but see the mitigation section below).

As per my last blog, modules are not artifacts. But the filename of a jar file is usually the Maven artifactId, a name disconnected from the module name (which should be the super-package reverse DNS). For example, the filename of Google Guava is guava while the correct module name would be com.google.common.

Taken in isolation, this all works fine. The application can now be modularised, and I can depend on a project that is not modularised:

 module com.foo.myapp {
  requires guava;  // guava not yet modularised, so use the filename
 }

But in order to work, the module-info.java file must specify the dependency as being on the filename, not the module name.

The astute will note that depending on the module name is generally not possible until the dependency is a module. But equally, depending on the filename is using a name that is going to be wrong once the dependency is turned into a module:

 module com.foo.myapp {
  requires com.google.common;  // now guava has been modularised
 }

It is this change of name that is at the heart of the problem. Essentially it means that the name used to define the dependencies of your library/application will change once they are modularised. While this name change is fine in a private codebase, it will be hell in the world of Java open source.

Impact of automatic modules

To fully understand how automatic modules affect the open source Java community, it is best to look at a use case. Lets consider what happens if an open source project is released that depends on a filename instead of a module name. And then another open source project is released that depends on that:

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v1 (not a module yet)  

What we now have is a graph of three projects, where the lowest is an automatic module, and the next two are real modules that depend on an automatic module. When Guava is finally modularised, a new release will occur. But Strata and Joda-Convert cannot immediately use the new release, because the module name they reference is now wrong:

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

As can be seen, this setup does not work. We have Module Hell The top two projects depend on "guava", not "com.google.common". And there is no way to have the same packages loaded under two different module names.

What happens if Joda-Convert is updated to match the new Guava? (Strata is not updated)

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v2 org.joda.convert com.google.common
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

This configuration does not work. There is no way to have the same packages loaded under two different module names.

What happens if Strata is updated to match the new Guava? (Joda-Convert is not updated)

Project Version Module name Requires
Strata v2 com.opengamma.strata org.joda.convert
com.google.common
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

This configuration also does not work. There is no way to have the same packages loaded under two different module names.

The only way to get it to work is to update the whole stack together:

Project Version Module name Requires
Strata v2 com.opengamma.strata org.joda.convert
com.google.common
Joda-Convert v2 org.joda.convert com.google.common
Guava v2 com.google.common  

To summarise, when a module depends on an automatic module, and that module is then depended on by others, the whole stack is then linked. Everything in the stack had to go from v1 to v2 together.

Looking back at the whole example, it should be clear that the problem started right back at the beginning. There never should have been a v1 release of Strata that depended on a filename "guava". Or a v1 release of Joda-Convert that depended on a filename. Instead, Strata should have waited until both Guava and Joda-Convert had released a modularised v2. And, Joda-Convert should have waited until Guava had released a modularised v2. To avoid Module Hell, migration must occur from bottom to top.

Given this, it is my opinion that this means that automatic modules do not by themselves provide a viable migration path for the Java open source community. The rule is as follows:

Do not release to Maven Central a modular jar file that expresses a dependency on a filename. Instead, wait until all dependencies can be expressed as module names.

Any jar file in Maven Central that expresses a dependency on a filename will be a cause of Module Hell.

Mitigation

Without any mitigation, the community would have to modularise each open source library one by one from the bottom of the stack upwards. No open source project could do anything until all its dependencies are modularised - a bottom-up migration.

The main piece of mitigation proposed for this problem is that jar files can have a new entry in MANIFEST.MF called "Automatic-Module-Name". When JPMS examines an automatic module, if the MANIFEST.MF entry is present then it uses the value as the module name instead of the filename.

This can be used to break the cycle to a degree. In the case above, the Strata team can release a version at any time with the new MANIFEST.MF entry, essentially stating what module name Strata is going to have in the future. Similarly, Joda-Convert can release at any time with the MANIFEST.MF entry stating what its module name will be. Full modularisation will still need to proceed from the bottom-up, but anyone depending on Joda-Convert or Strata can safely release without being affected by the question over Guava's module name.

The key difference between adding the MANIFEST.MF entry and adding a module declaration is that the MANIFEST.MF entry does not need the names of the dependencies to be specified. As such, there is no need to wait for the dependencies to be modularised before adding the MANIFEST.MF entry.

The rule outlined above can thus also be expressed as:

Do not release to Maven Central a modular jar file that depends on an automatic module, unless the automatic module has an "Automatic-Module-Name" MANIFEST.MF entry.

To be clear, I don't think this is ideal, but we are where we are. The message to the open source community is in two parts therefore.

Firstly, do not add a module-info.java module declaration until:

  • all of your runtime dependencies have been modularised (either as a full module or with a MANIFEST.MF entry)
  • all those modularised dependencies have been released to Maven Central
  • your library depends on the updated versions

Secondly, if you can't meet these criteria, but your project is well structured and otherwise suitable for modularisation, please add a MANIFEST.MF entry following the agreed module naming conventions (super-package reverse-DNS).

If everyone does this, then we stand a reasonable chance of avoiding Module Hell.

Summary

Automatic modules allow modules to depend on non-modules. But this is achieved by specifying a requires clause on the filename, not the module name. This will cause pain later if modules are published depending on the filename. A new MANIFEST.MF entry allows any open source project to choose a module name and publish that choice immediately. When JPMS sees the MANIFEST.MF entry, it will use the value as the name for the automatic module.

Community members must at all costs avoid publishing modular jar files that depend on filenames. But community members can publish jar files containing the new MANIFEST.MF entry from now on (although technically the MANIFEST.MF entry is not yet finalised, it probably will be soon).

Comments and feedback welcome.

Monday, 24 April 2017

Java SE 9 - JPMS modules are not artifacts

This is the next article in a series I'm writing to help make sense of the Java Platform Module System (JPMS) in Java SE 9. JPMS was developed as Project Jigsaw. Other articles in the series are Module basics and Module naming.

Module != Artifact

If you want to grasp what JPMS modules are all about, it turns out that it is critical to understand what they are not. In particular, they are not artifacts.

Firstly, lets define an artifact. An artifact is a file produced when developing software. For a project on Maven Central, this includes jar files of bytecode, jar files of sources and jar files of Javadoc. We are interested primarily in the bytecode for this discussion.

Secondly, lets assume that a project is going to have the same module name over time. This is just like package names - projects don't change package name with every release.

Given this, what is the mapping between an artifact and a module?

Versions

Each version of a project will consist of a different artifact (jar file), perhaps released on Maven Central. Each version will have the same module name. But, we also know that the Java platform (JPMS) does not know about versions or version-selection.

Therefore, when assembling a modulepath for Java SE 9, something else is going to have to choose the correct version of the module. This will typically be the build tool, eg. Maven.

But while the classpath will tolerate having two versions of the artifact (typically with bad consequences at runtime), the JPMS modulepath will refuse to start if there two modules contain the same package, as would happen if two versions of the same module are found.

Maven already manages versions of course, picking one version from a set of versions, where all with the same groupId:artifactId. With Java SE 9 we can say that Maven is picking one artifact from a set of artifacts to use in the runtime JPMS module graph.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick one of these artifacts for the runtime JPMS module graph org.joda.convert
org.joda : joda-convert : 1.1
org.joda : joda-convert : 1.0

Patching bugs

We've all run into the problem of finding a bug in another library, such as one on Maven Central. Most of the time, we workaround the bug. Sometimes, we have to fix it and have a private copy of the library (ie. a private version of the library). In extreme cases, we have to publish the bug-fix version to Maven Central.

If you want to publish a patched version of an open source project, the standard way to do this is to use your groupId, not the original groupId.

The patched version still uses the same package name. If it didn't it, it would be no use as a patched version.

Exactly the same rationale applies to the module name - the module name of the patched version needs to stay the same. This makes sense, because the module name is an aspect of the bytecode, not of the deployment.

As before, the build tool needs to be setup to pick the correct jar file artifact, this time choosing between the patched one and the original.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick either the original artifact or the patch for the runtime JPMS module graph org.joda.convert
com.foo : patched-joda-convert : 1.2-P

License-driven jars

There can be a situation where the equivalent code is released twice for license reasons. The most common case of this has been driven by the JCP.

Imagine that a JSR has been produced, and because of licensing restrictions, Apache and RedHat each decide to produce their own version of the specification (API) jar, one Apache licensed and one LGPL licensed. There will be two different jar file artifacts in Maven Central - redhat-jsr789-1.0.jar and apache-jsr789-1.0.jar. Both of these contain the same package name, and the same interfaces, because they are the same specification.

When creating a module-info.java file for these, teams might be tempted to put "redhat" or "apache" in the module name. But this would be wrong, as a package must be in one module at runtime. Thus, both teams must use the same module name, based on the package name, such as javax.foo. In reality, future JSRs will need to choose their own module name.

Just as in the other cases, the build tool will need to select the correct jar file artifact to use for any given application.

Artifacts   JPMS runtime module
org.apache : apache-jsr789 : 1.0 Build tool must pick one of the artifacts for the runtime JPMS module graph javax.foo
com.redhat : redhat-jsr789 : 1.0-GA

Two worlds

JPMS brings front and centre a distinction that we probably haven't thought too much about - that between build tools and code. I find it to be a useful mental model to keep these two worlds separate.

  Build/Deploy world Code world
Concept
  • Artifacts (jar files)
  • Versions
  • Groups
  • Organizations
  • Modules
  • Packages
  • Classes/Interfaces
  • Methods/Fields
Identifier org.joda : joda-convert : 1.2 org.joda.convert
Tool
  • Maven
  • Gradle
  • Ant
  • javac
  • java
  • jar

Summary

All the use cases above (and others) reduce to the fact that a build tool, such as Maven, must pick one artifact from many to satisfy the module requested in the JPMS modules graph. The module, and the module name, is tightly linked to the bytecode and packages. By contrast, the Maven groupId:artifactId:version co-ordinates are a tool for identifying each artifact in order to pick the correct one to use.

Hopefully, this helps explain the philosophical difference between artifacts and modules, and why modules should be named after their super-package, not after the artifactId. And why you cannot derive the module name from groupId:artifactId - just look at the tables above to see the varying artifact names.

Questions and comments welcome.

Thursday, 20 April 2017

Java SE 9 - JPMS module naming

The Java Platform Module System (JPMS) is soon to arrive, developed as Project Jigsaw. This article follows the introduction and looks at how modules should be named.

As with all "best practices", they are ultimately the opinion of the person writing them. I hope however to convince you that my opinion is right ;-). And as a community, we will certainly benefit if everyone follows the same rules, just like we benefited from everyone using reverse-DNS for package names.

TL;DR - My best practices

These are my recommendations for module naming:

  • Module names must be reverse-DNS, just like package names, e.g. org.joda.time.
  • Modules are a group of packages. As such, the module name must be related to the package names.
  • Module names are strongly recommended to be the same as the name of the super-package.
  • Creating a module with a particular name takes ownership of that package name and everything beneath it.
  • As the owner of that namespace, any sub-packages may be grouped into sub-modules as desired so long as no package is in two modules.

Thus the following is a well-named module:

  module org.joda.time {
    requires org.joda.convert;

    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

As can be seen, the module contains a set of packages (exported and hidden), all under one super-package. The module name is the same as the super-package name. The author of the module is asserting control over all names below org.joda.time, and could create a module org.joda.time.18n in the future if desired.

To understand why this approach makes sense, and the finer details, read on.

JPMS naming

Naming anything in software is hard. Unsurprisingly then, agreeing an approach to naming modules has also turned out to be hard.

The naming rules allow dots, but prohibit dashes, thus lots of name options are closed off. As a side note, module names in the JVM are more flexible, but we are only considering names at the Java level here.

These are the two basic approaches which I think make sense:

1) Project-style. Short names, as commonly seen in the jar filename from Maven Central.

2) Reverse DNS. Full names, exactly as we've used for a package names since Java v1.0.

Here are some examples to make it more clear:

  Project-style Reverse-DNS
Joda-Time joda.time org.joda.time
Commons-IO commons.io org.apache.commons.io
Strata-Basics strata.basics com.opengamma.strata.basics
JUnit junit org.junit

All things being equal, we'd choose the shorter name - project-style. It is certainly more attractive when reading a module-info.java file. But there are some clear reasons why reverse-DNS must be chosen.

It is worth noting that Mark Reinhold currently indicates a preference for project-style names. However, the linked mail doesn't really deal with the global uniqueness or clashing elements of the naming problem, and others in the expert group disagreed with project-style names.

Ownership and Uniqueness

The original designers of Java made a very shrewd choice to proposed reverse-DNS names for packages. This approach has scaled very well, through the incredible rise of open source software. It provides two key properties - Ownership and Uniqueness.

The ownership aspect of reverse-DNS delegates control of part of the global DNS namespace to an individual or company. It is a universally agreed approach with enough breadth of identifiers to make clashes rare. Within that namespace, developers are then responsible for ensuring uniqueness. Together, these two aspects result in globally unique package names. As such, it is pretty rare that code has two colliding packages, despite modern applications pulling in hundreds of dependent jar files. For example, the Spark framework and Apache Spark co-exist despite having the same simple name. But look what happens if we only use project-style names:

  Project-style Reverse-DNS
Spark framework spark.core com.sparkjava.core
Apache-Spark spark.core org.apache.spark.core

As can be seen, the project-style names clash! JPMS will simply refuse to start a modulepath where two modules have the same name, even if they contain different packages. (Since these projects haven't chosen module names yet, I've tweaked the example to make them clash. But this example is far from impossible, which is the point here!)

Not convinced? Well imagine what would happen if package names were not reverse-DNS. If your application pulls in hundreds of dependencies, do you think there would be no duplicates?

Of course we have project-style names today in Maven - the jar filename is the artifactId which is a project-style name. Given this, why don't we have problems today? Well it turns out that Maven is smart enough to rename the artifact if there is going to be a clash. The JPMS does not offer this ability - your only choice with a clash will be to rewrite the module-info-class file of the problematic module and all other modules that refer to it.

As a final example of how project-style name clashes can occur, consider a startup creating a new project - "willow". Since they are small, they choose a module name of "willow". Over the next year, the startup becomes fantastically successful, growing at an exponential rate, meaning that there are now 100s of modules within the company depending on "willow". But then a new Open Source project starts up, and calls itself "willow". Now, the company can't use the open source project. Nor can the company release "willow" as open source. These clashes are avoided if reverse-DNS names are used.

To summarize this section, we need reverse-DNS because module names need to be globally unique, even when writing modules that are destined to remain private. The ownership aspect of reverse-DNS provides enough namespace separation for companies to get the uniqueness necessary. After all, you wouldn't want to confuse Joda-Time with the freight company also called Joda would you?

Modules as package aggregates

The JPMS design is fundamentally simple - it extends JVM access control to add a new concept "modules" that groups together a set of packages. Given this, there is a very strong link between the concept of a module and the concept of a package.

The key restriction is that a package must be found in one and only one module.

Given that a module is formed from one or more packages, what is the conceptually simplest name that you can choose? I argue that it is one of the package names that forms the module. And thus a name you've already chosen. Now, consider we have a project with three packages, which of these three should be the module name?

  module ??? {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
  }

Again, I'd argue there isn't really a debate. There is a clear super-package, and that is what should be used as the module name - org.joda.time in this case.

Hidden packages

With JPMS, a module can hide packages. When hidden, the internal packages are not visible in Javadoc, nor are they visible in the module-info.java file. This means that consumers of the module have no immediate way of knowing what hidden packages a module has.

Now consider again the key restriction that a package must be found in one and only one module. This restriction applies to hidden packages as well as exported ones. Therefore if your application depends on two modules and both have the same hidden package, your application cannot be run as the packages clash. And since information on hidden packages is difficult to obtain, this clash will be surprising. (There are some advanced ways to around these clashes using layers, but these are designed for containers, not applications.)

The best solution to this problem is exactly as described in the last section. Consider a project with three exported packages and two hidden ones. So long as the hidden packages are sub-packages of the module name, we should be fine:

  module org.joda.time {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

By using the super-package name as the module name, the module developer has taken ownership of that package and everything below it. So long as all the non-exported packages are conceptually sub-packages, the end-user application should not see any hidden package clashes.

Automatic modules

JPMS includes a feature whereby a regular jar file, without a module-info.class file, turns into a special kind of module just by placing it on the modulepath. The automatic module feature is controversial in general, but a key part of this is that the name of the module is derived from the filename of the jar file. In addition, it means that people writing module-info.java files have to guess the name that someone else will use for a module. Having to guess a name, and having the Java platform pick a name based on the filename of a jar file are both bad ideas in my opinion, and that of many others, but our efforts to stop them seem to have failed.

The naming approach outlined in this article provides a means to mitigate the worst effects of this. If everyone uses reverse-DNS based on the super-package, then the guesses that people make should be reasonably accurate, as the selection process of a name should be fairly straightforward.

What if there isn't a clear super-package?

There are two cases to consider.

The first case is where there really is a super-package, it's just that it has no code. In this case, the implied super-package should be used. (Note that this example is Google Guava, which doesn't have guava in the package name!):

  module com.google.common {
    exports com.google.common.base;
    exports com.google.common.collect;
    exports com.google.common.io;
  }

The second case is where a jar file has two completely unrelated super-packages:

  foo.jar
  - package com.foo.util
  - package com.foo.util.money
  - package com.bar.client

The right approach here is to break the jar file into two separate modules:

  module com.foo.util {
    requires com.bar.client;
    exports com.foo.util;
    exports com.foo.util.money;
  }
  module com.bar.client {
    exports com.bar.client;
  }

Failure to do this is highly likely to cause clashes at some point, as there is no way that com.foo.util should be claiming ownership of the com.bar.client namespace.

If com.bar.client is going to be a hidden package when converted to modules, then instead of it being a separate module, it can be repackaged (i.e. shaded) under the module's super-package:

  module com.foo.util {
    exports com.foo.util;
    exports com.foo.util.money;
    // not exported: com.foo.util.shade.com.bar.client;
  }

Can you have sub-modules?

Yes. When a module name is chosen, the developer is taking control of a namespace. That namespace consists of the module name and all sub-names below it - sub-package names and sub-module names.

Ownership of that namespace allows the developer to release one module or many. The main constraint is that there should not be two published modules containing the same package.

As a side effect of this, the practice of larger projects releasing an "all" jar will need to stop. An "all" jar is used when the project has lots of separate jar files, but also wants to allow end-users to depend on a single jar file. These "all" jar files are a pain in Maven dependency trees, but will be a disaster in JPMS ones, as there is no way to override the metadata, unlike in Maven.

What if my existing project does not meet these guidelines?

The harsh suggestion is to change the project in an incompatible manner so it does meet the guidelines. JPMS in Java SE 9 is disruptive. It does not take the approach of providing all the tools necessary to meet all the edge cases in current deployments. As such, it is not surprising that some jar files and some projects will require some major rework.

Why ignore the Maven artifactId?

JPMS is an extension to the Java platform (language and runtime). Maven is a build system. Both are necessary, but they have different purposes, needs and conventions.

JPMS is all about packages, grouping them together to form modules and linking those. In this way, developers are working with source code, just like any other source code. What artifacts the source code is packed up into is a separate question. Understanding the separation is hard, because currently there is a one-to-one mapping between the module and the jar file, however, we should not assume this will always be the case in the future.

Another example of this separation is versioning. JPMS has little to no support for versions, yet build systems like Maven do. When running the application, Maven is responsible for collecting a coherent set of artifacts (jar files) to run the application, just as before. It's just that some of those might be modules.

Finally, the Maven artifactId does not exist in isolation. Maven makes unique identifiers by combining the groupId, artifactId and classifier. Only the combination is sufficiently globally unique to be useful. Picking out just the artifactId and trying to make a unique module name from it is asking for trouble.

Summary

JPMS module names, and the module-info.java in general, are going to require real thought to get right. The module declaration will be as much a part of your API as your method signatures.

The importance is heightened because, unlike Maven and other module systems, JPMS has no way to fix broken metadata. If you rely on some modular jar files, and get a clash or find some other mistake in the module declarations, your only options will be to not use JPMS or to rewrite the module declarations yourself. Given this difficulty, it is not yet clear that JPMS will be a success, thus your best option may be to not modularize your code.

See the TL;DR section above for the summary of the module name proposal. Feedback and questions welcome.

PS. For clarity, my personal interest is ensuring Java succeeds, something that will IMO require consistent naming.

Monday, 17 April 2017

Java 9 modules - JPMS basics

The Java Platform Module System (JPMS) is the major new feature of Java SE 9. In this article, I will introduce it, leaving most of my opinions to a follow up article. This is based on these slides.

Java Platform Module System (JPMS)

The new module system, developed as Project Jigsaw, is intended to raise the abstraction level of coding in Java as follows:

The primary goals of this Project are to:
* Make the Java SE Platform, and the JDK, more easily scalable down to small computing devices;
* Improve the security and maintainability of Java SE Platform Implementations in general, and the JDK in particular;
* Enable improved application performance; and
* Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.
To achieve these goals we propose to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. The module system should be powerful enough to modularize the JDK and other large legacy code bases, yet still be approachable by all developers.

However as we shall see, project goals are not always met.

What is a JPMS Module?

JPMS is a change to the Java libraries, language and runtime. This means that it affects the whole stack that developers code with day-to-day, and as such JPMS could have a big impact. For compatibility reasons, most existing code can ignore JPMS in Java SE 9, something that may prove to be very useful.

The key conceptual point to grasp is that JPMS adds new a concept to the JVM - modules. Where previously, code was organized into fields, methods, classes, interfaces and packages, with Java SE 9 there is a new structural element - modules.

  • a class is a container of fields and methods
  • a package is a container of classes and interfaces
  • a module is a container of packages

Because this is a new JVM element, it means the runtime can apply strong access control. With Java 8, a developer can express that the methods of a class cannot be seen by other classes by declaring them private. With Java 9, a developer can express that a package cannot be seen by other modules - ie. a package can be hidden within a module.

Being able to hide packages should in theory be a great benefit for application design. No longer should there be a need for a package to be named "impl" or "internal" with Javadoc declaring "please don't use types from this package". Unfortunately, life won't be quite that simple.

Creating a module is relatively simple however. A module is typically just a jar file that has a module-info.class file at the root - known as a modular jar file. And that file is created from a module-info.java file in your sourcebase (see below for more details).

Using a modular jar file involves adding the jar file to the modulepath instead of the classpath. If a modular jar file is on the classpath, it will not act as a module at all, and the module-info.class will be ignored. As such, while a modular jar file on the modulepath will have hidden packages enforced by the JVM, a modular jar file on the classpath will not have hidden packages at all.

Other module systems

Java has historically had other module systems, most notably OSGi and JBoss Modules. It is important to understand that JPMS has little resemblance to those systems.

Both OSGi and JBoss Modules have to exist without direct support from the JVM, yet still provide some additional support for modules. This is achieved by launching each module in its own class loader, a technique that gets the job done, yet is not without its own issues.

Unsurprisingly, given these are existing module systems, experts from those groups have been included in the formal Expert Group developing JPMS. However, this relationship has not been harmonious. Fundamentally, the JPMS authors (Oracle) have set out to build a JVM extension that can be used for something that can be described as modules, whereas the existing module systems derive experience and value from real use cases and tricky edge cases in big applications that exist today.

When reading about modules, it is important to consider whether the authors of the article you are reading are from the OSGi/JBoss Modules design camp. (I have never actively used OSGi or JBoss Modules, although I have used Eclipse and other tools that use OSGi internally.)

module-info.java

The module-info.java file contains the instructions that define a module (the most important ones are covered here, but there are more). This is a .java file, however the syntax is nothing like any .java file you've seen before.

There are two key questions that you have to answer to create the file - what does this module depend on, and what does it export:

module com.opengamma.util {
  requires org.joda.beans;  // this is a module name, not a package name
  requires com.google.guava;

  exports com.opengamma.util;  // this is a package name, not a module name
}

(The names to use for modules needed a whole separate article, for this one I'll use package-name style)

This module declaration says that com.opengamma.util depends on (requires) org.joda.beans and com.google.guava. It exports one package, com.opengamma.util. All other packages are hidden when using the modulepath (enforced by the JVM).

There is an implicit dependency on java.base, the core module of the JDK. Note that the JDK itself is also modularized, so if you want to depend on Swing, XML or Logging, that dependency needs to be expressed.

module org.joda.beans {
  requires transitive org.joda.convert;

  exports org.joda.beans;
  exports org.joda.beans.ser;
}

This module declaration says that org.joda.beans depends on (requires) org.joda.convert. The "requires transitive", as opposed to a simple "requires", means that any module that requires org.joda.beans can also see and use the packages from org.joda.convert. This is used here as Joda-Beans has methods where the return type is from Joda-Convert. This is shown by a dashed line.

module org.joda.convert {
  requires static com.google.guava;

  exports org.joda.convert;
}

This module declaration says that org.joda.convert depends on (requires) com.google.guava, but only at compile time, "requires static", as opposed to a simple "requires". This is an optional dependency. If Guava is on the modulepath, then Joda-Convert will be able to see and use it, and no error will occur if Guava is not present. This is shown by a dotted line.

Access rules

When running a modular jar on the modulepath with JVM access rules applied, code in package A can see a type in package B if:

  • the type is public
  • package B is exported from it's module
  • there is a dependency from the module containing package A to the module containing package B

Thus, in the example above, code in module com.opengamma.util can see packages org.joda.beans, org.joda.beans,ser, org.joda.convert and any package exported by Guava. However, it cannot see package org.joda.convert.internal (as it is not exported). In addition, code module com.google.guava cannot see code in package org.joda.beans or org.joda.convert as there is no modular dependency.

What can go wrong?

The basics described above are simple enough. It is initially quite easy to imagine how you might build an application from these foundations and benefit from hiding packages. Unfortunately, quite a few things can go wrong.

1) All use of module-info files only applies if using modular jars on the modulepath. For compatibility, all code on the classpath is packaged up as a special unnamed module, with no hidden packages and full access to the whole JDK. Thus, the security benefits of hiding packages are marginal at best. However, the modules of the JDK itself are always run in modular mode, thus are always guaranteed the security benefits.

2) Versions of modules are not handled. You cannot have the same module name loaded twice - you cannot have two versions of the same module loaded twice. It is left entirely to you, and thus to your build tool, to create a coherent set of modules that can actually be run. Thus, the classpath hell situation caused by clashing versions is not solved. Note that putting the version number in the module name is a Bad Idea that does not solve this problem and creates others.

3) Two modules may not contain the same package. This seems eminently sensible, until you consider that it also applies to hidden packages. Since hidden packages are not listed in module-info.class, a tool like Maven must unpack the jar file to discover what hidden packages there are in order to warn of clashes. As a user of the library, such a clash will be completely surprising, as you won't have any indication of the hidden packages in the Javadoc. This is a more general indication that JPMS does not provide sufficient isolation between modules, for reasons that are far from clear at this point.

4) There must be no cycles between modules, at compile time and at runtime. Again, this seems sensible - who wants to have module A depend on B depend on C which depends on A? But the reality of existing projects is that this happens, and on the classpath is not a problem. For example, consider what would happen if Guava decided to depend on Joda-Convert in the example above. This restriction will make some existing open source projects hard to migrate.

5) Reflection is changing, such that non-public fields and methods will no longer be accessible via reflection. Since almost every framework uses reflection in this way, there will be significant work needed to migrate existing code. In particular, the JDK will be very locked down against reflection, which may prove painful (command line flags can escape the trap for now). This article hasn't had a chance to explore how the module declaration can influence reflection - see "opens" in the slides for more details.

6) Are your dependencies modularized? In theory, you can only turn your code into a module once all your dependencies are also modules. For any large application with hundreds of jar file dependencies, this will be a problem. The "solution" is automatic modules, where a normal jar file placed on the modulepath is automatically turned into a module. This process is controversial, with naming a big issue. Library authors should not publish modules that depend on automatic modules to public repositories like Maven Central. Again, automatic modules deserve their own article!

7) Module naming is not yet set in stone. I've come to believe that naming your module after the highest package it contains, causing that module to "take ownership" of the subpackages, is the only sane strategy.

8) Conflicts with build systems - who is in charge? A Maven pom.xml also contains information about a project. Should it be extended to allow module information to be added? I would suggest not, because the module-info.java file contains a binding part of your API, and that is best expressed in .java code, not metadata like pom.xml.

For those wanting a book to read in much more depth, try this one from Nicolai.

Summary

Do not get too excited about JPMS - modules in Java 9. The above is only a summary of what is possible with the module-info.java file and the restrictions of the JPMS. If you are thinking of modularizing your library or application, please wait a little longer until everything becomes a little clearer.

Feedback welcome, but bear in mind that I am planning more articles.

Tuesday, 7 February 2017

Java Time (JSR-310) enhancements in Java SE 9

The java.time.* API (JSR-310) was added to Java SE 8, but what has been going on since then?

Java Time in Java SE 9

There are currently 117 java time issues targetted into Java SE 9. Most of these are not especially interesting, with a lot of mistakes in the Javadoc that needed fixing. What follows are some of the interesting ones:

Main enhancements:

JDK-8146218 - Add LocalDate.datesUntil method producing Stream.
Adds two new methods - LocalDate.datesUntil(LocalDate) and LocalDate.datesUntil(LocalDate,Period) - returning a stream of dates.

JDK-8068730 - Increase precision of Clock.systemUTC().
The clock in Java - System.currentTimeMillis() - has ticked in milliseconds since Java was first released. With Java SE 9, users of Clock will see higher precision, depending on the available clock of the operating system.

JDK-8071919 - Clock.tickMillis(ZoneId zone) method.
With the system clock now returning higher precision, a new method was added - Clock.tickMillis(ZoneId) - that chops off the extra precision to restrore the millisecond ticking behaviour of Java SE 8.

JDK-8030864 - Add efficient getDateTimeMillis method to java.time.
This adds two methods named epochSecond to Chronology that have no object creation to convert date-time fields to an epoch-second.

JDK-8142936 - Duration methods for days, hours, minutes, seconds, etc.
The Java SE 8 API of Duration turned out to be incomplete for certain use cases. This change adds a slew of new methods that allow parts of the duration to be reliably returned.

JDK-8148849 - Truncating Duration.
Adds a method Duration.truncatedTo(TemporalUnit) to allow truncation, similar to the existing method on Instant.

JDK-8032510 - Add Duration.dividedBy(Duration).
A new method to allow a duration to be divided by another duration.

JDK-8133079 - LocalDate and LocalTime ofInstant() factory methods.
Add new factory methods in LocalDate and LocalTime to simplify conversion from Instant.

JDK-8143413 - Add toEpochSecond methods for efficient access.
Add methods to LocalDate, LocalTime and OffsetTime to optimize conversion to epoch-seconds.

Formatting:

JDK-8148947 - DateTimeFormatter pattern letter 'g'.
This adds a pattern letter for modified Julian days.

JDK-8155823 - Add date-time patterns 'v' and 'vvvv'.
This adds support for the "generic non-location" format for time-zones as defined by CLDR, such as "Pacific Time" (the format ignores daylight saving time). Methods were also added to the formatter builder - DateTimeFormatterBuilder.appendGenericZoneText().

JDK-8066806 - DateTimeFormatter cannot parse offset with single digit hour.
The formatter is extended to support 11 more time-zone offset formats, including single digit hours such as +2:00.

JDK-8031085 - DateTimeFormatter won't parse format "yyyyMMddHHmmssSSS".
This extends support for adjacent value parsing to fractions. Where in Java SE 8 a pattern like this looks like it should work, but doesn't, in Java SE 9 it just works.

JDK-8145633 - Adjacent value parsing for Localized Patterns .
This extends support for adjacent value parsing to localized patterns, such as week-based-year and week of year.

JDK-8148949 - DateTimeFormatter pattern letters 'A','n','N'.
These patterns were altered to be more flexible and produce less errors.

Performance:

JDK-8073394 - Clock.systemUTC() should return a constant.
This change avoids creating unecessary objects when using Clock.

JDK-8074003 - ZoneRules.getOffset(Instant) can be optimized.
This change reduces object churn when looking up time-zone data.

JDK-8074002 - ZoneId.systemDefault() should be faster.
This enhancement uses a clever approach to cache the time-zone while handling TimeZone.setDefault.

JDK-8068803 - Performance of LocalDate.plusDays could be better.
This optimizes LocalDate.plusDays for the common case of adding a small number of days.

JDK-8066291 - ZoneIdPrinterParser can be optimized.
The method ZoneRulesProvider.getAvailableZoneIds() now returns an immutable set, not a mutable one. Since most user code calls ZoneRules.getAvailableZoneIds(), it will be unaffected.

Thanks

Thanks for all these bug fixes and enhancements go to the many contributors, both inside and outside Oracle. (I've mostly been a reviewer, rather than an author, which has worked pretty well overall.)

Summary

The java.time.* keep on moving forward. Any feedback or other enhancement suggestions are welcome!

Monday, 26 September 2016

Code generating beans - mutable and immutable

Java has long suffered from the pain of beans. To declare a simple data class takes far too much code. At JavaOne 2016, I talked about code generation options - see the slides.

Code generation of mutable and immutable beans

The Java ecosystem is massive. So many libraries releasd as open source and beyond, which naturally leads to the question as to how those libraries communicate. And it is the basic concept of beans that is the essential glue, despite the ancient specification. How do ORMs (Hibernate etc.), Serialization (Jackson etc.) and Bean Mappers (Dozer etc.) communicate? Via getters and setters.

The essential features of beans have moved beyond the JavaBeans spec, and are sometimes referred to as POJOs. The features are:

  • Mutable
  • No-args constructor
  • Getters and Setters
  • equals() / hashCode() / toString()

But writing these manually is slow, tedious and error-prone. Code generation should be able to help us here.

But should we be using mutable beans in 2016? No, no, no!

It is time to be writing immutable data structure (immutable beans). But the only practical way to do so is code generation, especially if you want to have builders.

In my talk at JavaOne 2016, I considered various code generation approaches:

IDE code generation

This is fine as far as it goes, but while the code is likely to be correct immediately after generation, there is still no guarantee that the generated code will stay correct as the class is maintained over time.

AutoValue, Immutables and VALJOGen

These three projects - AutoValue, Immutables, VALJOGen - use annotation processors to generate code during compilation. The idea is simple - the developer writes an abstract class or interface, and the tool code generates the implementation at compile time. However, these tool all focus on immutable beans, not mutable (Immutables can generate a modifiable bean, but it doesn't match the JavaBeans spec, so many tools will reject it).

On the up side, there is no chance to mess up the equals / hashCode / toString. While the tools all allow the methods to be manually written if necessary, most of the time, the default is what you want. It is also great not to have to implement the immutable builder class manually.

On the down side, you as the developer have to write abstract methods, not fields. A method is a few more keystrokes than a field, and the Javadoc requires an @return line too. With AutoValue, this is particularly painful, as you have to write the outline of the builder class. With Immutables, there is no need for this.

Of the three projects, AutoValue provides a straightforward simple tool that works, and where the implementation class is hidden (package scoped). Immutables provides a full-featured tool, with many options and ways to generate. By default, the implementation class is publicly visible and used by callers, but there are ways to make it package-scoped (with more code written by you). VALJOGen allows full customisation of the generation template. There is no doubt that Immutables is the most comprehensive of the three projects.

All three annotation processing tools must be setup before use. In general, adding the tool to Maven will do most of the work (Maven support is good). For Eclipse and IntelliJ, these instructions are comprehensive.

Lombok

The Lombok project also uses annotations to control code generation. However, instead of acting as an annotation processor, it hacks into the internal APIs of Eclipse and the Java compiler.

This approach allows code to be generated within the same class, avoiding the need for developers to work with an abstract class or interface. This means that instead of writing abstract methods, the developer writes fields, which is a more natural thing to do.

The key question with Lombok is not what it generates, but the way it generates it. If you are willing to accept the hacky approach, IDE limitations, and the inability to debug into the generated code, then it is a neat enough solution.

For Eclipse, Lombok must be installed, which is fairly painless as there is a GUI. Other tools require other installation approaches, see this page.

Joda-Beans

The Joda-Beans project takes a third approach to code generation. It is a source code regenerator, creating code within the same source file, identified by "autogenerated start/end" comments.

Developers write fields, not abstract methods which is simpler and less code. They also generate code into the same class, which can be final if desired.

One key benefit of generating all the code to the same class is that the code is entirely valid when checked out. There is no need to install a plugin or configure your IDE in any way.

Unlike the other choices, Joda-Beans also provides functionality at runtime. It aims to add C# style properties to Java. What this means in practice is that you can easily treat a bean as a set of properties, loop over the properties and create instances using a standardized builder. These features are the ideal building block for serialization frameworks, and Joda-Beans provides XML, JSON and binary serialization that operates using the properties, generally without reflection. The trade off here is that Joda-Beans is a runtime dependency, so it is only the best option if you use the additional properties feature.

The Joda-Beans regenerator can be run from Maven using a plugin. If you use the standard Eclipse Maven support, then simply saving the file in Eclipse will regenerate it. There is also a Gradle plugin.

Comparing

Most of the projects have some appeal.

  • AutoValue is a simple annotation processor that hides the implementation class, but requires more code to trigger it.
  • Immutables is a flexible and comprehensive annotation processor that can be used in many ways.
  • Lombok requires the least coding by the developer, but the trade-off is the implementation via internal APIs.
  • Joda-Beans is different in that it has a runtime dependency adding C# style properties to Java, allowing code to reliably loop over beans.

My preference is Joda-Beans (which I wrote), because I like the fact that the generated code is in the same source file, so callers see a normal final class, not an interface or abstract class. It also means that it compiles immediately in an IDE that has not been configured when checked out from source control. But Joda-Beans should really only be used if you understand the value of the properties support it provides.

If I was to pick another tool, I would use Immutables. It is comprehensive, and providing you invest the time to choose the best way to generate beans for your needs, it should have everything you need.

Finally, it is important that readers have a chance to look at the code written and the code generated. To do this, I have created the compare-beangen GitHub project. This project contains source code for all the tools above and more, demonstrating what you have to write.

To make best use of the project, check it out and import it into your IDE. That way, you will experience what code generation means, and how practical it is to work with it. (For example, see what happens when you rename a field/method/class. Does the code generator cope?)

Summary

It is time to start writing and using immutable beans instead of mutable ones. The Strata open source market risk project (my day job) has no mutable beans, so it is perfectly possible to do. But to use immutable beans, you are going to need a code generator, as they are too painful to use otherwise.

This blog has summarised five code generators, and provided a nice GitHub project for you to do you own comparisons.

Tuesday, 20 September 2016

Private methods in interfaces in Java 9

Java SE 9 is slowly moving towards the finishing line. One new feature is private methods on interfaces.

Private methods on interfaces in Java 9

In Java 7 and all earlier versions, interfaces were simple. They could only contain public abstract methods.

Java 8 changed this. From Java 8, you can have public static methods and public default methods.

public interface HolidayCalendar {
  // static method, to get the calendar by identifier
  public static HolidayCalendar of(String id) {
    return Util.holidayCalendar(id);
  }
  
  // abstract method, to find if the date is a holiday
  public abstract boolean isHoliday(LocalDate date);
  
  // default method, using isHoliday()
  public default boolean isBusinessDay(LocalDate date) {
    return !isHoliday(date);
  }
}

Note that I have chosen to use the full declaration, with "public" on all three methods even though it is not required. I have argued that this is best practice for Java SE 8, because it makes the code clearer (now there are three types of method) and prepares for a time when there will be non-public methods.

And that time is very soon, as Java 9 is adding private methods on interfaces.

public interface HolidayCalendar {
  // example of a private interface method
  private void validateDate(LocalDate date) {
    if (date.isBefore(LocalDate.of(1970, 1, 1))) {
   throw new IllegalArgumentException();
 }
  }
}

Thus, methods can be public or private (with the default being public if not specified). Private methods can be static or instance. In both cases, the private method is not inherited by sub-interfaces or implementations. The valid combinations of modifiers in Java 9 will be as follow:

  • public static - supported
  • public abstract - supported
  • public default - supported
  • private static - supported
  • private abstract - compile error
  • private default - compile error
  • private - supported

Private methods on interfaces will be very useful in rounding out the functionality added in Java 8.