Tuesday, 9 May 2017

Java SE 9 - JPMS automatic modules

This article in my series on the Java Platform Module System (JPMS) is focussed on automatic modules. JPMS was previously known as Project Jigsaw and is the module system in Java SE 9. See also Module basics, Module naming and Modules & Artifacts.

Automatic modules

Lets say you are in charge of Java, and after 20 years you want to add a module system to the platform. As well as the problems of designing the module system itself, you have to consider migration of all the existing code written in Java (and to a degree, other JVM languages).

The solution to this that JPMS has chosen is automatic modules. Unfortunately, my opinion is that it is the wrong solution.

To understand automatic modules, we have to start by looking at how jar files will be specified in future. In addition to the classpath, Java SE 9 will also have a modulepath. The basic idea is that modules (jar files containing module-info.class) will be placed on the modulepath, not the classpath. In fact, placing a module on the classpath will cause the module declaration (module-info.class) to be completely ignored, which is usually not what you want.

As a basic rule, the modulepath cannot see the classpath. If you create a module and put it on the modulepath, all its dependencies must also be on the modulepath. Thus, in order to write a module at all, all the dependencies must also have been converted to be modules. And many of those dependencies are likely to be open source projects, with varying release schedules.

Clearly, this is a bit of a problem. Essentially, it would mean that an application would need to wait until every dependency had become a module before it could add module-info.java. The "solution" to this is automatic modules.

An automatic module is a normal jar file - one without a module-info.class file - that is placed on the modulepath. Thus the modulepath will contain two types of module - "real" and "automatic". Since there is no module-info.class, an automatic module is missing the metadata normally associated with a module:

  • no module name
  • no list of exported packages
  • no list of dependencies

Unsurprisingly, the list of exported packages is simply set to be all packages in the jar file. The list of dependencies is set to be everything on the modulepath, plus the whole classpath. As such, automatic modules allow the modulepath to depend on the classpath, something which is normally not allowed. While not ideal, both of these defaults for the metadata make sense.

The final missing piece of information is the module name. This has been a big point of discussion in Project Jigsaw. As it stands, the module name will be derived from the filename of the jar file. For me, this is a critical problem with the basic design of automatic modules (but see the mitigation section below).

As per my last blog, modules are not artifacts. But the filename of a jar file is usually the Maven artifactId, a name disconnected from the module name (which should be the super-package reverse DNS). For example, the filename of Google Guava is guava while the correct module name would be com.google.common.

Taken in isolation, this all works fine. The application can now be modularised, and I can depend on a project that is not modularised:

 module com.foo.myapp {
  requires guava;  // guava not yet modularised, so use the filename
 }

But in order to work, the module-info.java file must specify the dependency as being on the filename, not the module name.

The astute will note that depending on the module name is generally not possible until the dependency is a module. But equally, depending on the filename is using a name that is going to be wrong once the dependency is turned into a module:

 module com.foo.myapp {
  requires com.google.common;  // now guava has been modularised
 }

It is this change of name that is at the heart of the problem. Essentially it means that the name used to define the dependencies of your library/application will change once they are modularised. While this name change is fine in a private codebase, it will be hell in the world of Java open source.

Impact of automatic modules

To fully understand how automatic modules affect the open source Java community, it is best to look at a use case. Lets consider what happens if an open source project is released that depends on a filename instead of a module name. And then another open source project is released that depends on that:

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v1 (not a module yet)  

What we now have is a graph of three projects, where the lowest is an automatic module, and the next two are real modules that depend on an automatic module. When Guava is finally modularised, a new release will occur. But Strata and Joda-Convert cannot immediately use the new release, because the module name they reference is now wrong:

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

As can be seen, this setup does not work. We have Module Hell The top two projects depend on "guava", not "com.google.common". And there is no way to have the same packages loaded under two different module names.

What happens if Joda-Convert is updated to match the new Guava? (Strata is not updated)

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v2 org.joda.convert com.google.common
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

This configuration does not work. There is no way to have the same packages loaded under two different module names.

What happens if Strata is updated to match the new Guava? (Joda-Convert is not updated)

Project Version Module name Requires
Strata v2 com.opengamma.strata org.joda.convert
com.google.common
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

This configuration also does not work. There is no way to have the same packages loaded under two different module names.

The only way to get it to work is to update the whole stack together:

Project Version Module name Requires
Strata v2 com.opengamma.strata org.joda.convert
com.google.common
Joda-Convert v2 org.joda.convert com.google.common
Guava v2 com.google.common  

To summarise, when a module depends on an automatic module, and that module is then depended on by others, the whole stack is then linked. Everything in the stack had to go from v1 to v2 together.

Looking back at the whole example, it should be clear that the problem started right back at the beginning. There never should have been a v1 release of Strata that depended on a filename "guava". Or a v1 release of Joda-Convert that depended on a filename. Instead, Strata should have waited until both Guava and Joda-Convert had released a modularised v2. And, Joda-Convert should have waited until Guava had released a modularised v2. To avoid Module Hell, migration must occur from bottom to top.

Given this, it is my opinion that this means that automatic modules do not by themselves provide a viable migration path for the Java open source community. The rule is as follows:

Do not release to Maven Central a modular jar file that expresses a dependency on a filename. Instead, wait until all dependencies can be expressed as module names.

Any jar file in Maven Central that expresses a dependency on a filename will be a cause of Module Hell.

Mitigation

Without any mitigation, the community would have to modularise each open source library one by one from the bottom of the stack upwards. No open source project could do anything until all its dependencies are modularised - a bottom-up migration.

The main piece of mitigation proposed for this problem is that jar files can have a new entry in MANIFEST.MF called "Automatic-Module-Name". When JPMS examines an automatic module, if the MANIFEST.MF entry is present then it uses the value as the module name instead of the filename.

This can be used to break the cycle to a degree. In the case above, the Strata team can release a version at any time with the new MANIFEST.MF entry, essentially stating what module name Strata is going to have in the future. Similarly, Joda-Convert can release at any time with the MANIFEST.MF entry stating what its module name will be. Full modularisation will still need to proceed from the bottom-up, but anyone depending on Joda-Convert or Strata can safely release without being affected by the question over Guava's module name.

The key difference between adding the MANIFEST.MF entry and adding a module declaration is that the MANIFEST.MF entry does not need the names of the dependencies to be specified. As such, there is no need to wait for the dependencies to be modularised before adding the MANIFEST.MF entry.

The rule outlined above can thus also be expressed as:

Do not release to Maven Central a modular jar file that depends on an automatic module, unless the automatic module has an "Automatic-Module-Name" MANIFEST.MF entry.

To be clear, I don't think this is ideal, but we are where we are. The message to the open source community is in two parts therefore.

Firstly, do not add a module-info.java module declaration until:

  • all of your runtime dependencies have been modularised (either as a full module or with a MANIFEST.MF entry)
  • all those modularised dependencies have been released to Maven Central
  • your library depends on the updated versions

Secondly, if you can't meet these criteria, but your project is well structured and otherwise suitable for modularisation, please add a MANIFEST.MF entry following the agreed module naming conventions (super-package reverse-DNS).

If everyone does this, then we stand a reasonable chance of avoiding Module Hell.

Summary

Automatic modules allow modules to depend on non-modules. But this is achieved by specifying a requires clause on the filename, not the module name. This will cause pain later if modules are published depending on the filename. A new MANIFEST.MF entry allows any open source project to choose a module name and publish that choice immediately. When JPMS sees the MANIFEST.MF entry, it will use the value as the name for the automatic module.

Community members must at all costs avoid publishing modular jar files that depend on filenames. But community members can publish jar files containing the new MANIFEST.MF entry from now on (although technically the MANIFEST.MF entry is not yet finalised, it probably will be soon).

Comments and feedback welcome.