Monday, 9 July 2018

Upgrading to Eclipse Photon

I use Eclipse as my Java IDE. And the new release, Photon is now out.

Photon is a large release, with lots of new features. The most important is the separation of the test and main classpaths, which has always been a point of pain in the IDE. Now it just works as you would expect, and the Maven plugin M2E correctly sets it up:

Note the darker colour of the src/test classpath elements.

Support for Java 9 (modules) and Java 10 (local variable type inferenece) is also present, ready for Java 11 in September. You can also use JUnit 5. It even tries to help you reach 100% code coverage!

All in all, I feel this is a release where upgrading will make a difference to everyday coding.

I've upgraded my own Eclipse installations, and it all went pretty well. You can either start from a clean install (I prefer the basic IDE without plugins so I can choose which ones to add). Or you can add Photon as an update site, and let Eclipse update itself.

One problem I had was the plugin that connects Maven (M2E) to Checkstyle (Eclipse-CS), known as m2e-code-quality. Fortunately, the team at GEBIT have been maintaining a fork of the original plugin. However, they don't release it in binary form. As such, I had to build the plugin locally (no big deal - its a simple build).

To simplify the process however, I've created a repository on GitHub with my Eclipse setup files, and a binary zip of the GEBIT forked plugin.

To use just the m2e-code-quality GEBIT fork, download the zip file and add it as an update site. Here are some instructions.

Thank you Eclipse team for a great release!


PS. I won't be answering "how to" questions about upgrading Eclipse or the eclipse-setup repository. There are plenty of other places to ask questions, such as Stack Overflow or the Eclipse Forums.

Thursday, 22 March 2018

JPMS modules for library developers - negative benefits

Java 9 introduced a major new feature - JPMS, the Java Platform Module System. After six months I've come to the conclusion that JPMS currently offers "negative benefits" to open source library developers. Read on to understand why.

Modules for library developers

Java 8 is probably the most successful Java release ever. It is widely used and widely liked. As such, almost all open source libraries run on Java 8 (as library authors want their code to be used!). Some libraries with a long history also still run on older versions. Joda-Convert has a Java 6 baseline, while Joda-Time has a Java 5 baseline. Others have a Java 8 baseline, such as ThreeTen-Extra.

Java 9 was released in September 2017, but it is not a release that will be supported for a number of years. Instead, it had a lifetime of six months and is now obsolete because Java 10 is out. And in six months time Java 11 will be out making Java 10 obsolete, and so on.

While most releases last six months, some are luckier. Java 11 will be a "long term support" (LTS) release with security and bug support for a few years (Java 8 is also an LTS release). Thus, even though Java 10 is out, Java 8 is still the sensible Java version for open source library developers to target right now because it is the current LTS release.

But what happens when Java 11 comes out? Since Java 8 will be unsupported relatively soon after Java 11 is released, you'd think that the sensible baseline would be 11. Unfortunately I believe many companies will be sticking with Java 8 for a long time. An aggressive open source project might move quickly to a Java 11 baseline, but doing so would be a risky strategy for adoption.

The module-path

Before discussing the JPMS options for open source library developers, it is important to cover the distinction between the class-path and the module-path. The class-path that we all know and love is still present in Java 9+, and it mostly works in the same way.

The module-path is new. When a jar file is on the module-path any module-info is used to apply the new stricter JPMS rules. For example, a public method is no longer callable unless it has been exported from the module it is contained in (and required by the caller's module).

The basic idea is simple, you put old fashioned non-modular jar files on the class-path, while you put modular jar files on the module-path. Nothing enforces this however, and it turns out this is a bit of a problem. There are thus four possibilities:

  • modular jar on the module-path
  • modular jar on the class-path
  • classic non-modular jar on the module-path
  • classic non-modular jar on the class-path

To be sure your library works correctly, you need to test it both on the class-path and on the module-path. For example, service loading is very different on the module-path compared to the class-path. And some resource lookup methods also work completely differently.

To complicate this further, JPMS has no explicit support for testing. In order to test a module on the module-path (which is a tightly locked down set of packages) you have to use the --patch-module command line flag. This flag effectively extends the module, adding the testing packages into the same module as the classes under test. (If you only test the public API, you can do this without using patch-module, but in Maven you'd need a whole new project and pom.xml to achieve that approach, so its likely to be rare.)

In the latest Maven surefire plugin (v2.21.0 and later) the patch-module flag is used, but if your module has optional dependencies, or you have additional testing dependencies, you may have to manually add them, see this issue and this issue.

Given all this, what should an open source library developer do?

Option 1, do nothing

In most cases, but not all, code that is compiled on Java 8 or earlier will run just fine on the class-path in Java 9+. So, you can do nothing and ignore JPMS.

The problem is that other projects will depend on your library. By not adopting JPMS at all, you block those projects from progressing in their modularization. (A project can only choose to fully modularize once all of its dependencies are modularized.)

Of course if your code doesn't run on Java 9+ because you've used sun.misc.Unsafe or something else you shouldn't have done then you've got other things to fix.

And don't forget that a user could put your jar file on the class-path or the module-path. Have you tested both? ie. The truth is that "do nothing" is not possible - at a minimum you have extra testing to do, even if it just to document that your project does't work on the module-path.

Option 2, add a module name

Java 9+ recognises a new entry in the MANIFEST.MF file. The Automatic-Module-Name entry allows a jar file to declare what name it will use if/when it is turned into a proper modular jar file. Here is how you can use Maven to add it:

 <plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-jar-plugin</artifactId>
  <configuration>
   <archive>
    <manifestEntries>
     <Automatic-Module-Name>org.foo.bar</Automatic-Module-Name>
    </manifestEntries>
   </archive>
  </configuration>
 </plugin>

This is a nice simple way to move forward. It reserves the module name and allows other projects that depend on your jar file to fully modularize if they wish.

But because its so simple, its easy to forget the testing aspect. Again, your jar file might be placed on the class-path or on the module-path, and the two can behave quite differently. In fact, now that it has some module information, tools may treat it differently.

When Maven sees an Automatic-Module-Name it will normally place the classes on the module-path instead of the class-path. This may have no effect, or it may show up a bug where your code works on the class-path but not on the module-path. With Maven right now, you have to use surefire plugin v2.20.1 to test on the class-path (an old version that doesn't know about the module-path). To test on the module-path, use v2.21.0. Swapping between these two versions is of course a manual process, see this issue for a request to improve this.

While upgrading some of my projects I added Automatic-Module-Name without testing on the module-path. When I did eventually test on the module-path the tests failed, as the code simply didn't work on the module-path. Unfortunately, I now have some releases on Maven-Central that have Automatic-Module-Name but don't work on the module-path, happy days...

To emphasise this, just adding something to the MANIFEST.MF file can have an effect on how the project is run and tested. You need to test on both the class-path and module-path.

Option 3, add module-info.java

This is the full modularization approach described in numerous web pages and tutorials on JPMS.

 module org.foo.bar {
   requires org.threeten.extra;
   exports org.foo.bar;
   exports org.foo.bar.util;
 }

So, what are the implications of doing this to the open source project?

Unlike option 2, your code now has a baseline of Java 9+. The Java 8 compiler won't understand the file. What we really want is a jar file that contains Java 8 class files, but with just the module-info.java file compiled under Java 9+. In theory, when running on Java 8 the module-info.class file will be ignored if it is not used.

Maven has a technique to achieve this. While the technique works OK, it turns out to be nowhere near sufficient to achieve the goal. To actually get a single jar file that works on both Java 8 and 9+ you need:

  • use the release flag on Java 9+ to build for Java 8
  • add an OSGi require capability filter to inform it that its still Java 8 compatible
  • exclude module-info.java from maven-javadoc-plugin when building on Java 8
  • use maven-javadoc-plugin v3.0.0-M1 (not later), manually copy dependencies to a directory and refer to them using additional Javadoc command line arguments, see this issue
  • exclude module-info.java from maven-checkstyle-plugin
  • exclude module-info.java from maven-pmd-plugin
  • manually swap the version of maven-surefire-plugin to test both the module-path and the class-path

And probably some more I've forgotten about. Here is one pom.xml before integrating Java 9. Here it is after integrating Java 9. An increase from 650 to 862 lines, with lots of complexity, profiles and workarounds.

With a Java 11 baseline, the project would be simpler again, but that baseline isn't going to happen for a number of years. Note that my comments should not be interpreted as anti-Maven. A small team there is working hard to do the best they can - JPMS is complex.

And just for kicks, your project can no longer be used by Android (as the team there seems to be very slow in adding a simple "ignore module-info" rule). And many tools with older versions of bytecode libraries like ASM will fail too - I had a report that a particular version of Tomcat/TomEE could not load the modular jar file. I've ended up having to release a "classic" non-modular jar file to cope with these situations, something which is profoundly depressing.

While I've added module-info.java to some of my projects, I cannot recommend others to do so - its a very painful and time-consuming process. The time to consider it would appear to be once Java 11 or beyond is widely adopted and the baseline of your project.

Negative benefits

Now for the controversial part.

It is my conclusion that JPMS, as currently designed, has "negative benefits" for open source libraries.

As explained above, the cost of full modularization is high for library developers. The need to retain Java 8 compatibility makes JPMS really hard to use (module information should have been textual, not a class file). The tooling is still incomplete/buggy. Many older projects can't cope with the new jar files if you do go for it. Much of this will improve over time, but we're talking a number of years before Java 11 is widely adopted. But don't be lulled into just believing waiting will solve the key problem.

The split (bifurcation) of the module-path from the class-path is an absolute nightmare. At a stroke, there are now two different ways that your library can be run, and the two environments have quite different qualities. Code that compiles and runs on the class-path will often not compile or not run on the module-path. And vice versa. As a library author, you cannot control whether the class-path or module-path is used. You have no choice - you must test both, which you probably won't think to do. (And Maven currently provides no way to test both in one pom.xml)

Given all this effort and extra complexity, we should be getting some great benefits, right? Well no.

JPMS is supposed to ensure reliable configuration (that all your dependencies are available at startup) and strong encapsulation (that other code can't see or use packages that you want to keep hidden). But since there is no way to stop your modular jar file being used on the class-path, you get none of these benefits.

Did you put lots of effort into choosing which packages to hide? Meaningless, as the user can just put the jar file on class-path and call your internal packages. Did you believe that the JVM will check all your dependent modules are available before starting? Afraid not, no checks performed when the user puts the jar file on class-path.

Since we get none of the claimed benefits of JPMS, but get lots of extra work in testing and complexity in the build tools, I feel "negative benefits" is a pretty accurate summary.

Summary

As of today, JPMS is a pain for library authors. The split of the module-path from the class-path is at the heart of the problem. You really can't afford to release a library without testing on both module-path and class-path - there are subtle corner cases where the environments differ.

What is desperately needed is a small change to JPMS. There needs to be a way for a library author to insist that the modular jar file they are producing can only be run on the module-path (with any attempt to use it on the class-path preventing application startup). This would eliminate the need for testing both class-path and module-path. Together with the passage of time, JPMS might yet achieve its goals and go from negative to positive benefits.

Monday, 5 February 2018

Java 9 has six weeks to live

Java 9 is obsolete in just six weeks (20th March 2018). What? You haven't upgraded yet? Well, Java 10 is only going to last six months before it is obsolete too.

Update 2018-03-20: Java 10 is released. Java 9 is obsolete.

Release train impact

The new Java release train means that there will be a new release of Java every six months. And when the next release comes out, the previous release is obsolete.

What do I mean by obsolete?

In practical terms it means that there are no more security updates from Oracle. (Theoretically, the OpenJDK community could release security updates, but there is no sign of this yet). And since you don't want to run your software without the latest security updates, you are expected to upgrade to Java 10 as soon as it is released.

As a user of Java, here are three possible ways to approach the release train:

  1. Stay on Java 8, the current LTS (long term support) release, until the next LTS release occurs (Java 11)
  2. Move from Java 9 to Java 10 to Java 11, making sure you update rapidly to get the security updates
  3. Stay on Java 9 (or Java 10) and don't worry about security updates

If you have already moved to Java 9, you have effectively committed to option 2 or 3. If you care about security updates, you need to be prepared to switch to Java 10 shortly after it is release on 20th March. To do this, you probably should be testing with a Java 10 pre-release now. If you find that to be a challenge, you have to stop caring about security, or consider going back to Java 8 LTS.

However you look at it, being on the release train is a big commitment.

  • Will your dependencies work on the next version?
  • Will your IDE be ready?
  • Will your build tool (Maven, Gradle etc.) be ready?
  • Will your other tools (spotbugs, checkstyle, PMD etc.) be ready?
  • How fast are you going to be able to update when the release you are on is obsolete?

Lots to consider. And given the number of external tools/dependencies to consider, I think its fair to say that its a bold choice to use Java 9 or 10.

Summary

With a release every six months, it is important to decide on an approach to the release train. If you want to upgrade every six months, great! But you'll need to test pre-releases of Java with your whole toolchain in advance of the release to ensure you don't get stuck on an unpatched obsolete release of Java.

Tuesday, 9 May 2017

Java SE 9 - JPMS automatic modules

This article in my series on the Java Platform Module System (JPMS) is focussed on automatic modules. JPMS was previously known as Project Jigsaw and is the module system in Java SE 9. See also Module basics, Module naming and Modules & Artifacts.

Automatic modules

Lets say you are in charge of Java, and after 20 years you want to add a module system to the platform. As well as the problems of designing the module system itself, you have to consider migration of all the existing code written in Java (and to a degree, other JVM languages).

The solution to this that JPMS has chosen is automatic modules. Unfortunately, my opinion is that it is the wrong solution.

To understand automatic modules, we have to start by looking at how jar files will be specified in future. In addition to the classpath, Java SE 9 will also have a modulepath. The basic idea is that modules (jar files containing module-info.class) will be placed on the modulepath, not the classpath. In fact, placing a module on the classpath will cause the module declaration (module-info.class) to be completely ignored, which is usually not what you want.

As a basic rule, the modulepath cannot see the classpath. If you create a module and put it on the modulepath, all its dependencies must also be on the modulepath. Thus, in order to write a module at all, all the dependencies must also have been converted to be modules. And many of those dependencies are likely to be open source projects, with varying release schedules.

Clearly, this is a bit of a problem. Essentially, it would mean that an application would need to wait until every dependency had become a module before it could add module-info.java. The "solution" to this is automatic modules.

An automatic module is a normal jar file - one without a module-info.class file - that is placed on the modulepath. Thus the modulepath will contain two types of module - "real" and "automatic". Since there is no module-info.class, an automatic module is missing the metadata normally associated with a module:

  • no module name
  • no list of exported packages
  • no list of dependencies

Unsurprisingly, the list of exported packages is simply set to be all packages in the jar file. The list of dependencies is set to be everything on the modulepath, plus the whole classpath. As such, automatic modules allow the modulepath to depend on the classpath, something which is normally not allowed. While not ideal, both of these defaults for the metadata make sense.

The final missing piece of information is the module name. This has been a big point of discussion in Project Jigsaw. As it stands, the module name will be derived from the filename of the jar file. For me, this is a critical problem with the basic design of automatic modules (but see the mitigation section below).

As per my last blog, modules are not artifacts. But the filename of a jar file is usually the Maven artifactId, a name disconnected from the module name (which should be the super-package reverse DNS). For example, the filename of Google Guava is guava while the correct module name would be com.google.common.

Taken in isolation, this all works fine. The application can now be modularised, and I can depend on a project that is not modularised:

 module com.foo.myapp {
  requires guava;  // guava not yet modularised, so use the filename
 }

But in order to work, the module-info.java file must specify the dependency as being on the filename, not the module name.

The astute will note that depending on the module name is generally not possible until the dependency is a module. But equally, depending on the filename is using a name that is going to be wrong once the dependency is turned into a module:

 module com.foo.myapp {
  requires com.google.common;  // now guava has been modularised
 }

It is this change of name that is at the heart of the problem. Essentially it means that the name used to define the dependencies of your library/application will change once they are modularised. While this name change is fine in a private codebase, it will be hell in the world of Java open source.

Impact of automatic modules

To fully understand how automatic modules affect the open source Java community, it is best to look at a use case. Lets consider what happens if an open source project is released that depends on a filename instead of a module name. And then another open source project is released that depends on that:

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v1 (not a module yet)  

What we now have is a graph of three projects, where the lowest is an automatic module, and the next two are real modules that depend on an automatic module. When Guava is finally modularised, a new release will occur. But Strata and Joda-Convert cannot immediately use the new release, because the module name they reference is now wrong:

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

As can be seen, this setup does not work. We have Module Hell The top two projects depend on "guava", not "com.google.common". And there is no way to have the same packages loaded under two different module names.

What happens if Joda-Convert is updated to match the new Guava? (Strata is not updated)

Project Version Module name Requires
Strata v1 com.opengamma.strata org.joda.convert
guava (a filename)
Joda-Convert v2 org.joda.convert com.google.common
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

This configuration does not work. There is no way to have the same packages loaded under two different module names.

What happens if Strata is updated to match the new Guava? (Joda-Convert is not updated)

Project Version Module name Requires
Strata v2 com.opengamma.strata org.joda.convert
com.google.common
Joda-Convert v1 org.joda.convert guava (a filename)
Guava v2 com.google.common  
  Module Hell - "guava" != "com.google.common"

This configuration also does not work. There is no way to have the same packages loaded under two different module names.

The only way to get it to work is to update the whole stack together:

Project Version Module name Requires
Strata v2 com.opengamma.strata org.joda.convert
com.google.common
Joda-Convert v2 org.joda.convert com.google.common
Guava v2 com.google.common  

To summarise, when a module depends on an automatic module, and that module is then depended on by others, the whole stack is then linked. Everything in the stack had to go from v1 to v2 together.

Looking back at the whole example, it should be clear that the problem started right back at the beginning. There never should have been a v1 release of Strata that depended on a filename "guava". Or a v1 release of Joda-Convert that depended on a filename. Instead, Strata should have waited until both Guava and Joda-Convert had released a modularised v2. And, Joda-Convert should have waited until Guava had released a modularised v2. To avoid Module Hell, migration must occur from bottom to top.

Given this, it is my opinion that this means that automatic modules do not by themselves provide a viable migration path for the Java open source community. The rule is as follows:

Do not release to Maven Central a modular jar file that expresses a dependency on a filename. Instead, wait until all dependencies can be expressed as module names.

Any jar file in Maven Central that expresses a dependency on a filename will be a cause of Module Hell.

Mitigation

Without any mitigation, the community would have to modularise each open source library one by one from the bottom of the stack upwards. No open source project could do anything until all its dependencies are modularised - a bottom-up migration.

The main piece of mitigation proposed for this problem is that jar files can have a new entry in MANIFEST.MF called "Automatic-Module-Name". When JPMS examines an automatic module, if the MANIFEST.MF entry is present then it uses the value as the module name instead of the filename.

This can be used to break the cycle to a degree. In the case above, the Strata team can release a version at any time with the new MANIFEST.MF entry, essentially stating what module name Strata is going to have in the future. Similarly, Joda-Convert can release at any time with the MANIFEST.MF entry stating what its module name will be. Full modularisation will still need to proceed from the bottom-up, but anyone depending on Joda-Convert or Strata can safely release without being affected by the question over Guava's module name.

The key difference between adding the MANIFEST.MF entry and adding a module declaration is that the MANIFEST.MF entry does not need the names of the dependencies to be specified. As such, there is no need to wait for the dependencies to be modularised before adding the MANIFEST.MF entry.

The rule outlined above can thus also be expressed as:

Do not release to Maven Central a modular jar file that depends on an automatic module, unless the automatic module has an "Automatic-Module-Name" MANIFEST.MF entry.

To be clear, I don't think this is ideal, but we are where we are. The message to the open source community is in two parts therefore.

Firstly, do not add a module-info.java module declaration until:

  • all of your runtime dependencies have been modularised (either as a full module or with a MANIFEST.MF entry)
  • all those modularised dependencies have been released to Maven Central
  • your library depends on the updated versions

Secondly, if you can't meet these criteria, but your project is well structured and otherwise suitable for modularisation, please add a MANIFEST.MF entry following the agreed module naming conventions (super-package reverse-DNS).

If everyone does this, then we stand a reasonable chance of avoiding Module Hell.

Summary

Automatic modules allow modules to depend on non-modules. But this is achieved by specifying a requires clause on the filename, not the module name. This will cause pain later if modules are published depending on the filename. A new MANIFEST.MF entry allows any open source project to choose a module name and publish that choice immediately. When JPMS sees the MANIFEST.MF entry, it will use the value as the name for the automatic module.

Community members must at all costs avoid publishing modular jar files that depend on filenames. But community members can publish jar files containing the new MANIFEST.MF entry from now on (although technically the MANIFEST.MF entry is not yet finalised, it probably will be soon).

Comments and feedback welcome.

Monday, 24 April 2017

Java SE 9 - JPMS modules are not artifacts

This is the next article in a series I'm writing to help make sense of the Java Platform Module System (JPMS) in Java SE 9. JPMS was developed as Project Jigsaw. Other articles in the series are Module basics and Module naming.

Module != Artifact

If you want to grasp what JPMS modules are all about, it turns out that it is critical to understand what they are not. In particular, they are not artifacts.

Firstly, lets define an artifact. An artifact is a file produced when developing software. For a project on Maven Central, this includes jar files of bytecode, jar files of sources and jar files of Javadoc. We are interested primarily in the bytecode for this discussion.

Secondly, lets assume that a project is going to have the same module name over time. This is just like package names - projects don't change package name with every release.

Given this, what is the mapping between an artifact and a module?

Versions

Each version of a project will consist of a different artifact (jar file), perhaps released on Maven Central. Each version will have the same module name. But, we also know that the Java platform (JPMS) does not know about versions or version-selection.

Therefore, when assembling a modulepath for Java SE 9, something else is going to have to choose the correct version of the module. This will typically be the build tool, eg. Maven.

But while the classpath will tolerate having two versions of the artifact (typically with bad consequences at runtime), the JPMS modulepath will refuse to start if there two modules contain the same package, as would happen if two versions of the same module are found.

Maven already manages versions of course, picking one version from a set of versions, where all with the same groupId:artifactId. With Java SE 9 we can say that Maven is picking one artifact from a set of artifacts to use in the runtime JPMS module graph.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick one of these artifacts for the runtime JPMS module graph org.joda.convert
org.joda : joda-convert : 1.1
org.joda : joda-convert : 1.0

Patching bugs

We've all run into the problem of finding a bug in another library, such as one on Maven Central. Most of the time, we workaround the bug. Sometimes, we have to fix it and have a private copy of the library (ie. a private version of the library). In extreme cases, we have to publish the bug-fix version to Maven Central.

If you want to publish a patched version of an open source project, the standard way to do this is to use your groupId, not the original groupId.

The patched version still uses the same package name. If it didn't it, it would be no use as a patched version.

Exactly the same rationale applies to the module name - the module name of the patched version needs to stay the same. This makes sense, because the module name is an aspect of the bytecode, not of the deployment.

As before, the build tool needs to be setup to pick the correct jar file artifact, this time choosing between the patched one and the original.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick either the original artifact or the patch for the runtime JPMS module graph org.joda.convert
com.foo : patched-joda-convert : 1.2-P

License-driven jars

There can be a situation where the equivalent code is released twice for license reasons. The most common case of this has been driven by the JCP.

Imagine that a JSR has been produced, and because of licensing restrictions, Apache and RedHat each decide to produce their own version of the specification (API) jar, one Apache licensed and one LGPL licensed. There will be two different jar file artifacts in Maven Central - redhat-jsr789-1.0.jar and apache-jsr789-1.0.jar. Both of these contain the same package name, and the same interfaces, because they are the same specification.

When creating a module-info.java file for these, teams might be tempted to put "redhat" or "apache" in the module name. But this would be wrong, as a package must be in one module at runtime. Thus, both teams must use the same module name, based on the package name, such as javax.foo. In reality, future JSRs will need to choose their own module name.

Just as in the other cases, the build tool will need to select the correct jar file artifact to use for any given application.

Artifacts   JPMS runtime module
org.apache : apache-jsr789 : 1.0 Build tool must pick one of the artifacts for the runtime JPMS module graph javax.foo
com.redhat : redhat-jsr789 : 1.0-GA

Two worlds

JPMS brings front and centre a distinction that we probably haven't thought too much about - that between build tools and code. I find it to be a useful mental model to keep these two worlds separate.

  Build/Deploy world Code world
Concept
  • Artifacts (jar files)
  • Versions
  • Groups
  • Organizations
  • Modules
  • Packages
  • Classes/Interfaces
  • Methods/Fields
Identifier org.joda : joda-convert : 1.2 org.joda.convert
Tool
  • Maven
  • Gradle
  • Ant
  • javac
  • java
  • jar

Summary

All the use cases above (and others) reduce to the fact that a build tool, such as Maven, must pick one artifact from many to satisfy the module requested in the JPMS modules graph. The module, and the module name, is tightly linked to the bytecode and packages. By contrast, the Maven groupId:artifactId:version co-ordinates are a tool for identifying each artifact in order to pick the correct one to use.

Hopefully, this helps explain the philosophical difference between artifacts and modules, and why modules should be named after their super-package, not after the artifactId. And why you cannot derive the module name from groupId:artifactId - just look at the tables above to see the varying artifact names.

Questions and comments welcome.

Thursday, 20 April 2017

Java SE 9 - JPMS module naming

The Java Platform Module System (JPMS) is soon to arrive, developed as Project Jigsaw. This article follows the introduction and looks at how modules should be named.

As with all "best practices", they are ultimately the opinion of the person writing them. I hope however to convince you that my opinion is right ;-). And as a community, we will certainly benefit if everyone follows the same rules, just like we benefited from everyone using reverse-DNS for package names.

TL;DR - My best practices

These are my recommendations for module naming:

  • Module names must be reverse-DNS, just like package names, e.g. org.joda.time.
  • Modules are a group of packages. As such, the module name must be related to the package names.
  • Module names are strongly recommended to be the same as the name of the super-package.
  • Creating a module with a particular name takes ownership of that package name and everything beneath it.
  • As the owner of that namespace, any sub-packages may be grouped into sub-modules as desired so long as no package is in two modules.

Thus the following is a well-named module:

  module org.joda.time {
    requires org.joda.convert;

    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

As can be seen, the module contains a set of packages (exported and hidden), all under one super-package. The module name is the same as the super-package name. The author of the module is asserting control over all names below org.joda.time, and could create a module org.joda.time.18n in the future if desired.

To understand why this approach makes sense, and the finer details, read on.

JPMS naming

Naming anything in software is hard. Unsurprisingly then, agreeing an approach to naming modules has also turned out to be hard.

The naming rules allow dots, but prohibit dashes, thus lots of name options are closed off. As a side note, module names in the JVM are more flexible, but we are only considering names at the Java level here.

These are the two basic approaches which I think make sense:

1) Project-style. Short names, as commonly seen in the jar filename from Maven Central.

2) Reverse DNS. Full names, exactly as we've used for a package names since Java v1.0.

Here are some examples to make it more clear:

  Project-style Reverse-DNS
Joda-Time joda.time org.joda.time
Commons-IO commons.io org.apache.commons.io
Strata-Basics strata.basics com.opengamma.strata.basics
JUnit junit org.junit

All things being equal, we'd choose the shorter name - project-style. It is certainly more attractive when reading a module-info.java file. But there are some clear reasons why reverse-DNS must be chosen.

It is worth noting that Mark Reinhold currently indicates a preference for project-style names. However, the linked mail doesn't really deal with the global uniqueness or clashing elements of the naming problem, and others in the expert group disagreed with project-style names.

Ownership and Uniqueness

The original designers of Java made a very shrewd choice to proposed reverse-DNS names for packages. This approach has scaled very well, through the incredible rise of open source software. It provides two key properties - Ownership and Uniqueness.

The ownership aspect of reverse-DNS delegates control of part of the global DNS namespace to an individual or company. It is a universally agreed approach with enough breadth of identifiers to make clashes rare. Within that namespace, developers are then responsible for ensuring uniqueness. Together, these two aspects result in globally unique package names. As such, it is pretty rare that code has two colliding packages, despite modern applications pulling in hundreds of dependent jar files. For example, the Spark framework and Apache Spark co-exist despite having the same simple name. But look what happens if we only use project-style names:

  Project-style Reverse-DNS
Spark framework spark.core com.sparkjava.core
Apache-Spark spark.core org.apache.spark.core

As can be seen, the project-style names clash! JPMS will simply refuse to start a modulepath where two modules have the same name, even if they contain different packages. (Since these projects haven't chosen module names yet, I've tweaked the example to make them clash. But this example is far from impossible, which is the point here!)

Not convinced? Well imagine what would happen if package names were not reverse-DNS. If your application pulls in hundreds of dependencies, do you think there would be no duplicates?

Of course we have project-style names today in Maven - the jar filename is the artifactId which is a project-style name. Given this, why don't we have problems today? Well it turns out that Maven is smart enough to rename the artifact if there is going to be a clash. The JPMS does not offer this ability - your only choice with a clash will be to rewrite the module-info-class file of the problematic module and all other modules that refer to it.

As a final example of how project-style name clashes can occur, consider a startup creating a new project - "willow". Since they are small, they choose a module name of "willow". Over the next year, the startup becomes fantastically successful, growing at an exponential rate, meaning that there are now 100s of modules within the company depending on "willow". But then a new Open Source project starts up, and calls itself "willow". Now, the company can't use the open source project. Nor can the company release "willow" as open source. These clashes are avoided if reverse-DNS names are used.

To summarize this section, we need reverse-DNS because module names need to be globally unique, even when writing modules that are destined to remain private. The ownership aspect of reverse-DNS provides enough namespace separation for companies to get the uniqueness necessary. After all, you wouldn't want to confuse Joda-Time with the freight company also called Joda would you?

Modules as package aggregates

The JPMS design is fundamentally simple - it extends JVM access control to add a new concept "modules" that groups together a set of packages. Given this, there is a very strong link between the concept of a module and the concept of a package.

The key restriction is that a package must be found in one and only one module.

Given that a module is formed from one or more packages, what is the conceptually simplest name that you can choose? I argue that it is one of the package names that forms the module. And thus a name you've already chosen. Now, consider we have a project with three packages, which of these three should be the module name?

  module ??? {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
  }

Again, I'd argue there isn't really a debate. There is a clear super-package, and that is what should be used as the module name - org.joda.time in this case.

Hidden packages

With JPMS, a module can hide packages. When hidden, the internal packages are not visible in Javadoc, nor are they visible in the module-info.java file. This means that consumers of the module have no immediate way of knowing what hidden packages a module has.

Now consider again the key restriction that a package must be found in one and only one module. This restriction applies to hidden packages as well as exported ones. Therefore if your application depends on two modules and both have the same hidden package, your application cannot be run as the packages clash. And since information on hidden packages is difficult to obtain, this clash will be surprising. (There are some advanced ways to around these clashes using layers, but these are designed for containers, not applications.)

The best solution to this problem is exactly as described in the last section. Consider a project with three exported packages and two hidden ones. So long as the hidden packages are sub-packages of the module name, we should be fine:

  module org.joda.time {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

By using the super-package name as the module name, the module developer has taken ownership of that package and everything below it. So long as all the non-exported packages are conceptually sub-packages, the end-user application should not see any hidden package clashes.

Automatic modules

JPMS includes a feature whereby a regular jar file, without a module-info.class file, turns into a special kind of module just by placing it on the modulepath. The automatic module feature is controversial in general, but a key part of this is that the name of the module is derived from the filename of the jar file. In addition, it means that people writing module-info.java files have to guess the name that someone else will use for a module. Having to guess a name, and having the Java platform pick a name based on the filename of a jar file are both bad ideas in my opinion, and that of many others, but our efforts to stop them seem to have failed.

The naming approach outlined in this article provides a means to mitigate the worst effects of this. If everyone uses reverse-DNS based on the super-package, then the guesses that people make should be reasonably accurate, as the selection process of a name should be fairly straightforward.

What if there isn't a clear super-package?

There are two cases to consider.

The first case is where there really is a super-package, it's just that it has no code. In this case, the implied super-package should be used. (Note that this example is Google Guava, which doesn't have guava in the package name!):

  module com.google.common {
    exports com.google.common.base;
    exports com.google.common.collect;
    exports com.google.common.io;
  }

The second case is where a jar file has two completely unrelated super-packages:

  foo.jar
  - package com.foo.util
  - package com.foo.util.money
  - package com.bar.client

The right approach here is to break the jar file into two separate modules:

  module com.foo.util {
    requires com.bar.client;
    exports com.foo.util;
    exports com.foo.util.money;
  }
  module com.bar.client {
    exports com.bar.client;
  }

Failure to do this is highly likely to cause clashes at some point, as there is no way that com.foo.util should be claiming ownership of the com.bar.client namespace.

If com.bar.client is going to be a hidden package when converted to modules, then instead of it being a separate module, it can be repackaged (i.e. shaded) under the module's super-package:

  module com.foo.util {
    exports com.foo.util;
    exports com.foo.util.money;
    // not exported: com.foo.util.shade.com.bar.client;
  }

Can you have sub-modules?

Yes. When a module name is chosen, the developer is taking control of a namespace. That namespace consists of the module name and all sub-names below it - sub-package names and sub-module names.

Ownership of that namespace allows the developer to release one module or many. The main constraint is that there should not be two published modules containing the same package.

As a side effect of this, the practice of larger projects releasing an "all" jar will need to stop. An "all" jar is used when the project has lots of separate jar files, but also wants to allow end-users to depend on a single jar file. These "all" jar files are a pain in Maven dependency trees, but will be a disaster in JPMS ones, as there is no way to override the metadata, unlike in Maven.

What if my existing project does not meet these guidelines?

The harsh suggestion is to change the project in an incompatible manner so it does meet the guidelines. JPMS in Java SE 9 is disruptive. It does not take the approach of providing all the tools necessary to meet all the edge cases in current deployments. As such, it is not surprising that some jar files and some projects will require some major rework.

Why ignore the Maven artifactId?

JPMS is an extension to the Java platform (language and runtime). Maven is a build system. Both are necessary, but they have different purposes, needs and conventions.

JPMS is all about packages, grouping them together to form modules and linking those. In this way, developers are working with source code, just like any other source code. What artifacts the source code is packed up into is a separate question. Understanding the separation is hard, because currently there is a one-to-one mapping between the module and the jar file, however, we should not assume this will always be the case in the future.

Another example of this separation is versioning. JPMS has little to no support for versions, yet build systems like Maven do. When running the application, Maven is responsible for collecting a coherent set of artifacts (jar files) to run the application, just as before. It's just that some of those might be modules.

Finally, the Maven artifactId does not exist in isolation. Maven makes unique identifiers by combining the groupId, artifactId and classifier. Only the combination is sufficiently globally unique to be useful. Picking out just the artifactId and trying to make a unique module name from it is asking for trouble.

Summary

JPMS module names, and the module-info.java in general, are going to require real thought to get right. The module declaration will be as much a part of your API as your method signatures.

The importance is heightened because, unlike Maven and other module systems, JPMS has no way to fix broken metadata. If you rely on some modular jar files, and get a clash or find some other mistake in the module declarations, your only options will be to not use JPMS or to rewrite the module declarations yourself. Given this difficulty, it is not yet clear that JPMS will be a success, thus your best option may be to not modularize your code.

See the TL;DR section above for the summary of the module name proposal. Feedback and questions welcome.

PS. For clarity, my personal interest is ensuring Java succeeds, something that will IMO require consistent naming.

Monday, 17 April 2017

Java 9 modules - JPMS basics

The Java Platform Module System (JPMS) is the major new feature of Java SE 9. In this article, I will introduce it, leaving most of my opinions to a follow up article. This is based on these slides.

Java Platform Module System (JPMS)

The new module system, developed as Project Jigsaw, is intended to raise the abstraction level of coding in Java as follows:

The primary goals of this Project are to:
* Make the Java SE Platform, and the JDK, more easily scalable down to small computing devices;
* Improve the security and maintainability of Java SE Platform Implementations in general, and the JDK in particular;
* Enable improved application performance; and
* Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.
To achieve these goals we propose to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. The module system should be powerful enough to modularize the JDK and other large legacy code bases, yet still be approachable by all developers.

However as we shall see, project goals are not always met.

What is a JPMS Module?

JPMS is a change to the Java libraries, language and runtime. This means that it affects the whole stack that developers code with day-to-day, and as such JPMS could have a big impact. For compatibility reasons, most existing code can ignore JPMS in Java SE 9, something that may prove to be very useful.

The key conceptual point to grasp is that JPMS adds new a concept to the JVM - modules. Where previously, code was organized into fields, methods, classes, interfaces and packages, with Java SE 9 there is a new structural element - modules.

  • a class is a container of fields and methods
  • a package is a container of classes and interfaces
  • a module is a container of packages

Because this is a new JVM element, it means the runtime can apply strong access control. With Java 8, a developer can express that the methods of a class cannot be seen by other classes by declaring them private. With Java 9, a developer can express that a package cannot be seen by other modules - ie. a package can be hidden within a module.

Being able to hide packages should in theory be a great benefit for application design. No longer should there be a need for a package to be named "impl" or "internal" with Javadoc declaring "please don't use types from this package". Unfortunately, life won't be quite that simple.

Creating a module is relatively simple however. A module is typically just a jar file that has a module-info.class file at the root - known as a modular jar file. And that file is created from a module-info.java file in your sourcebase (see below for more details).

Using a modular jar file involves adding the jar file to the modulepath instead of the classpath. If a modular jar file is on the classpath, it will not act as a module at all, and the module-info.class will be ignored. As such, while a modular jar file on the modulepath will have hidden packages enforced by the JVM, a modular jar file on the classpath will not have hidden packages at all.

Other module systems

Java has historically had other module systems, most notably OSGi and JBoss Modules. It is important to understand that JPMS has little resemblance to those systems.

Both OSGi and JBoss Modules have to exist without direct support from the JVM, yet still provide some additional support for modules. This is achieved by launching each module in its own class loader, a technique that gets the job done, yet is not without its own issues.

Unsurprisingly, given these are existing module systems, experts from those groups have been included in the formal Expert Group developing JPMS. However, this relationship has not been harmonious. Fundamentally, the JPMS authors (Oracle) have set out to build a JVM extension that can be used for something that can be described as modules, whereas the existing module systems derive experience and value from real use cases and tricky edge cases in big applications that exist today.

When reading about modules, it is important to consider whether the authors of the article you are reading are from the OSGi/JBoss Modules design camp. (I have never actively used OSGi or JBoss Modules, although I have used Eclipse and other tools that use OSGi internally.)

module-info.java

The module-info.java file contains the instructions that define a module (the most important ones are covered here, but there are more). This is a .java file, however the syntax is nothing like any .java file you've seen before.

There are two key questions that you have to answer to create the file - what does this module depend on, and what does it export:

module com.opengamma.util {
  requires org.joda.beans;  // this is a module name, not a package name
  requires com.google.guava;

  exports com.opengamma.util;  // this is a package name, not a module name
}

(The names to use for modules needed a whole separate article, for this one I'll use package-name style)

This module declaration says that com.opengamma.util depends on (requires) org.joda.beans and com.google.guava. It exports one package, com.opengamma.util. All other packages are hidden when using the modulepath (enforced by the JVM).

There is an implicit dependency on java.base, the core module of the JDK. Note that the JDK itself is also modularized, so if you want to depend on Swing, XML or Logging, that dependency needs to be expressed.

module org.joda.beans {
  requires transitive org.joda.convert;

  exports org.joda.beans;
  exports org.joda.beans.ser;
}

This module declaration says that org.joda.beans depends on (requires) org.joda.convert. The "requires transitive", as opposed to a simple "requires", means that any module that requires org.joda.beans can also see and use the packages from org.joda.convert. This is used here as Joda-Beans has methods where the return type is from Joda-Convert. This is shown by a dashed line.

module org.joda.convert {
  requires static com.google.guava;

  exports org.joda.convert;
}

This module declaration says that org.joda.convert depends on (requires) com.google.guava, but only at compile time, "requires static", as opposed to a simple "requires". This is an optional dependency. If Guava is on the modulepath, then Joda-Convert will be able to see and use it, and no error will occur if Guava is not present. This is shown by a dotted line.

Access rules

When running a modular jar on the modulepath with JVM access rules applied, code in package A can see a type in package B if:

  • the type is public
  • package B is exported from it's module
  • there is a dependency from the module containing package A to the module containing package B

Thus, in the example above, code in module com.opengamma.util can see packages org.joda.beans, org.joda.beans,ser, org.joda.convert and any package exported by Guava. However, it cannot see package org.joda.convert.internal (as it is not exported). In addition, code module com.google.guava cannot see code in package org.joda.beans or org.joda.convert as there is no modular dependency.

What can go wrong?

The basics described above are simple enough. It is initially quite easy to imagine how you might build an application from these foundations and benefit from hiding packages. Unfortunately, quite a few things can go wrong.

1) All use of module-info files only applies if using modular jars on the modulepath. For compatibility, all code on the classpath is packaged up as a special unnamed module, with no hidden packages and full access to the whole JDK. Thus, the security benefits of hiding packages are marginal at best. However, the modules of the JDK itself are always run in modular mode, thus are always guaranteed the security benefits.

2) Versions of modules are not handled. You cannot have the same module name loaded twice - you cannot have two versions of the same module loaded twice. It is left entirely to you, and thus to your build tool, to create a coherent set of modules that can actually be run. Thus, the classpath hell situation caused by clashing versions is not solved. Note that putting the version number in the module name is a Bad Idea that does not solve this problem and creates others.

3) Two modules may not contain the same package. This seems eminently sensible, until you consider that it also applies to hidden packages. Since hidden packages are not listed in module-info.class, a tool like Maven must unpack the jar file to discover what hidden packages there are in order to warn of clashes. As a user of the library, such a clash will be completely surprising, as you won't have any indication of the hidden packages in the Javadoc. This is a more general indication that JPMS does not provide sufficient isolation between modules, for reasons that are far from clear at this point.

4) There must be no cycles between modules, at compile time and at runtime. Again, this seems sensible - who wants to have module A depend on B depend on C which depends on A? But the reality of existing projects is that this happens, and on the classpath is not a problem. For example, consider what would happen if Guava decided to depend on Joda-Convert in the example above. This restriction will make some existing open source projects hard to migrate.

5) Reflection is changing, such that non-public fields and methods will no longer be accessible via reflection. Since almost every framework uses reflection in this way, there will be significant work needed to migrate existing code. In particular, the JDK will be very locked down against reflection, which may prove painful (command line flags can escape the trap for now). This article hasn't had a chance to explore how the module declaration can influence reflection - see "opens" in the slides for more details.

6) Are your dependencies modularized? In theory, you can only turn your code into a module once all your dependencies are also modules. For any large application with hundreds of jar file dependencies, this will be a problem. The "solution" is automatic modules, where a normal jar file placed on the modulepath is automatically turned into a module. This process is controversial, with naming a big issue. Library authors should not publish modules that depend on automatic modules to public repositories like Maven Central unless they have the Automatic-Module-Name manifest entry. Again, automatic modules deserve their own article!

7) Module naming is not yet set in stone. I've come to believe that naming your module after the highest package it contains, causing that module to "take ownership" of the subpackages, is the only sane strategy.

8) Conflicts with build systems - who is in charge? A Maven pom.xml also contains information about a project. Should it be extended to allow module information to be added? I would suggest not, because the module-info.java file contains a binding part of your API, and that is best expressed in .java code, not metadata like pom.xml.

For those wanting a book to read in much more depth, try this one from Nicolai.

Summary

Do not get too excited about JPMS - modules in Java 9. The above is only a summary of what is possible with the module-info.java file and the restrictions of the JPMS. If you are thinking of modularizing your library or application, please wait a little longer until everything becomes a little clearer.

Feedback welcome, but bear in mind that I am planning more articles.