Monday, 24 April 2017

Java SE 9 - JPMS modules are not artifacts

This is the next article in a series I'm writing to help make sense of the Java Platform Module System (JPMS) in Java SE 9. JPMS was developed as Project Jigsaw. Other articles in the series are Module basics and Module naming.

Module != Artifact

If you want to grasp what JPMS modules are all about, it turns out that it is critical to understand what they are not. In particular, they are not artifacts.

Firstly, lets define an artifact. An artifact is a file produced when developing software. For a project on Maven Central, this includes jar files of bytecode, jar files of sources and jar files of Javadoc. We are interested primarily in the bytecode for this discussion.

Secondly, lets assume that a project is going to have the same module name over time. This is just like package names - projects don't change package name with every release.

Given this, what is the mapping between an artifact and a module?

Versions

Each version of a project will consist of a different artifact (jar file), perhaps released on Maven Central. Each version will have the same module name. But, we also know that the Java platform (JPMS) does not know about versions or version-selection.

Therefore, when assembling a modulepath for Java SE 9, something else is going to have to choose the correct version of the module. This will typically be the build tool, eg. Maven.

But while the classpath will tolerate having two versions of the artifact (typically with bad consequences at runtime), the JPMS modulepath will refuse to start if there two modules contain the same package, as would happen if two versions of the same module are found.

Maven already manages versions of course, picking one version from a set of versions, where all with the same groupId:artifactId. With Java SE 9 we can say that Maven is picking one artifact from a set of artifacts to use in the runtime JPMS module graph.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick one of these artifacts for the runtime JPMS module graph org.joda.convert
org.joda : joda-convert : 1.1
org.joda : joda-convert : 1.0

Patching bugs

We've all run into the problem of finding a bug in another library, such as one on Maven Central. Most of the time, we workaround the bug. Sometimes, we have to fix it and have a private copy of the library (ie. a private version of the library). In extreme cases, we have to publish the bug-fix version to Maven Central.

If you want to publish a patched version of an open source project, the standard way to do this is to use your groupId, not the original groupId.

The patched version still uses the same package name. If it didn't it, it would be no use as a patched version.

Exactly the same rationale applies to the module name - the module name of the patched version needs to stay the same. This makes sense, because the module name is an aspect of the bytecode, not of the deployment.

As before, the build tool needs to be setup to pick the correct jar file artifact, this time choosing between the patched one and the original.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick either the original artifact or the patch for the runtime JPMS module graph org.joda.convert
com.foo : patched-joda-convert : 1.2-P

License-driven jars

There can be a situation where the equivalent code is released twice for license reasons. The most common case of this has been driven by the JCP.

Imagine that a JSR has been produced, and because of licensing restrictions, Apache and RedHat each decide to produce their own version of the specification (API) jar, one Apache licensed and one LGPL licensed. There will be two different jar file artifacts in Maven Central - redhat-jsr789-1.0.jar and apache-jsr789-1.0.jar. Both of these contain the same package name, and the same interfaces, because they are the same specification.

When creating a module-info.java file for these, teams might be tempted to put "redhat" or "apache" in the module name. But this would be wrong, as a package must be in one module at runtime. Thus, both teams must use the same module name, based on the package name, such as javax.foo. In reality, future JSRs will need to choose their own module name.

Just as in the other cases, the build tool will need to select the correct jar file artifact to use for any given application.

Artifacts   JPMS runtime module
org.apache : apache-jsr789 : 1.0 Build tool must pick one of the artifacts for the runtime JPMS module graph javax.foo
com.redhat : redhat-jsr789 : 1.0-GA

Two worlds

JPMS brings front and centre a distinction that we probably haven't thought too much about - that between build tools and code. I find it to be a useful mental model to keep these two worlds separate.

  Build/Deploy world Code world
Concept
  • Artifacts (jar files)
  • Versions
  • Groups
  • Organizations
  • Modules
  • Packages
  • Classes/Interfaces
  • Methods/Fields
Identifier org.joda : joda-convert : 1.2 org.joda.convert
Tool
  • Maven
  • Gradle
  • Ant
  • javac
  • java
  • jar

Summary

All the use cases above (and others) reduce to the fact that a build tool, such as Maven, must pick one artifact from many to satisfy the module requested in the JPMS modules graph. The module, and the module name, is tightly linked to the bytecode and packages. By contrast, the Maven groupId:artifactId:version co-ordinates are a tool for identifying each artifact in order to pick the correct one to use.

Hopefully, this helps explain the philosophical difference between artifacts and modules, and why modules should be named after their super-package, not after the artifactId. And why you cannot derive the module name from groupId:artifactId - just look at the tables above to see the varying artifact names.

Questions and comments welcome.

Thursday, 20 April 2017

Java SE 9 - JPMS module naming

The Java Platform Module System (JPMS) is soon to arrive, developed as Project Jigsaw. This article follows the introduction and looks at how modules should be named.

As with all "best practices", they are ultimately the opinion of the person writing them. I hope however to convince you that my opinion is right ;-). And as a community, we will certainly benefit if everyone follows the same rules, just like we benefited from everyone using reverse-DNS for package names.

TL;DR - My best practices

These are my recommendations for module naming:

  • Module names must be reverse-DNS, just like package names, e.g. org.joda.time.
  • Modules are a group of packages. As such, the module name must be related to the package names.
  • Module names are strongly recommended to be the same as the name of the super-package.
  • Creating a module with a particular name takes ownership of that package name and everything beneath it.
  • As the owner of that namespace, any sub-packages may be grouped into sub-modules as desired so long as no package is in two modules.

Thus the following is a well-named module:

  module org.joda.time {
    requires org.joda.convert;

    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

As can be seen, the module contains a set of packages (exported and hidden), all under one super-package. The module name is the same as the super-package name. The author of the module is asserting control over all names below org.joda.time, and could create a module org.joda.time.18n in the future if desired.

To understand why this approach makes sense, and the finer details, read on.

JPMS naming

Naming anything in software is hard. Unsurprisingly then, agreeing an approach to naming modules has also turned out to be hard.

The naming rules allow dots, but prohibit dashes, thus lots of name options are closed off. As a side note, module names in the JVM are more flexible, but we are only considering names at the Java level here.

These are the two basic approaches which I think make sense:

1) Project-style. Short names, as commonly seen in the jar filename from Maven Central.

2) Reverse DNS. Full names, exactly as we've used for a package names since Java v1.0.

Here are some examples to make it more clear:

  Project-style Reverse-DNS
Joda-Time joda.time org.joda.time
Commons-IO commons.io org.apache.commons.io
Strata-Basics strata.basics com.opengamma.strata.basics
JUnit junit org.junit

All things being equal, we'd choose the shorter name - project-style. It is certainly more attractive when reading a module-info.java file. But there are some clear reasons why reverse-DNS must be chosen.

It is worth noting that Mark Reinhold currently indicates a preference for project-style names. However, the linked mail doesn't really deal with the global uniqueness or clashing elements of the naming problem, and others in the expert group disagreed with project-style names.

Ownership and Uniqueness

The original designers of Java made a very shrewd choice to proposed reverse-DNS names for packages. This approach has scaled very well, through the incredible rise of open source software. It provides two key properties - Ownership and Uniqueness.

The ownership aspect of reverse-DNS delegates control of part of the global DNS namespace to an individual or company. It is a universally agreed approach with enough breadth of identifiers to make clashes rare. Within that namespace, developers are then responsible for ensuring uniqueness. Together, these two aspects result in globally unique package names. As such, it is pretty rare that code has two colliding packages, despite modern applications pulling in hundreds of dependent jar files. For example, the Spark framework and Apache Spark co-exist despite having the same simple name. But look what happens if we only use project-style names:

  Project-style Reverse-DNS
Spark framework spark.core com.sparkjava.core
Apache-Spark spark.core org.apache.spark.core

As can be seen, the project-style names clash! JPMS will simply refuse to start a modulepath where two modules have the same name, even if they contain different packages. (Since these projects haven't chosen module names yet, I've tweaked the example to make them clash. But this example is far from impossible, which is the point here!)

Not convinced? Well imagine what would happen if package names were not reverse-DNS. If your application pulls in hundreds of dependencies, do you think there would be no duplicates?

Of course we have project-style names today in Maven - the jar filename is the artifactId which is a project-style name. Given this, why don't we have problems today? Well it turns out that Maven is smart enough to rename the artifact if there is going to be a clash. The JPMS does not offer this ability - your only choice with a clash will be to rewrite the module-info-class file of the problematic module and all other modules that refer to it.

As a final example of how project-style name clashes can occur, consider a startup creating a new project - "willow". Since they are small, they choose a module name of "willow". Over the next year, the startup becomes fantastically successful, growing at an exponential rate, meaning that there are now 100s of modules within the company depending on "willow". But then a new Open Source project starts up, and calls itself "willow". Now, the company can't use the open source project. Nor can the company release "willow" as open source. These clashes are avoided if reverse-DNS names are used.

To summarize this section, we need reverse-DNS because module names need to be globally unique, even when writing modules that are destined to remain private. The ownership aspect of reverse-DNS provides enough namespace separation for companies to get the uniqueness necessary. After all, you wouldn't want to confuse Joda-Time with the freight company also called Joda would you?

Modules as package aggregates

The JPMS design is fundamentally simple - it extends JVM access control to add a new concept "modules" that groups together a set of packages. Given this, there is a very strong link between the concept of a module and the concept of a package.

The key restriction is that a package must be found in one and only one module.

Given that a module is formed from one or more packages, what is the conceptually simplest name that you can choose? I argue that it is one of the package names that forms the module. And thus a name you've already chosen. Now, consider we have a project with three packages, which of these three should be the module name?

  module ??? {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
  }

Again, I'd argue there isn't really a debate. There is a clear super-package, and that is what should be used as the module name - org.joda.time in this case.

Hidden packages

With JPMS, a module can hide packages. When hidden, the internal packages are not visible in Javadoc, nor are they visible in the module-info.java file. This means that consumers of the module have no immediate way of knowing what hidden packages a module has.

Now consider again the key restriction that a package must be found in one and only one module. This restriction applies to hidden packages as well as exported ones. Therefore if your application depends on two modules and both have the same hidden package, your application cannot be run as the packages clash. And since information on hidden packages is difficult to obtain, this clash will be surprising. (There are some advanced ways to around these clashes using layers, but these are designed for containers, not applications.)

The best solution to this problem is exactly as described in the last section. Consider a project with three exported packages and two hidden ones. So long as the hidden packages are sub-packages of the module name, we should be fine:

  module org.joda.time {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

By using the super-package name as the module name, the module developer has taken ownership of that package and everything below it. So long as all the non-exported packages are conceptually sub-packages, the end-user application should not see any hidden package clashes.

Automatic modules

JPMS includes a feature whereby a regular jar file, without a module-info.class file, turns into a special kind of module just by placing it on the modulepath. The automatic module feature is controversial in general, but a key part of this is that the name of the module is derived from the filename of the jar file. In addition, it means that people writing module-info.java files have to guess the name that someone else will use for a module. Having to guess a name, and having the Java platform pick a name based on the filename of a jar file are both bad ideas in my opinion, and that of many others, but our efforts to stop them seem to have failed.

The naming approach outlined in this article provides a means to mitigate the worst effects of this. If everyone uses reverse-DNS based on the super-package, then the guesses that people make should be reasonably accurate, as the selection process of a name should be fairly straightforward.

What if there isn't a clear super-package?

There are two cases to consider.

The first case is where there really is a super-package, it's just that it has no code. In this case, the implied super-package should be used. (Note that this example is Google Guava, which doesn't have guava in the package name!):

  module com.google.common {
    exports com.google.common.base;
    exports com.google.common.collect;
    exports com.google.common.io;
  }

The second case is where a jar file has two completely unrelated super-packages:

  foo.jar
  - package com.foo.util
  - package com.foo.util.money
  - package com.bar.client

The right approach here is to break the jar file into two separate modules:

  module com.foo.util {
    requires com.bar.client;
    exports com.foo.util;
    exports com.foo.util.money;
  }
  module com.bar.client {
    exports com.bar.client;
  }

Failure to do this is highly likely to cause clashes at some point, as there is no way that com.foo.util should be claiming ownership of the com.bar.client namespace.

If com.bar.client is going to be a hidden package when converted to modules, then instead of it being a separate module, it can be repackaged (i.e. shaded) under the module's super-package:

  module com.foo.util {
    exports com.foo.util;
    exports com.foo.util.money;
    // not exported: com.foo.util.shade.com.bar.client;
  }

Can you have sub-modules?

Yes. When a module name is chosen, the developer is taking control of a namespace. That namespace consists of the module name and all sub-names below it - sub-package names and sub-module names.

Ownership of that namespace allows the developer to release one module or many. The main constraint is that there should not be two published modules containing the same package.

As a side effect of this, the practice of larger projects releasing an "all" jar will need to stop. An "all" jar is used when the project has lots of separate jar files, but also wants to allow end-users to depend on a single jar file. These "all" jar files are a pain in Maven dependency trees, but will be a disaster in JPMS ones, as there is no way to override the metadata, unlike in Maven.

What if my existing project does not meet these guidelines?

The harsh suggestion is to change the project in an incompatible manner so it does meet the guidelines. JPMS in Java SE 9 is disruptive. It does not take the approach of providing all the tools necessary to meet all the edge cases in current deployments. As such, it is not surprising that some jar files and some projects will require some major rework.

Why ignore the Maven artifactId?

JPMS is an extension to the Java platform (language and runtime). Maven is a build system. Both are necessary, but they have different purposes, needs and conventions.

JPMS is all about packages, grouping them together to form modules and linking those. In this way, developers are working with source code, just like any other source code. What artifacts the source code is packed up into is a separate question. Understanding the separation is hard, because currently there is a one-to-one mapping between the module and the jar file, however, we should not assume this will always be the case in the future.

Another example of this separation is versioning. JPMS has little to no support for versions, yet build systems like Maven do. When running the application, Maven is responsible for collecting a coherent set of artifacts (jar files) to run the application, just as before. It's just that some of those might be modules.

Finally, the Maven artifactId does not exist in isolation. Maven makes unique identifiers by combining the groupId, artifactId and classifier. Only the combination is sufficiently globally unique to be useful. Picking out just the artifactId and trying to make a unique module name from it is asking for trouble.

See also this follow up article on modules vs artifacts.

Summary

JPMS module names, and the module-info.java in general, are going to require real thought to get right. The module declaration will be as much a part of your API as your method signatures.

The importance is heightened because, unlike Maven and other module systems, JPMS has no way to fix broken metadata. If you rely on some modular jar files, and get a clash or find some other mistake in the module declarations, your only options will be to not use JPMS or to rewrite the module declarations yourself. Given this difficulty, it is not yet clear that JPMS will be a success, thus your best option may be to not modularize your code.

See the TL;DR section above for the summary of the module name proposal. Feedback and questions welcome.

PS. For clarity, my personal interest is ensuring Java succeeds, something that will IMO require consistent naming.

Monday, 17 April 2017

Java 9 modules - JPMS basics

The Java Platform Module System (JPMS) is the major new feature of Java SE 9. In this article, I will introduce it, leaving most of my opinions to a follow up article. This is based on these slides.

Java Platform Module System (JPMS)

The new module system, developed as Project Jigsaw, is intended to raise the abstraction level of coding in Java as follows:

The primary goals of this Project are to:
* Make the Java SE Platform, and the JDK, more easily scalable down to small computing devices;
* Improve the security and maintainability of Java SE Platform Implementations in general, and the JDK in particular;
* Enable improved application performance; and
* Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.
To achieve these goals we propose to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. The module system should be powerful enough to modularize the JDK and other large legacy code bases, yet still be approachable by all developers.

However as we shall see, project goals are not always met.

What is a JPMS Module?

JPMS is a change to the Java libraries, language and runtime. This means that it affects the whole stack that developers code with day-to-day, and as such JPMS could have a big impact. For compatibility reasons, most existing code can ignore JPMS in Java SE 9, something that may prove to be very useful.

The key conceptual point to grasp is that JPMS adds new a concept to the JVM - modules. Where previously, code was organized into fields, methods, classes, interfaces and packages, with Java SE 9 there is a new structural element - modules.

  • a class is a container of fields and methods
  • a package is a container of classes and interfaces
  • a module is a container of packages

Because this is a new JVM element, it means the runtime can apply strong access control. With Java 8, a developer can express that the methods of a class cannot be seen by other classes by declaring them private. With Java 9, a developer can express that a package cannot be seen by other modules - ie. a package can be hidden within a module.

Being able to hide packages should in theory be a great benefit for application design. No longer should there be a need for a package to be named "impl" or "internal" with Javadoc declaring "please don't use types from this package". Unfortunately, life won't be quite that simple.

Creating a module is relatively simple however. A module is typically just a jar file that has a module-info.class file at the root - known as a modular jar file. And that file is created from a module-info.java file in your sourcebase (see below for more details).

Using a modular jar file involves adding the jar file to the modulepath instead of the classpath. If a modular jar file is on the classpath, it will not act as a module at all, and the module-info.class will be ignored. As such, while a modular jar file on the modulepath will have hidden packages enforced by the JVM, a modular jar file on the classpath will not have hidden packages at all.

Other module systems

Java has historically had other module systems, most notably OSGi and JBoss Modules. It is important to understand that JPMS has little resemblance to those systems.

Both OSGi and JBoss Modules have to exist without direct support from the JVM, yet still provide some additional support for modules. This is achieved by launching each module in its own class loader, a technique that gets the job done, yet is not without its own issues.

Unsurprisingly, given these are existing module systems, experts from those groups have been included in the formal Expert Group developing JPMS. However, this relationship has not been harmonious. Fundamentally, the JPMS authors (Oracle) have set out to build a JVM extension that can be used for something that can be described as modules, whereas the existing module systems derive experience and value from real use cases and tricky edge cases in big applications that exist today.

When reading about modules, it is important to consider whether the authors of the article you are reading are from the OSGi/JBoss Modules design camp. (I have never actively used OSGi or JBoss Modules, although I have used Eclipse and other tools that use OSGi internally.)

module-info.java

The module-info.java file contains the instructions that define a module (the most important ones are covered here, but there are more). This is a .java file, however the syntax is nothing like any .java file you've seen before.

There are two key questions that you have to answer to create the file - what does this module depend on, and what does it export:

module com.opengamma.util {
  requires org.joda.beans;  // this is a module name, not a package name
  requires com.google.guava;

  exports com.opengamma.util;  // this is a package name, not a module name
}

(The names to use for modules needed a whole separate article, for this one I'll use package-name style)

This module declaration says that com.opengamma.util depends on (requires) org.joda.beans and com.google.guava. It exports one package, com.opengamma.util. All other packages are hidden when using the modulepath (enforced by the JVM).

There is an implicit dependency on java.base, the core module of the JDK. Note that the JDK itself is also modularized, so if you want to depend on Swing, XML or Logging, that dependency needs to be expressed.

module org.joda.beans {
  requires transitive org.joda.convert;

  exports org.joda.beans;
  exports org.joda.beans.ser;
}

This module declaration says that org.joda.beans depends on (requires) org.joda.convert. The "requires transitive", as opposed to a simple "requires", means that any module that requires org.joda.beans can also see and use the packages from org.joda.convert. This is used here as Joda-Beans has methods where the return type is from Joda-Convert. This is shown by a dashed line.

module org.joda.convert {
  requires static com.google.guava;

  exports org.joda.convert;
}

This module declaration says that org.joda.convert depends on (requires) com.google.guava, but only at compile time, "requires static", as opposed to a simple "requires". This is an optional dependency. If Guava is on the modulepath, then Joda-Convert will be able to see and use it, and no error will occur if Guava is not present. This is shown by a dotted line.

Access rules

When running a modular jar on the modulepath with JVM access rules applied, code in package A can see a type in package B if:

  • the type is public
  • package B is exported from it's module
  • there is a dependency from the module containing package A to the module containing package B

Thus, in the example above, code in module com.opengamma.util can see packages org.joda.beans, org.joda.beans,ser, org.joda.convert and any package exported by Guava. However, it cannot see package org.joda.convert.internal (as it is not exported). In addition, code module com.google.guava cannot see code in package org.joda.beans or org.joda.convert as there is no modular dependency.

What can go wrong?

The basics described above are simple enough. It is initially quite easy to imagine how you might build an application from these foundations and benefit from hiding packages. Unfortunately, quite a few things can go wrong.

1) All use of module-info files only applies if using modular jars on the modulepath. For compatibility, all code on the classpath is packaged up as a special unnamed module, with no hidden packages and full access to the whole JDK. Thus, the security benefits of hiding packages are marginal at best. However, the modules of the JDK itself are always run in modular mode, thus are always guaranteed the security benefits.

2) Versions of modules are not handled. You cannot have the same module name loaded twice - you cannot have two versions of the same module loaded twice. It is left entirely to you, and thus to your build tool, to create a coherent set of modules that can actually be run. Thus, the classpath hell situation caused by clashing versions is not solved. Note that putting the version number in the module name is a Bad Idea that does not solve this problem and creates others.

3) Two modules may not contain the same package. This seems eminently sensible, until you consider that it also applies to hidden packages. Since hidden packages are not listed in module-info.class, a tool like Maven must unpack the jar file to discover what hidden packages there are in order to warn of clashes. As a user of the library, such a clash will be completely surprising, as you won't have any indication of the hidden packages in the Javadoc. This is a more general indication that JPMS does not provide sufficient isolation between modules, for reasons that are far from clear at this point.

4) There must be no cycles between modules, at compile time and at runtime. Again, this seems sensible - who wants to have module A depend on B depend on C which depends on A? But the reality of existing projects is that this happens, and on the classpath is not a problem. For example, consider what would happen if Guava decided to depend on Joda-Convert in the example above. This restriction will make some existing open source projects hard to migrate.

5) Reflection is changing, such that non-public fields and methods will no longer be accessible via reflection. Since almost every framework uses reflection in this way, there will be significant work needed to migrate existing code. In particular, the JDK will be very locked down against reflection, which may prove painful (command line flags can escape the trap for now). This article hasn't had a chance to explore how the module declaration can influence reflection - see "opens" in the slides for more details.

6) Are your dependencies modularized? In theory, you can only turn your code into a module once all your dependencies are also modules. For any large application with hundreds of jar file dependencies, this will be a problem. The "solution" is automatic modules, where a normal jar file placed on the modulepath is automatically turned into a module. This process is controversial, with naming a big issue. Library authors should not publish modules that depend on automatic modules to public repositories like Maven Central unless they have the Automatic-Module-Name manifest entry. Again, automatic modules deserve their own article!

7) Module naming is not yet set in stone. I've come to believe that naming your module after the highest package it contains, causing that module to "take ownership" of the subpackages, is the only sane strategy.

8) Conflicts with build systems - who is in charge? A Maven pom.xml also contains information about a project. Should it be extended to allow module information to be added? I would suggest not, because the module-info.java file contains a binding part of your API, and that is best expressed in .java code, not metadata like pom.xml.

For those wanting a book to read in much more depth, try this one from Nicolai.

Summary

Do not get too excited about JPMS - modules in Java 9. The above is only a summary of what is possible with the module-info.java file and the restrictions of the JPMS. If you are thinking of modularizing your library or application, please wait a little longer until everything becomes a little clearer.

Feedback welcome, but bear in mind that I am planning more articles.