Monday, 24 April 2017

Java SE 9 - JPMS modules are not artifacts

This is the next article in a series I'm writing to help make sense of the Java Platform Module System (JPMS) in Java SE 9. JPMS was developed as Project Jigsaw. Other articles in the series are Module basics and Module naming.

Module != Artifact

If you want to grasp what JPMS modules are all about, it turns out that it is critical to understand what they are not. In particular, they are not artifacts.

Firstly, lets define an artifact. An artifact is a file produced when developing software. For a project on Maven Central, this includes jar files of bytecode, jar files of sources and jar files of Javadoc. We are interested primarily in the bytecode for this discussion.

Secondly, lets assume that a project is going to have the same module name over time. This is just like package names - projects don't change package name with every release.

Given this, what is the mapping between an artifact and a module?

Versions

Each version of a project will consist of a different artifact (jar file), perhaps released on Maven Central. Each version will have the same module name. But, we also know that the Java platform (JPMS) does not know about versions or version-selection.

Therefore, when assembling a modulepath for Java SE 9, something else is going to have to choose the correct version of the module. This will typically be the build tool, eg. Maven.

But while the classpath will tolerate having two versions of the artifact (typically with bad consequences at runtime), the JPMS modulepath will refuse to start if there two modules contain the same package, as would happen if two versions of the same module are found.

Maven already manages versions of course, picking one version from a set of versions, where all with the same groupId:artifactId. With Java SE 9 we can say that Maven is picking one artifact from a set of artifacts to use in the runtime JPMS module graph.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick one of these artifacts for the runtime JPMS module graph org.joda.convert
org.joda : joda-convert : 1.1
org.joda : joda-convert : 1.0

Patching bugs

We've all run into the problem of finding a bug in another library, such as one on Maven Central. Most of the time, we workaround the bug. Sometimes, we have to fix it and have a private copy of the library (ie. a private version of the library). In extreme cases, we have to publish the bug-fix version to Maven Central.

If you want to publish a patched version of an open source project, the standard way to do this is to use your groupId, not the original groupId.

The patched version still uses the same package name. If it didn't it, it would be no use as a patched version.

Exactly the same rationale applies to the module name - the module name of the patched version needs to stay the same. This makes sense, because the module name is an aspect of the bytecode, not of the deployment.

As before, the build tool needs to be setup to pick the correct jar file artifact, this time choosing between the patched one and the original.

Artifacts   JPMS runtime module
org.joda : joda-convert : 1.2 Build tool must pick either the original artifact or the patch for the runtime JPMS module graph org.joda.convert
com.foo : patched-joda-convert : 1.2-P

License-driven jars

There can be a situation where the equivalent code is released twice for license reasons. The most common case of this has been driven by the JCP.

Imagine that a JSR has been produced, and because of licensing restrictions, Apache and RedHat each decide to produce their own version of the specification (API) jar, one Apache licensed and one LGPL licensed. There will be two different jar file artifacts in Maven Central - redhat-jsr789-1.0.jar and apache-jsr789-1.0.jar. Both of these contain the same package name, and the same interfaces, because they are the same specification.

When creating a module-info.java file for these, teams might be tempted to put "redhat" or "apache" in the module name. But this would be wrong, as a package must be in one module at runtime. Thus, both teams must use the same module name, based on the package name, such as javax.foo. In reality, future JSRs will need to choose their own module name.

Just as in the other cases, the build tool will need to select the correct jar file artifact to use for any given application.

Artifacts   JPMS runtime module
org.apache : apache-jsr789 : 1.0 Build tool must pick one of the artifacts for the runtime JPMS module graph javax.foo
com.redhat : redhat-jsr789 : 1.0-GA

Two worlds

JPMS brings front and centre a distinction that we probably haven't thought too much about - that between build tools and code. I find it to be a useful mental model to keep these two worlds separate.

  Build/Deploy world Code world
Concept
  • Artifacts (jar files)
  • Versions
  • Groups
  • Organizations
  • Modules
  • Packages
  • Classes/Interfaces
  • Methods/Fields
Identifier org.joda : joda-convert : 1.2 org.joda.convert
Tool
  • Maven
  • Gradle
  • Ant
  • javac
  • java
  • jar

Summary

All the use cases above (and others) reduce to the fact that a build tool, such as Maven, must pick one artifact from many to satisfy the module requested in the JPMS modules graph. The module, and the module name, is tightly linked to the bytecode and packages. By contrast, the Maven groupId:artifactId:version co-ordinates are a tool for identifying each artifact in order to pick the correct one to use.

Hopefully, this helps explain the philosophical difference between artifacts and modules, and why modules should be named after their super-package, not after the artifactId. And why you cannot derive the module name from groupId:artifactId - just look at the tables above to see the varying artifact names.

Questions and comments welcome.

4 comments:

  1. Did you read JBoss' (not RedHat's) take on JigSaw? https://developer.jboss.org/blogs/scott.stark/2017/04/14/critical-deficiencies-in-jigsawjsr-376-java-platform-module-system-ec-member-concerns
    Would be interested to know what you think of it.

    ReplyDelete
  2. The implication here is that module names are API just like package and type names. And must be as carefully designed and even published as part of specifications. So JSR 789 must, in addition to specifying a set of types in packages, also specify the module name to be used by all modules providing an implementation of the API. But it is more than just the module name: there is other information in module-info file which is part of the public "face" of the module. And every implementation will need to ensure all private (hidden) packages are properly shaded to ensure the module implementations can be substituted.

    ReplyDelete
    Replies
    1. Yes, that's we (Bean Validation 2.0, JSR 380) will do; we'll recommend to use a module name of "java.validation" when releasing the API as a JPMS module (in the next version we'll release after Java 9, this recommendation will become a mandatory definition).

      In terms of packages, all packages in our (most, all?) JSR are public, so they'd all have to be exported. Implementations typically provide an API JAR and another one with their implementation, so API and implementation is clearly separated. In fact we some tooling as part of the TCK which ensures that the API packages are exactly the ones mandated by the spec (nothing less and nothing more).

      Delete
  3. But JPMS *is* metadata. It's literally describing relationships between code. Just because it is "compiled" doesn't mean it isn't just a descriptor.

    If it shouldn't be thought of as a descriptor, then the syntax is deceptive and only really useful for saying "hey don't look at my internal packages", a constraint that would be better implemented with the addition of friend classes/packages to the existing privacy declarations (so that we don't end up with asinine phrases like "public isn't public").

    If it is actually a descriptor, which most reasonable people expect (thus the ongoing confusion), then the dependency relationships need to be well defined. Specifically, the current JPMS syntax does not clarify whether a dependency is on the specific classes that your code was compiled against (i.e. shared code), or whether any module that matches that declaration can be substituted at run time. When we do this in code, we use interfaces to explicitly warn that the implementation may change, and that it's none of your business anyway. But if I'm compiling against a public final class, my expectation is that that class and only that class will be available at run time.

    OSGi solves this in two ways, one conceptual and one technical. Conceptually it allows you to specify a dependency on specific code, via versioning. If you still don't feel safe importing a package, classloader isolation allows you to have that package directly in your component. This is all anyone really wanted out of modules anyway.

    ReplyDelete