The Java Platform Module System (JPMS) is soon to arrive, developed as Project Jigsaw. This article follows the introduction and looks at how modules should be named.
As with all "best practices", they are ultimately the opinion of the person writing them. I hope however to convince you that my opinion is right ;-). And as a community, we will certainly benefit if everyone follows the same rules, just like we benefited from everyone using reverse-DNS for package names.
TL;DR - My best practices
These are my recommendations for module naming:
- Module names must be reverse-DNS, just like package names, e.g. org.joda.time.
- Modules are a group of packages. As such, the module name must be related to the package names.
- Module names are strongly recommended to be the same as the name of the super-package.
- Creating a module with a particular name takes ownership of that package name and everything beneath it.
- As the owner of that namespace, any sub-packages may be grouped into sub-modules as desired so long as no package is in two modules.
Thus the following is a well-named module:
module org.joda.time {
requires org.joda.convert;
exports org.joda.time;
exports org.joda.time.chrono;
exports org.joda.time.format;
// not exported: org.joda.time.base;
// not exported: org.joda.time.tz;
}
As can be seen, the module contains a set of packages (exported and hidden), all under one super-package.
The module name is the same as the super-package name.
The author of the module is asserting control over all names below org.joda.time
, and could create a module org.joda.time.18n
in the future if desired.
To understand why this approach makes sense, and the finer details, read on.
JPMS naming
Naming anything in software is hard. Unsurprisingly then, agreeing an approach to naming modules has also turned out to be hard.
The naming rules allow dots, but prohibit dashes, thus lots of name options are closed off.
As a side note, module names in the JVM are more flexible, but we are only considering names at the Java level here.
These are the two basic approaches which I think make sense:
1) Project-style. Short names, as commonly seen in the jar filename from Maven Central.
2) Reverse DNS. Full names, exactly as we've used for a package names since Java v1.0.
Here are some examples to make it more clear:
|
Project-style |
Reverse-DNS |
Joda-Time |
joda.time |
org.joda.time |
Commons-IO |
commons.io |
org.apache.commons.io |
Strata-Basics |
strata.basics |
com.opengamma.strata.basics |
JUnit |
junit |
org.junit |
All things being equal, we'd choose the shorter name - project-style.
It is certainly more attractive when reading a module-info.java file.
But there are some clear reasons why reverse-DNS must be chosen.
It is worth noting that Mark Reinhold currently indicates a preference for project-style names. However, the linked mail doesn't really deal with the global uniqueness or clashing elements of the naming problem, and others in the expert group disagreed with project-style names.
Ownership and Uniqueness
The original designers of Java made a very shrewd choice to proposed reverse-DNS names for packages. This approach has scaled very well, through the incredible rise of open source software. It provides two key properties - Ownership and Uniqueness.
The ownership aspect of reverse-DNS delegates control of part of the global DNS namespace to an individual or company. It is a universally agreed approach with enough breadth of identifiers to make clashes rare. Within that namespace, developers are then responsible for ensuring uniqueness. Together, these two aspects result in globally unique package names. As such, it is pretty rare that code has two colliding packages, despite modern applications pulling in hundreds of dependent jar files. For example, the Spark framework and Apache Spark co-exist despite having the same simple name. But look what happens if we only use project-style names:
|
Project-style |
Reverse-DNS |
Spark framework |
spark.core |
com.sparkjava.core |
Apache-Spark |
spark.core |
org.apache.spark.core |
As can be seen, the project-style names clash!
JPMS will simply refuse to start a modulepath where two modules have the same name, even if they contain different packages.
(Since these projects haven't chosen module names yet, I've tweaked the example to make them clash. But this example is far from impossible, which is the point here!)
Not convinced?
Well imagine what would happen if package names were not reverse-DNS.
If your application pulls in hundreds of dependencies, do you think there would be no duplicates?
Of course we have project-style names today in Maven - the jar filename is the artifactId
which is a project-style name.
Given this, why don't we have problems today?
Well it turns out that Maven is smart enough to rename the artifact if there is going to be a clash.
The JPMS does not offer this ability - your only choice with a clash will be to rewrite the module-info-class file of the problematic module and all other modules that refer to it.
As a final example of how project-style name clashes can occur, consider a startup creating a new project - "willow".
Since they are small, they choose a module name of "willow".
Over the next year, the startup becomes fantastically successful, growing at an exponential rate, meaning that there are now 100s of modules within the company depending on "willow".
But then a new Open Source project starts up, and calls itself "willow".
Now, the company can't use the open source project.
Nor can the company release "willow" as open source.
These clashes are avoided if reverse-DNS names are used.
To summarize this section, we need reverse-DNS because module names need to be globally unique, even when writing modules that are destined to remain private. The ownership aspect of reverse-DNS provides enough namespace separation for companies to get the uniqueness necessary.
After all, you wouldn't want to confuse Joda-Time with the freight company also called Joda would you?
Modules as package aggregates
The JPMS design is fundamentally simple - it extends JVM access control to add a new concept "modules" that groups together a set of packages. Given this, there is a very strong link between the concept of a module and the concept of a package.
The key restriction is that a package must be found in one and only one module.
Given that a module is formed from one or more packages, what is the conceptually simplest name that you can choose? I argue that it is one of the package names that forms the module. And thus a name you've already chosen.
Now, consider we have a project with three packages, which of these three should be the module name?
module ??? {
exports org.joda.time;
exports org.joda.time.chrono;
exports org.joda.time.format;
}
Again, I'd argue there isn't really a debate. There is a clear super-package, and that is what should be used as the module name - org.joda.time
in this case.
Hidden packages
With JPMS, a module can hide packages. When hidden, the internal packages are not visible in Javadoc, nor are they visible in the module-info.java file. This means that consumers of the module have no immediate way of knowing what hidden packages a module has.
Now consider again the key restriction that a package must be found in one and only one module.
This restriction applies to hidden packages as well as exported ones.
Therefore if your application depends on two modules and both have the same hidden package, your application cannot be run as the packages clash.
And since information on hidden packages is difficult to obtain, this clash will be surprising.
(There are some advanced ways to around these clashes using layers, but these are designed for containers, not applications.)
The best solution to this problem is exactly as described in the last section.
Consider a project with three exported packages and two hidden ones.
So long as the hidden packages are sub-packages of the module name, we should be fine:
module org.joda.time {
exports org.joda.time;
exports org.joda.time.chrono;
exports org.joda.time.format;
// not exported: org.joda.time.base;
// not exported: org.joda.time.tz;
}
By using the super-package name as the module name, the module developer has taken ownership of that package and everything below it.
So long as all the non-exported packages are conceptually sub-packages, the end-user application should not see any hidden package clashes.
Automatic modules
JPMS includes a feature whereby a regular jar file, without a module-info.class file, turns into a special kind of module just by placing it on the modulepath.
The automatic module feature is controversial in general, but a key part of this is that the name of the module is derived from the filename of the jar file.
In addition, it means that people writing module-info.java files have to guess the name that someone else will use for a module. Having to guess a name, and having the Java platform pick a name based on the filename of a jar file are both bad ideas in my opinion, and that of many others, but our efforts to stop them seem to have failed.
The naming approach outlined in this article provides a means to mitigate the worst effects of this.
If everyone uses reverse-DNS based on the super-package, then the guesses that people make should be reasonably accurate, as the selection process of a name should be fairly straightforward.
What if there isn't a clear super-package?
There are two cases to consider.
The first case is where there really is a super-package, it's just that it has no code. In this case, the implied super-package should be used. (Note that this example is Google Guava, which doesn't have guava in the package name!):
module com.google.common {
exports com.google.common.base;
exports com.google.common.collect;
exports com.google.common.io;
}
The second case is where a jar file has two completely unrelated super-packages:
foo.jar
- package com.foo.util
- package com.foo.util.money
- package com.bar.client
The right approach here is to break the jar file into two separate modules:
module com.foo.util {
requires com.bar.client;
exports com.foo.util;
exports com.foo.util.money;
}
module com.bar.client {
exports com.bar.client;
}
Failure to do this is highly likely to cause clashes at some point, as there is no way that com.foo.util
should be claiming ownership of the com.bar.client
namespace.
If com.bar.client
is going to be a hidden package when converted to modules, then instead of it being a separate module, it can be repackaged (i.e. shaded) under the module's super-package:
module com.foo.util {
exports com.foo.util;
exports com.foo.util.money;
// not exported: com.foo.util.shade.com.bar.client;
}
Can you have sub-modules?
Yes. When a module name is chosen, the developer is taking control of a namespace. That namespace consists of the module name and all sub-names below it - sub-package names and sub-module names.
Ownership of that namespace allows the developer to release one module or many. The main constraint is that there should not be two published modules containing the same package.
As a side effect of this, the practice of larger projects releasing an "all" jar will need to stop. An "all" jar is used when the project has lots of separate jar files, but also wants to allow end-users to depend on a single jar file. These "all" jar files are a pain in Maven dependency trees, but will be a disaster in JPMS ones, as there is no way to override the metadata, unlike in Maven.
What if my existing project does not meet these guidelines?
The harsh suggestion is to change the project in an incompatible manner so it does meet the guidelines.
JPMS in Java SE 9 is disruptive. It does not take the approach of providing all the tools necessary to meet all the edge cases in current deployments. As such, it is not surprising that some jar files and some projects will require some major rework.
Why ignore the Maven artifactId?
JPMS is an extension to the Java platform (language and runtime). Maven is a build system. Both are necessary, but they have different purposes, needs and conventions.
JPMS is all about packages, grouping them together to form modules and linking those.
In this way, developers are working with source code, just like any other source code.
What artifacts the source code is packed up into is a separate question.
Understanding the separation is hard, because currently there is a one-to-one mapping between the module and the jar file, however, we should not assume this will always be the case in the future.
Another example of this separation is versioning. JPMS has little to no support for versions, yet build systems like Maven do. When running the application, Maven is responsible for collecting a coherent set of artifacts (jar files) to run the application, just as before. It's just that some of those might be modules.
Finally, the Maven artifactId does not exist in isolation. Maven makes unique identifiers by combining the groupId, artifactId and classifier. Only the combination is sufficiently globally unique to be useful. Picking out just the artifactId and trying to make a unique module name from it is asking for trouble.
See also this follow up article on modules vs artifacts.
Summary
JPMS module names, and the module-info.java in general, are going to require real thought to get right. The module declaration will be as much a part of your API as your method signatures.
The importance is heightened because, unlike Maven and other module systems, JPMS has no way to fix broken metadata.
If you rely on some modular jar files, and get a clash or find some other mistake in the module declarations, your only options will be to not use JPMS or to rewrite the module declarations yourself.
Given this difficulty, it is not yet clear that JPMS will be a success, thus your best option may be to not modularize your code.
See the TL;DR section above for the summary of the module name proposal.
Feedback and questions welcome.
PS. For clarity, my personal interest is ensuring Java succeeds, something that will IMO require consistent naming.