Thursday, 20 April 2017

Java SE 9 - JPMS module naming

The Java Platform Module System (JPMS) is soon to arrive, developed as Project Jigsaw. This article follows the introduction and looks at how modules should be named.

As with all "best practices", they are ultimately the opinion of the person writing them. I hope however to convince you that my opinion is right ;-). And as a community, we will certainly benefit if everyone follows the same rules, just like we benefited from everyone using reverse-DNS for package names.

TL;DR - My best practices

These are my recommendations for module naming:

  • Module names must be reverse-DNS, just like package names, e.g. org.joda.time.
  • Modules are a group of packages. As such, the module name must be related to the package names.
  • Module names are strongly recommended to be the same as the name of the super-package.
  • Creating a module with a particular name takes ownership of that package name and everything beneath it.
  • As the owner of that namespace, any sub-packages may be grouped into sub-modules as desired so long as no package is in two modules.

Thus the following is a well-named module:

  module org.joda.time {
    requires org.joda.convert;

    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

As can be seen, the module contains a set of packages (exported and hidden), all under one super-package. The module name is the same as the super-package name. The author of the module is asserting control over all names below org.joda.time, and could create a module org.joda.time.18n in the future if desired.

To understand why this approach makes sense, and the finer details, read on.

JPMS naming

Naming anything in software is hard. Unsurprisingly then, agreeing an approach to naming modules has also turned out to be hard.

The naming rules allow dots, but prohibit dashes, thus lots of name options are closed off. As a side note, module names in the JVM are more flexible, but we are only considering names at the Java level here.

These are the two basic approaches which I think make sense:

1) Project-style. Short names, as commonly seen in the jar filename from Maven Central.

2) Reverse DNS. Full names, exactly as we've used for a package names since Java v1.0.

Here are some examples to make it more clear:

  Project-style Reverse-DNS
Joda-Time joda.time org.joda.time
Commons-IO commons.io org.apache.commons.io
Strata-Basics strata.basics com.opengamma.strata.basics
JUnit junit org.junit

All things being equal, we'd choose the shorter name - project-style. It is certainly more attractive when reading a module-info.java file. But there are some clear reasons why reverse-DNS must be chosen.

It is worth noting that Mark Reinhold currently indicates a preference for project-style names. However, the linked mail doesn't really deal with the global uniqueness or clashing elements of the naming problem, and others in the expert group disagreed with project-style names.

Ownership and Uniqueness

The original designers of Java made a very shrewd choice to proposed reverse-DNS names for packages. This approach has scaled very well, through the incredible rise of open source software. It provides two key properties - Ownership and Uniqueness.

The ownership aspect of reverse-DNS delegates control of part of the global DNS namespace to an individual or company. It is a universally agreed approach with enough breadth of identifiers to make clashes rare. Within that namespace, developers are then responsible for ensuring uniqueness. Together, these two aspects result in globally unique package names. As such, it is pretty rare that code has two colliding packages, despite modern applications pulling in hundreds of dependent jar files. For example, the Spark framework and Apache Spark co-exist despite having the same simple name. But look what happens if we only use project-style names:

  Project-style Reverse-DNS
Spark framework spark.core com.sparkjava.core
Apache-Spark spark.core org.apache.spark.core

As can be seen, the project-style names clash! JPMS will simply refuse to start a modulepath where two modules have the same name, even if they contain different packages. (Since these projects haven't chosen module names yet, I've tweaked the example to make them clash. But this example is far from impossible, which is the point here!)

Not convinced? Well imagine what would happen if package names were not reverse-DNS. If you application pulls in hundreds of dependencies, do you think there would be no duplicates?

Of course we have project-style names today in Maven - the jar filename is the artifactId which is a project-style name. Given this, why don't we have problems today? Well it turns out that Maven is smart enough to rename the artifact if there is going to be a clash. The JPMS does not offer this ability - your only choice with a clash will be to rewrite the module-info-class file of the problematic module and all other modules that refer to it.

As a final example of how project-style name clashes can occur, consider a startup creating a new project - "willow". Since they are small, they choose a module name of "willow". Over the next year, the startup becomes fantastically successful, growing at an exponential rate, meaning that there are now 100s of modules within the company depending on "willow". But then a new Open Source project starts up, and calls itself "willow". Now, the company can't use the open source project. Nor can the company release "willow" as open source. These clashes are avoided if reverse-DNS names are used.

To summarize this section, we need reverse-DNS because module names need to be globally unique, even when writing modules that are destined to remain private. The ownership aspect of reverse-DNS provides enough namespace separation for companies to get the uniqueness necessary. After all, you wouldn't want to confuse Joda-Time with the freight company also called Joda would you?

Modules as package aggregates

The JPMS design is fundamentally simple - it extends JVM access control to add a new concept "modules" that groups together a set of packages. Given this, there is a very strong link between the concept of a module and the concept of a package.

The key restriction is that a package must be found in one and only one module.

Given that a module is formed from one or more packages, what is the conceptually simplest name that you can choose? I argue that it is one of the package names that forms the module. And thus a name you've already chosen. Now, consider we have a project with three packages, which of these three should be the module name?

  module ??? {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
  }

Again, I'd argue there isn't really a debate. There is a clear super-package, and that is what should be used as the module name - org.joda.time in this case.

Hidden packages

With JPMS, a module can hide packages. When hidden, the internal packages are not visible in Javadoc, nor are they visible in the module-info.java file. This means that consumers of the module have no immediate way of knowing what hidden packages a module has.

Now consider again the key restriction that a package must be found in one and only one module. This restriction applies to hidden packages as well as exported ones. Therefore if your application depends on two modules and both have the same hidden package, your application cannot be run as the packages clash. And since information on hidden packages is difficult to obtain, this clash will be surprising. (There are some advanced ways to around these clashes using layers, but these are designed for containers, not applications.)

The best solution to this problem is exactly as described in the last section. Consider a project with three exported packages and two hidden ones. So long as the hidden packages are sub-packages of the module name, we should be fine:

  module org.joda.time {
    exports org.joda.time;
    exports org.joda.time.chrono;
    exports org.joda.time.format;
    // not exported: org.joda.time.base;
    // not exported: org.joda.time.tz;
  }

By using the super-package name as the module name, the module developer has taken ownership of that package and everything below it. So long as all the non-exported packages are conceptually sub-packages, the end-user application should not see any hidden package clashes.

Automatic modules

JPMS includes a feature whereby a regular jar file, without a module-info.class file, turns into a special kind of module just by placing it on the modulepath. The automatic module feature is controversial in general, but a key part of this is that the name of the module is derived from the filename of the jar file. In addition, it means that people writing module-info.java files have to guess the name that someone else will use for a module. Having to guess a name, and having the Java platform pick a name based on the filename of a jar file are both bad ideas in my opinion, and that of many others, but our efforts to stop them seem to have failed.

The naming approach outlined in this article provides a means to mitigate the worst effects of this. If everyone uses reverse-DNS based on the super-package, then the guesses that people make should be reasonably accurate, as the selection process of a name should be fairly straightforward.

What if there isn't a clear super-package?

There are two cases to consider.

The first case is where there really is a super-package, it's just that it has no code. In this case, the implied super-package should be used. (Note that this example is Google Guava, which doesn't have guava in the package name!):

  module com.google.common {
    exports com.google.common.base;
    exports com.google.common.collect;
    exports com.google.common.io;
  }

The second case is where a jar file has two completely unrelated super-packages:

  foo.jar
  - package com.foo.util
  - package com.foo.util.money
  - package com.bar.client

The right approach here is to break the jar file into two separate modules:

  module com.foo.util {
    requires com.bar.client;
    exports com.foo.util;
    exports com.foo.util.money;
  }
  module com.bar.client {
    exports com.bar.client;
  }

Failure to do this is highly likely to cause clashes at some point, as there is no way that com.foo.util should be claiming ownership of the com.bar.client namespace.

If com.bar.client is going to be a hidden package when converted to modules, then instead of it being a separate module, it can be repackaged (i.e. sharded) under the module's super-package:

  module com.foo.util {
    exports com.foo.util;
    exports com.foo.util.money;
    // not exported: com.foo.util.shard.com.bar.client;
  }

Can you have sub-modules?

Yes. When a module name is chosen, the developer is taking control of a namespace. That namespace consists of the module name and all sub-names below it - sub-package names and sub-module names.

Ownership of that namespace allows the developer to release one module or many. The main constraint is that there should not be two published modules containing the same package.

As a side effect of this, the practice of larger projects releasing an "all" jar will need to stop. An "all" jar is used when the project has lots of separate jar files, but also wants to allow end-users to depend on a single jar file. These "all" jar files are a pain in Maven dependency trees, but will be a disaster in JPMS ones, as there is no way to override the metadata, unlike in Maven.

What if my existing project does not meet these guidelines?

The harsh suggestion is to change the project in an incompatible manner so it does meet the guidelines. JPMS in Java SE 9 is disruptive. It does not take the approach of providing all the tools necessary to meet all the edge cases in current deployments. As such, it is not surprising that some jar files and some projects will require some major rework.

Why ignore the Maven artifactId?

JPMS is an extension to the Java platform (language and runtime). Maven is a build system. Both are necessary, but they have different purposes, needs and conventions.

JPMS is all about packages, grouping them together to form modules and linking those. In this way, developers are working with source code, just like any other source code. What artifacts the source code is packed up into is a separate question. Understanding the separation is hard, because currently there is a one-to-one mapping between the module and the jar file, however, we should not assume this will always be the case in the future.

Another example of this separation is versioning. JPMS has little to no support for versions, yet build systems like Maven do. When running the application, Maven is responsible for collecting a coherent set of artifacts (jar files) to run the application, just as before. It's just that some of those might be modules.

Finally, the Maven artifactId does not exist in isolation. Maven makes unique identifiers by combining the groupId, artifactId and classifier. Only the combination is sufficiently globally unique to be useful. Picking out just the artifactId and trying to make a unique module name from it is asking for trouble.

Summary

JPMS module names, and the module-info.java in general, are going to require real thought to get right. The module declaration will be as much a part of your API as your method signatures.

The importance is heightened because, unlike Maven and other module systems, JPMS has no way to fix broken metadata. If you rely on some modular jar files, and get a clash or find some other mistake in the module declarations, your only options will be to not use JPMS or to rewrite the module declarations yourself. Given this difficulty, it is not yet clear that JPMS will be a success, thus your best option may be to not modularize your code.

See the TL;DR section above for the summary of the module name proposal. Feedback and questions welcome.

PS. For clarity, my personal interest is ensuring Java succeeds, something that will IMO require consistent naming.

Monday, 17 April 2017

Java 9 modules - JPMS basics

The Java Platform Module System (JPMS) is the major new feature of Java SE 9. In this article, I will introduce it, leaving most of my opinions to a follow up article. This is based on these slides.

Java Platform Module System (JPMS)

The new module system, developed as Project Jigsaw, is intended to raise the abstraction level of coding in Java as follows:

The primary goals of this Project are to:
* Make the Java SE Platform, and the JDK, more easily scalable down to small computing devices;
* Improve the security and maintainability of Java SE Platform Implementations in general, and the JDK in particular;
* Enable improved application performance; and
* Make it easier for developers to construct and maintain libraries and large applications, for both the Java SE and EE Platforms.
To achieve these goals we propose to design and implement a standard module system for the Java SE Platform and to apply that system to the Platform itself, and to the JDK. The module system should be powerful enough to modularize the JDK and other large legacy code bases, yet still be approachable by all developers.

However as we shall see, project goals are not always met.

What is a JPMS Module?

JPMS is a change to the Java libraries, language and runtime. This means that it affects the whole stack that developers code with day-to-day, and as such JPMS could have a big impact. For compatibility reasons, most existing code can ignore JPMS in Java SE 9, something that may prove to be very useful.

The key conceptual point to grasp is that JPMS adds new a concept to the JVM - modules. Where previously, code was organized into fields, methods, classes, interfaces and packages, with Java SE 9 there is a new structural element - modules.

  • a class is a container of fields and methods
  • a package is a container of classes and interfaces
  • a module is a container of packages

Because this is a new JVM element, it means the runtime can apply strong access control. With Java 8, a developer can express that the methods of a class cannot be seen by other classes by declaring them private. With Java 9, a developer can express that a package cannot be seen by other modules - ie. a package can be hidden within a module.

Being able to hide packages should in theory be a great benefit for application design. No longer should there be a need for a package to be named "impl" or "internal" with Javadoc declaring "please don't use types from this package". Unfortunately, life won't be quite that simple.

Creating a module is relatively simple however. A module is typically just a jar file that has a module-info.class file at the root - known as a modular jar file. And that file is created from a module-info.java file in your sourcebase (see below for more details).

Using a modular jar file involves adding the jar file to the modulepath instead of the classpath. If a modular jar file is on the classpath, it will not act as a module at all, and the module-info.class will be ignored. As such, while a modular jar file on the modulepath will have hidden packages enforced by the JVM, a modular jar file on the classpath will not have hidden packages at all.

Other module systems

Java has historically had other module systems, most notably OSGi and JBoss Modules. It is important to understand that JPMS has little resemblance to those systems.

Both OSGi and JBoss Modules have to exist without direct support from the JVM, yet still provide some additional support for modules. This is achieved by launching each module in its own class loader, a technique that gets the job done, yet is not without its own issues.

Unsurprisingly, given these are existing module systems, experts from those groups have been included in the formal Expert Group developing JPMS. However, this relationship has not been harmonious. Fundamentally, the JPMS authors (Oracle) have set out to build a JVM extension that can be used for something that can be described as modules, whereas the existing module systems derive experience and value from real use cases and tricky edge cases in big applications that exist today.

When reading about modules, it is important to consider whether the authors of the article you are reading are from the OSGi/JBoss Modules design camp. (I have never actively used OSGi or JBoss Modules, although I have used Eclipse and other tools that use OSGi internally.)

module-info.java

The module-info.java file contains the instructions that define a module (the most important ones are covered here, but there are more). This is a .java file, however the syntax is nothing like any .java file you've seen before.

There are two key questions that you have to answer to create the file - what does this module depend on, and what does it export:

module com.opengamma.util {
  requires org.joda.beans;  // this is a module name, not a package name
  requires com.google.guava;

  exports com.opengamma.util;  // this is a package name, not a module name
}

(The names to use for modules needed a whole separate article, for this one I'll use package-name style)

This module declaration says that com.opengamma.util depends on (requires) org.joda.beans and com.google.guava. It exports one package, com.opengamma.util. All other packages are hidden when using the modulepath (enforced by the JVM).

There is an implicit dependency on java.base, the core module of the JDK. Note that the JDK itself is also modularized, so if you want to depend on Swing, XML or Logging, that dependency needs to be expressed.

module org.joda.beans {
  requires transitive org.joda.convert;

  exports org.joda.beans;
  exports org.joda.beans.ser;
}

This module declaration says that org.joda.beans depends on (requires) org.joda.convert. The "requires transitive", as opposed to a simple "requires", means that any module that requires org.joda.beans can also see and use the packages from org.joda.convert. This is used here as Joda-Beans has methods where the return type is from Joda-Convert. This is shown by a dashed line.

module org.joda.convert {
  requires static com.google.guava;

  exports org.joda.convert;
}

This module declaration says that org.joda.convert depends on (requires) com.google.guava, but only at compile time, "requires static", as opposed to a simple "requires". This is an optional dependency. If Guava is on the modulepath, then Joda-Convert will be able to see and use it, and no error will occur if Guava is not present. This is shown by a dotted line.

Access rules

When running a modular jar on the modulepath with JVM access rules applied, code in package A can see a type in package B if:

  • the type is public
  • package B is exported from it's module
  • there is a dependency from the module containing package A to the module containing package B

Thus, in the example above, code in module com.opengamma.util can see packages org.joda.beans, org.joda.beans,ser, org.joda.convert and any package exported by Guava. However, it cannot see package org.joda.convert.internal (as it is not exported). In addition, code module com.google.guava cannot see code in package org.joda.beans or org.joda.convert as there is no modular dependency.

What can go wrong?

The basics described above are simple enough. It is initially quite easy to imagine how you might build an application from these foundations and benefit from hiding packages. Unfortunately, quite a few things can go wrong.

1) All use of module-info files only applies if using modular jars on the modulepath. For compatibility, all code on the classpath is packaged up as a special unnamed module, with no hidden packages and full access to the whole JDK. Thus, the security benefits of hiding packages are marginal at best. However, the modules of the JDK itself are always run in modular mode, thus are always guaranteed the security benefits.

2) Versions of modules are not handled. You cannot have the same module name loaded twice - you cannot have two versions of the same module loaded twice. It is left entirely to you, and thus to your build tool, to create a coherent set of modules that can actually be run. Thus, the classpath hell situation caused by clashing versions is not solved. Note that putting the version number in the module name is a Bad Idea that does not solve this problem and creates others.

3) Two modules may not contain the same package. This seems eminently sensible, until you consider that it also applies to hidden packages. Since hidden packages are not listed in module-info.class, a tool like Maven must unpack the jar file to discover what hidden packages there are in order to warn of clashes. As a user of the library, such a clash will be completely surprising, as you won't have any indication of the hidden packages in the Javadoc. This is a more general indication that JPMS does not provide sufficient isolation between modules, for reasons that are far from clear at this point.

4) There must be no cycles between modules, at compile time and at runtime. Again, this seems sensible - who wants to have module A depend on B depend on C which depends on A? But the reality of existing projects is that this happens, and on the classpath is not a problem. For example, consider what would happen if Guava decided to depend on Joda-Convert in the example above. This restriction will make some existing open source projects hard to migrate.

5) Reflection is changing, such that non-public fields and methods will no longer be accessible via reflection. Since almost every framework uses reflection in this way, there will be significant work needed to migrate existing code. In particular, the JDK will be very locked down against reflection, which may prove painful (command line flags can escape the trap for now). This article hasn't had a chance to explore how the module declaration can influence reflection - see "opens" in the slides for more details.

6) Are your dependencies modularized? In theory, you can only turn your code into a module once all your dependencies are also modules. For any large application with hundreds of jar file dependencies, this will be a problem. The "solution" is automatic modules, where a normal jar file placed on the modulepath is automatically turned into a module. This process is controversial, with naming a big issue. Library authors should not publish modules that depend on automatic modules to public repositories like Maven Central. Again, automatic modules deserve their own article!

7) Module naming is not yet set in stone. I've come to believe that naming your module after the highest package it contains, causing that module to "take ownership" of the subpackages, is the only sane strategy.

8) Conflicts with build systems - who is in charge? A Maven pom.xml also contains information about a project. Should it be extended to allow module information to be added? I would suggest not, because the module-info.java file contains a binding part of your API, and that is best expressed in .java code, not metadata like pom.xml.

For those wanting a book to read in much more depth, try this one from Nicolai.

Summary

Do not get too excited about JPMS - modules in Java 9. The above is only a summary of what is possible with the module-info.java file and the restrictions of the JPMS. If you are thinking of modularizing your library or application, please wait a little longer until everything becomes a little clearer.

Feedback welcome, but bear in mind that I am planning more articles.

Tuesday, 7 February 2017

Java Time (JSR-310) enhancements in Java SE 9

The java.time.* API (JSR-310) was added to Java SE 8, but what has been going on since then?

Java Time in Java SE 9

There are currently 117 java time issues targetted into Java SE 9. Most of these are not especially interesting, with a lot of mistakes in the Javadoc that needed fixing. What follows are some of the interesting ones:

Main enhancements:

JDK-8146218 - Add LocalDate.datesUntil method producing Stream.
Adds two new methods - LocalDate.datesUntil(LocalDate) and LocalDate.datesUntil(LocalDate,Period) - returning a stream of dates.

JDK-8068730 - Increase precision of Clock.systemUTC().
The clock in Java - System.currentTimeMillis() - has ticked in milliseconds since Java was first released. With Java SE 9, users of Clock will see higher precision, depending on the available clock of the operating system.

JDK-8071919 - Clock.tickMillis(ZoneId zone) method.
With the system clock now returning higher precision, a new method was added - Clock.tickMillis(ZoneId) - that chops off the extra precision to restrore the millisecond ticking behaviour of Java SE 8.

JDK-8030864 - Add efficient getDateTimeMillis method to java.time.
This adds two methods named epochSecond to Chronology that have no object creation to convert date-time fields to an epoch-second.

JDK-8142936 - Duration methods for days, hours, minutes, seconds, etc.
The Java SE 8 API of Duration turned out to be incomplete for certain use cases. This change adds a slew of new methods that allow parts of the duration to be reliably returned.

JDK-8148849 - Truncating Duration.
Adds a method Duration.truncatedTo(TemporalUnit) to allow truncation, similar to the existing method on Instant.

JDK-8032510 - Add Duration.dividedBy(Duration).
A new method to allow a duration to be divided by another duration.

JDK-8133079 - LocalDate and LocalTime ofInstant() factory methods.
Add new factory methods in LocalDate and LocalTime to simplify conversion from Instant.

JDK-8143413 - Add toEpochSecond methods for efficient access.
Add methods to LocalDate, LocalTime and OffsetTime to optimize conversion to epoch-seconds.

Formatting:

JDK-8148947 - DateTimeFormatter pattern letter 'g'.
This adds a pattern letter for modified Julian days.

JDK-8155823 - Add date-time patterns 'v' and 'vvvv'.
This adds support for the "generic non-location" format for time-zones as defined by CLDR, such as "Pacific Time" (the format ignores daylight saving time). Methods were also added to the formatter builder - DateTimeFormatterBuilder.appendGenericZoneText().

JDK-8066806 - DateTimeFormatter cannot parse offset with single digit hour.
The formatter is extended to support 11 more time-zone offset formats, including single digit hours such as +2:00.

JDK-8031085 - DateTimeFormatter won't parse format "yyyyMMddHHmmssSSS".
This extends support for adjacent value parsing to fractions. Where in Java SE 8 a pattern like this looks like it should work, but doesn't, in Java SE 9 it just works.

JDK-8145633 - Adjacent value parsing for Localized Patterns .
This extends support for adjacent value parsing to localized patterns, such as week-based-year and week of year.

JDK-8148949 - DateTimeFormatter pattern letters 'A','n','N'.
These patterns were altered to be more flexible and produce less errors.

Performance:

JDK-8073394 - Clock.systemUTC() should return a constant.
This change avoids creating unecessary objects when using Clock.

JDK-8074003 - ZoneRules.getOffset(Instant) can be optimized.
This change reduces object churn when looking up time-zone data.

JDK-8074002 - ZoneId.systemDefault() should be faster.
This enhancement uses a clever approach to cache the time-zone while handling TimeZone.setDefault.

JDK-8068803 - Performance of LocalDate.plusDays could be better.
This optimizes LocalDate.plusDays for the common case of adding a small number of days.

JDK-8066291 - ZoneIdPrinterParser can be optimized.
The method ZoneRulesProvider.getAvailableZoneIds() now returns an immutable set, not a mutable one. Since most user code calls ZoneRules.getAvailableZoneIds(), it will be unaffected.

Thanks

Thanks for all these bug fixes and enhancements go to the many contributors, both inside and outside Oracle. (I've mostly been a reviewer, rather than an author, which has worked pretty well overall.)

Summary

The java.time.* keep on moving forward. Any feedback or other enhancement suggestions are welcome!

Monday, 26 September 2016

Code generating beans - mutable and immutable

Java has long suffered from the pain of beans. To declare a simple data class takes far too much code. At JavaOne 2016, I talked about code generation options - see the slides.

Code generation of mutable and immutable beans

The Java ecosystem is massive. So many libraries releasd as open source and beyond, which naturally leads to the question as to how those libraries communicate. And it is the basic concept of beans that is the essential glue, despite the ancient specification. How do ORMs (Hibernate etc.), Serialization (Jackson etc.) and Bean Mappers (Dozer etc.) communicate? Via getters and setters.

The essential features of beans have moved beyond the JavaBeans spec, and are sometimes referred to as POJOs. The features are:

  • Mutable
  • No-args constructor
  • Getters and Setters
  • equals() / hashCode() / toString()

But writing these manually is slow, tedious and error-prone. Code generation should be able to help us here.

But should we be using mutable beans in 2016? No, no, no!

It is time to be writing immutable data structure (immutable beans). But the only practical way to do so is code generation, especially if you want to have builders.

In my talk at JavaOne 2016, I considered various code generation approaches:

IDE code generation

This is fine as far as it goes, but while the code is likely to be correct immediately after generation, there is still no guarantee that the generated code will stay correct as the class is maintained over time.

AutoValue, Immutables and VALJOGen

These three projects - AutoValue, Immutables, VALJOGen - use annotation processors to generate code during compilation. The idea is simple - the developer writes an abstract class or interface, and the tool code generates the implementation at compile time. However, these tool all focus on immutable beans, not mutable (Immutables can generate a modifiable bean, but it doesn't match the JavaBeans spec, so many tools will reject it).

On the up side, there is no chance to mess up the equals / hashCode / toString. While the tools all allow the methods to be manually written if necessary, most of the time, the default is what you want. It is also great not to have to implement the immutable builder class manually.

On the down side, you as the developer have to write abstract methods, not fields. A method is a few more keystrokes than a field, and the Javadoc requires an @return line too. With AutoValue, this is particularly painful, as you have to write the outline of the builder class. With Immutables, there is no need for this.

Of the three projects, AutoValue provides a straightforward simple tool that works, and where the implementation class is hidden (package scoped). Immutables provides a full-featured tool, with many options and ways to generate. By default, the implementation class is publicly visible and used by callers, but there are ways to make it package-scoped (with more code written by you). VALJOGen allows full customisation of the generation template. There is no doubt that Immutables is the most comprehensive of the three projects.

All three annotation processing tools must be setup before use. In general, adding the tool to Maven will do most of the work (Maven support is good). For Eclipse and IntelliJ, these instructions are comprehensive.

Lombok

The Lombok project also uses annotations to control code generation. However, instead of acting as an annotation processor, it hacks into the internal APIs of Eclipse and the Java compiler.

This approach allows code to be generated within the same class, avoiding the need for developers to work with an abstract class or interface. This means that instead of writing abstract methods, the developer writes fields, which is a more natural thing to do.

The key question with Lombok is not what it generates, but the way it generates it. If you are willing to accept the hacky approach, IDE limitations, and the inability to debug into the generated code, then it is a neat enough solution.

For Eclipse, Lombok must be installed, which is fairly painless as there is a GUI. Other tools require other installation approaches, see this page.

Joda-Beans

The Joda-Beans project takes a third approach to code generation. It is a source code regenerator, creating code within the same source file, identified by "autogenerated start/end" comments.

Developers write fields, not abstract methods which is simpler and less code. They also generate code into the same class, which can be final if desired.

One key benefit of generating all the code to the same class is that the code is entirely valid when checked out. There is no need to install a plugin or configure your IDE in any way.

Unlike the other choices, Joda-Beans also provides functionality at runtime. It aims to add C# style properties to Java. What this means in practice is that you can easily treat a bean as a set of properties, loop over the properties and create instances using a standardized builder. These features are the ideal building block for serialization frameworks, and Joda-Beans provides XML, JSON and binary serialization that operates using the properties, generally without reflection. The trade off here is that Joda-Beans is a runtime dependency, so it is only the best option if you use the additional properties feature.

The Joda-Beans regenerator can be run from Maven using a plugin. If you use the standard Eclipse Maven support, then simply saving the file in Eclipse will regenerate it. There is also a Gradle plugin.

Comparing

Most of the projects have some appeal.

  • AutoValue is a simple annotation processor that hides the implementation class, but requires more code to trigger it.
  • Immutables is a flexible and comprehensive annotation processor that can be used in many ways.
  • Lombok requires the least coding by the developer, but the trade-off is the implementation via internal APIs.
  • Joda-Beans is different in that it has a runtime dependency adding C# style properties to Java, allowing code to reliably loop over beans.

My preference is Joda-Beans (which I wrote), because I like the fact that the generated code is in the same source file, so callers see a normal final class, not an interface or abstract class. It also means that it compiles immediately in an IDE that has not been configured when checked out from source control. But Joda-Beans should really only be used if you understand the value of the properties support it provides.

If I was to pick another tool, I would use Immutables. It is comprehensive, and providing you invest the time to choose the best way to generate beans for your needs, it should have everything you need.

Finally, it is important that readers have a chance to look at the code written and the code generated. To do this, I have created the compare-beangen GitHub project. This project contains source code for all the tools above and more, demonstrating what you have to write.

To make best use of the project, check it out and import it into your IDE. That way, you will experience what code generation means, and how practical it is to work with it. (For example, see what happens when you rename a field/method/class. Does the code generator cope?)

Summary

It is time to start writing and using immutable beans instead of mutable ones. The Strata open source market risk project (my day job) has no mutable beans, so it is perfectly possible to do. But to use immutable beans, you are going to need a code generator, as they are too painful to use otherwise.

This blog has summarised five code generators, and provided a nice GitHub project for you to do you own comparisons.

Tuesday, 20 September 2016

Private methods in interfaces in Java 9

Java SE 9 is slowly moving towards the finishing line. One new feature is private methods on interfaces.

Private methods on interfaces in Java 9

In Java 7 and all earlier versions, interfaces were simple. They could only contain public abstract methods.

Java 8 changed this. From Java 8, you can have public static methods and public default methods.

public interface HolidayCalendar {
  // static method, to get the calendar by identifier
  public static HolidayCalendar of(String id) {
    return Util.holidayCalendar(id);
  }
  
  // abstract method, to find if the date is a holiday
  public abstract boolean isHoliday(LocalDate date);
  
  // default method, using isHoliday()
  public default boolean isBusinessDay(LocalDate date) {
    return !isHoliday(date);
  }
}

Note that I have chosen to use the full declaration, with "public" on all three methods even though it is not required. I have argued that this is best practice for Java SE 8, because it makes the code clearer (now there are three types of method) and prepares for a time when there will be non-public methods.

And that time is very soon, as Java 9 is adding private methods on interfaces.

public interface HolidayCalendar {
  // example of a private interface method
  private void validateDate(LocalDate date) {
    if (date.isBefore(LocalDate.of(1970, 1, 1))) {
   throw new IllegalArgumentException();
 }
  }
}

Thus, methods can be public or private (with the default being public if not specified). Private methods can be static or instance. In both cases, the private method is not inherited by sub-interfaces or implementations. The valid combinations of modifiers in Java 9 will be as follow:

  • public static - supported
  • public abstract - supported
  • public default - supported
  • private static - supported
  • private abstract - compile error
  • private default - compile error
  • private - supported

Private methods on interfaces will be very useful in rounding out the functionality added in Java 8.

Saturday, 26 March 2016

Var and val in Java?

Should local variable type inference be added to Java? This is the question being pondered right now by the Java language team.

Local Variable Type Inference

JEP-286 proposes to add inference to local variables using a new psuedo-keyword (treated as a "reserved type name").

We seek to improve the developer experience by reducing the ceremony associated with writing Java code, while maintaining Java's commitment to static type safety, by allowing developers to elide the often-unnecessary manifest declaration of local variable types.

A number of possible keywords have been suggested:

  • var - for mutable local variables
  • val - for final (immutable) local variables
  • let - for final (immutable) local variables
  • auto - well lets ignore that one shall we...

Given the implementation strategy, it appears that the current final keyword will still be accepted in front of all of the options, and thus all of these would be final (immutable) variables:

  • final var - changes the mutable local variable to be final
  • final val - redundant additional modifier
  • final let - redundant additional modifier

Thus, the choice appears to be to add one of these combinations to Java:

  • var and final var
  • var and val - but final var and final val also valid
  • var and let - but final var and final let also valid

In broad terms, I am unexcited by this feature and unconvinced it actually makes Java better. While IDEs can mitigate the loss of type information when coding, I expect some code reviews to be significantly harder (as they are done outside IDEs). It should also be noted that the C# coding standards warn against excessive use of this feature:

Do not use var when the type is not apparent from the right side of the assignment.
Do not rely on the variable name to specify the type of the variable. It might not be correct.

Having said the above, I suspect there is very little chance of stopping this feature. The rest of this blog post focuses on choosing the right option for Java

Best option for Java

When this feature was announced, aficionados of Scala and Kotlin naturally started arguing for var and val. However, while precedence in other languages is good to examine, it does not necessarily apply that it is the best option for Java.

The primary reason why the best option for Java might be different is history. Scala and Kotlin had this feature from the start, Java has not. I'd like to show why I think val or let is wrong for Java, because of that history.

Consider the following code:

 public double parSpread(SwapLeg leg) {
   Currency ccyLeg = leg.getCurrency();
   Money convertedPv = presentValue(swap, ccyLeg);
   double pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

Local variable type inference would apply fine to it. But lets say that the type on one line was unclear, so we choose to keep it to add clarity (as per the C# guidelines):

 public double parSpread(SwapLeg leg) {
   val ccyLeg = leg.getCurrency();
   Money convertedPv = presentValue(swap, ccyLeg);
   val pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

Fine, you might say. But what if the code is written in a team that insists on marking every local variable as final.

 public double parSpread(final SwapLeg leg) {
   val ccyLeg = leg.getCurrency();
   final Money convertedPv = presentValue(swap, ccyLeg);
   val pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

Suddenly, we have a mess. Some parts of the code use final to indicate a "final" (immutable) local variable. Whereas other parts of the code use val. It is the mixture that is all wrong, and it is that mixture that you do not get in Scala or Kotlin.

(Perhaps you don't code using final on every local variable? I know I don't. But I do know that it is a reasonably common coding standard, designed to add safety to the code and reduce bugs.)

Contrast the above to the alternative:

 public double parSpread(final SwapLeg leg) {
   final var ccyLeg = leg.getCurrency();
   final Money convertedPv = presentValue(swap, ccyLeg);
   final var pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

This is a lot more consistent within Java. final continues to be the mechanism used everywhere to get a final (immutable) variable. And if you, like me, don't worry about the final keyword, it reduces to this:

 public double parSpread(final SwapLeg leg) {
   var ccyLeg = leg.getCurrency();
   Money convertedPv = presentValue(swap, ccyLeg);
   var pvbp = legPricer.pvbp(leg, provider);
   return -convertedPv.getAmount() / pvbp;
 }

I understand the objections many readers will be having right now - that there should be two new "keywords", one for mutable and one for immutable local variables and that both should be of the same length/weight (or that the mutable one should be longer) to push people to use the immutable form more widely.

But in Java it really isn't that simple. We've had the final keyword for many years. Ignoring it results in an unpleasant and inconsistent mess.

Summary

I don't personally like local variable type inference at all. But if we are to have it, we have to make it fit well within the existing language.

I argue that val or let simply does not fit Java, because final already exists and has a clear meaning in that space. While not ideal, I must argue for var and final var, as the only combination on offer that meets the key criteria of fitting the existing language.

Tuesday, 22 December 2015

Explicit receiver parameters

I discovered that Java 8 has a language feature I'd never heard of before today!

Explicit receiver parameters

Consider a simple method in Java 7:

 public class Currency {
   public String getCode() { ... }
 }

In Java 8, it turns out there is a second way to write the method:

 public class Currency {
   public String getCode(Currency this) { ... }
 }

The same trick also works for methods with arguments:

 public class Currency {
   public int compareTo(Currency this, Currency other) { ... }
 }

So, what is going on here?

The relevant part of the Java Language Specification is here.

The receiver parameter is an optional syntactic device for an instance method or an inner class's constructor. For an instance method, the receiver parameter represents the object for which the method is invoked. For an inner class's constructor, the receiver parameter represents the immediately enclosing instance of the newly constructed object. Either way, the receiver parameter exists solely to allow the type of the represented object to be denoted in source code, so that the type may be annotated. The receiver parameter is not a formal parameter; more precisely, it is not a declaration of any kind of variable, it is never bound to any value passed as an argument in a method invocation expression or qualified class instance creation expression, and it has no effect whatsoever at run time.

The new feature is entirely optional, and exists to allow the type to be annotated at the point it is used:

 public class Currency {
   public int compareTo(@AnnotatedUsage Currency this, Currency other) { ... }
 }
 @Target(ElementType.TYPE_USE)
 public @interface AnnotatedUsage {}

The annotation is made available in reflection using Method::getAnnotatedReceiverType().

Looking to the future

I discovered this feature in the context of Project Valhalla, the effort to add value types to Java. One possible syntax being considered to allow different methods on List<int> from List<String> is to use receiver types:

 public class List<any T> {
   public int sum(List<int> this) { ... }
 }

This syntax would be used to define a sum() method that only applies when List is parameterized by int, and not when parameterized by anything else, such as String.

For more information, see this email to the Valhalla experts mailing list. But please note that the list is read-only, cannot be signed up to and is for pre-selected experts only. In addition, everything discussed there is very, very early in the process, and liable to change greatly before being released in a real version of Java (such as Java 10).

Summary

Java 8 has a new language feature which you've probably never heard of and will probably never use. But at least you now have some knowledge that you can use to prove you are a Java Guru!