Sunday, 7 September 2014

Oracle RDBMS empty string and null

The Oracle database (RDBMS) has a "wonderful quirk" - it treats null and empty string as one and the same.

Oracle database empty string

If you have never used Oracle RDMBS you may not be aware of this particular "feature". Basically, if you insert an empty string into the database it is treated as null. Thus, a column declared as "NOT NULL" will reject the insertion of an empty string. For me, and probably most other developers, this isn't what you expect.

The Stack Overflow explanation is "I believe the answer is that Oracle is very, very old." As well as being amusing, it is as good an explanation as any other.

Last week, I had the dubious pleasure of porting the OpenGamma database infrastructure to work on Oracle. It already worked on various other databases (including HSQLDB, Postgres and SQL Server) but Oracle is rather an outlier in the world of SQL.

Most of the conversion went fine with ElSql able to handle any deviations from standard SQL. But the problem of empty strings and null was a little more vexing.

The first question was whether we could avoid needing to identify the difference between null and an empty string? I believe that if Oracle is your primary database, then you could code your application in that way. However, if you are coding the application to work on multiple different databases then this is simply not an option.

The second question was of course how to handle the problem? My research threw up three realistic possibilities:

  1. Prefix all strings by the same character, such as '@'. The string 'Stephen' would be stored in the database as '@Stephen', while the empty string would be stored as '@'.
  2. Replace the empty string with a magic value, such as '!!! EMPTY !!!'.
  3. Store an additional flag column indicating the difference between null and the empty string.

Option 1 means that every string in the database is polluted, making it hard to use the data directly by SQL (bypassing the application). Option 2 relies on the magic value never occurring in real data, and direct SQL access is again compromised (though less than option 1). Option 3 is the most "pure" solution, but would have been very hard to adopt when writing one piece of code for multiple different databases.

My chosen solution was a variation on option 2 - encoding using "one extra space". I'm documenting it here in case anyone else finds it to be a useful strategy:

  • The empty string is stored in the database as a single space (ASCII 32).
  • Any string that consists only of spaces (ASCII 32) is stored with one additional space added.
  • Any other string is stored without change

The advantage of this encoding is that the vast majority of strings are stored without being changed. This makes access by direct SQL simple and obvious. The only strings to be encoded are the empty string, and the unlikely "all space" strings. The encoding will make little difference to most algorithms or displays even if it is not decoded. The encoding is also fully reversible, provided that column length limits are not hit.

Fortunately, I was able to encode and decode the data in relatively few places. For decoding, a decorated ResultSet worked effectively. For encoding, it was possible to create a subclass of Spring's NamedParameterJdbcTemplate and JdbcTemplate to do the trick. See this commit for details.

As a final note, there are other complications with data storage in Oracle. As I understand it, NaN and null also cannot be separated. This blog only covers problems of strings, which is tricky enough!


I will continue to believe that an empty string and null are two different concepts. Oracle RDBMS disagrees. If you face the problem of creating a workaround, perhaps the option I used is worth considering.

Wednesday, 6 August 2014

StringJoiner in Java SE 8

Java SE 8 added a new class for joining strings - StringJoiner. But is it any good?


Here is what the Javadoc of the new class says:

StringJoiner is used to construct a sequence of characters separated by a delimiter and optionally starting with a supplied prefix and ending with a supplied suffix.

Sounds good! Maybe we can finally stop using the Joiner class in Guava.

Unfortunately, the JDK StringJoiner is not a general purpose string joiner in the way that the Guava Joiner is. The first clue is in the API:

// constructors
 StringJoiner(CharSequence delimiter)
 StringJoiner(CharSequence delimiter, CharSequence prefix, CharSequence suffix)
 // methods
 StringJoiner setEmptyValue(CharSequence emptyValue)
 StringJoiner add(CharSequence newElement)
 StringJoiner merge(StringJoiner other)
 int length()
 String toString()

There are two constructors, one taking the separator (delimiter) and one allowing a prefix and suffix. There are then just 5 other methods. By contrast, the Guava version has 15 other methods on top of 2 factory methods.

But the real missing thing? A method to add multiple elements at once to the joiner!

Every time I want to join, I have a list, set or other iterable. With Guava I simply say:

 String joined = Joiner.on(", ").join(list);

StringJoiner has no equivalent method. You have to add the elements one by one using add(CharSequence)!

 StringJoiner joiner = new StringJoiner(", ");
 for (String str : list) {
 String joined = joiner.toString();

I think we'd all agree that rather defeats the purpose of having a joiner at all!

However, it turns out that it is kind of possible to add multiple with the JDK, but you might not spot it:

 String joined = String.join(", ", list);

So, not too bad then?

Firstly, I don't expect the method to actually perform a useful join to be on String, I expect it to be on StringJoiner. The method on String is not referenced from StringJoiner at all.

Secondly, the method on String is static, whereas the Guava method is an instance method. This means that the Guava method can pickup additional state from the builder phase of the joiner, such as the ability to handle null. The Guava joiner can in fact handle Map joins as well thanks to its clever immutable instance-based design.

Thirdly, StringJoiner only works on CharSequence. By contrast, Guava's Joiner works on Object, which is much more useful in most circumstances.


So, why was StringJoiner written this way?

Well, partly, it is just bad API design. But the reason why no-one noticed is because you are not supposed to actually use the class!

The whole StringJoiner API is designed to be a tool used as a Collector, the mutable reduction phase of the new Java SE 8 stream API. In this context StringJoiner itself is not visible:

 String joined =
   .collect(Collectors.joining(", "));

In the simple case, this is longer than Guava and less discoverable, plus I had to manually map to a string. However, in more advanced stream cases it is a great tool to have.

The other advantage of StringJoiner over Guava Joiner is that it handles prefixes and suffixes. This is actually really useful, the classic example being to output the '[' and ']' at the start and end of a list. Ideally, Guava would add prefix and suffix handling to their Joiner.

The good news is that some of the flaws in StringJoiner can be mitigated in a later JDK version. However, since StringJoiner is fundamentally stateful and mutable it will never be comparable to Guava's Joiner.


Amusingly, for many of the day-to-day tasks in string building, the class I developed in Commons-Lang over 12 years ago, StrBuilder is still the best option. It takes the concept of StringBuilder class and adds many additional methods. Relevant to this discussion is:

 return new StrBuilder()
   .appendWithSeparators(list, ", ")

Note how the joining occurs naturally within the middle of a fluent set of method calls. Neither Guava nor JDK joiners can be used in this way.


The Java SE 8 StringJoiner class is in my opinion nothing more than a behind-the-scenes tool. It should only be used indirectly from String.join() or Collectors.joining(). If you use it directly you are liable to be frustrated.

Personally, I plan to continue using the Guava joiner, unless I am performing a mutable reduction of a stream.

Tuesday, 1 July 2014

ThreeTen-Backport vs Joda-Time

So which project should you choose? ThreeTen-Backport or Joda-Time?

Date and Time libarary pre-Java 8

Over the years, I have produced two libraries for Java date and time.

The first is Joda-Time, a library has become the de facto standard choice for many users.

The second is JSR-310, included as java.time in Java SE 8. Today I released v1.0 of ThreeTen-Backport, which makes the same JSR-310 API available in Java SE 6 and 7, although it is under a different package name.

So which should you choose?

If you are on Java SE 8 then you should use java.time (JSR-310). It tackles many issues with Joda-Time and is better integrated with the rest of the Java core libraries. Where necessary, consider using ThreeTen-Extra for any additional functionality that isn't in the JDK.

If you are on Java SE 6 or 7, then in general you should use Joda-Time. It is the standard option that other teams will be using, and it will be easier to interoperate if you stick to using Joda-Time.

The main use case for ThreeTen-Backport is if your team is using Java SE 6 or 7 but you are planning on moving to Java SE 8 in the near future. In this case, using the backport can smooth your future migration. However it will still require a package rename for your API users.

Another use case for the backport is to assist with backporting other Java SE 8 code to Java SE 7. See the retrolambda project (which could use ThreeTen-Backport but doesn't).

I hope that helps you to decide!

Thursday, 26 June 2014

Iterable in Java SE 8

In the last few weeks I've finally had the chance to use lambdas and streams in Java SE 8 in anger. In doing so, I've found much to like, but some rough edges.

Iterable woes

The Iterable interface was added in Java SE 5 to enable the foreach loop to work. It was defined with a single method that returns an iterator:

 public interface Iterable<T> {
   Iterator<T> iterator();

The simplicity of the interface was based around the primary use case, the foreach loop, which needed nothing more than a source of an iterator.

In Java SE 8, the interface changed, as it added two default methods:

 public interface Iterable<T> {
   Iterator<T> iterator();

   default void forEach(Consumer<? super T> action) {
     for (T t : this) {
   default Spliterator<T> spliterator() {
     return Spliterators.spliteratorUnknownSize(iterator(), 0);

Isn't this wonderful? Lots of new functionality for free! Er well, no.

Despite following the lambda-dev mailing lists, the problem I now have with Iterable isn't one I foresaw. Yet it took just a few days of active use of Java SE 8 to realise that Iterable is now a lot less useful than it was.

The problem as I see it, is that Iterable has been used for two different things in my code. The first, is as the parameter of a method, where it acted as a very abstract form of collection. The second is as a "common" interface on a domain object.

Receiving an Iterable parameter

The first case is where a method has been written to receive an Iterable parameter. In my code, I have defined many methods that take an Iterable, often alongside a varargs array:

   void addNames(Name... names);
   void addNames(Iterable<Name> names);

This pairing provides great flexibility to callers. If you have a set of actual name objects, you can use the varargs form. If you have an array of name objects, you can also use the varargs form. And if you have virtually any kind of collection, you can use the iterable form. (That abstract sense of just providing an iteration really broadens the kind of inputs allowed and blurs the line between collections and non-collections).

Code within the addName(Iterable) method will benefit to some degree from the new methods on Iterable. That is because the method receiving the Iterable knows nothing else about the input type other than it is an iterable.

Given this, the addition of the new forEach(Consumer) method is reasonable, as it provides additional looping options. But, in reality, there is little difference between these two:

   void addNames(Iterable<Name> names) {
     for (Name name : names) { ... }
   void addNames(Iterable<Name> names) {
     names.forEach(() -> ...);

Yes its a new method, but the actual new functionality is limited. Nevertheless, from this perspective it is a win.

By contrast, the addition of the new spliterator() method simply leads to frustration, as by itself the method has no value. What would have had value would have been a stream() method but there are good reasons why stream() was not added.

Instead, we have to write some rather ugly code to get the actually useful stream:

   void addNames(Iterable<Name> names) {, false)

As an aside, arrays manage this better:

   void addNames(Name[] names) {

Now why oh why oh why is there no Stream.of(Iterable) method?

Actually, I know why, but that is a story for another day...


Perhaps you're thinking that the reason why Iterable isn't supported well is that there is a Streamable interface that you are supposed to use for this purpose instead?

Er, no. There was a Streamable interface, but it got removed during development. There is no JDK supplied abstraction for a type that provides streams.

Perhaps this is philosophical - you could just declare and call using a supplier:

   void addNames(Supplier<Stream<T>> streamSupplier) {
   Collection<T> coll = ...
   T[] array = ...

Definitely a different kind of use of the type system.

After much thought and playing with the feel of the options available, I decided to keep on declaring methods to take Iterable and varargs as that is friendliest to users of the API. Internally, the mapping has to be done from iterable to stream, which necessitates a convenience method in a helper for my sanity.

Iterable on domain objects

The second major way I use Iterable is where I had more difficulty.

I have found over the years that some domain objects are little more than a type-safe wrapper around a collection. Consider a FooSummary object that contains a list of FooItem. In many cases, I have made FooSummary implement Iterable<FooItem>:

 public class FooSummary implements Iterable<FooItem> {
   List<FooItem> items;
   public Iterator<FooItem> iterator() {
     return items.iterator();

Doing this allowed the object to be iterated over directly in a foreach loop.

 FooSummary summary = ...
 for (FooItem item : summary) { ... }

This approach is very useful for those domain objects that are primarily a collection of some other object, yet are distinct enough to have a class of their own.

However, this kind of approach relied on using Iterable as a kind of "common" interface, similar to Comparable or Serializable. These small "common" interfaces work on the basis that they provide a common language across a very broad spectrum of use cases. Comparable does nothing other than define the method needed to participate in comparisons. I used Iterable in exactly that sense - to accept the common language for things that can be iterated over, without any implication that they were actual collections.

In my opinion, usage of Iterable as a "common" interface has been all but destroyed by Java SE 8.

The FooSummary class is essentially a domain object. The methods it provides must make sense as a set in their own right. Adding the iterable() method was perfectly acceptable in exactly the same way as adding comparable() is. It is a well-known method with clearly defined properties and usage. However, in Java SE 8, the domain object has had two additional methods foisted upon it.

The issue is that from the perspective of a user of FooSummary, the two additional methods are net negative, rather than positive. While there is nothing wrong in theory with being able to call forEach(Consumer) or spliterator(), it needs to be a specific and separate decision to add them to the domain object.

In my cases, I definitely do not want those extra methods. As such, I have no choice but to stop treating Iterable as a "common" interface. It can now only be used by classes that really make sense to be treated as collections.

In practical terms, this change simply makes life for callers slightly more verbose:

 FooSummary summary = ...
 for (FooItem item : summary.getItems()) { ... }

Where the line between a collection and a non-collection could previously be blurred through Iterable, that option is not really present any more. As such, I feel an API design tool has been removed from my tool-box.

(Another example I played with was a Tuple interface abstracting over Pair and Triple. Again, while adding an iterator() method to Tuple is desirable, adding forEach(Consumer) and spliterator() was not. Here, none of the available options are very pleasing.)


The Iterable interface has had two default methods added in Java SE 8. These make reasonable sense when receiving an iterable. Unfortunately, the difficulty of obtaining an actual stream is rather annoying.

The more significant problem for me is that Iterable is no longer suitable for use as a "common" interface. The extra methods, and the threat of more in future releases mean that the use cases for Iterable have been reduced. It should now only be used for classes that really are collections, something that I find to be a significant reduction in usability.

Any thoughts on iterable, "common" interfaces and default methods? Has Iterable been ruined by the addition of default methods?

Friday, 21 March 2014

VALJOs - Value Java Objects

The term "value object" gets confusing in Java circles with various different interpretations. When I think about these I have my own very specific interpretation, which I'm christening VALJOs.


The Wikipedia article on value objects gives this definition:

In computer science, a value object is a small object that represents a simple entity whose equality is not based on identity: i.e. two value objects are equal when they have the same value, not necessarily being the same object.

Classic examples of values are numbers, amounts, dates, money and currency. Martin Fowler also gives a good definition, and related wiki.

I agree with the Wikipedia definition, however there is still a lot of detail to be added. What does it take to write a value object in Java - is there a set of rules?

Recently, life has got more complicated with talk of adding value types to a future Java release:

"A future version of Java" will almost certainly have support for "value types", "user-defined primitives", "identity-less aggregates", "structs", or whatever you would like to call them.

Work on value types is focussed on extending the JVM, language and libraries to support values. Key benefits revolve and memory usage and performance, particularly in enabling the JVM to be much more efficient. Since this post isn't about value types, I'll direct curious readers to the JEP and various articles on John Rose's blog.

JVM supported value types may be the future, but how far can be go today?

VALue Java Objects - VALJOs

What I want to achieve is a set of criteria for well-written value objects in Java today. Although written for today, the criteria have one eye on what value types of the future may look like.

As the term "value object" is overloaded, I'm giving these the name VALJO. Obviously this is based on the popular POJO naming.

  • The class must be immutable, which implies the final keyword and thread-safety.
  • The class name must be simple and direct, focussed on the value.
  • Instances must be obtained using static factory methods. All constructors must be private.
  • The state that defines the value must be clearly specified. It will consist of one or more elements.
  • The elements of the state must be other values, which includes primitive types.
  • The equals() and hashCode() methods must check all the elements of the state and nothing else.
  • If the class implements Comparable, then it must check all the elements of the state and nothing else.
  • If the class implements Comparable, the compareTo() method must be consistent with equals.
  • The toString() method must return a formally-defined string fully exposing the state and nothing else.
  • The toString() for two equal values must be the same. For two non-equal values it must be different.
  • There must be a static factory method capable of creating an instance from the formal string representation. It is strongly recommended to use the method name parse(String) or of(String).
  • The clone() method should not be public.
  • Provide methods, typically simple getters, to get the elements of the state.
  • Consider providing with() methods to obtain a copy of the original with different state.
  • Other methods should be pure functions, depending only on the arguments are the state or derived state.
  • Consider providing a link to the JDK 8 value-based classes documentation.

It is important to note that the state of a VALJO is not necessarily identical to the instance variables of the Class. This happens in two different ways:

Firstly, the state could be stored using different instance variables to those in the definition. For example, the state definition and public variables may expose an int but the instance variable might be a byte, because the validated number only needs 8 bit storage, not 32 bit. Similarly, the state definition might expose an enum, but store the instance variable as an int. In this case, the difference is merely an implementation detail - the VALJO rules apply to the logical state.

Secondly, there may be more instance variables than those in the state definition. For example, a Currency is represented in ISO 4217 with two codes, a three letter string and a three digit number. Since it is possible to derive the numeric code from the string code (or vice versa), the state should consist only of the string. However, rather than looking up the number each time it is needed, it is perfectly acceptable to store the numeric code in an instance variable that is not part of the state. What would not be acceptable for Currency is including numeric code in the toString (as the numeric code is not part of the state, only the string code is).

In effect, the previous paragraph simply permits caching of related data within a VALJO so long as it can be derived from the state. Extending this logic further, it is acceptable to cache the toString() or hashCode().

On parsing, I recommend using of(String) if the state of the VALJO consists entirely of a single element which is naturally a string, such as Currency. I recommend parse(String) where the state consists of more than one element, or where the single element is not a string, such as Money.

It is acceptable for the parse method to accept alternate string representations. Thus, a Currency parsing method might accept both the standard three letter code and the string version of the numeric code.

On immutability, it is allowed to have an abstract VALJO class providing the constructor is private scoped and all the implementations are immutable private static final nested classes. The need for this is rare however.

Using VALJOs

It is best practice to use VALJOs in a specific way tailored for potential future conversion to JVM value types:

  • Do not use ==, only compare using equals().
  • Do not rely on System.identityHashCode().
  • Do not synchronize on instances.
  • Do not use the wait() and notify() methods.

The first of these rules is stronger than absolutely necessary given we are talking about normal Java objects in a current JVM. Basically, so long as your VALJO implementation promises to provide a singleton cached instance for each distinct state then using == is acceptable, because == and equals() will always return the same result. The real issue to be stopped here is using == to distinguish between two objects that are equal via equals().

Separately from these restrictions, the intention of VALJOs is that users should refer to the concrete type, not an interface. For example, users refer to the JSR-310 LocalDate, not the Temporal interface that it implements. The reason for this is that VALJOs are, of necessity, simple basic types, and hiding them behind an interface is not helpful.


Actually writing VALJO classes is boring and tedious. The rules and implications need to be carefully thought through on occasion. Some projects provide tools which may assist to some degree.

Joda-Convert provides annotations for @ToString and @FromString which can be used to identify the round-trippable string format.

Joda-Beans provides a source code generation system that includes immutable beans. They are not instant VALJOs, but can be customised.

Auto-Value provides a bytecode generation system that converts abstract classes into immutable value objects. They are not instant VALJOs, but can be customised.

Project Lombok provides a customised compiler and IDE plugin which includes value object generation. They are not instant VALJOs, but can be customised.

In general however, if you want to write a really good VALJO, you need to do it by hand. And if that VALJO will be used by many others, such as in JSR-310, it is worth doing correctly.


The aim of this blog was to define VALJOs, a specific set of rules for value objects in Java. The rules are intended to be on the strict side - all VALJOs are value objects, but not all value objects are VALJOs.

Did I miss anything? Get something wrong? Let me know!

Monday, 10 February 2014

New project: ThreeTen-Extra for JDK 8

JDK 8 includes JSR-310, a new date and time library. But what about functionality that didn't make it into the JDK?


The main ThreeTen project is now essentially complete. The project was developed and delivered via JSR-310 into OpenJDK and JDK 8.

However, as part of that process, certain pieces of functionality were rejected and/or excluded from the JDK. This was sometimes due to scope management and sometimes due to whether something was appropriate for the JDK.

The TheeTen-Extra project provides a home for that functionality.

ThreeTen-Extra is a project on GitHub that provides additional functionality. It is delivered via Maven Central and is dependent on JDK 8.

The following functionality is currently provided:

  • DayOfMonth temporal value type
  • DayOfYear temporal value type
  • AmPm temporal enum
  • Quarter temporal enum
  • YearQuarter temporal value type
  • Days, Months and Years amount value types
  • Next/previous day adjusters that skip the weekend (Saturday/Sunday)
  • Coptic calendar system
  • Support for the TAI and UTC time-scales

The project has spare space to add more functionality, so long as it is generally applicable. For example, additional calendar systems would be a good fit. Feel free to raise a pull request with your ideas.


The ThreeTen-Extra project is now available, providing an additional jar file of date/time code that builds on java.time (JSR-310) in JDK 8.

Comments welcome!

Saturday, 8 February 2014

Turning off doclint in JDK 8 Javadoc

JDK 8 includes many updates, but one is I suspect going to cause quite a few complaints - doclint for Javadoc.

Javadoc doclint

Documentation is not something most developers like writing. With Java, we were fortunate to have the Javadoc toolset built in and easy to access from day one. As such, writing Javadoc is a standard part of most developers life.

The Javadoc comments in source code use a mixture of tags, starting with @, and HTML to allow the developer to express their comment and format it nicely.

Up to JDK 7, the Javadoc tool was pretty lenient. As a developer, you could write anything that vaguely resembled HTML and the tool would rarely complain beyond warnings. Thus you could have @link references that were inaccurate (such as due to refactoring) and the tool would simply provide a warning.

With JDK 8, a new part has been added to Javadoc called doclint and it changes that friendly behaviour. In particular, the tool aim to get conforming W3C HTML 4.01 HTML (despite the fact that humans are very bad at matching conformance wrt HTML).

With JDK 8, you are unable to get Javadoc unless your tool meets the standards of doclint. Some of its rules are:

  • no self-closed HTML tags, such as <br /> or <a id="x" />
  • no unclosed HTML tags, such as <ul> without matching </ul>
  • no invalid HTML end tags, such as </br>
  • no invalid HTML attributes, based on doclint's interpretation of W3C HTML 4.01
  • no duplicate HTML id attribute
  • no empty HTML href attribute
  • no incorrectly nested headers, such as class documentation must have <h3>, not <h4>
  • no invalid HTML tags, such as List<String> (where you forgot to escape using &lt;)
  • no broken @link references
  • no broken @param references, they must match the actual parameter name
  • no broken @throws references, the first word must be a class name

Note that these are errors, not warnings. Break the rules and you get no Javadoc output.

In my opinion, this is way too strict to be the default. I have no problem with such a tool existing in Javadoc, but given the history of Javadoc, errors like this should be opt-in, not opt-out. Its far better to get slightly broken Javadoc than no Javadoc.

I also haven't been able to find a list of the rules, which makes life hard. At least we can see the source code to reverse engineer them.

Turning off doclint

The magic incantation you need is -Xdoclint:none. This goes on the command line invoking Javadoc.

If you are running from maven, you need to use the additionalparam setting, as per the manual. Either add it as a global property:


or add it to the maven-javadoc-plugin:


Ant also uses additionalparam to pass in -Xdoclint:none, see the manual.

Gradle does not expose additionalparam but Tim Yates and Cedric Champeau advise of this solution:

  if (JavaVersion.current().isJava8Compatible()) {
    allprojects {
      tasks.withType(Javadoc) {
        options.addStringOption('Xdoclint:none', '-quiet')

See also the Gradle manual.


I don't mind doclint existing, but there is no way that it should be turned on to error mode by default. Getting some Javadoc produced without hassle is far more important than pandering to the doclint style checks. In addition, it is very heavy handed with what it defines to be errors, rejecting plenty of HTML that works perfectly fine in a browser.

I've asked the maven team to disable doclint by default, and I'd suggest the same to Ant and Gradle. Unfortunately, the Oracle team seem convinced that they've made the right choice with errors by default and their use of strict HTML.

Comments welcome, but please note that non-specific "it didn't work for me" comments should be at Stack Overflow or Java Ranch, not here!