Monday 28 January 2008

Java 7 - Multi-line String literals

One of the most common features in other programming languages is the multi-line String literal. Would it be possible to add this to Java?

Update, 2011-10-31, Just wanted to note that multi-line strings are not in Java 7, nor are they likely to be in Java 8. This blog post is still useful to understand some of the difficulties that would have to be tackled if they were to be included in future.

Update, 2018-01-28: This is now being considered for addition to Java, read more here.

Multi-line String literals

In Java today there is only one form of string literal, supplied in double quotes. Within those double quotes, certain characters have to be escaped.

 String basic = "Hello";
 String three = "This string\nspans three\nlines";
 String welcome = "Hello, My name is \"Stephen\", Hi!";

The first of these two examples is not complex, and would not make use of a multi-line String literal. The other two might be more readable with such a literal.

The standard for defining a multi-line String literal in both Scala and Groovy is three double quotes. This also seems like a sensible choice for Java:

  String three = """This string
spans three
lines""";

This is potentially much more readable, especially with large blocks of text. This form of literal would also avoid the need for escaping:

 String welcome = """Hello, My name is "Stephen", Hi!""";

Note that we no longer need to escape the double quotes. This would be especially useful for regular expressions.

Bear in mind that multi-line String literals are fundamentally no different to normal String literals on the key point of the object created. Both would create java.lang.String objects.

Issues

The first issue is the multi-line arrangement. Since all text within the multi-line literal is included, all lines except the first must begin from column zero. This will look odd in a piece of well-formatted Java code:

  // what a naive multi-line literal forces us to write
  public class MyClass {
    public void doStuff() {
      String three = """This string
spans three
lines""";
      System.out.println(three);
    }
  }
  
  // what we'd like to write
  public class MyClass {
    public void doStuff() {
      String three = """This string
                        spans three
                        lines""";
      System.out.println(three);
    }
  }

One possible solution is to provide a method on String that strips all whitespace after each newline. This could be called directly after the literal. Unfortunately this approach loses some efficiency as the string must be trimmed each time:

  // option with trimNewline()
  public class MyClass {
    public void doStuff() {
      String three = """This string
                        spans three
                        lines""".trimNewLine();
      System.out.println(three);
    }
  }

Another, perhaps better, solution might be to have a syntax variation. If the opening triple quote is followed immediately by a newline, then the position of the first non-space character on the next line represents the column to begin the literal at. (The first newline would not be included in this form of the literal.) Only the space character would be permitted in earlier columns until the end of the literal. This would allow for natural formatting of this kind of string:

  // option with columns determined by first line
  public class MyClass {
    public void doStuff() {
      String three = """
              This string
              spans three
              lines""";
      System.out.println(three);
    }
  }

One final tricky issue is handling a string containing the triple double quote. The answer is probably to ignore this situation (Scala does this). It is going to be very rare, and it can be worked around using string concatenation.

Summary

Multi-line String literals should be a relatively easy addition to Java (anyone fancy adding it to Kijaro?). The main benefits would be avoiding escaping in regular expressions, and pasting in large blocks of text from other sources.

Overall, I think they would be a valuable addition to Java. But have I missed any obvious issues? Are there any other syntax options that should be considered? Opinions welcome as always :-)

Friday 25 January 2008

Closures - Comparing closure type inference

In this blog I'm going to compare the three principle 'closure' proposals based on how they infer types. This follows my previous comparison blogs.

Here are the links to the proposals if you want to read more:

  • BGGA - full closures for Java
  • CICE - simplified inner classes (with the related ARM proposal)
  • FCM - first class methods (with the related JCA proposal)

Type inference

Type inference is where the compiler works out the type of a given expression without the programmer needing to explicitly state the type. For example:

 String str = "Hello";

In the above, the type of str is known to be a String because it is stated by the developer. However, there is no actual need for the developer to manually state the type of 'String'. Instead, the compiler (not the Java 1.6 compiler) could work it out itself by infering the type from the literal "Hello" - hence 'type inference'.

Closures

Lets consider how the closures proposals allow some limited type inference in Java. Consider this single method interface:

 public interface PairMatcher<T,U> {
   boolean matches(T arg1, U arg2);
 }

The purpose of the interface is to allow developers to write a callback that takes two arguments and returns true if they 'match' some criteria that is important to the developer. Here is an example of such a framework method, that searches through two lists and returns the matched entry:

 public T find(List<T> list1, List<U> list2, PairMatcher<T,U> matcher) {
   for (T t : list1) {
     for (U u : list2) {
       if (matcher.matches(t, u)) {
         return t;
       }
     }
   }
   return null;
 }

And here is how calling this code looks today using an inner class:

 List<String> stringList = ...
 List<Integer> integerList = ...
 String str = find(stringList, integerList, new PairMatcher<String,Integer>() {
   public boolean matches(String str, Integer val) {
     return (str != null && str.length() > 0 && val != null && val > 10);
   }
 });

BGGA

With BGGA, the code can be shortened to use use a closure:

 List<String> stringList = ...
 List<Integer> integerList = ...
 String str = find(stringList, integerList, {String str, Integer val =>
   (str != null && str.length() > 0 && val != null && val > 10)
 });

Obviously the code is shorter. But it is important to understand what has occurred. The type PairMatcher, and its generic arguments, have been inferred. This happened as the BGGA compiler identified which method was being called. In order to call the method, a 'closure conversion' occurred, which changed the closure to a PairMatcher with the correct generics.

So, not only is the type inferred, but the generic arguments are as well.

FCM

FCM can be used, like BGGA, to shorten the original:

 List<String> stringList = ...
 List<Integer> integerList = ...
 String str = find(stringList, integerList, #(String str, Integer val) {
   return (str != null && str.length() > 0 && val != null && val > 10);
 });

Exactly the same conversion and type inference is occurring as in BGGA. Thus, for this scenario of type inference, FCM and BGGA have the same power, just different syntax.

CICE

CICE also provides a means to shorten the original:

 List<String> stringList = ...
 List<Integer> integerList = ...
 String str = find(stringList, integerList, PairMatcher<String,Integer>(String str, Integer val) {
   return (str != null && str.length() > 0 && val != null && val > 10);
 });

I hope that this example makes it clear that there is no type inference here as there was in FCM and BGGA. Instead, the developer still has to type the interface name and, more significantly, the generic arguments.

While just typing the interface name might be regarded as a documentation benefit by some, I would suggest that it is very hard to justify the retyping of the generic arguments. Clearly, for this scenario of type inference, CICE has less power than FCM and BGGA.

Comparison

Both BGGA and FCM support the same style of type inference for the single method interface and it's generic arguments. CICE does not, and with generics that results in quite a verbose result for the developer to type.

Summary

There are two basic approaches here - infer or be explicit. My view is that this is an area where inference is really needed. With CICE, I find it hard to see how this is really much of a gain over an inner class.

Opinions welcome as always :-)

Monday 7 January 2008

Java 7 - Checked exception library change

Personally, I don't like checked exceptions. But why? And can we do anything about it?

Checked exceptions

Its fairly well known that checked exceptions were a kind of experiment in Java. Its also fairly well known that other newer languages are not choosing to follow. The basic reason is that as a language feature they haven't met their goal.

The expectation was that checked exceptions would improve exception handling quality. The reality is that they haven't:

  public void process() {
    try {
      callSomeExceptionThrowingMethod();
    } catch (Exception ex) {
      // ignore
    }
  }

I hope that most of us recognise this as an anti-pattern (in most cases). Rather than discuss this much longer, I'll point everyone at Joshua Gertzen's blog.

His blog reminded me of a possible library change (yes library, not language!) I'd thought of that might help. At least it is something I'd like opinions on:

  public void process() {
    try {
      callSomeExceptionThrowingMethod();
    } catch (Exception ex) {
      ex.rethrowUnchecked();
    }
  }

So, what does 'rethrowUnchecked()' on Exception do? Well it would rethrow the exception, but ignoring the checked exception status of the exception.

  public void rethrowUnchecked() {
    Unsafe.getUnsafe().throwException(this);
    return null;
  }

This all works with Java today. There is no language change, just an alternative view of an existing capability.

So, why would this be useful? Well, I suggest that this could be a better way to handle annoying checked exceptions that we don't really care about. In particular, I believe that it would be easier to understand the exception trace, as it wouldn't consist of lots of nested exceptions - no more 'throw new RuntimeException(ex)'.

Summary

I'm not convinced of the benefits of this one yet. It seems neat, but does it really benefit us, as we still have to write the catch block? Perhaps it is more suited to a language change? Opinions welcome as always :-)

Tuesday 1 January 2008

Closures - Comparing control structures of BGGA, ARM and JCA

In this blog I'm going to compare the second major part of the three principle 'closure' proposals - control structures. This follows my previous blog where I compared one method callbacks.

Here are the links to the proposals if you want to read more:

  • BGGA - full closures for Java
  • CICE - simplified inner classes (with the related ARM proposal)
  • FCM - first class methods (with the related JCA proposal)

Comparing 'closure' 'control structure' proposals

Firstly, what do we mean by control structure problems? Well, there are three classic use cases:

 // resource management
 try {
   // open resource
   // use resource
 } finally {
   // close resource
 }

 // locking
 try {
   // lock
   // process within the lock
 } finally {
   // unlock
 }

 // looping around a map
 for (Map.Entry<String,Integer> entry : map.entrySet()) {
   String key = entry.getKey();
   Integer value = entry.getValue();
   // process map
 }

Each of the three principle closure proposals provide a solution to some or all of these issues.

BGGA

BGGA treats control structures as simply an alternative way to invoke a method. This allows any developer to write control structures, something referred to as library-defined control structures.

Thus, with BGGA, you can write a method to manage a lock. This can be called in two ways - either by passing the closure as a parameter, or by following the method call by the block.

 // library defined control structure
 public void withLock(Lock lock, {=>void} block) {
   lock.lock();
   try {
     block.invoke();
   } finally {
     lock.unlock();
   }
 }

 // call using closure as parameter
 withLock(lock, {=>
   // process within the lock
 });

 // call using control invocation syntax
 withLock(lock) {
   // process within the lock
 }

Whichever style is used, the meaning of return and this is the same. The return keyword will return from the lexically enclosing method, which is entirely sensible for control invocation. The this keyword also refers to the lexically enclosing class.

CICE/ARM

CICE does not support control structures. Instead, this area is covered by the ARM proposal.

The basic idea behind the ARM proposal is that control structures should only be added by language designers, and should not be able to be added to libraries.

The ARM proposal itself is merely an outline proposal and does not have detail. However, Josh Bloch has indicated that he would see ARM tackling specific issues, notably resource management and locking.

 // resource management
 try (BufferedReader r = new BufferedReader(new FileReader(file))) {
   // process resource
 }

 // locking
 protected(lock) {
   // process within the lock
 }

As this is a language change, it has to wait until a new version of Java is released. In addition, the meaning of return and this is naturally as per any other language defined control statement, such as a normal try block.

FCM/JCA

FCM does not support control structures. Instead, this area is covered by the JCA proposal.

The basic idea behind the JCA proposal is that control structures can be added by anyone, but that they should clearly be separated from the one method callback cases. The aim of the separation is to send a message to developers - adding a control structure in a library is something that should be undertaken with care.

 // library defined control structure
 public void withLock(#(void()) block : Lock lock) {
   lock.lock();
   try {
     block.invoke();
   } finally {
     lock.unlock();
   }
 }

 // call
 withLock(lock) {
   // process within the lock
 }

It should be noted that this is similar to the BGGA design. The calling code is identical to BGGA's control invocation syntax. However, you cannot pass the block of code as a parameter to the method as you can in BGGA - there is only one calling syntax.

The library defined control structure is very different however. In JCA, the library method has a special syntax with the block before the colon, and the input parameters after the colon. This matches the callers syntax, and clearly indicates that this is a 'special' kind of method.

Again, the meaning of return and this is lexically scoped, as with BGGA and ARM.

Comparison

Each of the three approaches has merits.

BGGA provides a full-featured approach, where a closure is treated as it would be in most other languages. Control structures can be written in libraries, and the control invocation syntax is just sugar for a normal method call.

ARM provides a use-case based approach. It solves specific problems with specific language changes. This doesn't allow ordinary developers to add control structures, but does provide valuable, reliable, enhancements.

JCA sits between BGGA and ARM, but because it allows library-defined control structures it is closer to BGGA. It differs from BGGA in that it treats control structures as effectively a separate language change to one method callbacks. By doing this, and with specific syntax, it aims to slightly discourage the extensive use of library-defined control structures.

Summary

There are two basic approaches to control structures - let language designers write them, eg. ARM, or let anyone write them, eg. BGGA or JCA. The BGGA and JCA proposals differ mainly on syntax, where JCA is using the syntax to warn against excessive or inappropriate use of control structures.

My own view is that everyone should have the right to write control structures, and that it will allow the Java language to keep fresh without lots of pressure on Sun to add new language features.

However, the right to add control structures comes with responsibilities. If everyone writes control structures all the time, then the resulting code might look more like a dialect of Java than Java itself. That is why I favour JCA, which aims through syntax and documentation to encourage serious consideration before writing a control structure.

Opinions welcome as always :-)