Stephen Colebourne's blog: February 2007

Tuesday 27 February 2007

First-Class Methods: Java-style closures

The debate on Closures in Java 7 has been limited to two proposals so far. Today Stefan Schulz and myself would like to offer a third option - First-Class Methods: Java-style closures (FCM) - to the Java community.

First-Class Methods: Java-style closures

Our aim when writing this proposal was to provide developers with much of the power of full-blown closures, but without the complexity. The syntax had to fit neatly within Java, but more importantly so did the semantics. The result of our work is a closure proposal that focuses on methods rather than closures, hence the name 'First-class methods'.

For those currently programming in Java, the method is the principal construct where application logic resides. Our proposal simply takes methods and allows them to be used as first-class objects within the application. But the new constructs are still recognisably methods. This results in a rapid learning curve - since all developers are already familiar with methods.

The proposal adds four new syntax elements:

Method literals - a type-safe way to refer to a method
Method types - a way to define the parameters and result of a method
Invocable method references - a reference to a method-object combination that together can be invoked
Inner methods - a way to write a method inline within another method, similar to inner classes

Full details of the syntax and semantics can be found in the First-Class Methods proposal.

Example

To whet your appetite, here is an example of creating a comparator using an inner class in Java 5:

  List<String> list = ...
  Collections.sort(list, new Comparator<String>() {
    public int compare(String str1, String str2) {
      return str1.length() - str2.length();
    }
  });

And here is the code rewritten using an inner method from the proposal:

  List<String> list = ...
  Collections.sort(list, #(String str1, String str2) {
    return str1.length() - str2.length();
  });

The inner method version will compile to almost exactly the same bytecode as the inner class version. But the syntax is much shorter and the semantics clearer.

Relationship to other proposals

There are two other closure proposals being debated. The CICE proposal aims to simplify inner class creation, while the BGGA proposal aims to introduce full-blown closures and API driven control flow statements.

The First-Class Methods (FCM) proposal is located between CICE and BGGA. FCM is considerably more powerful than CICE, which is still lumbered with the dreaded OuterClass.this syntax of the inner class. Equally, FCM doesn't get caught up in the obsessive BGGA requirement of Tennent's Correspondence Principles with its myriad of nasty implications (such as last-line-no-semicolon-return, exceptions for control-flow and RestrictedFunction). Instead FCM focuses on empowering the humble, but familiar, method.

Summary

The First-Class Methods proposal (FCM) proposal provides an alternative to the two existing closure proposals. It focusses on the familiar concept of methods, providing all the syntax and semantics necessary to use them to their full potential. The result is a proposal that gives the developer power without complexity or surprise and would make a great Java language change.

As always, Stefan and I would love to hear your feedback on this proposal.

Tuesday 20 February 2007

Java 7 - List and Map literals

One of the small pieces of syntax sugar that many languages have is literals for lists and maps. How would these appear if we added them to Java?

List literals

The simplest syntax approach would be to copy what Groovy has already done.

  List<String> list = ["Abba", "Beatles", "Corrs"];
  List<String> emptyList = [];

This syntax is pretty clear and simple, creating an ArrayList (no opportunity to create a LinkedList etc.). The problems occur when you start considering the complications of generics.

The simple case is assignment, as shown above. Here, the generic type of the RHS would be inferred from the LHS without difficulty. However, if the literal isn't assigned to a variable then its a lot more complex.

  ["Abba", "Beatles", "Corrs"].add(new Date());  // compile error?
  [].add(new Date());   // compile error?

The first option is to infer the generic type from the content of the list literal (common type shared between all elements). The second option is to use List<Object> in this case.

Verbose list literals

As an alternative syntax, we could use array literals as our syntax inspiration:

  List<String> list = new ArrayList<String>() ["Abba", "Beatles", "Corrs"];
  List<String> emptyList = new ArrayList<String>();

The advantage of this syntax is that the list implementation class and the generic type is clear. The disadvantage is the verbosity.

Map literals

Map literals share exactly the same issues as list literals. The only difference is that maps have a key and a value. Again using Groovy as a basis, a colon would separate the key from the value:

  // either with the short syntax
  Map<String, Integer> map = ["Abba" : 1972, "Beatles" : 1960, "Corrs" : 1995];
  Map<String, Integer> emptyMap = [:];

  // or using the verbose syntax
  Map<String, Integer> map = new HashMap<String, Integer>()
                                    ["Abba" : 1972, "Beatles" : 1960, "Corrs" : 1995];

Summary

List and map literals should be a really simple addition to Java. But they're not when you examine the detail. In this case, Java's static types and generics combine to cause difficulty.

Which of the two options (short or verbose) would I pick? To be honest, I don't know. Neither is exactly what I hoped for when I started writing this blog!

So, feedback welcomed on better alternatives!

Monday 12 February 2007

Java language - anonymous methods (CICE)

In my last entry I wrote about method adaptation as an alternative to closures. What if we extended the syntax there to allow anonymous methods?

Anonymous methods

Using the example from last time, we had created a method adaptor that looked as follows:

public void init() {
  executor.execute((Runnable->run()) this->processTask());
}
private void processTask() {
  // lots of code to run in the executor
}

Here, the this->processTask() syntax is being automatically adapted to the required Runnable interface. But often, we don't want to create a separate method for this, so lets create the method anonymously:

public void init() {
  executor.execute(Runnable->run() {
    // code of anonymous method to run in the executor
  });
}

This would operate in the same way as an inner class, except with one method. In other words, the contents of the anonymous method has the same rules and restrictions as a method does when written within the normal inner class syntax. Hence, 'return' will return from the anonymous method, not the surrounding init() method. Thus, the above is merely a exact shorthand for:

public void init() {
  executor.execute(new Runnable() {
    public void run() {
      // code to run in the executor
    }
  });
}

The code can only access final variables as with inner classes. But, that should be addressed for both anonymous inner classes and anonymous methods separately.

For the sake of completeness, here is an example with parameters:

public void init() {
  Collections.sort(list,
    Comparator<String>->compare(String a, String b) {
      return a.length() - b.length();
    }
  );
}

For those of you following the Java language debate, this is of course just an amended version of the CICE proposal.

Compared to BGGA

So why consider this route (method adaption and anonymous methods) rather than full blown BGGA closures?

Firstly, it is less of a jump for existing Java programmers. This is simply a slightly shorter syntax for creating an inner class. I suspect many Java developers will appreciate that. This change also works well together with method adaptation, which provides an alternative way to refactor the common case of the single line inner class that calls another method.

Secondly, it is fully and clearly documentable. The parameter being passed to the executor or the sort method is a real interface, which can be documented in all the detail needed. Documenting a BGGA closure method will have to be done for every definition.

Thirdly, there is no confusion over different syntaxes depending on whether its a synchronous or asynchronous use-case. There is no need for a hacky RestrictedClosure interface.

Finally, it feels more correct to be passing an object, rather than an arbitrary block of code, to a method that processes it at some later point in time. An object is expected to have its own lifecycle - whereas a developer expects a block of code to be executed immediately.

However, there are downsides too. Firstly, anonymous methods are less powerful. There is no ability to have function types, and the functional programming styles that accompany them.

Secondly, anonymous methods are more verbose. The BGGA proposal presents examples that are extremely short and pithy, including for executors. Except of course that the rules for what code you can put in that block differ from other blocks.

Finally, the synchronous use cases are not served that well by anonymous methods. The control statement form of BGGA is very expressive of the intent of the coder in those cases (for example looping around a map, locking or working with files).

Summary

What is my opinion? Well I haven't quite made up my mind yet. I should be a natural supporter of BGGA, as I am in favour of pushing Java forward. But BGGA bothers me in lots of annoying little ways.

So, consider my blogs on anonymous methods and method adaption as my attempt to explore an alternative to BGGA. Comments welcome as always.

Tuesday 6 February 2007

Java language - method adaption

BGGA Closures has been following one course to address the verbosity of defining an inner class in Java. In this blog I want to consider an alternative that could be more useful in some circumstances.

Method adaption

Consider a long running task that will use the Java 5 executor framework (I could also have used a swing example):

public void init() {
  executor.execute(
    new Runnable() {
      public void run() {
        // lots of code to run in the executor
      }
    }
  );
}

The problem is that frequently, the code to be run gets quite long. So, what may happen is that the code is refactored. One refactoring is to use top-level classes, but lets consider the simpler alternative - method refactoring:

public void init() {
  executor.execute(
    new Runnable() {
      public void run() {
        processTask();
      }
    }
  );
}
private void processTask() {
  // lots of code to run in the executor
}

However, the developer is really having to do a lot of tedious drudgery to achieve the goal here. Could it be simplified? Perhaps we could use method literals:

public void init() {
  executor.execute(this->processTask());
}
private void processTask() {
  // lots of code to run in the executor
}

So, two things have happened here. Firstly, a method literal has been used - this->processTask(). This represents the processTask() method on this specific object.

The second thing that has happened is that an adaptor has been generated for us. This is a new class that implements Runnable, and calls processTask() from the run() method exactly as we would have if we coded it ourselves.

Obviously, there is a restriction that the method signature of processTask() must match the method signature of Runnable.run(). But that seems perfectly reasonable.

My one reservation is whether too much information has been lost with this. (The choice of Runnable is being inferred in the example.) One variation is to explicitly specify the API and method that is to be adapted to:

public void init() {
  executor.execute((Runnable->run()) this->processTask());
}
private void processTask() {
  // lots of code to run in the executor
}

Here in this variation, the 'cast' is specifying the method signature to be adapted to, again using a method literal (this time not tied to a specific instance). This is a more general syntax, as it also allows assignment to a variable:

public void init() {
  Runnable r = (Runnable->run()) this->processTask();
  executor.execute(r);
}
private void processTask() {
  // lots of code to run in the executor
}

The advantage of this variation is that it handles cases like MouseListener where there are many possible methods to adapt to. This disadvantage is that there is more code which could be inferred.

But which of the two variations works best? Or perhaps the first is just a special case of the second? And is a 'cast' the appropriate syntax here?

Summary

Method adaption could be a useful language change. It certainly doesn't compete with BGGA closures for raw power, but it does provide an alternative for many async use cases that could be viewed as being a lot clearer.

As always, please view this as a concept for discussion (and its certainly not a new idea) I've not proven it can be implemented, but there's nothing obvious hitting me yet. As such, all opinions are welcome...

Friday 2 February 2007

Java language - Enum Switch

Today I want to look at the switch statement for enums and how they could be better validated.

enum switch

Lets consider a simple enum:

public enum Colour {
  RED, BLUE, GREEN
}

And how we might use it in a switch statement:

Colour col = ...
switch (col) {
  case RED:
    // do stuff with red
    break;
  case BLUE:
    // do stuff with blue
    break;
  case GREEN:
    // do stuff with green
    break;
}

But what happens if we add another colour, BLACK, to the enum?

public enum Colour {
  RED, BLUE, GREEN, BLACK
}

At this point our nice switch statement silently continues without doing anything. Which probably isn't what we want. So, we would need to search our entire application to find switch statements referencing Colour and update them.

So, could we do better? Well, what if we could mark the switch statement as 'must contain all the values of the enum':

Colour col = ...
@CompleteEnumSwitch
switch (col) {
  case RED:
    // do stuff with red
    break;
  case BLUE:
    // do stuff with blue
    break;
  case GREEN:
    // do stuff with green
    break;
}

Now, with the new annotation, this piece of code won't compile (as BLACK isn't handled). We would need to either add a case for BLACK, or a default clause.

Using an annotation here seems appropriate. It doesn't affect the bytecode, but does validate it. In other words, its very similar to @Override.

There is a downside to the annotation approach - its compile time only. If BLACK is added to the enum without recompiling the switch statement, then there will be no error either at compile time or runtime. But this seems a reasonable balance for the feature.

There's another problem though - an annotation can't be added at this point in the code at present. However, JSR 308 is working on this right now, so hopefully this restriction will disappear.

Moving beyond @CompleteEnumSwitch, I could also envisage an @NoFallThroughSwitch which would make falling through not compile in switch statements.

Any thoughts on this concept?

BTW, thanks to Ricky Clarkson for talking about complete switches which triggered me to write this.