Tuesday, 30 January 2007

Announcing JSR 310 - Date and Time API

For too long we've suffered the Date and Calendar APIs in Java. No more, I say, No more!

I am pleased to be able to announce JSR 310 - Date and Time API. Here's the summary of what the JSR is aiming for:

This JSR will provide a new and improved date and time API for Java. The main goal is to build upon the lessons learned from the first two APIs (Date and Calendar) in Java SE, providing a more advanced and comprehensive model for date and time manipulation.

The new API will be targeted at all applications needing a data model for dates and times. This model will go beyond classes to replace Date and Calendar, to include representations of date without time, time without date, durations and intervals. This will raise the quality of application code. For example, instead of using an int to store a duration, and javadoc to describe it as being a number of days, the date and time model will provide a class defining it unambiguously.

The new API will also tackle related date and time issues. These include formatting and parsing, taking into account the ISO8601 standard and its implementations, such as XML. In addition, the areas of serialization and persistence will be considered.

The final goal of the new API is to be simple to use. The API will need to contain some powerful features, but these must not be allowed to obscure the standard use cases. Part of being easy to use includes interaction with the existing Date and Calendar classes, something that will be a key focus of the Expert Group.

Obviously, this whole JSR will be influenced by Joda-Time, the open source library that I've been working on for the last few years. However, the JSR will not be a simple adoption of Joda-Time. For a start the Java 5 language features make a difference. And there are probably a few rough edges that can be smoothed out too. But overall, its a very solid base to begin from.

The JSR will be run by myself and Michael Nascimento Santos. I also want to thank our extensive list of supporters (19!) without whom we wouldn't have got to this point.

The aim in running the JSR is to be as open and transparent as possible. There will be a java.net project with a mailing list available for anyone to sign up to. We also aim to have a publicly readable Subversion repository. Joda-Time worked well as open-source - why shouldn't the JSR?

So, if you're interested in helping us move Java beyond Date and Calendar, then please consider joining the public mailing list or Expert Group (via JCP). Alternatively, feel free to leave comments here.

Java language - dynamic instanceof

The instanceof keyword is well known and well understood. It certainly isn't the most OO part of Java, but it is one that the vast majority of programs will use somewhere. Is it possible that even this familiar operator could be improved?

instanceof

Lets consider a method that finds all the Strings in a mixed collection which might contain any kind of object:

  public Collection<String> extract(Collection<?> coll) {
    Collection<String> result = new ArrayList<String>();
    for (Object obj : coll) {
      if (obj instanceof String) {
        result.add((String) obj);
      }
    }
    return result;
  }

Nothing strange here. But what if we want to pass in the type to check for at runtime?

  public <T> Collection<T> extract(Collection<?> coll, Class<T> type) {
    Collection<T> result = new ArrayList<T>();
    for (Object obj : coll) {
      if (type.isInstance(obj)) {
        result.add((T) obj);
      }
    }
    return result;
  }

Suddenly, we can't use the instanceof operator. Instead we have to use the isInstance() method on the type itself. What would be neat would be a more dynamic instanceof operator:

  public <T> Collection<T> extract(Collection<?> coll, Class<T> type) {
    Collection<T> result = new ArrayList<T>();
    for (Object obj : coll) {
      if (object instanecof type) {
        result.add((T) obj);
      }
    }
    return result;
  }

This code sample is using instanceof to check the type of a Class object, rather than a hard coded compile-time Class literal. Of course, its really just syntax sugar for type.isInstance(obj), possibly with null-handling.

This strikes me as something that probably should have been in Java from the start, and has minimal potential for damage if added now. However, its also a pretty specialised operation, rarely used and has a simple enough workaround.

Thus, I fear this is one for the 'nice idea, but not worth the effort' pile. Any opinions?

Saturday, 27 January 2007

Dot equals and the source keyword

A number of comments to yesterday's blog on a dot equals operator for overloading equals() objected to the syntax. As I tried to show, using the most logical syntax of == isn't possible in Java. Or is it?

Source keyword

Howard Lovatt has suggested using a source keyword to identify the version of the Java language that the source code is written in. This would appear before the package statement:

/**
 * Copyright...
 */
source 7;
package com.foo.bar;

public class Foo {
  // normal class stuff, but can use Java 7 syntax
}

The addition of this keyword could allow the radical change of redefining == to === and .equals() to == in Java 7 code. Files with 'source 7' would use the new equals operators, files with source 6 (or no source keyword) would use the old equals operators.

  source 6;  // keyword specifying Java version
  ...
  if (person.equals(input)) { ... }  // test by equals()
  if (person == input) { ... }       // test by equality
  source 7;  // keyword specifying Java version
  ...
  if (person == input) { ... }   // test by equals()
  if (person === input) { ... }  // test by equality

I'll be honest - this concept seems like a 'technically it would work but its a really bad idea' thing. Redefining == in Java, so it means two different things depending on what the source version is just bound to cause trouble. Thats why yesterday's blog is seeking a new operator to represent equals().

Anyway, I include this blog just to show what is possible with syntax change - not what I actually want.

Thursday, 25 January 2007

Java 7 - Dot Equals

Comparing two objects is one of the most common tasks developers do. Yet its one where Java feels rather, well verbose. I just find those .equals() seem to be clutter rather than clarity. So what can we do?

Groovy equals

One of the key changes that Groovy made compared to Java was the alteration of the meaning of the == operator. Here's the change that was made:

  // Java
  if (person1.equals(person2)) { ... }

  // Groovy
  if (person1 == person2) { ... }

For those that don't know Groovy, these two extracts of code are identical at the bytecode level. Groovy simply makes the == operator call the equals() method. What about the not equal example:

  // Java
  if (!person1.equals(person2)) { ... }
  // or
  if (person1.equals(person2) == false) { ... }

  // Groovy
  if (person1 != person2) { ... }

And when you actually want to check for equality?

  // Java
  if (person1 == person2) { ... }

  // Groovy
  if (person1 === person2) { ... }

So, Groovy uses two equals symbols, ==, to mean equals() and three equals symbols, ===, to mean equality.

Personally, I find the Groovy code using == to mean equals() to be a lot clearer. But, clearly Java cannot be improved in the same way. Redefining the meaning of the == operator would break lots and lots of existing code and be very confusing.

Dot equals

I do believe there is a possible alternative that fits Java-Style. The concept is to add a new operator to the language that maps to the equals() method:

  if (person1 .= person2) { ... }

So, dot-equals is a new operator in the language, consisting of a dot immediately followed by an equals symbol. This compiles to the following standard Java code:

  if (person1.equals(person2)) { ... }

The advantage of the dot-equals operator is that it retains the visibility of the dot. And the dot allows existing Java developers to link the operator to the actual method call going on in the background. The dot also provides a hint that a NPE could occur at that point.

However, it is also possible to eliminate NPE completely from the dot-equals operator. To achieve this, the code would compile to something like this instead:

  if (System.nullSafeEquals(person1, person2)) { ... }

Given how many NPEs we suffer from, this seems like a Good Thing. But it does make the meaning of the dot-equals operator more complex. Opinions welcome on whether to eliminate NPEs or compile straight to equals().

Finally, what about the not equals concept? This is where things are not so pretty:

  // option A
  if (!person1 .= person2) { ... }

  // option B
  if (person1 !.= person2) { ... }

  // option C
  if (person1 .!= person2) { ... }

I have to say that option A misses the point for me - its too easy to lose the ! symbol. Neither option B or C are beautiful, but they are viable. If you've any other suggestions there I'd love to hear.

Summary

So, this is a proposal for a .= operator:

  // existing
  if (person1.getSurname().equals(person2.getSurname())) { ... }

  // proposed
  if (person1.getSurname() .= person2.getSurname()) { ... }

So, what am I missing? Do you love it or hate it? Is it clear and readable, or really obsure? Feedback welcomed of course.

Tuesday, 23 January 2007

Java 7 - Short declarations

Peter Ahe recently blogged about shortening the code needed for declaring variables, where he proposed static factory methods. I just wanted to note down my opinion on this area and why.

Short declarations

The problem to be solved is the excessive length of defining a variable, particularly as half the information is repeated. For example:

SomeVeryLongClassname foo = new SomeVeryLongClassname();

Why do we actually need to say 'SomeVeryLongClassname' twice for example? The current proposals for changing this are:

// now
SomeVeryLongClassname foo = new SomeVeryLongClassname();

// option A
foo := new SomeVeryLongClassname();

// option B
final foo = new SomeVeryLongClassname();

// option C
var foo = new SomeVeryLongClassname();

// option D
SomeVeryLongClassname foo = SomeVeryLongClassname.new();

// option E
SomeVeryLongClassname foo = new ();

Now, lets repeat the options using the classic map example:

// now
Map<String, List<Integer>> foo = new HashMap<String, List<Integer>>();

// option A
foo := new HashMap<String, List<Integer>>();

// option B
final foo = new HashMap<String, List<Integer>>();

// option C
var foo = new HashMap<String, List<Integer>>();

// option D
Map<String, List<Integer>> foo = HashMap.new();

// option E
Map<String, List<Integer>> foo = new HashMap<>();

// option E2
Map<String, List<Integer>> foo = new HashMap();

And finally, lets view the syntax options when applied to initialising the variable from a method:

// now
Map<String, List<Integer>> foo = loadHashMap();

// option A
foo := loadHashMap();

// option B
final foo = loadHashMap();

// option C
var foo = loadHashMap();

// option D
Map<String, List<Integer>> foo = loadHashMap();

// option E
Map<String, List<Integer>> foo = loadHashMap();

// option E3
Map<> foo = loadHashMap();

I hope that this is useful in comparing the main options being proposed.

My view is that options A, B and C are really very similar. They differ only in the exact syntax used to perform the inference. If I were forced to choose one of those three, I would probably choose C. However, I believe that all three are inappropriate for Java.

Why? One of the key tenets of Java is static typing. I believe that this manifests itself in Java-Style as variables always being fully declared at the point of creation (ie. on the LHS). This allows the maintainance developer to easily scan up the file and find what type the variable is that they are working with (Clarity, Intent and Readability). Options A, B and C do not meet this LHS declaration Java-Style rule.

A side effect of this is that the type of foo should be able to be declared to be a Map, and not a HashMap. Again, options A, B and C have no ability to do this.

So, I believe that options A, B and C are not acceptable if we are to stick with Java-Style in Java. I understand how dynamic language programmers may have no problems with these options and the kind of type-inference that they represent. I just don't believe that model is Java-Style.

Having ruled out A, B and C, we have D and E. Option D is only really of value in the map/list generics case. As such, it seems too specialist to me.

Option E has value in both the long class name and map/list generics cases. Dropping the RHS type doesn't clash with my proposal of defining the LHS as a key element of Java-Style. And without the RHS, object creation becomes significantly shorter.

(Option E2 is a really technical point to do with future reification of generics - not that important for this post.)

The downside of E is that it makes no difference when a factory or method is used on the RHS. But thats the inevitable result of defining the LHS as essential in Java-Style. This is tackled by E3, but again at the cost of losing LHS information, potentially breaking the Java-Style rule we've strived to create.

Summary

I believe that Java has a certain style, and its not the same as scripting languages. Part of that style is that the LHS of a declaration completely specifies the type. This rules out the options discussed elsewhere (A, B and C) and leads naturally to my favourite E (although I might be persuaded to accept E3).

As always feedback is welcomed. If possible could you mention if you regularly use a dynamic/type-inferring language, as I suspect views on this topic vary by how much people use, or have used, those languages.

Sunday, 21 January 2007

Closures - Control-invocation syntax

The BGGA closures proposal includes a control-invocation syntax for calling closures. This is viewed as a convenient way to call some closure-control-methods (APIs). Unfortunately, the use cases envisaged in the proposal seem too limited to me.

Control-invocation syntax

All BGGA closures can be called like this (the standard-invocation syntax):

eachEntry(map, {String key, Integer value =>
  // do stuff with key and value
});

Some BGGA closures can be also called like this (the control-invocation syntax):

eachEntry(String key, Integer value : map) {
  // do stuff with key and value
}

The former is the more flexible, as it allows multiple closure-block parameters to be passed to a single API if needed. It is also the syntax that is closest to the dynamic languages like Ruby or Groovy.

I have concerns with the standard-invocation syntax in Java however. In particular, it is quite reminiscent of the inner class syntax. But with closures, return/break/continue may have a different meaning entirely. I also seem to always forget the part after the closing brace - ); - bracket semicolon.

Put simply, I believe that the control-invocation syntax fits stylistically into Java much better.

The current BGGA proposal places limitations on control-invocation syntax closures. They can't pass a result back to the closure-control-method. Nor can the closure-control-method pass a result back to the caller (I'm 99% sure of this). And they can only be used where the closure-control-method receives one closure. This is explained in BGGA thus:

This is not an expression form for a very good reason: it looks like a statement, and we expect it to be used most commonly as a statement for the purpose of writing APIs that abstract patterns of control. If it were an expression form, an invocation like this would require a trailing semicolon after the close curly brace of a controlled block. Forgetting the semicolon would probably be a common source of error.

So how to address these concerns? Well, here is a simple piece of code using the standard-invocation syntax:

  List[String] matches = eachMatching(list, {String str =>
    str != null
  });

So, to change to the control-invocation syntax we have to change the BGGA rules. Firstly, lets allow the closure-block to pass a result back to the closure-control-method using =>, as I talked about before. Since we've only changed one rule, we still can't return the matches directly.

  List[String] matches = new ArrayList[String];
  eachMatching(String str : list, matches) {
    => str != null;
  }

Secondly, lets allow the closure-control-method to return a result, but without being an expression, and without requiring the semicolon.

  List[String] matches = eachMatching(String str : list) {
    => str != null;
  }

The eachMatching result can only be used to assign to a variable. You cannot use the dot operator to call a method such as size() on the block. (Formally, this is now a new form in the language, neither a statement or an expression, but a restricted type of statement where the {} block has taken the place of the semicolon.)

Now, taking this on to the final level we could add in the 'for' keyword to identify that a closure is occurring, to complete the picture.

  List[String] matches = for eachMatching(String str : list) {
    => str != null;
  }

Well, some of you probably hate this last snippet of code, but I hope some might also see beauty there. It is a mix of keywords and statements in a way that definitely isn't quite like Java of today. And yet, its also I think its clearly derived from today's syntax. What I do hope however, is that you can understand it from a simple glance.

As usual, let me know your feelings on this syntax direction for closures.

Closures - Last-line-No-SemiColon nastiness

The BGGA closures proposal currently plans to use a 'last-line-no-semi-colon' to return the result of the closure back to the closure-defining method. I personally find this rather nasty. What about an alternative?

Last-line-no-semicolon

Here is some code that traverses a complex object structure to find all the bookings that contain a specific purchase type. (Yes, I know its not OO, a lot of real world code isn't - OO isn't the point of this blog post).

List[Booking] matches = new ArrayList[Booking];
for (Booking booking : bookingList) {
  for (Purchase purchase : booking.getPurchaseList()) {
    if (purchase.matchesType(typeToFind)) {
      matches.add(booking);
      break;
    }
  }
}

And here is the code refactored to use a BGGA closure, 'findAll', using the current rules:

List[Booking] matches = findAll(Booking booking : bookingList) {
  boolean matched = false;
  for (Purchase purchase : booking.getPurchaseList()) {
    if (purchase.matchesType(typeToFind)) {
      matched = true;
      break;
    }
  }
  matched        // passes result to findAll, last-line-no-semicolon
}

So, adding the closure has made things worse. We had to introduce an artificial matched variable, and the result was returned on the last line without a semicolon, which is highly error prone and very un-Java-like.

In my prior blog post on this topic I suggested one possibility was using the goto keyword to help. But the truth is thats just horrible. So, I'll propose an alternative which I believe fits BGGA much better:

List[Booking] matches = findAll(Booking booking : bookingList) {
  for (Purchase purchase : booking.getPurchaseList()) {
    if (purchase.matchesType(typeToFind)) {
      => true;        // passes result to findAll of true 
    }
  }
  => false;           // passes result to findAll of false
}

So, this proposal is to simply use the => symbol to act as the result syntax within a closure. The true/false value is returned to the closure method which then decides whether to add the booking to the overall result or not. This fits the other use of => in the function type declaration, where it means 'returning'.

So, the overall effect is that a result can be returned from any part of the closure block, and there are no missing semicolons or weird last line effects.

Nested closures

Neal has previously raised a problem with this approach. He points out that the use of something like this means that refactoring the method further to introduce a second closure may cause problems. Here we change the inner for loop to the 'eachNotNull' closure:

List[Booking] matches = findAll(Booking booking : bookingList) {
  eachNotNull(Purchase purchase : booking.getPurchaseList()) {
    if (purchase.matchesType(typeToFind)) {
      => true;        // now within two closures! help!
    }
  }
  => false;           // passes result to findAll of false
}

As shown, the '=> true' is now within two closures, 'findAll' and 'eachNotNull'. So what does the => mean? Well, this actually isn't a new concept in Java - break and continue currently refer to the closest loop, and so are vulnerable to refactorings that introduce a new loop level. As such, I suspect that this isn't as big a deal as Neal thinks.

Option A for this situation is that the => operator returns the result to the closest surrounding closure that accepts that type of the result. In this case, the 'eachNotNull' closure has a 'Void' result, so the boolean value 'true' is still passed back to the 'findAll' closure. Thus, adding the 'eachNotNull' closure had no effect on the logic.

Option B is that any result within a second closure has to be labelled to reach its destination. This is the same rule as break/continue from a loop within a loop in current Java.

outer:   // label
List[Booking] matches = findAll(Booking booking : bookingList) {
  eachNotNull(Purchase purchase : booking.getPurchaseList()) {
    if (purchase.matchesType(typeToFind)) {
      => outer: true;    // labelled result now reaches findAll
    }
  }
  => false;              // no need to label, result goes to closest closure, findAll
}

Here, the closure is labelled 'outer' in exactly the same way as in current Java. The => result then references the label to return the result to the correct closure.

So which do you prefer? Option A (auto-seek best first matching result) or option B (explicit labelling)? I suspect that option A is too vague and complex for Java, so I'm leaning towards option B.

Fully refactored

By the way, if the code was refactored a little more, this is what you might end up with:

List[Booking] matches = findAll(Booking booking : bookingList) {
  => booking.matchesType(typeToFind);
}

To my eyes, this is the most attractive form, making it clear that a result is being returned from the closure block, and retaining the standard Java semicolon.

Summary

I'm determined to excise the evil 'last-line-no-semicolon' concept from closures - its a seriously nasty hack. Let me know your thoughts on it!

Thursday, 18 January 2007

Are you scared of Java language change? Why?

Are you scared of Java language change? Why? I'm going to try and debunk some of the arguments against change, and express some frustration with Java as-is.

"It was designed simple"

There is a commonly held view that Java was designed to be this simple, perfect language for all-known tasks. Thats not how I read history.

If we go back and re-read about the creation of Java we see some interesting points. These articles cover how Java (Oak) was written for set-top boxes, a market that didn't work out. Thus, the plan changed to focus on Applets on the new internet. Then the internet exploded, Java caught the wave and the rest is history.

Yes folks, Java was originally designed for set-top boxes and applets. Yet today, it is probably the most widely used enterprise language, and applets are dead. Since the fundamental use-case has changed, why shouldn't the language?

My point is that those who claim Java's 'simplicity' was its reason for success are wrong. I contend Java just got lucky.

Still not convinced? Try reading this extract from Patrick Naughton's Long strange trip to Java (via Artima):

[Bill Joy] was often comparing Oak to more complicated and elegant languages like Python and Beta. He would often go on at length about how great Oak would be if he could only add closures and continuations and parameterized types. While we all agreed these were very cool language features, we were all kind of hoping to finish this language in our lifetimes and get on to creating cool applications with it. The more we argued with Bill about making those changes the more strongly he would fight us. After a while it became a choice between not having Bill involved at all or losing control of the language. James and I got into a rather epic battle with Bill in his office in Aspen one evening about this issue. He started out by insulting both of us about how poorly Oak compared to better languages and then he volunteered to resign from being the Live Oak architect if we wanted him to. James and I agreed that would be best and walked out to go across the street to watch "Speed". What a rush.

The next day, Bill was pretty much his normal old nice guy self again, a little relieved, I think, to be out of the role of being directly responsible for our destiny. Bill is annoyingly smart, and I wouldn't put it past him to have planned that whole scenario the night before to force us to defend our positions and take ownership of our project. The interesting thing is that we were right about needing to finish the language even though it had missing features. It was a timing issue, there was only about a three month window in which the whole Java phenomenon could have happened. We barely made it. It is also interesting that Bill was absolutely right about what Java needs long term. When I go look at the list of things he wanted to add back then, I want them all. He was right, he usually is.

Java succeeded because it hit that critical time-window of being in the right time at the right place. But to achieve it, compromises were made. In particular, lots of language features were dropped - assertions, closures, enums, generics (sound familiar?). By all accounts, they weren't dropped to keep the language 'simple', so much as because the timeline dictated it.

Thus Java's so-called simplicity is a fallacy. Language changes now are simply completing the job that was unfinished back then and meeting the realities of Java as an enterprise language.

"Nothing beyond Java"

Some in the Java community seem to have become zealots, extremely passionate about the language, and vehemently rejecting all change. This may stem from the battles between Sun and Microsoft, where some became religiously committed to Java. This has left them unwilling to look over the fence at other programming languages, with a belief that anything from outside the Java ecosystem must inherently be bad.

I reject that view. Other programming languages do exist. Each has its plus points, and each its negatives. But they can teach us what works and what doesn't - assuming that we are willing to look Beyond Java and learn.

"Just look at generics"

Java 5 introduced generics amongst many other items. Unfortunately, generics are probably the most troublesome change that has been made to Java. With over 400 pages in the official generics FAQ attempting to explain weird corner cases, we know something went wrong.

The negative take on this is that we shouldn't change Java ever again because 'we might get another generics'. I believe that is a very reactionary point of view. So long as any change is well specified, and avoids weird corner cases, it should be fine. And the Java community should be testing that and enforcing it.

"Code isn't important"

Another group, take the view that in the big picture code and syntax isn't important. Instead, the focus should be on process, teamwork, risk and testing.

The problem with this is two-fold. Firstly, these issues about process affect you whichever language they use, so they are a non-argument when discussing language changes.

Secondly, they fail to take into account that less lines of code actually does matter. Less LOCs means less lines to test. Fewer if clauses means fewer places for it to go wrong, and fewer tests. Abstracted loops means more predictable code, and fewer tests. More errors caught at compile-time mean less problems in production and fewer tests. And less LOCs apparently also means more secure systems (number of bugs proportional to LOCs).

You could misread this last paragraph to suggest I believe LOCs is the only measure of language change - I certainly don't believe that (auto-boxing being the classic example of getting it wrong by obscuring intent). But it does turn out to be far more important than is often thought.

"Junior developers won't understand this"

The issue with training and language change seems overblown. Developers are not stupid - even the junior ones or so-called Java-Joes. Every day they interact with multiple frameworks consisting of hundreds of classes which are way more complicated than the language changes being talked about.

In fact, most language changes actually try to simplify that interaction. By abstracting out common patterns found in many frameworks into the language, knowledge from one framework becomes transferable to another framework and the overall interaction becomes simpler and often more compile-safe.

What if we don't change

An increasing number of developers are, over time, seeing life beyond the high walls of Java, most notably this has been exposure to Ruby or Groovy. When these developers come back to Java to write code, they tend to find it very frustrating. I think this quote from Ray Cromwell on Javalobby expresses the frustrations well:

I've been programming in Java for almost 12 years now, and I am pretty close to abandoning the language. If it hadn't been for IntelliJ IDEA, I think I would have dropped it in 2002. I really feel like I'm killing brain cells sometimes by generating the same code over and over. I program in a language that I would actually call "JetBrains Java" since I find I must drop into Live Templates often to refactor and reuse code that otherwise can't be abstracted due to Java's limited expressiveness.

Every Java IDE, every Java framework, every JSR and ultimately every Java developer is finding a way to work around issues in the language, many of which could be solved by relatively small, directed language changes.

Summary

Java is still the enterprise choice. But more and more architects are seeing life beyond Java. They're realising that coding something in half or less code is actually a good thing, and the result is code that is far clearer and more understandable than the equivalent Java code. And it often takes less time too.

Java will never be Ruby or Groovy, but it does need to learn some of their lessons. The process of language change really doesn't need to be scary - and, if done well, the upsides will far outweigh any downsides.

Tuesday, 16 January 2007

Property access - is it an expression language?

There is lots of talk about a simpler syntax for getting and setting properties. But what are the implications?

Property value access

The basic concept is simple. The proposers of this idea (not me!!!) use a simple get or set example:

  // current code
  person.setSurname( input.getName() );

  // proposed change - not from me!
  person.surname = input.name;

As a simple example, this works fine. Its effectively an operator overload for get/set methods. I don't think it would be disastrous if implemented - just a little weak and incomplete.

First, some specific issues with it. Problem one is that the syntax clashes with public fields on any existing class. The compiler will have to have specific rules to decide whether to pick a field or a property. Or alternatively, you can't declare both with the same name. Or a syntax other than dot could be used (which has other downsides).

Problem two is that the code is no longer 'transparent' wrt what it actually does. A basic field assignment statement doesn't fail (yes, yes, I know it can, but 99.9% of the time it doesn't). With this change, what looks like an innocuous field assignment is actually a method call, which may throw an exception.

Problem three is related to two - the get and set methods could end up being remote operations performed across a network. Obviously, a remote operation may have performance and scalability issues, and should be thought about a bit more. This is very hidden from the calling code (although it could be argued that a get/set method hides it equally as much)

Problem four is null handling. Consider a more complex example:

  address.postcode = input.address.postcode;

What happens if input or address is null. Do we care? Or do we just want to assign null to address.postcode? This is related to the issue I tackled in null-ignore invocation.

Problem five is scope - the proposal only covers basic get/set. But real applications use Lists and Maps. And they need indexed and mapped access too.

  List<Person> people = ...
  Person person = people[0];

  Mapt<String, Integer> numberToBook = ...
  Integer adultsToBook = numberToBook["Adult"];

  Requirements reqs = ...
  Integer adultsToBook = reqs.flight[0].numberToBook["Adult"];

Suddenly, the scope of the change has grown dramatically to cover List.get(int), Map.get(K) and Map.put(K, V).

A full solution

So, we need a different syntax, to meet these goals:

  • no conflict with existing Java syntax
  • is isolated enough that we can be aware that an error might occur on assignment
  • can handle nulls along the way, optionally ignoring them
  • can handle indexed and mapped access

To me, this more completely set of requirements is a long way from simply an operator overload for get and set methods. These goals remind me of JSP/JSF EL. Or OGNL. These are expression languages where there is a non-Java syntax dedicated to accessing and manipulating bean properties.

Now, non-Java languages, such as Groovy, don't have a problem with this. They can design their language around the need for property and path access.

But, making the change in Java is a big deal - its almost like adding a new sub-language within the Java language. The question is whether Java can take that level of change?

A separate sub-language

So perhaps we should be honest and admit that Java isn't suited for defining these expressions. Instead, embed a new expression language, delimited by special tokens:

  String email = ${contact.email};

  ${contact.email = input.email};

  ${address.postcode = input#address#postcode};

  List<Person> people = ...
  Person person = ${people[0]};

  Mapt<String, Integer> numberToBook = ...
  Integer adultsToBook = ${numberToBook["Adult"]};

  Requirements reqs = ...
  Integer adultsToBook = ${reqs.flight[0].numberToBook["Adult"]};

This seems to meet the goals in a much cleaner fashion. The expression is clearly delineated, using ${...}, to indicate it follows different syntax rules. It can interact fully with all the enclosing variables. In fact it would be compiled to bytecode so the resulting code would be indistinguishable. Nulls can be ignored, here with a # syntax. Lists and maps are dealt with using a syntax that is familiar from other ELs.

It does rather feel like the additional ${} characters get in the way though.

Summary

Groovy has shown that you can embed a full property access syntax, GPath, within a syntax like Java. But is it viable to change Java by that degree? The expression-language concept allows a new syntax to be used in Java that doesn't clash with existing assumptions in language.

I'm not sure I'm sold on the concept though - for a start it just seems to look a bit ugly. All I hope to do is to consider some of the issues involved, and hopefully trigger other ideas elsewhere. As usual, your thoughts are welcomed.

Monday, 15 January 2007

A 'bean' keyword?

In the talk about properties, I don't think I've seen anyone mention using a 'bean' keyword. Here's a thought experiment to see how it might work.

Bean keyword

  public bean Person {
    // properties
    property String forename;
    property String surname;

    // normal fields and methods
    private String ordinaryField;
    public boolean validate() { ... }
  }

This would be compiled to:

  public class Person implements Bean {
    // properties
    private String forename;
    private String surname;
    public String getForename() { ... }
    public void setForename(String forename) { ... }
    public String getSurname() { ... }
    public void setSurname(String surname) { ... }

    // normal fields and methods
    private String ordinaryField;
    public boolean validate() { ... }

    // from Bean interface
    public MetaBean metaBean() { ... }
    public Property property(String propertyName) { ... }
    public Map<String, Property> propertyMap() { ... }
  }

So what have we got? Firstly, neither the 'bean' or 'property' keyword would break existing code. Keyword 'bean' would a contextual keyword, used in place of 'class'. Keyword 'property' is also contextual, in that it can only appear inside a bean.

The 'bean' keyword acts like the 'enum' keyword. It causes the generated class to implement a JDK interface - Bean. Code is generated to implement the interface allowing dynamic access to the property objects of the bean. The keyword also triggers the code generation of the get/set methods.

The trouble is that the 'bean' keyword hasn't really changed the nature of the problem. There are still two basic types of bean (simple semi-encapsulated classes, often server-side data transfer objects, and rich event-firing classes, usually client-side). The keyword hasn't helped us define the scope of the property, readonly/writeonly, bound/constrained, validation, etc.

Abandoning the past

What if we leave existing bean conventions in the past? Well, then we might have something that makes more sense. What if the 'bean' keyword generated this code instead:

  public class Person implements Bean {
    // public final properties
    public final Property<Person,String> forename = ...
    public final Property<Person,String> surname = ...

    // normal fields and methods
    private String ordinaryField;
    public boolean validate() { ... }

    // from Bean interface
    public MetaBean<Person> metaBean() { ... }
    public Property<Person,?> property(String propertyName) { ... }
    public Map<String, Property<Person,?>> propertyMap() { ... }
  }

Now, each property is actually no more than a public final field on the bean. But, using this code-generation is very different from at present:

  String surname = person.surname.get();
  person.surname.set( person.surname.get().toUpperCase() );
  person.surname.addListener(someListener);   // client-side only?

So, application code has to use person.surname.get() instead of person.getSurname(), to access data. This is true property objects - and a much better design than a get/set method naming convention.

Is this viable? Yes, although performance would need careful analysis. The beans Introspector could probably cope with the new structure, so most existing frameworks would still work.

But is it realistic? Well, I can hear the howls of protest now, so probably not. The change could be sweetened by generating regular get/set methods on the Person class itself that delegate to the property object. But if this is truly a new beans feature in the language then why look to the past?.

Summary

On its own, a 'bean' keyword doesn't look that useful. The main problem is still in defining the properties. The main use I could think of was using it to identify a completely new type of JavaBean, following a specification completely different to JavaBean spec 1.0. And that seems unrealistic.

Friday, 12 January 2007

More detail on Property Literals / Property Objects

A little more clarity on my property literal proposal (as it seems to have become named).

The aim is to provide compiler checked access to properties, simplifying the relationship between applications and frameworks. Currently, we write (where person is the bean being worked on):

  binder.bind( person, "surname" );

Property literal syntax replaces this:

  binder.bind( person->surname );

Thus, the arrow operator on a bean returns an object which refers to a specific property on an actual bean instance.

Property literal syntax also allows you to bind to the meta-property, which is the property without a reference to any particular bean instance (where Person is the class name):

  binder.bindMeta( Person->surname );

Finally, there is the possibility of using the same syntax for references to methods:

  binder.bind( person->validate() );
  binder.bindMeta( Person->validate() );

The property can be used by the framework to access the bean in one of these ways (depending on whether you have a property or meta-property - ie. one that is bound or unbound to a specific bean):

  String value = property.get();
  String value = metaProperty.get(beanInstance);

The real question is how to connect these properties to the bean itself. There are two basic ways:

The first is the classical approach, where the PropertyImpl uses reflection to access the get/set methods. In fact, PropertyImpl doesn't even appear in the bean code at all, as it is just generated by the -> operator. So this bean is nothing special at all, all the magic is in the reflection in PropertyImpl.

  public class Person {  // ie. this is a 'normal' bean
    private String surname;
    public String getSurname() { return surname; }
    public void setSurname(String surname) {  this.surname = surname;  }
  }

So this bean is nothing special at all, all the magic is in the reflection in PropertyImpl.

The second approach is much more powerful and OO, but uses more memory.

  public class Person {
    private Property<Person, String> surname =
        new PropertyImpl<Property, String>(this, "surname");

    public String getSurname() {
      return surname.get();
    }
    public void setSurname(String surname) {
      surname.set(surname);
    }
  }

With this structure, the class PropertyImpl can be varied according to what kind of property it is. Thus, if you need a bound property then you simply use BoundPropertyImpl instead.

The classical approach uses less memory, is more proven, and is probably better for the server-side where memory and scalability matter. The second approach requires one object for each property, and probably works better for swing GUIs, which require lots of events to fly about.

For reference, the system at my day-job uses property objects. As we don't have property literals and the arrow operator, we just code-generate lots of methods instead. It has a similar effect, except that only frameworks written by us know about the property objects. As a result, we currently have proprietry frameworks. It would be great if Java could unite around property literals (and method literals?) for interacting between objects, and thus gain the benefts we see in this coding style.

Thursday, 11 January 2007

Java 7 - Null-safe types

Yesterday I published part of a proposal for enhancing null handling in Java 7. Reading the comments, this got a few people upset. Unfortunately, what I didn't make clear was that I was also documenting the other part - null-safe types for Java.

The null-safe types part of the proposal is the tougher one to write-up, and probably the tougher one to implement. I was going to try and finish the details before publishing, but lets get the discussion going anyway...

Null-safe types are similar to Option Types in Nice. The difficulty is taking the idea and blending it with Java in a way that works and doesn't break the style of Java.

So, what is the null-safe types proposal? Well, lets examine this method:

  public Person lookupPerson(String personId) { ... }

This is a fairly typical method that loks up a person from a person identifier. But what happens if we pass a null identifier into the method? And can we get a null response? The current solution to this is adding explicit documentation:

  /**
   * Find person by identifier.
   * @param personId  the person id, not null
   * @return the person, never null
   * @throws IllegalArgumentException if the person is not found
   */
  public Person lookupPerson(String personId) { ... }

Now we know how the method is to be used. But this is just documentation. There is no compile-time checking. If we don't read the documentation then we may get a NPE at runtime.

The null-safe types proposal adds a new modifier to types - hash # - that prevents that variable/parameter/return from being null.

  public #Person lookupPerson(#String personId) { ... }

Now that we are using these declarations, the compiler can check that the calling code does not pass a null String to the method.

  String badId = null;
  #Person person = lookupPerson(badId);   // compile error

  #String id = "1234";
  #Person person = lookupPerson(id);      // ok

The draft of the formal null-safe types proposal is available with more details. And there are a lot more details, and probably a lot I've missed. If you've got any suggested enhancements or fixes to the proposal, or want to help out, let me know. All feedback welcomed.

And remember that this proposal is intended to complement the null-ignore part of the proposal.

Wednesday, 10 January 2007

Java 7 - Null-ignore invocation

Have you ever had an NPE in production? Or felt that handling null is a right pain? Well perhaps this Java 7 language change proposal - Null-ignore invocation - will interest you.

The concept is to provide the developer with a tool to handle null, that will reduce NPEs and improve code clarity. To explain this, lets consider this method:

  public String getPostcode(Person person) {
    return person.getAddress().getPostcode()
  }

The problem is that this code may throw an NPE if person or getAddress() is null. Here is the 'correct' way to code this:

  public String getPostcode(Person person) {
    if (person != null) {
      Address address = person.getAddress();
      if (address != null) {
        return address.getPostcode();
      }
    }
    return null;
  }

The problem with the 'correct' solution is that the intent of the code has been lost. The null-checks are really getting in the way of reading and understanding the method.

This proposal aims to tackle this by adding compiler syntax sugar to generate the null pointer checks for developers. It achieves this by adding an alternative to the dot for method and field invocation, the hash - #. Using the hash invokes a method or field in the same way as using the dot, but with one change. If the expression on the left hand side of the hash is null, then the result of the expression is null. If the expression type is a primitive then zero or false is returned instead of null. If the expression is a void, then no action occurs.

  public String getPostcode(Person person) {
    return person#getAddress()#getPostcode()
  }

This code is now null-safe and won't throw an NPE. (Of course the interals of getAddress() or getPostcode() might throw NPE.) In addition the code is now back to its former clarity - you can now read it and clearly see the intent without any visual clutter.

I know that some reading this will cry 'Null object design pattern'. Well maybe thats fine in your closed sourcebase, where you can enforce its use with code reviews and so on. But as soon as your code deals with an external library, or even the JDK, then your code has to handle null. Which leads you right back to this proposal.

If you want to read the full details, I've written up the formal Null-ignore invocation proposal. If you've got any suggested enhancements or fixes to the proposal, let me know. All feedback welcomed.

Monday, 8 January 2007

Closures - use case - asynchronous

This is the fifth in a series of 'use cases' for the closures proposal. This time, asynchronous callbacks.

Concurrent Executor

  Executor ex = ...
  ex.execute() {    // closure
    // invoked in another thread asynchronously
  }

This code takes a Java 5 Executor and executes it the behaviour asynchronously. This is clearly shorter than the current syntax:

  Executor ex = ...
  ex.execute(new Runnable() {    // inner class
    public void run() {
      // invoked in another thread asynchronously
    }
  });

It may be shorter, but is it clearer overall? Unfortunately, problems happen when you use break, continue or return. The following is a more complex asynchronous inner class:

public void process() {
  final AddressService service = ...
  final Person person = ...
  Executor ex = ...
  ex.execute(new Runnable() {    // inner class
    public void run() {
      for (String addressId : person.getAddressIdList()) {
        Address address = service.getAddress(addressId);
        if (address == null) {
          return;
        }
        person.addAddress(addressId, address);
      }
      service.setRetrievedAddresses(true);
    }
  });
}

(OK, so this is probably stupid code - thats not the point here...). Now lets do a 'dumb' conversion to closures, following the rules we have learnt in previous blogs ("a closure works just like a synchronized block"):

public void process() {
  AddressService service = ...
  Person person = ...
  Executor ex = ...
  ex.execute() {  // closure
    for (String addressId : person.getAddressIdList()) {
      Address address = service.getAddress(addressId);
      if (address == null) {
        return;
      }
      person.addAddress(addressId, address);
    }
    service.setRetrievedAddresses(true);
  }
}

Unfortunately, one of two things will happen here. This will either fail to compile at the return statement or it will throw an UnmatchedNonLocalTransfer exception at runtime, also at the return statement. (The former happens if Runnable implements RestrictedClosure, the latter happens if it doesn't). In addition, the compiler or runtime will complain about the service and person variables being non-final.

Why is this an error? Because the 'return' keyword in a closure has been defined to mean 'return from the surrounding method' - the process() method in this case. But, this closure is being executed asynchronously, at a later point in time, on a different thread. Thus the process() method no longer exists and cannot be returned from. In other words, all the old inner class restrictions are still present. Its just that because the syntax changed, we were misled into thinking the restrictions had been removed.

How can this be solved? Well, one solution is refactor the closure method:

public void process() {
  final AddressService service = ...
  final Person person = ...
  Executor ex = ...
  ex.execute() {  // closure
    boolean success = true;
    for (String addressId : person.getAddressIdList()) {
      Address address = service.getAddress(addressId);
      if (address == null) {
        success = false;
        break;
      }
      person.addAddress(addressId, address);
    }
    service.setRetrievedAddresses(success);
  }
}

I'm sorry, but refactoring to use a success flag like that is just a bad code-smell. So I guess that we'd have to refactor to use another method. But that really defeats the purpose of inline, quick-and-simple closures.

The basic premise of closures from the previous 4 uses on this blog, was that you could just add code to the closure in the same way as you add code to a synchronized block. With asynchronous closures, that just isn't so. And I suspect that most developers would feel cheated as a result. Two different types of closure. Two different sets of syntax rules.

I could have written this blog without telling you that the execute closure was asynchronous. This would have produced real confusion, as there is no way of telling, just by reading the code that it is restricted (asynchronous) and therefore doesn't allow return, break or continue, or access to non-final variables.

Personally, I don't see how closures are viable in Java with this issue. Its just way too confusing. One solution would be to prevent asynchronous closures by controlling the reference to the closure object closely. A second possible solution is to use a different invocation syntax that clearly identifies the different rules - in fact, something not unlike CICE...:

public void process() {
  final AddressService service = ...
  final Person person = ...
  Executor ex = ...
  ex.execute(new Runnable() {
    for (String addressId : person.getAddressIdList()) {
      Address address = service.getAddress(addressId);
      if (address == null) {
        return;  // means return from Runnable
      }
      person.addAddress(addressId, address);
    }
    service.setRetrievedAddresses(true);
  });
}

Summary

Asynchronous closures have different syntax rules to regular closures - no return, break, continue or non-final variable access. A developer with inner-class experience will be misled into believing that closures remove those restrictions, be severely disappointed when they discover the truth, and probably return to coding inner-classes for clarity. Developers without inner-class experience will likely be just plain confused as to why closures sometimes compile, sometimes don't and sometimes throw weird runtime exceptions.

Java 7 - Property objects

Property syntax for Java 7 is the current discussion de jour. But I've been pointing out that we can go beyond get/set methods to create refactorable, compile checked property objects too.

Property syntax

The proposals being discussed revolve around syntax sugar code generation. So, the developer codes:

  public class Person {
    public property private String forename;
    public property private String surname;
  }

This would code generate the get/set methods:

  public class Person {
    private String forename;
    public String getForename() {return forename;}
    public void setForename(String forename) {this.forename = forename;}

    private String surname;
    public String getSurname() {return forename;}
    public void setSurname(String forename) {this.forename = forename;}
  }

Discussions on Javalobby and various blogs have focussed on the basic mechanics of the syntax for defining a property (keyword vs annotation vs token) and javadoc. In addition, a completely separate discussion about arrow operators has confused the debate. This is all fine, but in my view it misses the huge opportunity that is presented by this change.

Property objects

Currently, many many tools, especially frameworks, have to access beans in a dynamic way. By this I mean where a String property name is used to identify a property and reflection is used to access it. For example, this code might be used to bind the surname property in a validation, swing or database framework.

  binder.bind(person, "surname");

But this design is really utter madness!!! A key strength of Java is static typing, allowing compile time detection of errors and refactoring. With a design like this, all these features of Java are lost. For example, if you rename the property, then the binding will fail at runtime - yuck!!! What I want to write is:

  binder.bind( person.propertySurname() );

This is fully type-safe, refactorable and compiled. And once you have it, you find an awful lot of places where it can be used - Database bindings, Swing bindings, WebApp bindings, XML access, ...

Implementation

Full details can be foud in the download zip. In summary, there are four interfaces, the most significant is Property:

  public interface Property<B, T> {
    B bean();
    MetaProperty<B, T> metaProperty();
    T get();
    void set(T value);
  }

The bean method returns the bean that owns this property. The meta property holds details about the property such as its name and type. The get and set methods access the bean. The key point is that the get method actually calls the real getSurname() method on the bean. Likewise for set. So, the property object isn't breaking encapsulation, it just provides a higher level view of the property.

So, why the link to the current property debate? Well, the method propertySurname() needs to be generated along with the get and set methods. (In fact a metaProperty static method should be generated as well to complete the picture, see the download zip for full details). Anyway, with my proposal, the property code generation would create the following:

  public class Person {
    private String forename;
    public String getForename() {return forename;}
    public void setForename(String forename) {this.forename = forename;}
    public Property propertyForename() {return new SimpleProperty(this, "forename");}

    private String surname;
    public String getSurname() {return forename;}
    public void setSurname(String forename) {this.forename = forename;}
    public Property propertySurname() {return new SimpleProperty(this, "surname");}
  }

In other words, the proposal simply means one more syntax sugar generated method per property (see the download zip for full details).

Summary

The get/set method convention is extremely widely used in Java, and so any property solution must generate get and set methods. But if we go a little further and create property objects then we can dramatically change the interaction between our applications and the frameworks we use day-in day-out. Personally, I just love the refactorable, static-type checks and compiler safety that this brings. Do you agree?

Friday, 5 January 2007

Closures - use cases - resource access

Some more 'use cases' for the closures proposal (see previous entries). One poster child for closures is resource management, so lets see how that works out.

withLock

  Lock lock = ...;
  withLock (lock) {
    // do something which required locking
  }

This operation locks the lock, calls the closure, then unlocks the lock. This equates to the try..finally block:

  Lock lk = ...;
  lk.lock();
  try {
    // do something which required locking
  } finally {
    lk.unlock();
  }

withFileReader

  File file = ...;
  withFileReader (FileReader in : file) {
    // read data from the file
  }

This operation opens the file, creates the reader, calls the closure, then closes the file. Again this equates to a try..finally block and eliminates problems by forgetting to close the resource.

Similar methods would exist for Writer, and streams.

eachLine

  File file = ...;
  eachLine (String line : file) {
    // called once for each line in the file
  }

This operation opens the file, creates the reader, reads the file line by line calling the closure once per line, then closes the file. This equates to quite a significant body of code that is essentially cut and pasted in many applications. And again it eliminates problems by forgetting to close the resource.

Summary

Although I haven't shown it, additional examples could be written for databases, or any other connection where the resource needs explicit closure after the task is complete. These tasks all seem a lot cleaner using the proposed closure syntax than at present.

Wednesday, 3 January 2007

Closures - use case - instanceof

Another 'use case' for the closures proposal following on from yesterday. This time a problematical one - instanceof.

ifInstanceOf

  Object obj = ...;
  ifInstanceOf (String str : obj) {
    // do something with the string
  }

The intention of this closure method is to take the input object 'obj', check if it matches the required type, and then call the closure itself with the typecast object 'str'. Thus, this operation is the closure equivalent of this current operation:

  Object obj = ...;
  if (obj instanceof String) {
    String str = (String) obj;
    // do something with the string
  }

So, how would the implementation of the method look? Well the implementation needs to be generic, as we can't just handle String objects. So, lets use T to represent the type that we are testing:

public static [T] void ifInstanceOf(Object obj, {T=>void} block) {
  if (obj instanceof T) {
    block.invoke((T) obj);
  }
}

Unfortunately, this won't compile, as you can't write obj instanceof T where T is a generic type. The problem is of course erasure. This method has no way of determining what the actual type of T is - String in this example. So what can be done?

One solution would be to pass the type in as an additional parameter from the calling code:

  Object obj = ...;
  ifInstanceOf (String str : obj, String.class) {
    // do something with the string
  }

But that is just rubbish, as you could define a different type as the Class object and get a ClassCastException - ifInstanceOf (String str : obj, Integer.class)!.

A better solution would be for closure blocks to store the actual invoked generic types as instance variables. Then we could write:

public static [T] void ifInstanceOf(Object obj, {T=>void} block) {
  if (block.actualType(0).isInstance(obj)) {
    block.invoke((T) obj);
  }
}

This would be fairly simple to do, and in fact could probably be optimised by the compiler so that the actual types are only stored if the actualType method is called.

I'm not sure if this is the only use-case where the actual types of the closure would be useful, but I suspect it isn't. If you've got another use-case where the actual type would be useful, or any other feedback, please add a comment!

Tuesday, 2 January 2007

What kind of closure person are you?

I seem to be detecting a pattern in the contributions to the debate on closures. And its based on the prior programming language experience of the contributor. Which kind of person are you?

  • Functional - Like Haskell. See closures as functions, and a way to do lots of clever FP black magic. Typically also argue for currying and tail-order recursion. Unwilling to accept that Java isn't and never will be a functional language.
  • Dynamic - Like Ruby, Groovy or Smalltalk. See closures as methods on objects, which are the primary means to loop, read files, etc. Initially like the idea of closures in Java but see the implementation as too verbose. Typically also argue for extra methods on the Java Collection or List interface, and type inference. Find it very difficult to move back from the dynamic-typed language to the static-typed Java where a lot more keypresses are required.
  • Java-lover - Like Java and may well not have seen much else. See closures as an alternative to inner classes with strange, different rules. Typically argue for simpler syntax for inner classes instead of closures. Unwilling to look beyond Java - if it ain't broke, don't fix it.

I hope I'm not be too cruel by categorizing this - please don't hold it against me!!! There are of course many people who don't fit this model, but its a basis to start from. And remember, its framed in terms of responses to the idea of closures in Java.

And me? Well I was in the last camp - the Java-lover. But I've now explored just enough of the dynamic world to see how closures work there. And I can now understand the desire for the dynamic crowd to have closures as regular methods - list.each() for example. However, I'm far from convinced that the method-style invocation of a closure fits the Java-style or mentality.

And that's the point of this blog post. If closures are added to Java, then the syntax must feel right and understandable to the third group above - Java-lovers. And it seems to me that the keyword-style invocation is much more palatable to that group. Of course, as such, the proposal feels wrong to most in the functional and dynamic groups.

BTW. That's why I use the keyword style syntax for all my examples of closures. Because in the end, I believe that style is what will be most understood by Java developers. And probably most hated by the dynamic and functional groups!

Closures use cases - updating

More 'use cases' for the closures proposal (see yesterday). This time, the more thorny subject of methods that alter the collection.

The standard solution to altering a collection using a closure in other languages seems to be 'don't do it'. Instead, functional programming prefers returning a new collection with the contents altered. I would argue that while that might sometimes be what you want, it probably doesn't fit well with the Java paradigms. ie. in Java, we tend to alter the state of our existing objects. Lets see how that works:

filter(Collection)

  Collection[String] coll = ...
  filter (String str : coll) {
    boolean remove = false;
    if (str == null) {
      remove = true;
    }
    remove
  }

This operation would remove the element (calling Iterator.remove) if the result of the closure is true. But, the syntax is very clumsy. The result can only be specified as the last expression in the block, and it has to be without a semicolon. This looks like a guaranteed cause of confusion.

transform(List)

  List[String] list = ...
  transform (String str : list) {
    str.toUpperCase()
  }

This operation would change each value in the list to a new value, based on the result of the closure. Note again that the result is defined by a statement with no semi-colon at the end. In this case, because there is no other statement in the block, this doesn't look too bad, but I'm sure I'd put the semi-colon in as a reflex action.

Alternate approach 1 - yield parameter

  List[String] list = ...
  update (String str : list : boolean removeItem) {
    if (str == null) {
      removeItem = true;
    }
  }

The idea here is that instead of yielding a result back to the closure by some arbitrary last statement without a semi-colon, a variable is provided which must be updated with the resut of the code-block. In other words, removeItem is the result of the closure, and you have to assign a value to it

There may be ways to play with the syntax, but this does feel a little 'odd'. It would work as a general purpose replacement for the 'last-line-no-semi-colon' concept.

Alternate approach 2 - an updater

  List[String] list = ...
  update (String str, Updater[String] it : list) {
    if (str == null) {
      it.remove();
    } else {
      it.set(str.toUpperCase());
    }
  }

The idea is to provide the closure with an object suitable for updating the list. Now perhaps this could just be a ListIterator, or it might be a separate class with a different set of methods (ie. 'Updater') - thats detail.

The nice part about this solution is that it is closer to traditional Java syntax and traditional Java mentality. Of course you might then ask exactly what this loop is gaining over the current code. One downside is that the 'String' parameter has to be repeated. This could potentially be solved, but with its own issues:

  List[String] list = ...
  update (Updater[String] it : list) {
    if (it.get() == null) {
      it.remove();
    } else {
      it.set(it.get().toUpperCase());
    }
  }

Alternate approach 3 - improve the syntax of the closures proposal

  List[String] list = ...
  removeLoop:update (String str : list) {
    if (str == null) {
      goto removeLoop => true;
    }
    goto removeLoop => false;
  }

The idea here is to provide a mechanism to return the result of the closure clearly and safely. The 'removeLoop' is a label, as currently exists in Java. Here I am using it to clearly identify which closure to return the result to. (Neal Gafter has indicated that yielding a result without identifying the closure to yield to causes issues.) The goto in this proposal also doesn't break Java, as goto is a reserved keyword.

It should be noted that there are many, many possible syntax variations to achieve the effect above. I'm certainly not hung up on this one, although I think it reads pretty clearly, and is safe.

Summary

Well, that outlines where I see a hole in the closures proposal at present, and three or four possible solutions. Next time, I'll try and cover some non-collection use-cases. All feedback welcomed!!

Closures use cases - looping

The closures proposal for Java 7 continues to progress. But there doesn't seem to be a large list of potential use cases lying about, just a few random examples. I'm going to try and write up some example 'use cases' hopefully to help the debate.

each(Collection)

  Collection[String] coll = ...
  each (String str : coll) {
    // do something with str
  }

This would loop around each entry in the collection exactly as per the enhanced foreach loop.

eachIndexed(Collection)

  Collection[String] coll = ...
  eachIndexed (String str, int index : coll) {
    // do something with str
  }

This would provide a loop (0-based, incrementing by 1) as well as the entry each time around the loop.

eachReversed(List)

  List[String] list = ...
  eachReversed (String str : list) {
    // do something with str
  }

This operation would loop around the list in reverse order.

each(Map)

  Map[String, Integer] map = ...
  each (String key, Integer value : map) {
    // do something with key/value
  }

This loops around each entry in the map, giving the loop the key and value each time around. An indexed version of this method would probably also be needed.