Saturday, 31 March 2007

Closures - Outside Java

Trying to agree semantics for closures in Java isn't easy. Maybe we should look at what some other Java-like languages have done - Nice, Groovy and Scala.

The first thing to notice is that they all have closures. Java really is the odd one out now. The devil is in the detail though. Specifically, what do these languages do with the interaction between continue/break/return and the closure?

Nice

  // syntax for the higher order function
  String find(String -> boolean predicate) {
    ...
    boolean result = predicate(str);
    ...
  }

  // syntax 1 to call the higher order function
  String result = strings.find(String str => str.length() == 2);

  // syntax 2 to call the higher order function
  String result = strings.find(String str => {
    // other code
    return str.length() == 2;
  });

Nice has two variants of calling the closure. Syntax 1, where there is just a single expression has no braces and no return keyword. Syntax 2, has a block of code in braces, and uses the return keyword to return a value back to the higher order function.

There is a control abstraction syntax variant, where again the return keyword returns back to the higher order function. Similarly, the continue and break keywords may not cross the boundary between the closure and the main application.

Groovy

  // syntax for the higher order function
  def String find(Closure predicate) {
    ...
    boolean result = predicate(str)
    ...
  }

  // syntax 1 to call the higher order function
  String result = strings.find({String str -> str.length() == 2})

  // syntax 2 to call the higher order function
  String result = strings.find({String str -> 
    // other code
    return str.length() == 2;
  })

Groovy has many possible variants of calling the closure (with optional types). Semicolons at the end of lines are always optional, as is specifying the return keyword. So syntax 1 and 2 are naturally the same in Groovy anyway. However, the example does show that if you do use the return keyword, then it returns from the closure to the higher order function.

There is a control abstraction syntax variant, where again the return keyword returns back to the higher order function. Similarly, the continue and break keywords may not cross the boundary between the closure and the main application.

Scala

  // syntax for the higher order function
  def find(predicate : (String) => boolean) : String = {
    ...
    boolean result = predicate(str)
    ...
  }

  // syntax 1 to call the higher order function
  String result = strings.find({String str => str.length() == 2});

  // syntax 2 to call the higher order function
  String result = strings.find({String str =>
    // other code
    str.length() == 2;
  });

In Scala, like Groovy, semicolons at the end of lines are always optional. However, the key difference with Groovy is that the return keyword is always linked to the nearest enclosing method, not the closure. If that method is no longer running when the closure is invoked, then a NonLocalReturnException may be thrown.

There is a control abstraction syntax variant, where again the return keyword will return from the enclosing method. Scala does not support the continue and break keywords.

BGGA and FCM

Hopefully, the above is clear enough to show that Nice and Groovy operate in a similar manner (return always returns to the higher order function), whereas Scala is different (return will always return the enclosing method, but you may get an exception). This is especially noticeable when Nice/Groovy is used in a control abstraction syntax manner, because then the return keyword looks odd as it returns from an 'invisible' closure (the code often just looks like a keyword).

So how do the Java proposals compare?

BGGA follows the Scala model. If you write a return within the closure block it will return from the enclosing method. And you might get a NonLocalReturnException if that enclosing method has completed processing. BGGA also handles continue and break similarly.

FCM follows the Nice/Groovy model. If you write a return within the closure block it will return to the higher order function method, and there are no possible exceptions. There is no way to return from the enclosing method. Similarly, FCM prevents continue and break from escaping the boundary of the closure block.

Which model is right? Well, both have points in their favour and points against.

Scala/BGGA is more powerful as it allows return from the enclosing method, but it comes with the price of a weird exception - NonLocalReturnException. FCM, is thus slightly less powerful, but has no risk of undesirable exceptions.

Personally I can see why Scala's choice makes sense in Scala - because the language generally omits semicolons and return statements. But I'm not expecting us to remove semicolons from Java any time soon, or make return optional on all methods, so Scala's choices within a Java context seem dubious.

Feedback

Feedback welcomed of course, including examples from other languages (C#, Smalltalk, Ruby, ...).

And what about adding a new keyword to FCM to allow returning from the enclosing method? Such as "break return"? It would expose FCM to NonLocalReturnException though...

23 comments:

  1. Ruby does it the way BGGA and Scala do.

    ReplyDelete
  2. And nicely in Ruby, everything (?) is also an expression. Makes for a nice consistent language. Java, however, lives in expressions vs. statements land. (Solving that generally could be fun, but then there's that issue of semicolons on the end of blocks, ...)

    In any case, for Java, I think that any closures that look like ordinary blocks should act like ordinary blocks (i.e., Tennent's). Things that look like nested methods should act like nested methods. Pretty simple.


    I recommend against new constructs like "break return". Let's not get too fancy.

    ReplyDelete
  3. BGGA's earliest specs did have local function declarations, in which return returned from the local function. I haven't heard much call for adding those back to the spec.

    ReplyDelete
  4. Isn't the point of 'return' working on the non-local scope that the inner block is then a part of the outer/enclosing block rather than the function that calls the block?

    By analogy with variables:

    public static void func({->} block) {
    int i = 5;
    block.invoke();
    }

    int i = 0;
    func() {
    i++;
    }

    System.out.println(i);


    In this case it would seem obvious that the i incremented in the inner block is declared 2 lines above rather than in func().

    So conceptually the block that is passed around is part of by the containing block. That's one of the things that closures fix compared to inner classes. Returns come into the same category, and should act on the containing block rather than the block that invokes them.

    ReplyDelete
  5. Howard Lovatt1 April 2007 10:37

    The return to a enclosing block can be very confusing if the block returns a value. Not the examples given above don't return a value and hence don't have this confusion. Similarly for an if block say, there is no confusion because an if statement doesn't have a value. Foe a general facility the way inner classes work, return is to the caller, is clearer. Consider:

    static Boolean invert( Callable< Boolean > block ) {
    return !block.call();
    }

    static Boolean someMethod() {
    ...
    return invert( { => true; } );
    }

    Does someMethod return true or false? What does the closure conversion do? Even if you know what this does, is this really clear? Is this not yet another Java puzzler?

    Also note the NonLocalReturnException will happen a long way from the cause of the error, avoidance of far of errors is exactly what generics were introduced to stop. Same argument goes here, don't introduce a feature that will be hard to track the resulting bugs from.

    ReplyDelete
  6. Stephen Colebourne1 April 2007 12:38

    @Neal, thanks for the input on Ruby - I'm not surprised, particularly if everything is an expression as Tom suggests.

    @Tom, I'm pretty much with you that closures that look like ordinary blocks should act like ordinary blocks, and those that look like nested methods should act like nested methods.

    @Neal, I guess those that like FCM are asking for return to return the closure.

    @Joe, you wrote your example using control abstraction syntax - ie. it looks like a normal part of the language. In those cases, I would agree that return should return the enclosing method. But when the closure is passed as an argument, the other syntax for return makes more sense *in Java*.

    @Howard, I'm not sure your example demonstrates the problem you were trying to demonstrate. I agree on the NonLocalReturnException issue though.

    ReplyDelete
  7. The more I read on this, the more I think it is a syntax issue, i.e., a matter of perception. If something looks like an inner function or method (i.e., some entity that can be passed on), javaists expect return to return from it. If it looks like a control statement (i.e., some block of code being part of the current control flow), return is seen as bound to the enclosing scope.

    ReplyDelete
  8. howard Lovatt1 April 2007 14:59

    @Stephen,

    The example I gave (I think) returns true. You may not expect this since the line:

    return invert( { => true; } );

    might quite reasonable be expected to return false. Since you might think that true is passed to invert which then inverts it and therefore returns false. However the act of wrapping a BGGA closure inside a method circumvents this. The true is returned directly from someMethod, the enclosing class, and doesn't go through invert at all! Thus Tennent's principle is broken for methods that return a value in BGGA.

    Anyway; as you can see, it is very confusing.

    ReplyDelete
  9. @Howard,
    your example seems not valid to me. Firstly, the closure block you are passing is not compatible to Callable, as the block's result is void (last line having semicolon).
    Secondly, if you remove the semicolon, it would match. The final expression is true, which makes call() return true (which is what closure conversion provides), and hence invert returns false as expected.

    ReplyDelete
  10. @Howard: "true;" isn't a valid statement in Java, so that's just a syntax error.

    ReplyDelete
  11. Thanks to Stefan for the explanation. The significance of the final ; had passed my by. Assuming I have understood this correctly now I think it will be a cause of many hard to find bugs. Because, in Java you require a ; before a }, however in BGGA if you want to return a value the ; must be absent.

    However I don't think this ; was the mistake I made. The mistake was omitting a return. That will teach me to post at midnight (Australian time).

    Anyway, back to my original point. The line now with correct syntax (I think) is:

    return invert( { => return true; } );

    I am uncertain about the syntax, you might have to write:

    return invert( { => return true; true } );

    I think the non-local return is very confusing. In C3S it was suggested that if a non-local return was wanted then you should have to name it. You could use FCM style naming, so that in FCM the example might be:

    return invert( #() { return#somMethod() true; } );

    in C3S using FCM style naming (the C3S proposal used different syntax - the FCM syntax is better)

    return invert method { return#somMethod() true };

    in C3S it is suggested that these non-local returns are handled using checked exceptions with a synthesized exception name and therefore a method that uses a non-local return could only be passed to something that threw the appropriate exception.

    ReplyDelete
  12. Stephen Colebourne2 April 2007 01:08

    OK, lets compare return to closure vs return enclosing.

    a) BGGA which completes the closure normally - result is false:

    _ static Boolean someMethod() {
    ___ return invert( { => true } );
    _ }

    b) BGGA which returns enclosing, so the closure isn't completed normally - result is true:

    _ static Boolean someMethod() {
    ___ return invert( { => return true; } );
    _ }

    c) FCM which completes the closure normally - result is false:

    _ static Boolean someMethod() {
    ___ return invert( #{ return true; } );
    _ }

    FCM provides no mechanism to return enclosing.

    BGGA control abstraction syntax does not apply here.

    ReplyDelete
  13. @Stephen,

    You have given a nice summary. I think most people will find the differences between:

    return invert( { => true; } );
    return invert( { => true } );
    return invert( { => return true; } );

    Subtle and confusing. I would also say that the use case for:

    return invert( { => return true; } );

    is rare and can safely be omitted (you can throw an exception if needed). It also has potentially difficult to find bugs associated with it. If you must support this use case then I suggest:

    return invert( { => return#someMethod() true; } );

    is superior since the compiler can find the bugs due to the explicit naming of the method that is to be returned from.

    ReplyDelete
  14. As Neal pointed out, "true;" is no valid Java syntax and will result in a compiler error. As will the block containing "return true;" for not being compatible with the target SAM, whose sole method takes void and returns Boolean:
    { => return true; } -> {=>}
    { => true } -> {=>boolean}
    And, of course, on "return true; true" the compiler would state, that true is unreachable.
    I agree that the difference between the three blocks is subtle, but the compiler will tell you which one is appropriate.

    ReplyDelete
  15. Gotta correct myself: "{ => return true; }" of course is valid, as it does not complete normally. Should not post on early monday morning, I guess ;)
    If Callable is refactored to implement RestrictedFunction, using return would cause a compiler error, though.

    ReplyDelete
  16. @Stefan,

    I agree with your corrected comments. But:

    1. I think everyone will forget RestrictedFunction and refactoring old interfaces is probably not possible. This concept might work if it was the other way round, you had to enable non-local returns instead of disable them.

    2. The error message from:

    return invert( { => true; } );

    will be hard to follow. It is likely to say that true isn't a statement. It is unlikely to say that you put in an extra ; :(. Therefore I think people will find this mistake difficult to debug.

    People will make this mistake quite a bit, because you normally put a ; at the end of every line (even before a }). Therefore the extra ; will be the natural way to type this. The no ; rule meaning return this value is a special used only in closures. Normal methods would use return true; and therefore this will be a most odd situation. People are likely to type either:

    return invert( { => return true; } );
    return invert( { => true; } );

    since both are closer in syntax to the rest of Java than:

    return invert( { => true } );

    To add to the confusion it is also an unnecessary rule (that's why I missed it). The compiler knows if something is a void and therefore doesn't need the rule that ; at the end means void method.

    PS I sympathize with mistaken posts - I made a mistake at midnight last night!

    ReplyDelete
  17. Just a quick follow up. There are also other variations on the error message from:

    return invert( { => true; } );

    that are even more subtle. What error message will:

    return invert( { => invert( { => true } ); } );

    give?

    Probably something along the lines of the closure returns a void. Whereas:

    return invert( { => true; } );

    is likely to say that true isn't a statement. Thus the same mistake, an extra ;, is likely to gives completely different diagnostics depending on circumstance. Neither helpful!

    ReplyDelete
  18. And don't forget:

    d) Expression methods which allow only a single _expression_ - result is false:

    _ static Boolean someMethod() {
    ___ return invert( #() true );
    _ }

    And again, I don't advocate these as the only form of closure, but I think a single expression is likely to be more clear (in Java) than an expression at the end of a series of statements. Going for KISS-mode for LINQ here.

    ReplyDelete
  19. By the way, note that for closures, the issue with synchronous vs. asynchronous isn't about threads. It's about when the closure will be executed. I think if it is executed 0 or more times but never after the statement containing the closure, it should be considered "synchronous" (no matter what thread it executes on). And if possibly executed 1 or more times after the statement containing the closure, it should be considered "asynchronous" (even if on the same thread). These terms might need more formal definition, but that might have been done before.

    ReplyDelete
  20. Stephen Colebourne3 April 2007 17:45

    @Tom, yes, that is the definition of sync vs async I was working with. We'll use the term 'concurrent' if the closure might be executing in another thread whilst the method it was defined in is still executing.

    ReplyDelete
  21. Thanks for the info. That was on my mind, so I was just making sure.

    ReplyDelete
  22. I have put a comparison of different closure poposals on my blog - see URL below

    ReplyDelete