Tuesday, 27 February 2007

First-Class Methods: Java-style closures

The debate on Closures in Java 7 has been limited to two proposals so far. Today Stefan Schulz and myself would like to offer a third option - First-Class Methods: Java-style closures (FCM) - to the Java community.

First-Class Methods: Java-style closures

Our aim when writing this proposal was to provide developers with much of the power of full-blown closures, but without the complexity. The syntax had to fit neatly within Java, but more importantly so did the semantics. The result of our work is a closure proposal that focuses on methods rather than closures, hence the name 'First-class methods'.

For those currently programming in Java, the method is the principal construct where application logic resides. Our proposal simply takes methods and allows them to be used as first-class objects within the application. But the new constructs are still recognisably methods. This results in a rapid learning curve - since all developers are already familiar with methods.

The proposal adds four new syntax elements:

  • Method literals - a type-safe way to refer to a method
  • Method types - a way to define the parameters and result of a method
  • Invocable method references - a reference to a method-object combination that together can be invoked
  • Inner methods - a way to write a method inline within another method, similar to inner classes

Full details of the syntax and semantics can be found in the First-Class Methods proposal.

Example

To whet your appetite, here is an example of creating a comparator using an inner class in Java 5:

  List<String> list = ...
  Collections.sort(list, new Comparator<String>() {
    public int compare(String str1, String str2) {
      return str1.length() - str2.length();
    }
  });

And here is the code rewritten using an inner method from the proposal:

  List<String> list = ...
  Collections.sort(list, #(String str1, String str2) {
    return str1.length() - str2.length();
  });

The inner method version will compile to almost exactly the same bytecode as the inner class version. But the syntax is much shorter and the semantics clearer.

Relationship to other proposals

There are two other closure proposals being debated. The CICE proposal aims to simplify inner class creation, while the BGGA proposal aims to introduce full-blown closures and API driven control flow statements.

The First-Class Methods (FCM) proposal is located between CICE and BGGA. FCM is considerably more powerful than CICE, which is still lumbered with the dreaded OuterClass.this syntax of the inner class. Equally, FCM doesn't get caught up in the obsessive BGGA requirement of Tennent's Correspondence Principles with its myriad of nasty implications (such as last-line-no-semicolon-return, exceptions for control-flow and RestrictedFunction). Instead FCM focuses on empowering the humble, but familiar, method.

Summary

The First-Class Methods proposal (FCM) proposal provides an alternative to the two existing closure proposals. It focusses on the familiar concept of methods, providing all the syntax and semantics necessary to use them to their full potential. The result is a proposal that gives the developer power without complexity or surprise and would make a great Java language change.


As always, Stefan and I would love to hear your feedback on this proposal.

38 comments:

  1. I like the sound of this. I also seem to remember something like this in Modula2 - or is my memory showing its age.

    ReplyDelete
  2. The "." would be javaish. Personally I find the "#" disturbing.

    The method type decalaration using ActionEvent->void is easier to distinguish when reading code than the nested void(ActionEvent)

    ReplyDelete
  3. I haven't thought much about all the implications of this proposal yet, but one thing I wondered was:

    isn't the "this" in such a method basically untyped? (Or at least not know until runtime)

    Maybe not a huge problem but a possible source of hard-to-find problems. Or am I missing something here?

    ReplyDelete
  4. Finally an idea that makes me like the idea of closures in the java language. Good job.

    I found the # weird in javadoc comments, but if # now means "reference to a method" it starts making a whole lot more sense to me.

    ReplyDelete
  5. This example is too simple to be convincing. Show us something that uses local variables and this (or state that that's impossible in this proposal.) That's where things get tricky.

    ReplyDelete
  6. I don't see why the closures-enabled example would be better/easier/faster or more readable...

    ReplyDelete
  7. It's not really a closure if it doesn't capture state correct? I don't think we'll be able to add closures without adding keywords...

    For some reason I keep thinking the closures and properties might have some benefits of being solved at the same time. They require new explicit definitions in areas that were previously left semi-implicit "field/method".
    Dunno, just musing...

    ReplyDelete
  8. Stephen Colebourne27 February 2007 13:37

    Thanks for the comments so far,

    @Carsten, If I asked you to define a method signature in Java, which wuold you write - ActionEvent -> void or void(ActionEvent)? Our proposal claims the latter is more Java-like.

    @Quintesse, Within an inner method, 'this' refers to the class that surrounds the inner method in the source code (its a closure). As such its fully typed, and not a source of bugs.

    @Elliotte, See the description of 'this' above. With local variables, we've defined the semantics (that all local variables should be accessible) but not the syntax. I suspect that there actually doesn't need to be any syntax - after all BGGA doesn't have syntax for this concept.

    @Ivan, Inner methods DO capture their environment, its just that the simple example in this blog doesn't need to.

    ReplyDelete
  9. quintesse - the 'this' is the instance of the enclosing class, not the instance of the compile-time generated class that actually contains the inner method.

    Carsten - while '.' could possibly work for inner methods, it couldn't for method references. object#toString() is unambiguous, object.toString() already means something.

    Elliotte Rusty Harold - here's an example that accesses state (there is actually one in the proposal):

    boolean useIgnoreCase=Math.random()<0.5;

    Collections.sort(list,#(String one,String two)
    {
    return useIgnoreCase ? one.compareTo(two) : one.compareToIgnoreCase(two);
    });

    But I'm sure you could have worked that out for yourself.

    Ivan - it does capture state. Final and non-final variables from the enclosing scope are accessible.

    ReplyDelete
  10. Hello,

    I respect your attempts at keeping new proposals more Java-like, we need people like you. Yet I still do not understand why having a special delimiter symbol is necessary at all.

    The parser can well be implemented to differentiate between a closure declaration or a regular parameter declaration, in fact having it in parenthesis is enough to establish the required context.

    No need for the '#'.

    ReplyDelete
  11. final class Toolkit {
    public static final List fuse(final (List) methods...) {
    final List l = new LinkedList();
    for(final (List) m : methods) {
    m(l);
    }
    return l;
    }
    }

    final class Test {
    final List l = Toolkit.fuse(
    (final List l) {
    l.add(0);
    },
    (final List l) {
    l.add(1);
    },
    Test.fuseMethodTwo
    );
    private final void fuseMethodTwo(final List l) {
    l.add(2);
    }
    }

    The compiler will perform all the normal overloading resolution to determine which 'fuseMethodTwo'
    it must reference.

    Assignment:

    To assign a method to a local reference, one will specify the type signature and use the regular
    '.' operator to signal to the compiler that a method literal is being assigned. Which can then
    be invoked using regular method call semantics.

    e.g., (consider the above class Test as an open namespace, thus continuing...)

    class Test {
    private final void test() {
    // acquire reference
    (List) fuseMethodTwoAlias = Test.fuseMethodTwo;
    // invoke
    fuseMethodTwoAlias(new LinkedList());
    // pass to utility method
    Toolkit.fuse(fuseMethodAliasTwo, Test.fuseMethodTwo);
    }
    }

    Please consider the above as a possible alternative.

    ReplyDelete
  12. Test.fuseMethodTwo is ambiguous, it may be a field. It certainly looks like field access.

    ReplyDelete
  13. If there is a field of type:

    Test {
    (List) fuseMethodTwo;
    }

    Then the field will be referenced, if not, then method reference will be created on the fly. If there is a field on type int, or Object, or anything else, how can compiler in a statically typed language confuse it for a method reference?

    End result is the same, whether or make a reference on the fly, or reference on already defined as a member of said class.

    ReplyDelete
  14. I really like this proposal. The syntax is very clean and still feels like Java.

    The only thing I don't like is the automatic generation of null methods (in the MouseListener example). Something just doesn't feel right about it. To me one of the beauties of Java is that its syntax is very clear, there's not a lot of compiler magic you have to worry about. It's one of the reasons I don't like autoboxing and unboxing--yes, it's convenient, but if you get your parameters mixed up you can accidentally call the wrong method, since the primitive is automatically boxed into an object. Before autoboxing this would have been caught by the compiler.

    Rather than automatically generating the missing methods, I think it would be more natural to add a syntax for subclassing and overriding a single method. In the MouseListener example, we would subclass MouseAdapter:

    MouseListener listener = MouseAdapter#mouseClicked(MouseEvent evt) {
    ...
    };

    Note that this syntax overlaps with the syntax for referring to static methods e.g. Math#min(int, int), however the presence of the code block should be enough to clarify the programmer's intention to subclass.

    This makes what you're doing explicit, without much burden. It also allows you to program to the interface (listener is a MouseListener reference instead of MouseAdapter) but with the ability to explicitly specify the base class. I can think of dozens of cases where this usage would be useful for me.

    The DefaultListCellRenderer example could then be changed to:

    ListCellRenderer renderer = DefaultListCellRenderer#getCellListRendererComponent(JList list, Object value, int index, boolean isSelected, boolean hasFocus) {
    ...
    };

    If the superclass requires arguments in the constructor, you could specify them the same way as with anonymous inner classes:

    MyInterface object = MyAbstractImplementer(arg1, arg2)#interfaceMethod() {
    ...
    };

    ReplyDelete
  15. jnice, our goal was to keep the new method elements visible, clear, and (hopefully) not introduce new sources of error-prone code. The line:
    (List) fuseMethodTwo;
    already is valid Java code by now (I changed to Integer, as you cannot have primitives as generic types) representing a cast to a List of Integers (although, it might not make sense to not assign to something). Your ideas are ok, but shift too much inference to the compiler, which makes it difficult to understand for a developer.
    Imagine you introduce the above static variable to Test and, at the same time, have the method:
    void fuseMethodTwo(List);
    Which one will the compiler choose on referencing to Test.fuseMethodTwo? Or would it cause a name clash?

    Matthew, the syntax you propose was included in some earlier draft of our proposal and referred to as "Concise Instance Construction". It does not directly fit into the proposal, though, as it is not about methods. A CIC would also add further context to the method, namely the instance you are creating (binding of "this"), and is not really far from writing an anonymous class (merely only missing a "new" and a set of braces).
    I don't say it may not be worth to be introduced, but definitely is out of scope for our proposal.

    Cheers.Stefan

    ReplyDelete
  16. Stefan,

    My suggestion was not intended to apply outside the case of multiple abstract methods. I think of it as a natural expansion of what you're proposing for that special case, not a replacement.

    As for being out of scope, the same argument could also be made against automatic generation of null methods by the compiler. ;)

    I do want to voice some concerns on the new "this" binding. It may shorten the code but I don't think it simplifies it. People already understand that "this" refers to the object that owns the currently executing method--including in anonymous inner classes. Having "this" bind differently, depending on whether I use the old syntax (anonymous inner class) or the new syntax (anonymous inner method) is inconsistent and bound to cause confusion.

    For example: Let's say I decide to update some existing code to use the new syntax. So I go through and convert each anonymous inner class, removing the now unnecessary boilerplate. I expect this code to work exactly the same as before. This cannot be guaranteed under the terms of your proposal.

    I appreciate the savings in typing and the improved readability but I don't think it is worth the confusion.

    ReplyDelete
  17. Stefan,

    I do realize that shifting too much to the compiler can be undesirable, however, the goals lately appear to achieve just that. With the new initialization syntax by Peter von der Ahe (HashMap.new()) (which I like a lot) and just in general, to alleviate the typing (new properties proposals, etc).

    Of course some people do not like cutting away Java's verbosity (which has its place to a degree).

    My main gripe - and I regret coming across with perhaps too much sarcasm or harshness - is the (subjective) uglification of a good looking language with introduction of symbols which can otherwise be expressed with some context and/or existing symbols.

    As for name clash (btw use of int was intentional, Integer is too long and what I wrote was not valid Java code anyway), it can again be resolved with some context.

    In my opinion, a class member initialized with a reference to a method should override the selection of the referenced method of same type. This may appear counterproductive and/or unintuitive in light of the recent discussion about properties (where methods shadow variables of same name, etc), but it carries a nice attribute in that actual method reference can be reassigned without changing the calling code.

    (pseudo-code)

    Test {
    (List) method = method;
    void method(List l) {}
    }

    When passing the 'method' literal, compiler will select the field if one exists in the provided lexical scope, otherwise it will attempt to create a reference to a method.

    If the field is selected, then the actual reference creation is manual, and compiler will simply assign the field to the target method parameter, which can then be invoked using regular invocation syntax.

    The field can be reassigned to another method without changing the calling site.

    I'm still pondering the implications of this approach, but its at least something to think about. (for me at anyway)

    Thanks for your thoughts.

    P.S.

    The field in this discussion is not a regular member, it is essentially a pointer to function, and as such should not be seen as "giving preference to a static entity versus a dynamic entity, such as a regular method". Since it is a pointer to method, its end result upon invocation is the execution of the pointed-to method.

    ReplyDelete
  18. Ok, about my "untyped this" question, don't know what got me confused but I had the idea, while reading the proposal, that the method reference could actually somehow come from a different instance so that's why I couldn't figure out what "this" meant at all times. But it was just me, thx for the correction :-)

    Is it maybe possible to give some kind of list of things that you do and do not support compared to the other two proposals? Just to have a clear understanding what we would gain and what we would lose (in your opinion) if this was ever implemented over one of the others.

    ReplyDelete
  19. I really like this proposal, it feels very Java-ish. I think, loops should be supported (do you still plan on doing so?). Maybe like this: super.return (or something similar) is for non-local returns from the method and break/continue throw exceptions that can be caught by loop implementors. A non-local return is simulated (as usual) via an exception.

    I think Java could really win back a lot of people from the Python, Ruby and Smalltalk camps with first-class methods, map/list literals, multi-line strings and a few clean-ups in the standard library (negative collection indices, anyone?).

    ReplyDelete
  20. Stephen Colebourne28 February 2007 01:57

    @Matthew, As Stefan said, we had the MouseAdapter#mouseClicked(MouseEvent ev) syntax of yours in the proposal for a while. But the problem is that it places the emphasis on the class, whereas we want the emphasis to be on the method. Put simply, if you want to create a class, with constructor arguments and allowing this in the method to reference the class, then write an inner class! (Inner classes aren't 'old syntax' with FCM). An inner method has a different meaning.

    I agree that converting an inner class to an inner method isn't necessary free because of the different 'this' semantics. But the semantic change is essential.

    When you write an inner class, it is clear and visible in the source code that a class is being created, so it is reasonable for 'this' to bind to that inner class. When you write an inner method, there is no visible surrounding class other than that which contains the method holding the inner method. So 'this' should naturally refer to the surrounding object, not a construct that doesn't appear in the source code.

    @Quintesse, I'll try and compare the three proposals in a new blog soon.

    @Alex, We exclude the control-invocation syntax, so there is no need for special loop constructs, non-local returns, break/continue as exceptions. Thats why this proposal is a lot simpler to implement and get right than BGGA.

    @jnice, A number of your points refer to a dislike of #. This character was chosen as (a) its already used in javadoc, (b) its not used for anything else, and (c) by adding it the syntax has a more definite (and we think clearer) style than just allowing one set of brackets to flow into another set of braces.

    ReplyDelete
  21. My only real beef is where you say that nullary methods would be automatically created when there are multiple abstract methods (wide interfaces). Outside the domain of event listeners this approach is completely wrong. Case in point:

    List numberedElements = #get(int index) {
    return "Element #"+index;
    }

    As soon as we call a method other than get(int) we're going to get nonsense answers. That's not the intended use for the syntax but it's a real possibility. This is why I feel that the case for implementing multiple abstract methods with the # syntax should be disallowed. There can only be a single abstract method, period.

    Therefore if we want a concise event listener on a wide interface, we have to start from a base implementation which already implements all the other methods, hence my first proposal. It's the only way (famous last words) that I can see to be sure that interfaces are being implemented correctly.

    Other than this one small item, I very much support your proposal, and hope it gets the attention and traction it deserves.

    ReplyDelete
  22. Matthew, your explanation is sound and I somewhat agree. At least it should be considered in the next version of the proposal either in form of allowing assignments to SAM-providing and non-abstract classes only, or by adding a rationale about a converse decision.

    ReplyDelete
  23. Stephen Colebourne28 February 2007 11:18

    @Matthew, Your example is a good one. One possible solution would be to only allow an override of a class/interface with multiple abstract methods if all the methods not overridden return void. That would handle the MouseListener case, and block the List example.

    ReplyDelete
  24. What about using a array-like syntax for implementing multiple methods? Something like:

    MouseListener lnr = {
    #mouseClicked(MouseEvent ev) { ... },
    this#mouseReleased(MouseEvent ev),
    thid#mousePressed(MouseEvent ev),
    #mouseEntered(MouseEvent ev) { ... },
    #mouseExited(MouseEvent ev) { ... }
    };

    ReplyDelete
  25. Although I've already mentioned most of this in email to Stephen and Stefan, I'd just like to point out what I consider to be right and wrong with this proposal.

    Method references appear to be done absolutely right, as do inner methods, except that a reference to an instance method is not invocable. The use of # looked odd to me at first, but it's clear that '.' would be hard to read for some cases, and other syntaxes would look less like existing Java. If the proposal was about runnable blocks, like the BGGA is, then the syntax being method-like would be a disadvantage. As it is method-orientated, it is an advantage.

    In discussions with Ste{phe,fa}n, I never seemed to get a conclusive reason why Object#toString() couldn't be invoked. To me, it seems that if you pass an object to it as the first parameter, you could invoke it, that is: Object#toString() is a #(String(Object)) (a method that takes an Object and delivers a String). The proposal does, however, let you do instance#toString(), and that is invocable.

    I admire the way they both completely ignored my suggestions about introducing inference, that probably keeps the spec readable and simple. ;)

    The limitations of the generic type system mean that this proposal, and the BGGA, need to generate separate interfaces for each type of method/function. One big reason is that generics don't do varargs, so you can't write interface Function
    {
    Return invoke(Params params...) throws Exceptions;
    }

    Of course, there is some ambiguity in the above, you'd need to say where params stop and exceptions start, but if the generic type system included varargs, plus some way of delimiting sets of type parameters, I think we'd be onto a winner. Neal thinks, for the case I presented to him, it would be simpler just to add a currying/partial function evaluation operator, but the fact that you can't do this as a language user rather than designer shows the limits of the type system.

    ReplyDelete
  26. I suppose between all these alternatives something will emerge that will:

    a) Facilitate familiarity for newcomers from languages like C/C++. (Perl users will be right at home of course)

    b) Keep installed Java base put.

    So many people working on it, its bound to work out.

    ReplyDelete
  27. @Carlos: Why would you prefer such a difficult to read syntax over creating and inner class?

    @Ricky: Thanks, Ricky. :)
    Btw. the reason that Object#toString() is not invocable is the same as that Object.toString() is not invocable (in contrast to obj#toString()). Of course, you could automatically let the compiler wrap it into
    #(Object obj) { return obj.toString(); }
    But that's not the same. On the other hand, you can invoke it like follows:
    Object#toString().invoke(obj);
    as the first becomes an instance of Method.

    ReplyDelete
  28. Object#toString().invoke(obj)

    And you'd have to catch all the associated exceptions, which is odd because this can be statically checked.

    ReplyDelete
  29. Well, I lost this part on Stephen ;)

    ReplyDelete
  30. My main concern with all these closure proposals is that even if I don't like them, libraries will start popping up that require I use them. Additionally, for the people who do like them, existing libraries will need to be modified to allow their use, am I right? Collections.sort doesn't currently accept arguments in the form or Method or Method Reference.

    I see that your proposal can assign your Method reference to an Interface method, and return an object implementing that interface, which would solve part of the problem. But how is that any different than an anonymous inner class definition? Just shorter?

    ReplyDelete
  31. This is slightly off-topic, but as far as the ability to distinguish between returning from the enclosing method vs returning from the inner method, why not just introduce the "yield" operator to return from the inner method, and have "return" return from the enclosing method? Just a thought.

    ReplyDelete
  32. Stephen Colebourne1 March 2007 00:08

    @Michael, Collections.sort currently takes a Comparator as its second parameter, and FCM is simply adapting to that Comparator. An inner method differs from an inner class in the semantics of 'this'. In an inner class 'this' has complex meaning, in an inner method it just refers to the class that surrounds it in source code.

    @Matthew, Using 'return' is simply more method-like, and familiar to exiting developers. For the cases that we are tackling, there is no need for returning from the surrounding method.

    ReplyDelete
  33. I don't see why the same argument for redefining "this" in an inner method couldn't apply to my argument for redefining "return." In a block they could *both* be in terms of the enclosing scope.

    The problem is that if you don't plan for *some* way to return from the enclosing scope before implementation, it will be too late to fix it later.

    ReplyDelete
  34. I'm not familiar with the internals of the JVM, but would it be possible to overload functions on a per-reference basis? I know scripting languages inside the JVM can dynamically redefine methods and add interface implementation to an instance, so maybe it's possible.

    I was thinking something that would work like this:

    Collections.sort(list,
    new Comparator.compare(Object a, Object b) {
    //.. compare code
    }
    );

    Which would create a new anonymous class that extends Object and implements Comparator, with the compare method defined in the block. This is really just a simplified inner class, but what about something like this:

    JButton button = new JButton("Hello World");
    button.addActionListener(
    this$ActionListener.actionPerformed(ActionEvent e) {
    //.. Action code
    }
    );

    This would return a reference to 'this' that is overloaded such that calling 'actionPerformed' on that reference would execute the code block, while calling 'actionPerformed' on another reference to the same object may result in executing a different code block.

    Is something like this even feasible without major modifications to the JVM and/or bytecode spec?

    ReplyDelete
  35. Stephen Colebourne1 March 2007 13:23

    @Matthew, What is the use case for needing yield and return? If you read http://gafter.blogspot.com/2006/08/use-cases-for-closures.html then you'll find we are tackling only the asynchronous use cases with this proposal (hopefully, we'll produce a separate document to discuss the synchronous use case). I don't believe that the asnyc use case needs both a yield and a return.

    @Michael, Maybe I've missed something, but I think you'll find the proposal covers all your requirements.

    ReplyDelete
  36. Stepan Koltsov5 March 2007 13:42

    Stephen, I have several comments about your proposal.

    First, I think # is unnecessary when declaring method:

    List list = ...
    Collections.sort(list, (String str1, String str2) {
    __return str1.length() - str2.length();
    });

    This syntax is similar to BGGA v0.2, that is perfect at my opinion (and as in v0.2 I would like to remove function types).

    Second, I'd prefer to change hash # symbol to :: . # just points to HTML anchor.

    And third, most important: in the most cases, argument types can be omitted:

    List list = ...
    Collections.sort(list, (str1, str2) {
    __return str1.length() - str2.length();
    });

    Collection.sort accepts (List, Comparator), so if first argument is List, then second argument must be Comparator, so declaring argument types is not necessary.

    ReplyDelete
  37. Stephen Colebourne5 March 2007 17:35

    @Stephan, Point 1: You are correct that the # is unnecessary to the compiler. However, we argue that it *is* necessary for the *human* reading the code at a later date.

    Point 2: I'm sure that there are many alternatives to #. I started off with -> if you look back through my blog. Now I've used it, I quite like it, but its an aesthetic thing and depends on what other languages you've used.

    Point 3: Again, you are correct that the compiler could infer the types. However, that is unlike any other location in Java, and greatly increases the complexity of the proposal. If such type inference is to occur it should be a separate proposal which affects all of Java, such as using a new 'var' keyword.

    ReplyDelete
  38. A while back I proposed another alternative, Clear, Consistent, and Concise Syntax (C3S) for Java (http://www.artima.com/weblogs/viewpost.jsp?thread=182412). It is similar to yours in that it is based around inner classes, but has short syntax. Another difference is that it is an error if you don't supply all the methods in an interface/abstract class. Your example in my proposal is:

    List< String > list = Collections.sort list, method( String str1, String str2 ) {
    __str1.length() - str2.length()
    };

    ReplyDelete