Stefan and I are pleased to announce the release of v0.4 of the First-class Methods: Java-style closures proposal.
Changes
Since v0.3, we have tried to incorporate some of the feedback received on the various forums. The main changes are as follows:
1) Constructor and Field literals. It is now possible to create type-safe, compile-time changed instances of java.lang.reflect.Constructor and Field using FCM syntax:
// method literal: Method m = Integer#valueOf(int); // constructor literal: Constructor<Integer> c = Integer#(int); // field literal: Field f = Integer#MAX_VALUE;
2) Extended method references. Invocable method references have been renamed to method references. There are now four types, static, instance, bound and constructor:
// static method reference: #(Integer(int)) ref = Integer#valueOf(int); // constructor reference: #(Integer(int)) ref = Integer#(int); // bound method reference: Integer i = ... #(int()) ref = i#intValue(); // instance method reference: #(int(Integer)) ref = Integer#intValue();
3) Defined mechanism for accessing local variables. We have chosen a simple mechanism - copy-on-construction. This mechanism copies the values of any local variables accessed by the inner method to the compiler-generated inner class at the point of instantiation (via the constructor).
The effect is that changes made to the local variable are not visible to the inner method. It also means that you do not have write access to the local variables from within the inner method. The benefits are simplicity and thread-safety:
int val = 6; StringBuffer buf = new StringBuffer(); #(void()) im = #{ int calc = val; // OK, val always equals 6 val = 7; // compile error! buf.append("Hi"); // OK buf = null; // compile error! };
4) Stub methods are only generated for methods of return type void. This addresses a concern that it was too easy to create invalid implementations using named inner methods.
5) Added method compounds. This addresses the inability to override more than one method at a time. Unlike inner classes, each inner method in a method compound remains independent of one another, thus in this example, the mouseClicked method cannot call the mouseExited method.
MouseListener lnr = #[ #mouseClicked(MouseEvent ev) { // handle mouse click } #mouseExited(MouseEvent ev) { // handle mouse exit } ];
6) Added more detail on implementation throughout the document, with specific examples.
We believe that we have addressed the key concerns of the community with this version of FCM. But if you've any more feedback, please let us know!
Making a copy on construction retains the 'final' problem, except that we no longer need 'final' to access the variable.
ReplyDeleteAlso, it doesn't work intuitively for this trivial problem:
int x=5;
invokeLater(#(){
System.out.println(x);
});
x=6;
If the above code runs on the Event Dispatch Thread, one would expect x to be 6, as the later-invoked code would happen, well, later.
I think that if copying is used, the 'final' restriction should remain. However, I don't think that copying should be used. Closures are defined as referring to free variables from the enclosing scope. That's not the same as referring to copies of them.
I know we discussed this already, but I think it's worth mentioning here.
Compile-time type-safe method references may be great, and "#" is probably the right way to do that, but BGGA covers simple expressions (super handy) and block construction (super needed) much better. The closer you get to inner class needs, the more you create just another way of doing the same thing. Method compounds are an extreme case of this. Sure there may be some differences, but how is someone to decide when to use one or the other? TOOWTDI beats TIMTOWTDI. So my opinion is to let BGGA win for closures but to continue to pursue static-typed '#' references to avoid reflection (especially for beanish properties - which is why "bound" can't be used in Java the way it is used in other languages).
ReplyDeleteI think that this would really mess the language itself. It would be less readable than c++ and very complicated, although the idea is quite good.
ReplyDelete@Ricky, Yes its good to mention. I am unconvinced that copy-on-construction semantics will confuse. Well, more accurately, I am convinced that changing the semantics to allow a race condition is a very Bad Idea. I want something that will be reproducible every time in testing, not blow up randomly in production.
ReplyDelete@Tom, By block construction I assume you mean control abstraction. By simple expressions I'm not sure what you mean really - the closure form perhaps?
FCM doesn't attempt to address the control abstraction use-case, it only attacks the asynchronous callback use case, which it does rather nicely.
Whether method compounds are really needed is a question I can understand you asking. The community asked for them, so we put them in.
@83, I think I'm confused by your double negative. Remember the semantics matter more than the syntax.
This revision seems like a step in the right direction.
ReplyDeleteThe instance method reference syntax threw me at first:
#(int(Integer)) ref = Integer#intValue();
It took me a moment to see what was going on. I think making the implicit instance an explicit argument to the method is going to confuse a lot of people. This may make sense to those with background in other languages like Python where "this" is always explicit, but it's confusing as hell to a Java programmer.
I don't like the method compounds. By the time you use them, you're only a few keystrokes away from a regular inner class anyway, so you're just adding a layer of complexity (two syntaxes for the same thing) instead of removing it.
I'm glad to see the additional restriction on automatic method generation so only methods with void return types can be generated. However this still smells funny to me. Void methods usually mean "do something" so a nullary method body seems exactly wrong for these cases. See java.applet.AudioClip. It's a weak example but it makes the point. I still feel strongly that nullary methods should not be generated by the compiler, and that when defining a single method from a wide interface the superclass should be explicitly declared:
MouseListener listener = MouseAdapter#mouseClicked(MouseEvent evt) {
// do stuff
};
It's not a lot of extra typing but it makes what you're doing much more clear.
On a side note, it seems to me that the public keyword as in the CICE proposal would seem to solve most of the closure issues people are bringing up.
I don't quite understand why so many hashes and parentheses are needed.
ReplyDeleteCouldn't the following:
#(Integer(int)) ref = Integer#valueOf(int);
#(Integer(int)) ref = Integer#(int);
#(int(Integer)) ref = Integer#intValue();
Integer i = ...
#(int()) ref = i#intValue();
#(void()) im = #{
...
};
be written like:
Integer(int) ref = Integer#valueOf(int);
Integer(int) ref = Integer#(int);
int(Integer) ref = Integer#intValue();
Integer i = ...
int() ref = i#intValue();
void() im = #{
...
};
without ambiguity or any loss of information? That would seem to me much more elegant.
Stephen, you guys have made an excellent proposal and you have my vote. I also see this go hand in hand with properties. This is exactly as far as I'm willing to go and still call the language Java. BGGA closures is a kitchen zink language proposal IMO and will confuse 90% and be loved by the 10% (which are 90% of us discussing this..)
ReplyDeleteAs @62 also suggested, isn't it possible to get rid of some of the parentheses, especially in the lvalue cases? It looks more elegant. I think we should still have the # though, even in the lvalue case, for clarity.
Cheers,
Mikael Grev
@Mikael & 62: The main reason for keeping the # is for clarity, consistency, and visibility. Of course, one could drop the parens at some places, but it gets quite confusing, if you also add throws statements:
ReplyDelete´ int(File) throws IOException read = ...
@Mathew: The main difference between method compounds and inner classes is the scope of access. And surely one point is to save stating the methods you don't care about, e.g., in a MouseListener.
After reading on Guice, though, another idea might be to state default implementation classes for interfaces, that could be used as instance base for a compound assignment:
´ @ImplementedBy(MouseAdapter.class)
´ public interface MouseListener { ... }
Concerning the kitchen sink, FCM has way more features in it than BGGA. BGGA is simple and small by comparison (and note that FCM would be longer than it is if it didn't rely on BGGA to define function type issues).
ReplyDeleteSo here are the use cases right now for closures (so ignoring the other features addressed by FCM):
1. Control abstraction. Great way to make reliable code much easier than in Java today. Also handy for custom looping and so on. BGGA supports this well. (FCM doesn't. Neither do inner classes. Almost no one uses them this way because it's a pain, and code quality suffers as a result.)
2. Synchronous expression callbacks. Great for queries, data manipulation, and related common use cases. BGGA supports this well. (FCM and today's inner classes also don't support this well.)
3. Asynchronous code blocks. Usually done today with inner classes, but it could be easier. FCM attempts to address this, and BGGA perhaps isn't the best way to do this since its control abstraction seems more geared around synchronous cases.
I think method compounds were proposed by people who weren't thinking about existing Java features.
I think issues 1 and 2 are much more needed in Java today than issue 3, and the spec to cover it is shorter. (That is, BGGA is shorter and simpler than FCM.) Maybe some syntax alignment and further thoughts on the subject of asynchronous code blocks could help avoid the need for competing specs.
@Tom, Its interesting that you believe BGGA to be simpler than FCM, perhaps because its shorter. I think thats a misunderstanding of the semantics of the proposals.
ReplyDeleteAll the key aspects of FCM (except method types) map cleanly onto existing Java semantic constructs, making it mostly syntax sugar. BGGA involves much more in depth detail with the type system, exceptions, control flow etc.
BTW, I agree that FCM doesn't tackle your 1 and 2 above. What we've argued strongly for so far is that 3 must not be compromised by focussing on 1 and 2. We'll continue to argue that.
Method compounds are an addition the community requested in the spec. They could be removed again just as easily if they prove unpopular.
Finally, can you give an example of your use case 2? It would be useful for us to know what you're thinking of.
It's actually quite close to existing Java semantics. That's actually one of the points of Tennet's Correspondence Principle: don't change semantics. Or rather, the semantics are modeled around Java blocks rather than Java methods. I think I understand it pretty well.
ReplyDeleteFor expressions, I mean like LINQ. As in:
persons.findAll({Person person => person.getAge() >= 18}).select({Person person => person.getName()});
Having to say "return" on all of those makes it a lot less clean looking in my opinion.
But say we say that Tennet's CP is only useful for control structures ("abbreviated syntax"). And they are always void anyway. And say that we allow mutable locals for synchronous cases for all constructs. And say that {=>}/#(){} syntax makes it so "return" is from that block rather than from the outer method, but that losing the final ";" also provides a return value, then we could say this:
persons.findAll(#(Person person) {person.getAge() >= 18}).select(#(Person person) {person.getName()});
But your async FCM cases (including return) would also work. Again, this is saying that block semantics (Tennet's) are most meaningful in block/control-structure syntax.
Oh, and be willing to challenge "the community", including me. None of us are 100% representative of what needs done.
Or maybe just make a new form called "expression methods" that more obviously take just one expression (and therefore no control-flow statements anyway):
ReplyDeletepersons.findAll(#(Person person) person.getAge() >= 18).select(#(Person person) person.getName());
This is similar to the upcoming shorthand function syntax in ECMAScript 4, and it really doesn't seems so crazy.
So that leaves with three new syntaxes: anonymous FCM methods using return, expression methods (no return/break/continue), and control-blocks (like BGGA - modeled after new for loops - and they carry the same semantics for break/continue/return as normal control blocks). All three use "this" to mean the outer this object.
I'll give a review sometime maybe of things I think should be taken out of FCM (in addition to the compounds I've repeatedly mentioned).
Here's what else should go:
ReplyDeleteNamed inner methods: Too much opportunity for ambiguity. If this is allowed, might as well allow for implementing interfaces and leaving some methods undefined. I doubt we want to start that furor. Rather "#whatever()" should refer to a instance method bound to "this". (And again, "bound" shouldn't be used in Java this way because of bound bean properties: http://java.sun.com/docs/books/tutorial/javabeans/properties/bound.html. A new word is needed.)
Local variable limitations: I lean most towards BGGA. Best is to make common cases easy as long as there's some way to deal with problems. Thread safety is super easy to cause trouble with anyway. I have yet to see what's so special about local variables compared to everything else that can cause trouble.
Type conversion: I'm still concerned that "a#b()" in one case and then another can be different objects. But maybe my concerns are unfounded (just as I feel the threading concerns are unfounded). Still, it wouldn't be too hard to say "#() a.b()" (or "{=> a.b()}" in BGGA) with simple expression closures. Almost as succinct and causes the same effect but more obviously a new object, I think.
Concerning threading, we can already set the values of fields (or array slots) in outer local variables. And that's effectively what non-final local variables would get compiled to, anyway. So summary, there are _no_ new thread issues with modifiable outer locals, especially if the volatile modifier is allowed. Really.
ReplyDelete@Tom
ReplyDeleteCase 2: So with BGGA one saves some keystrokes and it looks leaner in your opinion. That's about it?
Named inner methods: You might be right, it's a concession for allowing concise construction. It may not be necessary, though. For SAMs, anonymous inner methods are good enough. And "bigger" interfaces might deserve some builder approach.
Local variable limitations: These are there for a reason: providing least surprise and protection against races. The example Ricky gave in his post makes it quite clear: if the passed method is called asynchronously, the outcome of the program will be non-deterministic. There is no guarantee, that the assignment on x will be executed after the method has been passed. Thus, if you expect the closure to see the 6, you might be surprised it is not. For primitives, it might not be that problematic. Now, think about a local variable that later will reference an instance of Frame. It might still be null or improperly initialized when the closure accesses it, even the following code sets it up correctly.
I did not understand the type conversion concerns.
Thanks for your extensive feedback :)
For threading, the issue you described already exists in Java today (again due to local final vars with fields). That's all I'm saying. Might as well accept that and make life easier for the common case.
ReplyDeleteFor expression methods, more clarity and fewer keystrokes can mean a lot. I find this much harder to write and read (extra words, inline semicolons, and so on):
persons.findAll(#(Person person) {return person.getAge() >= 18;}).select(#(Person person) {return person.getName();});
Hm, actually it's up to the developer, if his design is thread safe or not. So the restriction on copy-by-value might be too protective, especially as for common listener cases one can be quite sure that no event is fired before the frame is visible.
ReplyDeleteI don't think, the "extra words" do a lot of harm here. But that maybe, too, because I don't like the "put everything in one line" code style. I'd use at least final method type variables or wold assign the result of findAll to a final local variable. Lines like this are an NPE hell.
Personally, I find lines like this irritating:
{Person person => person.getAge() >= 18}
At the first glance, looks like some range check gone wrong.
Somewhat agreed. That's why I experimented with syntax like this ("expression methods"):
ReplyDelete#(Person person) person.getAge() >= 18
I think I like it better, and as mentioned above, expressions can't contain break/continue/return (can they? - I haven't doublechecked), so that becomes a non-issue for these cases.
And again, note that for anonymous block methods (like currently in FCM), I'd recommend keeping your current semantics for return. That is, these two expressions would be equivalent:
ReplyDelete#(Person person) person.getAge() >= 18
#(Person person) {return person.getAge() >= 18;}
See here for the related ES4 discussion: "http://developer.mozilla.org/es4/proposals/expression_closures.html". I think it works pretty well with your syntax, too.
@Tom, Thanks for the good feedback, its really helping me formulate what looks good and what doesn't. Your 'expression methods' look interesting, and yes an expression can't contain a continue/break/return.
ReplyDeleteOkay, I really like this proposal and I love this syntax:
ReplyDeleteMethod m = Integer#valueOf(int);
Constructor c = Integer#(int);
Field f = Integer#MAX_VALUE;
that's really nice and neat, and gives the # operator very clear semantics.
However, I'm very much confused by this one
#(Integer(int)) ref = Integer#valueOf(int);
#(Integer(int)) ref = Integer#(int);
#(int(Integer)) ref = Integer#intValue();
first of all, it looks extremely cryptic (in the perl kind of way), the way the 3 expression specifies the type of the object the method operates on is additionally confusing. And I also think that # used this way makes the semantics of the operator much less clear.
I can easily understand # as some special . to denote references to methods as opposed to method invocations, but I can't see # as used to mark the data type of the reference, much like you never see object declared like this.
.Object foo = ...
IMO, you should try to find another syntax to denote the invocable reference type. Maybe something like
Method ref = Integer#valueOf(int);
Constructor ref = Integer#(int);
Method ref = Integer#intValue();
It looks more like java and less like line noise.
oh, I forgot to add my name to the previous post.
ReplyDeleteI will also use this occasion to say that I don't like compounds methods. I saves a minimal amount of typing, looks weird, and probably makes the semantics of # even more confusing.
Sorry for triple posting, but I'm just want to make sure I'm not beeing misunderstood. I understand that my proposed syntax
ReplyDeleteMethod ref = Integer#valueOf(int);
Constructor ref = Integer#(int);
Method ref = Integer#intValue();
uses Method and Constructor and that it's not entirely appropriate. But maybe introducing either new classes (make up whatever you want) or native types (method, constructor) would be better.
I'm just advocating to stay away from premature keystroke number optimization ;)
Nicolas -- I agree with you that there seems to me no reason to use the hash to mark data-types. It makes the semantics seem *more* inconsistent to me; not to mention the visual clutter.
ReplyDeleteBut what if we had:
Integer(int) ref = Integer#valueOf(int);
Integer(int) ref = Integer#(int);
int(Integer) ref = Integer#intValue();
and *also* the alternative syntax:
Integer ref(int) = Integer#valueOf(int);
Integer ref(int) = Integer#(int);
int ref(Integer) = Integer#intValue();
---
In that case I think this is as Java-like syntax as we're likely to get, as it becomes very close to the syntax already used to define methods....
So if a programmer need to change functions:
public void initGUI() {
//...
}
public void displayStrings(String str, Date d) {
//...
}
to fields, he could just change them to simply:
public void initGUI() = #{
//.
}
public void displayStrings(String str, Date d)=#{
//...
}
which would be equivalent to:
public void() initGUI = #{
//...
}
public void(String str, Date d) displayStrings =#{
//...
}
Aris' syntax is great! I wish I'd thought of it ;)
ReplyDeleteI do have a concern about the alternate syntax in the case of object fields. I know that array syntax has two forms:
int[] array = new int[] { 1, 2, 3 };
int array[] = new int[] { 1, 2, 3 };
That may have been the reasoning behind the alternate method reference syntax. But I think reading through a class it may get a little confusing when it comes to the alternate syntax:
public int doStuff(int) = #(int arg){
__// do stuff
}
That is so close to a method declaration that it might be mistaken for it and cause a lot of confusion. Just remove the "(int) = #" and it becomes a method. The other problem is that you can't leave it unassigned, as the compiler would assume it was a method that should have been marked abstract:
public int doStuff(int); // invocable method reference field?
public abstract int doStuff(int arg); // abstract method
This introduces the same type of subtle behaviors we were trying to get away from in C++.
However the first syntax is clearly different from a method definition:
public int(int) doStuff = #(int arg){
__// do stuff
}
It also allows for you to leave those fields uninitialized:
public int(int) doStuff; // Still clearly an invocable method reference
I think that separating the invocable method reference syntax from the method literal and anonymous method syntaxes will make this proposal much more appealing.
These all look acceptable to me, except for the invocable instance method. in my Opinion both
ReplyDeleteint(Integer) ref = Integer#intValue();
and
int ref(Integer) = Integer#intValue();
are confusing by using the instance class as a parameter of the method. Yes, I know this is common in other languages, but we are not trying to enhance another language.
Also, if you decide to keep the int(Integer) syntax, how do you differenciate the types of these two methods?
public static int foo(Integer bar)
class Integer{
__public int denver(){}
}
I suggest to use
Integer.int() ref = Integer#intValue();
instead, because that clearly marks that the invocable method reference is only invocable on a instance of the Integer class. Static methods or constructors don't need to use the Integer. prefix, simply because they do not need an instance to be invoqued on.
The drawback of that particular example, is that the type declaration looks suspiciously like a static function call. But on the other hand, the context makes it pretty clear that it is not.
Perhaps this is a silly question I'm asking, but is there an *actual* need to differentiate between the types of:
ReplyDeletepublic static int foo(Integer bar)
and
class Integer{
__public int denver(){}
}
?
----
Other than that, though at first glance I really liked your syntax of
Integer.int() ref
as seemingly more clear, at second glance I saw there'll be an ambiguity with static inner classes:
class Art {
__Photo getRandomPhoto()
__static class Photo {
____static Photo getBestPhoto()
__}
}
In your example the former method is
Art.Photo() method = Art#getRandomPhoto();
and the latter method is again:
Art.Photo() method = Art.Photo#getBestPhoto();
and yet the one requires an Art object, and the other does not!
With the previous suggestion however the one is
Photo(Art) method = Art#getRandomPhoto();
and the other is
Art.Photo() method = Art.Photo#getBestPhoto();
differentiating the two.
"Integer.int() ref;" could be confusing unless you call it in a similar fashion, e.g. "age.ref()" which I don't think you'd want to do. That would be confusing. Status quo also says the instance goes in as a first arg. See Method#invoke(Object, Object...). Also, it's easier to reuse logic if you don't distinguish between statics with an arg and instance methods without. The effectively take the same number of parameters. Just because one does dynamic lookup, I don't feel a need to distinguish appearances.
ReplyDeleteConcerning Art and photos, I think the suggestion from Nicolas would have:
Photo() method1 = Art#getRandomPhoto();
Art.Photo() method2 = Art.Photo#getBestPhoto();
And with the previous suggestion, I think it would be:
Photo() method1 = Art#getRandomPhoto();
Photo(Art) method2 = Art.Photo#getBestPhoto();
So, I don't think an ambiguity exists, but I also don't really like the "InstanceType.ReturnType()" suggestion for reasons mentioned earlier.
However, I do agree that we should avoid "#(Photo(Art))" in favor of "Photo(Art)" for type names if the grammar allows for it.
Aris - Nevermind. Saw the trouble now. I was imagining Photo at a top level for some reason (which could be done by import, but ignoring that for now - but I made other mistakes, too). Still, I think for the workable case, it would look like this, right?:
ReplyDeleteArt.Photo(Art) method1 = Art#getRandomPhoto();
Art.Photo() method2 = Art.Photo#getBestPhoto();
I hope this syntax can be used. Thanks for the recommendation.
I agree with Tom that distinguishing between instance methods with the instance as first parameter from other methods with the same signature is unnecessary. They both have the same signature, why should they need to be treated different?
ReplyDeleteI also have finally conceded that the instance as first parameter option is worth endorsing as well. Tom's reference to Method.invoke(Object instance, Object...args) finally convinced me.
Stephen, Stefan, would you care to let us know your reactions to suggested invocable method reference syntax proposed by Aris?
Unfortunately, I'm not a fan of Aris' syntax. It it too close to method syntax for my taste, and it doesn't handle the throws clause well (if at all).
ReplyDeleteBTW, I happen to agree with requiring "ReturnType(ArgType) ref;" rather than allowing the alternative "ReturnType ref(ArgType);". It might seem convenient for some cases, but it's less clear.
ReplyDeleteGood point on the throws clause, but otherwise I think it's more readable than the '#'. Hmm.
ReplyDelete@Stephen -- Just throwing out the first idea that came to mind: if the method has a throws clause, perhaps the whole thing could be wrapped in parentheses?
ReplyDelete#(Integer(int) throws Throwable) ref = ...
becomes:
(Integer(int) throws Throwable) ref = ...
The rest of the time the outside parentheses could be optional:
(Integer(int)) ref = ...
is equivalent to:
Integer(int) ref = ...
Personally I see the similarity to the regular method syntax as a clarification of intent rather than diluting the syntax. Maybe I'm alone in that regard though.
I agree with Matthew that the similarity of the syntaxes makes perfect sense. You say "It is too close to method syntax", but that is because the semantics is very close too.
ReplyDelete(example 1)
public class Foo {
__public int beef(){
____return 0xBEEF;
__}
__public final int(Foo) dead = #{
____ return 0xDEAD;
__}
}
now consider these statements:
(example 2)
Foo foo = new Foo();
foo.beef();
foo.dead.invoke(foo);
The two last lines are for all intent and purposes identical. So if I wanted to push the whole thing a little too far, I could say that the traditional method syntax (the first one in example 1) is just syntactical sugar for the first class method (the second one in example 2).
So yes the syntaxes are similar but that's because the concepts are the same in my opinion (as illustrated by the above example).
Yes, where my tastes are concerned I'd consider the similarity in syntax to be a feature, not a bug -- indeed I meant it to be as similar as reasonably possible.
ReplyDeleteAnd yeah -- Matthew's suggestion concerning throws seems perfectly fine for me.
Thank you all for the inspiring ideas.
ReplyDeleteFor now, Stephen and I are focusing on semantics rather than syntax. A syntax can always be changed later on, while the semantics has to cover all that is needed to provide a useful proposal. And we are not there yet.
Cheers, Stefan
@Stephen, Stefan: I don't think anybody disagrees with the semantics you're proposing. Indeed this is precisely why we're so interested. Maybe the big reason we've been chiming in so much about syntax is that a good syntax will go a long way toward clarifying the semantics.
ReplyDeleteCould you elaborate on the semantic aspects you still trying to work out? Are you talking about the "to-do" list at the end of the current proposal?
We are still discussing the semantics of 'synchronous' closures, and control abstraction.
ReplyDeleteCould you give some examples of the synchronous cases you'd like to tackle?
ReplyDeleteYou might want to have a look at the latest examples on Neal's blog (http://gafter.blogspot.com) describing a quite interesting case. I'd call it sequential rather than synchronous, though.
ReplyDelete