Saturday, 26 April 2008

Java 7 - For-each loop control access

I've gathered together a few more thoughts on improving the enhanced for-each loops. The basic idea is to take this very popular Java 5 feature and provide the missing parts.

Control access

One of the more frustrating parts of the Java 5 for-each loop is when you are 80% through writing a loop, and you discover you need to remove an item, or require the loop index. At that point, you have to go back and manually change the loop to one of the old formats (in Eclipse at least). This is a hassle.

Perhaps more importantly is that the older for loops simply aren't as clear in their intentions, aren't very DRY, and are definitely more error-prone. As a result, I've documented my proposal to improve the for-each loop with control access. For example, to access the loop index:

 Collection<String> coll = new ArrayList<String>();

 for (String str : coll : it) {
   System.out.println("Item: " + str + ", Index: " + it.index());
 }

And here is an example of removing an item:

 List<String> list  = new ArrayList<String>();

 for (String str : list : it) {
   if (str == null || it.isFirst()) {
     it.remove();
   }
 }

As can be seen, the syntax simply involves adding another colon and a 'variable' name. The 'variable' can be used access loop control and manipulation functions. Note that the additional colon and 'variable' are of course optional for full backwards compatibility.

The document discusses two strategies for implementing the syntax - either via real Java types or as a language level feature. Please read the document for more information.

Maps

I have updated my previous document about extending for-each to maps. The download of the javac implementation remains available from Kijaro.

Summary

It seems increasingly unlikely that there is time for closures to make it into Java 7. There are also many developers expressing real doubts as to whether the complexity of control invocation is just too much for the venerable Java language.

The alternative is smaller improvements like these two. They provide an easy to grasp extension to the popular Java 5 for-each loop, that might still be possible to deliver in Java 7. Opinions welcome, as always.

15 comments:

  1. What about labelling the loop and adding a remove keyword that would act like it.remove(); continue?

    loop:
    for (String str : strings) {
    if (str.equals("blah"))
    remove loop;

    // ...
    }

    ReplyDelete
  2. That's a nice KISS improvement. At the same time, I would love if Iterator were equipped with an isFirst - since you often need to perform a special step in the first iteration. Currently, one has to roll his own stuff along the lines of:

    class BoundsIterator implements Iterator
    {
    private final Iterator iter;
    private boolean first = true;

    public BoundsIterator(Iterator iter){ this.iter = iter; }
    public boolean hasNext(){ return iter.hasNext(); }
    public T next(){ first = false; return iter.next(); }
    public void remove(){ iter.remove(); }

    public boolean isFirst(){ return first; }
    public boolean isLast(){ return !hasNext(); }
    }

    ReplyDelete
  3. One possible inconsistence, when introducing iterator in the for-each construct is that the iterator semantics can not be provided for array.

    There is no way to really _remove_ an element from array and on the other hand the iterator interface does not let us _replace_ the value that it's pointing to.

    ReplyDelete
  4. You cannot change the existing Iterator interface - remember interfaces can not evolve and are forever!

    ReplyDelete
  5. A List or an array is sort of like a map that happens to be indexed by integers, so it seems like a syntax similar to the proposed for-loop for maps would make a lot of sense:

    List items = Lists.newArrayList();
    ...

    for (int index, Foo value : items) {
    System.out.println("item[" + index + "] is " + value);
    }

    It doesn't let you remove an entry, but that seems less common than just needing to know the index.

    ReplyDelete
  6. Brian's idea is interesting. Instead of adding a colon and an untyped designator/variable, why not optionally allow a second argument before the colon that can take the resulting typed iterator or index, depending on the argument following the colon. This not only would allow to pick some compiler-defined container, but the appropriate one.
    E.g.:
    for (Foo value, int index : fooArray) ...
    for (Foo value, MyFooIterator it : fooList) ...
    Which may not provide additional features like isFirst or isLast, but is backwards compatible and would require few changes to the for-each implementation. It still would allow for a map-for-each implementation.

    ReplyDelete
  7. An anonymous comment indicates that you can't change an existing interface, such as Iterator. That's very true, but you CAN extend it, in a couple of ways:

    1) If you really wanted to eliminate some forms of backwards compatibility, you could change the signature on return calls, to make, for example, List<> return NewIterator<>, where NewIterator<> extends Iterator<>. The problem here is with all the existing List<> implementations outside of the standard JDK, which would have to be rewritten.
    2) You could have NewIterator<> extend Iterator<>, and have some implementations return NewIterator<> rather than just a plain Iterator<>. This has the disadvantage that it requires an instanceof and downcast, but it works.
    3) The next option is to follow the ListIterator<> model and have a separate method to return that (like "NewIterator<> newIterator()").

    However, the key problem here is that the interfaces involved are already so entrenched that any option other than #2 is going to be extremely difficult to get past all the various implementations of the existing interfaces that rely on Iterator<>.

    New for loop semantics, however, are NOT prone to this problem, specifically because they can use autobox in conjunction with #2. So imagine that you're in an enhanced-for loop. The code generation can actually automatically grab the Iterator<> and downcast it (or wrap it if it's not a NewIterator<>) so that it presents as a NewIterator<>. So although the interface is immutable, because enhanced-for semantics no matter what can box and wrap up existing code, this would be a relatively safe extension.

    ReplyDelete
  8. Stephen Colebourne26 April 2008 at 11:44

    Matt, The problem with using a label is that the scope of the label is not limited to the lifetime of the loop. In addition, adding a new keyword here seems excessive.

    Dimitar, The document discusses remove on an array (disallowed) and set on an iterable (disallowed, unless it is a List)

    Brian, An interesting notion, but what happens if you implement both the maps and control proposals. Then you need to allow to obtain the index of the map iteration. I take the view that this is information about the iteration, not part of the iteration itself. Meta-data if you like.

    Stefan, I have considered allowing dedicated user-provided iterator classes. However, my gut feeling is that they make the concept too complex. This is all about the right level of complexity, and I'm going for KISS here.

    Kirk, Your #2 can be made to work, but again is fairly complex. In the document, option (b) for implementing the change doesn't require NewIterator at all for example.

    ReplyDelete
  9. Stephen, I don't think that compiler-generated or -bound types are simpler than using what's already there. I'd challenge your KISS claim in this case.

    ReplyDelete
  10. > You cannot change the existing Iterator interface - remember interfaces can not evolve and are forever!

    Ok let me rephrase that, I would love if internal methods of the JDK (or at least of collection) returned an improved Iterator (BoundsIterator or IndexIterator). That would not break existing code the way I see it.

    ReplyDelete
  11. I really don't like the idea of the "direct syntax" (option b). Programmers will have to go to the language spec to look up the set of methods on "it", instead of going to the Javadoc APIs, like they do for everything else. It'll have all sorts of unexpected restrictions, like inability to pass it to another method, cast it to Object, use reflection, etc. IDEs (and all other language parsers) would need all sorts of special support for them. It'll be entirely unlike any other Java variable. I can't imagine that the advantages sufficiently justify it.

    ReplyDelete
  12. Stephen Colebourne27 April 2008 at 10:31

    Lawrence, The main benefit of the direct approach is performance, which does matter for looping. Bear in mind that array for-each doesn't even generate an Iterator at all ATM. The lookup problem would be solved by IDE code completion for most users.

    A downside of using a real type for the controller variable is that would normally imply specifying the type in the declaration. I want to avoid that, as it is unecessary duplication, especially with the generics that would be required.

    One possible solution, would be to have the relevant interface(s), but to allow the compiler to generate direct syntax if it could determine that the variable isn't used for anything special (like passing to another method or reflection). Generally, this kind of optimisation has been left to hotspot, but I suspect that wouldn't be good enough here - a question for the hotspot engineers!

    ReplyDelete
  13. For me for.index and for.isLast would solve my 95% cases and be simple to implement. Of course it would not solve outer loop access but that is fine with me since it is easy to save that in a tmp variable before the inner loop.
    If you want to go more advanced that it fine too, as long as it doesn't mean I will have to choose between fewer characters and performance and that the simplest case now is not simple and has not much benefit over the original one.

    ReplyDelete
  14. For me for.index and for.isLast would solve my 95% cases and be simple to implement. Of course it would not solve outer loop access but that is fine with me since it is easy to save that in a tmp variable before the inner loop.
    If you want to go more advanced that it fine too, as long as it doesn't mean I will have to choose between fewer characters and performance and that the simplest case now is not simple and has not much benefit over the original one.

    ReplyDelete
  15. The index proposal is just perfect, adding remove, isFirst, isLast would really be great.

    Thanks for following up on the for-each post and the positive feedback in the comments.

    ReplyDelete