Monday, 4 November 2019

Java switch - 4 wrongs don't make a right

The switch statement in Java is being changed. But is it an upgrade or a mess?

Classic switch

The classic switch statement in Java isn't great. Unlike many other parts of Java, it wasn't properly rethought when pulling features across from C all those years ago.

The key flaw is "fall-through-by-default". This means that if you forget to put a break clause in each case, processing will continue on to the next case clause.

Another flaw is that variables are scoped to the entire switch, thus you cannot reuse a variable name in two different case clauses. In addition, default clause is not required, which leaves readers of the code unclear as to whether a clause was forgotten or not.

And of course there is also the key limitation - that the type to be switched on can only be an integer, enum or string.

 String instruction;
 switch (trafficLight) {
   case RED:
     instruction = "Stop";
   case YELLOW:
     instruction = "Prepare";
     break;
   case GREEN:
     instruction = "Go";
     break;
 }
 System.out.println(instruction);

The code above does not compile because there is no default clause, leaving instruction undefined. But even if it did compile, it would never print "Stop" due to the missing break. In my own coding, I prefer to always put a switch at the end of a method, with each clause containing a return to reduce the risks of switch.

Upgraded switch

As part of Project Amber, switch is being upgraded. But sadly, I'm unconvinced as to the merits of the new design. To be clear, there are some good aspects, but overall I think the solution is overly complex and with some unpleasant syntax choices.

The key aim is to add an expression form, where you can assign the result of the switch to a variable. This is rather like the ternary operator (eg. x != null ? x : ""), which is the expression equivalent of an if statement. An expression form would reduce problems like the undefined variable above, because it makes it more obvious that each branch must result in a variable.

The current plan is to add not one, but three new forms of switch. Yes, three.

Explaining this in a blog post is, unsurprisingly, going to take a while...

  • Type 1: Statement with classic syntax. As today. With fall-through-by-default. Not exhaustive.
  • Type 2: Expression with classic syntax. NEW! With fall-through-by-default. Must be exhaustive.
  • Type 3: Statement with new syntax. NEW! No fall-through. Not exhaustive.
  • Type 4: Expression with new syntax. NEW! No fall-through. Must be exhaustive.

The headline example (type 4) is of course quite nice:

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> "Prepare";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

As can be seen, the new syntax of type 3 and 4 uses an arrow instead of a colon. And there is no need to use break if the code consists of a single expression. There is also no need for a default clause when using an enum, because the compiler can insert it for you provided you've included all the known enum values. So, if you missed out GREEN, you would get a compile error.

The devil of course is in the detail.

Firstly, a clear positive. Instead of falling through by listing multiple labels, they can be comma-separated:

 // type 4
 var instruction = switch (trafficLight) {
   case RED, YELLOW -> "Stop";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

Straightforward and obvious. And avoids many of the simple fall-through use cases.

What if the code to execute is more complex than an expression?

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> {
     revYourEngine();
     yield "Prepare";
   }
   case GREEN -> "Go";
 };
 System.out.println(instruction);

yield? Shrug. For a long time it was going to be break {expression}, but this clashes with labelled break (a syntax feature that is rarely used).

So what about type 2?

 // type 2
 var instruction = switch (trafficLight) {
   case RED: yield "Stop";
   case YELLOW:
     System.out.println("Prepare");
   case GREEN: yield "Go";
 };
 System.out.println(instruction);

Oops! I forgot the yield. So, an input of YELLOW will output "Prepare" and then fall-through to yield "Go".

So, why is it proposed to add a new form of switch that repeats the fall-through-by-default error from 20 years ago? The answer is orthogonality - a 2x2 grid with expression vs statement and fall-through-by-default vs no fall-through.

A key question is whether being orthogonal justifies adding a almost totally useless form of switch (type 2) to the language.

So, type 3 is fine them?

Well, no. Because of the insistence on orthogonality, and thus an insistence of copying the historic rules relating to type 1 statement switch, there is no requirement to list all the case clauses:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
 }

So, what happens for YELLOW? The answer is nothing, but as a reader I am left wondering if the code is correct or incomplete. It would be much better if the above was a compile error, with developers forced to write a default clause:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
   default -> {}
 }

The official argument is that since type 1 statement switch (the current one) does not force exhaustiveness, neither can the new type 3 statement switch. My view is that keeping a bad design from 20 years ago is a worse sin.

What else? Well, one thing to bear in mind is that expressions cannot complete early, thus there is no way to return directly from within a switch expression (type 2 or 4). Nor is there a way to continue/break a loop. Trust me when I say there is an endless supply of Java exam questions in the rules that actually apply.

Summarizing the types

Type 1: Classic statement switch

  • Statement
  • Fall-through-by-default
  • return allowed, also continue/break a loop
  • Single scope for variables
  • Logic for each case is a sequence of statements potentially ending with break
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 2: Classic syntax expression switch

  • Expression
  • Fall-through-by-default
  • return not allowed, cannot continue/break a loop
  • Single scope for variables
  • Logic for each case can be a yield expression, or a sequence of statements potentially ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values

Type 3: Arrow-form statement switch

  • Statement
  • Fall-through is not permitted
  • return allowed, also continue/break a loop
  • No variable scope problems, logic for each case must be a statement or a block
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 4: Arrow-form expression switch

  • Expression
  • Fall-through is not permitted
  • return not allowed, cannot continue/break a loop
  • No variable scope problems, logic for each case must be an expression or a block ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values, but only from blocks (it is implied when not a block)

Are you confused yet?

OK, I'm sure I didn't explain everything perfectly, and I may well have made an error somewhere along the way. But the reality is that it is complex, and there are lots of rules hidden in plain sight. Yes, it is orthogonal. But I really don't think that helps in comprehending the feature.

What would I do?

Type 4 switch expressions are fine (although I have real issues with the extension of the arrow syntax from lambdas). My problem is with type 2 and 3. In reality, those two types of switch will be very rare, and thus most developers will never see them. Given this, I believe it would be better to not include them at all. Once this is accepted, there is no point in treating the expression form as a switch, because it won't actually have many connections to the old statement form.

I would drop type 2 and 3, and allow type 4 switch expressions to become what is known as statement expressions. (Another example of a statement expression is a method call, which can be used as an expression or as a statement on a line of its own, ignoring any return value.)

 // Stephen's expression switch
 var instruction = match (trafficLight) {
   case RED: "Stop";
   case YELLOW: "Prepare";
   case GO: "Go";
 }
 // Stephen's expression switch used as a statement (technically a statement expression)
 match (instruction) {
   case "Stop": doStop();
   case "Go": doGo();
   default: ;
 }

My approach uses a new keyword match, as I believe extending switch is the wrong baseline to use. Making it a statement expression means that there is only one set of rules - it is always an expression, it is just that you can use it as though it were a statement. What you can't do with my approach is use return in the statement version, because it isn't actually a statement (you can't use return from any expression in Java today, so this would be no different).

Summary

If you ignore the complexity, and just use type 4 switch expressions, the new feature is quite reasonable.

However, in order to add the one form of switch Java needed, we've also got two other duds - type 2 and 3. In my view, the feature needs to go back to the drawing board, but sadly I suspect it is now too late for that.

5 comments:

  1. I gave a few talks about how the switch syntax for match expressions is a mess and a failure of the process, recommending that many people (and not just known naysayers) interact with the design process early and often. To nobody's surprise, few programmers were aware of how the process worked, and I could only find one person who thought that having four forms of switch was a defensible idea. Most people stared at me with disbelief, no matter how often I said "I am not making this up."

    ReplyDelete
  2. After reading this full post, Colebourne's criticisms are well thought out and valid... but I still think the JDK 13 switch expressions are fine as-is, and a nice improvement over the Java status-quo. Maybe it would be slightly less confusing to omit the type 2 scenario given; few people will use that. I appreciate Colebourne as a devil's advocate to language changes to get the details right, but I trust Brian Goetz to make excellent decisions.

    My bigger Java gripe is that from a developer perspective, most of the big useful features discussed from ~5 years ago are still perpetually years away from shipping: like Project Valhalla and Loom (fibers) and data classes and pattern matching and reified generics are still way out in the future. The biggest plus about Java 9-13 is that the community has finally adopted Java 8 as the bare minimum and can fully utilize JDK 8 features.

    ReplyDelete
  3. I like the way to use the match keyword, it will be like a new start.

    ReplyDelete
  4. Good article. I think you are correct in your statement that they should have omitted type 2 and type 3. Too many different options make for a mess. Especially odd-ball variants that only a select few developers will ever use. It will end up affecting maintainability of code.

    ReplyDelete
  5. Do you want to force exhaustiveness or using a default clause. The former would be fine (and is practically the current state as AFAIK every IDE produces a warning).

    Forcing the default clause would be IMHO wrong. I deliberately leave it out whenever possible, as this gives me a warning when the enum gets a new member.

    I guess, I like your proposal, but it might be hard to introduce a new keyword. What about reusing "case" like

    case (instruction) {
    "Stop": doStop();
    "Go": doGo();
    default: ;
    }

    I guess, there's no need for keeping the "case-colon" part of the old syntax.

    ReplyDelete

Please be aware that by commenting you provide consent to associate your selected profile with your comment. Long comments or those with excessive links may be deleted by Blogger (not me!). All spam will be deleted.