Monday, 4 November 2019

Java switch - 4 wrongs don't make a right

The switch statement in Java is being changed. But is it an upgrade or a mess?

Classic switch

The classic switch statement in Java isn't great. Unlike many other parts of Java, it wasn't properly rethought when pulling features across from C all those years ago.

The key flaw is "fall-through-by-default". This means that if you forget to put a break clause in each case, processing will continue on to the next case clause.

Another flaw is that variables are scoped to the entire switch, thus you cannot reuse a variable name in two different case clauses. In addition, default clause is not required, which leaves readers of the code unclear as to whether a clause was forgotten or not.

And of course there is also the key limitation - that the type to be switched on can only be an integer, enum or string.

 String instruction;
 switch (trafficLight) {
   case RED:
     instruction = "Stop";
   case YELLOW:
     instruction = "Prepare";
     break;
   case GREEN:
     instruction = "Go";
     break;
 }
 System.out.println(instruction);

The code above does not compile because there is no default clause, leaving instruction undefined. But even if it did compile, it would never print "Stop" due to the missing break. In my own coding, I prefer to always put a switch at the end of a method, with each clause containing a return to reduce the risks of switch.

Upgraded switch

As part of Project Amber, switch is being upgraded. But sadly, I'm unconvinced as to the merits of the new design. To be clear, there are some good aspects, but overall I think the solution is overly complex and with some unpleasant syntax choices.

The key aim is to add an expression form, where you can assign the result of the switch to a variable. This is rather like the ternary operator (eg. x != null ? x : ""), which is the expression equivalent of an if statement. An expression form would reduce problems like the undefined variable above, because it makes it more obvious that each branch must result in a variable.

The current plan is to add not one, but three new forms of switch. Yes, three.

Explaining this in a blog post is, unsurprisingly, going to take a while...

  • Type 1: Statement with classic syntax. As today. With fall-through-by-default. Not exhaustive.
  • Type 2: Expression with classic syntax. NEW! With fall-through-by-default. Must be exhaustive.
  • Type 3: Statement with new syntax. NEW! No fall-through. Not exhaustive.
  • Type 4: Expression with new syntax. NEW! No fall-through. Must be exhaustive.

The headline example (type 4) is of course quite nice:

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> "Prepare";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

As can be seen, the new syntax of type 3 and 4 uses an arrow instead of a colon. And there is no need to use break if the code consists of a single expression. There is also no need for a default clause when using an enum, because the compiler can insert it for you provided you've included all the known enum values. So, if you missed out GREEN, you would get a compile error.

The devil of course is in the detail.

Firstly, a clear positive. Instead of falling through by listing multiple labels, they can be comma-separated:

 // type 4
 var instruction = switch (trafficLight) {
   case RED, YELLOW -> "Stop";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

Straightforward and obvious. And avoids many of the simple fall-through use cases.

What if the code to execute is more complex than an expression?

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> {
     revYourEngine();
     yield "Prepare";
   }
   case GREEN -> "Go";
 };
 System.out.println(instruction);

yield? Shrug. For a long time it was going to be break {expression}, but this clashes with labelled break (a syntax feature that is rarely used).

So what about type 2?

 // type 2
 var instruction = switch (trafficLight) {
   case RED: yield "Stop";
   case YELLOW:
     System.out.println("Prepare");
   case GREEN: yield "Go";
 };
 System.out.println(instruction);

Oops! I forgot the yield. So, an input of YELLOW will output "Prepare" and then fall-through to yield "Go".

So, why is it proposed to add a new form of switch that repeats the fall-through-by-default error from 20 years ago? The answer is orthogonality - a 2x2 grid with expression vs statement and fall-through-by-default vs no fall-through.

A key question is whether being orthogonal justifies adding a almost totally useless form of switch (type 2) to the language.

So, type 3 is fine them?

Well, no. Because of the insistence on orthogonality, and thus an insistence of copying the historic rules relating to type 1 statement switch, there is no requirement to list all the case clauses:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
 }

So, what happens for YELLOW? The answer is nothing, but as a reader I am left wondering if the code is correct or incomplete. It would be much better if the above was a compile error, with developers forced to write a default clause:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
   default -> {}
 }

The official argument is that since type 1 statement switch (the current one) does not force exhaustiveness, neither can the new type 3 statement switch. My view is that keeping a bad design from 20 years ago is a worse sin.

What else? Well, one thing to bear in mind is that expressions cannot complete early, thus there is no way to return directly from within a switch expression (type 2 or 4). Nor is there a way to continue/break a loop. Trust me when I say there is an endless supply of Java exam questions in the rules that actually apply.

Summarizing the types

Type 1: Classic statement switch

  • Statement
  • Fall-through-by-default
  • return allowed, also continue/break a loop
  • Single scope for variables
  • Logic for each case is a sequence of statements potentially ending with break
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 2: Classic syntax expression switch

  • Expression
  • Fall-through-by-default
  • return not allowed, cannot continue/break a loop
  • Single scope for variables
  • Logic for each case can be a yield expression, or a sequence of statements potentially ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values

Type 3: Arrow-form statement switch

  • Statement
  • Fall-through is not permitted
  • return allowed, also continue/break a loop
  • No variable scope problems, logic for each case must be a statement or a block
  • Not exhaustive - default clause is not required
  • yield is not allowed

Type 4: Arrow-form expression switch

  • Expression
  • Fall-through is not permitted
  • return not allowed, cannot continue/break a loop
  • No variable scope problems, logic for each case must be an expression or a block ending with yield
  • Exhaustive - default clause is required
  • Must use yield to return values, but only from blocks (it is implied when not a block)

Are you confused yet?

OK, I'm sure I didn't explain everything perfectly, and I may well have made an error somewhere along the way. But the reality is that it is complex, and there are lots of rules hidden in plain sight. Yes, it is orthogonal. But I really don't think that helps in comprehending the feature.

What would I do?

Type 4 switch expressions are fine (although I have real issues with the extension of the arrow syntax from lambdas). My problem is with type 2 and 3. In reality, those two types of switch will be very rare, and thus most developers will never see them. Given this, I believe it would be better to not include them at all. Once this is accepted, there is no point in treating the expression form as a switch, because it won't actually have many connections to the old statement form.

I would drop type 2 and 3, and allow type 4 switch expressions to become what is known as statement expressions. (Another example of a statement expression is a method call, which can be used as an expression or as a statement on a line of its own, ignoring any return value.)

 // Stephen's expression switch
 var instruction = match (trafficLight) {
   case RED: "Stop";
   case YELLOW: "Prepare";
   case GO: "Go";
 }
 // Stephen's expression switch used as a statement (technically a statement expression)
 match (instruction) {
   case "Stop": doStop();
   case "Go": doGo();
   default: ;
 }

My approach uses a new keyword match, as I believe extending switch is the wrong baseline to use. Making it a statement expression means that there is only one set of rules - it is always an expression, it is just that you can use it as though it were a statement. What you can't do with my approach is use return in the statement version, because it isn't actually a statement (you can't use return from any expression in Java today, so this would be no different).

Summary

If you ignore the complexity, and just use type 4 switch expressions, the new feature is quite reasonable.

However, in order to add the one form of switch Java needed, we've also got two other duds - type 2 and 3. In my view, the feature needs to go back to the drawing board, but sadly I suspect it is now too late for that.

Friday, 22 March 2019

User-defined literals in Java?

Java has a number of literals for creating values, but wouldn't it be nice if we had more?

Current literals

These are some of the literals we can write in Java today:

  • integer - 123, 12s, 1234L, 0xB8E817, 077, 0b1011_1010
  • floating point - 45.6f, 56.7d, 7.656e6
  • string - "Hello world"
  • char - 'a'
  • boolean - true, false
  • null - null

Project Amber is also considering adding multi-line and/or raw string literals.

But there are many other data types that would benefit from literals, such as dates, regex and URIs.

User-defined literals

In my ideal future, I'd like to see Java extended to support some form of user-defined literals. This would allow the author of a class to provide a mechanism to convert a sequence of characters into an instance of that class. It may be clearer to see some examples using one possible syntax (using backticks):

 Currency currency = `GBP`;
 LocalDate date = `2019-03-29`;
 Pattern pattern = `strata\.\w+`;
 URI uri = `https://blog.joda.org/`;

A number of semantic features would be required:

  • Type inference
  • Raw processing
  • Validated at compile-time
Type inference

Type inference is of course a key aspect of literals. It would have to work in a similar way to the existing literals, but with a tweak to handle the new var keyword. ie. these two would be equivalent:

 LocalDate date = `2019-03-29`;
 var date = LocalDate`2019-03-29`;

The type inference would also work with methods (compile error if ambiguous):

 boolean inferior = isShortMonth(`2019-04-12`);

 public boolean isShortMonth(LocalDate date) { return date.lengthOfMonth() < 31; }
Raw processing

Processing of the literal should not be limited by Java's escape mechanisms. User-defined literals need access to the raw string. Note that this is especially useful for regex, but would also be useful for files on Windows:

 // user-defined literals
 var pattern = Pattern`strata\.\w+`;
 // today
 var pattern = Pattern.compile("strata\\.\\w+");

Today, the `\` needs to be escaped, making the regex difficult to read.

Clearly, the problem with parsing raw literals is that there is no mechanism to escape. But the use cases for user-defined literals tend to have constrained formats, eg. a date doesn't contain random characters. So, although there might be edge cases where this would be a problem, they would vert much be edge cases.

Validated at Compile-time

A key feature of literals is that they are validated at compile-time. You can't use an integer literal to create an int if the value is larger than the maximum allowed integer (2^31).

User-defined literals also need to be parsed and validated at compile-time too. Thus this code would not compile:

 LocalDate date = `2019-02-31`;

Most types which would benefit from literals only accept specific input formats, so being able to check this at compile time would be beneficial.

How would it be implemented?

I'm pretty confident that there are various ways it could be done. I'm not going to pick an approach, as ultimately those that control the JVM and language are better placed to decide. Clearly though, there is going to need to be some form of factory method on the user class that performs the parse, with that method invoked by the compiler. And ideally, the results of the parse would be stored in the constant pool rather than re-parsed at runtime.

What I would say is that user-defined literals would almost be a requirement for making value types usable, so something like this may be on the way anyway.

Summary

I like literals. And I would really like to be able to define my own!

Any thoughts?

Wednesday, 9 January 2019

Commercial support for Joda and ThreeTen projects

The Java ecosystem is made up of many individuals, organisations and companies producing many different libraries. Some of the largest projects have long had support options where users of the project, typically corporates, can pay for an enhanced warranty, guaranteed approach to bug fixes and more.

Small projects, run by a single individual or a team, have been unable to offer this service, even if they wanted to. In addition, there is a more subtle problem. The amount a small project could charge is too low for a corporate to pay.

This sounds odd, but was brought home to me by this thread on twitter:

As the thread indicates, it is basically impossible for a corporate to gift money to a small project, and it is not viable for small projects to meaningfully offer a support contract.

The problem is that not paying the maintainers has negative consequences. Take the recent case where a developer handed his open source project on to another person, who then used it to steal bitcoins.

Pay the maintainers

I believe there is now a solution to the problem. Tidelift.

Tidelift offers companies a monthly subscription to support their open source usage. And they pay some of that income directly to the maintainers of the projects that the company uses.

Maintainers are expected to continue maintaining the project, follow a responsible disclosure process for security issues and check their licensing. Tidelift does not get to control the project roadmap, and maintainers do not have to provide an active helpdesk or consulting. See here for more details.

As such, I'm now offering commercial support for Joda-Time, Joda-Money, Joda-Beans, Joda-Convert, Joda-Collect, ThreeTen-Extra, ThreeTen-backport via the Tidelift subscription.

This is an extra option for those that want to support the maintainers of open source but haven't been able to find a way to do so until now. The Joda and ThreeTen projects will always be free and available under a permissive licence, so there is no need to worry as a result of this.

Comments welcome.