Stephen Colebourne's blog

Wednesday, 15 October 2025

Type conversion in Java - an alternative proposal for primitive type patterns

A lot of good work has been done by the core Java team on patterns, providing new ways to explore data. The latest extension, in JEP 507, is the idea that primitive type patterns should be supported. Today I'm publishing an alternative approach.

Primitive Types in Patterns, instanceof, and switch

The current proposal is as follows:

  long val = createLong();
  int i = (int) val;  // cast long to int, potentially silently losing information
  switch (val) {
    case int j -> IO.println("Long fits in an int");
    case long v -> IO.println("Long does not fit in an int");
  };

I like the idea of being able to tell if a long value fits into an int without loss. But I hate the syntax.

The key problem is that type patterns check the supertype/subtype relationship, and int is not a subtype of long. The result is code that doesn't seem to make sense.

The official explanation is based on the notion that develoeprs use instanceof String before a cast to String all the time. Thus a parallel can be drawn to have an instanceof int before a cast to int. Effectively the aim is to extend the meaning of type patterns to cover primitive type casts, which are type conversions, not type checks.

I know I am not alone in finding this argument weak, and in finding the proposed syntax highly confusing. But it took a while, and an 8 page document, to figure out exactly why.

Type conversion in Java

In response to the JEP and subsequent discussions, I have written up a detailed proposal for type conversion casts and type conversion patterns. These allow developers to more clearly express the difference between type checks (that check the supertype/subtype relationship) and type conversions (where primitive types are changed to a different type).

The big idea is to introduce a new kind of cast, the type conversion cast that operates like a standard primitive type cast, but throws an exception when the conversion would be lossy.

  long val = createLong();
  int i = (int) val;   // cast long to int, potentially losing information by truncation
  int j = (~int) val;  // cast long to int, throwing TypeConversionException if lossy

As can be seen, the new kind of cast blends in well, but immediately offers safety benefits. Millions of primitive type casts in Java applications are intended to be fully safe, but are unchecked and could silently lose information. Simply adding one extra character ~ would upgrade them to a safer alternative.

The related type conversion pattern allows switch and instanceof to safely check for type conversions:

  long val = createLong();
  switch (val) {
    case ~int j -> IO.println("Long fits in an int");
    case long v -> IO.println("Long does not fit in an int");
  };

This allows pattern matching in switch for primitive types, but highlights the different things the pattern is checking.

Type patterns check the supertype/subtype relationship.
Type conversion patterns check point-to-point conversions, defined for primitive types or value types.

Please read the full proposal and provide feedback.

Tuesday, 20 February 2024

Pattern match Optional in Java 21

I'm going to describe a trick to get pattern patching on Optional in Java 21, but one you'll probably never actually use.

Using Optional

As of Java 21, Pattern matching in Java allows us to check a value against a type like an instanceof with a new variable being declared of the correct type. Pattern matching can handle simple types and the deconstruction of records. But pattern matching of arbitrary classes like Optional is not yet supported. (Work to support pattern match methods is ongoing).

In normal code, the best way to use Optional is with one of the functional methods:

  var addressOpt = findAddress(personId);
  var addressStr = addressOpt
        .map(address -> address.format())
        .orElse("No address available");

This works well in most cases. But sometimes you want to use the Optional with a return statement. This results in code using get() like this:

  var addressOpt = findAddress(personId);
  if (addressOpt.isPresent()) {
    // early return if address found
    return addressOpt.get().format();
  }
  // lots of other code to handle case when address not found

One way to improve this is to write a simple method:

  /**
   * Converts an optional to an iterable for use in the for-each statement.
   *
   * @param <lT> the type of optional element
   * @param optional the optional
   * @return an iterable representation of the optional
   */
  public static <lT> Iterable<lT> inOptional(Optional<lT> optional) {
    return optional.isPresent() ? List.of(optional.get()): List.of();
  }

Which allows the following neat form:

  for (var address : inOptional(findAddress(personId))) {
    // early return if address found
    return address.format();
  }
  // lots of other code to handle case when address not found

This is a great approach providing that you don't need an else branch.

Using Optional with Pattern matching

With Java 21 and pattern matching we have a new way to do this!

  if (findAddress(personId).orElse(null) instanceof Address address) {
    // early return if address found
    return address.format();
  } else {
    // lots of other code to handle case when address not found
  }

This makes use of the fact that instanceof rejects null. Any expression can be used that creates an Optional - just pop .orElse(null) on the end.

Note that this does not work with Java 17 in most cases, because unconditional patterns were not permitted. (In most cases, the expression will return a value of the type being checked for, and the compiler rejects it as being "always true". Of course this was a simplification in ealier Java versions, as the null really needed to be checked for.)

Is this a useful trick?

In reality, this is of marginal use. Where I think it could be of is a long if-else chain, for example:

  if (obj instanceof Integer value) {
    // use value
  } else if (obj instanceof Long value) {
    // use value
  } else if (obj instanceof Double value) {
    // use value
  } else if (lookupConverter(obj).orElse(null) instanceof Converter conv) {
    // use converter
  } else {
    // even more options
  }

(It is valuable here as it avoids calling lookupConverter(obj) until necessary.)

Anyway, it is a fun trick even if you never use it!

One day soon I imagine we will use something like this:

  if (lookupConverter(obj) instanceof Optional.of(Converter conv)) {
    // use converter
  } else {
    // even more options
  }

which I think we'd all agree is the better approach.

Comments at Reddit

Thursday, 6 October 2022

Java on-ramp - Fully defined Entrypoints

How do you start a Java program? With a main method of course. But the ceremony around writing such a method is perhaps not the nicest for newcomers to Java.

There has been a bit of dicussion recently about how the "on-ramp" for Java could be made easier. This is the original proposal. Here are follow ups - OpenJDK, Reddit, Hacker news.

Starting point

This is the classic Java Hello Word:

  public class HelloWorld { 
    public static void main(String[] args) { 
      System.out.println("Hello World");
    }
  }

Lots of stuff going on - public, class, a class name, void, arrays, a method, method call. And one of the weirdest things in Java - System.out - a public static field in lower case. Something that is pretty much never seen in normal Java code. (I still remember System.out.println being the most confusing part about getting started in Java 1.0 - why are there two dots and why isn't it out()?)

The official proposal continues to discuss:

A more tolerant launch protocol
Unnamed classes
Predefined static imports for the most critical methods and fields

The ensuing discussion resulted in various suggestions. Having taken some time to reflect on the proposal and discussion, here is my contribution, which is that what is really needed is something more comprehensive.

Entrypoints

When a Java program starts some kind of class file needs to be run. It could be a normal class, but that isn't ideal as we don't really want static/instance variables, subclasses, parent interfaces, access control etc. One suggestion was for it to be a normal interface, but that isn't ideal as we don't want to mark the methods as default or allow abstract methods.

I'd like to propose that what Java needs is a new kind of class declaration for entrypoints.

I don't think this is overly radical. We already have two alternate class declarations - record and enum. They have alternate syntax that compiles to a class file without being explictly a class in source code. What we need here is a new kind - entrypoint - that compiles to a class file but has different syntax rules, just like record and enum do.

I believe this is a fundamentally better approach than the minor tweaks in the official proposal, because it will be useful to developers of all skill levels, including framework authors. ie. it has a much better "bang for buck".

The simplest entrypoint would be:

  // MyMain.java
  entrypoint {
    SystemOut.println("Hello World");
  }

In the source code we have various things:

Inferred class name from file name, the class file is MyMain$entrypoint
Top-level code, no need to discuss methods initially
No access to paraneters
New classes SystemOut, SystemIn and SystemErr
No constructor, as a new kind of class declaration it doesn't need it

The classes like SystemOut may seem like a small change, but it would have been much simpler for me from 25 years ago to understand. I don't favour more static imports for them (either here or more generally), as I think SystemOut.println("Hello World") is simple enough. More static imports would be too magical in my opinion.

The next steps when learning Java are for the instructor to expand the entrypoint.

Add a named method (always private, any name although main() would be common)
Add parameters to the method (maybe String[], maybe String...)
Add return type to the method (void is default return type)
Group code into a block
Add additional methods (always private)

Here are some valid examples. Note that instructors can choose the order to explain each feature:

  entrypoint {
    SystemOut.println("Hello World");
  }
  entrypoint main() {
    SystemOut.println("Hello World");
  }
  entrypoint main(String[] args) {
    SystemOut.println("Hello World");
  }
  entrypoint {
    main() {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    void main(String[] args) {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    main(String[] args) {
      output("Hello World");
    }
    output(String text) {
      SystemOut.println(text);
    }
  }

Note that there are never any static methods, static variables, instance variables or access control. If you need any of that you need a class. Thus we have proper separation of concerns for the entrypoint of systems, which would be Best Practice even for experienced developers.

Progressing to classes

During initial learning, the entrypoint class declaration and normal class declaration would be kept in separate files:

  // MyMain.java
  entrypoint {
    SystemOut.println(new Person().name());
  }
  // Person.java
  public class Person {
    String name() {
      return "Bob";
    }
  }

However, at some point the instructor would embed an entrypoint (of any valid syntax) in a normal class.

  public class Person {
    entrypoint {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

We discover that an entrypoint is normally wrapped in a class which then offers the ability to add static/instance variables and access control.

Note that since all methods on the entrypoint are private and the entrypoint is anonymous, there is no way for the rest of the code to invoke it without hackery. Note also that the entrypoint does not get any special favours like an instance of the outer class, thus there is no issue with no-arg constructors - if you want an instance you have to use new (the alternative is unhelpful magic that harms learnability IMO).

Finally, we see that our old-style static main method is revealed to be just a normal entrypoint:

  public class Person {
    entrypoint public static void main(String[] args) {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

ie. when a method is declared as public static void main(String[]) the keyword entrypoint is implicitly added.

What experienced developers gain from this is a clearer way to express what the entrypoint actually is, and more power in expressing whether they want the command line arguments or not.

Full-featured entrypoints

Everything above is what most Java developers would need to know. But an entrypoint would actually be a whole lot more powerful.

The basic entrypoint would compile to a class something like this:

  // MyMain.java
  entrypoint startHere(String[] args) {
    SystemOut.println("Hello World");
  }
  // MyMain$entrypoint.class
  public final MyMain$entrypoint implements java.lang.Entrypoint {
    @Override
    public void main(Runtime runtime) {
      runtime.execute(() -> startHere(runtime.args()));
    }
    private void startHere(String[] args) {
      SystemOut.println("Hello World");
    }
  }

Note that it is final and methods are private.

The Entrypoint interface would be:

  public interface java.lang.Entrypoint {
    /**
     * Invoked by the JVM to launch the program.
     * When the method completes, the JDK terminates.
     */
    public abstract void main(Runtime runtime);
  }

The Runtime.execute method would be something like:

  public void execute(ThrowableRunnable runnable) {
    try {
      runnable.run();
      System.exit(0);
    } catch (Throwable ex) {
      ex.printStackTrace();
      System.exit(1);
    }
  }

The JVM would do the following:

Load the class file specified on the command line
If it implements java.lang.Entrypoint call the no-args constructor and invoke it
Else look for a legacy public static void main(String[]), and invoke that

Note that java.lang.Entrypoint is a normal interface that can be implemented by anyone and do anything!

This last point is critical to enhancing the bang-for-buck. I was intriguied by things like Azul CRaC which wants to own the whole lifecycle of the JVM run. Wouldn't that be more powerful if they could control the whole lifecycle through Entrypoint. Another possibile use is to reset the state when an application has finished, allowing the same JVM to be reused - a bit like Function-as-a-Service providers or build system daemons do. (I suspect it may be possible to enhance the entrypoint concept to control the shutdown hooks and to catch things like System.exit but that is beyond the scope of this blog.) For example, here is a theoretical application framework entrypoint:

  // FrameworkApplication.java - an Open Source library
  public interface FrameworkApplication extends Entrypoint {
    public default main(Runtime runtime) {
      // do framework things
      start();
      // do framework things
    }
    public abstract start();
  }

Applications just implement this interface, and they can run it by specifying their own class name on the command line, yet it is a full-featured framework application!

Summary

I argue that the proposal above is more powerful and more useful to experienced developers than the official proposal, while still meeting the core goal of step-by-step on-ramp learning. Do let me know what you think. (Reddit discussion).

Saturday, 25 September 2021

Big problems at the timezone database

The last time I wrote about the timezone database on this blog, the database was under threat from a lawsuit. Fortunately that lawsuit went away relatively quickly as the company involved got the message that their action was a big mistake. Unfortunately this time the mess is internal.

Paul Eggert is the project lead of the timezone database hosted at IANA, a position referred to as the TZ Coordinator. He is an expert in the field, having been involved in documenting timezone data for decades. Unfortunately, he is currently ignoring all objections to an action only he seems intent on making to solve an invented problem that only he sees as important.

The database is the world's principle source of timezone information. The data is included in everything from operating systems to smartphones to programming language development kits such as the JDK. While you may never have heard of it, the sheer pervasiveness of the data makes the potential impact of change or damage pretty huge.

The timezone database contains information about how clocks have varies in each region around the world. The mandate of the project is to record this information from 1970 onwards.

Of course, computers being what they are, a function that returns the timezone for a given date can be passed in a pre-1970 date as well as a post-1970 one. For this, and reasons of completeness, the timezone database contains pre-1970 data as well as post-1970 data. If you go to your JDK or operating system and ask for the timezone offset for 1920-01-01 for the ID "Europe/Oslo" or "Europe/Berlin" you will get an answer:

  DateTimeZone oslo = DateTimeZone.forID("Europe/Oslo");
  System.out.println(oslo.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 3600000
  DateTimeZone berlin = DateTimeZone.forID("Europe/Berlin");
  System.out.println(berlin.getOffset(new DateTime(1948, 6, 1, 12, 0)));  //prints 7200000

The proposed change is to downgrade "Europe/Oslo" to be merely an alias for "Europe/Berlin". The rationale is that since the two regions have the same data post-1970 there should only be one ID. The issue with this is that querying "Europe/Oslo" in pre-1970 (as above) will now return the data from Berlin. ie. the well researched pre-1970 data for Oslo will be replaced by that of Berlin.

The situation with Joda-Time is even worse. With Joda-Time, aliases (also known as Links) are actively resolved. Before the proposed change this test case passes, after the proposed change the test case fails:

  assertEquals("Europe/Oslo", DateTimeZone.forID("Europe/Oslo").getID());

In other words, it will be impossible for a Joda-Time user to hold the ID "Europe/Oslo" in memory. This could be pretty catastrophic for systems that rely on timezone management, particularly ones that end up storing that data in a database. To mitigate this to some degree, I've added a test case and released a version that has such a test case in it, but obviously users of Joda-Time that haven't upgraded to the latest version may still see problems if they update the tzdb version.

But what about backwards compatibility? Well it seems that the TZ Coordinator just doesn't see this as being important. To him, it doesn't matter that users may be relying on this data.

(Technically, the data has moved, not been deleted. But the file containing the moved data is never normally used by downstream systems, thus to all intents and purposes it has been deleted.)

The question you might ask is why is Berlin favoured over Oslo? Why can Berlin keep its status and full history, but Oslo gets effectively deleted? The answer is that Berlin has the greater population - would you have guessed that if you didn't read it here?

From my perspective, I cannot see how it is not incredibly unfair that the timezone ID that represents the country of Norway, "Europe/Oslo", is treated as less important than the ID that represents the country of Germany, "Europe/Berlin". Yes, there is a technical rationale around 1970 and population, but that is really pretty arcane.

In fact it goes further than this. The TZ Coordinator does not really believe that there should be an ID for Oslo/Norway at all. (The official project rules say that there should only be one ID for locations where timezone data is the same post-1970. Country borders simply dont matter.)

Some of the 30 IDs proposed for downgrade are "Europe/Oslo", "Europe/Stockholm", "Europe/Copenhagen", "Europe/Amsterdam", "Europe/Luxembourg", "Europe/Monaco", "Atlantic/Reykjavik" and "Indian/Mahe". If you regularly use any of these IDs you may be affected by this change. Iceland is a classic case - "Atlantic/Reykjavik" is to be downgraded in favour of "Africa/Abidjan". Bonus points if you know which country Abidjan is in!

What is driving the change? Well this is where it gets really weird. The TZ Coordinator's argument is that there is a fairness/equity problem if Oslo is allowed to keep its pre-1970 history but other locations (typically in Africa) are not. I have two problems with this. Firstly, I consider it to also be unfair and inequitable that Berlin gets to have pre-1970 history and Oslo does not. Secondly, the correct approach to solving a fairness/equity problem is to level up, not level down. ie. Most leaders would want to improve the worse performers on the team, not force the best performers to be as bad as the worst.

So, we have a change with terrible downstream effects, including a potential fork of a major global data set and broken end-user applications, to make a problem that no one was complaining about a whole lot worse.

I've spent months trying to stop this happening, but appear to have lost the battle. This is despite near unaminity on the mailing list requesting the changes to be paused. Tonight 9 of the 30 changes have been included in release 2021b. These are not the ones affecting Europe.

I still hope that a solution can be found that the TZ Coordinator is happy with that avoids an impact on countries like Norway, Sweden, Denmark or the Netherlands. In the medium term, I hope that funding can be found for the CLDR project to take on the timezone database (as CLDR has a much better record at managing data like this).

Stay tuned as I try and work out how best to resolve this completely unecessary drama.

Monday, 4 November 2019

Java switch - 4 wrongs don't make a right

The switch statement in Java is being changed. But is it an upgrade or a mess?

Classic switch

The classic switch statement in Java isn't great. Unlike many other parts of Java, it wasn't properly rethought when pulling features across from C all those years ago.

The key flaw is "fall-through-by-default". This means that if you forget to put a break clause in each case, processing will continue on to the next case clause.

Another flaw is that variables are scoped to the entire switch, thus you cannot reuse a variable name in two different case clauses. In addition, default clause is not required, which leaves readers of the code unclear as to whether a clause was forgotten or not.

And of course there is also the key limitation - that the type to be switched on can only be an integer, enum or string.

 String instruction;
 switch (trafficLight) {
   case RED:
     instruction = "Stop";
   case YELLOW:
     instruction = "Prepare";
     break;
   case GREEN:
     instruction = "Go";
     break;
 }
 System.out.println(instruction);

The code above does not compile because there is no default clause, leaving instruction undefined. But even if it did compile, it would never print "Stop" due to the missing break. In my own coding, I prefer to always put a switch at the end of a method, with each clause containing a return to reduce the risks of switch.

Upgraded switch

As part of Project Amber, switch is being upgraded. But sadly, I'm unconvinced as to the merits of the new design. To be clear, there are some good aspects, but overall I think the solution is overly complex and with some unpleasant syntax choices.

The key aim is to add an expression form, where you can assign the result of the switch to a variable. This is rather like the ternary operator (eg. x != null ? x : ""), which is the expression equivalent of an if statement. An expression form would reduce problems like the undefined variable above, because it makes it more obvious that each branch must result in a variable.

The current plan is to add not one, but three new forms of switch. Yes, three.

Explaining this in a blog post is, unsurprisingly, going to take a while...

Type 1: Statement with classic syntax. As today. With fall-through-by-default. Not exhaustive.
Type 2: Expression with classic syntax. NEW! With fall-through-by-default. Must be exhaustive.
Type 3: Statement with new syntax. NEW! No fall-through. Not exhaustive.
Type 4: Expression with new syntax. NEW! No fall-through. Must be exhaustive.

The headline example (type 4) is of course quite nice:

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> "Prepare";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

As can be seen, the new syntax of type 3 and 4 uses an arrow instead of a colon. And there is no need to use break if the code consists of a single expression. There is also no need for a default clause when using an enum, because the compiler can insert it for you provided you've included all the known enum values. So, if you missed out GREEN, you would get a compile error.

The devil of course is in the detail.

Firstly, a clear positive. Instead of falling through by listing multiple labels, they can be comma-separated:

 // type 4
 var instruction = switch (trafficLight) {
   case RED, YELLOW -> "Stop";
   case GREEN -> "Go";
 };
 System.out.println(instruction);

Straightforward and obvious. And avoids many of the simple fall-through use cases.

What if the code to execute is more complex than an expression?

 // type 4
 var instruction = switch (trafficLight) {
   case RED -> "Stop";
   case YELLOW -> {
     revYourEngine();
     yield "Prepare";
   }
   case GREEN -> "Go";
 };
 System.out.println(instruction);

yield? Shrug. For a long time it was going to be break {expression}, but this clashes with labelled break (a syntax feature that is rarely used).

So what about type 2?

 // type 2
 var instruction = switch (trafficLight) {
   case RED: yield "Stop";
   case YELLOW:
     System.out.println("Prepare");
   case GREEN: yield "Go";
 };
 System.out.println(instruction);

Oops! I forgot the yield. So, an input of YELLOW will output "Prepare" and then fall-through to yield "Go".

So, why is it proposed to add a new form of switch that repeats the fall-through-by-default error from 20 years ago? The answer is orthogonality - a 2x2 grid with expression vs statement and fall-through-by-default vs no fall-through.

A key question is whether being orthogonal justifies adding a almost totally useless form of switch (type 2) to the language.

So, type 3 is fine them?

Well, no. Because of the insistence on orthogonality, and thus an insistence of copying the historic rules relating to type 1 statement switch, there is no requirement to list all the case clauses:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
 }

So, what happens for YELLOW? The answer is nothing, but as a reader I am left wondering if the code is correct or incomplete. It would be much better if the above was a compile error, with developers forced to write a default clause:

 // type 3
 switch (trafficLight) {
   case RED -> doStop();
   case GO -> doGo();
   default -> {}
 }

The official argument is that since type 1 statement switch (the current one) does not force exhaustiveness, neither can the new type 3 statement switch. My view is that keeping a bad design from 20 years ago is a worse sin.

What else? Well, one thing to bear in mind is that expressions cannot complete early, thus there is no way to return directly from within a switch expression (type 2 or 4). Nor is there a way to continue/break a loop. Trust me when I say there is an endless supply of Java exam questions in the rules that actually apply.

Summarizing the types

Type 1: Classic statement switch

Statement
Fall-through-by-default
return allowed, also continue/break a loop
Single scope for variables
Logic for each case is a sequence of statements potentially ending with break
Not exhaustive - default clause is not required
yield is not allowed

Type 2: Classic syntax expression switch

Expression
Fall-through-by-default
return not allowed, cannot continue/break a loop
Single scope for variables
Logic for each case can be a yield expression, or a sequence of statements potentially ending with yield
Exhaustive - default clause is required
Must use yield to return values

Type 3: Arrow-form statement switch

Statement
Fall-through is not permitted
return allowed, also continue/break a loop
No variable scope problems, logic for each case must be a statement or a block
Not exhaustive - default clause is not required
yield is not allowed

Type 4: Arrow-form expression switch

Expression
Fall-through is not permitted
return not allowed, cannot continue/break a loop
No variable scope problems, logic for each case must be an expression or a block ending with yield
Exhaustive - default clause is required
Must use yield to return values, but only from blocks (it is implied when not a block)

Are you confused yet?

OK, I'm sure I didn't explain everything perfectly, and I may well have made an error somewhere along the way. But the reality is that it is complex, and there are lots of rules hidden in plain sight. Yes, it is orthogonal. But I really don't think that helps in comprehending the feature.

What would I do?

Type 4 switch expressions are fine (although I have real issues with the extension of the arrow syntax from lambdas). My problem is with type 2 and 3. In reality, those two types of switch will be very rare, and thus most developers will never see them. Given this, I believe it would be better to not include them at all. Once this is accepted, there is no point in treating the expression form as a switch, because it won't actually have many connections to the old statement form.

I would drop type 2 and 3, and allow type 4 switch expressions to become what is known as statement expressions. (Another example of a statement expression is a method call, which can be used as an expression or as a statement on a line of its own, ignoring any return value.)

 // Stephen's expression switch
 var instruction = match (trafficLight) {
   case RED: "Stop";
   case YELLOW: "Prepare";
   case GO: "Go";
 }
 // Stephen's expression switch used as a statement (technically a statement expression)
 match (instruction) {
   case "Stop": doStop();
   case "Go": doGo();
   default: ;
 }

My approach uses a new keyword match, as I believe extending switch is the wrong baseline to use. Making it a statement expression means that there is only one set of rules - it is always an expression, it is just that you can use it as though it were a statement. What you can't do with my approach is use return in the statement version, because it isn't actually a statement (you can't use return from any expression in Java today, so this would be no different).

Summary

If you ignore the complexity, and just use type 4 switch expressions, the new feature is quite reasonable.

However, in order to add the one form of switch Java needed, we've also got two other duds - type 2 and 3. In my view, the feature needs to go back to the drawing board, but sadly I suspect it is now too late for that.

Friday, 22 March 2019

User-defined literals in Java?

Java has a number of literals for creating values, but wouldn't it be nice if we had more?

Current literals

These are some of the literals we can write in Java today:

integer - 123, 12s, 1234L, 0xB8E817, 077, 0b1011_1010
floating point - 45.6f, 56.7d, 7.656e6
string - "Hello world"
char - 'a'
boolean - true, false
null - null

Project Amber is also considering adding multi-line and/or raw string literals.

But there are many other data types that would benefit from literals, such as dates, regex and URIs.

User-defined literals

In my ideal future, I'd like to see Java extended to support some form of user-defined literals. This would allow the author of a class to provide a mechanism to convert a sequence of characters into an instance of that class. It may be clearer to see some examples using one possible syntax (using backticks):

 Currency currency = `GBP`;
 LocalDate date = `2019-03-29`;
 Pattern pattern = `strata\.\w+`;
 URI uri = `https://blog.joda.org/`;

A number of semantic features would be required:

Type inference
Raw processing
Validated at compile-time

Type inference

Type inference is of course a key aspect of literals. It would have to work in a similar way to the existing literals, but with a tweak to handle the new var keyword. ie. these two would be equivalent:

 LocalDate date = `2019-03-29`;
 var date = LocalDate`2019-03-29`;

The type inference would also work with methods (compile error if ambiguous):

 boolean inferior = isShortMonth(`2019-04-12`);

 public boolean isShortMonth(LocalDate date) { return date.lengthOfMonth() < 31; }

Raw processing

Processing of the literal should not be limited by Java's escape mechanisms. User-defined literals need access to the raw string. Note that this is especially useful for regex, but would also be useful for files on Windows:

 // user-defined literals
 var pattern = Pattern`strata\.\w+`;
 // today
 var pattern = Pattern.compile("strata\\.\\w+");

Today, the `\` needs to be escaped, making the regex difficult to read.

Clearly, the problem with parsing raw literals is that there is no mechanism to escape. But the use cases for user-defined literals tend to have constrained formats, eg. a date doesn't contain random characters. So, although there might be edge cases where this would be a problem, they would vert much be edge cases.

Validated at Compile-time

A key feature of literals is that they are validated at compile-time. You can't use an integer literal to create an int if the value is larger than the maximum allowed integer (2^31).

User-defined literals also need to be parsed and validated at compile-time too. Thus this code would not compile:

 LocalDate date = `2019-02-31`;

Most types which would benefit from literals only accept specific input formats, so being able to check this at compile time would be beneficial.

How would it be implemented?

I'm pretty confident that there are various ways it could be done. I'm not going to pick an approach, as ultimately those that control the JVM and language are better placed to decide. Clearly though, there is going to need to be some form of factory method on the user class that performs the parse, with that method invoked by the compiler. And ideally, the results of the parse would be stored in the constant pool rather than re-parsed at runtime.

What I would say is that user-defined literals would almost be a requirement for making value types usable, so something like this may be on the way anyway.

Summary

I like literals. And I would really like to be able to define my own!

Any thoughts?

Wednesday, 9 January 2019

Commercial support for Joda and ThreeTen projects

The Java ecosystem is made up of many individuals, organisations and companies producing many different libraries. Some of the largest projects have long had support options where users of the project, typically corporates, can pay for an enhanced warranty, guaranteed approach to bug fixes and more.

Small projects, run by a single individual or a team, have been unable to offer this service, even if they wanted to. In addition, there is a more subtle problem. The amount a small project could charge is too low for a corporate to pay.

This sounds odd, but was brought home to me by this thread on twitter:

As the thread indicates, it is basically impossible for a corporate to gift money to a small project, and it is not viable for small projects to meaningfully offer a support contract.

The problem is that not paying the maintainers has negative consequences. Take the recent case where a developer handed his open source project on to another person, who then used it to steal bitcoins.

Pay the maintainers

I believe there is now a solution to the problem. Tidelift.

Tidelift offers companies a monthly subscription to support their open source usage. And they pay some of that income directly to the maintainers of the projects that the company uses.

Maintainers are expected to continue maintaining the project, follow a responsible disclosure process for security issues and check their licensing. Tidelift does not get to control the project roadmap, and maintainers do not have to provide an active helpdesk or consulting. See here for more details.

As such, I'm now offering commercial support for Joda-Time, Joda-Money, Joda-Beans, Joda-Convert, Joda-Collect, ThreeTen-Extra, ThreeTen-backport via the Tidelift subscription.

This is an extra option for those that want to support the maintainers of open source but haven't been able to find a way to do so until now. The Joda and ThreeTen projects will always be free and available under a permissive licence, so there is no need to worry as a result of this.

Comments welcome.