Friday, 22 March 2019

User-defined literals in Java?

Java has a number of literals for creating values, but wouldn't it be nice if we had more?

Current literals

These are some of the literals we can write in Java today:

  • integer - 123, 12s, 1234L, 0xB8E817, 077, 0b1011_1010
  • floating point - 45.6f, 56.7d, 7.656e6
  • string - "Hello world"
  • char - 'a'
  • boolean - true, false
  • null - null

Project Amber is also considering adding multi-line and/or raw string literals.

But there are many other data types that would benefit from literals, such as dates, regex and URIs.

User-defined literals

In my ideal future, I'd like to see Java extended to support some form of user-defined literals. This would allow the author of a class to provide a mechanism to convert a sequence of characters into an instance of that class. It may be clearer to see some examples using one possible syntax (using backticks):

 Currency currency = `GBP`;
 LocalDate date = `2019-03-29`;
 Pattern pattern = `strata\.\w+`;
 URI uri = `https://blog.joda.org/`;

A number of semantic features would be required:

  • Type inference
  • Raw processing
  • Validated at compile-time
Type inference

Type inference is of course a key aspect of literals. It would have to work in a similar way to the existing literals, but with a tweak to handle the new var keyword. ie. these two would be equivalent:

 LocalDate date = `2019-03-29`;
 var date = LocalDate`2019-03-29`;

The type inference would also work with methods (compile error if ambiguous):

 boolean inferior = isShortMonth(`2019-04-12`);

 public boolean isShortMonth(LocalDate date) { return date.lengthOfMonth() < 31; }
Raw processing

Processing of the literal should not be limited by Java's escape mechanisms. User-defined literals need access to the raw string. Note that this is especially useful for regex, but would also be useful for files on Windows:

 // user-defined literals
 var pattern = Pattern`strata\.\w+`;
 // today
 var pattern = Pattern.compile("strata\\.\\w+");

Today, the `\` needs to be escaped, making the regex difficult to read.

Clearly, the problem with parsing raw literals is that there is no mechanism to escape. But the use cases for user-defined literals tend to have constrained formats, eg. a date doesn't contain random characters. So, although there might be edge cases where this would be a problem, they would vert much be edge cases.

Validated at Compile-time

A key feature of literals is that they are validated at compile-time. You can't use an integer literal to create an int if the value is larger than the maximum allowed integer (2^31).

User-defined literals also need to be parsed and validated at compile-time too. Thus this code would not compile:

 LocalDate date = `2019-02-31`;

Most types which would benefit from literals only accept specific input formats, so being able to check this at compile time would be beneficial.

How would it be implemented?

I'm pretty confident that there are various ways it could be done. I'm not going to pick an approach, as ultimately those that control the JVM and language are better placed to decide. Clearly though, there is going to need to be some form of factory method on the user class that performs the parse, with that method invoked by the compiler. And ideally, the results of the parse would be stored in the constant pool rather than re-parsed at runtime.

What I would say is that user-defined literals would almost be a requirement for making value types usable, so something like this may be on the way anyway.

Summary

I like literals. And I would really like to be able to define my own!

Any thoughts?

8 comments:

  1. Clojure's tagged literals provide roughly this functionality. There are a couple of built-in literals (for instants and UUIDs) and you can define your own:

    https://clojure.org/reference/reader#_built_in_tagged_literals

    ReplyDelete
    Replies
    1. yes, in fact you may be interested to see some literals for the jsr310 objects: https://github.com/henryw374/time-literals

      Delete
  2. Improving existing constructors with built-in literals is fine but user defined? Terrible idea. Java is designed to readable without having to dig into class internals. You probably think arbitrary operator overloading is a good idea too. When will clowns learn?

    ReplyDelete
    Replies
    1. No I don't think arbitrary operator overloading is a good idea. Controlled operator overloading by implementing an interface or similar is IMO a good thing, and is probably going to be a necessary part of value types, just like user-defined literals.

      Delete
  3. I like the general idea. I'd go beyond literals, however, and add support for type-safe unit expressions. For instance, here are some examples involving date/time you might appreciate:


    var time = 2:35 PM;
    var date = 2018-May-22;
    var dateTime = 2018-May-22 2:35:53:909 PM PST;
    var jdate = Heisei 27-May-19;
    var duration = 1 day - 1 s;


    Other example include all manner of dimension[ish] types. A dimension can be physical like Length, Time, Weight, etc. A dimension can also be abstract or intangible such as Money or Memory. Basically a dimension represents something that can be measured in specific units, like:


    var investment = 5000 EUR + 10000 USD;
    var work = 49 kg m/s/s * 10m;
    var work2 = 490 Joules;


    The idea involves "binding expressions", which consist of two adjacent expressions where the type of one expression supports a binding relationship with the type of the other expression. The expression type providing the binding relationship is called a binder type; it implements postfixBind() and/or prefixBind(). The return type of a bind method determines the type of the resulting binding expression, thus the result of evaluating a binding expression is the result of either method call.

    One nice outcome of the implementation is that any mix of units in a given calculation "just works." This is because a value of any given dimension is always maintained in "standard" units e.g., meters for the Length dimension, but also has a "preferred" display unit type.

    In recent months I've entertained the idea of adding this feature to Java via the Manifold framework as a javac plugin. I will at least port the foundation class library and add dimensional operator overloading via interface where you can do a more verbose Java version of 50 mph * 3 hr:


    var velocity = new Velocity(50, Mile, Hour);
    var time = new Time(3, Hour);
    var distance = velocity * time; // a Length of 150 miles

    ReplyDelete
  4. Scala also has this functionality using macros. Here is a quick example for uritemplates, https://github.com/Slakah/uritemplate4s/#usage .

    The library circe (https://github.com/circe/circe) also provides compile time checks for the validity of json, i.e.

    import io.circe.literal._

    json""" {"foo": "bar"} """ // compiles
    json""" {"foo": "bar} """ // doesn't compile

    ReplyDelete
  5. look what postgresql has for that: https://www.postgresql.org/docs/8.1/sql-syntax.html section 4.1.2.5. Constants of Other Types
    'string'::type

    your syntax could be similar to that:

    var date = `2018-May-22`::Date("YYYY-mm-DD");

    where Date is a resulting type of literal `2018-May-22`, "YYYY-mm-DD" is a parsing format.
    effectively this syntax is constructor invocation, though could be checked at compile time.

    more complex expressions could be written similar to that

    var foo = `1,2,3,4`::vector * 2 + `2,3,4,5`::vector * 5;

    that is an expression which could be evaluated at compile time,
    where
    vector
    is user defined type, which has a constructor accepting literal, for parsing
    though parsing is done at compile time,

    all that could be possible if user defined literal is defined in the language syntax, which could be accompanied with type,
    and parsing rules are specified for that type -- like constructor for vector accepting user defined literal. one caveat though, implementation of vector accepting that literal should be available at compile time, and compiled results should be bound to that implementation at compile time. substitution of vector implementation for execution may break parsing contract.
    shoot me an email if you think it is acceptable --> dmitriy_pichugin@yahoo.com

    ReplyDelete

Please be aware that by commenting you provide consent to associate your selected profile with your comment. Long comments or those with excessive links may be deleted by Blogger (not me!). All spam will be deleted.