Thursday 6 October 2022

Java on-ramp - Fully defined Entrypoints

How do you start a Java program? With a main method of course. But the ceremony around writing such a method is perhaps not the nicest for newcomers to Java.

There has been a bit of dicussion recently about how the "on-ramp" for Java could be made easier. This is the original proposal. Here are follow ups - OpenJDK, Reddit, Hacker news.

Starting point

This is the classic Java Hello Word:

  public class HelloWorld { 
    public static void main(String[] args) { 
      System.out.println("Hello World");
    }
  }

Lots of stuff going on - public, class, a class name, void, arrays, a method, method call. And one of the weirdest things in Java - System.out - a public static field in lower case. Something that is pretty much never seen in normal Java code. (I still remember System.out.println being the most confusing part about getting started in Java 1.0 - why are there two dots and why isn't it out()?)

The official proposal continues to discuss:

  • A more tolerant launch protocol
  • Unnamed classes
  • Predefined static imports for the most critical methods and fields

The ensuing discussion resulted in various suggestions. Having taken some time to reflect on the proposal and discussion, here is my contribution, which is that what is really needed is something more comprehensive.

Entrypoints

When a Java program starts some kind of class file needs to be run. It could be a normal class, but that isn't ideal as we don't really want static/instance variables, subclasses, parent interfaces, access control etc. One suggestion was for it to be a normal interface, but that isn't ideal as we don't want to mark the methods as default or allow abstract methods.

I'd like to propose that what Java needs is a new kind of class declaration for entrypoints.

I don't think this is overly radical. We already have two alternate class declarations - record and enum. They have alternate syntax that compiles to a class file without being explictly a class in source code. What we need here is a new kind - entrypoint - that compiles to a class file but has different syntax rules, just like record and enum do.

I believe this is a fundamentally better approach than the minor tweaks in the official proposal, because it will be useful to developers of all skill levels, including framework authors. ie. it has a much better "bang for buck".

The simplest entrypoint would be:

  // MyMain.java
  entrypoint {
    SystemOut.println("Hello World");
  }

In the source code we have various things:

  • Inferred class name from file name, the class file is MyMain$entrypoint
  • Top-level code, no need to discuss methods initially
  • No access to paraneters
  • New classes SystemOut, SystemIn and SystemErr
  • No constructor, as a new kind of class declaration it doesn't need it

The classes like SystemOut may seem like a small change, but it would have been much simpler for me from 25 years ago to understand. I don't favour more static imports for them (either here or more generally), as I think SystemOut.println("Hello World") is simple enough. More static imports would be too magical in my opinion.

The next steps when learning Java are for the instructor to expand the entrypoint.

  • Add a named method (always private, any name although main() would be common)
  • Add parameters to the method (maybe String[], maybe String...)
  • Add return type to the method (void is default return type)
  • Group code into a block
  • Add additional methods (always private)

Here are some valid examples. Note that instructors can choose the order to explain each feature:

  entrypoint {
    SystemOut.println("Hello World");
  }
  entrypoint main() {
    SystemOut.println("Hello World");
  }
  entrypoint main(String[] args) {
    SystemOut.println("Hello World");
  }
  entrypoint {
    main() {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    void main(String[] args) {
      SystemOut.println("Hello World");
    }
  }
  entrypoint {
    main(String[] args) {
      output("Hello World");
    }
    output(String text) {
      SystemOut.println(text);
    }
  }

Note that there are never any static methods, static variables, instance variables or access control. If you need any of that you need a class. Thus we have proper separation of concerns for the entrypoint of systems, which would be Best Practice even for experienced developers.

Progressing to classes

During initial learning, the entrypoint class declaration and normal class declaration would be kept in separate files:

  // MyMain.java
  entrypoint {
    SystemOut.println(new Person().name());
  }
  // Person.java
  public class Person {
    String name() {
      return "Bob";
    }
  }

However, at some point the instructor would embed an entrypoint (of any valid syntax) in a normal class.

  public class Person {
    entrypoint {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

We discover that an entrypoint is normally wrapped in a class which then offers the ability to add static/instance variables and access control.

Note that since all methods on the entrypoint are private and the entrypoint is anonymous, there is no way for the rest of the code to invoke it without hackery. Note also that the entrypoint does not get any special favours like an instance of the outer class, thus there is no issue with no-arg constructors - if you want an instance you have to use new (the alternative is unhelpful magic that harms learnability IMO).

Finally, we see that our old-style static main method is revealed to be just a normal entrypoint:

  public class Person {
    entrypoint public static void main(String[] args) {
      SystemOut.println(new Person().name());
    }
    String name() {
      return "Bob";
    }
  }

ie. when a method is declared as public static void main(String[]) the keyword entrypoint is implicitly added.

What experienced developers gain from this is a clearer way to express what the entrypoint actually is, and more power in expressing whether they want the command line arguments or not.

Full-featured entrypoints

Everything above is what most Java developers would need to know. But an entrypoint would actually be a whole lot more powerful.

The basic entrypoint would compile to a class something like this:

  // MyMain.java
  entrypoint startHere(String[] args) {
    SystemOut.println("Hello World");
  }
  // MyMain$entrypoint.class
  public final MyMain$entrypoint implements java.lang.Entrypoint {
    @Override
    public void main(Runtime runtime) {
      runtime.execute(() -> startHere(runtime.args()));
    }
    private void startHere(String[] args) {
      SystemOut.println("Hello World");
    }
  }

Note that it is final and methods are private.

The Entrypoint interface would be:

  public interface java.lang.Entrypoint {
    /**
     * Invoked by the JVM to launch the program.
     * When the method completes, the JDK terminates.
     */
    public abstract void main(Runtime runtime);
  }

The Runtime.execute method would be something like:

  public void execute(ThrowableRunnable runnable) {
    try {
      runnable.run();
      System.exit(0);
    } catch (Throwable ex) {
      ex.printStackTrace();
      System.exit(1);
    }
  }

The JVM would do the following:

  • Load the class file specified on the command line
  • If it implements java.lang.Entrypoint call the no-args constructor and invoke it
  • Else look for a legacy public static void main(String[]), and invoke that

Note that java.lang.Entrypoint is a normal interface that can be implemented by anyone and do anything!

This last point is critical to enhancing the bang-for-buck. I was intriguied by things like Azul CRaC which wants to own the whole lifecycle of the JVM run. Wouldn't that be more powerful if they could control the whole lifecycle through Entrypoint. Another possibile use is to reset the state when an application has finished, allowing the same JVM to be reused - a bit like Function-as-a-Service providers or build system daemons do. (I suspect it may be possible to enhance the entrypoint concept to control the shutdown hooks and to catch things like System.exit but that is beyond the scope of this blog.) For example, here is a theoretical application framework entrypoint:

  // FrameworkApplication.java - an Open Source library
  public interface FrameworkApplication extends Entrypoint {
    public default main(Runtime runtime) {
      // do framework things
      start();
      // do framework things
    }
    public abstract start();
  }

Applications just implement this interface, and they can run it by specifying their own class name on the command line, yet it is a full-featured framework application!

Summary

I argue that the proposal above is more powerful and more useful to experienced developers than the official proposal, while still meeting the core goal of step-by-step on-ramp learning. Do let me know what you think. (Reddit discussion).

No comments: