Sunday 24 July 2011

Reversed type declarations

I can't write about Kotlin without first talking about the folly of "reversed" type declarations. Sadly this disease has afflicted Scala, Gosu and now Kotlin (but perhaps its not yet too late for Kotlin ;-).

Reversed type declarations

When I see a new language, one of the first things I look at is the parameters and variable declarations. For this blog I'll refer to them as "standard" (like Java, Ceylon and Fantom) and "reversed" (like Pascal, Scala, Gosu and Kotlin).

  // standard
  Type variableName
  
  // reversed
  variableName : Type

Here I compare a Java and Kotlin method, although the principle is similar for Gosu, Scala and quite a few others.

  // Java
  public void process(String str) { ... }
  
  // Reversed lang (Kotlin or any similar language like Scala or Gosu)
  fun process(str : String) { ... }

When I see the latter, I cringe. Its usually a sign that the language isn't going to win me over.

So why the big deal?

For a start, there is one extra syntax character, the colon. Given that this is unnecessary (as shown by Java), it continues to be surprising to me that languages that aim to be less verbose than Java begin with something that is more verbose.

Ignoring the annoyance of having to type the extra character, the two examples above are still fundamentally readable. The human eye (specifically my human eye, but hey I'm being opinionated...) can cope with the extra colon and "reversed" declaration order. However, add complexity and the situation changes:

  // Java
  public void process(String str, int total, List<String> input) { ... }

  // Reversed lang
  fun process(str : String, total : int, input : List<String>) { ... }

As more parameters are added, the eye has more difficulty in picking out where one parameter ends and the next one starts. This is simply because the colon is more visually arresting than the comma, so as the eye scans the line it breaks up the parameters using the colons, not the commas. Thus at a glance I see "str", "String, total", "int, input", "List<String>". In fact, my eye sometimes doesn't see the commas at all, thus I get "str", "String total", "int input", "List<String>" which is horribly broken.

In order to actually read the information, I have to slow down and take longer. But when designing a programming language, it is rapid and quick readability that matters. Slowing me down on reading is a Bad Thing. So, beyond being unnecessary, the extra colons are actually making the code significantly harder to read (and write!).

But it gets worse:

  // Java
  public Future<Person> process(String str, List<String> input) { ... }

  // Reversed lang
  fun process(str : String, input : List<String>) : Future<Person> { ... }

Now, we have a return type, again separated by a colon. The use of the same character (yes that is another verbosity character I have to type) makes it especially difficult to visually parse. For me, the strength of the colon overrides the end bracket, thus I end up seeing "Future<Person>" as a parameter. Effectively my eye is parsing the line in a fraction of a second, but it gets to the end and has to double back to "push" the last thing it saw onto the "return type" stack. Try this one if you're struggling to see the issue. Note how the types and colons dominate and flow into one another, causing the distinctions as to their meaning (type of a parameter vs return type) to be lost:

  fun process(a : Int, b : Int, c : Int) : Int { ... }

As an aside, lets look at default parameters, which many new languages support (using an equivalent syntax for Java):

  // Java
  public Future<Person> process(String str, List<String> input, int total = 0) { ... }

  // Reversed lang
  fun process(str : String, input : List<String>, total : Int = 0) : Future<Person> { ... }

Now it is really broken! Now I've got "Int = 0" staring me in the eye, which really is not what the programmer was trying to express. Again, that visual barrier of the colon, together with the type, makes it very hard to connect the actual parameter name "total" with the value it has "0".

The real test for the syntax is the more complex case of higher order functions. This varies a lot by language, so lots of examples (hopefully accurate - I don't have time for lots of testing). I'm simulating some syntax for Java and Fantom:

  // Java
  public <T, R> List<R> transform(List<T> list, Transformer<T, R> transformer) { ... }
  public <T, R> List<R> transform(List<T> list, #R(T) transformer) { ... }  // lambda strawman
  public <T, R> List<R> transform(List<T> list, {T => R} transformer) { ... }  // BGGA
  
  // Ceylon
  shared List<R> transform<T, R>(List<T> list, R transformer(T)) { ... }
  
  // Fantom
  R[] transform<T, R>(T[] list, |T -> R| transformer) { ... }  // simulated syntax

  // Gosu
  function transform<T, R>(list : List<T>, transformer(T) : R) : List<R> { ... }
  
  // Kotlin
  fun <T, R> transform(list : List<T>, transformer : fun(T) : R) : List<R> { ... }
  
  // Scala
  def transform[T, R](list : List[T], transformer: T => R) : List[R]

The Java strawman and BGGA examples and Fantom pseudo-example demonstrate that a form can be created where higher order declarations are possible using "standard" declarations. Ceylon chooses to go down the C style route, mixing the parameter name in the middle of the return type and arguments. I don't find Ceylon's choice as readable to my eye when scanning as the Strawman/BGGA/Fantom pseudo-examples because it mixes the type and the variable name.

The three "reversed" declaration approaches are very different. Gosu (if I've read the documentation correctly) makes a very weird choice as the element after the colon is the return type of the function not the type of the variable "transformer" within the method as would be expected most of the time. Kotlin's choice is also poor, as it now means there are two colons in the parameter declaration, one to separate the variable name from the type, and one to separate the function type input from its output. Scala's is the most rational of the "reversed" declaration styles. However, I find the lack of anything surrounding the "T => R" means that the eye struggles to find the start and end of the type in more complex examples, which is essential to finding the variable name.

Of these, the Strawman/BGGA/Fantom pseudo-examples and most readable of the first group and Scala in the second group (ignoring the real Java example for a minute, and noting that Scala would be clearer with something around the function type). That is because when I'm performing the eye parsing/scanning I've been talking about, I am essentially trying to grasp the signature. To do that I need to know the number of parameters, their names and their types. Specifically, I want to put the name and type into different mental boxes. Mixing the name and type as Ceylon and Gosu do makes that harder, while Kotlin's additional colon simply creates another fence for my eye to have to jump.

To do this full justice, I should really have some examples of function types of function types. However this blog is already very long...

Finally, I'll point out that this isn't just confined to method parameters, but also to variable declarations such as local variables. Again, this gets complicated between language in the detail, so I'll compare to a typical "reversed" type language using a braindead stupid example:

  // Java
  private String process(int total) {
    String str = Integer.toString(total);
    return str;
  }

  // Reversed lang
  fun process(total : Int) : String {
    val str : String = Integer.toString(total)
    return str
  }

Again, the "reversed" example is telling me that "String = ...", not "str = ...". In logical terms, its utterly broken.

OK, OK, I can hear you yelling "type inference":

  // Java
  private String process(int total) {
    val str = Integer.toString(total);
    return str;
  }

  // Reversed lang
  fun process(total : Int) : String {
    val str = Integer.toString(total)
    return str
  }

So, I cheated right? By inventing a Java type inference syntax. Well, I'm making the point that type inference need not be limited to new languages, Java or any language using the "standard" type declaration style can have it too (and Fantom and Ceylon do). Thus, we should judge the variable declarations by the long form, even if it is not used for local variables all the time. And as shown above, the long form is awful in the "reversed" style. I am most emphatically not assigning the total to "String", I'm assigning it to "str", and that is what the code should say.

I'm sure if you've read this far you have a number of comments. Perhaps you believe tooling solves the issue, maybe syntax colouring? Well, I'll simply say that while tooling helps, you should still be able to understand the language without it, even if just for command line diffs around a version control system. Or perhaps you're objecting to my methodology of trying to visually parse a line in a glance? Its how I work, don't you do scan code too?

Let me be clear, in none of the above do I mention the task of the compiler. My sole focus is on the developer reading the code, and an order of magnitude less writing code.

Any argument in support of "reversed" type declarations should never be based on relevance to the compiler or some other element of type theory.

My view is that the usability to the mainstream developer is what matters. And that is primarily about ease of reading what is written. I have endeavoured to show that the reverse style hampers readability, and is unnecessary to achieve the same goals of a more complex type system that are sometimes used as justification.

Summary

I'm arguing that the "reversed" type declaration style is flat out harder to visually parse, and should therefore be rejected by language authors, even if they believe they have sound compiler or type theory rationales. Programming languages exist primarily for developers, not to aid the compiler or underlying theories. "Write once, Read many" must be the first law of language design!

I am thus hugely disappointed that Kotlin, which has many fine features taken from Fantom, did not think through this choice in more detail, and I plead with the authors to change their minds before Kotlin is locked down.

Opinions welcome, and I'm sure there will be lots...

Thursday 21 July 2011

Kotlin and the search for a better Java

Another day, another new JVM language. This time Kotlin.

My first point here is that another group is willing to go public and say that Scala is too complex. It is easy to miss this, but anyone writing a new language right now (Kotlin, Ceylon, Gosu, Fantom, ...) is implicitly saying Scala isn't right. Of course I don't expect Scala supporters to like this or agree with it, but the truth is that I and many others have looked at it and run fast in the opposite direction.

As I've commented before, my dislike for Scala is not support for Java. There is a lot wrong with Java, and that cannot be sorted out without breaking backwards compatibility. Elements like primitives, arrays, wildcard generics and basic operators like equals. That is why I have proposed a backwards incompatible Java - #bijava.

More generally, my position is that as a community there is a role for a popular, statically typed, industrial, Java-like langauge without Java's warts. The Java-like also means a design that manages complexity and is usable by the mass-market. Fantom, Gosu, Ceylon and Kotlin are targetting that market. Groovy, Clojure, Scala and many others are simply not targetting that specific market.

I can't comment that much on Ceylon as it is more vapour than reality and from the little public information appears to have some dubious design decisions, especially around verbose words rather than syntax. I've also not studied Gosu much for some reason, yet in a day I've looked at Kotlin a lot (not sure why Gosu doesn't excite me...).

Fantom is probably the most different of the four. It provides a new platform, which happens to run on the JVM (and in the browser via Javascript!). The core Fantom library is new and dedicated to Fantom, with some different design principles to those of Java. It also runs its own form of bytecode, allowing deep immutability, non-null types, full modules and reified generics amongst other things. In fact, one could question whether Fantom fits into the group of four at all, however it does fit the criteria of statically typed, industrial and Java-like.

So, what about Kotlin? Well at first glance it gets a lot right, starting with null-safety and type inference. However, I have real issues with some features, which I think a proper Kotlin blog piece should focus on.

More generally if I could wave my magic wand, I would probably wish that Ceylon and Kotlin would merge into a single project. (Gosu and Fantom are now used in production, and Fantom has many different goals, so both are harder to change now.) Basically, we need the energy of all that input (and associated money) but with a better, single focus. Both Ceylon and Kotlin are still mostly paper languages, and both are trying to achieve the same thing (with Kotlin looking closer than Ceylon at this point). RedHat and JetBrains could you please have a conversation? (I'm happy to mediate if desired.)

Summary

I still like Fantom and I think most people hugely underestimate it, especially those from a Scala background. Fantom's rethinks what a language should be from the platform/productivity perspective with deep immutability, deep modules, no shared state and a practical type system that aims to eliminate bugs not be in absolute control. Scala is in many ways, light years behind Fantom.

And this is my key point with Kotlin too. Simply focussing on syntax is worthy, but kind of misses the point. Syntax exists simply to express the programmers intent in a way that should be readable years later. What makes a bigger difference are the productivity issues that are not typically thought of when talking about a language - versioning of code, which logging library to use, how to access configuration or injected state. Kotlin tackles the syntax parts pretty well, though not perfectly. But its not clear yet if they can grasp just how unimportant the syntax is relative to language related productivity gains.