Tuesday 29 November 2011

Real life Scala feedback from Yammer

Following up on my recent Scala posts, some commented that because I haven't used Scala seriously, my opinion is of little value. I responded by noting that writing the FCM closures proposal for Java, altering the javac compiler, and talking at conferences about language design might qualify me to have an opinion.

Having said all that, I do believe that the most valuable feedback comes from those that have tried Scala, and found it wanting.

Today there was a good example of this, the most important kind of feedback. Yammer, the Enterprise Social Network, announced provided an indication that they are moving away from Scala, and back to Java. And whats more, they provided a very detailed rationale. The original gist disappeared, so for certainty I've copied the current version inline here (my bold highlights):

Update: Twitter user coda says "@jodastephen I'm saying you're misrepresenting it. It was a personal email which did not represent the views of my employer. In the same sense your Twitter account isn't an official publication of OpenGamma, neither is my personal email.". The main point appears to be that this isn't official Yammer policy. Personally, I think the contents are more interesting than whether its an official announcement or not, but YMMV... For the record, here is the original Hacker News thread.

Update 2011-11-30: Coda has explained how this document (originally a private email) came to be public. For the record, I got the story off Twitter, then Hacker News, and did not receive anything privately, but wanted to ensure the quality of this feedback reached a wider audience (especially given the original gist was deleted). The text of the explanation is also in a comment below.

Update 2011-12-01: A formal response has now arrived from Yammer. If you read the text below, you should also read the response linked above to get the fuller picture. Very little in tech is black and white, and, despite what some readers may think, I understand that very well.

Originally:
https://gist.github.com/7565976a89d5da1511ce

Hi Donald (and Martin),

Thanks for pinging me; it's nice to know Typesafe is keeping tabs on this, and I appreciate the tone. This is a Yegge-long response, but given that you and Martin are the two people best-situated to do anything about this, I'd rather err on the side of giving you too much to think about. I realize I'm being very critical of something in which you've invested a great deal (both financially and professionally) and I want to be explicit about my intentions: I think the world could benefit from a better Scala, and I'd like to see that work out even if it doesn't change what we're doing here.

Right now at Yammer we're moving our basic infrastructure stack over to Java, and keeping Scala support around in the form of façades and legacy libraries. It's not a hurried process and we're just starting out on it, but it's been a long time coming. The essence of it is that the friction and complexity that comes with using Scala instead of Java isn't offset by enough productivity benefit or reduction of maintenance burden for it to make sense as our default language. We'll still have Scala in production, probably in perpetuity, but going forward our main development target will be Java.

So.

Scala, as a language, has some profoundly interesting ideas in it. That's one of the things which attracted me to it in the first place. But it's also a very complex language. The number of concepts I had to explain to new members of our team for even the simplest usage of a collection was surprising: implicit parameters, builder typeclasses, "operator overloading", return type inference, etc. etc. Then the particulars: what's a Traversable vs. a TraversableOnce? GenTraversable? Iterable? IterableLike? Should they be choosing the most general type for parameters, and if so what was that? What was a =:= and where could they get one from?

A lot of this has been waved away as something only library authors really need to know about, but when an library's API bubbles all of this up to the top (and since most of these features resolve specifics at the call site, they do), engineers need to have an accurate mental model of how these libraries work or they shift into cargo-culting snippets of code as magic talismans of functionality.

In addition to the concepts and specific implementations that Scala introduces, there is also a cultural layer of what it means to write idiomatic Scala. The most vocal — and thus most visible — members of the Scala community at large seem to tend either towards the comic buffoonery of attempting to compile their Haskell using scalac or towards vigorously and enthusiastically reinventing the wheel as a way of exercising concepts they'd been struggling with or curious about. As my team navigated these waters, they would occasionally ask things like: "So this one guy says the only way to do this is with a bijective map on a semi-algebra, whatever the hell that is, and this other guy says to use a library which doesn't have docs and didn't exist until last week and that he wrote. The first guy and the second guy seem to hate each other. What's the Scala way of sending an HTTP request to a server?" We had some patchwork code where idioms which had been heartily recommended and then hotly criticized on Stack Overflow threads were tried out, but at some point a best practice emerged: ignore the community entirely.

Not being able to rely on a strong community presence meant we had to fend for ourselves in figuring out what "good" Scala was. In hindsight, I definitely underestimated both the difficulty and importance of learning (and teaching) Scala. Because it's effectively impossible to hire people with prior Scala experience (of the hundreds of people we've interviewed perhaps three had Scala experience, of those three we hired one), this matters much more than it might otherwise. If we take even the strongest of JVM engineers and rush them into writing Scala, we increase our maintenance burden with their funky code; if we invest heavily in teaching new hires Scala they won't be writing production code for a while, increasing our time-to-market. Contrast this with the default for the JVM ecosystem: if new hires write Java, they're productive as soon as we can get them a keyboard.

Even once our team members got up to speed on Scala, the development story was never as easy as I'd thought it would be. Because one never writes pure Scala in an industrial setting, we found ourselves having to superimpose four different levels of mental model — the Scala we wrote, the Java we didn't write, the bytecode it all compiles into, and the actual problem we were writing code to solve. It wasn't until I wrote some pure Java that I realized how much extra burden that had been, and I've heard similar comments from other team members. Even with services that only used Scala libraries, the choice was never between Java and Scala; it was between Java and Scala-and-Java.

Adding to the unease in development were issues with the build toolchain. We started with SBT 0.7, which offered a pleasant interface to some rather dubious internals, but by the time SBT 0.10 came out, we'd had endless issues trying to debug or extend SBT. We looked at using 0.10, but we found it to have the exact same problems managing dependencies (read: Ivy), two new, different flavors of inpenetrable, undocumented, symbol-heavy API, and an implementation which can only be described as an idioglossia. The fact that SBT plugin authors had to discover what "best practices" are in order to avoid making two plugins accidentally incompatible should have been a red flag for any tool which includes typesafety as a selling point. (The fact that I tried to write a plugin to replace SBT's usage of Ivy with Maven's Aether library should have been a red flag for me.)

We ended up moving to Maven, which isn't pretty but works. We jettisoned all of the SBT plugins I wrote to duplicate Maven functionality, our IDE integration worked properly, and the rest of our release toolchain (CI, deployment, etc.) no longer needed custom shims to work. But using Maven really highlighted the second-class status assigned to it in the Scala ecosystem. In addition to the "enterprisey" cat-calls and disbelief from the community, we found out that pointing out scalac's incremental compilation bugs had gotten that feature removed outright. Even the deprecation warning for -make: suggests using SBT or an IDE. This emphasis on SBT being the one true way has meant the marginalization of Maven and Ant -- the two main build tools in the Java ecosystem.

Cross-building is also crazy-making. I don't have any good solutions for backwards compatibility, but each major Scala release being incompatible with the previous one biases Scala developers towards newer libraries and promotes wheel-reinventing in the general ecosystem. Most Scala releases contain improvements in day-to-day programming (including compilation speed), but an application developer has to wait until all their dependencies are upgraded before they themselves can upgrade. If they can't wait, they have to take on the maintenance burden of that library indefinitely. In order to reduce their maintenance overhead, they naturally look for another, roughly equivalent library with a more responsive author. Even if the older library is better-tested, better-documented, and better-featured it will still lose out over time as developers jump ship for something that works with Scala 2.next sooner. (It's also worth noting that most companies using Scala at scale or in mission-critical capacities will not immediately upgrade; the library authors they employ will likely be similarly conservative, and the benefit their experience brings to their code will benefit the community less and less over time. As far as I've found, we're the only big startup in SF using 2.9.)

Once in production, Scala's runtime characteristics were the least subtle problem. At one point, half the team was working on a distributed database, and given the write fanout for our large networks some parts of the code could be called 10-20M times per write. Via profiling and examining the bytecode we managed to get a 100x improvement by adopting some simple rules:

1. Don't ever use a for-loop. Creating a new object for the loop closure, passing it to the iterable, etc., ends up being a forest of invokevirtual calls, even for the simple case of iterating over an array. Writing the same code as a while-loop or tail recursive call brings it back to simple field access and gotos. While I'm sure Scala will be have better optimizations in the future, we had to mutilate a fair portion of our code in order to actually ship it. (In another service, we got away with just using the ScalaCL compiler plugin and copying things to and from arrays instead of using immutable collections.)

2. Don't ever use scala.collection.mutable. Replacing a scala.collection.mutable.HashMap with a java.util.HashMap in a wrapper produced an order-of-magnitude performance benefit for one of these loops. Again, this led to some heinous code as any of its methods which took a Builder or CanBuildFrom would immediately land us with a mutable.HashMap. (We ended up using explicit external iterators and a while-loop, too.)

3. Don't ever use scala.collection.immutable. Replacing a scala.collection.immutable.HashMap with a java.util.concurrent.ConcurrentHashMap in a wrapper also produced a large performance benefit for a strictly read-only workload. Replacing a small Set with an array for lookups was another big win, performance-wise.

4. Always use private[this]. Doing so avoids turning simple field access into an invokevirtual on generated getters and setters. Generally HotSpot would end up inlining these, but inside our inner serialization loop this made a huge difference.

5. Avoid closures. Ditching Specs2 for my little JUnit wrapper meant that the main test class for one of our projects (~600-700 lines) no longer took three minutes to compile or produced 6MB of .class files. It did this by not capturing everything as closures. At some point, we stopped seeing lambdas as free and started seeing them as syntactic sugar on top of anonymous classes and thus acquired the same distaste for them as we did anonymous classes.

Now, every language has its performance issues, and the best a standard library can hope to do is to hit 80% of use cases. But what we found were pervasive issues — we could replace all of our own usages of s.c.i.HashMap, but it's a class which is extensively used throughout the standard library. It being slower than j.u.HashMap means groupBy is slower, as is a lot of other collections functionality I like.

At some point, I wondered if the positive aspects of our development experience owed less to Scala and more to the set of libraries we use, so I spent a few days and roughly ported a medium-sized service to pure Java. I broached this issue with the team, demo'd the two codebases, and was actually surprised by the rather immediate consensus on switching. There's definitely aspects of Scala we'll miss, but it's not enough to keep us around.

Already I've moved our base web service stack to Java, with Scala support as a separate module. New services are already being written on it, and given the results from our Hack Day at the beginning of this week it hasn't slowed our ability to quickly ship complex code. I'm keeping a close eye on the effects of this change, but I'm optimistic, and the team seems excited. We'll see.

So.

I've tried hard here not to offer you advice. Some of these problems could easily be specific to our team and our workload; some of them won't make a difference in how your company does; some of them aren't even your problems to solve, really. But they're still the problems we've encountered over the past two years, and they compose the bulk of what's motivating this change.

Despite the fact that we're moving away from Scala, I still think it's one of the most interesting, innovative, and exciting languages I've used, and I hope this giant wall of opinion helps you in some way to see it succeed. If there's anything here I can clarify for you, please let me know.

Its not for me to add any more to this, Yammer's opinion. There is a gold mine of information about Scala in there. And in my opinion, everyone thinking of adopting Scala should read it - in detail.

Update 2011-12-01: Just a quick reminder to read the responses from Coda/Yammer: immediate and formal.

Thursday 24 November 2011

Scala EJB 2 feedback

My Scala/EJB post generated plenty of attention, as I expected. I left the comments there for others to discuss the post - this is my reply to a few of the points.

EJB 2 comparison

A number of comments arose about the comparison, especially those feeling it didn't make sense. Basically, EJB 1/2 was one of those technologies that appeared at first glance to have a lot of promise, targetting known pain points. But over time everyone figured out that the approach was needlessly complicated and created more problems than it solved. Collectively, developers who lived through the era wonder how on earth it got adopted - in hindsight it seems obviously bad.

As I indicated in the blog, I was trained in EJB, but it was obvious to me that the technology was deeply flawed. So, I argued against its adoption, and never had to use it in anger. I see Scala as equally deeply flawed, thus am arguing against its adoption, and endeavouring to avoid using it in anger.

Since I had that same reaction in my gut as I did with EJB, I used the analogy - a high level analogy, not a low level one. To me, Scala really does feel as bad an idea as EJB 2 did.

Fantom

The trouble with today's tweet based soundbites is that it is difficult to have a slightly subtle position on something, and I'd say my position on Fantom is subtle. I think Fantom is hugely interesting because it shows what happens when you challenge your preconceptions about what a static type system should do (and also because of its ability to turn shared mutable state into a compile error).

The subtlety is that I don't see any evidence that the majority of the static typing community of developers (ie. Java) are willing to take the radical step that Fantom offers (paring down static typing to the bare minimum). For me, I think Ceylon and Kotlin are both being seduced into adding more to the type system than developers really need.

In my Devoxx talk, and in the evaluating Fantom blog post, I made the point about the type system. I also suggested that Fantom might well appeal to those from the dynamic side of the fence who have been bitten by an absence of static typing (like Ruby).

Thus, Fantom makes a good counterpoint to Scala. They are pretty much polar opposites in the static typing space. And I find it interesting and worth noting that Fantom spends its language complexity budget on things I care about, whereas Scala (over)spends its complexity budget on things I don't care about.

Thus, while it may seem like I'm saying "use Fantom, use Fantom, please use Fantom", I'm really just using it as an effective counterpoint. Pointing out something that in my opinion has better answers to the hard questions is not the same as saying go and adopt it. Linked yes, but not the same.

The other points

A number of comments from the Scala side noted that modules (of the type I was referring to) were a problem. I will also willingly acknowledge the heritage of the word module in other contexts.

On concurrency, some got the message and some didn't. My point is that you can design a programming language such that shared mutable state does not compile. Scala talks a good game in this area, but in forensic analysis it doesn't match up.

On the type system, some feel the strength of Scala's approach is valuable, while some like me see it as way too far beyond the point of sensible returns. I also maintain that if I add a string to a list of integers I should get a compile error, not a list of Any. With type inference and implicits, there is far too much potential for things that should be errors during maintenance/refactoring to not be spotted for my taste.

On syntax I was primarily driving at the open and flexible nature of the design. With optional method invocation dot, optional brackets, optional semicolons, symbols for methods/operators and implicits thrown in, it will necessarily be harder than many languages to work out what any individual piece of code does. And there are consequences. That flexibility leaves ample room for mailing list discussion about the "right way" to do something. It also makes it very difficult for IDEs and compilers to figure out what the code means - which is the reason for the slow compile speeds mentioned in a number of comments. Personally, I find the goal of the open and flexible syntax (arbitrary DSLs) to be not worth the pain. There are other neater ways to think about DSLs.

FUD, criticism and my goals

I was accused of spreading FUD. No big surprise there. My view is that if that was my goal I would have done a better job in showing the more complex interactions of the feature set, or just flat out lied. No, I think my goal was a bit more interesting than just FUD.

Basically, the key goal with the blog was to provide reassurance to others who feel as I do that Scala just isn't right. I opened my blog talking about the Scala community not liking dissent, and I stand by that. A number of reactions actually praised my bravery for being willing to stand up to the "Scala cult". I don't think its my imagination to suggest that Scala's enthusiasts have managed to stifle criticism and given the impression that you'd be crazy not to use Scala. If I have inspired confidence in others to speak out, or question what they've heard, then that is a Good Thing in my book.

Beyond that, the long term theme of this blog has been that we should look again at just how much Java threw away from C, and judge new languages as much on what they threw away as what they added. For me, Scala didn't throw enough away and added too much - a lethal combination.

In the end, as Dick Wall suggested, individuals should try it for themselves. I just ask those that do to think deeply before adopting it, and as I said in a tweet - make sure you get both the positives and negatives before deciding.

My personal favourite responses

These are selected because I found them funny, or the point they made was interesting to my biased eyes. I'll let you figure out which are which!

Colebourne is a sad, old twat.
Anonymous comment

He sounds like a naughty schoolboy who misses being spanked..
AlSpeer

I developed with Scala for about a year. I learn a lot and got to work with some smart people. However, then I had to maintain code and, oh boy, it was like being forced onto a diet of live insects.
Anonymous comment

I've only evaluated Scala at the surface, but simply want to back up Stephen in agreeing with the subjective scary feel of the language. I can think of no other languages where I've scanned examples and documentation, and wanted to run away as I have with Scala.
Casper Bang comment

I'm sure the type system is genius, but I'd prefer if it was sensible genius instead of mad genius. I like my sanity.
mcv comment

Years ago, I was pretty excited about it. I saw lots of things Java couldn't do, and that would make my life easier. But then more features. And more. But I think that even before I got totally turned off by the language, I got turned off by the community first. It felt very elitist and unwelcoming to people who may be just interested in a language to 'get things done' instead of one you can put on your resume and earn an immediate 'smart like hell' badge with.
chillenious comment

What can be worse than a likeness to EJB 2? A likeness to WS-*
Paul Sandoz tweet

You know who else compared Scala with EJB? Hitler.
Runar Oli tweet (which I took in good humour)

According to @jodastephen "Scala feels like EJB 2" http://blog.joda.org/2011/11/scala-feels-like-ejb-2-and-other.html ... will the next article explain how #Scala will cause world war 3?
Mario Fusco tweet

FWIW, I've used Scala for two years, written tens of thousands of lines of code in it, and find your criticism incisive.
Coda Hale tweet

Not that I really should comment on Scala, but I feel that 50% of it would be better than 100% of it -- "too much of everything"
Tatu Saloranta tweet

i used scala for a few months. it sounded very promising, java without the verbosity. but in the end i decided to stop using it.
The biggest problem for me was readability. Scala is the first language that i've learned where at first i couldn't just read code and immediately guess what it does.
adabsurdo

My personal experience in wrapping non-trivial Java libraries in Scala and Clojure is that with Clojure it usually just works and it works quickly. In Scala I am usually reduced to an extra hour or two of adding manifests to signatures until the compiler accepts it.
I am disappointed with Scala and having lived through EJB 1, EJB 2 and then onto Spring and EJB 3, I agree with Steve it makes me feel exactly the same as I felt about EJB 1 and EJB 2 - that is I am being sold overcomplicated technology as a panacea.
Tim Clark

I have similar feelings about Scala. It's a bit like C++. The difference is: I found a subset of C++ I liked.
Glyn Normington tweet

Scala sucks, and i'm blessed to know that i'm not alone feeling that way.
Evgeny Shepelyuk tweet

And finally

I don't see myself writing a post in quite the same way about Groovy, Ceylon, Kotlin, Xtend, Clojure,... I may critique them (all do or will have flaws), but I don't see myself ripping into them in the same way. There is just something about Scala...

My final thought is that it is OK to look at Scala and decide against. You're not alone.

Tuesday 22 November 2011

Scala feels like EJB 2, and other thoughts

At Devoxx last week I used the phrase "Scala feels like EJB 2 to me". What was on my mind?

Scala

For a number of years on this blog I've been mentioning a desire to write a post about Scala. Writing such a post is not easy, because anyone who has been paying attention to anti-Scala blog posts will know that writing one is a sure fire way of getting flamed. The Scala community is not tolerant of dissent.

But ultimately, I felt that it was important for me to speak out and express my opinions. As I said in my talk, if it was just me that had a poor opinion of Scala I would probably keep quiet (or try to figure out why I was out of step). But I perceive considerable uneasiness amongst many that have tried or looked at the language - something that reinforces my concerns.

Before I start I should mention that although I like the Fantom programming language, I also see merit in aspects of other languages - Groovy, Kotlin, Ceylon, Gosu, Xtend and many others. I also respect Clojure. By comparison, I really struggle to find positive feelings for Scala, and what positive feelings I have had have reduced over time. But why is that?

(For those not at my Devoxx talk, I tried to do two things - firstly to show Fantom off and explain how most simple comparisons to Scala rather miss the point, and secondly to point out some of the difficulties I have with Scala. To be clear, I'm not bashing Scala to promote Fantom. I'm bashing Scala because I think its entirely the wrong direction for the future.)

For this blog post Ive picked out a few key areas for discussion. I probably could have written a post 3 times as long as this one, and this one is very long as it is. There is lots to say about Scala, and very little is good.

Modules

Scala does not have a module system. By that I mean a deeply integrated system of modules that treats the basic program unit as something larger than a class, with versioning and dependencies. A key test is whether the new language compile modules or classes.

One of the greatest issues with Java is the lack of a module system. This absence has over time cause the platform to gain cruft (like CORBA) and struggle to shed it. The multi-year effort of modularising the JDK is evidence of how complicated this work is to do if not done from the start. And of course the Java platform will always have to support code not written in modules. Beyond the core JDK, most experienced Java developers have encountered the "Jar hell" scenario, where different versions of Jar files are required by different libraries and the ease with which it becomes impossible to assemble the whole application.

So, I have a clear sense that proper modularistion is a Good Thing, with all the versioned goodness that comes with it. (Managing change of a large application over time remains one of the largest problems faced by most large development shops, and one I don't see Scala tackling.) Over time, Java has evolved the Maven, Ivy and OSGi approaches to modules. Each has some benefits, but none are integrated into the platform itself, which is a significant disadvantage.

Yet, in a recent thread on modules, Scala aficionados claimed that Scala does have modules. In fact the opinion was clear - "see the object keyword", "Scala objects and path dependent types encode ML-style modules", "Also see http://www.mpi-sws.org/~rossberg/" (an academic paper). On further prompting, the ML view (standard source code can be used to express modules) was expanded on, before eventually the Scala approved way of using Maven/Ivy and the sbt tool was finally explained. There wasn't any real sense that this was a problem for Scala - so long as it integrated with the Java solutions that was fine.

I claim that integrating with Maven/Ivy is not fine. It misses huge opportunities to make life better on a topic where developers face real productivity issues in the field. Hence I commented that "Scala focuses on the wrong issues".

I also noted that backwards compatibility has been a constant problem of the Scala libraries. Modules are a tool for managing versioning and compatibility issues and would almost certainly have helped Scala evolve.

Finally, I noted that modules allow an application to find all the classes in the classpath/modulepath. This allows an application to find all the classes that implement an interface or annotation easily, which allows applications to be easily assembled from their parts. Java and Scala can achieve this, but only via complex and slow classpath scanning tools, like scannotation.

Concurrency

Scala makes a big deal about concurrency. About how the functional approach will aid the creation of safe multi-threaded code.

Except its really a bit of smoke and mirrors.

The big problem in concurrency is shared mutable state. It turns out that us developers are pretty bad at reasoning about it and using the tools at our disposal (synchronized, locks, java.util.concurrent) to manage that state. You'd expect that Scala would have tackled the concurrency problem at source - the shared mutable state - but it doesn't.

Scala (the language) does not know whether a class is immutable or not, nor does it provide a way to check is an object is immutable (Scala's libraries might help, but the language doesn't). As a result, it is perfectly possible to have a "static" (shared-thread) variable, or a "static" value of a mutable object. Its also possible to pass a mutable object to an actor and shared mutable state that way.

object Foo {
  var bar = "Hello"        // this is shared mutable state
  val baz = new Mutable()  // so is this
}

Tackling shared mutable state is not easy in language design. It involves designing the language to know about immutability, to track it, and to only allow immutable objects to be passed by reference to another thread/actor (mutable objects can be passed by copy). Done right, it eliminates the potential for concurrency issues from shared mutable state.

Scala relies on library design and discplined behaviour from developers to get this right (whereas Fantom builds this into the language, such that code equivalent to the example above will not compile). This is of course part of Scala's design approach - to give developers the power and trust that they will not abuse it. For me, this is simply another case of Scala failing to tackle the root cause of a big developer productivity issue.

Community

Scala has a loud and vocal community, especially amongst those on forums and twitter. Some of these developers have gone on to create libraries based on Scala, in all manner of areas, from web frameworks to database access. This can have the effect of making Scala appear to be the "upcoming destination" where other developers think they should invest their time.

Unfortunately there are some aspects of the community that are much more negative. Scala appears to have attracted developers who are very comfortable with type theory, hard-core functional programming and the mathematical end of programming. Frequently, there is the sense of a brainiac competition, and an awfully large amount of argument of whether solution A, B, C or D is the right one when in reality they all do exactly the same thing (Scala typically offers many ways to achieve the same end result, something that Java sought to avoid, and something that tends to create more heat than light in debates).

There is also a sense that many in the Scala community struggle to understand how other developers cannot grasp Scala/Type/FP concepts which seem simple to them. This sometimes leads Scala aficionados to castigate those that don't understand as lazy or poor quality developers, as being second class citizens. This can easily lead into derogatory comments about "Java Joe" or worse.

My experience is that most developers are perfectly clever people, and perfectly capable of understanding many things if they are explained correctly. The classic example is variance in Java generics, where ? extends is needed. I find that it is perfectly possible to explain the issues to a developer, who will pretty quickly grasp why a List of Integer cannot just be assigned to a List of Number. However, what I find is that once the discussion is complete, and the developer solves their immediate problem, the explanation will tend to slip away. The problem is not that the developer isn't smart enough, its that the complexities of the type system isn't important enough to care about. Understanding the issue at hand, management priorities, the problem domain and the architecture/design of the large system they are working on are much more significant issues.

The Scala community is also infected with modern societies desire for more, more, more without considering the consequences (more gadgets, faster car, bigger TV, more money, more spending, yet bankrupt people and countries). Scala goes all the way with its language features - everything is about maximum power. And the community revels in that power, finding and exploiting every corner case that the power grants, without truly considering the harm it does.

Type system

Every time I look at Scala it feels rather like the type system fits the phrase "if all you have is a hammer, everything looks like a nail". Whatever the problem, the type system is bound to be part of the solution.

The trouble is that a big type system is inevitably a complex type system. The concepts added to support the type system have their own terminology which is instantly inaccessible without significant learning, from high kinds to type constructors to dependent types... Its all just a baffling mess of type theory that provides no meaningful connection to actual work that needs doing.

The trouble is that despite the pleading of aficionados, method signatures like this abound:

 def ++ [B >: A, That] (that: TraversableOnce[B])(implicit bf: CanBuildFrom[List[A], B, That]) : That

If you don't know Scala, you wouldn't have a hope of attempting to understand the code. In fact this is the equivalent to Java's addAll() on a list. (Its so complex that Scala tries to hide it from you in the documentation by giving you a simpler form instead.)

Oh, and by the way, I do get the idea that the humongous type system is there to prevent compile errors and pre-check your code. But applying that logic in the opposite direction would imply that no-one gets any real work done in languages with dynamic type systems. For me, Scala's type system is way way beyond the point of sensible returns for a language feature.

Steve Yegge's analysis was perhaps the most fun:

The... the the the... the language spec... oh, my god. I've gotta blog about this. It's, like, ninety percent [about the type system]. It's the biggest type system you've ever seen in your life, by 5x. Not by an order of magnitude, but man! There are type types, and type type types; there's complexity...

They have this concept called complexity complexity<T> Meaning it's not just complexity; it's not just complexity-complexity: it's parameterized complexity-complexity. ... I mean, this thing has types on its types on its types. It's gnarly

Steve uses Scala to argue for dynamic type systems. I disagree, and consider a static type system to be useful for documentation, communicating intent in a team and for basic error checking. But I don't need the world's most complicated type system to do that - I just need something simple and effective.

In essence, Scala's type system is giving static typing in general a bad name.

Syntax

Scala's syntax is very wide open. It is the case that given a small piece of code (the kind that developers look at all day long) it is frequently difficult to reason about what that code does.

Scala has a big focus on flexible syntax with the aim of allowing a user to create a DSL in almost any form without having to write any parsing code. Instead, the developer just has to write a "normal" API, and use the language's syntactic flexibility to enable the ultimate end user to write in the desired style.

Take implicits, a technique that seems perfectly sensible to allow type conversions and object enhancements in a type safe way. Well, it may be type safe, but its also silent and very deadly. You can look a piece of code and not have any idea what is being converted. Unless you understand every import, every active implicit, their scope, their priorties and much more, you really don't have a clue what your code is doing. But thats OK, you didn't want to be able to understand Scala code did you???

Or take the fold operators and placeholder _, which produce delightful code like this:

 (0/:l)(_+_)

Thats practically the very definition of line noise. (And I've not even shown any scalaz examples, or similar unicode weirdness)

By the way, if you're looking at Scala, you may come across conference presentations, blog posts and forums that show small snippets of code in Java and Scala and show how much less code you have to write in Scala, and how much simpler it appears to be. This is a clever trick. The real complexity occurs when you start mixing each of the seemingly simple features together in a large codebase, on a big team, over a period of time, where code is read far more than it is written. That is when you start to appreciate that some of Java's restrictions are there for a reason.

Quality

When evaluating a language, its important to get a sense of the quality of the implementation. This is useful for determining how easy it will be to maintain the current language and extend it in the future.

For this, I turn to the analysis by the core Scala committer, Paul Phillips, in a Scala podcast in June 2011 (selected elements):

The compiler is, and the libraries and language as a whole, its awesome but the number of places where features interact is astronomical. In order to really get the lid on that many feature intersections we need a massively comprehensive test suite, that we simply don't have.
...
[Question:] Its been suggested that you are skeptical of community involvement because there is no test suite? You're afraid that if anyone touches it but you the whole world will break.
[Answer:] I, unfortunately, continue to be bitten by extremely subtle bugs that come out because of the inadequecy of our test suite.
...
[Question:] Where do you want the test suites?
[Answer:] Collections
[Question:] Anywhere in particular?
[Answer:] All of them. There's no reason that many many many of the bugs we've seen in the collections over the last couple of years should ever have happened because they should be exhaustively shown not to exist by virtue of the tests that we have, but don't have yet.
...
[Question:] And what about the compiler?
[Answer:] An exhaustive test suite for the pattern matcher would certainly aid me in the process of finally really fixing it. It would be very very helpful actually.

An incredibly complicated language with very few tests? Sounds like a poor foundation to build real world applications on to me.

Specifically, note this line - "the number of places where features interact is astronomical". This is a key aspect of Scala. That each language feature is orthogonal and flexible. Implicits mean that code can be inserted almost anywhere (which slows the compiler, listen to the podcast). The ability to drop method invocation dots and brackets for parameters (to achieve DSLs) makes the meaning of code non-obvious, and leaves no spare syntax space for future enhancements. And these things combine to make a good IDE a very difficult challenge.

(If you're evaluating Scala for adoption by your team, I strongly recommend listening to the whole 40 minutes of the podcast. It will help you understand just what the real issues are with Scala, the quality of the implementation, and how difficult the language is to evolve.)

EJB 2

The EJB 2 spec was in many ways the nadir of Java EE, where huge amounts of boilerplate, XML and general complexity were foisted onto the Java industry. The spec was widely adopted, but adoption was followed by criticism. Developers found that while EJB 2 sought to reduce complexity in building an enterprise application through an abstracted higher level API, in reality it added more complexity without providing the expected gains. Documentation, best practices and tooling failed to solve the basic design issue. Spring was launched as a greatly simplified alternative, and eventually the much simpler EJB 3 was launched, a spec that had little to do with EJB 2.

As a data point, I attended a formal weeks training course in EJB. At the end of the course I knew that this was a very bad technology and that I would recommend against its use at every opportunity. Scala has exactly that same feel to me.

So, at Devoxx I said that "Scala feels like EJB 2 to me". The language is a well-meaning attempt to create something with a higher abstraction level. But what got created is a language that has huge inherent complexity and doesn't really address the true issues that developers face today, yet is being pitched as being a suitable replacement for Java, something which I find to be bonkers.

At the moment, Scala is at the stage of thinking that better documentation, best practices and tooling will make a huge difference. None of these helped EJB 2.

In fact one might argue that Java's biggest flaw down the years has been the architectural over-engineering of solutions, when something simpler would have done the job. Again, Scala feels very much in the mold of that strand of the Java community, over-engineered rather than YAGNI or 80/20.

Of course, neither Spring nor EJB 3 are perfect, but the core concept of injection appears to be easy to grasp and the basic mechanism of linking them simple. In particular, having easily cut and pasted sections of documentation proved very valuable. Having code that is easy to grasp where problems can be tracked down without needing a PhD in type theory is a Good Thing, not a bad one. Having code where you can work out what it does without needing to know every last detail of the "astronomical" number of language feature intersections is a Good Thing, not a bad one.

Of course the upside for Scala of my EJB 2 comparison is that EJB 3 is a lot better. Thus, it is conceivable that Scala could reinvent itself. But I would point out that EJB 2 and 3 are essentially utterly different approaches, happening to share a common name. I would say that Scala would need a similar reinvention from scratch to solve its problems.

Summary

I don't like Scala. And that dislike is increasing. Specifically, I do not want to spend any of my future working life having to write code in it.

Had Scala stayed as a remote language, for highly specialist use cases (like Haskell or Erlang) then I would have far less of an issue. But it is being sold as the solution for mainstream development, and for that it is as utterly unsuited as EJB 2 was.


Update 2011-11-24: Rather than respond to all the comments inline, I penned a response blog.

Friday 18 November 2011

Guide to evaluating Fantom

Yesterday at Devoxx I spoke about the Fantom programming language.

Evaluating Fantom

This blog post provides some links to help you evaluate Fantom, and could be useful if you attended the session yesterday, or if you're generally looking at languages.

While the "hello world" example is useful when evaluating a language, its important to look deeper. Key areas to look at include the type system, paradigm (OO vs functional), immutability, concurrency, standard library, modularity and composability. Instead of hello world, here is a word counting program:

class WordCount {
  Void main(Str[] args) {
    if (args.size != 1) {
      echo("usage: Wordcount <file>")
      Env.cur.exit(-1)
    }

    // Set up an empty map to count each word, setting default for each value to zero
    wordCounts := Str:Int[:] { def = 0 }

    // Open the file, read each line in order
    file := Uri(args[0]).toFile
    file.eachLine |line| {
      // skip empty lines
      if (line.trim.isEmpty) return

      // split and trim on whitespace into words
      words := line.split

      // count each one
      words.each |word| { wordCounts[word] += 1 }
    }

    // Show each word found, with its count, in alphabetical order
    wordCounts.keys.sort.each |key| {
      echo("$key ${wordCounts[key]}")
    }
  }
}

Even without the comments, a key Fantom feature is that the code is instantly readable. It is very much in the mainstream of syntax and there is a high chance that developers can read the code without learning the language.

Underneath, I think Fantom is more radical in some ways than many other new JVM languages. Fantom is effectively its own platform on top of the JVM. It compiles to its own bytecode format, fcode, with JVM bytecode emitted on demand. It can also compile to JavaScript, and people have investigated .NET and LLVM compilation as well. And, although you can call Java classes, most of the time you work with Fantom's API, not Java's API.

The biggest difference to other languages is the type system. Static typing is valuable for communication, documentation and basic error checking. But most static type systems are a straight-jacket of rules that effectively require you the developer to prove your application to the compiler, something which Java generics has shown can be very painful. Fantom's approach is radically different - it uses the type system to support you, rather than restrict you. This is what Brian Frank, the language author had to say:

Kotlin seems to be following in the same basic philosophical path of Java, Scala, and most all other statically typed languages: the job of a static type system is to always prove that something is type safe at compile time. I believe Fantom is unique in our approach of type systems: we allow things which might work to be caught at runtime versus compile time. This is a pragmatic trade-off which years of use seems to validate: you can use a weaker but much simpler type system and still catch most errors at compile time. Fantom proves that there can be a sweet spot between static and dynamic type systems.

Whether you can accept this approach to the type system will define your evaluation of Fantom. If you can appreciate the idea that you don't need to add more generics to fix the problems in Java, then you will appreciate Fantom. If you've used and liked the freedom of dynamic langauges, like Ruby, but understand that a little static typing could be useful, then you will appreciate Fantom.

Fantom also does something which every other new language should do, but doesn't. Everything is a module, which provides big knock on benefits to reflection and dynamic behaviour, plus a good way to assemble code.

Finally, Fantom tackles concurrency properly by actively preventing shared mutable state (something that Scala fails to do). Immutability is deeply built in. Every object, list, map and function knows if it is immutable, and lists/maps can be converted to be immutable. At the static level, there are no static variables, and constants must be immutable. The only way to access another thread is via the actor framework, and that only allows you to send an immutable object or a copy of a mutable one (via Fantom's built in serialization).

Summary

When Java was created it took C and C++ and threw lots of cruft away to create something radically simpler, yet more powerful overall. Fantom has done the same to Java, throwing lots of cruft away to create something radically simpler, yet more powerful overall.

Hopefully the links and this quick feature guide will help those looking at the language.

Monday 7 November 2011

The future is in the JEPs

If you want to know the most likely contents of Java 8 and beyond, the JEP process is the best place to look.

JDK Enhancement Proposals

JDK Enhancement Proposals (JEPs) are the simple descriptions of the tasks that are being considered for Java 8. Here is Brian Goetz's take:

JEP stands for "JDK Enhancement Proposal", and is part of our process for building a technical roadmap for the Java platform. Filing a JEP (what we used to called a "one pager") is the first step towards inclusion of a proposed feature in the JDK.

The JEP process document linked below outlines the states a JEP can go through, from Draft (we're just talking) to Submitted (I think this is good enough to review) to Candidate (Group/Area leads think the idea has enough merit to not toss out) to Funded (someone has actually committed resources to making it happen.) The last transition -- to Funded -- is the one at which this goes from being an idea to being part of the plan for some JDK release.
...
What JEP is not: it is not a "suggestion box" for drive-by requests. The JEP process is open to JDK *committers*, and it is unlikely a JEP will gain funding if the author is not prepared to contribute significant effort to the project's implementation or stewardship.

And the formal process document says:

The primary goal of this process is to produce a regularly-updated list of proposals to serve as the long-term Roadmap for JDK Release Projects and related efforts. The Roadmap should extend at least three years into the future so as to allow sufficient time for the most complex proposals to be investigated, defined, and implemented.
...
This process is open to every OpenJDK Committer. Decisions about specific proposals will be made in a transparent manner but are ultimately up to the OpenJDK Lead.

This process does not in any way supplant the Java Community Process. The JCP remains the governing body for all standard Java SE APIs and related interfaces. If a proposal accepted into this process intends to revise existing standard interfaces, or to define new ones, then a parallel effort to design, review, and approve those changes must be undertaken in the JCP, either as part of a Maintenance Review of an existing JSR or in the context of a new JSR.
...
That a particular JEP appears in the Roadmap means only that it is the proposal of record from a technical perspective. There is no guarantee that anyone will work on it, much less that its end result will appear in any JDK Release Project.

At the time of writing, there are 26 JEPs, plus 2 "meta" JEPs.

JEPTitle and LinkMy comments
101Generalized Target-Type InferenceMore type inference
102Process API UpdatesManaging OS processes
103Parallel Array SortingFrom the concurrency group
104Annotations on Java TypesAnnotations everywhere (JSR-305/308)
105DocTree APICompile AST access to javadoc
106Add Javadoc to javax.toolsProgramatic access to javadoc tool
107Bulk Data Operations for CollectionsAdd lambda aware methods to collections
108Collections Enhancements from Third-Party LibrariesNew collection methods and collection classes
109Enhance Core Libraries with LambdaAdd lambda aware methods to everything apart from collections
110New HTTP ClientReplace HttpURLConnection
111Additional Unicode Constructs for Regular ExpressionsEnhance RegEx
112Charset Implementation ImprovementsEnhance CharSet
113MS-SFU Kerberos 5 ExtensionsSecurity
114TLS Server Name Indication (SNI) ExtensionSecurity
115AEAD CipherSuitesSecurity
116Extended Validation CertificatesSecurity
117Remove the Annotation-Processing Tool (apt)Replaced by JSR-269
118Access to Parameter Names at RuntimeReflective access to parameter names
119javax.lang.model Implementation Backed by Core ReflectionEnhance low level compiler model
120Repeating AnnotationsAllow multiple annotations of the same type
121Stronger Algorithms for Password-Based EncryptionSecurity
122Remove the Permanent GenerationMajor change to JVM memory usage
123Configurable Secure Random-Number GenerationBetter random numbers (Security)
124Enhance the Certificate Revocation-Checking APISecurity
125Network Interface Aliases, Events, and DefaultsSuch as listening to wifi and mobile networks
126Lambda Expressions and Virtual Extension MethodsMain language change of Project lambda

Overall a good set of items so far, with access to parameter names and collection enhacements being of particular interest to me.

Since Oracle isn't accepting "drive by" enhancement requests, feel free to post your thoughts in the comments!

Tuesday 1 November 2011

My Google Reader fix

Google Reader's bland new look is a huge step backwards. So bad that I've installed GreaseMonkey for the first time.

Clearly, I'm a GreaseMonkey script newbie, but based on this script which does most of the hard work I've reached something that makes the product usable again. (The base script reduced the whitespace, while my extensions add back colour.)

// ==UserScript==
// @name           Fix New Google Reader Layout
// @namespace      http://joda.org
// @description    Combined fix for Google Reader
// @include        http*://www.google.*/reader/*
// @version        2.1
// ==/UserScript==
// based on http://userscripts.org/scripts/show/116850

var overrideCSS = " \
#top-bar { height:45px !important; } \
#search { padding:8px 0 !important; } \
#viewer-header { height:45px !important; } \
#lhn-add-subscription-section { height:45px !important; } \
#lhn-add-subscription, #viewer-top-controls-container \
{ margin-top:-15px !important; } \
#entries { padding:0 !important; background-color: #e2e7f0 } \
#title-and-status-holder { padding:0.3ex 0 0 0.5em !important; } \
.collapsed { line-height:2.2ex !important; padding:2px 0 !important; } \
.entry-icons { top:0 !important } \
.entry-source-title { top:2px !important } \
.entry-secondary { top:2px !important } \
.entry-main .entry-original { top:4px !important } \
.section-minimize { left: 0px !important } \
#overview-selector, #lhn-selectors .selector, .folder .name.name-d-0, \
#sub-tree-header \
{ padding-left: 15px !important; } \
.folder .folder .folder-toggle { margin-left:13px !important } \
.folder .sub-icon, .folder .folder>a>.icon { margin-left:27px !important } \
.folder .folder>ul .icon { margin-left:34px !important } \
.folder .folder .name-text { max-width:160px; !important } \
#reading-list-selector .label { display:inline !important } \
#entries.list .entry { background-color: #e2e7f0 !important; } \
#entries.list .entry .collapsed { border-bottom:2px solid #e2e7f0; background-color: #c7d0e2 !important } \
#entries.list .entry.read { background-color: #e2e7f0 !important; } \
#entries.list .entry.read .collapsed { background-color: #e2e7f0 !important; } \
#entries.list .entry-container { border-left:2px solid #6a7893 !important; border-right:2px solid #6a7893 !important; border-bottom:2px solid #6a7893 !important } \
";
GM_addStyle(overrideCSS);

(Save as a file ending in ".user.js", then drag it into FireFox to load it. You may find Reader needs some refreshes before it is picked up. BTW, you only need this script - you don't need the base script.)

I suspect others will make a better script, but this satisfies me for now.

Update 2011-11-03: On my other computer, I'm trying Stylish instead. The following is my customization of this style and this style. This style is closer to the old reader than the GreaseMonkey one above. Stylish is also safer to use for general web surfers than GreaseMonkey.

@namespace url(http://www.w3.org/1999/xhtml);

@-moz-document regexp("https?://www.google.com/reader/.*")   {

#top-bar {
    height: 40px !important;
}
#logo {
    height: 27px !important;
    margin: -13px 0 0 11px !important;
}
#search {
    margin-left: 157px !important;
    padding: 9px 0 !important;
}
#search-input {
    border: 1px solid #B2B2B2 !important;
    margin: 0 !important;
    padding: 3px 2px !important;
    width: 200px !important;
}
#viewer-container {
    background: none repeat scroll 0 0 #EBEFF9 !important;
}
#viewer-top-controls-container {
    margin-top: -12px !important;
}
.jfk-textinput {
    height: 17px !important;
}

#viewer-header-container {
    background: none repeat scroll 0 0 #C2CFF1 !important;
}
#viewer-header {
    background: none repeat scroll 0 0 #C2CFF1 !important;
    height: 30px !important;
}
.goog-button-base,.jfk-button,.goog-flat-menu-button {
    font-size: 1em !important;
    font-weight: normal !important;
}
.goog-menu-button .goog-button-base-content {
    padding: 7px !important;
}
.goog-button-base,.jfk-button,.goog-inline-block,.goog-flat-menu-button {
    cursor: pointer !important;
}
.goog-button-base-content {
    padding: 3px 0.461em 0 !important;
    position: relative !important;
    text-align: center !important;
}
.goog-button-tight .goog-button-base-content {
    line-height: 1em !important;
}
.goog-button-base-inner-box {
    background: none repeat scroll 0 0 #F9F9F9 !important;
    height: 20px !important;
}
.goog-button-base-outer-box {
    border-bottom: 1px solid #AAAAAA !important;
    border-top: 1px solid #BBBBBB !important;
}

.goog-button-base-inner-box {
    background: none repeat scroll 0 0 #F9F9F9 !important;
    border-left: 1px solid #BBBBBB !important;
    border-right: 1px solid #AAAAAA !important;
    margin: 0 -1px !important;
}
.goog-menu-button:active .goog-button-base-outer-box, .goog-menu-button:active .goog-button-base-inner-box, .goog-combobox-active .goog-button-base-outer-box, .goog-combobox-active .goog-button-base-inner-box, .goog-menu-button.goog-button-base-open .goog-button-base-outer-box, .goog-menu-button.goog-button-base-open .goog-button-base-inner-box {
    background: none repeat scroll 0 0 #F9F9F9 !important;
    border-color: #888888 !important;
}
.goog-button-base:hover .goog-button-base-outer-box, .goog-button-base:hover .goog-button-base-inner-box, .goog-button-base:focus .goog-button-base-outer-box, .goog-button-base:focus .goog-button-base-inner-box {
    border-color: #888888 !important;
}
.jfk-button-standard.jfk-button-hover, .jfk-button-standard.jfk-button-clear-outline.jfk-button-hover {
    border-color: #888888 !important;
}
.goog-flat-menu-button.goog-flat-menu-button-hover {
    border-color: #888888 !important;
}
.goog-menu-button:active .goog-button-base-outer-box, .goog-menu-button:active .goog-button-base-inner-box, .goog-combobox-active .goog-button-base-outer-box, .goog-combobox-active .goog-button-base-inner-box, .goog-menu-button.goog-button-base-open .goog-button-base-outer-box, .goog-menu-button.goog-button-base-open .goog-button-base-inner-box {
    background-color: #777777 !important;
    border-color: #888888 !important;
}
.jfk-button-standard
{
    border: 1px solid #AAAAAA !important;
    background: none repeat scroll 0 0 #F9F9F9 !important;
}
.jfk-button {
    height: 20px !important;
    line-height: 20px !important;
}
.goog-flat-menu-button {
    line-height: 20px !important;
    border: 1px solid #AAAAAA !important;
}
.goog-flat-menu-button-dropdown {
    top: 8px !important;
}
.goog-menu-button .goog-menu-button-dropdown {
    top: 8px !important;
}
.goog-menu-button .goog-button-base-content {
    padding: 4px !important;
}

#title-and-status-holder {
    padding: 0.1ex 0 0.1ex 0.5em !important;
}

#entries {
    padding: 0 !important;
    border-top: 1px solid #C2CFF1 !important;
}
#entries.list .entry .collapsed {
    background: none repeat scroll 0 0 #FFFFFF !important;
    border: 2px solid #FFFFFF !important;
    cursor: pointer !important;
    margin: 0 !important;
    overflow: hidden !important;
    padding: 0 !important;
    position: relative !important;
    width: auto !important;
}

#entries.list .read .collapsed {
    background: none repeat scroll 0 0 #F3F5FC !important;
    border: 2px solid #F3F5FC !important;
}
#entries.list .collapsed .entry-icons {
    top: 1px !important;
}
#entries.list .collapsed .entry-secondary,#entries.list .collapsed .entry-main .entry-source-title {
    top: 1px !important;
}
#entries.list .collapsed .entry-main .entry-original {
    top: 6px !important;
}

#current-entry .entry-container .entry-title a, #current-entry .entry-container a.entry-source-title, #current-entry .entry-container .entry-body a, #current-entry .entry-container a.entry-post-author-name {
    color: #2244BB !important;
}
#entries.list #current-entry .collapsed {
    border-color: #6688EE !important;
}
#entries.list .expanded .collapsed {
    border-bottom-width: 0 !important;
}
#entries.list #current-entry.expanded {
    border-color: #6688EE;
    border-style: solid !important;
    border-width: 0 2px !important;
}
#entries.list #current-entry.expanded .collapsed {
    border-color: #6688EE -moz-use-text-color;
    border-left: medium none !important;
    border-right: medium none !important;
}
#entries.list #current-entry.expanded .entry-actions {
    border-bottom-color: #6688EE !important;
}
#entries.list .entry .entry-actions {
    background-color: #EBEFF9 !important;
    border-top: 1px solid #C2CFF1 !important;
    color: #333333 !important;
}
.entry .entry-actions a, .entry .entry-actions .link {
    color: #2244BB !important;
}

/*This changes the folder icon. Play around with alpha settings to change opacity */

/*.folder-icon {
 background-position: 0 0 !important;
 background: url(https://ssl.gstatic.com/docs/doclist/images/collectionsprite_1.png) no-repeat !important;
}*/

.folder-icon, .tag-icon {
opacity: .1 !important;
filter: alpha(opacity=10) !important;

}

/* change colour of subscribe button */
.jfk-button-primary {
 background: #f2f2f2 url(none) !important;
 border: 1px solid #DCDCDC !important;
 color: black !important;
}

}

To use this style, install Stylish, go to AddOns manager, and click 'Write a New Style', pasting in the code above.

Monday 31 October 2011

Trick or Treat, Extortion and Patents

Tonight, the 31st of October, many children will be out "trick or treating". While some may call me a grouch, I find the whole concept rather disturbing.

Trick or treat is very simple. The child knocks on your door and offers you a choice. Either give them a "treat" (money or sweets), or they will perform a "trick" (something unspecified, but probably something very annoying and messy).

What other words and phrases do we have that describe this modus operandi:

"Money with menaces"

"Protection racket"

"Extortion"

"Blackmail"

So, around the world, thousands of parents are happily letting their child go out and blackmail/extort people in their own homes. Personally, I find the parenting aspect of this deeply disturbing - teaching children how to use extortion to get what they want.

So, whats the link to patents?

Well, patents have become simply another form of extortion.

Company X goes along to company Y and says "look, I have a great wodge of patents here, pay me lots of money or I'll cause you lots of grief". Its simply trick or treat in corporate clothes.

(Lawsuits, at least in the US and UK, tend to be won by the party with the most money, not the party who is "right". Thus, for most companies the extortion threat is generally strong enough to make a smaller company pay the protection money.)

The effects of this are causing terrible grief to this industry. Right now everyone is suing everyone. Well apart from IBM, and to some degree Microsoft, who appear to have a big enough wodge of patents to waft around that they can scare everyone else into submission. Heck, IBM even has a patent on the "business process" of extortion by patent.

The situation is now at a point where I would suggest the following advice holds to entrepeuners.

Make sure that your startup is either relatively small and insignificant so that it doesn't attract attention (if you have no money its not worth blackmailing you), or is so wildly successful that you can afford to pay the extortion/protection money.

(In case you were wondering, Google is the classic case of a company that became wildly successful and wealthy but didn't submit to extortion. And one look at the mess of lawsuits it, and related Android partners, are suffering from shows how bad it can get, such as Microsoft earning more from Android through extortion than it does from its own phone OS. If you know patents and want a job, Google has lots on offer.)

Viewed in this way, it should be clear why patents are not "protecting the little guy" or "promoting innovation". They simply reward longevity and ability to be scary.

So tonight I encourage those parents amongst my readers to not allow your children to go out and learn how to extort/blackmail. Perhaps in 20 years time corporates will be a little better as result.

Wednesday 26 October 2011

Transparency in action

Some days I think I have infinite patience. Today was a day I was reminded that I don't.

Here is what Oracle said in the JCP submission for Project Lambda (November 2010). Interspersed is what has actually happened (non contentious points snipped):

2.14 Please describe the anticipated working model for the Expert Group working on developing this specification.

A publicly readable mailing list for expert group communication will be the primary working model.

The expert group mailing list is not publicly readable. This has effectively created a 'them and us' environment where no rationale can be seen and no real input can be provided.

2.15 Provide detailed answers to the transparency checklist, making sure to include URLs as appropriate:
...
- The Expert Group business is regularly reported on a publicly readable alias.
We intend this to be the case.

Expert group business has occasionally been reported to the public lambda-dev list. However, feedback could in no way be described as "regular". Today (the straw that finally broke the camels back of my patience) I found that the most likely syntax for method references was being touted in an IBM developer works article without any input from the main public mailing list at all - I'm sorry but you cannot talk about transparency and then ignore the only vaguely transparent element in the system in key decisions.

- The public can read/write to a wiki for my JSR.
Rather than a wiki which the Expert Group must take time out to visit, we intend to implement a pair of mailing lists following the approach of JSR 294 and JSR 330. First, Expert Group business will be carried out on a mailing list dedicated to Expert Group members. Second, an "observer" mailing list which anyone may join will receive the traffic of the Expert Group list. The observer list allows the public to follow Expert Group business, and is writable to allow the public to make comments for other observers to see and make further comment on. Expert Group members are under no obligation to join the observer list and partake in public discussion. In addition to these lists we will maintain a private Expert-Group-only list for administrative and other non-technical traffic.

Nope. No observer list with public archives. And obviously not writeable. All meaningful discussions are in private and hidden.

- I read and respond to posts on the discussion board for my JSR on jcp.org.
In lieu of tracking the jcp.org discussion board, and in light of the considerable public attention that the observer list is likely to receive, the Specification Lead (or his designates) will read the observer list and respond directly as appropriate.

Obviously this fails too.

- There is an issue-tracker for my JSR that the public can read.
We intend this to be the case.

Anyone seen an issue tracker? Not me.

Now, I've been patient on this as I know privately some of the reasons why there is no transparency. But frankly enough is enough. If the management team cared enough about this they would have escalated the priority of this sufficiently by now to have removed the roadblocks.

After all, not even members of the JCP Executive Committee get to see what is going on, as per the comments for Java 7:

...
------------------------------------------------------------------------------
On 2011-07-18 Keil, Werner commented:
... want to emphasize again the concern about intransparent ways this umbrella JSR and some of the underlying EGs have worked.

Hoping greater transparency in the spirit of JCP.next and also more satisfactory licensing terms follow with Java 8 and beyond.
------------------------------------------------------------------------------
On 2011-07-16 London Java Community commented:
...
We note that the archives for some of the Expert Groups that comprise this JSR have not been made public. It is most regrettable that this did not happen prior to this being put to final vote.

We trust that no further platform JSRs will be submitted without full access to EG archives - we would be very unlikely to support any such JSR.

Summary

However you look at it, transparency simply isn't happening with key JCP Java SE projects. (The same issues afflict other Java SE projects)

But don't worry, its all going to get magically better! Oracle recently signed up for more transparency in JSR-348. Is anyone out there still willing to believe its actually going to happen?

Monday 17 October 2011

IANA running time-zone database

This is a quick note to alert readers to the fact that IANA, the Internet Assigned Numbers Authority, has taken over the running of the time-zone database. The new home is at IANA.

The formal announcement is as follows:

ICANN to Manage Time Zone Database

The Internet Corporation for Assigned Names and Numbers (ICANN) today took over operation of an Internet Time Zone Database that is used by a number of major computer systems.

ICANN agreed to manage the database after receiving a request from the Internet Engineering Task Force (IETF).

The database contains time zone code and data that computer programs and operating systems such as Unix, Linux, Java, and Oracle rely on to determine the correct time for a given location. Modifications to the database occur frequently throughout the year.

"The time zone database is used by a large number of commercial operating systems and the software applications," said Russ Housely, chairman of the IETF. "Incorrect time zone information will impact many everyday activities, including meeting and conference call coordination, airplane and train schedules, physical package delivery notices, and astronomical observatories."

For nearly three decades, the TZ Database had been maintained by a group of dedicated volunteers, in particular, Arthur David Olson at the US National Institutes of Health. Olson coordinated the group, managed the data, and created a platform for their release. Olson's announced retirement prompted the IETF to turn to ICANN to ensure continued operation of the database.

"The Time Zone Database provides an essential service on the Internet and keeping it operational falls within ICANN's mission of maintaining a stable and dependable Internet," said Akram Atallah, ICANN's Chief Operating Officer.

This good news should hopefully provide a more secure legal footing for the database.

Thursday 13 October 2011

Time-zone database - Astrolabe's opinion

I have had the following email statement from Astrolabe forwarded to me with regards to the time-zone database dispute:

Shortly after filing a copyright violation lawsuit with the U.S District Court for Massachusetts against Mr. Arthur David Olson or Mr. Paul R. Eggert, Astrolabe Inc., received dozens of communications via emails, telephone calls, Facebook postings, tweets, and inquiries. While there have been some legitimate inquiries, many of these communications have been hostile and accusatory in nature; Astrolabe, Inc. has been attacked on many levels, leading it to conclude, after careful review of these communications, that the purpose of this lawsuit has been misunderstood by the media and the public. Astrolabe, Inc., seeks to clarify this purpose in an effort to clear the air, and allay misguided concerns, by clearly setting forth the relevant facts:

1. Astrolabe’s lawsuit is in no way intended to interfere with compilation of current time-zone information maintained by Mssrs. Olson and Eggert, or any other persons. Indeed, Astrolabe applauds the efforts of Olson, Eggert and the many other volunteers who maintain the database of current time changes.

2. The aim of the suit is only to enforce copyright protection for materials regarding historical time data prior to 2000. This does not affect current time-setting on computers, and it has little or no effect on the Unix computing world, as erroneously reported by various media outlets, to the best of Astrolabe’s knowledge.

3. Late in 2010, Astrolabe was disturbed to learn of the compilation effort of Mssrs. Olson and Eggert, which had been going on for a number of years, included not only current time-zone information, but also historical information. In response to Astrolabe’s inquiries, Mssrs. Olson and Eggert provided misleading answers, indicating that their database included only incidental and limited reproduction of the copyrighted material, however, further research by Astrolabe revealed that this was not the case, but instead consisted of wholesale reproduction of the same, without lawful permission, contract or license.

4. The fact is that the historical time data compiled by ACS is protected by registered copyrights, particularly in publication in book form as the International Atlas, and later in electronic form as the ACS PC Atlas. The question of whether the material is “copyrightable” has already been decided by the U.S. Copyright Office in the affirmative.

5. These Atlases are not simply “compilations” of historical, readily available “facts.” Besides researching “official” records, the publisher and authors consulted a myriad of other records using proprietary methods and, on some occasions, hiring local investigators. Where inconsistencies existed, the publisher and authors used their best judgments and expertise as to the actual time observed in specific locations, based on this historical research. In much the same way as Zagat Survey and Michelin Guide not only set forth the names, addresses and features of particular restaurants, but also various ratings, the Atlases comprise original historical time and location research, including judgments and expertise in determining actual historical time observed in any given location, fully meeting the definition of an “original work” as required under the Copyright Laws of the United States.

Conclusion

With growing globalization, an interest in arcane historical information, such as actual time observed in specific locations -- of vital import to the astrological community -- apparently has spread to the global community. As anyone with an interest in astrology will tell you, knowing the place and actual time when a particular event occurred is important. While the dubious may cast aspersions upon astrology, they cannot discount the contributions made regarding actual historical time observations by those interested in this field, or the utility such data may have in the current global community – nor should they be allowed to profit at the expense of the hard work and efforts of others.

(Email forwarded to me by Ken Hirsch who requested "on the record" comment. Its obviously been sent to others too, as seen here, and in a slightly longer form here)

I will again avoid commenting on the legal situation and let readers make their own judgements.

However I do note that Astrolabe are effectively saying that continued development of the time-zone database with new information based on Government changes is acceptable.

Monday 10 October 2011

Time zone database rebooted

Thanks to the efforts of Robert Elz and other community members, I can now report that the timezone database is effectively continuing without significant interruption.

The timezone database mailing list had been discussing how to handle the impeding retirement of Arthur David Olson for a while before last week. IANA had effectively been chosen as the new home, and one that would have slightly more structure than the original home. As a result of this preparation, it was possible for key members of the community, led by Robert Elz, to restart the mailing list in a new location. The first message from Robert Elz invited list members to continue the good work:

...the world's governments continue to adjust their view of their local timezones & summer time adjustments, so we need to continue updating the database (at least). There are several updates that will be needed soon.

For now, until someone else volunteers, I'm prepared to make the updates, in much the same way that Arthur did, and then make new releases available from munnari's ftp server. I suspect that it is unlikely that either Arthur or Paul will be able to assist much with this in the coming months, so I'm hoping that others will assist where possible.

Following this message, there has been a lot of activity. This included the setting up of some new FTP mirrors for the data, discussion of new time-zone changes (in exactly the same manner as before), some history digging to find old/misisng versions of code and data, discussion of PGP signing of the data, and some temporary git repositories (not ready to be cloned yet).

I think that all this energy once again demonstrates the value of open source communities and their resilience to difficulties when the code/data really matters to a wide cross-section of people. Thanks to everyone involved in provding that energy and spreading news of the attack.

However, none of us must forget Arthur David Olson and Paul Eggert, who still face the threat of a direct and personal lawsuit having given so much to the open community for zero cost and zero reward. I look forward to hearing better news for them personally.

Thursday 6 October 2011

Time-zone database down

Today (2011-10-06), the time-zone database was closed down.

It is perhaps easy to read that line, think it doesn't affect you, and then move on. But thats just not the case.

Update 2011-10-07: A backup copy of the mailing list and subscribers has now swung into operation run by volunteer Robert Elz. Please only join this list to discuss changes to time-zones - its not a "fighting the lawsuit" list. New tz list

Update 2011-10-07: More details on the history of ACS, Astrolabe and how this came about from a competitor. Worth reading.

Update 2011-10-10: A new version of the time-zone database 2011l has been released. This news indicates how other members of the community have stepped up and continued the work, so the database can no longer be considered to be "down". However, we all need to ensure that we continue to support Olson and Eggert in their fight with the foolish people at Astrolabe. My blog about the rebooted database.

Update 2011-10-26: The situation has now stabilised. The database is now hosted by IANA after the succesful database reboot. Astrolabe also gave their opinion on the matter.

The time-zone database (sometimes referred to as the Olson database) is the computing world's principle source of time-zone data. It is embedded in every Unix and Java for starters, and will be used by many websites and probably by your iPhone. You may know it via the IDs, such as "Europe/London" or "America/New_York".

But, perhaps you're thinking that time-zones don't change? Well that may be true for America and the EU right now, but certainly isn't for the rest of the world. Governments change their time-zones all the time, and the decisions are frequently very political. I'd estimate there are between 20 and 100 separate changes made around the globe each year. And these can be at very short notice, triggered by earthquakes for example.

The time-zone database tracks all this information and creates a standard format file that describes it. I would show you an example of the file, but then perhaps I'd be sued....

The database itself was run as an open source project, led by Arthur David Olson, supported by many others. The data was published as a set of files about 15 times a year, and then picked up by users everywhere.

The complaint itself comes from Astrolabe, Inc, whose website looks like a company I would avoid doing business with.

The complaint is that Astrolabe produce a work, the "ACS Atlas", which is referenced by the time-zone database (some sources suggest that Astrolabe may have recently purchased the work). Astrolabe claim copyright over their work and thus believe that the time-zone databse should not have released their information to the public domain. The case is targetted at two private individuals - Arthur David Olson and Paul Eggert, who have hosted the website for many years.

The key passage in the time-zone database files is this:

# From Paul Eggert (2006-03-22):
# A good source for time zone historical data in the US is
# Thomas G. Shanks, The American Atlas (5th edition),
# San Diego: ACS Publications, Inc. (1991).
# Make sure you have the errata sheet; the book is somewhat useless without it.
# It is the source for most of the pre-1991 US entries below.

For obvious reasons, I'll refrain on commenting on the rights and wrongs of the case, although I will note that facts like the phonebook cannot be copyrighted. A detailed response from one site taken down is now available. Instead I'll focus on the impact.

The impact of this is severe for anyone that uses it - whether via Java, Unix or some other means. This really is the key tool used by everyone to tell the right time globally. We all owe a debt of gratitude to the database maintainers who have worked on this for many, many years at zero cost to the industry and for zero financial gain.

So, right now the global situation is that there is no longer a single central location for time-zone information for computing. I'm sure that each major user project (like the Unix distros) will patch their own versions as best they can, but the stricter ones might argue that the current data is tainted and want to remove even that. This could get very messy very quickly.

Both Joda-Time and ThreeTen/JSR-310 use the data to build timezone information. ThreeTen/JSR-310 in particular provides this information in huge detail to applications. The worst case scenario is that multiple groups start up to provide this data in the future, and applications are then responsible for handling multiple competing data sources.

This data is so key to the world at this point that it needs to be formalised and run by a group with more legal and financial backing. Efforts had been ongoing to achieve this, but they may now be in jeopardy - who would want to take on a project being legally attacked?.

I hereby call on the industry leaders to help sort this out - IBM, Oracle, Apple, Google, RedHat I'm looking at you.
Update: I didn't include Microsoft here because Windows has its own time-zone data files.

In the meantime, could I please ask that anyone thinking of patching the data on a temporary basis, or trying to recreate it from scratch, re-uses the existing file format. There is no reason to believe that the C code or file format is tainted by the lawsuit, just the data. So, lets all please try to minimise the mess that could happen if everyone starts to go their own way.

Add a comment or a rant below!

Update 2011-10-07: Alternatively you can rant on their Facebook page, Website contact us page or Amazon book review (note that the book may be by a different company - its a little unclear).

Thursday 8 September 2011

Factory names

Building on my recent discussion of common method names, I wanted to add a little more detail on factory methods. Of course, this relates to Java - other languages have different conventions.

Factory methods

One of the habits I have got into is using factory methods for immutable classes and constructors for mutable ones. Like all rules, this isn't applies 100%, but it is a good place to start.

One of the benefit of factories is the ability to change the implementation class (such as to an optimised subclasss). This can tackle performance issues without affecting the main application. Similarly, the factory might return a cached instance if the class is immutable. The JDK Integer.valueOf() method is a good example of applying a cache (although sadly there are more negative knock-on implications in Java).

But, what name should be used to define these factories?

The JDK default choice was Foo.valueOf(). This name is perfectly usable, and has the benafit of being well-known. But it is a bit more wordy than necessary.

The JDK added to this convention with EnumSet.of(), the Foo.of() naming pattern. This is my favourite choice, as "of" is short and clear for most cases.

Where necessary, it can be suffixed by something descriptive, such as OffsetDateTime.ofMidnight() or Duraton.ofSeconds() (example). These factories are normally a little more complex in how they go about manipulating the input parameters into the state of the class, and the Javadoc of the factory will tend to be a little more complex.

Thus Foo.of() itself (no descriptive suffix) should be reserved for the most common case where there is no confusion as to meaning. This will typically take parameters that relate very simply to the internal state of the class (example). For example, there should be relatively little complication in the Javadoc that describes what the factory does.

Sometimes, the "of" prefix doesn't make sense. So within a given API it may make sense to deviate. In ThreeTen/JSR-310 I have a variety of other common factory names.

The DateTime.now() variants create an instance of the class with the current time. This could be DateTime.ofNow(), but the functionality of the factory feels sufficiently different to justify its own specific factory name. (example)

Similarly, I use a specific factory name for parsing - DateTime.parse(). (example). Parsing is a very specific operation, that really justifies standing out in the API. Note that if the string being parsed is effectively just an identifier and therefore the actual state of the class, then I would use "of", not "parse", as in ZoneId.of("Europe/London").

For these more complex cases, its about clarity. For example, Duration.between(a,b) to calculate the duration between two instants,is a lot clearer than Duration.ofBetween(a,b).

There are lots of possible alternatives: Foo.make(), Foo.from(), Foo.for() (which requires a suffix to make a valid method name!), and many more. The advantage of "from" is that it is the opposite of "to", when converting from another type. However, in most cases, I find the consistency and simplicity of using "of" as a general factory prefix as being more useful.

And of course a key advantage of method prefix naming strategies is that you can type Foo.of and do your IDE auto-complete to see all the available principal factories. That is a key API usability feature.

Finally, I particularly dislike factory methods that start with "get", like Foo.getFoo() or Foo.getInstance(). These are really confusing in all circumstances, but especially in an IDE where auto-complete after "get" really shouldn't show a factory.

Summary

I like to use Foo.of() for most of my factory methods, supplemented by Foo.parse() and other specialist variants. I also try to use factory methods for immutable classes, and constructors for mutable classes where possible and sensible.

Any thoughts on this? Comments welcome...