Friday 2 May 2008

Enhancing Java - Multi-lingual blocks

The reality for Java is that there are many other programming languages, and many of those have features that Java developers sometimes wish they could access. But its simply impossible to add all those features. Is there a possible alternative if we think 'outside the box'?

Multi-lingual

What I'm thinking about in this blog is the possibility of embedding Groovy, Ruby, Jython or Scala code directly within Java code.

Why might that be useful?

Well each language has their own benefits, whether Scala's functional style or Groovy's GStrings. Including a small part of another language within the main code body could be useful, although obviously this would be a technique to be used with care.

And it doesn't have to stop at known languages. What about a dedicated 'SQL language'? Or a dedicated 'XML language'? These would be more than just DSLs, but actual languages with whatever syntax rules are most applicable.

So, what might a syntax look like:

 public String fetchRow(int id) {
   :groovy: {
     println "Row id: $id!"
   }
   :sql: {
     SELECT %text% FROM my_table WHERE row_id = %id%;
   }
   return text;
 }

The idea is that a block of code, surrounded by curly brackets, can be identified as belonging to a different language. In this case I've used the syntax of the name of the language (which would have to be imported) surrounded by colons. Note that there is nothing specific about the syntax within the block. Bear in mind that the syntax isn't that important - its the concept that matters.

The Groovy example - just normal Groovy code - outputs the row id using an embedded string. The SQL example is an invented 'language' where a column is read by id, and then returned to the Java code as the variable text.

So, what about the detail? Well, the approach requires two parts.

Firstly, there needs to be a parser for each language that understands the relevant syntax. This will typically be a variation of the normal parser for a 'real' language like Scala or Ruby. For a new language like SQL or XML, it would be written from scratch. The parser also needs to be able to recognise when the block of code in that language is complete.

Secondly, the parser needs to be able to share variables with the surrounding code. As a basic principle, this can be thought of as a map, where the other language code can both read and write to the map. Of course this requires there to be a mapping between the various type systems - for Groovy this should be easy, other languages might find that more tricky.

So how hard is this to implement? Probably pretty hard. But it does open up lots of possibiilties - wheter for embedded DSLs or larger blocks of code in another language.

Summary

This is an outline of an idea to allow other languages, whether existing or new, to be easily embedded directly in existing code. Any thoughts?