Friday 2 May 2008

Enhancing Java - Multi-lingual blocks

The reality for Java is that there are many other programming languages, and many of those have features that Java developers sometimes wish they could access. But its simply impossible to add all those features. Is there a possible alternative if we think 'outside the box'?


What I'm thinking about in this blog is the possibility of embedding Groovy, Ruby, Jython or Scala code directly within Java code.

Why might that be useful?

Well each language has their own benefits, whether Scala's functional style or Groovy's GStrings. Including a small part of another language within the main code body could be useful, although obviously this would be a technique to be used with care.

And it doesn't have to stop at known languages. What about a dedicated 'SQL language'? Or a dedicated 'XML language'? These would be more than just DSLs, but actual languages with whatever syntax rules are most applicable.

So, what might a syntax look like:

 public String fetchRow(int id) {
   :groovy: {
     println "Row id: $id!"
   :sql: {
     SELECT %text% FROM my_table WHERE row_id = %id%;
   return text;

The idea is that a block of code, surrounded by curly brackets, can be identified as belonging to a different language. In this case I've used the syntax of the name of the language (which would have to be imported) surrounded by colons. Note that there is nothing specific about the syntax within the block. Bear in mind that the syntax isn't that important - its the concept that matters.

The Groovy example - just normal Groovy code - outputs the row id using an embedded string. The SQL example is an invented 'language' where a column is read by id, and then returned to the Java code as the variable text.

So, what about the detail? Well, the approach requires two parts.

Firstly, there needs to be a parser for each language that understands the relevant syntax. This will typically be a variation of the normal parser for a 'real' language like Scala or Ruby. For a new language like SQL or XML, it would be written from scratch. The parser also needs to be able to recognise when the block of code in that language is complete.

Secondly, the parser needs to be able to share variables with the surrounding code. As a basic principle, this can be thought of as a map, where the other language code can both read and write to the map. Of course this requires there to be a mapping between the various type systems - for Groovy this should be easy, other languages might find that more tricky.

So how hard is this to implement? Probably pretty hard. But it does open up lots of possibiilties - wheter for embedded DSLs or larger blocks of code in another language.


This is an outline of an idea to allow other languages, whether existing or new, to be easily embedded directly in existing code. Any thoughts?


  1. Wow, there's a refreshingly radical new idea. Much much better than today's misuse of annotation strings.

    We'd probably want rudimentary versioning, say:

    SELECT COUNT(*) AS %cnt% FROM xyz;

    In fact, having versioned pluggable language providers would allow us to escape legacy Java land for good:

    void asyncHelloWorld(Executor ex) {
    ex.execute({=> System.out.println("Hello world"); });

    I guess it would also open up to writing native JVM bytecode, i.e. so we finally can do a swap:


    So, when can we play with a prototype? ;)

  2. Looks interesting, but I'm afraid that the IDEs and static analysis tools will be dramatically less useful in such environment.

    You can still use other languages right now and the embedding API cleanly delineates them from your Java code. It's cool, but not very practical (in the way a monocycle is)

  3. Sounds a tiny bit like SQLJ:

  4. I think this is not a good idea! This will be a debugging nightmare...

  5. The funny thing about this is that it matches my thoughts on when you proposed multi-line strings, only my idea was the wrong way round and didn't work out.
    What you actually propose is to include some textual fragment that gets interpreted/translated by some developer-defined, external parser producing the byte code for you. I'd extend to that external parsers should be imported by a class and given some alias for use:
    ´ import native my.great.domain.Java3000Parser :J3K:;
    ´ import native java.lang.MultilineStringParser :":;
    Which may then be applied as you have shown, but should support some return value, too:
    ´ List names = :J3K: {\ getName() ) };
    ´ String response = :": {
    ´ ´ Hello ${name}!
    ´ ´ This is a multiline text that ignores
    ´ ´ initial return and leading spaces.
    ´ ´ Cheers, ${user.getName()}
    ´ };
    Not sure about the "user.getName()", but may be interesting to have the escape-delimiters not only refer to some variables but be some "callback" to the enclosing Java environment.

  6. Nice article - there's some good thinking behind this.
    I'd really like to see this up on JavaLobby. If you're interested send me a mail and we can organise it


  7. Hi,

    Sorry, I think it's a sloppy-thinking-coder impossible-to-parse disaster waiting to happen.

    Look at how mix-ins like POP-11 were misused by poor programmers who thought that changing the language would make the hard thinking go away.



  8. Stephen Colebourne2 May 2008 at 12:22

    Caspar, yes versioning would be possible, but I'm not sure whether it needs to be part of the feature, or just a naming convention. You're right about the Java v7/8 concept though.

    I'd love someone to develop a prototype...

    Dimitar, yes IDEs are a critical factor. It would be up to the writers of the plugin parsers to provide whatever IDE support they feel is necessary, from text editing to debugging. Basic IDE support would be at the level of text editing only for these blocks.

    Pete, yes the idea is not unlike SQLJ. But its also not unlike the native XML idea that used to be talked about for Java 7. And its not unlike the direct embedding of 'unsafe' C code in C#.

    Stefan, I agree entirely on the need for a specialised import.

    Damon, I definitely think its parsable syntax, but I'm not sure thats what you mean. I also think its an extra risk point in your code, allowing coders to write complicated code in another language that confuses their co-workers.

    But, the point is that it would be easy to stop developers doing this. Each plugin parser would almost certainly be a separate jar file, so if the team lead blocks the jar file dependency then developers can't access it. Bear in mind that C# already has a feature like this with 'unsafe'.

  9. I don't see how this can be useful.
    Could you provide some example explaing that?
    It seems to me a features that only could add extra complexity to the source code.

    I think that this does not add extra functionlities.
    Currently if you want use SQL in Java you can simply pass a SQL string to your JDBC driver.
    If you want to interact with groovy you can create and a compile a class with grovy. In this case the interface between the Java code and the groovy code is more clear.

  10. Don't use single brackets because it makes trying to parse foreign languages a pain in the arse, and if you mismatch brackets in a bracket-ful embedded language, the compiler will get extremely lost and you get a plateful of weird errors.

    Use something sufficiently weird that it's unlikely to legitimately show up in any language. Use multiple symbols. {:{ with not whitespace in between or some such.

    I would also suggest that the keywords are turned into java classes. You could then go and do:

    with ( java.languages.python.Python ) {:{
    def whatever:

    and of course with an import statement that just becomes with ( Python ). Otherwise you need some dumb 'registration' system the way JDBC drivers due. Blech. Now there's a namespace conflict too if someone else wants to write a different plugin. Also blech. Here's a really weird hack: What if the with() marker took an annotation? The thing with annotations is, it already comes out of the box with the ability and limitation you need: You can specify a class, and you can specify a version, and the version stuff can be optional, and whatever you do, it must be compile time constants. That's exactly what you want.

  11. Stephen Colebourne2 May 2008 at 14:54

    Andrea, Many developers are finally realising that Java isn't the be all and end all of programming languages. However, they also want to keep the bulk of their development and coding within Java for the many good reasons, such as availability of developers and shared knowledge.

    This technique would allow developers the ability to break out into alternative languages (either full or DSL) on demand, and with a lot less effort than today.

    Today, the developer has to find the other language, setup the environment, work out how to compile code using it, and keep all the source in a separate file.

    With the technique I've outlined, the developer just adds a jar to the classpath and starts coding. A standard solution to many problems.

    BTW, the SQL example above includes reading a varible from Java, and writing a variable back to Java, despite being written in a different language. That is quite a neat trick, and much more powerful than just passing a string to JDBC.

  12. I like schemes that allow for evolution rather than revolution. With this scheme I can move to Groovy one snippet at a time instead of committing my whole project (or entire classes) to it.

    What does the parser generate? Ideally it would be able to generate any level: Java source (to be fed back into the parser), simple AST, post-attribution AST, post-desugaring AST, and bytecode. It'd be a great way to prototype new Java concepts, like your map for-each. For that you'd want to be able to hook in to the existing Java parser so you don't have to write one from scratch.

    This reminds me a bit of Meta-Lua. A friend and I have been thinking about Meta-Java for a while, and this seems like a variation of that idea.

    Other syntax ideas include , like PHP, or <%language %>.

  13. I agree with Remi that it is better to stick to the single language in the source code. However it doesn't mean that IDE couldn't help us with doing some cool stuff on top of that. See for example this use case and corresponding papers from the code blocks annotations proposal that been researched as part of JSR-308.

  14. I have used C compilers that support the asm keyword to allow embedded assembly code in with the C. In general it was only practical for short snippets. Also people generally found it easier to encapsulate the assembly into a method or into a macro, so that people didn't need to read the assembly. I have never missed the asm construct in other C compilers and I suspect multilingual inline code will be more of a pain than a benefit, like asm is.

    Another option is to use a source statement:

    So that at the top of a file you would write:

    source Java7


    source Python3000

    Then a multilingual compiler could compile all the languages in one hit. Also tools would be able to know if they were applicable to the file. Further, it is easy for people to tell what is happening. It also allows languages to evolve; e.g. Java7 need not be source compatible with 6, because Java7 could specify a source statement as compulsory. The source RFE referenced above also suggests an escape sequence for keywords so that APIs that use Java keywords can be called from Java.

  15. I cannot see a use in a file-wide source statement. You could easily set up your development environment to make it compile different files with specific compilers, which actually is what that meta-preprocessor approach would be.

  16. A method scope declaration would be good.
    Please don't make fragments like this.
    I can never imagine any case you should use multiple language in a single method.

  17. Stephen,

    I have been thinking along these lines for a while. I commend you for your idea.

    We already embed many languages and mini-languages in our code. For instance SQL, HQL, XPath, SimpleDateFormat, HTML, etc. Many of these are embedded in Strings. If we could indicate what language we are using then the syntax could be checked by the compiler. This would help reduce errors.

  18. Hello,

    I've been thinking about this kind of "Multilingual Java" feature every now and then.

    There are generally 2 use cases where I have missed the mixed language support:
    * Various string format expressions, regular expressions and the like, where having a compile time syntax check and validation (not to mention IDE editor support with autocompletion and syntax highlight) would greatly enhance my productivity.
    * narrowly specialized (mostly declarative) DSL's for laying out GUI widgets, performing SQL queries, etc.

    The first case would only require little syntactic sugar, that could easily be provided by annotating variables or properties accepting special syntax strings. Something like that:

    >public static Pattern compile(
    > @Language(RegexPatternLanguage.class)
    > String regex) {
    > return new Pattern(regex, 0);

  19. The second use case could easily be reduced to the first one without requiring too much

    It would be of course much more elegant to allow full blown mixed language support, but have you stopped to consider all the implications?

    Syntax aside, to make this even remotely useful feature, these foreign language code blocks must essentially be behaving as closures and as such can not be until Java has closures support.

    Ignoring for a moment the forseeable horrors of debugging and maintainability of such a code mixture (if used wisely, this will increase maintainability, but we all know that this blade cuts both ways), what to do with exceptions thrown from the foreign language block? what is the scope/visibility of the variables around the foreign language block? I am sure, these concerns are all familiar...

  20. Stephen Colebourne8 May 2008 at 17:23

    While I do agree that this is a form of closure, there is no need for this to be implemented as a closure. The alternate language parser should generate bytecode, which can be directly integrated into the host method.

    The key point about the whole proposal is increasing the ability of the compiler to check the code we write - obviously in cooperation with the IDE.

  21. You know you've been programming Scala too long when you see println "Row id: $id!" and straight away think "oh no! his function has side effects!!" ;)


Please be aware that by commenting you provide consent to associate your selected profile with your comment. Long comments or those with excessive links may be deleted by Blogger (not me!). All spam will be deleted.