Monday, 28 January 2008

Java 7 - Multi-line String literals

One of the most common features in other programming languages is the multi-line String literal. Would it be possible to add this to Java?

Update, 2011-10-31, Just wanted to note that multi-line strings are not in Java 7, nor are they likely to be in Java 8. This blog post is still useful to understand some of the difficulties that would have to be tackled if they were to be included in future.

Multi-line String literals

In Java today there is only one form of string literal, supplied in double quotes. Within those double quotes, certain characters have to be escaped.

 String basic = "Hello";
 String three = "This string\nspans three\nlines";
 String welcome = "Hello, My name is \"Stephen\", Hi!";

The first of these two examples is not complex, and would not make use of a multi-line String literal. The other two might be more readable with such a literal.

The standard for defining a multi-line String literal in both Scala and Groovy is three double quotes. This also seems like a sensible choice for Java:

  String three = """This string
spans three
lines""";

This is potentially much more readable, especially with large blocks of text. This form of literal would also avoid the need for escaping:

 String welcome = """Hello, My name is "Stephen", Hi!""";

Note that we no longer need to escape the double quotes. This would be especially useful for regular expressions.

Bear in mind that multi-line String literals are fundamentally no different to normal String literals on the key point of the object created. Both would create java.lang.String objects.

Issues

The first issue is the multi-line arrangement. Since all text within the multi-line literal is included, all lines except the first must begin from column zero. This will look odd in a piece of well-formatted Java code:

  // what a naive multi-line literal forces us to write
  public class MyClass {
    public void doStuff() {
      String three = """This string
spans three
lines""";
      System.out.println(three);
    }
  }
  
  // what we'd like to write
  public class MyClass {
    public void doStuff() {
      String three = """This string
                        spans three
                        lines""";
      System.out.println(three);
    }
  }

One possible solution is to provide a method on String that strips all whitespace after each newline. This could be called directly after the literal. Unfortunately this approach loses some efficiency as the string must be trimmed each time:

  // option with trimNewline()
  public class MyClass {
    public void doStuff() {
      String three = """This string
                        spans three
                        lines""".trimNewLine();
      System.out.println(three);
    }
  }

Another, perhaps better, solution might be to have a syntax variation. If the opening triple quote is followed immediately by a newline, then the position of the first non-space character on the next line represents the column to begin the literal at. (The first newline would not be included in this form of the literal.) Only the space character would be permitted in earlier columns until the end of the literal. This would allow for natural formatting of this kind of string:

  // option with columns determined by first line
  public class MyClass {
    public void doStuff() {
      String three = """
              This string
              spans three
              lines""";
      System.out.println(three);
    }
  }

One final tricky issue is handling a string containing the triple double quote. The answer is probably to ignore this situation (Scala does this). It is going to be very rare, and it can be worked around using string concatenation.

Summary

Multi-line String literals should be a relatively easy addition to Java (anyone fancy adding it to Kijaro?). The main benefits would be avoiding escaping in regular expressions, and pasting in large blocks of text from other sources.

Overall, I think they would be a valuable addition to Java. But have I missed any obvious issues? Are there any other syntax options that should be considered? Opinions welcome as always :-)

32 comments:

  1. I was just outlining some code that reads Java source and scans for the name of the package... in that case, one needs to account for the situation where somebody has an evil (but unlikely) multi-line comment that is missing the asterix at the beginning of the line, such as:-

    /*
    * blah blah
    package my.package;
    *
    */

    It's easy enough to deal with this by essentially trimming away all comments before parsing for the package name. Anybody writing such parsing code would also have to be aware of any multi-line string notation and deal with it appropriately, but it would break backwards compatibility in some sense. That's a real corner case though... but it does highlight that any proposed solution should be easy for a regex to recognise. I believe the triple-quote notation works nicely there. In fact, even if somebody is ignoring comments with the regex "\".*?\"", it will still work because """multi line string""" will be removed by it!

    ReplyDelete
  2. I don't think it's worth not doing

    String multi = "This is a multi" +
    " line sample of a test" +
    " and notice since they are all" +
    " constants, the compiler" +
    " combines them at compile time" +
    " into one string constant";

    ReplyDelete
  3. er.... I meant...
    I don't think it's worth doing.

    ReplyDelete
  4. Would it be ridiculous to entertain a heredoc syntax?

    String multiLine = <<"
    This is a perhaps silly way
    to handle a multiline string.

    It can handle newlines and will
    preserve
    spacing too if you want.
    ">>;

    Of course a single line version that would ignore quotes would work too:

    String dialogue = <<" "You're right, Dr. Zaius," said the ignorant dirty ape. ">>;

    This gets you WYSIWYG strings and a unique operator that isn't as complicated to parse as successive quote marks.

    ReplyDelete
  5. initially this sounded not a bad idea. but after reading the problems and their rather not convincing solutions, i would rather stay with the current solution. because,
    - current java ensures there is no white space problem
    - it is easy to automatically format the text.
    - modern ide's like IDEa are really smart about multi line string handling.
    - it is not that common to use "This string\nspans three\nlines"; type of Strings.

    however, especially when IDE's are not there, indeed writing SQL queries, or regular expressions become sort of a pain. maybe some other type of syntax could be introduced

    ReplyDelete
  6. Only significant problem I see is ambiguity in newline handling. Right now, Java string literals are very nice in the fact that any newline characters must be explicitly specified using escape sequences. With multi-line literals, the character embedded in the string would depend upon the platform-specific carriage return/line feed sequence. This could have a drastic impact on how some code assumes strings will work. More importantly, it will technically introduce a platform-specific quirk at the compiler level. A piece of code compiled on Windows may have totally different behavior to the same code compiled on Linux may have totally different behavior to the same code compiled on Mac (Mac and Linux behavior will likely be the same).

    It's a good idea in principle, but it means spending way too much time worrying about line feeds and escape sequences.

    ReplyDelete
  7. Nice proposal but the addition of variables within the strings seems necessary. Like GStrings in Groovy. Example:

    int a = 10;
    String multi = """Hello I'm ${a}
    years old"""
    Without the variables in the String one still needs to concatenate to form dynamic strings. The syntax of ${} should be comfortable since it just like jstl.

    ReplyDelete
  8. Hm
    The idea is nice but the syntax with """ is blah.
    String a="""somethink
    x"""; :(
    the other ideas with "' somethink '" or '"axax "' maybe are better but why we dont make not only Multilines Strings Why we dont make "PRE" Strings
    somethink like
    String a="'This String is PRE somethink like \ can be single and can be multylines too
    yeah :) cool '";
    or like in C# @" string with =\ \ \ \ \ \"

    ReplyDelete
  9. Is there nobody using small XML fragments in Test Classes!?
    Here multiline strings would be *really* useful!

    But if I could choose only one feature for Java7, Neal Gafter's closures proposal is still my most wanted though!

    ReplyDelete
  10. Laurent:
    - write a method: Files.readAsString(fileName) and put your test xml's in files.
    - i think Stephen's FCM or Mr Bloch's CICE should be preferred to the Mr. Gafter's proposal.

    ReplyDelete
  11. Concerning the "option with columns determined by first line", it seems like you would have a problem with someone who wants to indent the first line even further than the rest, like the start of a paragraph. Sure, you can say that a person should move paragraphs to an outside file, in which case why not do that for all multilines? So you would have to survey all the lines to find where to start.

    On a related indenting note, what happens with tabs verses spaces. A lot of IDEs will allow you to set how many characters a tab appears to be. Is a tab 4 spaces or 8? And how should a compiler read it if the first line is 2 tabs and 4 spaces and the second line is 12 spaces and the third line is 20 spaces?

    ReplyDelete
  12. You guys really worry about this? Get a freaking life.

    ReplyDelete
  13. Python (which is probably where Scala/Groovy copied the syntax from) handles the multi-line issue by accepting a '\newline' as a special escape that causes the newline to be ignored. e.g.:

    String foo = """\
    Foo
    and
    Bar""";

    ReplyDelete
  14. I think this would be awesome, it is such a pain that you have to either do concatenation with + or read it from a total separate file. There doesn't need to be any special syntax. Other languages have allowed multi line strings for ages with no problem and I have always thought it a little bit silly that java wouldn't let you do it.

    ReplyDelete
  15. python uses fixed indentations, so it leverages that. it is not possible in java. i think that code formatting issue (preceding white spaces) will bite in all the proposals, so i say forget it, trust your IDE in java for multi-line strings.

    ReplyDelete
  16. I like the way Python and Groovy handle this. Seems we could do the same in Java.

    Why not exactly?

    ReplyDelete
  17. well, a second thought, automatically trimming may not be a bad idea. i mean trim all the lines by default if multiline syntax is used.

    ReplyDelete
  18. Stephen Colebourne28 January 2008 at 23:18

    Thanks for all the thoughts.

    My first response to those that have backed away due to the multi-line issues is don't forget the benefit of the single line no-escapes format. The would make regex much easier for starters.

    The syntax <<" ">> is a valid alternative, but I would still prefer the Scala/Groovy commonality if possible.

    The code formatting whitespace problem may just be a red herring. One solution I didn't list is just ignoring it, and coding any multi-line strings from column zero.

    More significant is Daniel's point about which newline character to use - Windows or Unix. This is a nasty problem - but how do Groovy/Scala handle it?

    ReplyDelete
  19. I don't know about multi-line Strings, but a regular expression literal (with the /xxx/ syntax) would be much more useful IMHO! (Sorry if this was discussed here before).

    ReplyDelete
  20. For languages like PHP and Python, multi-line Strings are (or have been) benefitial, as they usually embed with HTML as interpreted languages for Web applications. Not introducing some object-access from within multi-line Strings quite narrows the advantages, and for Java, one does not really want to "print" out HTML this way.
    I would definitely favor some String-wide escape-escape, so one could write regular expressions more readable, but I am not sure of the real use-cases for multi-line support. To only ease XML-blobbing for test cases or to create meaningful error messages does not really sound convincing to me.

    ReplyDelete
  21. What about using a 'everything after here is literal' leading character quoted layout.

    // imagine the subsequent lines are indented to match the initiating quotes
    String s = """
    """ multiline comment
    """

    // Imagine the subsequent lines are 'two space' indented.
    String t =
    """Another multiline
    """String literal

    The Good
    * Inserting the leading quotes is easy with any decent editor.
    * Supports leading whitespace as indentation AND as String content.
    * Highly visible

    The Bad
    * Unusually, for java, the statement terminating semi-colon isn't present nor is a String terminator.
    * Trailing whitespace is unprotected from editor settings unaware of the importance.

    The Unclear
    * Should the JVM use runtime platform line separators?
    * Should trailing triple quotes be supported for those times when you need to protect trailing whitespace?

    // Trailing quotes example ('two space' indented honest).
    String u =
    """This lines doesn't have trailing whitespace
    """But this one does... """
    """But that wasn't the end of the String.

    ReplyDelete
  22. Why? This is really, really, REALLY trivial.

    ReplyDelete
  23. My preferences have long been similar to Talden's ideas, but more like this:

    String u =
    "This line doesn't have trailing whitespace
    "But this one does... "
    "But that wasn't the end of the String.
    "";

    Note the needed ""; at the end to terminate the String and statement.

    Further, allow backtick (`) or something for raw Strings.

    ReplyDelete
  24. I would like to see raw Strings without the need to unescape chars. Something like "" or """. Very seldom I use \n, \t, … Especially in regular expressions the usage of \\ is tedious so that s.replaceAll("\\\\", "/") can become s.replaceAll(""\\"", "/") or print("Hello\"Kitty\"") is simply print(""Hello "Kitty"""). The Multiline feature isn't that important for me. FreeMarker Templates in external files with editor support works for me.

    ReplyDelete
  25. In addition:

    Treated-as-HTML text between """ will not reduce program performance, because tags can be preprocessed by Java compiler and put to .class file as an ordinary Java string.

    Most of all I need multi-line strings to write two things:
    - SQL Queries templates
    - Usage notes for applications launched from command line

    So, SQL syntax almost doesn't intersect with HTML tags and not sensible to spaces and tabs. For the second use of HTML format is even the preffered.

    ReplyDelete
  26. In addition :)

    before-HTML-processed value and after-HTML-processed value can get both into .class files.
    String s = """
    this is a
    text""");

    We must provide one new function or property to access it - getHTMLText() or getHyperText(). So
    System.out.print(s) will print " this is a text", and
    System.out.print(s.getHtmlText()) will print the original value.

    ReplyDelete
  27. One area I'd be cautious of when sorting out a syntax is the compilers now used within IDEs such as Eclipse.

    Now, if someone forgets to terminate a string, then the newline in the source is detected by the compiler, and allows the rest of the code to still be compiled, thus not generating a whole host of compile errors for one missing double quote.

    In this respect, I favour a syntax where the beginning of the line is marked on each line, as that would clearly allow the compiler to know that the string has not been terminated correctly.

    i.e. I much prefer:

    String a = """blah
    """ de
    """blah""";

    instead of:

    String a = """blah
    de
    blah""";

    However, it would also be nice to be able to just paste text in.

    Perhaps the best approach is to just copy an existing pattern that works well... hence copying Groovy/Scala

    ReplyDelete
  28. Why not handle a newline like any other character? So i would like to see:

    String a = "this is simple text
    which continues in other line
    and not much more to say!";

    Too simple, right?

    Now (December 2012) even a \n\ with line break is not possible.

    ReplyDelete
  29. Ridiculous that Java didn't add this in Java 7. Should definitely be in Java 8, but won't be. No wonder most people complain that Java is falling behind.

    ReplyDelete
  30. here's a use-case: Suppose you want to pass source code from Java to another compiler. e.g .SQL. Suppose I define a simple table:

    create table foo (
    bar int
    ,fum varchar(200)
    );

    Now, I can paste that into some SQL IDE and run it more or less as is. If I were passing it to the DBMS via Java, I'd like to do something like this:

    String ddl =
    create table foo (
    bar int
    ,fum varchar(200)
    );


    then I can pass the ddl in a call to the DBMS. This way, I have the same alternate-language source in Java and my favorite DB IDE. Same spacing and newlines. The only thing I'd want Java to do in addition is provide a way to strip the common leading spaces (four, in this case). That's not important for SQL of course, but could be in other, indentation-sensitive languages (Python, e.g.)


    ReplyDelete
  31. I'm not sure how well it would perform in the real world, but what if multiline strings were handled similar to JavaDocs are?

    /*This is
    *a javadoc
    *with multiple lines,
    *right?
    */

    /+This is
    +a string
    +with multiple lines,
    +right?
    +/

    Looks like the syntax could use a little bit of tweaking to avoid dealing with extra whitespace and newlines, but that's the first thing that came to mind after throwing out everything else that either had poor syntax or ruined code structure. I kinda like it because it'd use the "+" character just like if you were concatenating a string normally, but this is a much cleaner looking way of doing it in my opinion.

    The only downside I readily see is that it might not fit in look-and-feel wise with the rest of Java, but there's always ample time to get used to new things.

    ReplyDelete
  32. The lack of heredoc in Java has made many physiotherapists rich. Two things painfully missed after tasting PHP are multiline strings and associative arrays (yea maps). Anyways Groovying in the meantime.

    ReplyDelete