Stephen Colebourne's blog: Annotating JDK default data

Tuesday, 4 December 2012

Annotating JDK default data

A common issue in Java development is use of the default Locale, default TimeZone and default CharSet.

JDK defaults

The JDK has a number of defaults that apply to a running JVM. The most well known are the default Locale, default TimeZone and default CharSet.

These defaults are very useful for getting systems up and running quickly. When a developer new to Java writes some code, they should get the localized answer they expect. However, in any larger environment, especially code intended to run on a server, the defaults cause a problem.

The classic example of this is the String.toLowerCase() method. Many developers use str1.toLowerCase().equals(str2.toLowerCase()) to check if two strings are equal ignoring case. But this code is not valid is all Locales! It turns out that in Turkey, there are two different upper case versions of the letter I and two different lower case versions. This causes lots of problems.

The problem applies more generally to server side applications. Any server-side code that relies on a JDK method using one of the defaults will alter its behaviour based on where and how the server is setup. This is clearly undesirable.

As is often the case, there are two competing forces - ease of use for newcomers, and bugs in larger applications. There is also the issue of backwards compatibility, meaning that the JDK methods depending on the defaults cannot be removed.

One approach has been taken by the Lucene project, scanning code using ASM. The link/tool also provides a useful list of JDK methods affected by this problem:

  java.lang.String#<init>(byte[])
  java.lang.String#<init>(byte[],int)
  java.lang.String#<init>(byte[],int,int)
  java.lang.String#<init>(byte[],int,int,int)
  java.lang.String#getBytes()
  java.lang.String#getBytes(int,int,byte[],int) 
  java.lang.String#toLowerCase()
  java.lang.String#toUpperCase()
  java.lang.String#format(java.lang.String,java.lang.Object[])

  java.io.FileReader
  java.io.FileWriter
  java.io.ByteArrayOutputStream#toString()
  java.io.InputStreamReader#(java.io.InputStream)
  java.io.OutputStreamWriter#(java.io.OutputStream)
  java.io.PrintStream#(java.io.File)
  java.io.PrintStream#(java.io.OutputStream)
  java.io.PrintStream#(java.io.OutputStream,boolean)
  java.io.PrintStream#(java.lang.String)
  java.io.PrintWriter#(java.io.File)
  java.io.PrintWriter#(java.io.OutputStream)
  java.io.PrintWriter#(java.io.OutputStream,boolean)
  java.io.PrintWriter#(java.lang.String)
  java.io.PrintWriter#format(java.lang.String,java.lang.Object[])
  java.io.PrintWriter#printf(java.lang.String,java.lang.Object[])

  java.nio.charset.Charset#displayName()

  java.text.BreakIterator#getCharacterInstance()
  java.text.BreakIterator#getLineInstance()
  java.text.BreakIterator#getSentenceInstance()
  java.text.BreakIterator#getWordInstance()
  java.text.Collator#getInstance()
  java.text.DateFormat#getTimeInstance()
  java.text.DateFormat#getTimeInstance(int)
  java.text.DateFormat#getDateInstance()
  java.text.DateFormat#getDateInstance(int)
  java.text.DateFormat#getDateTimeInstance()
  java.text.DateFormat#getDateTimeInstance(int,int)
  java.text.DateFormat#getInstance()
  java.text.DateFormatSymbols#()
  java.text.DateFormatSymbols#getInstance()

  java.text.DecimalFormat#()
  java.text.DecimalFormat#(java.lang.String)
  java.text.DecimalFormatSymbols#()
  java.text.DecimalFormatSymbols#getInstance()
  java.text.MessageFormat#(java.lang.String)
  java.text.NumberFormat#getInstance()
  java.text.NumberFormat#getNumberInstance()
  java.text.NumberFormat#getIntegerInstance()
  java.text.NumberFormat#getCurrencyInstance()
  java.text.NumberFormat#getPercentInstance()
  java.text.SimpleDateFormat#()
  java.text.SimpleDateFormat#(java.lang.String)

  java.util.Calendar#()
  java.util.Calendar#getInstance()
  java.util.Calendar#getInstance(java.util.Locale)
  java.util.Calendar#getInstance(java.util.TimeZone)
  java.util.Currency#getSymbol()
  java.util.GregorianCalendar#()
  java.util.GregorianCalendar#(int,int,int)
  java.util.GregorianCalendar#(int,int,int,int,int)
  java.util.GregorianCalendar#(int,int,int,int,int,int)
  java.util.GregorianCalendar#(java.util.Locale)
  java.util.GregorianCalendar#(java.util.TimeZone)

  java.util.Scanner#(java.io.InputStream)
  java.util.Scanner#(java.io.File)
  java.util.Scanner#(java.nio.channels.ReadableByteChannel)
  java.util.Formatter#()
  java.util.Formatter#(java.lang.Appendable)
  java.util.Formatter#(java.io.File)
  java.util.Formatter#(java.io.File,java.lang.String)
  java.util.Formatter#(java.io.OutputStream)
  java.util.Formatter#(java.io.OutputStream,java.lang.String)
  java.util.Formatter#(java.io.PrintStream)
  java.util.Formatter#(java.lang.String)
  java.util.Formatter#(java.lang.String,java.lang.String)

Thinking about the two competing forces, it seems to me that the best option would be a new annotation.

Imagine the JDK adds an annotation @DependsOnJdkDefaults. This would be used to annotate all methods in the JDK that directly or indirectly rely on a default, including all of those above. Developers outside the JDK could of course also use the annotation to mark their methods relying on defaults.

  @DependsOnJdkDefaults
  public String toLowerCase() {
    return toLowerCase(Locale.getDefault());
  }

Tooling like Checkstyle, IDEs and perhaps even the JDK compiler could then use the annotation to warn developers that they should not use the method. It would be possible to even envisage an IDE setting to hide the methods from auto-complete.

This would seem to balance the competing forces by providing information that could be widely used and very valuable.

Summary

I think marking methods that rely on JDK defult Locale/TimeZone/CharSet with an annotation would be a powerful tool. What do you think?

26 comments:

Uwe Schindler4 December 2012 at 12:47
Hey Stephen, very nice blog post. I like the idea! Unfortunately this needs to be communicated through JCP...
ReplyDelete
Replies
Robert Muir4 December 2012 at 13:37
A great annotation for these methods already exists: @Deprecated
ReplyDelete
Replies
Uwe Schindler4 December 2012 at 13:41
@Robert: Unfortunately @Deprecated must also be communicated through JCP...
ReplyDelete
Replies
Anonymous4 December 2012 at 14:06
FWIW, Findbugs issues a warning when using the default charset (http://findbugs.sourceforge.net/bugDescriptions.html#DM_DEFAULT_ENCODING) - I don't think it does anything for default locale / timezone.
ReplyDelete
Replies
Uwe Schindler4 December 2012 at 14:11
@Anonymous: The method list FindBugs is on is not complete (it in fact only catches the most common ones). We investigated FindBugs and PMD at Lucene; but then moved to the 100 times faster ASM based checker with a full-featured charset/locale/TZ list. We were also able to disable other horrible code patterns like printStackTrace().
ReplyDelete
Replies
Stephen Colebourne4 December 2012 at 14:23
@Uwe, the JCP isn't really relevant here. Its mostly a rubber stamping body on Java SE matters. It certainly doesn't initiate ideas. This needs to go into the OpenJDK processes if its going to happen. It needs a JEP and a plan for JDK1.9.
ReplyDelete
Replies
Nathan4 December 2012 at 18:03
Three potential issues:

(1) The annotation indicates that zero or more paths may be default dependent, not that all paths are default dependent. In this way it may be overly safe. Some checked exceptions suffer from this. (MalformedURLException comes to mind.)

(2) The information implied by the annotation may be misleading. For example, many constructors of Formatter are on the list, because they obtain the default locale. However, the default locale is not *used* by these constructors. It is the methods of such an instance of formatter that have potentially unsafe behavior.

(3) The information implied by the annotation may be lost by certain abstractions. For example, String.CASE_INSENSITIVE_ORDER.compare(String, String) depends on String.toLowerCase(). Transitively, String.CASE_INSENSITIVE_ORDER.compare(String, String) is default dependent. Transitively, anything that invokes String.CASE_INSENSITIVE_ORDER.compare(String, String) is default dependent. However, the latter dependency may not be knowable, since the type of the instance of Comparator may not be knowable.
ReplyDelete
Replies
Frisian4 December 2012 at 18:54
Given, that these methods are perfectly fine for a client-side application, wouldn't it be enough to check them with PMD or Checkstyle? The JDK's source code is available and I don't think that a lot locale-dependent methods will be added to it in the future.
I like the annotation, though. But I'd like to see a more generic variant: @DependsOn("JdkDefaults")
ReplyDelete
Replies
elizarov4 December 2012 at 20:07
Great idea, but it seems wasteful to use it for JDK defaults only. You are basically talking about a kind of method effect annotation. Frisian in the comment above had suggested @DependsOn("JdkDefault"), but more correct and fully generic form should be more like @DependsOn("Locale.default"), @DependsOn("TimeZone.default") or @DependsOn("Locale.default", "TimeZone.default", ...), etc using the appropriate "import" information from the source file to shorten the package names. One should be able to use this annotation for defaults in other frameworks and to otherwise annotate method's dependence on a mutable static variables.

However, just like with other effects annotations (and as with "throws" which is also a special kind of effects annotation) there is a scalability problem. Checked exceptions and "throws" somewhat work around this problem by providing a way of handling exception and/or wrapping them into other types of exceptions. Even though checked exceptions are a powerful and very helpful tool in skillful hands precisely because they can be used as an effect annotation tool that is being enforced by compiler, they still invoke a lot of criticism from many people.

One should figure first how the corresponding scalability problem should be addressed before implementing any kind of @DependsOn annotation.
ReplyDelete
Replies
Stephen Colebourne5 December 2012 at 14:30
I wouldn't consider this annotation to be like checked exceptions. It is additional info provided for use. It would be totally up to applications whether they apply the annotation to their own code. I don't see the JDK trying to pass the info on.

I'm not a fan of string based annotations here either. The string is a magic constant that developers have to learn and is not directly compile time checked.
ReplyDelete
Replies
hertzsprung6 December 2012 at 14:40
Good idea. In a similar vein, I'd love to see compiler/tool support for recognising when you've created a thread or thread pool with a default name. Having usefully-named threads makes diagnosis with VisualVM so much less painful.
ReplyDelete
Replies
Anthony7 December 2012 at 18:43
I don't get it:

1) all defaults can be set with system properties + for Locale & TimeZone there are also specific methods to set the default. So for server-side applications you can easily force the defaults you want.

2) I didn't check all of them, but (almost?) every method on the list has an overloaded method with Locale as a parameter. So for a developer it ought to be obvious that if they don't pass a Locale, Java will have to somehow figure out a "default" value for it.

3) what if Java gets default parameters some day? Annotate every such method with @DependsOnDefaultValuesOfParametersWhenNotGiven?

So can someone explain me what all the fuss is about?
ReplyDelete
Replies
Uwe Schindler11 February 2013 at 21:59
Hi, the Lucene forbidden API checker was forked and released as Ant, Maven or CLI task at https://code.google.com/p/forbidden-apis/, available via Maven Central.
ReplyDelete
Replies

Add comment

Please be aware that by commenting you provide consent to associate your selected profile with your comment. Long comments or those with excessive links may be deleted by Blogger (not me!). All spam will be deleted.