Wednesday, 11 July 2007

Java 7 - What to do with BigDecimal?

What happens in Java when we have to deal with large decimal numbers? Numbers that must be accurate, of unlimited size and precision? We use BigDecimal, of course. Yet, I have a sense that we tend to curse every time we do. Why is that?

BigDecimal

I think that the main reason why we dislike working with BigDecimal is that its so much more clunky than working with a primitive type. There are no literals to construct them easily. Its only in Java 5 that constructors taking an int/long have been added.

And then there are the mathematical operations. Calling methods is just a whole lot more verbose and confusing than using operators.

  // yucky BigDecimal
  BigDecimal dec = new BigDecimal("12.34");
  dec = dec.add(new BigDecimal("34.45")).multiply(new BigDecimal("1.12")).subtract(new BigDecimal("3.21"));

  // nice primitive double
  double d = 12.34d;
  d = (d + 34.45d) * 1.12d - 3.21d;

Finally, there is the performance question. Perhaps this is a Java Urban Myth, but my brain associates BigDecimal with poor performance. (Note to self... really need to benchmark this!).

So, I was wondering if anything can be done about this? One solution would be a new primitive type in Java - decimal - with all the associated new bytecodes. Somehow I doubt this will happen, although it would be nice if it did.

More realistic is operator overloading (lets just consider BigDecimal for now, and not get into the whole operator overloading debate...). Overloading for BigDecimal has definitely been talked about in Sun for Java 7, and it would make sense. However, as can be seen in my example below, there is really a need for BigDecimal literals to be added at the same time:

  // now
  BigDecimal dec = new BigDecimal("12.34");
  dec = dec.add(new BigDecimal("34.45")).multiply(new BigDecimal("1.12")).subtract(new BigDecimal("3.21"));

  // just operator overloading
  BigDecimal dec = new BigDecimal("12.34");
  dec = dec + new BigDecimal("34.45") * new BigDecimal("1.12") - new BigDecimal("3.21");

  // with literals
  BigDecimal dec = 12.34n;
  dec = dec + 34.45n * 1.12n - 3.21n;

As you can see, the literal syntax makes a big difference to readability (I've used 'n' as a suffix, meaning 'number' for now. Of course the main issue with operator overloading is precedence, and that would need quite some work to get it right. I would argue that if literals are added at the same time, then precedence must work exactly as per other primitive numbers in Java.

One possibility to consider is replacing BigDecimal with a new class. I don't know if there are backwards compatability issues holding back the performance or design of BigDecimal, but its something to consider. Obviously, its difficult though, as BigDecimal is, like Date, a very widely used class.

A new idea?

One idea I had tonight was to write a subclass of BigDecimal that uses a long for internal storage instead of a BigInteger. The subclass would override all the methods of BigDecimal, implementing them more quickly because of the single primitive long storage. If the result of any calculation overflowed the capacity of a long, then the method would just return the standard BigDecimal class. I'd love to hear if anyone has had this idea before, or thinks that it would work well.

By the way, I thought of this in relation to JSR-310. It could be a specialised implementation of BigDecimal just for milliseconds, allowing arbitrary precision datetimes if people really need it.

And finally...

And finally, could I just say that I hate the new Java 5 method on BigDecimal called plus(). Believe it or not, the method does absolutely nothing except return 'this'. Who thought that was a good idea???

Summary

Hopefully this random collection of thoughts about BigDecimal can trigger some ideas and responses. There is still time before Java 7 to get a better approach to decimal numbers, and stop the use of inaccurate doubles. Opinions welcome as always.

26 comments:

  1. Oh brother! BigDecimal is TERRIBLE!

    I have dug around in the source and it's not a pretty thing. Performance has apparently been addressed, but due to the nature of its implementation it'll never be as fast as primitive types.

    I have often thought that operator overloading would introduce a degree of expressiveness that we've been missing in the Java space for some time. But again, that's bordering on the religious arguments...

    In my mind, the pain around java.util.Date and the solution ala joda is similar to that surrounding BigDecimal.

    Please make it go away. We're dealing with NUMBERS!

    ReplyDelete
  2. Stephen, tell me you have read my blog post on a similar topic. Though it was more targeting things for the retrospective.

    http://jroller.com/page/eu?entry=operation_overloading_in_java

    ReplyDelete
  3. Oh yes, finally some high profile debate about this issue, the BigDecimal is a super example of the verbosity of Java anno 2007.

    Add a Decimal literal, from the class file format it looks like there's plenty room for this. I have no idea what the plus() is about, ironically the add() really should have been named plus(), since it does not add to the BigDecimal but rather returns the new sum.

    ReplyDelete
  4. Patrick Wright12 July 2007 08:26

    I think it's actually a tough question, that is, whether the language should be extended to cover operator overloading for BD. The problem I have with it is that without formal support for operator overloading, the compiler has to bake-in the link to the BD class, which means we are stuck with the implementation we have; after the OO is added in, we can't safely "improve" the class without affecting existing code. We also can't choose alternate implementations based on our needs on different projects. That said, it should would be more readable--not sure what the solution is.

    There hasn't been much discussion about this till now, glad you brought it up.

    Regards
    Patrick

    ReplyDelete
  5. John Hendrikx12 July 2007 08:43

    What I donot understand is why there is no discussion about extending the number of primitive types for Java instead, avoiding this entire issue. One could add a "decimal" type that is suitable for the most common uses of BigDecimal (usually calculations involving money).

    I realize that adding new primitives is a much larger undertaking, but it would in my opinion be a much cleaner solution and allow for much more compact storage. I'd much prefer to see a system where primitives can be created with arbitrary precision when the defaults are not good enough something like:

    decimal d = 1.256; // defaults to 20 places?
    decimal(100) d = 3.1415.......;

    and perhaps also:

    int(128) int_with_128_bits;
    float(128) float_with_128_bits;

    ReplyDelete
  6. Søren Boisen12 July 2007 13:33

    I can't believe noone has mentioned my biggest annoyance with BigDecimal yet: that they include scale in equals() comparisons. How silly is that? They create a class that guarantees a given mathematical number can be represented exactly, but then make it so that two instances, that represent the SAME mathematical number, can be deemed unequal simply because the scale is different? On what planet was that ever useful?? I have used BDs a lot and every time I needed to compare two BDs, I didn't care about scale. Yeah yeah, I know, I can just use compareTo() == 0 instead, but that is awkward and unnatural.

    Wrt the original proposal, I think the way to go is to add a literal notation for BDs, just like we have it for String like you suggest. Then either use general operator overloading, if that is introduced, or special-case OO for the common mathematical operations (which btw should always return a new BD instead of modifying the current - unless += syntax is added as well, that should modify the object).

    ReplyDelete
  7. By the way, Groovy uses BigDecimal by default for decimal expressions.

    groovy> (1.5*2).class
    Result: class java.math.BigDecimal

    You can also force double (or float) mode if performance is an issue:

    groovy> (1.5d*2).class
    Result: class java.lang.Double

    BigDecimal seems to be fast enough for common operations, though. For more infos, see
    http://groovy.codehaus.org/Groovy+BigDecimal+Math
    http://groovy.codehaus.org/Groovy+Floating+Point+Math

    ReplyDelete
  8. True BigDecimal source is a mess. Seems like as java Date and Calendar was developed in india, OMG.

    ReplyDelete
  9. see:
    https://o24j.dev.java.net/
    (really) small oo API for java.

    ReplyDelete
  10. Matthew Hall12 July 2007 16:19

    RE: BigDecimal.plus() -- from the javadoc:

    "This method, which simply returns this BigDecimal is included for symmetry with the unary minus method negate()."

    Symmetry? I believe the opposite of "negate" is "leave it alone." I agree with you, there was no need for symmetry here. Let's see

    BigDecimal dec = new BigDecimal(50);
    // just to make sure!
    dec = dec.plus();

    ...yeah.

    ReplyDelete
  11. Stephen Colebourne12 July 2007 16:31

    Sounds like I touched a raw nerve, thanks for all the comments.

    @Eugene, I looked at your blog, and its a very clever idea :-) However, for me its still a hack around the real problems, which are OpOverload and literals.

    @Casper, @John, Looking at adding a primitive type is difficult. There are 35 bytecodes free at the moment. But a decimal primitive type would probably use most of those 35 leaving little for other changes, so I'm unconvinced that its the right solution. String isn't a primitive either, so being a literal is not connected with being a primitive.

    @Patrick, I accept that a literal could bake in BigDecimal, but what else can we do (except create a better implementation and bake that in). For OpOverload, I suspect that a new interface should be added that specifies the mathematical operations as method names. But we need self-types before that works properly.

    As for alternative implementations, thats where my 'new idea' comes in. If we had a BigDecimal factory (rather than constructors), then it could return optimised implementations based on the size of the underlying number (eg. use a long rather than a BigInteger for storage).

    @Soren, I'd forgotten equals being a pain :-)

    @Daniel, I'd forgotten that Groovy uses BD throughout. Thats one argument for keeping the existing class and working to improve access (as per groovy).

    I'm definitely thinking Literals and OpOverload is the way to go. What do people think about the 'n' suffix for 'numeric' big decimal literals?

    ReplyDelete
  12. Matthew Hall12 July 2007 16:47

    Søren, I've been bitten by the scale in equals comparisons too, and have similarly had to resort to compareTo() == 0 for equality.

    Although operator overloading is a source of contention, I must admit I would love to see it added to the language. Maybe we could find a middle ground, by only offering operator "aliasing," i.e. some descriptor would say "on this object or interface, invoking the + operator actually calls this add() method over here." Or maybe an annotation would do it:

    class BigDecimal {
    __@Operator("+")
    __public BigDecimal add(BigDecimal augend) {
    ____// ...
    __}
    }

    interface List {
    @Operator("+=")
    public boolean add(T e);
    }

    Of course this means that operator overloading would be interpreted contextually (depending on the type of reference you have to the object). It also means that if you change your mind on which method should be assigned to "+", that client could would have to recompile to start using that new method.

    Hmm. On the positive side, we could finally find a use for BigDecimal.plus()!

    @Operator("+")
    public BigDecimal plus(); // Woo hoo! Now we can support unary plus!

    ReplyDelete
  13. I like the annotation idea applied to Eugene's idea, giving a declarative way to map ISUB=>subtract(...), IADD=>add(...), ...
    The stack effects of the bytecode(s) in the method annotations might let you do sanity checking of which operators are okay to map to which methods.

    ReplyDelete
  14. Matthew Hall12 July 2007 17:48

    Now that I think of it, couldn't the annotation idea be implemented as a compiler extension, without being integrated into the language? I'm not super familiar with the annotation processor but if you can get access to the syntax tree then this should be possible.

    ReplyDelete
  15. @George Bush:
    BigDecimal was written by veteran Josh Bloch, to his defense a base 10 decimal that does not feel awkward is not an entirely simple thing to encapsulate in a language such as Java.

    ReplyDelete
  16. I like the idea of introducing new literals for BigDecimal. Operator overloading for a type that has a literal representation seems quite logical to me. Primitives, on the other hand, are a real pain in OO. I'd rather like all primitives to be replaced by literals, only caring about objects in future.
    Although, the suffix is of lesser importance, I'd rather chose G/g. A solution as in Groovy would be a nice possibility, though.

    ReplyDelete
  17. What do you mean by "There are 35 bytecodes free at the moment."? I'm somewhat with the adding a decimal type in java as a syntactic sugar and using the compiler to generate some BigDecimal bytecode behind, that way you'd have binary compatibility and would (hopefully) not need to add new bytecode, though I wasn't aware all bytecode space has been used, I really can't believe that.

    ReplyDelete
  18. Stephen Colebourne13 July 2007 09:54

    @Stefan, What would the literal G/g suffix be short for?

    @Srgjan, The 35 bytecodes refers to the number of bytecodes (out of 256) still unused (http://en.wikipedia.org/wiki/Java_bytecode). These would be used up rapidly if a primitive decimal was added (as each primitive uses many bytecodes).

    Adding a literal syntax that operates like String by constructing a class (BigDecimal in this case) does not of course require any bytecode changes.

    ReplyDelete
  19. Stephen, it's hardly backwards compatible, but couldn't that be resolved by using reified generics? To my knowledge, in C# the byte code is the same for add/subtract etc. regardless of the primitive in question. I remember hearing Anders Hejlsberg talk about this very issue a while back.

    ReplyDelete
  20. I think I'd prefer that an undecorated decimal value would be presumed to be BigDecimal (instead of Double). That, of course, fails the backwards compatibility test. We could use X|x as a suffix (roman numeral 10) as a big decimal indicator, at the risk of confusing anyone who uses leading X's to indicate hex constants.

    I think baking in support for big decimal literals without adding the primitive (using String and Array as precedents) is a good compromise.

    ReplyDelete
  21. I've been doing some benchmarking of BigDecimal. Turns out to be difficult to keep JNI overhead out of the numbers, and also it often takes much longer just to print results than to do the calculations that you're trying to measure. See this http://forums.java.net/jive/thread.jspa?threadID=18620&tstart=60 for example. Turns out after I posted that I figured out that the slowness is due to concatenation of Strings and a bit of I/O slowness, not due to something slow in BigDecimal.

    I also would like the ability to use operator overloading for BigDecimal. When we translate COBOL code to Java, we have to translate all operations from the pretty syntax to ugly method calls. That won't go over well when you're trying to convince customers that Java is more readable than COBOL.

    ReplyDelete
  22. Take a look at the Java operator compiler:

    https://jop.dev.java.net/

    ReplyDelete
  23. @Stephen: "What would the literal G/g suffix be short for?"
    Nothing particular. I just liked the G ;)
    Actually, one could think of T or t as suffix, which could be short for "ten based / base ten" a.k.a. decimal. One should actually do some brain storming on if it would be possible to replace the current number usage with BigDecimal or type-inference in general. For example, don't override operators in BD but in Number and try handling different subclasses thereof by autoconversion or similar.

    ReplyDelete
  24. My preference for operator overloading would be to make interfaces that would have to be implemented for the compiler to do overloading. The semantics can then be defined on this interface (eg. binary '+' is 'add', unary '+' is plus; 'add' method must be be commutative, associative, etc.).

    Don't know the maths well enough, but would a parameterised 'Ring' and 'Field' work?

    Eg. BigDecimal implements Field

    ReplyDelete
  25. Frankly, I cannot see much use in it.

    Besides of the (outstanding) coercion rules which open the door for many Java Puzzlers ("CAFEBABE") to come - my uses of BigDecimal had been mostly for the defined rounding semantics (i.e. fixed-point libraries) which can't be expressed by the literals (unless you add even more).

    Overloading math operators for BigXxx would be a nice syntactic shortcut, but literals (besides 0) are not really used that often when working with BigDecimals.

    ReplyDelete
  26. Wow, this shook my world a bit. It made me look at the .Net spec which defines that "CIL opcodes are one or more bytes long". Perhaps we should have looked more into the future?

    ReplyDelete