## digitalmars.D - BigInts in Tango/Phobos

- bearophile <bearophileHUGS lycos.com> Sep 09 2009
- Don <nospam nospam.com> Sep 10 2009

I have tried to find Don's email address, but so far I have failed (and I have not seen him on IRC d.tango), so I talk here, even if this isn't the best place. I think this is the data representation of a Tango BigInt: alias uint BigDigit; const BigDigit[] ZERO = [0]; struct BigUint { BigDigit[] data = ZERO; } struct BigInt { BigUint data; // BigInt adds signed arithmetic to BigUint. bool sign = false; } I think the following representation may be better: struct BigInt { size_t* data; int sizeVal; // sign(sizeVal) is always the sign of the BigInt. // If data!=null then abs(sizeVal) is the actually size used // of the memory block, otherwise sizeVal is the value of this // BigInt (useful for small BigInts), with overflow tests too. int capacity; // The available size of the block } So the operations on such BigInts are cheap if the numbers can be stored in a 32 bit int (it just has to test if data is null and then perform a safe operation on 32 bit numbers). This first part can be done in a small struct method that I hope will always be inlined. If data isn't null then there's a call to the bigint machienery. When a 32 bit sizeVal produces overflow the bigint is converted into a normal bigint. Small integers are very common, so speeding up this case is very useful. Overflow tests in C: http://www.fefe.de/intof.html The capacity/sizeVal pair is modeled on a similar pair of the C++ vector, so it can mutate inplace and grow avoiding some reallocations (GNU GMP too allows inplace mutation). In theory inside such BigInt struct there's space for a 64 bit value (as a union) but I think 64 bit values aren't common enouggh to justify the little slowdown on smaller values. Bye, bearophile

Sep 09 2009

bearophile wrote:I have tried to find Don's email address, but so far I have failed (and I have not seen him on IRC d.tango), so I talk here, even if this isn't the best place.

You should post this kind of thing as a ticket in Tango.I think this is the data representation of a Tango BigInt: alias uint BigDigit; const BigDigit[] ZERO = [0]; struct BigUint { BigDigit[] data = ZERO; } struct BigInt { BigUint data; // BigInt adds signed arithmetic to BigUint. bool sign = false; } I think the following representation may be better: struct BigInt { size_t* data; int sizeVal; // sign(sizeVal) is always the sign of the BigInt. // If data!=null then abs(sizeVal) is the actually size used // of the memory block, otherwise sizeVal is the value of this // BigInt (useful for small BigInts), with overflow tests too. int capacity; // The available size of the block } So the operations on such BigInts are cheap if the numbers can be stored in a 32 bit int (it just has to test if data is null and then perform a safe operation on 32 bit numbers). This first part can be done in a small struct method that I hope will always be inlined. If data isn't null then there's a call to the bigint machienery. When a 32 bit sizeVal produces overflow the bigint is converted into a normal bigint.

Yes, I've considered doing something like this. But I'm actually not happy at all with copy-on-write BigInts as used in Tango, they result in a huge amount of wasted memory allocation. They are the only option in D1, but I'll do it differently in D2. I've really just concentrated on getting the low-level routines working, and exposing a minimal wrapper for functionality.My feeling was that the optimal design for BigInt couldn't be settled without reference to BigFloat. Someone had said they were working on a BigFloat, but it doesn't seem to have materialized.Small integers are very common, so speeding up this case is very useful.

Overflow tests in C: http://www.fefe.de/intof.html The capacity/sizeVal pair is modeled on a similar pair of the C++ vector, so it can mutate inplace and grow avoiding some reallocations (GNU GMP too allows inplace mutation).

Yes. But note that mutate in-place is impossible with copy-on-write, so it's D2 only. BigInts are also the perfect place for memory pools. Temporary variables can in theory be dealt with extremely efficiently.In theory inside such BigInt struct there's space for a 64 bit value (as a union) but I think 64 bit values aren't common enouggh to justify the little slowdown on smaller values.

You're probably right. But I'm concentrating on compiler bugfixes right now, BigInt is on hold for now.

Sep 10 2009