## digitalmars.D - Implement the "unum" representation in D ?

- "Nick B" <nick.barbalich gmail.com> Feb 20 2014
- "Nick B" <nick.barbalich gmail.com> Feb 20 2014
- "John Colvin" <john.loughran.colvin gmail.com> Feb 20 2014
- "Rikki Cattermole" <alphaglosined gmail.com> Feb 20 2014
- "w0rp" <devw0rp gmail.com> Feb 20 2014
- "bearophile" <bearophileHUGS lycos.com> Feb 20 2014
- "jerro" <jkrempus gmail.com> Feb 20 2014
- Iain Buclaw <ibuclaw gdcproject.org> Feb 20 2014
- Walter Bright <newshound2 digitalmars.com> Feb 20 2014
- "bearophile" <bearophileHUGS lycos.com> Feb 20 2014
- "Francesco Cattoglio" <francesco.cattoglio gmail.com> Feb 20 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 20 2014
- "jerro" <jkrempus gmail.com> Feb 20 2014
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> Feb 20 2014
- "Francesco Cattoglio" <francesco.cattoglio gmail.com> Feb 20 2014
- "Chris Williams" <yoreanon-chrisw yahoo.co.jp> Feb 20 2014
- "Francesco Cattoglio" <francesco.cattoglio gmail.com> Feb 20 2014
- "Frustrated" <c1514843 drdrb.com> Feb 20 2014
- "francesco cattoglio" <francesco.cattoglio gmail.com> Feb 20 2014
- "ponce" <contact gam3sfrommars.fr> Feb 21 2014
- "Ivan Kazmenko" <gassa mail.ru> Feb 21 2014
- "Chris Williams" <yoreanon-chrisw yahoo.co.jp> Feb 21 2014
- "Frustrated" <c1514843 drdrb.com> Feb 21 2014
- "Francesco Cattoglio" <francesco.cattoglio gmail.com> Feb 21 2014
- "Frustrated" <c1514843 drdrb.com> Feb 21 2014
- "Frustrated" <c1514843 drdrb.com> Feb 21 2014
- "Ivan Kazmenko" <gassa mail.ru> Feb 22 2014
- "Ivan Kazmenko" <gassa mail.ru> Feb 22 2014
- "Nick B" <nick.barbalich gmail.com> Feb 23 2014
- Iain Buclaw <ibuclaw gdcproject.org> Feb 23 2014

Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community. I am based in Wellington, New Zealand. In Auckland, NZ, from Tuesday to Friday next week there will be two seminars held. The first 2 days (Tuesday and Wednesday) are for the multicore conference. Details are http://www.multicoreworld.com/ Here is the schedule for 2 days (thursday & friday) of the SKA conference: http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/ John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf The math details are beyond me, but I understand his basic idea. I would like to bring your attention to Page 34 and his comments re "standard committees" and page 62 and his comments "Coded in Mathematica for now. Need a fast native version.." I am sure you can see where I am going with this .... 1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ? 2. Is there any interest in this format within the D community Destroy. Nick

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community.

Sorry if I was not clear what the SKA is. In a nutshell is a truely massive telescope project which will require massive computing resources. https://www.skatelescope.org/ Nick

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community. I am based in Wellington, New Zealand. In Auckland, NZ, from Tuesday to Friday next week there will be two seminars held. The first 2 days (Tuesday and Wednesday) are for the multicore conference. Details are http://www.multicoreworld.com/ Here is the schedule for 2 days (thursday & friday) of the SKA conference: http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/ John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf The math details are beyond me, but I understand his basic idea. I would like to bring your attention to Page 34 and his comments re "standard committees" and page 62 and his comments "Coded in Mathematica for now. Need a fast native version.." I am sure you can see where I am going with this .... 1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ? 2. Is there any interest in this format within the D community Destroy. Nick

Hmm. Interesting. It could be done in D, definitely. However, I'm a bit skeptical on how efficient it would ever be without specialist hardware. I'd have to read it over a few more times to get a proper grip on it.

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community. I am based in Wellington, New Zealand. In Auckland, NZ, from Tuesday to Friday next week there will be two seminars held. The first 2 days (Tuesday and Wednesday) are for the multicore conference. Details are http://www.multicoreworld.com/ Here is the schedule for 2 days (thursday & friday) of the SKA conference: http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/ John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf The math details are beyond me, but I understand his basic idea. I would like to bring your attention to Page 34 and his comments re "standard committees" and page 62 and his comments "Coded in Mathematica for now. Need a fast native version.." I am sure you can see where I am going with this .... 1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ? 2. Is there any interest in this format within the D community Destroy. Nick

It looks like something that could be made into a library, with the help of inline assembly. Shame I've got study next week. Definitely no chance of making it.

Feb 20 2014

This is very interesting, thank you for sharing this. My knowledge of assembly, compilers, and how machines actually work is limited, but I knew enough about the details of floating point to get the gist of it. The diagrams in the PDF also helped. I hope someone does more research on this, as it looks promising for improving the quality of calculations.

Feb 20 2014

Slide 6:Even the IEEE standard (1985) made choices that look dubious now<

On the other hand it's still working surprisingly well today.Negative zero. (ugh!)

It's not bad. Slide 15: are (Page Rank and) mp3 Players using floating point values? Slide 18: "is harder than doing THIS with them?" Hardware multiplication uses a different algorithm. Regarding the variable length of his FP numbers, their energy savings are beer-based numbers, I don't think they have any experimental basis (yet). Bye, bearophile

Feb 20 2014

Regarding the variable length of his FP numbers, their energy savings are beer-based numbers, I don't think they have any experimental basis (yet).

Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading both the pointer and the value from memory. Since pointers and double precision floats have the same size on modern hardware, one would expect this to actually consume more energy than just reading a double precision value. An additional indirection can also have great performance cost. And there's one more step you need to do. After getting the pointer you need to first read the utag so you can decide how many bytes to read. So where you would normally just read the value, you now need to read the pointer, use that to read the utag and use the utag to read the value.

Feb 20 2014

On 20 February 2014 15:26, bearophile <bearophileHUGS lycos.com> wrote:Slide 6:Even the IEEE standard (1985) made choices that look dubious now<

On the other hand it's still working surprisingly well today.Negative zero. (ugh!)

It's not bad. Slide 15: are (Page Rank and) mp3 Players using floating point values?

Most encoding formats use a form of discrete cosine transform, which involves floating point maths.

Feb 20 2014

On 2/20/2014 2:10 AM, Nick B wrote:1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ?

Yes, as a library type. I don't think it'll save any energy, though.2. Is there any interest in this format within the D community

I think it would be a fun and useful project. Any takers? (Be aware that there's a Python 'unum' type that is something quite different.)

Feb 20 2014

Iain Buclaw:Most encoding formats use a form of discrete cosine transform, which involves floating point maths.

I don't believe this much :-( Bye, bearophile

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second" Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.

Feb 20 2014

Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading both the pointer and the value from memory.

We might not need random access though. For basic linear algebra forward or bidirectional should be enough which should suit this format. Depends on the application.

Feb 20 2014

We might not need random access though.

Even if you don't need random access, you can't store them in a packed way if you want to be able to mutate them in place, since mathematical operations on them can change the number of bits they take.Depends on the application.

I suspect that in most numerical applications the number of bits needed to store the numbers will keep increasing during the computation until it reaches the maximal supported value. That can actually happen very fast - a single division with a non power of two is enough. Applications that could actually benefit from this in any way are probably extremely rare. The only one I can think of is to use it as a very simple form of compression, but I'm sure there are better options for that.

Feb 20 2014

The unum variable length encoding is very similar to how msgpack packs integers. See msgpack-d on github for a superb implementation in D.

Feb 20 2014

On Thursday, 20 February 2014 at 23:21:26 UTC, Nordlöw wrote:We might not need random access though. For basic linear algebra forward or bidirectional should be enough which should suit this format. Depends on the application.

"Basic Linear Algebra" often requires sparse matrices. Sparse Matrices = both random access and indirection.

Feb 20 2014

On Thursday, 20 February 2014 at 23:13:20 UTC, Francesco Cattoglio wrote:On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second" Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.

I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look). It also wasn't clear whether his format removed the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length, which would bring along with it all the problems of the IEEE format that he mentioned. I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.

Feb 20 2014

On Thursday, 20 February 2014 at 23:52:13 UTC, Chris Williams wrote:I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look).

It also wasn't clear whether his format removed the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length

multiple representations is still there. The only real advantage is that you can probably store a NaN in 8 bits. Honestly, who cares. If I get a NaN in my numerical simulation, I have bigger concerns other than saving memory space.I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.

rational numbers. How extraordinary! :P The whole idea has one merit: the float now stores it's own accuracy. But I think you can achieve more or less the same goal by storing a pair of something. The whole idea of saving space sounds bogus, at least for my field of application. We already have an amazing technique for saving a lot of space for numerical simulation of PDEs, it's called grid refinement.

Feb 20 2014

On Thursday, 20 February 2014 at 23:41:12 UTC, jerro wrote:We might not need random access though.

Even if you don't need random access, you can't store them in a packed way if you want to be able to mutate them in place, since mathematical operations on them can change the number of bits they take.Depends on the application.

I suspect that in most numerical applications the number of bits needed to store the numbers will keep increasing during the computation until it reaches the maximal supported value. That can actually happen very fast - a single division with a non power of two is enough. Applications that could actually benefit from this in any way are probably extremely rare. The only one I can think of is to use it as a very simple form of compression, but I'm sure there are better options for that.

Yes, but his ubox method attempts to solve this problem by providing only the most accurate answers. By using a "sliding" floating point you can control the accuracy much better and by using the ubox method you can zero in on the most accurate solutions. I think a few here are missing the point. First: The unums self scale to provide the "fewest bit" representation ala his morse code vs ascii example. Not a huge deal, after all, it's only memory. But by using such representations one can have more accurate results because one can better represent values that are not accurately representable in standard fp. I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc... with more accuracy than normal and handled internally by the cpu. With such constants one could have sqrt(2)*sqrt(2) = 2 exactly. (designate part of the constants as "squares" since one could have up to 2^n - 1 constants) The problem I see is that to make it all work requires a lot of work on the hardware. Without it, there is no real reason to use it. One also runs into the problem of optimization. Having variable sized fp numbers may not be very efficient in memory alignment. A list of resizedable fp types would contain 1, 2, 3, 4, 5, ... byte numbers of random size. If you do a calculation on the list and try to use the list as storage again could end up with a different sized list... which could be very inefficient. You'll probably have to allocate a list of the maximum size possible in the first place to prevent buffer overflows, which then defeats the purpose in some sense In any case, one could have a type where the last 2 bits designate the representation. 00 - Positive Integer 01 - Floating point 10 - Constant - (value represents an index, possibly virtual, into a table of constants) 11 - Negative Integer For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)

Feb 20 2014

On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc...

constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)

thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.

Feb 20 2014

On Thursday, 20 February 2014 at 23:43:18 UTC, Nordlöw wrote:The unum variable length encoding is very similar to how msgpack packs integers. See msgpack-d on github for a superb implementation in D.

msgpack-d is indeed a great library that makes serialization almost instant to implement. I implemented CBOR another binary encoding scheme and it was obvious CBOR brings nothing over msgpack, it even did worse with integer encoding.

Feb 21 2014

On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly.

I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q. The reason is, if we take a reciprocal 1/q and represent it as a binary, decimal, or other fixed-base fraction, the representation is surely periodic, but the upper bound on the length of its period is as high as q-1, and it is not unlikely to be the exact bound. For example, at http://en.wikipedia.org/wiki/Binary_number#Fractions , note that the binary fraction for 1/11 has period 10, and for 1/13 the period is 12. Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator). Ivan Kazmenko.

Feb 21 2014

On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q.

Yeah, in retrospect I would say that a binary layout like: numberator length | +- | numerator | divisor Might be a stronger solution, or at least one which offers different advantages over what we have today. It still wouldn't be able to represent pi accurately, since it isn't a rational number, but I wouldn't be surprised if a rational number exists that could be represented than can be represented in IEEE format.

Feb 21 2014

On Friday, 21 February 2014 at 07:42:36 UTC, francesco cattoglio wrote:On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc...

constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.

I disagree. Many computations done by computers involve constants in the calculation... pi, e, sqrt(2), sqrt(-1), many physical constants, etc. The cpu could store these constants at a higher bit resolution than standard, say 128 bits or whatever.For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)

(I'm thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.

Simply not true. Maple, for example, uses constants and can compute using constants. It might speed up the calculations significantly if one could compute at the cpu level using constants. e.g. pi*sqrt(2) is just another constant. 2*pi is another constant. Obviously there is a limit to the number of constants one can store but one can encode many constants easily without storage. (where the "index" itself is used as the value) Also, the cpu could reduce values to constants... so if you have a fp value that is 3.14159265..... or whatever, the cpu could convert this to the constant pi since for all purposes it is pi(even if it is just an approximation or pi - 1/10^10). Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits. I doubt anyone would miss it. In fact, probably no real loss.... NaN could be a constant, in which case you would have only one rather than the 2 billion or whatever you have now).

Feb 21 2014

On Friday, 21 February 2014 at 19:12:39 UTC, Frustrated wrote:Simply not true. Maple, for example, uses constants and can compute using constants.

two are completely unrelated.Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits.

basically creating a tagged union type. It's pretty much unrelated from the "unum" proposed by the PDF.

Feb 21 2014

On Friday, 21 February 2014 at 19:59:36 UTC, Francesco Cattoglio wrote:On Friday, 21 February 2014 at 19:12:39 UTC, Frustrated wrote:Simply not true. Maple, for example, uses constants and can compute using constants.

The two are completely unrelated.Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits.

basically creating a tagged union type. It's pretty much unrelated from the "unum" proposed by the PDF.

Yes, I mentioned that would could extend floating points to include other representations that are more efficient and reduce errors and such.

Feb 21 2014

On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly.

I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q. The reason is, if we take a reciprocal 1/q and represent it as a binary, decimal, or other fixed-base fraction, the representation is surely periodic, but the upper bound on the length of its period is as high as q-1, and it is not unlikely to be the exact bound. For example, at http://en.wikipedia.org/wiki/Binary_number#Fractions , note that the binary fraction for 1/11 has period 10, and for 1/13 the period is 12. Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator).

No, you are comparing apples to oranges. The q is not the same in both equations. The number of bits for p/q when p and q are stored separately is log[2](p) + log[2](q). But when p/q is stored as a repeating decimal(assuming it repeats), then it is a fixed constant dependent on the number of bits. So in some cases the rational expression is cheaper and in other cases the repeating decimal is cheaper. e.g., .1.... in theory could take 1 bit to represent wile 1/9 takes 1 + 4 = 5. (the point being is that the rational representation sometimes takes way more bits than the repeating decimal representation, other times it doesn't. Basically if we use n bits to represent the repeating decimal we can't represent repetitions that require more than n bits, in this case a rational expression may be cheaper. If we use some type of compression then rational expressions would, in general, be cheaper the larger n is. e.g., 1/9 takes 5 bits to represent as as a rational faction(4 if we require the numerator to be 1). 0.1.... takes 1 bit to represent in theory but for n bits, it takes n bits. e.g., 1/9 = 11111111/10^8 = 0.000111... in decimal. Which takes 6 bits to store(000111). If n was large and fixed then I think statistically it would be best to use rational expressions rather than repeating decimals. But rational expressions are not unique and that may cause some problems with representation... unless the cpu implemented some algorithms for reduction. 1/9 = 10/90 but 10/90 takes way more bits to represent than 1/9 even though they represent the same repeating decimal and the repeating decimals bits are fixed. In any case, if the fpu had a history of calculations it could potentially store the calculations in an expression tree and attempt to reduce them. e.g., p/q*(q/p) = 1. A history could also be useful for constants in that multiplying several constants together could produce a new constant which would not introduce any new accumulation errors.

Feb 21 2014

On Friday, 21 February 2014 at 20:27:18 UTC, Frustrated wrote:On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator).

No, you are comparing apples to oranges. The q is not the same in both equations.

Huh? I believe my q is actually the same q in both. What I am saying is, more formally: 1. Representing p/q as a rational number asymptotically takes on the order of log(p)+log(q) bits. 2. Representing p/q as a repeating binary fraction asymptotically takes on the order of log(p)+q bits in the worst case, and this worst case is common. Make that a decimal fraction instead of a binary fraction if you wish, the statement remains the same.The number of bits for p/q when p and q are stored separately is log[2](p) + log[2](q). But when p/q is stored as a repeating decimal(assuming it repeats), then it is a fixed constant dependent on the number of bits.

A rational represented as a fixed-base (binary, decimal, etc.) fraction always has a period. Sometimes the period is degenerate, that is, consists of a single zero: 1/2 = 0.100000... = 0.1 as a binary fraction. Sometimes there is a non-degenerate pre-period part before the period: 13/10 = 1.0100110011 = 1.0(1001) as a binary fraction, the "1.0" part being the pre-period and the "(1001)" part the period. The same goes for decimal fractions.So in some cases the rational expression is cheaper and in other cases the repeating decimal is cheaper.

Sure, it depends on the p and q, I was talking about the asymptotic case. Say, for p and q uniformly distributed in [100..999], I believe the rational representation is much shorter on average. We can measure that if the practical need arises... :) In practice, one may argue that large prime denominators don't occur too often; well, for such applications, fine then.If n was large and fixed then I think statistically it would be best to use rational expressions rather than repeating decimals.

Yeah, that's similar to what I was trying to state.But rational expressions are not unique and that may cause some problems with representation... unless the cpu implemented some algorithms for reduction. 1/9 = 10/90 but 10/90 takes way more bits to represent than 1/9 even though they represent the same repeating decimal and the repeating decimals bits are fixed.

Given a rational, bringing it to lowest terms is as easy (yet costly) as calculating the GCD.In any case, if the fpu had a history of calculations it could potentially store the calculations in an expression tree and attempt to reduce them. e.g., p/q*(q/p) = 1. A history could also be useful for constants in that multiplying several constants together could produce a new constant which would not introduce any new accumulation errors.

As far as I know, most compilers do that for constants known at compile time. As for the run-time usage of constants, note that the pool of constants will still have to be stored somewhere, and addressed somehow, and this may cancel out the performance gains. Ivan Kazmenko.

Feb 22 2014

On Saturday, 22 February 2014 at 14:17:23 UTC, Ivan Kazmenko wrote:Sometimes there is a non-degenerate pre-period part before the period: 13/10 = 1.0100110011 = 1.0(1001) as a binary fraction, the "1.0" part being the pre-period and the "(1001)" part the period. The same goes for decimal fractions.

Sorry, I meant 13/10 = 1.0100110011..., that is, the "1001" part repeating forever - as opposed to just stopping at 10-th bit of the fraction.

Feb 22 2014

Hi I will ask my question again. Is there any interest in this format within the D community ? Nick

Feb 23 2014

On 20 February 2014 22:46, bearophile <bearophileHUGS lycos.com> wrote:Iain Buclaw:Most encoding formats use a form of discrete cosine transform, which involves floating point maths.

I don't believe this much :-(

:-( I looked up vorbis (very, very) briefly way of example. Most structs that deal with encoding/decoding use integers, floats and doubles to represent values such as decay, attenuation, thresholds, amplification etc.

Feb 23 2014