## digitalmars.D - Implement the "unum" representation in D ?

- Nick B (27/27) Feb 20 2014 Hi everyone....
- Nick B (6/9) Feb 20 2014 Sorry if I was not clear what ...
- John Colvin (5/33) Feb 20 2014 Hmm. Interesting. It could be ...
- Rikki Cattermole (4/32) Feb 20 2014 It looks like something that c...
- w0rp (6/6) Feb 20 2014 This is very interesting, than...
- bearophile (12/14) Feb 20 2014 It's not bad....
- jerro (12/15) Feb 20 2014 Also, because they are variabl...
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (4/7) Feb 20 2014 We might not need random acces...
- jerro (15/17) Feb 20 2014 Even if you don't need random ...
- Frustrated (49/66) Feb 20 2014 Yes, but his ubox method attem...
- francesco cattoglio (9/24) Feb 20 2014 Unfortunately maths (real worl...
- Frustrated (22/50) Feb 21 2014 I disagree. Many computations ...
- Francesco Cattoglio (6/11) Feb 21 2014 You are mixing symbolic calcul...
- Frustrated (5/16) Feb 21 2014 Yes, I mentioned that would co...
- Ivan Kazmenko (16/22) Feb 21 2014 I believe that the repeating d...
- Chris Williams (8/11) Feb 21 2014 Yeah, in retrospect I would sa...
- Frustrated (39/62) Feb 21 2014 No, you are comparing apples t...
- Ivan Kazmenko (33/61) Feb 22 2014 Huh? I believe my q is actual...
- Ivan Kazmenko (5/10) Feb 22 2014 Sorry, I meant 13/10 = 1.01001...
- Nick B (4/4) Feb 23 2014 Hi...
- Francesco Cattoglio (3/7) Feb 20 2014 "Basic Linear Algebra" often r...
- "Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> (7/9) Jul 13 Unless the author was thinking...
- Nick B (2/6) Jul 16 Question for Andrei, above, if...
- Andrei Alexandrescu (3/8) Jul 17 Haven't gotten to reading abou...
- Iain Buclaw (3/9) Feb 20 2014 Most encoding formats use a fo...
- bearophile (4/6) Feb 20 2014 I don't believe this much :-(...
- Iain Buclaw (6/10) Feb 23 2014 :-(...
- Walter Bright (4/7) Feb 20 2014 I think it would be a fun and ...
- Francesco Cattoglio (9/11) Feb 20 2014 "The pursuit of exascale float...
- Chris Williams (16/28) Feb 20 2014 I don't quite understand his u...
- Francesco Cattoglio (16/30) Feb 20 2014 Exctly. If I read correctly it...
- =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= (3/3) Feb 20 2014 The unum variable length encod...
- ponce (6/9) Feb 21 2014 msgpack-d is indeed a great li...
- Nick B (90/97) Jul 10 FYI...
- Andrei Alexandrescu (2/5) Jul 11 Very interesting, I'll read it...
- Timon Gehr (2/10) Jul 11 I think Walter should read cha...
- deadalnix (2/14) Sep 16 What is this chapter about ?...
- Timon Gehr (9/25) Sep 16 Relevant quote: "Programmers a...
- Russel Winder via Digitalmars-d (15/27) Jul 11 6582...
- ponce (3/6) Jul 11 Sample chapter here: ...
- jmh530 (13/15) Jul 11 I was just thinking about over...
- Nick B (3/14) Jul 12 I glad that you guys have foun...
- Andrei Alexandrescu (14/28) Jul 22 Just read http://radiofreehpc....
- jmh530 (5/10) Jul 22 The book has quite a bit of Ma...
- Anthony Di Franco (31/43) Sep 14 Here is a Python port of the M...
- Shachar Shemesh (12/29) Jul 12 As far as I can tell, the auth...
- Iain Buclaw via Digitalmars-d (13/44) Jul 12 http://www.amazon.com/End-Erro...
- deadalnix (6/112) Sep 14 To be honest, that sound like ...
- ponce (5/7) Sep 15 The sample chapter dissipates ...
- deadalnix (3/11) Sep 15 Read it. That is suddenly much...
- ponce (7/19) Sep 15 I can see this being useful si...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/9) Sep 15 I think he is looking into 32 ...
- ponce (3/4) Sep 15 That's a pretty convincing cas...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/7) Sep 15 You:9...
- Don (27/33) Sep 16 I'm not convinced. I think the...
- deadalnix (34/67) Sep 16 GPU do it a lot. Especially, b...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/22) Sep 16 That really depends on memory ...
- deadalnix (15/39) Sep 16 No you don't. Because the stre...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (20/34) Sep 16 You can load continuously 64 b...
- deadalnix (44/69) Sep 16 1/ If you load the worst case ...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (20/27) Sep 16 There is _no_ cache. The compi...
- deadalnix (30/52) Sep 16 Stop...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (2/12) Sep 16 OK. I stop. You are beyond rea...
- deadalnix (7/21) Sep 16 True, how blind I was. It is f...
- H. S. Teoh via Digitalmars-d (44/69) Sep 16 I found this .pdf that explain...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (25/45) Sep 16 Let's not make it so complicat...
- jmh530 (18/20) Sep 16 I don't recall how he would de...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/21) Sep 16 I don't think he is downplayin...
- Wyatt (7/11) Sep 16 Oh hey, I remember these joker...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/15) Sep 16 Yes, of course, most startups ...
- Timon Gehr (2/7) Sep 16 https://en.wikipedia.org/wiki/...
- Anthony Di Franco (42/54) Sep 17 I read the whole book and did ...
- Nick B (9/15) Sep 17 It would seem to depend on h...
- Ola Fosheim Grostad (4/7) Sep 17 Verified numerical computation...
- Richard Davies (25/33) Nov 08 Hi,...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (29/36) Sep 18 I don't think you should expec...
- skoppe (4/10) Sep 18 You forgot to mention that D i...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Sep 18 Yes, but that does not define ...
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/8) Sep 18 Also keep in mind that people ...

Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community. I am based in Wellington, New Zealand. In Auckland, NZ, from Tuesday to Friday next week there will be two seminars held. The first 2 days (Tuesday and Wednesday) are for the multicore conference. Details are http://www.multicoreworld.com/ Here is the schedule for 2 days (thursday & friday) of the SKA conference: http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/ John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf The math details are beyond me, but I understand his basic idea. I would like to bring your attention to Page 34 and his comments re "standard committees" and page 62 and his comments "Coded in Mathematica for now. Need a fast native version.." I am sure you can see where I am going with this .... 1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ? 2. Is there any interest in this format within the D community Destroy. Nick

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community.

Sorry if I was not clear what the SKA is. In a nutshell is a truely massive telescope project which will require massive computing resources. https://www.skatelescope.org/ Nick

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community. I am based in Wellington, New Zealand. In Auckland, NZ, from Tuesday to Friday next week there will be two seminars held. The first 2 days (Tuesday and Wednesday) are for the multicore conference. Details are http://www.multicoreworld.com/ Here is the schedule for 2 days (thursday & friday) of the SKA conference: http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/ John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf The math details are beyond me, but I understand his basic idea. I would like to bring your attention to Page 34 and his comments re "standard committees" and page 62 and his comments "Coded in Mathematica for now. Need a fast native version.." I am sure you can see where I am going with this .... 1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ? 2. Is there any interest in this format within the D community Destroy. Nick

Hmm. Interesting. It could be done in D, definitely. However, I'm a bit skeptical on how efficient it would ever be without specialist hardware. I'd have to read it over a few more times to get a proper grip on it.

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone. I'm attend the SKA conference in Auckland next week and I would like to discuss a opportunity for the D community. I am based in Wellington, New Zealand. In Auckland, NZ, from Tuesday to Friday next week there will be two seminars held. The first 2 days (Tuesday and Wednesday) are for the multicore conference. Details are http://www.multicoreworld.com/ Here is the schedule for 2 days (thursday & friday) of the SKA conference: http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/ John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf The math details are beyond me, but I understand his basic idea. I would like to bring your attention to Page 34 and his comments re "standard committees" and page 62 and his comments "Coded in Mathematica for now. Need a fast native version.." I am sure you can see where I am going with this .... 1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ? 2. Is there any interest in this format within the D community Destroy. Nick

It looks like something that could be made into a library, with the help of inline assembly. Shame I've got study next week. Definitely no chance of making it.

Feb 20 2014

This is very interesting, thank you for sharing this. My knowledge of assembly, compilers, and how machines actually work is limited, but I knew enough about the details of floating point to get the gist of it. The diagrams in the PDF also helped. I hope someone does more research on this, as it looks promising for improving the quality of calculations.

Feb 20 2014

Slide 6:Even the IEEE standard (1985) made choices that look dubious now<

On the other hand it's still working surprisingly well today.Negative zero. (ugh!)

It's not bad. Slide 15: are (Page Rank and) mp3 Players using floating point values? Slide 18: "is harder than doing THIS with them?" Hardware multiplication uses a different algorithm. Regarding the variable length of his FP numbers, their energy savings are beer-based numbers, I don't think they have any experimental basis (yet). Bye, bearophile

Feb 20 2014

Regarding the variable length of his FP numbers, their energy savings are beer-based numbers, I don't think they have any experimental basis (yet).

Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading both the pointer and the value from memory. Since pointers and double precision floats have the same size on modern hardware, one would expect this to actually consume more energy than just reading a double precision value. An additional indirection can also have great performance cost. And there's one more step you need to do. After getting the pointer you need to first read the utag so you can decide how many bytes to read. So where you would normally just read the value, you now need to read the pointer, use that to read the utag and use the utag to read the value.

Feb 20 2014

Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading both the pointer and the value from memory.

We might not need random access though. For basic linear algebra forward or bidirectional should be enough which should suit this format. Depends on the application.

Feb 20 2014

We might not need random access though.

Even if you don't need random access, you can't store them in a packed way if you want to be able to mutate them in place, since mathematical operations on them can change the number of bits they take.Depends on the application.

I suspect that in most numerical applications the number of bits needed to store the numbers will keep increasing during the computation until it reaches the maximal supported value. That can actually happen very fast - a single division with a non power of two is enough. Applications that could actually benefit from this in any way are probably extremely rare. The only one I can think of is to use it as a very simple form of compression, but I'm sure there are better options for that.

Feb 20 2014

On Thursday, 20 February 2014 at 23:41:12 UTC, jerro wrote:We might not need random access though.

Even if you don't need random access, you can't store them in a packed way if you want to be able to mutate them in place, since mathematical operations on them can change the number of bits they take.Depends on the application.

I suspect that in most numerical applications the number of bits needed to store the numbers will keep increasing during the computation until it reaches the maximal supported value. That can actually happen very fast - a single division with a non power of two is enough. Applications that could actually benefit from this in any way are probably extremely rare. The only one I can think of is to use it as a very simple form of compression, but I'm sure there are better options for that.

Yes, but his ubox method attempts to solve this problem by providing only the most accurate answers. By using a "sliding" floating point you can control the accuracy much better and by using the ubox method you can zero in on the most accurate solutions. I think a few here are missing the point. First: The unums self scale to provide the "fewest bit" representation ala his morse code vs ascii example. Not a huge deal, after all, it's only memory. But by using such representations one can have more accurate results because one can better represent values that are not accurately representable in standard fp. I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc... with more accuracy than normal and handled internally by the cpu. With such constants one could have sqrt(2)*sqrt(2) = 2 exactly. (designate part of the constants as "squares" since one could have up to 2^n - 1 constants) The problem I see is that to make it all work requires a lot of work on the hardware. Without it, there is no real reason to use it. One also runs into the problem of optimization. Having variable sized fp numbers may not be very efficient in memory alignment. A list of resizedable fp types would contain 1, 2, 3, 4, 5, ... byte numbers of random size. If you do a calculation on the list and try to use the list as storage again could end up with a different sized list... which could be very inefficient. You'll probably have to allocate a list of the maximum size possible in the first place to prevent buffer overflows, which then defeats the purpose in some sense In any case, one could have a type where the last 2 bits designate the representation. 00 - Positive Integer 01 - Floating point 10 - Constant - (value represents an index, possibly virtual, into a table of constants) 11 - Negative Integer For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)

Feb 20 2014

On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc...

Unfortunately maths (real world maths) isn't made by "common" constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)

Things like those are cool and might have their application (I'm thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.

Feb 20 2014

On Friday, 21 February 2014 at 07:42:36 UTC, francesco cattoglio wrote:On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc...

Unfortunately maths (real world maths) isn't made by "common" constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.

I disagree. Many computations done by computers involve constants in the calculation... pi, e, sqrt(2), sqrt(-1), many physical constants, etc. The cpu could store these constants at a higher bit resolution than standard, say 128 bits or whatever.For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)

Things like those are cool and might have their application (I'm thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.

Simply not true. Maple, for example, uses constants and can compute using constants. It might speed up the calculations significantly if one could compute at the cpu level using constants. e.g. pi*sqrt(2) is just another constant. 2*pi is another constant. Obviously there is a limit to the number of constants one can store but one can encode many constants easily without storage. (where the "index" itself is used as the value) Also, the cpu could reduce values to constants... so if you have a fp value that is 3.14159265..... or whatever, the cpu could convert this to the constant pi since for all purposes it is pi(even if it is just an approximation or pi - 1/10^10). Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits. I doubt anyone would miss it. In fact, probably no real loss.... NaN could be a constant, in which case you would have only one rather than the 2 billion or whatever you have now).

Feb 21 2014

On Friday, 21 February 2014 at 19:12:39 UTC, Frustrated wrote:Simply not true. Maple, for example, uses constants and can compute using constants.

You are mixing symbolic calculus and numerical computations. The two are completely unrelated.Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits.

Unless I'm completely mistaken, what you are proposing is basically creating a tagged union type. It's pretty much unrelated from the "unum" proposed by the PDF.

Feb 21 2014

On Friday, 21 February 2014 at 19:59:36 UTC, Francesco Cattoglio wrote:On Friday, 21 February 2014 at 19:12:39 UTC, Frustrated wrote:Simply not true. Maple, for example, uses constants and can compute using constants.

You are mixing symbolic calculus and numerical computations. The two are completely unrelated.Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits.

Unless I'm completely mistaken, what you are proposing is basically creating a tagged union type. It's pretty much unrelated from the "unum" proposed by the PDF.

Yes, I mentioned that would could extend floating points to include other representations that are more efficient and reduce errors and such.

Feb 21 2014

On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly.

I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q. The reason is, if we take a reciprocal 1/q and represent it as a binary, decimal, or other fixed-base fraction, the representation is surely periodic, but the upper bound on the length of its period is as high as q-1, and it is not unlikely to be the exact bound. For example, at http://en.wikipedia.org/wiki/Binary_number#Fractions , note that the binary fraction for 1/11 has period 10, and for 1/13 the period is 12. Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator). Ivan Kazmenko.

Feb 21 2014

On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q.

Yeah, in retrospect I would say that a binary layout like: numberator length | +- | numerator | divisor Might be a stronger solution, or at least one which offers different advantages over what we have today. It still wouldn't be able to represent pi accurately, since it isn't a rational number, but I wouldn't be surprised if a rational number exists that could be represented than can be represented in IEEE format.

Feb 21 2014

On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly.

I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q. The reason is, if we take a reciprocal 1/q and represent it as a binary, decimal, or other fixed-base fraction, the representation is surely periodic, but the upper bound on the length of its period is as high as q-1, and it is not unlikely to be the exact bound. For example, at http://en.wikipedia.org/wiki/Binary_number#Fractions , note that the binary fraction for 1/11 has period 10, and for 1/13 the period is 12. Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator).

No, you are comparing apples to oranges. The q is not the same in both equations. The number of bits for p/q when p and q are stored separately is log[2](p) + log[2](q). But when p/q is stored as a repeating decimal(assuming it repeats), then it is a fixed constant dependent on the number of bits. So in some cases the rational expression is cheaper and in other cases the repeating decimal is cheaper. e.g., .1.... in theory could take 1 bit to represent wile 1/9 takes 1 + 4 = 5. (the point being is that the rational representation sometimes takes way more bits than the repeating decimal representation, other times it doesn't. Basically if we use n bits to represent the repeating decimal we can't represent repetitions that require more than n bits, in this case a rational expression may be cheaper. If we use some type of compression then rational expressions would, in general, be cheaper the larger n is. e.g., 1/9 takes 5 bits to represent as as a rational faction(4 if we require the numerator to be 1). 0.1.... takes 1 bit to represent in theory but for n bits, it takes n bits. e.g., 1/9 = 11111111/10^8 = 0.000111... in decimal. Which takes 6 bits to store(000111). If n was large and fixed then I think statistically it would be best to use rational expressions rather than repeating decimals. But rational expressions are not unique and that may cause some problems with representation... unless the cpu implemented some algorithms for reduction. 1/9 = 10/90 but 10/90 takes way more bits to represent than 1/9 even though they represent the same repeating decimal and the repeating decimals bits are fixed. In any case, if the fpu had a history of calculations it could potentially store the calculations in an expression tree and attempt to reduce them. e.g., p/q*(q/p) = 1. A history could also be useful for constants in that multiplying several constants together could produce a new constant which would not introduce any new accumulation errors.

Feb 21 2014

On Friday, 21 February 2014 at 20:27:18 UTC, Frustrated wrote:On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator).

No, you are comparing apples to oranges. The q is not the same in both equations.

Huh? I believe my q is actually the same q in both. What I am saying is, more formally: 1. Representing p/q as a rational number asymptotically takes on the order of log(p)+log(q) bits. 2. Representing p/q as a repeating binary fraction asymptotically takes on the order of log(p)+q bits in the worst case, and this worst case is common. Make that a decimal fraction instead of a binary fraction if you wish, the statement remains the same.The number of bits for p/q when p and q are stored separately is log[2](p) + log[2](q). But when p/q is stored as a repeating decimal(assuming it repeats), then it is a fixed constant dependent on the number of bits.

A rational represented as a fixed-base (binary, decimal, etc.) fraction always has a period. Sometimes the period is degenerate, that is, consists of a single zero: 1/2 = 0.100000... = 0.1 as a binary fraction. Sometimes there is a non-degenerate pre-period part before the period: 13/10 = 1.0100110011 = 1.0(1001) as a binary fraction, the "1.0" part being the pre-period and the "(1001)" part the period. The same goes for decimal fractions.So in some cases the rational expression is cheaper and in other cases the repeating decimal is cheaper.

Sure, it depends on the p and q, I was talking about the asymptotic case. Say, for p and q uniformly distributed in [100..999], I believe the rational representation is much shorter on average. We can measure that if the practical need arises... :) In practice, one may argue that large prime denominators don't occur too often; well, for such applications, fine then.If n was large and fixed then I think statistically it would be best to use rational expressions rather than repeating decimals.

Yeah, that's similar to what I was trying to state.But rational expressions are not unique and that may cause some problems with representation... unless the cpu implemented some algorithms for reduction. 1/9 = 10/90 but 10/90 takes way more bits to represent than 1/9 even though they represent the same repeating decimal and the repeating decimals bits are fixed.

Given a rational, bringing it to lowest terms is as easy (yet costly) as calculating the GCD.In any case, if the fpu had a history of calculations it could potentially store the calculations in an expression tree and attempt to reduce them. e.g., p/q*(q/p) = 1. A history could also be useful for constants in that multiplying several constants together could produce a new constant which would not introduce any new accumulation errors.

As far as I know, most compilers do that for constants known at compile time. As for the run-time usage of constants, note that the pool of constants will still have to be stored somewhere, and addressed somehow, and this may cancel out the performance gains. Ivan Kazmenko.

Feb 22 2014

On Saturday, 22 February 2014 at 14:17:23 UTC, Ivan Kazmenko wrote:Sometimes there is a non-degenerate pre-period part before the period: 13/10 = 1.0100110011 = 1.0(1001) as a binary fraction, the "1.0" part being the pre-period and the "(1001)" part the period. The same goes for decimal fractions.

Sorry, I meant 13/10 = 1.0100110011..., that is, the "1001" part repeating forever - as opposed to just stopping at 10-th bit of the fraction.

Feb 22 2014

Hi I will ask my question again. Is there any interest in this format within the D community ? Nick

Feb 23 2014

On Thursday, 20 February 2014 at 23:21:26 UTC, Nordlöw wrote:We might not need random access though. For basic linear algebra forward or bidirectional should be enough which should suit this format. Depends on the application.

"Basic Linear Algebra" often requires sparse matrices. Sparse Matrices = both random access and indirection.

Feb 20 2014

On Thursday, 20 February 2014 at 21:30:10 UTC, jerro wrote:Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading

Unless the author was thinking in terms of D Ranges, for the algorithms that does *not* require random access. Can we do anything useful with unums in numeric algorithms if only have forward or bidirectional access? Similar to algorithms such as Levenshtein that are compatible with UTF-8 and UTF-16, Andrei? :)

Jul 13

On Monday, 13 July 2015 at 21:25:12 UTC, Per Nordlöw wrote:Can we do anything useful with unums in numeric algorithms if only have forward or bidirectional access? Similar to algorithms such as Levenshtein that are compatible with UTF-8 and UTF-16, Andrei? :)

Question for Andrei, above, if he would like to reply.

Jul 16

On 7/16/15 9:49 PM, Nick B wrote:On Monday, 13 July 2015 at 21:25:12 UTC, Per Nordlöw wrote:Can we do anything useful with unums in numeric algorithms if only have forward or bidirectional access? Similar to algorithms such as Levenshtein that are compatible with UTF-8 and UTF-16, Andrei? :)

Question for Andrei, above, if he would like to reply.

Haven't gotten to reading about unum, so I'm a bit lost. Who has forward/bidirectional access? -- Andrei

Jul 17

On 20 February 2014 15:26, bearophile <bearophileHUGS lycos.com> wrote:Slide 6:Even the IEEE standard (1985) made choices that look dubious now<

On the other hand it's still working surprisingly well today.Negative zero. (ugh!)

It's not bad. Slide 15: are (Page Rank and) mp3 Players using floating point values?

Most encoding formats use a form of discrete cosine transform, which involves floating point maths.

Feb 20 2014

Iain Buclaw:Most encoding formats use a form of discrete cosine transform, which involves floating point maths.

I don't believe this much :-( Bye, bearophile

Feb 20 2014

On 20 February 2014 22:46, bearophile <bearophileHUGS lycos.com> wrote:Iain Buclaw:Most encoding formats use a form of discrete cosine transform, which involves floating point maths.

I don't believe this much :-(

:-( I looked up vorbis (very, very) briefly way of example. Most structs that deal with encoding/decoding use integers, floats and doubles to represent values such as decay, attenuation, thresholds, amplification etc.

Feb 23 2014

On 2/20/2014 2:10 AM, Nick B wrote:1. Would it be possible to implement the "unum" representation in D and therefore make it a contender for the SKA ?

Yes, as a library type. I don't think it'll save any energy, though.2. Is there any interest in this format within the D community

I think it would be a fun and useful project. Any takers? (Be aware that there's a Python 'unum' type that is something quite different.)

Feb 20 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second" Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.

Feb 20 2014

On Thursday, 20 February 2014 at 23:13:20 UTC, Francesco Cattoglio wrote:On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second" Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.

I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look). It also wasn't clear whether his format removed the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length, which would bring along with it all the problems of the IEEE format that he mentioned. I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.

Feb 20 2014

On Thursday, 20 February 2014 at 23:52:13 UTC, Chris Williams wrote:I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look).

Exctly. If I read correctly it should cover 128+ bits.It also wasn't clear whether his format removed the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length

Pretty sure it would not. It seems to me that space wasted by multiple representations is still there. The only real advantage is that you can probably store a NaN in 8 bits. Honestly, who cares. If I get a NaN in my numerical simulation, I have bigger concerns other than saving memory space.I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.

In fact, true rational numbers can only be represented by rational numbers. How extraordinary! :P The whole idea has one merit: the float now stores it's own accuracy. But I think you can achieve more or less the same goal by storing a pair of something. The whole idea of saving space sounds bogus, at least for my field of application. We already have an amazing technique for saving a lot of space for numerical simulation of PDEs, it's called grid refinement.

Feb 20 2014

The unum variable length encoding is very similar to how msgpack packs integers. See msgpack-d on github for a superb implementation in D.

Feb 20 2014

On Thursday, 20 February 2014 at 23:43:18 UTC, Nordlöw wrote:The unum variable length encoding is very similar to how msgpack packs integers. See msgpack-d on github for a superb implementation in D.

msgpack-d is indeed a great library that makes serialization almost instant to implement. I implemented CBOR another binary encoding scheme and it was obvious CBOR brings nothing over msgpack, it even did worse with integer encoding.

Feb 21 2014

On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone.

John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here:

FYI John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY Here is one of the reviewers comments: 9 of 9 people found the following review helpful This book is revolutionary By David Jefferson on April 18, 2015 This book is revolutionary. That is the only way to describe it. I have been a professional computer science researcher for almost 40 years, and only once or twice before have I seen a book that is destined to make such a profound change in the way we think about computation. It is hard to imagine that after 70 years or so of computer arithmetic that there is anything new to say about it, but this book reinvents the subject from the ground up, from the very notion of finite precision numbers to their bit-level representation, through the basic arithmetic operations, the calculation of elementary functions, all the way to the fundamental methods of numerical analysis, including completely new approaches to expression calculation, root finding, and the solution of differential equations. On every page from the beginning to the end of the book there are surprises that just astonished me, making me re-think material that I thought had been settled for decades. The methods described in this book are profoundly different from all previous treatments of numerical methods. Unum arithmetic is an extension of floating point arithmetic, but mathematically much cleaner. It never does rounding, so there is no rounding error. It handles what in floating point arithmetic is called "overflow" and "underflow" in a far more natural and correct way that makes them normal rather than exceptional. It also handles exceptional values (NaN, +infinity, -infinity) cleanly and consistently. Those contributions alone would have been a profound contribution. But the book does much more. One of the reasons I think the book is revolutionary is that unum-based numerical methods can effortlessly provide provable bounds on the error in numerical computation, something that is very rare for methods based on floating point calculations. And the bounds are generally as tight as possible (or as tight as you want them), rather than the useless or trivial bounds as often happens with floating point methods or even interval arithmetic methods. Another reason I consider the book revolutionary is that many of the unum-based methods are cleanly parallelizable, even for problems that are normally considered to be unavoidably sequential. This was completely unexpected. A third reason is that in most cases unum arithmetic uses fewer bits, and thus less power, storage, and bandwidth (the most precious resources in today’s computers) than the comparable floating point calculation. It hard to believe that we get this advantage in addition to all of the others, but it is amply demonstrated in the book. Doing efficient unum arithmetic takes more logic (e.g. transistors) than comparable floating point arithmetic does, but as the author points out, transistors are so cheap today that that hardly matters, especially when compared to the other benefits. Some of the broader themes of the book are counterintuitive to people like me advanced conventional training, so that I have to re-think everything I “knew” before. For example, the discussion of just what it means to “solve” an equation numerically is extraordinarily thought provoking. Another example is the author’s extended discussion of how calculus is not the best inspiration for computational numerical methods, even for problems that would seem to absolutely require calculus-based thinking, such as the solution of ordinary differential equations. Not only is the content of the book brilliant, but so is the presentation. The text is so well written, a mix of clarity, precision, and reader friendliness that it is a pure pleasure to read, rather then the dense struggle that mathematical textbooks usually require of the reader. But in addition, almost every page has full color graphics and diagrams that are completely compelling in their ability to clearly communicate the ideas. I cannot think of any technical book I have ever seen that is so beautifully illustrated all the way through. I should add that I read the Kindle edition on an iPad, and for once Amazon did not screw up the presentation of a technical book, at least for this platform. It is beautifully produced, in full color and detail, and with all of the fonts and graphics reproduced perfectly. Dr. Gustafson has also provided a Mathematica implementation of unums and of the many numerical methods discussed in the book. Let us hope that in the next few years there will be implementations in other languages, followed by hardware implementations. Over time there should be unum arithmetic units alongside of floating point arithmetic units on every CPU and GPU chip, and in the long run unums should replace floating point entirely. The case the author makes for this is overwhelming. If you are at all interested in computer arithmetic or numerical methods, read this book. It is destined to be a classic.

Jul 10

On 7/10/15 11:02 PM, Nick B wrote:John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY

Very interesting, I'll read it. Thanks! -- Andrei

Jul 11

On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:On 7/10/15 11:02 PM, Nick B wrote:John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY

Very interesting, I'll read it. Thanks! -- Andrei

I think Walter should read chapter 5.

Jul 11

On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote:On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:On 7/10/15 11:02 PM, Nick B wrote:John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY

Very interesting, I'll read it. Thanks! -- Andrei

I think Walter should read chapter 5.

What is this chapter about ?

Sep 16

On 09/16/2015 10:46 AM, deadalnix wrote:On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote:On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:

Very interesting, I'll read it. Thanks! -- Andrei

I think Walter should read chapter 5.

What is this chapter about ?

Relevant quote: "Programmers and users were never given visibility or control of when a value was promoted to “double extended precision” (80-bit or higher) format, unless they wrote assembly language; it just happened automatically, opportunistically, and unpredictably. Confusion caused by different results outweighed the advantage of reduced rounding-overflow-underflow problems, and now coprocessors must dumb down their results to mimic systems that have no such extra scratchpad capability."

Sep 16

On Sat, 2015-07-11 at 11:07 -0400, Andrei Alexandrescu via Digitalmars -d wrote:On 7/10/15 11:02 PM, Nick B wrote:John Gustafson book is now out: =20 It can be found here: =20 http://www.amazon.com/End-Error-Computing-Chapman -Computational/dp/1482239868/ref=3Dsr_1_1?s=3Dbooks&ie=3DUTF8&qid=3D143=

6582956&sr=3D1 -1&keywords=3DJohn+Gustafson&pebp=3D1436583212284&perid=3D093TDC82KFP9Y=

4S5PXPY

=20 Very interesting, I'll read it. Thanks! -- Andrei

Interesting that the publication date appears to be 2015-04-01. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder

Jul 11

On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:If you are at all interested in computer arithmetic or numerical methods, read this book. It is destined to be a classic.

Sample chapter here: http://radiofreehpc.com/tmp/TheEndofErrorSampleChapter.pdf

Jul 11

On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:FYI John Gustafson book is now out:

I was just thinking about overflow (more for my own interest than a practical reason). I wouldn't have known about this way to deal with it if you hadn't bumped this thread. So thanks, it's interesting (not sure if this book will adequately address Walter's original concern that it won't really reduce power consumption). I also found the discussion of rational numbers earlier in the thread interesting. When I was looking at other ways to handle overflow, I learned about a change they made to Python circa 2.3 or 2.4 to handle overflow (PEP237). Basically, they take an approach described in the presentation in the original post that will cast an int to a long if there is an overflow.

Jul 11

On Sunday, 12 July 2015 at 03:52:32 UTC, jmh530 wrote:On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:FYI John Gustafson book is now out:

I wouldn't have known about this way to deal with it if you hadn't bumped this thread. So thanks, it's interesting (not sure if this book will adequately address Walter's original concern that it won't really reduce power consumption). I also found the discussion of rational numbers earlier in the thread interesting.

I glad that you guys have found this interesting :) Nick

Jul 12

On 7/13/15 1:20 AM, Nick B wrote:On Sunday, 12 July 2015 at 03:52:32 UTC, jmh530 wrote:On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:FYI John Gustafson book is now out:

I wouldn't have known about this way to deal with it if you hadn't bumped this thread. So thanks, it's interesting (not sure if this book will adequately address Walter's original concern that it won't really reduce power consumption). I also found the discussion of rational numbers earlier in the thread interesting.

I glad that you guys have found this interesting :) Nick

Just read http://radiofreehpc.com/tmp/TheEndofErrorSampleChapter.pdf, very interesting stuff. I assume the definition of various primitives for unum are as compelling as alluded. The treatment angles toward a hardware representation and primitives, yet these ideas seem reasonably actionable at language level. We'd be able to implement the primitives in software, and the result would be a competitor for infinite-precision arithmetic libraries, which do exist and are being used. Yet the software implementation would be very slow compared to IEEE floats, which limits applications quite a bit. All we can do now, with our limited resources, is to keep an eye on developments and express cautious interest. If someone able and willing comes along with a unum library for D, that would be great. Andrei

Jul 22

On Wednesday, 22 July 2015 at 19:28:41 UTC, Andrei Alexandrescu wrote:On 7/13/15 1:20 AM, Nick B wrote: All we can do now, with our limited resources, is to keep an eye on developments and express cautious interest. If someone able and willing comes along with a unum library for D, that would be great.

The book has quite a bit of Mathematica code at the end. A first pass at a unum library could probably just involve porting that to D.

Jul 22

On Wednesday, 22 July 2015 at 20:41:42 UTC, jmh530 wrote:On Wednesday, 22 July 2015 at 19:28:41 UTC, Andrei Alexandrescu wrote:On 7/13/15 1:20 AM, Nick B wrote: All we can do now, with our limited resources, is to keep an eye on developments and express cautious interest. If someone able and willing comes along with a unum library for D, that would be great.

The book has quite a bit of Mathematica code at the end. A first pass at a unum library could probably just involve porting that to D.

Here is a Python port of the Mathematica code which is that much closer to D: https://github.com/jrmuizel/pyunum Regardless of whether the representation used follows the unum bitwise format proposed in the book, having the semantics of interval arithmetic that keeps proper account of amount of precision, and exact values vs. open/closed ended intervals on the extended real line would be quite valuable for verified computation and for how it enables simple root finding / optimization / differential equation solving via search, and could build on hard float directly andor an extended precision float library such as MPFR to keep performance close to raw floats. The bitwise format is intended to fit as much data as possible through the time-energy bottleneck between main memory and the processor. This would clearly be of great value if the format were supported in hardware but of less clear value otherwise. The semantic improvements would be quite welcome regardless. They go much further than best practices with floats to put error bounds on computations, handle over/underflow and infinity correctly, and keep algorithms correct by default. We might also consider small semantic improvements if not rigidly bound by the proposed formats and the need to implement in hardware, such as having division by an interval including zero return the union of the results from the intervals on either side of the zero as well as NaN (1-D intervals are strictly represented as the convex hull of at most two points in the proposed format, so an interval such as this made of a sentinel value unified with two disjoint intervals, though it has better semantics, is not supported). Anthony

Sep 14

On 11/07/15 06:02, Nick B wrote:On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone.

John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here:

FYI John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY

As far as I can tell, the author is pushing toward a hardware based implementation (i.e. - have the CPU do the implementation). All of his arguments are around "transistors are cheap, bus lines are expensive" theme. A software based implementation will have much worse performance than an FPU based floating point implementation, making these, even if generally adopted (and I have my doubts), impractical. If the powers that be think this is showing enough of a potential, then it might be a good idea to shape D so that it CAN use unums. That would require quite a few architectural changes to the way people think (which is a large part of why I think unums will not be generally adopted). Shachar

Jul 12

On 12 Jul 2015 14:30, "Shachar Shemesh via Digitalmars-d" < digitalmars-d puremagic.com> wrote:On 11/07/15 06:02, Nick B wrote:On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone.

John Gustafson Will be presenting a Keynote on Thursday 27th February at 11:00 am The abstract is here: http://openparallel.com/multicore-world-2014/speakers/john-gustafson/ There is also a excellent background paper, (PDF - 64 pages) which can be found here:

FYI John Gustafson book is now out: It can be found here:

http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY

As far as I can tell, the author is pushing toward a hardware based

implementation (i.e. - have the CPU do the implementation). All of his arguments are around "transistors are cheap, bus lines are expensive" theme.A software based implementation will have much worse performance than an

FPU based floating point implementation, making these, even if generally adopted (and I have my doubts), impractical.If the powers that be think this is showing enough of a potential, then

it might be a good idea to shape D so that it CAN use unums. That would require quite a few architectural changes to the way people think (which is a large part of why I think unums will not be generally adopted).Shachar

I reckon it will be a good to implement as proof of concept. Speed would likely be equivalent to soft vs. hard floats. I'm more interested in what existing hardware capabilities we can take advantage of now.

Jul 12

On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:Hi everyone.

FYI John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY Here is one of the reviewers comments: 9 of 9 people found the following review helpful This book is revolutionary By David Jefferson on April 18, 2015 This book is revolutionary. That is the only way to describe it. I have been a professional computer science researcher for almost 40 years, and only once or twice before have I seen a book that is destined to make such a profound change in the way we think about computation. It is hard to imagine that after 70 years or so of computer arithmetic that there is anything new to say about it, but this book reinvents the subject from the ground up, from the very notion of finite precision numbers to their bit-level representation, through the basic arithmetic operations, the calculation of elementary functions, all the way to the fundamental methods of numerical analysis, including completely new approaches to expression calculation, root finding, and the solution of differential equations. On every page from the beginning to the end of the book there are surprises that just astonished me, making me re-think material that I thought had been settled for decades. The methods described in this book are profoundly different from all previous treatments of numerical methods. Unum arithmetic is an extension of floating point arithmetic, but mathematically much cleaner. It never does rounding, so there is no rounding error. It handles what in floating point arithmetic is called "overflow" and "underflow" in a far more natural and correct way that makes them normal rather than exceptional. It also handles exceptional values (NaN, +infinity, -infinity) cleanly and consistently. Those contributions alone would have been a profound contribution. But the book does much more. One of the reasons I think the book is revolutionary is that unum-based numerical methods can effortlessly provide provable bounds on the error in numerical computation, something that is very rare for methods based on floating point calculations. And the bounds are generally as tight as possible (or as tight as you want them), rather than the useless or trivial bounds as often happens with floating point methods or even interval arithmetic methods. Another reason I consider the book revolutionary is that many of the unum-based methods are cleanly parallelizable, even for problems that are normally considered to be unavoidably sequential. This was completely unexpected. A third reason is that in most cases unum arithmetic uses fewer bits, and thus less power, storage, and bandwidth (the most precious resources in today’s computers) than the comparable floating point calculation. It hard to believe that we get this advantage in addition to all of the others, but it is amply demonstrated in the book. Doing efficient unum arithmetic takes more logic (e.g. transistors) than comparable floating point arithmetic does, but as the author points out, transistors are so cheap today that that hardly matters, especially when compared to the other benefits. Some of the broader themes of the book are counterintuitive to people like me advanced conventional training, so that I have to re-think everything I “knew” before. For example, the discussion of just what it means to “solve” an equation numerically is extraordinarily thought provoking. Another example is the author’s extended discussion of how calculus is not the best inspiration for computational numerical methods, even for problems that would seem to absolutely require calculus-based thinking, such as the solution of ordinary differential equations. Not only is the content of the book brilliant, but so is the presentation. The text is so well written, a mix of clarity, precision, and reader friendliness that it is a pure pleasure to read, rather then the dense struggle that mathematical textbooks usually require of the reader. But in addition, almost every page has full color graphics and diagrams that are completely compelling in their ability to clearly communicate the ideas. I cannot think of any technical book I have ever seen that is so beautifully illustrated all the way through. I should add that I read the Kindle edition on an iPad, and for once Amazon did not screw up the presentation of a technical book, at least for this platform. It is beautifully produced, in full color and detail, and with all of the fonts and graphics reproduced perfectly. Dr. Gustafson has also provided a Mathematica implementation of unums and of the many numerical methods discussed in the book. Let us hope that in the next few years there will be implementations in other languages, followed by hardware implementations. Over time there should be unum arithmetic units alongside of floating point arithmetic units on every CPU and GPU chip, and in the long run unums should replace floating point entirely. The case the author makes for this is overwhelming. If you are at all interested in computer arithmetic or numerical methods, read this book. It is destined to be a classic.

To be honest, that sound like snake oil salesman speech to me rather than science. It's all hand waving and nothing concrete is provided, the whole thing wrapped in way too much superlatives. The guy seems to have good credential. Why should I read that book ?

Sep 14

On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:The guy seems to have good credential. Why should I read that book ?

The sample chapter dissipates a bit the marketing cloud. One of the ideas is that the imprecise bit encode an interval between 2 values, hence automatically computing the "precision" of a computation without analysis.

Sep 15

On Tuesday, 15 September 2015 at 07:07:20 UTC, ponce wrote:On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:The guy seems to have good credential. Why should I read that book ?

The sample chapter dissipates a bit the marketing cloud. One of the ideas is that the imprecise bit encode an interval between 2 values, hence automatically computing the "precision" of a computation without analysis.

Read it. That is suddenly much less impressive, but much more sensible.

Sep 15

On Tuesday, 15 September 2015 at 07:57:01 UTC, deadalnix wrote:On Tuesday, 15 September 2015 at 07:07:20 UTC, ponce wrote:On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:The guy seems to have good credential. Why should I read that book ?

The sample chapter dissipates a bit the marketing cloud. One of the ideas is that the imprecise bit encode an interval between 2 values, hence automatically computing the "precision" of a computation without analysis.

Read it. That is suddenly much less impressive, but much more sensible.

I can see this being useful since nowadays in the native space we often choose between single and double precision with ad-hoc oral rules and learned habits rather than measuring precision. However if unum aren't fast, they will be only for prototyping and the real algorithm would rely on IEEE floats with different precision characteristics, so yeah hardware is critical.

Sep 15

On Tuesday, 15 September 2015 at 08:24:30 UTC, ponce wrote:However if unum aren't fast, they will be only for prototyping and the real algorithm would rely on IEEE floats with different precision characteristics, so yeah hardware is critical.

I think he is looking into 32 bit floats for a simpler version of the concept combined with the "ubox" method (multidimensional interval representation that "brute force" the answer). The "uboxing" is visualized here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

Sep 15

On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim Grøstad wrote:http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

That's a pretty convincing case. Who does it :)?

Sep 15

On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim Grøstad wrote:http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

That's a pretty convincing case. Who does it :)?

You:9 https://github.com/jrmuizel/pyunum/blob/master/unum.py

Sep 15

On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim Grøstad wrote:On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim Grøstad wrote:http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

That's a pretty convincing case. Who does it :)?

I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: Disadvantages of the Unum Format * Non-power-of-two alignment. Needs packing and unpacking, garbage collection. I think that disadvantage is so enormous that it negates most of the advantages. Note that in the x86 world, unaligned memory loads of SSE values still take longer than aligned loads. And that's a trivial case! The energy savings are achieved by using a primitive form of compression. Sure, you can reduce the memory bandwidth required by compressing the data. You could do that for *any* form of data, not just floating point. But I don't think anyone thinks that's worthwhile. The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache? That would show you what the real gains are, and my guess is they are tiny. So: * I don't believe the energy savings are real. * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law) * but the error bound stuff is cool.

Sep 16

On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote:On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim Grøstad wrote:On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

That's a pretty convincing case. Who does it :)?

I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: Disadvantages of the Unum Format * Non-power-of-two alignment. Needs packing and unpacking, garbage collection. I think that disadvantage is so enormous that it negates most of the advantages. Note that in the x86 world, unaligned memory loads of SSE values still take longer than aligned loads. And that's a trivial case! The energy savings are achieved by using a primitive form of compression. Sure, you can reduce the memory bandwidth required by compressing the data. You could do that for *any* form of data, not just floating point. But I don't think anyone thinks that's worthwhile.

GPU do it a lot. Especially, but not exclusively on mobile. Not to reduce the misses (a miss is pretty much guaranteed, you load 32 thread at once in a shader core, each of them will require at least 8 pixel for a bilinear texture with mipmap, that's the bare minimum. That means 256 memory access at once. One of these pixel WILL miss, and it is going to stall the 32 threads). It is not a latency issue, but a bandwidth and energy one. But yeah, in the general case, random access is preferable, memory alignment, and the fact you don't need to do as much bookeeping are very significants. Also, predictable size mean you can split your dataset and process it in parallel, which is impossible if sizes are random.The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache? That would show you what the real gains are, and my guess is they are tiny.

The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory. The author also use a lot the wire vs transistor cost, and how it evolved? He is right. Except that you won't cram more wire at runtime into the CPU. The CPU need the wiring for the worst case scenario, always. The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time. That being said, even a hit in L1 is very energy hungry. Think about it, you need to go a 8 - way fetch (so you'll end up loading 4k of data from the cache) in parallel with address translation (usually 16 ways) in parallel with snooping into the load and the store buffer. If the load is not aligned, you pretty much have to multiply this by 2 if it cross a cache line boundary. I'm not sure what his number represent, but hitting L1 is quite power hungry. He is right on that one.So: * I don't believe the energy savings are real. * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law) * but the error bound stuff is cool.

Yup, that's pretty much what I get out of it as well.

Sep 16

On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory.

That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support.The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time.

You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists. If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it.

Sep 16

On Wednesday, 16 September 2015 at 14:11:04 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory.

That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support.

No you don't. Because the streamer still need to load the unum one by one. Maybe 2 by 2 with a fair amount of hardware speculation (which means you are already trading energy for performances, so the energy argument is weak). There is no way you can feed 256+ cores that way. To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage.The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time.

You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists. If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it.

The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire.

Sep 16

On Wednesday, 16 September 2015 at 19:21:59 UTC, deadalnix wrote:No you don't. Because the streamer still need to load the unum one by one. Maybe 2 by 2 with a fair amount of hardware speculation (which means you are already trading energy for performances, so the energy argument is weak). There is no way you can feed 256+ cores that way.

You can load continuously 64 bytes in a stream, decode to your internal format and push them into the scratchpad of other cores. You could even do this in hardware. If you look at the ubox brute forcing method you compute many calculations over the same data, because you solve spatially, not by timesteps. So you can run many many parallell computations over the same data.To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage.

That's because we use a dumb compiler that does not prefetch intelligently. If you are writing for a tile based VLIW CPU you preload. These calculations are highly iterative so I'd rather think of it as a co-processor solving a single equation repeatedly than running the whole program. You can run the larger program on a regular CPU or a few cores.The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire.

I haven't seen a design, so I cannot comment. But keep in mind that the CPU does not have to work with the format, it can use a different format internally. We'll probably see FPGA implementations that can be run on FPGU cards for PCs within a few years. I read somewhere that a group in Singapore was working on it.

Sep 16

On Wednesday, 16 September 2015 at 19:40:49 UTC, Ola Fosheim Grøstad wrote:You can load continuously 64 bytes in a stream, decode to your internal format and push them into the scratchpad of other cores. You could even do this in hardware.

1/ If you load the worst case scenario, then your power advantage is gone. 2/ If you load these one by one, how do you expect to feed 256+ cores ? Obviously you can make this in hardware. And obviously this is not going to be able to feed 256+ cores. Even with a chip at low frequency, let's say 800MHz or so, you have about 80 cycles to access memory. That mean you need to have 20 000+ cycles of work to do per core per unum. That simple back of the envelope calculation. Your proposal is simply ludicrous. It's a complete non starter. You can make this in hardware. Sure you can, no problem. But you won't because it is a stupid idea.To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage.

That's because we use a dumb compiler that does not prefetch intelligently.

You know, when you have no idea what you are talking about, you can just move on to something you understand. Prefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem. No sufficiently smart compiler can fix that.If you are writing for a tile based VLIW CPU you preload. These calculations are highly iterative so I'd rather think of it as a co-processor solving a single equation repeatedly than running the whole program. You can run the larger program on a regular CPU or a few cores.

That's irrelevant. The problem is not the kind of CPU, it is how do you feed it at a fast enough rate.The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire.

I haven't seen a design, so I cannot comment. But keep in mind that the CPU does not have to work with the format, it can use a different format internally. We'll probably see FPGA implementations that can be run on FPGU cards for PCs within a few years. I read somewhere that a group in Singapore was working on it.

That's hardware 101. When you have a floating point unit, you get your 32 bits you get 23 bits that go into the mantissa FU and 8 in the exponent FU. For instance, if you multiply floats, you send the 2 exponent into a adder, you send the 2 mantissa into a 24bits multiplier (you add a leading 1), you xor the bit signs. You get the carry from the adder, and emit a multiply, or you count the leading 0 of the 48bit multiply result, shift by that amount and add the shit to the exponent. If you get a carry in the exponent adder, you saturate and emit an inifinity. Each bit goes into a given functional unit. That mean you need on wire from the input to the functional unit is goes to. Sale for these result. Now, if the format is variadic, you need to wire all bits to all functional units, because they can potentially end up there. That's a lot of wire, in fact the number of wire is growing quadratically with that joke. The author keep repeating that wire became the expensive thing and he is right. Meaning a solution with quadratic wiring is not going to cut it.

Sep 16

On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote:You know, when you have no idea what you are talking about, you can just move on to something you understand.

Ah, nice move. Back to your usual habits?Prefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem.

There is _no_ cache. The compiler fully controls the layout of the scratchpad.That's hardware 101.

Is it? The core point is this: 1. if there is academic interest (i.e. publishing opportunities) you get research 2. if there is research you get new algorithms 3. you get funding etc You cannot predict at this point what the future will be like. Is it unlikely that anything specific will change status quo? Yes. Is it highly probable that something will change status quo? Yes. Will it happen over night. No. 50+ years has been invested in floating point design. Will this be offset over night, no. It'll probably take 10+ years before anyone has a different type of numerical ALU on their desktop than IEEE754. By that time we are in a new era.

Sep 16

On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote:You know, when you have no idea what you are talking about, you can just move on to something you understand.

Ah, nice move. Back to your usual habits?

StopPrefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem.

There is _no_ cache. The compiler fully controls the layout of the scratchpad.

You are the king of goalspot shifting. You answer about x86 decoding you get served. You want to talk about a scraptch pad ? Good ! How do the data ends up in the scratchpad to begin with ? Using magic ? What is the scraptchpad made of if not flip flops ? If if so, how is it different from a cache as far as the hardware is concerned ? You can play with words, but the problem remain the same. When you get on chip memory, be it cache or scratchpad, and a variadic encoding, you can't even feed a handful of ALUs. How do you expect to feed 256+ VLIW cores ? There are 3 order of magintude of gap in your reasoning. You can't pull 3 orders of magnitude out of your ass and just pretend it can be done.That's hardware 101.

Is it?

Yes wire is hardware 101. I mean seriously, if one do not get how component can be wired together, one should probably abstain from making any hardware comment.You cannot predict at this point what the future will be like. Is it unlikely that anything specific will change status quo? Yes. Is it highly probable that something will change status quo? Yes. Will it happen over night. No. 50+ years has been invested in floating point design. Will this be offset over night, no. It'll probably take 10+ years before anyone has a different type of numerical ALU on their desktop than IEEE754. By that time we are in a new era.

Ok listen that is not complicated. I don't know what car will come out next year? But I know there won't be a car that can go 10000km on 10 centiliter of gazoline. This would be physic defying stuff. Same thing you won't be able to feed 256+ cores if you load data sequentially. Don't get me this stupid we don't know what's going to happen tomorow bullshit. We won't have unicorn meat in supermarkets. We won't have free energy. We won't have interstellar travel. And we won't have the capability to feed 256+ cores sequentially. I gave you numbers you gave me bullshit.

Sep 16

On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix wrote:On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote:You know, when you have no idea what you are talking about, you can just move on to something you understand.

Ah, nice move. Back to your usual habits?

Stop

OK. I stop. You are beyond reason.

Sep 16

On Wednesday, 16 September 2015 at 21:12:11 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix wrote:On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote:

Ah, nice move. Back to your usual habits?

Stop

OK. I stop. You are beyond reason.

True, how blind I was. It is fairly obvious now, thinking about it, that you can get 3 order of magnitude increase in sequential decoding in hardware by having a compiler with a vectorized SSA and a scratchpad ! Or maybe you have number to present us that show I'm wrong ?

Sep 16

On Wed, Sep 16, 2015 at 08:06:42PM +0000, deadalnix via Digitalmars-d wrote: [...]When you have a floating point unit, you get your 32 bits you get 23 bits that go into the mantissa FU and 8 in the exponent FU. For instance, if you multiply floats, you send the 2 exponent into a adder, you send the 2 mantissa into a 24bits multiplier (you add a leading 1), you xor the bit signs. You get the carry from the adder, and emit a multiply, or you count the leading 0 of the 48bit multiply result, shift by that amount and add the shit to the exponent. If you get a carry in the exponent adder, you saturate and emit an inifinity. Each bit goes into a given functional unit. That mean you need on wire from the input to the functional unit is goes to. Sale for these result. Now, if the format is variadic, you need to wire all bits to all functional units, because they can potentially end up there. That's a lot of wire, in fact the number of wire is growing quadratically with that joke. The author keep repeating that wire became the expensive thing and he is right. Meaning a solution with quadratic wiring is not going to cut it.

I found this .pdf that explains the unum representation a bit more: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf On p.31, you can see the binary representation of unum. The utag has 3 bits for exponent size, presumably meaning the exponent can vary in size up to 7 bits. There are 5 bits in the utag for the mantissa, so it can be anywhere from 0 to 31 bits. It's not completely variadic, but it's complex enough that you will probably need some kind of shift register to extract the exponent and mantissa so that you can pass them in the right format to the various parts of the hardware. It definitely won't be as straightforward as the current floating-point format; you can't just wire the bits directly to the adders and multipliers. This is probably what the author meant by needing "more transistors". I guess his point was that we have to do more work in the CPU, but in return we (hopefully) reduce the traffic to DRAM, thereby saving the cost of data transfer. I'm not so sure how well this will work in practice, though, unless we have a working prototype that proves the benefits. What if you have a 10*10 unum matrix, and during some operation the size of the unums in the matrix changes? Assuming the worst case, you could have started out with 10*10 unums with small exponent/mantissa, maybe fitting in 2-3 cache lines, but after the operation most of the entries expand to 7-bit exponent and 31-bit mantissa, so now your matrix doesn't fit into the allocated memory anymore. So now your hardware has to talk to druntime to have it allocate new memory for storing the resulting unum matrix? The only sensible solution seems to be to allocate the maximum size for each matrix entry, so that if the value changes you won't run out of space. But that means we have lost the benefit of having a variadic encoding to begin with -- you will have to transfer the maximum size's worth of data when you load the matrix from DRAM, even if most of that data is unused (because the unum only takes up a small percentage of the space). The author proposed GC, but I have a hard time imagining a GC implemented in *CPU*, no less, colliding with the rest of the world where it's the *software* that controls DRAM allocation. (GC too slow for your application? Too bad, gotta upgrade your CPU...) The way I see it from reading the PDF slides, is that what the author is proposing would work well as a *software* library, perhaps backed up by hardware support for some of the lower-level primitives. I'm a bit skeptical of the claims of data traffic / power savings, unless there is hard data to prove that it works. T -- "The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."

Sep 16

On Wednesday, 16 September 2015 at 23:28:23 UTC, H. S. Teoh wrote:I'm not so sure how well this will work in practice, though, unless we have a working prototype that proves the benefits. What if you have a 10*10 unum matrix, and during some operation the size of the unums in the matrix changes? Assuming the worst case, you could have started out with 10*10 unums with small exponent/mantissa, maybe fitting in 2-3 cache lines, but after the operation most of the entries expand to 7-bit exponent and 31-bit mantissa, so now your matrix doesn't fit into the allocated memory anymore. So now your hardware has to talk to druntime to have it allocate new memory for storing the resulting unum matrix?

Let's not make it so complicated. The internal CPU format could just be 32 and 64 bit. The key concept is about recording closed/open intervals and precision. If you spend 16 cores of a 256 core tiled coprocessor on I/O you still have 240 cores left. For the external format, it depends on your algorithm. If you are using map reduce you load/unload working sets, let the coprocessor do most of the work and combine the results. Like an actor based pipeline. The problem is more that average programmers will have real trouble making good use of it, since the know-how isn't there.The author proposed GC, but I have a hard time imagining a GC implemented in *CPU*, no less, colliding with the rest of the world where it's the *software* that controls DRAM allocation. (GC too slow for your application? Too bad, gotta upgrade your CPU...)

That's a bit into the future, isn't it? But local memory is probably less that 256K and designed for the core, so… who knows what extras you could build in? If you did it, the effect would be local, but it sounds too complicated to be worth it. But avoid thinking that the programmer address memory directly. CPU+Compiler is one package. Your interface is the compiler, not the CPU as such.The way I see it from reading the PDF slides, is that what the author is proposing would work well as a *software* library, perhaps backed up by hardware support for some of the lower-level primitives. I'm a bit skeptical of the claims of

First you would need to establish that there are numerical advantages that scientists require in some specific fields. Then you need to build it into scientific software and accelerate it. For desktop CPUs, nah... most people don't care about accuracy that much. Standards like IEEE1788 might also make adoption of unum less likely.

Sep 16

On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:Also, predictable size mean you can split your dataset and process it in parallel, which is impossible if sizes are random.

I don't recall how he would deal with something similar to cache misses when you have to promote or demote a unum. However, my recollection of the book is that there was quite a bit of focus on a unum representation that has the same size as a double. If you only did the computations with this format, I would expect the sizes would be more-or-less fixed. Promotion would be pretty rare, but still possible, I would think. Compared to calculations with doubles there might not be a strong case for energy efficiency (but I don't really know for sure). My understanding was that the benefit for energy efficiency is only when you use a smaller sized unum instead of a float. I don't recall how he would resolve your point about cache misses. Anyway, while I can see a benefit from using unum numbers (accuracy, avoiding overflow, etc.) rather than floating point numbers, I think that performance or energy efficiency would have to be within range of floating point numbers for it to have any meaningful adoption.

Sep 16

On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote:I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34:

I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum: http://www.theplatform.net/2015/07/22/supercomputer-chip-startup-scores-funding-darpa-contract/ He assumes that you use a scratchpad (a big register file), not caching, for intermediate calculations. His basic reasoning is that brute force ubox methods makes for highly parallel calculations. It might be possible to design ALUs that can work with various unum bit widths efficiently (many small or a few large)... who knows. You'll have to try first. Let's not forget that there is a _lot_ of legacy constraints and architectural assumptions in both x86 architecture.The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache?

I think this argument is aiming at HPC where you can find funding for ASICs. They push a lot of data over the memory bus.

Sep 16

On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim Grøstad wrote:I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum:

Oh hey, I remember these jokers. They were trying to blow some smoke about moving 288 GB/s at 4W. They're looking at unum? Of course they are; care to guess who's advising them? Yep. I'll be shocked if they ever even get to tape out. -Wyatt

Sep 16

On Wednesday, 16 September 2015 at 20:35:16 UTC, Wyatt wrote:On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim Grøstad wrote:I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum:

Oh hey, I remember these jokers. They were trying to blow some smoke about moving 288 GB/s at 4W. They're looking at unum? Of course they are; care to guess who's advising them? Yep. I'll be shocked if they ever even get to tape out.

Yes, of course, most startups in hardware don't succeed. I assume they get knowhow from Adapteva.

Sep 16

On 09/16/2015 10:17 AM, Don wrote:So: ... * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law)

https://en.wikipedia.org/wiki/Gustafson's_law :o)

Sep 16

On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:

...If you are at all interested in computer arithmetic or numerical methods, read this book. It is destined to be a classic.

To be honest, that sound like snake oil salesman speech to me rather than science. It's all hand waving and nothing concrete is provided, the whole thing wrapped in way too much superlatives. The guy seems to have good credential. Why should I read that book ?

I read the whole book and did not regret it at all, but I was already looking for good interval arithmetic implementations. I found that the techniques are not too different (though improved in important ways) from what is mainstream in verified computing. I found signs that techniques like this are standard in beam physics. (Caveat, I am not a beam physicist, but my friend at CERN is.) And the bibliography for MSU's COSY INFINITY, a verified computing tool from the beam physics community, provided a lot of interesting background information: http://www.bt.pa.msu.edu/pub/ What is perhaps most surprising is not that the techniques work well but that they can be implemented efficiently in a sensible amount of hardware, little more expensive than bare floating point hardware. Even maybe the most surprising results about search-based global optimization with unums have precedent, see e.g. "Rigorous Global Search using Taylor Models," M. Berz, K. Makino, Symbolic Numeric Computation 2009, (2009) 11-19, http://bt.pa.msu.edu/cgi-bin/display.pl?name=GOSNC09 (Taylor model techniques are similar to direct computation with intervals but add a bit more sophistication.) So, from my perspective, I think a unum library would at least be an interesting and useful library, and roughly in the style of the Mathematica / Python libraries could reduce unum interval computations to floating point computations with a modest amount of overhead. There is a sense in which we might expect the small overhead up front to be well worth it in the overall system: less haste, more speed. Hacks to try to compensate for incorrect behavior after the fact may end up being more costly overall, certainly to the programmer but perhaps also to the machine. For example, the common need to stick a loop over an entire vector or matrix into the inner loop of an iterative method to renormalize to prevent floating point rounding errors from accumulating. Whether this library should be part of the standard library, I don't know. It would seem to depend on how much people want the standard library to support verified numerical computing. If it is clear that verified numerical computing needs good support in the standard library, something like unums should be there, maybe even with some other techniques built on top of them (Taylor model or Levi-Civita for example). Anthony

Sep 17

On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di Franco wrote:I read the whole book and did not regret it at all, but I was already looking for good interval arithmetic implementations. I found that the techniques are not too different (though improved in important ways) from what is mainstream in verified computing.

It would seem to depend on how much people want thestandard library to support verified numerical computing.

Anthony Good to know that you enjoyed reading the book. Can you describe what YOU mean by 'verified numerical computing', as I could not find a good description of it, and why is it important to have it. Nick

Sep 17

On Friday, 18 September 2015 at 03:19:26 UTC, Nick B wrote:Can you describe what YOU mean by 'verified numerical computing', as I could not find a good description of it, and why is it important to have it.

Verified numerical computations provide results that are guaranteed to be without roundoff errors. A bit misleading term, perhaps.

Sep 17

On Friday, 18 September 2015 at 03:19:26 UTC, Nick B wrote:On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di Franco wrote:I read the whole book and did not regret it at all, but I was already looking for good interval arithmetic implementations. I found that the techniques are not too different (though improved in important ways) from what is mainstream in verified computing.

Hi, I haven't finished the book but have read over half of it and browsed the rest. I wanted to add that an implementation of unums would have advantages beyond verifiable computing. Some examples that spring to mind are: Using low precision (8-bit) unums to determine if an answer exists before using a higher precision representation to do the calculation (example briefly discussed in the book is ray tracing). More generally, unums can self-tune their precision which may be generally useful in getting high precision answers efficiently. It is possible for the programmer to specify the level of accuracy so that unums don't waste time calculating bits that have no meaning. Parallelisation - floating point ops are not associative but unum ops are. Tighter bounds on results than interval arithmetic or significance arithmetic. These are just a few areas where a software implementation could be useful. If you've ever had any issues with floating point, I'd recommend reading the book, not just because of the approach it proposes to solve these but also because it's very clearly written and quite entertaining (given the subject matter). Richard

Nov 08

On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di Franco wrote:Whether this library should be part of the standard library, I don't know. It would seem to depend on how much people want the standard library to support verified numerical computing. If it is clear that verified numerical computing needs good support in the standard library, something like unums should be there, maybe even with some other techniques built on top of them (Taylor model or Levi-Civita for example).

I don't think you should expect D to support verifiable programming. The only person here that has pushed for it consistently is Bearophile, but he is not a dev (and where is he?). Andrei has previously voiced the opinion that interval arithmetics as defined is ad-hoc and that D should do it differently: http://forum.dlang.org/post/l8su3p$g4o$1 digitalmars.com Walter, Andrei and many others have previously argued that you can turn asserts into assumes (basically assuming that they hold) without wrecking total havoc to the correctness of the program after optimization. It has also been argued that signalling NaNs are useless and that reproducible floating point math (IEEE754-2008) is not going in based on some pragmatic assumptions that I never quite understood. The current definition of D floats is fundamentally incompatible with IEEE 754-2008. So I am not even sure if you can implement IEEE 1788 (interval arithmetics) as a plain D library. D also only have modular integer math so you cannot detect overflow by adding a compiler switch since libraries may depend on modular arithmetic behaviour. D is created by hackers who enjoy hacking. They don't have the focus on correctness that verifiable-anything requires. So if you enjoy hacking you'll have fun. If are into reliability, stability and correctness you'll get frustrated. I'm not even sure you can have it both ways (both have a hacker mindset and a correctness mindset in the same design process).

Sep 18

On Friday, 18 September 2015 at 09:25:00 UTC, Ola Fosheim Grøstad wrote:D is created by hackers who enjoy hacking. They don't have the focus on correctness that verifiable-anything requires. So if you enjoy hacking you'll have fun. If are into reliability, stability and correctness you'll get frustrated. I'm not even sure you can have it both ways (both have a hacker mindset and a correctness mindset in the same design process).

You forgot to mention that D is quite attractive for people who just want to complain on forums.

Sep 18

On Friday, 18 September 2015 at 13:39:24 UTC, skoppe wrote:You forgot to mention that D is quite attractive for people who just want to complain on forums.

Yes, but that does not define the language. That's just a consequence of people having expectations and caring about where it is heading. If you want to avoid that you have to be upfront about where it is at and where it is going. If people didn't care about D and where it is heading, then they would not complain.

Sep 18

Also keep in mind that people who care about the language complain only in the forums. People who no longer care about the language and are upset because they had too high expectations complain not on the forums, but on reddit, slashdot and blogs... So setting expectations where they belong pays off. D really need to improve on that aspect. I basically just involves a focus on honest and objective communication throughout.

Sep 18