www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Implement the "unum" representation in D ?

reply "Nick B" <nick.barbalich gmail.com> writes:
Hi everyone.

I'm attend the SKA conference in Auckland next week and I would 
like to discuss a opportunity for the D community.

I am based in Wellington, New Zealand.  In Auckland, NZ, from 
Tuesday to Friday next week there will be two seminars held.

The first 2 days (Tuesday and Wednesday) are for the multicore 
conference. Details are   http://www.multicoreworld.com/

Here is the schedule for 2 days (thursday & friday) of the SKA 
conference:

http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/


John Gustafson Will be presenting a Keynote on Thursday 27th 
February at 11:00 am

The abstract is here:  
http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

There is also a excellent background paper, (PDF - 64 pages) 
which can be found here:

http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

The math details are beyond me, but I understand his basic idea.

I would like to bring your attention to Page 34 and his comments 
re "standard committees"  and page 62 and his comments "Coded in 
Mathematica  for now. Need a fast native version.."

I am sure you can see where I am going with this ....

1.  Would it be possible to implement the "unum"  representation 
in D and therefore make it a  contender for the SKA ?

2.  Is there any interest in this format within the D community

Destroy.

Nick
Feb 20 2014
next sibling parent "Nick B" <nick.barbalich gmail.com> writes:
On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.

 I'm attend the SKA conference in Auckland next week and I would 
 like to discuss a opportunity for the D community.
Sorry if I was not clear what the SKA is. In a nutshell is a truely massive telescope project which will require massive computing resources. https://www.skatelescope.org/ Nick
Feb 20 2014
prev sibling next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.

 I'm attend the SKA conference in Auckland next week and I would 
 like to discuss a opportunity for the D community.

 I am based in Wellington, New Zealand.  In Auckland, NZ, from 
 Tuesday to Friday next week there will be two seminars held.

 The first 2 days (Tuesday and Wednesday) are for the multicore 
 conference. Details are   http://www.multicoreworld.com/

 Here is the schedule for 2 days (thursday & friday) of the SKA 
 conference:

 http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/


 John Gustafson Will be presenting a Keynote on Thursday 27th 
 February at 11:00 am

 The abstract is here:  
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

 There is also a excellent background paper, (PDF - 64 pages) 
 which can be found here:

 http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

 The math details are beyond me, but I understand his basic idea.

 I would like to bring your attention to Page 34 and his 
 comments re "standard committees"  and page 62 and his comments 
 "Coded in Mathematica  for now. Need a fast native version.."

 I am sure you can see where I am going with this ....

 1.  Would it be possible to implement the "unum"  
 representation in D and therefore make it a  contender for the 
 SKA ?

 2.  Is there any interest in this format within the D community

 Destroy.

 Nick
Hmm. Interesting. It could be done in D, definitely. However, I'm a bit skeptical on how efficient it would ever be without specialist hardware. I'd have to read it over a few more times to get a proper grip on it.
Feb 20 2014
prev sibling next sibling parent "Rikki Cattermole" <alphaglosined gmail.com> writes:
On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.

 I'm attend the SKA conference in Auckland next week and I would 
 like to discuss a opportunity for the D community.

 I am based in Wellington, New Zealand.  In Auckland, NZ, from 
 Tuesday to Friday next week there will be two seminars held.

 The first 2 days (Tuesday and Wednesday) are for the multicore 
 conference. Details are   http://www.multicoreworld.com/

 Here is the schedule for 2 days (thursday & friday) of the SKA 
 conference:

 http://openparallel.com/multicore-world-2014/computing-for-ska/schedule-computing-for-ska-2014/


 John Gustafson Will be presenting a Keynote on Thursday 27th 
 February at 11:00 am

 The abstract is here:  
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

 There is also a excellent background paper, (PDF - 64 pages) 
 which can be found here:

 http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf

 The math details are beyond me, but I understand his basic idea.

 I would like to bring your attention to Page 34 and his 
 comments re "standard committees"  and page 62 and his comments 
 "Coded in Mathematica  for now. Need a fast native version.."

 I am sure you can see where I am going with this ....

 1.  Would it be possible to implement the "unum"  
 representation in D and therefore make it a  contender for the 
 SKA ?

 2.  Is there any interest in this format within the D community

 Destroy.

 Nick
It looks like something that could be made into a library, with the help of inline assembly. Shame I've got study next week. Definitely no chance of making it.
Feb 20 2014
prev sibling next sibling parent "w0rp" <devw0rp gmail.com> writes:
This is very interesting, thank you for sharing this. My 
knowledge of assembly, compilers, and how machines actually work 
is limited, but I knew enough about the details of floating point 
to get the gist of it. The diagrams in the PDF also helped. I 
hope someone does more research on this, as it looks promising 
for improving the quality of calculations.
Feb 20 2014
prev sibling next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Slide 6:

Even the IEEE standard (1985) made choices that look dubious now<
On the other hand it's still working surprisingly well today.
 Negative zero. (ugh!)
It's not bad. Slide 15: are (Page Rank and) mp3 Players using floating point values? Slide 18: "is harder than doing THIS with them?" Hardware multiplication uses a different algorithm. Regarding the variable length of his FP numbers, their energy savings are beer-based numbers, I don't think they have any experimental basis (yet). Bye, bearophile
Feb 20 2014
next sibling parent reply "jerro" <jkrempus gmail.com> writes:
 Regarding the variable length of his FP numbers, their energy 
 savings are beer-based numbers, I don't think they have any 
 experimental basis (yet).
Also, because they are variable sized, you need to access them through pointers if you want random access. Now you are reading both the pointer and the value from memory. Since pointers and double precision floats have the same size on modern hardware, one would expect this to actually consume more energy than just reading a double precision value. An additional indirection can also have great performance cost. And there's one more step you need to do. After getting the pointer you need to first read the utag so you can decide how many bytes to read. So where you would normally just read the value, you now need to read the pointer, use that to read the utag and use the utag to read the value.
Feb 20 2014
next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
 Also, because they are variable sized, you need to access them 
 through pointers if you want random access. Now you are reading 
 both the pointer and the value from memory.
We might not need random access though. For basic linear algebra forward or bidirectional should be enough which should suit this format. Depends on the application.
Feb 20 2014
next sibling parent reply "jerro" <jkrempus gmail.com> writes:
 We might not need random access though.
Even if you don't need random access, you can't store them in a packed way if you want to be able to mutate them in place, since mathematical operations on them can change the number of bits they take.
 Depends on the application.
I suspect that in most numerical applications the number of bits needed to store the numbers will keep increasing during the computation until it reaches the maximal supported value. That can actually happen very fast - a single division with a non power of two is enough. Applications that could actually benefit from this in any way are probably extremely rare. The only one I can think of is to use it as a very simple form of compression, but I'm sure there are better options for that.
Feb 20 2014
parent reply "Frustrated" <c1514843 drdrb.com> writes:
On Thursday, 20 February 2014 at 23:41:12 UTC, jerro wrote:
 We might not need random access though.
Even if you don't need random access, you can't store them in a packed way if you want to be able to mutate them in place, since mathematical operations on them can change the number of bits they take.
 Depends on the application.
I suspect that in most numerical applications the number of bits needed to store the numbers will keep increasing during the computation until it reaches the maximal supported value. That can actually happen very fast - a single division with a non power of two is enough. Applications that could actually benefit from this in any way are probably extremely rare. The only one I can think of is to use it as a very simple form of compression, but I'm sure there are better options for that.
Yes, but his ubox method attempts to solve this problem by providing only the most accurate answers. By using a "sliding" floating point you can control the accuracy much better and by using the ubox method you can zero in on the most accurate solutions. I think a few here are missing the point. First: The unums self scale to provide the "fewest bit" representation ala his morse code vs ascii example. Not a huge deal, after all, it's only memory. But by using such representations one can have more accurate results because one can better represent values that are not accurately representable in standard fp. I think though adding a "repeating" bit would make it even more accurate so that repeating decimals within the bounds of maximum bits used could be represented perfectly. e.g., 1/3 = 0.3333... could be represented perfectly with such a bit and sliding fp type. With proper cpu support one could have 0.3333... * 3 = 1 exactly. By having two extra bits one could represent constants to any degree of accuracy. e.g., the last bit says the value represents the ith constant in some list. This would allow very common irrational constants to be used: e, pi, sqrt(2), etc... with more accuracy than normal and handled internally by the cpu. With such constants one could have sqrt(2)*sqrt(2) = 2 exactly. (designate part of the constants as "squares" since one could have up to 2^n - 1 constants) The problem I see is that to make it all work requires a lot of work on the hardware. Without it, there is no real reason to use it. One also runs into the problem of optimization. Having variable sized fp numbers may not be very efficient in memory alignment. A list of resizedable fp types would contain 1, 2, 3, 4, 5, ... byte numbers of random size. If you do a calculation on the list and try to use the list as storage again could end up with a different sized list... which could be very inefficient. You'll probably have to allocate a list of the maximum size possible in the first place to prevent buffer overflows, which then defeats the purpose in some sense In any case, one could have a type where the last 2 bits designate the representation. 00 - Positive Integer 01 - Floating point 10 - Constant - (value represents an index, possibly virtual, into a table of constants) 11 - Negative Integer For floating points, the 3rd last bit could represent a repeating decimal or they could be used in the constants for common repeating decimals. (since chances are, repeated calculations would not produce repeating decimals)
Feb 20 2014
next sibling parent reply "francesco cattoglio" <francesco.cattoglio gmail.com> writes:
On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:
 I think though adding a "repeating" bit would make it even more
 accurate so that repeating decimals within the bounds of maximum
 bits used could be represented perfectly. e.g., 1/3 = 0.3333...
 could be represented perfectly with such a bit and sliding fp
 type. With proper cpu support one could have 0.3333... * 3 = 1
 exactly.

 By having two extra bits one could represent constants to any
 degree of accuracy. e.g., the last bit says the value represents
 the ith constant in some list. This would allow very common
 irrational constants to be used: e, pi, sqrt(2), etc...
Unfortunately maths (real world maths) isn't made by "common" constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.
 For floating points, the 3rd last bit could represent a 
 repeating
 decimal or they could be used in the constants for common
 repeating decimals. (since chances are, repeated calculations
 would not produce repeating decimals)
Things like those are cool and might have their application (I'm thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.
Feb 20 2014
parent reply "Frustrated" <c1514843 drdrb.com> writes:
On Friday, 21 February 2014 at 07:42:36 UTC, francesco cattoglio
wrote:
 On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:
 I think though adding a "repeating" bit would make it even more
 accurate so that repeating decimals within the bounds of 
 maximum
 bits used could be represented perfectly. e.g., 1/3 = 0.3333...
 could be represented perfectly with such a bit and sliding fp
 type. With proper cpu support one could have 0.3333... * 3 = 1
 exactly.

 By having two extra bits one could represent constants to any
 degree of accuracy. e.g., the last bit says the value 
 represents
 the ith constant in some list. This would allow very common
 irrational constants to be used: e, pi, sqrt(2), etc...
Unfortunately maths (real world maths) isn't made by "common" constants. More, such a "repeating bit" should become a "repeating counter", since you usually get a certain number of repeating digits, not just a single one.
I disagree. Many computations done by computers involve constants in the calculation... pi, e, sqrt(2), sqrt(-1), many physical constants, etc. The cpu could store these constants at a higher bit resolution than standard, say 128 bits or whatever.
 For floating points, the 3rd last bit could represent a 
 repeating
 decimal or they could be used in the constants for common
 repeating decimals. (since chances are, repeated calculations
 would not produce repeating decimals)
Things like those are cool and might have their application (I'm thinking mostly at message passing via TCP/IP), but have no real use in scientific computation. If you want good precision, you might as well be better off with bignum numbers.
Simply not true. Maple, for example, uses constants and can compute using constants. It might speed up the calculations significantly if one could compute at the cpu level using constants. e.g. pi*sqrt(2) is just another constant. 2*pi is another constant. Obviously there is a limit to the number of constants one can store but one can encode many constants easily without storage. (where the "index" itself is used as the value) Also, the cpu could reduce values to constants... so if you have a fp value that is 3.14159265..... or whatever, the cpu could convert this to the constant pi since for all purposes it is pi(even if it is just an approximation or pi - 1/10^10). Basically the benefit of this(and potential) outweigh the cost of 1 bit out of 64-bits. I doubt anyone would miss it. In fact, probably no real loss.... NaN could be a constant, in which case you would have only one rather than the 2 billion or whatever you have now).
Feb 21 2014
parent reply "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Friday, 21 February 2014 at 19:12:39 UTC, Frustrated wrote:
 Simply not true. Maple, for example, uses constants and can
 compute using constants.
You are mixing symbolic calculus and numerical computations. The two are completely unrelated.
 Basically the benefit of this(and potential) outweigh the cost 
 of
 1 bit out of 64-bits.
Unless I'm completely mistaken, what you are proposing is basically creating a tagged union type. It's pretty much unrelated from the "unum" proposed by the PDF.
Feb 21 2014
parent "Frustrated" <c1514843 drdrb.com> writes:
On Friday, 21 February 2014 at 19:59:36 UTC, Francesco Cattoglio
wrote:
 On Friday, 21 February 2014 at 19:12:39 UTC, Frustrated wrote:
 Simply not true. Maple, for example, uses constants and can
 compute using constants.
You are mixing symbolic calculus and numerical computations. The two are completely unrelated.
 Basically the benefit of this(and potential) outweigh the cost 
 of
 1 bit out of 64-bits.
Unless I'm completely mistaken, what you are proposing is basically creating a tagged union type. It's pretty much unrelated from the "unum" proposed by the PDF.
Yes, I mentioned that would could extend floating points to include other representations that are more efficient and reduce errors and such.
Feb 21 2014
prev sibling parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:
 I think though adding a "repeating" bit would make it even more
 accurate so that repeating decimals within the bounds of maximum
 bits used could be represented perfectly. e.g., 1/3 = 0.3333...
 could be represented perfectly with such a bit and sliding fp
 type. With proper cpu support one could have 0.3333... * 3 = 1
 exactly.
I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q. The reason is, if we take a reciprocal 1/q and represent it as a binary, decimal, or other fixed-base fraction, the representation is surely periodic, but the upper bound on the length of its period is as high as q-1, and it is not unlikely to be the exact bound. For example, at http://en.wikipedia.org/wiki/Binary_number#Fractions , note that the binary fraction for 1/11 has period 10, and for 1/13 the period is 12. Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator). Ivan Kazmenko.
Feb 21 2014
next sibling parent "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:
 I believe that the repeating decimals, or better, repeating 
 binary fractions, will hardly be more useful than a rational 
 representation like p/q.
Yeah, in retrospect I would say that a binary layout like: numberator length | +- | numerator | divisor Might be a stronger solution, or at least one which offers different advantages over what we have today. It still wouldn't be able to represent pi accurately, since it isn't a rational number, but I wouldn't be surprised if a rational number exists that could be represented than can be represented in IEEE format.
Feb 21 2014
prev sibling parent reply "Frustrated" <c1514843 drdrb.com> writes:
On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko wrote:
 On Friday, 21 February 2014 at 05:21:53 UTC, Frustrated wrote:
 I think though adding a "repeating" bit would make it even more
 accurate so that repeating decimals within the bounds of 
 maximum
 bits used could be represented perfectly. e.g., 1/3 = 0.3333...
 could be represented perfectly with such a bit and sliding fp
 type. With proper cpu support one could have 0.3333... * 3 = 1
 exactly.
I believe that the repeating decimals, or better, repeating binary fractions, will hardly be more useful than a rational representation like p/q. The reason is, if we take a reciprocal 1/q and represent it as a binary, decimal, or other fixed-base fraction, the representation is surely periodic, but the upper bound on the length of its period is as high as q-1, and it is not unlikely to be the exact bound. For example, at http://en.wikipedia.org/wiki/Binary_number#Fractions , note that the binary fraction for 1/11 has period 10, and for 1/13 the period is 12. Thus repeating decimal for a fraction p/q will take up to q-1 bits when we store it as a repeating decimal, but log(p)+log(q) bits when stored as a rational number (numerator and denominator).
No, you are comparing apples to oranges. The q is not the same in both equations. The number of bits for p/q when p and q are stored separately is log[2](p) + log[2](q). But when p/q is stored as a repeating decimal(assuming it repeats), then it is a fixed constant dependent on the number of bits. So in some cases the rational expression is cheaper and in other cases the repeating decimal is cheaper. e.g., .1.... in theory could take 1 bit to represent wile 1/9 takes 1 + 4 = 5. (the point being is that the rational representation sometimes takes way more bits than the repeating decimal representation, other times it doesn't. Basically if we use n bits to represent the repeating decimal we can't represent repetitions that require more than n bits, in this case a rational expression may be cheaper. If we use some type of compression then rational expressions would, in general, be cheaper the larger n is. e.g., 1/9 takes 5 bits to represent as as a rational faction(4 if we require the numerator to be 1). 0.1.... takes 1 bit to represent in theory but for n bits, it takes n bits. e.g., 1/9 = 11111111/10^8 = 0.000111... in decimal. Which takes 6 bits to store(000111). If n was large and fixed then I think statistically it would be best to use rational expressions rather than repeating decimals. But rational expressions are not unique and that may cause some problems with representation... unless the cpu implemented some algorithms for reduction. 1/9 = 10/90 but 10/90 takes way more bits to represent than 1/9 even though they represent the same repeating decimal and the repeating decimals bits are fixed. In any case, if the fpu had a history of calculations it could potentially store the calculations in an expression tree and attempt to reduce them. e.g., p/q*(q/p) = 1. A history could also be useful for constants in that multiplying several constants together could produce a new constant which would not introduce any new accumulation errors.
Feb 21 2014
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Friday, 21 February 2014 at 20:27:18 UTC, Frustrated wrote:
 On Friday, 21 February 2014 at 09:04:40 UTC, Ivan Kazmenko 
 wrote:
 Thus repeating decimal for a fraction p/q will take up to q-1 
 bits when we store it as a repeating decimal, but 
 log(p)+log(q) bits when stored as a rational number (numerator 
 and denominator).
No, you are comparing apples to oranges. The q is not the same in both equations.
Huh? I believe my q is actually the same q in both. What I am saying is, more formally: 1. Representing p/q as a rational number asymptotically takes on the order of log(p)+log(q) bits. 2. Representing p/q as a repeating binary fraction asymptotically takes on the order of log(p)+q bits in the worst case, and this worst case is common. Make that a decimal fraction instead of a binary fraction if you wish, the statement remains the same.
 The number of bits for p/q when p and q are stored separately is
 log[2](p) + log[2](q). But when p/q is stored as a repeating
 decimal(assuming it repeats), then it is a fixed constant
 dependent on the number of bits.
A rational represented as a fixed-base (binary, decimal, etc.) fraction always has a period. Sometimes the period is degenerate, that is, consists of a single zero: 1/2 = 0.100000... = 0.1 as a binary fraction. Sometimes there is a non-degenerate pre-period part before the period: 13/10 = 1.0100110011 = 1.0(1001) as a binary fraction, the "1.0" part being the pre-period and the "(1001)" part the period. The same goes for decimal fractions.
 So in some cases the rational expression is cheaper and in other
 cases the repeating decimal is cheaper.
Sure, it depends on the p and q, I was talking about the asymptotic case. Say, for p and q uniformly distributed in [100..999], I believe the rational representation is much shorter on average. We can measure that if the practical need arises... :) In practice, one may argue that large prime denominators don't occur too often; well, for such applications, fine then.
 If n was large and fixed then I think statistically it would be
 best to use rational expressions rather than repeating decimals.
Yeah, that's similar to what I was trying to state.
 But rational expressions are not unique and that may cause some
 problems with representation... unless the cpu implemented some
 algorithms for reduction. 1/9 = 10/90 but 10/90 takes way more
 bits to represent than 1/9 even though they represent the same
 repeating decimal and the repeating decimals bits are fixed.
Given a rational, bringing it to lowest terms is as easy (yet costly) as calculating the GCD.
 In any case, if the fpu had a history of calculations it could
 potentially store the calculations in an expression tree and
 attempt to reduce them. e.g., p/q*(q/p) = 1. A history could 
 also be useful for constants in that multiplying several 
 constants
 together could produce a new constant which would not introduce
 any new accumulation errors.
As far as I know, most compilers do that for constants known at compile time. As for the run-time usage of constants, note that the pool of constants will still have to be stored somewhere, and addressed somehow, and this may cancel out the performance gains. Ivan Kazmenko.
Feb 22 2014
parent reply "Ivan Kazmenko" <gassa mail.ru> writes:
On Saturday, 22 February 2014 at 14:17:23 UTC, Ivan Kazmenko 
wrote:
 Sometimes there is a non-degenerate pre-period part before the 
 period:
 13/10 = 1.0100110011 = 1.0(1001) as a binary fraction, the 
 "1.0" part being the pre-period and the "(1001)" part the 
 period.  The same goes for decimal fractions.
Sorry, I meant 13/10 = 1.0100110011..., that is, the "1001" part repeating forever - as opposed to just stopping at 10-th bit of the fraction.
Feb 22 2014
parent "Nick B" <nick.barbalich gmail.com> writes:
Hi

I will ask my question again.

Is there any interest in this format within the D community ?

Nick
Feb 23 2014
prev sibling parent "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Thursday, 20 February 2014 at 23:21:26 UTC, Nordlöw wrote:
 We might not need random access though.

 For basic linear algebra forward or bidirectional should be 
 enough which should suit this format.

 Depends on the application.
"Basic Linear Algebra" often requires sparse matrices. Sparse Matrices = both random access and indirection.
Feb 20 2014
prev sibling parent reply "Per =?UTF-8?B?Tm9yZGzDtnci?= <per.nordlow gmail.com> writes:
On Thursday, 20 February 2014 at 21:30:10 UTC, jerro wrote:
 Also, because they are variable sized, you need to access them 
 through pointers if you want random access. Now you are reading
Unless the author was thinking in terms of D Ranges, for the algorithms that does *not* require random access. Can we do anything useful with unums in numeric algorithms if only have forward or bidirectional access? Similar to algorithms such as Levenshtein that are compatible with UTF-8 and UTF-16, Andrei? :)
Jul 13 2015
parent reply "Nick B" <nick.barbalich gmail.com> writes:
On Monday, 13 July 2015 at 21:25:12 UTC, Per Nordlöw wrote:

 Can we do anything useful with unums in numeric algorithms if 
 only have forward or bidirectional access? Similar to 
 algorithms such as Levenshtein that are compatible with UTF-8 
 and UTF-16, Andrei? :)
Question for Andrei, above, if he would like to reply.
Jul 16 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/16/15 9:49 PM, Nick B wrote:
 On Monday, 13 July 2015 at 21:25:12 UTC, Per Nordlöw wrote:

 Can we do anything useful with unums in numeric algorithms if only
 have forward or bidirectional access? Similar to algorithms such as
 Levenshtein that are compatible with UTF-8 and UTF-16, Andrei? :)
Question for Andrei, above, if he would like to reply.
Haven't gotten to reading about unum, so I'm a bit lost. Who has forward/bidirectional access? -- Andrei
Jul 17 2015
prev sibling parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On 20 February 2014 15:26, bearophile <bearophileHUGS lycos.com> wrote:
 Slide 6:

 Even the IEEE standard (1985) made choices that look dubious now<
On the other hand it's still working surprisingly well today.
 Negative zero. (ugh!)
It's not bad. Slide 15: are (Page Rank and) mp3 Players using floating point values?
Most encoding formats use a form of discrete cosine transform, which involves floating point maths.
Feb 20 2014
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Iain Buclaw:

 Most encoding formats use a form of discrete cosine transform, 
 which involves floating point maths.
I don't believe this much :-( Bye, bearophile
Feb 20 2014
parent Iain Buclaw <ibuclaw gdcproject.org> writes:
On 20 February 2014 22:46, bearophile <bearophileHUGS lycos.com> wrote:
 Iain Buclaw:


 Most encoding formats use a form of discrete cosine transform, which
 involves floating point maths.
I don't believe this much :-(
:-( I looked up vorbis (very, very) briefly way of example. Most structs that deal with encoding/decoding use integers, floats and doubles to represent values such as decay, attenuation, thresholds, amplification etc.
Feb 23 2014
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/20/2014 2:10 AM, Nick B wrote:
 1.  Would it be possible to implement the "unum"  representation in D and
 therefore make it a  contender for the SKA ?
Yes, as a library type. I don't think it'll save any energy, though.
 2.  Is there any interest in this format within the D community
I think it would be a fun and useful project. Any takers? (Be aware that there's a Python 'unum' type that is something quite different.)
Feb 20 2014
prev sibling next sibling parent reply "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 The abstract is here:  
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/
"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second" Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.
Feb 20 2014
parent reply "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Thursday, 20 February 2014 at 23:13:20 UTC, Francesco 
Cattoglio wrote:
 On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 The abstract is here:  
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/
"The pursuit of exascale floating point is ridiculous, since we do not need to be making 10^18 sloppy rounding errors per second; we need instead to get provable, valid results for the first time, by turning the speed of parallel computers into higher quality answers instead of more junk per second" Ok, I think I know a bunch of people who could question the contents of that sentence. Or, at the very least, question this guy's way of presenting sensational news.
I don't quite understand his ubox stuff, but his unum format doesn't really solve the 0.1 problem, except maybe by allowing the size of his values to exceed 64-bits so that precision errors creap up a little bit slower. (I'm not sure how many bits his format tops out at and I don't want to re-open the PDF again to look). It also wasn't clear whether his format removed the multiple values of NaN, -0, etc. It looked like it was just the current IEEE formats bonded to a sliding bit-length, which would bring along with it all the problems of the IEEE format that he mentioned. I think the only way to solve for problems like 0.1 in decimal not mapping to any reasonable value in binary is to store numbers as integer equations (i.e. 0.1 = 1/10), which would take a hell of a complex format to represent and some pretty fancy CPUs.
Feb 20 2014
parent "Francesco Cattoglio" <francesco.cattoglio gmail.com> writes:
On Thursday, 20 February 2014 at 23:52:13 UTC, Chris Williams 
wrote:
 I don't quite understand his ubox stuff, but his unum format 
 doesn't really solve the 0.1 problem, except maybe by allowing 
 the size of his values to exceed 64-bits so that precision 
 errors creap up a little bit slower. (I'm not sure how many 
 bits his format tops out at and I don't want to re-open the PDF 
 again to look).
Exctly. If I read correctly it should cover 128+ bits.
It also wasn't clear whether his format removed
 the multiple values of NaN, -0, etc. It looked like it was just 
 the current IEEE formats bonded to a sliding bit-length
Pretty sure it would not. It seems to me that space wasted by multiple representations is still there. The only real advantage is that you can probably store a NaN in 8 bits. Honestly, who cares. If I get a NaN in my numerical simulation, I have bigger concerns other than saving memory space.
 I think the only way to solve for problems like 0.1 in decimal 
 not mapping to any reasonable value in binary is to store 
 numbers as integer equations (i.e. 0.1 = 1/10), which would 
 take a hell of a complex format to represent and some pretty 
 fancy CPUs.
In fact, true rational numbers can only be represented by rational numbers. How extraordinary! :P The whole idea has one merit: the float now stores it's own accuracy. But I think you can achieve more or less the same goal by storing a pair of something. The whole idea of saving space sounds bogus, at least for my field of application. We already have an amazing technique for saving a lot of space for numerical simulation of PDEs, it's called grid refinement.
Feb 20 2014
prev sibling next sibling parent reply =?UTF-8?B?Ik5vcmRsw7Z3Ig==?= <per.nordlow gmail.com> writes:
The unum variable length encoding is very similar to how msgpack 
packs integers. See msgpack-d on github for a superb 
implementation in D.
Feb 20 2014
parent "ponce" <contact gam3sfrommars.fr> writes:
On Thursday, 20 February 2014 at 23:43:18 UTC, Nordlöw wrote:
 The unum variable length encoding is very similar to how 
 msgpack packs integers. See msgpack-d on github for a superb 
 implementation in D.
msgpack-d is indeed a great library that makes serialization almost instant to implement. I implemented CBOR another binary encoding scheme and it was obvious CBOR brings nothing over msgpack, it even did worse with integer encoding.
Feb 21 2014
prev sibling parent reply "Nick B" <nick.barbalich gmail.com> writes:
On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.
 John Gustafson Will be presenting a Keynote on Thursday 27th 
 February at 11:00 am

 The abstract is here:  
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

 There is also a excellent background paper, (PDF - 64 pages) 
 which can be found here:
FYI John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY Here is one of the reviewers comments: 9 of 9 people found the following review helpful This book is revolutionary By David Jefferson on April 18, 2015 This book is revolutionary. That is the only way to describe it. I have been a professional computer science researcher for almost 40 years, and only once or twice before have I seen a book that is destined to make such a profound change in the way we think about computation. It is hard to imagine that after 70 years or so of computer arithmetic that there is anything new to say about it, but this book reinvents the subject from the ground up, from the very notion of finite precision numbers to their bit-level representation, through the basic arithmetic operations, the calculation of elementary functions, all the way to the fundamental methods of numerical analysis, including completely new approaches to expression calculation, root finding, and the solution of differential equations. On every page from the beginning to the end of the book there are surprises that just astonished me, making me re-think material that I thought had been settled for decades. The methods described in this book are profoundly different from all previous treatments of numerical methods. Unum arithmetic is an extension of floating point arithmetic, but mathematically much cleaner. It never does rounding, so there is no rounding error. It handles what in floating point arithmetic is called "overflow" and "underflow" in a far more natural and correct way that makes them normal rather than exceptional. It also handles exceptional values (NaN, +infinity, -infinity) cleanly and consistently. Those contributions alone would have been a profound contribution. But the book does much more. One of the reasons I think the book is revolutionary is that unum-based numerical methods can effortlessly provide provable bounds on the error in numerical computation, something that is very rare for methods based on floating point calculations. And the bounds are generally as tight as possible (or as tight as you want them), rather than the useless or trivial bounds as often happens with floating point methods or even interval arithmetic methods. Another reason I consider the book revolutionary is that many of the unum-based methods are cleanly parallelizable, even for problems that are normally considered to be unavoidably sequential. This was completely unexpected. A third reason is that in most cases unum arithmetic uses fewer bits, and thus less power, storage, and bandwidth (the most precious resources in today’s computers) than the comparable floating point calculation. It hard to believe that we get this advantage in addition to all of the others, but it is amply demonstrated in the book. Doing efficient unum arithmetic takes more logic (e.g. transistors) than comparable floating point arithmetic does, but as the author points out, transistors are so cheap today that that hardly matters, especially when compared to the other benefits. Some of the broader themes of the book are counterintuitive to people like me advanced conventional training, so that I have to re-think everything I “knew” before. For example, the discussion of just what it means to “solve” an equation numerically is extraordinarily thought provoking. Another example is the author’s extended discussion of how calculus is not the best inspiration for computational numerical methods, even for problems that would seem to absolutely require calculus-based thinking, such as the solution of ordinary differential equations. Not only is the content of the book brilliant, but so is the presentation. The text is so well written, a mix of clarity, precision, and reader friendliness that it is a pure pleasure to read, rather then the dense struggle that mathematical textbooks usually require of the reader. But in addition, almost every page has full color graphics and diagrams that are completely compelling in their ability to clearly communicate the ideas. I cannot think of any technical book I have ever seen that is so beautifully illustrated all the way through. I should add that I read the Kindle edition on an iPad, and for once Amazon did not screw up the presentation of a technical book, at least for this platform. It is beautifully produced, in full color and detail, and with all of the fonts and graphics reproduced perfectly. Dr. Gustafson has also provided a Mathematica implementation of unums and of the many numerical methods discussed in the book. Let us hope that in the next few years there will be implementations in other languages, followed by hardware implementations. Over time there should be unum arithmetic units alongside of floating point arithmetic units on every CPU and GPU chip, and in the long run unums should replace floating point entirely. The case the author makes for this is overwhelming. If you are at all interested in computer arithmetic or numerical methods, read this book. It is destined to be a classic.
Jul 10 2015
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/10/15 11:02 PM, Nick B wrote:
 John Gustafson book is now out:

 It can be found here:

 http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY
Very interesting, I'll read it. Thanks! -- Andrei
Jul 11 2015
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:
 On 7/10/15 11:02 PM, Nick B wrote:
 John Gustafson book is now out:

 It can be found here:

 http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY
Very interesting, I'll read it. Thanks! -- Andrei
I think Walter should read chapter 5.
Jul 11 2015
parent reply deadalnix <deadalnix gmail.com> writes:
On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote:
 On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:
 On 7/10/15 11:02 PM, Nick B wrote:
 John Gustafson book is now out:

 It can be found here:

 http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY
Very interesting, I'll read it. Thanks! -- Andrei
I think Walter should read chapter 5.
What is this chapter about ?
Sep 16 2015
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 09/16/2015 10:46 AM, deadalnix wrote:
 On Saturday, 11 July 2015 at 18:16:22 UTC, Timon Gehr wrote:
 On 07/11/2015 05:07 PM, Andrei Alexandrescu wrote:
 On 7/10/15 11:02 PM, Nick B wrote:
 John Gustafson book is now out:

 It can be found here:

 http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY
Very interesting, I'll read it. Thanks! -- Andrei
I think Walter should read chapter 5.
What is this chapter about ?
Relevant quote: "Programmers and users were never given visibility or control of when a value was promoted to “double extended precision” (80-bit or higher) format, unless they wrote assembly language; it just happened automatically, opportunistically, and unpredictably. Confusion caused by different results outweighed the advantage of reduced rounding-overflow-underflow problems, and now coprocessors must dumb down their results to mimic systems that have no such extra scratchpad capability."
Sep 16 2015
prev sibling parent Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Sat, 2015-07-11 at 11:07 -0400, Andrei Alexandrescu via Digitalmars
-d wrote:
 On 7/10/15 11:02 PM, Nick B wrote:
 John Gustafson book is now out:
=20
 It can be found here:
=20
 http://www.amazon.com/End-Error-Computing-Chapman
 -Computational/dp/1482239868/ref=3Dsr_1_1?s=3Dbooks&ie=3DUTF8&qid=3D143=
6582
 956&sr=3D1
 -1&keywords=3DJohn+Gustafson&pebp=3D1436583212284&perid=3D093TDC82KFP9Y=
4S
 5PXPY
=20 Very interesting, I'll read it. Thanks! -- Andrei
Interesting that the publication date appears to be 2015-04-01. --=20 Russel. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D Dr Russel Winder t: +44 20 7585 2200 voip: sip:russel.winder ekiga.n= et 41 Buckmaster Road m: +44 7770 465 077 xmpp: russel winder.org.uk London SW11 1EN, UK w: www.russel.org.uk skype: russel_winder
Jul 11 2015
prev sibling next sibling parent "ponce" <contact gam3sfrommars.fr> writes:
On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:
 If you are at all interested in computer arithmetic or 
 numerical methods, read this book. It is destined to be a 
 classic.
Sample chapter here: http://radiofreehpc.com/tmp/TheEndofErrorSampleChapter.pdf
Jul 11 2015
prev sibling next sibling parent reply "jmh530" <john.michael.hall gmail.com> writes:
On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:
 FYI

 John Gustafson book is now out:
I was just thinking about overflow (more for my own interest than a practical reason). I wouldn't have known about this way to deal with it if you hadn't bumped this thread. So thanks, it's interesting (not sure if this book will adequately address Walter's original concern that it won't really reduce power consumption). I also found the discussion of rational numbers earlier in the thread interesting. When I was looking at other ways to handle overflow, I learned about a change they made to Python circa 2.3 or 2.4 to handle overflow (PEP237). Basically, they take an approach described in the presentation in the original post that will cast an int to a long if there is an overflow.
Jul 11 2015
parent reply "Nick B" <nick.barbalich gmail.com> writes:
On Sunday, 12 July 2015 at 03:52:32 UTC, jmh530 wrote:
 On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:
 FYI

 John Gustafson book is now out:
I wouldn't have known about this way to deal with it if you hadn't bumped this thread. So thanks, it's interesting (not sure if this book will adequately address Walter's original concern that it won't really reduce power consumption). I also found the discussion of rational numbers earlier in the thread interesting.
I glad that you guys have found this interesting :) Nick
Jul 12 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/13/15 1:20 AM, Nick B wrote:
 On Sunday, 12 July 2015 at 03:52:32 UTC, jmh530 wrote:
 On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:
 FYI

 John Gustafson book is now out:
I wouldn't have known about this way to deal with it if you hadn't bumped this thread. So thanks, it's interesting (not sure if this book will adequately address Walter's original concern that it won't really reduce power consumption). I also found the discussion of rational numbers earlier in the thread interesting.
I glad that you guys have found this interesting :) Nick
Just read http://radiofreehpc.com/tmp/TheEndofErrorSampleChapter.pdf, very interesting stuff. I assume the definition of various primitives for unum are as compelling as alluded. The treatment angles toward a hardware representation and primitives, yet these ideas seem reasonably actionable at language level. We'd be able to implement the primitives in software, and the result would be a competitor for infinite-precision arithmetic libraries, which do exist and are being used. Yet the software implementation would be very slow compared to IEEE floats, which limits applications quite a bit. All we can do now, with our limited resources, is to keep an eye on developments and express cautious interest. If someone able and willing comes along with a unum library for D, that would be great. Andrei
Jul 22 2015
parent reply "jmh530" <john.michael.hall gmail.com> writes:
On Wednesday, 22 July 2015 at 19:28:41 UTC, Andrei Alexandrescu 
wrote:
 On 7/13/15 1:20 AM, Nick B wrote:

 All we can do now, with our limited resources, is to keep an 
 eye on developments and express cautious interest. If someone 
 able and willing comes along with a unum library for D, that 
 would be great.
The book has quite a bit of Mathematica code at the end. A first pass at a unum library could probably just involve porting that to D.
Jul 22 2015
parent Anthony Di Franco <di.franco aya.yale.edu> writes:
On Wednesday, 22 July 2015 at 20:41:42 UTC, jmh530 wrote:
 On Wednesday, 22 July 2015 at 19:28:41 UTC, Andrei Alexandrescu 
 wrote:
 On 7/13/15 1:20 AM, Nick B wrote:

 All we can do now, with our limited resources, is to keep an 
 eye on developments and express cautious interest. If someone 
 able and willing comes along with a unum library for D, that 
 would be great.
The book has quite a bit of Mathematica code at the end. A first pass at a unum library could probably just involve porting that to D.
Here is a Python port of the Mathematica code which is that much closer to D: https://github.com/jrmuizel/pyunum Regardless of whether the representation used follows the unum bitwise format proposed in the book, having the semantics of interval arithmetic that keeps proper account of amount of precision, and exact values vs. open/closed ended intervals on the extended real line would be quite valuable for verified computation and for how it enables simple root finding / optimization / differential equation solving via search, and could build on hard float directly andor an extended precision float library such as MPFR to keep performance close to raw floats. The bitwise format is intended to fit as much data as possible through the time-energy bottleneck between main memory and the processor. This would clearly be of great value if the format were supported in hardware but of less clear value otherwise. The semantic improvements would be quite welcome regardless. They go much further than best practices with floats to put error bounds on computations, handle over/underflow and infinity correctly, and keep algorithms correct by default. We might also consider small semantic improvements if not rigidly bound by the proposed formats and the need to implement in hardware, such as having division by an interval including zero return the union of the results from the intervals on either side of the zero as well as NaN (1-D intervals are strictly represented as the convex hull of at most two points in the proposed format, so an interval such as this made of a sentinel value unified with two disjoint intervals, though it has better semantics, is not supported). Anthony
Sep 14 2015
prev sibling next sibling parent reply Shachar Shemesh <shachar weka.io> writes:
On 11/07/15 06:02, Nick B wrote:
 On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.
 John Gustafson Will be presenting a Keynote on Thursday 27th February
 at 11:00 am

 The abstract is here:
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

 There is also a excellent background paper, (PDF - 64 pages) which can
 be found here:
FYI John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY
As far as I can tell, the author is pushing toward a hardware based implementation (i.e. - have the CPU do the implementation). All of his arguments are around "transistors are cheap, bus lines are expensive" theme. A software based implementation will have much worse performance than an FPU based floating point implementation, making these, even if generally adopted (and I have my doubts), impractical. If the powers that be think this is showing enough of a potential, then it might be a good idea to shape D so that it CAN use unums. That would require quite a few architectural changes to the way people think (which is a large part of why I think unums will not be generally adopted). Shachar
Jul 12 2015
parent Iain Buclaw via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 12 Jul 2015 14:30, "Shachar Shemesh via Digitalmars-d" <
digitalmars-d puremagic.com> wrote:
 On 11/07/15 06:02, Nick B wrote:
 On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.
 John Gustafson Will be presenting a Keynote on Thursday 27th February
 at 11:00 am

 The abstract is here:
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

 There is also a excellent background paper, (PDF - 64 pages) which can
 be found here:
FYI John Gustafson book is now out: It can be found here:
http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY

 As far as I can tell, the author is pushing toward a hardware based
implementation (i.e. - have the CPU do the implementation). All of his arguments are around "transistors are cheap, bus lines are expensive" theme.
 A software based implementation will have much worse performance than an
FPU based floating point implementation, making these, even if generally adopted (and I have my doubts), impractical.
 If the powers that be think this is showing enough of a potential, then
it might be a good idea to shape D so that it CAN use unums. That would require quite a few architectural changes to the way people think (which is a large part of why I think unums will not be generally adopted).
 Shachar
I reckon it will be a good to implement as proof of concept. Speed would likely be equivalent to soft vs. hard floats. I'm more interested in what existing hardware capabilities we can take advantage of now.
Jul 12 2015
prev sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:
 On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
 Hi everyone.
 John Gustafson Will be presenting a Keynote on Thursday 27th 
 February at 11:00 am

 The abstract is here:  
 http://openparallel.com/multicore-world-2014/speakers/john-gustafson/

 There is also a excellent background paper, (PDF - 64 pages) 
 which can be found here:
FYI John Gustafson book is now out: It can be found here: http://www.amazon.com/End-Error-Computing-Chapman-Computational/dp/1482239868/ref=sr_1_1?s=books&ie=UTF8&qid=1436582956&sr=1-1&keywords=John+Gustafson&pebp=1436583212284&perid=093TDC82KFP9Y4S5PXPY Here is one of the reviewers comments: 9 of 9 people found the following review helpful This book is revolutionary By David Jefferson on April 18, 2015 This book is revolutionary. That is the only way to describe it. I have been a professional computer science researcher for almost 40 years, and only once or twice before have I seen a book that is destined to make such a profound change in the way we think about computation. It is hard to imagine that after 70 years or so of computer arithmetic that there is anything new to say about it, but this book reinvents the subject from the ground up, from the very notion of finite precision numbers to their bit-level representation, through the basic arithmetic operations, the calculation of elementary functions, all the way to the fundamental methods of numerical analysis, including completely new approaches to expression calculation, root finding, and the solution of differential equations. On every page from the beginning to the end of the book there are surprises that just astonished me, making me re-think material that I thought had been settled for decades. The methods described in this book are profoundly different from all previous treatments of numerical methods. Unum arithmetic is an extension of floating point arithmetic, but mathematically much cleaner. It never does rounding, so there is no rounding error. It handles what in floating point arithmetic is called "overflow" and "underflow" in a far more natural and correct way that makes them normal rather than exceptional. It also handles exceptional values (NaN, +infinity, -infinity) cleanly and consistently. Those contributions alone would have been a profound contribution. But the book does much more. One of the reasons I think the book is revolutionary is that unum-based numerical methods can effortlessly provide provable bounds on the error in numerical computation, something that is very rare for methods based on floating point calculations. And the bounds are generally as tight as possible (or as tight as you want them), rather than the useless or trivial bounds as often happens with floating point methods or even interval arithmetic methods. Another reason I consider the book revolutionary is that many of the unum-based methods are cleanly parallelizable, even for problems that are normally considered to be unavoidably sequential. This was completely unexpected. A third reason is that in most cases unum arithmetic uses fewer bits, and thus less power, storage, and bandwidth (the most precious resources in today’s computers) than the comparable floating point calculation. It hard to believe that we get this advantage in addition to all of the others, but it is amply demonstrated in the book. Doing efficient unum arithmetic takes more logic (e.g. transistors) than comparable floating point arithmetic does, but as the author points out, transistors are so cheap today that that hardly matters, especially when compared to the other benefits. Some of the broader themes of the book are counterintuitive to people like me advanced conventional training, so that I have to re-think everything I “knew” before. For example, the discussion of just what it means to “solve” an equation numerically is extraordinarily thought provoking. Another example is the author’s extended discussion of how calculus is not the best inspiration for computational numerical methods, even for problems that would seem to absolutely require calculus-based thinking, such as the solution of ordinary differential equations. Not only is the content of the book brilliant, but so is the presentation. The text is so well written, a mix of clarity, precision, and reader friendliness that it is a pure pleasure to read, rather then the dense struggle that mathematical textbooks usually require of the reader. But in addition, almost every page has full color graphics and diagrams that are completely compelling in their ability to clearly communicate the ideas. I cannot think of any technical book I have ever seen that is so beautifully illustrated all the way through. I should add that I read the Kindle edition on an iPad, and for once Amazon did not screw up the presentation of a technical book, at least for this platform. It is beautifully produced, in full color and detail, and with all of the fonts and graphics reproduced perfectly. Dr. Gustafson has also provided a Mathematica implementation of unums and of the many numerical methods discussed in the book. Let us hope that in the next few years there will be implementations in other languages, followed by hardware implementations. Over time there should be unum arithmetic units alongside of floating point arithmetic units on every CPU and GPU chip, and in the long run unums should replace floating point entirely. The case the author makes for this is overwhelming. If you are at all interested in computer arithmetic or numerical methods, read this book. It is destined to be a classic.
To be honest, that sound like snake oil salesman speech to me rather than science. It's all hand waving and nothing concrete is provided, the whole thing wrapped in way too much superlatives. The guy seems to have good credential. Why should I read that book ?
Sep 14 2015
next sibling parent reply ponce <contact gam3sfrommars.fr> writes:
On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:
 The guy seems to have good credential. Why should I read that 
 book ?
The sample chapter dissipates a bit the marketing cloud. One of the ideas is that the imprecise bit encode an interval between 2 values, hence automatically computing the "precision" of a computation without analysis.
Sep 15 2015
parent reply deadalnix <deadalnix gmail.com> writes:
On Tuesday, 15 September 2015 at 07:07:20 UTC, ponce wrote:
 On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:
 The guy seems to have good credential. Why should I read that 
 book ?
The sample chapter dissipates a bit the marketing cloud. One of the ideas is that the imprecise bit encode an interval between 2 values, hence automatically computing the "precision" of a computation without analysis.
Read it. That is suddenly much less impressive, but much more sensible.
Sep 15 2015
parent reply ponce <contact gam3sfrommars.fr> writes:
On Tuesday, 15 September 2015 at 07:57:01 UTC, deadalnix wrote:
 On Tuesday, 15 September 2015 at 07:07:20 UTC, ponce wrote:
 On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:
 The guy seems to have good credential. Why should I read that 
 book ?
The sample chapter dissipates a bit the marketing cloud. One of the ideas is that the imprecise bit encode an interval between 2 values, hence automatically computing the "precision" of a computation without analysis.
Read it. That is suddenly much less impressive, but much more sensible.
I can see this being useful since nowadays in the native space we often choose between single and double precision with ad-hoc oral rules and learned habits rather than measuring precision. However if unum aren't fast, they will be only for prototyping and the real algorithm would rely on IEEE floats with different precision characteristics, so yeah hardware is critical.
Sep 15 2015
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Tuesday, 15 September 2015 at 08:24:30 UTC, ponce wrote:
 However if unum aren't fast, they will be only for prototyping 
 and the real algorithm would rely on IEEE floats with different 
 precision characteristics, so yeah hardware is critical.
I think he is looking into 32 bit floats for a simpler version of the concept combined with the "ubox" method (multidimensional interval representation that "brute force" the answer). The "uboxing" is visualized here: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf
Sep 15 2015
parent reply ponce <contact gam3sfrommars.fr> writes:
On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim 
Grøstad wrote:
 http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf
That's a pretty convincing case. Who does it :)?
Sep 15 2015
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:
 On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim 
 Grøstad wrote:
 http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf
That's a pretty convincing case. Who does it :)?
You:9 https://github.com/jrmuizel/pyunum/blob/master/unum.py
Sep 15 2015
parent reply Don <prosthetictelevisions teletubby.medical.com> writes:
On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim 
Grøstad wrote:
 On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:
 On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim 
 Grøstad wrote:
 http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf
That's a pretty convincing case. Who does it :)?
I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: Disadvantages of the Unum Format * Non-power-of-two alignment. Needs packing and unpacking, garbage collection. I think that disadvantage is so enormous that it negates most of the advantages. Note that in the x86 world, unaligned memory loads of SSE values still take longer than aligned loads. And that's a trivial case! The energy savings are achieved by using a primitive form of compression. Sure, you can reduce the memory bandwidth required by compressing the data. You could do that for *any* form of data, not just floating point. But I don't think anyone thinks that's worthwhile. The energy comparisons are plain dishonest. The power required for accessing from DRAM is the energy consumption of a *cache miss* !! What's the energy consumption of a load from cache? That would show you what the real gains are, and my guess is they are tiny. So: * I don't believe the energy savings are real. * There is no guarantee that it would be possible to implement it in hardware without a speed penalty, regardless of how many transistors you throw at it (hardware analogue of Amdahl's Law) * but the error bound stuff is cool.
Sep 16 2015
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote:
 On Tuesday, 15 September 2015 at 11:13:59 UTC, Ola Fosheim 
 Grøstad wrote:
 On Tuesday, 15 September 2015 at 10:38:23 UTC, ponce wrote:
 On Tuesday, 15 September 2015 at 09:35:36 UTC, Ola Fosheim 
 Grøstad wrote:
 http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf
That's a pretty convincing case. Who does it :)?
I'm not convinced. I think they are downplaying the hardware difficulties. Slide 34: Disadvantages of the Unum Format * Non-power-of-two alignment. Needs packing and unpacking, garbage collection. I think that disadvantage is so enormous that it negates most of the advantages. Note that in the x86 world, unaligned memory loads of SSE values still take longer than aligned loads. And that's a trivial case! The energy savings are achieved by using a primitive form of compression. Sure, you can reduce the memory bandwidth required by compressing the data. You could do that for *any* form of data, not just floating point. But I don't think anyone thinks that's worthwhile.
GPU do it a lot. Especially, but not exclusively on mobile. Not to reduce the misses (a miss is pretty much guaranteed, you load 32 thread at once in a shader core, each of them will require at least 8 pixel for a bilinear texture with mipmap, that's the bare minimum. That means 256 memory access at once. One of these pixel WILL miss, and it is going to stall the 32 threads). It is not a latency issue, but a bandwidth and energy one. But yeah, in the general case, random access is preferable, memory alignment, and the fact you don't need to do as much bookeeping are very significants. Also, predictable size mean you can split your dataset and process it in parallel, which is impossible if sizes are random.
 The energy comparisons are plain dishonest. The power required 
 for accessing from DRAM is the energy consumption of a *cache 
 miss* !! What's the energy consumption of a load from cache? 
 That would show you what the real gains are, and my guess is 
 they are tiny.
The energy comparison is bullshit. As long as you haven't loaded the data, you don't know how wide they are. Meaning you need either to go pessimistic and load for the worst case scenario or do 2 round trip to memory. The author also use a lot the wire vs transistor cost, and how it evolved? He is right. Except that you won't cram more wire at runtime into the CPU. The CPU need the wiring for the worst case scenario, always. The hardware is likely to be slower as you'll need way more wiring than for regular floats, and wire is not only cost, but also time. That being said, even a hit in L1 is very energy hungry. Think about it, you need to go a 8 - way fetch (so you'll end up loading 4k of data from the cache) in parallel with address translation (usually 16 ways) in parallel with snooping into the load and the store buffer. If the load is not aligned, you pretty much have to multiply this by 2 if it cross a cache line boundary. I'm not sure what his number represent, but hitting L1 is quite power hungry. He is right on that one.
 So:
 * I don't believe the energy savings are real.
 * There is no guarantee that it would be possible to implement 
 it in hardware without a speed penalty, regardless of how many 
 transistors you throw at it (hardware analogue of Amdahl's Law)
 * but the error bound stuff is cool.
Yup, that's pretty much what I get out of it as well.
Sep 16 2015
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:
 The energy comparison is bullshit. As long as you haven't 
 loaded the data, you don't know how wide they are. Meaning you 
 need either to go pessimistic and load for the worst case 
 scenario or do 2 round trip to memory.
That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support.
 The hardware is likely to be slower as you'll need way more 
 wiring than for regular floats, and wire is not only cost, but 
 also time.
You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists. If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it.
Sep 16 2015
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 16 September 2015 at 14:11:04 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix 
 wrote:
 The energy comparison is bullshit. As long as you haven't 
 loaded the data, you don't know how wide they are. Meaning you 
 need either to go pessimistic and load for the worst case 
 scenario or do 2 round trip to memory.
That really depends on memory layout and algorithm. A likely implementation would be a co-processor that would take a unum stream and then pipe it through a network of cores (tile based co-processor). The internal busses between cores are very very fast and with 256+ cores you get tremendous throughput. But you need a good compiler/libraries and software support.
No you don't. Because the streamer still need to load the unum one by one. Maybe 2 by 2 with a fair amount of hardware speculation (which means you are already trading energy for performances, so the energy argument is weak). There is no way you can feed 256+ cores that way. To gives you a similar example, x86 decoding is often the bottleneck on an x86 CPU. The number of ALUs in x86 over the past decade decreased rather than increased, because you simply can't decode fast enough to feed them. Yet, x86 CPUs have a 64 ways speculative decoding as a first stage.
 The hardware is likely to be slower as you'll need way more 
 wiring than for regular floats, and wire is not only cost, but 
 also time.
You need more transistors per ALU, but slower does not matter if the algorithm needs bounded accuracy or if it converge more quickly with unums. The key challenge for him is to create a market, meaning getting the semantics into scientific software and getting initial workable implementations out to scientists. If there is a market demand, then there will be products. But you need to create the market first. Hence he wrote an easy to read book on the topic and support people who want to implement it.
The problem is not transistor it is wire. Because the damn thing is variadic in every ways, pretty much every bit as input can end up anywhere in the functional unit. That is a LOT of wire.
Sep 16 2015
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 19:21:59 UTC, deadalnix wrote:
 No you don't. Because the streamer still need to load the unum 
 one by one. Maybe 2 by 2 with a fair amount of hardware 
 speculation (which means you are already trading energy for 
 performances, so the energy argument is weak). There is no way 
 you can feed 256+ cores that way.
You can load continuously 64 bytes in a stream, decode to your internal format and push them into the scratchpad of other cores. You could even do this in hardware. If you look at the ubox brute forcing method you compute many calculations over the same data, because you solve spatially, not by timesteps. So you can run many many parallell computations over the same data.
 To gives you a similar example, x86 decoding is often the 
 bottleneck on an x86 CPU. The number of ALUs in x86 over the 
 past decade decreased rather than increased, because you simply 
 can't decode fast enough to feed them. Yet, x86 CPUs have a 64 
 ways speculative decoding as a first stage.
That's because we use a dumb compiler that does not prefetch intelligently. If you are writing for a tile based VLIW CPU you preload. These calculations are highly iterative so I'd rather think of it as a co-processor solving a single equation repeatedly than running the whole program. You can run the larger program on a regular CPU or a few cores.
 The problem is not transistor it is wire. Because the damn 
 thing is variadic in every ways, pretty much every bit as input 
 can end up anywhere in the functional unit. That is a LOT of 
 wire.
I haven't seen a design, so I cannot comment. But keep in mind that the CPU does not have to work with the format, it can use a different format internally. We'll probably see FPGA implementations that can be run on FPGU cards for PCs within a few years. I read somewhere that a group in Singapore was working on it.
Sep 16 2015
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 16 September 2015 at 19:40:49 UTC, Ola Fosheim 
Grøstad wrote:
 You can load continuously 64 bytes in a stream, decode to your 
 internal format and push them into the scratchpad of other 
 cores. You could even do this in hardware.
1/ If you load the worst case scenario, then your power advantage is gone. 2/ If you load these one by one, how do you expect to feed 256+ cores ? Obviously you can make this in hardware. And obviously this is not going to be able to feed 256+ cores. Even with a chip at low frequency, let's say 800MHz or so, you have about 80 cycles to access memory. That mean you need to have 20 000+ cycles of work to do per core per unum. That simple back of the envelope calculation. Your proposal is simply ludicrous. It's a complete non starter. You can make this in hardware. Sure you can, no problem. But you won't because it is a stupid idea.
 To gives you a similar example, x86 decoding is often the 
 bottleneck on an x86 CPU. The number of ALUs in x86 over the 
 past decade decreased rather than increased, because you 
 simply can't decode fast enough to feed them. Yet, x86 CPUs 
 have a 64 ways speculative decoding as a first stage.
That's because we use a dumb compiler that does not prefetch intelligently.
You know, when you have no idea what you are talking about, you can just move on to something you understand. Prefetching would not change anything here. The problem come from variable size encoding, and the challenge it causes for hardware. You can have 100% L1 hit and still have the same problem. No sufficiently smart compiler can fix that.
 If you are writing for a tile based VLIW CPU you preload. These 
 calculations are highly iterative so I'd rather think of it as 
 a co-processor solving a single equation repeatedly than 
 running the whole program. You can run the larger program on a 
 regular CPU or a few cores.
That's irrelevant. The problem is not the kind of CPU, it is how do you feed it at a fast enough rate.
 The problem is not transistor it is wire. Because the damn 
 thing is variadic in every ways, pretty much every bit as 
 input can end up anywhere in the functional unit. That is a 
 LOT of wire.
I haven't seen a design, so I cannot comment. But keep in mind that the CPU does not have to work with the format, it can use a different format internally. We'll probably see FPGA implementations that can be run on FPGU cards for PCs within a few years. I read somewhere that a group in Singapore was working on it.
That's hardware 101. When you have a floating point unit, you get your 32 bits you get 23 bits that go into the mantissa FU and 8 in the exponent FU. For instance, if you multiply floats, you send the 2 exponent into a adder, you send the 2 mantissa into a 24bits multiplier (you add a leading 1), you xor the bit signs. You get the carry from the adder, and emit a multiply, or you count the leading 0 of the 48bit multiply result, shift by that amount and add the shit to the exponent. If you get a carry in the exponent adder, you saturate and emit an inifinity. Each bit goes into a given functional unit. That mean you need on wire from the input to the functional unit is goes to. Sale for these result. Now, if the format is variadic, you need to wire all bits to all functional units, because they can potentially end up there. That's a lot of wire, in fact the number of wire is growing quadratically with that joke. The author keep repeating that wire became the expensive thing and he is right. Meaning a solution with quadratic wiring is not going to cut it.
Sep 16 2015
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix wrote:
 You know, when you have no idea what you are talking about, you 
 can just move on to something you understand.
Ah, nice move. Back to your usual habits?
 Prefetching would not change anything here. The problem come 
 from variable size encoding, and the challenge it causes for 
 hardware. You can have 100% L1 hit and still have the same 
 problem.
There is _no_ cache. The compiler fully controls the layout of the scratchpad.
 That's hardware 101.
Is it? The core point is this: 1. if there is academic interest (i.e. publishing opportunities) you get research 2. if there is research you get new algorithms 3. you get funding etc You cannot predict at this point what the future will be like. Is it unlikely that anything specific will change status quo? Yes. Is it highly probable that something will change status quo? Yes. Will it happen over night. No. 50+ years has been invested in floating point design. Will this be offset over night, no. It'll probably take 10+ years before anyone has a different type of numerical ALU on their desktop than IEEE754. By that time we are in a new era.
Sep 16 2015
parent reply deadalnix <deadalnix gmail.com> writes:
On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix 
 wrote:
 You know, when you have no idea what you are talking about, 
 you can just move on to something you understand.
Ah, nice move. Back to your usual habits?
Stop
 Prefetching would not change anything here. The problem come 
 from variable size encoding, and the challenge it causes for 
 hardware. You can have 100% L1 hit and still have the same 
 problem.
There is _no_ cache. The compiler fully controls the layout of the scratchpad.
You are the king of goalspot shifting. You answer about x86 decoding you get served. You want to talk about a scraptch pad ? Good ! How do the data ends up in the scratchpad to begin with ? Using magic ? What is the scraptchpad made of if not flip flops ? If if so, how is it different from a cache as far as the hardware is concerned ? You can play with words, but the problem remain the same. When you get on chip memory, be it cache or scratchpad, and a variadic encoding, you can't even feed a handful of ALUs. How do you expect to feed 256+ VLIW cores ? There are 3 order of magintude of gap in your reasoning. You can't pull 3 orders of magnitude out of your ass and just pretend it can be done.
 That's hardware 101.
Is it?
Yes wire is hardware 101. I mean seriously, if one do not get how component can be wired together, one should probably abstain from making any hardware comment.
 You cannot predict at this point what the future will be like. 
 Is it unlikely that anything specific will change status quo? 
 Yes. Is it highly probable that something will change status 
 quo? Yes. Will it happen over night. No.

 50+ years has been invested in floating point design. Will this 
 be offset over night, no.

 It'll probably take 10+ years before anyone has a different 
 type of numerical ALU on their desktop than IEEE754. By that 
 time we are in a new era.
Ok listen that is not complicated. I don't know what car will come out next year? But I know there won't be a car that can go 10000km on 10 centiliter of gazoline. This would be physic defying stuff. Same thing you won't be able to feed 256+ cores if you load data sequentially. Don't get me this stupid we don't know what's going to happen tomorow bullshit. We won't have unicorn meat in supermarkets. We won't have free energy. We won't have interstellar travel. And we won't have the capability to feed 256+ cores sequentially. I gave you numbers you gave me bullshit.
Sep 16 2015
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix wrote:
 On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim 
 Grøstad wrote:
 On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix 
 wrote:
 You know, when you have no idea what you are talking about, 
 you can just move on to something you understand.
Ah, nice move. Back to your usual habits?
Stop
OK. I stop. You are beyond reason.
Sep 16 2015
parent deadalnix <deadalnix gmail.com> writes:
On Wednesday, 16 September 2015 at 21:12:11 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 16 September 2015 at 20:53:37 UTC, deadalnix 
 wrote:
 On Wednesday, 16 September 2015 at 20:30:36 UTC, Ola Fosheim 
 Grøstad wrote:
 On Wednesday, 16 September 2015 at 20:06:43 UTC, deadalnix 
 wrote:
 You know, when you have no idea what you are talking about, 
 you can just move on to something you understand.
Ah, nice move. Back to your usual habits?
Stop
OK. I stop. You are beyond reason.
True, how blind I was. It is fairly obvious now, thinking about it, that you can get 3 order of magnitude increase in sequential decoding in hardware by having a compiler with a vectorized SSA and a scratchpad ! Or maybe you have number to present us that show I'm wrong ?
Sep 16 2015
prev sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Sep 16, 2015 at 08:06:42PM +0000, deadalnix via Digitalmars-d wrote:
[...]
 When you have a floating point unit, you get your 32 bits you get 23
 bits that go into the mantissa FU and 8 in the exponent FU. For
 instance, if you multiply floats, you send the 2 exponent into a
 adder, you send the 2 mantissa into a 24bits multiplier (you add a
 leading 1), you xor the bit signs.
 
 You get the carry from the adder, and emit a multiply, or you count
 the leading 0 of the 48bit multiply result, shift by that amount and
 add the shit to the exponent.
 
 If you get a carry in the exponent adder, you saturate and emit an
 inifinity.
 
 Each bit goes into a given functional unit. That mean you need on wire
 from the input to the functional unit is goes to. Sale for these
 result.
 
 Now, if the format is variadic, you need to wire all bits to all
 functional units, because they can potentially end up there. That's a
 lot of wire, in fact the number of wire is growing quadratically with
 that joke.
 
 The author keep repeating that wire became the expensive thing and he
 is right. Meaning a solution with quadratic wiring is not going to cut
 it.
I found this .pdf that explains the unum representation a bit more: http://sites.ieee.org/scv-cs/files/2013/03/Right-SizingPrecision1.pdf On p.31, you can see the binary representation of unum. The utag has 3 bits for exponent size, presumably meaning the exponent can vary in size up to 7 bits. There are 5 bits in the utag for the mantissa, so it can be anywhere from 0 to 31 bits. It's not completely variadic, but it's complex enough that you will probably need some kind of shift register to extract the exponent and mantissa so that you can pass them in the right format to the various parts of the hardware. It definitely won't be as straightforward as the current floating-point format; you can't just wire the bits directly to the adders and multipliers. This is probably what the author meant by needing "more transistors". I guess his point was that we have to do more work in the CPU, but in return we (hopefully) reduce the traffic to DRAM, thereby saving the cost of data transfer. I'm not so sure how well this will work in practice, though, unless we have a working prototype that proves the benefits. What if you have a 10*10 unum matrix, and during some operation the size of the unums in the matrix changes? Assuming the worst case, you could have started out with 10*10 unums with small exponent/mantissa, maybe fitting in 2-3 cache lines, but after the operation most of the entries expand to 7-bit exponent and 31-bit mantissa, so now your matrix doesn't fit into the allocated memory anymore. So now your hardware has to talk to druntime to have it allocate new memory for storing the resulting unum matrix? The only sensible solution seems to be to allocate the maximum size for each matrix entry, so that if the value changes you won't run out of space. But that means we have lost the benefit of having a variadic encoding to begin with -- you will have to transfer the maximum size's worth of data when you load the matrix from DRAM, even if most of that data is unused (because the unum only takes up a small percentage of the space). The author proposed GC, but I have a hard time imagining a GC implemented in *CPU*, no less, colliding with the rest of the world where it's the *software* that controls DRAM allocation. (GC too slow for your application? Too bad, gotta upgrade your CPU...) The way I see it from reading the PDF slides, is that what the author is proposing would work well as a *software* library, perhaps backed up by hardware support for some of the lower-level primitives. I'm a bit skeptical of the claims of data traffic / power savings, unless there is hard data to prove that it works. T -- "The number you have dialed is imaginary. Please rotate your phone 90 degrees and try again."
Sep 16 2015
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 23:28:23 UTC, H. S. Teoh wrote:
 I'm not so sure how well this will work in practice, though, 
 unless we have a working prototype that proves the benefits.  
 What if you have a 10*10 unum matrix, and during some operation 
 the size of the unums in the matrix changes?  Assuming the 
 worst case, you could have started out with 10*10 unums with 
 small exponent/mantissa, maybe fitting in 2-3 cache lines, but 
 after the operation most of the entries expand to 7-bit 
 exponent and 31-bit mantissa, so now your matrix doesn't fit 
 into the allocated memory anymore.  So now your hardware has to 
 talk to druntime to have it allocate new memory for storing the 
 resulting unum matrix?
Let's not make it so complicated. The internal CPU format could just be 32 and 64 bit. The key concept is about recording closed/open intervals and precision. If you spend 16 cores of a 256 core tiled coprocessor on I/O you still have 240 cores left. For the external format, it depends on your algorithm. If you are using map reduce you load/unload working sets, let the coprocessor do most of the work and combine the results. Like an actor based pipeline. The problem is more that average programmers will have real trouble making good use of it, since the know-how isn't there.
 The author proposed GC, but I have a hard time imagining a GC 
 implemented in *CPU*, no less, colliding with
 the rest of the world where it's the *software* that controls 
 DRAM allocation.  (GC too slow for your application? Too bad, 
 gotta upgrade your CPU...)
That's a bit into the future, isn't it? But local memory is probably less that 256K and designed for the core, so… who knows what extras you could build in? If you did it, the effect would be local, but it sounds too complicated to be worth it. But avoid thinking that the programmer address memory directly. CPU+Compiler is one package. Your interface is the compiler, not the CPU as such.
 The way I see it from reading the PDF slides, is that what the 
 author is proposing would work well as a *software* library, 
 perhaps backed up by hardware support for some of the 
 lower-level primitives.  I'm a bit skeptical of the claims of
First you would need to establish that there are numerical advantages that scientists require in some specific fields. Then you need to build it into scientific software and accelerate it. For desktop CPUs, nah... most people don't care about accuracy that much. Standards like IEEE1788 might also make adoption of unum less likely.
Sep 16 2015
prev sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 16 September 2015 at 08:38:25 UTC, deadalnix wrote:
 Also, predictable size mean you can split your dataset and 
 process it in parallel, which is impossible if sizes are random.
I don't recall how he would deal with something similar to cache misses when you have to promote or demote a unum. However, my recollection of the book is that there was quite a bit of focus on a unum representation that has the same size as a double. If you only did the computations with this format, I would expect the sizes would be more-or-less fixed. Promotion would be pretty rare, but still possible, I would think. Compared to calculations with doubles there might not be a strong case for energy efficiency (but I don't really know for sure). My understanding was that the benefit for energy efficiency is only when you use a smaller sized unum instead of a float. I don't recall how he would resolve your point about cache misses. Anyway, while I can see a benefit from using unum numbers (accuracy, avoiding overflow, etc.) rather than floating point numbers, I think that performance or energy efficiency would have to be within range of floating point numbers for it to have any meaningful adoption.
Sep 16 2015
prev sibling next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 08:17:59 UTC, Don wrote:
 I'm not convinced. I think they are downplaying the hardware 
 difficulties. Slide 34:
I don't think he is downplaying it. He has said that it will probably take at least 10 years before it is available in hardware. There is also a company called Rex Computing that are looking at unum: http://www.theplatform.net/2015/07/22/supercomputer-chip-startup-scores-funding-darpa-contract/ He assumes that you use a scratchpad (a big register file), not caching, for intermediate calculations. His basic reasoning is that brute force ubox methods makes for highly parallel calculations. It might be possible to design ALUs that can work with various unum bit widths efficiently (many small or a few large)... who knows. You'll have to try first. Let's not forget that there is a _lot_ of legacy constraints and architectural assumptions in both x86 architecture.
 The energy comparisons are plain dishonest. The power required 
 for accessing from DRAM is the energy consumption of a *cache 
 miss* !! What's the energy consumption of a load from cache?
I think this argument is aiming at HPC where you can find funding for ASICs. They push a lot of data over the memory bus.
Sep 16 2015
parent reply Wyatt <wyatt.epp gmail.com> writes:
On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim 
Grøstad wrote:
 I don't think he is downplaying it. He has said that it will 
 probably take at least 10 years before it is available in 
 hardware. There is also a company called Rex Computing that are 
 looking at unum:
Oh hey, I remember these jokers. They were trying to blow some smoke about moving 288 GB/s at 4W. They're looking at unum? Of course they are; care to guess who's advising them? Yep. I'll be shocked if they ever even get to tape out. -Wyatt
Sep 16 2015
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Wednesday, 16 September 2015 at 20:35:16 UTC, Wyatt wrote:
 On Wednesday, 16 September 2015 at 08:53:24 UTC, Ola Fosheim 
 Grøstad wrote:
 I don't think he is downplaying it. He has said that it will 
 probably take at least 10 years before it is available in 
 hardware. There is also a company called Rex Computing that 
 are looking at unum:
Oh hey, I remember these jokers. They were trying to blow some smoke about moving 288 GB/s at 4W. They're looking at unum? Of course they are; care to guess who's advising them? Yep. I'll be shocked if they ever even get to tape out.
Yes, of course, most startups in hardware don't succeed. I assume they get knowhow from Adapteva.
Sep 16 2015
prev sibling parent Timon Gehr <timon.gehr gmx.ch> writes:
On 09/16/2015 10:17 AM, Don wrote:
 So:
 ...
 * There is no guarantee that it would be possible to implement it in
 hardware without a speed penalty, regardless of how many transistors you
 throw at it (hardware analogue of Amdahl's Law)
https://en.wikipedia.org/wiki/Gustafson's_law :o)
Sep 16 2015
prev sibling parent reply Anthony Di Franco <di.franco aya.yale.edu> writes:
On Tuesday, 15 September 2015 at 05:16:53 UTC, deadalnix wrote:
 On Saturday, 11 July 2015 at 03:02:24 UTC, Nick B wrote:
 On Thursday, 20 February 2014 at 10:10:13 UTC, Nick B wrote:
...
 If you are at all interested in computer arithmetic or 
 numerical methods, read this book. It is destined to be a 
 classic.
To be honest, that sound like snake oil salesman speech to me rather than science. It's all hand waving and nothing concrete is provided, the whole thing wrapped in way too much superlatives. The guy seems to have good credential. Why should I read that book ?
I read the whole book and did not regret it at all, but I was already looking for good interval arithmetic implementations. I found that the techniques are not too different (though improved in important ways) from what is mainstream in verified computing. I found signs that techniques like this are standard in beam physics. (Caveat, I am not a beam physicist, but my friend at CERN is.) And the bibliography for MSU's COSY INFINITY, a verified computing tool from the beam physics community, provided a lot of interesting background information: http://www.bt.pa.msu.edu/pub/ What is perhaps most surprising is not that the techniques work well but that they can be implemented efficiently in a sensible amount of hardware, little more expensive than bare floating point hardware. Even maybe the most surprising results about search-based global optimization with unums have precedent, see e.g. "Rigorous Global Search using Taylor Models," M. Berz, K. Makino, Symbolic Numeric Computation 2009, (2009) 11-19, http://bt.pa.msu.edu/cgi-bin/display.pl?name=GOSNC09 (Taylor model techniques are similar to direct computation with intervals but add a bit more sophistication.) So, from my perspective, I think a unum library would at least be an interesting and useful library, and roughly in the style of the Mathematica / Python libraries could reduce unum interval computations to floating point computations with a modest amount of overhead. There is a sense in which we might expect the small overhead up front to be well worth it in the overall system: less haste, more speed. Hacks to try to compensate for incorrect behavior after the fact may end up being more costly overall, certainly to the programmer but perhaps also to the machine. For example, the common need to stick a loop over an entire vector or matrix into the inner loop of an iterative method to renormalize to prevent floating point rounding errors from accumulating. Whether this library should be part of the standard library, I don't know. It would seem to depend on how much people want the standard library to support verified numerical computing. If it is clear that verified numerical computing needs good support in the standard library, something like unums should be there, maybe even with some other techniques built on top of them (Taylor model or Levi-Civita for example). Anthony
Sep 17 2015
next sibling parent reply Nick B <nick.barbalich gmail.com> writes:
On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di Franco 
wrote:

 I read the whole book and did not regret it at all, but I was 
 already looking for good interval arithmetic implementations. I 
 found that the techniques are not too different (though 
 improved in important ways) from what is mainstream in verified 
 computing.
It would seem to depend on how much people want the
 standard library to support verified numerical computing.
Anthony Good to know that you enjoyed reading the book. Can you describe what YOU mean by 'verified numerical computing', as I could not find a good description of it, and why is it important to have it. Nick
Sep 17 2015
next sibling parent Ola Fosheim Grostad <ola.fosheim.grostad+dlang gmail.com> writes:
On Friday, 18 September 2015 at 03:19:26 UTC, Nick B wrote:
 Can you describe what YOU mean by 'verified numerical 
 computing', as I could not find a good description of it, and 
 why is it important to have it.
Verified numerical computations provide results that are guaranteed to be without roundoff errors. A bit misleading term, perhaps.
Sep 17 2015
prev sibling parent reply Richard Davies <optevo gmail.com> writes:
On Friday, 18 September 2015 at 03:19:26 UTC, Nick B wrote:
 On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di 
 Franco wrote:

 I read the whole book and did not regret it at all, but I was 
 already looking for good interval arithmetic implementations. 
 I found that the techniques are not too different (though 
 improved in important ways) from what is mainstream in 
 verified computing.
Hi, I haven't finished the book but have read over half of it and browsed the rest. I wanted to add that an implementation of unums would have advantages beyond verifiable computing. Some examples that spring to mind are: Using low precision (8-bit) unums to determine if an answer exists before using a higher precision representation to do the calculation (example briefly discussed in the book is ray tracing). More generally, unums can self-tune their precision which may be generally useful in getting high precision answers efficiently. It is possible for the programmer to specify the level of accuracy so that unums don't waste time calculating bits that have no meaning. Parallelisation - floating point ops are not associative but unum ops are. Tighter bounds on results than interval arithmetic or significance arithmetic. These are just a few areas where a software implementation could be useful. If you've ever had any issues with floating point, I'd recommend reading the book, not just because of the approach it proposes to solve these but also because it's very clearly written and quite entertaining (given the subject matter). Richard
Nov 08 2015
parent reply Lionello Lunesu <lionello lunesu.remove.com> writes:
On 09/11/15 04:38, Richard Davies wrote:
 On Friday, 18 September 2015 at 03:19:26 UTC, Nick B wrote:
 On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di Franco wrote:

 I read the whole book and did not regret it at all, but I was already
 looking for good interval arithmetic implementations. I found that
 the techniques are not too different (though improved in important
 ways) from what is mainstream in verified computing.
Hi, I haven't finished the book but have read over half of it and browsed the rest. I wanted to add that an implementation of unums would have advantages beyond verifiable computing. Some examples that spring to mind are: Using low precision (8-bit) unums to determine if an answer exists before using a higher precision representation to do the calculation (example briefly discussed in the book is ray tracing). More generally, unums can self-tune their precision which may be generally useful in getting high precision answers efficiently. It is possible for the programmer to specify the level of accuracy so that unums don't waste time calculating bits that have no meaning. Parallelisation - floating point ops are not associative but unum ops are. Tighter bounds on results than interval arithmetic or significance arithmetic. These are just a few areas where a software implementation could be useful. If you've ever had any issues with floating point, I'd recommend reading the book, not just because of the approach it proposes to solve these but also because it's very clearly written and quite entertaining (given the subject matter). Richard
Yeah, I got curious too. I spend some time on it yesterday and had a stab at writing it in D. I was playing with the idea of using native floating point types to store Unums. This would not have the full benefit of the dynamically sized Unums, but would allow for the accuracy benefits. Still need to implement the basic arithmetic (interval) stuff: https://github.com/lionello/unumd L.
Nov 14 2015
parent reply Nick_B <nick.barbalich gmail.com> writes:
On Sunday, 15 November 2015 at 04:19:21 UTC, Lionello Lunesu 
wrote:
 On 09/11/15 04:38, Richard Davies wrote:

 Yeah, I got curious too. I spend some time on it yesterday and 
 had a stab at writing it in D.
Hi. I send a email to John Gustafson yesterday, re this thread. He replied as follows: "There are efforts underway worldwide to support unums in C, Python, Julia, Java, and now D. And the book has only been out for nine months!" Nick
Nov 16 2015
parent reply Lionello Lunesu <lionello lunesu.remove.com> writes:
On 17/11/15 07:52, Nick_B wrote:
 On Sunday, 15 November 2015 at 04:19:21 UTC, Lionello Lunesu wrote:
 On 09/11/15 04:38, Richard Davies wrote:

 Yeah, I got curious too. I spend some time on it yesterday and had a
 stab at writing it in D.
Hi. I send a email to John Gustafson yesterday, re this thread. He replied as follows: "There are efforts underway worldwide to support unums in C, Python, Julia, Java, and now D. And the book has only been out for nine months!" Nick
Ha, and I think we can do a better job than the others! Surely in terms of performance! Funny, I was indeed looking at the Julia and Python (and John's Matematica code) to learn how some of the operations on Unums work. Still a lot to do though. I've basically only declared the type, still needs all the calculations. L.
Nov 17 2015
parent reply Nick B <nick.barbalich gmail.com> writes:
Hi

John Gustafson was in town (Wellington, NZ) for the Multicore 
World Conference 2016 ( http://www.multicoreworld.com/) 
conference. I caught up with him, tonight, and spoke to him for 
about two hours. Here is a quick summary of what we discussed.  
John has just redesigned Unums, to address the design issues in 
version 1.0.  He presented his Powerpoint presentation to the 
conference, with the details of Unums 2.0 (this is a tentative 
name at present).  Its a improved design, but I will only brief 
detail it:  "It  will have  more dynamic range with 16-bit values 
than IEEE half-precision, but only by a small amount. Still 
remarkable to be uniformly better in dynamic range and precision, 
with support for inexact values and perfect reciprocation. If a 
language supports just one unum data type, John believes it 
should be the 16-bit one".  John has agreed to provide a link to 
the Powerpoint presentation, in a couple of weeks, and then 
later, a link to his new published paper on the subject, when it 
is ready.  There will likely be a new book, building on version 
1.0, and, again, tentatively titled 'Unums 2.0'. I also discussed 
with him, about integrating it with D. At the present, there is a 
'C' codebase under construction, but this could be rewritten in D 
in the future.  D may require some language changes, and a new 
phobos library, to support this advanced functionality. Of course 
Walter will have decide if he wants this advanced numbering 
system as a part of D.

As an aside, John mentioned that Rex Computing 
(http://www.rexcomputing.com/) is using Unums with the Julia 
language, for their new hyper-efficient processor architecture. 
It will be interesting to see what these whiz kids deliver in 
time.

cheers
Nick
Feb 17
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 17 February 2016 at 08:11:21 UTC, Nick B wrote:
 Hi

 John Gustafson was in town (Wellington, NZ) for the Multicore 
 World Conference 2016 ( http://www.multicoreworld.com/) 
 conference. I caught up with him, tonight, and spoke to him for 
 about two hours. Here is a quick summary of what we discussed.  
 John has just redesigned Unums, to address the design issues in 
 version 1.0.  He presented his Powerpoint presentation to the 
 conference, with the details of Unums 2.0 (this is a tentative 
 name at present).  Its a improved design, but I will only brief 
 detail it:  "It  will have  more dynamic range with 16-bit 
 values than IEEE half-precision, but only by a small amount. 
 Still remarkable to be uniformly better in dynamic range and 
 precision, with support for inexact values and perfect 
 reciprocation. If a language supports just one unum data type, 
 John believes it should be the 16-bit one".  John has agreed to 
 provide a link to the Powerpoint presentation, in a couple of 
 weeks, and then later, a link to his new published paper on the 
 subject, when it is ready.  There will likely be a new book, 
 building on version 1.0, and, again, tentatively titled 'Unums 
 2.0'. I also discussed with him, about integrating it with D. 
 At the present, there is a 'C' codebase under construction, but 
 this could be rewritten in D in the future.  D may require some 
 language changes, and a new phobos library, to support this 
 advanced functionality. Of course Walter will have decide if he 
 wants this advanced numbering system as a part of D.
I would be interested in the Powerpoint when it becomes available. As for getting it to work in D, I'm not sure how much language changes would be necessary. If they can get it to work in C, then surely it would work in D. After all, the book has an implementation in python (albeit this is not the latest and great version apparently). Wrt phobos, I would just recommend that whatever unum library gets eventually written has a companion with the equivalent of the functions from std.math.
Feb 17
parent reply Nick B <nick.barbalich gmail.com> writes:
On Wednesday, 17 February 2016 at 16:35:41 UTC, jmh530 wrote:
 On Wednesday, 17 February 2016 at 08:11:21 UTC, Nick B wrote:
 Wrt phobos, I would just recommend that whatever unum library 
 gets eventually written has a companion with the equivalent of 
 the functions from std.math.
Having just looked at the slides again, I believe this will break compatibility with std.math, (for example it throws out NaN), just as D has broken full compatibility with all of C++. I hope to have a link to the revised presentation within 7 days. Can anyone tell me who are the maths experts, and hard science users, around here ? Nick
Feb 20
parent Nick B <nick.barbalich gmail.com> writes:
On Saturday, 20 February 2016 at 23:25:40 UTC, Nick B wrote:
 On Wednesday, 17 February 2016 at 16:35:41 UTC, jmh530 wrote:
 On Wednesday, 17 February 2016 at 08:11:21 UTC, Nick B wrote:
 Having just looked at the slides again, I believe this will 
 break compatibility with std.math, (for example it throws out 
 NaN), just as D has broken full compatibility with all of C++.
UNUM II is also proposing to break completely from IEEE 754 floats and gain Computation with mathematical rigor .......
 Can anyone tell me who are the maths experts, and hard science 
 users, around here ?


 Nick
Feb 20
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Thursday, 17 September 2015 at 23:53:30 UTC, Anthony Di Franco 
wrote:
 Whether this library should be part of the standard library, I 
 don't know. It would seem to depend on how much people want the 
 standard library to support verified numerical computing. If it 
 is clear that verified numerical computing needs good support 
 in the standard library, something like unums should be there, 
 maybe even with some other techniques built on top of them 
 (Taylor model or Levi-Civita for example).
I don't think you should expect D to support verifiable programming. The only person here that has pushed for it consistently is Bearophile, but he is not a dev (and where is he?). Andrei has previously voiced the opinion that interval arithmetics as defined is ad-hoc and that D should do it differently: http://forum.dlang.org/post/l8su3p$g4o$1 digitalmars.com Walter, Andrei and many others have previously argued that you can turn asserts into assumes (basically assuming that they hold) without wrecking total havoc to the correctness of the program after optimization. It has also been argued that signalling NaNs are useless and that reproducible floating point math (IEEE754-2008) is not going in based on some pragmatic assumptions that I never quite understood. The current definition of D floats is fundamentally incompatible with IEEE 754-2008. So I am not even sure if you can implement IEEE 1788 (interval arithmetics) as a plain D library. D also only have modular integer math so you cannot detect overflow by adding a compiler switch since libraries may depend on modular arithmetic behaviour. D is created by hackers who enjoy hacking. They don't have the focus on correctness that verifiable-anything requires. So if you enjoy hacking you'll have fun. If are into reliability, stability and correctness you'll get frustrated. I'm not even sure you can have it both ways (both have a hacker mindset and a correctness mindset in the same design process).
Sep 18 2015
parent reply skoppe <mail skoppe.eu> writes:
On Friday, 18 September 2015 at 09:25:00 UTC, Ola Fosheim Grøstad 
wrote:
 D is created by hackers who enjoy hacking. They don't have the 
 focus on correctness that verifiable-anything requires. So if 
 you enjoy hacking you'll have fun. If are into reliability, 
 stability and correctness you'll get frustrated. I'm not even 
 sure you can have it both ways (both have a hacker mindset and 
 a correctness mindset in the same design process).
You forgot to mention that D is quite attractive for people who just want to complain on forums.
Sep 18 2015
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
On Friday, 18 September 2015 at 13:39:24 UTC, skoppe wrote:
 You forgot to mention that D is quite attractive for people who 
 just want to complain on forums.
Yes, but that does not define the language. That's just a consequence of people having expectations and caring about where it is heading. If you want to avoid that you have to be upfront about where it is at and where it is going. If people didn't care about D and where it is heading, then they would not complain.
Sep 18 2015
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:
Also keep in mind that people who care about the language 
complain only in the forums.

People who no longer care about the language and are upset 
because they had too high expectations complain not on the 
forums, but on reddit, slashdot and blogs...

So setting expectations where they belong pays off. D really need 
to improve on that aspect. I basically just involves a focus on 
honest and objective communication throughout.
Sep 18 2015