www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Exotic floor() function - D is different

reply "Bob W" <nospam aol.com> writes:
The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

  dmc
  djgpp
  dmdscript
  jscript
  assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.


//------------------------------

import std.stdio,std.string,std.math;

int main(char[][] av) {
  if (av.length!=2) {
    printf("\nEnter Val! (e.g. 0.0000195)\n");  return(0);
  }

  double x=atof(av[1]);                    // expecting 0.0000195;
  writef("          x*1e6:%12.6f\n",x*1e6);
  writef("     floor(x..):%12.6f\n",floor(1e6*x));
  writef("  floor(.5+x..):%12.6f\n",floor(.5 + 1e6*x));
  writef("  floor(.5+co.):%12.6f\n",floor(.5 + 1e6*0.0000195));

  return(0);
}
Mar 28 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Bob W" <nospam aol.com> wrote in message
news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.
Mar 30 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

 "Bob W" <nospam aol.com> wrote in message
 news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.

I can follow what you say, but can you explain the output of the program below? There appears to be a difference in the way variables and literals are treated. import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195; y = 0.0000195; z = 0.0000195; writefln(" Raw Floor"); writefln("Using float variable: %12.6f %12.6f", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using double variable: %12.6f %12.6f", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using real variable: %12.6f %12.6f", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln("Using double literal: %12.6f %12.6f", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln("Using real literal: %12.6f %12.6f", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } ---------- I get the following output... ---------- Raw Floor Using float variable: 19.999999 19.000000 Using double variable: 20.000000 19.000000 Using real variable: 20.000000 19.000000 Using float literal: 19.999999 20.000000 Using double literal: 20.000000 20.000000 Using real literal: 20.000000 20.000000 -- Derek Melbourne, Australia 31/03/2005 6:43:48 PM
Mar 31 2005
parent reply "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:7di6xztjokyz.6vnxzcx1d7l8.dlg 40tude.net...
 On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

 "Bob W" <nospam aol.com> wrote in message
 news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.

I can follow what you say, but can you explain the output of the program below? There appears to be a difference in the way variables and literals are treated. import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195; y = 0.0000195; z = 0.0000195; writefln(" Raw Floor"); writefln("Using float variable: %12.6f %12.6f", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using double variable: %12.6f %12.6f", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using real variable: %12.6f %12.6f", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln("Using double literal: %12.6f %12.6f", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln("Using real literal: %12.6f %12.6f", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } ---------- I get the following output... ---------- Raw Floor Using float variable: 19.999999 19.000000 Using double variable: 20.000000 19.000000 Using real variable: 20.000000 19.000000 Using float literal: 19.999999 20.000000 Using double literal: 20.000000 20.000000 Using real literal: 20.000000 20.000000 -- Derek Melbourne, Australia 31/03/2005 6:43:48 PM

Great job! I could not believe it first: writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); producing the following output: Using float variable: 19.999999 20.000000 Looks like floor() mutates to ceil() at times. To ensure that this is not "down under" specific (Melbourne), I have repeated your test in the northern hemisphere, and, not surprisingly, it did the same thing. Now I am pretty curious to know why this is happening. We'll see if Walter comes up with an answer .....
Mar 31 2005
parent "Walter" <newshound digitalmars.com> writes:
"Bob W" <nospam aol.com> wrote in message
news:d2i3et$27dg$1 digitaldaemon.com...
 Great job! I could not believe it first:

     writefln("Using float   literal: %12.6f %12.6f",
         (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

 producing the following output:

    Using float  variable:    19.999999    20.000000

 We'll see if Walter comes up with an answer .....

I suggest in general viewing how these things work (floating, chopping, rounding, precision, etc.) is to print things using the %a format (which prints out ALL the bits in hexadecimal format). As to the specific case above, let's break down each (using suffix 'd' to represent double): (.5 + 1e6*0.0000195f) => (.5d + 1e6d * cast(double)0.0000195f), result is double floor(.5 + 1e6*0.0000195f)) => floor(cast(real)(.5d + 1e6d * cast(double)0.0000195f)), result is real When writef prints a real, it adds ".5" to the last signficant decimal digit and chops. This will give DIFFERENT results for a double and for a real. It's also DIFFERENT from the binary rounding that goes on in intermediate floating point calculations, which adds "half a bit" (not .5) and chops. Also, realize that internally to the FPU, a "guard bit" and a "sticky bit" are maintained for a floating point value, these influence rounding, and are discarded when a value leaves the FPU and is written to memory. What is happening here is that you start with a value that is not exactly representable, then putting it through a series of precision changes and roundings, and comparing it with the result of a different series of precision changes and roundings, and expecting the results to match bit for bit. There's no way to make that happen.
Apr 01 2005
prev sibling parent reply "Bob W" <nospam aol.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:d2g9jj$8om$1 digitaldaemon.com...
 "Bob W" <nospam aol.com> wrote in message
 news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.

Thank you for your information, Walter. However, I am not convinced that the culprit ist the 80-bit floating point format. This is due to some tests I have made programming the FPU directly. Based on my above stated example, the 80 bit format is perfectly capable to generate the 'mainstream result' of 20 as opposed to the lone 19 which D is producing. Some more info, which might lead to the real problem: - D is not entirely 80-bit based as claimed. - Literals are converted to 64 bit first (and from there to 80 bits) at compile time if no suffix is used, even if the target is of type 'real'. - atof() for example is returning a 'real' value which is obviously derived from a 'double', thus missing some essential bits at the end. Example: The hex value for 0.0000195 in 'real' can be expressed as 3fef a393ee5e edcc20d5 or 3fef a393ee5e edcc20d6 (due to the non-decimal fraction). The same value converted from a 'double' would be 3fef a393ee5e edcc2000 and therefore misses several trailing bits. This could cause the floor() function to misbehave. I hope this info was somewhat useful. Cheers.
Mar 31 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Bob W" <nospam aol.com> wrote in message
news:d2ieh5$2ksl$1 digitaldaemon.com...
 - D is not entirely 80-bit based as claimed.

Not true, it fully supports 80 bits.
 - Literals are converted to 64 bit first (and from there
   to 80 bits) at compile time if no suffix is used, even
   if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".
 - atof() for example is returning a 'real' value which is
   obviously derived from a 'double', thus missing some
   essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.
 Example:

 The hex value for 0.0000195 in 'real' can be expressed as
        3fef a393ee5e edcc20d5
 or
        3fef a393ee5e edcc20d6
 (due to the non-decimal fraction).

 The same value converted from a 'double' would be
        3fef a393ee5e edcc2000
 and therefore misses several trailing bits. This could
 cause the floor() function to misbehave.


 I hope this info was somewhat useful.

Perhaps the following program will help: import std.stdio; void main() { writefln("float %a", 0.0000195F); writefln("double %a", 0.0000195); writefln("real %a", 0.0000195L); writefln("cast(real)float %a", cast(real)0.0000195F); writefln("cast(real)double %a", cast(real)0.0000195); writefln("cast(real)real %a", cast(real)0.0000195L); writefln("float %a", 0.0000195F * 7 - 195); writefln("double %a", 0.0000195 * 7 - 195); writefln("real %a", 0.0000195L * 7 - 195); } float 0x1.4727dcp-16 double 0x1.4727dcbddb984p-16 real 0x1.4727dcbddb9841acp-16 cast(real)float 0x1.4727dcp-16 cast(real)double 0x1.4727dcbddb984p-16 cast(real)real 0x1.4727dcbddb9841acp-16 float -0x1.85ffeep+7 double -0x1.85ffee1bd1edap+7 real -0x1.85ffee1bd1ed9dfep+7
Apr 01 2005
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 1 Apr 2005 15:03:02 -0800, Walter wrote:

 "Bob W" <nospam aol.com> wrote in message
 news:d2ieh5$2ksl$1 digitaldaemon.com...
 - D is not entirely 80-bit based as claimed.

Not true, it fully supports 80 bits.
 - Literals are converted to 64 bit first (and from there
   to 80 bits) at compile time if no suffix is used, even
   if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".
 - atof() for example is returning a 'real' value which is
   obviously derived from a 'double', thus missing some
   essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.
 Example:

 The hex value for 0.0000195 in 'real' can be expressed as
        3fef a393ee5e edcc20d5
 or
        3fef a393ee5e edcc20d6
 (due to the non-decimal fraction).

 The same value converted from a 'double' would be
        3fef a393ee5e edcc2000
 and therefore misses several trailing bits. This could
 cause the floor() function to misbehave.


 I hope this info was somewhat useful.

Perhaps the following program will help: import std.stdio; void main() { writefln("float %a", 0.0000195F); writefln("double %a", 0.0000195); writefln("real %a", 0.0000195L); writefln("cast(real)float %a", cast(real)0.0000195F); writefln("cast(real)double %a", cast(real)0.0000195); writefln("cast(real)real %a", cast(real)0.0000195L); writefln("float %a", 0.0000195F * 7 - 195); writefln("double %a", 0.0000195 * 7 - 195); writefln("real %a", 0.0000195L * 7 - 195); } float 0x1.4727dcp-16 double 0x1.4727dcbddb984p-16 real 0x1.4727dcbddb9841acp-16 cast(real)float 0x1.4727dcp-16 cast(real)double 0x1.4727dcbddb984p-16 cast(real)real 0x1.4727dcbddb9841acp-16 float -0x1.85ffeep+7 double -0x1.85ffee1bd1edap+7 real -0x1.85ffee1bd1ed9dfep+7

I repeat, (I think) I understand what you are saying but can you explain the output of this ... <code> import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195; y = 0.0000195; z = 0.0000195; writefln(" %24s %24s","Raw","Floor"); writefln("Using float variable: %24a %24a", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using double variable: %24a %24a", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using real variable: %24a %24a", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using float literal: %24a %24a", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln("Using double literal: %24a %24a", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln("Using real literal: %24a %24a", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } </code> ______________ Output is ... Raw Floor Using float variable: 0x1.3fffff4afp+4 0x1.3p+4 Using double variable: 0x1.4p+4 0x1.3p+4 Using real variable: 0x1.3ffffffffffffe68p+4 0x1.3p+4 Using float literal: 0x1.3fffff4afp+4 0x1.4p+4 Using double literal: 0x1.4p+4 0x1.4p+4 Using real literal: 0x1.4000000000000002p+4 0x1.4p+4 There seems to be different treatment of literals and variables. Even apart from that, given the values above, I can understand the floor behaviour except for lines 2(double variable) and 6 (real literal). -- Derek Parnell Melbourne, Australia 2/04/2005 10:19:43 AM
Apr 01 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:eouhnxxkjb80$.clvse1356mlr.dlg 40tude.net...
 There seems to be different treatment of literals and variables.

No, there isn't. The reason for the difference is when you assign the literal to z. Use the 'L' suffix for a real literal.
Apr 01 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Fri, 1 Apr 2005 18:50:40 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:eouhnxxkjb80$.clvse1356mlr.dlg 40tude.net...
 There seems to be different treatment of literals and variables.

No, there isn't. The reason for the difference is when you assign the literal to z. Use the 'L' suffix for a real literal.

Ok, I did that. And I still can't explain the output. Raw Floor Using float variable: 0x1.3fffff4afp+4 0x1.3p+4 Using double variable: 0x1.4p+4 0x1.3p+4 Using real variable: 0x1.4000000000000002p+4 0x1.4p+4 Using float literal: 0x1.3fffff4afp+4 0x1.4p+4 Using double literal: 0x1.4p+4 0x1.4p+4 Using real literal: 0x1.4000000000000002p+4 0x1.4p+4 Look at the results for doubles. How does floor(0x1.4p+4) give 0x1.3p+4 when the expression is a variable and give 0x1.4p+4 when the expression is a literal? -- Derek Parnell Melbourne, Australia 2/04/2005 3:34:22 PM
Apr 01 2005
next sibling parent Derek Parnell <derek psych.ward> writes:
On Sat, 2 Apr 2005 15:39:01 +1000, Derek Parnell wrote:

I've reformatted the display to make it easier to spot the anomaly.

                                            Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4

Using double variable:                 0x1.4p+4                 0x1.3p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4

Using real   variable:  0x1.4000000000000002p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

And here is the program that created the above ...
<code>
import std.stdio;
import std.math;
import std.string;

void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195F;
  y = 0.0000195;
  z = 0.0000195L;
  writefln("                       %24s %24s","Raw","Floor");
  writefln("Using float  variable: %24a %24a", 
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using float   literal: %24a %24a", 
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
                    
  writefln("");                    
  writefln("Using double variable: %24a %24a", 
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using double  literal: %24a %24a", 
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));

  writefln("");                    
  writefln("Using real   variable: %24a %24a", 
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using real    literal: %24a %24a", 
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


}
</code>
-- 
Derek Parnell
Melbourne, Australia
2/04/2005 4:48:12 PM
Apr 01 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.
Apr 02 2005
next sibling parent "Bob W" <nospam aol.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:d2lodh$2qhf$1 digitaldaemon.com...
 "Derek Parnell" <derek psych.ward> wrote in message
 news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.

It's C legacy hidden in the way the compiler parses this code. You'll be facing these kind of questions over and over again, unless you move a step further away from C and let the compiler treat unsuffixed literals as the "internal compiler floating point precision format". See my thread: "80 Bit Challenge"
Apr 02 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.

Got it. So to summarize, in expressions that contain at least one double variable, each term is promoted to real before expression evaluation, but if the expression only contains double literals, then the terms are not promoted to real. Why did you decide to have this anomaly? -- Derek Parnell Melbourne, Australia 2/04/2005 11:46:44 PM
Apr 02 2005
next sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...
 So to summarize, in expressions that contain at least one double variable,
 each term is promoted to real before expression evaluation, but if the
 expression only contains double literals, then the terms are not promoted
 to real.

 Why did you decide to have this anomaly?

It's the way C works.
Apr 02 2005
parent Derek Parnell <derek psych.ward> writes:
On Sat, 2 Apr 2005 10:04:32 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...
 So to summarize, in expressions that contain at least one double variable,
 each term is promoted to real before expression evaluation, but if the
 expression only contains double literals, then the terms are not promoted
 to real.

 Why did you decide to have this anomaly?

It's the way C works.

I understand. And here I was thinking that D was meant to be better than C. My bad. -- Derek Parnell Melbourne, Australia 3/04/2005 8:13:09 AM
Apr 02 2005
prev sibling parent reply "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...
 On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.

Got it. So to summarize, in expressions that contain at least one double variable, each term is promoted to real before expression evaluation, but if the expression only contains double literals, then the terms are not promoted to real. Why did you decide to have this anomaly? -- Derek Parnell Melbourne, Australia 2/04/2005 11:46:44 PM

Some further info: Currently it seems that in the D language no literal is ever promoted to real directly if it was not suffixed with a "L". You can cast(real) it, and it will still be a double which is converted to a crippled real in the FPU, because some of its matissa bits went missing. There are many exceptions though: All floating point integers (1.0 2.0 10.0 etc.) and fractions like 0.5 0.25 0.125 etc. are converted to proper real values, because they are accurately represented in binary floating point formats. But even they are initially doubles, which are just unharmed by the conversion because most of their trailing mantissa bits are zero. Any other fractional number (e.g. 1.2) cannot be represented accurately in the binary system, so its double representation is not equivalent to its real representation (nor is it to the decimal literal). If such a double is converted to real, it is missing several bits of precision, so it will not correspond accurately to its properly converted counterpart (e.g. 1.2L). As a summary: If you feel the need using extended double (real) precision in D, never ever forget the "L" for literals unless you want "special effects". Examples: real r=1.2L; // proper 80 bit real assigned to r real r=1.2; // inaccurate truncated 80 bit real real r=2.4/2.0; // inaccurate (2.4 loses precision) real r=2.4/2.0L; // inaccurate for the same reason real r=2.4L/2.0; // this one will work (2.0 == 2.0L) real r=2.4L/2.0L; // thats the safe way to do it real r=cast(real)1.2; // inaccurate, converted from // 1.2 as a double By the way, C does it the same way for historic reasons. Other languages are more user friendly and I am still hoping that D might evolve in this direction.
Apr 02 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Bob W" <nospam aol.com> wrote in message
news:d2nd96$1aos$1 digitaldaemon.com...
 By the way, C does it the same way for historic
 reasons. Other languages are more user friendly
 and I am still hoping that D might evolve in this
 direction.

Actually, many languages, mathematical programs, and even C compilers have *dropped* support for 80 bit long doubles. At one point, Microsoft had even made it impossible to execute 80 bit floating instructions on their upcoming Win64 (I made some frantic phone calls to them and apparently was the only one who ever made a case to them in favor of 80 bit long doubles, they said they'd put the support back in). Intel doesn't support 80 bit reals on any of their new vector floating point instructions. The 64 bit chips only support it in a 'legacy' manner. Java, C#, VC, Javascript do not support 80 bit reals. I haven't done a comprehensive survey of computer languages, but as far as I can tell D stands pretty much alone in its support for 80 bits, along with a handful of C/C++ compilers (including DMC). Because of this shaky operating system and chip support for 80 bits, it would be a mistake to center D's floating point around 80 bits. Some systems may force a reversion to 64 bits. On the other hand, ongoing system support for 64 bit doubles is virtually guaranteed, and D generally follows C's rules with these. (BTW, this thread is a classic example of "build it, and they will come". D is almost single handedly rescuing 80 bit floating point from oblivion, since it makes such a big deal about it and has wound up interesting a lot of people in it. Before D, as far as I could tell, nobody cared a whit about it. I think it's great that this has struck such a responsive chord.)
Apr 03 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

 I haven't done a comprehensive survey of computer languages, but as far as I
 can tell D stands pretty much alone in its support for 80 bits, along with a
 handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ? It doesn't even say the minimum size, so one can only assume 64... I think it would be more clear to say "80 bits minimum", and then future CPUs/code is still free to use 128-bit extended doubles too ? (since D allows all FP calculations to be done at a higher precision) This would be simplified by padding the 80-bit floating point to a full 16 bytes, by adding zeros (as suggested by performance anyway) And then, with both 128-bit integers and 128-bit floating point, D would truly be equipped to face both today (64) and tomorrow... (and with a "real" alias, it's still the "largest hardware implemented") Just my 2 öre, --anders
Apr 03 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:d2og5l$27nh$3 digitaldaemon.com...
 Walter wrote:

 I haven't done a comprehensive survey of computer languages, but as far


 can tell D stands pretty much alone in its support for 80 bits, along


 handful of C/C++ compilers (including DMC).

It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software emulator. I've used such emulators before, and they are really, really slow. I don't think it's practical for D floating point to be 100x slower on some machines.
 I think it would be more clear to say "80 bits minimum", and then
 future CPUs/code is still free to use 128-bit extended doubles too ?
 (since D allows all FP calculations to be done at a higher precision)

What it's supposed to be is the max precision supported by the hardware the D program is running on.
 This would be simplified by padding the 80-bit floating point to
 a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte boundaries. To conform to the C ABI, D must follow suit.
 And then, with both 128-bit integers and 128-bit floating point,
 D would truly be equipped to face both today (64) and tomorrow...

 (and with a "real" alias, it's still the "largest hardware implemented")


 Just my 2 öre,
 --anders

Apr 03 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software emulator. I've used such emulators before, and they are really, really slow. I don't think it's practical for D floating point to be 100x slower on some machines.

Me neither. Emulating 64-bit integers with two 32-bit registers is OK, since that is a whole lot easier. (could even be done for 128-bit ints?) But emulating 80-bit floating point ? Eww. Emulating a 128-bit double is better, but the current method is cheating a lot on IEEE-755 spec... No, I meant that extended precision should be *unavailable* on some CPU. But maybe it's better to have it work in D, like long double does in C ? (i.e. it falls back to using regular doubles, possibly with warnings) If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware? Since that was the whole idea... (have "extended" map to 80-bit FP type)
 What it's supposed to be is the max precision supported by the hardware the
 D program is running on.

OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ? Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-) It seems like likely real-life values would be: 64, 80, 96 and 128 bits (PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above) It's possible that a future 128-bit CPU would have a 128-bit FPU too... But who knows ? (I haven't even seen the slightest hint of such a beast)
This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte boundaries. To conform to the C ABI, D must follow suit.

I thought that was an ABI option, how to align "long double" types ? It was my understanding that it was aligned to 96 bits on X86, and to 128 bits on X86_64. But I might very well be wrong there... (it's just the impression that I got from reading the GCC manual) i.e. it still uses the regular 80 bit floating point registers, but pads the values out with zeroes when storing them in memory. --anders
Apr 03 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:d2pdbk$30dj$1 digitaldaemon.com...
 If so, just tell me it's better to have a flexible width language type,
 than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.
 What it's supposed to be is the max precision supported by the hardware


 D program is running on.

OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ? Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-) It seems like likely real-life values would be: 64, 80, 96 and 128 bits (PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above) It's possible that a future 128-bit CPU would have a 128-bit FPU too... But who knows ? (I haven't even seen the slightest hint of such a beast)

When I first looked at the AMD64 documentation, I was thrilled to see "m128" for a floating point type. I was crushed when I found it meant "two 64 bit doubles". I'd love to see a big honker 128 bit floating point type in hardware.
This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte boundaries. To conform to the C ABI, D must follow suit.


The only option is to align it to what the corresponding C compiler does.
 It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.
 and to 128 bits on X86_64. But I might very well be wrong there...
 (it's just the impression that I got from reading the GCC manual)

 i.e. it still uses the regular 80 bit floating point registers,
 but pads the values out with zeroes when storing them in memory.

 --anders

Apr 03 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ? The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?
 When I first looked at the AMD64 documentation, I was thrilled to see "m128"
 for a floating point type. I was crushed when I found it meant "two 64 bit
 doubles". I'd love to see a big honker 128 bit floating point type in
 hardware.

I had a similar experience, with PPC64 and GCC, a while back... (-mlong-double-128, referring to the IBM AIX style DoubledDouble) Anyway, double-double has no chance of being full IEEE 755 spec.
It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two) But it was my understanding that on the X86/X86_64 family of processors that Windows used to use 10-byte doubles (and then removed extended?), and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now uses 16-byte doubles (using the GCC option of -m128bit-long-double) And that was *not* a suggestion, but how it actually worked... Now ? --anders
Apr 04 2005
next sibling parent reply Georg Wrede <georg.wrede nospam.org> writes:
Anders F Björklund wrote:
 It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

Size can be anything divisible by 8 bits, i.e. any number of bytes. Alignment has to be a power of two, and is about _where_ in memory the thing can or cannot be stored. Align 4 for example, means that the variable cannot be stored in a memory address which, taken as a number, is not divisible by 4. Only something aligned 1 can be stored in any address.
Apr 04 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Georg Wrede wrote:

 It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

Size can be anything divisible by 8 bits, i.e. any number of bytes. Alignment has to be a power of two, and is about _where_ in memory the thing can or cannot be stored. Align 4 for example, means that the variable cannot be stored in a memory address which, taken as a number, is not divisible by 4. Only something aligned 1 can be stored in any address.

OK, seems like my sloppy syntax is hurting me once again... :-P I meant that the *size* of "long double" on GCC X86 is 96 bits, so that it can be *aligned* to 32 bits always (unlike 80 bits?) Anyway, aligning to 128 bits gives better Pentium performance ? (or at least, that's what I heard... Only have doubles on PPC) Thanks for clearing it up, in my head 96 bits was "a power of two". (since anything aligned to a multiple of a power of two is fine too) --anders
Apr 04 2005
prev sibling next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:d2qq5u$1aau$1 digitaldaemon.com...
 Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ? The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?

What happens when someone declares a variable as quadruple on a platform without hardware support? Does D plug in a software quadruple implementation? That isn't the right thing to do. That's been my whole point of bringing up Java's experience. They tried to foist too much rigor on their floating point model in the name of portability and had to redo it.
Apr 04 2005
next sibling parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 What happens when someone declares a variable as quadruple on a platform 
 without hardware support? Does D plug in a software quadruple 
 implementation? That isn't the right thing to do. That's been my whole point 
 of bringing up Java's experience. They tried to foist too much rigor on 
 their floating point model in the name of portability and had to redo it. 

Choke... Splutter... Die. Java did not re-implement extended in software. They just ignored it... --anders
Apr 04 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
I wrote, in response to Ben Hinkle:

 What happens when someone declares a variable as quadruple on a 
 platform without hardware support? 

Choke... Splutter... Die.

Just to be perfectly clear: Those are the sounds the *compiler* would make, not Ben :-) Seriously, trying to use the extended or quadruple types on platforms where they are not implemented in hardware would be a compile time error. "real" would silently fall back. --anders
Apr 04 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:d2rcfd$1ueq$2 digitaldaemon.com...
I wrote, in response to Ben Hinkle:

 What happens when someone declares a variable as quadruple on a platform 
 without hardware support?

Choke... Splutter... Die.

Just to be perfectly clear: Those are the sounds the *compiler* would make, not Ben :-)

yup, I read it that way - though I did notice I spluttered a bit this morning...
 Seriously, trying to use the extended or quadruple types on
 platforms where they are not implemented in hardware would
 be a compile time error. "real" would silently fall back.

OK, needless to say I think a builtin type that is illegal on many platforms is a mistake.
Apr 04 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Ben Hinkle wrote:

 OK, needless to say I think a builtin type that
 is illegal on many platforms is a mistake. 

That is actually *not* needless to say, but Walter agrees with you on the topic. Just as we can talk about "real" as the 64/80/96/128 bit floating point type, and not somehow assume that it will be 80 bits - then I'm perfectly fine with it. "long double" in C/C++ works just the same. But if you *do* want to talk about the "X87" 80-bit type, then please do by all means use "extended" instead. Less confusion, all around ? (let's save "quadruple" for later, with "cent") --anders
Apr 04 2005
parent reply "Bob W" <nospam aol.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message 
news:d2rdjp$1vfl$1 digitaldaemon.com...
 Ben Hinkle wrote:

 OK, needless to say I think a builtin type that
 is illegal on many platforms is a mistake.

That is actually *not* needless to say, but Walter agrees with you on the topic. Just as we can talk about "real" as the 64/80/96/128 bit floating point type, and not somehow assume that it will be 80 bits - then I'm perfectly fine with it. "long double" in C/C++ works just the same. But if you *do* want to talk about the "X87" 80-bit type, then please do by all means use "extended" instead. Less confusion, all around ? (let's save "quadruple" for later, with "cent") --anders

The IEEE 754r suggests that there won't be a 80bit nor a 96bit format in future (whenever this may be). Ref.: My today's post about IEEE 754r
Apr 04 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Bob W wrote:

But if you *do* want to talk about the "X87"
80-bit type, then please do by all means use
"extended" instead. Less confusion, all around ?
(let's save "quadruple" for later, with "cent")

The IEEE 754r suggests that there won't be a 80bit nor a 96bit format in future (whenever this may be).

According to Sun, Microsoft, IBM and Apple there isn't such a 80-bit type today even... ;-) BTW; the 96-bit floating point was the type preferred by the 68K families FPU processor --anders
Apr 04 2005
parent "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:d2rhtd$258a$1 digitaldaemon.com...
 Bob W wrote:
 The IEEE 754r suggests that there won't be
 a 80bit nor a 96bit format in future (whenever
 this may be).

there isn't such a 80-bit type today even... ;-)

I fear it will be constant struggle to keep the chipmakers from dropping it and the OS vendors from abandoning support.
Apr 04 2005
prev sibling parent Charles Hixson <charleshixsn earthlink.net> writes:
Ben Hinkle wrote:
 "Anders F Björklund" <afb algonet.se> wrote in message 
 news:d2qq5u$1aau$1 digitaldaemon.com...
 
Walter wrote:


If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ? The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?

What happens when someone declares a variable as quadruple on a platform without hardware support? Does D plug in a software quadruple implementation? That isn't the right thing to do. That's been my whole point of bringing up Java's experience. They tried to foist too much rigor on their floating point model in the name of portability and had to redo it.

depends on the available hardware, but also allow the user to define what size/precision is needed in any particular case. It may slow things down a lot if you demand 17 places of accuracy, but if you really need exactly 17, you should be able to specify it. (OTOH, Ada had the govt. paying for it's development, and it still ended up as a language people didn't want to use.)
Apr 05 2005
prev sibling parent reply "Walter" <newshound digitalmars.com> writes:
"Anders F Björklund" <afb algonet.se> wrote in message
news:d2qq5u$1aau$1 digitaldaemon.com...
 Walter wrote:
If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?



 but they're screwed anyway if the hardware won't support it.

the int -> short/long that C has gotten so much beating for already ?

Philosophically, they are the same. Practically, however, they are very different. Increasing integer sizes gives more range, and integer calculations tend to be *right* or *wrong*. Floating point increased size, however, gives more precision. So an answer is *better* or *worse*, insted of right or wrong. (Increased bits also gives fp more range, but if the range is not enough, it fails cleanly with an overflow indication, not just wrapping around and giving garbage.) In other words, decreasing the bits in an fp value tends to gracefully degrade the results, which is very different from the effect on integer values.
 The suggestion was to have fixed precision types:
 - float => IEEE 754 Single precision (32-bit)
 - double => IEEE 754 Double precision (64-bit)
 - extended => IEEE 754 Double Extended precision (80-bit)
 - quadruple => "IEEE 754" Quadruple precision (128-bit)

 And then have "real" be an alias to the largest hardware-supported type.
 It wouldn't break code more than if it was a variadic size type format ?

I just don't see the advantage. If you use "extended" and your hardware doesn't support it, you're out of luck. If you use "real", your program will still compile and run. If certain characteristics of the "real" type are required, one can use static asserts on the properties of real.
It was my understanding that it was aligned to 96 bits on X86,



There is nothing set up in the operating system or linker to handle alignment to 96 bits or other values not a power of 2. Note that there is a big difference between the size of an object and what its alignment is.
 But it was my understanding that on the X86/X86_64 family of processors
 that Windows used to use 10-byte doubles (and then removed extended?),
 and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now
 uses 16-byte doubles (using the GCC option of -m128bit-long-double)

 And that was *not* a suggestion, but how it actually worked... Now ?

Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if gcc on linux does it that way or not.
Apr 04 2005
parent reply =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
Walter wrote:

 Philosophically, they are the same. Practically, however, they are very
 different. Increasing integer sizes gives more range, and integer
 calculations tend to be *right* or *wrong*. Floating point increased size,
 however, gives more precision. So an answer is *better* or *worse*, insted
 of right or wrong. (Increased bits also gives fp more range, but if the
 range is not enough, it fails cleanly with an overflow indication, not just
 wrapping around and giving garbage.) In other words, decreasing the bits in
 an fp value tends to gracefully degrade the results, which is very different
 from the effect on integer values.

Interesting view of it, but I think that int fixed-point math degrades gracefully in the same way (using integers) Still with wrapping, though. Not that I've used fixed-point in quite some time, and it doesn't seem like I will be either - with the current CPUs and the new APIs.
 I just don't see the advantage. If you use "extended" and your hardware
 doesn't support it, you're out of luck. If you use "real", your program will
 still compile and run. If certain characteristics of the "real" type are
 required, one can use static asserts on the properties of real.

To be honest, I was just tired of the "real is 80 bits" all over D ? And more than a little annoyed at the ireal and creal, of course ;-) I always thought that "long double" was confusing, so now I've started to use "extended" for 80-bit and "real" for the biggest-available-type. And it's working out good so far.
And that was *not* a suggestion, but how it actually worked... Now ?

Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if gcc on linux does it that way or not.

Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes. --anders
Apr 04 2005
parent =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
 Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not 
 sure if gcc on linux does it that way or not.

Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes.

Make that "Linux on X86 aligns to 4 bytes, by making the size 12". You know what I mean :-) --anders
Apr 04 2005
prev sibling parent reply Charles Hixson <charleshixsn earthlink.net> writes:
Walter wrote:
 "Anders F Björklund" <afb algonet.se> wrote in message
 news:d2og5l$27nh$3 digitaldaemon.com...
 
Walter wrote:


I haven't done a comprehensive survey of computer languages, but as far


as I
can tell D stands pretty much alone in its support for 80 bits, along


with a
handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ? It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software emulator. I've used such emulators before, and they are really, really slow. I don't think it's practical for D floating point to be 100x slower on some machines. ...

with a 128-bit integer as the underlying type, I think it would have operational limitations, but it should be a lot faster then "100 times as slow as hardware". (OTOH, there's lots of reasons why it isn't a normal feature of languages. Apple on the 68000 series is the only computer I know of using it, and then only for specialized applications.)
Apr 05 2005
parent "Walter" <newshound digitalmars.com> writes:
"Charles Hixson" <charleshixsn earthlink.net> wrote in message
news:d2unfm$2n6s$1 digitaldaemon.com...
 Would implementing fixed point arithmetic improve that?  Even
 with a 128-bit integer as the underlying type, I think it would
 have operational limitations, but it should be a lot faster then
 "100 times as slow as hardware".  (OTOH, there's lots of reasons
 why it isn't a normal feature of languages.  Apple on the 68000
 series is the only computer I know of using it, and then only for
 specialized applications.)

If using a 128 bit fixed point would work, then one can use integer arithmetic on it. But that isn't floating point, which is a fundamentally different animal.
Apr 05 2005
prev sibling parent "Bob W" <nospam aol.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:d2od1o$25vd$1 digitaldaemon.com...
 "Bob W" <nospam aol.com> wrote in message
 news:d2nd96$1aos$1 digitaldaemon.com...
 By the way, C does it the same way for historic
 reasons. Other languages are more user friendly
 and I am still hoping that D might evolve in this
 direction.

Actually, many languages, mathematical programs, and even C compilers have *dropped* support for 80 bit long doubles. At one point, Microsoft had even made it impossible to execute 80 bit floating instructions on their upcoming Win64 (I made some frantic phone calls to them and apparently was the only one who ever made a case to them in favor of 80 bit long doubles, they said they'd put the support back in). Intel doesn't support 80 bit reals on any of their new vector floating point instructions. The 64 bit chips only support it in a 'legacy' manner. Java, C#, VC, Javascript do not support 80 bit reals. I haven't done a comprehensive survey of computer languages, but as far as I can tell D stands pretty much alone in its support for 80 bits, along with a handful of C/C++ compilers (including DMC). Because of this shaky operating system and chip support for 80 bits, it would be a mistake to center D's floating point around 80 bits. Some systems may force a reversion to 64 bits. On the other hand, ongoing system support for 64 bit doubles is virtually guaranteed, and D generally follows C's rules with these. (BTW, this thread is a classic example of "build it, and they will come". D is almost single handedly rescuing 80 bit floating point from oblivion, since it makes such a big deal about it and has wound up interesting a lot of people in it. Before D, as far as I could tell, nobody cared a whit about it. I think it's great that this has struck such a responsive chord.)

I am probably looking like an extended precison advocate, but I am actually not. The double format was good enough for me even for statistical evaluation in almost 100% of cases. There are admittedly cases which would benefit from having 80 bit precision available, however. Therefore, although it would not be devastating for me should you ever decide to drop support for the reals, I'd still like having them available just in case they are needed. However, if you do offer 80 bit types you'll have to assign real variables with proper real values if evaluation can be completed at compile time. Otherwise I suggest that you issue a warning where accuracy might be impaired. It is hard to believe that a new millennium programming language would actually require people to write real r=1.2L instead of real r=1.2 in order not to produce an incorrect assignment. Yes, I know what C programmers would want to say here, I am one of them. : ) For someone not familiar with C, the number 1.2 is not a real and is not a double either, especially if he is purely mathematically oriented. It is a decimal floating point value. He takes it for granted that 1.2 is fine whether assigned to a float or to a double. But he will refuse to understand why he has to suffix the literal to become an accurate real value. Of course you could try to explain him that the usual +/- 1/2 LSB error for most fractional (decimal) values converted to binary would increase to about 11 LSBs if he ever forgot to use that important "L" suffix. But would he really want to know?
Apr 03 2005
prev sibling parent reply "Bob W" <nospam aol.com> writes:
"Walter" <newshound digitalmars.com> wrote in message 
news:d2kk71$1pnl$1 digitaldaemon.com...
 "Bob W" <nospam aol.com> wrote in message
 news:d2ieh5$2ksl$1 digitaldaemon.com...
 - D is not entirely 80-bit based as claimed.


I still don't buy that. Example: std.string.atof() as mentioned below.
 - Literals are converted to 64 bit first (and from there
   to 80 bits) at compile time if no suffix is used, even
   if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

Maybe there is a misunderstanding: I just wanted to mention that although it is claimed that the default internal FP format is 80 bits, the default floating point format for literals is double. The lexer, (at least to my understanding) seems to confirm this. Therefore, if someone does not want to experience a loss in precision, he ALWAYS needs to use the L suffix for literals, otherwise he gets a real which was converted from a double. e.g.: real r1=1.2L; // this one is ok thanks to the suffix real r2=1.2; // loss in precision, double convt'd to real
 - atof() for example is returning a 'real' value which is
   obviously derived from a 'double', thus missing some
   essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.

This one yes, but not the official Phobos version std.string.atof() which I have used. Phobos docs suggest that atof() can be found in 1) std.math (n/a) 2) std.string Since I have not found any atof() function in std.math and std.math2 is not even mentioned in the Phobos docs, I've got it from std.string AND THIS ONE IS 64 BIT! --------- quote from "c.stdlib.d" --------- double atof(char *); --------------- unquote ------------------- --------- quote from "string.d" ---------- real atof(char[] s) { // BUG: should implement atold() return std.c.stdlib.atof(toStringz(s)); } --------------- unquote ------------------- Due to heavy workload this issue might have been overlooked. Luckily I do not even have to mention the word "BUG", this was aparently already done in the author's comment line. : ) After searching the archives it looks like someone was already troubled by the multiple appearance of atof() in Nov 2004: http://www.digitalmars.com/d/archives/digitalmars/D/bugs/2196.html
 Example:

 The hex value for 0.0000195 in 'real' can be expressed as
        3fef a393ee5e edcc20d5
 or
        3fef a393ee5e edcc20d6
 (due to the non-decimal fraction).

 The same value converted from a 'double' would be
        3fef a393ee5e edcc2000
 and therefore misses several trailing bits. This could
 cause the floor() function to misbehave.


 I hope this info was somewhat useful.

Perhaps the following program will help: import std.stdio; void main() { writefln("float %a", 0.0000195F); writefln("double %a", 0.0000195); writefln("real %a", 0.0000195L); writefln("cast(real)float %a", cast(real)0.0000195F); writefln("cast(real)double %a", cast(real)0.0000195); writefln("cast(real)real %a", cast(real)0.0000195L); writefln("float %a", 0.0000195F * 7 - 195); writefln("double %a", 0.0000195 * 7 - 195); writefln("real %a", 0.0000195L * 7 - 195); } float 0x1.4727dcp-16 double 0x1.4727dcbddb984p-16 real 0x1.4727dcbddb9841acp-16 cast(real)float 0x1.4727dcp-16 cast(real)double 0x1.4727dcbddb984p-16 cast(real)real 0x1.4727dcbddb9841acp-16 float -0x1.85ffeep+7 double -0x1.85ffee1bd1edap+7 real -0x1.85ffee1bd1ed9dfep+7

In accordance to what I have mentioned before, the following program demonstrates the existence of "truncated" reals: void main() { real r1=1.2L; // converted directly to 80 bit value real r2=1.2; // parsed to 64b, then convt'd to 80b writefln("Genuine : %a",r1); writefln("Truncated: %a",r2); } Output (using %a): Genuine : 0x1.3333333333333334p+0 Truncated: 0x1.3333333333333p+0 Alternative Output: Genuine: 1.20000000000000000 [3fff 99999999 9999999a] Truncated: 1.19999999999999996 [3fff 99999999 99999800]
Apr 01 2005
parent reply "Walter" <newshound digitalmars.com> writes:
"Bob W" <nospam aol.com> wrote in message
news:d2kvcc$22qa$1 digitaldaemon.com...
 I just wanted to mention that although it is claimed that
 the default internal FP format is 80 bits,

Actually, what is happening is that if you write the expression: double a, b, c, d; a = b + c + d; then the intermediate values generated by b+c+d are allowed (but not required) to be evaluated to the largest precision available. This means that it's allowed to evaluate it as: a = cast(double)(cast(real)b + cast(real)c + cast(real)d)); but it is not required to evaluate it in that way. This produces a slightly different result than: double t; t = b + c; a = t + d; The latter is the way Java is specified to work, which turns out to be both numerically inferior and *slower* on the x86 FPU. The x86 FPU *wants* to evaluate things to 80 bits. The D compiler's internal paths fully support 80 bit arithmetic, that means there are no surprising "choke points" where it gets truncated to 64 bits. If the type of a literal is specified to be 'double', which is the case for no suffix, then you get 64 bits of precision. I hope you'll agree that that is the least surprising thing to do.
 Check out std.math2.atof(). It's fully 80 bit.


True, that's a bug, and I'll fix it
Apr 01 2005
parent "Bob W" <nospam aol.com> writes:
I have started a new thread: "80 Bit Challenge",
which should serve as a reply to your post ....
Apr 02 2005