digitalmars.D - Exotic floor() function

digitalmars.D - Exotic floor() function - D is different

Bob W (43/43) Mar 28 2005 The floor() function in D does not produce equivalent

Walter (20/50) Mar 30 2005 What you're seeing is the result of using 80 bit precision, which is wha...

Derek Parnell (42/104) Mar 31 2005 I can follow what you say, but can you explain the output of the program

Bob W (13/124) Mar 31 2005 Great job! I could not believe it first:

Walter (23/29) Apr 01 2005 I suggest in general viewing how these things work (floating, chopping,

Bob W (29/90) Mar 31 2005 Thank you for your information, Walter.

Walter (28/46) Apr 01 2005 Not true, it fully supports 80 bits.

Derek Parnell (45/109) Apr 01 2005 I repeat, (I think) I understand what you are saying but can you explain

Walter (4/5) Apr 01 2005 No, there isn't. The reason for the difference is when you assign the

Derek Parnell (16/22) Apr 01 2005 Ok, I did that. And I still can't explain the output.

Derek Parnell (43/43) Apr 01 2005 On Sat, 2 Apr 2005 15:39:01 +1000, Derek Parnell wrote:
Walter (12/13) Apr 02 2005 Recall that, at runtime, the intermediate values are allowed to be carri...

Bob W (9/22) Apr 02 2005 It's C legacy hidden in the way the compiler parses
Derek Parnell (11/33) Apr 02 2005 Got it.

Walter (3/8) Apr 02 2005 It's the way C works.

Derek Parnell (7/17) Apr 02 2005 I understand. And here I was thinking that D was meant to be better than...

Bob W (42/76) Apr 02 2005 Some further info:

Walter (24/28) Apr 03 2005 Actually, many languages, mathematical programs, and even C compilers ha...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/16) Apr 03 2005 The thing is that the D "real" type does *not* guarantee 80 bits ?

Walter (12/28) Apr 03 2005 as I

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (24/38) Apr 03 2005 Me neither. Emulating 64-bit integers with two 32-bit registers is OK,

Walter (11/33) Apr 03 2005 Yes, I believe that is better. Every once in a while, an app *does* care...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (20/32) Apr 04 2005 I just fail to see how real -> double/extended, is any different from

Georg Wrede (7/12) Apr 04 2005 Size can be anything divisible by 8 bits, i.e. any number of bytes.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (9/24) Apr 04 2005 OK, seems like my sloppy syntax is hurting me once again... :-P

Ben Hinkle (7/22) Apr 04 2005 What happens when someone declares a variable as quadruple on a platform...

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/9) Apr 04 2005 Choke... Splutter... Die.

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (7/10) Apr 04 2005 Just to be perfectly clear:

Ben Hinkle (6/16) Apr 04 2005 yup, I read it that way - though I did notice I spluttered a bit this

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/15) Apr 04 2005 That is actually *not* needless to say,

Bob W (6/21) Apr 04 2005 The IEEE 754r suggests that there won't be

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/14) Apr 04 2005 According to Sun, Microsoft, IBM and Apple

Walter (4/10) Apr 04 2005 I fear it will be constant struggle to keep the chipmakers from dropping...

Charles Hixson (8/40) Apr 05 2005 Perhaps Ada has the right idea here. Have a system default that

Walter (21/43) Apr 04 2005 care,

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (12/29) Apr 04 2005 Interesting view of it, but I think that int fixed-point math degrades

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (3/7) Apr 04 2005 Make that "Linux on X86 aligns to 4 bytes, by making the size 12".

Charles Hixson (8/34) Apr 05 2005 Would implementing fixed point arithmetic improve that? Even

Walter (5/12) Apr 05 2005 If using a 128 bit fixed point would work, then one can use integer

Bob W (37/75) Apr 03 2005 I am probably looking like an extended precison

Bob W (57/106) Apr 01 2005 I still don't buy that.

Walter (23/27) Apr 01 2005 Actually, what is happening is that if you write the expression:

Bob W (2/2) Apr 02 2005 I have started a new thread: "80 Bit Challenge",

"Bob W" <nospam aol.com> writes:

The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

  dmc
  djgpp
  dmdscript
  jscript
  assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.


//------------------------------

import std.stdio,std.string,std.math;

int main(char[][] av) {
  if (av.length!=2) {
    printf("\nEnter Val! (e.g. 0.0000195)\n");  return(0);
  }

  double x=atof(av[1]);                    // expecting 0.0000195;
  writef("          x*1e6:%12.6f\n",x*1e6);
  writef("     floor(x..):%12.6f\n",floor(1e6*x));
  writef("  floor(.5+x..):%12.6f\n",floor(.5 + 1e6*x));
  writef("  floor(.5+co.):%12.6f\n",floor(.5 + 1e6*0.0000195));

  return(0);
}

Mar 28 2005

"Walter" <newshound digitalmars.com> writes:

"Bob W" <nospam aol.com> wrote in message
news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

What you're seeing is the result of using 80 bit precision, which is what D
uses in internal calculations. .0000195 is not represented exactly, to print
the number it is rounded. So, depending on how many bits of precision there
are in the representation, it might be one bit, 63 bits to the right, under
"5", so floor() will chop it down.

Few C compilers support 80 bit long doubles, they implement them as 64 bit
ones. Very few programs use 80 bit reals.

The std.math.floor function uses 80 bit precision. If you want to use the C
64 bit one instead, add this declaration:

    extern (C) double floor(double);

Then the results are:

          x*1e6:   19.500000
     floor(x..):   19.000000
  floor(.5+x..):   20.000000
  floor(.5+co.):   20.000000

I suggest that while it's a reasonable thing to require a minimum number of
floating point bits for a computation, it's probably not a good idea to
require a maximum.

Mar 30 2005

Derek Parnell <derek psych.ward> writes:

On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

 "Bob W" <nospam aol.com> wrote in message
 news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

 
 What you're seeing is the result of using 80 bit precision, which is what D
 uses in internal calculations. .0000195 is not represented exactly, to print
 the number it is rounded. So, depending on how many bits of precision there
 are in the representation, it might be one bit, 63 bits to the right, under
 "5", so floor() will chop it down.
 
 Few C compilers support 80 bit long doubles, they implement them as 64 bit
 ones. Very few programs use 80 bit reals.
 
 The std.math.floor function uses 80 bit precision. If you want to use the C
 64 bit one instead, add this declaration:
 
     extern (C) double floor(double);
 
 Then the results are:
 
           x*1e6:   19.500000
      floor(x..):   19.000000
   floor(.5+x..):   20.000000
   floor(.5+co.):   20.000000
 
 I suggest that while it's a reasonable thing to require a minimum number of
 floating point bits for a computation, it's probably not a good idea to
 require a maximum.

I can follow what you say, but can you explain the output of the program
below? There appears to be a difference in the way variables and literals
are treated.

import std.stdio;
import std.math;
import std.string;

void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195;
  y = 0.0000195;
  z = 0.0000195;
  writefln("                          Raw            Floor");
  writefln("Using float  variable: %12.6f %12.6f", 
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using double variable: %12.6f %12.6f", 
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using real   variable: %12.6f %12.6f", 
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using float   literal: %12.6f %12.6f", 
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
  writefln("Using double  literal: %12.6f %12.6f", 
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
  writefln("Using real    literal: %12.6f %12.6f", 
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


}

----------
I get the following output...
----------
                          Raw          Floor
Using float  variable:    19.999999    19.000000
Using double variable:    20.000000    19.000000
Using real   variable:    20.000000    19.000000
Using float   literal:    19.999999    20.000000
Using double  literal:    20.000000    20.000000
Using real    literal:    20.000000    20.000000

-- 
Derek
Melbourne, Australia
31/03/2005 6:43:48 PM

Mar 31 2005

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:7di6xztjokyz.6vnxzcx1d7l8.dlg 40tude.net...
 On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

 "Bob W" <nospam aol.com> wrote in message
 news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

 What you're seeing is the result of using 80 bit precision, which is what 
 D
 uses in internal calculations. .0000195 is not represented exactly, to 
 print
 the number it is rounded. So, depending on how many bits of precision 
 there
 are in the representation, it might be one bit, 63 bits to the right, 
 under
 "5", so floor() will chop it down.

 Few C compilers support 80 bit long doubles, they implement them as 64 
 bit
 ones. Very few programs use 80 bit reals.

 The std.math.floor function uses 80 bit precision. If you want to use the 
 C
 64 bit one instead, add this declaration:

     extern (C) double floor(double);

 Then the results are:

           x*1e6:   19.500000
      floor(x..):   19.000000
   floor(.5+x..):   20.000000
   floor(.5+co.):   20.000000

 I suggest that while it's a reasonable thing to require a minimum number 
 of
 floating point bits for a computation, it's probably not a good idea to
 require a maximum.

 I can follow what you say, but can you explain the output of the program
 below? There appears to be a difference in the way variables and literals
 are treated.

 import std.stdio;
 import std.math;
 import std.string;

 void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195;
  y = 0.0000195;
  z = 0.0000195;
  writefln("                          Raw            Floor");
  writefln("Using float  variable: %12.6f %12.6f",
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using double variable: %12.6f %12.6f",
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using real   variable: %12.6f %12.6f",
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using float   literal: %12.6f %12.6f",
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
  writefln("Using double  literal: %12.6f %12.6f",
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
  writefln("Using real    literal: %12.6f %12.6f",
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


 }

 ----------
 I get the following output...
 ----------
                          Raw          Floor
 Using float  variable:    19.999999    19.000000
 Using double variable:    20.000000    19.000000
 Using real   variable:    20.000000    19.000000
 Using float   literal:    19.999999    20.000000
 Using double  literal:    20.000000    20.000000
 Using real    literal:    20.000000    20.000000

 -- 
 Derek
 Melbourne, Australia
 31/03/2005 6:43:48 PM



Great job! I could not believe it first:

    writefln("Using float   literal: %12.6f %12.6f",
        (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

producing the following output:

   Using float  variable:    19.999999    20.000000


Looks like floor() mutates to ceil() at times. To ensure
that this is not "down under" specific (Melbourne),
I have repeated your test in the northern hemisphere,
and, not surprisingly, it did the same thing. Now
I am pretty curious to know why this is happening.

We'll see if Walter comes up with an answer .....

Mar 31 2005

"Walter" <newshound digitalmars.com> writes:

"Bob W" <nospam aol.com> wrote in message
news:d2i3et$27dg$1 digitaldaemon.com...
 Great job! I could not believe it first:

     writefln("Using float   literal: %12.6f %12.6f",
         (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

 producing the following output:

    Using float  variable:    19.999999    20.000000

 We'll see if Walter comes up with an answer .....

I suggest in general viewing how these things work (floating, chopping,
rounding, precision, etc.) is to print things using the %a format (which
prints out ALL the bits in hexadecimal format).

As to the specific case above, let's break down each (using suffix 'd' to
represent double):

(.5 + 1e6*0.0000195f) => (.5d + 1e6d * cast(double)0.0000195f), result is
double
floor(.5 + 1e6*0.0000195f)) => floor(cast(real)(.5d + 1e6d *
cast(double)0.0000195f)), result is real

When writef prints a real, it adds ".5" to the last signficant decimal digit
and chops. This will give DIFFERENT results for a double and for a real.
It's also DIFFERENT from the binary rounding that goes on in intermediate
floating point calculations, which adds "half a bit" (not .5) and chops.
Also, realize that internally to the FPU, a "guard bit" and a "sticky bit"
are maintained for a floating point value, these influence rounding, and are
discarded when a value leaves the FPU and is written to memory.

What is happening here is that you start with a value that is not exactly
representable, then putting it through a series of precision changes and
roundings, and comparing it with the result of a different series of
precision changes and roundings, and expecting the results to match bit for
bit. There's no way to make that happen.

Apr 01 2005

"Bob W" <nospam aol.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:d2g9jj$8om$1 digitaldaemon.com...
 "Bob W" <nospam aol.com> wrote in message
 news:d2aash$a4s$1 digitaldaemon.com...
 The floor() function in D does not produce equivalent
 results compared to a bunch of other languages
 tested. The other languages were:

   dmc
   djgpp
   dmdscript
   jscript
   assembler ('87 code)

 The biggest surprise was that neither dmc nor
 dmdscript were able to match the D results.

 The sample program below gets an input
 from the command line, converts it, multiplies
 it with 1e6 and adds 0.5 before calling the
 floor() function. The expected result, based on
 an input of 0.0000195, would be 20.0, but
 D thinks it should be 19.0.

 Since 0.0000195 cannot be represented
 accurately in any of the usual floating point
 formats, the somewhat unique D result is
 probably not even a bug. But it is a major
 inconvenience when comparing numerical
 outputs produced by different programs.

 So far I was unable to reproduce the rounding
 issue in D with any other language tested.
 (I have even tried OpenOffice to check.)
 Before someone tells me that D uses a
 different floating point format, I'd like to
 mention that I have used float, double and
 long double in the equivalent C programs
 without any changes.

 What you're seeing is the result of using 80 bit precision, which is what 
 D
 uses in internal calculations. .0000195 is not represented exactly, to 
 print
 the number it is rounded. So, depending on how many bits of precision 
 there
 are in the representation, it might be one bit, 63 bits to the right, 
 under
 "5", so floor() will chop it down.

 Few C compilers support 80 bit long doubles, they implement them as 64 bit
 ones. Very few programs use 80 bit reals.

 The std.math.floor function uses 80 bit precision. If you want to use the 
 C
 64 bit one instead, add this declaration:

    extern (C) double floor(double);

 Then the results are:

          x*1e6:   19.500000
     floor(x..):   19.000000
  floor(.5+x..):   20.000000
  floor(.5+co.):   20.000000

 I suggest that while it's a reasonable thing to require a minimum number 
 of
 floating point bits for a computation, it's probably not a good idea to
 require a maximum.




Thank you for your information, Walter.

However, I am not convinced that the culprit ist the
80-bit floating point format. This is due to some tests
I have made programming the FPU directly.

Based on my above stated example, the 80 bit format
is perfectly capable to generate the 'mainstream result'
of 20 as opposed to the lone 19 which D is producing.


Some more info, which might lead to the real problem:

- D is not entirely 80-bit based as claimed.

- Literals are converted to 64 bit first (and from there
  to 80 bits) at compile time if no suffix is used, even
  if the target is of type 'real'.

- atof() for example is returning a 'real' value which is
  obviously derived from a 'double', thus missing some
  essential bits at the end.


Example:

The hex value for 0.0000195 in 'real' can be expressed as
       3fef a393ee5e edcc20d5
or
       3fef a393ee5e edcc20d6
(due to the non-decimal fraction).

The same value converted from a 'double' would be
       3fef a393ee5e edcc2000
and therefore misses several trailing bits. This could
cause the floor() function to misbehave.


I hope this info was somewhat useful.

Cheers.

Mar 31 2005

"Walter" <newshound digitalmars.com> writes:

"Bob W" <nospam aol.com> wrote in message
news:d2ieh5$2ksl$1 digitaldaemon.com...
 - D is not entirely 80-bit based as claimed.

Not true, it fully supports 80 bits.

 - Literals are converted to 64 bit first (and from there
   to 80 bits) at compile time if no suffix is used, even
   if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

 - atof() for example is returning a 'real' value which is
   obviously derived from a 'double', thus missing some
   essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.

 Example:

 The hex value for 0.0000195 in 'real' can be expressed as
        3fef a393ee5e edcc20d5
 or
        3fef a393ee5e edcc20d6
 (due to the non-decimal fraction).

 The same value converted from a 'double' would be
        3fef a393ee5e edcc2000
 and therefore misses several trailing bits. This could
 cause the floor() function to misbehave.


 I hope this info was somewhat useful.

Perhaps the following program will help:

import std.stdio;

void main()
{
    writefln("float  %a", 0.0000195F);
    writefln("double %a", 0.0000195);
    writefln("real   %a", 0.0000195L);

    writefln("cast(real)float  %a", cast(real)0.0000195F);
    writefln("cast(real)double %a", cast(real)0.0000195);
    writefln("cast(real)real   %a", cast(real)0.0000195L);

    writefln("float  %a", 0.0000195F * 7 - 195);
    writefln("double %a", 0.0000195  * 7 - 195);
    writefln("real   %a", 0.0000195L * 7 - 195);
}



float  0x1.4727dcp-16
double 0x1.4727dcbddb984p-16
real   0x1.4727dcbddb9841acp-16
cast(real)float  0x1.4727dcp-16
cast(real)double 0x1.4727dcbddb984p-16
cast(real)real   0x1.4727dcbddb9841acp-16
float  -0x1.85ffeep+7
double -0x1.85ffee1bd1edap+7
real   -0x1.85ffee1bd1ed9dfep+7

Apr 01 2005

Derek Parnell <derek psych.ward> writes:

On Fri, 1 Apr 2005 15:03:02 -0800, Walter wrote:

 "Bob W" <nospam aol.com> wrote in message
 news:d2ieh5$2ksl$1 digitaldaemon.com...
 - D is not entirely 80-bit based as claimed.

 
 Not true, it fully supports 80 bits.
 
 - Literals are converted to 64 bit first (and from there
   to 80 bits) at compile time if no suffix is used, even
   if the target is of type 'real'.

 
 Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".
 
 - atof() for example is returning a 'real' value which is
   obviously derived from a 'double', thus missing some
   essential bits at the end.

 
 Check out std.math2.atof(). It's fully 80 bit.
 
 Example:

 The hex value for 0.0000195 in 'real' can be expressed as
        3fef a393ee5e edcc20d5
 or
        3fef a393ee5e edcc20d6
 (due to the non-decimal fraction).

 The same value converted from a 'double' would be
        3fef a393ee5e edcc2000
 and therefore misses several trailing bits. This could
 cause the floor() function to misbehave.


 I hope this info was somewhat useful.

 
 Perhaps the following program will help:
 
 import std.stdio;
 
 void main()
 {
     writefln("float  %a", 0.0000195F);
     writefln("double %a", 0.0000195);
     writefln("real   %a", 0.0000195L);
 
     writefln("cast(real)float  %a", cast(real)0.0000195F);
     writefln("cast(real)double %a", cast(real)0.0000195);
     writefln("cast(real)real   %a", cast(real)0.0000195L);
 
     writefln("float  %a", 0.0000195F * 7 - 195);
     writefln("double %a", 0.0000195  * 7 - 195);
     writefln("real   %a", 0.0000195L * 7 - 195);
 }
 
 
 
 float  0x1.4727dcp-16
 double 0x1.4727dcbddb984p-16
 real   0x1.4727dcbddb9841acp-16
 cast(real)float  0x1.4727dcp-16
 cast(real)double 0x1.4727dcbddb984p-16
 cast(real)real   0x1.4727dcbddb9841acp-16
 float  -0x1.85ffeep+7
 double -0x1.85ffee1bd1edap+7
 real   -0x1.85ffee1bd1ed9dfep+7

I repeat, (I think) I understand what you are saying but can you explain
the output of this ...
<code>
import std.stdio;
import std.math;
import std.string;

void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195;
  y = 0.0000195;
  z = 0.0000195;
  writefln("                       %24s %24s","Raw","Floor");
  writefln("Using float  variable: %24a %24a", 
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using double variable: %24a %24a", 
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using real   variable: %24a %24a", 
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using float   literal: %24a %24a", 
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
  writefln("Using double  literal: %24a %24a", 
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
  writefln("Using real    literal: %24a %24a", 
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


}
</code>

______________
Output is ...

                                            Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using double variable:                 0x1.4p+4                 0x1.3p+4
Using real   variable:  0x1.3ffffffffffffe68p+4                 0x1.3p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

There seems to be different treatment of literals and variables.

Even apart from that, given the values above, I can understand the floor
behaviour except for lines 2(double variable)  and 6 (real literal).

-- 
Derek Parnell
Melbourne, Australia
2/04/2005 10:19:43 AM

Apr 01 2005

"Walter" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message
news:eouhnxxkjb80$.clvse1356mlr.dlg 40tude.net...
 There seems to be different treatment of literals and variables.

No, there isn't. The reason for the difference is when you assign the
literal to z. Use the 'L' suffix for a real literal.

Apr 01 2005

Derek Parnell <derek psych.ward> writes:

On Fri, 1 Apr 2005 18:50:40 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:eouhnxxkjb80$.clvse1356mlr.dlg 40tude.net...
 There seems to be different treatment of literals and variables.

 
 No, there isn't. The reason for the difference is when you assign the
 literal to z. Use the 'L' suffix for a real literal.

Ok, I did that. And I still can't explain the output. 

                                            Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using double variable:                 0x1.4p+4                 0x1.3p+4
Using real   variable:  0x1.4000000000000002p+4                 0x1.4p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

Look at the results for doubles. How does floor(0x1.4p+4) give 0x1.3p+4
when the expression is a variable and give 0x1.4p+4 when the expression is
a literal?

-- 
Derek Parnell
Melbourne, Australia
2/04/2005 3:34:22 PM

Apr 01 2005

Derek Parnell <derek psych.ward> writes:

On Sat, 2 Apr 2005 15:39:01 +1000, Derek Parnell wrote:

I've reformatted the display to make it easier to spot the anomaly.

                                            Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4

Using double variable:                 0x1.4p+4                 0x1.3p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4

Using real   variable:  0x1.4000000000000002p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

And here is the program that created the above ...
<code>
import std.stdio;
import std.math;
import std.string;

void main() {

  float  x;
  double y;
  real   z;


  x = 0.0000195F;
  y = 0.0000195;
  z = 0.0000195L;
  writefln("                       %24s %24s","Raw","Floor");
  writefln("Using float  variable: %24a %24a", 
                    (.5 + 1e6*x), floor(.5 + 1e6*x));
  writefln("Using float   literal: %24a %24a", 
                    (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
                    
  writefln("");                    
  writefln("Using double variable: %24a %24a", 
                    (.5 + 1e6*y), floor(.5 + 1e6*y));
  writefln("Using double  literal: %24a %24a", 
                    (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));

  writefln("");                    
  writefln("Using real   variable: %24a %24a", 
                    (.5 + 1e6*z), floor(.5 + 1e6*z));

  writefln("Using real    literal: %24a %24a", 
                    (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));


}
</code>
-- 
Derek Parnell
Melbourne, Australia
2/04/2005 4:48:12 PM

Apr 01 2005

"Walter" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message
news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried
out to 80 bits. So,

    floor(.5 + 1e6*y)

is evaluated as:

    floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

whereas:

    floor(.5 + 1e6*0.0000195)

is evaluated as:

    float(cast(real)(.5 + 1e6*0.0000195))

hence the difference in result.

Apr 02 2005

"Bob W" <nospam aol.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:d2lodh$2qhf$1 digitaldaemon.com...
 "Derek Parnell" <derek psych.ward> wrote in message
 news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

 Recall that, at runtime, the intermediate values are allowed to be carried
 out to 80 bits. So,

    floor(.5 + 1e6*y)

 is evaluated as:

    floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

 whereas:

    floor(.5 + 1e6*0.0000195)

 is evaluated as:

    float(cast(real)(.5 + 1e6*0.0000195))

 hence the difference in result.


It's C legacy hidden in the way the compiler parses
this code. You'll be facing these kind of questions
over and over again, unless you move a step further
away from C and let the compiler treat unsuffixed
literals as the "internal compiler floating point
precision format".

See my thread: "80 Bit Challenge"

Apr 02 2005

Derek Parnell <derek psych.ward> writes:

On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

 
 Recall that, at runtime, the intermediate values are allowed to be carried
 out to 80 bits. So,
 
     floor(.5 + 1e6*y)
 
 is evaluated as:
 
     floor(cast(real).5 + cast(real)(1e6) * cast(real)y);
 
 whereas:
 
     floor(.5 + 1e6*0.0000195)
 
 is evaluated as:
 
     float(cast(real)(.5 + 1e6*0.0000195))
 
 hence the difference in result.

Got it. 

So to summarize, in expressions that contain at least one double variable,
each term is promoted to real before expression evaluation, but if the
expression only contains double literals, then the terms are not promoted
to real.

Why did you decide to have this anomaly?

-- 
Derek Parnell
Melbourne, Australia
2/04/2005 11:46:44 PM

Apr 02 2005

"Walter" <newshound digitalmars.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message
news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...
 So to summarize, in expressions that contain at least one double variable,
 each term is promoted to real before expression evaluation, but if the
 expression only contains double literals, then the terms are not promoted
 to real.

 Why did you decide to have this anomaly?

It's the way C works.

Apr 02 2005

Derek Parnell <derek psych.ward> writes:

On Sat, 2 Apr 2005 10:04:32 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...
 So to summarize, in expressions that contain at least one double variable,
 each term is promoted to real before expression evaluation, but if the
 expression only contains double literals, then the terms are not promoted
 to real.

 Why did you decide to have this anomaly?

 
 It's the way C works.

I understand. And here I was thinking that D was meant to be better than C.
My bad.

-- 
Derek Parnell
Melbourne, Australia
3/04/2005 8:13:09 AM

Apr 02 2005

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...
 On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...
 Ok, I did that. And I still can't explain the output.

 Recall that, at runtime, the intermediate values are allowed to be 
 carried
 out to 80 bits. So,

     floor(.5 + 1e6*y)

 is evaluated as:

     floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

 whereas:

     floor(.5 + 1e6*0.0000195)

 is evaluated as:

     float(cast(real)(.5 + 1e6*0.0000195))

 hence the difference in result.

 Got it.

 So to summarize, in expressions that contain at least one double variable,
 each term is promoted to real before expression evaluation, but if the
 expression only contains double literals, then the terms are not promoted
 to real.

 Why did you decide to have this anomaly?

 -- 
 Derek Parnell
 Melbourne, Australia
 2/04/2005 11:46:44 PM


Some further info:

Currently it seems that in the D language no
literal is ever promoted to real directly if it
was not suffixed with a "L". You can cast(real)
it, and it will still be a double which is
converted to a crippled real in the FPU, because
some of its matissa bits went missing.

There are many exceptions though: All floating
point integers (1.0 2.0 10.0 etc.) and fractions
like 0.5 0.25 0.125 etc. are converted to proper
real values, because they are accurately
represented in binary floating point formats.
But even they are initially doubles, which are
just unharmed by the conversion because most
of their trailing mantissa bits are zero.

Any other fractional number (e.g. 1.2) cannot be
represented accurately in the binary system, so
its double representation is not equivalent to
its real representation (nor is it to the decimal
literal). If such a double is converted to real,
it is missing several bits of precision, so it
will not correspond accurately to its properly
converted counterpart (e.g. 1.2L).

As a summary: If you feel the need using
extended double (real) precision in D,
never ever forget the "L" for literals unless
you want "special effects".

Examples:

real r=1.2L;      // proper 80 bit real assigned to r
real r=1.2;       // inaccurate truncated 80 bit real
real r=2.4/2.0;   // inaccurate (2.4 loses precision)
real r=2.4/2.0L;  // inaccurate for the same reason
real r=2.4L/2.0;  // this one will work (2.0 == 2.0L)
real r=2.4L/2.0L; // thats the safe way to do it
real r=cast(real)1.2;  // inaccurate, converted from
                               // 1.2 as a double

By the way, C does it the same way for historic
reasons. Other languages are more user friendly
and I am still hoping that D might evolve in this
direction.

Apr 02 2005

"Walter" <newshound digitalmars.com> writes:

"Bob W" <nospam aol.com> wrote in message
news:d2nd96$1aos$1 digitaldaemon.com...
 By the way, C does it the same way for historic
 reasons. Other languages are more user friendly
 and I am still hoping that D might evolve in this
 direction.

Actually, many languages, mathematical programs, and even C compilers have
*dropped* support for 80 bit long doubles. At one point, Microsoft had even
made it impossible to execute 80 bit floating instructions on their upcoming
Win64 (I made some frantic phone calls to them and apparently was the only
one who ever made a case to them in favor of 80 bit long doubles, they said
they'd put the support back in). Intel doesn't support 80 bit reals on any
of their new vector floating point instructions. The 64 bit chips only

bit reals.

I haven't done a comprehensive survey of computer languages, but as far as I
can tell D stands pretty much alone in its support for 80 bits, along with a
handful of C/C++ compilers (including DMC).

Because of this shaky operating system and chip support for 80 bits, it
would be a mistake to center D's floating point around 80 bits. Some systems
may force a reversion to 64 bits. On the other hand, ongoing system support
for 64 bit doubles is virtually guaranteed, and D generally follows C's
rules with these.

(BTW, this thread is a classic example of "build it, and they will come". D
is almost single handedly rescuing 80 bit floating point from oblivion,
since it makes such a big deal about it and has wound up interesting a lot
of people in it. Before D, as far as I could tell, nobody cared a whit about
it. I think it's great that this has struck such a responsive chord.)

Apr 03 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter wrote:

 I haven't done a comprehensive survey of computer languages, but as far as I
 can tell D stands pretty much alone in its support for 80 bits, along with a
 handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

I think it would be more clear to say "80 bits minimum", and then
future CPUs/code is still free to use 128-bit extended doubles too ?

(since D allows all FP calculations to be done at a higher precision)


This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

And then, with both 128-bit integers and 128-bit floating point,
D would truly be equipped to face both today (64) and tomorrow...

(and with a "real" alias, it's still the "largest hardware implemented")


Just my 2 �re,
--anders

Apr 03 2005

"Walter" <newshound digitalmars.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message
news:d2og5l$27nh$3 digitaldaemon.com...
 Walter wrote:

 I haven't done a comprehensive survey of computer languages, but as far


as I
 can tell D stands pretty much alone in its support for 80 bits, along


with a
 handful of C/C++ compilers (including DMC).

 The thing is that the D "real" type does *not* guarantee 80 bits ?
 It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software
emulator. I've used such emulators before, and they are really, really slow.
I don't think it's practical for D floating point to be 100x slower on some
machines.

 I think it would be more clear to say "80 bits minimum", and then
 future CPUs/code is still free to use 128-bit extended doubles too ?
 (since D allows all FP calculations to be done at a higher precision)

What it's supposed to be is the max precision supported by the hardware the
D program is running on.

 This would be simplified by padding the 80-bit floating point to
 a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte
boundaries. To conform to the C ABI, D must follow suit.

 And then, with both 128-bit integers and 128-bit floating point,
 D would truly be equipped to face both today (64) and tomorrow...

 (and with a "real" alias, it's still the "largest hardware implemented")


 Just my 2 �re,
 --anders

Apr 03 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter wrote:

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

 
 Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software
 emulator. I've used such emulators before, and they are really, really slow.
 I don't think it's practical for D floating point to be 100x slower on some
 machines.

Me neither. Emulating 64-bit integers with two 32-bit registers is OK,
since that is a whole lot easier. (could even be done for 128-bit ints?)

But emulating 80-bit floating point ? Eww. Emulating a 128-bit double
is better, but the current method is cheating a lot on IEEE-755 spec...


No, I meant that extended precision should be *unavailable* on some CPU.
But maybe it's better to have it work in D, like long double does in C ?

(i.e. it falls back to using regular doubles, possibly with warnings)

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Since that was the whole idea... (have "extended" map to 80-bit FP type)

 What it's supposed to be is the max precision supported by the hardware the
 D program is running on.

OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ?
Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-)

It seems like likely real-life values would be: 64, 80, 96 and 128 bits
(PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above)

It's possible that a future 128-bit CPU would have a 128-bit FPU too...
But who knows ? (I haven't even seen the slightest hint of such a beast)

This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

 
 C compilers that support 80 bit long doubles will align them on 2 byte
 boundaries. To conform to the C ABI, D must follow suit.

I thought that was an ABI option, how to align "long double" types ?

It was my understanding that it was aligned to 96 bits on X86,
and to 128 bits on X86_64. But I might very well be wrong there...
(it's just the impression that I got from reading the GCC manual)

i.e. it still uses the regular 80 bit floating point registers,
but pads the values out with zeroes when storing them in memory.

--anders

Apr 03 2005

"Walter" <newshound digitalmars.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message
news:d2pdbk$30dj$1 digitaldaemon.com...
 If so, just tell me it's better to have a flexible width language type,
 than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care,
but they're screwed anyway if the hardware won't support it.


 What it's supposed to be is the max precision supported by the hardware


the
 D program is running on.

 OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ?
 Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-)

 It seems like likely real-life values would be: 64, 80, 96 and 128 bits
 (PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above)

 It's possible that a future 128-bit CPU would have a 128-bit FPU too...
 But who knows ? (I haven't even seen the slightest hint of such a beast)

When I first looked at the AMD64 documentation, I was thrilled to see "m128"
for a floating point type. I was crushed when I found it meant "two 64 bit
doubles". I'd love to see a big honker 128 bit floating point type in
hardware.

This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

 C compilers that support 80 bit long doubles will align them on 2 byte
 boundaries. To conform to the C ABI, D must follow suit.

 I thought that was an ABI option, how to align "long double" types ?

The only option is to align it to what the corresponding C compiler does.

 It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

 and to 128 bits on X86_64. But I might very well be wrong there...
 (it's just the impression that I got from reading the GCC manual)

 i.e. it still uses the regular 80 bit floating point registers,
 but pads the values out with zeroes when storing them in memory.

 --anders

Apr 03 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

 
 Yes, I believe that is better. Every once in a while, an app *does* care,
 but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from 
the int -> short/long that C has gotten so much beating for already ?

The suggestion was to have fixed precision types:
- float => IEEE 754 Single precision (32-bit)
- double => IEEE 754 Double precision (64-bit)
- extended => IEEE 754 Double Extended precision (80-bit)
- quadruple => "IEEE 754" Quadruple precision (128-bit)

And then have "real" be an alias to the largest hardware-supported type.
It wouldn't break code more than if it was a variadic size type format ?

 When I first looked at the AMD64 documentation, I was thrilled to see "m128"
 for a floating point type. I was crushed when I found it meant "two 64 bit
 doubles". I'd love to see a big honker 128 bit floating point type in
 hardware.

I had a similar experience, with PPC64 and GCC, a while back...
(-mlong-double-128, referring to the IBM AIX style DoubledDouble)

Anyway, double-double has no chance of being full IEEE 755 spec.

It was my understanding that it was aligned to 96 bits on X86,

 
 That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

But it was my understanding that on the X86/X86_64 family of processors
that Windows used to use 10-byte doubles (and then removed extended?),
and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now
uses 16-byte doubles (using the GCC option of -m128bit-long-double)

And that was *not* a suggestion, but how it actually worked... Now ?

--anders

Apr 04 2005

Georg Wrede <georg.wrede nospam.org> writes:

Anders F Bj�rklund wrote:
 It was my understanding that it was aligned to 96 bits on X86,

 That's not a power of 2, so won't work as alignment.

 
 You lost me ? (anyway, I suggested 128 - which *is* a power of two)

Size can be anything divisible by 8 bits, i.e. any number of bytes.

Alignment has to be a power of two, and is about _where_ in memory the 
thing can or cannot be stored.

Align 4 for example, means that the variable cannot be stored in a 
memory address which, taken as a number, is not divisible by 4.

Only something aligned 1 can be stored in any address.

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Georg Wrede wrote:

 It was my understanding that it was aligned to 96 bits on X86,

 That's not a power of 2, so won't work as alignment.

 You lost me ? (anyway, I suggested 128 - which *is* a power of two)

 
 Size can be anything divisible by 8 bits, i.e. any number of bytes.
 
 Alignment has to be a power of two, and is about _where_ in memory the 
 thing can or cannot be stored.
 
 Align 4 for example, means that the variable cannot be stored in a 
 memory address which, taken as a number, is not divisible by 4.
 
 Only something aligned 1 can be stored in any address.

OK, seems like my sloppy syntax is hurting me once again... :-P


I meant that the *size* of "long double" on GCC X86 is 96 bits,
so that it can be *aligned* to 32 bits always (unlike 80 bits?)

Anyway, aligning to 128 bits gives better Pentium performance ?
(or at least, that's what I heard... Only have doubles on PPC)


Thanks for clearing it up, in my head 96 bits was "a power of two".
(since anything aligned to a multiple of a power of two is fine too)

--anders

Apr 04 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:d2qq5u$1aau$1 digitaldaemon.com...
 Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

 Yes, I believe that is better. Every once in a while, an app *does* care,
 but they're screwed anyway if the hardware won't support it.

 I just fail to see how real -> double/extended, is any different from the 
 int -> short/long that C has gotten so much beating for already ?

 The suggestion was to have fixed precision types:
 - float => IEEE 754 Single precision (32-bit)
 - double => IEEE 754 Double precision (64-bit)
 - extended => IEEE 754 Double Extended precision (80-bit)
 - quadruple => "IEEE 754" Quadruple precision (128-bit)

 And then have "real" be an alias to the largest hardware-supported type.
 It wouldn't break code more than if it was a variadic size type format ?

What happens when someone declares a variable as quadruple on a platform 
without hardware support? Does D plug in a software quadruple 
implementation? That isn't the right thing to do. That's been my whole point 
of bringing up Java's experience. They tried to foist too much rigor on 
their floating point model in the name of portability and had to redo it.

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Ben Hinkle wrote:

 What happens when someone declares a variable as quadruple on a platform 
 without hardware support? Does D plug in a software quadruple 
 implementation? That isn't the right thing to do. That's been my whole point 
 of bringing up Java's experience. They tried to foist too much rigor on 
 their floating point model in the name of portability and had to redo it. 

Choke... Splutter... Die.

Java did not re-implement extended in software. They just ignored it...

--anders

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

I wrote, in response to Ben Hinkle:

 What happens when someone declares a variable as quadruple on a 
 platform without hardware support? 

 Choke... Splutter... Die.

Just to be perfectly clear:
Those are the sounds the *compiler* would make, not Ben :-)

Seriously, trying to use the extended or quadruple types on
platforms where they are not implemented in hardware would
be a compile time error. "real" would silently fall back.

--anders

Apr 04 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:d2rcfd$1ueq$2 digitaldaemon.com...
I wrote, in response to Ben Hinkle:

 What happens when someone declares a variable as quadruple on a platform 
 without hardware support?

 Choke... Splutter... Die.

 Just to be perfectly clear:
 Those are the sounds the *compiler* would make, not Ben :-)

yup, I read it that way - though I did notice I spluttered a bit this 
morning...

 Seriously, trying to use the extended or quadruple types on
 platforms where they are not implemented in hardware would
 be a compile time error. "real" would silently fall back.

OK, needless to say I think a builtin type that is illegal on many platforms 
is a mistake.

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Ben Hinkle wrote:

 OK, needless to say I think a builtin type that
 is illegal on many platforms is a mistake. 

That is actually *not* needless to say,
but Walter agrees with you on the topic.

Just as we can talk about "real" as the
64/80/96/128 bit floating point type,
and not somehow assume that it will be
80 bits - then I'm perfectly fine with it.

"long double" in C/C++ works just the same.

But if you *do* want to talk about the "X87"
80-bit type, then please do by all means use
"extended" instead. Less confusion, all around ?
(let's save "quadruple" for later, with "cent")

--anders

Apr 04 2005

"Bob W" <nospam aol.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message 
news:d2rdjp$1vfl$1 digitaldaemon.com...
 Ben Hinkle wrote:

 OK, needless to say I think a builtin type that
 is illegal on many platforms is a mistake.

 That is actually *not* needless to say,
 but Walter agrees with you on the topic.

 Just as we can talk about "real" as the
 64/80/96/128 bit floating point type,
 and not somehow assume that it will be
 80 bits - then I'm perfectly fine with it.

 "long double" in C/C++ works just the same.

 But if you *do* want to talk about the "X87"
 80-bit type, then please do by all means use
 "extended" instead. Less confusion, all around ?
 (let's save "quadruple" for later, with "cent")

 --anders


The IEEE 754r suggests that there won't be
a 80bit nor a 96bit format in future (whenever
this may be).

Ref.: My today's post about IEEE 754r

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Bob W wrote:

But if you *do* want to talk about the "X87"
80-bit type, then please do by all means use
"extended" instead. Less confusion, all around ?
(let's save "quadruple" for later, with "cent")

 
 The IEEE 754r suggests that there won't be
 a 80bit nor a 96bit format in future (whenever
 this may be).

According to Sun, Microsoft, IBM and Apple
there isn't such a 80-bit type today even... ;-)

BTW; the 96-bit floating point was the type
preferred by the 68K families FPU processor

--anders

Apr 04 2005

"Walter" <newshound digitalmars.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message
news:d2rhtd$258a$1 digitaldaemon.com...
 Bob W wrote:
 The IEEE 754r suggests that there won't be
 a 80bit nor a 96bit format in future (whenever
 this may be).

 According to Sun, Microsoft, IBM and Apple
 there isn't such a 80-bit type today even... ;-)

I fear it will be constant struggle to keep the chipmakers from dropping it
and the OS vendors from abandoning support.

Apr 04 2005

Charles Hixson <charleshixsn earthlink.net> writes:

Ben Hinkle wrote:
 "Anders F Bj�rklund" <afb algonet.se> wrote in message 
 news:d2qq5u$1aau$1 digitaldaemon.com...
 
Walter wrote:


If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care,
but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from the 
int -> short/long that C has gotten so much beating for already ?

The suggestion was to have fixed precision types:
- float => IEEE 754 Single precision (32-bit)
- double => IEEE 754 Double precision (64-bit)
- extended => IEEE 754 Double Extended precision (80-bit)
- quadruple => "IEEE 754" Quadruple precision (128-bit)

And then have "real" be an alias to the largest hardware-supported type.
It wouldn't break code more than if it was a variadic size type format ?

 
 
 What happens when someone declares a variable as quadruple on a platform 
 without hardware support? Does D plug in a software quadruple 
 implementation? That isn't the right thing to do. That's been my whole point 
 of bringing up Java's experience. They tried to foist too much rigor on 
 their floating point model in the name of portability and had to redo it. 
 
 

Perhaps Ada has the right idea here.  Have a system default that 
depends on the available hardware, but also allow the user to 
define what size/precision is needed in any particular case.  It 
may slow things down a lot if you demand 17 places of accuracy, 
but if you really need exactly 17, you should be able to specify 
it.  (OTOH, Ada had the govt. paying for it's development, and it 
still ended up as a language people didn't want to use.)

Apr 05 2005

"Walter" <newshound digitalmars.com> writes:

"Anders F Bj�rklund" <afb algonet.se> wrote in message
news:d2qq5u$1aau$1 digitaldaemon.com...
 Walter wrote:
If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

 Yes, I believe that is better. Every once in a while, an app *does*


care,
 but they're screwed anyway if the hardware won't support it.

 I just fail to see how real -> double/extended, is any different from
 the int -> short/long that C has gotten so much beating for already ?

Philosophically, they are the same. Practically, however, they are very
different. Increasing integer sizes gives more range, and integer
calculations tend to be *right* or *wrong*. Floating point increased size,
however, gives more precision. So an answer is *better* or *worse*, insted
of right or wrong. (Increased bits also gives fp more range, but if the
range is not enough, it fails cleanly with an overflow indication, not just
wrapping around and giving garbage.) In other words, decreasing the bits in
an fp value tends to gracefully degrade the results, which is very different
from the effect on integer values.


 The suggestion was to have fixed precision types:
 - float => IEEE 754 Single precision (32-bit)
 - double => IEEE 754 Double precision (64-bit)
 - extended => IEEE 754 Double Extended precision (80-bit)
 - quadruple => "IEEE 754" Quadruple precision (128-bit)

 And then have "real" be an alias to the largest hardware-supported type.
 It wouldn't break code more than if it was a variadic size type format ?

I just don't see the advantage. If you use "extended" and your hardware
doesn't support it, you're out of luck. If you use "real", your program will
still compile and run. If certain characteristics of the "real" type are
required, one can use static asserts on the properties of real.


It was my understanding that it was aligned to 96 bits on X86,

 That's not a power of 2, so won't work as alignment.

 You lost me ? (anyway, I suggested 128 - which *is* a power of two)

There is nothing set up in the operating system or linker to handle
alignment to 96 bits or other values not a power of 2. Note that there is a
big difference between the size of an object and what its alignment is.

 But it was my understanding that on the X86/X86_64 family of processors
 that Windows used to use 10-byte doubles (and then removed extended?),
 and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now
 uses 16-byte doubles (using the GCC option of -m128bit-long-double)

 And that was *not* a suggestion, but how it actually worked... Now ?

Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if
gcc on linux does it that way or not.

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

Walter wrote:

 Philosophically, they are the same. Practically, however, they are very
 different. Increasing integer sizes gives more range, and integer
 calculations tend to be *right* or *wrong*. Floating point increased size,
 however, gives more precision. So an answer is *better* or *worse*, insted
 of right or wrong. (Increased bits also gives fp more range, but if the
 range is not enough, it fails cleanly with an overflow indication, not just
 wrapping around and giving garbage.) In other words, decreasing the bits in
 an fp value tends to gracefully degrade the results, which is very different
 from the effect on integer values.

Interesting view of it, but I think that int fixed-point math degrades 
gracefully in the same way (using integers) Still with wrapping, though.

Not that I've used fixed-point in quite some time, and it doesn't
seem like I will be either - with the current CPUs and the new APIs.

 I just don't see the advantage. If you use "extended" and your hardware
 doesn't support it, you're out of luck. If you use "real", your program will
 still compile and run. If certain characteristics of the "real" type are
 required, one can use static asserts on the properties of real.

To be honest, I was just tired of the "real is 80 bits" all over D ?
And more than a little annoyed at the ireal and creal, of course ;-)

I always thought that "long double" was confusing, so now I've started
to use "extended" for 80-bit and "real" for the biggest-available-type.

And it's working out good so far.

And that was *not* a suggestion, but how it actually worked... Now ?

 
 Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if
 gcc on linux does it that way or not.

Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes.

--anders

Apr 04 2005

=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:

 Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not 
 sure if gcc on linux does it that way or not.

 
 Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes.

Make that "Linux on X86 aligns to 4 bytes, by making the size 12".

You know what I mean :-)

--anders

Apr 04 2005

Charles Hixson <charleshixsn earthlink.net> writes:

Walter wrote:
 "Anders F Bj�rklund" <afb algonet.se> wrote in message
 news:d2og5l$27nh$3 digitaldaemon.com...
 
Walter wrote:


I haven't done a comprehensive survey of computer languages, but as far


 
 as I
 
can tell D stands pretty much alone in its support for 80 bits, along


 
 with a
 
handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

 
 
 Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software
 emulator. I've used such emulators before, and they are really, really slow.
 I don't think it's practical for D floating point to be 100x slower on some
 machines.
 ...
 

Would implementing fixed point arithmetic improve that?  Even 
with a 128-bit integer as the underlying type, I think it would 
have operational limitations, but it should be a lot faster then 
"100 times as slow as hardware".  (OTOH, there's lots of reasons 
why it isn't a normal feature of languages.  Apple on the 68000 
series is the only computer I know of using it, and then only for 
specialized applications.)

Apr 05 2005

"Walter" <newshound digitalmars.com> writes:

"Charles Hixson" <charleshixsn earthlink.net> wrote in message
news:d2unfm$2n6s$1 digitaldaemon.com...
 Would implementing fixed point arithmetic improve that?  Even
 with a 128-bit integer as the underlying type, I think it would
 have operational limitations, but it should be a lot faster then
 "100 times as slow as hardware".  (OTOH, there's lots of reasons
 why it isn't a normal feature of languages.  Apple on the 68000
 series is the only computer I know of using it, and then only for
 specialized applications.)

If using a 128 bit fixed point would work, then one can use integer
arithmetic on it. But that isn't floating point, which is a fundamentally
different animal.

Apr 05 2005

"Bob W" <nospam aol.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:d2od1o$25vd$1 digitaldaemon.com...
 "Bob W" <nospam aol.com> wrote in message
 news:d2nd96$1aos$1 digitaldaemon.com...
 By the way, C does it the same way for historic
 reasons. Other languages are more user friendly
 and I am still hoping that D might evolve in this
 direction.

 Actually, many languages, mathematical programs, and even C compilers have
 *dropped* support for 80 bit long doubles. At one point, Microsoft had 
 even
 made it impossible to execute 80 bit floating instructions on their 
 upcoming
 Win64 (I made some frantic phone calls to them and apparently was the only
 one who ever made a case to them in favor of 80 bit long doubles, they 
 said
 they'd put the support back in). Intel doesn't support 80 bit reals on any
 of their new vector floating point instructions. The 64 bit chips only

 80
 bit reals.

 I haven't done a comprehensive survey of computer languages, but as far as 
 I
 can tell D stands pretty much alone in its support for 80 bits, along with 
 a
 handful of C/C++ compilers (including DMC).

 Because of this shaky operating system and chip support for 80 bits, it
 would be a mistake to center D's floating point around 80 bits. Some 
 systems
 may force a reversion to 64 bits. On the other hand, ongoing system 
 support
 for 64 bit doubles is virtually guaranteed, and D generally follows C's
 rules with these.

 (BTW, this thread is a classic example of "build it, and they will come". 
 D
 is almost single handedly rescuing 80 bit floating point from oblivion,
 since it makes such a big deal about it and has wound up interesting a lot
 of people in it. Before D, as far as I could tell, nobody cared a whit 
 about
 it. I think it's great that this has struck such a responsive chord.)

I am probably looking like an extended precison
advocate, but I am actually not. The double
format was good enough for me even for
statistical evaluation in almost 100% of cases.
There are admittedly cases which would benefit
from having 80 bit precision available, however.

Therefore, although it would not be devastating for
me should you ever decide to drop support for the
reals, I'd still like having them available just in
case they are needed. However, if you do offer
80 bit types you'll have to assign real variables
with proper real values if evaluation can be
completed at compile time. Otherwise I suggest
that you issue a warning where accuracy might
be impaired. It is hard to believe that a new
millennium programming language would actually
require people to write

  real r=1.2L   instead of   real r=1.2

in order not to produce an incorrect assignment.
Yes, I know what C programmers would want
to say here, I am one of them.    : )

For someone not familiar with C, the number
1.2 is not a real and is not a double either,
especially if he is purely mathematically
oriented. It is a decimal floating point value.
He takes it for granted that 1.2 is fine whether
assigned to a float or to a double. But he will
refuse to understand why he has to suffix the
literal to become an accurate real value.

Of course you could try to explain him that
the usual +/- 1/2 LSB error for most fractional
(decimal) values converted to binary would
increase to about 11 LSBs if he ever forgot
to use that important "L" suffix. But would
he really want to know?

Apr 03 2005

"Bob W" <nospam aol.com> writes:

"Walter" <newshound digitalmars.com> wrote in message 
news:d2kk71$1pnl$1 digitaldaemon.com...
 "Bob W" <nospam aol.com> wrote in message
 news:d2ieh5$2ksl$1 digitaldaemon.com...
 - D is not entirely 80-bit based as claimed.


I still don't buy that.
Example: std.string.atof() as mentioned below.


 - Literals are converted to 64 bit first (and from there
   to 80 bits) at compile time if no suffix is used, even
   if the target is of type 'real'.

 Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

Maybe there is a misunderstanding:

I just wanted to mention that although it is claimed that
the default internal FP format is 80 bits, the default
floating point format for literals is double. The lexer,
(at least to my understanding) seems to confirm this.
Therefore, if someone does not want to experience a
loss in precision, he ALWAYS needs to use the L suffix
for literals, otherwise he gets a real which was converted
from a double.

e.g.:

real r1=1.2L;  // this one is ok thanks to the suffix
real r2=1.2;   // loss in precision, double convt'd to real



 - atof() for example is returning a 'real' value which is
   obviously derived from a 'double', thus missing some
   essential bits at the end.

 Check out std.math2.atof(). It's fully 80 bit.

This one yes, but not the official Phobos
version std.string.atof() which I have used.
Phobos docs suggest that atof() can be found in

  1) std.math (n/a)
  2) std.string

Since I have not found any atof() function in std.math
and std.math2 is not even mentioned in the Phobos docs,
I've got it from std.string AND THIS ONE IS 64 BIT!

--------- quote from "c.stdlib.d" ---------
double atof(char *);
--------------- unquote -------------------


--------- quote from "string.d" ----------
real atof(char[] s)
{
    // BUG: should implement atold()
    return std.c.stdlib.atof(toStringz(s));
}
--------------- unquote -------------------

Due to heavy workload this issue might have
been overlooked. Luckily I do not even have
to mention the word "BUG", this was aparently
already done in the author's comment line.  : )

After searching the archives it looks like
someone was already troubled by the multiple
appearance of atof() in Nov 2004:

http://www.digitalmars.com/d/archives/digitalmars/D/bugs/2196.html



 Example:

 The hex value for 0.0000195 in 'real' can be expressed as
        3fef a393ee5e edcc20d5
 or
        3fef a393ee5e edcc20d6
 (due to the non-decimal fraction).

 The same value converted from a 'double' would be
        3fef a393ee5e edcc2000
 and therefore misses several trailing bits. This could
 cause the floor() function to misbehave.


 I hope this info was somewhat useful.

 Perhaps the following program will help:

 import std.stdio;

 void main()
 {
    writefln("float  %a", 0.0000195F);
    writefln("double %a", 0.0000195);
    writefln("real   %a", 0.0000195L);

    writefln("cast(real)float  %a", cast(real)0.0000195F);
    writefln("cast(real)double %a", cast(real)0.0000195);
    writefln("cast(real)real   %a", cast(real)0.0000195L);

    writefln("float  %a", 0.0000195F * 7 - 195);
    writefln("double %a", 0.0000195  * 7 - 195);
    writefln("real   %a", 0.0000195L * 7 - 195);
 }


 float  0x1.4727dcp-16
 double 0x1.4727dcbddb984p-16
 real   0x1.4727dcbddb9841acp-16
 cast(real)float  0x1.4727dcp-16
 cast(real)double 0x1.4727dcbddb984p-16
 cast(real)real   0x1.4727dcbddb9841acp-16
 float  -0x1.85ffeep+7
 double -0x1.85ffee1bd1edap+7
 real   -0x1.85ffee1bd1ed9dfep+7


In accordance to what I have mentioned before,
the following program demonstrates the
existence of "truncated" reals:


void main() {
  real r1=1.2L;  // converted directly to 80 bit value
  real r2=1.2;   // parsed to 64b, then convt'd to 80b

  writefln("Genuine  : %a",r1);
  writefln("Truncated: %a",r2);
}


Output (using %a):

Genuine  : 0x1.3333333333333334p+0
Truncated: 0x1.3333333333333p+0


Alternative Output:

Genuine:    1.20000000000000000 [3fff 99999999 9999999a]
Truncated:  1.19999999999999996 [3fff 99999999 99999800]

Apr 01 2005

"Walter" <newshound digitalmars.com> writes:

"Bob W" <nospam aol.com> wrote in message
news:d2kvcc$22qa$1 digitaldaemon.com...
 I just wanted to mention that although it is claimed that
 the default internal FP format is 80 bits,

Actually, what is happening is that if you write the expression:

    double a, b, c, d;
    a = b + c + d;

then the intermediate values generated by b+c+d are allowed (but not
required) to be evaluated to the largest precision available. This means
that it's allowed to evaluate it as:

    a = cast(double)(cast(real)b + cast(real)c + cast(real)d));

but it is not required to evaluate it in that way. This produces a slightly
different result than:

    double t;
    t = b + c;
    a = t + d;

The latter is the way Java is specified to work, which turns out to be both
numerically inferior and *slower* on the x86 FPU. The x86 FPU *wants* to
evaluate things to 80 bits.

The D compiler's internal paths fully support 80 bit arithmetic, that means
there are no surprising "choke points" where it gets truncated to 64 bits.
If the type of a literal is specified to be 'double', which is the case for
no suffix, then you get 64 bits of precision. I hope you'll agree that that
is the least surprising thing to do.

 Check out std.math2.atof(). It's fully 80 bit.

 I've got it from std.string AND THIS ONE IS 64 BIT!

True, that's a bug, and I'll fix it

Apr 01 2005

"Bob W" <nospam aol.com> writes:

I have started a new thread: "80 Bit Challenge",
which should serve as a reply to your post ....

Apr 02 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Exotic floor() function - D is different