www.digitalmars.com         C & C++   DMDScript  

D - D floating point maths

reply John Fletcher <J.P.Fletcher aston.ac.uk> writes:
At the moment function like sqrt() use the underlying C functions in
double precision. Is there any way to have versions which work to
extended precision?

John
Feb 07 2002
next sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
news:3C6268D1.C1E23080 aston.ac.uk...

 At the moment function like sqrt() use the underlying C functions in
 double precision. Is there any way to have versions which work to
 extended precision?
Yes - write them =) The comment there says it's just a temporary solution. I'd expect all the math functions to be rewritten until the final release.
Feb 07 2002
prev sibling parent reply "Sean L. Palmer" <spalmer iname.com> writes:
Actually can we have some functions like sin, cos, tan, and sqrt that deal
with float instead of double?  In the world of games, speed is usually more
important than accuracy and I hate having to explicitly typecast back to
float to avoid warnings.

Another nice thing to have is reciprocal square root (most processors have
this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

Sean

"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
news:3C6268D1.C1E23080 aston.ac.uk...
 At the moment function like sqrt() use the underlying C functions in
 double precision. Is there any way to have versions which work to
 extended precision?

 John
Feb 08 2002
next sibling parent reply John Fletcher <J.P.Fletcher aston.ac.uk> writes:
"Sean L. Palmer" wrote:

 Actually can we have some functions like sin, cos, tan, and sqrt that deal
 with float instead of double?  In the world of games, speed is usually more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.

 Another nice thing to have is reciprocal square root (most processors have
 this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

 Sean
I was pondring implementing the full precision versions I talked about. I think that the modern Intel and compatible chips have coprocessor instructions which actually do the work to full precision. Is it just a question of wrapping the correct input and output around that to do the different cases? There may be some error trapping as well. John
Feb 08 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
news:3C63C2A2.CC837516 aston.ac.uk...

 I was pondring implementing the full precision versions I talked about.  I
 think that the modern Intel and compatible chips have coprocessor
instructions
 which actually do the work to full precision.  Is it just a question of
 wrapping the correct input and output around that to do the different
cases?
 There may be some error trapping as well.
Yes, AFAIK Intel FPUs do calculations in full precision anyhow. However, extended arguments have to be passed on stack, and since they're 10-byte long, you get three PUSHes (while float would only take one).
Feb 08 2002
parent "Walter" <walter digitalmars.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:a40jcl$1oso$1 digitaldaemon.com...
 Yes, AFAIK Intel FPUs do calculations in full precision anyhow. However,
 extended arguments have to be passed on stack, and since they're 10-byte
 long, you get three PUSHes (while float would only take one).
It usually just subtracts 12 from ESP and does an FST. With scheduling and pipelining, the extra instruction frequently takes no extra time.
Feb 08 2002
prev sibling next sibling parent reply Russell Borogove <kaleja estarcion.com> writes:
Sean L. Palmer wrote:

 Actually can we have some functions like sin, cos, tan, and sqrt that deal
 with float instead of double?  In the world of games, speed is usually more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.
 
 Another nice thing to have is reciprocal square root (most processors have
 this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

 "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
 news:3C6268D1.C1E23080 aston.ac.uk...
 
At the moment function like sqrt() use the underlying C functions in
double precision. Is there any way to have versions which work to
extended precision?
Indeed on all counts, both extended and single-precision versions of at least the more common math functions would be valuable. Who wants to get to work on that library? :) -Russell B
Feb 08 2002
next sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3C63FBED.1080500 estarcion.com...

 Indeed on all counts, both extended and single-precision versions
 of at least the more common math functions would be valuable.
 Who wants to get to work on that library? :)
Not till we get inline asm working =)
Feb 08 2002
prev sibling parent reply "Sean L. Palmer" <spalmer iname.com> writes:
I believe the common form of this stuff is to add "f" to the end of the name

sqrtf
fabsf
fmodf


etc

Sean

"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3C63FBED.1080500 estarcion.com...
 Sean L. Palmer wrote:

 Actually can we have some functions like sin, cos, tan, and sqrt that
deal
 with float instead of double?  In the world of games, speed is usually
more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.

 Another nice thing to have is reciprocal square root (most processors
have
 this nowadays...) usually it's cheaper (and less accurate) than
1/sqrt(x)
 "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
 news:3C6268D1.C1E23080 aston.ac.uk...

At the moment function like sqrt() use the underlying C functions in
double precision. Is there any way to have versions which work to
extended precision?
Indeed on all counts, both extended and single-precision versions of at least the more common math functions would be valuable. Who wants to get to work on that library? :) -Russell B
Feb 08 2002
next sibling parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a4192g$2ihc$1 digitaldaemon.com...

 I believe the common form of this stuff is to add "f" to the end of the
name
 sqrtf
 fabsf
 fmodf
Why, if we have function overloading?
Feb 08 2002
next sibling parent reply "Juan Carlos Arevalo Baeza" <jcab roningames.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:a419ev$2kdg$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a4192g$2ihc$1 digitaldaemon.com...

 I believe the common form of this stuff is to add "f" to the end of the
name
 sqrtf
 fabsf
 fmodf
Why, if we have function overloading?
The suffix specifies the precision of the return type, which cannot be overloaded on. Or am I wrong? Salutaciones, JCAB
Feb 08 2002
parent "Pavel Minayev" <evilone omen.ru> writes:
"Juan Carlos Arevalo Baeza" <jcab roningames.com> wrote in message
news:a419vp$2kol$1 digitaldaemon.com...

    The suffix specifies the precision of the return type, which cannot be
 overloaded on. Or am I wrong?
Return type can be determined by the argument: float sqrt(float); double sqrt(double); extended sqrt(extended);
Feb 08 2002
prev sibling parent "Sean L. Palmer" <spalmer iname.com> writes:
True true.

Sean

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a419ev$2kdg$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a4192g$2ihc$1 digitaldaemon.com...

 I believe the common form of this stuff is to add "f" to the end of the
name
 sqrtf
 fabsf
 fmodf
Why, if we have function overloading?
Feb 08 2002
prev sibling parent "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a4192g$2ihc$1 digitaldaemon.com...
 I believe the common form of this stuff is to add "f" to the end of the
name
 sqrtf
 fabsf
 fmodf
Since D supports overloading by argument type, that is not necessary.
Feb 08 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
On Intel processors, the float and double math computations are not one iota
faster than the extended ones. The ONLY reasons to use float and double are:

1) compatibility with C
2) large arrays will use less space

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a40c1s$1lfi$1 digitaldaemon.com...
 Actually can we have some functions like sin, cos, tan, and sqrt that deal
 with float instead of double?  In the world of games, speed is usually
more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.

 Another nice thing to have is reciprocal square root (most processors have
 this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

 Sean

 "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
 news:3C6268D1.C1E23080 aston.ac.uk...
 At the moment function like sqrt() use the underlying C functions in
 double precision. Is there any way to have versions which work to
 extended precision?

 John
Feb 08 2002
next sibling parent reply "Sean L. Palmer" <spalmer iname.com> writes:
That's not true... but you have to set the CPU into low precision mode to
see the speed advantages.  Otherwise it internally works with double
precision by default.

In game scenarios, we can't just go around wasting 8 bytes per number when 4
bytes will do.  And it depends on the processor, as well.

Floats are still definitely faster.  For instance the P4 can handle 2
doubles per instruction, but can do 4 floats in the same amount of time.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not one
iota
 faster than the extended ones. The ONLY reasons to use float and double
are:
 1) compatibility with C
 2) large arrays will use less space
Feb 08 2002
parent reply "Walter" <walter digitalmars.com> writes:
Hmm. I didn't know that. -Walter

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a42k1f$6m1$1 digitaldaemon.com...
 That's not true... but you have to set the CPU into low precision mode to
 see the speed advantages.  Otherwise it internally works with double
 precision by default.

 In game scenarios, we can't just go around wasting 8 bytes per number when
4
 bytes will do.  And it depends on the processor, as well.

 Floats are still definitely faster.  For instance the P4 can handle 2
 doubles per instruction, but can do 4 floats in the same amount of time.

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not one
iota
 faster than the extended ones. The ONLY reasons to use float and double
are:
 1) compatibility with C
 2) large arrays will use less space
Feb 09 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this:
(you can do timings yourself if you wish...  It only affects the FPU x87
coprocessor.  I haven't tried this in a few years so newer Pentium 4
processors may not see much advantage from this)  However using SSE2 it is
still true that with one instruction you can either process 2 doubles or 4
floats.

I believe the main advantage this provides is keeping the FPU from having to
do so much work with complex calculations like division, square root, trig,
etc.  Less bits of precision need be computed.  They can get away with fewer
iterations, cheaper approximations, less terms in the Taylor series, etc.

In a lot of cases 5 or 6 digits of precision is all we need.  So don't get
rid of the float type yet.  ;)

Sean

/* CNTRL87.C: This program uses _control87 to output the control
 * word, set the precision to 24 bits, and reset the status to
 * the default.
 */

#include <stdio.h>
#include <float.h>

void main( void )
{
   double a = 0.1;

   /* Show original control word and do calculation. */
   printf( "Original: 0x%.4x\n", _control87( 0, 0 ) );
   printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

   /* Set precision to 24 bits and recalculate. */
   printf( "24-bit:   0x%.4x\n", _control87( _PC_24, MCW_PC ) );
   printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

   /* Restore to default and recalculate. */
   printf( "Default:  0x%.4x\n",
          _control87( _CW_DEFAULT, 0xfffff ) );
   printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );
}



Output

Original: 0x9001f
0.1 * 0.1 = 1.000000000000000e-002
24-bit:   0xa001f
0.1 * 0.1 = 9.999999776482582e-003
Default:  0x001f
0.1 * 0.1 = 1.000000000000000e-002


"Walter" <walter digitalmars.com> wrote in message
news:a42tca$hrc$2 digitaldaemon.com...
 Hmm. I didn't know that. -Walter

 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a42k1f$6m1$1 digitaldaemon.com...
 That's not true... but you have to set the CPU into low precision mode
to
 see the speed advantages.  Otherwise it internally works with double
 precision by default.

 In game scenarios, we can't just go around wasting 8 bytes per number
when
 4
 bytes will do.  And it depends on the processor, as well.

 Floats are still definitely faster.  For instance the P4 can handle 2
 doubles per instruction, but can do 4 floats in the same amount of time.

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not
one
 iota
 faster than the extended ones. The ONLY reasons to use float and
double
 are:
 1) compatibility with C
 2) large arrays will use less space
Feb 09 2002
parent reply "Walter" <walter digitalmars.com> writes:
I know that you can reset the internal calculation precision. I did not know
this affected execution time, I've not seen any hint of that in the Intel
CPU documentation, though I could have just missed it.

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a444db$14h4$1 digitaldaemon.com...
 Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this:
 (you can do timings yourself if you wish...  It only affects the FPU x87
 coprocessor.  I haven't tried this in a few years so newer Pentium 4
 processors may not see much advantage from this)  However using SSE2 it is
 still true that with one instruction you can either process 2 doubles or 4
 floats.

 I believe the main advantage this provides is keeping the FPU from having
to
 do so much work with complex calculations like division, square root,
trig,
 etc.  Less bits of precision need be computed.  They can get away with
fewer
 iterations, cheaper approximations, less terms in the Taylor series, etc.

 In a lot of cases 5 or 6 digits of precision is all we need.  So don't get
 rid of the float type yet.  ;)

 Sean

 /* CNTRL87.C: This program uses _control87 to output the control
  * word, set the precision to 24 bits, and reset the status to
  * the default.
  */

 #include <stdio.h>
 #include <float.h>

 void main( void )
 {
    double a = 0.1;

    /* Show original control word and do calculation. */
    printf( "Original: 0x%.4x\n", _control87( 0, 0 ) );
    printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

    /* Set precision to 24 bits and recalculate. */
    printf( "24-bit:   0x%.4x\n", _control87( _PC_24, MCW_PC ) );
    printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

    /* Restore to default and recalculate. */
    printf( "Default:  0x%.4x\n",
           _control87( _CW_DEFAULT, 0xfffff ) );
    printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );
 }



 Output

 Original: 0x9001f
 0.1 * 0.1 = 1.000000000000000e-002
 24-bit:   0xa001f
 0.1 * 0.1 = 9.999999776482582e-003
 Default:  0x001f
 0.1 * 0.1 = 1.000000000000000e-002


 "Walter" <walter digitalmars.com> wrote in message
 news:a42tca$hrc$2 digitaldaemon.com...
 Hmm. I didn't know that. -Walter

 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a42k1f$6m1$1 digitaldaemon.com...
 That's not true... but you have to set the CPU into low precision mode
to
 see the speed advantages.  Otherwise it internally works with double
 precision by default.

 In game scenarios, we can't just go around wasting 8 bytes per number
when
 4
 bytes will do.  And it depends on the processor, as well.

 Floats are still definitely faster.  For instance the P4 can handle 2
 doubles per instruction, but can do 4 floats in the same amount of
time.
 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not
one
 iota
 faster than the extended ones. The ONLY reasons to use float and
double
 are:
 1) compatibility with C
 2) large arrays will use less space
Feb 09 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
You may have read more recent docs than I have... last I thoroughly checked
this out was on Pentium 1 (in fact Intel seems to not want to disclose
instruction cycle counts anymore... hard to find this info in latest P3
specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

Sean

"Walter" <walter digitalmars.com> wrote in message
news:a446n5$15hm$3 digitaldaemon.com...
 I know that you can reset the internal calculation precision. I did not
know
 this affected execution time, I've not seen any hint of that in the Intel
 CPU documentation, though I could have just missed it.
Feb 09 2002
next sibling parent "Walter" <walter digitalmars.com> writes:
I suppose the definitive way is to write a benchmark.

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a44i10$19tn$1 digitaldaemon.com...
 You may have read more recent docs than I have... last I thoroughly
checked
 this out was on Pentium 1 (in fact Intel seems to not want to disclose
 instruction cycle counts anymore... hard to find this info in latest P3
 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a446n5$15hm$3 digitaldaemon.com...
 I know that you can reset the internal calculation precision. I did not
know
 this affected execution time, I've not seen any hint of that in the
Intel
 CPU documentation, though I could have just missed it.
Feb 09 2002
prev sibling parent reply "Serge K" <skarebo programmer.net> writes:
 You may have read more recent docs than I have... last I thoroughly
checked
 this out was on Pentium 1 (in fact Intel seems to not want to disclose
 instruction cycle counts anymore... hard to find this info in latest P3
 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)
That's all what I found in Intel optimization manuals: FDIV: Latency (single, double, extended) cycles: Pentium Pro : 17, 36, 56 Pentium 2,3 : 18, 32, 38 Pentium 4 : 23, 38, 43 FSQR: Latency (single, double, extended) cycles: Pentium 4 : 23, 38, 43 btw, It is highly not recomended to do any performance sensetive calculations on x86 in extended precision. There are no reg-mem floating point instructions for 80bit floats, everything have to be compiled into ( FLD / stack operations / FST ) form. Besides, extended precision FLD & FST are much slower than single/double precision.
Feb 10 2002
parent "Serge K" <skarebo programmer.net> writes:
The same info with additions and corrections:

FDIV: Latency (single, double, extended) cycles:
Pentium Pro : 17,  36,  56
Pentium 2,3 :  18,  32,  38
Pentium 4    :  23,  38,  43
Athlon (K7) :  16,  20,  24

FSQRT: Latency (single, double, extended) cycles:
Pentium 4    :  23,  38,  43
Athlon (K7) :  19,  27,  35

FLD: Latency (single, double, extended) cycles:
Athlon (K7) :  2,  2,  10
FSTP: Latency (single, double, extended) cycles:
Athlon (K7) :  4,  4,  8

I have no info about FLD/FSTP on Pentium Pro..4,
only the number of micro-ops for FLD/FSTD:

number of micro-ops (single, double, extended):
FLD   : 1,  1,  4
FSTP : 2,  2, complex instruction

 btw,
 It is highly not recomended to do any performance sensitive calculations
on
 x86 in extended precision.
 There are no reg-mem floating point instructions for 80bit floats,
 everything have to be compiled into ( FLD / stack operations / FST ) form.
 Besides, extended precision FLD & FST are much slower than single/double
 precision.
It should be: ( FLD / stack operations / FSTP ), since there is no FST for extended precision. It means : there is no way to store some result in memory without throwing it out of FPU stack.
Feb 10 2002
prev sibling parent reply Russell Borogove <kaleja estarcion.com> writes:
Walter wrote:

 On Intel processors, the float and double math computations are not one iota
 faster than the extended ones. The ONLY reasons to use float and double are:
 
 1) compatibility with C
 2) large arrays will use less space
 
As an extension of item 2, note that in the FPU, they're not one iota faster, but getting thousands of floats into and out of level-1 cache is much faster than doubles or extendeds. That's the main reason that 3D graphics and high-end audio applications, today, use floats instead of the fatter formats. -RB
Feb 09 2002
parent "Walter" <walter digitalmars.com> writes:
"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3C6590B5.1000602 estarcion.com...
 As an extension of item 2, note that in the FPU, they're not
 one iota faster, but getting thousands of floats into and out
 of level-1 cache is much faster than doubles or extendeds.

 That's the main reason that 3D graphics and high-end audio
 applications, today, use floats instead of the fatter
 formats.
Ok, I hadn't thought of that.
Feb 09 2002