D - D floating point maths

John Fletcher (4/4) Feb 07 2002 At the moment function like sqrt() use the underlying C functions in

Pavel Minayev (5/8) Feb 07 2002 Yes - write them =)
Sean L. Palmer (9/13) Feb 08 2002 Actually can we have some functions like sin, cos, tan, and sqrt that de...

John Fletcher (7/14) Feb 08 2002 I was pondring implementing the full precision versions I talked about. ...

Pavel Minayev (7/12) Feb 08 2002 instructions

Walter (4/7) Feb 08 2002 It usually just subtracts 12 from ESP and does an FST. With scheduling a...

Russell Borogove (5/18) Feb 08 2002 Indeed on all counts, both extended and single-precision versions

Pavel Minayev (3/6) Feb 08 2002 Not till we get inline asm working =)
Sean L. Palmer (12/31) Feb 08 2002 I believe the common form of this stuff is to add "f" to the end of the ...

Pavel Minayev (4/8) Feb 08 2002 name

Juan Carlos Arevalo Baeza (6/15) Feb 08 2002 The suffix specifies the precision of the return type, which cannot b...

Pavel Minayev (6/8) Feb 08 2002 Return type can be determined by the argument:

Sean L. Palmer (4/13) Feb 08 2002 True true.

Walter (4/8) Feb 08 2002 name

Walter (7/21) Feb 08 2002 On Intel processors, the float and double math computations are not one ...

Sean L. Palmer (12/16) Feb 08 2002 That's not true... but you have to set the CPU into low precision mode t...

Walter (4/21) Feb 09 2002 Hmm. I didn't know that. -Walter

Sean L. Palmer (46/74) Feb 09 2002 Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this:

Walter (9/86) Feb 09 2002 I know that you can reset the internal calculation precision. I did not ...

Sean L. Palmer (8/11) Feb 09 2002 You may have read more recent docs than I have... last I thoroughly chec...

Walter (5/16) Feb 09 2002 I suppose the definitive way is to write a benchmark.
Serge K (15/19) Feb 10 2002 That's all what I found in Intel optimization manuals:

Serge K (23/30) Feb 10 2002 The same info with additions and corrections:

Russell Borogove (8/14) Feb 09 2002 As an extension of item 2, note that in the FPU, they're not

Walter (3/9) Feb 09 2002 Ok, I hadn't thought of that.

John Fletcher <J.P.Fletcher aston.ac.uk> writes:

At the moment function like sqrt() use the underlying C functions in
double precision. Is there any way to have versions which work to
extended precision?

John

Feb 07 2002

"Pavel Minayev" <evilone omen.ru> writes:

"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
news:3C6268D1.C1E23080 aston.ac.uk...

 At the moment function like sqrt() use the underlying C functions in
 double precision. Is there any way to have versions which work to
 extended precision?

Yes - write them =)
The comment there says it's just a temporary solution. I'd expect
all the math functions to be rewritten until the final release.

Feb 07 2002

"Sean L. Palmer" <spalmer iname.com> writes:

Actually can we have some functions like sin, cos, tan, and sqrt that deal
with float instead of double?  In the world of games, speed is usually more
important than accuracy and I hate having to explicitly typecast back to
float to avoid warnings.

Another nice thing to have is reciprocal square root (most processors have
this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

Sean

"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
news:3C6268D1.C1E23080 aston.ac.uk...
 At the moment function like sqrt() use the underlying C functions in
 double precision. Is there any way to have versions which work to
 extended precision?

 John

Feb 08 2002

John Fletcher <J.P.Fletcher aston.ac.uk> writes:

"Sean L. Palmer" wrote:

 Actually can we have some functions like sin, cos, tan, and sqrt that deal
 with float instead of double?  In the world of games, speed is usually more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.

 Another nice thing to have is reciprocal square root (most processors have
 this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

 Sean

I was pondring implementing the full precision versions I talked about.  I
think that the modern Intel and compatible chips have coprocessor instructions
which actually do the work to full precision.  Is it just a question of
wrapping the correct input and output around that to do the different cases?
There may be some error trapping as well.

John

Feb 08 2002

"Pavel Minayev" <evilone omen.ru> writes:

"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
news:3C63C2A2.CC837516 aston.ac.uk...

 I was pondring implementing the full precision versions I talked about.  I
 think that the modern Intel and compatible chips have coprocessor

instructions
 which actually do the work to full precision.  Is it just a question of
 wrapping the correct input and output around that to do the different

cases?
 There may be some error trapping as well.

Yes, AFAIK Intel FPUs do calculations in full precision anyhow. However,
extended arguments have to be passed on stack, and since they're 10-byte
long, you get three PUSHes (while float would only take one).

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a40jcl$1oso$1 digitaldaemon.com...
 Yes, AFAIK Intel FPUs do calculations in full precision anyhow. However,
 extended arguments have to be passed on stack, and since they're 10-byte
 long, you get three PUSHes (while float would only take one).

It usually just subtracts 12 from ESP and does an FST. With scheduling and
pipelining, the extra instruction frequently takes no extra time.

Feb 08 2002

Russell Borogove <kaleja estarcion.com> writes:

Sean L. Palmer wrote:

 Actually can we have some functions like sin, cos, tan, and sqrt that deal
 with float instead of double?  In the world of games, speed is usually more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.
 
 Another nice thing to have is reciprocal square root (most processors have
 this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

 "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
 news:3C6268D1.C1E23080 aston.ac.uk...
 
At the moment function like sqrt() use the underlying C functions in
double precision. Is there any way to have versions which work to
extended precision?



Indeed on all counts, both extended and single-precision versions
of at least the more common math functions would be valuable.
Who wants to get to work on that library? :)

-Russell B

Feb 08 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3C63FBED.1080500 estarcion.com...

 Indeed on all counts, both extended and single-precision versions
 of at least the more common math functions would be valuable.
 Who wants to get to work on that library? :)

Not till we get inline asm working =)

Feb 08 2002

"Sean L. Palmer" <spalmer iname.com> writes:

I believe the common form of this stuff is to add "f" to the end of the name

sqrtf
fabsf
fmodf


etc

Sean

"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3C63FBED.1080500 estarcion.com...
 Sean L. Palmer wrote:

 Actually can we have some functions like sin, cos, tan, and sqrt that


deal
 with float instead of double?  In the world of games, speed is usually


more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.

 Another nice thing to have is reciprocal square root (most processors


have
 this nowadays...) usually it's cheaper (and less accurate) than


1/sqrt(x)
 "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
 news:3C6268D1.C1E23080 aston.ac.uk...

At the moment function like sqrt() use the underlying C functions in
double precision. Is there any way to have versions which work to
extended precision?



 Indeed on all counts, both extended and single-precision versions
 of at least the more common math functions would be valuable.
 Who wants to get to work on that library? :)

 -Russell B

Feb 08 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a4192g$2ihc$1 digitaldaemon.com...

 I believe the common form of this stuff is to add "f" to the end of the

name
 sqrtf
 fabsf
 fmodf

Why, if we have function overloading?

Feb 08 2002

"Juan Carlos Arevalo Baeza" <jcab roningames.com> writes:

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a419ev$2kdg$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a4192g$2ihc$1 digitaldaemon.com...

 I believe the common form of this stuff is to add "f" to the end of the

 name
 sqrtf
 fabsf
 fmodf

 Why, if we have function overloading?

   The suffix specifies the precision of the return type, which cannot be
overloaded on. Or am I wrong?

Salutaciones,
                         JCAB

Feb 08 2002

"Pavel Minayev" <evilone omen.ru> writes:

"Juan Carlos Arevalo Baeza" <jcab roningames.com> wrote in message
news:a419vp$2kol$1 digitaldaemon.com...

    The suffix specifies the precision of the return type, which cannot be
 overloaded on. Or am I wrong?

Return type can be determined by the argument:

    float sqrt(float);
    double sqrt(double);
    extended sqrt(extended);

Feb 08 2002

"Sean L. Palmer" <spalmer iname.com> writes:

True true.

Sean

"Pavel Minayev" <evilone omen.ru> wrote in message
news:a419ev$2kdg$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a4192g$2ihc$1 digitaldaemon.com...

 I believe the common form of this stuff is to add "f" to the end of the

 name
 sqrtf
 fabsf
 fmodf

 Why, if we have function overloading?

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a4192g$2ihc$1 digitaldaemon.com...
 I believe the common form of this stuff is to add "f" to the end of the

name
 sqrtf
 fabsf
 fmodf


Since D supports overloading by argument type, that is not necessary.

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

On Intel processors, the float and double math computations are not one iota
faster than the extended ones. The ONLY reasons to use float and double are:

1) compatibility with C
2) large arrays will use less space

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a40c1s$1lfi$1 digitaldaemon.com...
 Actually can we have some functions like sin, cos, tan, and sqrt that deal
 with float instead of double?  In the world of games, speed is usually

more
 important than accuracy and I hate having to explicitly typecast back to
 float to avoid warnings.

 Another nice thing to have is reciprocal square root (most processors have
 this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x)

 Sean

 "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message
 news:3C6268D1.C1E23080 aston.ac.uk...
 At the moment function like sqrt() use the underlying C functions in
 double precision. Is there any way to have versions which work to
 extended precision?

 John

Feb 08 2002

"Sean L. Palmer" <spalmer iname.com> writes:

That's not true... but you have to set the CPU into low precision mode to
see the speed advantages.  Otherwise it internally works with double
precision by default.

In game scenarios, we can't just go around wasting 8 bytes per number when 4
bytes will do.  And it depends on the processor, as well.

Floats are still definitely faster.  For instance the P4 can handle 2
doubles per instruction, but can do 4 floats in the same amount of time.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not one

iota
 faster than the extended ones. The ONLY reasons to use float and double

are:
 1) compatibility with C
 2) large arrays will use less space

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

Hmm. I didn't know that. -Walter

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a42k1f$6m1$1 digitaldaemon.com...
 That's not true... but you have to set the CPU into low precision mode to
 see the speed advantages.  Otherwise it internally works with double
 precision by default.

 In game scenarios, we can't just go around wasting 8 bytes per number when

4
 bytes will do.  And it depends on the processor, as well.

 Floats are still definitely faster.  For instance the P4 can handle 2
 doubles per instruction, but can do 4 floats in the same amount of time.

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not one

 iota
 faster than the extended ones. The ONLY reasons to use float and double

 are:
 1) compatibility with C
 2) large arrays will use less space

Feb 09 2002

"Sean L. Palmer" <spalmer iname.com> writes:

Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this:
(you can do timings yourself if you wish...  It only affects the FPU x87
coprocessor.  I haven't tried this in a few years so newer Pentium 4
processors may not see much advantage from this)  However using SSE2 it is
still true that with one instruction you can either process 2 doubles or 4
floats.

I believe the main advantage this provides is keeping the FPU from having to
do so much work with complex calculations like division, square root, trig,
etc.  Less bits of precision need be computed.  They can get away with fewer
iterations, cheaper approximations, less terms in the Taylor series, etc.

In a lot of cases 5 or 6 digits of precision is all we need.  So don't get
rid of the float type yet.  ;)

Sean

/* CNTRL87.C: This program uses _control87 to output the control
 * word, set the precision to 24 bits, and reset the status to
 * the default.
 */

#include <stdio.h>
#include <float.h>

void main( void )
{
   double a = 0.1;

   /* Show original control word and do calculation. */
   printf( "Original: 0x%.4x\n", _control87( 0, 0 ) );
   printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

   /* Set precision to 24 bits and recalculate. */
   printf( "24-bit:   0x%.4x\n", _control87( _PC_24, MCW_PC ) );
   printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

   /* Restore to default and recalculate. */
   printf( "Default:  0x%.4x\n",
          _control87( _CW_DEFAULT, 0xfffff ) );
   printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );
}



Output

Original: 0x9001f
0.1 * 0.1 = 1.000000000000000e-002
24-bit:   0xa001f
0.1 * 0.1 = 9.999999776482582e-003
Default:  0x001f
0.1 * 0.1 = 1.000000000000000e-002


"Walter" <walter digitalmars.com> wrote in message
news:a42tca$hrc$2 digitaldaemon.com...
 Hmm. I didn't know that. -Walter

 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a42k1f$6m1$1 digitaldaemon.com...
 That's not true... but you have to set the CPU into low precision mode


to
 see the speed advantages.  Otherwise it internally works with double
 precision by default.

 In game scenarios, we can't just go around wasting 8 bytes per number


when
 4
 bytes will do.  And it depends on the processor, as well.

 Floats are still definitely faster.  For instance the P4 can handle 2
 doubles per instruction, but can do 4 floats in the same amount of time.

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not



one
 iota
 faster than the extended ones. The ONLY reasons to use float and



double
 are:
 1) compatibility with C
 2) large arrays will use less space

Feb 09 2002

"Walter" <walter digitalmars.com> writes:

I know that you can reset the internal calculation precision. I did not know
this affected execution time, I've not seen any hint of that in the Intel
CPU documentation, though I could have just missed it.

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a444db$14h4$1 digitaldaemon.com...
 Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this:
 (you can do timings yourself if you wish...  It only affects the FPU x87
 coprocessor.  I haven't tried this in a few years so newer Pentium 4
 processors may not see much advantage from this)  However using SSE2 it is
 still true that with one instruction you can either process 2 doubles or 4
 floats.

 I believe the main advantage this provides is keeping the FPU from having

to
 do so much work with complex calculations like division, square root,

trig,
 etc.  Less bits of precision need be computed.  They can get away with

fewer
 iterations, cheaper approximations, less terms in the Taylor series, etc.

 In a lot of cases 5 or 6 digits of precision is all we need.  So don't get
 rid of the float type yet.  ;)

 Sean

 /* CNTRL87.C: This program uses _control87 to output the control
  * word, set the precision to 24 bits, and reset the status to
  * the default.
  */

 #include <stdio.h>
 #include <float.h>

 void main( void )
 {
    double a = 0.1;

    /* Show original control word and do calculation. */
    printf( "Original: 0x%.4x\n", _control87( 0, 0 ) );
    printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

    /* Set precision to 24 bits and recalculate. */
    printf( "24-bit:   0x%.4x\n", _control87( _PC_24, MCW_PC ) );
    printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );

    /* Restore to default and recalculate. */
    printf( "Default:  0x%.4x\n",
           _control87( _CW_DEFAULT, 0xfffff ) );
    printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a );
 }



 Output

 Original: 0x9001f
 0.1 * 0.1 = 1.000000000000000e-002
 24-bit:   0xa001f
 0.1 * 0.1 = 9.999999776482582e-003
 Default:  0x001f
 0.1 * 0.1 = 1.000000000000000e-002


 "Walter" <walter digitalmars.com> wrote in message
 news:a42tca$hrc$2 digitaldaemon.com...
 Hmm. I didn't know that. -Walter

 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:a42k1f$6m1$1 digitaldaemon.com...
 That's not true... but you have to set the CPU into low precision mode


 to
 see the speed advantages.  Otherwise it internally works with double
 precision by default.

 In game scenarios, we can't just go around wasting 8 bytes per number


 when
 4
 bytes will do.  And it depends on the processor, as well.

 Floats are still definitely faster.  For instance the P4 can handle 2
 doubles per instruction, but can do 4 floats in the same amount of



time.
 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a41oen$2se5$4 digitaldaemon.com...
 On Intel processors, the float and double math computations are not



 one
 iota
 faster than the extended ones. The ONLY reasons to use float and



 double
 are:
 1) compatibility with C
 2) large arrays will use less space

Feb 09 2002

"Sean L. Palmer" <spalmer iname.com> writes:

You may have read more recent docs than I have... last I thoroughly checked
this out was on Pentium 1 (in fact Intel seems to not want to disclose
instruction cycle counts anymore... hard to find this info in latest P3
specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

Sean

"Walter" <walter digitalmars.com> wrote in message
news:a446n5$15hm$3 digitaldaemon.com...
 I know that you can reset the internal calculation precision. I did not

know
 this affected execution time, I've not seen any hint of that in the Intel
 CPU documentation, though I could have just missed it.

Feb 09 2002

"Walter" <walter digitalmars.com> writes:

I suppose the definitive way is to write a benchmark.

"Sean L. Palmer" <spalmer iname.com> wrote in message
news:a44i10$19tn$1 digitaldaemon.com...
 You may have read more recent docs than I have... last I thoroughly

checked
 this out was on Pentium 1 (in fact Intel seems to not want to disclose
 instruction cycle counts anymore... hard to find this info in latest P3
 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:a446n5$15hm$3 digitaldaemon.com...
 I know that you can reset the internal calculation precision. I did not

 know
 this affected execution time, I've not seen any hint of that in the


Intel
 CPU documentation, though I could have just missed it.

Feb 09 2002

"Serge K" <skarebo programmer.net> writes:

 You may have read more recent docs than I have... last I thoroughly

checked
 this out was on Pentium 1 (in fact Intel seems to not want to disclose
 instruction cycle counts anymore... hard to find this info in latest P3
 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)

That's all what I found in Intel optimization manuals:

FDIV: Latency (single, double, extended) cycles:
Pentium Pro : 17,  36,  56
Pentium 2,3 :  18,  32,  38
Pentium 4    :  23,  38,  43

FSQR: Latency (single, double, extended) cycles:
Pentium 4    :  23,  38,  43

btw,
It is highly not recomended to do any performance sensetive calculations on
x86 in extended precision.
There are no reg-mem floating point instructions for 80bit floats,
everything have to be compiled into ( FLD / stack operations / FST ) form.
Besides, extended precision FLD & FST are much slower than single/double
precision.

Feb 10 2002

"Serge K" <skarebo programmer.net> writes:

The same info with additions and corrections:

FDIV: Latency (single, double, extended) cycles:
Pentium Pro : 17,  36,  56
Pentium 2,3 :  18,  32,  38
Pentium 4    :  23,  38,  43
Athlon (K7) :  16,  20,  24

FSQRT: Latency (single, double, extended) cycles:
Pentium 4    :  23,  38,  43
Athlon (K7) :  19,  27,  35

FLD: Latency (single, double, extended) cycles:
Athlon (K7) :  2,  2,  10
FSTP: Latency (single, double, extended) cycles:
Athlon (K7) :  4,  4,  8

I have no info about FLD/FSTP on Pentium Pro..4,
only the number of micro-ops for FLD/FSTD:

number of micro-ops (single, double, extended):
FLD   : 1,  1,  4
FSTP : 2,  2, complex instruction

 btw,
 It is highly not recomended to do any performance sensitive calculations

on
 x86 in extended precision.
 There are no reg-mem floating point instructions for 80bit floats,
 everything have to be compiled into ( FLD / stack operations / FST ) form.
 Besides, extended precision FLD & FST are much slower than single/double
 precision.

It should be: ( FLD / stack operations / FSTP ),
since there is no FST for extended precision.
It means : there is no way to store some result in memory without throwing
it out of FPU stack.

Feb 10 2002

Russell Borogove <kaleja estarcion.com> writes:

Walter wrote:

 On Intel processors, the float and double math computations are not one iota
 faster than the extended ones. The ONLY reasons to use float and double are:
 
 1) compatibility with C
 2) large arrays will use less space
 


As an extension of item 2, note that in the FPU, they're not
one iota faster, but getting thousands of floats into and out
of level-1 cache is much faster than doubles or extendeds.

That's the main reason that 3D graphics and high-end audio
applications, today, use floats instead of the fatter
formats.


-RB

Feb 09 2002

"Walter" <walter digitalmars.com> writes:

"Russell Borogove" <kaleja estarcion.com> wrote in message
news:3C6590B5.1000602 estarcion.com...
 As an extension of item 2, note that in the FPU, they're not
 one iota faster, but getting thousands of floats into and out
 of level-1 cache is much faster than doubles or extendeds.

 That's the main reason that 3D graphics and high-end audio
 applications, today, use floats instead of the fatter
 formats.

Ok, I hadn't thought of that.

Feb 09 2002

D Programming

C/C++ Programming

Other

D - D floating point maths