www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - int opEquals(Object), and other legacy ints

reply Stewart Gordon <smjg_1998 yahoo.com> writes:
There seem to be a number of leftovers from before we had a bool type, 
and many people were using the int type to pass booleans around.

The most obvious is int opEquals(Object) defined in the Object class. 
Changing this'll break a considerable amount of existing code - but then 
again, the 0.163 change of making imports private by default has done 
this already.

But there are many functions in Phobos that can be cleaned up a bit 
without doing much harm.  Just to name a few....

std.string.iswhite
std.string.inPattern
std.ctype.isalnum (indeed, most of the functions in std.ctype)
std.file.exists
std.file.isfile
std.file.isdir
std.intrinsic.bt
std.intrinsic.btc
std.intrinsic.btr
std.intrinsic.bts
std.math.isnan (and other is* functions)
std.math.signbit

Going through the other modules will probably reveal many more, but I 
haven't checked.

Stewart.
Jul 20 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool type, 
 and many people were using the int type to pass booleans around.

They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.
Jul 21 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Walter Bright wrote:
 Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool type, 
 and many people were using the int type to pass booleans around.

They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.

But isn't bool an int internally? Why is it less efficient to use a bool? -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 21 2006
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Bruno Medeiros" <brunodomedeirosATgmail SPAM.com> wrote in message 
news:e9qd21$2ueu$2 digitaldaemon.com...

 But isn't bool an int internally? Why is it less efficient to use a bool?

It's a byte internally.
Jul 21 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Bruno Medeiros wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool 
 type, and many people were using the int type to pass booleans around.

They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.

But isn't bool an int internally? Why is it less efficient to use a bool?

It's a byte internally, and is constrained to be one of the values 0 or 1.
Jul 21 2006
parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Walter Bright wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool 
 type, and many people were using the int type to pass booleans around.

They are typed as returning int for efficiency reasons. These functions often appear in performance critical loops, where an extra instruction or two makes a difference.

But isn't bool an int internally? Why is it less efficient to use a bool?

It's a byte internally, and is constrained to be one of the values 0 or 1.

Duh, it's a byte of course, I should have checked that. But the question remains, is it then less efficient to return a byte than a int? Why? And if so isn't there a way for the compiler to somehow optimize it? I find it a bit hard to believe that nowadays there isn't sufficient compiler and/or CPU technology to somehow make a bool(byte) return value as efficient as a int one. :/ -- Bruno Medeiros - CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 27 2006
next sibling parent reply xs0 <xs0 xs0.com> writes:
 It's a byte internally, and is constrained to be one of the values 0 
 or 1.

Duh, it's a byte of course, I should have checked that. But the question remains, is it then less efficient to return a byte than a int? Why? And if so isn't there a way for the compiler to somehow optimize it? I find it a bit hard to believe that nowadays there isn't sufficient compiler and/or CPU technology to somehow make a bool(byte) return value as efficient as a int one. :/

Well, I'm just guessing, but I think something like
 int opEquals(Foo foo)
 {
     return this.bar == foo.bar;
 }

is compiled to something like
 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions

It's the 1/0 constraint on bools that causes the slowness, not the size (stack is usually size_t-aligned anyway) xs0
Jul 28 2006
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
xs0 wrote:
<snip>
 Well, I'm just guessing, but I think something like
 
  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }
 
 is compiled to something like
 
 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler. Moreover, what's your evidence that subtracting one number from another might be more efficient than comparing them for equality directly?
 It's the 1/0 constraint on bools that causes the slowness, not the size 
 (stack is usually size_t-aligned anyway)

But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 28 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?
 Moreover, what's your evidence that subtracting one number from another 
 might be more efficient than comparing them for equality directly?

The only difference between a CMP and a SUB instruction is where the result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a register takes more instructions.
 It's the 1/0 constraint on bools that causes the slowness, not the 
 size (stack is usually size_t-aligned anyway)

But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.
Jul 28 2006
next sibling parent reply kris <foo bar.com> writes:
Walter Bright wrote:
 Stewart Gordon wrote:
 
 xs0 wrote:
 <snip>

 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?
 Moreover, what's your evidence that subtracting one number from 
 another might be more efficient than comparing them for equality 
 directly?

The only difference between a CMP and a SUB instruction is where the result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a register takes more instructions.
 It's the 1/0 constraint on bools that causes the slowness, not the 
 size (stack is usually size_t-aligned anyway)

But if the function only tries to return 0 or 1 anyway, then what difference does it make? At the moment, I can't think of an example of equality testing that can be made more efficient by being allowed to return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.

So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?
Jul 28 2006
parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
kris wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.

So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?

Then what would happen if a and b differ by, say, 256? Remember, an int is 4 bytes, a bool is only 1.
Jul 28 2006
parent reply kris <foo bar.com> writes:
Frits van Bommel wrote:
 kris wrote:
 
 Walter Bright wrote:

 Stewart Gordon wrote:

 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.

So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?

Then what would happen if a and b differ by, say, 256? Remember, an int is 4 bytes, a bool is only 1.

Sure, but it's generally more efficient to do all logical and arithmetic operations in the native width of the device anyway ~ generally 32bits for current D compilers. If you're talking about issues related to actually storing a bool result, then that's part of the "concerns" noted above. Bool values derived in certains ways may need to be folded for storage, but not for testing. The subtraction case above may be included in that group, but testing should still only require a compare against zero (for both true and false). I'm suggesting only that zero values should *always* be used to test for 'truth' ~ never 1, or 255, or any value other than zero. Anywhere the keyword "true" is used (or implied) for comparative purposes, test against zero and invert the jmp-condition instead. If that's not done already, it would probably speed things up in many cases.
Jul 28 2006
parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
kris wrote:
 Frits van Bommel wrote:
 kris wrote:

 Walter Bright wrote:

 Stewart Gordon wrote:

 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an 
 example of equality testing that can be made more efficient by 
 being allowed to return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.

So, why not treat false as 0, and true as not 0? That way, it works just the same as the "int" version does (and comparing/testing against zero doesn't hit the address-bus). Yes, I can see some potential for concern there; but is there anything insurmountable?

Then what would happen if a and b differ by, say, 256? Remember, an int is 4 bytes, a bool is only 1.

Sure, but it's generally more efficient to do all logical and arithmetic operations in the native width of the device anyway ~ generally 32bits for current D compilers. If you're talking about issues related to actually storing a bool result, then that's part of the "concerns" noted above. Bool values derived in certains ways may need to be folded for storage, but not for testing. The subtraction case above may be included in that group, but testing should still only require a compare against zero (for both true and false). I'm suggesting only that zero values should *always* be used to test for 'truth' ~ never 1, or 255, or any value other than zero. Anywhere the keyword "true" is used (or implied) for comparative purposes, test against zero and invert the jmp-condition instead. If that's not done already, it would probably speed things up in many cases.

Actually, I'm pretty sure testing for zero is already how it's done (just with 1-byte operands instead of 4-byte ones). Something else: if there are multiple ways to represent true then equality testing just got a lot more complicated :).
Jul 29 2006
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
 Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?

If anything resembling the above, then return this.bar-foo.bar?0:1; which cancels out the advantage you mention next: <snip>
 The only difference between a CMP and a SUB instruction is where the 
 result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts 
 the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a 
 register takes more instructions.

<snip>
 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), and the result is int 0 for equality, int !=0 for inequality.

How is this (a == b) rather than (a != b)? Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 29 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Stewart Gordon wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?

If anything resembling the above, then return this.bar-foo.bar?0:1;

? Let's look at an example: class Foo { int foo, bar; int Eq1(Foo foo) { return this.bar-foo.bar?0:1; } int Eq2(Foo foo) { return this.bar-foo.bar; } } which generates: Eq1: mov EDX,4[ESP] mov ECX,0Ch[EAX] sub ECX,0Ch[EDX] cmp ECX,1 sbb EAX,EAX neg EAX ret 4 Eq2: mov ECX,4[ESP] mov EAX,0Ch[EAX] sub EAX,0Ch[ECX] ret 4 So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.
 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

How is this (a == b) rather than (a != b)?

I don't understand your question.
Jul 29 2006
next sibling parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 I can. (a == b), where a and b are ints, can be implemented as (a -
 b), and the result is int 0 for equality, int !=0 for inequality.

How is this (a == b) rather than (a != b)?

I don't understand your question.

(a - b), if a and b are equal ints, evaluates to 0, which is generally considered to mean false. So isn't (a - b) actually a way of finding (a != b), instead of (a == b)?
Jul 30 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Deewiant wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 I can. (a == b), where a and b are ints, can be implemented as (a -
 b), and the result is int 0 for equality, int !=0 for inequality.



(a - b), if a and b are equal ints, evaluates to 0, which is generally considered to mean false. So isn't (a - b) actually a way of finding (a != b), instead of (a == b)?

Oh, I see what you mean. To invert the result would take another 2 instructions for a total of 3, still less than 4.
Jul 30 2006
parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Walter Bright wrote:
 Deewiant wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 I can. (a == b), where a and b are ints, can be implemented as (a -
 b), and the result is int 0 for equality, int !=0 for inequality.



(a - b), if a and b are equal ints, evaluates to 0, which is generally considered to mean false. So isn't (a - b) actually a way of finding (a != b), instead of (a == b)?

Oh, I see what you mean. To invert the result would take another 2 instructions for a total of 3, still less than 4.

Exactly. But because what we have is opEquals and not opNotEquals, the benefit of fewer instructions is lost (except when opEquals is simple enough that the compiler can inline and optimise away the double negation). Indeed, on this basis, if we had opNotEquals then it would be just be equivalent to opCmp for many types. So I can see people thinking that opNotEquals should just call opCmp by default. However, there's a problem with this idea - for classes that have no ordering, even the current behaviour of comparing object references would have to be explicitly programmed in. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- C++ a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 30 2006
prev sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?

If anything resembling the above, then return this.bar-foo.bar?0:1;

? Let's look at an example: class Foo { int foo, bar; int Eq1(Foo foo) { return this.bar-foo.bar?0:1; } int Eq2(Foo foo) { return this.bar-foo.bar; } } which generates: Eq1: mov EDX,4[ESP] mov ECX,0Ch[EAX] sub ECX,0Ch[EDX] cmp ECX,1 sbb EAX,EAX neg EAX ret 4 Eq2: mov ECX,4[ESP] mov EAX,0Ch[EAX] sub EAX,0Ch[ECX] ret 4 So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.
 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

How is this (a == b) rather than (a != b)?

I don't understand your question.

As per the other posts, Eq2 actually takes 2 instructions: Eq2: ... sub EAX,0Ch[ECX] not EAX; And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! : Eq1: ... cmp EAX,0Ch[ECX] sete EAX; (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html) It seems to me perfectly valid, is there any problem here? What does the original Eq1 even do? : sub ECX,0Ch[EDX] cmp ECX,1 // Huh? sbb EAX,EAX neg EAX -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 30 2006
next sibling parent kris <foo bar.com> writes:
Bruno Medeiros wrote:
 Walter Bright wrote:
 
 Stewart Gordon wrote:

 Walter Bright wrote:

 Stewart Gordon wrote:

 xs0 wrote:
 <snip>

 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes
 return this.bar-foo.bar?1:0; // 3 instructions


If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?

If anything resembling the above, then return this.bar-foo.bar?0:1;

? Let's look at an example: class Foo { int foo, bar; int Eq1(Foo foo) { return this.bar-foo.bar?0:1; } int Eq2(Foo foo) { return this.bar-foo.bar; } } which generates: Eq1: mov EDX,4[ESP] mov ECX,0Ch[EAX] sub ECX,0Ch[EDX] cmp ECX,1 sbb EAX,EAX neg EAX ret 4 Eq2: mov ECX,4[ESP] mov EAX,0Ch[EAX] sub EAX,0Ch[ECX] ret 4 So we have 4 instructions generated rather than 1. If there's a trick to generate only one instruction for Eq1, I'd like to know about it.
 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

How is this (a == b) rather than (a != b)?

I don't understand your question.

As per the other posts, Eq2 actually takes 2 instructions: Eq2: ... sub EAX,0Ch[ECX] not EAX; And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it was only 2 instructions too, CMP and SETE ! : Eq1: ... cmp EAX,0Ch[ECX] sete EAX; (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html) It seems to me perfectly valid, is there any problem here?

Yes indeed. Well spotted! On anything supporting the 386 instruction set (and D is targeted for 32-bit devices only), there's really no performance advantage in returning an int over returning a bool. This should be addressed, so that some of the core APIs can be cleaned up appropriately?
 
 What does the original Eq1 even do? :
 
     sub     ECX,0Ch[EDX]
     cmp     ECX,1       // Huh?
     sbb     EAX,EAX
     neg     EAX
 
 

That's old-skool, pre-386 hacking :)
Jul 30 2006
prev sibling next sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Bruno Medeiros wrote:
 And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
 was only 2 instructions too, CMP and SETE ! :
 
 Eq1:
     ...
     cmp     EAX,0Ch[ECX]
     sete    EAX;
 
 (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
 It seems to me perfectly valid, is there any problem here?

Interesting instruction. Seems to have the exact semantics needed for these situations. You'd almost think CPU designers care about what people want to do with their products :P.
 What does the original Eq1 even do? :

     mov     ECX,0Ch[EAX]

     sub     ECX,0Ch[EDX]

     cmp     ECX,1       // Huh?

foo.bar == this.bar), clears it otherwise.
     sbb     EAX,EAX

1) or 0 (if carry == 0).
     neg     EAX

A bit weird at first glance, but it works as advertised :). But indeed, a cmp/sete combo seems to do the same in less instructions.
Jul 30 2006
parent reply "Lionello Lunesu" <lio lunesu.remove.com> writes:
 But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster). L.
Aug 07 2006
next sibling parent Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Lionello Lunesu wrote:
 But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster).

Heh... You may have noticed I didn't use any word related to speed :). The reason for that is that I don't know much about optimization for speed, especially where pipelines etc. are involved... Hardware is weird.
Aug 07 2006
prev sibling parent reply kris <foo bar.com> writes:
Lionello Lunesu wrote:
But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster). L.

If you'd looked at the setne instruction linked previously, you'd have seen that it consumes 3 cycles. And no; there are no jump, loops, or any other reason to cause pipeline bubbles. If you need a primer on what causes modern CPUs to stall (the silly P4 in particular) then you could do a lot worse than to read the articles by Jon Stokes at ArsTechnica. Oh, and this is just daft. Why don't we all count the cycles for a call/return instead? Or, perhaps just exactly what it costs to compare the bytes of two strings until they start to look different? You'll find the cost of setne (and probably even the prior "extra" three instructions for boolean support) is relegated to background noise. Let's face it: int is likely used instead of bool for historical reasons; probably just an artifact left over from pre-80386 days. Would be nice to get that codegen cleaned up ~ especially since it was W who claimed the reasons were performance related. Hacking the high-level code with int vs boolean, just to reflect some archaic machine instruction, is one of those things that come under the umbrella of "premature optimization".
Aug 07 2006
parent Dave <Dave_member pathlink.com> writes:
kris wrote:
 Lionello Lunesu wrote:
 But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly instructions are actually slower than multiple lower-level ones. "loop" is the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne is faster). L.

If you'd looked at the setne instruction linked previously, you'd have seen that it consumes 3 cycles. And no; there are no jump, loops, or any other reason to cause pipeline bubbles. If you need a primer on what causes modern CPUs to stall (the silly P4 in particular) then you could do a lot worse than to read the articles by Jon Stokes at ArsTechnica. Oh, and this is just daft. Why don't we all count the cycles for a call/return instead? Or, perhaps just exactly what it costs to compare the bytes of two strings until they start to look different? You'll find the cost of setne (and probably even the prior "extra" three instructions for boolean support) is relegated to background noise. Let's face it: int is likely used instead of bool for historical reasons; probably just an artifact left over from pre-80386 days. Would be nice to get that codegen cleaned up ~ especially since it was W who claimed the reasons were performance related. Hacking the high-level code with int vs boolean, just to reflect some archaic machine instruction, is one of those things that come under the umbrella of "premature optimization".

Yea, AFAIK setne is supported by 386 onward, plus a quick check of the GDC code that uses it seems to indicate it is faster (from the Eq1 and Eq2 samples earlier in the thread). But you're right - in many cases it will probably be background noise anyhow 'cause you only save a couple of cycles. As an aside, I think the current DMD backend may be well suited to the new Dual Core CPU because it hasn't been chasing after optimum performance on the P4 with it's 20 stage pipeline or whatever <g>
Aug 07 2006
prev sibling parent Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Bruno Medeiros wrote:
 
 What does the original Eq1 even do? :
 
     sub     ECX,0Ch[EDX]
     cmp     ECX,1       // Huh?
     sbb     EAX,EAX
     neg     EAX
 
 

[PS: I've read Frits answer after writing this: ] Ah I get it now... wasn't understanding what borrow (the mathematical notion) was, since I'm not a native english speaker. Nothing a wikipedia lookup didn't solve. So, correct me if I'm wrong: (when I say EDX I mean 0Ch[EDX] or whatever) // sets the carry flag if zero flag is on, // that is, if ECX == EDX (from previous instruction) cmp ECX,1 // sets EAX as zero and also subtracts one if carry flag is set // that is, EAX = -1 if ECX == EDX and EAX = 0 if ECX != EDX sbb EAX,EAX // two's complement negation of EAX, 0 becomes 0, -1 becomes 1 neg EAX // end result: EAX = 1 if ECX == EDX and EAX = 0 if ECX != EDX So yeah, it seems these 3 instructions do the same as SETE ... ? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 30 2006
prev sibling parent reply Walter Bright <newshound digitalmars.com> writes:
Bruno Medeiros wrote:
 But the question remains, is it then less efficient to return a byte 
 than a int?

Yes. It's also less efficient to constrain the results to 0 or 1.
 Why?

Consider: a = 0x1000; b = 0x2000; Now convert (a == b) into a bool. If the result is an int, I can just do (a - b), one instruction. Converting it to a byte, or to 1 or 0, takes more.
 And if so isn't there a way for the compiler to somehow 
 optimize it?

The math is inevitable <g>.
 I find it a bit hard to believe that nowadays there isn't sufficient 
 compiler and/or CPU technology to somehow make a bool(byte) return value 
 as efficient as a int one. :/

I work with what the CPU makes available. P.S. Inevitably, some will ask "who cares" about these small efficiencies. The trouble is, these kinds of things often appear in tight loops, where small inefficiencies get multiplied by millions.
Jul 28 2006
next sibling parent reply Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Walter Bright wrote:
 Bruno Medeiros wrote:
 But the question remains, is it then less efficient to return a byte 
 than a int?

Yes. It's also less efficient to constrain the results to 0 or 1.
 Why?

Consider: a = 0x1000; b = 0x2000; Now convert (a == b) into a bool. If the result is an int, I can just do (a - b), one instruction. Converting it to a byte, or to 1 or 0, takes more.
 And if so isn't there a way for the compiler to somehow optimize it?

The math is inevitable <g>.

Well, let's think about the other way around then. Why should bool be constrained to 0 or 1? Why not, same as kris said, 0 would be false, and non zero would be true. Then we could have an opEquals or any function returning a bool instead of int, without penalty loss. The only shortcoming I see is that it would be slower to compare two bool /variables/: (b1 == b2) that expression is currently just 1 instruction, a CMP, but without the 0,1 restriction it would be more (3, I think, have to check that). However, is that significantly worse? I think not. I think comparison between two bool _variables_ is likely very rare, and when it happens it is also probably not performance critical. (statistical references?) Note: this would not affect at all comparisons between a bool variable and a bool literal. Like (b == true) or (b == false). Or is there another reason for the 0,1 restriction? -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Jul 30 2006
parent reply Walter Bright <newshound digitalmars.com> writes:
Bruno Medeiros wrote:
 Well, let's think about the other way around then. Why should bool be 
 constrained to 0 or 1? Why not, same as kris said, 0 would be false, and 
 non zero would be true. Then we could have an opEquals or any function 
 returning a bool instead of int, without penalty loss.
 
 The only shortcoming I see is that it would be slower to compare two 
 bool /variables/:
    (b1 == b2)
 that expression is currently just 1 instruction, a CMP, but without the 
 0,1 restriction it would be more (3, I think, have to check that). 
 However, is that significantly worse? I think not. I think comparison 
 between two bool _variables_ is likely very rare, and when it happens it 
 is also probably not performance critical. (statistical references?)
 Note: this would not affect at all comparisons between a bool variable 
 and a bool literal. Like (b == true) or (b == false).

I think most programmers would find this to be very surprising behavior. I know I would.
Jul 30 2006
parent Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:
Walter Bright wrote:
 Bruno Medeiros wrote:
 Well, let's think about the other way around then. Why should bool be 
 constrained to 0 or 1? Why not, same as kris said, 0 would be false, 
 and non zero would be true. Then we could have an opEquals or any 
 function returning a bool instead of int, without penalty loss.

 The only shortcoming I see is that it would be slower to compare two 
 bool /variables/:
    (b1 == b2)
 that expression is currently just 1 instruction, a CMP, but without 
 the 0,1 restriction it would be more (3, I think, have to check that). 
 However, is that significantly worse? I think not. I think comparison 
 between two bool _variables_ is likely very rare, and when it happens 
 it is also probably not performance critical. (statistical references?)
 Note: this would not affect at all comparisons between a bool variable 
 and a bool literal. Like (b == true) or (b == false).

I think most programmers would find this to be very surprising behavior. I know I would.

Surprising behavior? What surprising behavior, those are all implementation details, they have not a bearing on language/program behavior. And how about the alternative of using the SETE instruction for bool restriction?, you haven't commented on that yet... -- Bruno Medeiros - MSc in CS/E student http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D
Aug 01 2006
prev sibling parent Dave <Dave_member pathlink.com> writes:
Walter Bright wrote:
 P.S. Inevitably, some will ask "who cares" about these small 
 efficiencies. The trouble is, these kinds of things often appear in 
 tight loops, where small inefficiencies get multiplied by millions.

I consider this kind of stuff the compilers job -- so if I write or maintain code that is slow, I know there is probably something I can do about it w/o having to drop into assembly. Personally I've spent a huge amount of time tuning code and I can't tell you the positive effect that has on end-users. IMHO bad performance is often the "forgotten bug" (that's not to say the budget should be busted on that "last 20%" either though). - Dave
Jul 31 2006