digitalmars.D.bugs - int opEquals(Object), and other legacy ints

Stewart Gordon (23/23) Jul 20 2006 There seem to be a number of leftovers from before we had a bool type,

Walter Bright (4/6) Jul 21 2006 They are typed as returning int for efficiency reasons. These functions

Bruno Medeiros (5/12) Jul 21 2006 But isn't bool an int internally? Why is it less efficient to use a bool...

Jarrett Billingsley (3/4) Jul 21 2006 It's a byte internally.
Walter Bright (2/12) Jul 21 2006 It's a byte internally, and is constrained to be one of the values 0 or ...

Bruno Medeiros (11/24) Jul 27 2006 Duh, it's a byte of course, I should have checked that.

xs0 (6/23) Jul 28 2006 but if the return type is bool, it becomes

Stewart Gordon (18/34) Jul 28 2006 If it does this, then there's a serious bug in the compiler.

Walter Bright (8/35) Jul 28 2006 The only difference between a CMP and a SUB instruction is where the

kris (5/54) Jul 28 2006 So, why not treat false as 0, and true as not 0? That way, it works

Frits van Bommel (3/19) Jul 28 2006 Then what would happen if a and b differ by, say, 256? Remember, an int

kris (14/40) Jul 28 2006 Sure, but it's generally more efficient to do all logical and arithmetic...

Frits van Bommel (5/47) Jul 29 2006 Actually, I'm pretty sure testing for zero is already how it's done

Stewart Gordon (16/48) Jul 29 2006 If anything resembling the above, then

Walter Bright (31/61) Jul 29 2006 ? Let's look at an example:

Deewiant (4/12) Jul 30 2006 (a - b), if a and b are equal ints, evaluates to 0, which is generally

Walter Bright (4/15) Jul 30 2006 Oh, I see what you mean.

Stewart Gordon (19/37) Jul 30 2006 Exactly. But because what we have is opEquals and not opNotEquals, the

Bruno Medeiros (22/92) Jul 30 2006 As per the other posts, Eq2 actually takes 2 instructions:

kris (7/119) Jul 30 2006 Yes indeed. Well spotted! On anything supporting the 386 instruction set...
Frits van Bommel (14/30) Jul 30 2006 Interesting instruction. Seems to have the exact semantics needed for

Lionello Lunesu (5/6) Aug 07 2006 But is it faster? I've noticed that many of the higher-level assembly

Frits van Bommel (5/11) Aug 07 2006 Heh... You may have noticed I didn't use any word related to speed :).
kris (18/29) Aug 07 2006 If you'd looked at the setne instruction linked previously, you'd have

Dave (7/40) Aug 07 2006 Yea, AFAIK setne is supported by 386 onward, plus a quick check of the G...

Bruno Medeiros (19/28) Jul 30 2006 [PS: I've read Frits answer after writing this: ]

Walter Bright (12/20) Jul 28 2006 Consider:

Bruno Medeiros (19/40) Jul 30 2006 Well, let's think about the other way around then. Why should bool be

Walter Bright (3/18) Jul 30 2006 I think most programmers would find this to be very surprising behavior....

Bruno Medeiros (9/28) Aug 01 2006 Surprising behavior? What surprising behavior, those are all

Dave (9/12) Jul 31 2006 I consider this kind of stuff the compilers job -- so if I write or

Stewart Gordon <smjg_1998 yahoo.com> writes:

There seem to be a number of leftovers from before we had a bool type, 
and many people were using the int type to pass booleans around.

The most obvious is int opEquals(Object) defined in the Object class. 
Changing this'll break a considerable amount of existing code - but then 
again, the 0.163 change of making imports private by default has done 
this already.

But there are many functions in Phobos that can be cleaned up a bit 
without doing much harm.  Just to name a few....

std.string.iswhite
std.string.inPattern
std.ctype.isalnum (indeed, most of the functions in std.ctype)
std.file.exists
std.file.isfile
std.file.isdir
std.intrinsic.bt
std.intrinsic.btc
std.intrinsic.btr
std.intrinsic.bts
std.math.isnan (and other is* functions)
std.math.signbit

Going through the other modules will probably reveal many more, but I 
haven't checked.

Stewart.

Jul 20 2006

Walter Bright <newshound digitalmars.com> writes:

Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool type, 
 and many people were using the int type to pass booleans around.

They are typed as returning int for efficiency reasons. These functions 
often appear in performance critical loops, where an extra instruction 
or two makes a difference.

Jul 21 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Walter Bright wrote:
 Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool type, 
 and many people were using the int type to pass booleans around.

 
 They are typed as returning int for efficiency reasons. These functions 
 often appear in performance critical loops, where an extra instruction 
 or two makes a difference.

But isn't bool an int internally? Why is it less efficient to use a bool?

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 21 2006

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Bruno Medeiros" <brunodomedeirosATgmail SPAM.com> wrote in message 
news:e9qd21$2ueu$2 digitaldaemon.com...

 But isn't bool an int internally? Why is it less efficient to use a bool?

It's a byte internally.

Jul 21 2006

Walter Bright <newshound digitalmars.com> writes:

Bruno Medeiros wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool 
 type, and many people were using the int type to pass booleans around.

 They are typed as returning int for efficiency reasons. These 
 functions often appear in performance critical loops, where an extra 
 instruction or two makes a difference.

 
 But isn't bool an int internally? Why is it less efficient to use a bool?

It's a byte internally, and is constrained to be one of the values 0 or 1.

Jul 21 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Walter Bright wrote:
 Bruno Medeiros wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 There seem to be a number of leftovers from before we had a bool 
 type, and many people were using the int type to pass booleans around.

 They are typed as returning int for efficiency reasons. These 
 functions often appear in performance critical loops, where an extra 
 instruction or two makes a difference.

 But isn't bool an int internally? Why is it less efficient to use a bool?

 
 It's a byte internally, and is constrained to be one of the values 0 or 1.

Duh, it's a byte of course, I should have checked that.

But the question remains, is it then less efficient to return a byte 
than a int? Why? And if so isn't there a way for the compiler to somehow 
optimize it?
I find it a bit hard to believe that nowadays there isn't sufficient 
compiler and/or CPU technology to somehow make a bool(byte) return value 
as efficient as a int one. :/

-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 27 2006

xs0 <xs0 xs0.com> writes:

 It's a byte internally, and is constrained to be one of the values 0 
 or 1.

 
 Duh, it's a byte of course, I should have checked that.
 
 But the question remains, is it then less efficient to return a byte 
 than a int? Why? And if so isn't there a way for the compiler to somehow 
 optimize it?
 I find it a bit hard to believe that nowadays there isn't sufficient 
 compiler and/or CPU technology to somehow make a bool(byte) return value 
 as efficient as a int one. :/

Well, I'm just guessing, but I think something like

 int opEquals(Foo foo)
 {
     return this.bar == foo.bar;
 }

is compiled to something like

 return this.bar-foo.bar; // 1 instruction

but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

It's the 1/0 constraint on bools that causes the slowness, not the size 
(stack is usually size_t-aligned anyway)


xs0

Jul 28 2006

Stewart Gordon <smjg_1998 yahoo.com> writes:

xs0 wrote:
<snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

If it does this, then there's a serious bug in the compiler.

Moreover, what's your evidence that subtracting one number from another 
might be more efficient than comparing them for equality directly?

 It's the 1/0 constraint on bools that causes the slowness, not the size 
 (stack is usually size_t-aligned anyway)

But if the function only tries to return 0 or 1 anyway, then what 
difference does it make?  At the moment, I can't think of an example of 
equality testing that can be made more efficient by being allowed to 
return a value other than 0 or 1.

Stewart.

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:-  C++  a->--- UB  P+ L E  W++  N+++ o K-  w++  O? M V? PS- 
PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Jul 28 2006

Walter Bright <newshound digitalmars.com> writes:

Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

 If it does this, then there's a serious bug in the compiler.

What instruction sequence do expect to be generated for it?

 Moreover, what's your evidence that subtracting one number from another 
 might be more efficient than comparing them for equality directly?

The only difference between a CMP and a SUB instruction is where the 
result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts 
the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a 
register takes more instructions.

 It's the 1/0 constraint on bools that causes the slowness, not the 
 size (stack is usually size_t-aligned anyway)

 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example of 
 equality testing that can be made more efficient by being allowed to 
 return a value other than 0 or 1.

I can. (a == b), where a and b are ints, can be implemented as (a - b), 
and the result is int 0 for equality, int !=0 for inequality.

Jul 28 2006

kris <foo bar.com> writes:

Walter Bright wrote:
 Stewart Gordon wrote:

 xs0 wrote:
 <snip>

 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

 If it does this, then there's a serious bug in the compiler.

 What instruction sequence do expect to be generated for it?

 Moreover, what's your evidence that subtracting one number from 
 another might be more efficient than comparing them for equality 
 directly?

 The only difference between a CMP and a SUB instruction is where the 
 result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts 
 the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a 
 register takes more instructions.

 It's the 1/0 constraint on bools that causes the slowness, not the 
 size (stack is usually size_t-aligned anyway)

 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.

 I can. (a == b), where a and b are ints, can be implemented as (a - b), 
 and the result is int 0 for equality, int !=0 for inequality.

So, why not treat false as 0, and true as not 0?  That way, it works 
just the same as the "int" version does (and comparing/testing against 
zero doesn't hit the address-bus). Yes, I can see some potential for 
concern there; but is there anything insurmountable?

Jul 28 2006

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

kris wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.


 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

 
 
 So, why not treat false as 0, and true as not 0?  That way, it works 
 just the same as the "int" version does (and comparing/testing against 
 zero doesn't hit the address-bus). Yes, I can see some potential for 
 concern there; but is there anything insurmountable?

Then what would happen if a and b differ by, say, 256? Remember, an int 
is 4 bytes, a bool is only 1.

Jul 28 2006

kris <foo bar.com> writes:

Frits van Bommel wrote:
 kris wrote:
 
 Walter Bright wrote:

 Stewart Gordon wrote:

 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.



 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.



 So, why not treat false as 0, and true as not 0?  That way, it works 
 just the same as the "int" version does (and comparing/testing against 
 zero doesn't hit the address-bus). Yes, I can see some potential for 
 concern there; but is there anything insurmountable?

 
 
 Then what would happen if a and b differ by, say, 256? Remember, an int 
 is 4 bytes, a bool is only 1.

Sure, but it's generally more efficient to do all logical and arithmetic 
operations in the native width of the device anyway ~ generally 32bits 
for current D compilers.

If you're talking about issues related to actually storing a bool 
result, then that's part of the "concerns" noted above. Bool values 
derived in certains ways may need to be folded for storage, but not for 
testing. The subtraction case above may be included in that group, but 
testing should still only require a compare against zero (for both true 
and false). I'm suggesting only that zero values should *always* be used 
to test for 'truth' ~ never 1, or 255, or any value other than zero. 
Anywhere the keyword "true" is used (or implied) for comparative 
purposes, test against zero and invert the jmp-condition instead. If 
that's not done already, it would probably speed things up in many cases.

Jul 28 2006

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

kris wrote:
 Frits van Bommel wrote:
 kris wrote:

 Walter Bright wrote:

 Stewart Gordon wrote:

 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an 
 example of equality testing that can be made more efficient by 
 being allowed to return a value other than 0 or 1.



 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.



 So, why not treat false as 0, and true as not 0?  That way, it works 
 just the same as the "int" version does (and comparing/testing 
 against zero doesn't hit the address-bus). Yes, I can see some 
 potential for concern there; but is there anything insurmountable?


 Then what would happen if a and b differ by, say, 256? Remember, an 
 int is 4 bytes, a bool is only 1.

 
 Sure, but it's generally more efficient to do all logical and arithmetic 
 operations in the native width of the device anyway ~ generally 32bits 
 for current D compilers.
 
 If you're talking about issues related to actually storing a bool 
 result, then that's part of the "concerns" noted above. Bool values 
 derived in certains ways may need to be folded for storage, but not for 
 testing. The subtraction case above may be included in that group, but 
 testing should still only require a compare against zero (for both true 
 and false). I'm suggesting only that zero values should *always* be used 
 to test for 'truth' ~ never 1, or 255, or any value other than zero. 
 Anywhere the keyword "true" is used (or implied) for comparative 
 purposes, test against zero and invert the jmp-condition instead. If 
 that's not done already, it would probably speed things up in many cases.

Actually, I'm pretty sure testing for zero is already how it's done 
(just with 1-byte operands instead of 4-byte ones).

Something else: if there are multiple ways to represent true then 
equality testing just got a lot more complicated :).

Jul 29 2006

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter Bright wrote:
 Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

 If it does this, then there's a serious bug in the compiler.

 What instruction sequence do expect to be generated for it?

If anything resembling the above, then

     return this.bar-foo.bar?0:1;

which cancels out the advantage you mention next:

<snip>
 The only difference between a CMP and a SUB instruction is where the 
 result ends up. But the CMP doesn't generate 0 or 1 as a result, it puts 
 the result in the FLAGS register. Converting the FLAGS to a 0 or 1 in a 
 register takes more instructions.

<snip>
 But if the function only tries to return 0 or 1 anyway, then what 
 difference does it make?  At the moment, I can't think of an example 
 of equality testing that can be made more efficient by being allowed 
 to return a value other than 0 or 1.

 I can. (a == b), where a and b are ints, can be implemented as (a - b), 
 and the result is int 0 for equality, int !=0 for inequality.

How is this (a == b) rather than (a != b)?

Stewart.

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:-  C++  a->--- UB  P+ L E  W++  N+++ o K-  w++  O? M V? PS- 
PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Jul 29 2006

Walter Bright <newshound digitalmars.com> writes:

Stewart Gordon wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions


 If it does this, then there's a serious bug in the compiler.

 What instruction sequence do expect to be generated for it?

 
 If anything resembling the above, then
 
     return this.bar-foo.bar?0:1;

? Let's look at an example:

class Foo
{
     int foo, bar;

     int Eq1(Foo foo)
     {
         return this.bar-foo.bar?0:1;
     }

     int Eq2(Foo foo)
     {
         return this.bar-foo.bar;
     }
}

which generates:

     Eq1:
                 mov     EDX,4[ESP]
                 mov     ECX,0Ch[EAX]
                 sub     ECX,0Ch[EDX]
                 cmp     ECX,1
                 sbb     EAX,EAX
                 neg     EAX
                 ret     4
     Eq2:
                 mov     ECX,4[ESP]
                 mov     EAX,0Ch[EAX]
                 sub     EAX,0Ch[ECX]
                 ret     4

So we have 4 instructions generated rather than 1. If there's a trick to 
generate only one instruction for Eq1, I'd like to know about it.

 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

 
 How is this (a == b) rather than (a != b)?

I don't understand your question.

Jul 29 2006

Deewiant <deewiant.doesnotlike.spam gmail.com> writes:

Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 I can. (a == b), where a and b are ints, can be implemented as (a -
 b), and the result is int 0 for equality, int !=0 for inequality.

 How is this (a == b) rather than (a != b)?

 
 I don't understand your question.

(a - b), if a and b are equal ints, evaluates to 0, which is generally
considered to mean false. So isn't (a - b) actually a way of finding (a != b),
instead of (a == b)?

Jul 30 2006

Walter Bright <newshound digitalmars.com> writes:

Deewiant wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 I can. (a == b), where a and b are ints, can be implemented as (a -
 b), and the result is int 0 for equality, int !=0 for inequality.

 How is this (a == b) rather than (a != b)?

 I don't understand your question.

 
 (a - b), if a and b are equal ints, evaluates to 0, which is generally
 considered to mean false. So isn't (a - b) actually a way of finding (a != b),
 instead of (a == b)?

Oh, I see what you mean.

To invert the result would take another 2 instructions for a total of 3, 
still less than 4.

Jul 30 2006

Stewart Gordon <smjg_1998 yahoo.com> writes:

Walter Bright wrote:
 Deewiant wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 I can. (a == b), where a and b are ints, can be implemented as (a -
 b), and the result is int 0 for equality, int !=0 for inequality.

 How is this (a == b) rather than (a != b)?

 I don't understand your question.

 (a - b), if a and b are equal ints, evaluates to 0, which is generally
 considered to mean false. So isn't (a - b) actually a way of finding 
 (a != b),
 instead of (a == b)?

 
 Oh, I see what you mean.
 
 To invert the result would take another 2 instructions for a total of 3, 
 still less than 4.

Exactly.  But because what we have is opEquals and not opNotEquals, the 
benefit of fewer instructions is lost (except when opEquals is simple 
enough that the compiler can inline and optimise away the double negation).

Indeed, on this basis, if we had opNotEquals then it would be just be 
equivalent to opCmp for many types.  So I can see people thinking that 
opNotEquals should just call opCmp by default.  However, there's a 
problem with this idea - for classes that have no ordering, even the 
current behaviour of comparing object references would have to be 
explicitly programmed in.

Stewart.

-- 
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCS/M d- s:-  C++  a->--- UB  P+ L E  W++  N+++ o K-  w++  O? M V? PS- 
PE- Y? PGP- t- 5? X? R b DI? D G e++++ h-- r-- !y
------END GEEK CODE BLOCK------

My e-mail is valid but not my primary mailbox.  Please keep replies on 
the 'group where everyone may benefit.

Jul 30 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Walter Bright wrote:
 Stewart Gordon wrote:
 Walter Bright wrote:
 Stewart Gordon wrote:
 xs0 wrote:
 <snip>
 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

 If it does this, then there's a serious bug in the compiler.

 What instruction sequence do expect to be generated for it?

 If anything resembling the above, then

     return this.bar-foo.bar?0:1;

 ? Let's look at an example:

 class Foo
 {
     int foo, bar;

     int Eq1(Foo foo)
     {
         return this.bar-foo.bar?0:1;
     }

     int Eq2(Foo foo)
     {
         return this.bar-foo.bar;
     }
 }

 which generates:

     Eq1:
                 mov     EDX,4[ESP]
                 mov     ECX,0Ch[EAX]
                 sub     ECX,0Ch[EDX]
                 cmp     ECX,1
                 sbb     EAX,EAX
                 neg     EAX
                 ret     4
     Eq2:
                 mov     ECX,4[ESP]
                 mov     EAX,0Ch[EAX]
                 sub     EAX,0Ch[ECX]
                 ret     4

 So we have 4 instructions generated rather than 1. If there's a trick to 
 generate only one instruction for Eq1, I'd like to know about it.

 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

 How is this (a == b) rather than (a != b)?

 I don't understand your question.

As per the other posts, Eq2 actually takes 2 instructions:

Eq2:
	...
	sub     EAX,0Ch[ECX]
	not	EAX;

And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
was only 2 instructions too, CMP and SETE ! :

Eq1:
	...
	cmp     EAX,0Ch[ECX]
	sete	EAX;

(http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
It seems to me perfectly valid, is there any problem here?

What does the original Eq1 even do? :

	sub     ECX,0Ch[EDX]
	cmp     ECX,1       // Huh?
	sbb     EAX,EAX
	neg     EAX

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 30 2006

kris <foo bar.com> writes:

Bruno Medeiros wrote:
 Walter Bright wrote:

 Stewart Gordon wrote:

 Walter Bright wrote:

 Stewart Gordon wrote:

 xs0 wrote:
 <snip>

 Well, I'm just guessing, but I think something like

  > int opEquals(Foo foo)
  > {
  >     return this.bar == foo.bar;
  > }

 is compiled to something like

 return this.bar-foo.bar; // 1 instruction

 but if the return type is bool, it becomes

 return this.bar-foo.bar?1:0; // 3 instructions

 If it does this, then there's a serious bug in the compiler.

 What instruction sequence do expect to be generated for it?

 If anything resembling the above, then

     return this.bar-foo.bar?0:1;

 ? Let's look at an example:

 class Foo
 {
     int foo, bar;

     int Eq1(Foo foo)
     {
         return this.bar-foo.bar?0:1;
     }

     int Eq2(Foo foo)
     {
         return this.bar-foo.bar;
     }
 }

 which generates:

     Eq1:
                 mov     EDX,4[ESP]
                 mov     ECX,0Ch[EAX]
                 sub     ECX,0Ch[EDX]
                 cmp     ECX,1
                 sbb     EAX,EAX
                 neg     EAX
                 ret     4
     Eq2:
                 mov     ECX,4[ESP]
                 mov     EAX,0Ch[EAX]
                 sub     EAX,0Ch[ECX]
                 ret     4

 So we have 4 instructions generated rather than 1. If there's a trick 
 to generate only one instruction for Eq1, I'd like to know about it.

 I can. (a == b), where a and b are ints, can be implemented as (a - 
 b), and the result is int 0 for equality, int !=0 for inequality.

 How is this (a == b) rather than (a != b)?

 I don't understand your question.

 As per the other posts, Eq2 actually takes 2 instructions:

 Eq2:
     ...
     sub     EAX,0Ch[ECX]
     not    EAX;

 And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
 was only 2 instructions too, CMP and SETE ! :

 Eq1:
     ...
     cmp     EAX,0Ch[ECX]
     sete    EAX;

 (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
 It seems to me perfectly valid, is there any problem here?

Yes indeed. Well spotted! On anything supporting the 386 instruction set 
(and D is targeted for 32-bit devices only), there's really no 
performance advantage in returning an int over returning a bool.

This should be addressed, so that some of the core APIs can be cleaned 
up appropriately?

 What does the original Eq1 even do? :

     sub     ECX,0Ch[EDX]
     cmp     ECX,1       // Huh?
     sbb     EAX,EAX
     neg     EAX

That's old-skool, pre-386 hacking :)

Jul 30 2006

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Bruno Medeiros wrote:
 And uuuh.., I've checked gcc's generated code for a C++'s Eq1, and it 
 was only 2 instructions too, CMP and SETE ! :
 
 Eq1:
     ...
     cmp     EAX,0Ch[ECX]
     sete    EAX;
 
 (http://www.cs.tut.fi/~siponen/upros/intel/instr/sete_setz.html)
 It seems to me perfectly valid, is there any problem here?

Interesting instruction. Seems to have the exact semantics needed for 
these situations. You'd almost think CPU designers care about what 
people want to do with their products :P.


 What does the original Eq1 even do? :

Step by step:
     mov     ECX,0Ch[EAX]

(You skipped this one) Loads this.bar into ECX.
     sub     ECX,0Ch[EDX]

Subtracts foo.bar from ECX.
     cmp     ECX,1       // Huh?

Among other things, sets borrow (aka carry) flag if ECX == 0 (i.e. if 
foo.bar == this.bar), clears it otherwise.
     sbb     EAX,EAX

Subtracts (EAX + borrow) from EAX, setting it to either -1 (if carry == 
1) or 0 (if carry == 0).
     neg     EAX

Negates EAX.

A bit weird at first glance, but it works as advertised :).


But indeed, a cmp/sete combo seems to do the same in less instructions.

Jul 30 2006

"Lionello Lunesu" <lio lunesu.remove.com> writes:

 But indeed, a cmp/sete combo seems to do the same in less instructions.

But is it faster? I've noticed that many of the higher-level assembly
instructions are actually slower than multiple lower-level ones. "loop" is
the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
is faster).

L.

Aug 07 2006

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Lionello Lunesu wrote:
 But indeed, a cmp/sete combo seems to do the same in less instructions.

 
 But is it faster? I've noticed that many of the higher-level assembly
 instructions are actually slower than multiple lower-level ones. "loop" is
 the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
 is faster).

Heh... You may have noticed I didn't use any word related to speed :). 
The reason for that is that I don't know much about optimization for 
speed, especially where pipelines etc. are involved...

Hardware is weird.

Aug 07 2006

kris <foo bar.com> writes:

Lionello Lunesu wrote:
But indeed, a cmp/sete combo seems to do the same in less instructions.

 
 
 But is it faster? I've noticed that many of the higher-level assembly
 instructions are actually slower than multiple lower-level ones. "loop" is
 the best example of this (dec ecx/jne is faster), or "rep" (again, dec/jne
 is faster).
 
 L. 
 
 

If you'd looked at the setne instruction linked previously, you'd have 
seen that it consumes 3 cycles. And no; there are no jump, loops, or any 
other reason to cause pipeline bubbles. If you need a primer on what 
causes modern CPUs to stall (the silly P4 in particular) then you could 
do a lot worse than to read the articles by Jon Stokes at ArsTechnica.

Oh, and this is just daft. Why don't we all count the cycles for a 
call/return instead? Or, perhaps just exactly what it costs to compare 
the bytes of two strings until they start to look different? You'll find 
the cost of setne (and probably even the prior "extra" three 
instructions for boolean support) is relegated to background noise.

Let's face it: int is likely used instead of bool for historical 
reasons; probably just an artifact left over from pre-80386 days. Would 
be nice to get that codegen cleaned up ~ especially since it was W who 
claimed the reasons were performance related. Hacking the high-level 
code with int vs boolean, just to reflect some archaic machine 
instruction, is one of those things that come under the umbrella of 
"premature optimization".

Aug 07 2006

Dave <Dave_member pathlink.com> writes:

kris wrote:
 Lionello Lunesu wrote:
 But indeed, a cmp/sete combo seems to do the same in less instructions.


 But is it faster? I've noticed that many of the higher-level assembly
 instructions are actually slower than multiple lower-level ones. 
 "loop" is
 the best example of this (dec ecx/jne is faster), or "rep" (again, 
 dec/jne
 is faster).

 L.

 
 If you'd looked at the setne instruction linked previously, you'd have 
 seen that it consumes 3 cycles. And no; there are no jump, loops, or any 
 other reason to cause pipeline bubbles. If you need a primer on what 
 causes modern CPUs to stall (the silly P4 in particular) then you could 
 do a lot worse than to read the articles by Jon Stokes at ArsTechnica.
 
 Oh, and this is just daft. Why don't we all count the cycles for a 
 call/return instead? Or, perhaps just exactly what it costs to compare 
 the bytes of two strings until they start to look different? You'll find 
 the cost of setne (and probably even the prior "extra" three 
 instructions for boolean support) is relegated to background noise.
 
 Let's face it: int is likely used instead of bool for historical 
 reasons; probably just an artifact left over from pre-80386 days. Would 
 be nice to get that codegen cleaned up ~ especially since it was W who 
 claimed the reasons were performance related. Hacking the high-level 
 code with int vs boolean, just to reflect some archaic machine 
 instruction, is one of those things that come under the umbrella of 
 "premature optimization".

Yea, AFAIK setne is supported by 386 onward, plus a quick check of the GDC code
that uses it seems 
to indicate it is faster (from the Eq1 and Eq2 samples earlier in the thread).

But you're right - in many cases it will probably be background noise anyhow
'cause you only save a 
couple of cycles.

As an aside, I think the current DMD backend may be well suited to the new Dual
Core CPU because it 
hasn't been chasing after optimum performance on the P4 with it's 20 stage
pipeline or whatever <g>

Aug 07 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Bruno Medeiros wrote:
 
 What does the original Eq1 even do? :
 
     sub     ECX,0Ch[EDX]
     cmp     ECX,1       // Huh?
     sbb     EAX,EAX
     neg     EAX
 
 

[PS: I've read Frits answer after writing this: ]

Ah I get it now... wasn't understanding what borrow (the mathematical 
notion) was, since I'm not a native english speaker. Nothing a wikipedia 
lookup didn't solve. So, correct me if I'm wrong:

(when I say EDX I mean 0Ch[EDX] or whatever)

// sets the carry flag if zero flag is on,
// that is, if ECX == EDX (from previous instruction)
   cmp   ECX,1

// sets EAX as zero and also subtracts one if carry flag is set
// that is, EAX = -1 if ECX == EDX and EAX = 0 if ECX != EDX
   sbb	EAX,EAX

// two's complement negation of EAX, 0 becomes 0, -1 becomes 1
   neg EAX
// end result: EAX = 1 if ECX == EDX and EAX = 0 if ECX != EDX

So yeah, it seems these 3 instructions do the same as SETE ... ?



-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 30 2006

Walter Bright <newshound digitalmars.com> writes:

Bruno Medeiros wrote:
 But the question remains, is it then less efficient to return a byte 
 than a int?

Yes. It's also less efficient to constrain the results to 0 or 1.

 Why?

Consider:

	a = 0x1000;
	b = 0x2000;

Now convert (a == b) into a bool. If the result is an int, I can just do 
(a - b), one instruction. Converting it to a byte, or to 1 or 0, takes more.

 And if so isn't there a way for the compiler to somehow 
 optimize it?

The math is inevitable <g>.

 I find it a bit hard to believe that nowadays there isn't sufficient 
 compiler and/or CPU technology to somehow make a bool(byte) return value 
 as efficient as a int one. :/

I work with what the CPU makes available.

P.S. Inevitably, some will ask "who cares" about these small 
efficiencies. The trouble is, these kinds of things often appear in 
tight loops, where small inefficiencies get multiplied by millions.

Jul 28 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Walter Bright wrote:
 Bruno Medeiros wrote:
 But the question remains, is it then less efficient to return a byte 
 than a int?

 
 Yes. It's also less efficient to constrain the results to 0 or 1.
 
 Why?

 
 Consider:
 
     a = 0x1000;
     b = 0x2000;
 
 Now convert (a == b) into a bool. If the result is an int, I can just do 
 (a - b), one instruction. Converting it to a byte, or to 1 or 0, takes 
 more.
 
 And if so isn't there a way for the compiler to somehow optimize it?

 
 The math is inevitable <g>.
 

Well, let's think about the other way around then. Why should bool be 
constrained to 0 or 1? Why not, same as kris said, 0 would be false, and 
non zero would be true. Then we could have an opEquals or any function 
returning a bool instead of int, without penalty loss.

The only shortcoming I see is that it would be slower to compare two 
bool /variables/:
    (b1 == b2)
that expression is currently just 1 instruction, a CMP, but without the 
0,1 restriction it would be more (3, I think, have to check that). 
However, is that significantly worse? I think not. I think comparison 
between two bool _variables_ is likely very rare, and when it happens it 
is also probably not performance critical. (statistical references?)
Note: this would not affect at all comparisons between a bool variable 
and a bool literal. Like (b == true) or (b == false).

Or is there another reason for the 0,1 restriction?

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 30 2006

Walter Bright <newshound digitalmars.com> writes:

Bruno Medeiros wrote:
 Well, let's think about the other way around then. Why should bool be 
 constrained to 0 or 1? Why not, same as kris said, 0 would be false, and 
 non zero would be true. Then we could have an opEquals or any function 
 returning a bool instead of int, without penalty loss.
 
 The only shortcoming I see is that it would be slower to compare two 
 bool /variables/:
    (b1 == b2)
 that expression is currently just 1 instruction, a CMP, but without the 
 0,1 restriction it would be more (3, I think, have to check that). 
 However, is that significantly worse? I think not. I think comparison 
 between two bool _variables_ is likely very rare, and when it happens it 
 is also probably not performance critical. (statistical references?)
 Note: this would not affect at all comparisons between a bool variable 
 and a bool literal. Like (b == true) or (b == false).

I think most programmers would find this to be very surprising behavior. 
I know I would.

Jul 30 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

Walter Bright wrote:
 Bruno Medeiros wrote:
 Well, let's think about the other way around then. Why should bool be 
 constrained to 0 or 1? Why not, same as kris said, 0 would be false, 
 and non zero would be true. Then we could have an opEquals or any 
 function returning a bool instead of int, without penalty loss.

 The only shortcoming I see is that it would be slower to compare two 
 bool /variables/:
    (b1 == b2)
 that expression is currently just 1 instruction, a CMP, but without 
 the 0,1 restriction it would be more (3, I think, have to check that). 
 However, is that significantly worse? I think not. I think comparison 
 between two bool _variables_ is likely very rare, and when it happens 
 it is also probably not performance critical. (statistical references?)
 Note: this would not affect at all comparisons between a bool variable 
 and a bool literal. Like (b == true) or (b == false).

 
 I think most programmers would find this to be very surprising behavior. 
 I know I would.

Surprising behavior? What surprising behavior, those are all 
implementation details, they have not a bearing on language/program 
behavior.

And how about the alternative of using the SETE instruction for bool 
restriction?, you haven't commented on that yet...

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Aug 01 2006

Dave <Dave_member pathlink.com> writes:

Walter Bright wrote:
 P.S. Inevitably, some will ask "who cares" about these small 
 efficiencies. The trouble is, these kinds of things often appear in 
 tight loops, where small inefficiencies get multiplied by millions.

I consider this kind of stuff the compilers job -- so if I write or 
maintain code that is slow, I know there is probably something I can do 
about it w/o having to drop into assembly.

Personally I've spent a huge amount of time tuning code and I can't tell 
you the positive effect that has on end-users. IMHO bad performance is 
often the "forgotten bug" (that's not to say the budget should be busted 
on that "last 20%" either though).

- Dave

Jul 31 2006

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - int opEquals(Object), and other legacy ints