www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Simple features that I've always missed from C...

reply Manu <turkeyman gmail.com> writes:
--0016e648adca60335d04af84cd5c
Content-Type: text/plain; charset=UTF-8

Some trivial binary operations that never had an expression in C/C++, I'd
love consideration for an operator or some sort of intrinsic for these.

*Roll/Rotate:* I'm loving the '>>>' operator, but I could often really do
with a rotate operator useful in many situations... '>>|' perhaps...
something like that?
  This is ugly: a = (a << x) | ((unsigned)a >> (sizeof(a)/8 - x)); ... and
I'm yet to see a compiler that will interpret that correctly.
  Additionally, if a vector type is every added, a rotate operator will
become even more useful.

*Count leading/trailing zeroes:* I don't know of any even slightly recent
architecture that doesn't have opcodes to count loading/trailing zeroes,
although they do exist, so perhaps this is a little dubious. I'm sure this
could be emulated for such architectures, but it might be unreasonably slow
if used...

*Min/Max operators:* GCC has the lovely <? and >? operators... a <? b ==
min(a, b) .. Why this hasn't been adopted by all C compilers is beyond me.
Surely this couldn't be much trouble to add? Again, super useful in
vector/maths heavy code too.

*Predecated selection:* Float, vector, and often enough even int math can
really benefit from using hardware select opcodes to avoid loads/stores. In
C there is no way to express this short of vendor specific intrinsics again.
'a > b ? a : b' seems like a simple enough expression for the compiler to
detect potential for a predecated select opcode (but in my experience, it
NEVER does), however, when considering vector types, the logic isn't so
clear in that format. Since hardware vectors implement component-wise
selection, the logical nature of the ?: operator doesn't really make sense.
  This could easily be considered an expansion of min/max... 'a <? b', 'a >?
b', 'a ==? b', 'a !=? b', etc. seems pretty natural if you're happy to
accept GCC's '<?' operators, and give the code generator the opportunity to
implement these things using hardware support.


C is terrible at expressing these concepts, resulting in
architecture/compiler specific intrinsics for each of them. Every time I've
ever written a maths library, or even just optimised some maths heavy
routines, these things come up, and I end up with code full of
architecture/platform/compiler ifdef's. I'd like to think they should be
standardised intrinsic features of the language (not implemented in the
standard library), so the code generator/back end has the most information
to generate proper code...

Cheers guys
- Manu

--0016e648adca60335d04af84cd5c
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Some trivial binary operations that never had an expression in C/C++, I&#39=
;d love consideration for an operator or some sort of intrinsic for these.<=
div><br></div><div><b>Roll/Rotate:</b> I&#39;m loving the &#39;&gt;&gt;&gt;=
&#39; operator, but I could often really do with a rotate operator useful i=
n many situations... &#39;&gt;&gt;|&#39; perhaps... something like that?</d=
iv>
<div>=C2=A0 This is ugly: a =3D (a &lt;&lt; x) | ((unsigned)a &gt;&gt; (siz=
eof(a)/8 - x)); ... and I&#39;m yet to see a compiler that will interpret t=
hat correctly.</div><div>=C2=A0 Additionally, if a vector type is every add=
ed, a rotate operator will become even more useful.</div>
<div><br></div><div><b>Count leading/trailing zeroes:</b> I don&#39;t know =
of any even slightly recent architecture that doesn&#39;t have opcodes to c=
ount loading/trailing zeroes, although they do exist, so perhaps this is a =
little dubious. I&#39;m sure this could be emulated for such architectures,=
 but it might be unreasonably slow if used...</div>
<div><br></div><div><b>Min/Max operators:</b> GCC has the lovely &lt;? and =
&gt;? operators... a &lt;? b =3D=3D min(a, b) .. Why this hasn&#39;t been a=
dopted by all C compilers is beyond me. Surely this couldn&#39;t be much tr=
ouble to add? Again, super useful in vector/maths heavy code too.</div>
<div><br></div><div><b>Predecated selection:</b>=C2=A0Float, vector, and of=
ten enough even int math can really benefit from using hardware select opco=
des to avoid loads/stores. In C there is no way to express this short of ve=
ndor specific intrinsics again. &#39;a &gt; b ? a : b&#39; seems like a sim=
ple enough expression for the compiler to detect potential for a predecated=
 select opcode (but in my experience, it NEVER does), however, when conside=
ring vector types, the logic isn&#39;t so clear in that format. Since hardw=
are vectors implement component-wise selection, the logical nature of the ?=
: operator doesn&#39;t really make sense.</div>
<div>=C2=A0 This could easily be considered an expansion of min/max... &#39=
;a &lt;? b&#39;, &#39;a &gt;? b&#39;, &#39;a =3D=3D? b&#39;, &#39;a !=3D? b=
&#39;, etc. seems pretty natural if you&#39;re happy to accept GCC&#39;s &#=
39;&lt;?&#39; operators, and give the code generator the opportunity to imp=
lement these things using hardware support.</div>
<div><br></div><div><br></div><div>C is terrible at expressing these concep=
ts, resulting in architecture/compiler specific intrinsics for each of them=
. Every time I&#39;ve ever written a maths library, or even just optimised =
some maths heavy routines, these things come up, and I end up with code ful=
l of architecture/platform/compiler ifdef&#39;s.=C2=A0I&#39;d like to think=
 they should be standardised intrinsic features of the language (not implem=
ented in the standard library), so the code generator/back end has the most=
 information to generate proper code...</div>
<div><br></div><div>Cheers guys</div><div>- Manu</div>

--0016e648adca60335d04af84cd5c--
Oct 17 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Manu:

 *Roll/Rotate:* I'm loving the '>>>' operator, but I could often really do
 with a rotate operator useful in many situations... '>>|' perhaps...
 something like that?
   This is ugly: a = (a << x) | ((unsigned)a >> (sizeof(a)/8 - x));

I have asked for a rotate intrinsic in Phobos, but Walter has added a rewrite rule instead, that turns D code to a rot. Personal experience has shown me that it's easy to write the operation in a slightly different way (like with signed instead of unsigned values) that causes a missed optimization. So I prefer still something specific, like a Phobos intrinsic, to explicitly ask for this operation to every present and future D compiler, with no risk of mistakes.
 *Min/Max operators:* GCC has the lovely <? and >? operators... a <? b ==
 min(a, b) .. Why this hasn't been adopted by all C compilers is beyond me.
 Surely this couldn't be much trouble to add? Again, super useful in
 vector/maths heavy code too.

This is cute. Surely max/min is a common operation to do, but often I have to find a max or min of a collection, where I think this operator can't be used. I don't think this operator is necessary, and it makes D code a bit less readable for people that don't know D.
 *Predecated selection:* Float, vector, and often enough even int math can
 really benefit from using hardware select opcodes to avoid loads/stores. In
 C there is no way to express this short of vendor specific intrinsics again.

I don't understand what you are asking here. Please show an example. There is an enhancement request that asks to support vector operations like this too (some CPUs support something like this in hardware): int[] a = [1,2,3,4]; int[] b = [4,3,2,1]; auto c = a[] > b[]; assert(c == [false, false, true, true]); Are operations like this what you are asking for here? Bye, bearophile
Oct 17 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/17/2011 4:45 PM, bearophile wrote:
 Manu:

 *Roll/Rotate:* I'm loving the '>>>' operator, but I could often really do
 with a rotate operator useful in many situations... '>>|' perhaps...
 something like that? This is ugly: a = (a<<  x) | ((unsigned)a>>
 (sizeof(a)/8 - x));

I have asked for a rotate intrinsic in Phobos, but Walter has added a rewrite rule instead, that turns D code to a rot. Personal experience has shown me that it's easy to write the operation in a slightly different way (like with signed instead of unsigned values) that causes a missed optimization. So I prefer still something specific, like a Phobos intrinsic, to explicitly ask for this operation to every present and future D compiler, with no risk of mistakes.

There's no need for a compiler intrinsic. Just write a function that does do the optimization, and call it. The signed versions "don't work" because a signed right shift is not the same thing as an unsigned right shift. For reference: void test236() { uint a; int shift; a = 7; shift = 1; int r; r = (a >> shift) | (a << (int.sizeof * 8 - shift)); assert(r == 0x8000_0003); r = (a << shift) | (a >> (int.sizeof * 8 - shift)); assert(a == 7); }
Oct 17 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:

 There's no need for a compiler intrinsic. Just write a function that does do
the 
 optimization, and call it.

Right. Two functions like this are worth putting somewhere in Phobos.
 The signed versions "don't work" because a signed right shift is not the same 
 thing as an unsigned right shift.

It was a mistake in my code. Thank you, bye, bearophile
Oct 18 2011
prev sibling next sibling parent kennytm <kennytm gmail.com> writes:
Manu <turkeyman gmail.com> wrote:
 
 *Min/Max operators:* GCC has the lovely <? and >? operators... a <? b ==
 min(a, b) .. Why this hasn't been adopted by all C compilers is beyond me.
 Surely this couldn't be much trouble to add? Again, super useful in
 vector/maths heavy code too.

FYI, g++ has deprecated these operators long time ago (since 4.0). http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Deprecated-Features.html
Oct 17 2011
prev sibling next sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Mon, 17 Oct 2011 16:53:42 -0400, Manu <turkeyman gmail.com> wrote:
[snip]
 *Count leading/trailing zeroes:* I don't know of any even slightly recent
 architecture that doesn't have opcodes to count loading/trailing zeroes,
 although they do exist, so perhaps this is a little dubious. I'm sure this
 could be emulated for such architectures, but it might be unreasonably slow
 if used...

D has this: check out std.intrinsic's bsr and bsl.
Oct 17 2011
parent reply Don <nospam nospam.com> writes:
On 18.10.2011 06:25, Robert Jacques wrote:
 On Mon, 17 Oct 2011 16:53:42 -0400, Manu <turkeyman gmail.com> wrote:
 [snip]
 *Count leading/trailing zeroes:* I don't know of any even slightly recent
 architecture that doesn't have opcodes to count loading/trailing zeroes,
 although they do exist, so perhaps this is a little dubious. I'm sure
 this
 could be emulated for such architectures, but it might be unreasonably
 slow
 if used...

D has this: check out std.intrinsic's bsr and bsl.

You mean bsr and bsf. Unfortunately, there are some big problems with them. What is bsr(0) ?
Oct 18 2011
parent reply Don <nospam nospam.com> writes:
On 18.10.2011 11:43, Manu wrote:
 On 18 October 2011 12:12, Don <nospam nospam.com
 <mailto:nospam nospam.com>> wrote:

     You mean bsr and bsf.
     Unfortunately, there are some big problems with them. What is bsr(0) ?


 True ;) .. but that's why the API needs to be defined and standardised.
 On PowerPC it returns 32 (or 64), and the x86 version returns 2 values,
 the position, and also a bool telling you if it was zero or not (useful
 for loop termination)

Even worse -- Intel says that the position value of bsr(0) is undefined. But AMD does define it, they say it's what was in the register before.
 I think all hardware that I've seen is easy to factor into the win32
 intrinsic api.

That would be nice. What do you think it should do for the zero case? Note that on x86, one possibility is to do a bsr followed by a cmov, to get the PowerPC semantics.
Oct 18 2011
parent Don <nospam nospam.com> writes:
On 19.10.2011 10:13, Manu wrote:
 Nicely spotted, I didn't realise the intel/amd distinction ;)

 Unless I'm mistaken, it is possible for D to return 'out' parameters by
 value right? (in additional return registers, no touching the stack?) ..
 Assuming that's the case you would surely standardise something more
 like the win32 intrinsic rather than one resembling the PPC opcode.
 If the function returns a bool that the value was zero or not, then I
 think it's fair to say the position is undefined (which supports the
 intel assertion).

 PPC's approach is more cleanly factored into the win32 model than the
 other way around I think, in terms of allowing the optimiser to trim the
 unused code. If the intrinsic generates implicit code to produce a bool
 from the value, it will surely be trimmed by the optimiser if that
 result is not used.

 While cmov might work nicely (although I really don't trust that opcode
 anyway, an intrinsic like bsr shouldn't be producing a hidden branch) on
 x86 to produce the PPC result, I'm not sure other architectures would
 have such a simple solution.

Most other architectures that I know of, use lzcnt instead. On AMD64 (not Intel) and on ARM, there's an LZCNT resp. CLZ instruction, which gives: lzcnt(x) = x? 63-bsr(x) : 64; Here's how it could be done: RAX lzcnt(EBX) { bsr RAX, RBX; cmovz RAX, -1 xor RAX, 63; }
 Again, I think the win32 approach is easier
 for all architectures to produce and for the optimiser to truncate if
 the calculated result is unused.

 bool bsf/bsr(int value, out int position); // this assumes that position
 will cleanly return in a second return register...

Seems to be equivalent to replacing the bsr with a comma expression: (position = native_bsr(value), value == 0) Do we really gain much by this? The more painful signature of the function somewhat discourages users from calling it with a zero value, but the undefined position is still exposed. So the original problem of undefined behaviour remains. There's maybe a performance improvement in the fairly rare case where there's a branch on zero value. Although theoretically, in existing code the optimizer could check for the sequence: bsr dest, src cmp src, 0 where only Z flag is required and remove the cmp, so I don't think the performance aspect should be rated very highly. It's a bit of a problem that AMD's bsf and bsr are so slow. They're really slow on Pentium 4 and Atom as well. Interestingly AMD's lzcnt is faster than their bsr. But since Intel doesn't support it, it's pretty useless outside of inline asm. I think we need to do a survey of as many architectures as possible, before we can decide what to do. As far as I know, bsr/bsf is unique to x86. If this is true, then bsf/bsr should probably be wrapped in version(x86), and discouraged from general use. A portable function (perhaps leadz, trailz) would need to provided as well, and recommended for general use.
Oct 20 2011
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--0016364ee67aa5571204af8f8e5c
Content-Type: text/plain; charset=UTF-8

On 18 October 2011 12:12, Don <nospam nospam.com> wrote:

 You mean bsr and bsf.
 Unfortunately, there are some big problems with them. What is bsr(0) ?

True ;) .. but that's why the API needs to be defined and standardised. On PowerPC it returns 32 (or 64), and the x86 version returns 2 values, the position, and also a bool telling you if it was zero or not (useful for loop termination) I think all hardware that I've seen is easy to factor into the win32 intrinsic api. --0016364ee67aa5571204af8f8e5c Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 18 October 2011 12:12, Don <span dir=3D"ltr">= &lt;<a href=3D"mailto:nospam nospam.com">nospam nospam.com</a>&gt;</span> w= rote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde= r-left:1px #ccc solid;padding-left:1ex;"> <div><div></div><div class=3D"h5">You mean bsr and bsf.</div></div> Unfortunately, there are some big problems with them. What is bsr(0) ?<br><= /blockquote><div><br></div><div>True ;) .. but that&#39;s why the API needs= to be defined and standardised.</div><div>On PowerPC it returns 32 (or 64)= , and the x86 version returns 2 values, the position, and also a bool telli= ng you if it was zero or not (useful for loop termination)<br> </div><div>I think all hardware that I&#39;ve seen is easy to factor into t= he win32 intrinsic api.</div></div> --0016364ee67aa5571204af8f8e5c--
Oct 18 2011
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--0016361e82105006f704af8f95e2
Content-Type: text/plain; charset=UTF-8

On 18 October 2011 05:11, kennytm <kennytm gmail.com> wrote:

 FYI, g++ has deprecated these operators long time ago (since 4.0).

 http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Deprecated-Features.html

Nooo! .. Removed in favour of the STL instead... well I for one thought they were a great idea, but apparently trumped by the standards mob. Doesn't mean they couldn't be considered for D though :) --0016361e82105006f704af8f95e2 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 18 October 2011 05:11, kennytm <span dir=3D"l= tr">&lt;<a href=3D"mailto:kennytm gmail.com">kennytm gmail.com</a>&gt;</spa= n> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;b= order-left:1px #ccc solid;padding-left:1ex;"> <div class=3D"im">FYI, g++ has deprecated these operators long time ago (si= nce 4.0).</div> <br> <a href=3D"http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Deprecated-Features.= html" target=3D"_blank">http://gcc.gnu.org/onlinedocs/gcc-4.0.4/gcc/Depreca= ted-Features.html</a><br> </blockquote></div><br><div>Nooo! .. Removed in favour of the STL instead..= . well I for one thought they were a great idea, but apparently trumped by = the standards mob.</div><div>Doesn&#39;t mean they couldn&#39;t be consider= ed for D though :)</div> --0016361e82105006f704af8f95e2--
Oct 18 2011
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
--0016e64f7eae10b8d104af8fde53
Content-Type: text/plain; charset=UTF-8

On 18 October 2011 02:45, bearophile <bearophileHUGS lycos.com> wrote:

 I have asked for a rotate intrinsic in Phobos, but Walter has added a
 rewrite rule instead, that turns D code to a rot.
 Personal experience has shown me that it's easy to write the operation in a
 slightly different way (like with signed instead of unsigned values) that
 causes a missed optimization. So I prefer still something specific, like a
 Phobos intrinsic, to explicitly ask for this operation to every present and
 future D compiler, with no risk of mistakes.

I agree, an intrinsic that guarantees compiler support, or even an operator... ;)
 *Predecated selection:* Float, vector, and often enough even int math can
 really benefit from using hardware select opcodes to avoid loads/stores.

 C there is no way to express this short of vendor specific intrinsics

I don't understand what you are asking here. Please show an example. There is an enhancement request that asks to support vector operations like this too (some CPUs support something like this in hardware): int[] a = [1,2,3,4]; int[] b = [4,3,2,1]; auto c = a[] > b[]; assert(c == [false, false, true, true]); Are operations like this what you are asking for here?

by predicated selection, I mean, code that will select from 2 values based on some predicate... code that looks like this: float c = (some comparison) ? x : z; .. This has hardware support on many modern architectures to perform it branch free, particularly important on PowerPC and other RISC chips. The vector equivalent depends on generating mask vectors from various comparisons (essentially the same as the scalar versions, but it would be nice to standardise that detail with a strict api). Working something like this: a = {1,2,3,4} b = {4,3,2,1} m = maskLessThan(a, b); -> m == { true, true, false, false }; (usually expressed by integer 0 or -1) c = select(m, a, b); -> c == {1, 2, 2, 1} Now this is effectively identical to: float c = a < b ? a : b; but in SIMD, but there's no nice expression in the language to do this. The details are occasionally slightly different on different architectures, hence I'd like to see a standard predecated selection API of some form, which will allow use of hardware opcodes for float/int, and also mapping to SIMD cleanly. This might possibly branch off into another topic about SIMD support in D, which appears to be basically non-existent. One of the real problems is lack of definition of SIMD types and behaviours. Also, this construct requires the concept of a mask vector (in essence a SIMD bool), which should be a concept factored into the SIMD design... On a side note, I've seen murmurings of support for syntax like you illustrate a few times (interpreting D arrays as candidates for hardware SIMD usage). While that MIGHT be a nice optimisation in isolated cases, I have very serious concerns about standardising that as the language mechanic for dealing with SIMD data types. I wrote a couple of emails about that in the past though. --0016e64f7eae10b8d104af8fde53 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div class=3D"gmail_quote">On 18 October 2011 02:45, bearophile <span dir= =3D"ltr">&lt;<a href=3D"mailto:bearophileHUGS lycos.com">bearophileHUGS lyc= os.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"= margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"> I have asked for a rotate intrinsic in Phobos, but Walter has added a rewri= te rule instead, that turns D code to a rot.<br> Personal experience has shown me that it&#39;s easy to write the operation = in a slightly different way (like with signed instead of unsigned values) t= hat causes a missed optimization. So I prefer still something specific, lik= e a Phobos intrinsic, to explicitly ask for this operation to every present= and future D compiler, with no risk of mistakes.<br> </blockquote><div><br></div><div>I agree, an intrinsic that guarantees comp= iler support, or even an operator... ;)</div><div><br></div><blockquote cla= ss=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pa= dding-left:1ex;"> &gt; *Predecated selection:* Float, vector, and often enough even int math = can<br> <div class=3D"im">&gt; really benefit from using hardware select opcodes to= avoid loads/stores. In<br> &gt; C there is no way to express this short of vendor specific intrinsics = again.<br> <br> </div>I don&#39;t understand what you are asking here. Please show an examp= le.<br> <br> There is an enhancement request that asks to support vector operations like= this too (some CPUs support something like this in hardware):<br> int[] a =3D [1,2,3,4];<br> int[] b =3D [4,3,2,1];<br> auto c =3D a[] &gt; b[];<br> assert(c =3D=3D [false, false, true, true]);<br> <br> Are operations like this what you are asking for here?<br></blockquote><div=
<br></div><div>by predicated selection, I mean, code that will select from=

c =3D (some comparison) ? x : z; .. This has hardware support on many moder= n architectures to perform it branch free, particularly important on PowerP= C and other RISC chips.</div> <div><br></div><div>The vector equivalent depends on generating mask vector= s from various comparisons (essentially the same as the scalar versions, bu= t it would be nice to standardise that detail with a strict api).</div> <div>Working something like this:</div><div>a =3D {1,2,3,4}</div><div>b =3D= {4,3,2,1}</div><div>m =3D maskLessThan(a, b); =C2=A0-&gt; m =3D=3D { true,= true, false, false }; (usually expressed by integer 0 or -1)</div><div>c = =3D select(m, a, b); -&gt; c =3D=3D {1, 2, 2, 1}</div> <div><br></div><div>Now this is effectively identical to: float c =3D a &lt= ; b ? a : b; but in SIMD, but there&#39;s no nice expression in the languag= e to do this. The details are occasionally slightly different on different = architectures, hence I&#39;d like to see a standard predecated selection AP= I of some form, which will allow use of hardware opcodes for float/int, and= also mapping to SIMD cleanly.</div> <div><br></div><div>This might possibly branch off into another topic about= SIMD support in D, which appears to be basically non-existent.</div><div>O= ne of the real problems is lack of definition of SIMD types and behaviours.= Also, this construct requires the concept of a mask vector (in essence a S= IMD bool), which should be a concept factored into the SIMD design...</div> <div><br></div><div>On a side note,=C2=A0I&#39;ve seen=C2=A0murmurings=C2= =A0of support for syntax like you illustrate a few times (interpreting D ar= rays as candidates for hardware SIMD usage). While that MIGHT be a nice opt= imisation in isolated cases, I have very serious concerns about standardisi= ng that as the language mechanic for dealing with SIMD data types.</div> <div>I wrote a couple of emails about that in the past though.</div></div> --0016e64f7eae10b8d104af8fde53--
Oct 18 2011
prev sibling parent Manu <turkeyman gmail.com> writes:
--0016361e7ceac5e79504afa26af3
Content-Type: text/plain; charset=UTF-8

Nicely spotted, I didn't realise the intel/amd distinction ;)

Unless I'm mistaken, it is possible for D to return 'out' parameters by
value right? (in additional return registers, no touching the stack?) ..
Assuming that's the case you would surely standardise something more like
the win32 intrinsic rather than one resembling the PPC opcode.
If the function returns a bool that the value was zero or not, then I think
it's fair to say the position is undefined (which supports the intel
assertion).
PPC's approach is more cleanly factored into the win32 model than the other
way around I think, in terms of allowing the optimiser to trim the unused
code. If the intrinsic generates implicit code to produce a bool from the
value, it will surely be trimmed by the optimiser if that result is not
used.

While cmov might work nicely (although I really don't trust that opcode
anyway, an intrinsic like bsr shouldn't be producing a hidden branch) on x86
to produce the PPC result, I'm not sure other architectures would have such
a simple solution. Again, I think the win32 approach is easier for all
architectures to produce and for the optimiser to truncate if the calculated
result is unused.

bool bsf/bsr(int value, out int position); // this assumes that position
will cleanly return in a second return register...


On 18 October 2011 22:50, Don <nospam nospam.com> wrote:

 On 18.10.2011 11:43, Manu wrote:

 On 18 October 2011 12:12, Don <nospam nospam.com
 <mailto:nospam nospam.com>> wrote:

    You mean bsr and bsf.
    Unfortunately, there are some big problems with them. What is bsr(0) ?


 True ;) .. but that's why the API needs to be defined and standardised.
 On PowerPC it returns 32 (or 64), and the x86 version returns 2 values,
 the position, and also a bool telling you if it was zero or not (useful
 for loop termination)

Even worse -- Intel says that the position value of bsr(0) is undefined. But AMD does define it, they say it's what was in the register before. I think all hardware that I've seen is easy to factor into the win32
 intrinsic api.

That would be nice. What do you think it should do for the zero case? Note that on x86, one possibility is to do a bsr followed by a cmov, to get the PowerPC semantics.

--0016361e7ceac5e79504afa26af3 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable <div>Nicely spotted, I didn&#39;t realise the intel/amd distinction ;)</div=
<div><br></div><div>Unless I&#39;m mistaken, it is possible for D to retur=

no touching the stack?) .. Assuming that&#39;s the case you would surely= =C2=A0standardise something more like the win32 intrinsic rather than one r= esembling the PPC opcode.</div> <div>If the function returns a bool that the value was zero or not, then I = think it&#39;s fair to say the position is undefined (which supports the in= tel assertion).</div><div>PPC&#39;s approach is more cleanly factored into = the win32 model than the other way around I think, in terms of allowing the= optimiser to trim the unused code. If the intrinsic generates implicit cod= e to produce a bool from the value, it will surely be trimmed by the optimi= ser if that result is not used.</div> <div><br></div><div>While cmov might work nicely (although I really don&#39= ;t trust that opcode anyway, an intrinsic like bsr shouldn&#39;t be produci= ng a hidden branch) on x86 to produce the PPC result, I&#39;m not sure othe= r architectures would have such a simple solution. Again, I think the win32= approach is easier for all architectures to produce and for the optimiser = to truncate if the calculated result is unused.</div> <div><br></div><div>bool bsf/bsr(int value, out int position); // this assu= mes that position will cleanly return in a second return register...</div><= div><br></div><br><div class=3D"gmail_quote">On 18 October 2011 22:50, Don = <span dir=3D"ltr">&lt;<a href=3D"mailto:nospam nospam.com">nospam nospam.co= m</a>&gt;</span> wrote:<br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex;"><div class=3D"im">On 18.10.2011 11:43, Manu= wrote:<br> </div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-l= eft:1px #ccc solid;padding-left:1ex"><div class=3D"im"> On 18 October 2011 12:12, Don &lt;<a href=3D"mailto:nospam nospam.com" targ= et=3D"_blank">nospam nospam.com</a><br></div><div class=3D"im"> &lt;mailto:<a href=3D"mailto:nospam nospam.com" target=3D"_blank">nospam no= spam.com</a>&gt;&gt; wrote:<br> <br> =C2=A0 =C2=A0You mean bsr and bsf.<br> =C2=A0 =C2=A0Unfortunately, there are some big problems with them. What is= bsr(0) ?<br> <br> <br></div><div class=3D"im"> True ;) .. but that&#39;s why the API needs to be defined and standardised.= <br> On PowerPC it returns 32 (or 64), and the x86 version returns 2 values,<br> the position, and also a bool telling you if it was zero or not (useful<br> for loop termination)<br> </div></blockquote> <br> Even worse -- Intel says that the position value of bsr(0) is undefined. Bu= t AMD does define it, they say it&#39;s what was in the register before.<di= v class=3D"im"><br> <br> <br> <blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p= x #ccc solid;padding-left:1ex"> I think all hardware that I&#39;ve seen is easy to factor into the win32<br=

</blockquote> <br></div> That would be nice. What do you think it should do for the zero case?<br> Note that on x86, one possibility is to do a bsr followed by a cmov, to get= the PowerPC semantics.<br> </blockquote></div><br> --0016361e7ceac5e79504afa26af3--
Oct 19 2011