digitalmars.D.learn - String compare in words?

qznc (4/4) May 29 2016 Given two string (or char[] or ubyte[]) objects, I want to

Era Scarecrow (3/8) May 29 2016 Assuming you don't have codepoints and only want to do a raw
Jonathan M Davis via Digitalmars-d-learn (10/14) May 29 2016 In what way are you trying to compare them? If all you're doing is compa...

Era Scarecrow (8/16) May 29 2016 I'm reminded that the GNU stdlib has a string compare function

qznc (6/8) May 29 2016 Not with dmd v2.071.0 or ldc 0.17.1.

qznc (19/22) May 29 2016 If I write the comparison naively, the assembly clearly shows a

qznc (21/38) May 29 2016 Ok, to answer my own question, this looks good:

Seb (4/27) May 29 2016 Isn't that something that the compiler should optimize for you

Stefan Koch (3/8) May 29 2016 It can be faster because of inlining.

chmike (32/74) May 29 2016 I don't know if this would be faster, but here is my attempt.

chmike (3/3) May 29 2016 On average there would be less than 4 bytes remaining to compare.

ag0aep6g (11/30) May 30 2016 Having "string" in the function name may be a bit misleading. This

=?UTF-8?B?TcOhcmNpbw==?= Martins (9/44) May 30 2016 I too expected it to compile to a memcmp call, but according to
Christophe Meessen via Digitalmars-d-learn (16/48) May 30 2016 I didn't check assembly for '=='. What I have seen is that struct

qznc <qznc web.de> writes:

Given two string (or char[] or ubyte[]) objects, I want to 
compare them. The naive loop accesses the arrays byte-wise. How 
could I turn this into a word-wise compare for better performance?

Is a cast into size_t[] ok? Some Phobos helper functions?

May 29 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 29 May 2016 at 17:13:49 UTC, qznc wrote:
 Given two string (or char[] or ubyte[]) objects, I want to 
 compare them. The naive loop accesses the arrays byte-wise. How 
 could I turn this into a word-wise compare for better 
 performance?

 Is a cast into size_t[] ok? Some Phobos helper functions?

  Assuming you don't have codepoints and only want to do a raw 
compare, i believe you can as long as the sizes align.

May 29 2016

Jonathan M Davis via Digitalmars-d-learn writes:

On Sunday, May 29, 2016 17:13:49 qznc via Digitalmars-d-learn wrote:
 Given two string (or char[] or ubyte[]) objects, I want to
 compare them. The naive loop accesses the arrays byte-wise. How
 could I turn this into a word-wise compare for better performance?

 Is a cast into size_t[] ok? Some Phobos helper functions?

In what way are you trying to compare them? If all you're doing is comparing
them for equality, then just use ==. e.g.

if(str1 == str2)
{
}

And if you're not simply comparing for equality, what are you looking to
figure out? Without more information about what you're trying to do, it's
kind of hard to help you.

- Jonathan M Davis

May 29 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
 In what way are you trying to compare them? If all you're doing 
 is comparing them for equality, then just use ==. e.g.

 if(str1 == str2)
 {
 }

 And if you're not simply comparing for equality, what are you 
 looking to figure out? Without more information about what 
 you're trying to do, it's kind of hard to help you.

  I'm reminded that the GNU stdlib has a string compare function 
which defaults to using larger double words to get a speedup, and 
I think he wants to do that the same way. Although unless they 
are both the same size and both divisible by the size of size_t, 
then it's a multi-stage process to do correctly.

  Worse I'm not sure if the code generation already does that and 
possibly does a better job than what we could do by hand...

May 29 2016

qznc <qznc web.de> writes:

On Sunday, 29 May 2016 at 17:42:48 UTC, Era Scarecrow wrote:
  Worse I'm not sure if the code generation already does that 
 and possibly does a better job than what we could do by hand...

Not with dmd v2.071.0 or ldc 0.17.1.

At least not in all the variations I tried to trick them with, 
like copying into fixed-size array. Well, they did insert a 
memcmp and libc is probably optimized like that, but they copied 
data first, which is unnecessary.

May 29 2016

qznc <qznc web.de> writes:

On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
 And if you're not simply comparing for equality, what are you 
 looking to figure out? Without more information about what 
 you're trying to do, it's kind of hard to help you.

If I write the comparison naively, the assembly clearly shows a 
"movzbl" [0]. It loads a single byte! The other single byte load 
is encoded in the address mode of "cmp". Implementation:

bool stringcmp(string x, string y) {
   foreach(i; 0..x.length) {
     if (x[i] != y[i]) // byte compare
       return false;
   }
   return true;
}

It makes no sense to load single bytes here. Since we only want 
to check for equality, we could load two full words and compare 
four or eight bytes in one go.

This example is simplified and far-fetched. Actually, this is 
about the find algorithm [1].

[0] http://goo.gl/ttybAB
[1] 
http://forum.dlang.org/post/vdjraubhtoqtxeshjvqt forum.dlang.org

May 29 2016

qznc <qznc web.de> writes:

On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
 On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
 And if you're not simply comparing for equality, what are you 
 looking to figure out? Without more information about what 
 you're trying to do, it's kind of hard to help you.

 If I write the comparison naively, the assembly clearly shows a 
 "movzbl" [0]. It loads a single byte! The other single byte 
 load is encoded in the address mode of "cmp". Implementation:

 bool stringcmp(string x, string y) {
   foreach(i; 0..x.length) {
     if (x[i] != y[i]) // byte compare
       return false;
   }
   return true;
 }

 It makes no sense to load single bytes here. Since we only want 
 to check for equality, we could load two full words and compare 
 four or eight bytes in one go.

Ok, to answer my own question, this looks good:

bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {
     pragma(inline, false);
     if (x.length != y.length) return false;
     int i=0;
     // word-wise compare is faster than byte-wise
     if (x.length > size_t.sizeof)
         for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
             size_t* xw = cast(size_t*) &x[i];
             size_t* yw = cast(size_t*) &x[i];
             if (*xw != *yw) return false;
         }
     // last sub-word part
     for (; i < x.length; i+=1) {
         if (x[i] != y[i]) // byte compare
             return false;
     }
     return true;
}

Any comments or recommendations?

May 29 2016

Seb <seb wilzba.ch> writes:

On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
 On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
 [...]

 Ok, to answer my own question, this looks good:

 bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) 
 {
     pragma(inline, false);
     if (x.length != y.length) return false;
     int i=0;
     // word-wise compare is faster than byte-wise
     if (x.length > size_t.sizeof)
         for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
             size_t* xw = cast(size_t*) &x[i];
             size_t* yw = cast(size_t*) &x[i];
             if (*xw != *yw) return false;
         }
     // last sub-word part
     for (; i < x.length; i+=1) {
         if (x[i] != y[i]) // byte compare
             return false;
     }
     return true;
 }

 Any comments or recommendations?

Isn't that something that the compiler should optimize for you 
when you do an equality comparison?
Is it really faster than ldc (with all optimzations turned on)?

May 29 2016

Stefan Koch <uplink.coder googlemail.com> writes:

On Sunday, 29 May 2016 at 20:51:19 UTC, Seb wrote:
 On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
 [...]

 Isn't that something that the compiler should optimize for you 
 when you do an equality comparison?
 Is it really faster than ldc (with all optimzations turned on)?

It can be faster because of inlining.
memcmp is a runtime call.

May 29 2016

chmike <christophe meessen.net> writes:

On Sunday, 29 May 2016 at 20:40:52 UTC, qznc wrote:
 On Sunday, 29 May 2016 at 18:15:16 UTC, qznc wrote:
 On Sunday, 29 May 2016 at 17:38:17 UTC, Jonathan M Davis wrote:
 And if you're not simply comparing for equality, what are you 
 looking to figure out? Without more information about what 
 you're trying to do, it's kind of hard to help you.

 If I write the comparison naively, the assembly clearly shows 
 a "movzbl" [0]. It loads a single byte! The other single byte 
 load is encoded in the address mode of "cmp". Implementation:

 bool stringcmp(string x, string y) {
   foreach(i; 0..x.length) {
     if (x[i] != y[i]) // byte compare
       return false;
   }
   return true;
 }

 It makes no sense to load single bytes here. Since we only 
 want to check for equality, we could load two full words and 
 compare four or eight bytes in one go.

 Ok, to answer my own question, this looks good:

 bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) 
 {
     pragma(inline, false);
     if (x.length != y.length) return false;
     int i=0;
     // word-wise compare is faster than byte-wise
     if (x.length > size_t.sizeof)
         for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
             size_t* xw = cast(size_t*) &x[i];
             size_t* yw = cast(size_t*) &x[i];
             if (*xw != *yw) return false;
         }
     // last sub-word part
     for (; i < x.length; i+=1) {
         if (x[i] != y[i]) // byte compare
             return false;
     }
     return true;
 }

 Any comments or recommendations?

I don't know if this would be faster, but here is my attempt.

It assumes the arrays start at an address multiple of 8.

if (x is y)
     return true;
if (x.length != y.length)
     return false;
size_t l = x.length;
ubyte* a = x.ptr, b = y.ptr;
for (size_t n = l>>3; n != 0; --n, a+=8, b+=8)
     if (*cast(long*)a ^ *cast(long*)b)
         return false;
if (l & 4)
{
     if (*cast(int*)a ^ *cast(int*)b)
         return false;
     a+= 4;
     b+= 4;
}
if (l & 2)
{
     if (*cast(short*)a ^ *cast(short*)b)
         return false;
     a+=2;
     b+=2;
}
return (l & 1) && (*a ^ *b) ? false : true;

If the pointers are not on an address multiple of 8, one has to 
inverse the trailing tests to consume the bytes in front of the 
array until the address becomes a multiple of 8.

The trailing tests could eventually be replaced by a simple 
sequential byte compare. I don't know which is faster.

May 29 2016

chmike <christophe meessen.net> writes:

On average there would be less than 4 bytes remaining to compare. 
So a simple straightforward byte comparison should do the job 
efficiently.

May 29 2016

ag0aep6g <anonymous example.com> writes:

On 05/29/2016 10:40 PM, qznc wrote:
 bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {

Having "string" in the function name may be a bit misleading. This 
doesn't have any special functionality for text/characters/Unicode, does it?

Should have const parameters, not immutable.

      pragma(inline, false);

I think you have to put this pragma on the function signature, not in 
the body. Also, why prevent inlining of the function?

      if (x.length != y.length) return false;
      int i=0;

int isn't large enough for array lengths.

      // word-wise compare is faster than byte-wise
      if (x.length > size_t.sizeof)
          for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
              size_t* xw = cast(size_t*) &x[i];
              size_t* yw = cast(size_t*) &x[i];

Typo: Should be `&y[i]` here.

              if (*xw != *yw) return false;
          }
      // last sub-word part
      for (; i < x.length; i+=1) {
          if (x[i] != y[i]) // byte compare
              return false;
      }
      return true;
 }

 Any comments or recommendations?

Did you benchmark this against the built-in `==`, with ldc or gdc?

If this is correct and faster than the built-in `==`, why isn't it the 
built-in `==`?

May 30 2016

=?UTF-8?B?TcOhcmNpbw==?= Martins <marcioapm gmail.com> writes:

On Monday, 30 May 2016 at 09:28:29 UTC, ag0aep6g wrote:
 On 05/29/2016 10:40 PM, qznc wrote:
 bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] 
 y) {

 Having "string" in the function name may be a bit misleading. 
 This doesn't have any special functionality for 
 text/characters/Unicode, does it?

 Should have const parameters, not immutable.

      pragma(inline, false);

 I think you have to put this pragma on the function signature, 
 not in the body. Also, why prevent inlining of the function?

      if (x.length != y.length) return false;
      int i=0;

 int isn't large enough for array lengths.

      // word-wise compare is faster than byte-wise
      if (x.length > size_t.sizeof)
          for (; i < x.length - size_t.sizeof; 
 i+=size_t.sizeof) {
              size_t* xw = cast(size_t*) &x[i];
              size_t* yw = cast(size_t*) &x[i];

 Typo: Should be `&y[i]` here.

              if (*xw != *yw) return false;
          }
      // last sub-word part
      for (; i < x.length; i+=1) {
          if (x[i] != y[i]) // byte compare
              return false;
      }
      return true;
 }

 Any comments or recommendations?

 Did you benchmark this against the built-in `==`, with ldc or 
 gdc?

 If this is correct and faster than the built-in `==`, why isn't 
 it the built-in `==`?

I too expected it to compile to a memcmp call, but according to 
asm.dlang.org DMD with -O and -release, DMD compiles a == b to a 
byte-wise compare.

I suppose for very tiny strings this is the fastest, but for 
slightly larger strings, calling memcmp() would be faster. I 
think inlining a string comparison is also not great for code 
size. In the general case, for element types with trivial 
equality, a call to memcmp() will always be preferable, right?

May 30 2016

Christophe Meessen via Digitalmars-d-learn writes:

I didn't check assembly for '=='. What I have seen is that struct 
comparison in dmd is implemented as byte per byte compare even if the 
struct is 64bit long (e.g. Rebindable).  I suppose dmd uses this 
strategy because struct/array may not be 64bit aligned, or they could 
have different alignment.

In order to use the suggested optimization we need a good planet 
alignment.  The effort to check that, or enforce it, is not worth the 
benefit on average.

There could be a benefit in specific use cases where the alignment is 
ensured.

I would be interested in such optimization with Rebindable. Can the 'is' 
operator be overloaded ?


Le 30/05/2016 11:28, ag0aep6g via Digitalmars-d-learn a écrit :
 On 05/29/2016 10:40 PM, qznc wrote:
 bool string_cmp_opt(immutable(ubyte)[] x, immutable(ubyte)[] y) {

 Having "string" in the function name may be a bit misleading. This 
 doesn't have any special functionality for text/characters/Unicode, 
 does it?

 Should have const parameters, not immutable.

      pragma(inline, false);

 I think you have to put this pragma on the function signature, not in 
 the body. Also, why prevent inlining of the function?

      if (x.length != y.length) return false;
      int i=0;

 int isn't large enough for array lengths.

      // word-wise compare is faster than byte-wise
      if (x.length > size_t.sizeof)
          for (; i < x.length - size_t.sizeof; i+=size_t.sizeof) {
              size_t* xw = cast(size_t*) &x[i];
              size_t* yw = cast(size_t*) &x[i];

 Typo: Should be `&y[i]` here.

              if (*xw != *yw) return false;
          }
      // last sub-word part
      for (; i < x.length; i+=1) {
          if (x[i] != y[i]) // byte compare
              return false;
      }
      return true;
 }

 Any comments or recommendations?

 Did you benchmark this against the built-in `==`, with ldc or gdc?

 If this is correct and faster than the built-in `==`, why isn't it the 
 built-in `==`?

-- 
Bien cordialement,

Ch.Meessen

May 30 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - String compare in words?