digitalmars.D - Challenge: write a really really small front() for UTF8

Andrei Alexandrescu (2/2) Mar 23 2014 Here's a baseline: http://goo.gl/91vIGc. Destroy!

Dmitry Olshansky (5/7) Mar 23 2014 Assertions to check encoding?!

Andrei Alexandrescu (5/10) Mar 23 2014 That implementation does zero effort to optimize checks themselves, and

Dmitry Olshansky (6/16) Mar 23 2014 Just how much you are willing to assert? You don't even check length.

Andrei Alexandrescu (4/19) Mar 23 2014 A replacement for front() in arrays of char and wchar.

Anonymous (13/13) Mar 23 2014 dchar front(char[] s) {

Andrei Alexandrescu (3/16) Mar 23 2014 That's smaller but doesn't seem to do the same!

Michel Fortin (38/39) Mar 23 2014 If you want the smallest assembly size with array bound checking, make
Vladimir Panteleev (5/25) Mar 23 2014 Since strings are often user input, if there is chance that bad

Mike (6/8) Mar 23 2014 This example only considers encodings of up to 4 bytes, but UTF-8

Walter Bright (2/4) Mar 23 2014 It's not anymore. The 5 and 6 byte encodings are now illegal.
=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (5/12) Mar 23 2014 RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to

Dmitry Olshansky (6/18) Mar 24 2014 More importantly Unicode standard explicitly fixed the range of code

Daniel N (12/19) Mar 24 2014 I did some hacks using C at work with _pext_u32, it's an

Daniel N (43/46) Mar 24 2014 I now managed to dig up my old C source... but I'm still blocked

Iain Buclaw (3/13) Mar 25 2014 4) It's tied to one piece of hardware.

Nick Sabalausky (12/30) Mar 25 2014 bool supportCpuFeatureX;

dennis luehring (2/30) Mar 25 2014 the extra branch could kill the performance benefit if doStuff is too sm...

Daniel N (4/14) Mar 25 2014 you'd simply have to hoist the condition outside the inner loop.
Piotr Szturmaj (9/42) Mar 25 2014 void function() doStuff;

Temtaime (2/2) Mar 26 2014 http://goo.gl/4RSWhr

Temtaime (1/1) Mar 26 2014 Oups, in last return forgot "+ s[3] "

Michel Fortin (31/32) Mar 23 2014 Optimizing for smallest assembly size:

Andrei Alexandrescu (4/32) Mar 23 2014 Nice, thanks! I'd hope for a short path for the ASCII subset, could you

Michel Fortin (30/65) Mar 23 2014 Unfortunately, there's a bug in the above. A missing "break" results in

bearophile (4/5) Mar 23 2014 Isn't your wish already granted?

Michel Fortin (7/12) Mar 23 2014 Maybe. I don't have a D compiler installed at the moment to check. I'm

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/17) Mar 23 2014 That compiler is very old. :)

Andrei Alexandrescu (8/69) Mar 23 2014 -w

bearophile (6/11) Mar 23 2014 Lot of people in D.learn don't use warnings.

Michel Fortin (23/24) Mar 23 2014 Oops, messed up my patterns. Here's a hopefully fixed front():

Andrei Alexandrescu (3/20) Mar 23 2014 http://goo.gl/y9EVFr

Michel Fortin (50/55) Mar 23 2014 As I said earlier, that front2 version is broken. It's missing a break.

safety0ff (5/10) Mar 23 2014 Everything seems to work in your corrected versions:

Michel Fortin (51/56) Mar 24 2014 Ok, so let's check what happens in actual usage. In other words, what

Michel Fortin (8/20) Mar 24 2014 Oops, wrong link, was from an intermediary test.
dnspies (12/70) Mar 24 2014 It seems like mine wasn't being inlined because I had carelessly

dnspies (5/91) Mar 24 2014 Whoops, wrong "original" version. That was the one before I

safety0ff (33/46) Mar 23 2014 0b1000000 is missing a zero: 0b1000_0000

Michel Fortin (8/10) Mar 23 2014 That "auto indicator = (s[0] >> 5) & 0b11;" line is wrong too. s[0]

safety0ff (6/8) Mar 23 2014 Here's my attempt: http://goo.gl/alJ9JS

Andrei Alexandrescu (2/10) Mar 23 2014 Very interesting! -- Andrei

Chris Williams (3/5) Mar 24 2014 http://goo.gl/TaZTNB

JR (3/10) Mar 24 2014 size_t, ++i! :>
Andrei Alexandrescu (3/8) Mar 24 2014 Nice! Why assert(ret != 0)?

Chris Williams (5/15) Mar 24 2014 It should be -1. I wanted to make sure that bsr() wasn't called

dnspies (23/25) Mar 24 2014 Here's mine (the loop gets unrolled):

dnspies (25/51) Mar 24 2014 Sorry, I misunderstood. We only want the x's in the output.

Andrei Alexandrescu (2/33) Mar 24 2014 That turned out quite large. -- Andrei

Andrei Alexandrescu (4/29) Mar 24 2014 http://goo.gl/HBSv5T - looks nice! For minimum size replace the

monarch_dodra (8/10) Mar 24 2014 Before we roll this out, could we discuss a strategy/guideline in

dennis luehring (6/10) Mar 24 2014 it would be great to habe a "basic" form of this range that error
Chris Williams (7/18) Mar 24 2014 As you'll note from the assembler, asserts are compiled out of
w0rp (18/29) Mar 24 2014 I would strongly advise to at least offer an option, possibly via

dennis luehring (2/33) Mar 24 2014 +1
Andrei Alexandrescu (4/18) Mar 24 2014 Options are fine for functions etc. But front would need to find an

dennis luehring (9/28) Mar 24 2014 b"\255".decode("utf-8", errors="strict") # UnicodeDecodeError

Andrei Alexandrescu (4/10) Mar 24 2014 I think std.array.front should return the invalid dchar on error, and

Vladimir Panteleev (12/24) Mar 24 2014 Ignoring UTF errors is a lossy operation and has the potential
monarch_dodra (12/25) Mar 24 2014 That sounds good to me (I think). Better than a straight up

dnspies (5/7) Mar 24 2014 Ok, I managed to make it smaller. I think this is the smallest

Andrei Alexandrescu (3/10) Mar 24 2014 This is nice because even the checks are relatively cheap. It's quite

John Colvin (6/8) Mar 24 2014 I've probably missed something, but here's 15 instructions:

John Colvin (2/9) Mar 24 2014 woops yeah, that's completely broken.

John Colvin (3/5) Mar 24 2014 http://forum.dlang.org/post/fsgdviklnbugdzjsgfsk@forum.dlang.org
John Colvin (30/32) Mar 24 2014 On a bigendian machine with loose alignment requirements (1

John Colvin (26/33) Mar 24 2014 Same again, but for little-endian, 18 instructions:

Dmitry Olshansky (6/8) Mar 24 2014 I had to join the party at some point.

Dmitry Olshansky (5/12) Mar 25 2014 Interestingly gdc-4.8 produces better results.

anonymous (14/14) Mar 25 2014 dchar front(char[] s) {

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Here's a baseline: http://goo.gl/91vIGc. Destroy!

Andrei

Mar 23 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

Assertions to check encoding?!
I thought it would detect broken encoding and do a substitution at least.

 Andrei


-- 
Dmitry Olshansky

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Assertions to check encoding?!
 I thought it would detect broken encoding and do a substitution at least.

That implementation does zero effort to optimize checks themselves, and 
indeed puts them in asserts. I think there's value in having such a 
primitive down below.

Andrei

Mar 23 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Mar-2014 01:34, Andrei Alexandrescu пишет:
 On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Assertions to check encoding?!
 I thought it would detect broken encoding and do a substitution at least.

 That implementation does zero effort to optimize checks themselves, and
 indeed puts them in asserts. I think there's value in having such a
 primitive down below.

Just how much you are willing to assert? You don't even check length.
In short - what are the specs of this primitive and where you see it 
being used.

 Andrei


-- 
Dmitry Olshansky

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 3:10 PM, Dmitry Olshansky wrote:
 24-Mar-2014 01:34, Andrei Alexandrescu пишет:
 On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Assertions to check encoding?!
 I thought it would detect broken encoding and do a substitution at
 least.

 That implementation does zero effort to optimize checks themselves, and
 indeed puts them in asserts. I think there's value in having such a
 primitive down below.

 Just how much you are willing to assert? You don't even check length.

Array bounds checking takes care of that.

 In short - what are the specs of this primitive and where you see it
 being used.

A replacement for front() in arrays of char and wchar.


Andrei

Mar 23 2014

"Anonymous" <anon ymous.net> writes:

dchar front(char[] s) {
     uint c = s[0];
     ubyte p = ~s[0];
     if (p>>7)
       return c;
     c = c<<8 | s[1];
     if (p>>5)
       return c;
     c = c<<8 | s[2];
     if (p>>4)
       return c;
     return c<<8 | s[3];
}

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 4:28 PM, Anonymous wrote:
 dchar front(char[] s) {
      uint c = s[0];
      ubyte p = ~s[0];
      if (p>>7)
        return c;
      c = c<<8 | s[1];
      if (p>>5)
        return c;
      c = c<<8 | s[2];
      if (p>>4)
        return c;
      return c<<8 | s[3];
 }

That's smaller but doesn't seem to do the same!

Andrei

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-23 22:58:32 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Array bounds checking takes care of that.

If you want the smallest assembly size with array bound checking, make 
the function  safe and see the disaster it causes to the assembly size. 
That's what you have to optimize. If you're going to optimize while 
looking at the assembly, better check for bounds manually:

dchar front(char[] s)
{
  if (s.length < 1) return dchar.init;
  size_t bytesize;
  dchar result;
  switch (s[0])
  {
    case 0b00000000: .. case 0b01111111:
    	return s[0];
    case 0b11000000: .. case 0b11011111:
    	result = s[0] & 0b00011111;
    	bytesize = 2;
    	break;
    case 0b11100000: .. case 0b11101111:
    	result = s[0] & 0b00001111;
    	bytesize = 3;
    	break;
    case 0b11110000: .. case 0b11110111:
    	result = s[0] & 0b00000111;
    	bytesize = 4;
    default:
       return dchar.init;
  }
  if (s.length < bytesize) return dchar.init;
  foreach (i; 1..bytesize)
      result = (result << 6) | (s[i] & 0b00111111);
  return result;
}



-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Sunday, 23 March 2014 at 22:58:53 UTC, Andrei Alexandrescu 
wrote:
 On 3/23/14, 3:10 PM, Dmitry Olshansky wrote:
 24-Mar-2014 01:34, Andrei Alexandrescu пишет:
 On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:
 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Assertions to check encoding?!
 I thought it would detect broken encoding and do a 
 substitution at
 least.

 That implementation does zero effort to optimize checks 
 themselves, and
 indeed puts them in asserts. I think there's value in having 
 such a
 primitive down below.

 Just how much you are willing to assert? You don't even check 
 length.

 Array bounds checking takes care of that.

Since strings are often user input, if there is chance that bad 
input can cause undefined behavior, then the checks must be done 
unconditionally.

Mar 23 2014

"Mike" <none none.com> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

This example only considers encodings of up to 4 bytes, but UTF-8 
can encode code points in as many as 6 bytes.  Is that not a 
concern?

Mike

Mar 23 2014

Walter Bright <newshound2 digitalmars.com> writes:

On 3/23/2014 5:32 PM, Mike wrote:
 This example only considers encodings of up to 4 bytes, but UTF-8 can encode
 code points in as many as 6 bytes.  Is that not a concern?

It's not anymore. The 5 and 6 byte encodings are now illegal.

Mar 23 2014

=?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On 2014-03-24 00:32, Mike wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 This example only considers encodings of up to 4 bytes, but UTF-8 can
 encode code points in as many as 6 bytes.  Is that not a concern?

 Mike

RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to 
conform to constraints in UTF-16, removing all 5- and 6-byte sequences.

--
   Simen

Mar 23 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Mar-2014 04:44, Simen Kjærås пишет:
 On 2014-03-24 00:32, Mike wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 This example only considers encodings of up to 4 bytes, but UTF-8 can
 encode code points in as many as 6 bytes.  Is that not a concern?

 Mike

 RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to
 conform to constraints in UTF-16, removing all 5- and 6-byte sequences.

More importantly Unicode standard explicitly fixed the range of code 
points to that of representable in UTF-16. Starting with the 5th version 
of the standard if memory serves me right.


-- 
Dmitry Olshansky

Mar 24 2014

"Daniel N" <ufo orbiting.us> writes:

On Monday, 24 March 2014 at 11:48:00 UTC, Dmitry Olshansky wrote:
 RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 
 to
 conform to constraints in UTF-16, removing all 5- and 6-byte 
 sequences.

 More importantly Unicode standard explicitly fixed the range of 
 code points to that of representable in UTF-16. Starting with 
 the 5th version of the standard if memory serves me right.

I did some hacks using C at work with _pext_u32, it's an 
absolutely wonderful instruction(pext) with the corresponding 
pdep.
http://software.intel.com/sites/landingpage/IntrinsicsGuide/

And ridiculously fast according to Agner(Latency 3, Throughput 1):
http://www.agner.org/optimize/instruction_tables.pdf

I think we should add this as an intrinsic to D as well(if it 
isn't already, but I couldn't find it)... it could do wonders for 
utf decoding.

I'm currently too busy to submit a complete solution, but please 
feel free to use my idea if you think it sounds promising.

Mar 24 2014

"Daniel N" <ufo orbiting.us> writes:

On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
 I'm currently too busy to submit a complete solution, but 
 please feel free to use my idea if you think it sounds 
 promising.

I now managed to dig up my old C source... but I'm still blocked 
by dmd not accepting the 'pext' instruction...

1) I know my solution is not directly comparable to the rest in 
this thread(for many reasons).
2) It's of course trivial to add a fast path for ascii... if 
desired.
3) It throws safety and standards out the window.

#include <x86intrin.h>

char32_t front(const char* inp)
{
   static const uint32_t mask[32] =
   {
       0b01111111'00000000'00000000'00000000,
       0b00000000'00000000'00000000'00000000,
       0b00011111'00111111'00000000'00000000,
       0b00001111'00111111'00111111'00000000,
       0b00000111'00111111'00111111'00111111,
   };
   uint32_t rev = __builtin_bswap32(reinterpret_cast<const 
uint32_t*>(inp)[0]);
   uint32_t len = __lzcnt32(~rev);

   return _pext_u32(rev, mask[len]);
}

This is what clang 3.4 generated:

         pushq   %rbp
Ltmp2:
         .cfi_def_cfa_offset 16
Ltmp3:
         .cfi_offset %rbp, -16
         movq    %rsp, %rbp
Ltmp4:
         .cfi_def_cfa_register %rbp
         movl    (%rdi), %eax
         bswapl  %eax
         movl    %eax, %ecx
         notl    %ecx
         lzcntl  %ecx, %ecx
         leaq    __ZZ5frontPKcE4mask(%rip), %rdx
         pextl   (%rdx,%rcx,4), %eax, %eax
         popq    %rbp
         ret

Mar 24 2014

Iain Buclaw <ibuclaw gdcproject.org> writes:

On 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:
 On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
 I'm currently too busy to submit a complete solution, but please feel free
 to use my idea if you think it sounds promising.


 I now managed to dig up my old C source... but I'm still blocked by dmd not
 accepting the 'pext' instruction...

 1) I know my solution is not directly comparable to the rest in this
 thread(for many reasons).
 2) It's of course trivial to add a fast path for ascii... if desired.
 3) It throws safety and standards out the window.


4) It's tied to one piece of hardware.

No Thankee.

Mar 25 2014

Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:

On 3/25/2014 4:00 AM, Iain Buclaw wrote:
 On 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:
 On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
 I'm currently too busy to submit a complete solution, but please feel free
 to use my idea if you think it sounds promising.


 I now managed to dig up my old C source... but I'm still blocked by dmd not
 accepting the 'pext' instruction...

 1) I know my solution is not directly comparable to the rest in this
 thread(for many reasons).
 2) It's of course trivial to add a fast path for ascii... if desired.
 3) It throws safety and standards out the window.


 4) It's tied to one piece of hardware.

 No Thankee.

bool supportCpuFeatureX;

void main() {
     supportCpuFeatureX = detectCpuFeatureX();

     doStuff();
}

void doStuff() {
     if(supportCpuFeatureX)
         doStuff_FeatureX();
     else
         doStuff_Fallback();
}

 dmd -inline blah.d

Mar 25 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 25.03.2014 11:38, schrieb Nick Sabalausky:
 On 3/25/2014 4:00 AM, Iain Buclaw wrote:
 On 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:
 On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
 I'm currently too busy to submit a complete solution, but please feel free
 to use my idea if you think it sounds promising.


 I now managed to dig up my old C source... but I'm still blocked by dmd not
 accepting the 'pext' instruction...

 1) I know my solution is not directly comparable to the rest in this
 thread(for many reasons).
 2) It's of course trivial to add a fast path for ascii... if desired.
 3) It throws safety and standards out the window.


 4) It's tied to one piece of hardware.

 No Thankee.

 void doStuff() {
       if(supportCpuFeatureX)
           doStuff_FeatureX();
       else
           doStuff_Fallback();
 }

   > dmd -inline blah.d

the extra branch could kill the performance benefit if doStuff is too small

Mar 25 2014

"Daniel N" <ufo orbiting.us> writes:

On Tuesday, 25 March 2014 at 10:42:59 UTC, dennis luehring wrote:
 void doStuff() {
      if(supportCpuFeatureX)
          doStuff_FeatureX();
      else
          doStuff_Fallback();
 }

  > dmd -inline blah.d

 the extra branch could kill the performance benefit if doStuff 
 is too small

you'd simply have to hoist the condition outside the inner loop.

Furthermore the branch prediction would never fail, only 
unpredictable branches are terrible.

Mar 25 2014

Piotr Szturmaj <bncrbme jadamspam.pl> writes:

W dniu 2014-03-25 11:42, dennis luehring pisze:
 Am 25.03.2014 11:38, schrieb Nick Sabalausky:
 On 3/25/2014 4:00 AM, Iain Buclaw wrote:
 On 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:
 On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:
 I'm currently too busy to submit a complete solution, but please
 feel free
 to use my idea if you think it sounds promising.


 I now managed to dig up my old C source... but I'm still blocked by
 dmd not
 accepting the 'pext' instruction...

 1) I know my solution is not directly comparable to the rest in this
 thread(for many reasons).
 2) It's of course trivial to add a fast path for ascii... if desired.
 3) It throws safety and standards out the window.


 4) It's tied to one piece of hardware.

 No Thankee.

 void doStuff() {
       if(supportCpuFeatureX)
           doStuff_FeatureX();
       else
           doStuff_Fallback();
 }

   > dmd -inline blah.d

 the extra branch could kill the performance benefit if doStuff is too small

void function() doStuff;

void main() {
     auto supportCpuFeatureX = detectCpuFeatureX();

     if (supportCpuFeatureX)
         doStuff = &doStuff_FeatureX;
     else
         doStuff = &doStuff_Fallback;
}

Mar 25 2014

"Temtaime" <temtaime gmail.com> writes:

http://goo.gl/4RSWhr

Only 3 ifs.

Mar 26 2014

"Temtaime" <temtaime gmail.com> writes:

Oups, in last return forgot "+ s[3] "

Mar 26 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 Here's a baseline: http://goo.gl/91vIGc. Destroy!

Optimizing for smallest assembly size:

dchar front(char[] s)
{
  size_t bytesize;
  dchar result;
  switch (s[0])
  {
    case 0b00000000: .. case 0b01111111:
    	return s[0];
    case 0b11000000: .. case 0b11011111:
    	return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
    case 0b11100000: .. case 0b11101111:
    	result = s[0] & 0b00001111;
    	bytesize = 3;
    	break;
    case 0b11110000: .. case 0b11110111:
    	result = s[0] & 0b00000111;
    	bytesize = 4;
    default:
       return dchar.init;
  }
  foreach (i; 1..bytesize)
      result = (result << 6) | (s[i] & 0b00111111);
  return result;
}

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 6:53 PM, Michel Fortin wrote:
 On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Optimizing for smallest assembly size:

 dchar front(char[] s)
 {
   size_t bytesize;
   dchar result;
   switch (s[0])
   {
     case 0b00000000: .. case 0b01111111:
         return s[0];
     case 0b11000000: .. case 0b11011111:
         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
     case 0b11100000: .. case 0b11101111:
         result = s[0] & 0b00001111;
         bytesize = 3;
         break;
     case 0b11110000: .. case 0b11110111:
         result = s[0] & 0b00000111;
         bytesize = 4;
     default:
        return dchar.init;
   }
   foreach (i; 1..bytesize)
       result = (result << 6) | (s[i] & 0b00111111);
   return result;
 }

Nice, thanks! I'd hope for a short path for the ASCII subset, could you 
achieve that?

Andrei

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 02:25:17 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 3/23/14, 6:53 PM, Michel Fortin wrote:
 On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:
 
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 
 Optimizing for smallest assembly size:
 
 dchar front(char[] s)
 {
   size_t bytesize;
   dchar result;
   switch (s[0])
   {
     case 0b00000000: .. case 0b01111111:
         return s[0];
     case 0b11000000: .. case 0b11011111:
         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
     case 0b11100000: .. case 0b11101111:
         result = s[0] & 0b00001111;
         bytesize = 3;
         break;
     case 0b11110000: .. case 0b11110111:
         result = s[0] & 0b00000111;
         bytesize = 4;
     default:
        return dchar.init;
   }
   foreach (i; 1..bytesize)
       result = (result << 6) | (s[i] & 0b00111111);
   return result;
 }

 
 Nice, thanks! I'd hope for a short path for the ASCII subset, could you 
 achieve that?

Unfortunately, there's a bug in the above. A missing "break" results in 
a fallthrough to default case. That's why the optimizer is so good, it 
just omits the four-byte branch entirely. I noticed something was wrong 
by looking at the assembly. I really wish D had no implicit fallthrough.

But try this instead, the result is even shorter:

dchar front(char[] s)
{
  if (s[0] < 0b1000000)
    return s[0]; // ASCII

  // pattern     indicator  tailLength
  // 0b100xxxxx  0b00 (0)   1
  // 0b101xxxxx  0b01 (1)   1 == indicator
  // 0b110xxxxx  0b10 (2)   2 == indicator
  // 0b111xxxxx  0b11 (3)   3 == indicator
  // note: undefined result for illegal 0b1111xxxx case

  auto indicator = (s[0] >> 5) & 0b11;
  auto tailLength = indicator ? indicator : 1;

  dchar result = s[0] & (0b00111111 >> tailLength);
  foreach (i; 0..tailLength)
      result = (result << 6) | (s[1+i] & 0b00111111);
  return result;
}

(Disclaimer: not tested, but I did check that all the expected code 
paths are present in the assembly this time.)

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Michel Fortin:

 I really wish D had no implicit fallthrough.

Isn't your wish already granted?

Bye,
bearophile

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 04:39:08 +0000, "bearophile" <bearophileHUGS lycos.com> said:

 Michel Fortin:
 
 I really wish D had no implicit fallthrough.

 
 Isn't your wish already granted?

Maybe. I don't have a D compiler installed at the moment to check. I'm 
just playing with d.godbolt.org and it accepts implicit fallthrough.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 03/23/2014 09:44 PM, Michel Fortin wrote:
 On 2014-03-24 04:39:08 +0000, "bearophile" <bearophileHUGS lycos.com> said:

 Michel Fortin:

 I really wish D had no implicit fallthrough.

 Isn't your wish already granted?

 Maybe. I don't have a D compiler installed at the moment to check. I'm
 just playing with d.godbolt.org and it accepts implicit fallthrough.

That compiler is very old. :)

import std.compiler;

pragma(msg, version_major);
pragma(msg, version_minor);

The output:

2u
55u

Ali

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 9:37 PM, Michel Fortin wrote:
 On 2014-03-24 02:25:17 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 On 3/23/14, 6:53 PM, Michel Fortin wrote:
 On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> said:

 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Optimizing for smallest assembly size:

 dchar front(char[] s)
 {
   size_t bytesize;
   dchar result;
   switch (s[0])
   {
     case 0b00000000: .. case 0b01111111:
         return s[0];
     case 0b11000000: .. case 0b11011111:
         return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111);
     case 0b11100000: .. case 0b11101111:
         result = s[0] & 0b00001111;
         bytesize = 3;
         break;
     case 0b11110000: .. case 0b11110111:
         result = s[0] & 0b00000111;
         bytesize = 4;
     default:
        return dchar.init;
   }
   foreach (i; 1..bytesize)
       result = (result << 6) | (s[i] & 0b00111111);
   return result;
 }

 Nice, thanks! I'd hope for a short path for the ASCII subset, could
 you achieve that?

 Unfortunately, there's a bug in the above. A missing "break" results in
 a fallthrough to default case. That's why the optimizer is so good, it
 just omits the four-byte branch entirely. I noticed something was wrong
 by looking at the assembly. I really wish D had no implicit fallthrough.

-w

Probably time to move that from a warning to an error. In the short time 
we've used D at Facebook (forgot to add "-w" to the implicit flags) 
we've already have not one, but two bugs related to switch's implicit 
fallthrough.

 But try this instead, the result is even shorter:

 dchar front(char[] s)
 {
   if (s[0] < 0b1000000)
     return s[0]; // ASCII

   // pattern     indicator  tailLength
   // 0b100xxxxx  0b00 (0)   1
   // 0b101xxxxx  0b01 (1)   1 == indicator
   // 0b110xxxxx  0b10 (2)   2 == indicator
   // 0b111xxxxx  0b11 (3)   3 == indicator
   // note: undefined result for illegal 0b1111xxxx case

   auto indicator = (s[0] >> 5) & 0b11;
   auto tailLength = indicator ? indicator : 1;

   dchar result = s[0] & (0b00111111 >> tailLength);
   foreach (i; 0..tailLength)
       result = (result << 6) | (s[1+i] & 0b00111111);
   return result;
 }

 (Disclaimer: not tested, but I did check that all the expected code
 paths are present in the assembly this time.)

Nicely done. I collected what we have at http://goo.gl/W9V6E7.


Andrei

Mar 23 2014

"bearophile" <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 -w

 Probably time to move that from a warning to an error. In the 
 short time we've used D at Facebook (forgot to add "-w" to the 
 implicit flags) we've already have not one, but two bugs 
 related to switch's implicit fallthrough.

Lot of people in D.learn don't use warnings.
What I'd like is to have (informational) warning active on 
default, and to disable them on request with a compiler switch.

Bye,
bearophile

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 04:37:22 +0000, Michel Fortin <michel.fortin michelf.ca> said:

 But try this instead, the result is even shorter:

Oops, messed up my patterns. Here's a hopefully fixed front():

dchar front(char[] s)
{
  if (s[0] < 0b1000000)
    return s[0]; // ASCII

  // pattern     indicator  tailLength
  // 0b1100xxxx  0b00 (0)   1
  // 0b1101xxxx  0b01 (1)   1 == indicator
  // 0b1110xxxx  0b10 (2)   2 == indicator
  // 0b1111xxxx  0b11 (3)   3 == indicator
  // note: undefined result for illegal 0b11111xxx case

  auto indicator = (s[0] >> 4) & 0b11;
  auto tailLength = indicator ? indicator : 1;

  dchar result = s[0] & (0b00111111 >> tailLength);
  foreach (i; 0..tailLength)
      result = (result << 6) | (s[1+i] & 0b00111111);
  return result;
}

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 9:49 PM, Michel Fortin wrote:
 dchar front(char[] s)
 {
   if (s[0] < 0b1000000)
     return s[0]; // ASCII

   // pattern     indicator  tailLength
   // 0b1100xxxx  0b00 (0)   1
   // 0b1101xxxx  0b01 (1)   1 == indicator
   // 0b1110xxxx  0b10 (2)   2 == indicator
   // 0b1111xxxx  0b11 (3)   3 == indicator
   // note: undefined result for illegal 0b11111xxx case

   auto indicator = (s[0] >> 4) & 0b11;
   auto tailLength = indicator ? indicator : 1;

   dchar result = s[0] & (0b00111111 >> tailLength);
   foreach (i; 0..tailLength)
       result = (result << 6) | (s[1+i] & 0b00111111);
   return result;
 }

http://goo.gl/y9EVFr

Andrei

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 05:09:15 +0000, Andrei Alexandrescu 
<SeeWebsiteForEmail erdani.org> said:

 On 3/23/14, 9:49 PM, Michel Fortin wrote:
 

 http://goo.gl/y9EVFr
 
 Andrei

As I said earlier, that front2 version is broken. It's missing a break. 
Adding the break makes it non-interesting from an instruction count 
point of view.

Also, there's still an issue in my code above (found by safetyOff): the 
first "if" for ASCII should use 0b1000_0000, not 0b1000000 (missing one 
zero). Here's the corrected (again) version of front1:

dchar front1(char[] s)
{
  if (s[0] < 0b1000_0000)
    return s[0]; // ASCII

  // pattern     indicator  tailLength
  // 0b1100xxxx  0b00 (0)   1
  // 0b1101xxxx  0b01 (1)   1 == indicator
  // 0b1110xxxx  0b10 (2)   2 == indicator
  // 0b1111xxxx  0b11 (3)   3 == indicator
  // note: undefined result for illegal 0b11111xxx case

  auto indicator = (s[0] >> 4) & 0b11;
  auto tailLength = indicator ? indicator : 1;

  dchar result = s[0] & (0b0011_1111 >> tailLength);
  foreach (i; 0..tailLength)
      result = (result << 6) | (s[1+i] & 0b0011_1111);
  return result;
}

And I'm going to suggest a variant of the above with one less branch 
(but three more instructions): look at how tailLenght is computed by 
or'ing with the negative of bit 2 of indicator. I suspect it'll be 
faster with non-ASCII input, unless it gets inlined less.

dchar front2(char[] s)
{
  if (s[0] < 0b1000_0000)
    return s[0]; // ASCII

  // pattern     indicator  tailLength
  // 0b1100xxxx  0b00 (0)   1
  // 0b1101xxxx  0b01 (1)   1
  // 0b1110xxxx  0b10 (2)   2
  // 0b1111xxxx  0b11 (3)   3
  // note: undefined result for illegal 0b11111xxx case

  auto indicator = ((s[0] >> 4) & 0b11);
  auto tailLength = indicator | ((~indicator >> 1) & 0b1);

  dchar result = s[0] & (0b0011_1111 >> tailLength);
  foreach (i; 0..tailLength)
      result = (result << 6) | (s[1+i] & 0b0011_1111);
  return result;
}

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

On Monday, 24 March 2014 at 05:27:43 UTC, Michel Fortin wrote:
 Here's the corrected (again) version of front1:
 [Snip]

 And I'm going to suggest a variant of the above with one less 
 branch (but three more instructions):
 [Snip]

Everything seems to work in your corrected versions:
http://dpaste.dzfl.pl/3b32535559ba

Andrei's version doesn't compile on recent compiler versions due 
to goto skipping initialization.

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 06:08:30 +0000, "safety0ff" <safety0ff.dev gmail.com> said:

 Everything seems to work in your corrected versions:
 http://dpaste.dzfl.pl/3b32535559ba
 
 Andrei's version doesn't compile on recent compiler versions due to 
 goto skipping initialization.

Ok, so let's check what happens in actual usage. In other words, what 
happens when front is inlined and used in a loop. I'm counting the 
number of assembly lines of this main function for looping through each 
possible input using the various implementations:

http://goo.gl/QjnA2k

implementation   line count of main in assembly
andralex         285 -  9 labels = 276 instructions
MFortin1         135 - 13 labels = 122 instructions
MFortin2         150 - 14 labels = 136 instructions
safety0ff         -- not inlined (too big?) --
dnspies          161 - 15 labels = 146 instructions
CWilliams        233 - 25 labels = 208 instructions

For compactness, my first implementation seems to be the winner, both 
with and without inlining.

That said, I think front should only assume a non-empty char[] and thus 
should check the length before accessing anything after the first byte 
to protect against incomplete multi-byte sequences such as 
[0x1000_0000]. So I added this line to my two implementations:

    if (1+tailLength >= s.length) return dchar.init;

Now lets see what happens with those changes:

http://goo.gl/XPCGYE

implementation   line count of main in assembly
MFortin1Check    103 - 11 labels =  92 instructions
MFortin2Check    135 - 13 labels = 122 instructions

Now I'm baffled. Adding a test makes the code shorter? It actually make 
the standalone functions longer by 8 instructions (as expected), but 
the inlined version is shorter. My guess is that the optimizer creates 
something entirely different and it turns out that this different 
version optimises better after inlining.

That said, my test "main" function isn't very representative of the 
general case because the length is statically known by the optimizer. 
Let's see what it does when the length is not statically known:

http://goo.gl/E2Q0Yu

implementation   line count of main in assembly
andralex         384 -  9 labels = 375 instructions
MFortin1         173 - 19 labels = 154 instructions
MFortin2         182 - 18 labels = 164 instructions
safety0ff         -- not inlined --
dnspies           -- not inlined --
CWilliams        229 - 23 labels = 206 instructions
MFortin1Check    211 - 24 labels = 187 instructions
MFortin2Check    218 - 21 labels = 197 instructions

So again MFortin1 is the winner for compactness. Still, I maintain that 
we ought to at least check the length of the array for multi-byte 
sequences to protect against incomplete sequences that could lay at the 
end of the string, so I'd favor MFortin1Check.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 24 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 13:39:45 +0000, Michel Fortin <michel.fortin michelf.ca> said:

 That said, my test "main" function isn't very representative of the 
 general case because the length is statically known by the optimizer. 
 Let's see what it does when the length is not statically known:

Oops, wrong link, was from an intermediary test.
This one will give you the results in the table:
http://goo.gl/lwU4hv

 implementation   line count of main in assembly
 andralex         384 -  9 labels = 375 instructions
 MFortin1         173 - 19 labels = 154 instructions
 MFortin2         182 - 18 labels = 164 instructions
 safety0ff         -- not inlined --
 dnspies           -- not inlined --
 CWilliams        229 - 23 labels = 206 instructions
 MFortin1Check    211 - 24 labels = 187 instructions
 MFortin2Check    218 - 21 labels = 197 instructions


-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 24 2014

"dnspies" <dspies ualberta.ca> writes:

It seems like mine wasn't being inlined because I had carelessly 
replaced char[] with const(char)[] in the function signature (I 
don't know why that should make a difference, but it does)

Going back to my original version (with Andre's trick to avoid 
letting the loop unroll), I get
dnspies_orig   159 - 16 labels = 143 instructions

However, MFortin1 can be made to do better by taking successive 
slices instead of indexing (and commenting out the check, since 
it no longer helps optimize).
MFortin1_slice 137 - 13 labels = 124 instructions

http://goo.gl/JAgVK1

On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
 On 2014-03-24 06:08:30 +0000, "safety0ff" 
 <safety0ff.dev gmail.com> said:

 Everything seems to work in your corrected versions:
 http://dpaste.dzfl.pl/3b32535559ba
 
 Andrei's version doesn't compile on recent compiler versions 
 due to goto skipping initialization.

 Ok, so let's check what happens in actual usage. In other 
 words, what happens when front is inlined and used in a loop. 
 I'm counting the number of assembly lines of this main function 
 for looping through each possible input using the various 
 implementations:

 http://goo.gl/QjnA2k

 implementation   line count of main in assembly
 andralex         285 -  9 labels = 276 instructions
 MFortin1         135 - 13 labels = 122 instructions
 MFortin2         150 - 14 labels = 136 instructions
 safety0ff         -- not inlined (too big?) --
 dnspies          161 - 15 labels = 146 instructions
 CWilliams        233 - 25 labels = 208 instructions

 For compactness, my first implementation seems to be the 
 winner, both with and without inlining.

 That said, I think front should only assume a non-empty char[] 
 and thus should check the length before accessing anything 
 after the first byte to protect against incomplete multi-byte 
 sequences such as [0x1000_0000]. So I added this line to my two 
 implementations:

    if (1+tailLength >= s.length) return dchar.init;

 Now lets see what happens with those changes:

 http://goo.gl/XPCGYE

 implementation   line count of main in assembly
 MFortin1Check    103 - 11 labels =  92 instructions
 MFortin2Check    135 - 13 labels = 122 instructions

 Now I'm baffled. Adding a test makes the code shorter? It 
 actually make the standalone functions longer by 8 instructions 
 (as expected), but the inlined version is shorter. My guess is 
 that the optimizer creates something entirely different and it 
 turns out that this different version optimises better after 
 inlining.

 That said, my test "main" function isn't very representative of 
 the general case because the length is statically known by the 
 optimizer. Let's see what it does when the length is not 
 statically known:

 http://goo.gl/E2Q0Yu

 implementation   line count of main in assembly
 andralex         384 -  9 labels = 375 instructions
 MFortin1         173 - 19 labels = 154 instructions
 MFortin2         182 - 18 labels = 164 instructions
 safety0ff         -- not inlined --
 dnspies           -- not inlined --
 CWilliams        229 - 23 labels = 206 instructions
 MFortin1Check    211 - 24 labels = 187 instructions
 MFortin2Check    218 - 21 labels = 197 instructions

 So again MFortin1 is the winner for compactness. Still, I 
 maintain that we ought to at least check the length of the 
 array for multi-byte sequences to protect against incomplete 
 sequences that could lay at the end of the string, so I'd favor 
 MFortin1Check.

Mar 24 2014

"dnspies" <dspies ualberta.ca> writes:

Whoops, wrong "original" version.  That was the one before I 
understood the game. Here mine is fixed (but up to 180 lines - 16 
labels = 164 instructions):

http://goo.gl/wjIAVm

On Monday, 24 March 2014 at 17:14:11 UTC, dnspies wrote:
 It seems like mine wasn't being inlined because I had 
 carelessly replaced char[] with const(char)[] in the function 
 signature (I don't know why that should make a difference, but 
 it does)

 Going back to my original version (with Andre's trick to avoid 
 letting the loop unroll), I get
 dnspies_orig   159 - 16 labels = 143 instructions

 However, MFortin1 can be made to do better by taking successive 
 slices instead of indexing (and commenting out the check, since 
 it no longer helps optimize).
 MFortin1_slice 137 - 13 labels = 124 instructions

 http://goo.gl/JAgVK1

 On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:
 On 2014-03-24 06:08:30 +0000, "safety0ff" 
 <safety0ff.dev gmail.com> said:

 Everything seems to work in your corrected versions:
 http://dpaste.dzfl.pl/3b32535559ba
 
 Andrei's version doesn't compile on recent compiler versions 
 due to goto skipping initialization.

 Ok, so let's check what happens in actual usage. In other 
 words, what happens when front is inlined and used in a loop. 
 I'm counting the number of assembly lines of this main 
 function for looping through each possible input using the 
 various implementations:

 http://goo.gl/QjnA2k

 implementation   line count of main in assembly
 andralex         285 -  9 labels = 276 instructions
 MFortin1         135 - 13 labels = 122 instructions
 MFortin2         150 - 14 labels = 136 instructions
 safety0ff         -- not inlined (too big?) --
 dnspies          161 - 15 labels = 146 instructions
 CWilliams        233 - 25 labels = 208 instructions

 For compactness, my first implementation seems to be the 
 winner, both with and without inlining.

 That said, I think front should only assume a non-empty char[] 
 and thus should check the length before accessing anything 
 after the first byte to protect against incomplete multi-byte 
 sequences such as [0x1000_0000]. So I added this line to my 
 two implementations:

   if (1+tailLength >= s.length) return dchar.init;

 Now lets see what happens with those changes:

 http://goo.gl/XPCGYE

 implementation   line count of main in assembly
 MFortin1Check    103 - 11 labels =  92 instructions
 MFortin2Check    135 - 13 labels = 122 instructions

 Now I'm baffled. Adding a test makes the code shorter? It 
 actually make the standalone functions longer by 8 
 instructions (as expected), but the inlined version is 
 shorter. My guess is that the optimizer creates something 
 entirely different and it turns out that this different 
 version optimises better after inlining.

 That said, my test "main" function isn't very representative 
 of the general case because the length is statically known by 
 the optimizer. Let's see what it does when the length is not 
 statically known:

 http://goo.gl/E2Q0Yu

 implementation   line count of main in assembly
 andralex         384 -  9 labels = 375 instructions
 MFortin1         173 - 19 labels = 154 instructions
 MFortin2         182 - 18 labels = 164 instructions
 safety0ff         -- not inlined --
 dnspies           -- not inlined --
 CWilliams        229 - 23 labels = 206 instructions
 MFortin1Check    211 - 24 labels = 187 instructions
 MFortin2Check    218 - 21 labels = 197 instructions

 So again MFortin1 is the winner for compactness. Still, I 
 maintain that we ought to at least check the length of the 
 array for multi-byte sequences to protect against incomplete 
 sequences that could lay at the end of the string, so I'd 
 favor MFortin1Check.

Mar 24 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

On Monday, 24 March 2014 at 04:37:23 UTC, Michel Fortin wrote:
 dchar front(char[] s)
 {
  if (s[0] < 0b1000000)
    return s[0]; // ASCII
  auto indicator = (s[0] >> 5) & 0b11;
  auto tailLength = indicator ? indicator : 1;

  dchar result = s[0] & (0b00111111 >> tailLength);
  foreach (i; 0..tailLength)
      result = (result << 6) | (s[1+i] & 0b00111111);
  return result;
 }

 (Disclaimer: not tested, but I did check that all the expected 
 code paths are present in the assembly this time.)

0b1000000 is missing a zero: 0b1000_0000
Fixing that, I still get a range violation from "s[1+i]".

----- Test program -----
void main()
{
	foreach (ubyte b0; 0..0x80)
	{
		char[] s = [b0];
		assert(front(s)==front2(s));
	}	writeln("Single byte done");
	foreach (ubyte b0; 0..0x40)
	foreach (ubyte b1; 0..0x20)
	{
		char[] s = [0xC0|b1, 0x80|b0];
		assert(front(s)==front2(s));
	}	writeln("Double byte done");
	foreach (ubyte b0; 0..0x40)
	foreach (ubyte b1; 0..0x40)
	foreach (ubyte b2; 0..0x10)
	{
		char[] s = [0xE0|b2, 0x80|b1, 0x80|b0];
		assert(front(s)==front2(s));
	}	writeln("Triple byte done");
	foreach (ubyte b0; 0..0x40)
	foreach (ubyte b1; 0..0x40)
	foreach (ubyte b2; 0..0x40)
	foreach (ubyte b3; 0..0x08)
	{
		char[] s = [0xF0|b3, 0x80|b2, 0x80|b1, 0x80|b0];
		assert(front(s)==front2(s));
	}
}

Mar 23 2014

Michel Fortin <michel.fortin michelf.ca> writes:

On 2014-03-24 04:58:08 +0000, "safety0ff" <safety0ff.dev gmail.com> said:

 0b1000000 is missing a zero: 0b1000_0000

Indeed, thanks.

 Fixing that, I still get a range violation from "s[1+i]".

That "auto indicator = (s[0] >> 5) & 0b11;" line is wrong too. s[0] 
needs a shift by 4, not by 5. No doubt it crashes your test program.

-- 
Michel Fortin
michel.fortin michelf.ca
http://michelf.ca

Mar 23 2014

"safety0ff" <safety0ff.dev gmail.com> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

Here's my attempt: http://goo.gl/alJ9JS
I interpreted the challenge slightly differently, I went for 
fewer lines of code and fewer branches.

I verified the output for all valid inputs.

Mar 23 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/23/14, 8:36 PM, safety0ff wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Here's my attempt: http://goo.gl/alJ9JS
 I interpreted the challenge slightly differently, I went for fewer lines
 of code and fewer branches.

 I verified the output for all valid inputs.

Very interesting! -- Andrei

Mar 23 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

http://goo.gl/TaZTNB

Mar 24 2014

"JR" <zorael gmail.com> writes:

On Monday, 24 March 2014 at 07:53:11 UTC, Chris Williams wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 http://goo.gl/TaZTNB

  for (int i = 1; i < len; i++) {

size_t, ++i! :>

(optimized away but still, nitnit pickpick)

Mar 24 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/24/14, 12:53 AM, Chris Williams wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 http://goo.gl/TaZTNB

Nice! Why assert(ret != 0)?

Andrei

Mar 24 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Monday, 24 March 2014 at 16:17:52 UTC, Andrei Alexandrescu 
wrote:
 On 3/24/14, 12:53 AM, Chris Williams wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 http://goo.gl/TaZTNB

 Nice! Why assert(ret != 0)?

 Andrei

It should be -1. I wanted to make sure that bsr() wasn't called 
on zero, but of course that's not the correct way to test it when 
I'm about to ~ it.

Mar 24 2014

"dnspies" <dspies ualberta.ca> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

Here's mine (the loop gets unrolled):

dchar front(const char[] s) {
   assert(s.length > 0);
   byte c = s[0];
   dchar res = cast(ubyte)c;
   if(c >= 0) {
     return res;
   }
   c <<= 1;
   assert(c < 0);
   for(int i = 1; i < 4; i++) {
     assert(i < s.length);
     ubyte d = s[i];
     assert((d >> 6) == 0b10);
     res = (res << 8) | d;
     c <<= 1;
     if(c >= 0)
       return res;
   }
   assert(false);
}

Mar 24 2014

"dnspies" <dspies ualberta.ca> writes:

On Monday, 24 March 2014 at 08:06:53 UTC, dnspies wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Here's mine (the loop gets unrolled):

 dchar front(const char[] s) {
   assert(s.length > 0);
   byte c = s[0];
   dchar res = cast(ubyte)c;
   if(c >= 0) {
     return res;
   }
   c <<= 1;
   assert(c < 0);
   for(int i = 1; i < 4; i++) {
     assert(i < s.length);
     ubyte d = s[i];
     assert((d >> 6) == 0b10);
     res = (res << 8) | d;
     c <<= 1;
     if(c >= 0)
       return res;
   }
   assert(false);
 }

Sorry, I misunderstood.  We only want the x's in the output.  
Here's my revised solution

http://goo.gl/PL729J

dchar front(const char[] s) {
   assert(s.length > 0);
   byte c = s[0];
   dchar res = cast(ubyte)c;
   if(c >= 0)
     return res;
   dchar cover = 0b0100_0000;
   c <<= 1;
   assert(c < 0);
   for(int i = 1; i < 4; i++) {
     assert(i < s.length);
     ubyte d = s[i];
     assert((d >> 6) == 0b10);
     cover <<= 5;
     res = ((res << 6) & (cover - 1)) | (d & 0b0011_1111);
     c <<= 1;
     if(c >= 0)
       return res;
   }
   assert(false);
}

Mar 24 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/24/14, 1:56 AM, dnspies wrote:
 On Monday, 24 March 2014 at 08:06:53 UTC, dnspies wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Here's mine (the loop gets unrolled):

 dchar front(const char[] s) {
   assert(s.length > 0);
   byte c = s[0];
   dchar res = cast(ubyte)c;
   if(c >= 0) {
     return res;
   }
   c <<= 1;
   assert(c < 0);
   for(int i = 1; i < 4; i++) {
     assert(i < s.length);
     ubyte d = s[i];
     assert((d >> 6) == 0b10);
     res = (res << 8) | d;
     c <<= 1;
     if(c >= 0)
       return res;
   }
   assert(false);
 }

 Sorry, I misunderstood.  We only want the x's in the output. Here's my
 revised solution

 http://goo.gl/PL729J

That turned out quite large. -- Andrei

Mar 24 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/24/14, 1:06 AM, dnspies wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Here's mine (the loop gets unrolled):

 dchar front(const char[] s) {
    assert(s.length > 0);
    byte c = s[0];
    dchar res = cast(ubyte)c;
    if(c >= 0) {
      return res;
    }
    c <<= 1;
    assert(c < 0);
    for(int i = 1; i < 4; i++) {
      assert(i < s.length);
      ubyte d = s[i];
      assert((d >> 6) == 0b10);
      res = (res << 8) | d;
      c <<= 1;
      if(c >= 0)
        return res;
    }
    assert(false);
 }

http://goo.gl/HBSv5T - looks nice! For minimum size replace the 
assert(false) with one inside the loop: http://goo.gl/eWs0ft

Andrei

Mar 24 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

Before we roll this out, could we discuss a strategy/guideline in 
regards to detecting and handling invalid UTF sequences?

Having a fast "front" is fine and all, but if it means your 
program asserting in release (or worst, silently corrupting 
memory) just because the client was trying to read a bad text 
file, I'm unsure this is acceptable.

Mar 24 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 24.03.2014 10:02, schrieb monarch_dodra:
 Having a fast "front" is fine and all, but if it means your
 program asserting in release (or worst, silently corrupting
 memory) just because the client was trying to read a bad text
 file, I'm unsure this is acceptable.

it would be great to habe a "basic" form of this range that error 
behavior could be extended policy based / templates - the phobos version 
could extend the basic range with "prefered" error behavior - but the 
basic range is still able to read without checking - for example if i 
know that my input is 100% valid and i need the speed etc.

Mar 24 2014

"Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:

On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline 
 in regards to detecting and handling invalid UTF sequences?

 Having a fast "front" is fine and all, but if it means your 
 program asserting in release (or worst, silently corrupting 
 memory) just because the client was trying to read a bad text 
 file, I'm unsure this is acceptable.

As you'll note from the assembler, asserts are compiled out of 
release builds. Though if the goal of this little project is to 
create useful code for inclusion in a library, you would want to 
switch many of the asserts in the examples into "if (x) return 
0xfffd;". (Mine needs an assert/if in the loop to check for the 
proper bit markers).

Mar 24 2014

"w0rp" <devw0rp gmail.com> writes:

On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline 
 in regards to detecting and handling invalid UTF sequences?

 Having a fast "front" is fine and all, but if it means your 
 program asserting in release (or worst, silently corrupting 
 memory) just because the client was trying to read a bad text 
 file, I'm unsure this is acceptable.

I would strongly advise to at least offer an option, possibly via 
a template parameter, for turning error handling on or off, 
similar to how Python handles decoding. Examples below in Python 
3.



used

sequence removed.

All three strategies are useful from time to time. I mainly reach 
for option three when I'm trying to get some text data out of 
some old broken databases or similar.

We may consider leaving the error checking on in -release for the 
'strict' decoding, but throwing an Error instead of an exception 
so the function can be nothrow. This would prevent memory 
corruption in release code. assert vs throw Error is up for 
debate.

Mar 24 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 24.03.2014 13:51, schrieb w0rp:
 On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline
 in regards to detecting and handling invalid UTF sequences?

 Having a fast "front" is fine and all, but if it means your
 program asserting in release (or worst, silently corrupting
 memory) just because the client was trying to read a bad text
 file, I'm unsure this is acceptable.

 I would strongly advise to at least offer an option, possibly via
 a template parameter, for turning error handling on or off,
 similar to how Python handles decoding. Examples below in Python
 3.



 used

 sequence removed.

 All three strategies are useful from time to time. I mainly reach
 for option three when I'm trying to get some text data out of
 some old broken databases or similar.

 We may consider leaving the error checking on in -release for the
 'strict' decoding, but throwing an Error instead of an exception
 so the function can be nothrow. This would prevent memory
 corruption in release code. assert vs throw Error is up for
 debate.

+1

Mar 24 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/24/14, 5:51 AM, w0rp wrote:
 On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline in
 regards to detecting and handling invalid UTF sequences?

 Having a fast "front" is fine and all, but if it means your program
 asserting in release (or worst, silently corrupting memory) just
 because the client was trying to read a bad text file, I'm unsure this
 is acceptable.

 I would strongly advise to at least offer an option

Options are fine for functions etc. But front would need to find an 
all-around good compromise between speed and correctness.

Andrei

Mar 24 2014

dennis luehring <dl.soluz gmx.net> writes:

Am 24.03.2014 17:44, schrieb Andrei Alexandrescu:
 On 3/24/14, 5:51 AM, w0rp wrote:
 On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline in
 regards to detecting and handling invalid UTF sequences?

 Having a fast "front" is fine and all, but if it means your program
 asserting in release (or worst, silently corrupting memory) just
 because the client was trying to read a bad text file, I'm unsure this
 is acceptable.

 I would strongly advise to at least offer an option

 Options are fine for functions etc. But front would need to find an
 all-around good compromise between speed and correctness.

 Andrei




sequence removed.

i think there should be a base range for UTF8 iteration - with policy 
based error extension (like in python) and some variants that defer this 
base UTF8 range with different error behavior - and one of these become 
the phobos standard = default parameter so its still switchable

Mar 24 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/24/14, 2:02 AM, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline in
 regards to detecting and handling invalid UTF sequences?

I think std.array.front should return the invalid dchar on error, and 
popFront should attempt to resync on error.

Andrei

Mar 24 2014

"Vladimir Panteleev" <vladimir thecybershadow.net> writes:

On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu 
wrote:
 On 3/24/14, 2:02 AM, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline 
 in
 regards to detecting and handling invalid UTF sequences?

 I think std.array.front should return the invalid dchar on 
 error, and popFront should attempt to resync on error.

Ignoring UTF errors is a lossy operation and has the potential 
problem of irreversible data loss. For example, consider a 
program which needs to process Windows-1251 files: it would need 
to read the file from disk, convert to UTF-8, process it, convert 
back to Windows-1251, and save it back to disk. If a bug causes 
the UTF-8 conversion step to be accidentally skipped, then all 
Unicode data in that file is lost.

I think UTF-8 decoding operations should, by default, throw on 
UTF-8 errors. Ignoring UTF-8 errors should only be done 
explicitly, with the programmer's consent.

Mar 24 2014

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu 
wrote:
 On 3/24/14, 2:02 AM, monarch_dodra wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Before we roll this out, could we discuss a strategy/guideline 
 in
 regards to detecting and handling invalid UTF sequences?

 I think std.array.front should return the invalid dchar on 
 error, and popFront should attempt to resync on error.

 Andrei

That sounds good to me (I think). Better than a straight up 
assert anyways...

...which is what the code you submitted does.

Also, I think relying on bounds checking is wrong: Disabling 
bounds checking mean you are OK for memory corruption for *wrong 
code*. I don't think invalid string is wrong code. 'front' should 
do an actual "if", and "assert(0)" when needed. This should have 
no impact on performance BTW, if done with pointer extraction 
("slice.ptr[1]"), and in a trusted context.

Of course, this is all assuming we are asserting at all.

Mar 24 2014

"dnspies" <dspies ualberta.ca> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

Ok, I managed to make it smaller.  I think this is the smallest 
so far with only 23 instructions (no loop unrolling in this one):

http://goo.gl/RKF5Vm

Mar 24 2014

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 3/24/14, 3:30 AM, dnspies wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 Ok, I managed to make it smaller.  I think this is the smallest so far
 with only 23 instructions (no loop unrolling in this one):

 http://goo.gl/RKF5Vm

This is nice because even the checks are relatively cheap. It's quite 
difficult to understand tho :o). -- Andrei

Mar 24 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

I've probably missed something, but here's 15 instructions: 
http://goo.gl/d3Mgj0


Also, I contacted the owner of d.godbolt.org and hopefully we'll 
have a more up-to-date gdc on there soon. Perhaps ldc as well :)

Mar 24 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 24 March 2014 at 15:32:09 UTC, John Colvin wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 I've probably missed something, but here's 15 instructions: 
 http://goo.gl/d3Mgj0

woops yeah, that's completely broken.

Mar 24 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

http://forum.dlang.org/post/fsgdviklnbugdzjsgfsk forum.dlang.org

Mar 24 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

On a bigendian machine with loose alignment requirements (1 
byte), you can do this, which is down to 13 instructions on x86 
(which is of course meaningless, what with it being the wrong 
endianess):

uint front(char[] s) {
   if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII
   assert(s[0] & 0b0100_0000);
	
   if(s[0] & 0b0010_0000)
   {
     if(s[0] & 0b0001_0000)
     {
       assert(s.length >=4 && !(s[0] & 0b1000)
              && s[1] <= 0b1011_1111
              && s[2] <= 0b1011_1111
              && s[3] <= 0b1011_1111);
       return *(cast(dchar*)(s.ptr));
     }
     assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <= 
0b1011_1111);
     return *(cast(dchar*)(s.ptr)) >> 8;
   }
	
   assert(s.length >= 2 && s[1] <= 0b1011_1111);
   return *(cast(wchar*)(s.ptr));
}

http://goo.gl/Kf6RZJ


There may be architectures that can benefit from this.

Mar 24 2014

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 24 March 2014 at 16:41:02 UTC, John Colvin wrote:
 On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu 
 wrote:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 On a bigendian machine with loose alignment requirements (1 
 byte), you can do this

Same again, but for little-endian, 18 instructions:

http://goo.gl/jlrweQ

uint front(char[] s)
{
   if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII
   assert(s[0] & 0b0100_0000);
	
   if(s[0] & 0b0010_0000)
   {
     if(s[0] & 0b0001_0000)
     {
       assert(s.length >=4 && !(s[0] & 0b1000)
              && s[1] <= 0b1011_1111
              && s[2] <= 0b1011_1111
              && s[3] <= 0b1011_1111);
       return swapEndian(*(cast(dchar*)(s.ptr)));
     }
     assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <= 
0b1011_1111);
     return swapEndian(*(cast(dchar*)(s.ptr))) >> 8;
   }
	
   assert(s.length >= 2 && s[1] <= 0b1011_1111);
   return swapEndian(*(cast(wchar*)(s.ptr)));
}

Mar 24 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

I had to join the party at some point.
This seems like 25 instructions:
http://goo.gl/N7sHtK

-- 
Dmitry Olshansky

Mar 24 2014

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

24-Mar-2014 23:53, Dmitry Olshansky пишет:
 24-Mar-2014 01:22, Andrei Alexandrescu пишет:
 Here's a baseline: http://goo.gl/91vIGc. Destroy!

 Andrei

 I had to join the party at some point.
 This seems like 25 instructions:
 http://goo.gl/N7sHtK

Interestingly gdc-4.8 produces better results.
http://goo.gl/1R7GMs

-- 
Dmitry Olshansky

Mar 25 2014

"anonymous" <anon ymous.net> writes:

dchar front(char[] s) {
   dchar c = s[0];
   if (!(c & 0x80))
     return c;
   byte b = (c >> 4) & 3;
   b += !b;
   c &= 63 >> b;
   char *p = s.ptr;
   do {
     p++;
     c = c << 6 | *p & 63;
   } while(--b);
   return c;
}

Mar 25 2014

D Programming

C/C++ Programming

Other

digitalmars.D - Challenge: write a really really small front() for UTF8