digitalmars.D - Challenge: write a really really small front() for UTF8
- Andrei Alexandrescu (2/2) Mar 23 2014 Here's a baseline: http://goo.gl/91vIGc. Destroy!
- Dmitry Olshansky (5/7) Mar 23 2014 Assertions to check encoding?!
- Andrei Alexandrescu (5/10) Mar 23 2014 That implementation does zero effort to optimize checks themselves, and
- Dmitry Olshansky (6/16) Mar 23 2014 Just how much you are willing to assert? You don't even check length.
- Andrei Alexandrescu (4/19) Mar 23 2014 A replacement for front() in arrays of char and wchar.
- Anonymous (13/13) Mar 23 2014 dchar front(char[] s) {
- Andrei Alexandrescu (3/16) Mar 23 2014 That's smaller but doesn't seem to do the same!
- Michel Fortin (38/39) Mar 23 2014 If you want the smallest assembly size with array bound checking, make
- Vladimir Panteleev (5/25) Mar 23 2014 Since strings are often user input, if there is chance that bad
- Mike (6/8) Mar 23 2014 This example only considers encodings of up to 4 bytes, but UTF-8
- Walter Bright (2/4) Mar 23 2014 It's not anymore. The 5 and 6 byte encodings are now illegal.
- =?UTF-8?B?U2ltZW4gS2rDpnLDpXM=?= (5/12) Mar 23 2014 RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to
- Dmitry Olshansky (6/18) Mar 24 2014 More importantly Unicode standard explicitly fixed the range of code
- Daniel N (12/19) Mar 24 2014 I did some hacks using C at work with _pext_u32, it's an
- Daniel N (43/46) Mar 24 2014 I now managed to dig up my old C source... but I'm still blocked
- Iain Buclaw (3/13) Mar 25 2014 4) It's tied to one piece of hardware.
- Nick Sabalausky (12/30) Mar 25 2014 bool supportCpuFeatureX;
- dennis luehring (2/30) Mar 25 2014 the extra branch could kill the performance benefit if doStuff is too sm...
- Daniel N (4/14) Mar 25 2014 you'd simply have to hoist the condition outside the inner loop.
- Piotr Szturmaj (9/42) Mar 25 2014 void function() doStuff;
- Michel Fortin (31/32) Mar 23 2014 Optimizing for smallest assembly size:
- Andrei Alexandrescu (4/32) Mar 23 2014 Nice, thanks! I'd hope for a short path for the ASCII subset, could you
- Michel Fortin (30/65) Mar 23 2014 Unfortunately, there's a bug in the above. A missing "break" results in
- bearophile (4/5) Mar 23 2014 Isn't your wish already granted?
- Michel Fortin (7/12) Mar 23 2014 Maybe. I don't have a D compiler installed at the moment to check. I'm
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/17) Mar 23 2014 That compiler is very old. :)
- Andrei Alexandrescu (8/69) Mar 23 2014 -w
- bearophile (6/11) Mar 23 2014 Lot of people in D.learn don't use warnings.
- Michel Fortin (23/24) Mar 23 2014 Oops, messed up my patterns. Here's a hopefully fixed front():
- Andrei Alexandrescu (3/20) Mar 23 2014 http://goo.gl/y9EVFr
- Michel Fortin (50/55) Mar 23 2014 As I said earlier, that front2 version is broken. It's missing a break.
- safety0ff (5/10) Mar 23 2014 Everything seems to work in your corrected versions:
- Michel Fortin (51/56) Mar 24 2014 Ok, so let's check what happens in actual usage. In other words, what
- Michel Fortin (8/20) Mar 24 2014 Oops, wrong link, was from an intermediary test.
- dnspies (12/70) Mar 24 2014 It seems like mine wasn't being inlined because I had carelessly
- dnspies (5/91) Mar 24 2014 Whoops, wrong "original" version. That was the one before I
- safety0ff (33/46) Mar 23 2014 0b1000000 is missing a zero: 0b1000_0000
- Michel Fortin (8/10) Mar 23 2014 That "auto indicator = (s[0] >> 5) & 0b11;" line is wrong too. s[0]
- safety0ff (6/8) Mar 23 2014 Here's my attempt: http://goo.gl/alJ9JS
- Andrei Alexandrescu (2/10) Mar 23 2014 Very interesting! -- Andrei
- Chris Williams (3/5) Mar 24 2014 http://goo.gl/TaZTNB
- JR (3/10) Mar 24 2014 size_t, ++i! :>
- Andrei Alexandrescu (3/8) Mar 24 2014 Nice! Why assert(ret != 0)?
- Chris Williams (5/15) Mar 24 2014 It should be -1. I wanted to make sure that bsr() wasn't called
- dnspies (23/25) Mar 24 2014 Here's mine (the loop gets unrolled):
- dnspies (25/51) Mar 24 2014 Sorry, I misunderstood. We only want the x's in the output.
- Andrei Alexandrescu (2/33) Mar 24 2014 That turned out quite large. -- Andrei
- Andrei Alexandrescu (4/29) Mar 24 2014 http://goo.gl/HBSv5T - looks nice! For minimum size replace the
- monarch_dodra (8/10) Mar 24 2014 Before we roll this out, could we discuss a strategy/guideline in
- dennis luehring (6/10) Mar 24 2014 it would be great to habe a "basic" form of this range that error
- Chris Williams (7/18) Mar 24 2014 As you'll note from the assembler, asserts are compiled out of
- w0rp (18/29) Mar 24 2014 I would strongly advise to at least offer an option, possibly via
- dennis luehring (2/33) Mar 24 2014 +1
- Andrei Alexandrescu (4/18) Mar 24 2014 Options are fine for functions etc. But front would need to find an
- dennis luehring (9/28) Mar 24 2014 b"\255".decode("utf-8", errors="strict") # UnicodeDecodeError
- Andrei Alexandrescu (4/10) Mar 24 2014 I think std.array.front should return the invalid dchar on error, and
- Vladimir Panteleev (12/24) Mar 24 2014 Ignoring UTF errors is a lossy operation and has the potential
- monarch_dodra (12/25) Mar 24 2014 That sounds good to me (I think). Better than a straight up
- dnspies (5/7) Mar 24 2014 Ok, I managed to make it smaller. I think this is the smallest
- Andrei Alexandrescu (3/10) Mar 24 2014 This is nice because even the checks are relatively cheap. It's quite
- John Colvin (6/8) Mar 24 2014 I've probably missed something, but here's 15 instructions:
- John Colvin (2/9) Mar 24 2014 woops yeah, that's completely broken.
- John Colvin (3/5) Mar 24 2014 http://forum.dlang.org/post/fsgdviklnbugdzjsgfsk@forum.dlang.org
- John Colvin (30/32) Mar 24 2014 On a bigendian machine with loose alignment requirements (1
- John Colvin (26/33) Mar 24 2014 Same again, but for little-endian, 18 instructions:
- Dmitry Olshansky (6/8) Mar 24 2014 I had to join the party at some point.
- Dmitry Olshansky (5/12) Mar 25 2014 Interestingly gdc-4.8 produces better results.
- anonymous (14/14) Mar 25 2014 dchar front(char[] s) {
Here's a baseline: http://goo.gl/91vIGc. Destroy! Andrei
Mar 23 2014
24-Mar-2014 01:22, Andrei Alexandrescu пишет:Here's a baseline: http://goo.gl/91vIGc. Destroy!Assertions to check encoding?! I thought it would detect broken encoding and do a substitution at least.Andrei-- Dmitry Olshansky
Mar 23 2014
On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:24-Mar-2014 01:22, Andrei Alexandrescu пишет:That implementation does zero effort to optimize checks themselves, and indeed puts them in asserts. I think there's value in having such a primitive down below. AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy!Assertions to check encoding?! I thought it would detect broken encoding and do a substitution at least.
Mar 23 2014
24-Mar-2014 01:34, Andrei Alexandrescu пишет:On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:Just how much you are willing to assert? You don't even check length. In short - what are the specs of this primitive and where you see it being used.24-Mar-2014 01:22, Andrei Alexandrescu пишет:That implementation does zero effort to optimize checks themselves, and indeed puts them in asserts. I think there's value in having such a primitive down below.Here's a baseline: http://goo.gl/91vIGc. Destroy!Assertions to check encoding?! I thought it would detect broken encoding and do a substitution at least.Andrei-- Dmitry Olshansky
Mar 23 2014
On 3/23/14, 3:10 PM, Dmitry Olshansky wrote:24-Mar-2014 01:34, Andrei Alexandrescu пишет:Array bounds checking takes care of that.On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:Just how much you are willing to assert? You don't even check length.24-Mar-2014 01:22, Andrei Alexandrescu пишет:That implementation does zero effort to optimize checks themselves, and indeed puts them in asserts. I think there's value in having such a primitive down below.Here's a baseline: http://goo.gl/91vIGc. Destroy!Assertions to check encoding?! I thought it would detect broken encoding and do a substitution at least.In short - what are the specs of this primitive and where you see it being used.A replacement for front() in arrays of char and wchar. Andrei
Mar 23 2014
dchar front(char[] s) { uint c = s[0]; ubyte p = ~s[0]; if (p>>7) return c; c = c<<8 | s[1]; if (p>>5) return c; c = c<<8 | s[2]; if (p>>4) return c; return c<<8 | s[3]; }
Mar 23 2014
On 3/23/14, 4:28 PM, Anonymous wrote:dchar front(char[] s) { uint c = s[0]; ubyte p = ~s[0]; if (p>>7) return c; c = c<<8 | s[1]; if (p>>5) return c; c = c<<8 | s[2]; if (p>>4) return c; return c<<8 | s[3]; }That's smaller but doesn't seem to do the same! Andrei
Mar 23 2014
On 2014-03-23 22:58:32 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Array bounds checking takes care of that.If you want the smallest assembly size with array bound checking, make the function safe and see the disaster it causes to the assembly size. That's what you have to optimize. If you're going to optimize while looking at the assembly, better check for bounds manually: dchar front(char[] s) { if (s.length < 1) return dchar.init; size_t bytesize; dchar result; switch (s[0]) { case 0b00000000: .. case 0b01111111: return s[0]; case 0b11000000: .. case 0b11011111: result = s[0] & 0b00011111; bytesize = 2; break; case 0b11100000: .. case 0b11101111: result = s[0] & 0b00001111; bytesize = 3; break; case 0b11110000: .. case 0b11110111: result = s[0] & 0b00000111; bytesize = 4; default: return dchar.init; } if (s.length < bytesize) return dchar.init; foreach (i; 1..bytesize) result = (result << 6) | (s[i] & 0b00111111); return result; } -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 23 2014
On Sunday, 23 March 2014 at 22:58:53 UTC, Andrei Alexandrescu wrote:On 3/23/14, 3:10 PM, Dmitry Olshansky wrote:Since strings are often user input, if there is chance that bad input can cause undefined behavior, then the checks must be done unconditionally.24-Mar-2014 01:34, Andrei Alexandrescu пишет:Array bounds checking takes care of that.On 3/23/14, 2:29 PM, Dmitry Olshansky wrote:Just how much you are willing to assert? You don't even check length.24-Mar-2014 01:22, Andrei Alexandrescu пишет:That implementation does zero effort to optimize checks themselves, and indeed puts them in asserts. I think there's value in having such a primitive down below.Here's a baseline: http://goo.gl/91vIGc. Destroy!Assertions to check encoding?! I thought it would detect broken encoding and do a substitution at least.
Mar 23 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiThis example only considers encodings of up to 4 bytes, but UTF-8 can encode code points in as many as 6 bytes. Is that not a concern? Mike
Mar 23 2014
On 3/23/2014 5:32 PM, Mike wrote:This example only considers encodings of up to 4 bytes, but UTF-8 can encode code points in as many as 6 bytes. Is that not a concern?It's not anymore. The 5 and 6 byte encodings are now illegal.
Mar 23 2014
On 2014-03-24 00:32, Mike wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to conform to constraints in UTF-16, removing all 5- and 6-byte sequences. -- SimenHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiThis example only considers encodings of up to 4 bytes, but UTF-8 can encode code points in as many as 6 bytes. Is that not a concern? Mike
Mar 23 2014
24-Mar-2014 04:44, Simen Kjærås пишет:On 2014-03-24 00:32, Mike wrote:More importantly Unicode standard explicitly fixed the range of code points to that of representable in UTF-16. Starting with the 5th version of the standard if memory serves me right. -- Dmitry OlshanskyOn Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to conform to constraints in UTF-16, removing all 5- and 6-byte sequences.Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiThis example only considers encodings of up to 4 bytes, but UTF-8 can encode code points in as many as 6 bytes. Is that not a concern? Mike
Mar 24 2014
On Monday, 24 March 2014 at 11:48:00 UTC, Dmitry Olshansky wrote:I did some hacks using C at work with _pext_u32, it's an absolutely wonderful instruction(pext) with the corresponding pdep. http://software.intel.com/sites/landingpage/IntrinsicsGuide/ And ridiculously fast according to Agner(Latency 3, Throughput 1): http://www.agner.org/optimize/instruction_tables.pdf I think we should add this as an intrinsic to D as well(if it isn't already, but I couldn't find it)... it could do wonders for utf decoding. I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.RFC 3629 (http://tools.ietf.org/html/rfc3629) restricted UTF-8 to conform to constraints in UTF-16, removing all 5- and 6-byte sequences.More importantly Unicode standard explicitly fixed the range of code points to that of representable in UTF-16. Starting with the 5th version of the standard if memory serves me right.
Mar 24 2014
On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction... 1) I know my solution is not directly comparable to the rest in this thread(for many reasons). 2) It's of course trivial to add a fast path for ascii... if desired. 3) It throws safety and standards out the window. #include <x86intrin.h> char32_t front(const char* inp) { static const uint32_t mask[32] = { 0b01111111'00000000'00000000'00000000, 0b00000000'00000000'00000000'00000000, 0b00011111'00111111'00000000'00000000, 0b00001111'00111111'00111111'00000000, 0b00000111'00111111'00111111'00111111, }; uint32_t rev = __builtin_bswap32(reinterpret_cast<const uint32_t*>(inp)[0]); uint32_t len = __lzcnt32(~rev); return _pext_u32(rev, mask[len]); } This is what clang 3.4 generated: pushq %rbp Ltmp2: .cfi_def_cfa_offset 16 Ltmp3: .cfi_offset %rbp, -16 movq %rsp, %rbp Ltmp4: .cfi_def_cfa_register %rbp movl (%rdi), %eax bswapl %eax movl %eax, %ecx notl %ecx lzcntl %ecx, %ecx leaq __ZZ5frontPKcE4mask(%rip), %rdx pextl (%rdx,%rcx,4), %eax, %eax popq %rbp ret
Mar 24 2014
On 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:4) It's tied to one piece of hardware. No Thankee.I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction... 1) I know my solution is not directly comparable to the rest in this thread(for many reasons). 2) It's of course trivial to add a fast path for ascii... if desired. 3) It throws safety and standards out the window.
Mar 25 2014
On 3/25/2014 4:00 AM, Iain Buclaw wrote:On 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:bool supportCpuFeatureX; void main() { supportCpuFeatureX = detectCpuFeatureX(); doStuff(); } void doStuff() { if(supportCpuFeatureX) doStuff_FeatureX(); else doStuff_Fallback(); }On Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:4) It's tied to one piece of hardware. No Thankee.I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction... 1) I know my solution is not directly comparable to the rest in this thread(for many reasons). 2) It's of course trivial to add a fast path for ascii... if desired. 3) It throws safety and standards out the window.dmd -inline blah.d
Mar 25 2014
Am 25.03.2014 11:38, schrieb Nick Sabalausky:On 3/25/2014 4:00 AM, Iain Buclaw wrote:the extra branch could kill the performance benefit if doStuff is too smallOn 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:void doStuff() { if(supportCpuFeatureX) doStuff_FeatureX(); else doStuff_Fallback(); } > dmd -inline blah.dOn Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:4) It's tied to one piece of hardware. No Thankee.I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction... 1) I know my solution is not directly comparable to the rest in this thread(for many reasons). 2) It's of course trivial to add a fast path for ascii... if desired. 3) It throws safety and standards out the window.
Mar 25 2014
On Tuesday, 25 March 2014 at 10:42:59 UTC, dennis luehring wrote:you'd simply have to hoist the condition outside the inner loop. Furthermore the branch prediction would never fail, only unpredictable branches are terrible.void doStuff() { if(supportCpuFeatureX) doStuff_FeatureX(); else doStuff_Fallback(); } > dmd -inline blah.dthe extra branch could kill the performance benefit if doStuff is too small
Mar 25 2014
W dniu 2014-03-25 11:42, dennis luehring pisze:Am 25.03.2014 11:38, schrieb Nick Sabalausky:void function() doStuff; void main() { auto supportCpuFeatureX = detectCpuFeatureX(); if (supportCpuFeatureX) doStuff = &doStuff_FeatureX; else doStuff = &doStuff_Fallback; }On 3/25/2014 4:00 AM, Iain Buclaw wrote:the extra branch could kill the performance benefit if doStuff is too smallOn 25 March 2014 00:04, Daniel N <ufo orbiting.us> wrote:void doStuff() { if(supportCpuFeatureX) doStuff_FeatureX(); else doStuff_Fallback(); } > dmd -inline blah.dOn Monday, 24 March 2014 at 12:21:55 UTC, Daniel N wrote:4) It's tied to one piece of hardware. No Thankee.I'm currently too busy to submit a complete solution, but please feel free to use my idea if you think it sounds promising.I now managed to dig up my old C source... but I'm still blocked by dmd not accepting the 'pext' instruction... 1) I know my solution is not directly comparable to the rest in this thread(for many reasons). 2) It's of course trivial to add a fast path for ascii... if desired. 3) It throws safety and standards out the window.
Mar 25 2014
On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Here's a baseline: http://goo.gl/91vIGc. Destroy!Optimizing for smallest assembly size: dchar front(char[] s) { size_t bytesize; dchar result; switch (s[0]) { case 0b00000000: .. case 0b01111111: return s[0]; case 0b11000000: .. case 0b11011111: return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111); case 0b11100000: .. case 0b11101111: result = s[0] & 0b00001111; bytesize = 3; break; case 0b11110000: .. case 0b11110111: result = s[0] & 0b00000111; bytesize = 4; default: return dchar.init; } foreach (i; 1..bytesize) result = (result << 6) | (s[i] & 0b00111111); return result; } -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 23 2014
On 3/23/14, 6:53 PM, Michel Fortin wrote:On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Nice, thanks! I'd hope for a short path for the ASCII subset, could you achieve that? AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy!Optimizing for smallest assembly size: dchar front(char[] s) { size_t bytesize; dchar result; switch (s[0]) { case 0b00000000: .. case 0b01111111: return s[0]; case 0b11000000: .. case 0b11011111: return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111); case 0b11100000: .. case 0b11101111: result = s[0] & 0b00001111; bytesize = 3; break; case 0b11110000: .. case 0b11110111: result = s[0] & 0b00000111; bytesize = 4; default: return dchar.init; } foreach (i; 1..bytesize) result = (result << 6) | (s[i] & 0b00111111); return result; }
Mar 23 2014
On 2014-03-24 02:25:17 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:On 3/23/14, 6:53 PM, Michel Fortin wrote:Unfortunately, there's a bug in the above. A missing "break" results in a fallthrough to default case. That's why the optimizer is so good, it just omits the four-byte branch entirely. I noticed something was wrong by looking at the assembly. I really wish D had no implicit fallthrough. But try this instead, the result is even shorter: dchar front(char[] s) { if (s[0] < 0b1000000) return s[0]; // ASCII // pattern indicator tailLength // 0b100xxxxx 0b00 (0) 1 // 0b101xxxxx 0b01 (1) 1 == indicator // 0b110xxxxx 0b10 (2) 2 == indicator // 0b111xxxxx 0b11 (3) 3 == indicator // note: undefined result for illegal 0b1111xxxx case auto indicator = (s[0] >> 5) & 0b11; auto tailLength = indicator ? indicator : 1; dchar result = s[0] & (0b00111111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b00111111); return result; } (Disclaimer: not tested, but I did check that all the expected code paths are present in the assembly this time.) -- Michel Fortin michel.fortin michelf.ca http://michelf.caOn 2014-03-23 21:22:58 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Nice, thanks! I'd hope for a short path for the ASCII subset, could you achieve that?Here's a baseline: http://goo.gl/91vIGc. Destroy!Optimizing for smallest assembly size: dchar front(char[] s) { size_t bytesize; dchar result; switch (s[0]) { case 0b00000000: .. case 0b01111111: return s[0]; case 0b11000000: .. case 0b11011111: return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111); case 0b11100000: .. case 0b11101111: result = s[0] & 0b00001111; bytesize = 3; break; case 0b11110000: .. case 0b11110111: result = s[0] & 0b00000111; bytesize = 4; default: return dchar.init; } foreach (i; 1..bytesize) result = (result << 6) | (s[i] & 0b00111111); return result; }
Mar 23 2014
Michel Fortin:I really wish D had no implicit fallthrough.Isn't your wish already granted? Bye, bearophile
Mar 23 2014
On 2014-03-24 04:39:08 +0000, "bearophile" <bearophileHUGS lycos.com> said:Michel Fortin:Maybe. I don't have a D compiler installed at the moment to check. I'm just playing with d.godbolt.org and it accepts implicit fallthrough. -- Michel Fortin michel.fortin michelf.ca http://michelf.caI really wish D had no implicit fallthrough.Isn't your wish already granted?
Mar 23 2014
On 03/23/2014 09:44 PM, Michel Fortin wrote:On 2014-03-24 04:39:08 +0000, "bearophile" <bearophileHUGS lycos.com> said:That compiler is very old. :) import std.compiler; pragma(msg, version_major); pragma(msg, version_minor); The output: 2u 55u AliMichel Fortin:Maybe. I don't have a D compiler installed at the moment to check. I'm just playing with d.godbolt.org and it accepts implicit fallthrough.I really wish D had no implicit fallthrough.Isn't your wish already granted?
Mar 23 2014
On 3/23/14, 9:37 PM, Michel Fortin wrote:On 2014-03-24 02:25:17 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:-w Probably time to move that from a warning to an error. In the short time we've used D at Facebook (forgot to add "-w" to the implicit flags) we've already have not one, but two bugs related to switch's implicit fallthrough.On 3/23/14, 6:53 PM, Michel Fortin wrote:Unfortunately, there's a bug in the above. A missing "break" results in a fallthrough to default case. That's why the optimizer is so good, it just omits the four-byte branch entirely. I noticed something was wrong by looking at the assembly. I really wish D had no implicit fallthrough.On 2014-03-23 21:22:58 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Nice, thanks! I'd hope for a short path for the ASCII subset, could you achieve that?Here's a baseline: http://goo.gl/91vIGc. Destroy!Optimizing for smallest assembly size: dchar front(char[] s) { size_t bytesize; dchar result; switch (s[0]) { case 0b00000000: .. case 0b01111111: return s[0]; case 0b11000000: .. case 0b11011111: return ((s[0] & 0b00011111) << 6) | (s[1] & 0b00011111); case 0b11100000: .. case 0b11101111: result = s[0] & 0b00001111; bytesize = 3; break; case 0b11110000: .. case 0b11110111: result = s[0] & 0b00000111; bytesize = 4; default: return dchar.init; } foreach (i; 1..bytesize) result = (result << 6) | (s[i] & 0b00111111); return result; }But try this instead, the result is even shorter: dchar front(char[] s) { if (s[0] < 0b1000000) return s[0]; // ASCII // pattern indicator tailLength // 0b100xxxxx 0b00 (0) 1 // 0b101xxxxx 0b01 (1) 1 == indicator // 0b110xxxxx 0b10 (2) 2 == indicator // 0b111xxxxx 0b11 (3) 3 == indicator // note: undefined result for illegal 0b1111xxxx case auto indicator = (s[0] >> 5) & 0b11; auto tailLength = indicator ? indicator : 1; dchar result = s[0] & (0b00111111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b00111111); return result; } (Disclaimer: not tested, but I did check that all the expected code paths are present in the assembly this time.)Nicely done. I collected what we have at http://goo.gl/W9V6E7. Andrei
Mar 23 2014
Andrei Alexandrescu:-w Probably time to move that from a warning to an error. In the short time we've used D at Facebook (forgot to add "-w" to the implicit flags) we've already have not one, but two bugs related to switch's implicit fallthrough.Lot of people in D.learn don't use warnings. What I'd like is to have (informational) warning active on default, and to disable them on request with a compiler switch. Bye, bearophile
Mar 23 2014
On 2014-03-24 04:37:22 +0000, Michel Fortin <michel.fortin michelf.ca> said:But try this instead, the result is even shorter:Oops, messed up my patterns. Here's a hopefully fixed front(): dchar front(char[] s) { if (s[0] < 0b1000000) return s[0]; // ASCII // pattern indicator tailLength // 0b1100xxxx 0b00 (0) 1 // 0b1101xxxx 0b01 (1) 1 == indicator // 0b1110xxxx 0b10 (2) 2 == indicator // 0b1111xxxx 0b11 (3) 3 == indicator // note: undefined result for illegal 0b11111xxx case auto indicator = (s[0] >> 4) & 0b11; auto tailLength = indicator ? indicator : 1; dchar result = s[0] & (0b00111111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b00111111); return result; } -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 23 2014
On 3/23/14, 9:49 PM, Michel Fortin wrote:dchar front(char[] s) { if (s[0] < 0b1000000) return s[0]; // ASCII // pattern indicator tailLength // 0b1100xxxx 0b00 (0) 1 // 0b1101xxxx 0b01 (1) 1 == indicator // 0b1110xxxx 0b10 (2) 2 == indicator // 0b1111xxxx 0b11 (3) 3 == indicator // note: undefined result for illegal 0b11111xxx case auto indicator = (s[0] >> 4) & 0b11; auto tailLength = indicator ? indicator : 1; dchar result = s[0] & (0b00111111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b00111111); return result; }http://goo.gl/y9EVFr Andrei
Mar 23 2014
On 2014-03-24 05:09:15 +0000, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:On 3/23/14, 9:49 PM, Michel Fortin wrote:As I said earlier, that front2 version is broken. It's missing a break. Adding the break makes it non-interesting from an instruction count point of view. Also, there's still an issue in my code above (found by safetyOff): the first "if" for ASCII should use 0b1000_0000, not 0b1000000 (missing one zero). Here's the corrected (again) version of front1: dchar front1(char[] s) { if (s[0] < 0b1000_0000) return s[0]; // ASCII // pattern indicator tailLength // 0b1100xxxx 0b00 (0) 1 // 0b1101xxxx 0b01 (1) 1 == indicator // 0b1110xxxx 0b10 (2) 2 == indicator // 0b1111xxxx 0b11 (3) 3 == indicator // note: undefined result for illegal 0b11111xxx case auto indicator = (s[0] >> 4) & 0b11; auto tailLength = indicator ? indicator : 1; dchar result = s[0] & (0b0011_1111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b0011_1111); return result; } And I'm going to suggest a variant of the above with one less branch (but three more instructions): look at how tailLenght is computed by or'ing with the negative of bit 2 of indicator. I suspect it'll be faster with non-ASCII input, unless it gets inlined less. dchar front2(char[] s) { if (s[0] < 0b1000_0000) return s[0]; // ASCII // pattern indicator tailLength // 0b1100xxxx 0b00 (0) 1 // 0b1101xxxx 0b01 (1) 1 // 0b1110xxxx 0b10 (2) 2 // 0b1111xxxx 0b11 (3) 3 // note: undefined result for illegal 0b11111xxx case auto indicator = ((s[0] >> 4) & 0b11); auto tailLength = indicator | ((~indicator >> 1) & 0b1); dchar result = s[0] & (0b0011_1111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b0011_1111); return result; } -- Michel Fortin michel.fortin michelf.ca http://michelf.cahttp://goo.gl/y9EVFr Andrei
Mar 23 2014
On Monday, 24 March 2014 at 05:27:43 UTC, Michel Fortin wrote:Here's the corrected (again) version of front1: [Snip] And I'm going to suggest a variant of the above with one less branch (but three more instructions): [Snip]Everything seems to work in your corrected versions: http://dpaste.dzfl.pl/3b32535559ba Andrei's version doesn't compile on recent compiler versions due to goto skipping initialization.
Mar 23 2014
On 2014-03-24 06:08:30 +0000, "safety0ff" <safety0ff.dev gmail.com> said:Everything seems to work in your corrected versions: http://dpaste.dzfl.pl/3b32535559ba Andrei's version doesn't compile on recent compiler versions due to goto skipping initialization.Ok, so let's check what happens in actual usage. In other words, what happens when front is inlined and used in a loop. I'm counting the number of assembly lines of this main function for looping through each possible input using the various implementations: http://goo.gl/QjnA2k implementation line count of main in assembly andralex 285 - 9 labels = 276 instructions MFortin1 135 - 13 labels = 122 instructions MFortin2 150 - 14 labels = 136 instructions safety0ff -- not inlined (too big?) -- dnspies 161 - 15 labels = 146 instructions CWilliams 233 - 25 labels = 208 instructions For compactness, my first implementation seems to be the winner, both with and without inlining. That said, I think front should only assume a non-empty char[] and thus should check the length before accessing anything after the first byte to protect against incomplete multi-byte sequences such as [0x1000_0000]. So I added this line to my two implementations: if (1+tailLength >= s.length) return dchar.init; Now lets see what happens with those changes: http://goo.gl/XPCGYE implementation line count of main in assembly MFortin1Check 103 - 11 labels = 92 instructions MFortin2Check 135 - 13 labels = 122 instructions Now I'm baffled. Adding a test makes the code shorter? It actually make the standalone functions longer by 8 instructions (as expected), but the inlined version is shorter. My guess is that the optimizer creates something entirely different and it turns out that this different version optimises better after inlining. That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known: http://goo.gl/E2Q0Yu implementation line count of main in assembly andralex 384 - 9 labels = 375 instructions MFortin1 173 - 19 labels = 154 instructions MFortin2 182 - 18 labels = 164 instructions safety0ff -- not inlined -- dnspies -- not inlined -- CWilliams 229 - 23 labels = 206 instructions MFortin1Check 211 - 24 labels = 187 instructions MFortin2Check 218 - 21 labels = 197 instructions So again MFortin1 is the winner for compactness. Still, I maintain that we ought to at least check the length of the array for multi-byte sequences to protect against incomplete sequences that could lay at the end of the string, so I'd favor MFortin1Check. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 24 2014
On 2014-03-24 13:39:45 +0000, Michel Fortin <michel.fortin michelf.ca> said:That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known:Oops, wrong link, was from an intermediary test. This one will give you the results in the table: http://goo.gl/lwU4hvimplementation line count of main in assembly andralex 384 - 9 labels = 375 instructions MFortin1 173 - 19 labels = 154 instructions MFortin2 182 - 18 labels = 164 instructions safety0ff -- not inlined -- dnspies -- not inlined -- CWilliams 229 - 23 labels = 206 instructions MFortin1Check 211 - 24 labels = 187 instructions MFortin2Check 218 - 21 labels = 197 instructions-- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 24 2014
It seems like mine wasn't being inlined because I had carelessly replaced char[] with const(char)[] in the function signature (I don't know why that should make a difference, but it does) Going back to my original version (with Andre's trick to avoid letting the loop unroll), I get dnspies_orig 159 - 16 labels = 143 instructions However, MFortin1 can be made to do better by taking successive slices instead of indexing (and commenting out the check, since it no longer helps optimize). MFortin1_slice 137 - 13 labels = 124 instructions http://goo.gl/JAgVK1 On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:On 2014-03-24 06:08:30 +0000, "safety0ff" <safety0ff.dev gmail.com> said:Everything seems to work in your corrected versions: http://dpaste.dzfl.pl/3b32535559ba Andrei's version doesn't compile on recent compiler versions due to goto skipping initialization.Ok, so let's check what happens in actual usage. In other words, what happens when front is inlined and used in a loop. I'm counting the number of assembly lines of this main function for looping through each possible input using the various implementations: http://goo.gl/QjnA2k implementation line count of main in assembly andralex 285 - 9 labels = 276 instructions MFortin1 135 - 13 labels = 122 instructions MFortin2 150 - 14 labels = 136 instructions safety0ff -- not inlined (too big?) -- dnspies 161 - 15 labels = 146 instructions CWilliams 233 - 25 labels = 208 instructions For compactness, my first implementation seems to be the winner, both with and without inlining. That said, I think front should only assume a non-empty char[] and thus should check the length before accessing anything after the first byte to protect against incomplete multi-byte sequences such as [0x1000_0000]. So I added this line to my two implementations: if (1+tailLength >= s.length) return dchar.init; Now lets see what happens with those changes: http://goo.gl/XPCGYE implementation line count of main in assembly MFortin1Check 103 - 11 labels = 92 instructions MFortin2Check 135 - 13 labels = 122 instructions Now I'm baffled. Adding a test makes the code shorter? It actually make the standalone functions longer by 8 instructions (as expected), but the inlined version is shorter. My guess is that the optimizer creates something entirely different and it turns out that this different version optimises better after inlining. That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known: http://goo.gl/E2Q0Yu implementation line count of main in assembly andralex 384 - 9 labels = 375 instructions MFortin1 173 - 19 labels = 154 instructions MFortin2 182 - 18 labels = 164 instructions safety0ff -- not inlined -- dnspies -- not inlined -- CWilliams 229 - 23 labels = 206 instructions MFortin1Check 211 - 24 labels = 187 instructions MFortin2Check 218 - 21 labels = 197 instructions So again MFortin1 is the winner for compactness. Still, I maintain that we ought to at least check the length of the array for multi-byte sequences to protect against incomplete sequences that could lay at the end of the string, so I'd favor MFortin1Check.
Mar 24 2014
Whoops, wrong "original" version. That was the one before I understood the game. Here mine is fixed (but up to 180 lines - 16 labels = 164 instructions): http://goo.gl/wjIAVm On Monday, 24 March 2014 at 17:14:11 UTC, dnspies wrote:It seems like mine wasn't being inlined because I had carelessly replaced char[] with const(char)[] in the function signature (I don't know why that should make a difference, but it does) Going back to my original version (with Andre's trick to avoid letting the loop unroll), I get dnspies_orig 159 - 16 labels = 143 instructions However, MFortin1 can be made to do better by taking successive slices instead of indexing (and commenting out the check, since it no longer helps optimize). MFortin1_slice 137 - 13 labels = 124 instructions http://goo.gl/JAgVK1 On Monday, 24 March 2014 at 13:39:45 UTC, Michel Fortin wrote:On 2014-03-24 06:08:30 +0000, "safety0ff" <safety0ff.dev gmail.com> said:Everything seems to work in your corrected versions: http://dpaste.dzfl.pl/3b32535559ba Andrei's version doesn't compile on recent compiler versions due to goto skipping initialization.Ok, so let's check what happens in actual usage. In other words, what happens when front is inlined and used in a loop. I'm counting the number of assembly lines of this main function for looping through each possible input using the various implementations: http://goo.gl/QjnA2k implementation line count of main in assembly andralex 285 - 9 labels = 276 instructions MFortin1 135 - 13 labels = 122 instructions MFortin2 150 - 14 labels = 136 instructions safety0ff -- not inlined (too big?) -- dnspies 161 - 15 labels = 146 instructions CWilliams 233 - 25 labels = 208 instructions For compactness, my first implementation seems to be the winner, both with and without inlining. That said, I think front should only assume a non-empty char[] and thus should check the length before accessing anything after the first byte to protect against incomplete multi-byte sequences such as [0x1000_0000]. So I added this line to my two implementations: if (1+tailLength >= s.length) return dchar.init; Now lets see what happens with those changes: http://goo.gl/XPCGYE implementation line count of main in assembly MFortin1Check 103 - 11 labels = 92 instructions MFortin2Check 135 - 13 labels = 122 instructions Now I'm baffled. Adding a test makes the code shorter? It actually make the standalone functions longer by 8 instructions (as expected), but the inlined version is shorter. My guess is that the optimizer creates something entirely different and it turns out that this different version optimises better after inlining. That said, my test "main" function isn't very representative of the general case because the length is statically known by the optimizer. Let's see what it does when the length is not statically known: http://goo.gl/E2Q0Yu implementation line count of main in assembly andralex 384 - 9 labels = 375 instructions MFortin1 173 - 19 labels = 154 instructions MFortin2 182 - 18 labels = 164 instructions safety0ff -- not inlined -- dnspies -- not inlined -- CWilliams 229 - 23 labels = 206 instructions MFortin1Check 211 - 24 labels = 187 instructions MFortin2Check 218 - 21 labels = 197 instructions So again MFortin1 is the winner for compactness. Still, I maintain that we ought to at least check the length of the array for multi-byte sequences to protect against incomplete sequences that could lay at the end of the string, so I'd favor MFortin1Check.
Mar 24 2014
On Monday, 24 March 2014 at 04:37:23 UTC, Michel Fortin wrote:dchar front(char[] s) { if (s[0] < 0b1000000) return s[0]; // ASCII auto indicator = (s[0] >> 5) & 0b11; auto tailLength = indicator ? indicator : 1; dchar result = s[0] & (0b00111111 >> tailLength); foreach (i; 0..tailLength) result = (result << 6) | (s[1+i] & 0b00111111); return result; } (Disclaimer: not tested, but I did check that all the expected code paths are present in the assembly this time.)0b1000000 is missing a zero: 0b1000_0000 Fixing that, I still get a range violation from "s[1+i]". ----- Test program ----- void main() { foreach (ubyte b0; 0..0x80) { char[] s = [b0]; assert(front(s)==front2(s)); } writeln("Single byte done"); foreach (ubyte b0; 0..0x40) foreach (ubyte b1; 0..0x20) { char[] s = [0xC0|b1, 0x80|b0]; assert(front(s)==front2(s)); } writeln("Double byte done"); foreach (ubyte b0; 0..0x40) foreach (ubyte b1; 0..0x40) foreach (ubyte b2; 0..0x10) { char[] s = [0xE0|b2, 0x80|b1, 0x80|b0]; assert(front(s)==front2(s)); } writeln("Triple byte done"); foreach (ubyte b0; 0..0x40) foreach (ubyte b1; 0..0x40) foreach (ubyte b2; 0..0x40) foreach (ubyte b3; 0..0x08) { char[] s = [0xF0|b3, 0x80|b2, 0x80|b1, 0x80|b0]; assert(front(s)==front2(s)); } }
Mar 23 2014
On 2014-03-24 04:58:08 +0000, "safety0ff" <safety0ff.dev gmail.com> said:0b1000000 is missing a zero: 0b1000_0000Indeed, thanks.Fixing that, I still get a range violation from "s[1+i]".That "auto indicator = (s[0] >> 5) & 0b11;" line is wrong too. s[0] needs a shift by 4, not by 5. No doubt it crashes your test program. -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 23 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiHere's my attempt: http://goo.gl/alJ9JS I interpreted the challenge slightly differently, I went for fewer lines of code and fewer branches. I verified the output for all valid inputs.
Mar 23 2014
On 3/23/14, 8:36 PM, safety0ff wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Very interesting! -- AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiHere's my attempt: http://goo.gl/alJ9JS I interpreted the challenge slightly differently, I went for fewer lines of code and fewer branches. I verified the output for all valid inputs.
Mar 23 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! Andreihttp://goo.gl/TaZTNB
Mar 24 2014
On Monday, 24 March 2014 at 07:53:11 UTC, Chris Williams wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! Andreihttp://goo.gl/TaZTNBfor (int i = 1; i < len; i++) {size_t, ++i! :> (optimized away but still, nitnit pickpick)
Mar 24 2014
On 3/24/14, 12:53 AM, Chris Williams wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Nice! Why assert(ret != 0)? AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! Andreihttp://goo.gl/TaZTNB
Mar 24 2014
On Monday, 24 March 2014 at 16:17:52 UTC, Andrei Alexandrescu wrote:On 3/24/14, 12:53 AM, Chris Williams wrote:It should be -1. I wanted to make sure that bsr() wasn't called on zero, but of course that's not the correct way to test it when I'm about to ~ it.On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Nice! Why assert(ret != 0)? AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! Andreihttp://goo.gl/TaZTNB
Mar 24 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiHere's mine (the loop gets unrolled): dchar front(const char[] s) { assert(s.length > 0); byte c = s[0]; dchar res = cast(ubyte)c; if(c >= 0) { return res; } c <<= 1; assert(c < 0); for(int i = 1; i < 4; i++) { assert(i < s.length); ubyte d = s[i]; assert((d >> 6) == 0b10); res = (res << 8) | d; c <<= 1; if(c >= 0) return res; } assert(false); }
Mar 24 2014
On Monday, 24 March 2014 at 08:06:53 UTC, dnspies wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Sorry, I misunderstood. We only want the x's in the output. Here's my revised solution http://goo.gl/PL729J dchar front(const char[] s) { assert(s.length > 0); byte c = s[0]; dchar res = cast(ubyte)c; if(c >= 0) return res; dchar cover = 0b0100_0000; c <<= 1; assert(c < 0); for(int i = 1; i < 4; i++) { assert(i < s.length); ubyte d = s[i]; assert((d >> 6) == 0b10); cover <<= 5; res = ((res << 6) & (cover - 1)) | (d & 0b0011_1111); c <<= 1; if(c >= 0) return res; } assert(false); }Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiHere's mine (the loop gets unrolled): dchar front(const char[] s) { assert(s.length > 0); byte c = s[0]; dchar res = cast(ubyte)c; if(c >= 0) { return res; } c <<= 1; assert(c < 0); for(int i = 1; i < 4; i++) { assert(i < s.length); ubyte d = s[i]; assert((d >> 6) == 0b10); res = (res << 8) | d; c <<= 1; if(c >= 0) return res; } assert(false); }
Mar 24 2014
On 3/24/14, 1:56 AM, dnspies wrote:On Monday, 24 March 2014 at 08:06:53 UTC, dnspies wrote:That turned out quite large. -- AndreiOn Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Sorry, I misunderstood. We only want the x's in the output. Here's my revised solution http://goo.gl/PL729JHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiHere's mine (the loop gets unrolled): dchar front(const char[] s) { assert(s.length > 0); byte c = s[0]; dchar res = cast(ubyte)c; if(c >= 0) { return res; } c <<= 1; assert(c < 0); for(int i = 1; i < 4; i++) { assert(i < s.length); ubyte d = s[i]; assert((d >> 6) == 0b10); res = (res << 8) | d; c <<= 1; if(c >= 0) return res; } assert(false); }
Mar 24 2014
On 3/24/14, 1:06 AM, dnspies wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:http://goo.gl/HBSv5T - looks nice! For minimum size replace the assert(false) with one inside the loop: http://goo.gl/eWs0ft AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiHere's mine (the loop gets unrolled): dchar front(const char[] s) { assert(s.length > 0); byte c = s[0]; dchar res = cast(ubyte)c; if(c >= 0) { return res; } c <<= 1; assert(c < 0); for(int i = 1; i < 4; i++) { assert(i < s.length); ubyte d = s[i]; assert((d >> 6) == 0b10); res = (res << 8) | d; c <<= 1; if(c >= 0) return res; } assert(false); }
Mar 24 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences? Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.
Mar 24 2014
Am 24.03.2014 10:02, schrieb monarch_dodra:Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.it would be great to habe a "basic" form of this range that error behavior could be extended policy based / templates - the phobos version could extend the basic range with "prefered" error behavior - but the basic range is still able to read without checking - for example if i know that my input is 100% valid and i need the speed etc.
Mar 24 2014
On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:As you'll note from the assembler, asserts are compiled out of release builds. Though if the goal of this little project is to create useful code for inclusion in a library, you would want to switch many of the asserts in the examples into "if (x) return 0xfffd;". (Mine needs an assert/if in the loop to check for the proper bit markers).Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences? Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.
Mar 24 2014
On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I would strongly advise to at least offer an option, possibly via a template parameter, for turning error handling on or off, similar to how Python handles decoding. Examples below in Python 3. used sequence removed. All three strategies are useful from time to time. I mainly reach for option three when I'm trying to get some text data out of some old broken databases or similar. We may consider leaving the error checking on in -release for the 'strict' decoding, but throwing an Error instead of an exception so the function can be nothrow. This would prevent memory corruption in release code. assert vs throw Error is up for debate.Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences? Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.
Mar 24 2014
Am 24.03.2014 13:51, schrieb w0rp:On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:+1On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I would strongly advise to at least offer an option, possibly via a template parameter, for turning error handling on or off, similar to how Python handles decoding. Examples below in Python 3. used sequence removed. All three strategies are useful from time to time. I mainly reach for option three when I'm trying to get some text data out of some old broken databases or similar. We may consider leaving the error checking on in -release for the 'strict' decoding, but throwing an Error instead of an exception so the function can be nothrow. This would prevent memory corruption in release code. assert vs throw Error is up for debate.Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences? Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.
Mar 24 2014
On 3/24/14, 5:51 AM, w0rp wrote:On Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:Options are fine for functions etc. But front would need to find an all-around good compromise between speed and correctness. AndreiOn Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I would strongly advise to at least offer an optionHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences? Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.
Mar 24 2014
Am 24.03.2014 17:44, schrieb Andrei Alexandrescu:On 3/24/14, 5:51 AM, w0rp wrote:sequence removed. i think there should be a base range for UTF8 iteration - with policy based error extension (like in python) and some variants that defer this base UTF8 range with different error behavior - and one of these become the phobos standard = default parameter so its still switchableOn Monday, 24 March 2014 at 09:02:19 UTC, monarch_dodra wrote:Options are fine for functions etc. But front would need to find an all-around good compromise between speed and correctness. AndreiOn Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I would strongly advise to at least offer an optionHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences? Having a fast "front" is fine and all, but if it means your program asserting in release (or worst, silently corrupting memory) just because the client was trying to read a bad text file, I'm unsure this is acceptable.
Mar 24 2014
On 3/24/14, 2:02 AM, monarch_dodra wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I think std.array.front should return the invalid dchar on error, and popFront should attempt to resync on error. AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences?
Mar 24 2014
On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu wrote:On 3/24/14, 2:02 AM, monarch_dodra wrote:Ignoring UTF errors is a lossy operation and has the potential problem of irreversible data loss. For example, consider a program which needs to process Windows-1251 files: it would need to read the file from disk, convert to UTF-8, process it, convert back to Windows-1251, and save it back to disk. If a bug causes the UTF-8 conversion step to be accidentally skipped, then all Unicode data in that file is lost. I think UTF-8 decoding operations should, by default, throw on UTF-8 errors. Ignoring UTF-8 errors should only be done explicitly, with the programmer's consent.On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I think std.array.front should return the invalid dchar on error, and popFront should attempt to resync on error.Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences?
Mar 24 2014
On Monday, 24 March 2014 at 16:31:42 UTC, Andrei Alexandrescu wrote:On 3/24/14, 2:02 AM, monarch_dodra wrote:That sounds good to me (I think). Better than a straight up assert anyways... ...which is what the code you submitted does. Also, I think relying on bounds checking is wrong: Disabling bounds checking mean you are OK for memory corruption for *wrong code*. I don't think invalid string is wrong code. 'front' should do an actual "if", and "assert(0)" when needed. This should have no impact on performance BTW, if done with pointer extraction ("slice.ptr[1]"), and in a trusted context. Of course, this is all assuming we are asserting at all.On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:I think std.array.front should return the invalid dchar on error, and popFront should attempt to resync on error. AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiBefore we roll this out, could we discuss a strategy/guideline in regards to detecting and handling invalid UTF sequences?
Mar 24 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiOk, I managed to make it smaller. I think this is the smallest so far with only 23 instructions (no loop unrolling in this one): http://goo.gl/RKF5Vm
Mar 24 2014
On 3/24/14, 3:30 AM, dnspies wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:This is nice because even the checks are relatively cheap. It's quite difficult to understand tho :o). -- AndreiHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiOk, I managed to make it smaller. I think this is the smallest so far with only 23 instructions (no loop unrolling in this one): http://goo.gl/RKF5Vm
Mar 24 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiI've probably missed something, but here's 15 instructions: http://goo.gl/d3Mgj0 Also, I contacted the owner of d.godbolt.org and hopefully we'll have a more up-to-date gdc on there soon. Perhaps ldc as well :)
Mar 24 2014
On Monday, 24 March 2014 at 15:32:09 UTC, John Colvin wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:woops yeah, that's completely broken.Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiI've probably missed something, but here's 15 instructions: http://goo.gl/d3Mgj0
Mar 24 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! Andreihttp://forum.dlang.org/post/fsgdviklnbugdzjsgfsk forum.dlang.org
Mar 24 2014
On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiOn a bigendian machine with loose alignment requirements (1 byte), you can do this, which is down to 13 instructions on x86 (which is of course meaningless, what with it being the wrong endianess): uint front(char[] s) { if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII assert(s[0] & 0b0100_0000); if(s[0] & 0b0010_0000) { if(s[0] & 0b0001_0000) { assert(s.length >=4 && !(s[0] & 0b1000) && s[1] <= 0b1011_1111 && s[2] <= 0b1011_1111 && s[3] <= 0b1011_1111); return *(cast(dchar*)(s.ptr)); } assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <= 0b1011_1111); return *(cast(dchar*)(s.ptr)) >> 8; } assert(s.length >= 2 && s[1] <= 0b1011_1111); return *(cast(wchar*)(s.ptr)); } http://goo.gl/Kf6RZJ There may be architectures that can benefit from this.
Mar 24 2014
On Monday, 24 March 2014 at 16:41:02 UTC, John Colvin wrote:On Sunday, 23 March 2014 at 21:23:18 UTC, Andrei Alexandrescu wrote:Same again, but for little-endian, 18 instructions: http://goo.gl/jlrweQ uint front(char[] s) { if(!(s[0] & 0b1000_0000)) return s[0]; //handle ASCII assert(s[0] & 0b0100_0000); if(s[0] & 0b0010_0000) { if(s[0] & 0b0001_0000) { assert(s.length >=4 && !(s[0] & 0b1000) && s[1] <= 0b1011_1111 && s[2] <= 0b1011_1111 && s[3] <= 0b1011_1111); return swapEndian(*(cast(dchar*)(s.ptr))); } assert(s.length >= 3 && s[1] <= 0b1011_1111 && s[2] <= 0b1011_1111); return swapEndian(*(cast(dchar*)(s.ptr))) >> 8; } assert(s.length >= 2 && s[1] <= 0b1011_1111); return swapEndian(*(cast(wchar*)(s.ptr))); }Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiOn a bigendian machine with loose alignment requirements (1 byte), you can do this
Mar 24 2014
24-Mar-2014 01:22, Andrei Alexandrescu пишет:Here's a baseline: http://goo.gl/91vIGc. Destroy! AndreiI had to join the party at some point. This seems like 25 instructions: http://goo.gl/N7sHtK -- Dmitry Olshansky
Mar 24 2014
24-Mar-2014 23:53, Dmitry Olshansky пишет:24-Mar-2014 01:22, Andrei Alexandrescu пишет:Interestingly gdc-4.8 produces better results. http://goo.gl/1R7GMs -- Dmitry OlshanskyHere's a baseline: http://goo.gl/91vIGc. Destroy! AndreiI had to join the party at some point. This seems like 25 instructions: http://goo.gl/N7sHtK
Mar 25 2014
dchar front(char[] s) { dchar c = s[0]; if (!(c & 0x80)) return c; byte b = (c >> 4) & 3; b += !b; c &= 63 >> b; char *p = s.ptr; do { p++; c = c << 6 | *p & 63; } while(--b); return c; }
Mar 25 2014