digitalmars.D.learn - (SIMD) Optimized multi-byte chunk scanning

=?UTF-8?B?Tm9yZGzDtnc=?= (22/22) Aug 23 2017 I recall seeing some C/C++/D code that optimizes the comment- and

Igor (6/28) Aug 25 2017 On line comments it doesn't sound like it will pay off since you

=?UTF-8?B?Tm9yZGzDtnc=?= (2/4) Aug 25 2017 Wow, what a fabulous UX!

Cecil Ward (25/29) Aug 26 2017 The pcmpestri instruction is probably what you are looking for?

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

I recall seeing some C/C++/D code that optimizes the comment- and 
whitespace-skipping parts (tokens) of lexers by operating on 2, 4 
or 8-byte chunks instead of single-byte chunks. This in the case 
when token-terminators are expressed as sets of (alternative) 
ASCII-characters.

For instance, when searching for the end of a line comment, I 
would like to speed up the while-loop in

     size_t offset;
     string input = "// \n"; // a line-comment string
     import std.algorithm : among;
     // until end-of-line or file terminator
     while (!input[offset].among!('\0', '\n', '\r')
     {
         ++offset;
     }

by taking `offset`-steps larger than one.

Note that my file reading function that creates the real `input`, 
appends a '\0' at the end to enable sentinel-based search as 
shown in the call to `among` above.

I further recall that there are x86_64 intrinsics that can be 
used here for further speedups.

Refs, anyone?

Aug 23 2017

Igor <stojkovic.igor gmail.com> writes:

On Wednesday, 23 August 2017 at 22:07:30 UTC, Nordlöw wrote:
 I recall seeing some C/C++/D code that optimizes the comment- 
 and whitespace-skipping parts (tokens) of lexers by operating 
 on 2, 4 or 8-byte chunks instead of single-byte chunks. This in 
 the case when token-terminators are expressed as sets of 
 (alternative) ASCII-characters.

 For instance, when searching for the end of a line comment, I 
 would like to speed up the while-loop in

     size_t offset;
     string input = "// \n"; // a line-comment string
     import std.algorithm : among;
     // until end-of-line or file terminator
     while (!input[offset].among!('\0', '\n', '\r')
     {
         ++offset;
     }

 by taking `offset`-steps larger than one.

 Note that my file reading function that creates the real 
 `input`, appends a '\0' at the end to enable sentinel-based 
 search as shown in the call to `among` above.

 I further recall that there are x86_64 intrinsics that can be 
 used here for further speedups.

 Refs, anyone?

On line comments it doesn't sound like it will pay off since you 
would have to do extra work to make sure you work on 16 byte 
aligned memory. For multi-line comments maybe.

As for a nice reference of intel intrinsics: 
https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Aug 25 2017

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Friday, 25 August 2017 at 09:40:28 UTC, Igor wrote:
 As for a nice reference of intel intrinsics: 
 https://software.intel.com/sites/landingpage/IntrinsicsGuide/

Wow, what a fabulous UX!

Aug 25 2017

Cecil Ward <d cecilward.com> writes:

On Friday, 25 August 2017 at 18:52:57 UTC, Nordlöw wrote:
 On Friday, 25 August 2017 at 09:40:28 UTC, Igor wrote:
 As for a nice reference of intel intrinsics: 
 https://software.intel.com/sites/landingpage/IntrinsicsGuide/

 Wow, what a fabulous UX!

The pcmpestri instruction is probably what you are looking for? 
There is a useful resource in the Intel optimisation guide. There 
is also an Intel article about speeding up XML parsing with this 
instruction, but if memory serves it's really messy - a right 
palaver. A lot depends on how much control, if any you have over 
the input data, typically none I quite understand.

Based on this article,
     https://graphics.stanford.edu/~seander/bithacks.html
I wrote a short d routine to help me learn the language as I was 
thinking about faster strlen using larger-sized gulps. The above 
article has a good test for whether or not a wide word contains a 
particular byte value somewhere in it. I wrote a
    bool hasZeroByte( in uint64_t x )
function based on that method.

I'm intending to write a friendlier d convenience routine to give 
access to inline pcmpestri code generation in GDC when I get 
round to it (one instruction all fully inlined and flexibly 
optimised at compile-time, with no subroutine call to an 
instruction).

Agner Fog's libraries and articles are superb, take a look. He 
must have published code to deal with these C standard library 
byte string processing functions efficiently with wide aligned 
machine words, unless I'm getting very forgetful.

A bit of googling?

Aug 26 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - (SIMD) Optimized multi-byte chunk scanning