www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 10403] New: memchr optimization for std.algorithm.find

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403

           Summary: memchr optimization for std.algorithm.find
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: tommitissari hotmail.com


--- Comment #0 from Tommi <tommitissari hotmail.com> 2013-06-18 05:52:38 PDT ---
Add an optimization for:

R find(alias pred = "a == b", R, E)(R haystack, E needle);

...to use the c function memchr when pred is the default "a == b" and...

CASE 1:
All of the following are true:

1) R is an array of elements of type char, byte, ubyte or an enum type whose
base type is one of those.
2) E is integral
3) For the element type of R:
alias Elem = ElementType!R;
...the following is true:
(Elem.min <= needle && needle <= Elem.max)
This is important because memchr only cares about bitwise equality. If the
value of needle is beyound the limits of possible values for type Elem, then we
know needle cannot be found within haystack and the function can just return
early without searching.

CASE 2:
All of the following are true:

1) R is an array of elements of type E
2) E.sizeof == 1
3) For type E equality means bitwise equality

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jun 18 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |monarchdodra gmail.com
         AssignedTo|nobody puremagic.com        |monarchdodra gmail.com


--- Comment #1 from monarchdodra gmail.com 2013-06-19 12:57:30 PDT ---
(In reply to comment #0)
 Add an optimization for:
 
 R find(alias pred = "a == b", R, E)(R haystack, E needle);
 
 ...to use the c function memchr when pred is the default "a == b" and...
 
 CASE 1:
 All of the following are true:
 
 1) R is an array of elements of type char, byte, ubyte or an enum type whose
 base type is one of those.
 2) E is integral
 3) For the element type of R:
 alias Elem = ElementType!R;
 ...the following is true:
 (Elem.min <= needle && needle <= Elem.max)
 This is important because memchr only cares about bitwise equality. If the
 value of needle is beyound the limits of possible values for type Elem, then we
 know needle cannot be found within haystack and the function can just return
 early without searching.
 
 CASE 2:
 All of the following are true:
 
 1) R is an array of elements of type E
 2) E.sizeof == 1
 3) For type E equality means bitwise equality
I'm just about finished implementing the above, along with quite a few other tricks, and I'm getting some 'great' performance improvements. I implemented "CASE 2". However, I don't think it is worth implementing "CASE 1": I think the case of searching for an element that is simply outside of the possible values of the range's element type should be a very rare case. I mean, who would write: find(myRangeOfBytes, 5000); ??? I come to the conclusion that the cost of checking the condition all the time just to reduce the time of a special case is not worth it. It's kind of like of the "opAssign check for self assignment" issue. If you can write the code in such a case that the common case goes faster, but self is more costly, then that is better. So yeah, I implemented "CASE 2". -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403



--- Comment #2 from Tommi <tommitissari hotmail.com> 2013-06-19 14:48:56 PDT ---
(In reply to comment #1)
 (In reply to comment #0)
 Add an optimization for:
 
 R find(alias pred = "a == b", R, E)(R haystack, E needle);
 
 ...to use the c function memchr when pred is the default "a == b" and...
 
 CASE 1:
 All of the following are true:
 
 1) R is an array of elements of type char, byte, ubyte or an enum type whose
 base type is one of those.
 2) E is integral
 3) For the element type of R:
 alias Elem = ElementType!R;
 ...the following is true:
 (Elem.min <= needle && needle <= Elem.max)
 This is important because memchr only cares about bitwise equality. If the
 value of needle is beyound the limits of possible values for type Elem, then we
 know needle cannot be found within haystack and the function can just return
 early without searching.
 
 CASE 2:
 All of the following are true:
 
 1) R is an array of elements of type E
 2) E.sizeof == 1
 3) For type E equality means bitwise equality
I'm just about finished implementing the above, along with quite a few other tricks, and I'm getting some 'great' performance improvements. I implemented "CASE 2". However, I don't think it is worth implementing "CASE 1": I think the case of searching for an element that is simply outside of the possible values of the range's element type should be a very rare case. I mean, who would write: find(myRangeOfBytes, 5000); ??? I come to the conclusion that the cost of checking the condition all the time just to reduce the time of a special case is not worth it. It's kind of like of the "opAssign check for self assignment" issue. If you can write the code in such a case that the common case goes faster, but self is more costly, then that is better. So yeah, I implemented "CASE 2".
Checking for (Elem.min <= needle && needle <= Elem.max) is not an optimization, it's there for code correctness sake. It's the memchr that is the optimization. And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when it's possible that it might be true, i.e. only when the signedness of Elem is different from the signedness of the type of needle. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403



--- Comment #3 from Tommi <tommitissari hotmail.com> 2013-06-19 14:53:11 PDT ---
(In reply to comment #2)
 And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when
 it's possible that it might be true, [..]
I mean... checking for (Elem.min <= needle && needle <= Elem.max) is needed only when it's possible that it might be false -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403



--- Comment #4 from monarchdodra gmail.com 2013-06-19 15:30:33 PDT ---
(In reply to comment #3)
 (In reply to comment #2)
 And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when
 it's possible that it might be true, [..]
I mean... checking for (Elem.min <= needle && needle <= Elem.max) is needed only when it's possible that it might be false
Oh... OK. Makes sense, hadn't thought of that :/ Will deal with it. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 19 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         AssignedTo|monarchdodra gmail.com      |nobody puremagic.com


--- Comment #5 from monarchdodra gmail.com 2013-06-20 00:11:35 PDT ---
Actually, I thought about it some more(In reply to comment #4)
 (In reply to comment #3)
 (In reply to comment #2)
 And checking for (Elem.min <= needle && needle <= Elem.max) is needed only when
 it's possible that it might be true, [..]
I mean... checking for (Elem.min <= needle && needle <= Elem.max) is needed only when it's possible that it might be false
Oh... OK. Makes sense, hadn't thought of that :/ Will deal with it.
Hum... dealing with this optimization is making the code much more complicated then I care for right now... So I'm just leaving it out. Sorry. I did put quite a few other in though... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jun 20 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403



--- Comment #6 from github-bugzilla puremagic.com 2013-10-31 13:32:19 PDT ---
Commits pushed to master at https://github.com/D-Programming-Language/phobos

https://github.com/D-Programming-Language/phobos/commit/67e3c321a880dbeb29f69fe46efffa5f19aac633
fix Issue 10403 - memchr optimization for std.algorithm.find

https://github.com/D-Programming-Language/phobos/commit/482d0e161b04fa1eade0937deecc964d23bca88f
Merge pull request #1492 from monarchdodra/findImprov

fix Issue 10403 - memchr optimization for std.algorithm.find

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 31 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=10403


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Oct 31 2013