www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Which "if" choice is faster?

reply jicman <jicman_member pathlink.com> writes:
Greetings.

I have a program which handles lots of text files on a per line basis.  I am
trying to get some info from the D gurus as to which "if" option is fastest and
best for a large amount of text handling lines.  Let us imagine these two
instances of code:

Code a:
|char[] GetTheChoice(char[] line)
|{
|  if (std.string.find(line,"A") > -1)
|    return "A";
|  if (std.string.find(line,"B") > -1)
|    return "A";
|  ...
|  ...
|  ...
|  if (std.string.find(line,"Y") > -1)
|    return "A";
|  if (std.string.find(line,"Z") > -1)
|    return "A";
|}


Code b:
|char[] GetTheChoice(char[] line)
|{
|  if (std.string.find(line,"A") > -1 ||
|      std.string.find(line,"B") > -1 ||
|  ...
|  ...
|  ...
|      std.string.find(line,"Y") > -1 ||
|      std.string.find(line,"Z") > -1)
|    return "A";
|}

Also, if there is a faster way than this.

thanks,

josé
Aug 03 2005
next sibling parent reply Vathix <chris dprogramming.com> writes:
On Wed, 03 Aug 2005 12:38:01 -0400, jicman <jicman_member pathlink.com>  
wrote:

 Greetings.

 I have a program which handles lots of text files on a per line basis.   
 I am
 trying to get some info from the D gurus as to which "if" option is  
 fastest and
 best for a large amount of text handling lines.  Let us imagine these two
 instances of code:

 Code a:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1)
 |    return "A";
 |  if (std.string.find(line,"B") > -1)
 |    return "A";
 |  ...
 |  ...
 |  ...
 |  if (std.string.find(line,"Y") > -1)
 |    return "A";
 |  if (std.string.find(line,"Z") > -1)
 |    return "A";
 |}


 Code b:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1 ||
 |      std.string.find(line,"B") > -1 ||
 |  ...
 |  ...
 |  ...
 |      std.string.find(line,"Y") > -1 ||
 |      std.string.find(line,"Z") > -1)
 |    return "A";
 |}

 Also, if there is a faster way than this.

 thanks,

 josé

Well, std.string.find() needs to start at the beginning of the array for each search, so not using it at all and just looping over the string once would be fastest. char[] GetTheChoice(char[] line) { foreach(char ch; line) { switch(ch) { case 'A', 'B': return "A"; case 'Y', 'Z': return "A"; default: ; } } //return "?"; }
Aug 03 2005
parent jicman <jicman_member pathlink.com> writes:
Vathix says...
On Wed, 03 Aug 2005 12:38:01 -0400, jicman <jicman_member pathlink.com>  
wrote:

 Greetings.

 I have a program which handles lots of text files on a per line basis.   
 I am
 trying to get some info from the D gurus as to which "if" option is  
 fastest and
 best for a large amount of text handling lines.  Let us imagine these two
 instances of code:

 Code a:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1)
 |    return "A";
 |  if (std.string.find(line,"B") > -1)
 |    return "A";
 |  ...
 |  ...
 |  ...
 |  if (std.string.find(line,"Y") > -1)
 |    return "A";
 |  if (std.string.find(line,"Z") > -1)
 |    return "A";
 |}


 Code b:
 |char[] GetTheChoice(char[] line)
 |{
 |  if (std.string.find(line,"A") > -1 ||
 |      std.string.find(line,"B") > -1 ||
 |  ...
 |  ...
 |  ...
 |      std.string.find(line,"Y") > -1 ||
 |      std.string.find(line,"Z") > -1)
 |    return "A";
 |}

 Also, if there is a faster way than this.

 thanks,

 josé

Well, std.string.find() needs to start at the beginning of the array for each search, so not using it at all and just looping over the string once would be fastest. char[] GetTheChoice(char[] line) { foreach(char ch; line) { switch(ch) { case 'A', 'B': return "A"; case 'Y', 'Z': return "A"; default: ; } } //return "?"; }

Well, the problem is that I am not really searching for "one letter". I should have entered another example. ;-) My fault. There is a lot of information missing here, but I need to use std.string.find(). Thanks for the help, though. jic
Aug 03 2005
prev sibling next sibling parent reply "Walter" <newshound digitalmars.com> writes:
They're the same. In general, though, you need to profile code to see where
the real bottlenecks are, as the results can be counterintuitive and
surprising. This should get you started:

    www.digitalmars.com/techtips/timing_code.html
Aug 03 2005
parent jicman <jicman_member pathlink.com> writes:
Thanks Walter.  Yes, I figured that some if's will be encountered less than
others, and those I am placing at the bottom of the list.  This will allow for
those most used if's to return faster.  I will try both of these if options and
tell you if there is a real time difference between them.  However, don't wait
standing up.  I am working on a project which I chosed to do in D, instead of
Java or c, and I need to finish it first. ;-)  It's due Friday.  My boss does
not know, but when he sees the outcome, he'll be very glad. ;-)

thanks,

jic

Walter says...
They're the same. In general, though, you need to profile code to see where
the real bottlenecks are, as the results can be counterintuitive and
surprising. This should get you started:

    www.digitalmars.com/techtips/timing_code.html

Aug 03 2005
prev sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of import std.regexP; //... if(( new RegExp( "[A-Z]", "")).test( line)) return( "A"); -manfred
Aug 03 2005
parent reply jicman <jicman_member pathlink.com> writes:
Manfred Nowak says...
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of import std.regexP; //... if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");

Will this be faster than std.string.find? I already have some RegExp for something else, but I wouldn't think that opening a RegExp object for every different if would be faster. Perhaps, I should have explained it better. I am not just searching for one letter. It was just an example. Here is the full test, and perhaps, some of you may think of a better way: |char[] GetTheChoice(char[] line) |{ | //writefln(line); | if (std.string.find(line,"JUNKDATA") > -1) | return "JUNKDATA"; | if (std.string.find(line,"Auto stopping job") > -1) | return "AutoStopJob"; | if (std.string.find(line,"Auto starting the next job") > -1) | return "AutoStartJob"; | if (std.string.find(line,"HTTP/NSA job") > -1) | return "HTTP/NSA job"; | if (std.string.find(line,"HTTP/IFax job") > -1) | return "HTTP/IFax job"; | if (std.string.find(line,"IFax processing message") > -1) | return "IFax processing message"; | if (std.string.find(line,"IFax Job received") > -1) | return "IFax job"; | if (std.string.find(line,"IFax Job POP3 job") > -1) | return "IFax POP3 job"; | if (std.string.find(line,"Login failed because password incorrect:") > -1) | return "Failed NSA FTP login"; | if (std.string.find(line,"User logged in:") > -1) | return "NSA FTP login"; | if (std.string.find(line,"NSA job from ") > -1) | return "NSA job from "; | if (std.string.find(line,"Processing image of Cover Sheet ") > -1) | return "Processing image of Cover Sheet "; | if (std.string.find(line,"e-mail confirmation of") > -1) | return "e-mail confirmation of"; | if (std.string.find(line,"Printing Confirmation Page") > -1) | return "Printing Confirmation Page"; | if (std.string.find(line,"Web Centre URL ") > -1) | return "Web Centre URL "; | if (std.string.find(line,"Restarted job: ") > -1) | return "Restarted job: "; | if (std.string.find(line,"Starting remote job") > -1) | return "Starting remote job"; | if (std.string.find(line,"Error handler reports") > -1) | return "Error handler reports"; | if (std.string.find(line," Attempting to send mail to ") > -1) | return "CreatingUser0"; | if (std.string.find(line," User Created: ") > -1) | return "CreatingUser1"; | if (std.string.find(line,"ERROR ") > -1) | return "ERROR "; | if (std.string.find(line,"DEBUG ") > -1) | return "DEBUG "; | if (std.string.find(line,"WEB ") > -1) | return "WEB "; | if (std.string.find(line,"OCR: begin=") > -1) | return "OCR: begin="; | if (std.string.find(line,"OCR: fail=") > -1) | return "OCR: fail="; | if (std.string.find(line,"OCR: end=") > -1) | return "OCR: end="; | if (std.string.find(line,">>> Rejecting login ") > -1) | return "HTTPProgrammaticAccess"; | if (std.string.find(line,"WARNING ") > -1) | return "WARNING "; | if (std.string.find(line,"SYSTEM.ERR ") > -1) | return "SYSTEM.ERR "; | if (std.string.find(line,"SYSTEM.OUT ") > -1) | return "SYSTEM.OUT "; | if (std.string.find(line,"DocuShare retry loop required ") > -1) | return "DSRetryLoop"; | if (std.string.find(line,"FTP command rejected,") > -1) | return "FTP command rejected"; | if (std.string.find(line,"Login: User ") > -1) | return "UserLogin"; | if (std.string.find(line,"Logout from web") > -1) | return "UserLogout"; | if (std.string.find(line,"Login: Username ") > -1) | return "BadPassWord"; | if (std.string.find(line," Advanced access ") > -1) // this one covers | return "AdvancedAccess"; // both the request and the access | if (std.string.find(line," The scheduled backup did not occur") > -1) | return "MissedBackup"; | if (std.string.find(line,">>> Last backup time of ") > -1) | return "MissedBackup"; | if (std.string.find(line,"There are ") > -1 && | std.string.find(line," jobs waiting to restart") > -1) | return "JobsBackedUp"; | if (std.string.find(line,"Waiting for ") > -1 && | (std.string.find(line," job to complete") > -1 || | std.string.find(line," jobs to complete") > -1 )) | return "JobsBackedUp"; | if (std.string.find(line," Persistent GC is slow") > -1) | return "GCSlow"; | if (std.string.find(line," Persistent store GC reclaimed ") > -1) | return "GarbageCollection"; | if (std.string.find(line,"Shutdown called:") > -1) | return "Shutdown called:"; | if (std.string.find(line,"About to shutdown due to timeout") > -1) | return "TimeOutShutDown"; | if (std.string.find(line,"Persistent Store has been closed!") > -1 || | std.string.find(line,"Terminating persistent store session") > -1) | return "ClosedPersistentStore"; | if ( | std.string.find(line," >>> About to do GC: ") > -1 || | std.string.find(line," >>> Did GC: ") > -1 || | std.string.find(line," Initial class load") > -1 || | std.string.find(line," Cleaning Database prior ") > -1 || | std.string.find(line," Creating unpack.ini") > -1 || | std.string.find(line," Fixing internal state ") > -1 || | std.string.find(line," Recording Unpack ") > -1 || | std.string.find(line," Deleting info inserted ") > -1 || | std.string.find(line," Packing Properties and ") > -1 || | std.string.find(line," Packing Independent Entities") > -1 || | std.string.find(line," Noting Replaceable objects") > -1 || | std.string.find(line," Starting to unpack objects") > -1 || | std.string.find(line," >>> Credentials support provided ") > -1 || | std.string.find(line," >>> Gathered ") > -1 || | std.string.find(line," >>> Finished unpacking") > -1 || | std.string.find(line," >>> Finished Unpacking") > -1 || | std.string.find(line," >>> SystemParameter ") > -1 || | std.string.find(line," Writing objects to ") > -1 || | std.string.find(line," Exporting database objects...") > -1 || | std.string.find(line," Importing database objects...") > -1 || | std.string.find(line,"Administrative Notice Logging Facility") > -1 || | std.string.find(line,"Starting NSA FTP with server") > -1 || | std.string.find(line," >>> Unable to read ") > -1 || | std.string.find(line," >>> Deleting ") > -1 || | std.string.find(line," >>> Installing Patch ") > -1 || | std.string.find(line," Installing services and ") > -1 || | std.string.find(line," >>> Finished Packing") > -1 || | std.string.find(line,"Starting NSA FTP server") > -1 || | std.string.find(line,"Using Hostname class from ") > -1 || | std.string.find(line,"Started POP3 poller for ") > -1 || | std.string.find(line," >>> POP3 polling enabled") > -1 || | std.string.find(line," >>> POP3 enabled polling") > -1 || | std.string.find(line,"Starting POP3 poller for ") > -1 || | std.string.find(line,"Loading ApplicationJarEntities") > -1 || | std.string.find(line,">>> Loading application ") > -1 || | std.string.find(line,">>> Loading service ") > -1 || | std.string.find(line," Debugging and Error Logging Faci") > -1 || | std.string.find(line,"FlowPort Version FlowPort") > -1 || | std.string.find(line,"TextBridge SDK 5.0-PP") > -1 || | std.string.find(line," Deflate compressed in pass ") > -1 || | std.string.find(line," Terminating persistent store GC ") > -1 || | std.string.find(line,"Attempting to start IFax SMTP") > -1 || | std.string.find(line,"Enabled SMTP IFax server COM.") > -1 || | std.string.find(line,"Started SMTP IFax server COM.") > -1 | ) | return "IgnoreEntry"; // ignore these STATUS entries... not used | return "NotUsableData"; |} thanks, josé
Aug 03 2005
next sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"jicman" <jicman_member pathlink.com> wrote in message 
news:dcr6dh$1kv5$1 digitaldaemon.com...
 Manfred Nowak says...
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of import std.regexP; //... if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");

Will this be faster than std.string.find? I already have some RegExp for something else, but I wouldn't think that opening a RegExp object for every different if would be faster. Perhaps, I should have explained it better. I am not just searching for one letter. It was just an example. Here is the full test, and perhaps, some of you may think of a better way:

If you could find a way to use string comparison instead of find then you might be able to use an associative array with the keys the strings like "JUNKDATA", "Auto stopping job" etc and the values strings like "JUNKDATA","AutoStopJob", etc. Without any assumptions about where in the line they keys can appear it would be tough to do any better than what you are currently doing.
Aug 03 2005
parent jicman <jicman_member pathlink.com> writes:
Ben Hinkle says...
"jicman" <jicman_member pathlink.com> wrote in message 
news:dcr6dh$1kv5$1 digitaldaemon.com...
 Manfred Nowak says...
jicman <jicman_member pathlink.com> wrote:

[...]
 Also, if there is a faster way than this.

Have you thaught of import std.regexP; //... if(( new RegExp( "[A-Z]", "")).test( line)) return( "A");

Will this be faster than std.string.find? I already have some RegExp for something else, but I wouldn't think that opening a RegExp object for every different if would be faster. Perhaps, I should have explained it better. I am not just searching for one letter. It was just an example. Here is the full test, and perhaps, some of you may think of a better way:

If you could find a way to use string comparison instead of find then you might be able to use an associative array with the keys the strings like "JUNKDATA", "Auto stopping job" etc and the values strings like "JUNKDATA","AutoStopJob", etc. Without any assumptions about where in the line they keys can appear it would be tough to do any better than what you are currently doing.

Ok, thanks. These values are returned to a case which then, depending on the return string will execute more string splits, checks, etc. I just wanted to get an idea, since the amount of data is getting bigger and bigger. thanks again folks. jic
Aug 03 2005
prev sibling next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
jicman <jicman_member pathlink.com> wrote:

[...]
  Here is the full test, and
 perhaps, some of you may think of a better way: 

There is no predefined solution for your problem in D. If you want a fast running solution for a big data pool you have to implement a lexer. If you want a prototype to show how fast developing in D can be, you are already done. -manfred
Aug 03 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Manfred Nowak <svv1999 hotmail.com> wrote:

 you have to implement a lexer.

For this task you can use re2c from r2c.org. A partially filled template for your taks would look like this: <code> alias char YYCTYPE; YYCTYPE* YYCURSOR, YYLIMIT, YYMARKER; char[] choose( char[] line){ void yyLimitAdjust(){ YYLIMIT= &line[ line.length-1]; YYLIMIT++; } void YYFILL( int n){ uint now= line.length; line.length= now + n; yyLimitAdjust(); for( uint i= now; i< line.length; i++) line[ i]= ' '; } YYCURSOR= &line[0]; yyLimitAdjust(); try{ while( 1) /*!re2c "JUNKDATA" { return "JUNKDATA".dup;} "Auto stopping job" { return "AutoStopJob".dup;} "Waiting for ".*"job""s"?" to complete" { return "JobsBackedUp".dup; } '\n' {} . {} */ } catch{ return "NotUsableData";}; } unittest{ assert( choose( "JUNKDAT") == "NotUsableData"); assert( choose( "JUNKDATA") == "JUNKDATA"); assert( choose( "Waiting for one job to complete") == "JobsBackedUp"); assert( choose( "Waiting for 20 jobs to complete") == "JobsBackedUp"); } void main(){ } </char> This run through re2c can be fed into dmd nearly as is. -manfred
Aug 03 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 "JUNKDATA" { return "JUNKDATA".dup;}
 "Auto stopping job" { return "AutoStopJob".dup;}
 "Waiting for ".*"job""s"?" to complete" { return
 "JobsBackedUp".dup; }

any particular reason for the .dups?
Aug 03 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
"Ben Hinkle" <ben.hinkle gmail.com> wrote:

[...]
 any particular reason for the .dups? 

references to local data aren't allowed, are they? -manfred
Aug 03 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:dcrovi$23ik$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote:

 [...]
 any particular reason for the .dups?

references to local data aren't allowed, are they? -manfred

String literals are stored in the shared data segment so they can be safely returned from functions (like in C). The local info on the stack is the ptr to the data segment and length to the string data and that is being copied to the return value. If the function returned the address of a local variable or a string stored locally on the stack then that would indeed be illegal.
Aug 03 2005
parent reply Manfred Nowak <svv1999 hotmail.com> writes:
"Ben Hinkle" <ben.hinkle gmail.com> wrote:

 String literals are stored in the shared data segment so they
 can be safely returned from functions (like in C).

I cannot find any hint on this in the specs. So it is current but undocumented and therefore unpromised behaviour. Furthermore and as far as I understand, the 64-bit OS's have given up the idea of segments. If this cannot be reestablished, the move to 64-bit will break masses of D-code? -manfred
Aug 03 2005
parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Manfred Nowak" <svv1999 hotmail.com> wrote in message 
news:dcs6lm$2eod$1 digitaldaemon.com...
 "Ben Hinkle" <ben.hinkle gmail.com> wrote:

 String literals are stored in the shared data segment so they
 can be safely returned from functions (like in C).

I cannot find any hint on this in the specs. So it is current but undocumented and therefore unpromised behaviour.

Since C behaves that way Walter probably assumed people would assume D behaves that way, too. It would be nice to have it mentioned in the doc, though.
 Furthermore and as far as I understand, the 64-bit OS's have given up
 the idea of segments. If this cannot be reestablished, the move to
 64-bit will break masses of D-code?

That would make existing C code break, too, so I doubt it will be a problem.
 -manfred 

Aug 04 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
"Ben Hinkle" <ben.hinkle gmail.com> wrote:

[...]
 That would make existing C code break, too, so I doubt it will
 be a problem. 

I will have a closer look to C-code ported to 64-bit OS. But the fact that C-code may break is no reason for letting D-code break also. -manfred
Aug 04 2005
prev sibling parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 3 Aug 2005 19:37:21 +0000 (UTC), jicman wrote:


[snip]
 I am
 not just searching for one letter.

I have a much faster string finding routine coded up in Build. It uses the Boyer-Moore algorithm. Here it is ... <code> template BMScan(T) { int BMScan(T[] pContainer, T[] pFind, uint pStartPos=0) { int[T] lShift; // Shift table. int lCP; // Index into container array int lCE; // Current high-water mark in containter array int lFP; // Index into find array // Do length checks first. if (pFind.length == 0) return -1; if (pStartPos > pContainer.length) return -1; if (pStartPos < 0) return -1; if (pStartPos > 0) pContainer = pContainer[pStartPos .. $]; if (pContainer.length == 0) return -1; if (pFind.length > pContainer.length) return -1; // If find string is only one element long, do a brute force scan. if (pFind.length == 1) { T pElem = pFind[0]; foreach(int i, T lElem; pContainer[pStartPos .. $]) { if ( lElem == pElem) return i + pStartPos; } return -1; } // Prepare shift table. foreach(int i, T c; pFind) { lShift[c] = pFind.length - i - 1; } // Start scan at rightmost position. lCP = pFind.length - 1; while(lCP < pContainer.length) { lFP = pFind.length - 1; assert(lCP == 0 || lCP > lCE); lCE = lCP; // Remember where I'm up to so far. if (pContainer[lCP] != pFind[lFP]) { // Found a mismatch, so realign the arrays. if ((pContainer[lCP] in lShift) == null) { // This element is not anywhere in the Find array // so we can skip over it to align up just beyond it. lCP += pFind.length; } else { // This element is somewhere in the Find array // so if we haven't checked all Find elements we // can slide up to align with the rightmost // position where the element occurs in the Find // array. However, if we have checked the entire // Find array, then it isn't in the Container array. if (lFP == 0) return -1; lCP += lShift[pContainer[lCP]]; } } else { // Found a match, so scan right-to-left... // unless this we are at the first Find array element // in which case we have found it. if (lFP == 0) return lCP + pStartPos; // Keep scanning until we run out of elements in the Find array. while(lFP > 0) { // Point to next pair to check. lFP--; lCP--; if (pContainer[lCP] != pFind[lFP]) { // Found a mismatch, so realign the arrays. if ((pContainer[lCP] in lShift) == null) { // This element is not anywhere in the Find array // so we can skip over it to align up just beyond it. lCP += pFind.length; } else { // This element is somewhere in the Find array // so if we haven't checked all Find elements we // can realign the match so far beyond the current // 'block'. // However, if we have checked the entire // Find array, then it isn't in the Container array. if (lFP == 0) return -1; lCP = lCE + (pFind.length - lFP - 1); } // Stop element-by-element scanning and resume // block-end checking. break; } if (lFP == 0) // We are at the begining of the Find array so // that means we found a complete match! return lCP + pStartPos; } } } // If we get here, then there is no match. return -1; } } </code> And you use it like the phobos find() function ... <code> if ( BMScan!(char)(line,"Loading ApplicationJarEntities") > -1) ... </code> -- Derek Parnell Melbourne, Australia 4/08/2005 7:05:14 AM
Aug 03 2005
next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:9lcyd77i1y0t.uvlaprrcooxk.dlg 40tude.net...
 On Wed, 3 Aug 2005 19:37:21 +0000 (UTC), jicman wrote:


 [snip]
 I am
 not just searching for one letter.

I have a much faster string finding routine coded up in Build. It uses the Boyer-Moore algorithm. Here it is ...

Hmm. when I try it on strings pContainer of length between 50 and 150 with a pFind of "this is a test of phobos" I get std.string.find outperforms the posted code by an order of magnitude and it seems to get worse as the GC load increses (I notice the lShift is an assoc array which will allocate nodes for every unique key on every call). What sorts of problem sizes are appropriate for BMScan?
Aug 03 2005
parent reply Derek Parnell <derek psych.ward> writes:
On Wed, 3 Aug 2005 19:58:08 -0400, Ben Hinkle wrote:

 "Derek Parnell" <derek psych.ward> wrote in message 
 news:9lcyd77i1y0t.uvlaprrcooxk.dlg 40tude.net...
 On Wed, 3 Aug 2005 19:37:21 +0000 (UTC), jicman wrote:


 [snip]
 I am
 not just searching for one letter.

I have a much faster string finding routine coded up in Build. It uses the Boyer-Moore algorithm. Here it is ...

Hmm. when I try it on strings pContainer of length between 50 and 150 with a pFind of "this is a test of phobos" I get std.string.find outperforms the posted code by an order of magnitude and it seems to get worse as the GC load increses (I notice the lShift is an assoc array which will allocate nodes for every unique key on every call). What sorts of problem sizes are appropriate for BMScan?

Ouch! I've just tried some time trials myself and I have to say that phobos is faster in all cases, even when I have to do conversions of wchar[] and dchar[]. I also tried to remove the AA dependancies but the best I could do was to come in a half phobos's speed. Conclusion: Don't use the BMScan code I supplied. There is something not quite right about. On the lower-case 'L' prefix issue, I use a modified Courier font that displays this character's glyph as a lower-case "T" but without the bar. The net effect is a character that has a curved base and is easily distinguishable from both 'i' and 1. The same modified font also displays a zero with dot in the center of the oval to avoid confusion with 'O' and 'o'. Sorry about that ... ;-) -- Derek Melbourne, Australia 4/08/2005 1:21:06 PM
Aug 03 2005
parent "Walter" <newshound digitalmars.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message
news:s7w75fouxa82$.1x9tu12mubkvu.dlg 40tude.net...
 I've just tried some time trials myself and I have to say that phobos is
 faster in all cases, even when I have to do conversions of wchar[] and
 dchar[]. I also tried to remove the AA dependancies but the best I could

 was to come in a half phobos's speed.

It's just really hard to beat memchr() (which find() is based on).
Aug 05 2005
prev sibling parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
    int[T] lShift; // Shift table.
    int lCP;    // Index into container array
    int lCE;    // Current high-water mark in containter array
    int lFP;    // Index into find array

I admit this is a truly nit-picky complaint but the "l" prefix above took me a while to figure out - I thought it was an I for "index". I only really confirmed those were l's when I changed my font.
Aug 03 2005