www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why do the same work about 'IndexOfAny' and 'indexOf' function?

reply "FrankLike" <1150015857 qq.com> writes:
I want to know whether the string strs contains 
'exe','dll','a','lib',in c#,
I can do : int index =  indexofany(strs,["exe","dll","a","lib"]);
but in D:  I must to do like this:

findStr(strs,["exe","lib","dll","a"]))

bool findStr(string strIn,string[] strFind)
{
	bool bFind = false;
	foreach(str;strFind)
	{
		if(strIn.indexOf(str) !=-1)
                {
                      bFind = true;
			break;
                }
	}
	return bFind;
}

phobos 's string.d can add this some function to let the 
indexOfAny to better?

Thank you.

Frank
Jan 07 2015
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 7 January 2015 at 14:54:51 UTC, FrankLike wrote:
 I want to know whether the string strs contains 
 'exe','dll','a','lib',in c#,
 I can do : int index =  
 indexofany(strs,["exe","dll","a","lib"]);
 but in D:  I must to do like this:

 findStr(strs,["exe","lib","dll","a"]))

 bool findStr(string strIn,string[] strFind)
 {
 	bool bFind = false;
 	foreach(str;strFind)
 	{
 		if(strIn.indexOf(str) !=-1)
                {
                      bFind = true;
 			break;
                }
 	}
 	return bFind;
 }

 phobos 's string.d can add this some function to let the 
 indexOfAny to better?

 Thank you.

 Frank
std.algorithm.canFind will do what you want, including telling you which of ["exe","lib","dll","a"] was found. If you need to know where in strs it was found as well, you can use std.algorithm.find
Jan 07 2015
parent reply "FrankLike" <1150015857 qq.com> writes:
On Wednesday, 7 January 2015 at 15:11:57 UTC, John Colvin wrote:
 On Wednesday, 7 January 2015 at 14:54:51 UTC, FrankLike wrote:
 I want to know whether the string strs contains 
 'exe','dll','a','lib',in c#,
 I can do : int index =  
 indexofany(strs,["exe","dll","a","lib"]);
 but in D:  I must to do like this:

 findStr(strs,["exe","lib","dll","a"]))

 bool findStr(string strIn,string[] strFind)
 {
 	bool bFind = false;
 	foreach(str;strFind)
 	{
 		if(strIn.indexOf(str) !=-1)
               {
                     bFind = true;
 			break;
               }
 	}
 	return bFind;
 }

 phobos 's string.d can add this some function to let the 
 indexOfAny to better?

 Thank you.

 Frank
std.algorithm.canFind will do what you want, including telling you which of ["exe","lib","dll","a"] was found. If you need to know where in strs it was found as well, you can use std.algorithm.find
Sorry, 'std.algorithm.find' do this work:Finds an individual element in an input range,and it's Parameters: InputRange haystack The range searched in. Element needle The element searched for. But now I want to know in a string (like "hello.exe" or "hello.a",or "hello.dll" or "hello.lib" ) whether contains any of them: ["exe","dll","a","lib"]. My function 'findStr' works fine. If the string.d's function 'indexOfAny' do this work,it will happy.(but now 'IndexOfAny' and 'indexOf' do the same work) . Thank you.
Jan 07 2015
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
FrankLike:

 But now I want to know in a string (like "hello.exe" or 
 "hello.a",or "hello.dll" or "hello.lib" ) whether contains any 
 of them: ["exe","dll","a","lib"].
Seems this: http://rosettacode.org/wiki/File_extension_is_in_extensions_list#D Bye, bearophile
Jan 07 2015
parent "Tobias Pankrath" <tobias pankrath.net> writes:
On Wednesday, 7 January 2015 at 16:02:25 UTC, bearophile wrote:
 FrankLike:

 But now I want to know in a string (like "hello.exe" or 
 "hello.a",or "hello.dll" or "hello.lib" ) whether contains any 
 of them: ["exe","dll","a","lib"].
Seems this: http://rosettacode.org/wiki/File_extension_is_in_extensions_list#D Bye, bearophile
Which uses this overload: size_t canFind(Range, Ranges...)(Range haystack, Ranges needles)
Jan 07 2015
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 7 January 2015 at 15:57:18 UTC, FrankLike wrote:
 On Wednesday, 7 January 2015 at 15:11:57 UTC, John Colvin wrote:
 On Wednesday, 7 January 2015 at 14:54:51 UTC, FrankLike wrote:
 I want to know whether the string strs contains 
 'exe','dll','a','lib',in c#,
 I can do : int index =  
 indexofany(strs,["exe","dll","a","lib"]);
 but in D:  I must to do like this:

 findStr(strs,["exe","lib","dll","a"]))

 bool findStr(string strIn,string[] strFind)
 {
 	bool bFind = false;
 	foreach(str;strFind)
 	{
 		if(strIn.indexOf(str) !=-1)
              {
                    bFind = true;
 			break;
              }
 	}
 	return bFind;
 }

 phobos 's string.d can add this some function to let the 
 indexOfAny to better?

 Thank you.

 Frank
std.algorithm.canFind will do what you want, including telling you which of ["exe","lib","dll","a"] was found. If you need to know where in strs it was found as well, you can use std.algorithm.find
Sorry, 'std.algorithm.find' do this work:Finds an individual element in an input range,and it's Parameters: InputRange haystack The range searched in. Element needle The element searched for.
std.algorithm.find has several overloads, one of which takes multiple needles. The same is true for std.algorithm.canFind Quoting from the relevant std.algorithm.find overload docs: "Finds two or more needles into a haystack."
Jan 07 2015
parent reply "FrankLike" <1150015857 qq.com> writes:
 std.algorithm.find has several overloads, one of which takes 
 multiple needles. The same is true for std.algorithm.canFind

 Quoting from the relevant std.algorithm.find overload docs: 
 "Finds two or more needles into a haystack."
string strs ="hello.exe"; string[] s =["lib","exe","a","dll"]; auto a = canFind!(string,string[])(strs,s); writeln("a is ",a); string strsb ="hello."; auto b = canFind!(string,string[])(strsb,s); writeln("b is ",b); Get error: does not match template declaration canFind(alias pred = "a ==b") you can test it. Thank you.
Jan 07 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:
Try this:

	http://dlang.org/phobos-prerelease/std_algorithm#.findAmong


T

-- 
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs
Jan 07 2015
next sibling parent "FrankLike" <1150015857 qq.com> writes:
On Wednesday, 7 January 2015 at 17:08:55 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 Try this:

 	http://dlang.org/phobos-prerelease/std_algorithm#.findAmong


 T
You mean ? The result is not that I want to get! ---------------test.d-------------- import std.stdio, std.algorithm,std.string; auto ext =["exe","lib","a","dll"]; auto strs = "hello.exe"; void main() { auto b = findAmong(ext,strs); writeln("b is ",b); } ---------result----- b is ["exe","lib","a","dll"] -------------------- note: 1. I only want to find the given string 'hello.exe' whether to include any a string in the ["exe","lib","a","dll"]. 2. I think the 'indexOfAny' function of string.d do the same work with 'indexOf',This is not as it should be. Frank
Jan 08 2015
prev sibling parent "FrankLike" <1150015857 qq.com> writes:
On Wednesday, 7 January 2015 at 17:08:55 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 Try this:

 	http://dlang.org/phobos-prerelease/std_algorithm#.findAmong


 T
Thank you,it can work. but it's not what I want. ---------------test.d-------------- import std.stdio, std.algorithm,std.string; auto ext =["exe","lib","a","dll"]; auto strs = "hello.dll"; void main() { auto b = findAmong(ext,strs); writeln("b is ",b); } ---------result----- b is ["dll"] -------------------- I think if 'indexOfAny' function of string.d do the work ,it should be ok. such as : auto b = "hello.dll".indexOfAny(["exe","lib","a","dll"]); writeln("b is ",b); The result should be 'true',if it can work. Can you suggest 'phobos' to update 'indexOfAny' fuction? Thank you. Frank
Jan 08 2015
prev sibling parent reply "Robert burner Schadek" <rburners gmail.com> writes:
use canFind like such:
     bool a = canFind(strs,s) >= 1;

let the compiler figger out what the types of the parameter are.
Jan 08 2015
parent reply "FrankLike" <1150015857 qq.com> writes:
On Thursday, 8 January 2015 at 15:15:59 UTC, Robert burner 
Schadek wrote:
 use canFind like such:
     bool a = canFind(strs,s) >= 1;

 let the compiler figger out what the types of the parameter are.
canFind is work for such as : bool x = canFind(["exe","lib","a","dll"],"a" ); but can't work for canFind(["exe","lib","a","dll"],"hello.lib"); So I very want to let the function 'indexOfAny' do the same work. Thank you. Frank
Jan 08 2015
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 07:10:14 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Thursday, 8 January 2015 at 15:15:59 UTC, Robert burner=20
 Schadek wrote:
 use canFind like such:
     bool a =3D canFind(strs,s) >=3D 1;

 let the compiler figger out what the types of the parameter are.
=20 canFind is work for such as : bool x =3D canFind(["exe","lib","a","dll"],"a" ); but can't work for canFind(["exe","lib","a","dll"],"hello.lib"); =20 So I very want to let the function 'indexOfAny' do the same work. =20 Thank you. =20 Frank
be creative! ;-) import std.algorithm, std.stdio; void main () { string fname =3D "hello.exe"; import std.path : extension; if (findAmong([fname.extension], [".exe", ".lib", ".a", ".dll"]).length= ) { writeln("got it!"); } else { writeln("alas..."); } } note the dots in extension list. yet you can do it even easier: import std.algorithm, std.stdio; void main () { string fname =3D "hello.exe"; import std.path : extension; if ([".exe", ".lib", ".a", ".dll"].canFind(fname.extension)) { writeln("got it!"); } else { writeln("alas..."); } } as you obviously interested in extension here -- check only that part! ;-)
Jan 08 2015
parent reply "FrankLike" <1150015857 qq.com> writes:
iday, 9 January 2015 at 07:41:07 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 07:10:14 +0000
 FrankLike via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 On Thursday, 8 January 2015 at 15:15:59 UTC, Robert burner 
 Schadek wrote:
 use canFind like such:
     bool a = canFind(strs,s) >= 1;

 let the compiler figger out what the types of the parameter 
 are.
canFind is work for such as : bool x = canFind(["exe","lib","a","dll"],"a" ); but can't work for canFind(["exe","lib","a","dll"],"hello.lib"); So I very want to let the function 'indexOfAny' do the same work.
 
 Thank you.
 
 Frank
be creative! ;-) import std.algorithm, std.stdio; void main () { string fname = "hello.exe"; import std.path : extension; if (findAmong([fname.extension], [".exe", ".lib", ".a", ".dll"]).length) { writeln("got it!"); } else { writeln("alas..."); } } note the dots in extension list. yet you can do it even easier: import std.algorithm, std.stdio; void main () { string fname = "hello.exe"; import std.path : extension; if ([".exe", ".lib", ".a", ".dll"].canFind(fname.extension)) { writeln("got it!"); } else { writeln("alas..."); } } as you obviously interested in extension here -- check only that part! ;-)
Sorry,it's only a example .Thank you work hard,but it's not what I want. 'indexOfAny ' function should do this work. ”he is at home" ,["home","office",”sea","plane"], in C#,IndexOfAny can do it,what about in D? I know findAmong can do it,but use two function . Thank you.
Jan 09 2015
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 09:36:01 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 Sorry,it's  only  a example .Thank  you  work  hard,but  it's =20
 not  what  I  want.
 'indexOfAny '  function  should  do  this  work.
 =E2=80=9Dhe  is  at  home"  ,["home","office",=E2=80=9Dsea","plane"],  in=
=20
 C#,IndexOfAny can do it,what  about  in D?
 I know  findAmong can do it,but  use  two  function  .
 Thank  you.
be creative! ;-) import std.algorithm, std.stdio; void main () { string s =3D "he is at plane"; if (findAmong!((string a, string b) =3D> b.canFind(a))([s], ["home", "o= ffice", "sea", "plane"]).length) { writeln("got it!"); } else { writeln("alas..."); } } or: import std.algorithm, std.stdio; void main () { string s =3D "he is at home"; if (["home", "office", "sea", "plane"].canFind!((a, string b) =3D> b.ca= nFind(a))(s)) { writeln("got it!"); } else { writeln("alas..."); } }
Jan 09 2015
next sibling parent reply "FrankLike" <1150015857 qq.com> writes:
 be creative! ;-)

   import std.algorithm, std.stdio;

   void main () {
     string s = "he is at plane";
     if (findAmong!((string a, string b) => b.canFind(a))([s], 
 ["home", "office", "sea", "plane"]).length) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }

 or:

   import std.algorithm, std.stdio;

   void main () {
     string s = "he is at home";
     if (["home", "office", "sea", "plane"].canFind!((a, string 
 b) => b.canFind(a))(s)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }
The code is the best,and it's better than indexOfAny in C#: import std.algorithm, std.stdio; void main () { auto places = [ "home", "office", "sea","plane"]; auto strWhere = "He is in the sea."; auto where = places.canFind!(a => strWhere.canFind(a)); writeln("Result is ",where); }
Jan 09 2015
parent ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 12:46:53 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 The code is the best,and it's better than indexOfAny in C#:
=20
 import std.algorithm, std.stdio;
 void main ()
 {
      auto places =3D [ "home", "office", "sea","plane"];
      auto strWhere =3D "He is in the sea.";
      auto where =3D places.canFind!(a =3D> strWhere.canFind(a));
      writeln("Result is  ",where);
 }
this does unnecessary upvalue access (`strWhere`). try to avoid such stuff whenever it is possible.
Jan 09 2015
prev sibling parent reply "FrankLike" <1150015857 qq.com> writes:
On Friday, 9 January 2015 at 10:02:53 UTC, ketmar via 
Digitalmars-d-learn wrote:

   import std.algorithm, std.stdio;

   void main () {
     string s = "he is at home";
     if (["home", "office", "sea", "plane"].canFind!((a, string 
 b) => b.canFind(a))(s)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }
Thank you. The code is the best,and it's better than indexOfAny in C#: /* places.canFind!(a => strWhere.canFind(a)); */ By auto r = benchmark!(f0,f1, f2, f3,f4)(10_0000); Result is : filter is 42ms 85us findAmong is 37ms 268us foreach indexOf is 37ms 841us canFind is 13ms canFind indexOf is 39ms 455us -----------------------5 functions-------------------------- import std.stdio, std.algorithm,std.string; auto places = [ "home", "office", "sea","plane"]; auto strWhere = "He is in the sea."; void main() { auto where = places.filter!(a => strWhere.indexOf(a) != -1); writeln("0 Result is ",where); auto where1 = findAmong(places,strWhere); writeln("1 Result is ",where1); string where2; foreach(a;places) { if(strWhere.indexOf(a) !=-1) { where2 = a; break; } } writeln("2 Result is ",where2); auto where3 = places.canFind!(a => strWhere.canFind(a)); writeln("3 Result is ",where3); auto where4 = places.canFind!(a => strWhere.indexOf(a) != -1); writeln("4 Result is ",where4); } Frank
Jan 09 2015
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 13:06:09 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Friday, 9 January 2015 at 10:02:53 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
=20
   import std.algorithm, std.stdio;

   void main () {
     string s =3D "he is at home";
     if (["home", "office", "sea", "plane"].canFind!((a, string=20
 b) =3D> b.canFind(a))(s)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }
=20 Thank you. =20 The code is the best,and it's better than indexOfAny in C#: =20 /* places.canFind!(a =3D> strWhere.canFind(a)); */ =20 By auto r =3D benchmark!(f0,f1, f2, f3,f4)(10_0000); =20 Result is : filter is 42ms 85us findAmong is 37ms 268us foreach indexOf is 37ms 841us canFind is 13ms canFind indexOf is 39ms 455us =20 -----------------------5 functions-------------------------- import std.stdio, std.algorithm,std.string; =20 auto places =3D [ "home", "office", "sea","plane"]; auto strWhere =3D "He is in the sea."; =20 void main() { auto where =3D places.filter!(a =3D> strWhere.indexOf(a) !=3D -1); writeln("0 Result is ",where); =09 auto where1 =3D findAmong(places,strWhere); writeln("1 Result is ",where1); =09 string where2; foreach(a;places) { if(strWhere.indexOf(a) !=3D-1) { where2 =3D a; break; } } writeln("2 Result is ",where2); =09 auto where3 =3D places.canFind!(a =3D> strWhere.canFind(a)); writeln("3 Result is ",where3); =09 auto where4 =3D places.canFind!(a =3D> strWhere.indexOf(a) !=3D -1); writeln("4 Result is ",where4); } =20 Frank
if you *really* concerned with speed here, you'd better consider using regular expressions. as regular expression can be precompiled and then search for multiple words with only one pass over the source string. i believe that std.regex will use variation of Thomson algorithm for regular expressions when it is able to do so.
Jan 09 2015
parent reply "Robert burner Schadek" <rburners gmail.com> writes:
On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via 
Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better 
 consider using
 regular expressions. as regular expression can be precompiled 
 and then
 search for multiple words with only one pass over the source 
 string. i
 believe that std.regex will use variation of Thomson algorithm 
 for
 regular expressions when it is able to do so.
IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char. If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more.
Jan 09 2015
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 13:54:00 +0000
Robert burner Schadek via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better=20
 consider using
 regular expressions. as regular expression can be precompiled=20
 and then
 search for multiple words with only one pass over the source=20
 string. i
 believe that std.regex will use variation of Thomson algorithm=20
 for
 regular expressions when it is able to do so.
=20 IMO that is not sound advice. Creating the state machine and=20 running will be more costly than using canFind or indexOf how=20 basically only compare char by char. =20 If speed is really need use strstr and look if it uses sse to=20 compare multiple chars at a time. Anyway benchmark and then=20 benchmark some more.
std.regex can use CTFE to compile regular expressions (yet it sometimes slower than non-CTFE variant), and i mean that we compile regexp before doing alot of searches, not before each single search. if you have alot of words to match or alot of strings to check, regexp can give a huge boost. sure, it all depends of code patterns.
Jan 09 2015
next sibling parent reply "Robert burner Schadek" <rburners gmail.com> writes:
On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via 
Digitalmars-d-learn wrote:

 std.regex can use CTFE to compile regular expressions (yet it 
 sometimes
 slower than non-CTFE variant), and i mean that we compile 
 regexp before
 doing alot of searches, not before each single search. if you 
 have alot
 of words to match or alot of strings to check, regexp can give 
 a huge
 boost.

 sure, it all depends of code patterns.
even with CTFE regex still uses a state machine _mm256_cmpeq_epi8 will beat that even for multiple strings. Basically all lexer are handwritten, if regex where fast enough nobody would do the work.
Jan 09 2015
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 14:11:49 +0000
Robert burner Schadek via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
=20
 std.regex can use CTFE to compile regular expressions (yet it=20
 sometimes
 slower than non-CTFE variant), and i mean that we compile=20
 regexp before
 doing alot of searches, not before each single search. if you=20
 have alot
 of words to match or alot of strings to check, regexp can give=20
 a huge
 boost.

 sure, it all depends of code patterns.
=20 even with CTFE regex still uses a state machine _mm256_cmpeq_epi8=20 will beat that even for multiple strings. Basically all lexer are=20 handwritten, if regex where fast enough nobody would do the work.
heh. regexps *are* fast enough. it's hard to beat well-optimised generated thingy on a complex grammar. ;-)
Jan 09 2015
parent "Robert burner Schadek" <rburners gmail.com> writes:
On Friday, 9 January 2015 at 14:21:04 UTC, ketmar via 
Digitalmars-d-learn wrote:

 heh. regexps *are* fast enough. it's hard to beat well-optimised
 generated thingy on a complex grammar. ;-)
I don't see your point, anyway I think he got his help or at least some help.
Jan 09 2015
prev sibling parent reply "FrankLike" <1150015857 qq.com> writes:
On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 13:54:00 +0000
 Robert burner Schadek via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via 
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better 
 consider using
 regular expressions. as regular expression can be 
 precompiled and then
 search for multiple words with only one pass over the source 
 string. i
 believe that std.regex will use variation of Thomson 
 algorithm for
 regular expressions when it is able to do so.
IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char. If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more.
std.regex can use CTFE to compile regular expressions (yet it sometimes slower than non-CTFE variant), and i mean that we compile regexp before doing alot of searches, not before each single search. if you have alot of words to match or alot of strings to check, regexp can give a huge boost. sure, it all depends of code patterns.
import std.regex; auto ctr = ctRegex!(`(home|office|sea|plane)`); auto c2 = !matchFirst("He is in the sea.", ctr).empty; ---------------------------------------------------------- Test by auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000); Result is : filter is 42ms 85us findAmong is 37ms 268us foreach indexOf is 37ms 841us canFind is 13ms canFind indexOf is 39ms 455us ctRegex is 138ms
Jan 09 2015
parent reply ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:
On Fri, 09 Jan 2015 15:36:21 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 13:54:00 +0000
 Robert burner Schadek via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better=20
 consider using
 regular expressions. as regular expression can be=20
 precompiled and then
 search for multiple words with only one pass over the source=20
 string. i
 believe that std.regex will use variation of Thomson=20
 algorithm for
 regular expressions when it is able to do so.
=20 IMO that is not sound advice. Creating the state machine and=20 running will be more costly than using canFind or indexOf how=20 basically only compare char by char. =20 If speed is really need use strstr and look if it uses sse to=20 compare multiple chars at a time. Anyway benchmark and then=20 benchmark some more.
std.regex can use CTFE to compile regular expressions (yet it=20 sometimes slower than non-CTFE variant), and i mean that we compile=20 regexp before doing alot of searches, not before each single search. if you=20 have alot of words to match or alot of strings to check, regexp can give=20 a huge boost. sure, it all depends of code patterns.
import std.regex; auto ctr =3D ctRegex!(`(home|office|sea|plane)`); auto c2 =3D !matchFirst("He is in the sea.", ctr).empty; ---------------------------------------------------------- Test by auto r =3D benchmark!(f0,f1, f2, f3,f4,f5)(10_0000); =20 Result is : filter is 42ms 85us findAmong is 37ms 268us foreach indexOf is 37ms 841us canFind is 13ms canFind indexOf is 39ms 455us ctRegex is 138ms
1. stop doing captures in regexp, this will speedup the comparison. 2. your sample is very artificial. i was talking about alot more keywords and alot longer strings. sorry, i wasn't told that clear enough.
Jan 09 2015
parent "FrankLike" <1150015857 qq.com> writes:
On Friday, 9 January 2015 at 15:57:21 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 15:36:21 +0000
 FrankLike via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via 
 Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 13:54:00 +0000
 Robert burner Schadek via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via 
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better 
 consider using
 regular expressions. as regular expression can be 
 precompiled and then
 search for multiple words with only one pass over the 
 source string. i
 believe that std.regex will use variation of Thomson 
 algorithm for
 regular expressions when it is able to do so.
IMO that is not sound advice. Creating the state machine and running will be more costly than using canFind or indexOf how basically only compare char by char. If speed is really need use strstr and look if it uses sse to compare multiple chars at a time. Anyway benchmark and then benchmark some more.
std.regex can use CTFE to compile regular expressions (yet it sometimes slower than non-CTFE variant), and i mean that we compile regexp before doing alot of searches, not before each single search. if you have alot of words to match or alot of strings to check, regexp can give a huge boost. sure, it all depends of code patterns.
import std.regex; auto ctr = ctRegex!(`(home|office|sea|plane)`); auto c2 = !matchFirst("He is in the sea.", ctr).empty; ---------------------------------------------------------- Test by auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000); Result is : filter is 42ms 85us findAmong is 37ms 268us foreach indexOf is 37ms 841us canFind is 13ms canFind indexOf is 39ms 455us ctRegex is 138ms
1. stop doing captures in regexp, this will speedup the comparison. 2. your sample is very artificial. i was talking about alot more keywords and alot longer strings. sorry, i wasn't told that clear enough.
Yes. regex doing 'a lot more keywords and a lot longer strings' will be better. Thank you.
Jan 09 2015