digitalmars.D.learn - Why do the same work about 'IndexOfAny' and 'indexOf' function?

FrankLike (22/22) Jan 07 2015 I want to know whether the string strs contains

John Colvin (5/28) Jan 07 2015 std.algorithm.canFind will do what you want, including telling

FrankLike (12/45) Jan 07 2015 Sorry, 'std.algorithm.find' do this work:Finds an individual

bearophile (5/8) Jan 07 2015 Seems this:

Tobias Pankrath (3/11) Jan 07 2015 Which uses this overload:

John Colvin (5/45) Jan 07 2015 std.algorithm.find has several overloads, one of which takes

FrankLike (11/15) Jan 07 2015 string strs ="hello.exe";

H. S. Teoh via Digitalmars-d-learn (5/5) Jan 07 2015 Try this:

FrankLike (21/24) Jan 08 2015 You mean ? The result is not that I want to get!
FrankLike (24/27) Jan 08 2015 Thank you,it can work. but it's not what I want.

Robert burner Schadek (3/3) Jan 08 2015 use canFind like such:

FrankLike (8/11) Jan 08 2015 canFind is work for such as :

ketmar via Digitalmars-d-learn (29/46) Jan 08 2015 On Fri, 09 Jan 2015 07:10:14 +0000

FrankLike (9/60) Jan 09 2015 Sorry,it's only a example .Thank you work hard,but it's

ketmar via Digitalmars-d-learn (26/33) Jan 09 2015 On Fri, 09 Jan 2015 09:36:01 +0000

FrankLike (9/31) Jan 09 2015 The code is the best,and it's better than indexOfAny in C#:

ketmar via Digitalmars-d-learn (5/15) Jan 09 2015 On Fri, 09 Jan 2015 12:46:53 +0000

FrankLike (42/52) Jan 09 2015 Thank you.

ketmar via Digitalmars-d-learn (8/71) Jan 09 2015 On Fri, 09 Jan 2015 13:06:09 +0000

Robert burner Schadek (8/17) Jan 09 2015 IMO that is not sound advice. Creating the state machine and

ketmar via Digitalmars-d-learn (9/28) Jan 09 2015 On Fri, 09 Jan 2015 13:54:00 +0000

Robert burner Schadek (5/15) Jan 09 2015 even with CTFE regex still uses a state machine _mm256_cmpeq_epi8

ketmar via Digitalmars-d-learn (5/23) Jan 09 2015 On Fri, 09 Jan 2015 14:11:49 +0000

Robert burner Schadek (4/6) Jan 09 2015 I don't see your point, anyway I think he got his help or at

FrankLike (14/46) Jan 09 2015 import std.regex;

ketmar via Digitalmars-d-learn (7/56) Jan 09 2015 On Fri, 09 Jan 2015 15:36:21 +0000

FrankLike (5/64) Jan 09 2015 Yes. regex doing 'a lot more keywords and a lot longer strings'

"FrankLike" <1150015857 qq.com> writes:

I want to know whether the string strs contains 

I can do : int index =  indexofany(strs,["exe","dll","a","lib"]);
but in D:  I must to do like this:

findStr(strs,["exe","lib","dll","a"]))

bool findStr(string strIn,string[] strFind)
{
	bool bFind = false;
	foreach(str;strFind)
	{
		if(strIn.indexOf(str) !=-1)
                {
                      bFind = true;
			break;
                }
	}
	return bFind;
}

phobos 's string.d can add this some function to let the 
indexOfAny to better?

Thank you.

Frank

Jan 07 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 7 January 2015 at 14:54:51 UTC, FrankLike wrote:
 I want to know whether the string strs contains 

 I can do : int index =  
 indexofany(strs,["exe","dll","a","lib"]);
 but in D:  I must to do like this:

 findStr(strs,["exe","lib","dll","a"]))

 bool findStr(string strIn,string[] strFind)
 {
 	bool bFind = false;
 	foreach(str;strFind)
 	{
 		if(strIn.indexOf(str) !=-1)
                {
                      bFind = true;
 			break;
                }
 	}
 	return bFind;
 }

 phobos 's string.d can add this some function to let the 
 indexOfAny to better?

 Thank you.

 Frank

std.algorithm.canFind will do what you want, including telling 
you which of ["exe","lib","dll","a"] was found.

If you need to know where in strs it was found as well, you can 
use std.algorithm.find

Jan 07 2015

"FrankLike" <1150015857 qq.com> writes:

On Wednesday, 7 January 2015 at 15:11:57 UTC, John Colvin wrote:
 On Wednesday, 7 January 2015 at 14:54:51 UTC, FrankLike wrote:
 I want to know whether the string strs contains 

 I can do : int index =  
 indexofany(strs,["exe","dll","a","lib"]);
 but in D:  I must to do like this:

 findStr(strs,["exe","lib","dll","a"]))

 bool findStr(string strIn,string[] strFind)
 {
 	bool bFind = false;
 	foreach(str;strFind)
 	{
 		if(strIn.indexOf(str) !=-1)
               {
                     bFind = true;
 			break;
               }
 	}
 	return bFind;
 }

 phobos 's string.d can add this some function to let the 
 indexOfAny to better?

 Thank you.

 Frank

 std.algorithm.canFind will do what you want, including telling 
 you which of ["exe","lib","dll","a"] was found.

 If you need to know where in strs it was found as well, you can 
 use std.algorithm.find

Sorry, 'std.algorithm.find' do this work:Finds an individual 
element in an input range,and it's Parameters: InputRange 
haystack The range searched in.
Element needle The element searched for.

But now I want to know in a string (like "hello.exe" or 
"hello.a",or "hello.dll" or "hello.lib" ) whether contains any of 
them: ["exe","dll","a","lib"].

My function 'findStr' works fine. If the string.d's function 
'indexOfAny' do this work,it will happy.(but now  'IndexOfAny' 
and 'indexOf' do the same work) .

Thank you.

Jan 07 2015

"bearophile" <bearophileHUGS lycos.com> writes:

FrankLike:

 But now I want to know in a string (like "hello.exe" or 
 "hello.a",or "hello.dll" or "hello.lib" ) whether contains any 
 of them: ["exe","dll","a","lib"].

Seems this:
http://rosettacode.org/wiki/File_extension_is_in_extensions_list#D

Bye,
bearophile

Jan 07 2015

"Tobias Pankrath" <tobias pankrath.net> writes:

On Wednesday, 7 January 2015 at 16:02:25 UTC, bearophile wrote:
 FrankLike:

 But now I want to know in a string (like "hello.exe" or 
 "hello.a",or "hello.dll" or "hello.lib" ) whether contains any 
 of them: ["exe","dll","a","lib"].

 Seems this:
 http://rosettacode.org/wiki/File_extension_is_in_extensions_list#D

 Bye,
 bearophile

Which uses this overload:

size_t canFind(Range, Ranges...)(Range haystack, Ranges needles)

Jan 07 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Wednesday, 7 January 2015 at 15:57:18 UTC, FrankLike wrote:
 On Wednesday, 7 January 2015 at 15:11:57 UTC, John Colvin wrote:
 On Wednesday, 7 January 2015 at 14:54:51 UTC, FrankLike wrote:
 I want to know whether the string strs contains 

 I can do : int index =  
 indexofany(strs,["exe","dll","a","lib"]);
 but in D:  I must to do like this:

 findStr(strs,["exe","lib","dll","a"]))

 bool findStr(string strIn,string[] strFind)
 {
 	bool bFind = false;
 	foreach(str;strFind)
 	{
 		if(strIn.indexOf(str) !=-1)
              {
                    bFind = true;
 			break;
              }
 	}
 	return bFind;
 }

 phobos 's string.d can add this some function to let the 
 indexOfAny to better?

 Thank you.

 Frank

 std.algorithm.canFind will do what you want, including telling 
 you which of ["exe","lib","dll","a"] was found.

 If you need to know where in strs it was found as well, you 
 can use std.algorithm.find

 Sorry, 'std.algorithm.find' do this work:Finds an individual 
 element in an input range,and it's Parameters: InputRange 
 haystack The range searched in.
 Element needle The element searched for.

std.algorithm.find has several overloads, one of which takes 
multiple needles. The same is true for std.algorithm.canFind

Quoting from the relevant std.algorithm.find overload docs: 
"Finds two or more needles into a haystack."

Jan 07 2015

"FrankLike" <1150015857 qq.com> writes:

 std.algorithm.find has several overloads, one of which takes 
 multiple needles. The same is true for std.algorithm.canFind

 Quoting from the relevant std.algorithm.find overload docs: 
 "Finds two or more needles into a haystack."

string strs ="hello.exe";
   string[] s =["lib","exe","a","dll"];
   auto a = canFind!(string,string[])(strs,s);
   writeln("a is ",a);
   string strsb ="hello.";
   auto b = canFind!(string,string[])(strsb,s);
   writeln("b is ",b);

Get error:
does not match template declaration canFind(alias pred = "a ==b")

you can test it.

Thank you.

Jan 07 2015

"H. S. Teoh via Digitalmars-d-learn" <digitalmars-d-learn puremagic.com> writes:

Try this:




T

-- 
MACINTOSH: Most Applications Crash, If Not, The Operating System Hangs

Jan 07 2015

"FrankLike" <1150015857 qq.com> writes:

On Wednesday, 7 January 2015 at 17:08:55 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 Try this:




 T

You mean ? The result is not that I want to get!

---------------test.d--------------
import  std.stdio, std.algorithm,std.string;

auto ext =["exe","lib","a","dll"];
auto strs = "hello.exe";

void main()
{
     auto b = findAmong(ext,strs);
    	writeln("b is ",b);
}
---------result-----
b is ["exe","lib","a","dll"]
--------------------
note:
1. I only want to find the given string 'hello.exe' whether to 
include any a string in the ["exe","lib","a","dll"].
2. I think the 'indexOfAny' function of string.d do the same work 
with 'indexOf',This is not as it should be.

Frank

Jan 08 2015

"FrankLike" <1150015857 qq.com> writes:

On Wednesday, 7 January 2015 at 17:08:55 UTC, H. S. Teoh via 
Digitalmars-d-learn wrote:
 Try this:




 T

Thank you,it can work. but it's not what I want.

---------------test.d--------------
import  std.stdio, std.algorithm,std.string;

auto ext =["exe","lib","a","dll"];
auto strs = "hello.dll";

void main()
{
     auto b = findAmong(ext,strs);
    	writeln("b is ",b);
}
---------result-----
b is ["dll"]
--------------------

I think if  'indexOfAny' function of string.d do the work ,it 
should be ok.

such as :

  auto b = "hello.dll".indexOfAny(["exe","lib","a","dll"]);
    	writeln("b is ",b);

The result should be 'true',if it can work.

Can you suggest 'phobos' to update 'indexOfAny' fuction?

Thank you.
Frank

Jan 08 2015

"Robert burner Schadek" <rburners gmail.com> writes:

use canFind like such:
     bool a = canFind(strs,s) >= 1;

let the compiler figger out what the types of the parameter are.

Jan 08 2015

"FrankLike" <1150015857 qq.com> writes:

On Thursday, 8 January 2015 at 15:15:59 UTC, Robert burner 
Schadek wrote:
 use canFind like such:
     bool a = canFind(strs,s) >= 1;

 let the compiler figger out what the types of the parameter are.

canFind is work for such as :
  bool x = canFind(["exe","lib","a","dll"],"a" );
but can't work for canFind(["exe","lib","a","dll"],"hello.lib");

So  I very want to let the function 'indexOfAny' do the same work.

Thank you.

Frank

Jan 08 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 07:10:14 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Thursday, 8 January 2015 at 15:15:59 UTC, Robert burner=20
 Schadek wrote:
 use canFind like such:
     bool a =3D canFind(strs,s) >=3D 1;

 let the compiler figger out what the types of the parameter are.

=20
 canFind is work for such as :
   bool x =3D canFind(["exe","lib","a","dll"],"a" );
 but can't work for canFind(["exe","lib","a","dll"],"hello.lib");
=20
 So  I very want to let the function 'indexOfAny' do the same work.
=20
 Thank you.
=20
 Frank

be creative! ;-)

  import std.algorithm, std.stdio;

  void main () {
    string fname =3D "hello.exe";
    import std.path : extension;
    if (findAmong([fname.extension], [".exe", ".lib", ".a", ".dll"]).length=
) {
      writeln("got it!");
    } else {
      writeln("alas...");
    }
  }

note the dots in extension list.

yet you can do it even easier:

  import std.algorithm, std.stdio;

  void main () {
    string fname =3D "hello.exe";
    import std.path : extension;
    if ([".exe", ".lib", ".a", ".dll"].canFind(fname.extension)) {
      writeln("got it!");
    } else {
      writeln("alas...");
    }
  }

as you obviously interested in extension here -- check only that
part! ;-)

Jan 08 2015

"FrankLike" <1150015857 qq.com> writes:

iday, 9 January 2015 at 07:41:07 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 07:10:14 +0000
 FrankLike via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 On Thursday, 8 January 2015 at 15:15:59 UTC, Robert burner 
 Schadek wrote:
 use canFind like such:
     bool a = canFind(strs,s) >= 1;

 let the compiler figger out what the types of the parameter 
 are.

 
 canFind is work for such as :
   bool x = canFind(["exe","lib","a","dll"],"a" );
 but can't work for 
 canFind(["exe","lib","a","dll"],"hello.lib");
 
 So  I very want to let the function 'indexOfAny' do the same 
 work.


 
 Thank you.
 
 Frank

 be creative! ;-)

   import std.algorithm, std.stdio;

   void main () {
     string fname = "hello.exe";
     import std.path : extension;
     if (findAmong([fname.extension], [".exe", ".lib", ".a", 
 ".dll"]).length) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }

 note the dots in extension list.

 yet you can do it even easier:

   import std.algorithm, std.stdio;

   void main () {
     string fname = "hello.exe";
     import std.path : extension;
     if ([".exe", ".lib", ".a", 
 ".dll"].canFind(fname.extension)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }

 as you obviously interested in extension here -- check only that
 part! ;-)

Sorry,it's  only  a example .Thank  you  work  hard,but  it's  
not  what  I  want.
'indexOfAny '  function  should  do  this  work.
”he  is  at  home"  ,["home","office",”sea","plane"],  in  

I know  findAmong can do it,but  use  two  function  .
Thank  you.

Jan 09 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 09:36:01 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 Sorry,it's  only  a example .Thank  you  work  hard,but  it's =20
 not  what  I  want.
 'indexOfAny '  function  should  do  this  work.
 =E2=80=9Dhe  is  at  home"  ,["home","office",=E2=80=9Dsea","plane"],  in=

 =20

 I know  findAmong can do it,but  use  two  function  .
 Thank  you.

be creative! ;-)

  import std.algorithm, std.stdio;

  void main () {
    string s =3D "he is at plane";
    if (findAmong!((string a, string b) =3D> b.canFind(a))([s], ["home", "o=
ffice", "sea", "plane"]).length) {
      writeln("got it!");
    } else {
      writeln("alas...");
    }
  }

or:

  import std.algorithm, std.stdio;

  void main () {
    string s =3D "he is at home";
    if (["home", "office", "sea", "plane"].canFind!((a, string b) =3D> b.ca=
nFind(a))(s)) {
      writeln("got it!");
    } else {
      writeln("alas...");
    }
  }

Jan 09 2015

"FrankLike" <1150015857 qq.com> writes:

 be creative! ;-)

   import std.algorithm, std.stdio;

   void main () {
     string s = "he is at plane";
     if (findAmong!((string a, string b) => b.canFind(a))([s], 
 ["home", "office", "sea", "plane"]).length) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }

 or:

   import std.algorithm, std.stdio;

   void main () {
     string s = "he is at home";
     if (["home", "office", "sea", "plane"].canFind!((a, string 
 b) => b.canFind(a))(s)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }



import std.algorithm, std.stdio;
void main ()
{
     auto places = [ "home", "office", "sea","plane"];
     auto strWhere = "He is in the sea.";
     auto where = places.canFind!(a => strWhere.canFind(a));
     writeln("Result is  ",where);
}

Jan 09 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 12:46:53 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:


=20
 import std.algorithm, std.stdio;
 void main ()
 {
      auto places =3D [ "home", "office", "sea","plane"];
      auto strWhere =3D "He is in the sea.";
      auto where =3D places.canFind!(a =3D> strWhere.canFind(a));
      writeln("Result is  ",where);
 }

this does unnecessary upvalue access (`strWhere`). try to avoid such
stuff whenever it is possible.

Jan 09 2015

"FrankLike" <1150015857 qq.com> writes:

On Friday, 9 January 2015 at 10:02:53 UTC, ketmar via 
Digitalmars-d-learn wrote:

   import std.algorithm, std.stdio;

   void main () {
     string s = "he is at home";
     if (["home", "office", "sea", "plane"].canFind!((a, string 
 b) => b.canFind(a))(s)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }

Thank you.



/*  places.canFind!(a => strWhere.canFind(a));  */

By  auto r = benchmark!(f0,f1, f2, f3,f4)(10_0000);

Result is :
filter is          42ms 85us
findAmong is       37ms 268us
foreach indexOf is 37ms 841us
canFind is         13ms
canFind indexOf is 39ms 455us

-----------------------5 functions--------------------------
import  std.stdio, std.algorithm,std.string;

auto places = [ "home", "office", "sea","plane"];
auto strWhere = "He is in the sea.";

void main()
{
   auto where = places.filter!(a => strWhere.indexOf(a) != -1);
	writeln("0 Result is  ",where);
	
	auto where1 = findAmong(places,strWhere);
	writeln("1 Result is  ",where1);
	
	string where2;
	foreach(a;places)
	{
		if(strWhere.indexOf(a) !=-1)
		{
		  where2 = a;
		 break;
		}
	}
	writeln("2 Result is  ",where2);
	
	auto where3 = places.canFind!(a => strWhere.canFind(a));
	writeln("3 Result is  ",where3);
	
	auto where4 = places.canFind!(a => strWhere.indexOf(a) != -1);
	writeln("4 Result is  ",where4);
}

Frank

Jan 09 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 13:06:09 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Friday, 9 January 2015 at 10:02:53 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
=20
   import std.algorithm, std.stdio;

   void main () {
     string s =3D "he is at home";
     if (["home", "office", "sea", "plane"].canFind!((a, string=20
 b) =3D> b.canFind(a))(s)) {
       writeln("got it!");
     } else {
       writeln("alas...");
     }
   }

=20
 Thank you.
=20

=20
 /*  places.canFind!(a =3D> strWhere.canFind(a));  */
=20
 By  auto r =3D benchmark!(f0,f1, f2, f3,f4)(10_0000);
=20
 Result is :
 filter is          42ms 85us
 findAmong is       37ms 268us
 foreach indexOf is 37ms 841us
 canFind is         13ms
 canFind indexOf is 39ms 455us
=20
 -----------------------5 functions--------------------------
 import  std.stdio, std.algorithm,std.string;
=20
 auto places =3D [ "home", "office", "sea","plane"];
 auto strWhere =3D "He is in the sea.";
=20
 void main()
 {
    auto where =3D places.filter!(a =3D> strWhere.indexOf(a) !=3D -1);
 	writeln("0 Result is  ",where);
 =09
 	auto where1 =3D findAmong(places,strWhere);
 	writeln("1 Result is  ",where1);
 =09
 	string where2;
 	foreach(a;places)
 	{
 		if(strWhere.indexOf(a) !=3D-1)
 		{
 		  where2 =3D a;
 		 break;
 		}
 	}
 	writeln("2 Result is  ",where2);
 =09
 	auto where3 =3D places.canFind!(a =3D> strWhere.canFind(a));
 	writeln("3 Result is  ",where3);
 =09
 	auto where4 =3D places.canFind!(a =3D> strWhere.indexOf(a) !=3D -1);
 	writeln("4 Result is  ",where4);
 }
=20
 Frank

if you *really* concerned with speed here, you'd better consider using
regular expressions. as regular expression can be precompiled and then
search for multiple words with only one pass over the source string. i
believe that std.regex will use variation of Thomson algorithm for
regular expressions when it is able to do so.

Jan 09 2015

"Robert burner Schadek" <rburners gmail.com> writes:

On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via 
Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better 
 consider using
 regular expressions. as regular expression can be precompiled 
 and then
 search for multiple words with only one pass over the source 
 string. i
 believe that std.regex will use variation of Thomson algorithm 
 for
 regular expressions when it is able to do so.

IMO that is not sound advice. Creating the state machine and 
running will be more costly than using canFind or indexOf how 
basically only compare char by char.

If speed is really need use strstr and look if it uses sse to 
compare multiple chars at a time. Anyway benchmark and then 
benchmark some more.

Jan 09 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 13:54:00 +0000
Robert burner Schadek via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better=20
 consider using
 regular expressions. as regular expression can be precompiled=20
 and then
 search for multiple words with only one pass over the source=20
 string. i
 believe that std.regex will use variation of Thomson algorithm=20
 for
 regular expressions when it is able to do so.

=20
 IMO that is not sound advice. Creating the state machine and=20
 running will be more costly than using canFind or indexOf how=20
 basically only compare char by char.
=20
 If speed is really need use strstr and look if it uses sse to=20
 compare multiple chars at a time. Anyway benchmark and then=20
 benchmark some more.

std.regex can use CTFE to compile regular expressions (yet it sometimes
slower than non-CTFE variant), and i mean that we compile regexp before
doing alot of searches, not before each single search. if you have alot
of words to match or alot of strings to check, regexp can give a huge
boost.

sure, it all depends of code patterns.

Jan 09 2015

"Robert burner Schadek" <rburners gmail.com> writes:

On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via 
Digitalmars-d-learn wrote:

 std.regex can use CTFE to compile regular expressions (yet it 
 sometimes
 slower than non-CTFE variant), and i mean that we compile 
 regexp before
 doing alot of searches, not before each single search. if you 
 have alot
 of words to match or alot of strings to check, regexp can give 
 a huge
 boost.

 sure, it all depends of code patterns.

even with CTFE regex still uses a state machine _mm256_cmpeq_epi8 
will beat that even for multiple strings. Basically all lexer are 
handwritten, if regex where fast enough nobody would do the work.

Jan 09 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 14:11:49 +0000
Robert burner Schadek via Digitalmars-d-learn
<digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
=20
 std.regex can use CTFE to compile regular expressions (yet it=20
 sometimes
 slower than non-CTFE variant), and i mean that we compile=20
 regexp before
 doing alot of searches, not before each single search. if you=20
 have alot
 of words to match or alot of strings to check, regexp can give=20
 a huge
 boost.

 sure, it all depends of code patterns.

=20
 even with CTFE regex still uses a state machine _mm256_cmpeq_epi8=20
 will beat that even for multiple strings. Basically all lexer are=20
 handwritten, if regex where fast enough nobody would do the work.

heh. regexps *are* fast enough. it's hard to beat well-optimised
generated thingy on a complex grammar. ;-)

Jan 09 2015

"Robert burner Schadek" <rburners gmail.com> writes:

On Friday, 9 January 2015 at 14:21:04 UTC, ketmar via 
Digitalmars-d-learn wrote:

 heh. regexps *are* fast enough. it's hard to beat well-optimised
 generated thingy on a complex grammar. ;-)

I don't see your point, anyway I think he got his help or at 
least some help.

Jan 09 2015

"FrankLike" <1150015857 qq.com> writes:

On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 13:54:00 +0000
 Robert burner Schadek via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via 
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better 
 consider using
 regular expressions. as regular expression can be 
 precompiled and then
 search for multiple words with only one pass over the source 
 string. i
 believe that std.regex will use variation of Thomson 
 algorithm for
 regular expressions when it is able to do so.

 
 IMO that is not sound advice. Creating the state machine and 
 running will be more costly than using canFind or indexOf how 
 basically only compare char by char.
 
 If speed is really need use strstr and look if it uses sse to 
 compare multiple chars at a time. Anyway benchmark and then 
 benchmark some more.

 std.regex can use CTFE to compile regular expressions (yet it 
 sometimes
 slower than non-CTFE variant), and i mean that we compile 
 regexp before
 doing alot of searches, not before each single search. if you 
 have alot
 of words to match or alot of strings to check, regexp can give 
 a huge
 boost.

 sure, it all depends of code patterns.

import std.regex;
auto ctr = ctRegex!(`(home|office|sea|plane)`);
auto c2 = !matchFirst("He is in the sea.", ctr).empty;
----------------------------------------------------------
Test by  auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000);

Result is :
filter is          42ms 85us
findAmong is       37ms 268us
foreach indexOf is 37ms 841us
canFind is         13ms
canFind indexOf is 39ms 455us
ctRegex is         138ms

Jan 09 2015

ketmar via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

On Fri, 09 Jan 2015 15:36:21 +0000
FrankLike via Digitalmars-d-learn <digitalmars-d-learn puremagic.com>
wrote:

 On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 13:54:00 +0000
 Robert burner Schadek via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via=20
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better=20
 consider using
 regular expressions. as regular expression can be=20
 precompiled and then
 search for multiple words with only one pass over the source=20
 string. i
 believe that std.regex will use variation of Thomson=20
 algorithm for
 regular expressions when it is able to do so.

=20
 IMO that is not sound advice. Creating the state machine and=20
 running will be more costly than using canFind or indexOf how=20
 basically only compare char by char.
=20
 If speed is really need use strstr and look if it uses sse to=20
 compare multiple chars at a time. Anyway benchmark and then=20
 benchmark some more.

 std.regex can use CTFE to compile regular expressions (yet it=20
 sometimes
 slower than non-CTFE variant), and i mean that we compile=20
 regexp before
 doing alot of searches, not before each single search. if you=20
 have alot
 of words to match or alot of strings to check, regexp can give=20
 a huge
 boost.

 sure, it all depends of code patterns.

 import std.regex;
 auto ctr =3D ctRegex!(`(home|office|sea|plane)`);
 auto c2 =3D !matchFirst("He is in the sea.", ctr).empty;
 ----------------------------------------------------------
 Test by  auto r =3D benchmark!(f0,f1, f2, f3,f4,f5)(10_0000);
=20
 Result is :
 filter is          42ms 85us
 findAmong is       37ms 268us
 foreach indexOf is 37ms 841us
 canFind is         13ms
 canFind indexOf is 39ms 455us
 ctRegex is         138ms

1. stop doing captures in regexp, this will speedup the comparison.
2. your sample is very artificial. i was talking about alot more
keywords and alot longer strings. sorry, i wasn't told that clear
enough.

Jan 09 2015

"FrankLike" <1150015857 qq.com> writes:

On Friday, 9 January 2015 at 15:57:21 UTC, ketmar via 
Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 15:36:21 +0000
 FrankLike via Digitalmars-d-learn 
 <digitalmars-d-learn puremagic.com>
 wrote:

 On Friday, 9 January 2015 at 14:03:21 UTC, ketmar via 
 Digitalmars-d-learn wrote:
 On Fri, 09 Jan 2015 13:54:00 +0000
 Robert burner Schadek via Digitalmars-d-learn
 <digitalmars-d-learn puremagic.com> wrote:

 On Friday, 9 January 2015 at 13:25:17 UTC, ketmar via 
 Digitalmars-d-learn wrote:
 if you *really* concerned with speed here, you'd better 
 consider using
 regular expressions. as regular expression can be 
 precompiled and then
 search for multiple words with only one pass over the 
 source string. i
 believe that std.regex will use variation of Thomson 
 algorithm for
 regular expressions when it is able to do so.

 
 IMO that is not sound advice. Creating the state machine 
 and running will be more costly than using canFind or 
 indexOf how basically only compare char by char.
 
 If speed is really need use strstr and look if it uses sse 
 to compare multiple chars at a time. Anyway benchmark and 
 then benchmark some more.

 std.regex can use CTFE to compile regular expressions (yet 
 it sometimes
 slower than non-CTFE variant), and i mean that we compile 
 regexp before
 doing alot of searches, not before each single search. if 
 you have alot
 of words to match or alot of strings to check, regexp can 
 give a huge
 boost.

 sure, it all depends of code patterns.

 import std.regex;
 auto ctr = ctRegex!(`(home|office|sea|plane)`);
 auto c2 = !matchFirst("He is in the sea.", ctr).empty;
 ----------------------------------------------------------
 Test by  auto r = benchmark!(f0,f1, f2, f3,f4,f5)(10_0000);
 
 Result is :
 filter is          42ms 85us
 findAmong is       37ms 268us
 foreach indexOf is 37ms 841us
 canFind is         13ms
 canFind indexOf is 39ms 455us
 ctRegex is         138ms

 1. stop doing captures in regexp, this will speedup the 
 comparison.
 2. your sample is very artificial. i was talking about alot more
 keywords and alot longer strings. sorry, i wasn't told that 
 clear
 enough.

Yes. regex doing 'a lot more keywords and a lot longer strings' 
will be better.
Thank you.

Jan 09 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Why do the same work about 'IndexOfAny' and 'indexOf' function?