digitalmars.D.learn - Which option is faster...

jicman (31/31) Aug 05 2013 Greetings!

Timon Gehr (2/33) Aug 05 2013 They are both equally slow.

jicman (2/42) Aug 05 2013 How would you make it faster in D1?

bearophile (10/11) Aug 05 2013 Compute std.string.tolower(fext[0]) and put it in a temporary

jicman (2/13) Aug 05 2013 Thanks. This is great.

jicman (10/41) Aug 05 2013 Second option...
dennis luehring (6/37) Aug 05 2013 did you benchmarked your current szenario - how do you know that this is...

jicman (2/9) Aug 05 2013 Ok, how would you make it faster?

dennis luehring (8/19) Aug 05 2013 i don't see a better solution here - how to reduce ONE lowercase and

jicman (5/28) Aug 05 2013 It is a tool that was a script, but I have turned it into do,

dennis luehring (8/14) Aug 05 2013 so its totaly unclear if the presented code is your 2h monster, what was

jicman (9/27) Aug 05 2013 The files are in a network drive, so, that has some slowness

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (9/15) Aug 05 2013 And that is proof that a sizable amount of time is spent during
dennis luehring (19/39) Aug 05 2013 can you describe more detailed what are you doing - are you also reading

H. S. Teoh (15/22) Aug 05 2013 If you really want optimal performance, use std.regex:
Andrej Mitrovic (3/4) Aug 06 2013 Yes and if you also want to bloat your executable to the point that

John Colvin (25/56) Aug 05 2013 better:

jicman (2/66) Aug 05 2013 As the Hispanic speaking community say, muchas gracias.
monarch_dodra (49/74) Aug 05 2013 Arguably, you'd do even better with:

jicman (5/86) Aug 05 2013 :-) Now that's funny. He he he he...

David (7/24) Aug 05 2013 That's how I would do it.

John Colvin (5/31) Aug 05 2013 In case some of the strings are actually rather long, one could

JS (39/70) Aug 05 2013 Both are slow, those trying to optimize out tolower are missing

H. S. Teoh (13/71) Aug 05 2013 [...]

=?UTF-8?B?UmFwaGHDq2wgSmFrc2U=?= (74/105) Aug 05 2013 Like others said, writing if( ...) continue; if(...) continue; versus
Andre Artus (9/40) Aug 05 2013 What exactly are you trying to do with this? I get the impression

jicman (4/51) Aug 06 2013 The files are in a network drive and doing a list foreach *.doc,

Andre Artus (10/65) Aug 06 2013 Again, what are you trying to achieve?

jicman (8/75) Aug 06 2013 It's a long story and I will return in a few months and give you

dennis luehring (24/40) Aug 06 2013 after makeing us girls all wet to help you - your reply is

Andre Artus (11/44) Aug 06 2013 Just for a sanity check I implemented a quick client-server setup

H. S. Teoh (12/27) Aug 06 2013 In this case, the bottleneck most likely is in the network latency, so

H. S. Teoh (16/50) Aug 05 2013 [...]
Jonathan M Davis (23/56) Aug 05 2013 As others have pointed out, the two code snippets are semantically ident...

"jicman" <cabrera__SP M__wrc.xerox.com> writes:

Greetings!

I have this code,

foreach (...)
{

   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
}

foreach (...)
{
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
}

thanks.

josé

Aug 05 2013

Timon Gehr <timon.gehr gmx.ch> writes:

On 08/05/2013 03:59 PM, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

    if (std.string.tolower(fext[0]) == "doc" ||
      std.string.tolower(fext[0]) == "docx" ||
      std.string.tolower(fext[0]) == "xls" ||
      std.string.tolower(fext[0]) == "xlsx" ||
      std.string.tolower(fext[0]) == "ppt" ||
      std.string.tolower(fext[0]) == "pptx")
     continue;
 }

 foreach (...)
 {
    if (std.string.tolower(fext[0]) == "doc")
      continue;
    if (std.string.tolower(fext[0]) == "docx")
      continue;
    if (std.string.tolower(fext[0]) == "xls")
      continue;
    if (std.string.tolower(fext[0]) == "xlsx")
      continue;
    if (std.string.tolower(fext[0]) == "ppt")
      continue;
    if (std.string.tolower(fext[0]) == "pptx")
     continue;
    ...
    ...
 }

 thanks.

 josé

They are both equally slow.

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 14:13:33 UTC, Timon Gehr wrote:
 On 08/05/2013 03:59 PM, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }

 foreach (...)
 {
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
 }

 thanks.

 josé

 They are both equally slow.

How would you make it faster in D1?

Aug 05 2013

"bearophile" <bearophileHUGS lycos.com> writes:

jicman:

 How would you make it faster in D1?

Compute std.string.tolower(fext[0]) and put it in a temporary 
variable. And then compare that variable with all your string 
literals. In most cases that's fast enough. If it's not enough, 
you could create a little finite state machine that represents 
your directed acyclic word graph, and uses gotos to jump around 
states. The amount of strings is small, so perhaps there are not 
enough code cache misses to nullify this optimization.

Bye,
bearophile

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 14:50:27 UTC, bearophile wrote:
 jicman:

 How would you make it faster in D1?

 Compute std.string.tolower(fext[0]) and put it in a temporary 
 variable. And then compare that variable with all your string 
 literals. In most cases that's fast enough. If it's not enough, 
 you could create a little finite state machine that represents 
 your directed acyclic word graph, and uses gotos to jump around 
 states. The amount of strings is small, so perhaps there are 
 not enough code cache misses to nullify this optimization.

 Bye,
 bearophile

Thanks.  This is great.

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

First option...

 foreach (...)
 {

   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }

Second option...
 foreach (...)
 {
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
 }

 thanks.

 josé

So, after I saw this post I asked myself, what?  So, the question 
is: which of the two foreach loops options are faster:
1. The concatenated if ||
2. The single if

I am trying to see if it matters.  I have a project with lots of 
files and if one is faster, then, it will matter to code it the 
faster way.  Thanks.

Aug 05 2013

dennis luehring <dl.soluz gmx.net> writes:

did you benchmarked your current szenario - how do you know that this is 
the slow part - or are you working on an only-extension-compare-tool?

btw: they are both equal and slow - and full of partly code-duplication
std.string.tolower(fext[0]) multiple times, i hope your list isn't going 
much longer

Am 05.08.2013 15:59, schrieb jicman:
 Greetings!

 I have this code,

 foreach (...)
 {

     if (std.string.tolower(fext[0]) == "doc" ||
       std.string.tolower(fext[0]) == "docx" ||
       std.string.tolower(fext[0]) == "xls" ||
       std.string.tolower(fext[0]) == "xlsx" ||
       std.string.tolower(fext[0]) == "ppt" ||
       std.string.tolower(fext[0]) == "pptx")
      continue;
 }

 foreach (...)
 {
     if (std.string.tolower(fext[0]) == "doc")
       continue;
     if (std.string.tolower(fext[0]) == "docx")
       continue;
     if (std.string.tolower(fext[0]) == "xls")
       continue;
     if (std.string.tolower(fext[0]) == "xlsx")
       continue;
     if (std.string.tolower(fext[0]) == "ppt")
       continue;
     if (std.string.tolower(fext[0]) == "pptx")
      continue;
     ...
     ...
 }

 thanks.

 josé

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 14:27:43 UTC, dennis luehring wrote:
 did you benchmarked your current szenario - how do you know 
 that this is the slow part - or are you working on an 
 only-extension-compare-tool?

 btw: they are both equal and slow - and full of partly 
 code-duplication
 std.string.tolower(fext[0]) multiple times, i hope your list 
 isn't going much longer

Ok, how would you make it faster?

Aug 05 2013

dennis luehring <dl.soluz gmx.net> writes:

 Ok, how would you make it faster?

i don't see a better solution here - how to reduce ONE lowercase and 
SOME compares in any way? (i dont think a hash or something will help) 
but i know that anything like your continue-party is worth nothing 
(feels a little bit like script-kiddies "do it with assembler that would 
it make million times faster" blabla)

question: is this the slow part in your project?
do you know it for sure or just an emotion - HOW do you benchmark?

Am 05.08.2013 16:31, schrieb jicman:
 On Monday, 5 August 2013 at 14:27:43 UTC, dennis luehring wrote:
 did you benchmarked your current szenario - how do you know
 that this is the slow part - or are you working on an
 only-extension-compare-tool?

 btw: they are both equal and slow - and full of partly
 code-duplication
 std.string.tolower(fext[0]) multiple times, i hope your list
 isn't going much longer

 Ok, how would you make it faster?

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 14:47:37 UTC, dennis luehring wrote:
 Ok, how would you make it faster?

 i don't see a better solution here - how to reduce ONE 
 lowercase and SOME compares in any way? (i dont think a hash or 
 something will help) but i know that anything like your 
 continue-party is worth nothing (feels a little bit like 
 script-kiddies "do it with assembler that would it make million 
 times faster" blabla)

 question: is this the slow part in your project?
 do you know it for sure or just an emotion - HOW do you 
 benchmark?

 Am 05.08.2013 16:31, schrieb jicman:
 On Monday, 5 August 2013 at 14:27:43 UTC, dennis luehring 
 wrote:
 did you benchmarked your current szenario - how do you know
 that this is the slow part - or are you working on an
 only-extension-compare-tool?

 btw: they are both equal and slow - and full of partly
 code-duplication
 std.string.tolower(fext[0]) multiple times, i hope your list
 isn't going much longer

 Ok, how would you make it faster?


It is a tool that was a script, but I have turned it into do, 
which now has taken two hours from the last jscript script.  I 
have not benchmarked it, yet.  I may.  But I see that a great 
idea has been provided, which I will use.  Thanks for the help.

Aug 05 2013

dennis luehring <dl.soluz gmx.net> writes:

Am 05.08.2013 17:18, schrieb jicman:
 It is a tool that was a script, but I have turned it into do,
 which now has taken two hours from the last jscript script.  I
 have not benchmarked it, yet.  I may.  But I see that a great
 idea has been provided, which I will use.  Thanks for the help.

 have not benchmarked it, yet.  I may.

so its totaly unclear if the presented code is your 2h monster, what was
the runtime of your jscript?

 But I see that a great idea has been provided

using a local variable for not lowercasing on each if is not an great 
idea it is default programming style

and i don't think you're be able to implement the tree statemachine
when doing such simple performance killer like multiple lowercase calls, 
and try to help youselfe by introducing "continue"...

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 15:27:09 UTC, dennis luehring wrote:
 Am 05.08.2013 17:18, schrieb jicman:
 It is a tool that was a script, but I have turned it into do,
 which now has taken two hours from the last jscript script.  I
 have not benchmarked it, yet.  I may.  But I see that a great
 idea has been provided, which I will use.  Thanks for the help.

 have not benchmarked it, yet.  I may.

 so its totaly unclear if the presented code is your 2h monster, 
 what was
 the runtime of your jscript?

The files are in a network drive, so, that has some slowness 
already involved because of that.  The jscript use to take over 8 
hours.  The new D program has dropped that to under less than 
six.  This is huge to us.  But, I know that I can probably fine 
tune the program to make it a few minutes less. :-)

 But I see that a great idea has been provided

 using a local variable for not lowercasing on each if is not an 
 great idea it is default programming style

This will help.  If there are 100K files, which I know that there 
are more than that, it will help a little bit.
 and i don't think you're be able to implement the tree 
 statemachine
 when doing such simple performance killer like multiple 
 lowercase calls, and try to help youselfe by introducing 
 "continue"...

Perhaps, but it's a good idea, nonetheless.

Aug 05 2013

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 08/05/2013 10:04 AM, jicman wrote:

 The files are in a network drive, so, that has some slowness already
 involved because of that.

Which may dominate the running time.

 The jscript use to take over 8 hours.  The new D program has
 dropped that to under less than six.

And that is proof that a sizable amount of time is spent during 
computations. :)

 But, I know that I can probably fine tune the program to make
 it a few minutes less. :-)

You really should profile where the time is spent. If file I/O is still 
a big issue, there are WAN optimization products that may solve that 
problem amazingly efficiently:

   http://en.wikipedia.org/wiki/WAN_optimization

Ali

Aug 05 2013

dennis luehring <dl.soluz gmx.net> writes:

Am 05.08.2013 19:04, schrieb jicman:
 so its totaly unclear if the presented code is your 2h monster,
 what was
 the runtime of your jscript?

 The files are in a network drive, so, that has some slowness
 already involved because of that.  The jscript use to take over 8
 hours.  The new D program has dropped that to under less than
 six.  This is huge to us.  But, I know that I can probably fine
 tune the program to make it a few minutes less. :-)

can you describe more detailed what are you doing - are you also reading
these files? why not run your tool on the server and only collect the 
results over network (wouldn't that be much faster)?

what is the impact of using the networkdrive? just copy your biggeste 
szenario on a local machine an run it against this to get a feeling how 
painfull the networkdrive communication is

 But I see that a great idea has been provided

 using a local variable for not lowercasing on each if is not an
 great idea it is default programming style

 This will help.  If there are 100K files, which I know that there
 are more than that, it will help a little bit.
 and i don't think you're be able to implement the tree
 statemachine
 when doing such simple performance killer like multiple
 lowercase calls, and try to help youselfe by introducing
 "continue"...

 Perhaps, but it's a good idea, nonetheless.

maybe your program can be better optimized globaly, but we need more 
information of what is the programing doing - can you give some
psuedo code like:

1. read all filenames recursivley -> 100k filenames
2. reduce down to known extensions -> 10k filenames
3. ...
4. ...
5. ...

i don't think that your lower-case is the major part of the 6h, there
must be other very slow parts in your project - which you don't find 
without putting in some bechmarking (and i don't understand why you 
don't start with benchmarking)

Aug 05 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Aug 05, 2013 at 04:47:36PM +0200, dennis luehring wrote:
 Ok, how would you make it faster?

 
 i don't see a better solution here - how to reduce ONE lowercase and
 SOME compares in any way? (i dont think a hash or something will
 help) but i know that anything like your continue-party is worth
 nothing (feels a little bit like script-kiddies "do it with
 assembler that would it make million times faster" blabla)

If you really want optimal performance, use std.regex:

	import std.regex;
	auto reExtMatch = ctRegex!(`doc|docx|exe|...`, "i");
	foreach (...) {
		if (fext[0].match(reExtMatch))
			continue;
	}

The regex is set up to ignore case (the "i" flag), and is compiled at
compile-time to generate optimal matching code for the given list of
extensions. To get any faster than this, you'll have to hand-optimize or
write in assembly. :)


T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases. 
He/She has prejudices. -- Gene Wirchenko

Aug 05 2013

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

On 8/5/13, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:
 If you really want optimal performance, use std.regex:

Yes and if you also want to bloat your executable to the point that
other parts of the system start slowing down.

Aug 06 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }

 foreach (...)
 {
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
 }

 thanks.

 josé

better:

foreach (...)
{
     auto tmp = std.string.tolower(fext[0]);
     if(tmp == "doc" || tmp == "docx"
        || tmp == "xls" || tmp == "xlsx"
        || tmp == "ppt" || tmp == "pptx")
     {
         continue;
     }
}

but still not super-fast as (unless the compiler is very clever) 
it still means multiple passes over tmp. Also, it converts the 
whole string to lower case even when it's not necessary.

If you have large numbers of possible matches you will probably 
want to be clever with your data structures / algorithms. E.g.

You could create a tree-like structure to quickly eliminate 
possibilities as you read successive letters. You read one 
character, follow the appropriate branch, check if there are any 
further branches, if not then no match and break. Else, read the 
next character and follow the appropriate branch and so on.... 
Infeasible for large (or even medium-sized) character-sets 
without hashing, but might be pretty fast for a-z and a large 
number of short strings.

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:
 On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

  if (std.string.tolower(fext[0]) == "doc" ||
    std.string.tolower(fext[0]) == "docx" ||
    std.string.tolower(fext[0]) == "xls" ||
    std.string.tolower(fext[0]) == "xlsx" ||
    std.string.tolower(fext[0]) == "ppt" ||
    std.string.tolower(fext[0]) == "pptx")
   continue;
 }

 foreach (...)
 {
  if (std.string.tolower(fext[0]) == "doc")
    continue;
  if (std.string.tolower(fext[0]) == "docx")
    continue;
  if (std.string.tolower(fext[0]) == "xls")
    continue;
  if (std.string.tolower(fext[0]) == "xlsx")
    continue;
  if (std.string.tolower(fext[0]) == "ppt")
    continue;
  if (std.string.tolower(fext[0]) == "pptx")
   continue;
  ...
  ...
 }

 thanks.

 josé

 better:

 foreach (...)
 {
     auto tmp = std.string.tolower(fext[0]);
     if(tmp == "doc" || tmp == "docx"
        || tmp == "xls" || tmp == "xlsx"
        || tmp == "ppt" || tmp == "pptx")
     {
         continue;
     }
 }

 but still not super-fast as (unless the compiler is very 
 clever) it still means multiple passes over tmp. Also, it 
 converts the whole string to lower case even when it's not 
 necessary.

 If you have large numbers of possible matches you will probably 
 want to be clever with your data structures / algorithms. E.g.

 You could create a tree-like structure to quickly eliminate 
 possibilities as you read successive letters. You read one 
 character, follow the appropriate branch, check if there are 
 any further branches, if not then no match and break. Else, 
 read the next character and follow the appropriate branch and 
 so on.... Infeasible for large (or even medium-sized) 
 character-sets without hashing, but might be pretty fast for 
 a-z and a large number of short strings.

As the Hispanic speaking community say, muchas gracias.

Aug 05 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:
 better:

 foreach (...)
 {
     auto tmp = std.string.tolower(fext[0]);
     if(tmp == "doc" || tmp == "docx"
        || tmp == "xls" || tmp == "xlsx"
        || tmp == "ppt" || tmp == "pptx")
     {
         continue;
     }
 }

 but still not super-fast as (unless the compiler is very 
 clever) it still means multiple passes over tmp. Also, it 
 converts the whole string to lower case even when it's not 
 necessary.

 If you have large numbers of possible matches you will probably 
 want to be clever with your data structures / algorithms. E.g.

 You could create a tree-like structure to quickly eliminate 
 possibilities as you read successive letters. You read one 
 character, follow the appropriate branch, check if there are 
 any further branches, if not then no match and break. Else, 
 read the next character and follow the appropriate branch and 
 so on.... Infeasible for large (or even medium-sized) 
 character-sets without hashing, but might be pretty fast for 
 a-z and a large number of short strings.

Arguably, you'd do even better with:
foreach (...)
{
     auto tmp = std.string.tolower(fext[0]);
     switch(tmp)
     {
         case "doc", "docx":
         case "xls", "xlsx":
         case "ppt", "pptx":
             continue;
         default:
     }
}

Since it gives the compiler more wiggle room to optimize things 
as it wishes. For example, it *could* (who knows :D !) implement 
the switch as a hash table, or a tree.

BTW, a very convenient "tree-like" structure that could be used 
is a "heap": it is a basic binary tree, but stored inside an 
array. You could build it during compilation, and then simply 
search it.

A possible optimization is to first switch on string length:

foreach (...)
{
     auto tmp = std.string.tolower(fext[0]);
     switch(tmp.length)
     {
         case 3:
         switch(tmp)
         {
             case "doc", "xls", "ppt":
                 continue;
             default:
         }
         break;

         case 4:
         switch(tmp)
         {
             case "docx", "xlsx", "pptx":
                 continue;
             default:
         }
         break;

         default:
     }
}

That said, I'm not even sure this would be faster, so a bench 
would be called for. Further more, I'd really be tempted to say 
that at this point, we are in the realm of premature optimization.

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Monday, 5 August 2013 at 17:04:36 UTC, monarch_dodra wrote:
 On Monday, 5 August 2013 at 15:18:42 UTC, John Colvin wrote:
 better:

 foreach (...)
 {
    auto tmp = std.string.tolower(fext[0]);
    if(tmp == "doc" || tmp == "docx"
       || tmp == "xls" || tmp == "xlsx"
       || tmp == "ppt" || tmp == "pptx")
    {
        continue;
    }
 }

 but still not super-fast as (unless the compiler is very 
 clever) it still means multiple passes over tmp. Also, it 
 converts the whole string to lower case even when it's not 
 necessary.

 If you have large numbers of possible matches you will 
 probably want to be clever with your data structures / 
 algorithms. E.g.

 You could create a tree-like structure to quickly eliminate 
 possibilities as you read successive letters. You read one 
 character, follow the appropriate branch, check if there are 
 any further branches, if not then no match and break. Else, 
 read the next character and follow the appropriate branch and 
 so on.... Infeasible for large (or even medium-sized) 
 character-sets without hashing, but might be pretty fast for 
 a-z and a large number of short strings.

 Arguably, you'd do even better with:
 foreach (...)
 {
     auto tmp = std.string.tolower(fext[0]);
     switch(tmp)
     {
         case "doc", "docx":
         case "xls", "xlsx":
         case "ppt", "pptx":
             continue;
         default:
     }
 }

 Since it gives the compiler more wiggle room to optimize things 
 as it wishes. For example, it *could* (who knows :D !) 
 implement the switch as a hash table, or a tree.

 BTW, a very convenient "tree-like" structure that could be used 
 is a "heap": it is a basic binary tree, but stored inside an 
 array. You could build it during compilation, and then simply 
 search it.

 A possible optimization is to first switch on string length:

 foreach (...)
 {
     auto tmp = std.string.tolower(fext[0]);
     switch(tmp.length)
     {
         case 3:
         switch(tmp)
         {
             case "doc", "xls", "ppt":
                 continue;
             default:
         }
         break;

         case 4:
         switch(tmp)
         {
             case "docx", "xlsx", "pptx":
                 continue;
             default:
         }
         break;

         default:
     }
 }

 That said, I'm not even sure this would be faster, so a bench 
 would be called for. Further more, I'd really be tempted to say 
 that at this point, we are in the realm of premature 
 optimization.

:-)  Now that's funny.  He he he he...
By the way, your coding makes sense.  For 100K plus files, it may 
not mean much, but we are thinking that this amount will be only 
growing.

Aug 05 2013

David <d dav1d.de> writes:

Am 05.08.2013 15:59, schrieb jicman:
 
 Greetings!
 
 I have this code,
 
 foreach (...)
 {
 
   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }
 

That's how I would do it.

if(["doc", "docx", "xls", "xlsx", "ppt", "pptx"].canFind(fext[0].toLower) {
    continue;
}

With the array beeing a runtime/compiletime constant with a sensible
name. WORD_EXTENSIONS or something.

Aug 05 2013

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Monday, 5 August 2013 at 15:21:25 UTC, David wrote:
 Am 05.08.2013 15:59, schrieb jicman:
 
 Greetings!
 
 I have this code,
 
 foreach (...)
 {
 
   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }
 

 That's how I would do it.

 if(["doc", "docx", "xls", "xlsx", "ppt", 
 "pptx"].canFind(fext[0].toLower) {
     continue;
 }

 With the array beeing a runtime/compiletime constant with a 
 sensible
 name. WORD_EXTENSIONS or something.

In case some of the strings are actually rather long, one could 
use toLower lazily (either std.ascii.toLower or std.uni.toLower, 
depending on which you need):

if(WORD_EXT.canFind(fext[0].map!toLower()))

Aug 05 2013

"JS" <js.mdnq gmail.com> writes:

On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }

 foreach (...)
 {
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
 }

 thanks.

 josé


Both are slow, those trying to optimize out tolower are missing 
the boat. A good compiler should optimize that out.

The issue is that n compare are done in those cases with each 
successive compare getting slower and slower. if the match is doc 
then it happens in one compare in both cases. If it's pptx then 
it is 6. So if there are many pptx matches then the code will 
perform slower than for doc.


This can be good or bad behavior depending on if you are 
guarantee that your order is representative of what actually 
happens(many doc's few pptx's).

To get O(1) behavior, you must use some sort of jump list. A hash 
table can provide such behavior. If there are many comparisons 
needed then at some point this will be, on average, faster.

If performance is critical and the string length is small enough 
then the fastest method is to use a direct lookup table.

For n character of char. It is 256^n. For n = 4, this is 4GB of 
memory! (This is one reason to use a hash table). Since you are 
not using the full ascii you could compact this a bit, say 36^n, 
for n = 4 it is only 1.6MB of memory. The cost is packing the 
strings for lookup and computing the index.

One way to speed up the loop is to also compare for what is not 
used. In this case you can compare just on the first char.


    if (fext[0][0]) == "d")
    {
       ...
    }
    if (fext[0][0]) == "x")
    {
       ...
    }
    if (fext[0][0]) == "p")
    {
       ...
    }

This pares down the results. In this case it is only 3 
comparisions + 2 comparisons = 5 for pptx rather than 6.

You can try and develop your own hash to get optimal amortized 
performance.

Aug 05 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Aug 05, 2013 at 07:30:36PM +0200, JS wrote:
 On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
Greetings!

I have this code,

foreach (...)
{

  if (std.string.tolower(fext[0]) == "doc" ||
    std.string.tolower(fext[0]) == "docx" ||
    std.string.tolower(fext[0]) == "xls" ||
    std.string.tolower(fext[0]) == "xlsx" ||
    std.string.tolower(fext[0]) == "ppt" ||
    std.string.tolower(fext[0]) == "pptx")
   continue;
}

foreach (...)
{
  if (std.string.tolower(fext[0]) == "doc")
    continue;
  if (std.string.tolower(fext[0]) == "docx")
    continue;
  if (std.string.tolower(fext[0]) == "xls")
    continue;
  if (std.string.tolower(fext[0]) == "xlsx")
    continue;
  if (std.string.tolower(fext[0]) == "ppt")
    continue;
  if (std.string.tolower(fext[0]) == "pptx")
   continue;
  ...
  ...
}

thanks.

jos�

 
 
 Both are slow, those trying to optimize out tolower are missing the
 boat. A good compiler should optimize that out.
 
 The issue is that n compare are done in those cases with each
 successive compare getting slower and slower. if the match is doc
 then it happens in one compare in both cases. If it's pptx then it
 is 6. So if there are many pptx matches then the code will perform
 slower than for doc.
 
 
 This can be good or bad behavior depending on if you are guarantee
 that your order is representative of what actually happens(many
 doc's few pptx's).
 
 To get O(1) behavior, you must use some sort of jump list. A hash
 table can provide such behavior. If there are many comparisons
 needed then at some point this will be, on average, faster.

[...]

There's no need to reinvent the wheel here. std.regex.ctRegex will
create the equivalent of jump tables for you.

Besides, I doubt the OP's real performance bottleneck is in this code;
sounds like the program is I/O bound. Compared to I/O, even repeatedly
calling std.string.tolower repeatedly is just roundoff error in terms of
total execution time. Using a profiler will reveal the *real*
bottleneck. Based on past experience, I'm expecting that it will reveal
that the bottleneck isn't anywhere we're predicting it is. :)


T

-- 
VI = Visual Irritation

Aug 05 2013

=?UTF-8?B?UmFwaGHDq2wgSmFrc2U=?= <raphael.jakse gmail.com> writes:

Le 05/08/2013 15:59, jicman a écrit :
 Greetings!

 I have this code,

 foreach (...)
 {

    if (std.string.tolower(fext[0]) == "doc" ||
      std.string.tolower(fext[0]) == "docx" ||
      std.string.tolower(fext[0]) == "xls" ||
      std.string.tolower(fext[0]) == "xlsx" ||
      std.string.tolower(fext[0]) == "ppt" ||
      std.string.tolower(fext[0]) == "pptx")
     continue;
 }

 foreach (...)
 {
    if (std.string.tolower(fext[0]) == "doc")
      continue;
    if (std.string.tolower(fext[0]) == "docx")
      continue;
    if (std.string.tolower(fext[0]) == "xls")
      continue;
    if (std.string.tolower(fext[0]) == "xlsx")
      continue;
    if (std.string.tolower(fext[0]) == "ppt")
      continue;
    if (std.string.tolower(fext[0]) == "pptx")
     continue;
    ...
    ...
 }

 thanks.

 josé


Like others said, writing if( ...) continue; if(...) continue; versus 
if(... ||  ...) continue; is not the problem in the code.
Computing the lower case of the string for each comparison is a greater 
problem that can be spotted at the first glance. Storing the value in a 
variable and then using this variables in comparisons would be better.

Both versions of your code should be equivalent, though this should be 
verified.

I would suggest another way of doing this:

       switch(std.string.tolower(fext[0])) {
          case "doc":
          case "docx":
          case "xls":
          case "xlsx":
          case "ppt":
          case "pptx":
             continue;
          default:
            // do something else
       }

This way of writing your code seems far more readable and elegant to me.

Even more concise:
switch(std.string.tolower(fext[0])) {
    case "doc","docx","xls","xlsx","ppt","pptx":
       continue;
    default:
       // do something else
}

Much like what morarch_dodra proposed, I guess.

But notice that tolower is deprecated in current D2 version, toLower 
should be used instead. If you still use D1, maybe a better thing to do 
before considering such optimization would be opting for D2.

I would also suggest avoiding usage of continue if you don't need it. 
Here, it is likely that you can write something more structured like:

import std.string : toLower

// ...

foreach(...) {
    switch(fext[0].toLower) {
       case "doc","docx","xls","xlsx","ppt","pptx":
          // do what is to be done here with these cases, or nothing
          break;
       default:
          // do something for other cases
    }
}

If you need to use fext[0].toLower in any case, I would suggest you 
write this kind of code instead:


import std.string : toLower

// ...

auto ext = fext[0].toLower;
switch(ext) {
    case "doc","docx","xls","xlsx","ppt","pptx":
       // do what is to be done here with these cases, or nothing
       // use ext instead of fext[0].toLower, if needed
       break;
    default:
       // do something for other cases
       // use ext instead of fext[0].toLower, if needed
}


auto ext = fext[0].toLower; should be placed before the foreach loop if 
fext[0] isn't changed in the loop (avoid computing a value more than one 
time when it is possible, though thinking before applying this rule or 
any other rule is not forbidden).

There might exist solutions that could be faster, like testing the first 
letter before the second / using the length of the string / using finite 
state machine to save tests, but beware of the maintainability of your 
code and if you go into this, triple check that these solutions are 
indeed more efficient in terms of execution.

See morarch_dodra, JS, bearophile and others posts for more precise 
information.

If this part of your code is not known to be a bottleneck, I would opt 
for readability / elegance over over-optimization and for this 
particular case, I would trust the compiler for optimizing enough with 
the switch version of the code (though it is not more than a opinion here).

Aug 05 2013

"Andre Artus" <andre.artus gmail.com> writes:

On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }

 foreach (...)
 {
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
 }

 thanks.

 josé

What exactly are you trying to do with this? I get the impression 
that there is an attempt at "local optimization" when broader 
approach could lead to better results.

For instance. Using the OS's facilities to filter (six requests, 
one each for "*.doc", "*.docx") could actually end up being a lot 
faster.

If you could give more detail about what you are trying to 
achieve then it could be possible to get better results.

Aug 05 2013

"jicman" <cabrera_ _wrc.xerox.com> writes:

On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:
 On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

  if (std.string.tolower(fext[0]) == "doc" ||
    std.string.tolower(fext[0]) == "docx" ||
    std.string.tolower(fext[0]) == "xls" ||
    std.string.tolower(fext[0]) == "xlsx" ||
    std.string.tolower(fext[0]) == "ppt" ||
    std.string.tolower(fext[0]) == "pptx")
   continue;
 }

 foreach (...)
 {
  if (std.string.tolower(fext[0]) == "doc")
    continue;
  if (std.string.tolower(fext[0]) == "docx")
    continue;
  if (std.string.tolower(fext[0]) == "xls")
    continue;
  if (std.string.tolower(fext[0]) == "xlsx")
    continue;
  if (std.string.tolower(fext[0]) == "ppt")
    continue;
  if (std.string.tolower(fext[0]) == "pptx")
   continue;
  ...
  ...
 }

 thanks.

 josé

 What exactly are you trying to do with this? I get the 
 impression that there is an attempt at "local optimization" 
 when broader approach could lead to better results.

 For instance. Using the OS's facilities to filter (six 
 requests, one each for "*.doc", "*.docx") could actually end up 
 being a lot faster.

 If you could give more detail about what you are trying to 
 achieve then it could be possible to get better results.

The files are in a network drive and doing a list foreach *.doc, 
*.docx, etc. will be more expensive than getting the list of all 
the files at once and then processing them accordingly.

Aug 06 2013

"Andre Artus" <andre.artus gmail.com> writes:

On Tuesday, 6 August 2013 at 12:32:13 UTC, jicman wrote:
 On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:
 On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

 if (std.string.tolower(fext[0]) == "doc" ||
   std.string.tolower(fext[0]) == "docx" ||
   std.string.tolower(fext[0]) == "xls" ||
   std.string.tolower(fext[0]) == "xlsx" ||
   std.string.tolower(fext[0]) == "ppt" ||
   std.string.tolower(fext[0]) == "pptx")
  continue;
 }

 foreach (...)
 {
 if (std.string.tolower(fext[0]) == "doc")
   continue;
 if (std.string.tolower(fext[0]) == "docx")
   continue;
 if (std.string.tolower(fext[0]) == "xls")
   continue;
 if (std.string.tolower(fext[0]) == "xlsx")
   continue;
 if (std.string.tolower(fext[0]) == "ppt")
   continue;
 if (std.string.tolower(fext[0]) == "pptx")
  continue;
 ...
 ...
 }

 thanks.

 josé

 What exactly are you trying to do with this? I get the 
 impression that there is an attempt at "local optimization" 
 when broader approach could lead to better results.

 For instance. Using the OS's facilities to filter (six 
 requests, one each for "*.doc", "*.docx") could actually end 
 up being a lot faster.

 If you could give more detail about what you are trying to 
 achieve then it could be possible to get better results.

 The files are in a network drive and doing a list foreach 
 *.doc, *.docx, etc. will be more expensive than getting the 
 list of all the files at once and then processing them 
 accordingly.

Again, what are you trying to achieve?
Your statement is not necessarily true, for a  myriad of reasons, 
but it entirely depends on what you want to do.
I would reiterate Dennis Luehring's reply, why are you not 
benching? It seems like you are guessing at what the problems 
are, that's hardly ever useful.
One of the first rules of network optimization  is to reduce the 
amount od data, that normally means filtering.at the server, the 
next thing is coarse grained is better than fine (BOCTAOE/L).

Aug 06 2013

"jicman" <cabrera wrc.xerox.com> writes:

On Tuesday, 6 August 2013 at 14:49:42 UTC, Andre Artus wrote:
 On Tuesday, 6 August 2013 at 12:32:13 UTC, jicman wrote:
 On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:
 On Monday, 5 August 2013 at 13:59:24 UTC, jicman wrote:
 Greetings!

 I have this code,

 foreach (...)
 {

 if (std.string.tolower(fext[0]) == "doc" ||
  std.string.tolower(fext[0]) == "docx" ||
  std.string.tolower(fext[0]) == "xls" ||
  std.string.tolower(fext[0]) == "xlsx" ||
  std.string.tolower(fext[0]) == "ppt" ||
  std.string.tolower(fext[0]) == "pptx")
 continue;
 }

 foreach (...)
 {
 if (std.string.tolower(fext[0]) == "doc")
  continue;
 if (std.string.tolower(fext[0]) == "docx")
  continue;
 if (std.string.tolower(fext[0]) == "xls")
  continue;
 if (std.string.tolower(fext[0]) == "xlsx")
  continue;
 if (std.string.tolower(fext[0]) == "ppt")
  continue;
 if (std.string.tolower(fext[0]) == "pptx")
 continue;
 ...
 ...
 }

 thanks.

 josé

 What exactly are you trying to do with this? I get the 
 impression that there is an attempt at "local optimization" 
 when broader approach could lead to better results.

 For instance. Using the OS's facilities to filter (six 
 requests, one each for "*.doc", "*.docx") could actually end 
 up being a lot faster.

 If you could give more detail about what you are trying to 
 achieve then it could be possible to get better results.

 The files are in a network drive and doing a list foreach 
 *.doc, *.docx, etc. will be more expensive than getting the 
 list of all the files at once and then processing them 
 accordingly.

 Again, what are you trying to achieve?
 Your statement is not necessarily true, for a  myriad of 
 reasons, but it entirely depends on what you want to do.
 I would reiterate Dennis Luehring's reply, why are you not 
 benching? It seems like you are guessing at what the problems 
 are, that's hardly ever useful.
 One of the first rules of network optimization  is to reduce 
 the amount od data, that normally means filtering.at the 
 server, the next thing is coarse grained is better than fine 
 (BOCTAOE/L).

It's a long story and I will return in a few months and give you 
the whole story, but right now, time is not on my side.  I have 
answers for all the questions you folks have asked, and I 
appreciate all the input.  I have the answer that I was looking 
for, so in a few months, I will come back and explain the whole 
story.  Thanks for all the response and suggestions.

jic

Aug 06 2013

dennis luehring <dl.soluz gmx.net> writes:

Am 07.08.2013 06:30, schrieb jicman:
 Again, what are you trying to achieve?
 Your statement is not necessarily true, for a  myriad of
 reasons, but it entirely depends on what you want to do.
 I would reiterate Dennis Luehring's reply, why are you not
 benching? It seems like you are guessing at what the problems
 are, that's hardly ever useful.
 One of the first rules of network optimization  is to reduce
 the amount od data, that normally means filtering.at the
 server, the next thing is coarse grained is better than fine
 (BOCTAOE/L).

 It's a long story and I will return in a few months and give you
 the whole story, but right now, time is not on my side.  I have
 answers for all the questions you folks have asked, and I
 appreciate all the input.  I have the answer that I was looking
 for, so in a few months, I will come back and explain the whole
 story.  Thanks for all the response and suggestions.

after makeing us girls all wet to help you - your reply is
"no sex on the first date, im a gentlemen... but maybe in a few months"

so:

you having a jscript doing somehting with files,fileextensions over 
networkdrive - it runs around 8h

you ported that jscript to D - now it runs for 6h

you noob-guessed the lowercase-if-party could be evil (btw: it cost more 
time to guess then to benchmark)

you get trivial answers that won't get you very much, the lowercase 
would not boost your speed that much and the networkdrive latency will 
kill all the other statemachine ideas

you don't answer trivial questions about the big picture - and now 
you're out of time

open questions:
-why not collect the data on the server itself - instead of grabbing 
tiny bits over network? - this is for understanding your environent

-how big is the speed drop with your tool on the very same drive localy 
and over a networkdrive? - this is for understanding the latency

-are you also reading this files or just doing filename search 
(recursively?) and throwing out non office-extensions?
this is for getting an idea if buildin OS(operating system) features can 
help

see you in a few months

Aug 06 2013

"Andre Artus" <andre.artus gmail.com> writes:

 It's a long story and I will return in a few months and give 
 you
 the whole story, but right now, time is not on my side.  I have
 answers for all the questions you folks have asked, and I
 appreciate all the input.  I have the answer that I was looking
 for, so in a few months, I will come back and explain the whole
 story.  Thanks for all the response and suggestions.

 after makeing us girls all wet to help you - your reply is
 "no sex on the first date, im a gentlemen... but maybe in a few 
 months"

 so:

 you having a jscript doing somehting with files,fileextensions 
 over networkdrive - it runs around 8h

 you ported that jscript to D - now it runs for 6h

 you noob-guessed the lowercase-if-party could be evil (btw: it 
 cost more time to guess then to benchmark)

 you get trivial answers that won't get you very much, the 
 lowercase would not boost your speed that much and the 
 networkdrive latency will kill all the other statemachine ideas

 you don't answer trivial questions about the big picture - and 
 now you're out of time

 open questions:
 -why not collect the data on the server itself - instead of 
 grabbing tiny bits over network? - this is for understanding 
 your environent

Just for a sanity check I implemented a quick client-server setup 
where the daemon takes filespec from the client and returns a 
line-by-line list compressed into one packet.

The total running time on 8 terabytes of files stored over a 
dozen drives searched recursively: less than 1 minute.

Same over slow WiFi, negligible difference (list compresses to a 
few Kb) with LZMA.

I did not even bother to search each physical drive separately, 
just produced the list sequentially.

 -how big is the speed drop with your tool on the very same 
 drive localy and over a networkdrive? - this is for 
 understanding the latency

 -are you also reading this files or just doing filename search 
 (recursively?) and throwing out non office-extensions?
 this is for getting an idea if buildin OS(operating system) 
 features can help

 see you in a few months

It's impossible to help people who refuse to give basic 
information.

Aug 06 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Aug 06, 2013 at 02:32:11PM +0200, jicman wrote:
 On Tuesday, 6 August 2013 at 04:10:57 UTC, Andre Artus wrote:

[...]
What exactly are you trying to do with this? I get the impression
that there is an attempt at "local optimization" when broader
approach could lead to better results.

For instance. Using the OS's facilities to filter (six requests,
one each for "*.doc", "*.docx") could actually end up being a lot
faster.

If you could give more detail about what you are trying to achieve
then it could be possible to get better results.

 
 The files are in a network drive and doing a list foreach *.doc,
 *.docx, etc. will be more expensive than getting the list of all the
 files at once and then processing them accordingly.

In this case, the bottleneck most likely is in the network latency, so
optimizing string comparisons won't get you very far. You should instead
be focusing on how to improve the performance of network accesses.

What kind of processing are you doing with the filtered list of files?
If it's a complicated operation, you might want to consider having a
separate thread for fetching the list of files while work is being done
on them in the main thread.


T

-- 
Computers shouldn't beep through the keyhole.

Aug 06 2013

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Aug 05, 2013 at 03:59:23PM +0200, jicman wrote:
 
 Greetings!
 
 I have this code,
 
 foreach (...)
 {
 
   if (std.string.tolower(fext[0]) == "doc" ||
     std.string.tolower(fext[0]) == "docx" ||
     std.string.tolower(fext[0]) == "xls" ||
     std.string.tolower(fext[0]) == "xlsx" ||
     std.string.tolower(fext[0]) == "ppt" ||
     std.string.tolower(fext[0]) == "pptx")
    continue;
 }
 
 foreach (...)
 {
   if (std.string.tolower(fext[0]) == "doc")
     continue;
   if (std.string.tolower(fext[0]) == "docx")
     continue;
   if (std.string.tolower(fext[0]) == "xls")
     continue;
   if (std.string.tolower(fext[0]) == "xlsx")
     continue;
   if (std.string.tolower(fext[0]) == "ppt")
     continue;
   if (std.string.tolower(fext[0]) == "pptx")
    continue;
   ...
   ...
 }

[...]

It would appear that your bottleneck is not in the continue statements,
but in the repeated calls to std.string.tolower. It's probably better to
write it this way:

	foreach (...)
	{
		auto ext = std.string.tolower(fext[0]);
		if (ext == "doc" || ext == "docx" || ... )
			continue;
	}

This way you save on the overhead of many identical function calls,
which probably outweights any benefit you get by optimizing continue.


T

-- 
It's amazing how careful choice of punctuation can leave you hanging:

Aug 05 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, August 05, 2013 15:59:23 jicman wrote:
 Greetings!
 
 I have this code,
 
 foreach (...)
 {
 
    if (std.string.tolower(fext[0]) == "doc" ||
      std.string.tolower(fext[0]) == "docx" ||
      std.string.tolower(fext[0]) == "xls" ||
      std.string.tolower(fext[0]) == "xlsx" ||
      std.string.tolower(fext[0]) == "ppt" ||
      std.string.tolower(fext[0]) == "pptx")
     continue;
 }
 
 foreach (...)
 {
    if (std.string.tolower(fext[0]) == "doc")
      continue;
    if (std.string.tolower(fext[0]) == "docx")
      continue;
    if (std.string.tolower(fext[0]) == "xls")
      continue;
    if (std.string.tolower(fext[0]) == "xlsx")
      continue;
    if (std.string.tolower(fext[0]) == "ppt")
      continue;
    if (std.string.tolower(fext[0]) == "pptx")
     continue;
    ...
    ...
 }

As others have pointed out, the two code snippets are semantically identical, 
and there's a decent chance that the compiler will generate identical code for 
both. It wouldn't surprise me if the first snippet resulted in more concise 
assembly code, but there shouldn't be any performance difference between the 
two. But the first snippet is cleaner, so between those two, you should use 
that.

Of course, as others have pointed out, it would be far more efficient to simply 
calculate std.string.tolower(fext[0]) once and re-use the result, which should 
be a very large performance gain here (assuming that the code around it 
doesn't dwarf it in cost). However, another possibility that I don't think 
anyone has pointed out is std.string.icmp, which will do case-insensitive 
comparison, meaning that you could do

if(fext[0].icmp("doc") == 0 || fext[0].icmp("docx") == 0 || ...)

and that might be more efficient as it avoids having to allocate a new string 
when fext[0] isn't all lowercase already. However, you would have to benchmark 
the code to be sure, since depending on the typical input, the code around 
this code, and how many times this code gets called, calling toLower once 
could be more efficient or using icmp could be more efficient, since toLower 
allocates whereas icmp doesn't, but icmp does more comparisons that a simple 
==. But if you really care about efficiency, you should try both ways and see 
which is faster for your use case.

- Jonathan M Davis

Aug 05 2013

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Which option is faster...