Welcome to Web-News
A Web-based News Reader
Subject Re: Top 5
From Benji Smith <dlanguage@benjismith.net>
Date Sat, 11 Oct 2008 02:59:17 -0400
Newsgroups digitalmars.D
Attachment(s) StringTest.javaStringTest.d

dsimcha wrote:
> == Quote from Benji Smith (dlanguage@benjismith.net)'s article
>> Anyhow, I'm not going to keep chasing this point. For people new to D,
>> the subtle differences between static and dynamic arrays can be a source
>> of confusion. I still have my share of gotcha moments with them, and I
>> think D would be well served by minimizing those differences.
>> --benji
>
> I disagree, not only specifically on this issue but on a more philosophical level
> about a lot of stuff that's been mentioned here in the past few days about
> simplifying D.  The fact is that D is a performance language that retains the
> ability to program close to the metal.

Actually, when it comes to string processing, D is decidedly *not* a
"performance language".

Compared to...say...Java (which gets a bum rap around here for being
slow), D is nothing special when it comes to string processing speed.

I've attached a couple of benchmarks, implemented in both Java and D
(the "shakespeare.txt" file I'm benchmarking against is from the
Gutenburg project. It's about 5 MB, and you can grab it from here:
http://www.gutenberg.org/dirs/etext94/shaks12.txt )

In some of those benchmarks, D is slightly faster. In some of them, Java
is a lot faster. Overall, on my machine, the D code runs in about 12.5
seconds, and the Java code runs in about 2.5 seconds.

Keep in mind, all java characters are two-bytes wide. And you can't
access a character directly. You have to retrieve it from the String
object, using the charAt() method. And splitting a string creates a new
object for every fragment.

I admire the goal in D to be a performance language, but it drives me
crazy when people use performance as justification for an inferior
design, when other languages that use the superior design also
accomplish superior performance.

--benji



import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

public class StringTest {

   public static void main(String[] args) {

      File file = new File("shakespeare.txt");

      String text = null;

      StopWatch overallWatch = new StopWatch();
      StopWatch loadWatch = new StopWatch();
      try {
         byte[] fileBytes = new byte[(int) file.length()];
         InputStream stream = new FileInputStream(file);
         stream.read(fileBytes);
         text = new String(fileBytes);
         stream.close();
      } catch (IOException e) {
         e.printStackTrace();
      }

      System.out.format("time to load file: %6s seconds\n", loadWatch.seconds());

      runCharIterateTest(text);
      runWordFindTest(text);
      runWordReplaceTest(text);
      String[] splitWords = runWordSplitTest(text);
      runConcatenateTest(splitWords);

      System.out.format("overall test duration: %6s seconds\n", overallWatch.seconds());
   }

   private static void runCharIterateTest(String text) {
      StopWatch watch = new StopWatch();
      int spaceCount = 0;
      for (int i = 0; i < text.length(); i++) {
        if (text.charAt(i) == ' ') spaceCount++;
      }
      System.out.format("iterated through %s characters, and found %s spaces in %6s seconds\n", text.length(), spaceCount, watch.seconds());
   }

   private static void runWordFindTest(String text) {
      StopWatch watch = new StopWatch();
      int wordInstanceCount = -1;
      int position = 0;
      do {
         wordInstanceCount++;
         position = text.indexOf("the", position + 1);
      } while (position >= 0);
      System.out.format("String.indexOf(): found %s instances of 'the' in %6s seconds\n", wordInstanceCount, watch.seconds());
   }

   private static void runWordReplaceTest(String text) {
      int oldLength = text.length();
      StopWatch watch = new StopWatch();
      String replaced = text.replace("the", "XXXX");
      int newLength = replaced.length();
      int replacementCount = newLength - oldLength;
      System.out.format("replaced %s instances of 'the' with 'XXXX' in %6s seconds\n", replacementCount, watch.seconds());
   }

   private static String[] runWordSplitTest(String text) {
      StopWatch watch = new StopWatch();
      String[] splitWords = text.split(" ");
      System.out.format("split text into %s words in %6s seconds\n", splitWords.length, watch.seconds());
      return splitWords;
   }

   private static void runConcatenateTest(String[] splitWords) {
      StopWatch watch = new StopWatch();
      StringBuilder builder = new StringBuilder();
      for (String word : splitWords) {
         builder.append(word);
         builder.append(" ");
      }
      String rebuilt = builder.toString();
      double duration = watch.seconds();
      System.out.format("concatenated %s words (with %s chars) in %6s seconds\n", splitWords.length, rebuilt.length(), duration);
   }

   private static class StopWatch {
      private long startNanos;
      public StopWatch() { this.startNanos = System.nanoTime(); }
      public double seconds() {
         return (System.nanoTime() - startNanos) / 1000000000.0;
      }
   }

}



module StringTest;

import tango.io.device.FileConduit;
import tango.io.Stdout;
import tango.time.StopWatch;
import Util = tango.text.Util;

void main() {

   StopWatch overallWatch;
   overallWatch.start();
  
   StopWatch loadWatch;
   loadWatch.start();

   auto file = new FileConduit("shakespeare.txt");
   char[] text = new char[file.length];
   file.input.read(text);

   Stdout.formatln("time to load file: {0:f6} seconds", loadWatch.stop());

   runCharIterateTest(text);
   runWordFindTest(text);
   runWordReplaceTest(text);
   char[][] splitWords = runWordSplitTest(text);
   runConcatenateTest(splitWords);

   Stdout.formatln("overall test duration: {0:f6} seconds", overallWatch.stop());
}

private static void runCharIterateTest(char[] text) {
   StopWatch watch;
   watch.start();
   int spaceCount = 0;
   foreach (dchar c; text) {
     if (c == ' ') spaceCount++;
   }
   Stdout.formatln("iterated through {0} characters, and found {1} spaces in {2:f6} seconds", text.length, spaceCount, watch.stop());
}

private static void runWordFindTest(char[] text) {
   StopWatch watch;
   watch.start();
   int wordInstanceCount = -1;
   int position = 0;
   do {
      wordInstanceCount++;
      position = Util.locatePattern(text, "the", position + 1);
   } while (position < text.length);
   Stdout.formatln("String.indexOf(): found {0} instances of 'the' in {1:f6} seconds", wordInstanceCount, watch.stop());
}

private static void runWordReplaceTest(char[] text) {
   int oldLength = text.length;
   StopWatch watch;
   watch.start();
   char[] replaced = Util.substitute(text, "the", "XXXX");
   int newLength = replaced.length;
   int replacementCount = newLength - oldLength;
   Stdout.formatln("replaced {0} instances of 'the' with 'XXXX' in {1:f6} seconds", replacementCount, watch.stop());
}

private static char[][] runWordSplitTest(char[] text) {
   StopWatch watch;
   watch.start();
   char[][] splitWords = Util.split(text, " ");
   Stdout.formatln("split text into {0} words in {1:f6} seconds", splitWords.length, watch.stop());
   return splitWords;
}

private static void runConcatenateTest(char[][] splitWords) {
   StopWatch watch;
   watch.start();
   char[] buffer;
   foreach (char[] word; splitWords) {
      buffer ~= word;
      buffer ~= " ";
   }
   Stdout.formatln("concatenated {0} words (with {1} chars) in {2:f6} seconds", splitWords.length, buffer.length, watch.stop());
}


Recent messages in this thread
 
*# Re: Top 5 Oskar Linde 11-Oct-2008 11:19 am
|# Re: Top 5 Chris R. Miller 10-Oct-2008 08:36 pm
-# Re: Top 5 cemiller 10-Oct-2008 01:17 pm
|\# Re: Top 5 Benji Smith 10-Oct-2008 01:27 pm
-# Re: Top 5 Jarrett Billingsley 10-Oct-2008 02:05 pm
|-# Re: Top 5 Andrei Alexandrescu 10-Oct-2008 02:24 pm
|||# Re: Top 5 Benji Smith 10-Oct-2008 03:20 pm
||-# Re: Top 5 Sergey Gromov 10-Oct-2008 03:26 pm
|||\# Re: Top 5 Andrei Alexandrescu 10-Oct-2008 03:57 pm
||-# Re: Top 5 Steven Schveighoffer 10-Oct-2008 03:39 pm
||.-# Re: Top 5 Andrei Alexandrescu 10-Oct-2008 04:09 pm
||..-# Re: Top 5 Steven Schveighoffer 10-Oct-2008 04:42 pm
||...-# Re: Top 5 Andrei Alexandrescu 10-Oct-2008 05:05 pm
||....-# Re: Top 5 Christopher Wright 11-Oct-2008 10:41 am
||.....\# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 10:54 am
|-# Re: Top 5 Benji Smith 10-Oct-2008 03:23 pm
|||# Re: Top 5 Denis Koroskin 10-Oct-2008 03:28 pm
||-# Re: Top 5 Robert Fraser 10-Oct-2008 07:35 pm
||.-# Re: Top 5 Benji Smith 10-Oct-2008 08:07 pm
||..-# Re: Top 5 dsimcha 10-Oct-2008 08:44 pm
||..|-# Re: Top 5 (Current message) Benji Smith 11-Oct-2008 02:59 am
||..|.|# Re: Top 5 Benji Smith 11-Oct-2008 03:16 am
||..|.-# Re: Top 5 Sascha Katzner 11-Oct-2008 06:16 am
||..|.|-# Re: Top 5 Sergey Gromov 11-Oct-2008 08:49 am
||..|.||-# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 10:05 am
||..|.|||-# Re: Top 5 dsimcha 11-Oct-2008 10:37 am
||..|.||||\# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 10:53 am
||..|.|||-# Re: Top 5 Sergey Gromov 11-Oct-2008 12:52 pm
||..|.|||.-# Re: Top 5 Benji Smith 11-Oct-2008 03:00 pm
||..|.|||..-# Re: Top 5 bearophile 11-Oct-2008 03:08 pm
||..|.|||..|-# Re: Top 5 Benji Smith 11-Oct-2008 03:09 pm
||..|.|||..|.\# Re: Top 5 bearophile 11-Oct-2008 04:13 pm
||..|.|||..-# Re: Top 5 Sergey Gromov 11-Oct-2008 03:30 pm
||..|.|||...-# Re: Top 5 Benji Smith 11-Oct-2008 04:21 pm
||..|.|||....\# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 04:45 pm
||..|.||-# Re: Top 5 Sascha Katzner 11-Oct-2008 10:46 am
||..|.|||-# Re: Top 5 Benji Smith 11-Oct-2008 02:46 pm
||..|.|||.|# Re: Top 5 KennyTM~ 11-Oct-2008 02:58 pm
||..|.|||.-# Re: Top 5 bearophile 11-Oct-2008 03:07 pm
||..|.|||.|-# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 04:14 pm
||..|.|||.|.-# Re: Top 5 Benji Smith 11-Oct-2008 04:24 pm
||..|.|||.|..-# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 04:38 pm
||..|.|||.|...\# Re: Top 5 Benji Smith 11-Oct-2008 04:45 pm
||..|.|||.-# Re: Top 5 Sergey Gromov 11-Oct-2008 03:25 pm
||..|.|||.|\# Re: Top 5 Benji Smith 11-Oct-2008 03:26 pm
||..|.|||.-# Re: Top 5 Sascha Katzner 11-Oct-2008 03:32 pm
||..|.|||..\# Re: Top 5 Benji Smith 11-Oct-2008 04:14 pm
||..|.||-# Re: Top 5 Denis Koroskin 11-Oct-2008 10:17 am
||..|.||.\# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 10:32 am
||..|.|\# Re: Top 5 Benji Smith 11-Oct-2008 02:29 pm
||..|.\# Re: Top 5 Denis Koroskin 11-Oct-2008 06:56 am
||..\# Re: Top 5 Robert Fraser 10-Oct-2008 09:46 pm
|\# Re: Top 5 Jacob Carlborg 10-Oct-2008 05:14 pm
-# Re: Top 5 Benji Smith 10-Oct-2008 01:30 pm
|\# Re: Top 5 Denis Koroskin 10-Oct-2008 01:47 pm
|# Re: Top 5 Jarrett Billingsley 10-Oct-2008 02:57 pm
-# Re: Top 5 Chris R. Miller 10-Oct-2008 08:28 pm
|\# Re: Top 5 Nick Sabalausky 11-Oct-2008 12:30 am
|# Re: Top 5 Denis Koroskin 11-Oct-2008 10:37 am
-# Re: Top 5 Denis Koroskin 11-Oct-2008 10:50 am
|-# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 10:55 am
|.-# Re: Top 5 Sergey Gromov 11-Oct-2008 11:48 am
|..\# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 03:42 pm
-# Re: Top 5 Oskar Linde 11-Oct-2008 10:51 am
|-# Re: Top 5 Andrei Alexandrescu 11-Oct-2008 11:01 am
|.\# Re: Top 5 Oskar Linde 11-Oct-2008 11:45 am
-# Re: Exception Hierarchy [WAS: Re: Top 5] downs 11-Oct-2008 11:04 am
.-# Re: Exception Hierarchy [WAS: Re: Top 5] Jarrett Billingsley 11-Oct-2008 05:10 pm
..|# Re: Exception Hierarchy [WAS: Re: Top 5] Andrei Alexandrescu 11-Oct-2008 05:25 pm
..\# Re: Exception Hierarchy [WAS: Re: Top 5] Andrei Alexandrescu 11-Oct-2008 05:29 pm