Welcome to Web-News
A Web-based News Reader
Subject Help to improve speed
From Jacob Carlborg <doobnet@gmail.com>
Date Mon, 09 Jun 2008 18:38:46 +0200
Newsgroups digitalmars.D
Attachment(s) phonecode.rbphonecode.d

I have an application that maps numbers to words from a list of words
written in Ruby. I've ported the application to D and it's terrible slow
  and I don't know why. With a file of around 60 000 words and a file
with around 25 000 numbers the Ruby application takes around 3 seconds
and my D port takes around 209 seconds to finish. It's probably not well
coded, I just tried to port it but over 200 seconds longer doesn't feel
right.

I've attached the Ruby and D files.





class File
  
  def setMap(m)
    @map = m
  end
  
  def init(showText)
    #Initiates necessery action in order to search for words. Sets map, calulates min wordlength and saves words to hashtable
    map = {"a"=>"5","b"=>"7","c"=>"6","d"=>"3","e"=>"0","f"=>"4","g"=>"9","h"=>"9","i"=>"6","j"=>"1","k"=>"7","l"=>"8","m"=>"5","n"=>"1","o"=>"8","p"=>"8","q"=>"1","r"=>"2","s"=>"3","t"=>"4","u"=>"7","v"=>"6","w"=>"2","x"=>"2","y"=>"3","z"=>"9"}
    $stdout.puts "Sets mapping for letter to number" if showText
    self.setMap(map)
    $stdout.puts "Calculates minimal wordlength" if showText
    self.calcMinWordLength()
    $stdout.puts "Saves words in hashtable" if showText
    self.toHash()
  end
  
  def findWords(number)
    #Method that tries to find all possible word replacements for the digits in number.
    #Returns an array with all possible word replacements
    return self.findWordsRec(number, true)
  end
  
  def findWordsRec(number, replaceWithNumb)
    #Method that tries to find all possible word replacements for the digits in number.
    #replaceWithNumb is a argument used in recursive calls to prevent two digits after each other
    arrRet = []
    #Saves the current pos in file to make the method recursive safe
    
    if number.length == 0
      #Reached end of recursive calls.
      return [""]
    else
      #Gets an array with all words whos first letters match the number respetation of number[0,@minWordLength-1]
      arr = @hash[number[0,@minWordLength-1]]
      if arr != nil
        arr.each do |dicWord|
          #Search file for words whos digit respetation match the argument number. dicWord is the word to check
          dicWordNumb = self.StringToNumb(dicWord)  #Gets the number reseptation of dicWord
          if (number =~ /#{dicWordNumb}/) == 0
            #Current word matches the number
            #Do recurisve call with the rest of number that the word don't match, even if there is not any numbers left.
            tail = self.findWordsRec(number[dicWord.length..-1], true)
            tail.each do |t|
              #Found words that fit the rest. Add these to the return array.
              arrRet << dicWord + " " + t
            end
          end
        end
      end
    end
    
    if arrRet.length == 0 and replaceWithNumb == true
      #Found no words at all. Try to replace first char with a digit and do recursive call on the rest
      tail = self.findWordsRec(number[1, number.length--1], false)
      tail.each do |t|
        #Found words that fit the rest. Add these to the return array
        arrRet << number[0,1] + " " + t
      end
    end
    #Before returning make sure no words exists twice in arrRet
    return arrRet.uniq
  end

  def StringToNumb(str)
    #Converts str to numbers according to map
    ret = ""
    str.chomp.split(//).each do |char|
      #Uses the hashtable @map to find the corresponding number to character
      ret << @map[char.downcase]
    end
    return ret
  end
  
  def toHash()
    #Makes a hashtable where the values are arrays with words whos first letters translates to the key
    #How many letter that are used as key depends on how long the shortaged word in file is
    #ex. the word 'hello' is in an array where the key is 90 as h=9 and e=0 and the shortest word in file is 2 characters long
    hash = {}
    self.each do |word|
      #Loop throu all words in file
      word.chomp!
      #x contains the number respetation of word.
      x = self.StringToNumb(word)
      #cuts away the tail in order to make sure all keys in hastable is not longet then the minWordLength in file
      x= x[0,@minWordLength-1]
      if hash[x] == nil
        hash[x] = [word]
      else
        hash[x] << word
      end
    end
    @hash = hash
    self.rewind
  end
  
  def calcMinWordLength()
    #Calculates the minimal word length
    ret = 1000
    self.each do |word|
      if word.chomp.length < ret
        ret = word.chomp.length
      end
    end
    self.rewind()
    @minWordLength = ret
  end
  
end

t1 = Time.now

dicFile = File.new("dict.txt","r")
numbFile = File.new("numbers50k.txt","r")
outFile = File.new("out.txt", "w")

#Initates file
dicFile.init(true)

numbFile.each do |numb|
  #For all numbers in numbers file try to find words
  numb.chomp!
  ret = dicFile.findWords(numb)
  ret.each do |re|
    #Print result to screen and output file
    puts "#{numb}: #{re}"
    #In output file all characters are in downcase to work with validation program
    outFile.puts "#{numb}: #{re.downcase}"
  end
end

outFile.close
dicFile.close
numbFile.close

puts "Time:#{Time.now - t1}"



/** * Copyright: Copyright (c) 2008 Jacob Carlborg. All rights reserved. * Authors: Jacob Carlborg * Version: Initial created: May 27, 2008 * License: $(LINK2 http://opensource.org/licenses/bsd-license.php, BSD Style) * */ module phonecode; import tango.core.Array; import tango.io.FileConduit; import tango.io.Stdout; import Ascii = tango.text.Ascii; import tango.text.Regex; import tango.text.stream.LineIterator; import tango.time.StopWatch; import tango.time.Time; char[char] map; string[][string] hash; int minWordLength; string[] words; /** * Prints to standard output * and adds a new line * * Params: *     args = what to print */ void println (A...) (A args) {         foreach (a ; args)                 Stdout(a);         Stdout.newline; } /** * Initiates necessery action in order to search for words. * Calulates min wordlength and saves words to hashtable * * Params: *     words = the array of words */ void init (string[] words) {         map = ['a' : '5', 'b' : '7', 'c' : '6', 'd' : '3', 'e' : '0', 'f' : '4',                         'g' : '9', 'h' : '9', 'i' : '6', 'j' : '1', 'k' : '7', 'l' : '8',                         'm' : '5', 'n' : '1', 'o' : '8', 'p' : '8', 'q' : '1', 'r' : '2',                         's' : '3', 't' : '4', 'u' : '7', 'v' : '6', 'w' : '2', 'x' : '2',                         'y' : '3', 'z' : '9'];                  .words = words;         debug                 println("Calculates minimal wordlength");         calcMinWordLength;         debug                 println("Saves words in hashtable");         toHash; } /** * Converts the given char to lowercase * * Params: *     c = the char to convert *     * Returns: the converted char */ char toLower (char c) {         return Ascii.toLower([c])[0]; } /** * Method that tries to find all possible * word replacements for the digits in number. * * Params: *     number = the number to find all the words for *     * Returns: an array with all possible word replacements */ string[] findWords (string number) {         return findWordsRec(number, true); } /** * Method that tries to find all possible * word replacements for the digits in number. * * Params: *     number = the number to find all the words for *     replaceWithNumb = argument used in recursive calls to prevent two digits after each other *     * Returns: an array with all possible word replacements */ string[] findWordsRec (string number, bool replaceWithNumb) {         string[] arrRet;         if (number.length == 0)                 return [""];         else         {                 if (minWordLength - 1 < number.length)                 {                         if (number[0 .. minWordLength - 1] in hash)                         {                                 string[] arr = hash[number[0 .. minWordLength - 1]];                                          foreach (dicWord ; arr)                                 {                                         auto dicWordNumb = stringToNumb(dicWord);                                                  if (Regex(dicWordNumb).test(number))                                         {                                                 auto tail = findWordsRec(number[dicWord.length .. $], true);                                                          foreach (t ; tail)                                                         arrRet ~= dicWord ~ " " ~ t;                                         }                                 }                         }                 }                                  else                 {                         if (number[0 .. $] in hash)                         {                                 string[] arr = hash[number[0 .. minWordLength - 1]];                                          foreach (dicWord ; arr)                                 {                                         auto dicWordNumb = stringToNumb(dicWord);                                                  if (Regex(dicWordNumb).test(number))                                         {                                                 auto tail = findWordsRec(number[dicWord.length .. $], true);                                                          foreach (t ; tail)                                                         arrRet ~= dicWord ~ " " ~ t;                                         }                                 }                         }                 }                                 }         if (arrRet.length == 0 && replaceWithNumb == true)         {                 auto tail = findWordsRec(number[1 .. $], false);                 foreach (t ; tail)                         arrRet ~= number[0 .. 1] ~ " " ~ t;         }         auto i = arrRet.unique();         return arrRet[0 .. i]; } /** * Converts str to numbers according to map * * Params: *     str = the string to convert *     * Returns: the convertet string */ string stringToNumb (string str) {         string ret;         foreach (c ; str)                 ret ~= map[toLower(c)];         return ret; } /** * Makes a hashtable where the values are arrays with words whos first letters translates to the key * How many letter that are used as key depends on how long the shortaged word in file is * ex. the word 'hello' is in an array where the key is 90 as h=9 and e=0 and the shortest word in file is 2 characters long */ void toHash () {         string[][string] h;         foreach (word ; words)         {                 auto x = stringToNumb(word);                 x = x[0 .. minWordLength - 1];                 if (!(x in h))                         h[x] = [word];                 else                         h[x] ~= word;         }         hash = h; } /// Calculates the minimal word length void calcMinWordLength () {         auto ret = 1000;         foreach (word ; words)                 if (word.length < ret)                         ret = word.length;         minWordLength = ret; } void main (string[] args) {         if (args.length == 3)         {                 StopWatch elapsed;                 elapsed.start;                 string[] words;                 auto filenameNumbers = new FileConduit(args[1]);                 auto filenameWords = new FileConduit(args[2]);                                  scope (exit)                 {                         filenameNumbers.close;                         filenameWords.close;                 }                                  auto numbersLineIterator = new LineIterator!(char)(filenameNumbers);                 auto wordsLineIterator = new LineIterator!(char)(filenameWords);                 foreach (word ; wordsLineIterator)                         words ~= word.dup;                 init(words);                 foreach (number ; numbersLineIterator)                 {                         auto ret = findWords(number);                         foreach (re ; ret)                                 println(number, ": ", re);                 }                 auto time = TimeSpan.micros(elapsed.microsec);                 println("time: \n", time.hours, " hours\n", time.minutes, " minutes\n",                                 time.seconds, " seconds\n", time.millis, " milliseconds\n",                                 time.micros, " microseconds\n", time.nanos, " nanoseconds\n");         }         else                 println("usage: phonecode numbers words"); }
Recent messages in this thread
 
-# Help to improve speed (Current message) Jacob Carlborg 09-Jun-2008 12:38 pm
.-# Re: Help to improve speed bearophile 09-Jun-2008 03:27 pm
.|\# Re: Help to improve speed Jacob Carlborg 09-Jun-2008 04:53 pm
.-# Re: Help to improve speed Walter Bright 09-Jun-2008 04:07 pm
..-# Re: Help to improve speed Jacob Carlborg 09-Jun-2008 05:23 pm
...-# Re: Help to improve speed Walter Bright 09-Jun-2008 07:37 pm
...|-# Re: Help to improve speed Anders F Björklund 10-Jun-2008 03:31 am
...|.-# Re: Help to improve speed Jacob Carlborg 10-Jun-2008 06:21 am
...|..\# Re: Help to improve speed Walter Bright 10-Jun-2008 05:10 pm
...\# Re: Help to improve speed Lutger 10-Jun-2008 05:39 am