www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A better std.string.replace?

reply Chris Sauls <ibisbasenji gmail.com> writes:
I cooked up this program originally just to test the little 'Timer' class I
made for 
myself, and wound up with something interesting.  I tried to write a "better"
version of 
std.string.replace, and compared their run times.  I made two versions of mine,
one of 
them using an inout parameter.  And I timed calling it as a "pseudo-member"
function (aka: 
arr.func()).  These were my results:

[Timer] 'replace' benchmarks: Begin
[Timer] std.string.replace: Begin
[Timer] std.string.replace: 20.66000
[Timer] myReplace: Begin
[Timer] myReplace: 17.02000
[Timer] myReplaceIO: Begin
[Timer] myReplaceIO: 13.85000
[Timer] myReplaceIO pseudo-member: Begin
[Timer] myReplaceIO pseudo-member: 13.62000
[Timer] 'replace' benchmarks: 65.15000

I note four things here.  First, that my timer seems to be working.  That's
good for me, 
but nothing exciting.  Second, that my cheap and cheesy replace function
actually seems to 
be a little faster.  Third, that using an inout parameter has made it even
faster.  And 
fourth, that for some reason the pseudo-member invocation sometimes clocks even
faster.

Draw your own conclusions.  Complete source of the test program follows.

-- Chris Sauls

#
# module test1;
#
# private import std.string;
# private import timer;
#
# const int    ITERS    = 1_000_000;
#
# const char[] ORIGINAL =
"This+is+just+your+ordinary+run+of+the+mill+sentence.";
# const char[] FROM     = "+";
# const char[] TO       = " ";
#
# int main (char[][] args) {
#   auto Timer t0 = new Timer("'replace' benchmarks");
#
#   {
#     char[] orig   = ORIGINAL.dup,
#            result               ;
#
#     auto Timer t = new Timer("std.string.replace");
#     for (int i; i < ITERS; i++) {
#       result = replace(orig, FROM, TO);
#       replace(result, TO, FROM);
#     }
#   }
#
#   {
#     char[] orig   = ORIGINAL.dup,
#            result               ;
#
#     auto Timer t = new Timer("myReplace");
#     for (int i; i < ITERS; i++) {
#       result = myReplace(orig, FROM, TO);
#       replace(result, TO, FROM);
#     }
#   }
#
#   {
#     char[] str = ORIGINAL.dup;
#
#     auto Timer t = new Timer("myReplaceIO");
#     for (int i; i < ITERS; i++) {
#       myReplaceIO(str, FROM, TO);
#       myReplaceIO(str, TO, FROM);
#     }
#   }
#
#   {
#     char[] str = ORIGINAL.dup;
#
#     auto Timer t = new Timer("myReplaceIO pseudo-member");
#     for (int i; i < ITERS; i++) {
#       str.myReplaceIO(FROM, TO);
#       str.myReplaceIO(TO, FROM);
#     }
#   }
#
#   return 0;
# }
#
# char[] myReplace (char[] str, char[] from, char[] to) {
#   return std.string.join(std.string.split(str, from), to);
# }
#
# void myReplaceIO (inout char[] str, char[] from, char[] to) {
#   str = std.string.join(std.string.split(str, from), to);
# }
#
Jul 20 2005
parent reply Burton Radons <burton-radons smocky.com> writes:
Chris Sauls wrote:
 I cooked up this program originally just to test the little 'Timer' 
 class I made for myself, and wound up with something interesting.  I 
 tried to write a "better" version of std.string.replace, and compared 
 their run times.  I made two versions of mine, one of them using an 
 inout parameter.  And I timed calling it as a "pseudo-member" function 
 (aka: arr.func()).  These were my results:

You're getting the speed benefits from using out because you made a mistake in myReplace - you call it once but use std.string.replace for the reverse call. Fixing that makes them take identical times. Yours does use less memory than std.string.replace in most cases. Funny. It can be made slightly faster by avoiding a scan later on, but that wouldn't show up on any but seriously damaged tests. I made a recursive version that makes only one allocation. My results were: [Timer] std.string.replace: 3.4850 [Timer] myReplace: 2.3590 [Timer] replace_recursive: 1.9530 [Timer] 'replace' benchmarks: 7.7970 20% isn't good enough to merit a strict but unknowable, environment-defined limit on how many matches it can take, in my opinion. I think std.string.replace should be swapped with your function.
Jul 20 2005
next sibling parent Chris Sauls <ibisbasenji gmail.com> writes:
Burton Radons wrote:
 You're getting the speed benefits from using out because you made a 
 mistake in myReplace - you call it once but use std.string.replace for 
 the reverse call.  Fixing that makes them take identical times.

these times: [Timer] std.string.replace: 20.60000 [Timer] myReplace: 13.62000 [Timer] myReplaceIO: 13.79000 [Timer] myReplaceIO pseudo-member: 13.68000 [Timer] 'replace' benchmarks: 61.69000 So yes, the regular myReplace is definitely quicker. I thought it seemed a little odd the other way around. Although I'm still curious as to why the pseudo-member syntax is clocking slightly faster than the normal invocation.
 I made a recursive version that makes only one allocation.  My results 
 were:
 
 [Timer] std.string.replace: 3.4850
 [Timer] myReplace: 2.3590
 [Timer] replace_recursive: 1.9530
 [Timer] 'replace' benchmarks: 7.7970
 

w00t. Share? -- Chris Sauls
Jul 20 2005
prev sibling parent reply "Ben Hinkle" <bhinkle mathworks.com> writes:
"Burton Radons" <burton-radons smocky.com> wrote in message 
news:dbm52s$k37$1 digitaldaemon.com...
 Chris Sauls wrote:
 I cooked up this program originally just to test the little 'Timer' class 
 I made for myself, and wound up with something interesting.  I tried to 
 write a "better" version of std.string.replace, and compared their run 
 times.  I made two versions of mine, one of them using an inout 
 parameter.  And I timed calling it as a "pseudo-member" function (aka: 
 arr.func()).  These were my results:

You're getting the speed benefits from using out because you made a mistake in myReplace - you call it once but use std.string.replace for the reverse call. Fixing that makes them take identical times. Yours does use less memory than std.string.replace in most cases. Funny. It can be made slightly faster by avoiding a scan later on, but that wouldn't show up on any but seriously damaged tests. I made a recursive version that makes only one allocation. My results were: [Timer] std.string.replace: 3.4850 [Timer] myReplace: 2.3590 [Timer] replace_recursive: 1.9530 [Timer] 'replace' benchmarks: 7.7970 20% isn't good enough to merit a strict but unknowable, environment-defined limit on how many matches it can take, in my opinion. I think std.string.replace should be swapped with your function.

Since a common case would be that from.length == to.length it might be worth putting in the optimization that would only need one allocation (ie - if there are any matches, dup and copy 'to' in place of 'from' repeatedly) and use the split/join for the general case. I have no idea what the performance would end up being.
Jul 20 2005
parent Burton Radons <burton-radons smocky.com> writes:
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Ben Hinkle wrote:

 "Burton Radons" <burton-radons smocky.com> wrote in message 
 news:dbm52s$k37$1 digitaldaemon.com...
 
Chris Sauls wrote:

I cooked up this program originally just to test the little 'Timer' class 
I made for myself, and wound up with something interesting.  I tried to 
write a "better" version of std.string.replace, and compared their run 
times.  I made two versions of mine, one of them using an inout 
parameter.  And I timed calling it as a "pseudo-member" function (aka: 
arr.func()).  These were my results:

You're getting the speed benefits from using out because you made a mistake in myReplace - you call it once but use std.string.replace for the reverse call. Fixing that makes them take identical times. Yours does use less memory than std.string.replace in most cases. Funny. It can be made slightly faster by avoiding a scan later on, but that wouldn't show up on any but seriously damaged tests. I made a recursive version that makes only one allocation. My results were: [Timer] std.string.replace: 3.4850 [Timer] myReplace: 2.3590 [Timer] replace_recursive: 1.9530 [Timer] 'replace' benchmarks: 7.7970 20% isn't good enough to merit a strict but unknowable, environment-defined limit on how many matches it can take, in my opinion. I think std.string.replace should be swapped with your function.

Since a common case would be that from.length == to.length it might be worth putting in the optimization that would only need one allocation (ie - if there are any matches, dup and copy 'to' in place of 'from' repeatedly) and use the split/join for the general case. I have no idea what the performance would end up being.

That's true. I get these results when I do that: [Timer] replace_new: 2.3440 (baseline) [Timer] std.string.replace: 7.1870 (0.3261 vs baseline) [Timer] myReplace: 4.7660 (0.4918 vs baseline) [Timer] replace_recursive: 3.8130 (0.6147 vs baseline) [Timer] 'replace' benchmarks: 18.1100 (0.1294 vs baseline)
Jul 20 2005