www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Tips and Tricks for D

reply mclysenk mtu.edu writes:
(Not sure if this belongs in announcements)

I've recently written up an article for intermediate and beginning D
programmers.  

In it I discuss some basic tricks, covering:
+ printf
+ import scope
+ struct initialization
+ converting static arrays
+ removing objects from dynamic arrays

You can read it at http://www.assertfalse.com/articles/tricks.shtml .

Any comments, questions, suggestions or other feedback is welcome!

-Mikola Lysenko
Jul 02 2006
next sibling parent Derek Parnell <derek nomail.afraid.org> writes:
On Sun, 2 Jul 2006 22:38:15 +0000 (UTC), mclysenk mtu.edu wrote:


 You can read it at http://www.assertfalse.com/articles/tricks.shtml .
 
 Any comments, questions, suggestions or other feedback is welcome!

An alternate or additional place for this information (which is quite good) is the Wiki for D (http://www.prowiki.org/wiki4d/wiki.cgi). -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocrity!" 3/07/2006 10:50:07 AM
Jul 02 2006
prev sibling next sibling parent reply Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
mclysenk mtu.edu wrote:
 You can read it at http://www.assertfalse.com/articles/tricks.shtml .
 
 Any comments, questions, suggestions or other feedback is welcome!
 

I've always removed items from an array whilst preserving order in the following way (using variables from your example): queue = queue[0..idx] ~ queue[idx+1..$]; I did some test runs (benchmark source can be found at bottom of message), and on my machine, at least, my version is consistently faster - it runs in about 60% of the time yours takes. BTW, you write "tmp[].dup", which seems strange at least to my eyes: why not just "tmp.dup"? It confused me for a moment, I wondered whether it was doing something more than just a normal .dup operation. All in all, yours is a fine article about D. I'm not sure I knew about slicing pointers either. <g> My benchmark code: import std.stdio, std.perf; void main() { const int REPEATS = 1000, LENGTH = 100000; const int[] removeThese = [0, 42, 999, 7, 1000, 6888, 999, 9992]; int[] array; TickCounter t = new TickCounter(); t.start; for (int i = 0; i < REPEATS; ++i) { array.length = LENGTH; foreach (n; removeThese) { if (n != array.length - 1) { int[] tmp = array[n+1..$]; array[n..$-1] = tmp[].dup; } array.length = array.length - 1; } } t.stop; uint mTime = t.milliseconds; writefln("MLysenko: %6d ms for %6d repeats on %6d-length arrays (%.2f ms per removal)", mTime, REPEATS, LENGTH, cast(real)mTime / REPEATS); t.start; for (int i = 0; i < REPEATS; ++i) { array.length = LENGTH; foreach (n; removeThese) array = array[0..n] ~ array[n+1..$]; } t.stop; uint dTime = t.milliseconds; writefln("Deewiant: %6d ms for %6d repeats on %6d-length arrays (%.2f ms per removal)", dTime, REPEATS, LENGTH, cast(real)dTime / REPEATS); writefln("D / M: %.2f", cast(real)dTime / mTime); writefln("M / D: %.2f", cast(real)mTime / dTime); }
Jul 03 2006
parent reply mclysenk mtu.edu writes:
In article <e8ajmp$o9c$1 digitaldaemon.com>, Deewiant says...
mclysenk mtu.edu wrote:
 You can read it at http://www.assertfalse.com/articles/tricks.shtml .
 
 Any comments, questions, suggestions or other feedback is welcome!
 

I've always removed items from an array whilst preserving order in the following way (using variables from your example): queue = queue[0..idx] ~ queue[idx+1..$]; I did some test runs (benchmark source can be found at bottom of message), and on my machine, at least, my version is consistently faster - it runs in about 60% of the time yours takes.

First of all, thanks for the feedback! And second, you do have a good point. However, there is a slight bias in your test. Instead of removing elements from arbitrary positions in the array, you are strictly removing elements from the first 10000 positions. In your method, you are copying the beginning of the array, whereas in my method I am copying the end of the array. If you try the following benchmark, my version runs about 25-40 times faster on my machine: import std.stdio, std.perf; void main() { const int REPEATS = 1000, LENGTH = 100000; //Changed 0 to 1 in order to prevent bounds errors. const int[] removeThese = [1, 42, 999, 7, 1000, 6888, 999, 9992]; int[] array; TickCounter t = new TickCounter(); t.start; for (int i = 0; i < REPEATS; ++i) { array.length = LENGTH; foreach (n; removeThese) { n = array.length - n; //Pull from the end of the array if (n != array.length - 1) { int[] tmp = array[n+1..$]; array[n..$-1] = tmp[].dup; } array.length = array.length - 1; } } t.stop; uint mTime = t.milliseconds; writefln("MLysenko: %6d ms for %6d repeats on %6d-length arrays (%.2f ms per removal)", mTime, REPEATS, LENGTH, cast(real)mTime / REPEATS); t.start; for (int i = 0; i < REPEATS; ++i) { array.length = LENGTH; foreach (n; removeThese) { n = array.length - n; // Remove from opposite side. array = array[0..n] ~ array[n+1..$]; } } t.stop; uint dTime = t.milliseconds; writefln("Deewiant: %6d ms for %6d repeats on %6d-length arrays (%.2f ms per removal)", dTime, REPEATS, LENGTH, cast(real)dTime / REPEATS); writefln("D / M: %.2f", cast(real)dTime / mTime); writefln("M / D: %.2f", cast(real)mTime / dTime); } In the end, there isn't a single best answer to this problem. Both approaches have their merit, and perhaps an adaptive strategy which combines the two might best. (However yours performs far more consistently than mine, which may be a big advantage in some cases.)
BTW, you write "tmp[].dup", which seems strange at least to my eyes: why not
just "tmp.dup"? It confused me for a moment, I wondered whether it was doing
something more than just a normal .dup operation.

There is no difference. Since it is an array copy, I prefer to use the brackets. Once more, thanks for the comments! - Mikola Lysenko
Jul 03 2006
next sibling parent reply Tom S <h3r3tic remove.mat.uni.torun.pl> writes:
Errm... array.dup is evil, since it causes an alloc -> GC stalls anyone ?

How about this ?

void removeNth(T)(inout T[] array, int n) {
	if (n < array.length - 1) {
		memmove(&array[n],
				&array[n+1],
				(array.length - n - 1) * T.sizeof);
	}

	array.length = array.length - 1;
}


then just:

foreach (n; removeThese)
{
	n = array.length - n;
	array.removeNth(n);
}



-- 
Tomasz Stachowiak  /+ a.k.a. h3r3tic +/
Jul 03 2006
parent mclysenk mtu.edu writes:
In article <e8b8js$1m53$1 digitaldaemon.com>, Tom S says...
Errm... array.dup is evil, since it causes an alloc -> GC stalls anyone ?

How about this ?

void removeNth(T)(inout T[] array, int n) {
	if (n < array.length - 1) {
		memmove(&array[n],
				&array[n+1],
				(array.length - n - 1) * T.sizeof);
	}

	array.length = array.length - 1;
}

I thought about using a templated solution like this, but I decided to avoid it since it is perhaps a bit too complicated for an introductory article. However, upon consideration I think that this is probably the best way to go for any sort of heavy duty application. Of course, if you have 100k+ elements and you care about order, even a naive a linked list will beat the fastest array implementation any day. -Mikola Lysenko
Jul 03 2006
prev sibling next sibling parent Deewiant <deewiant.doesnotlike.spam gmail.com> writes:
mclysenk mtu.edu wrote:
 In article <e8ajmp$o9c$1 digitaldaemon.com>, Deewiant says...
 mclysenk mtu.edu wrote:
 You can read it at http://www.assertfalse.com/articles/tricks.shtml .

 Any comments, questions, suggestions or other feedback is welcome!

way (using variables from your example): queue = queue[0..idx] ~ queue[idx+1..$]; I did some test runs (benchmark source can be found at bottom of message), and on my machine, at least, my version is consistently faster - it runs in about 60% of the time yours takes.

First of all, thanks for the feedback! And second, you do have a good point. However, there is a slight bias in your test. Instead of removing elements from arbitrary positions in the array, you are strictly removing elements from the first 10000 positions. In your method, you are copying the beginning of the array, whereas in my method I am copying the end of the array. If you try the following benchmark, my version runs about 25-40 times faster on my machine:

My bad: originally I used a LENGTH of 10000 and then I neglected to change removeThese to include more testcases.
 In the end, there isn't a single best answer to this problem.  Both approaches
 have their merit, and perhaps an adaptive strategy which combines the two might
 best. (However yours performs far more consistently than mine, which may be a
 big advantage in some cases.)
 

You're right. Your method is a lot faster for removing elements near the end of an array.
Jul 03 2006
prev sibling parent Georg Wrede <georg.wrede nospam.org> writes:
mclysenk mtu.edu wrote:
 In article <e8ajmp$o9c$1 digitaldaemon.com>, Deewiant says...
 
mclysenk mtu.edu wrote:

You can read it at http://www.assertfalse.com/articles/tricks.shtml .

Any comments, questions, suggestions or other feedback is welcome!

I've always removed items from an array whilst preserving order in the following way (using variables from your example): queue = queue[0..idx] ~ queue[idx+1..$]; I did some test runs (benchmark source can be found at bottom of message), and on my machine, at least, my version is consistently faster - it runs in about 60% of the time yours takes.

First of all, thanks for the feedback! And second, you do have a good point. However, there is a slight bias in your test. Instead of removing elements from arbitrary positions in the array, you are strictly removing elements from the first 10000 positions. In your method, you are copying the beginning of the array, whereas in my method I am copying the end of the array. If you try the following benchmark, my version runs about 25-40 times faster on my machine: import std.stdio, std.perf; void main() { const int REPEATS = 1000, LENGTH = 100000; //Changed 0 to 1 in order to prevent bounds errors. const int[] removeThese = [1, 42, 999, 7, 1000, 6888, 999, 9992]; int[] array; TickCounter t = new TickCounter(); t.start; for (int i = 0; i < REPEATS; ++i) { array.length = LENGTH; foreach (n; removeThese) { n = array.length - n; //Pull from the end of the array if (n != array.length - 1) { int[] tmp = array[n+1..$]; array[n..$-1] = tmp[].dup; } array.length = array.length - 1; } } t.stop; uint mTime = t.milliseconds; writefln("MLysenko: %6d ms for %6d repeats on %6d-length arrays (%.2f ms per removal)", mTime, REPEATS, LENGTH, cast(real)mTime / REPEATS); t.start; for (int i = 0; i < REPEATS; ++i) { array.length = LENGTH; foreach (n; removeThese) { n = array.length - n; // Remove from opposite side. array = array[0..n] ~ array[n+1..$]; } } t.stop; uint dTime = t.milliseconds; writefln("Deewiant: %6d ms for %6d repeats on %6d-length arrays (%.2f ms per removal)", dTime, REPEATS, LENGTH, cast(real)dTime / REPEATS); writefln("D / M: %.2f", cast(real)dTime / mTime); writefln("M / D: %.2f", cast(real)mTime / dTime); } In the end, there isn't a single best answer to this problem. Both approaches have their merit, and perhaps an adaptive strategy which combines the two might best. (However yours performs far more consistently than mine, which may be a big advantage in some cases.)

If I understand correctly, the last items in the array get shifted a number of times, which takes unnecessary time. Imagine a "history diagram" of the array. In it I've put an "R" for the removables, and those in between are denoted with single lower case letters: RaaaaRbbbbbRccRddddddReeeeeeeeeeRfffff (So all items between the first and second removable are called "a".) Now, the history would look like: RaaaaRbbbbbRccRddddddReeeeeeeeeeRfffff aaaaRbbbbbRccRddddddReeeeeeeeeeRfffff aaaabbbbbRccRddddddReeeeeeeeeeRfffff aaaabbbbbccRddddddReeeeeeeeeeRfffff aaaabbbbbccddddddReeeeeeeeeeRfffff aaaabbbbbccddddddeeeeeeeeeeRfffff aaaabbbbbccddddddeeeeeeeeeefffff A lot less copying would be done with the following: RaaaaRbbbbbRccRddddddReeeeeeeeeeRfffff aaaa RbbbbbRccRddddddReeeeeeeeeeRfffff aaaabbbbb RccRddddddReeeeeeeeeeRfffff aaaabbbbbcc RddddddReeeeeeeeeeRfffff aaaabbbbbccdddddd ReeeeeeeeeeRfffff aaaabbbbbccddddddeeeeeeeeee Rfffff aaaabbbbbccddddddeeeeeeeeeefffff With this strategy, no element ever gets moved more than once. Of course, this is only applicable when all the elements-to-be-removed are known before the start. If you have an array of 1M elements, and remove, say a few hundred elements from near the beginning, then these two methods have a huge performance difference. (And yes, like someone pointed out, there do exist real-life situations where one simply has to use an array for this, as opposed to a linked list.) PS: I have no objection if you include this reasoning in your excellent Tips and Tricks, should you want to. :-)
Jul 06 2006
prev sibling parent reply Henrik Eneroth <Henrik_member pathlink.com> writes:
In article <e89hsn$1rvh$1 digitaldaemon.com>, mclysenk mtu.edu says...
(Not sure if this belongs in announcements)

I've recently written up an article for intermediate and beginning D
programmers.  

In it I discuss some basic tricks, covering:
+ printf
+ import scope
+ struct initialization
+ converting static arrays
+ removing objects from dynamic arrays

You can read it at http://www.assertfalse.com/articles/tricks.shtml .

Any comments, questions, suggestions or other feedback is welcome!

-Mikola Lysenko

Useful stuff! A question about structs: I have in many cases used classes over structs because of the lack of constructors in structs. Performance-wise, is it always better to use structs over classes if you can? Or will a class where no virtual functions etc. are used be similar to a struct performance-wise? Also, if structs are copy-by-value, could this lead to unnecessary memory allocations in those cases where just passing a reference (i.e. classes) would do? That is, would classes be a better choice in this case? /// Henrik
Jul 03 2006
parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
Henrik Eneroth skrev:
 In article <e89hsn$1rvh$1 digitaldaemon.com>, mclysenk mtu.edu says...
 (Not sure if this belongs in announcements)

 I've recently written up an article for intermediate and beginning D
 programmers.  

 In it I discuss some basic tricks, covering:
 + printf
 + import scope
 + struct initialization
 + converting static arrays
 + removing objects from dynamic arrays

 You can read it at http://www.assertfalse.com/articles/tricks.shtml .

 Any comments, questions, suggestions or other feedback is welcome!

 -Mikola Lysenko

Useful stuff! A question about structs: I have in many cases used classes over structs because of the lack of constructors in structs. Performance-wise, is it always better to use structs over classes if you can? Or will a class where no virtual functions etc. are used be similar to a struct performance-wise?

A class will always have a bit more space overhead than a struct. Other than the implications of by-value vs by-reference there should not be any performance differences.
 Also, if structs are copy-by-value, could this lead to unnecessary memory
 allocations in those cases where just passing a reference (i.e. classes) would
 do? That is, would classes be a better choice in this case?

A pass-by-value struct will be allocated on the stack which generally costs nothing compared to a heap allocation. The cost of copying depends on the size of the struct. In general, passing large structs by value is expensive. Small structs may (or may not) be more efficiently passed by value. The actual speed difference of passing a small struct by value compared to by reference depends largely on how the struct is used and the ABI in use. On some ABIs struct XY { int x,y; } int myfunc(XY p) { return p.x+p.y; } will be identical to: int myfunc(int x, int y) { return x+y; } Returning a struct by value may be more expensive than passing it. The only way to know for certain is to run the benchmarks yourself. I would guess that on x86, with its few registers, passing by reference could often be faster even for rather small structs. Does anyone have conclusive benchmark results and a rule of thumb here? Rather than using a class, one could explicitly pass the struct by reference, by either an inout parameter or a pointer: int myfunc(inout XY p) int myfunc(XY *p) Regards, Oskar
Jul 03 2006