www.digitalmars.com         C & C++   DMDScript  

D - I just read that D uses the COW principal for Strings

reply Helium <Helium_member pathlink.com> writes:
Als said in the topic D seems to use COW. Used in singel threaded applications
it can realy speed up things. But in a multithreaded world that we have today it
can realy slow down things.

I'm new to D, and I don't know, if it even support multithreading. If not
forgert this post, I it does you should realy think about COW, because it's a
speedup that isn't.
Sep 10 2003
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Helium wrote:
 Als said in the topic D seems to use COW. Used in singel threaded applications
 it can realy speed up things. But in a multithreaded world that we have today
it
 can realy slow down things.

Why actually slow down? And how does it depend on threads? The only consern i'm aware of are old strings which are left to a garbage collector -- which happens to lots of other stuff anyway. The GC is currently somewhat spartanian to threads and could use some improvement, but it will change someday.
 I'm new to D, and I don't know, if it even support multithreading. If not
 forgert this post, I it does you should realy think about COW, because it's a
 speedup that isn't.

Sure it supports threads. See phobos pages in the spec. Welcome to community and be sure to read further. ;) -eye
Sep 10 2003
parent reply "Philippe Mori" <philippe_mori hotmail.com> writes:
"Ilya Minkov" <minkov cs.tum.edu> a écrit dans le message de
news:bjnk9u$2joq$1 digitaldaemon.com...
 Helium wrote:
 Als said in the topic D seems to use COW. Used in singel threaded


 it can realy speed up things. But in a multithreaded world that we have


 can realy slow down things.

Why actually slow down? And how does it depend on threads? The only consern i'm aware of are old strings which are left to a garbage collector -- which happens to lots of other stuff anyway. The GC is currently somewhat spartanian to threads and could use some improvement, but it will change someday.

It is known in C++ that COW string implementation are either slower or marginally faster for typical multi-threaded applications... and it does not worth the increased complexity and bugs... The problem is essentially that it is hard to have a thread-safe and efficient string class in C++ at the same time... Even though the client need to uses some critical sections (or mutex) for safe access, library must implement thread-safe ref-count as this is not possible for the user to do it (cleanly). Even the latest STL used by Microsoft Visual C++ does not used COW anymore for those reason and I'm sure they are not alone to have done that.
Sep 11 2003
parent reply Helmut Leitner <leitner hls.via.at> writes:
Philippe Mori wrote:
 
 "Ilya Minkov" <minkov cs.tum.edu> a écrit dans le message de
 news:bjnk9u$2joq$1 digitaldaemon.com...
 Helium wrote:
 Als said in the topic D seems to use COW. Used in singel threaded


 it can realy speed up things. But in a multithreaded world that we have


 can realy slow down things.

Why actually slow down? And how does it depend on threads? The only consern i'm aware of are old strings which are left to a garbage collector -- which happens to lots of other stuff anyway. The GC is currently somewhat spartanian to threads and could use some improvement, but it will change someday.

It is known in C++ that COW string implementation are either slower or marginally faster for typical multi-threaded applications... and it does not worth the increased complexity and bugs... The problem is essentially that it is hard to have a thread-safe and efficient string class in C++ at the same time... Even though the client need to uses some critical sections (or mutex) for safe access, library must implement thread-safe ref-count as this is not possible for the user to do it (cleanly). Even the latest STL used by Microsoft Visual C++ does not used COW anymore for those reason and I'm sure they are not alone to have done that.

D uses a different garbage collection method that is not based on reference counting. While it may have other disadvantages it should be robust in a multi-threaded system. -- Helmut Leitner leitner hls.via.at Graz, Austria www.hls-software.com
Sep 11 2003
parent reply "Philippe Mori" <philippe_mori hotmail.com> writes:
 It is known in C++ that COW string implementation are either slower
 or marginally faster for typical multi-threaded applications... and it


 not worth the increased complexity and bugs...

 The problem is essentially that it is hard to have a thread-safe and
 efficient
 string class in C++ at the same time... Even though the client need to


 some critical sections (or mutex) for safe access, library must


 thread-safe ref-count as this is not possible for the user to do it
 (cleanly).

 Even the latest STL used by Microsoft Visual C++ does not used COW
 anymore for those reason and I'm sure they are not alone to have done


 D uses a different garbage collection method that is not based on

 counting. While it may have other disadvantages it should be robust in
 a multi-threaded system.

But then, does making a copy of a string make a real copy or take another reference to it.. What will happen in D with the following example?. string a = "hello"; string b; b = a; a = "goodbye"; In C++ a, would be "goodbye" and b "hello" and if the last line removed, we have one copy of the string if COW is used and 2 otherwise. So if you want one copy of the actual text in D if you do something like above, you need COW and this as almost nothing to do with GC. OTOH, if copy are reference to the same string, then it would be faster but you would have to ask explictly for a copy if you want one b = a.clone(); And if we always make copy, then if would be the same as in C++ without COW.
Sep 11 2003
parent reply Ilya Minkov <minkov cs.tum.edu> writes:
Philippe Mori wrote:

 But then, does making a copy of a string make a real copy or take another
 reference to it..
 
 What will happen in D with the following example?.
 
 string a = "hello";
 string b;
 b = a;
 a = "goodbye";

Given you use char[] instead of string - because we do not have string class: 1. a is created and is a slice of a constant "hello"; 2. b becomes a slice of a - they point to the same data which is constant; 3. a becomes a slice of a constant "goodbye". Reminder: slice is a structure of a start adress and a length of an array. There are obviously cases which will make them point to actually allocated arrays. You should then simply make sure that you overwrite nothing - and let GC pick up your leftovers.
 OTOH, if copy are reference to the same string, then it would be faster
 but you would have to ask explictly for a copy if you want one

So is it.
 b = a.clone();

I think it should read "b = a.dup;"
 And if we always make copy, then if would be the same as in C++
 without COW.

No, you don't want to always make copies, just on writes... Another difference, that C++ can use its destruction rules instead of GC. Sure, you can assure C++ follows a similar behaviour by using const qualifier. That's how we did this in Delphi. Hey! This leads me to a new idea: A function should automatically duplicate an array at the beginning, if an array is "in" qualified, and could be written to within this function. Const-ness is implicit, but it doesn't affect interfacing the functions, which is determined by the "in" qualifier, thus the conventions are not broken. Sometimes, it may be desirable that the array is not duplicated, and writes go back to the original array. In this case, it should be qualified "inout"! This regulation might be expanded to other things like Objects... -eye.
Sep 11 2003
parent "Philippe Mori" <philippe_mori hotmail.com> writes:
 What will happen in D with the following example?.

 string a = "hello";
 string b;
 b = a;
 a = "goodbye";

Given you use char[] instead of string - because we do not have string class: 1. a is created and is a slice of a constant "hello"; 2. b becomes a slice of a - they point to the same data which is constant; 3. a becomes a slice of a constant "goodbye". Reminder: slice is a structure of a start adress and a length of an array.

My sample was not well chosen... I'm relatively new to D and I have taken one of the simplest example I can imagine... What would happen if a = "goodbye"; is replaced by a call to a function that modify the content of the string like: a ~= " world"; // append at end if I remember well Sinc e a and b where shared before that, a copy must be made at that time... and to know that they are shared, we need a reference count (since otherwise, we would need to check from GC if it used more than once which would be very slow). Thus COW must be used and we face the same problem as in C++ where the performance will degrade a lot if we do a lot of modifications like appending one char at a time... From what I understand, in C++ the problem comes from the fact that we must lock very often and in some implementation slow thread synchronisation is or must be used... and it is not easy to provide an implementation that works correctly simply with locked arithmetic and comparison operations...
Sep 11 2003