www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Possible new COW/copy suggestions?

   I was reading the book on D by Andrei Alexandrescu, and it suddenly occurred
to me, perhaps there should be a couple special case copy methods for
Copy-on-write (COW) which work on arrays only. (on single variables it does
nothing special, since changes would just replace the variable's contents). I
have a copying suggestion for structures.

  You _can_ live without these, but they would make certain tasks and cases a
lot less repetitive and error prone.


  For arrays using COW, I'm using DMD's toupper function as a reference for how
this would work/affect code. http://www.digitalmars.com/d/2.0/memory.html

--Strings (and Array) Copy-on-Write

char[] toupper(char[] s)
{
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z')
	    s[i] = c - (cast(char)'a' - 'A');
    }
    return s;
}

  In a later example walter used would definitely work, but what if the
compiler did most of the work for us? Say, adding a keyword like cowref? Then
the only visible change would be in the definition signature.

char[] toupper(cowref char[] s)

  Internally it would add a flag, so just before it changes the the array, it
would check the flag and if it hasn't been done yet, makes a duplicate copy.
With this in mind, it can be treated as an (const/in) to calling functions and
thought of as inout inside the function, this allows accepting of
const/immutable data. These could be a permanent change in how arrays work for
these features too, or maybe a subtype of array for these specific calls.

char[] toupper(cowref char[] s)
{
    bool __cow_s = true;
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z') {
            if (__cow_s) {
               /*make copy*/
                __cow_s = false
            }
	    s[i] = c - (cast(char)'a' - 'A');
        }
    }
    return s;
}

 For optimization involving only one cowref, the compiler may end up making two
copies of the function with a additional label/goto so when it would be able to
modify the code the first time, it would copy and then branch to the copy so
the check isn't done on every pass. ex:


char[] toupper(cowref char[] s)
{
    int i;

    for (i = 0; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z') {
            /*changes made in this scope, everything but the array copying
              is removed. */
            goto __cowref_jump;
        }
    }
    return s;

    /*only copies code it can possibly return to, in a loop or goto jumps*/
    for (; i < s.length; i++)
    {
	char c = s[i];
	if ('a' <= c && c <= 'z') {
/*continue point at start of scope*/
__cowref_jump:
	    s[i] = c - (cast(char)'a' - 'A');
        }
    }
    return s;
}

  Second thought is for when you want to refer to the original array, but only
copy specific elements (rather than the whole array) forward. This would be
useful especially when doing sector referencing of 512 bytes or larger as an
individual block. Perhaps cowarray would be used. The array would work
normally, but with only a couple extra lookups. This would also accept
const/immutable data.

char[] toupper(cowarray char[] s)
{
//if known it's returning the array, it might precopy the original.
//but if it does that, the bool change array is probably unneeded unless
//you need to know if specific parts of the array were changed. Which
//means it may just become a cowref instead of a cowarray.
//bool still needed for multi-dimensional arrays.
    bool[] __cowarr_change_s = new bool[s.length];
    char[] __cowarr_arr_s;

    int i;

    for (i = 0; i < s.length; i++)
    {
//if changed, use change
//If the compiler sees it will never go other this again, it may
//skip this check and just read.
	char c = __cowarr_change_s[i] ? __cowarr_arr_s[i] : s[i];
//precopy
//	char c = __cowarr_change_s[i];

	if ('a' <= c && c <= 'z') {
//change and ensure it's changed on the flag.
	    __cowarr_arr_s[i] = c - (cast(char)'a' - 'A');
            __cowarr_change_s[i] = true;
        }
    }

   /*when copying out or to another array or duplicating, the current view
     is used without the cow part active.*/
    return s;
}

  If you needed to know if it changed on that block, perhaps .changed can be
used and the compiler would return the true/false.
  if(s[i].changed) { /*code/*
//becomes
  if(__cowarr_change_s[i]) { /*code*/

  Finally, the last suggestion involves structure copying. When copying a
structure it does a bitwise copy, however when you work with references to
arrays/structures/classes, you may want to make a duplicate rather than refer
to the original.

//Book example, pg 246
struct Widget {
   private int[] array;
   this(uint length) {
      array = new int[length];
   }
   // Postblit constructor
   this(this) {
      array = array.dup;
   }
   /*other code*/
}

  Perhaps a keyword like oncopy(copy function defaults to dup) or
onstructcopy(<-same) can be used. the compiler would gather all the oncopy's
and make a default this(this) using them. If you need anything more
complicated/extra during the copy, your definition of this(this) would execute
after the compiler built one (appended to the compiler generated one.) Ex:

struct Widget {
//   private oncopy(dup) int[] array; 
//       Name of function (is/could be) optional if the function dup
//       is used to create a copy. might be used as oncopy!(dup)
   private oncopy int[] array;

     this(this) {
         //compiler generated oncopy's
            array = array.dup; //dup is the copy name, which could be clone or
something else.

         // User definition (if any) Appended here.
     }

   /*other code*/
}

  Naturally, immutable data doesn't need to copy since it doesn't change;
however if it does change during the copy the user would likely end up doing it
manually, so using oncopy on immutable data would cause an error.

 Comments and suggestions? I'd like to hear Walter's feedback and opinions on
these.

 Era


      
Aug 21 2010