digitalmars.D - constness for arrays

xs0 (56/56) Jul 18 2006 As has been discussed, the lack of something like C++ const hurts most

Reiner Pope (5/6) Jul 18 2006 I just want to point out that this means that you can compile a release
David Medlock (9/20) Jul 18 2006

xs0 (12/33) Jul 18 2006 Theoretically, but in practice:

Don Clugston (21/31) Jul 18 2006 This is a really interesting idea. You're essentially chasing a

David Medlock (8/49) Jul 18 2006 Agreed , Don.
xs0 (15/47) Jul 18 2006 Well, actually the primary motivation is correctness*, but the whole
xs0 (139/160) Jul 19 2006 Well, I did a (admittedly biased :) test, and there do seem to be

Andrew Fedoniouk (41/41) Jul 18 2006 Dynamic constness versus static (compile time) constness is not new.

Chad J (21/74) Jul 18 2006 I like that typedef. Should be templatable though...

Andrew Fedoniouk (21/47) Jul 18 2006 I think so too. It would be nice to have this but I think that

Chad J (27/40) Jul 18 2006 In the usage.

Andrew Fedoniouk (16/27) Jul 18 2006 If you will define it as

Chad J (20/43) Jul 18 2006 I suppose that means I could do something like

xs0 (20/36) Jul 19 2006 Is there any particular difference from

Andrew Fedoniouk (32/67) Jul 19 2006 difference is in disabled opAssign**, so if you will define

xs0 (19/101) Jul 19 2006 But you can also not define opSliceAssign in struct string, and get the

Johan Granberg (40/42) Jul 19 2006 I think you are missing the point of this proposal (which I like a lot

Andrew Fedoniouk (17/59) Jul 19 2006 Johan, you've got it right.

Andrew Fedoniouk (18/18) Jul 19 2006 I think I need to explain the idea using different words.

Dave (16/70) Jul 18 2006 What do you mean by external methods?

Andrew Fedoniouk (6/20) Jul 18 2006 People are looking in the doc/ language specification first.

Dave (3/11) Jul 18 2006 I agree; just pointing out that it is there by design even if that

xs0 (19/42) Jul 19 2006 Cool! OTOH, I'm proposing of making the reference readonly, not the data...

xs0 (2/4) Jul 19 2006 Of course, both return a as well :)
Bruno Medeiros (11/25) Jul 20 2006 For that case (how to avoid the unnecessary dups):

Reiner Pope (71/99) Jul 19 2006 What do you mean by this? The runtime itself doesn't need to be able to

Andrew Fedoniouk (30/63) Jul 19 2006 propsed readonly solves one particular pretty narrow case of COW

Reiner Pope (32/101) Jul 20 2006 You have to be aware of CoW if you are writing a CoW function. It's like...

Craig Black (3/3) Jul 19 2006 Sounds like a great idea to me. Easy to implement, improves correctness...

xs0 (11/13) Jul 19 2006 Personally, I'm waiting/hoping for Walter to see the proposal and say

Don Clugston (9/23) Jul 20 2006 Maybe you just need some better terminology. How about

Andrew Fedoniouk (19/42) Jul 20 2006 Don, I think that reference counting (MOO COW) has the

Don Clugston (20/37) Jul 21 2006 I agree that something more universal would be better. But there's a

Andrew Fedoniouk (51/88) Jul 21 2006 "But there's a really interesting feature: when you have GC, you don't n...

Ben Phillips (9/13) Jul 21 2006 It is impossible to allow operator "=" to be overloaded without totally ...

Andrew Fedoniouk (21/40) Jul 21 2006 I think that operator= shall be available only for structs and probably

Derek Parnell (24/56) Jul 21 2006 ally

Andrew Fedoniouk (21/45) Jul 22 2006 "Derek Parnell" wrote in message

Derek Parnell (15/16) Jul 22 2006 Agreed.

xs0 <xs0 xs0.com> writes:

As has been discussed, the lack of something like C++ const hurts most 
when using arrays, as you can't code around it like with classes, 
structs or primitives (with the latter two you can just pass by value, 
for classes you can make readonly versions). The fact that inbuilt 
strings are also arrays makes the problem occur often.

I was wondering whether the following would resolve that issue:

- the top bit of arrays' .length becomes an indicator of the 
readonlyness of the array reference

- type of .length is changed from uint to int (just to indicate the 
proper maximum value; it still can't be negative (from the user's POV))

- arrays get a .isReadonly property which tests the top bit

- arrays get a .lock() method that sets it to 1

- .dup clears it (obviously :)

- reading .length masks the bit out

- setting .length sets the bit to zero if reallocation occurs, and 
leaves it intact otherwise

- arrays get a .readonly property which returns a copy of the array 
reference with the bit set

- optionally, arrays get a .needToWrite() method which does the 
following: { if (arr.isReadonly) arr=arr.dup; }  (yes, the name sucks)


Now this has the following (imho) neat properties:

- initial implementation should be quite trivial, I bet Walter could do 
it in a few hours; eventually, debug builds could prevent you from 
writing to a readonly array, but that's even not that important

- losing that one bit has no real effect on anything

- it can be tested for at runtime

- it has practically negligible impact on efficiency:
   - reading .length needs one instruction more (AND with 0x7fffffff)
   - setting .length needs about three instructions more
   - reading and writing to the array has no additional cost
   - moving the reference around also costs the same
   - new operations are quite trivial as well

- fits with COW perfectly

- there's no const-pollution and no need to write two versions of functions


A quick example of the possibilities:

char[] toUpper(char[] txt)
{
     for (int i=0; i<txt.length; i++) {
         char c = s[i];
         if ('a'<=c && c<='z') {
             txt.needToWrite();
             txt[i] = c-(cast(char)'a'-'A');
         }
     }
}

char[] FOO = toUpper("foo"); // constants are readonly, so COW is made

char[] bibi = getBibi(); // who owns it? I can finally know if it's me
char[] BIBI = toUpper(bibi); // write into bibi, if owned
char[] BIBI = toUpper(bibi.readonly); // leave bibi alone, as I need it


What y'all think?


xs0

PS:
Credits: the idea is not all mine, I got it from the discussion with 
Reiner Pope on D.learn

I'm also sorry if this was already suggested, but I don't remember 
anything of the sorts discussed..

Jul 18 2006

Reiner Pope <reiner.pope gmail.com> writes:

I like it. Same capabilities as I was grasping at, but much more elegant.

   - reading and writing to the array has no additional cost

I just want to point out that this means that you can compile a release 
version with checking disabled for max speed, but for extra safety, you 
could set write-checking for debug builds, where the slowdown should be 
more acceptable.

Jul 18 2006

David Medlock <noone nowhere.com> writes:

xs0 wrote:
 As has been discussed, the lack of something like C++ const hurts most 
 when using arrays, as you can't code around it like with classes, 
 structs or primitives (with the latter two you can just pass by value, 
 for classes you can make readonly versions). The fact that inbuilt 
 strings are also arrays makes the problem occur often.
 
 I was wondering whether the following would resolve that issue:
 
 - the top bit of arrays' .length becomes an indicator of the 
 readonlyness of the array reference
 

<snip>

I like it except it drops the max array size by half, doesn't it?

Since we are talking about dynamic arrays here, why not just:

1. add a flags byte or short to the internal array structure to hold it.

2. make the pointers lower bit hold it- this of course assumes the 
pointer is at least word-aligned.  I am not sure if this would conflict 
with structs which are align(1).

-DavidM

Jul 18 2006

xs0 <xs0 xs0.com> writes:

David Medlock wrote:
 xs0 wrote:
 As has been discussed, the lack of something like C++ const hurts most 
 when using arrays, as you can't code around it like with classes, 
 structs or primitives (with the latter two you can just pass by value, 
 for classes you can make readonly versions). The fact that inbuilt 
 strings are also arrays makes the problem occur often.

 I was wondering whether the following would resolve that issue:

 - the top bit of arrays' .length becomes an indicator of the 
 readonlyness of the array reference

 <snip>
 
 I like it except it drops the max array size by half, doesn't it?

Theoretically, but in practice:
- if you have a 64-bit machine, you don't care
- if you have a 32-bit machine, you can't get the full 4GB anyway (on 
Windows, a process can only allocate 2GB, I bet it's similar in other OSes)
- with anything larger than a byte you don't even theoretically need 
that bit

 Since we are talking about dynamic arrays here, why not just:
 
 1. add a flags byte or short to the internal array structure to hold it.

because that would increase the size of array structure, making it 
consume more memory (I'd guess at least 4 bytes per reference) and 
slower (more data to copy)

 2. make the pointers lower bit hold it- this of course assumes the 
 pointer is at least word-aligned.  I am not sure if this would conflict 
 with structs which are align(1).

that wouldn't work - the pointer is unrestricted and can point anywhere..


xs0

Jul 18 2006

Don Clugston <dac nospam.com.au> writes:

xs0 wrote:
 As has been discussed, the lack of something like C++ const hurts most 
 when using arrays, as you can't code around it like with classes, 
 structs or primitives (with the latter two you can just pass by value, 
 for classes you can make readonly versions). The fact that inbuilt 
 strings are also arrays makes the problem occur often.
 
 I was wondering whether the following would resolve that issue:
 
 - the top bit of arrays' .length becomes an indicator of the 
 readonlyness of the array reference

This is a really interesting idea. You're essentially chasing a 
performance benefit, rather than program correctness. Some benchmarks 
ought to be able to tell you if the performance benefit is real:

instead of char[], use

struct CharArray {
  char [] arr;
  bool readOnly;
}

for both the existing and proposed behaviour (for the existing one, 
readonly is ignored, but include it to make the parameter passing fair).

For code that makes heavy use of COW, I suspect that the benefit could 
be considerable. You probably don't need to eliminate many .dups to pay 
for the slightly slower .length.

The situation where a function only occasionally returns a read-only 
string is probably quite common:

char [] func(int n) {
   if (n==0) return "annoying";
   else return toString(n);
}
.. and you have to .dup it for the rare case where it's a literal.

Jul 18 2006

David Medlock <noone nowhere.com> writes:

Don Clugston wrote:
 xs0 wrote:
 
 As has been discussed, the lack of something like C++ const hurts most 
 when using arrays, as you can't code around it like with classes, 
 structs or primitives (with the latter two you can just pass by value, 
 for classes you can make readonly versions). The fact that inbuilt 
 strings are also arrays makes the problem occur often.

 I was wondering whether the following would resolve that issue:

 - the top bit of arrays' .length becomes an indicator of the 
 readonlyness of the array reference

 
 
 This is a really interesting idea. You're essentially chasing a 
 performance benefit, rather than program correctness. Some benchmarks 
 ought to be able to tell you if the performance benefit is real:
 
 instead of char[], use
 
 struct CharArray {
  char [] arr;
  bool readOnly;
 }
 
 for both the existing and proposed behaviour (for the existing one, 
 readonly is ignored, but include it to make the parameter passing fair).
 
 For code that makes heavy use of COW, I suspect that the benefit could 
 be considerable. You probably don't need to eliminate many .dups to pay 
 for the slightly slower .length.
 
 The situation where a function only occasionally returns a read-only 
 string is probably quite common:
 
 char [] func(int n) {
   if (n==0) return "annoying";
   else return toString(n);
 }
 .. and you have to .dup it for the rare case where it's a literal.
 

Agreed , Don.

Its important to note this is two issues:

1.  A readonly property of arrays.





-DavidM

Jul 18 2006

xs0 <xs0 xs0.com> writes:

Don Clugston wrote:
 xs0 wrote:
 As has been discussed, the lack of something like C++ const hurts most 
 when using arrays, as you can't code around it like with classes, 
 structs or primitives (with the latter two you can just pass by value, 
 for classes you can make readonly versions). The fact that inbuilt 
 strings are also arrays makes the problem occur often.

 I was wondering whether the following would resolve that issue:

 - the top bit of arrays' .length becomes an indicator of the 
 readonlyness of the array reference

 
 This is a really interesting idea. You're essentially chasing a 
 performance benefit, rather than program correctness. Some benchmarks 
 ought to be able to tell you if the performance benefit is real:

Well, actually the primary motivation is correctness*, but the whole 
thing does indeed seem to benefit performance as well (well, pending 
some actual tests, but it seems quite obvious).

 instead of char[], use
 
 struct CharArray {
  char [] arr;
  bool readOnly;
 }
 
 for both the existing and proposed behaviour (for the existing one, 
 readonly is ignored, but include it to make the parameter passing fair).

will probably do, as soon as I have some time (but if anyone else feels 
like it, go ahead ;)


 The situation where a function only occasionally returns a read-only 
 string is probably quite common:
 
 char [] func(int n) {
   if (n==0) return "annoying";
   else return toString(n);
 }
 .. and you have to .dup it for the rare case where it's a literal.

Yup.. and it's also common to .dup just because there is no indication 
at all whether it is required or not, and it's the safe thing to do.. 
like, if you call something like ucFirst(toLower("BIBI")) there's no 
need for ucFirst to .dup (already done in toLower), but it still does as 
it has no idea that is the case..


xs0

*) correctness in the sense that it becomes easier to write correct 
programs, and not in the sense that the compiler would force you to do 
so, at least until debugtime checking is done

Jul 18 2006

xs0 <xs0 xs0.com> writes:

Don Clugston wrote:
 xs0 wrote:
 - the top bit of arrays' .length becomes an indicator of the 
 readonlyness of the array reference

 
 This is a really interesting idea. You're essentially chasing a 
 performance benefit, rather than program correctness. Some benchmarks 
 ought to be able to tell you if the performance benefit is real:
 
 instead of char[], use
 
 struct CharArray {
  char [] arr;
  bool readOnly;
 }
 
 for both the existing and proposed behaviour (for the existing one, 
 readonly is ignored, but include it to make the parameter passing fair).
 
 For code that makes heavy use of COW, I suspect that the benefit could 
 be considerable. You probably don't need to eliminate many .dups to pay 
 for the slightly slower .length.

Well, I did a (admittedly biased :) test, and there do seem to be 
potential large benefits..

I wrote an app that counts different words in the 5.6MB ASCII text from
http://www.gutenberg.org/etext/1581

The text was read and duplicated 10 times (so I could do 10 runs). Then, 
words were extracted (word := a sequence of alnum chars), lowercased and 
placed into an AA. I ran each version about 20 times, and here are the 
fastest results for each:

bib_current    : 3641ms
bib_str        : 3031ms
bib_str (old)  : 3625ms
bib_str (ugly) : 3109ms

Commenting out the AA stuff, the results become

bib_current    : 1281
bib_str        : 812
bib_str (old)  : 1234
bib_str (ugly) : 812

About 11% of the calls to toLower would result in .duping currently, and 
none do with the new system, as it's not necessary in this particular 
case. Had I used toUpper, ... :)

bib_current is exactly the same code, except it uses Phobos' tolower and 
char[] instead of string.

For some reason, bib_current is (slightly) slower even than the string 
version that does exactly the same thing.. my guess would be that some 
more inlining/optimization was done in my code..

For some other reason, toLowerUgly is slower than toLowerNew, even 
though it potentially does less checks.. Probably the benefit of that 
was lost completely, as words tend to only have the first character 
uppercase, and more code just slowed the thing down.

Well, anyway, the conclusion would be that using that bit for readonly 
indication does not cause slowdowns even for code that doesn't use it. 
If used for COW-only-when-necessary, speed gains can be considerable.


xs0


The code was this (I snipped boring code in the interest of brevity, I 
can post the full code if someone wants it)

struct string {
     char* ptr;
     uint _length;

     public static string opCall(char[] bu, int readonly) { ... }
     public int length() { return _length & 0x7fffffff }
     public void length(int newlen) { ... }
     public string dup() { ... }
     public char opIndex(int i) { return ptr[i]; }
     public char opIndexAssign(char c, int i) { return ptr[i] = c; }
     public char[] toString() { return ptr[0..length()]; }
     public void wantToWrite() { if (_length & 0x80000000 } { ... } }
     public string slice(int start, int end) { ... }
}

string toLowerOld(string txt)
{
     int l = txt.length;
	
     for (int a=0; a<l; a++) {
         char c = txt[a];
         if (c>='A' && c<='Z') {
             txt = txt.dup;
             txt[a] = c+32;
             for (int b=a+1; b<l; b++) {
                 c = txt[b];
                 if (c>='A' && c<='Z')
                     txt[b]=c+32;
             }
             return txt;
         }
     }
     return txt;
}

string toLowerNew(string txt)
{
     int l = txt.length;
     for (int a=0; a<l; a++) {
         char c = txt[a];
         if (c>='A' && c<='Z') {	
             txt.wantToWrite();
             txt[a]=c+32;
         }
     }
     return txt;
}

string toLowerUgly(string txt)
{
     int l = txt.length;
     for (int a=0; a<l; a++) {
         char c = txt[a];
         if (c>='A' && c<='Z') {
             txt.wantToWrite();
             txt[a]=c+32;
             for (int b=a+1; b<l; b++) {
                 c = txt[b];
                 if (c>='A' && c<='Z')
                     txt[b]=c+32;
             }
             return txt;
         }
     }
     return txt;
}

void main()
{
     string[] bible;
     bible.length = 10;
     for (int a=0; a<bible.length; a++) {
         if (a==0) {
             bible[a] = string(cast(char[])read("bible.txt"), 0);
         } else {
             bible[a] = bible[a-1].dup;
         }
     }
     long start = getUTCtime();

     uint result;

     for (int q=0; q<bible.length; q++) {
         string txt = bible[q];

         int[char[]] count;

         int pos = 0;
         while (pos<txt.length) {
             if (!isalnum(txt[pos])) {
                 pos++;
                 continue;
             }
             int len = 1;
             while (pos+len < txt.length && isalnum(txt[pos+len]))
                 len++;

             //string word = toLowerOld(txt.slice(pos, pos+len));
             string word = toLowerNew(txt.slice(pos, pos+len));
             //string word = toLowerUgly(txt.slice(pos, pos+len));
             pos+=len;

             if (auto c = word.toString() in count) {
                 (*c)++;
             } else {
                 count[word.toString()]=1;
             }
         }
         result = count.length;
     }
     long end = getUTCtime();

     writefln("Different words found: ", result);
     writefln("Time taken: ", (end-start));
}

Jul 19 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

Dynamic constness versus static (compile time) constness is not new.

For example in Ruby you can dynamicly declare object/array readonly and
its runtime will control all modifications and note - in full as Ruby's 
sandbox
(as any other VM based runtime) has all facilities to fully control
immutability of such objects.

In case of runtimes like D (natively compileable) such control is not an
option.

I beleive that proposed runtime flag a) is not a constness in any sense
b) does not solve compile verification of readonlyness and
c) can be implemented now by defining:
struct vector
{
    bool readonly;
    T*  data;
    uint length;
}

Declarative contness prevents data misuse at compile time
when runtime constness moves problem into execution time
when is a) too late to do anything and b) expensive.

I would mention old idea again - real solution would be in creating of
mechanism of disabling exiting or creating new opertaions
for intrinsic types.

For example string definition might look like as:

typedef  string char[]
{
    disable opAssign;
    ....
    char[] tolower() { ..... }
}

In any case such mechanism a) is more universal than const in C++
b) allows to create flexible type systems and finally
c) this will also legalize situation with
"external methods" D has now for array types.

The later one alone is a good enough motivation to do so
as current situation with "external methods" looks like as
just a bug of design or compiler to be honest.

I am yet silent that it will make D's type system unique
in this respect among other languages.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 18 2006

Chad J <gamerChad _spamIsBad_gmail.com> writes:

Andrew Fedoniouk wrote:
 Dynamic constness versus static (compile time) constness is not new.
 
 For example in Ruby you can dynamicly declare object/array readonly and
 its runtime will control all modifications and note - in full as Ruby's 
 sandbox
 (as any other VM based runtime) has all facilities to fully control
 immutability of such objects.
 
 In case of runtimes like D (natively compileable) such control is not an
 option.
 
 I beleive that proposed runtime flag a) is not a constness in any sense
 b) does not solve compile verification of readonlyness and
 c) can be implemented now by defining:
 struct vector
 {
     bool readonly;
     T*  data;
     uint length;
 }
 
 Declarative contness prevents data misuse at compile time
 when runtime constness moves problem into execution time
 when is a) too late to do anything and b) expensive.
 
 I would mention old idea again - real solution would be in creating of
 mechanism of disabling exiting or creating new opertaions
 for intrinsic types.
 
 For example string definition might look like as:
 
 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }
 
 In any case such mechanism a) is more universal than const in C++
 b) allows to create flexible type systems and finally
 c) this will also legalize situation with
 "external methods" D has now for array types.
 
 The later one alone is a good enough motivation to do so
 as current situation with "external methods" looks like as
 just a bug of design or compiler to be honest.
 
 I am yet silent that it will make D's type system unique
 in this respect among other languages.
 
 Andrew Fedoniouk.
 http://terrainformatica.com
 

I like that typedef.  Should be templatable though...

typedef(T) array T[]
{
     ...
}

Or some such.  In an earlier post ("Module level operator overloading" 
at http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/39504) I was 
hoping for external functions as operator overloads and IFTI to help 
with things like array operations.  I just didn't know about external 
functions at the time.  But if this is supposed to replace external 
functions, how would I do the array op overloads that external functions 
would help me with?  Would be unfortunate to write something like this...

typedef(T) T[] T[] // mmm what would this do
{
   void opAdd(T[] array1, T[] array2)
   {
     etc...
   }

   ...
}

Jul 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }


 I like that typedef.  Should be templatable though...

 typedef(T) array T[]
 {
     ...
 }

I think so too. It would be nice to have this but I think that
it is enough to be able to define such types in each perticular case.


I think that such extended typedef makes sense for other basic types:

typedef color uint
{
    uint red() {  .... }
    uint blue() {  .... }
    uint green() {  .... }
}

Also such typedef makes sense for classes too.
To avoid vtbl  pollution. Especially actual for templated classes.


 Or some such.  In an earlier post ("Module level operator overloading" at 
 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/39504) I was 
 hoping for external functions as operator overloads and IFTI to help with 
 things like array operations.  I just didn't know about external functions 
 at the time.  But if this is supposed to replace external functions, how 
 would I do the array op overloads that external functions would help me 
 with?  Would be unfortunate to write something like this...

 typedef(T) T[] T[] // mmm what would this do
 {
   void opAdd(T[] array1, T[] array2)
   {
     etc...
   }
 }

What is the problem with the following:

typedef(T) array T[]
{
   void opAdd(array a1, array a2)
   {

   }
}

?

Andrew Fedoniouk.
http://terrainformatica.com

Jul 18 2006

Chad J <gamerChad _spamIsBad_gmail.com> writes:

Andrew Fedoniouk wrote:
 
 What is the problem with the following:
 
 typedef(T) array T[]
 {
    void opAdd(array a1, array a2)
    {
 
    }
 }
 
 ?
 

In the usage.
How do you use it?
Something like this?

array!(short) foo = [1,2,3];
array!(short) bar = [4,5,6];
array!(short) result = foo + bar;
// result is now [5,7,9]

not too bad... how about multidimensional stuff...

array!(array!(short)) foo = [[1,2],[3,4]];
array!(array!(short)) bar = [[5,6],[7,8]];
// um, maybe a matrix multiply or something.  I don't feel like it.

Well doable, but it would be better to have ordinary array syntax. 
Also, what if some external library passes in an ordinary array that is 
not set up as one of these types, and you want to use the new fancy 
features on it:

void gimmeAnArray( short[] foo )
{
   array!(short) bar = array!(short).convert( foo );
   // ugh
   array!(short) bar = array.convert( foo );
   // ahh IFTI is better, but this whole line should be unnecessary IMO.
   ...
}

I suppose the problem I have with your syntax suggestion is that it's 
impossible to add properties to existing types.  It forces you to define 
new types to add properties.

Jul 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 void gimmeAnArray( short[] foo )
 {
   array!(short) bar = array!(short).convert( foo );
   // ugh
   array!(short) bar = array.convert( foo );
   // ahh IFTI is better, but this whole line should be unnecessary IMO.
   ...
 }

 I suppose the problem I have with your syntax suggestion is that it's 
 impossible to add properties to existing types.  It forces you to define 
 new types to add properties.

If you will define it as

alias array short[]
{
    ...
    void someNewOp(self) {    }
}

then 1) you can use this someNewOp with it and 2)

void gimmeAnArray( short[] foo )
{
    array bar = foo; // ok
    ...
}

Extended alias allows to extend base types.
Extended typedef allows to extend and to reduce operations of base types.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 18 2006

Chad J <gamerChad _spamIsBad_gmail.com> writes:

Andrew Fedoniouk wrote:
 
 If you will define it as
 
 alias array short[]
 {
     ...
     void someNewOp(self) {    }
 }
 
 then 1) you can use this someNewOp with it and 2)
 
 void gimmeAnArray( short[] foo )
 {
     array bar = foo; // ok
     ...
 }
 
 Extended alias allows to extend base types.
 Extended typedef allows to extend and to reduce operations of base types.
 
 Andrew Fedoniouk.
 http://terrainformatica.com
 

I suppose that means I could do something like

alias array short[]
{
   ...
   short[] opAdd( short[] other )
   {
     ...
   }
}

short[] gimmeAnArray( short[] foo )
{
   short[] newArray;
   for ( int i = 0; i < foo.length; i++ )
     newArray ~= i;

   return foo + newArray; // usage of extension on short[]
}

That would be cool.  Though it would be nice if it didn't also stick an 
"array" type out there (does it?).  Is there anywhere I can find a 
complete look at what you're proposing?

Jul 18 2006

xs0 <xs0 xs0.com> writes:

Andrew Fedoniouk wrote:
 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }



Is there any particular difference from

struct string
{
     char[] data;

     char[] tolower() { .... }
}

?

 I think that such extended typedef makes sense for other basic types:
 
 typedef color uint
 {
     uint red() {  .... }
     uint blue() {  .... }
     uint green() {  .... }
 }

Is there any particular difference from

struct color {
     uint value;
     uint red() { ... }
     ...
}

?


 Also such typedef makes sense for classes too.

I don't get that.. Since you seem to want a new type, what's wrong with 
deriving?

 To avoid vtbl  pollution. Especially actual for templated classes.

Make the methods or class final, then they don't go into vtbl (or at 
least shouldn't).


xs0

Jul 19 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"xs0" <xs0 xs0.com> wrote in message news:e9lu7n$2jv0$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:
 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }



 Is there any particular difference from

 struct string
 {
     char[] data;

     char[] tolower() { .... }
 }

difference is in disabled opAssign**, so if you will define
let's say following:

typedef  string char[]
{
    disable opSliceAssign;
    ....
    char[] tolower() { ..... }
}

then you will not be able to compile following:

string s = "something read-only";
s[0..s.length] = '\0'; // compile time error.

 ?

 I think that such extended typedef makes sense for other basic types:

 typedef color uint
 {
     uint red() {  .... }
     uint blue() {  .... }
     uint green() {  .... }
 }

 Is there any particular difference from

 struct color {
     uint value;
     uint red() { ... }
     ...
 }

 ?

The difference is that such color is inherently uint so
you can do following:

color c = 0xFF00FF;
c <<= 8;
uint r = c.red();

 Also such typedef makes sense for classes too.

 I don't get that.. Since you seem to want a new type, what's wrong with 
 deriving?

See:

alias NewClass OldClass
{
    void foo() { .... }
}

will not create new VTBL for NewClass.
It is just a syntactic sugar:

Instead of defining and using:

void foo_x( OldClass c ) { ..... }

You can use

NewClass nc = ....;
nc.foo();


 To avoid vtbl  pollution. Especially actual for templated classes.

 Make the methods or class final, then they don't go into vtbl (or at least 
 shouldn't).

All classes in D has VTBL by definition as far as I remember.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 19 2006

xs0 <xs0 xs0.com> writes:

Andrew Fedoniouk wrote:
 "xs0" <xs0 xs0.com> wrote in message news:e9lu7n$2jv0$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:
 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }



 Is there any particular difference from

 struct string
 {
     char[] data;

     char[] tolower() { .... }
 }

 
 difference is in disabled opAssign**, so if you will define
 let's say following:
 
 typedef  string char[]
 {
     disable opSliceAssign;
     ....
     char[] tolower() { ..... }
 }
 
 then you will not be able to compile following:
 
 string s = "something read-only";
 s[0..s.length] = '\0'; // compile time error.

But you can also not define opSliceAssign in struct string, and get the 
compile time error?


 I think that such extended typedef makes sense for other basic types:

 typedef color uint
 {
     uint red() {  .... }
     uint blue() {  .... }
     uint green() {  .... }
 }

 Is there any particular difference from

 struct color {
     uint value;
     uint red() { ... }
     ...
 }

 ?

 
 The difference is that such color is inherently uint so
 you can do following:
 
 color c = 0xFF00FF;
 c <<= 8;
 uint r = c.red();

Besides the first line, you can do the same with a struct. And I'd say 
it's good that color and uint are not fully interchangeable, considering 
how they have nothing in common; one is a 32-bit integer, the other is 
more a byte[3] or byte[4], and even then you can't really say that a 
'generic 8-bit integer' and a 'level of red intensity' have much in common..


 Also such typedef makes sense for classes too.

 I don't get that.. Since you seem to want a new type, what's wrong with 
 deriving?

 
 See:
 
 alias NewClass OldClass
 {
     void foo() { .... }
 }
 
 will not create new VTBL for NewClass.
 It is just a syntactic sugar:
 
 Instead of defining and using:
 
 void foo_x( OldClass c ) { ..... }
 
 You can use
 
 NewClass nc = ....;
 nc.foo();

Well, if you override a function, it should be virtual. If it didn't 
exist and is final, the compiler should be able to determine it can call 
it directly. If it is not final (meaning you plan to override it in a 
further derived class), it should again be virtual.. So I don't see the 
problem..

Also, why would you want a non-member function to look like it is a 
member function? Just causes confusion..

Finally, bar.foo() isn't really a shorthand for foo(bar), being one 
character longer.. Seems more like syntactic saccharin :)


 To avoid vtbl  pollution. Especially actual for templated classes.

 Make the methods or class final, then they don't go into vtbl (or at least 
 shouldn't).

 
 All classes in D has VTBL by definition as far as I remember.

Yup.

xs0

Jul 19 2006

Johan Granberg <lijat.meREM OVEgmail.com> writes:

xs0 wrote:
 But you can also not define opSliceAssign in struct string, and get the 
 compile time error?

I think you are missing the point of this proposal (which I like a lot 
by the way).

(Andrew Fedoniouk if I have missinterepted your proposal, pleas excuse me)

The purpose is not to extend a type as with a class inheritance or to 
create a new type as with a new class or struct, but to alter the 
behavior of an existing type to allow for things like read only and 
color types that can bee passed as is to common graphics api's that 
expects uints etc.

This would add missing power too the language that are not available at 
the moment. An alternative weaker (but in my opinion worse) syntax to do 
this would be this.

struct string : char[]
{
	//posibly
	override opSliceAssign(){throw new Exception("");}

	//or
	disable opSliceAssign();
}

Here we introduce struct inheritance and the use of built in types as 
base classes but due too the extending nature of inheritance (the child 
beeing a superset of the parent) the disable syntax is bad and the use 
of inheritance forces the use of the virtual table (which structs and 
inbuilt types don't have).

The proposed syntax allows for this on the other hand

typedef char[] string
{
	disable opSliceAssign;
	string toLower(){..}
}

If we used inheritance this would not bee possible because we remove 
opSliceAssign from the interface of the type.

Here we use the new syntax to describe a "starting from" relation ship 
while inheritance creates a is a relationship.

We could create an entirely new type using structs and so on but the we 
would have to specify all the methods and fields of the new type rather 
than our changes. This is very much against the principles of code reuse.

/Johan Granberg

ps. Walter this could bee a nice feature to have as it would allow the 
creation of subsets of types (and intersecting types) as well as supersets.

Jul 19 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Johan Granberg" <lijat.meREM OVEgmail.com> wrote in message 
news:e9m79k$2sf$1 digitaldaemon.com...
 xs0 wrote:
 But you can also not define opSliceAssign in struct string, and get the 
 compile time error?

 I think you are missing the point of this proposal (which I like a lot by 
 the way).

 (Andrew Fedoniouk if I have missinterepted your proposal, pleas excuse me)

Johan, you've got it right.

 The purpose is not to extend a type as with a class inheritance or to 
 create a new type as with a new class or struct, but to alter the behavior 
 of an existing type to allow for things like read only and color types 
 that can bee passed as is to common graphics api's that expects uints etc.

Exactly. The main purpose is not for extending classes but to
give opportunity to extend intrinsic and value types.
It makes real sense for arrays, integers, enums, etc.

Again, external methods for arrays are here anyway - this
is good chance to legalize them.

 This would add missing power too the language that are not available at 
 the moment. An alternative weaker (but in my opinion worse) syntax to do 
 this would be this.

 struct string : char[]
 {
 //posibly
 override opSliceAssign(){throw new Exception("");}

 //or
 disable opSliceAssign();
 }

 Here we introduce struct inheritance and the use of built in types as base 
 classes but due too the extending nature of inheritance (the child beeing 
 a superset of the parent) the disable syntax is bad and the use of 
 inheritance forces the use of the virtual table (which structs and inbuilt 
 types don't have).

 The proposed syntax allows for this on the other hand

 typedef char[] string
 {
 disable opSliceAssign;
 string toLower(){..}
 }

 If we used inheritance this would not bee possible because we remove 
 opSliceAssign from the interface of the type.

 Here we use the new syntax to describe a "starting from" relation ship 
 while inheritance creates a is a relationship.

 We could create an entirely new type using structs and so on but the we 
 would have to specify all the methods and fields of the new type rather 
 than our changes. This is very much against the principles of code reuse.

Exactly!

Consider this
typedef uint color { ubyte red() {....} }
I want to keep color all attrributes and operations of uint but to
give it couple of specific methods. The thing is that declaration
of such type will force all methods to be declared in one place.
Intellisense engine will like such declarations....

 /Johan Granberg

 ps. Walter this could bee a nice feature to have as it would allow the 
 creation of subsets of types (and intersecting types) as well as 
 supersets.

Yep. You've got an idea. sub- and super-sets are right words.

Andrew Fedoniouk.

Jul 19 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

I think I need to explain the idea using different words.

In terms of C++

"char[]" and
"const char[]"
are two distinct types.

"const char[]" is a reduced version of "char[]"

Reduced means that "const char[]" as a type has no
mutating methods like length(uint newLength),
opIndexAssign, etc.

extended typedef allows you to define
explicitly such const types by reducing
set of operations (what C++ does implicitly)
and also allows you to extend such types by new
methods.

Main value of the approach is for array and
pointer types I guess.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 19 2006

Dave <Dave_member pathlink.com> writes:

Andrew Fedoniouk wrote:
 Dynamic constness versus static (compile time) constness is not new.
 
 For example in Ruby you can dynamicly declare object/array readonly and
 its runtime will control all modifications and note - in full as Ruby's 
 sandbox
 (as any other VM based runtime) has all facilities to fully control
 immutability of such objects.
 
 In case of runtimes like D (natively compileable) such control is not an
 option.
 
 I beleive that proposed runtime flag a) is not a constness in any sense
 b) does not solve compile verification of readonlyness and
 c) can be implemented now by defining:
 struct vector
 {
     bool readonly;
     T*  data;
     uint length;
 }
 
 Declarative contness prevents data misuse at compile time
 when runtime constness moves problem into execution time
 when is a) too late to do anything and b) expensive.
 
 I would mention old idea again - real solution would be in creating of
 mechanism of disabling exiting or creating new opertaions
 for intrinsic types.
 
 For example string definition might look like as:
 
 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }
 
 In any case such mechanism a) is more universal than const in C++
 b) allows to create flexible type systems and finally
 c) this will also legalize situation with
 "external methods" D has now for array types.
 
 The later one alone is a good enough motivation to do so
 as current situation with "external methods" looks like as
 just a bug of design or compiler to be honest.
 
 I am yet silent that it will make D's type system unique
 in this respect among other languages.
 
 Andrew Fedoniouk.
 http://terrainformatica.com
 
 

What do you mean by external methods?

This?

import std.stdio;
void main()
{
     char[] str = "abc";
     writefln(str.ucase()); // "ABC"
}
char[] ucase(char[] str)
{
     foreach(inout char c; str) if(c >= 'a' && c <= 'z') c += 'A' - 'a';
     return str;
}

If so, that's not a bug, it's intentional. Line 4141 of expression.c.

- Dave

Jul 18 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

 What do you mean by external methods?

 This?

Positive.

 import std.stdio;
 void main()
 {
     char[] str = "abc";
     writefln(str.ucase()); // "ABC"
 }
 char[] ucase(char[] str)
 {
     foreach(inout char c; str) if(c >= 'a' && c <= 'z') c += 'A' - 'a';
     return str;
 }

 If so, that's not a bug, it's intentional. Line 4141 of expression.c.

People are looking in the doc/ language specification first.

Line 4141 of expression.c is the last place where someone will
try to find answer on what language features D has.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 18 2006

Dave <Dave_member pathlink.com> writes:

Andrew Fedoniouk wrote:
 
 People are looking in the doc/ language specification first.
 
 Line 4141 of expression.c is the last place where someone will
 try to find answer on what language features D has.
 

I agree; just pointing out that it is there by design even if that 
design hasn't been codified in the docs. <g>

 Andrew Fedoniouk.
 http://terrainformatica.com

Jul 18 2006

xs0 <xs0 xs0.com> writes:

Andrew Fedoniouk wrote:
 Dynamic constness versus static (compile time) constness is not new.

Never said it was.

 For example in Ruby you can dynamicly declare object/array readonly and
 its runtime will control all modifications and note - in full as Ruby's 
 sandbox
 (as any other VM based runtime) has all facilities to fully control
 immutability of such objects.

Cool! OTOH, I'm proposing of making the reference readonly, not the data 
itself.


 In case of runtimes like D (natively compileable) such control is not an
 option.

Because?


 I beleive that proposed runtime flag a) is not a constness in any sense

It's more like readonlyness.

 b) does not solve compile verification of readonlyness and

I said so myself :P But, the question is whether compile-time 
verification is better or not. In some cases it definitely isn't;

int[] cowFoo(int[] a) { if (whatever) { a=a.dup; a[0] = 5; } }
int[] cowBar(int[] a) { if (something) { a=a.dup; a[1] = 10; } }

int[] result=cowFoo(cowBar(whatever));

How can a compile-time check ever help you avoid the (unnecessary) 
second .dup when both funcs decide to modify the data?

 c) can be implemented now by defining:
 struct vector
 {
     bool readonly;
     T*  data;
     uint length;
 }

So? How does that help when using built-in arrays?

 Declarative contness prevents data misuse at compile time
 when runtime constness moves problem into execution time
 when is a) too late to do anything and b) expensive.

I disagree. A single .dup probably costs more than tens (if not 
hundreds) of checks of a single bit (which can even be disabled in 
release builds). And why would it be too late to do anything?

 I would mention old idea again - real solution would be in creating of
 mechanism of disabling exiting or creating new opertaions
 for intrinsic types.

Start your own thread :P


xs0

Jul 19 2006

xs0 <xs0 xs0.com> writes:

 int[] cowFoo(int[] a) { if (whatever) { a=a.dup; a[0] = 5; } }
 int[] cowBar(int[] a) { if (something) { a=a.dup; a[1] = 10; } }

Of course, both return a as well :)

xs0

Jul 19 2006

Bruno Medeiros <brunodomedeirosATgmail SPAM.com> writes:

xs0 wrote:
 Andrew Fedoniouk wrote:
 b) does not solve compile verification of readonlyness and

 
 I said so myself :P But, the question is whether compile-time 
 verification is better or not. In some cases it definitely isn't;
 
 int[] cowFoo(int[] a) { if (whatever) { a=a.dup; a[0] = 5; } }
 int[] cowBar(int[] a) { if (something) { a=a.dup; a[1] = 10; } }
 
 int[] result=cowFoo(cowBar(whatever));
 
 How can a compile-time check ever help you avoid the (unnecessary) 
 second .dup when both funcs decide to modify the data?
 

For that case (how to avoid the unnecessary dups):

   int[] cowFoo(int[] a) { if (whatever) { a[0] = 5; return a; } }
   int[] cowBar(int[] a) { if (something) { a[1] = 10; return a; } }

   int[] result=cowFoo(cowBar(someintar));

What's the a.dup for? Do you realize that if the parameter (a) is 
non-const then that means the function is allowed to change it? Perhaps 
you meant a different use case?


-- 
Bruno Medeiros - CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D

Jul 20 2006

Reiner Pope <reiner.pope gmail.com> writes:

Andrew Fedoniouk wrote:
 Dynamic constness versus static (compile time) constness is not new.

So what?

 In case of runtimes like D (natively compileable) such control is not an
 option.

What do you mean by this? The runtime itself doesn't need to be able to 
control the code, as we know, such control could be forced at compile 
time. As to the fact that the runtime could be subverted, well, since we 
have assembly in D, static const can similarly be converted. If speed 
issues are the concern, read on.

 I beleive that proposed runtime flag a) is not a constness in any sense

What about the sense that illegal write operations to readonly arrays 
could be caught in debug builds? That effectively ensures that the 
arrays are kept *constant*, doesn't it?
 b) does not solve compile verification of readonlyness and

There seem two main arguments for compile time verification of 
readonlyness: speed and certainty. For reasons outlined below, speed is 
actually likely to be _greater_ with runtime const than with 
compile-time const. As for certainty, readonlyness is just one of many 
bug-catching mechanisms. Others include:
   - Design by Contract (pre- and post- conditions and invariants)
   - Unit testing
   - Typing mechanism (partial type safety)
   - Array bounds checking
   - GC (catches memory and type-safety errors)
All of these checking mechanisms other than type safety are implemented 
at runtime, yet there is not too much debate about that fact, even 
though they *could* be checked for at compile time, using theorem 
proving, (see http://en.wikipedia.org/wiki/SPARK_programming_language 
for a programming language that does this). The fact that they are 
checked at runtime means that, like runtime const-ness, the certainty of 
static checking isn't present. However, it still many more bugs to be 
caught than no const system at all, and I would even go so far as to say 
that it would catch *most* const violations if combined with good unit 
tests.

The main advantage of runtime checking is flexibility/speed, as well as 
no 'const-pollution', as xs0 put it.

You get the speed gains from avoiding all unnecessary duplications, a 
feat which simple (a la C++) static const-checking can't achieve. 
Imagine that we had a static const-checking system in D:

const char[] tolower(const char[] input)
// the input must be const, because we agree with CoW, so we won't change it
// Because of below, we also declare the output of the function const
{
   // do some stuff
   if ( a write is necessary )
   { // copy it into another variable, since we can't change input (it's 
const)
   }
   return something;
// This something could possibly be input, so it also needs to be 
declared const. So we go back and make the return value of the function 
also a const.
}

// Now, since the return value is const, we *must* dup it whenever we 
call it. This is *very* inefficient if we own the string, because we get 
two unnecessary dups. This is a big price to pay just to keep static 
const-checking.


 c) can be implemented now by defining:
 struct vector
 {
     bool readonly;
     T*  data;
     uint length;
 }

Yes and no. It can be implemented like that because that would 
effectively copy exactly what an array does already, but a) it takes up 
more memory than what xs0 proposed, and b) it isn't supported natively 
by the language's arrays, so it is less likely to be used.

 
 Declarative contness prevents data misuse at compile time
 when runtime constness moves problem into execution time
 when is a) too late to do anything and b) expensive.

a) Testing, especially when assisted by unit testing and the code 
coverage tool included in DMD, should pick up most, if not all, of the 
const violations in your code, when you still do have a chance to do 
something about it. It's impossible to rely on the compiler to pick up 
all your bugs in any situation.
b) It's not expensive, because it avoids unnecessary duplications and 
there should be a compiler switch to turn of the readonly checks in 
release builds, once you're sure of safety. xs0 covered the costs and 
concluded they weren't many.

 I would mention old idea again - real solution would be in creating of
 mechanism of disabling exiting or creating new opertaions
 for intrinsic types.
 
 For example string definition might look like as:
 
 typedef  string char[]
 {
     disable opAssign;
     ....
     char[] tolower() { ..... }
 }

While this could be a useful tool, using this as a form of 
data-protection is just WAY TOO inflexible, and it removes the areas 
where D's string (and array) processing is so powerful.



Cheers,

Reiner

Jul 19 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Reiner Pope" <reiner.pope gmail.com> wrote in message 
news:e9kunq$qli$1 digitaldaemon.com...
 You get the speed gains from avoiding all unnecessary duplications, a feat 
 which simple (a la C++) static const-checking can't achieve. Imagine that 
 we had a static const-checking system in D:

 const char[] tolower(const char[] input)
 // the input must be const, because we agree with CoW, so we won't change 
 it
 // Because of below, we also declare the output of the function const
 {
   // do some stuff
   if ( a write is necessary )
   { // copy it into another variable, since we can't change input (it's 
 const)
   }
   return something;
 // This something could possibly be input, so it also needs to be declared 
 const. So we go back and make the return value of the function also a 
 const.
 }

 // Now, since the return value is const, we *must* dup it whenever we call 
 it. This is *very* inefficient if we own the string, because we get two 
 unnecessary dups. This is a big price to pay just to keep static 
 const-checking.


 c) can be implemented now by defining:
 struct vector
 {
     bool readonly;
     T*  data;
     uint length;
 }

 Yes and no. It can be implemented like that because that would effectively 
 copy exactly what an array does already, but a) it takes up more memory 
 than what xs0 proposed, and b) it isn't supported natively by the 
 language's arrays, so it is less likely to be used.

propsed readonly solves one particular pretty narrow case of COW
(only for arrays and only in functions aware about this flag)

C++ has better and more universal mechanism for this.

inline string &
    string::operator= ( const string &s )
  {
    release_data();
    set_data ( s.data );
    return *this;
  }

inline string & string::operator += ( const string &s )
  {
    mutate(*this);
    resize( length() + s.length() );
    .....
    return *this;
  }

I beleive that COW arrays (strings in particular) if they needed cannot
be made without operator= in structures in D.
Reference counting cannot be made in D with the same elegancy as in C++.

But in pure GC world COW strings are not used.

string as a type simply has no such things as str[i] = 'o';
There are strong reasons for that.

extended typedef and alias will allow D to have strings as value types
without any additional runtime costs.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 19 2006

Reiner Pope <reiner.pope gmail.com> writes:

Andrew Fedoniouk wrote:
 "Reiner Pope" <reiner.pope gmail.com> wrote in message 
 news:e9kunq$qli$1 digitaldaemon.com...
 You get the speed gains from avoiding all unnecessary duplications, a feat 
 which simple (a la C++) static const-checking can't achieve. Imagine that 
 we had a static const-checking system in D:

 const char[] tolower(const char[] input)
 // the input must be const, because we agree with CoW, so we won't change 
 it
 // Because of below, we also declare the output of the function const
 {
   // do some stuff
   if ( a write is necessary )
   { // copy it into another variable, since we can't change input (it's 
 const)
   }
   return something;
 // This something could possibly be input, so it also needs to be declared 
 const. So we go back and make the return value of the function also a 
 const.
 }

 // Now, since the return value is const, we *must* dup it whenever we call 
 it. This is *very* inefficient if we own the string, because we get two 
 unnecessary dups. This is a big price to pay just to keep static 
 const-checking.


 c) can be implemented now by defining:
 struct vector
 {
     bool readonly;
     T*  data;
     uint length;
 }

 Yes and no. It can be implemented like that because that would effectively 
 copy exactly what an array does already, but a) it takes up more memory 
 than what xs0 proposed, and b) it isn't supported natively by the 
 language's arrays, so it is less likely to be used.

 
 propsed readonly solves one particular pretty narrow case of COW
 (only for arrays and only in functions aware about this flag)

You have to be aware of CoW if you are writing a CoW function. It's like 
saying that the opIndexAssign property of arrays is limited because it 
can only be used by the functions that know about it. I see this 
proposal as an alternative to C++-style const, and with regards to 
functions being aware of the features, xs0's solution is better because 
it avoids const propogation throughout the code

 
 C++ has better and more universal mechanism for this.
 
 inline string &
     string::operator= ( const string &s )
   {
     release_data();
     set_data ( s.data );
     return *this;
   }
 

This appears to be copying the contents of s into this. In D terms, this 
is a duplication, which is the runtime costs we are trying to avoid.

 inline string & string::operator += ( const string &s )
   {
     mutate(*this);
     resize( length() + s.length() );
     .....
     return *this;
   }
 

The other point to make is that this seems not to be a C++ feature, but 
rather a library feature. I'm probably not understanding your examples, 
but can you, say, provide C++ code to match the following D code's 
functionality while avoiding unnecessary duplicates _and having const 
safety_:

char[] foo = "foo";
foo = tolower(toupper(foo));

I don't see how you can manage that with static const-checking. Please 
explain, and maybe then I can understand how the C++ solution is 'better'.

 I beleive that COW arrays (strings in particular) if they needed cannot
 be made without operator= in structures in D.

This seems to be a tangential issue. xs0's solution appears to work, and 
you haven't outlined a technical reason for it not working. If Walter 
integrates it into D, then that isn't going to cause any problems.

 Reference counting cannot be made in D with the same elegancy as in C++.

I don't see why it can't, but ignoring that, I also don't see why we 
need ref-counting for CoW strings. Doesn't mark-and-sweep manage it better?

 But in pure GC world COW strings are not used.



  fast string processing is largely diminished. However, D's string 
processing capabilities are good, and since it is possible to keep them, 
why shouldn't we?
 string as a type simply has no such things as str[i] = 'o';
 There are strong reasons for that.


cumbersome to process strings, with all the calls to foo.substring(0, 
2); and so on. The other downside is that the processing is *slow* *as*.

Cheers,

Reiner

Jul 20 2006

"Craig Black" <cblack ara.com> writes:

Sounds like a great idea to me.  Easy to implement, improves correctness and 
performance.  What are we waiting for?

-Craig

Jul 19 2006

xs0 <xs0 xs0.com> writes:

Craig Black wrote:
 Sounds like a great idea to me.  Easy to implement, improves correctness and 
 performance.  What are we waiting for?

Personally, I'm waiting/hoping for Walter to see the proposal and say 
what he thinks :)

I'm also wondering whether the "overwhelming" response to the proposal 
is because
- I didn't write "proposal" in the subject
- it's from me (I used to argue in a bad way too much, I'm sure I'm 
being filtered at least by some people :)
- it's so bad it's not even worth a comment
- it's so good everybody is already waiting for Walter to say yes ;)


xs0

Jul 19 2006

Don Clugston <dac nospam.com.au> writes:

xs0 wrote:
 Craig Black wrote:
 Sounds like a great idea to me.  Easy to implement, improves 
 correctness and performance.  What are we waiting for?

 
 Personally, I'm waiting/hoping for Walter to see the proposal and say 
 what he thinks :)
 
 I'm also wondering whether the "overwhelming" response to the proposal 
 is because
 - I didn't write "proposal" in the subject
 - it's from me (I used to argue in a bad way too much, I'm sure I'm 
 being filtered at least by some people :)
 - it's so bad it's not even worth a comment
 - it's so good everybody is already waiting for Walter to say yes ;)

Maybe you just need some better terminology. How about

arr.clone
to replace arr with a writable copy of arr (instead of "needToWrite").
(you don't care if its the original arr, or a dup)
and turn it into a proposal about a more efficient dup.

Modify Only One Copy On Write.
(MOO COW).
<g>.

Jul 20 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:e9nvee$2h4m$1 digitaldaemon.com...
 xs0 wrote:
 Craig Black wrote:
 Sounds like a great idea to me.  Easy to implement, improves correctness 
 and performance.  What are we waiting for?

 Personally, I'm waiting/hoping for Walter to see the proposal and say 
 what he thinks :)

 I'm also wondering whether the "overwhelming" response to the proposal is 
 because
 - I didn't write "proposal" in the subject
 - it's from me (I used to argue in a bad way too much, I'm sure I'm being 
 filtered at least by some people :)
 - it's so bad it's not even worth a comment
 - it's so good everybody is already waiting for Walter to say yes ;)

 Maybe you just need some better terminology. How about

 arr.clone
 to replace arr with a writable copy of arr (instead of "needToWrite").
 (you don't care if its the original arr, or a dup)
 and turn it into a proposal about a more efficient dup.

 Modify Only One Copy On Write.
 (MOO COW).
 <g>.

Don, I think that reference counting (MOO COW) has the
same set of "civil rights" as GC so probably it makes sense
to look on this from language design perspective in more universal
fashion. RefCounting of arrays is only one particular thing I mean -
language shall support this idiom with the same quality as GC.

In fact for typical and effective refcounting implementation it is
enough to have ctors/dtors/assignement in structs.
Having them MOO COW can be implemented easily without
need of runtime model changes.

And MOO COW is somehow orthogonal to constness.

Again, I would try to find here more universal solution
rather than particular array problem.

I beleive that "smart pointer" as an entity will cover
MOO COW cases. But D does not have facilities
now for smart pointers at all.


Andrew Fedoniouk.
http://terrainformatica.com

Jul 20 2006

Don Clugston <dac nospam.com.au> writes:

Andrew Fedoniouk wrote:
 Don, I think that reference counting (MOO COW) has the
 same set of "civil rights" as GC so probably it makes sense
 to look on this from language design perspective in more universal
 fashion. RefCounting of arrays is only one particular thing I mean -
 language shall support this idiom with the same quality as GC.

I agree that something more universal would be better. But there's a 
really interesting feature: when you have GC, you don't need full 
reference counting, because you don't need deterministic destruction. 
You only need a single bit. (I think this is correct, but it needs more 
thought).

 In fact for typical and effective refcounting implementation it is
 enough to have ctors/dtors/assignement in structs.

I don't quite agree with this. I think that arrays in D are 
fundamentally different from arrays in C/C++. In C, they're little more 
than syntactic sugar for pointers, whereas in D, they are more like very 
important, built-in structs. If refcounting were more integral in the 
language, it would need to be available for built-in arrays.

 Having them MOO COW can be implemented easily without
 need of runtime model changes.
 
 And MOO COW is somehow orthogonal to constness.

Yes, that's the point I was trying to make. I thought it was an 
interesting proposal, but doesn't have much to do with compile-time 
constness, except insofar as it reduces the need for full const.

 Again, I would try to find here more universal solution
 rather than particular array problem.
 
 I beleive that "smart pointer" as an entity will cover
 MOO COW cases. But D does not have facilities
 now for smart pointers at all.

Walter seems to have vehement opposition to operator =. I wonder if it 
is really necessary. Maybe a single opXXX() function could do the job, 
if the compiler had some extra intelligence. (Much as opCmp does all of 
the >,<, >=, <=. Every op= and copy constructor I've ever seen in C++ 
was  very tedious, I wonder if that design pattern could be factored 
into a single function).

Jul 21 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:e9qb6h$2r4m$1 digitaldaemon.com...
 Andrew Fedoniouk wrote:
 Don, I think that reference counting (MOO COW) has the
 same set of "civil rights" as GC so probably it makes sense
 to look on this from language design perspective in more universal
 fashion. RefCounting of arrays is only one particular thing I mean -
 language shall support this idiom with the same quality as GC.

 I agree that something more universal would be better. But there's a 
 really interesting feature: when you have GC, you don't need full 
 reference counting, because you don't need deterministic destruction. You 
 only need a single bit. (I think this is correct, but it needs more 
 thought).

"But there's a really interesting feature: when you have GC, you don't need 
full
reference counting, because you don't need deterministic destruction."

Theoretically, yes. Practically... I would change it to
"when you have perfect GC".

And about GC:

Here is the sample I know pretty well:
There is no HTML rendering engine in the wild based on GC  memory
management ( http://en.wikipedia.org/wiki/List_of_layout_engines )


(In Harmonia I've decided to do not use GC heap for the DOM too)

Deterministic memory management has one and big benefit - it is
manageable and predictable.


 In fact for typical and effective refcounting implementation it is
 enough to have ctors/dtors/assignement in structs.

 I don't quite agree with this. I think that arrays in D are fundamentally 
 different from arrays in C/C++. In C, they're little more than syntactic 
 sugar for pointers, whereas in D, they are more like very important, 
 built-in structs. If refcounting were more integral in the language, it 
 would need to be available for built-in arrays.

 Having them MOO COW can be implemented easily without
 need of runtime model changes.

 And MOO COW is somehow orthogonal to constness.

 Yes, that's the point I was trying to make. I thought it was an 
 interesting proposal, but doesn't have much to do with compile-time 
 constness, except insofar as it reduces the need for full const.

There is a string struct in Harmonia using such bit (string.d):

struct tstring(CHAR)
{
  CHAR[]  chars;
  bit     mutable = true; // by default empty string is mutable
  ......
}

and this is exactly wat was propsed (except of placement of the bit)

Without constness and ability to define methods it worth nothing for
arrays.

 Again, I would try to find here more universal solution
 rather than particular array problem.

 I beleive that "smart pointer" as an entity will cover
 MOO COW cases. But D does not have facilities
 now for smart pointers at all.

 Walter seems to have vehement opposition to operator =. I wonder if it is 
 really necessary. Maybe a single opXXX() function could do the job, if the 
 compiler had some extra intelligence. (Much as opCmp does all of the >,<, 
  >=, <=. Every op= and copy constructor I've ever seen in C++ was  very 
 tedious, I wonder if that design pattern could be factored into a single 
 function).

operator "="  IS really necessary.  As there is no method in D
currently to guard assignment to variable (memory location).
Again without it good chunk of RAII methods and smart pointers
are not implementable in D.

In HTMLayout SDK I have dom::element object which is wrapper
around internal DOM element handler. It is in C++.
I physically cannot write something close and so easy to use in D.

**DOM element.*/

    class element
    {
    protected:
      HELEMENT he;
      void use(HELEMENT h) { he = (HTMLayout_UseElement(h) == HLDOM_OK)? h: 
0; }
      void unuse() { if(he) HTMLayout_UnuseElement(he); he = 0; }
      void set(HELEMENT h) { unuse(); use(h); }
    public:
      element(): he(0) { }
      element(HELEMENT h)       { use(h); }
      element(const element& e) { use(e.he); }
      operator HELEMENT() const { return he; }
      ~element()                { unuse(); }
      element& operator = (HELEMENT h) { set(h); return *this; }
      element& operator = (const element& e) { set(e.he); return *this; }
      ....
}

Jul 21 2006

Ben Phillips <Ben_member pathlink.com> writes:

operator "="  IS really necessary.  As there is no method in D
currently to guard assignment to variable (memory location).
Again without it good chunk of RAII methods and smart pointers
are not implementable in D.

It is impossible to allow operator "=" to be overloaded without totally killing
the way D works because D uses references.
Example:
ClassA a = new ClassA();
ClassA b = a; // b now refers to a
b.mutate(); // both 'b' and 'a' are changed since they refer to the same object

What is possible is to define a new operator (such as ":=") that means copy
assignment, but I don't see how this differs from creating a method that does
the same thing.

Jul 21 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Ben Phillips" <Ben_member pathlink.com> wrote in message 
news:e9rc1u$1g71$1 digitaldaemon.com...
operator "="  IS really necessary.  As there is no method in D
currently to guard assignment to variable (memory location).
Again without it good chunk of RAII methods and smart pointers
are not implementable in D.

 It is impossible to allow operator "=" to be overloaded without totally 
 killing
 the way D works because D uses references.
 Example:
 ClassA a = new ClassA();
 ClassA b = a; // b now refers to a
 b.mutate(); // both 'b' and 'a' are changed since they refer to the same 
 object

I think that operator= shall be available only for structs and probably 
other value types.
So it will be no conflict with current situation.

 What is possible is to define a new operator (such as ":=") that means 
 copy
 assignment, but I don't see how this differs from creating a method that 
 does
 the same thing.

method is not an option at all.
operator= is a guard of memory loacation and method, well, is method.

struct guard {
   int v;
   void opAssign(int nv) {  alarm("value 'v' is about to change"); v = 
v;  }
}

guard gv;
gv = 12;

As you may see operator= guards memory location allowing you to intercept
all assignments into the variable. Too many things (RAII, smart pointers) 
were built
around this in C++.

Method of the struct will not help you here in principle.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 21 2006

"Derek Parnell" <derek psych.ward> writes:

On Sat, 22 Jul 2006 06:27:24 +1000, Andrew Fedoniouk  =

<news terrainformatica.com> wrote:

 "Ben Phillips" <Ben_member pathlink.com> wrote in message
 news:e9rc1u$1g71$1 digitaldaemon.com...
 operator "=3D"  IS really necessary.  As there is no method in D
 currently to guard assignment to variable (memory location).
 Again without it good chunk of RAII methods and smart pointers
 are not implementable in D.

 It is impossible to allow operator "=3D" to be overloaded without tot=


ally
 killing
 the way D works because D uses references.
 Example:
 ClassA a =3D new ClassA();
 ClassA b =3D a; // b now refers to a
 b.mutate(); // both 'b' and 'a' are changed since they refer to the s=


ame
 object


But some of the side effects of a new operator ':=3D' would be that it c=
ould  =

be used with value types and reference types alike and mean that the  =

information contained in the right-hand side member is copy to the  =

left-hand side member. It would remove the need for ".dup" for example.

   char[] a;
   a :=3D toString(4);


 method is not an option at all.
 operator=3D is a guard of memory loacation and method, well, is method=

.
 struct guard {
    int v;
    void opAssign(int nv) {  alarm("value 'v' is about to change"); v =3D=

 v;  }
 }

 guard gv;
 gv =3D 12;

 As you may see operator=3D guards memory location allowing you to inte=

rcept
 all assignments into the variable. Too many things (RAII, smart pointe=

rs)
 were built
 around this in C++.

 Method of the struct will not help you here in principle.

  struct guard {
     private int _v;
     void v(int nv) {  alarm("value 'v' is about to change"); _v =3D v; =
 }
   }
  guard gv;
  gv.v =3D 12 ;

-- =

Derek Parnell
Melbourne, Australia

Jul 21 2006

"Andrew Fedoniouk" <news terrainformatica.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:op.tc2lghnq6b8z09 ginger.vic.bigpond.net.au...
[skiped]
 method is not an option at all.
 operator= is a guard of memory loacation and method, well, is method.

 struct guard {
    int v;
    void opAssign(int nv) {  alarm("value 'v' is about to change"); v =
 v;  }
 }

 guard gv;
 gv = 12;

 As you may see operator= guards memory location allowing you to intercept
 all assignments into the variable. Too many things (RAII, smart pointers)
 were built
 around this in C++.

 Method of the struct will not help you here in principle.

  struct guard {
     private int _v;
     void v(int nv) {  alarm("value 'v' is about to change"); _v = v;  }
   }
  guard gv;
  gv.v = 12 ;

Consider this:

guard gv, gv1;
gv1.v = 24;
gv.v = 12 ;

gv = gv1; //oops, where is my alarm()?

Again there is no method in "modern D" to catch assignment
to the variable.

In my case (wrapper of htmlayout), in following assignment:

dom::element root = dom::element::get_root(hWnd);

operator= calls HTMLayout_useElement of the HELEMENT
returned by get_root. (C++)
And C++ will call destructor for the root at the end of the block.
And in destructor happens HTMLayout_unuseElement.

Such use case allows to hold resources for limited (deterministic)
time.  At the end system is more responsive than any GCable one.

There is no way in D to implement this. Sorry, but this is true.

Andrew Fedoniouk.
http://terrainformatica.com

Jul 22 2006

"Derek Parnell" <derek psych.ward> writes:

On Sat, 22 Jul 2006 18:03:33 +1000, Andrew Fedoniouk  =

<news terrainformatica.com> wrote:


 There is no way in D to implement this. Sorry, but this is true.

Agreed.

The ':=3D' operator could be one solution. I suggest that when applied, =
it  =

should only copy one level deep and if one needs deeper copies then the =
 =

opCopy() function could be overloaded to provide that functionality. Of =
 =

course, we should also have opXXX functionality when using basic types a=
nd  =

arrays.  Hopefully, this concept can be seriously considered for v2.0

-- =

Derek Parnell
Melbourne, Australia

Jul 22 2006

D Programming

C/C++ Programming

Other

digitalmars.D - constness for arrays