www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - empty arrays - no complaints?

reply Farmer <itsFarmer. freenet.de> writes:
Why are there (almost) no complaints about D's support for empty arrays?


Just to get ex-BASIC programmers in touch with this aspect of D arrays, 
here's a (not so) small D sample that shows how to create 
   a)null arrays (named: null1, null2, null3)
   b)empty arrays (named: array1, array2, array3)
and also shows how they differ.

[D arrays have sooooo obvious semantic, that D programmers should feel free 
to skip to the end of this post and read the conclusion.]


--------------------- array sample code ---------------------


void printTraits(char[] array, char[] name)
{

   printf("\n%10.*s%-13.*s", name, ".length == 0");
   if (array.length == 0)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("%10.*s%-13.*s", name, " is null");
   if (array is null)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("\n%10.*s%-13.*s", name, " == null");
   if (array == null)
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");

   printf("%10.*s%-13.*s", name, " == \"\"");
   if (array == "")
      printf("%10.*s","is true");
   else
      printf("%10.*s","is false");
}


int main(char args[][])
{
   char[] empty1=(new char[1])[0..0];
   char[] empty2="1"[1..1];   // empty2="1"[2..2]  causes ArrayBoundsError
   char[] empty3="";

   char[] null1;
   char[] null2=new char[0];
   char[] null3=empty1;
   null3.length=0;

   printTraits(null1, "null1");
   printTraits(null2, "null2");
   printTraits(null3, "null3");
   printf("\n");
   printTraits(empty1, "empty1");
   printTraits(empty2, "empty2");
   printTraits(empty3, "empty3");
   printf("\n\n");
   if (null1 == null)
      printf("%20.*s","null1 == null   ");
   if (empty1 == null1)
      printf("%20.*s","empty1 == null1  ");
   if (empty1 != null)
      printf("%20.*s","but  empty1 != null");
   printf("\n");

   return 0;
}


Build with DMD 0.93 (Windows), the output is:

     null1.length == 0    is true     null1 is null        is true
     null1 == null        is true     null1 == ""          is true
     null2.length == 0    is true     null2 is null        is true
     null2 == null        is true     null2 == ""          is true
     null3.length == 0    is true     null3 is null        is true
     null3 == null        is true     null3 == ""          is true

    empty1.length == 0    is true    empty1 is null       is false
    empty1 == null       is false    empty1 == ""          is true
    empty2.length == 0    is true    empty2 is null       is false
    empty2 == null       is false    empty2 == ""          is true
    empty3.length == 0    is true    empty3 is null       is false
    empty3 == null       is false    empty3 == ""          is true

    null1 == null      empty1 == null1   but  empty1 != null


--------------------- end of array sample ---------------------



Conclusion: D does have empty-arrays and null-arrays but the language tries  
to blur them. 

This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist), be 
empty, or have a non-empty value.

If empty-arrays vs. null-arrays are blurred, the interface gets more bloated: 
// additional parameter
char[] getAttrValue(char[] name, out bit isNull)  
// additional function, potentially wasting a slot in the VTable
bit hasAttrValue(char[] name)
// additional indirection
Attribute getAttribute(char[] name) 


2) Initialization bugs are not detected at runtime.

D has
-null-references for objects
-null for pointers
-nan's for FP types
-invalid characters for unicode characters
-garantueed initialization of structs (Constructors are comming, soon !)
-and strong typedefs that empower the programmer to define application 
specific 'not-initialized' values for integer types 

to make an ubiquitous source of bugs, easy to spot and fix. 
But if empty/null arrays are commonly treated as being the same thing, 
uninitialized arrays will cause subtle bugs here and there.


3) This aspect of array behaviour is not obvious!

Ok, what's obvious is always a moot point. (If I knew, what's obvious, I 
would write posts about bit vs. bool vs. strong bool types.)
But I know that the array behaviour is definitely not obvious to all D/C/C++ 
programmers.


So, why doesn't anyone complain?


Farmer.
Jun 27 2004
next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
Conclusion: D does have empty-arrays and null-arrays but the language tries  
to blur them.
Not really. I'd rather argue that D tries to make both usable and reduce odd errors resulting from uninitialized arrays.
This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist), be 
empty, or have a non-empty value.
I'd say this is an interface or documentaation problem, not a language problem.
2) Initialization bugs are not detected at runtime.
This makes sense in this case. I don't like the idea of having to distinguish between an initialized array with no elements and an uninitialized array, as both are equivalent IMO. Further, setting the length property will cause a reallocation for both types of arrays.
to make an ubiquitous source of bugs, easy to spot and fix. 
But if empty/null arrays are commonly treated as being the same thing, 
uninitialized arrays will cause subtle bugs here and there.
I believe the opposite would be true. Sean
Jun 27 2004
parent reply Farmer <itsFarmer. freenet.de> writes:
Sean Kelly <sean f4.ca> wrote in news:cbn29h$rpo$1 digitaldaemon.com:


 Not really.  I'd rather argue that D tries to make both usable and
 reduce odd errors resulting from uninitialized arrays.
I think, D tries to *hide* errors resulting from uninitialized arrays.
 
This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist),
be empty, or have a non-empty value.
I'd say this is an interface or documentaation problem, not a language problem.
You misunderstood me, I meant that the function interface is a good one. I could document the function like this: /* Function returns the value the attribute of the given name. param name name of the attribute return returns null if the attribute doesn't exist returns value of the attribute otherwise */ char[] getAttrValue(char[] name) But the other functions, I mentioned would be a necessary workaround if you couldn't distinguish between null and empty arrays. And these functions are a waste of both cpu cycles and developer brain.
2) Initialization bugs are not detected at runtime.
This makes sense in this case. I don't like the idea of having to distinguish between an initialized array with no elements and an uninitialized array, as both are equivalent IMO. Further, setting the length property will cause a reallocation for both types of arrays.
Well, it's quite easy to do distinquish between an empty and a null array: An uninitialized array (null array) is a bug in either the programmer's code or in the code of a library. An initialized array (empty array) is a perfectly legal thing. Why is the idea to distinguish between a bug and correct programm behaviour such an unpleasent thing? Reallocation occures if the length is greater than the allocated size. I'm fine with that, the length 'property' is such an oddity that whatever it does, I would call it consistent. Reallocation is garanteed to not happen if the new length is less or equal the allocated size (Walter said so). Well, except when the new length happens to be 0. Talk about consistency.
Jun 27 2004
parent reply Derek Parnell <derek psych.ward> writes:
On Sun, 27 Jun 2004 22:55:46 +0000 (UTC), Farmer wrote:

 Sean Kelly <sean f4.ca> wrote in news:cbn29h$rpo$1 digitaldaemon.com:
 
 Not really.  I'd rather argue that D tries to make both usable and
 reduce odd errors resulting from uninitialized arrays.
I think, D tries to *hide* errors resulting from uninitialized arrays.
 
This is unfortunate as 

1) a clear separation of empty-arrays vs. null-arrays is useful for 
functional rich but simple API interfaces:

Imagine a function that returns the value of attributes of a XML-element
char[] getAttrValue(char[] name)

The attribute value could be non-existant (the attribute doesn't exist),
be empty, or have a non-empty value.
I'd say this is an interface or documentaation problem, not a language problem.
You misunderstood me, I meant that the function interface is a good one. I could document the function like this: /* Function returns the value the attribute of the given name. param name name of the attribute return returns null if the attribute doesn't exist returns value of the attribute otherwise */ char[] getAttrValue(char[] name) But the other functions, I mentioned would be a necessary workaround if you couldn't distinguish between null and empty arrays. And these functions are a waste of both cpu cycles and developer brain.
2) Initialization bugs are not detected at runtime.
This makes sense in this case. I don't like the idea of having to distinguish between an initialized array with no elements and an uninitialized array, as both are equivalent IMO. Further, setting the length property will cause a reallocation for both types of arrays.
Well, it's quite easy to do distinquish between an empty and a null array: An uninitialized array (null array) is a bug in either the programmer's code or in the code of a library. An initialized array (empty array) is a perfectly legal thing.
Well....the *use* of an uninitialized array it which it is assumed to be initialized is a bug. The fact, or presence, of an uninitialized array is itself is not really a bug. Also, the use of an empty array may well be a bug in other circumstances, even though is it 'a legal thing'.
 Why is the idea to distinguish between a bug and correct programm behaviour 
 such an unpleasent thing?
It's not, and no one said it was. We are talking about distinguishing between an array that has not been set to anything specific *yet*, and one that has been set explictly though assignment, to contain zero elements. There is a timing issue here. For example, it might be prudent in some situations to only initialize an array if its actually going to be used. This is a run-time decision and not a compile time decision. -- Derek Melbourne, Australia 28/Jun/04 10:44:13 AM
Jun 27 2004
parent reply "Kris" <someidiot earthlink.dot.dot.dot.net> writes:
Derek Parnell" <derek psych.ward> wrote:
 There is a timing issue here. For example, it might be prudent in some
 situations to only initialize an array if its actually going to be used.
 This is a run-time decision and not a compile time decision.
What I do to handle such issues is to check the array length only. See, even if the array is unallocated the length is still valid (because arrays are a pointer/length pair). If the length is zero, you move on. If not, then the pointer *should* be valid. That is, a length-check can perform double duty. For example: void foo (char[] bar) { if (bar.length) // do something ; } main () { foo (null); } - Kris
Jun 27 2004
parent Regan Heath <regan netwin.co.nz> writes:
On Sun, 27 Jun 2004 18:09:05 -0700, Kris 
<someidiot earthlink.dot.dot.dot.net> wrote:
 Derek Parnell" <derek psych.ward> wrote:
 There is a timing issue here. For example, it might be prudent in some
 situations to only initialize an array if its actually going to be used.
 This is a run-time decision and not a compile time decision.
What I do to handle such issues is to check the array length only. See, even if the array is unallocated the length is still valid (because arrays are a pointer/length pair). If the length is zero, you move on. If not, then the pointer *should* be valid. That is, a length-check can perform double duty. For example: void foo (char[] bar) { if (bar.length) // do something ; } main () { foo (null); }
I think Derek is thinking more of this other example he gave: if (a === null) { // Initialize it } else { if (a.length == 0) { // Empty situation. I DO NOT WANT TO INITIALIZE IT HERE! } else { // Use the non-empty array } } The array above is initialized if it's null. Otherwise it is handled based on whether it has items in it. We need to be able to tell the difference between empty and null, and it needs to be consistent. The inconsistencies as I see them are: empty array == null //true empry array == null array //true whereas both should be false. No change needs to be made to the way the length property works, as you say it's useful if you do not need to handle them differently. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 27 2004
prev sibling next sibling parent reply Andy Friesen <andy ikagames.com> writes:
Farmer wrote:
 Why are there (almost) no complaints about D's support for empty arrays?
 
 Conclusion: D does have empty-arrays and null-arrays but the language tries  
 to blur them. 
 
 This is unfortunate ...
 
 So, why doesn't anyone complain?
I think the problem is that D arrays almost always behave like reference types, and therefore are almost always treated like reference types. They aren't. null arrays *are* empty arrays. Arrays are value types which consist of a length and a pointer to memory. Copying and slicing an array creates a brand new array whose data happens to (generally) be memory that is also pointed to by another array. So! Rules of thumb: 1) think of arrays as though they are value types which can be cheaply copied. 2) use .dup if you need to mutate copies made in this way. (the Copy-on-Write principle) -- andy
Jun 27 2004
parent reply Farmer <itsFarmer. freenet.de> writes:
Andy Friesen <andy ikagames.com> wrote in 
news:cbn3js$tgq$1 digitaldaemon.com:


 
 I think the problem is that D arrays almost always behave like reference 
 types, and therefore are almost always treated like reference types.
Yes, this is a problem. It is a necessary evil to archive that outstanding performance. But it is not really related to the topic null array vs. empty array, since empty arrays are possible with the D array layout
 They aren't.  null arrays *are* empty arrays.
No, null arrays are not empty arrays, as my sample proofs.
 Arrays are value types which consist of a length and a pointer to 
 memory.  Copying and slicing an array creates a brand new array whose 
 data happens to (generally) be memory that is also pointed to by another 
 array.
I think there's a lapsus, slices *always* point to the same memory as the array from which they were created. Regards, Farmer.
Jun 27 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in 
 news:cbn3js$tgq$1 digitaldaemon.com:
 
I think the problem is that D arrays almost always behave like reference 
types, and therefore are almost always treated like reference types.
Yes, this is a problem. It is a necessary evil to archive that outstanding performance. But it is not really related to the topic null array vs. empty array, since empty arrays are possible with the D array layout
Sure, in the same sense that D allows 'empty' integers. :)
They aren't.  null arrays *are* empty arrays.
No, null arrays are not empty arrays, as my sample proofs.
Conceptually they are. If the length is zero, then the data pointer is meaningless. Testing the data pointer in such a case can be likened to using the result of a division by zero. Doing things like mathematically 'proving' that 3==5 or that empty!==null is easy when you go into the twilight zone. :) As an example: import std.string; char[] permute(char[] c) { // mutate that to which the array refers c[0] = 'H'; // mutate the array c.length = 4; return c; } int main() { char[] c = "hello world!"; printf("%s\n", toStringz(c)); char[] d = permute(c); printf("Post-permute\n"); printf("%s\n", toStringz(c)); printf("%s\n", toStringz(d)); return 0; } This program produces the output: hello world! Hello world! Hell The array is a value type. The data it points to is not.
Arrays are value types which consist of a length and a pointer to 
memory.  Copying and slicing an array creates a brand new array whose 
data happens to (generally) be memory that is also pointed to by another 
array.
I think there's a lapsus, slices *always* point to the same memory as the array from which they were created.
In my experience, this is true, but I don't know if it *must*, so I felt obligated to qualify my statement. -- andy
Jun 27 2004
next sibling parent Derek Parnell <derek psych.ward> writes:
On Sun, 27 Jun 2004 17:02:27 -0700, Andy Friesen wrote:

 Farmer wrote:
 
 Andy Friesen <andy ikagames.com> wrote in 
 news:cbn3js$tgq$1 digitaldaemon.com:
 
I think the problem is that D arrays almost always behave like reference 
types, and therefore are almost always treated like reference types.
Yes, this is a problem. It is a necessary evil to archive that outstanding performance. But it is not really related to the topic null array vs. empty array, since empty arrays are possible with the D array layout
Sure, in the same sense that D allows 'empty' integers. :)
They aren't.  null arrays *are* empty arrays.
No, null arrays are not empty arrays, as my sample proofs.
Conceptually they are. If the length is zero, then the data pointer is meaningless. Testing the data pointer in such a case can be likened to using the result of a division by zero. Doing things like mathematically 'proving' that 3==5 or that empty!==null is easy when you go into the twilight zone. :)
Huh? There are times when a zero-length array is valid and an uninitalized array is not valid. There are simply not the same thing. if (a === null) { // Initialize it } else { if (a.length == 0) { // Empty situation. I DO NOT WANT TO INITIALIZE IT HERE! } else { // Use the non-empty array } }
 As an example:
 
      import std.string;
 
      char[] permute(char[] c) {
          // mutate that to which the array refers
          c[0] = 'H';
          // mutate the array
          c.length = 4;
          return c;
      }
 
      int main() {
          char[] c = "hello world!";
          printf("%s\n", toStringz(c));
 
          char[] d = permute(c);
 
          printf("Post-permute\n");
          printf("%s\n", toStringz(c));
          printf("%s\n", toStringz(d));
          return 0;
      }
 
 This program produces the output:
 
 	hello world!
 	Hello world!
 	Hell
 
 The array is a value type.  The data it points to is not.
 
Arrays are value types which consist of a length and a pointer to 
memory.  Copying and slicing an array creates a brand new array whose 
data happens to (generally) be memory that is also pointed to by another 
array.
I think there's a lapsus, slices *always* point to the same memory as the array from which they were created.
In my experience, this is true, but I don't know if it *must*, so I felt obligated to qualify my statement.
Yes, it could be an artifact of the D compiler rather than the D language. -- Derek Melbourne, Australia 28/Jun/04 10:51:51 AM
Jun 27 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Sun, 27 Jun 2004 17:02:27 -0700, Andy Friesen <andy ikagames.com> wrote:
 Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in 
 news:cbn3js$tgq$1 digitaldaemon.com:

 I think the problem is that D arrays almost always behave like 
 reference types, and therefore are almost always treated like 
 reference types.
Yes, this is a problem. It is a necessary evil to archive that outstanding performance. But it is not really related to the topic null array vs. empty array, since empty arrays are possible with the D array layout
 Sure, in the same sense that D allows 'empty' integers. :)
D allows both empty arrays *and* null arrays. It does *not* allow both empty *and* null integers. They are different and not comparable.
 They aren't.  null arrays *are* empty arrays.
No, null arrays are not empty arrays, as my sample proofs.
 Conceptually they are. If the length is zero, then the data pointer is 
 meaningless.
I disagree. Conceptually they aren't the same, as both my example and 'Farmers' have proven for the case of a char array. Even with other array types there is still a conceptual difference between an array that does not exist and one containing no elements. In a large number of real world cases you would treat the 2 the same, but that does not make them the same, and is no reason to preclude the ability to treat them differently. Even in D's implementation they aren't exactly the same, consider: 0) char[] a; 1) char[] b = "regan"; 2) b = ""; 3) b = null; at 0 a's data pointer is null and length is zero at 1 b's data pointer is non-null and length is 5 at 2 b's data pointer is non-null and length is 0 I am not 100% certain what happens at 3, either: at 3 b's data pointer is null and length is 0 or at 3 b's data pointer is non-null and length is 0 in either case 'a' (the null array) is not the same as 'b' when it is an empty array, and may not be even when 'b' is a null array.
 Testing the data pointer in such a case can be likened to using the 
 result of a division by zero. Doing things like mathematically 'proving' 
 that 3==5 or that empty!==null is easy when you go into the twilight 
 zone. :)
 As an example:

      import std.string;

      char[] permute(char[] c) {
          // mutate that to which the array refers
          c[0] = 'H';
          // mutate the array
          c.length = 4;
          return c;
      }

      int main() {
          char[] c = "hello world!";
          printf("%s\n", toStringz(c));

          char[] d = permute(c);

          printf("Post-permute\n");
          printf("%s\n", toStringz(c));
          printf("%s\n", toStringz(d));
          return 0;
      }

 This program produces the output:

 	hello world!
 	Hello world!
 	Hell

 The array is a value type.  The data it points to is not.

 Arrays are value types which consist of a length and a pointer to 
 memory.  Copying and slicing an array creates a brand new array whose 
 data happens to (generally) be memory that is also pointed to by 
 another array.
I think there's a lapsus, slices *always* point to the same memory as the array from which they were created.
In my experience, this is true, but I don't know if it *must*, so I felt obligated to qualify my statement.
The simple fact remains that we require both null strings (and possibly other arrays) and empty strings and that conceptually they are different, or rather they can mean different things and/or demand different behaviour. All I'm advocating is that test for null to not compare true for an empty array, and thus a null array and an empty array not to compare equal. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 27 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:
 Yes, this is a problem. It is a necessary evil to archive that 
 outstanding  performance. But it is not really related to the topic 
 null array vs. empty array, since empty arrays are possible with the 
 D array layout
 Sure, in the same sense that D allows 'empty' integers. :)
D allows both empty arrays *and* null arrays. It does *not* allow both empty *and* null integers. They are different and not comparable.
D arrays are implement exactly so: struct Array { int length; void* data; } Array a; // value type int i; // value type 'i' will never be null, and 'a' never will either, because both types exist on the stack. 'a' can be *compared* to null because an implicit pointer conversion is performed. However, if 'a' does not contain any data, its pointer value is meaningless, so the result of such a comparison is undefined. Either way, 'a' itself is *not* null any more than 'i' ever could be. (I'm not saying that this is how it should be, I'm just saying that this is how it is)
 They aren't.  null arrays *are* empty arrays.
No, null arrays are not empty arrays, as my sample proofs.
 Conceptually they are. If the length is zero, then the data pointer is 
 meaningless.
I disagree. Conceptually they aren't the same, as both my example and 'Farmers' have proven for the case of a char array. Even with other array types there is still a conceptual difference between an array that does not exist and one containing no elements. In a large number of real world cases you would treat the 2 the same, but that does not make them the same, and is no reason to preclude the ability to treat them differently.
This goes back to D performing implicit pointer conversion. Comparing arrays with null is not a good idea.
 The simple fact remains that we require both null strings (and possibly 
 other arrays) and empty strings and that conceptually they are 
 different, or rather they can mean different things and/or demand 
 different behaviour.
 
 All I'm advocating is that test for null to not compare true for an 
 empty array, and thus a null array and an empty array not to compare equal.
I'm still forming an opinion on whether this is the right thing to do or not. If comparing arrays with pointers was illegal, this issue would never arise. As for testing existence against emptiness, I suggest you do the same thing you would for an integer (or any other value type) for which nil and zero/empty/T.init must be distinguishable. -- andy
Jun 27 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Sun, 27 Jun 2004 19:15:10 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:
 Yes, this is a problem. It is a necessary evil to archive that 
 outstanding  performance. But it is not really related to the topic 
 null array vs. empty array, since empty arrays are possible with the 
 D array layout
 Sure, in the same sense that D allows 'empty' integers. :)
D allows both empty arrays *and* null arrays. It does *not* allow both empty *and* null integers. They are different and not comparable.
D arrays are implement exactly so: struct Array { int length; void* data; } Array a; // value type int i; // value type 'i' will never be null, and 'a' never will either, because both types exist on the stack. 'a' can be *compared* to null because an implicit pointer conversion is performed. However, if 'a' does not contain any data, its pointer value is meaningless, so the result of such a comparison is undefined. Either way, 'a' itself is *not* null any more than 'i' ever could be. (I'm not saying that this is how it should be, I'm just saying that this is how it is)
I see what you're saying... the internal data pointer for the array can be null or non-null however, this is the difference between an un-initialized (or null) array and an empty one. I dont care how we do it, I just know we need to be able to tell the difference for 'strings'. Perhaps this applies to all arrays. Perhaps strings need to be a specialized form of array...
 They aren't.  null arrays *are* empty arrays.
No, null arrays are not empty arrays, as my sample proofs.
 Conceptually they are. If the length is zero, then the data pointer is 
 meaningless.
I disagree. Conceptually they aren't the same, as both my example and 'Farmers' have proven for the case of a char array. Even with other array types there is still a conceptual difference between an array that does not exist and one containing no elements. In a large number of real world cases you would treat the 2 the same, but that does not make them the same, and is no reason to preclude the ability to treat them differently.
This goes back to D performing implicit pointer conversion. Comparing arrays with null is not a good idea.
Perhaps not, but, there is currently no other way to tell the difference between an empty string and a null string. This is very important.
 The simple fact remains that we require both null strings (and possibly 
 other arrays) and empty strings and that conceptually they are 
 different, or rather they can mean different things and/or demand 
 different behaviour.

 All I'm advocating is that test for null to not compare true for an 
 empty array, and thus a null array and an empty array not to compare 
 equal.
I'm still forming an opinion on whether this is the right thing to do or not. If comparing arrays with pointers was illegal, this issue would never arise.
True, but then you wouldn't be able to tell null strings from empty ones.
 As for testing existence against emptiness, I suggest you do the same 
 thing you would for an integer (or any other value type) for which nil 
 and zero/empty/T.init must be distinguishable.
I suspect an arrays .init parameter *is* null. in which case uint[] c; if (c == c.init) is equvalent to if (c == null) I was just recently told by Walter not to use the init value of an array. I was trying to re-init the array, i.e. uint[4] c = [0,1,2,3]; c = c.init c[] = c.init; c[] = c[].init; none of those work. Walters soln... static uint[4] cinit = [0,1,2,3]; uint[4] c; c[] = cinit[]; Why can't .init do this implicitly? For my original example it would create one static array, and my array called 'c' then set c.init to the static array, so that c = c.init; would work. For an array that is not initialized c.init can stay null as c = c.init; would then be equivalent to c = null; Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 27 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:

 I see what you're saying... the internal data pointer for the array can 
 be null or non-null however, this is the difference between an 
 un-initialized (or null) array and an empty one.
 
 I dont care how we do it, I just know we need to be able to tell the 
 difference for 'strings'. Perhaps this applies to all arrays. Perhaps 
 strings need to be a specialized form of array...
You say that as though it is self-evident that strings must absolutely, unequivocably be, at all costs, reference types. Why? C++ containers cannot represent null either. D will (and does) get along just fine if its array type works the same way.
 This goes back to D performing implicit pointer conversion.  Comparing 
 arrays with null is not a good idea.
Perhaps not, but, there is currently no other way to tell the difference between an empty string and a null string. This is very important.
A 'null array' is a completely arbitrary concept that has been extrapolated from undefined behaviour. :) (check the documentation concerning arrays. Nowhere does the concept of a null array appear. The only place the keyword 'null' even occurs is a blip which says that arrays are initialized with their data pointer set to null)
 I'm still forming an opinion on whether this is the right thing to do 
 or not.  If comparing arrays with pointers was illegal, this issue 
 would never arise.
True, but then you wouldn't be able to tell null strings from empty ones.
Because there is no such thing. As far as D is concerned, all arrays exist. Some contain elements, others don't. Whether its data pointer is null or not does not set it apart from any other empty array. -- andy
Jun 28 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:

 I see what you're saying... the internal data pointer for the array can 
 be null or non-null however, this is the difference between an 
 un-initialized (or null) array and an empty one.

 I dont care how we do it, I just know we need to be able to tell the 
 difference for 'strings'. Perhaps this applies to all arrays. Perhaps 
 strings need to be a specialized form of array...
You say that as though it is self-evident that strings must absolutely, unequivocably be, at all costs, reference types. Why?
If it's not a reference type, then how can you signal non-existance (null)?
 C++ containers cannot represent null either.  D will (and does) get 
 along just fine if its array type works the same way.
I have not used C++ containers. I program in C for a living, and C++ for a hobby. Is there a C++ container for strings that cannot tell the difference between non-existant and empty?
 This goes back to D performing implicit pointer conversion.  Comparing 
 arrays with null is not a good idea.
Perhaps not, but, there is currently no other way to tell the difference between an empty string and a null string. This is very important.
A 'null array' is a completely arbitrary concept that has been extrapolated from undefined behaviour. :)
It may be undefined, but I believe it is required.
 (check the documentation concerning arrays.  Nowhere does the concept of 
 a null array appear.  The only place the keyword 'null' even occurs is a 
 blip which says that arrays are initialized with their data pointer set 
 to null)
So it's undefined, lets define it.
 I'm still forming an opinion on whether this is the right thing to do 
 or not.  If comparing arrays with pointers was illegal, this issue 
 would never arise.
True, but then you wouldn't be able to tell null strings from empty ones.
Because there is no such thing.
Yes there is. The concept exists, in C and in our examples.
 As far as D is concerned, all arrays exist.  Some contain elements, 
 others don't.  Whether its data pointer is null or not does not set it 
 apart from any other empty array.
Yes it does. This behaviour exists, it's just currently undefined (as you say) and inconsistent (as Farmer has pointed out). The soln IMO is either to make the current behaviour official and consistent, or to change the behaviour, make that official and provide another way to tell null apart from an empty string. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 28 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:
 On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> wrote:
 You say that as though it is self-evident that strings must 
 absolutely, unequivocably be, at all costs, reference types.  Why?
If it's not a reference type, then how can you signal non-existance (null)?
You don't.
 I have not used C++ containers. I program in C for a living, and C++ for 
 a hobby. Is there a C++ container for strings that cannot tell the 
 difference between non-existant and empty?
Yeah, it's called std::string, and it's more or less the default.
 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)
It may be undefined, but I believe it is required.
Why? C++ gets along without them just fine, and every C derivant I know of gets along fine without allowing primitive type returns to signify nonexistence. Functions which returns structs cannot return null either.
 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.
Farmer's test reports pretty consistent results if you suppose that comparing arrays to null is ill-formed: empty1.length == 0 is true empty1 == "" is true empty2.length == 0 is true empty2 == "" is true empty3.length == 0 is true empty3 == "" is true Don't compare arrays to null. Don't try to differentiate between empty and nonexistent. D arrays simply do not work that way. -- andy
Jun 28 2004
next sibling parent Derek Parnell <derek psych.ward> writes:
On Mon, 28 Jun 2004 16:33:25 -0700, Andy Friesen wrote:

 Regan Heath wrote:
 On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> wrote:
 You say that as though it is self-evident that strings must 
 absolutely, unequivocably be, at all costs, reference types.  Why?
If it's not a reference type, then how can you signal non-existance (null)?
You don't.
 I have not used C++ containers. I program in C for a living, and C++ for 
 a hobby. Is there a C++ container for strings that cannot tell the 
 difference between non-existant and empty?
Yeah, it's called std::string, and it's more or less the default.
 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)
It may be undefined, but I believe it is required.
Why? C++ gets along without them just fine, and every C derivant I know of gets along fine without allowing primitive type returns to signify nonexistence. Functions which returns structs cannot return null either.
 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.
Farmer's test reports pretty consistent results if you suppose that comparing arrays to null is ill-formed: empty1.length == 0 is true empty1 == "" is true empty2.length == 0 is true empty2 == "" is true empty3.length == 0 is true empty3 == "" is true Don't compare arrays to null. Don't try to differentiate between empty and nonexistent. D arrays simply do not work that way. -- andy
Agreed, D doesn't seem to work that way, but isn't that the issue. Some people would like to distinguish between an uninitialized array, and an initialized but empty array. -- Derek Melbourne, Australia 29/Jun/04 10:44:05 AM
Jun 28 2004
prev sibling next sibling parent "Bent Rasmussen" <exo bent-rasmussen.info> writes:
 Don't compare arrays to null.  Don't try to differentiate between empty
 and nonexistent.  D arrays simply do not work that way.
I must say, I kind of like that. I don't have to write a read/write property where the write property has an in/out contract to guard against internal/external code setting an array member field to null -- goodbye bloat!
Jun 28 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Mon, 28 Jun 2004 16:33:25 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:
 On Mon, 28 Jun 2004 12:50:08 -0700, Andy Friesen <andy ikagames.com> 
 wrote:
 You say that as though it is self-evident that strings must 
 absolutely, unequivocably be, at all costs, reference types.  Why?
If it's not a reference type, then how can you signal non-existance (null)?
You don't.
Thought so..
 I have not used C++ containers. I program in C for a living, and 
 C++ for a hobby. Is there a C++ container for strings that cannot tell 
 the difference between non-existant and empty?
Yeah, it's called std::string, and it's more or less the default.
And it's crap. IMNSHO.
 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)
It may be undefined, but I believe it is required.
Why? C++ gets along without them just fine, and every C derivant I know of gets along fine without allowing primitive type returns to signify nonexistence. Functions which returns structs cannot return null either.
Thus why just about no-one ever does this (in C). They all return a pointer to a struct.
 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.
Farmer's test reports pretty consistent results if you suppose that comparing arrays to null is ill-formed: empty1.length == 0 is true empty1 == "" is true empty2.length == 0 is true empty2 == "" is true empty3.length == 0 is true empty3 == "" is true Don't compare arrays to null. Don't try to differentiate between empty and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
 D arrays simply do not work that way.
In that case we need an array specialisation for strings, so I'll have to write my own. This defeats the purpose of char[] in the first place, which was, to be a better more consistent string handling method than in possible in c/c++. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 28 2004
next sibling parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have 
 to write my own. This defeats the purpose of char[] in the first place, 
 which was, to be a better more consistent  string handling method than 
 in possible in c/c++.
That would work, but it might be better to adjust your thinking to match the language instead of trying to shoehorn the way you're used to thinking onto an abstraction that clearly wasn't built for it. Don't think in Java/C++/etc. Think in D. :) -- andy
Jun 28 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Mon, 28 Jun 2004 22:54:23 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have to write 
 my own. This defeats the purpose of char[] in the first place, which 
 was, to be a better more consistent  string handling method than in 
 possible in c/c++.
That would work, but it might be better to adjust your thinking to match the language instead of trying to shoehorn the way you're used to thinking onto an abstraction that clearly wasn't built for it. Don't think in Java/C++/etc. Think in D. :)
You may be right, so in an effort to change my thinking, pls consider this... struct Item { char[] label; char[] value; } class Post { Item[] items; char[] getValue(char[] label) { foreach(Item i; items) { if (item.label == label) return item.value; } //return null; not allowed return ""; } } Web page... <form post.. > <input type="text" name="foo" value=""> <input type="text" name="bar" value=""> </form> Code to do something with the post. char[] s; Post p; s = p.getValue("foo"); if (s) .. s = p.getValue("bar"); if (s) .. Right... If I cannot return null, then (using the code above) I cannot tell the difference between whether foo or bar was passed or had an empty value. So I have to add a function, something like class Post { bool isPresent(char[] label) { foreach(Item i; items) { if (item.label == label) return true; } return false; } } and in my code.. if (p.isPresent("foo")) { s = p.getValue("foo"); .. } looks more complex. In addition I am searching for the label/value twice, doing twice the work. To avoid that I can add a parameter to the getValue function i.e. class Post { char[] getValue(char[] label, out bool isNull) { foreach(Item i; items) { if (item.label == label) return item.value; } //return null; not allowed isNull = true; return ""; } } then my code looks like... char[] s; bool isn; s = p.getValue("foo",isn); if (!isn) { } more complex code again, less obvious, a 3rd option springs to mind, instead of returning a char[] from getValue I could return existance and fill a passed char[] i.e. class Post { bool getValue(char[] label, out char[] value) { foreach(Item i; items) { if (item.label == label) { value = item.value; return true; } } return false; } } so my code now looks like... char[] s; if (getValue("foo",s)) { } this is perhaps the best soln so far. But! lets consider if this were extended to get 2 or more char[] values, (this is perfectly reasonable/likely, say they are loaded from a file, why process the file twice when you can do so once and get both values). bool getValue(out char[] val1, out char[] val2) { } what do we return if val1 exists but val2 does not? a set of flags? yuck. It just seems to me, that all this is done to emulate a reference type.. so why not have a reference type? We already have one, all it would take to make it consistent is 2 minor changes. If you have a solution to the above that is both as simple, elegant and easy to code as being able to return null.. pls educate me. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
next sibling parent reply Charlie <Charlie_member pathlink.com> writes:
---

s = p.getValue("foo");
if (s.length) 

---

Whats wrong with this way ? 

Charlie

In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...
On Mon, 28 Jun 2004 22:54:23 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have to write 
 my own. This defeats the purpose of char[] in the first place, which 
 was, to be a better more consistent  string handling method than in 
 possible in c/c++.
That would work, but it might be better to adjust your thinking to match the language instead of trying to shoehorn the way you're used to thinking onto an abstraction that clearly wasn't built for it. Don't think in Java/C++/etc. Think in D. :)
You may be right, so in an effort to change my thinking, pls consider this... struct Item { char[] label; char[] value; } class Post { Item[] items; char[] getValue(char[] label) { foreach(Item i; items) { if (item.label == label) return item.value; } //return null; not allowed return ""; } } Web page... <form post.. > <input type="text" name="foo" value=""> <input type="text" name="bar" value=""> </form> Code to do something with the post. char[] s; Post p; s = p.getValue("foo"); if (s) .. s = p.getValue("bar"); if (s) .. Right... If I cannot return null, then (using the code above) I cannot tell the difference between whether foo or bar was passed or had an empty value. So I have to add a function, something like class Post { bool isPresent(char[] label) { foreach(Item i; items) { if (item.label == label) return true; } return false; } } and in my code.. if (p.isPresent("foo")) { s = p.getValue("foo"); .. } looks more complex. In addition I am searching for the label/value twice, doing twice the work. To avoid that I can add a parameter to the getValue function i.e. class Post { char[] getValue(char[] label, out bool isNull) { foreach(Item i; items) { if (item.label == label) return item.value; } //return null; not allowed isNull = true; return ""; } } then my code looks like... char[] s; bool isn; s = p.getValue("foo",isn); if (!isn) { } more complex code again, less obvious, a 3rd option springs to mind, instead of returning a char[] from getValue I could return existance and fill a passed char[] i.e. class Post { bool getValue(char[] label, out char[] value) { foreach(Item i; items) { if (item.label == label) { value = item.value; return true; } } return false; } } so my code now looks like... char[] s; if (getValue("foo",s)) { } this is perhaps the best soln so far. But! lets consider if this were extended to get 2 or more char[] values, (this is perfectly reasonable/likely, say they are loaded from a file, why process the file twice when you can do so once and get both values). bool getValue(out char[] val1, out char[] val2) { } what do we return if val1 exists but val2 does not? a set of flags? yuck. It just seems to me, that all this is done to emulate a reference type.. so why not have a reference type? We already have one, all it would take to make it consistent is 2 minor changes. If you have a solution to the above that is both as simple, elegant and easy to code as being able to return null.. pls educate me. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
parent Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 00:52:17 +0000 (UTC), Charlie 
<Charlie_member pathlink.com> wrote:

 ---

 s = p.getValue("foo");
 if (s.length)

 ---

 Whats wrong with this way ?
an empty char[] has a length of 0. the above would not see an empty value passed in a form. Regan.
 Charlie

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...
 On Mon, 28 Jun 2004 22:54:23 -0700, Andy Friesen <andy ikagames.com> 
 wrote:

 Regan Heath wrote:
 ... we need an array specialisation for strings, so I'll have to write
 my own. This defeats the purpose of char[] in the first place, which
 was, to be a better more consistent  string handling method than in
 possible in c/c++.
That would work, but it might be better to adjust your thinking to match the language instead of trying to shoehorn the way you're used to thinking onto an abstraction that clearly wasn't built for it. Don't think in Java/C++/etc. Think in D. :)
You may be right, so in an effort to change my thinking, pls consider this... struct Item { char[] label; char[] value; } class Post { Item[] items; char[] getValue(char[] label) { foreach(Item i; items) { if (item.label == label) return item.value; } //return null; not allowed return ""; } } Web page... <form post.. > <input type="text" name="foo" value=""> <input type="text" name="bar" value=""> </form> Code to do something with the post. char[] s; Post p; s = p.getValue("foo"); if (s) .. s = p.getValue("bar"); if (s) .. Right... If I cannot return null, then (using the code above) I cannot tell the difference between whether foo or bar was passed or had an empty value. So I have to add a function, something like class Post { bool isPresent(char[] label) { foreach(Item i; items) { if (item.label == label) return true; } return false; } } and in my code.. if (p.isPresent("foo")) { s = p.getValue("foo"); .. } looks more complex. In addition I am searching for the label/value twice, doing twice the work. To avoid that I can add a parameter to the getValue function i.e. class Post { char[] getValue(char[] label, out bool isNull) { foreach(Item i; items) { if (item.label == label) return item.value; } //return null; not allowed isNull = true; return ""; } } then my code looks like... char[] s; bool isn; s = p.getValue("foo",isn); if (!isn) { } more complex code again, less obvious, a 3rd option springs to mind, instead of returning a char[] from getValue I could return existance and fill a passed char[] i.e. class Post { bool getValue(char[] label, out char[] value) { foreach(Item i; items) { if (item.label == label) { value = item.value; return true; } } return false; } } so my code now looks like... char[] s; if (getValue("foo",s)) { } this is perhaps the best soln so far. But! lets consider if this were extended to get 2 or more char[] values, (this is perfectly reasonable/likely, say they are loaded from a file, why process the file twice when you can do so once and get both values). bool getValue(out char[] val1, out char[] val2) { } what do we return if val1 exists but val2 does not? a set of flags? yuck. It just seems to me, that all this is done to emulate a reference type.. so why not have a reference type? We already have one, all it would take to make it consistent is 2 minor changes. If you have a solution to the above that is both as simple, elegant and easy to code as being able to return null.. pls educate me. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling next sibling parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:

 ... I could return existance and
 fill a passed char[]...  so my code now looks like...
 
 char[] s;
 if (getValue("foo",s))
I like this. It's simple and obvious.
 if this were extended to get 2 or more char[] values...
 bool getValue(out char[] val1, out char[] val2) {}
In this case, I would say that the best thing to do on failure is to throw an exception. Asking for a number of values all at once looks (to me, anyhow) to be implying that you expect them all to be present. If you don't, you'll have to test them all individually at some point anyway, in which case the previous form allows you to test and retrieve in one step. It may also be useful to return all the attributes as an associative array. They're easy to mutate and iterate through.
 It just seems to me, that all this is done to emulate a reference type.. 
 so why not have a reference type?
You got me there, but it seems to me that things could get very weird if you need to express a non-null array of 0 length.
 If you have a solution to the above that is both as simple, elegant and 
 easy to code as being able to return null.. pls educate me.
Exposing POST data as an associative array seems like a win to me; it's faster and can can be iterated over conveniently. Also, as a language intrinsic, it's a bit more likely to plug into other APIs easily. If you *really* need to, you could probably get away with doing something like: const char[] nadda = "nadda"; if (s is not nadda) { ... } -- andy
Jun 29 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 19:26:22 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:

 ... I could return existance and
 fill a passed char[]...  so my code now looks like...

 char[] s;
 if (getValue("foo",s))
I like this. It's simple and obvious.
I agree.
 if this were extended to get 2 or more char[] values...
 bool getValue(out char[] val1, out char[] val2) {}
In this case, I would say that the best thing to do on failure is to throw an exception. Asking for a number of values all at once looks (to me, anyhow) to be implying that you expect them all to be present.
Nope. This is taken from a real life example, I have a config file with 10 different settings, all optional, I want 3 or them at this point in the code, so I process the file once and load the 3 settings which may or may not be present, and may or may not have a zero length values.
 If you don't, you'll have to test them all individually at some point 
 anyway
Yes, at that point I need to be able to tell if the setting was present, present with zero length value, or not present at all.
 , in which case the previous form allows you to test and retrieve in one 
 step.
Which previous form? do you mean the one that takes only one parameter, if so, that would involve parsing the file 3 times, not acceptable.
 It may also be useful to return all the attributes as an associative 
 array.  They're easy to mutate and iterate through.
It's the same problem all over again, say I have: char[char[]] list; char[] s1,s2,s3; fn(list); s1 = list["setting1"]; s2 = list["setting2"]; s3 = list["setting3"]; s needs to be null for setting3, empty for setting2 and "foobar" for setting1. I believe this is currently the case, but!, as Farmer has shown if I then went if (s2 == s3) //this would evaluate to true and that's a problem.
 It just seems to me, that all this is done to emulate a reference 
 type.. so why not have a reference type?
You got me there, but it seems to me that things could get very weird if you need to express a non-null array of 0 length.
char[] s = "" s is a non-null array of 0 length.
 If you have a solution to the above that is both as simple, elegant and 
 easy to code as being able to return null.. pls educate me.
Exposing POST data as an associative array seems like a win to me;
I agree, it's a more D thing to do also :) I believe the same problem still applies (see above)
 it's faster and can can be iterated over conveniently.  Also, as a 
 language intrinsic, it's a bit more likely to plug into other APIs 
 easily.

 If you *really* need to, you could probably get away with doing 
 something like:

      const char[] nadda = "nadda";
      if (s is not nadda) { ... }
True, but this is yucky and what if a setting actually had a value of "nadda"? Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:
 This is taken from a real life example, I have a config file with 
 10 different settings, all optional, I want 3 or them at this point in 
 the code, so I process the file once and load the 3 settings which may 
 or may not be present, and may or may not have a zero length values.
I guess it's just a matter of preference. I don't have a problem with something like this: char[][char[]] attribs = ...; if ("a" in attribs && "b" in attribs && "c" in attribs) { If nonexistence is an alias for some default, fill the array before parsing the file. Attributes that are present will override those which are not. Python offers a get() method which takes two arguments: a key, and a default value which is returned should the key not exist. I use this a lot.
 things could get very weird if you need to express a non-null array of 0
length.
char[] s = "" s is a non-null array of 0 length.
What about non-char types?
 If you *really* need to, you could probably get away with doing 
 something like:

      const char[] nadda = "nadda";
      if (s is not nadda) { ... }
True, but this is yucky and what if a setting actually had a value of "nadda"?
That's why you use 'is' and not ==. 'is' performs a pointer comparison. The array has to point into that exact string literal for the comparison to be true. The only catch is string pooling. It'd be okay as long as the string literal "nadda" isn't declared anywhere in the source code. Come to think of it, this is better: char[] nonString = new char[1]; // don't mutate me! Just compare with 'is'! I'm officially out of ideas now. heh. -- andy
Jun 29 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 22:35:28 -0700, Andy Friesen <andy ikagames.com> wrote:
 Regan Heath wrote:
 This is taken from a real life example, I have a config file with 10 
 different settings, all optional, I want 3 or them at this point in the 
 code, so I process the file once and load the 3 settings which may or 
 may not be present, and may or may not have a zero length values.
I guess it's just a matter of preference. I don't have a problem with something like this: char[][char[]] attribs = ...; if ("a" in attribs && "b" in attribs && "c" in attribs) {
It's more like: if ("a" in attribs) { } if ("b" in attribs) { } if ("c" in attribs) { } but, you seem to have completely ignored the fact that, *if* we remove the ability to return null when an array type is expected (you suggested removing the ability to assign null to an array, it's the same thing), the above will cease to work altogether as I imagine the above is simply going if (attribs["a"] != null) which is the same as char[] s; s = attribs["a"]; if (s != null) which is impossible if you cannot use null with arrays.
 If nonexistence is an alias for some default, fill the array before 
 parsing the file.  Attributes that are present will override those which 
 are not.

 Python offers a get() method which takes two arguments: a key, and a 
 default value which is returned should the key not exist.  I use this a 
 lot.
but if there is no default, you're left doing the nadda thing below which is simply an ugly hack (explanation below)
 things could get very weird if you need to express a non-null array of 
 0 length.
char[] s = "" s is a non-null array of 0 length.
What about non-char types?
 If you *really* need to, you could probably get away with doing 
 something like:

      const char[] nadda = "nadda";
      if (s is not nadda) { ... }
True, but this is yucky and what if a setting actually had a value of "nadda"?
That's why you use 'is' and not ==. 'is' performs a pointer comparison. The array has to point into that exact string literal for the comparison to be true. The only catch is string pooling. It'd be okay as long as the string literal "nadda" isn't declared anywhere in the source code.
ahh, gotcha, so basically you're creating null with another name. Why not just have null. :)
 Come to think of it, this is better:

     char[] nonString = new char[1]; // don't mutate me!  Just compare 
 with 'is'!
Another face for the same entity, null.
 I'm officially out of ideas now.  heh.
Think of it from the other point of view, assume we make the minor adjustments to arrays that I suggested, what effect does it have on the people who cannot see themselves needing a null array? hmm.. I think none. IMO it simply gives us more flexibilty of expression at no cost. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:

 if ("a" in attribs) { ... }
 ...
 
 you seem to have completely ignored the fact that, *if* we remove 
 the ability to return null when an array type is expected (you suggested 
 removing the ability to assign null to an array, it's the same thing), 
 the above will cease to work altogether as I imagine the above is simply 
 going
 
 if (attribs["a"] != null)
I very much doubt this. Associative arrays maintain an internal list of keys and values. In all likelihood, the 'in' operator hashes the key ("a" in this case) and searches through the associative array's internal hash table for one that matches.
 If nonexistence is an alias for some default, fill the array before 
 parsing the file.  Attributes that are present will override those 
 which are not.

 Python offers a get() method which takes two arguments: a key, and a 
 default value which is returned should the key not exist.  I use this 
 a lot.
but if there is no default, you're left doing the nadda thing below which is simply an ugly hack (explanation below)
Right. I am an idiot. (below)
 That's why you use 'is' and not ==.  'is' performs a pointer 
 comparison.    The array has to point into that exact string literal 
 for the comparison to be true.  The only catch is string pooling.  
 It'd be okay as long as the string literal "nadda" isn't declared 
 anywhere in the source code.
ahh, gotcha, so basically you're creating null with another name. Why not just have null. :)
I was thinking about this, and the conclusion that I came to is that I am a complete idiot for not noticing what looked to be a completely arbitrary distinction with respect to comparing against null and comparing against any other pointer. After a tiny bit of testing, I came to the conclusion that I am an even bigger idiot than I could have possibly imagined. D already gets things pretty much bang on: T[] a, b; a = b; // 'a == b' and 'a is b' will both be true. (even if b is // null) a = b.dup; // 'a == b' will be true. 'a is b' will be true iff b is // null. (null.dup is null, evidently. funny that) With respect to 'a == null', my mind is quite blown. Farmer's tests reliably produce situations where zero-length strings compare false against null. My own tests show that empty arrays are equivalent to null but do not share identity. Don't test x==null, I guess. :) Explicitly testing for an empty, non-null array requires that you write 'if (x !== null && x.length == 0)', which is probably okay: I can envision hordes of new programmers going postal because of 'name != ""' and 'name.length == 0' somehow both evaluating to true at the same time. -- andy
Jun 30 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 19:02:22 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:

 if ("a" in attribs) { ... }
 ...

 you seem to have completely ignored the fact that, *if* we remove the 
 ability to return null when an array type is expected (you suggested 
 removing the ability to assign null to an array, it's the same thing), 
 the above will cease to work altogether as I imagine the above is 
 simply going

 if (attribs["a"] != null)
I very much doubt this. Associative arrays maintain an internal list of keys and values. In all likelihood, the 'in' operator hashes the key ("a" in this case) and searches through the associative array's internal hash table for one that matches.
I agree totally. I am not disputing how an associative array works, what I am saying is, without the ability to compare an array to null, you cannot express 'does not exist' in terms of an associative array. What does: if ("a" in attribs) actually evaluate to, if not: if (attribs["a"] != null) ?
 If nonexistence is an alias for some default, fill the array before 
 parsing the file.  Attributes that are present will override those 
 which are not.

 Python offers a get() method which takes two arguments: a key, and a 
 default value which is returned should the key not exist.  I use this 
 a lot.
but if there is no default, you're left doing the nadda thing below which is simply an ugly hack (explanation below)
Right. I am an idiot. (below)
 That's why you use 'is' and not ==.  'is' performs a pointer 
 comparison.    The array has to point into that exact string literal 
 for the comparison to be true.  The only catch is string pooling.  
 It'd be okay as long as the string literal "nadda" isn't declared 
 anywhere in the source code.
ahh, gotcha, so basically you're creating null with another name. Why not just have null. :)
I was thinking about this, and the conclusion that I came to is that I am a complete idiot for not noticing what looked to be a completely arbitrary distinction with respect to comparing against null and comparing against any other pointer. After a tiny bit of testing, I came to the conclusion that I am an even bigger idiot than I could have possibly imagined. D already gets things pretty much bang on: T[] a, b; a = b; // 'a == b' and 'a is b' will both be true. (even if b is // null) a = b.dup; // 'a == b' will be true. 'a is b' will be true iff b is // null. (null.dup is null, evidently. funny that) With respect to 'a == null', my mind is quite blown. Farmer's tests reliably produce situations where zero-length strings compare false against null. My own tests show that empty arrays are equivalent to null but do not share identity. Don't test x==null, I guess. :) Explicitly testing for an empty, non-null array requires that you write 'if (x !== null && x.length == 0)', which is probably okay:
My tests, given: char[] e = "" char[] n; output: e is "" (f) n is "" (f) e is null (f) n is null (t) e is n (f) e == "" (t) n == "" (t) incorrect? e == null (f) n == null (t) e == n (t) incorrect? e === "" (f) n === "" (f) e === null (f) n === null (t) e === n (f) The != and !== tests were all the opposite of the above, so I have not included them. == calls opEquals, perhaps it has a shortcut in it which says if the lengths are both 0 return true? this would explain the two cases above I have marked "incorrect?". I think these two cases are inconsistent. To reliably test for nullness I can use '===' or '!==' or 'is'.
 I can envision hordes of new programmers going postal because of 'name 
 != ""' and 'name.length == 0' somehow both evaluating to true at the 
 same time.
Yeah.. to stop that name.length would have to have a NaN (null) value. Which 'int' or 'uint' does not have. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:
 I am not disputing how an associative array works, what 
 I am saying is, without the ability to compare an array to null, you 
 cannot express 'does not exist' in terms of an associative array.
 
 What does:
   if ("a" in attribs)
 
 actually evaluate to, if not:
   if (attribs["a"] != null)
This could never work anyway. Types for which null does not make sense obviously can't use null to indicate nonexistence. Types for which null does make sense can't do this either, as it makes perfect sense to store a null reference. The fundamental idea is that you're trying to represent a "nonvalue", which is storable in the result variable, but not part of the variable's range. This obviously won't work, as it requires two contradictory ideas to be simultaneously true. Adding a 'special' value like null is sometimes close enough for specific application domains, but, in the end, all you're doing is making the range of allowable values bigger.
 == calls opEquals, perhaps it has a shortcut in it which says if the 
 lengths are both 0 return true? this would explain the two cases above I 
 have marked "incorrect?". I think these two cases are inconsistent.
Looking at internal/adi.d, it looks like it compares the lengths, then compares each element in succession: extern (C) int _adEq(Array a1, Array a2, TypeInfo ti) { if (a1.length != a2.length) return 0; // not equal int sz = ti.tsize(); //printf("sz = %d\n", sz); void *p1 = a1.ptr; void *p2 = a2.ptr; for (int i = 0; i < a1.length; i++) { if (!ti.equals(p1 + i * sz, p2 + i * sz)) return 0; // not equal } return 1; // equal } How on Earth ""!=null ever comes about is beyond me. -- andy
Jun 30 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 22:40:28 -0700, Andy Friesen <andy ikagames.com> wrote:

 Regan Heath wrote:
 I am not disputing how an associative array works, what I am saying is, 
 without the ability to compare an array to null, you cannot express 
 'does not exist' in terms of an associative array.

 What does:
   if ("a" in attribs)

 actually evaluate to, if not:
   if (attribs["a"] != null)
This could never work anyway. Types for which null does not make sense obviously can't use null to indicate nonexistence. Types for which null does make sense can't do this either, as it makes perfect sense to store a null reference.
Yeah... you're right.
 The fundamental idea is that you're trying to represent a "nonvalue", 
 which is storable in the result variable, but not part of the variable's 
 range.  This obviously won't work, as it requires two contradictory 
 ideas to be simultaneously true.  Adding a 'special' value like null is 
 sometimes close enough for specific application domains, but, in the 
 end, all you're doing is making the range of allowable values bigger.
I think.. I agree. :)
 == calls opEquals, perhaps it has a shortcut in it which says if the 
 lengths are both 0 return true? this would explain the two cases above 
 I have marked "incorrect?". I think these two cases are inconsistent.
Looking at internal/adi.d, it looks like it compares the lengths, then compares each element in succession:
I went looking for that (not hard enough obviously)..
      extern (C) int _adEq(Array a1, Array a2, TypeInfo ti)
      {
          if (a1.length != a2.length)
              return 0;		// not equal
          int sz = ti.tsize();
          //printf("sz = %d\n", sz);
          void *p1 = a1.ptr;
          void *p2 = a2.ptr;
          for (int i = 0; i < a1.length; i++)
          {
              if (!ti.equals(p1 + i * sz, p2 + i * sz))
                  return 0;		// not equal
          }
          return 1;			// equal
      }
 How on Earth ""!=null ever comes about is beyond me.
below _adEq is.. extern (C) int _adCmp(Array a1, Array a2, TypeInfo ti) { int len; //printf("adCmp()\n"); len = a1.length; if (a2.length < len) len = a2.length; int sz = ti.tsize(); void *p1 = a1.ptr; void *p2 = a2.ptr; for (int i = 0; i < len; i++) { int c; c = ti.compare(p1 + i * sz, p2 + i * sz); if (c) return c; } return cast(int)a1.length - cast(int)a2.length; } which would return 0 if both lengths were 0. "" and null both have a length of 0. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
parent Andy Friesen <andy ikagames.com> writes:
Regan Heath wrote:
 On Wed, 30 Jun 2004 22:40:28 -0700, Andy Friesen <andy ikagames.com> wrote:
 How on Earth ""!=null ever comes about is beyond me.
_adEq is.. extern (C) int _adCmp(Array a1, Array a2, TypeInfo ti) { [....] } which would return 0 if both lengths were 0. "" and null both have a length of 0.
Right, but Cmp functions return 0 to indicate equality, which would be the right thing in this case. My money says the cause is in that inline-assembly-optimized _adCmpChar. (line 360) I freely admit that I blame it on the inline assembly because me and assembly have not been on speaking terms for some time now. (one too many hand-coded alpha-blits that lost to MSVC's optimizing compiler) -- andy
Jul 01 2004
prev sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

s = p.getValue("foo");
if (s) ..
s = p.getValue("bar");
if (s) ..

Right...

If I cannot return null, then (using the code above) I cannot tell the 
difference between whether foo or bar was passed or had an empty value.
And indeed that very situation is ALSO true with integer parameters. How can tell the difference between an integer parameter being present and zero, and no integer parameter being present at all? But of course, there are various solutions to this problem, many much simpler than you propose. For a start, you could return an int* instead of an int, or indeed a char[]* instead of a char[]. Then you could explicitly test for ===null in both cases. In C++, I'd just return a std::pair<bool, T>. I'm sure that once we have a good supply of standard templates in D we'll be able to do much the same thing. (Even without templates, you could define a struct and return it). Anything wrong with either of these approaches? Arcane Jill
Jun 30 2004
parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 07:27:33 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell the
 difference between whether foo or bar was passed or had an empty value.
And indeed that very situation is ALSO true with integer parameters. How can tell the difference between an integer parameter being present and zero, and no integer parameter being present at all?
Yep. As another poster noted he had the same problem with integers, resulting in him using a value of -1 to represent null. Yuck.
 But of course, there are various solutions to this problem, many much 
 simpler
 than you propose. For a start, you could return an int* instead of an 
 int, or
 indeed a char[]* instead of a char[]. Then you could explicitly test for 
 ===null
 in both cases.
This is the C solution. For int I cannot think of a good D solution. For char[] (or any array) we already have one, the array emulates/acts like a reference type, it's just inconsistent.
 In C++, I'd just return a std::pair<bool, T>. I'm sure that once we have 
 a good
 supply of standard templates in D we'll be able to do much the same 
 thing. (Even
 without templates, you could define a struct and return it).
You're emulating a reference type, why not just have one. This may be the best soln for int and other strict value types.
 Anything wrong with either of these approaches?
Yep. Neither is as simple, elegant or clean as a reference type, which we already have in D arrays albeit inconsistently. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
parent reply Kevin Bealer <Kevin_member pathlink.com> writes:
In article <opsafg63rh5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 30 Jun 2004 07:27:33 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell the
 difference between whether foo or bar was passed or had an empty value.
And indeed that very situation is ALSO true with integer parameters. How can tell the difference between an integer parameter being present and zero, and no integer parameter being present at all?
Yep. As another poster noted he had the same problem with integers, resulting in him using a value of -1 to represent null. Yuck.
 But of course, there are various solutions to this problem, many much 
 simpler
 than you propose. For a start, you could return an int* instead of an 
 int, or
 indeed a char[]* instead of a char[]. Then you could explicitly test for 
 ===null
 in both cases.
This is the C solution. For int I cannot think of a good D solution. For char[] (or any array) we already have one, the array emulates/acts like a reference type, it's just inconsistent.
The D equivalent might be to return int[] or char[][] y. Test if the length is zero. If it's not, then the data is "present". Otherwise it is missing. For the HTML parsing example given in this thread, this may be even better because sometimes HTML has multiple values with the same tag. Another data point: I've also used the technique listed below (pair<bool, T>), albeit wrapped in a template class. The code is very readable. Kevin
 In C++, I'd just return a std::pair<bool, T>. I'm sure that once we have 
 a good
 supply of standard templates in D we'll be able to do much the same 
 thing. (Even
 without templates, you could define a struct and return it).
You're emulating a reference type, why not just have one. This may be the best soln for int and other strict value types.
 Anything wrong with either of these approaches?
Yep. Neither is as simple, elegant or clean as a reference type, which we already have in D arrays albeit inconsistently. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 07 2004
parent Farmer <itsFarmer. freenet.de> writes:
Kevin Bealer <Kevin_member pathlink.com> wrote in
news:cci0tl$2dnl$1 digitaldaemon.com: 

 In article <opsafg63rh5a2sq9 digitalmars.com>, Regan Heath says...
On Wed, 30 Jun 2004 07:27:33 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opsadsu8f75a2sq9 digitalmars.com>, Regan Heath says...

 s = p.getValue("foo");
 if (s) ..
 s = p.getValue("bar");
 if (s) ..

 Right...

 If I cannot return null, then (using the code above) I cannot tell
 the difference between whether foo or bar was passed or had an empty
 value. 
And indeed that very situation is ALSO true with integer parameters. How can tell the difference between an integer parameter being present and zero, and no integer parameter being present at all?
Yep. As another poster noted he had the same problem with integers, resulting in him using a value of -1 to represent null. Yuck.
 But of course, there are various solutions to this problem, many much 
 simpler
 than you propose. For a start, you could return an int* instead of an 
 int, or
 indeed a char[]* instead of a char[]. Then you could explicitly test
 for ===null
 in both cases.
This is the C solution. For int I cannot think of a good D solution. For char[] (or any array) we already have one, the array emulates/acts like a reference type, it's just inconsistent.
The D equivalent might be to return int[] or char[][] y. Test if the length is zero. If it's not, then the data is "present". Otherwise it is missing.
Disagree. Returning an array for a single value confuses a programmer that didn't bother to fully read the function's documentation. (Don't blame the programmer, in most cases the documentation doesn't exist, anyway.)











 
 For the HTML parsing example given in this thread, this may be even
 better because sometimes HTML has multiple values with the same tag.
 
 Another data point: I've also used the technique listed below
 (pair<bool, T>), albeit wrapped in a template class.  The code is very
 readable. 
Since the code is very readable, why do you argue that the D-way would be something different, then? [No need to answer this, I already know one good answer.] What do you mean by 'albeit wrapped in a template class'? Do you wrap 'pair<bool, T>' into your own templated class to provide an isNull() method? I like the pair<bool, T> solution best. It expresses the meaning of the returned value precisely and can be generically applied to all types. Still, in some cases using reference typec (e.g.null-arrays) is a simpler and faster solution. Farmer.
 
 Kevin
 
 In C++, I'd just return a std::pair<bool, T>. I'm sure that once we
 have a good
 supply of standard templates in D we'll be able to do much the same 
 thing. (Even
 without templates, you could define a struct and return it).
You're emulating a reference type, why not just have one. This may be the best soln for int and other strict value types.
 Anything wrong with either of these approaches?
Yep. Neither is as simple, elegant or clean as a reference type, which we already have in D arrays albeit inconsistently. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jul 09 2004
prev sibling next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 Yeah, it's called std::string, and it's more or less the default.
And it's crap. IMNSHO.
You'll get no arguments from me there. D got it right in not having a string class. I didn't think that at first, but I've come round to the D way of thinking. The problem with a string class is that you can't add new member functions to it. (Oh, you may be able to subclass String, if it's not final. Oh wait - it /is/ final in Java). With char[] arrays, you CAN add new functions. Besides which, what else can a char[] array possibly repreresent, other than a string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the same as a byte[] array, which could mean anything.
 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
Why? Do we also need a way to differentiate between empty and non-existent ints? In D, there is no such thing as a non-existent int; there is no such thing as a non-existent struct; and there is no such thing as a non-existent string. Why not just start from the assumption that we DON'T need to differentiate between empty and non-existant strings, and take it from there? Maybe the real solution would be to make it a compile error to assign an array with null, or to compare it with null. This would then force people to say what they mean, and all such problems would go away. (Anyway, you all KNOW my opinion that should be a compile-time error anyway, because char[] is not boolean. But that's another story). Jill
Jun 29 2004
next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Tue, 29 Jun 2004 07:18:20 +0000 (UTC), Arcane Jill wrote:

 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 Yeah, it's called std::string, and it's more or less the default.
And it's crap. IMNSHO.
You'll get no arguments from me there. D got it right in not having a string class. I didn't think that at first, but I've come round to the D way of thinking. The problem with a string class is that you can't add new member functions to it. (Oh, you may be able to subclass String, if it's not final. Oh wait - it /is/ final in Java). With char[] arrays, you CAN add new functions. Besides which, what else can a char[] array possibly repreresent, other than a string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the same as a byte[] array, which could mean anything.
 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
Why? Do we also need a way to differentiate between empty and non-existent ints? In D, there is no such thing as a non-existent int; there is no such thing as a non-existent struct; and there is no such thing as a non-existent string. Why not just start from the assumption that we DON'T need to differentiate between empty and non-existant strings, and take it from there?
Because that's not what is being meant. I'd like to differentiate between INITIALIZED and UNINITIALIZED vectors. This non-existant thing is a red-herring. 'empty' means initialized and length of zero. 'non-existant' means not initialized yet. Its a workaround for the current (longer) way of handling this situation. Its no big deal but it would be 'nice to have'. Like a strict bool type would be nice to have. -- Derek Melbourne, Australia 29/Jun/04 6:24:19 PM
Jun 29 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...

Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors.
Why? D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you realize). In C++, there is no such thing as an uninitialized vector. Why on Earth would you want them in D?
This non-existant thing is a
red-herring. 'empty' means initialized and length of zero. 'non-existant'
means not initialized yet.
Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow uninitialized array handles (as opposed to array content) to exist in D. It makes no sense. Please, can someone who is arguing in favor of allowing a distinction between initialized and unintialized dynamic array handles, explain exactly why you want such a distinction to exist? Arcane Jill
Jun 29 2004
next sibling parent reply Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:

 In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...
 
 
Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors.
Why? D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you realize).
The difference is in C++ it's common to use a pointer to a class (and I presume, a vector). In D, an array is a struct, not a class, so to get reference semantics you have to use a struct pointer. In C++ this would be no big deal, but this doesn't seem like the D way. Reference semantics allow me to change the length of an array and have it reflected in the caller, and to store nulls.
 In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?
For the same reason you use null in other situations with reference types. I want accessing an uninitialised member array to give an error. I want to be able to use a null argument to a function to trigger special or default behaviour (optional arguments in any position). Sam PS: AJ, I'm not sure if you read the forums at dsource, I posted a couple of deimos bugs: http://dsource.org/forums/viewtopic.php?t=224
Jun 29 2004
next sibling parent Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbrgtm$19gj$1 digitaldaemon.com>, Sam McCall says...
PS: AJ, I'm not sure if you read the forums at dsource,
I do, but less frequently than this one as it's a slow turnover list. I get notified when new posts are added to existing threads, but not when new threads are added.
I posted a 
couple of deimos bugs:
http://dsource.org/forums/viewtopic.php?t=224
Okay, I'm on it. I'll let you know when they're fixed. Maybe we could start a "Bugs" thread on Deimos. That way I'll always get notified when anyone adds to it. Jill
Jun 29 2004
prev sibling parent reply Matthias Becker <Matthias_member pathlink.com> writes:
 In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?
For the same reason you use null in other situations with reference types. I want accessing an uninitialised member array to give an error. I want to be able to use a null argument to a function to trigger special or default behaviour (optional arguments in any position).
Nope, wrong. If you use reference-types that are allowed to be NULL (in C++ references aren't, e.g. in nice there are references, that aren't, too, ...) you want to show that there possibly is no object. At least in languages that allow you to use other kinds of references (e.g. C++ or nixe as mentiond above). In languages that don't have references that can't be null, you just can't express yourself in the code. In C++ I never had the wish to pass a container/collection as a pointer. I allways pass them as C++-reference. So I'm sure there allways is a collection and I don't have to check for this. If there are no values to pass in, I just pass an empty collection. Could you please make some example where it makes sense not to pass a collection instead of passing an empty collection? -- Matthias Becker
Jun 29 2004
next sibling parent Sam McCall <tunah.d tunah.net> writes:
Matthias Becker wrote:
 In C++ I never had the wish to pass a container/collection as a pointer. I
 allways pass them as C++-reference. So I'm sure there allways is a collection
 and I don't have to check for this.
 If there are no values to pass in, I just pass an empty collection.
 
 
 Could you please make some example where it makes sense not to pass a
collection
 instead of passing an empty collection?
To request default behaviour a la optional arguments, without restrictions on the number or position of the arguments. Sam
Jun 29 2004
prev sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 15:58:29 +0000 (UTC), Matthias Becker 
<Matthias_member pathlink.com> wrote:

 In C++, there is no such thing as an uninitialized vector. Why on
 Earth would you want them in D?
For the same reason you use null in other situations with reference types. I want accessing an uninitialised member array to give an error. I want to be able to use a null argument to a function to trigger special or default behaviour (optional arguments in any position).
Nope, wrong. If you use reference-types that are allowed to be NULL (in C++ references aren't, e.g. in nice there are references, that aren't, too, ...) you want to show that there possibly is no object. At least in languages that allow you to use other kinds of references (e.g. C++ or nixe as mentiond above). In languages that don't have references that can't be null, you just can't express yourself in the code. In C++ I never had the wish to pass a container/collection as a pointer. I allways pass them as C++-reference. So I'm sure there allways is a collection and I don't have to check for this. If there are no values to pass in, I just pass an empty collection. Could you please make some example where it makes sense not to pass a collection instead of passing an empty collection?
pls read my post (2 prior to this one - sorted flat and by date, it is a response to Andy's post) it contains an example. I would like some feedback on how to achieve what I want to do... Regan.
 -- Matthias Becker
-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling next sibling parent reply Derek <derek psyc.ward> writes:
On Tue, 29 Jun 2004 09:50:35 +0000 (UTC), Arcane Jill wrote:

 In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...
 
Because that's not what is being meant. I'd like to differentiate between
INITIALIZED and UNINITIALIZED vectors.
Why? D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you realize). In C++, there is no such thing as an uninitialized vector. Why on Earth would you want them in D?
I don't use C++, so I'm not aware of what std::vector does or does not provide. Ok, off the top of my head... I'm writing a library that will be used by other coders. It has a function that accepts a dynamic array. A zero-length array is a valid parameter. The caller however can pass an uninitialized parameter to tell my function that the user wishes to use the default values instead of supplying a value. In short, an uninitialized variable contains information - namely the fact that it *is* uninitialized. And that information could be utilized by a coder - if they had the chance.
 
This non-existant thing is a
red-herring. 'empty' means initialized and length of zero. 'non-existant'
means not initialized yet.
Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow uninitialized array handles (as opposed to array content) to exist in D. It makes no sense.
Ok, but it does to me. Sorry I can't seem to be able to explain why.
 Please, can someone who is arguing in favor of allowing a distinction between
 initialized and unintialized dynamic array handles, explain exactly why you
want
 such a distinction to exist?
Apparently not; sorry. -- Derek Melbourne, Australia
Jun 29 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <12vwf4nkzjzxa.17ai9mojp3dpz$.dlg 40tude.net>, Derek says...
Ok, off the top of my head...

I'm writing a library that will be used by other coders. It has a function
that accepts a dynamic array. A zero-length array is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.
I'd use two functions for this: ..but only if an empty array was NOT the default. In many cases, I could probably get away with an empty array BEING the default, in which case, I could simply do:
In short, an uninitialized variable contains information - namely the fact
that it *is* uninitialized.
It's a nice argument, but it could be applied equally well to ANY types. If I were supremely in favor of the notion that uninitializedness carries information (which I'm not), I might argue as follows:
I'm writing a library that will be used by other coders. It has a function
that accepts a bit. Zero is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.
If I believed that, I'd be arguing for a distinction between an uninitialized bit, and a bit containing zero. I happen not to believe that, however.
 Why would ANYONE want to allow
 uninitialized array handles (as opposed to array content) to exist in D. It
 makes no sense.
Ok, but it does to me. Sorry I can't seem to be able to explain why.
Yeah, human language is a bummer. Someone ought to invent telepathy. Jill
Jun 29 2004
parent Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:
 In article <12vwf4nkzjzxa.17ai9mojp3dpz$.dlg 40tude.net>, Derek says...
 
Ok, off the top of my head...

I'm writing a library that will be used by other coders. It has a function
that accepts a dynamic array. A zero-length array is a valid parameter. The
caller however can pass an uninitialized parameter to tell my function that
the user wishes to use the default values instead of supplying a value.
I'd use two functions for this: ..but only if an empty array was NOT the default. In many cases, I could probably get away with an empty array BEING the default, in which case, I could simply do:
Sure, but it sucks if there's a lot of them, and is impossible if the function is variadic. The ability to pass null to a function is very useful, I've switched from structs to classes more than once for this reason. Sam
Jun 29 2004
prev sibling next sibling parent "Carlos Santander B." <carlos8294 msn.com> writes:
"Arcane Jill" <Arcane_member pathlink.com> escribió en el mensaje
news:cbre1b$15j0$1 digitaldaemon.com
|
| ...
|
| Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow
| uninitialized array handles (as opposed to array content) to exist in D.
It
| makes no sense.
|
| Please, can someone who is arguing in favor of allowing a distinction
between
| initialized and unintialized dynamic array handles, explain exactly why
you want
| such a distinction to exist?
|
|
| Arcane Jill

Regan already said why:

"Regan Heath" <regan netwin.co.nz> escribió en el mensaje
news:opr99w0st25a2sq9 digitalmars.com
|
| ...
|
| We *need* to have *both* null and empty arrays. The reason is pretty
| simple:
|    - null means does not exist
|    - emtpy means exists, but has no value (or empty value)
|
| This is important in situations like the original poster mentioned and in
| my experience for example... When reading POST input from a web page, you
| get a string like so:
|
|    Setting1=Regan+Heath&Setting2=&&
|
| when requesting items you might have a function like:
|
|    char[] getFormValue(char[] label);
|
| the code to get the values for the above form might go:
|
|    char[] s;
|
|    s = getFormValue("Setting1"); // s is "Regan Heath"
|    s = getFormValue("Setting2"); // s is ""
|    s = getFormValue("Setting3"); // s is null
|
| It is important the above code can tell that Setting3 was not passed in
| the form, so it can decide not to overwrite whatever current value that
| setting has, whereas it can tell Setting2 was passed and will overwrite
| the current value with a new blank one.
|
| ...
|

Personally, I would use an associative array to represent such a thing
(instead of using a function), but it's an implementation difference, and
the language should let Regan do the way he wants.

I've ran into such cases before ("" !== null), I know that. I just can't
remember any of them right now :D

Two more things: I don't think this should only be for strings, but for any
array. And I'm 100% sure this has been raised before.

-----------------------
Carlos Santander Bernal
Jun 29 2004
prev sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 09:50:35 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <cbr9e5$vai$1 digitaldaemon.com>, Derek Parnell says...

 Because that's not what is being meant. I'd like to differentiate 
 between
 INITIALIZED and UNINITIALIZED vectors.
Why? D's dynamic arrays are the same thing as C++ std::vectors (as I'm sure you realize). In C++, there is no such thing as an uninitialized vector. Why on Earth would you want them in D?
 This non-existant thing is a
 red-herring. 'empty' means initialized and length of zero. 
 'non-existant'
 means not initialized yet.
Yeah - but nobody has yet answered WHY? Why would ANYONE want to allow uninitialized array handles (as opposed to array content) to exist in D. It makes no sense. Please, can someone who is arguing in favor of allowing a distinction between initialized and unintialized dynamic array handles, explain exactly why you want such a distinction to exist?
Pls read the reply I just made to Andy's post that started this branch in this thread i.e. just go up a little bit in a threaded reader, or look for the post I made just prior to this one if viewing flat and sorting by date. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling next sibling parent reply Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:
 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 
Yeah, it's called std::string, and it's more or less the default.
And it's crap. IMNSHO.
You'll get no arguments from me there. D got it right in not having a string class. I didn't think that at first, but I've come round to the D way of thinking.
I'm still getting there... I still don't see why toUpper("hello") is better than "hello".toUpper(), under the assumption that the OO way has any merit. (If it doesn't, why do we have it?)
 The problem with a string class is that you can't add new member
 functions to it. (Oh, you may be able to subclass String, if it's not final. Oh
 wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.
I'm confused: is there a way of adding functions to array types that can't be used with classes?
 Besides which, what else can a char[] array possibly repreresent, other than a
 string? (Given that a char[] array MUST contain UTF-8, I mean). It's not the
 same as a byte[] array, which could mean anything.
In theory you're right. The problem is when people assume "a char array is a list of characters", which is perfectly logical, given the name. In theory, you should only store a list of characters in a dchar[]. But it's not going to happen, see std.string.maketrans (char[] is a list) and translate (char[] is opaque). [RANT] IMO, D (language, not libraries) isn't _really_ trying to be fully-unicode at all. What is the purpose of a char/wchar variable? How often do you actually need to be directly manipulating UTF8/16 fragments? (Hint: in a unicode-based language with good libraries, almost never). *IF* D is going to be fully-unicode, that does have performance impacts. A single character must _always_ go in a dchar variable. So what is the advantage in having strings being char[] arrays? ("knowing the encoding" doesn't count, the user shouldn't have to care). IMO, strings NEED to: * Have only one type, or one base type. I want to write a function that accepts a string. I don't want to write three functions, or use a template (that has to be manually instantiated). * Expose character data as _characters_, not fragments. This means characters accessed must be dchars, indexing must be character, not fragment-based. * Be efficient in the common case. At the moment, this probably means using UTF-8 internally. This could be changed in the future, or there could be multiple versions with the same base type, because all character data would be exposed at the character level. * Be fully reference types. At the moment, if someone passes in a string, I can modify its data, which is shared, and its length, which is not. This makes sense if you understand the implementation, but why should foo~="bar" have the truly odd effects it does? Always passing strings inout is ugly and confusing in other cases. Based on this, the solution to me looks like a String interface that exposes character data, and UTF8String as the default implementation, which stores its data in a ubyte[], literal strings would create these. There could then be a UTF32String implementation which would be more efficient for various other languages. The "char" type should be 32 bits wide. Anything else is confusing. (Hey, they did it with "int"...). [/RANT] Now flame on, I'm sure that's not going to be too popular ;-)
Don't compare arrays to null.  Don't try to differentiate between empty 
and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
Why? Do we also need a way to differentiate between empty and non-existent ints?
Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts of ugly things when negative numbers are perfectly valid. This is neccesary for pragmatic reasons of efficiency, I'd love chips to treat 0x8000... as NaN like the NaN we have in IEEE floating point. (This'd also balance the range of integers). I'm not saying we can/should change the behaviour of ints, just that I don't think this argument has merit. I think arrays should become fully reference types, for the same reason as strings above. Yes, this would probably mean double indirection, arrays would be a pointer to the (length,data pointer) struct that they currently are. Sam
Jun 29 2004
next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbrfn4$1805$1 digitaldaemon.com>, Sam McCall says...

[RANT]
IMO, D (language, not libraries) isn't _really_ trying to be 
fully-unicode at all.
What is the purpose of a char/wchar variable? How often do you actually 
need to be directly manipulating UTF8/16 fragments? (Hint: in a 
unicode-based language with good libraries, almost never).
Maybe not, but you still need something to store them in. Even if you let a library do all your UTF-8 work for you (which you should), then you still a type designed to contain such sequences. In D, a char array is that type. In other words, the type char exists in order that the type char[] might exist. I don't have a problem with that.
*IF* D is going to be fully-unicode, that does have performance impacts. 
A single character must _always_ go in a dchar variable. So what is the 
advantage in having strings being char[] arrays?
Space.
("knowing the encoding" 
doesn't count, the user shouldn't have to care).
In a strongly typed language, that would be true, but D is not a strongly typed language. Walter is on record as stating that all char types including dchar can be freely used as integers. If that's going to be true, you MUST care about the encoding.
IMO, strings NEED to:
	* Have only one type, or one base type.
And, to take that reasoning further, it should have other interesting properties too, like it should be IMPOSSIBLE IN ANY CIRCUMSTANCE to end up with a char containing a value outside the range U+000000 to U+10FFFF inclusive. However, I don't see this happening in D. The reason being that even a dchar is not a character in the Unicode sense. It is a UTF-32 encoding of a character. (The minor technical difference being that dchar values above 0x10FFFF exist, but are invalid, whereas Unicode characters beyond U+10FFFF do not even exist).
	* Expose character data as _characters_, not fragments.
This means characters accessed must be dchars, indexing must be 
character, not fragment-based.
That depends on your point of view. Unicode may be viewed on many levels. I'm sure I could hold a reasonable argument in which I insisted that string data should be exposed as _glyphs_, not characters (characters are, after all, merely glyph fragments). Glyphs are what you see. If a string contains an e-acute glyph, should your application really /care/ which characters compose that glyph? Somewhere along the line, you have to face the bottom level. That level is the level of character encoding. Language support is given to the encoding level. For anything above that, you use libraries. If such libraries don't exist yet, we can write them.
The "char" type should be 32 bits wide. Anything else is confusing. 
21 bits wide, and limited to the range 0-0x10FFFF. Anything else is confusing. But this is D, and D is practical.
Now flame on, I'm sure that's not going to be too popular ;-)
Actually, I loved it, and I'm not flaming (and I hope nobody does). You've made some excellent observations. But it's way too late to shape D that way now. In the future, the may well be languages which handle characters as true, pure, Unicode characters, but the world isn't fully Unicode-aware yet. To give an example of what I mean: Suppose you publish a web site containing a few musical symbols and a few exotic math symbols. (All valid Unicode). The sad fact is, such a website won't display properly on most people's browsers. To get them to display properly, it is currently the responsibility of VIEWERS (rather than publishers) of web sites, to "obtain", somehow, the relevant fonts to make it work. Usually, obtaining such fonts costs money, so who's going to bother? It'd be like buying a book and opening it to find half the characters looking like black blobs until you pay more money to a font-designer. And so, web site designers tend NOT to use such characters on their web sites, prefering gif images which everyone can view. It's a vicious circle. In short, the world is not Unicode yet, and it's frustrating. Bits of it are still trying to catch up with other bits. Sometimes you just want scream at the planet to get its act together right now. But we have to be realistic. And realistically, things /are/ changing - but slowly. What D is doing is moving in the right direction. The shift to full Unicode support in all things is a long way off yet, and to get there, we must move in small steps. Defining a char as a UTF-8 fragment may be a small step, but it is a very important and valuable one. At least we don't say "a char is a character in some unspecified encoding", like some other languages do. Nice post, by the way. I enjoyed reading it. Jill
Jun 29 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:

 In article <cbrfn4$1805$1 digitaldaemon.com>, Sam McCall says...
 
 
[RANT]
IMO, D (language, not libraries) isn't _really_ trying to be 
fully-unicode at all.
What is the purpose of a char/wchar variable? How often do you actually 
need to be directly manipulating UTF8/16 fragments? (Hint: in a 
unicode-based language with good libraries, almost never).
Maybe not, but you still need something to store them in. Even if you let a library do all your UTF-8 work for you (which you should), then you still a type designed to contain such sequences. In D, a char array is that type. In other words, the type char exists in order that the type char[] might exist. I don't have a problem with that.
Sure, but given that the "user" shouldn't be touching chars without realising that they're more complicated than in C, byte[] would do? Still, I'm not fussed about this.
*IF* D is going to be fully-unicode, that does have performance impacts. 
A single character must _always_ go in a dchar variable. So what is the 
advantage in having strings being char[] arrays?
Space.
Sorry, I didn't mean char[] as opposed to dchar[], I meant char[] as opposed to something more opaque. The reasoning for not having a string class, IIRC, is "strings are lists of characters". Well, chars aren't characters.
("knowing the encoding" 
doesn't count, the user shouldn't have to care).
In a strongly typed language, that would be true, but D is not a strongly typed language. Walter is on record as stating that all char types including dchar can be freely used as integers. If that's going to be true, you MUST care about the encoding.
If you don't use them as integers, then you don't have to care. I'm not saying it shouldn't be well-defined, but Java doesn't require the user to understand the intricacies of unicode encodings to manipulate strings. (Yes, java has efficiency problems with strings and presumably some problems with wide unicode characters due to a 16 bit char type, but I think that still makes sense).
IMO, strings NEED to:
	* Have only one type, or one base type.
And, to take that reasoning further, it should have other interesting properties too, like it should be IMPOSSIBLE IN ANY CIRCUMSTANCE to end up with a char containing a value outside the range U+000000 to U+10FFFF inclusive. However, I don't see this happening in D. The reason being that even a dchar is not a character in the Unicode sense. It is a UTF-32 encoding of a character. (The minor technical difference being that dchar values above 0x10FFFF exist, but are invalid, whereas Unicode characters beyond U+10FFFF do not even exist).
Okay, I didn't realise dchars were 21 bits wide... if there's a way of doing this that's efficient, that'd be cool, dchar (or "char") could be 21 bits. If it's going to be hopelessly slow, you have to trust the programmer to some extent, what about "any library operation involving an out-of-range dchar is undefined"?
 That depends on your point of view. Unicode may be viewed on many levels. I'm
 sure I could hold a reasonable argument in which I insisted that string data
 should be exposed as _glyphs_, not characters (characters are, after all,
merely
 glyph fragments). Glyphs are what you see. If a string contains an e-acute
 glyph, should your application really /care/ which characters compose that
 glyph?
Probably not, although if reading an encoded string and then writing it again doesn't produce the same byte-output, I'm sure I could find a contrived example... copy-pasting text invalidating a digital signature? Either would be much better than what we've got now, and I think character is more likely (though still spectacularly unlikely), because it has an obvious, efficient representation (32 bit unsigned number). Am I right in assuming a glyph can be fairly complicated?
 Somewhere along the line, you have to face the bottom level. That level is the
 level of character encoding. Language support is given to the encoding level.
 For anything above that, you use libraries. If such libraries don't exist yet,
 we can write them.
Yeah. It's just a bit disappointing after hearing "Strings are character arrays and everything about them makes sense" to realise that you either have to grok UTF-N or treat these "characters" as opaque... the advantages over a class are gone, and a class has reference semantics and member functions.
The "char" type should be 32 bits wide. Anything else is confusing. 
21 bits wide, and limited to the range 0-0x10FFFF. Anything else is confusing.
It clearly is, because I assumed a unicode character was 32 bits wide, on the basis that that's what D had taught me :-\
 But this is D, and D is practical.
If it's going to be horribly inefficient to make it 21 bits, have the spec say "it's at least 21 bits" and alias it to uint.
 Actually, I loved it, and I'm not flaming (and I hope nobody does). You've made
 some excellent observations. But it's way too late to shape D that way now. In
 the future, the may well be languages which handle characters as true, pure,
 Unicode characters, but the world isn't fully Unicode-aware yet.
Yeah, it's the partly-there that's frustrating... my selfish side would be happy with just ASCII ;-). It just seems sometimes that if it's not easy and consistent to make things unicode-friendly, it won't happen. Especially in places where ASCII works fine, that's certainly easy and consistent! The current way seems to suggest that officially it's all unicode and happy, but (don't tell anyone) feel free to use ascii and assume chars are characters if you want. The standard library even does this, in std.string no less.
 To give an example of what I mean: Suppose you publish a web site containing a
 few musical symbols and a few exotic math symbols. (All valid Unicode). The sad
 fact is, such a website won't display properly on most people's browsers. To
get
 them to display properly, it is currently the responsibility of VIEWERS (rather
 than publishers) of web sites, to "obtain", somehow, the relevant fonts to make
 it work. Usually, obtaining such fonts costs money, so who's going to bother?
 It'd be like buying a book and opening it to find half the characters looking
 like black blobs until you pay more money to a font-designer. And so, web site
 designers tend NOT to use such characters on their web sites, prefering gif
 images which everyone can view. It's a vicious circle.
Yeah, fonts are a problem. My ideal world would have a (huge!) complete system default font (or one each for serif, sans, and mono) supplied with the OS, that would be the fallback for nonexistant characters.
 And realistically, things /are/ changing - but slowly. What D is doing is
moving
 in the right direction. The shift to full Unicode support in all things is a
 long way off yet, and to get there, we must move in small steps.
Yes. What gets me is that in a 5 years we'll (hopefully) be far enough down the unicode road that D's approach will seem backward, and I'll have to wait for someone to reinvent a similar language, with a more thorough unicode integration. Ah well, maybe we'll get a strong boolean next time <g>
 Defining a char as a UTF-8 fragment may be a small step, but it is a very
 important and valuable one. At least we don't say "a char is a character in
some
 unspecified encoding", like some other languages do.
Yeah, definitely. I just wish it was easier to use and harder to ignore. Sam
Jun 29 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbs0bj$1vhf$1 digitaldaemon.com>, Sam McCall says...
I'm not saying it shouldn't be well-defined, but Java doesn't require 
the user to understand the intricacies of unicode encodings to 
manipulate strings.
Yes it does. Java chars operate in UTF-16. If you want to store the character U+012345 in a Java string, you need to worry about UTF-16.
Probably not, although if reading an encoded string and then writing it 
again doesn't produce the same byte-output, I'm sure I could find a 
contrived example... copy-pasting text invalidating a digital signature?
That's what normalization is for. We'll have that soon in a forthcoming version of etc.unicode.
Am I right in assuming a glyph can be fairly complicated?
Very much so. Especially if you're a font designer, since Unicode allows you to munge any two glyphs together into a bigger glyph (a ligature). In practice, fonts only provide a small subset of all possible ligatures (as you can imagine!).
Yeah. It's just a bit disappointing after hearing "Strings are character 
arrays and everything about them makes sense" to realise that you either 
have to grok UTF-N or treat these "characters" as opaque... the 
advantages over a class are gone, and a class has reference semantics 
and member functions.
Not really. So long as you remember that characters <= 0x7F are OK in a char, and that characters <= 0xFFFF are fine in a wchar, you're sorted.
Yeah, it's the partly-there that's frustrating... my selfish side would 
be happy with just ASCII ;-). It just seems sometimes that if it's not 
easy and consistent to make things unicode-friendly, it won't happen. 
Right, but it's a question of where that support comes from. To demand it all of the language itself is asking /a lot/ from poor old Walter. If we can add it, piece by piece, in libraries, I'd say we're not doing too badly.
Especially in places where ASCII works fine, that's certainly easy and 
consistent! The current way seems to suggest that officially it's all 
unicode and happy, but (don't tell anyone) feel free to use ascii
It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8. UTF-8 was designed that way.
and 
assume chars are characters if you want. The standard library even does 
this, in std.string no less.
So long as they make no assumptions about characters > 0x7F, that's perfectly reasonable.
Yeah, fonts are a problem. My ideal world would have a (huge!) complete 
system default font (or one each for serif, sans, and mono) supplied 
with the OS, that would be the fallback for nonexistant characters.
I absolutely agree. There are free fonts which do this, but they don't display well at small point-size because of something called "hinting", which apparently you can't do without paying someone royalties because of some stupid IP nonsense.
Yes. What gets me is that in a 5 years we'll (hopefully) be far enough 
down the unicode road that D's approach will seem backward, and I'll 
have to wait for someone to reinvent a similar language, with a more 
thorough unicode integration.
Yup. That's the way it goes. So what else shall we imagine for D++? Jill
Jun 29 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:

 In article <cbs0bj$1vhf$1 digitaldaemon.com>, Sam McCall says...
 
I'm not saying it shouldn't be well-defined, but Java doesn't require 
the user to understand the intricacies of unicode encodings to 
manipulate strings.
Yes it does. Java chars operate in UTF-16. If you want to store the character U+012345 in a Java string, you need to worry about UTF-16.
Whoops. Having never had to deal with this case (and taken a series of CS courses where we've iterated over chars countless times and they never mentioned this once :-\) I hadn't thought about this. Okay, suppose java had a 21- or 32-bit char type.
Probably not, although if reading an encoded string and then writing it 
again doesn't produce the same byte-output, I'm sure I could find a 
contrived example... copy-pasting text invalidating a digital signature?
That's what normalization is for. We'll have that soon in a forthcoming version of etc.unicode.
Of course... so no, the program shouldn't care, but...
Am I right in assuming a glyph can be fairly complicated?
Very much so. Especially if you're a font designer, since Unicode allows you to munge any two glyphs together into a bigger glyph (a ligature). In practice, fonts only provide a small subset of all possible ligatures (as you can imagine!).
Glyphs aren't really a practical option as the logical element type of strings if they can't be easily represented as a fixed-width number, I'd imagine.
Yeah. It's just a bit disappointing after hearing "Strings are character 
arrays and everything about them makes sense" to realise that you either 
have to grok UTF-N or treat these "characters" as opaque... the 
advantages over a class are gone, and a class has reference semantics 
and member functions.
Not really. So long as you remember that characters <= 0x7F are OK in a char, and that characters <= 0xFFFF are fine in a wchar, you're sorted.
But you can't do obvious "list-of-characters" things like index by character or even slice at any offset.
Yeah, it's the partly-there that's frustrating... my selfish side would 
be happy with just ASCII ;-). It just seems sometimes that if it's not 
easy and consistent to make things unicode-friendly, it won't happen. 
Right, but it's a question of where that support comes from. To demand it all of the language itself is asking /a lot/ from poor old Walter. If we can add it, piece by piece, in libraries, I'd say we're not doing too badly.
A decent unicode string class could be almost entirely library based, and would only require a little magic language support (for string literals). I might have a play around with one, on the assumption that if people find it useful, the horribly inefficient/incorrect bits could be fixed by people who know what they're doing ;)
 It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8.
UTF-8
 was designed that way.
So this means a char[] has two purposes depending on the app? On the one hand, ASCII/Unicode being a per-app decision is fair enough. On the other hand, that's not what it looked like to me in the docs, and I still think unicode should be the "default". Also, if people are going to use char[] as ASCII, they may write libraries that assume char[] is ASCII or worse, "a character in some unknown encoding".
and 
assume chars are characters if you want. The standard library even does 
this, in std.string no less.
So long as they make no assumptions about characters > 0x7F, that's perfectly reasonable.
If it were documented as only working for ASCII, sure, otherwise you might assume it was a UTF-8 encoded character list. And I'm still not sure it'd be reasonable unless a wchar/dchar version was provided, how good is a language's unicode support if string manipulation functions only work on ascii? Anyway: /************************************ * Construct translation table for translate(). */ char[] maketrans(char[] from, char[] to) in { assert(from.length == to.length); } body { char[] t = new char[256]; int i; for (i = 0; i < 256; i++) t[i] = cast(char)i; for (i = 0; i < from.length; i++) t[from[i]] = to[i]; return t; }
Yeah, fonts are a problem. My ideal world would have a (huge!) complete 
system default font (or one each for serif, sans, and mono) supplied 
with the OS, that would be the fallback for nonexistant characters.
I absolutely agree. There are free fonts which do this, but they don't display well at small point-size because of something called "hinting", which apparently you can't do without paying someone royalties because of some stupid IP nonsense.
Ew, does that apply to creating fonts too? I thought most free fonts weren't manually hinted because it'd take forever, especially for unicode... I know freetype doesn't interpret hints by default, but there's a #define somewhere: "set this to 1 if you have permission from Apple Legal, or live somewhere sane". On my distro of choice, this was set by default :-D
Yes. What gets me is that in a 5 years we'll (hopefully) be far enough 
down the unicode road that D's approach will seem backward, and I'll 
have to wait for someone to reinvent a similar language, with a more 
thorough unicode integration.
Yup. That's the way it goes. So what else shall we imagine for D++?
Fix C's broken precedence rules? Sam
Jun 29 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbsufo$a8u$1 digitaldaemon.com>, Sam McCall says...

Okay, suppose java had a 21- or 32-bit char type.
I'm led to believe there was a lot of debate about this. Some folk said that Java's char could NOT be anything other that 16 bits wide because it was defined that way and changing it would break things. Other folk looked under the hood of the JVM and decided that actually it probably wouldn't break anything after all. I don't know the ins and outs of it, but I gather the first lot won. The way it's going to go is UTF-16 support, with functions like isLetter() taking an int rather than a char.
Glyphs aren't really a practical option as the logical element type of 
strings if they can't be easily represented as a fixed-width number, I'd 
imagine.
Well, they can, with a bit of sneaky manipulation. The trick is to map only those ones you actually USE to the unused codepoints between 0x110000 and 0xFFFFFFFF. So long as such a mapping stays within the application (like, don't try to export it), you can indeed have one dchar per glyph. But it would be a temporary one - not one you could write to a file, for example. In general, you're right.
But you can't do obvious "list-of-characters" things like index by 
character or even slice at any offset.
True.
 It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8.
UTF-8
 was designed that way.
So this means a char[] has two purposes depending on the app?
I'm not sure I follow that. If you say char[] a = "hello world"; then you will get a string containing eleven chars, and it will be both valid ASCII and valid UTF-8. It's not like you have to choose.
On the one hand, ASCII/Unicode being a per-app decision is fair enough.
That isn't what I said. It's possible we may be misunderstanding each other somehow.
Also, if people are going to use char[] as ASCII, they may write 
libraries that assume char[] is ASCII
Well, that would be a bug, of course. It's perfectly ok to choose only to store ASCII characters in chars, but NOT perfectly okay to assume that chars will only contain ASCII characters. Anyone writing a library containing such a bug should simply be press-ganged into fixing it.
or worse, "a character in some 
unknown encoding".
Again, that would be a bug, and at odds with D's definition of what a char is.
If it were documented as only working for ASCII, sure, otherwise you 
might assume it was a UTF-8 encoded character list. And I'm still not 
sure it'd be reasonable unless a wchar/dchar version was provided, how 
good is a language's unicode support if string manipulation functions 
only work on ascii?
I'm not completely clear what functions you're talking about, as I haven't read the source code for std.string. Am I correct in assuming that the quote below is an extract?
Anyway:
/************************************
  * Construct translation table for translate().
  */

char[] maketrans(char[] from, char[] to)
     in
     {
	assert(from.length == to.length);
     }
     body
     {
	char[] t = new char[256];
	int i;

	for (i = 0; i < 256; i++)
	    t[i] = cast(char)i;

	for (i = 0; i < from.length; i++)
	    t[from[i]] = to[i];

	return t;
     }
This is a bug. ASCII stops at 0x7F. Characters above 0x7F are not ASCII. If this function is intended as an ASCII-only function then (a) it should be documented as such, and (b) it should leave all bytes >0x7F unmodified. Char values between 0x80 and 0xFF are resevered for the role they play in UTF-8. You CANNOT mess with them (unless you're a UTF-8 engine). You're right. I'd prefer to see a dchar version of this routine. Of course, you wouldn't want a lookup table with 0x1100000 entries in it, but an associative array should do the job. Assuming this is from std.string, I guess one of us should report this as a bug. Arcane Jill
Jun 30 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:

 In article <cbsufo$a8u$1 digitaldaemon.com>, Sam McCall says...
 
 
Okay, suppose java had a 21- or 32-bit char type.
I'm led to believe there was a lot of debate about this. Some folk said that Java's char could NOT be anything other that 16 bits wide because it was defined that way and changing it would break things. Other folk looked under the hood of the JVM and decided that actually it probably wouldn't break anything after all. I don't know the ins and outs of it, but I gather the first lot won. The way it's going to go is UTF-16 support, with functions like isLetter() taking an int rather than a char.
Sorry, I meant "if java had originally been defined to have char being 21 bits instead of 16, and storing a unicode codepoint instead of a UTF-16 fragment". All java's string manipulation stuff is char-based, and I was convinced there was a one-to-one correspondence between chars and characters (or possibly some too-big char values possible). Clearly I was mistaken, but if they had made chars 21 bits and kept the rest the same, it looks to me like it'd be just about perfect. (Well, I'm sure the APIs could be improved in minor ways, etc, but relatively speaking).
Glyphs aren't really a practical option as the logical element type of 
strings if they can't be easily represented as a fixed-width number, I'd 
imagine.
Well, they can, with a bit of sneaky manipulation. The trick is to map only those ones you actually USE to the unused codepoints between 0x110000 and 0xFFFFFFFF. So long as such a mapping stays within the application (like, don't try to export it), you can indeed have one dchar per glyph. But it would be a temporary one - not one you could write to a file, for example.
Ooh, clever :) But I don't see this working in a situation where you have dynamic libraries, for example.
It /is/ okay to use ASCII. All valid ASCII also happens to be valid UTF-8. UTF-8
was designed that way.
So this means a char[] has two purposes depending on the app?
I'm not sure I follow that. If you say char[] a = "hello world"; then you will get a string containing eleven chars, and it will be both valid ASCII and valid UTF-8. It's not like you have to choose.
 On the one hand, ASCII/Unicode being a per-app decision is fair
 enough.
That isn't what I said. It's possible we may be misunderstanding each other somehow.
Sorry, what I originally meant:
 Especially in places where ASCII works fine, that's certainly easy and
 consistent! The current way seems to suggest that officially it's all
 unicode and happy, but (don't tell anyone) feel free to use ascii
Was that although unicode is the officially designated content of these types, char[] looks and feels (and the standard library uses it) like it's ASCII, and people won't bother to use unicode, because it's requires calling conversion functions and so on. Especially since if you assume the language will take care of unicode for you like java (almost) does, then you'll end up with code that only works properly for ASCII data. That's probably all a lot of people will test it with. We should get unicode by default.
If it were documented as only working for ASCII, sure, otherwise you 
might assume it was a UTF-8 encoded character list. And I'm still not 
sure it'd be reasonable unless a wchar/dchar version was provided, how 
good is a language's unicode support if string manipulation functions 
only work on ascii?
I'm not completely clear what functions you're talking about, as I haven't read the source code for std.string. Am I correct in assuming that the quote below is an extract?
std.string.maketrans and std.string.translate.
Anyway:
/************************************
 * Construct translation table for translate().
 */

char[] maketrans(char[] from, char[] to)
    in
    {
	assert(from.length == to.length);
    }
    body
    {
	char[] t = new char[256];
	int i;

	for (i = 0; i < 256; i++)
	    t[i] = cast(char)i;

	for (i = 0; i < from.length; i++)
	    t[from[i]] = to[i];

	return t;
    }
This is a bug. ASCII stops at 0x7F. Characters above 0x7F are not ASCII. If this function is intended as an ASCII-only function then (a) it should be documented as such, and (b) it should leave all bytes >0x7F unmodified. Char values between 0x80 and 0xFF are resevered for the role they play in UTF-8. You CANNOT mess with them (unless you're a UTF-8 engine).
It's got a single-line explanation that doesn't mention encoding. I'll report it. Sam
Jun 30 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbts89$1poh$1 digitaldaemon.com>, Sam McCall says...

Sorry, I meant "if java had originally been defined to have char being 
21 bits instead of 16, and storing a unicode codepoint instead of a 
UTF-16 fragment". All java's string manipulation stuff is char-based, 
and I was convinced there was a one-to-one correspondence between chars 
and characters (or possibly some too-big char values possible). Clearly 
I was mistaken,
You weren't mistaken. You were spot on. When Java was invented, Unicode stood at version 2.0. Possibly even earlier. At that time, Unicode was touted as a 16-bit standard, and its maximum codepoint was U+FFFF. At that time, there was no such thing as UTF-16. A Unicode char was 16 bits wide, and that was that. The only relevant 16-bit encodings were UCS-16LE (which meant, emit the 16-bit codepoint low order byte first), and UCS-16BE (which meant, emit the codepoint high order byte first). Java simply took that on board and went with it. But as time went by, the Unicode folk realized that sixty five thousand characters wasn't actually ENOUGH for all the world's scripts (including historical ones that nobody ever uses any more), so they managed to find a way to squeeze even more characters into that 16-bit model. They called it UTF-16, and it extends the range from U+FFFF to U+10FFFF. There has been some discussion on the Unicode public formum as to whether even THIS limit will ever be extended. The Unicode Consortium currently are stating flat out that there will never, ever, be Unicode characters with codepoints above U+10FFFF. So, you can choose to believe them, or you can regard this statement with as much credibility as the statements like "64K should be enough memory for anyone" which were touted in the ZX81 days. Java got caught out by the changing of the times. D's chars should probably be wider than 21-bits, just in case.... (Not that I'm choosing to disbelieve the Unicode Consortium of course!) 32 bits seems safe enough, for the forseeable future.
but if they had made chars 21 bits and kept the rest the 
same, it looks to me like it'd be just about perfect.
Yes. I'll bet the Java folk thought that at the time.
Was that although unicode is the officially designated content of these 
types, char[] looks and feels (and the standard library uses it) like 
it's ASCII, and people won't bother to use unicode, because it's 
requires calling conversion functions and so on.
Well, of course UTF-8 was /designed/ to be compatible with ASCII, to ease transition. That's not such a bad thing. Bugs will happen, of course, just as they happen with any other encoding, but they can be found and fixed (and fixing them will be easier, the more library support there is). It's just one of those things which is going to get better with time. Arcane Jill
Jun 30 2004
parent Sam McCall <tunah.d tunah.net> writes:
Arcane Jill wrote:

 In article <cbts89$1poh$1 digitaldaemon.com>, Sam McCall says...
 
 You weren't mistaken. You were spot on.
<snip> Wow, thanks for that explanation, I really appreciate it :-)
 
 
but if they had made chars 21 bits and kept the rest the 
same, it looks to me like it'd be just about perfect.
Yes. I'll bet the Java folk thought that at the time.
Okay, we'll stick with 32 bits. If they reach that in my lifetime, someone is going to die... Anyway, by the time I work out how to efficiently character-index UTF-8 in mutable stri]ngs, I'm sure I'll think unicode is thorougly overrated :-D Sam
 Well, of course UTF-8 was /designed/ to be compatible with ASCII, to ease
 transition. That's not such a bad thing. Bugs will happen, of course, just as
 they happen with any other encoding, but they can be found and fixed (and
fixing
 them will be easier, the more library support there is). It's just one of those
 things which is going to get better with time.
Jun 30 2004
prev sibling parent reply "Bent Rasmussen" <exo bent-rasmussen.info> writes:
 Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts
 of ugly things when negative numbers are perfectly valid. This is
That's true. In Standard ML you could do val index : 'a -> int option Then if 'a exists return SOME(x), if not, return NONE. If a function has a an option type as a domain it has to deal with both cases. In D, you'd either use a magic value like -1 or encapsulate values in a class; then null is NONE and not null is SOME.
 I think arrays should become fully reference types, for the same reason
 as strings above. Yes, this would probably mean double indirection,
 arrays would be a pointer to the (length,data pointer) struct that they
 currently are.
But you can go ahead and create a class for lists, no problem at all. Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.
Jun 29 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Bent Rasmussen wrote:

Frankly, yes, I use -1 as a "magic value" all the time, and do all sorts
of ugly things when negative numbers are perfectly valid. This is
That's true. In Standard ML you could do val index : 'a -> int option Then if 'a exists return SOME(x), if not, return NONE. If a function has a an option type as a domain it has to deal with both cases.
McCall's Law the First: Every feature of a "traditional" language is a special case of a feature of every functional language. McCall's Law the Second: Every feature of every functional language is a special case of the only feature of Lisp.
 In D, you'd either use a magic value like -1 or encapsulate values in a
 class; then null is NONE and not null is SOME.
But this isn't ML. I will get some weird looks, and nobody will touch my libraries ;-) Besides, that's exactly equivalent (AFAICS) to a reference type, assuming no pointer arithmetic and casting shenanigans. If this _is_ useful, is dereferencing one more pointer to access arrays really going to kill us? Or is there some case where the value-type-kinda nature of arrays is useful?
 But you can go ahead and create a class for lists, no problem at all.
 Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.
I'm beginning to think this is the only answer. But lists are such a fundamental type, using a non-standard list type would be a pain. I can't see room for another list type, so I guess I'll end up using DTL's list everywhere, and hope everyone does the same. But it does seem a waste of such powerful arrays in the language. Sam
Jun 29 2004
parent Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 03:20:54 +1200, Sam McCall <tunah.d tunah.net> wrote:

 Bent Rasmussen wrote:

 Frankly, yes, I use -1 as a "magic value" all the time, and do all 
 sorts
 of ugly things when negative numbers are perfectly valid. This is
That's true. In Standard ML you could do val index : 'a -> int option Then if 'a exists return SOME(x), if not, return NONE. If a function has a an option type as a domain it has to deal with both cases.
McCall's Law the First: Every feature of a "traditional" language is a special case of a feature of every functional language. McCall's Law the Second: Every feature of every functional language is a special case of the only feature of Lisp.
 In D, you'd either use a magic value like -1 or encapsulate values in a
 class; then null is NONE and not null is SOME.
But this isn't ML. I will get some weird looks, and nobody will touch my libraries ;-) Besides, that's exactly equivalent (AFAICS) to a reference type, assuming no pointer arithmetic and casting shenanigans. If this _is_ useful, is dereferencing one more pointer to access arrays really going to kill us? Or is there some case where the value-type-kinda nature of arrays is useful?
I think the current value-type-kinda nature of arrays is good, it just needs the 2 tweaks I mentioned to make it consistent.
 But you can go ahead and create a class for lists, no problem at all.
 Neither Phobos nor DTL has fully hatched yet, so we'll see what happens.
I'm beginning to think this is the only answer. But lists are such a fundamental type, using a non-standard list type would be a pain. I can't see room for another list type, so I guess I'll end up using DTL's list everywhere, and hope everyone does the same. But it does seem a waste of such powerful arrays in the language. Sam
-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling next sibling parent reply Matthias Becker <Matthias_member pathlink.com> writes:
 Yeah, it's called std::string, and it's more or less the default.
And it's crap. IMNSHO.
You'll get no arguments from me there. D got it right in not having a string class. I didn't think that at first, but I've come round to the D way of thinking. The problem with a string class is that you can't add new member functions to it. (Oh, you may be able to subclass String, if it's not final. Oh wait - it /is/ final in Java). With char[] arrays, you CAN add new functions.
Why do you need to add member-functions to a string class, but you don't on char-arrays? Why are global functions OK for char-arrays, but aren't for a string class? This is some kind of strange. Just because another notation? Does taht realy matters? There are languages where you can wirte: object.function() and function(objekt) and it measn the same. I don't get your point.
 Don't compare arrays to null.  Don't try to differentiate between empty 
 and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
Why? Do we also need a way to differentiate between empty and non-existent ints? In D, there is no such thing as a non-existent int; there is no such thing as a non-existent struct; and there is no such thing as a non-existent string.
Something like that would be cool, just like option in SML. I think I have to write something like this. -- Matthias Becker
Jun 29 2004
parent "Bent Rasmussen" <exo bent-rasmussen.info> writes:
In D, there is no such thing as a non-existent int; there is no such
thing as a
non-existent struct; and there is no such thing as a non-existent string.
Something like that would be cool, just like option in SML. I think I have to write something like this.
Perhaps, class Option(VALUE) { VALUE Item; } template SOME(VALUE) { Option!(VALUE) SOME(VALUE x) { Option!(VALUE) e = new Option!(VALUE)(); e.Item = x; return e; } } alias Option!(uint) INDEX; class Array(VALUE) { ... INDEX Index(VALUE x) { foreach (uint i, VALUE z; Items) { if (x == z) { return SOME!(VALUE)(i); } } return null; } } Somewhat non-ideal though.
Jun 29 2004
prev sibling parent reply Farmer <itsFarmer. freenet.de> writes:
Arcane Jill <Arcane_member pathlink.com> wrote in
news:cbr53s$op8$1 digitaldaemon.com: 

[snip]
 Why? Do we also need a way to differentiate between empty and
 non-existent ints? 
Yes, we do. A slightly *naive* but definitely opinionated soul already suggested exactly this. Unfortunately, this is not implementable without unacceptable performance loss. So we cannot have this. [snip]
 Maybe the real solution would be to make it a compile error to assign an
 array with null, or to compare it with null. This would then force
 people to say what they mean, and all such problems would go away.
I agree, that would help to avoid some confusion. Unfortunately, people would be forced to either say 'I mean empty' or to shut up completely and use sth. completely different. Farmer.
Jun 29 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Farmer wrote:

 Arcane Jill <Arcane_member pathlink.com> wrote in
 news:cbr53s$op8$1 digitaldaemon.com: 
 
Maybe the real solution would be to make it a compile error to assign an
array with null, or to compare it with null. This would then force
people to say what they mean, and all such problems would go away.
I agree, that would help to avoid some confusion. Unfortunately, people would be forced to either say 'I mean empty' or to shut up completely and use sth. completely different.
We don't have array literals, so we can't do this: foo( [] ); At the moment we can do this: foo( null ); If we outlawed using nulls as arrays, we'd be left with foo( new int[0] ) which is maybe a bit messy? Sam
Jun 29 2004
parent Farmer <itsFarmer. freenet.de> writes:
Sam McCall <tunah.d tunah.net> wrote in
news:cbsupg$anb$1 digitaldaemon.com: 

 Farmer wrote:
 
 Arcane Jill <Arcane_member pathlink.com> wrote in
 news:cbr53s$op8$1 digitaldaemon.com: 
 
Maybe the real solution would be to make it a compile error to assign
an array with null, or to compare it with null. This would then force
people to say what they mean, and all such problems would go away.
I agree, that would help to avoid some confusion. Unfortunately, people would be forced to either say 'I mean empty' or to shut up completely and use sth. completely different.
We don't have array literals, so we can't do this: foo( [] ); At the moment we can do this: foo( null ); If we outlawed using nulls as arrays, we'd be left with foo( new int[0] ) which is maybe a bit messy? Sam
What's messy here? A bit more typing, that's it. One disadvantage of foo( null ); is, that there is no type information. If you had foo(int[]) foo(float[]) you would need a cast, because it gets ambiguous. Farmer.
Jun 30 2004
prev sibling next sibling parent reply Matthias Becker <Matthias_member pathlink.com> writes:
 A 'null array' is a completely arbitrary concept that has been 
 extrapolated from undefined behaviour. :)
It may be undefined, but I believe it is required.
Why? C++ gets along without them just fine, and every C derivant I know of gets along fine without allowing primitive type returns to signify nonexistence. Functions which returns structs cannot return null either.
Thus why just about no-one ever does this (in C). They all return a pointer to a struct.
Because copying a struct costs much more than just copying a pointer to it. In C++ you have references for things like this, which can't be NULL.
 The soln IMO is either to make the current behaviour official and 
 consistent, or to change the behaviour, make that official and provide 
 another way to tell null apart from an empty string.
Farmer's test reports pretty consistent results if you suppose that comparing arrays to null is ill-formed: empty1.length == 0 is true empty1 == "" is true empty2.length == 0 is true empty2 == "" is true empty3.length == 0 is true empty3 == "" is true Don't compare arrays to null. Don't try to differentiate between empty and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
 D arrays simply do not work that way.
In that case we need an array specialisation for strings, so I'll have to write my own. This defeats the purpose of char[] in the first place, which was, to be a better more consistent string handling method than in possible in c/c++.
Could you please make some real world examples, where you need empty strings and null-strings? -- Matthias Becker
Jun 29 2004
parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 15:39:15 +0000 (UTC), Matthias Becker 
<Matthias_member pathlink.com> wrote:
 A 'null array' is a completely arbitrary concept that has been
 extrapolated from undefined behaviour. :)
It may be undefined, but I believe it is required.
Why? C++ gets along without them just fine, and every C derivant I know of gets along fine without allowing primitive type returns to signify nonexistence. Functions which returns structs cannot return null either.
Thus why just about no-one ever does this (in C). They all return a pointer to a struct.
Because copying a struct costs much more than just copying a pointer to it. In C++ you have references for things like this, which can't be NULL.
Thus why I dont use references either when I need the ability to say it's NULL.
 The soln IMO is either to make the current behaviour official and
 consistent, or to change the behaviour, make that official and provide
 another way to tell null apart from an empty string.
Farmer's test reports pretty consistent results if you suppose that comparing arrays to null is ill-formed: empty1.length == 0 is true empty1 == "" is true empty2.length == 0 is true empty2 == "" is true empty3.length == 0 is true empty3 == "" is true Don't compare arrays to null. Don't try to differentiate between empty and nonexistent.
Fine and dandy EXCEPT we *need* to differentiate between empty and non-existant strings.
 D arrays simply do not work that way.
In that case we need an array specialisation for strings, so I'll have to write my own. This defeats the purpose of char[] in the first place, which was, to be a better more consistent string handling method than in possible in c/c++.
Could you please make some real world examples, where you need empty strings and null-strings?
Sure thing, pls see my reply to andy's post.. there has to be an easy way to direct you to a post but I dont know how.. I posted it 3 or 4 posts ago if you sort flat and by date. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.
Why? It seems to me that this behavior would also require arrays to be initialized with new rather than resizing from zero using the .length parameter. And this would result in a ton of extra coding--either in clauses that errored on null arrays or initialization code to handle both cases. No thanks. If this happened I'd stil using built-in arrays and write a class for the purpose. Sean
Jun 29 2004
next sibling parent reply Farmer <itsFarmer. freenet.de> writes:
Sean Kelly <sean f4.ca> wrote in news:cbs4ju$26aj$1 digitaldaemon.com:

 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
Fine and dandy EXCEPT we *need* to differentiate between empty and 
non-existant strings.
Why? It seems to me that this behavior would also require arrays to be initialized with new rather than resizing from zero using the .length parameter. And this would result in a ton of extra coding--either in clauses that errored on null arrays or initialization code to handle both cases. [...]
The .length parameter would still work with null-arrays (as they currently do). But why would you want to initialize an array to null/empty and then resize it, instead of 'newing' it with the correct size in first place? My CPU gets hot enough, no need for extra heat-up cycles :-) Extra coding is not required if you don't need null-arrays: if some user passes a null-array, the user gets a nice access violation/array bounds exception and will quickly learn to not pass null-arrays to such functions. A quick check in the DbC section of your function would do the job, too. (But I suppose, the user might not adapt that fast that way :-) If your function should deal with both null-arrays and empty-arrays, no extra code is required, since the .length property can be accessed for both null- arrays and emtpy-arrays.
[...] No thanks.  If this happened I'd stil using built-in arrays
 and write a class for the purpose. 
I came to the same conclusion, wrapping a build-in array in a class or struct to adapt its behaviour to the specific needs is one (if not the) way to go. Farmer.
Jun 29 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <Xns9517F3F654C29itsFarmer 63.105.9.61>, Farmer says...
The .length parameter would still work with null-arrays (as they currently 
do). 
But why would you want to initialize an array to null/empty and then resize 
it, instead of 'newing' it with the correct size in first place?
Consider the following: char[] str = new char[100]; str.length = 0; // A str.length = 5; // B str = new char[10]; // C In A, AFAIK it's legal for the compiler to retain the memory and merely change the length parameter for the string. B then just changes the length parameter again, and no reallocation is performed. C forces a reallocation even if the array already has the (hidden) capacity in place. Lacking allocators, this is a feature I consider rather nice in D.
Extra coding is not required if you don't need null-arrays: if some user 
passes a null-array, the user gets a nice access violation/array bounds 
exception and will quickly learn to not pass null-arrays to such functions. A 
quick check in the DbC section of your function would do the job, too. (But I 
suppose, the user might not adapt that fast that way :-)
I originally thought D worked the way you describe and added DBC clauses to all my functions to check for null array parameters. After some testing I realized I'd been mistaken and happily removed most of these clauses. The result IMO was tighter, cleaner code that was easier to understand. I suppose it's really a matter of opinion. I like that arrays work the same as the other primitive types.
If your function should deal with both null-arrays and empty-arrays, no extra 
code is required, since the .length property can be accessed for both null-
arrays and emtpy-arrays.
Could it? I suppose so, but the concept seems a tad odd. I kind of expect none of the parameters (besides sizeof, perhaps) to work for dynamic types that have not been initialized. Though perhaps that's the C way of thinking. Sean
Jun 29 2004
parent reply Farmer <itsFarmer. freenet.de> writes:
Sean Kelly <sean f4.ca> wrote in news:cbsqnf$547$1 digitaldaemon.com:

 In article <Xns9517F3F654C29itsFarmer 63.105.9.61>, Farmer says...
The .length parameter would still work with null-arrays (as they
currently do). 
But why would you want to initialize an array to null/empty and then
resize it, instead of 'newing' it with the correct size in first place?
Consider the following: char[] str = new char[100]; str.length = 0; // A str.length = 5; // B str = new char[10]; // C In A, AFAIK it's legal for the compiler to retain the memory and merely change the length parameter for the string. B then just changes the length parameter again, and no reallocation is performed. C forces a reallocation even if the array already has the (hidden) capacity in place. Lacking allocators, this is a feature I consider rather nice in D.
I agree with you that this feature is quite useful. The problem with (A) is, that DMD doesn't do that; the function 'arraysetlength' explicitly checks whether the new length is null, and if so destroys the data pointer. Furthermore it seems that it is not allowed to call the .length property for null-arrays. How do I know? Well the function in the phobos file internal\gc.d byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p) contains this assertion assert(!p.length || p.data); Ironically, this assertion permits, that the data pointer is null, but the length is greater than 0.
 
Extra coding is not required if you don't need null-arrays: if some user
passes a null-array, the user gets a nice access violation/array bounds 
exception and will quickly learn to not pass null-arrays to such
functions. A quick check in the DbC section of your function would do
the job, too. (But I suppose, the user might not adapt that fast that
way :-) 
I originally thought D worked the way you describe and added DBC clauses to all my functions to check for null array parameters. After some testing I realized I'd been mistaken and happily removed most of these clauses. The result IMO was tighter, cleaner code that was easier to understand. I suppose it's really a matter of opinion. I like that arrays work the same as the other primitive types.
I always love it when this happens. Code that isn't written, is bug-free, maintainable, and super-fast ;-)
 
If your function should deal with both null-arrays and empty-arrays, no
extra code is required, since the .length property can be accessed for
both null- arrays and emtpy-arrays.
Could it? I suppose so, but the concept seems a tad odd. I kind of expect none of the parameters (besides sizeof, perhaps) to work for dynamic types that have not been initialized. Though perhaps that's the C way of thinking.
Yes, I think it is bit odd, too. For reading the length property it makes sense, but for resizing it is more questionable. But I am definetely thinking the C way here. Farmer.
Jun 30 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 22:57:02 +0000 (UTC), Farmer <itsFarmer. freenet.de> 
wrote:
 Sean Kelly <sean f4.ca> wrote in news:cbsqnf$547$1 digitaldaemon.com:

 In article <Xns9517F3F654C29itsFarmer 63.105.9.61>, Farmer says...
 The .length parameter would still work with null-arrays (as they
 currently do).
 But why would you want to initialize an array to null/empty and then
 resize it, instead of 'newing' it with the correct size in first place?
Consider the following: char[] str = new char[100]; str.length = 0; // A str.length = 5; // B str = new char[10]; // C In A, AFAIK it's legal for the compiler to retain the memory and merely change the length parameter for the string. B then just changes the length parameter again, and no reallocation is performed. C forces a reallocation even if the array already has the (hidden) capacity in place. Lacking allocators, this is a feature I consider rather nice in D.
I agree with you that this feature is quite useful. The problem with (A) is, that DMD doesn't do that; the function 'arraysetlength' explicitly checks whether the new length is null, and if so destroys the data pointer.
Provably correct. :) --[test.d]-- struct array { int length; void *data; } void main() { char[] p = new char[100]; array *s = cast(array *)&p; printf("%d\n",s.length); printf("%08x\n",s.data); p.length = 0; printf("%d\n",s.length); printf("%08x\n",s.data); } prints 100 007d2f80 0 00000000
 Furthermore it seems that it is not allowed to
 call the .length property for null-arrays.
I can go: p.length = 0; p.length = 0; p.length = 0; p.length = 0; no problem? is that what you mean't?
 How do I know? Well the function in the phobos file internal\gc.d
     	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
 contains this assertion
     	assert(!p.length || p.data);
perhaps this function is not called if (p.length == 0 && newlength == 0) one level higher?
 Ironically, this assertion permits, that the data pointer is null, but 
 the
 length is greater than 0.
which is technically impossible. Regan
 Extra coding is not required if you don't need null-arrays: if some 
 user
 passes a null-array, the user gets a nice access violation/array bounds
 exception and will quickly learn to not pass null-arrays to such
 functions. A quick check in the DbC section of your function would do
 the job, too. (But I suppose, the user might not adapt that fast that
 way :-)
I originally thought D worked the way you describe and added DBC clauses to all my functions to check for null array parameters. After some testing I realized I'd been mistaken and happily removed most of these clauses. The result IMO was tighter, cleaner code that was easier to understand. I suppose it's really a matter of opinion. I like that arrays work the same as the other primitive types.
I always love it when this happens. Code that isn't written, is bug-free, maintainable, and super-fast ;-)
 If your function should deal with both null-arrays and empty-arrays, no
 extra code is required, since the .length property can be accessed for
 both null- arrays and emtpy-arrays.
Could it? I suppose so, but the concept seems a tad odd. I kind of expect none of the parameters (besides sizeof, perhaps) to work for dynamic types that have not been initialized. Though perhaps that's the C way of thinking.
Yes, I think it is bit odd, too. For reading the length property it makes sense, but for resizing it is more questionable. But I am definetely thinking the C way here. Farmer.
-- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
parent Farmer <itsFarmer. freenet.de> writes:
Sorry, I've posted rubbish.

Farmer.
Jul 01 2004
prev sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <Xns95199C928F73itsFarmer 63.105.9.61>, Farmer says...
How do I know? Well the function in the phobos file internal\gc.d
    	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
contains this assertion
    	assert(!p.length || p.data);

Ironically, this assertion permits, that the data pointer is null, but the 
length is greater than 0. 
I read it that the assertion requires either the length to be zero or the length to be nonzero and the data to be non-null. This seems to correspond to my assumption that D allows for zero length arrays to retain allocated memory. Sean
Jun 30 2004
next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Thu, 1 Jul 2004 04:37:37 +0000 (UTC), Sean Kelly <sean f4.ca> wrote:

 In article <Xns95199C928F73itsFarmer 63.105.9.61>, Farmer says...
 How do I know? Well the function in the phobos file internal\gc.d
    	byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array *p)
 contains this assertion
    	assert(!p.length || p.data);

 Ironically, this assertion permits, that the data pointer is null, but 
 the
 length is greater than 0.
I read it that the assertion requires either the length to be zero or the length to be nonzero and the data to be non-null. This seems to correspond to my assumption that D allows for zero length arrays to retain allocated memory.
It may very well allow it (in this code, at this level), but how do you do it? Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
prev sibling parent Farmer <itsFarmer. freenet.de> writes:
Sean Kelly <sean f4.ca> wrote in news:cc04eh$2l5e$1 digitaldaemon.com:

 In article <Xns95199C928F73itsFarmer 63.105.9.61>, Farmer says...
How do I know? Well the function in the phobos file internal\gc.d
         byte[] _d_arraysetlength(uint newlength, uint sizeelem, Array
         *p) 
contains this assertion
         assert(!p.length || p.data);

Ironically, this assertion permits, that the data pointer is null, but
the length is greater than 0. 
Rubbish.
 
 I read it that the assertion requires either the length to be zero or
 the length to be nonzero and the data to be non-null.
 This seems to
 correspond to my assumption that D allows for zero length arrays to
 retain allocated memory. 
 
 Sean
 
I blush for shame, this is too embarrassing. What a whimp I am, I can't do simple boolean algebra. What must years of Java(TM) programming hav done to me? On the upside, it means that I was wrong. No assertion discourages null or empty-arrays. Yes, memory for zero length arrays is retained, if the array is sliced.
Jul 01 2004
prev sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 16:15:58 +0000 (UTC), Sean Kelly <sean f4.ca> wrote:

 In article <opsab6o5rl5a2sq9 digitalmars.com>, Regan Heath says...
 Fine and dandy EXCEPT we *need* to differentiate between empty and
 non-existant strings.
Why? It seems to me that this behavior would also require arrays to be initialized with new rather than resizing from zero using the .length parameter.
Nope. It already works, except for 2 inconsistencies (see the original post)
 And this would result in a ton of extra coding--either in clauses that 
 errored
 on null arrays or initialization code to handle both cases.  No thanks.
Not true. You can/could still simply check the length vs 0 if you want to treat null and empty the same.
 If this
 happened I'd stil using built-in arrays and write a class for the 
 purpose.
? 'stil' == 'stop' ? Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling next sibling parent reply Farmer <itsFarmer. freenet.de> writes:
Andy Friesen <andy ikagames.com> wrote in
news:cbpsi6$1u7d$1 digitaldaemon.com: 

[snip]
 C++ containers cannot represent null either.  D will (and does) get 
 along just fine if its array type works the same way.
[snip] And probably that is one reason why programmers don't use std::vector. <rant> If I wanted to use sth. like std::vector, I'd simply use them in D. But if I want to get to the *bare metal*, I want the *bare metal*. No less. I don't want sth. that is similar to std::vector (just better tuned for performance), tightly integrated (or coupled, in book) with the language, with some odd syntax and superfluous but still incomplete properties like array.sort. Even if that means that I have to code a bubble sort, all the time myself ;-) <end of rant> Farmer.
Jun 29 2004
parent reply Andy Friesen <andy ikagames.com> writes:
Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com: 
 
 [snip]
 
C++ containers cannot represent null either.  D will (and does) get 
along just fine if its array type works the same way.
[snip] And probably that is one reason why programmers don't use std::vector.
They don't? Do you have a source to back that up? As far as I've ever noticed, bigwig C++ people have always made it clear that std::vector is preferable over an array and that std::string is preferable to a char*. The concern for distinguishing empty vs null has quite honestly never even occurred to me until it was mentioned here. Think about expressing the distinction a different way and move on. I do apologize if I sound naive, (I'll assume that comment was directed at me :) ) but I honestly can't comprehend a situation in which the distinction is going to have any measurable cost on clarity, let alone performance. -- andy
Jun 29 2004
next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Tue, 29 Jun 2004 18:16:25 -0700, Andy Friesen <andy ikagames.com> wrote:
 Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com: [snip]

 C++ containers cannot represent null either.  D will (and does) get 
 along just fine if its array type works the same way.
[snip] And probably that is one reason why programmers don't use std::vector.
They don't? Do you have a source to back that up? As far as I've ever noticed, bigwig C++ people have always made it clear that std::vector is preferable over an array and that std::string is preferable to a char*. The concern for distinguishing empty vs null has quite honestly never even occurred to me until it was mentioned here. Think about expressing the distinction a different way and move on.
Sure.. can you show me how. I am having trouble doing it, it must be my C fixated brain. Pls use the example in the post I made to you earlier today..
 I do apologize if I sound naive, (I'll assume that comment was directed 
 at me :) )
LOL.. I thought it was me..
 but I honestly can't comprehend a situation in which the distinction is 
 going to have any measurable cost on clarity, let alone performance.
I think my example in my previous post does show a cost on either or both. Basically I think a reference type allows me to *express* more than a value type does. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 29 2004
prev sibling parent reply Farmer <itsFarmer. freenet.de> writes:
Andy Friesen <andy ikagames.com> wrote in
news:cbt41t$i1n$1 digitaldaemon.com: 

 Farmer wrote:
 
 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com: 
 
 [snip]
 
C++ containers cannot represent null either.  D will (and does) get 
along just fine if its array type works the same way.
[snip] And probably that is one reason why programmers don't use std::vector.
They don't? Do you have a source to back that up? As far as I've ever noticed, bigwig C++ people have always made it clear that std::vector is preferable over an array and that std::string is preferable to a char*.
Sorry, my statement was badly expressed. I meant it more like "And probably that is another reason why programmers often refrain from using std:vector." Of course, programmers use std::vector, otherwise I'd said that I am not a programmer ;-)
 
 The concern for distinguishing empty vs null has quite honestly never 
 even occurred to me until it was mentioned here.  Think about expressing
 the distinction a different way and move on.
I expect that this concern will rarely come up, and that's exactly why I brought it up. I would move on, but I see no compelling reason to express it in a different way.
 
 I do apologize if I sound naive, (I'll assume that comment was directed 
 at me :) ) but I honestly can't comprehend a situation in which the 
 distinction is going to have any measurable cost on clarity, let alone 
 performance.
 
   -- andy
I was naive in believing that it is obvious what posts I referred to. I was thinking e.g. of post http://www.digitalmars.com/drn-bin/wwwnews/23126 Btw, the author of this post, happens to use the term naive, so he shouldn't take offense. But in fact, this post doesn't really advocate 'NaN' for ints, rather http://www.digitalmars.com/drn-bin/wwwnews/23100 does so. Sorry, andy and sorry Regan. You didn't suggest 'NaN' for ints. So no f(l)ame(s) for you... Farmer.
Jun 30 2004
next sibling parent reply "Bent Rasmussen" <exo bent-rasmussen.info> writes:
I hope you're not referring to the quick hack I posted. It was meant to
express the *conceptual* problem of returning a null value for a value
type -- *not* a practical one. It was mentioned in the context of the ML
option type.

ps. Both links are broken.
Jun 30 2004
parent Farmer <itsFarmer. freenet.de> writes:
"Bent Rasmussen" <exo bent-rasmussen.info> wrote in 
news:cbvk1g$1r9b$1 digitaldaemon.com:

 I hope you're not referring to the quick hack I posted. It was meant to
 express the *conceptual* problem of returning a null value for a value
 type -- *not* a practical one. It was mentioned in the context of the ML
 option type.
 
 ps. Both links are broken.
 
 
You suggested none's for int's but you don't use the term naive in your posts. So no f(l)ame(s) for you, either. Try these: http://www.digitalmars.com/drn-bin/wwwnews?D/29213 http://www.digitalmars.com/drn-bin/wwwnews?D/23120 Farmer.
Jul 01 2004
prev sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Wed, 30 Jun 2004 22:57:04 +0000 (UTC), Farmer <itsFarmer. freenet.de> 
wrote:
 Andy Friesen <andy ikagames.com> wrote in
 news:cbt41t$i1n$1 digitaldaemon.com:

 Farmer wrote:

 Andy Friesen <andy ikagames.com> wrote in
 news:cbpsi6$1u7d$1 digitaldaemon.com:

 [snip]

 C++ containers cannot represent null either.  D will (and does) get
 along just fine if its array type works the same way.
[snip] And probably that is one reason why programmers don't use std::vector.
They don't? Do you have a source to back that up? As far as I've ever noticed, bigwig C++ people have always made it clear that std::vector is preferable over an array and that std::string is preferable to a char*.
Sorry, my statement was badly expressed. I meant it more like "And probably that is another reason why programmers often refrain from using std:vector." Of course, programmers use std::vector, otherwise I'd said that I am not a programmer ;-)
 The concern for distinguishing empty vs null has quite honestly never
 even occurred to me until it was mentioned here.  Think about expressing
 the distinction a different way and move on.
I expect that this concern will rarely come up, and that's exactly why I brought it up. I would move on, but I see no compelling reason to express it in a different way.
 I do apologize if I sound naive, (I'll assume that comment was directed
 at me :) ) but I honestly can't comprehend a situation in which the
 distinction is going to have any measurable cost on clarity, let alone
 performance.

   -- andy
I was naive in believing that it is obvious what posts I referred to. I was thinking e.g. of post http://www.digitalmars.com/drn-bin/wwwnews/23126 Btw, the author of this post, happens to use the term naive, so he shouldn't take offense.
Was it me.. these links don't work for me :(
 But in fact, this post doesn't really advocate 'NaN' for ints, rather
     	http://www.digitalmars.com/drn-bin/wwwnews/23100
 does so.
linky no worky :(
 Sorry, andy and sorry Regan.
 You didn't suggest 'NaN' for ints. So no f(l)ame(s) for you...
Aww.. AFAIKS we either need a NaN value for all value types, OR, we use reference types instead. Arrays in D act just like reference types (except for the inconsitencies you have shown) even tho they aren't technically, what I want to know is, what effect will changes to those inconsistencies actually have to people who do not need to be able to tell a null array from an empty one? Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 30 2004
parent Farmer <itsFarmer. freenet.de> writes:
Regan Heath <regan netwin.co.nz> wrote in 
news:opsafmvd1m5a2sq9 digitalmars.com:

[snip]

 Arrays in D act just like reference types (except for the inconsitencies 
 you have shown) even tho they aren't technically, what I want to know is, 
 what effect will changes to those inconsistencies actually have to people 
 who do not need to be able to tell a null array from an empty one?
The impact for code that doesn't need to distinguish between null arrays and empty arrays depends on 1) the semantic of null arrays regarding the .length property and the opCat operator. 2) whether null arrays are disallowed by a function interface contract. 3) whether a function should treat null arrays and empty arrays in the sameway. Regarding item 1) I assume these semantics: - Reading and writing of the length property is allowed. Write access to the length property always returns an array of the given size. So nullarray.length=0 turns the null array 'nullarray' into an empty array - The opCat operators allows null arrays for both arguments. So nullarray.opcat(nullarray2) creates an empty array. Regarding item 2): Non-local arrays should be initialized to an empty array. Local arrays should be initialized to an empty array instead of a null array. Note that local arrays that are not explicitly initialized are not permitted, anyway (see section 'Local Variables' in function.htm of the D spec). (But as DMD doesn't enforce this, yet, such illegal D code might be quite common.) As with all reference types that are passed to a function, putting an assertion to check for the disallowed null-case is a good idea. If the D language permits that different objects that are physically never changed, are allocated only once, then there is almost no performance penalty for using empty arrays instead of null. Regarding item 3): Code need the same changed as described for item 2. Additionally any array parameters must be checked against null and eventually converted to empty arrays. E.g. if (array is null) array=new char[0]; Of course, a templated function could do that. For many "low-level" functions null arrays can be treated as empty arrays, without any additional checks, since the length property can still be accessed. But 'high-level' functions typically have to deal with null arrays explicitly, because they would depend on functions that disallow null-arrays. Farmer.
Jul 03 2004
prev sibling parent Farmer <itsFarmer. freenet.de> writes:
Andy Friesen <andy ikagames.com> wrote in
news:cbpsi6$1u7d$1 digitaldaemon.com: 

 Regan Heath wrote:
 
 ... I could return existance and
 fill a passed char[]...  so my code now looks like...
 
 char[] s;
 if (getValue("foo",s))
I like this. It's simple and obvious.
An expression like if (getValue("foo",s) == true) doesn't tell much to the maintainer. An enumeration is needed to fully express the intend. [snip]
 
 Exposing POST data as an associative array seems like a win to me; it's 
 faster and can can be iterated over conveniently.  Also, as a language 
 intrinsic, it's a bit more likely to plug into other APIs easily.
 
 If you *really* need to, you could probably get away with doing 
 something like:
 
      const char[] nadda = "nadda";
      if (s is not nadda) { ... }
 
   -- andy
I see one issue with associative arrays here. It would break up the encapsulation of the class. The internal data would be revealed. If your internal data structure is different you must convert the internal data to the associate array. At best, a call of .dup would be needed as safety-practice. Farmer.
Jun 30 2004
prev sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
Why are there (almost) no complaints about D's support for empty arrays?
Actually, I think that D has got it right here. At least mostly. I'm happy with the fact that null counts as an empty array. But I do have SOME gripes. These are: (1) given that a is an array of length n, the expression a[n..n] gives an array bounds exception, and I don't believe it should. I would prefer that it simply evaluated to an empty string. I've lost count of the number of times I've had to put a special test for this case in various bits of code. It's a fairly normal thing to do, to have a pointer (or index in this case) to the first element BEYOND the last one in which you're interested, and to slice against it. Currently you get the assert if n == a.length. I don't believe it should assert unless n >= a.length (2) I think it is wrong that the test (a == null) will return true if and only if BOTH the length AND the address are zero. I think, if we're going to have a model in which the statement a = null; will create an empty array, then (a == null) should return true if a /is/ an empty array. That is, only the length should be tested, not the address. (If you want to test both parts, well there's always a === null). Arcane Jill
Jun 27 2004
next sibling parent reply Regan Heath <regan netwin.co.nz> writes:
On Sun, 27 Jun 2004 18:58:50 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:
 In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
 Why are there (almost) no complaints about D's support for empty arrays?
Actually, I think that D has got it right here. At least mostly. I'm happy with the fact that null counts as an empty array. But I do have SOME gripes. These are: (1) given that a is an array of length n, the expression a[n..n] gives an array bounds exception, and I don't believe it should. I would prefer that it simply evaluated to an empty string. I've lost count of the number of times I've had to put a special test for this case in various bits of code. It's a fairly normal thing to do, to have a pointer (or index in this case) to the first element BEYOND the last one in which you're interested, and to slice against it. Currently you get the assert if n == a.length. I don't believe it should assert unless n >= a.length
This (now?) works. void main() { char[] a; a ~= "1"; a ~= "2"; a ~= "3"; printf("%.*s\n",a[3..3]); printf("%.*s\n",a[2..3]); printf("%.*s\n",a[1..3]); printf("%.*s\n",a[0..3]); }
 (2) I think it is wrong that the test (a == null) will return true if 
 and only
 if BOTH the length AND the address are zero.
I think this is correct.
 I think, if we're going to have a
 model in which the statement a = null; will create an empty array,
I think this is wrong. a = null should set the data to null and length to 0. It should *not* create an empty array.
 then (a ==
 null) should return true if a /is/ an empty array. That is, only the 
 length
 should be tested, not the address. (If you want to test both parts, well 
 there's
 always a === null).
We *need* to have *both* null and empty arrays. The reason is pretty simple: - null means does not exist - emtpy means exists, but has no value (or empty value) This is important in situations like the original poster mentioned and in my experience for example... When reading POST input from a web page, you get a string like so: Setting1=Regan+Heath&Setting2=&& when requesting items you might have a function like: char[] getFormValue(char[] label); the code to get the values for the above form might go: char[] s; s = getFormValue("Setting1"); // s is "Regan Heath" s = getFormValue("Setting2"); // s is "" s = getFormValue("Setting3"); // s is null It is important the above code can tell that Setting3 was not passed in the form, so it can decide not to overwrite whatever current value that setting has, whereas it can tell Setting2 was passed and will overwrite the current value with a new blank one. I think the problem with arrays is that a null array should not compare equal to an empty array. In other words the original post test(s) null1 == "" null1 == empty1 should be false. Regan. -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 27 2004
next sibling parent Derek <derek psyc.ward> writes:
On Mon, 28 Jun 2004 10:06:18 +1200, Regan Heath wrote:

[snip]

 
 We *need* to have *both* null and empty arrays. The reason is pretty 
 simple:
    - null means does not exist
    - emtpy means exists, but has no value (or empty value)
 
Agreed. A non-existant array is not the same as an array with no elements. -- Derek Melbourne, Australia
Jun 27 2004
prev sibling next sibling parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <opr99w0st25a2sq9 digitalmars.com>, Regan Heath says...
 (1) given that a is an array of length n, the expression a[n..n] gives 
 an array
 bounds exception,
This (now?) works.
Indeed, I think it has always worked. It was just me misremembering the problem. I'll start again. What I MEANT was... Given that a is an array of length n, the expression &a[n] gives an array bounds exception. And I don't believe it should. Taking the address of the first byte beyond the end of an array can be a very useful thing to do. In particular, if a is an empty array, then &a[0] asserts, which means that code like this: intended to fill an array from a FILE*-type stream, will fall over if a is empty. And there's no reason why it should - fread is quite happy to be passed a length of zero. Same goes for functions like memset() and so on. The fact of not being able to take &a[a.length] creates an akwardness that we have to code around. The above example would have to be encased in an if test in order not to assert - and you might think: So what? This is no big deal. But having to make that explicit test time and time again can start to get annoying. It should not, in my opinion, be an error to evaluate &a[a.length]; Arcane Jill
Jun 28 2004
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <cbpkes$1ip0$1 digitaldaemon.com>, Arcane Jill says...
Indeed, I think it has always worked. It was just me misremembering the problem.
I'll start again. What I MEANT was...

Given that a is an array of length n, the expression &a[n] gives an array bounds
exception. And I don't believe it should. Taking the address of the first byte
beyond the end of an array can be a very useful thing to do. 
Yes it is. But I think it's the syntax that's the problem in this case. IIRC using the subscript operator (ie. [n]) dereferences the element. So what you're doing when you call &a[n] is calculating the address of the element at position n. Since no such element exists, the call fails. In C the correct thing to do would be to use (a+n) instead. Just to make sure I was right, I dug this quote out of the C++ standard (5.2.1): "The expression E1[E2] is identical (by definition) to *((E1)+(E2))."
The fact of not being able to take &a[a.length] creates an akwardness that we
have to code around.
A possibility would be to have the compiler treat &a[n] as a special case... since the address-of operator is present, it could treat this expression as equivalent to: "a+n" rather than "&*(a+n)" Sean
Jun 28 2004
prev sibling next sibling parent reply Andy Friesen <andy ikagames.com> writes:
Arcane Jill wrote:

 The fact of not being able to take &a[a.length] creates an akwardness that we
 have to code around. The above example would have to be encased in an if test
in
 order not to assert - and you might think: So what? This is no big deal. But
 having to make that explicit test time and time again can start to get
annoying.
 
 It should not, in my opinion, be an error to evaluate &a[a.length];
Something which just occurred to me that would resolve this issue would be to add two properties to array types: begin and end. These properties would be pointer types which point to the beginning and end of the array's contents. (exactly like C++ iterators) T[] buffer = ...; // buffer.length makes more sense than end-begin in this case. // Bear with me: it's an example :) fread(buffer.begin, T.sizeof, buffer.end - buffer.begin, fileHandle); -- andy
Jun 28 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <cbprfd$1sq9$1 digitaldaemon.com>, Andy Friesen says...
Something which just occurred to me that would resolve this issue would 
be to add two properties to array types: begin and end.  These 
properties would be pointer types which point to the beginning and end 
of the array's contents.  (exactly like C++ iterators)
This might be very handy. If so, I wouldn't mind seeing rbegin and rend parameters as well though. Plus, it raises the question of what they return for associative arrays. Sean
Jun 28 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Sean Kelly wrote:

 In article <cbprfd$1sq9$1 digitaldaemon.com>, Andy Friesen says...
 
Something which just occurred to me that would resolve this issue would 
be to add two properties to array types: begin and end.  These 
properties would be pointer types which point to the beginning and end 
of the array's contents.  (exactly like C++ iterators)
This might be very handy. If so, I wouldn't mind seeing rbegin and rend parameters as well though.
Huh? They're pointers... wouldn't rbegin == end and rend == begin? I think I missed the point...
 Plus, it raises the question of what they return for
 associative arrays.
The concept doesn't apply to associative arrays afaics, so they wouldn't exist. Sam
Jun 29 2004
parent reply Sean Kelly <sean f4.ca> writes:
In article <cbrhd9$1a0o$1 digitaldaemon.com>, Sam McCall says...
 This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
 parameters as well though.
Huh? They're pointers... wouldn't rbegin == end and rend == begin? I think I missed the point...
Actually, rbegin == end-1 and rend == begin-1.
 Plus, it raises the question of what they return for
 associative arrays.
The concept doesn't apply to associative arrays afaics, so they wouldn't exist.
It does apply to associative arrays IMO. I iterate through the contents of such containers quite regularly in C++. I've done something similar with an iterator wrapper for associative arrays in D, but it would be nice to have this built-in if we move towards the iterator methodology. Sean
Jun 29 2004
parent reply Sam McCall <tunah.d tunah.net> writes:
Sean Kelly wrote:
 In article <cbrhd9$1a0o$1 digitaldaemon.com>, Sam McCall says...
 
This might be very handy.  If so, I wouldn't mind seeing rbegin and rend
parameters as well though.
Huh? They're pointers... wouldn't rbegin == end and rend == begin? I think I missed the point...
Actually, rbegin == end-1 and rend == begin-1.
Oops. Yeah, this would be useful.
Plus, it raises the question of what they return for
associative arrays.
The concept doesn't apply to associative arrays afaics, so they wouldn't exist.
It does apply to associative arrays IMO. I iterate through the contents of such containers quite regularly in C++. I've done something similar with an iterator wrapper for associative arrays in D, but it would be nice to have this built-in if we move towards the iterator methodology.
We're talking about pointers for low level iteration, this doesn't apply to associative arrays, who's data structure's opaque. I don't think we're moving towards iterators, just talking about pointers. The fact that iterators pretend to be pointers in their syntax is neither here nor threre ;) If you really want "official" iterators, there's always (or will always be) the DTL... Sam
Jun 29 2004
parent Sean Kelly <sean f4.ca> writes:
In article <cbt5vu$kdb$1 digitaldaemon.com>, Sam McCall says...
We're talking about pointers for low level iteration, this doesn't apply 
to associative arrays, who's data structure's opaque. I don't think 
we're moving towards iterators, just talking about pointers. The fact 
that iterators pretend to be pointers in their syntax is neither here 
nor threre ;)
This is easy enough to do with free functions anyway. Something like: alias char[][char[]] StrMap; StrMap map; Iterator!(Pair!(char[],char[])) i = begin!(StrMap)( map ); I'm sure the syntax could bwe improved but you get the idea. I've already experimented with such iterators for associative arrays and they work just fine. Sean
Jun 30 2004
prev sibling next sibling parent Farmer <itsFarmer. freenet.de> writes:
Arcane Jill <Arcane_member pathlink.com> wrote in
news:cbpkes$1ip0$1 digitaldaemon.com: 


 Given that a is an array of length n, the expression &a[n] gives an
 array bounds exception. And I don't believe it should. Taking the
 address of the first byte beyond the end of an array can be a very
 useful thing to do. 
The expression cast(elementtype*)a+n , does that. E.g. to get rid of annoying bounds-checking you could write. // given ubyte[] a; fread(cast(ubyte*)a+0, ubyte.size, a.length, fp); Farmer.
Jun 28 2004
prev sibling next sibling parent Regan Heath <regan netwin.co.nz> writes:
On Mon, 28 Jun 2004 17:27:56 +0000 (UTC), Arcane Jill 
<Arcane_member pathlink.com> wrote:

 In article <opr99w0st25a2sq9 digitalmars.com>, Regan Heath says...
 (1) given that a is an array of length n, the expression a[n..n] gives
 an array
 bounds exception,
 This (now?) works.
Indeed, I think it has always worked. It was just me misremembering the problem. I'll start again. What I MEANT was... Given that a is an array of length n, the expression &a[n] gives an array bounds exception. And I don't believe it should. Taking the address of the first byte beyond the end of an array can be a very useful thing to do. In particular, if a is an empty array, then &a[0] asserts, which means that code like this: intended to fill an array from a FILE*-type stream, will fall over if a is empty. And there's no reason why it should - fread is quite happy to be passed a length of zero. Same goes for functions like memset() and so on. The fact of not being able to take &a[a.length] creates an akwardness that we have to code around. The above example would have to be encased in an if test in order not to assert - and you might think: So what? This is no big deal. But having to make that explicit test time and time again can start to get annoying. It should not, in my opinion, be an error to evaluate &a[a.length];
Interestingly.. void main() { char[] p,s; s.length = 10; printf("%08x\n",&s[0]); printf("%08x\n",&p[0]); } D:\D\src\build\temp>dmd arr.d d:\D\dmd\bin\..\..\dm\bin\link.exe arr,,,user32+kernel32/noi; D:\D\src\build\temp>arr 007d0fd0 Error: ArrayBoundsError arr.d(6) So it seems Sean is indeed correct about what p[0] is doing (de-referencing the element) Regan -- Using M2, Opera's revolutionary e-mail client: http://www.opera.com/m2/
Jun 28 2004
prev sibling parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Arcane Jill wrote:

 In article <opr99w0st25a2sq9 digitalmars.com>, Regan Heath says...
 (1) given that a is an array of length n, the expression a[n..n] gives
 an array
 bounds exception,
This (now?) works.
Indeed, I think it has always worked. It was just me misremembering the problem. I'll start again. What I MEANT was... Given that a is an array of length n, the expression &a[n] gives an array bounds exception. And I don't believe it should. Taking the address of the first byte beyond the end of an array can be a very useful thing to do.
No, I disagree here. In general, that address would point to nothing. Reading there is pointless, writing is dangerous. If you want to append to a string by doing a low-level write to memory, then increment length first and write then. The way you could phrase it: In some cases it would be convenient if it were not an error to take that address, if it is then not used afterward. But still, I don't see that coding around that "limitation" is that much of an effort. It gives you a few if-clauses around expressions, so what?
Jun 29 2004
parent reply Arcane Jill <Arcane_member pathlink.com> writes:
In article <cbr57k$p0m$1 digitaldaemon.com>, Norbert Nemec says...
 Given that a is an array of length n, the expression &a[n] gives an array
 bounds exception. And I don't believe it should. Taking the address of the
 first byte beyond the end of an array can be a very useful thing to do.
No, I disagree here. In general, that address would point to nothing. Reading there is pointless, writing is dangerous.
Such a pointer is never used for reading OR writing. It /is/, however, used in pointer comparison expressions, and in such context, is perfectly meaningful, and safe. But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm happy.
If you want to append to
a string by doing a low-level write to memory,
I never said I wanted to do any such thing. Arcane Jill
Jun 29 2004
parent reply Norbert Nemec <Norbert.Nemec gmx.de> writes:
Arcane Jill wrote:

 Such a pointer is never used for reading OR writing. It /is/, however,
 used in pointer comparison expressions, and in such context, is perfectly
 meaningful, and safe.
True, you have a point there - I really don't know what to think about it.
 But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm
 happy.
Well - that's a workaround but not a clean solution.
Jun 29 2004
parent Farmer <itsFarmer. freenet.de> writes:
Norbert Nemec <Norbert.Nemec gmx.de> wrote in 
news:cbrogr$1jp7$1 digitaldaemon.com:

 Arcane Jill wrote:
 
[snip]
 
 But anyway, Farmer tells me I can write cast(elementtype*)a+n, so I'm
 happy.
Well - that's a workaround but not a clean solution.
In Jill's example, a *C* function expects a pointer to anything, not a D- array. So, I think, it makes perfect sense to convert the D array to the pointer type first, and than do pointer arithmetic as in C. (If you need the behaviour of a pointer, use one.) Farmer.
Jun 29 2004
prev sibling parent Farmer <itsFarmer. freenet.de> writes:
Regan Heath <regan netwin.co.nz> wrote in 
news:opr99w0st25a2sq9 digitalmars.com:
[snip]
 
 I think the problem with arrays is that a null array should not compare 
 equal to an empty array. In other words the original post test(s)
    null1 == ""
    null1 == empty1
 
 should be false.
 
Exactly, otherwise the equals() method would not be transitive. (Of course, we could also make (empty1 == null) evaluate to true by completely banning empty-arrays from the D sphere.) Regards, Farmer.
Jun 28 2004
prev sibling parent reply Farmer <itsFarmer. freenet.de> writes:
Arcane Jill <Arcane_member pathlink.com> wrote in
news:cbn5da$vu1$1 digitaldaemon.com: 

 In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
Why are there (almost) no complaints about D's support for empty arrays?
Actually, I think that D has got it right here. At least mostly. I'm happy with the fact that null counts as an empty array. But I do have SOME gripes. These are: (1) given that a is an array of length n, the expression a[n..n] gives an array bounds exception, and I don't believe it should. I would prefer that it simply evaluated to an empty string. I've lost count of the number of times I've had to put a special test for this case in various bits of code. It's a fairly normal thing to do, to have a pointer (or index in this case) to the first element BEYOND the last one in which you're interested, and to slice against it. Currently you get the assert if n == a.length. I don't believe it should assert unless n >= a.length
I'm a bit confused, since in my sample, the array 'empty2' is created from a slice that points behind the array and it didn't cause an array bounds exception. Or did you need empty-slices, that point at arbitrary memory locations?
 (2) I think it is wrong that the test (a == null) will return true if
 and only if BOTH the length AND the address are zero. I think, if we're
 going to have a model in which the statement a = null; will create an
 empty array, then (a == null) should return true if a /is/ an empty
 array. That is, only the length should be tested, not the address. (If
 you want to test both parts, well there's always a === null).
I guess the rule here is simple: For value types (as the array handle is one) ==/equals() is exactly the same as ===/is. But why should we're going to model arrays in way that make arrays less powerful and requires *additional* code to make the model work correct? Regards, Farmer.
Jun 27 2004
parent Farmer <itsFarmer. freenet.de> writes:
Farmer <itsFarmer. freenet.de> wrote in
news:Xns951699362221itsFarmer 63.105.9.61: 

 Arcane Jill <Arcane_member pathlink.com> wrote in
 news:cbn5da$vu1$1 digitaldaemon.com: 
 
 In article <Xns9515C8A3CA1ACitsFarmer 63.105.9.61>, Farmer says...
 (2) I think it is wrong that the test (a == null) will return true if
 and only if BOTH the length AND the address are zero. I think, if we're
 going to have a model in which the statement a = null; will create an
 empty array, then (a == null) should return true if a /is/ an empty
 array. That is, only the length should be tested, not the address. (If
 you want to test both parts, well there's always a === null).
I guess the rule here is simple: For value types (as the array handle is one) ==/equals() is exactly the same as ===/is.
My mistake, forget about this sentence, it is utter rubbish: For primitive value types, ===/is behaves like ==/equals() , rather than the other way round. Furthermore, array handles aren't primitive types.
Jun 28 2004