www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - suggestion: read-only array-reference

reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
I've posted a draft of this proposal on D.Learn but I think not many 
people visit there, plus this time it's a bit simplified.

This is a suggestion that aims to guarantee compile-time "read-only" 
correctness for arrays passed to and/or returned from functions.

Introduce a new type of array-reference, that proveds "read-only" 
interface to the contents of the array.

I'm using the term "array reference" because array types in D are 
actually references.

This proposed reference is a new type, that's strongly enforces by the 
compiler.

You can make a read-only array reference refer to any array (including 
slices, of course) but this reference doesn't provide any methods to 
modify the content of the array, not its length.

Any normal array can be converted/casted to a read-only reference, but 
read only references cannot be casted to anything; never.

More over, normal array references should be implicitly castable to 
read-only array referencs.
Normal arrays should provide a property "readonly" that returns a 
read-only reference to the array.
so, if xyz is an array, then xyz.readonly returns a read-only reference 
to xyz.

Of course, if you hold a normal reference to an array that's also 
referred to by a read only reference, you can still change the array 
using the normal reference, but that's not the point.

The point is, you can pass the read-only reference to functions, knowing 
for sure that these functions cannot change the array. More over, this 
guarantee is enforced by the compiler at compile-time.

Just like the compiler will not let you call
foo.bar();
if foo doesn't expose a method called bar().
The compiler will not let you call
abc[4] = 'h';
if the abc is a read-only reference.

The proposed read-only reference allows you to:
-read elements from the array and iterate over them
-read a slice (the slice operation returns a read-only reference)
-read the length
-let it refer to another array

It doesn't allow you to:
-write to/change any element
-write to/change any property
-get a pointer to the array
-cast it to a normal array


It's basically something like the following class, but built into the 
language as another array type:
class ReadOnlyArrayReference(T)
{
	private:
	T[] array; //internal array reference
	
	public:

	this()
	{
	}
		
	this( T[] a )
	{
		array = a;
	}

	void set( T[] a )
	{
		array = a;
	}

	typeof(array.length) length()
	{
		return array.length;
	}

	typeof(array.dup) dup()
	{
		return array.dup;
	}

	typeof(array[0]) opIndex( int i )
	{
		return array[i];
	}

	typeof(this) opSlice( int i, int j )
	{
		return new ReadOnlyArrayReference(array[i..j]);
	}

         //probably some more ...
}

It's just a wrapper around an array, but it hides the actual array 
reference and doesn't provide any interface for changing the contents of 
the array.

Of course, if the array contains object references, then you cannot 
change the references, but you can still change the objects themselves 
using the references.
However, I believe that's not a big issue.

The only thing that I can't figure out is what syntax is appropriate.
Currently, we use [] to declare arrays:

type[] x; //x is a reference to an array.

Maybe we can use [!] to declare read only references:
type[!] x; //x is a read-only reference to an array.
or something like that.

It should still be OK to create complicated types, like:

type[!][3] x; // x is an array of 3 read-only array references

and so on ...

If this gets implemented, then phobos could be rewritten to always take 
read-only references to char arrays (strings) and the such.
This implies that COW protocol wouldn't be needed, since you can't write 
to these arrays anyway.

Examples of usage:
--------------
//assume abc is already declared as a read-only reference to a char array
//holding the string "hello";

char c = abc[3]; //ok, you can read elements
abc[2] = 'x'; //error, can't write to elements
auto x = abc.ptr; //error, can't get pointer and/or no such property
abc.length = 7; //error, can't write to a property and/or not set method 
exists for property "length"
abc = "audi"; //ok, abc now refers to another array, the original
               //string "hello" remains intact.
auto r = abc.length; //ok, you can read the length property
char[] b = abc; //error, cast not allowed.
char[] b = abc.dub; //ok, duplicating produces a normal array reference 
.to the duplicated array
foreach( a; abc ) //ok, you can iterate over the array
{
  ....
}
------------

all errors indicated above are compile-time errors.
Jul 19 2006
parent reply xs0 <xs0 xs0.com> writes:
Hasan Aljudy wrote:

 [snip]
First, let me just say I'm all for compile-time verification as much as possible. But, your proposal has a problem - it introduces an entirely new type. You still can't write one function for both readonly and mutable arrays. You still can't know if you need to COW or not. As soon as a function takes a readonly parameter, all the functions it calls in turn also need to be declared readonly.. At the cost of some verbosity, the same effect can already be achieved with a simple array wrapper... xs0
Jul 19 2006
parent reply Hasan Aljudy <hasan.aljudy gmail.com> writes:
xs0 wrote:
 Hasan Aljudy wrote:
 
  > [snip]
 
 First, let me just say I'm all for compile-time verification as much as 
 possible.
 
 But, your proposal has a problem - it introduces an entirely new type. 
heheh, the suggestion *is* introducing a new type; that's the whole point.
 You still can't write one function for both readonly and mutable arrays.
Yes you can. Why do you say you can't?
 You still can't know if you need to COW or not.
Sure you do. If you're using read-only reference then you don't need to worry about COW because it's enforced by the compiler. If you're using a normal array reference, then you either don't care about COW or you made a mistake.
 As soon as a function 
 takes a readonly parameter, all the functions it calls in turn also need 
 to be declared readonly..
Yes, that's true. For this suggestion to work practically, all phobos functions must be re-touched to take read only array references. (I said "re-touched" instead of saying "re-written", because COW is already implemented in phobos, so changing the type of parameters shouldn't break anything.) The same probably applies for other libraries. However, on the other hand, most of the times you really only need read-only references. This is the case with Java. There are two types of strings, String (immutable/read-only interface) and StringBuffer (mutable/read-write interface). Most of the time, you deal with String rather than StringBuffer. So we're not "experimenting" with something that's totally new. It's been done before, and it seems to be doing just fine.
 
 At the cost of some verbosity, the same effect can already be achieved 
 with a simple array wrapper...
Yes, it could, and I think that's interesting, because even though there are ways to eforce compile-time immutability for arrays and even classes, people still cry for C++-style constness. However, think about it from a library-writer prespective. Should you make such a class yourself? What if other library writers provided different classes to achieve the same effect? These different classes would be incompatible with each other, making it very hard to use such libraries together seamlessly. Having a standarized wrapper is better. It could either be implemented in Phobos, or as a builtin type. I think having it as a builtin type would make things simpler, and it seems to fit with D's philosophy of having certain things built-into the language rather than implemented as a class. After all, D has dynamic arrays builtin, associative arrays builtin, complex numbers built in, etc. Having a builtin read-only array reference would serve to encourage using it. Additionally, we can use the same rational for builtins that Walter used for other things: having this built-in means that the compiler knows about the read-only array idiom, and can provide meaningful error messeges and prevent any possible attempt to break the contract (using asm or whatever pointer tricks available).
Jul 20 2006
parent reply Johan Granberg <lijat.meREM OVEgmail.com> writes:
Hasan Aljudy wrote:
 Additionally, we can use the same rational for builtins that Walter used 
 for other things: having this built-in means that the compiler knows 
 about the read-only array idiom, and can provide meaningful error 
 messeges and prevent any possible attempt to break the contract (using 
 asm or whatever pointer tricks available).
While I'm all for a built in const I disagree with the last paragraph. I don't want the compiler to try to prevent me subverting the protection by using casts or pointer tricks (c++ had const cast for a reason). I have used some c++ libraries where some values where const when not strictly needed, and I was able to achieve the desired behavior by the use of a cast. (This is of course unsafe and should never bee used in library code, just in quick and dirty applications or internally in your own code base where you can use this as a shortcut)
Jul 20 2006
parent reply Dave <Dave_member pathlink.com> writes:
Johan Granberg wrote:
 Hasan Aljudy wrote:
 Additionally, we can use the same rational for builtins that Walter 
 used for other things: having this built-in means that the compiler 
 knows about the read-only array idiom, and can provide meaningful 
 error messeges and prevent any possible attempt to break the contract 
 (using asm or whatever pointer tricks available).
Subversion by asm would probably be impossible to prevent. But that's Ok. Even if you could subvert through pointers and casting tricks the compiler could enforce the normal cases and the rest could be covered with something in the spec like "const really means 'constant' in D. Subversion of const is disallowed and the results are undefined."
 While I'm all for a built in const I disagree with the last paragraph. I 
 don't want the compiler to try to prevent me subverting the protection 
 by using casts or pointer tricks (c++ had const cast for a reason). I 
 have used some c++ libraries where some values where const when not 
 strictly needed, and I was able to achieve the desired behavior by the 
 use of a cast. (This is of course unsafe and should never bee used in 
 library code, just in quick and dirty applications or internally in your 
 own code base where you can use this as a shortcut)
And I disagree with that <g> If const was not strictly needed (or could not easily be subverted w/o asm as you can w/ C++) then the C++ library you mention should not have used it. With some sort of "true const" D libraries would be written differently.
Jul 20 2006
parent reply Johan Granberg <lijat.meREM OVEgmail.com> writes:
Dave wrote:
 While I'm all for a built in const I disagree with the last paragraph. 
 I don't want the compiler to try to prevent me subverting the 
 protection by using casts or pointer tricks (c++ had const cast for a 
 reason). I have used some c++ libraries where some values where const 
 when not strictly needed, and I was able to achieve the desired 
 behavior by the use of a cast. (This is of course unsafe and should 
 never bee used in library code, just in quick and dirty applications 
 or internally in your own code base where you can use this as a shortcut)
And I disagree with that <g> If const was not strictly needed (or could not easily be subverted w/o asm as you can w/ C++) then the C++ library you mention should not have used it. With some sort of "true const" D libraries would be written differently.
I agree with you about the library beeing incorrectly written. But notice this line in my reply.
 or internally in your own code base where you can use this as a 
shortcut) the case when you have a class like this class Foo { const char[] name; void setName(const char[] c){(cast(char[])name)[]=c;} } this could bee achieved by using properties but I think this should bee allowed.
Jul 20 2006
parent reply Ben Phillips <Ben_member pathlink.com> writes:
In article <e9o0pm$2j03$1 digitaldaemon.com>, Johan Granberg says...
Dave wrote:
 While I'm all for a built in const I disagree with the last paragraph. 
 I don't want the compiler to try to prevent me subverting the 
 protection by using casts or pointer tricks (c++ had const cast for a 
 reason). I have used some c++ libraries where some values where const 
 when not strictly needed, and I was able to achieve the desired 
 behavior by the use of a cast. (This is of course unsafe and should 
 never bee used in library code, just in quick and dirty applications 
 or internally in your own code base where you can use this as a shortcut)
And I disagree with that <g> If const was not strictly needed (or could not easily be subverted w/o asm as you can w/ C++) then the C++ library you mention should not have used it. With some sort of "true const" D libraries would be written differently.
I agree with you about the library beeing incorrectly written. But notice this line in my reply.
 or internally in your own code base where you can use this as a 
shortcut) the case when you have a class like this class Foo { const char[] name; void setName(const char[] c){(cast(char[])name)[]=c;} } this could bee achieved by using properties but I think this should bee allowed.
Well, in your case, why declare "name" as a "const" if you intend to allow it to be modified? A const member in a class should only be allowed to be modified in a constructor of that class. Off-topic: If D gets const then I think the following would be a nice piece of syntatic sugar: char[] str = "hell"; str ~= 'o'; // I can modify it someFunc(const str); // that function can't I dunno, it looks nice to me :P
Jul 20 2006
next sibling parent Johan Granberg <lijat.meREM OVEgmail.com> writes:
Ben Phillips wrote:
 Well, in your case, why declare "name" as a "const" if you intend to allow it
to
 be modified? A const member in a class should only be allowed to be modified in
 a constructor of that class.
The idea was that I can modify it but nobody else can. Actually it's not very important as you showed in your example.
Jul 20 2006
prev sibling parent Hasan Aljudy <hasan.aljudy gmail.com> writes:
Ben Phillips wrote:
 In article <e9o0pm$2j03$1 digitaldaemon.com>, Johan Granberg says...
 
Dave wrote:

While I'm all for a built in const I disagree with the last paragraph. 
I don't want the compiler to try to prevent me subverting the 
protection by using casts or pointer tricks (c++ had const cast for a 
reason). I have used some c++ libraries where some values where const 
when not strictly needed, and I was able to achieve the desired 
behavior by the use of a cast. (This is of course unsafe and should 
never bee used in library code, just in quick and dirty applications 
or internally in your own code base where you can use this as a shortcut)
And I disagree with that <g> If const was not strictly needed (or could not easily be subverted w/o asm as you can w/ C++) then the C++ library you mention should not have used it. With some sort of "true const" D libraries would be written differently.
I agree with you about the library beeing incorrectly written. But notice this line in my reply.
or internally in your own code base where you can use this as a 
shortcut) the case when you have a class like this class Foo { const char[] name; void setName(const char[] c){(cast(char[])name)[]=c;} } this could bee achieved by using properties but I think this should bee allowed.
Well, in your case, why declare "name" as a "const" if you intend to allow it to be modified? A const member in a class should only be allowed to be modified in a constructor of that class. Off-topic: If D gets const then I think the following would be a nice piece of syntatic sugar: char[] str = "hell"; str ~= 'o'; // I can modify it someFunc(const str); // that function can't I dunno, it looks nice to me :P
That's how my suggestion sort of works: You have a normal-reference to the array. BUT, You can pass a read-only reference of the array to functions. so, if someFunc is declared to take a read-only reference, then the following: char[] str = "hell"; str ~= "o"; //ok, normal reference someFunc( str ); //str is implicitly converted to a read-only reference for the function //OR someFunc( str.readonly ); //explicitly convert it to a read-only reference str ~= " world"; //ok: for you, it's still a normal reference.
Jul 20 2006