www.digitalmars.com         C & C++   DMDScript  

D - Thought on Array Type Syntax

reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
What if we declare array variables and types with the indices on the 
left?  Then we can still "build up" from basic types but still have 
in-order array indicies:
	[type_foo][]int var;
which would be an associative array (indexed by 'type_foo') of dynamic 
arrays of int.  var would be accessed in the same index-order as the 
declaration:
	var[foo_index][int_index] = <val>;

Interestingly, this also makes it easy and nonambiguous when you are 
mixing array and pointer modifiers:
	[type_foo]*[]*int var2;
would be an associative array of pointers to dynamic arrays to pointers 
to int.

I know that this looks horribly backwards to all of us who grew up on C. 
  But the reverse-index problem is also a horrible backwards, and it is 
NOT immediately obvious to new programmers (or even to old programmers, 
like me).  Maybe htis ugliness is better than the current?

AN INTERESTING SIDE-EFFECT is that now it is (more) obvious from the 
syntax that all variables in a multiple-declaration are the same type. 
In the current syntax,
	int *var3,var4;
declares two pointers, but looks (at first glance) like it's one pointer 
and one int.  But this syntax: is more obvious (at least to me):
	*int var3,var4;

Thoughts, anyone?
Jun 30 2003
next sibling parent reply Mark Evans <Mark_member pathlink.com> writes:
The question is really, Why use square brackets for so many purposes?  Any
syntax element with multiple meanings will be painful.  In C and D alike, square
brackets both declare and index arrays.  The confusion grows in D with the
proliferation of arrays:  VLA's, associative array declarations, strings, and
slicing.  D is overloading the [] syntax beyond reasonable limits.

One can hack through the undergrowth in a simple way.  Separate type
declarations from indexing.  Reserve [] for indexing and slicing only.  There
are other ways to declare types.

A phrase like 'array(int,N,fixed)' could declare an N-dimensional array of int,
'array(double,N,variable)' a VLA of doubles with initial size N.  A mixed case
might read 'array(array(int,N,fixed),M,variable)'.  If you don't like these
notions, invent your own.  There are no limits except to keep [] out of the type
declarations.

If D is finally going to break with C syntax (hooray), then go all the way and
do it right.  I'm not holding my breath, but that would be my input.

Mark
Jun 30 2003
next sibling parent reply "Sean L. Palmer" <palmer.sean verizon.net> writes:
You may be on the right track, but overloading ( ) more to avoid overloading
[ ] doesn't make much sense.

We need more brackets.

I think Unicode may have a few more brackets that may be useful.   ;)

Some mileage may be gained by using double square brackets for declarations,
thusly:

[[4]][[]]int wierdarray;

Since it's normally invalid syntax to have nested square brackets, this
should be unambiguous.

Sean

"Mark Evans" <Mark_member pathlink.com> wrote in message
news:bdq1d0$rhl$1 digitaldaemon.com...
 The question is really, Why use square brackets for so many purposes?  Any
 syntax element with multiple meanings will be painful.  In C and D alike,

 brackets both declare and index arrays.  The confusion grows in D with the
 proliferation of arrays:  VLA's, associative array declarations, strings,

 slicing.  D is overloading the [] syntax beyond reasonable limits.

 One can hack through the undergrowth in a simple way.  Separate type
 declarations from indexing.  Reserve [] for indexing and slicing only.

 are other ways to declare types.

 A phrase like 'array(int,N,fixed)' could declare an N-dimensional array of

 'array(double,N,variable)' a VLA of doubles with initial size N.  A mixed

 might read 'array(array(int,N,fixed),M,variable)'.  If you don't like

 notions, invent your own.  There are no limits except to keep [] out of

 declarations.

 If D is finally going to break with C syntax (hooray), then go all the way

 do it right.  I'm not holding my breath, but that would be my input.

 Mark

Jun 30 2003
parent Mark Evans <Mark_member pathlink.com> writes:
You may be on the right track, but overloading ( ) more to avoid overloading
[ ] doesn't make much sense.
We need more brackets.

It matters little except to avoid []. Maybe (), <>, {}, (()), or XMLisms could work. The main idea is to use a self-closing, nesting syntax instead of C's flat syntax. A pseudo-functional form stands to reason because the type declaration is a kind of compile-time function. Parameters go in, a type comes out. One could argue that <> makes more sense from a C++ familiarity and semantics standpoint. I would not quibble over such details. Here is my quibble. Instead of making the input parameters clear, C and D use cryptic, subtle clues: [] vs. [N], embedding the symbol inside its own type signature, and using implicit rules of precedence and associativity. Explicit parameters make more sense. Writing very involved C and C++ type definitions teaches one to composit typedefs with each other, avoiding C syntax completely at almost every step. This procedure is tantamount to shutting the language down and suggests that something is very wrong with it. http://compilers.iecc.com/comparch/article/03-06-010 "This is very true. When computer languages skirt the edge of ambiguity, people often write things they think are correct, but which are actually logical errors. For example, most people assume left-associative exponentiation, but right-associative exponentiation is also a valid interpretation of the mathematics and concepts involved. So if your language has an exponentiation operator, you have to make an explicit decision and specify it: is exponentiation left-associative, right-associative, or do you require parens or the equivalent to make it explicit? And after getting bitten a few times anyway, which inevitably happens, most programmers learn to use parentheses defensively, to prevent exactly that kind of ambiguity, even when the language has a rule for resolving it. That is, even in a language that has a rule for resolving a semantic ambiguity, people have to think about it and defend against misinterpretation - as much by themselves as by the language system. I've been bitten this way by C's address-of and dereferencing operators not associating the way I expect them to and requiring parentheses to disambiguate, many times. And now I just use parens as part of those operators because I don't want to sweat out some obscure bug caused by me taking one view of how something would be parsed and the compiler taking another (my LISP background shows here, I guess)."
Jul 01 2003
prev sibling parent "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 The question is really, Why use square brackets for so many purposes?
 Any syntax element with multiple meanings will be painful.  In C and
 D alike, square brackets both declare and index arrays.  The
 confusion grows in D with the proliferation of arrays:  VLA's,
 associative array declarations, strings, and slicing.  D is
 overloading the [] syntax beyond reasonable limits.

I don't really think so. VLAs are still arrays, and so are strings (who inherit the C notion of being an "array of characters", even though the way D specifies arrays makes it *much* safer than the C variant). That associative arrays behave like their non-associative counterparts is, given the name, pretty obvious to me. Remains the issue of slicing - but when you go the other way round and view array indexing as a special case of slicing (which it is), we're at exactly 2 uses: array declaration and array slicing. The whole point of C-style declarations being that a declaration looks just like the actual use (cf. pointers), I don't see much of an issue with that.
 One can hack through the undergrowth in a simple way.  Separate type
 declarations from indexing.  Reserve [] for indexing and slicing
 only.  There are other ways to declare types.

 A phrase like 'array(int,N,fixed)' could declare an N-dimensional
 array of int, 'array(double,N,variable)' a VLA of doubles with
 initial size N.  A mixed case might read
 'array(array(int,N,fixed),M,variable)'.  If you don't like these
 notions, invent your own.  There are no limits except to keep [] out
 of the type declarations.

I really don't see much point in this - as said, the idea about C declaration syntax *was* to look and behave like actual code, and the awful syntax of function pointers aside I think this both makes sense and is intuitive. What is your problem with that notion, exactly? -fg
Jul 01 2003
prev sibling next sibling parent "Sean L. Palmer" <palmer.sean verizon.net> writes:
This is a terse version of Pascal type specifier syntax, which read left to
right.  array[0..3] of ^ foo, or something of that nature... my Pascal days
are rapidly becoming a faded memory.

It is also almost exactly the method I chose for my old scripting language,
now known as Scrap.  ;)

Very simple to parse.  Very easy to remember, or to read.  I recommend it
thoroughly.

Sean

"Russ Lewis" <spamhole-2001-07-16 deming-os.org> wrote in message
news:bdpt1j$m5q$1 digitaldaemon.com...
 What if we declare array variables and types with the indices on the
 left?  Then we can still "build up" from basic types but still have
 in-order array indicies:
 [type_foo][]int var;
 which would be an associative array (indexed by 'type_foo') of dynamic
 arrays of int.  var would be accessed in the same index-order as the
 declaration:
 var[foo_index][int_index] = <val>;

 Interestingly, this also makes it easy and nonambiguous when you are
 mixing array and pointer modifiers:
 [type_foo]*[]*int var2;
 would be an associative array of pointers to dynamic arrays to pointers
 to int.

 I know that this looks horribly backwards to all of us who grew up on C.
   But the reverse-index problem is also a horrible backwards, and it is
 NOT immediately obvious to new programmers (or even to old programmers,
 like me).  Maybe htis ugliness is better than the current?

 AN INTERESTING SIDE-EFFECT is that now it is (more) obvious from the
 syntax that all variables in a multiple-declaration are the same type.
 In the current syntax,
 int *var3,var4;
 declares two pointers, but looks (at first glance) like it's one pointer
 and one int.  But this syntax: is more obvious (at least to me):
 *int var3,var4;

 Thoughts, anyone?

Jun 30 2003
prev sibling parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 AN INTERESTING SIDE-EFFECT is that now it is (more) obvious from the
 syntax that all variables in a multiple-declaration are the same type.
 In the current syntax,
 int *var3,var4;
 declares two pointers, but looks (at first glance) like it's one
 pointer and one int.  But this syntax: is more obvious (at least to
 me): *int var3,var4;

I'd rather simply not use the "int *var3,var4" notation in D code anymore, but the perfectly legal and far more descriptive variant "int* var3,var4" instead :) -fg
Jul 01 2003
parent reply Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Fabian Giesen wrote:
AN INTERESTING SIDE-EFFECT is that now it is (more) obvious from the
syntax that all variables in a multiple-declaration are the same type.
In the current syntax,
int *var3,var4;
declares two pointers, but looks (at first glance) like it's one
pointer and one int.  But this syntax: is more obvious (at least to
me): *int var3,var4;

I'd rather simply not use the "int *var3,var4" notation in D code anymore, but the perfectly legal and far more descriptive variant "int* var3,var4" instead :) -fg

I heard you, but then you've violated one of the fundamental assumptions of the C family: that whitespace is only used for delimiting tokens, not for syntax. :(
Jul 01 2003
parent reply "Fabian Giesen" <rygNO SPAMgmx.net> writes:
 I heard you, but then you've violated one of the fundamental
 assumptions
 of the C family: that whitespace is only used for delimiting tokens,
 not for syntax. :(

It is only being used for delimiting tokens. D *always* groups * with the type, regardless of whitespace (C/C++ always group with the variable, again regardless of whitespace). For grouping with variables, the descriptive (and intuitive) way to write it is int *x,y; For grouping with types, int* x,y; makes far more sense. Both variants are absolutely equal in C/C++/D as far as parsing is concerned and I do not propose to change that - however, as said, it's just more natural to write "int* x,y" the way D parses declarations. -fg
Jul 01 2003
parent Russ Lewis <spamhole-2001-07-16 deming-os.org> writes:
Fabian Giesen wrote:
I heard you, but then you've violated one of the fundamental
assumptions
of the C family: that whitespace is only used for delimiting tokens,
not for syntax. :(

It is only being used for delimiting tokens. D *always* groups * with the type, regardless of whitespace (C/C++ always group with the variable, again regardless of whitespace). For grouping with variables, the descriptive (and intuitive) way to write it is int *x,y; For grouping with types, int* x,y; makes far more sense. Both variants are absolutely equal in C/C++/D as far as parsing is concerned and I do not propose to change that - however, as said, it's just more natural to write "int* x,y" the way D parses declarations. -fg

Oh, so you're just talking about a coding convention? Yeah, I totally agree, and I already do that. :) Russ
Jul 01 2003