www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - D Multidimensional arrays wierdness

reply Johnson Jones <JJ Dynomite.com> writes:
I need to get this straight:

A normal single dimensional array in D is defined as

T[] arr

and is a linear sequential memory array of T's with an unbound 
length and is effectively the same as T*(although D treats them 
differently?)?

We can fix the length by adding a upper bound:

T[N] arr;

and this is equivalent to

auto arr = cast(T[])malloc(T.sizeof*N)[0..N];

possibly

auto arr = cast(T[N])malloc(T.sizeof*N);

But then arr is a fixed type and we can't use to resize the array 
later if needed.

these don't actually work, for some odd reason, so

auto arr = cast(T*)malloc(T.sizeof*N)[0..N];

seems to be the way to go.

So, what this "shows" is that in memory, we have
relative address         type
0                          T
1*sizeof(T)                T
2*sizeof(T)                T
...
N-1*sizeof(T)              T



This is pretty basic and just standard arrays, all that is fine 
and dandy!


Now, when it comes to multidimensional arrays:

T[][] arr;

There are two ways that the array can be laid out depending on 
how we interpret the order of the row/col or col/row.


The most natural way to do this is to extend single dimensional 
arrays:

T[][] is defined to be (T[])[]

or, lets used a fixed array so we can be clear;

T[N][M]

which means we have M sequential chunks of memory where each 
chunk is a T[N] array.

This is the natural way because it coincides with single arrays.

Similary, to access the element at the nth element in the mth 
chunk, we do

t[n][m] because, again, this conforms with out we think of single 
arrays.


Now, in fact, it doesn't matter too much if we call a row a 
column and a column a row(possibly performance, but as far as 
dealing with them, as long as we are consistent, everything will 
work).


BUT! D seems to do something very unique,

If one defines an array like

T[N][M]


one must access the element as

t[m][n]!

The accessors are backwards!

This is a huge problem!


int[3][5] a;

Lets access the last element:

auto x = a[4][2];
auto y = a[2][4]; <- the logical way, which is invalid in D

This method creates confusion and can be buggy. If our array is 
not fixed, and we use the *correct* way, then our bugs are at 
runtime and maybe subtle.

Why? Because the correct way only has one thing to get right, 
which is being consistent, which is easy.

In D, we not only have to be consistent, we also have to make 
sure to reverse our array accessors from how we defined it.

While it is a unique approach and may have something to do with 
quantum entanglement, I'm curious who the heck came up with the 
logic and if there is actually any valid reason?

Or are we stuck in one of those "Can't change it because it will 
break the universe" black holes?
Aug 28 2017
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Tuesday, 29 August 2017 at 03:16:13 UTC, Johnson Jones wrote:
 T[] arr

 and is a linear sequential memory array of T's with an unbound 
 length and is effectively the same as T*(although D treats them 
 differently?)?
It is a T* AND a size variable bundled together.
 We can fix the length by adding a upper bound:

 T[N] arr;
That's actually an entirely different type. This is now a block of static memory instead of a pointer+length combo.
 and this is equivalent to

 auto arr = cast(T[])malloc(T.sizeof*N)[0..N];
No, it isn't. The static array T[N] has no such allocation, it is all in-place.
 Now, when it comes to multidimensional arrays:

 T[][] arr;
That's an array of arrays. A pointer+length combo to a bunch of other pointer+length combos.
 or, lets used a fixed array so we can be clear;

 T[N][M]

 which means we have M sequential chunks of memory where each 
 chunk is a T[N] array.
That's correct though, with the caveat that there's no padding between the inner arrays; the T[N][M] is a single block of memory with size N * M * T.sizeof.
 Similary, to access the element at the nth element in the mth 
 chunk, we do

 t[n][m] because, again, this conforms with out we think of 
 single arrays.
I've never understood how anyone thought that made any sense. I have always thought C was backwards and illogical for that nonsense and even after many years of using C, I'd still often get it "backwards". D doing the sane thing and having consistent order is one of the things I like about the language.
 T[N][M]

 one must access the element as

 t[m][n]!

 The accessors are backwards!
What is t[m]? It gets the m'th element of t, right? Well, since t is an array of N items, it makes perfect logical sense that t[m] is is of type T[N]. Just like how T[N]* is a pointer to a T[N] (which is an array of T), or if it were a function, t(n) would be its return value. D always follows this simple recursive rule. Now, what's important about all this is the built-in things are arrays of arrays rather than multi-dimensional arrays. D also supports multi-dimensional arrays, which are indexed: T[x, y]. The actual type of them is not built into the language though - it needs to be defined as a library type. I'm not sure it ever got into phobos but it is available here: https://github.com/libmir/mir-algorithm Or you can write your own pretty quickly with a static array + the opIndex overload.
Aug 28 2017
prev sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Tuesday, August 29, 2017 03:16:13 Johnson Jones via Digitalmars-d-learn 
wrote:
 I need to get this straight:

 A normal single dimensional array in D is defined as

 T[] arr

 and is a linear sequential memory array of T's with an unbound
 length and is effectively the same as T*(although D treats them
 differently?)?
A dynamic array is essentially a struct defined like so struct DynamicArray(T) { size_t length; T* ptr; } In actuality, for historical reasons, I think that it's unfortunately defined with void* at the moment instead of being templated, but effectively, that struct is what you're dealing with when you have a dynamic array. If you want the C/C++ equivalent, you use the ptr property to get at the underlying pointer.
 We can fix the length by adding a upper bound:

 T[N] arr;

 and this is equivalent to

 auto arr = cast(T[])malloc(T.sizeof*N)[0..N];
 possibly

 auto arr = cast(T[N])malloc(T.sizeof*N);
They are not at all equivalent. Fixed size arrays live on the stack, just like you'd get with T arr[N]; in C/C++. Static arrays have a fixed length and live on the stack (or directly inside of an object on the heap if they're a member variable of an object on the heap), and they have no indirections. With dynamic arrays, the struct itself that is the dynamic array lives directly on the stack or wherever it's put, but it refers to any slice of memory. Usually, that memory is on the GC-allocated heap, but it could be malloced memory or even a slice of a static array on the static. They're resizable simply because the GC will either reallocate memory to resize them (e.g. if the memory is not GC-allocated or because there isn't enough room to expand it in place), or if the memory is GC-allocated, and there's room beyond the end of that slice of memory, then the dynamic array will just be made to refer to a larger slice of that block of memory.
 But then arr is a fixed type and we can't use to resize the array
 later if needed.

 these don't actually work, for some odd reason, so

 auto arr = cast(T*)malloc(T.sizeof*N)[0..N];

 seems to be the way to go.

 So, what this "shows" is that in memory, we have
 relative address         type
 0                          T
 1*sizeof(T)                T
 2*sizeof(T)                T
 ...
 N-1*sizeof(T)              T



 This is pretty basic and just standard arrays, all that is fine
 and dandy!


 Now, when it comes to multidimensional arrays:

 T[][] arr;

 There are two ways that the array can be laid out depending on
 how we interpret the order of the row/col or col/row.


 The most natural way to do this is to extend single dimensional
 arrays:

 T[][] is defined to be (T[])[]

 or, lets used a fixed array so we can be clear;

 T[N][M]

 which means we have M sequential chunks of memory where each
 chunk is a T[N] array.

 This is the natural way because it coincides with single arrays.

 Similary, to access the element at the nth element in the mth
 chunk, we do

 t[n][m] because, again, this conforms with out we think of single
 arrays.


 Now, in fact, it doesn't matter too much if we call a row a
 column and a column a row(possibly performance, but as far as
 dealing with them, as long as we are consistent, everything will
 work).


 BUT! D seems to do something very unique,

 If one defines an array like

 T[N][M]


 one must access the element as

 t[m][n]!

 The accessors are backwards!

 This is a huge problem!

 int[3][5] a;

 Lets access the last element:

 auto x = a[4][2];
 auto y = a[2][4]; <- the logical way, which is invalid in D

 This method creates confusion and can be buggy. If our array is
 not fixed, and we use the *correct* way, then our bugs are at
 runtime and maybe subtle.

 Why? Because the correct way only has one thing to get right,
 which is being consistent, which is easy.

 In D, we not only have to be consistent, we also have to make
 sure to reverse our array accessors from how we defined it.

 While it is a unique approach and may have something to do with
 quantum entanglement, I'm curious who the heck came up with the
 logic and if there is actually any valid reason?

 Or are we stuck in one of those "Can't change it because it will
 break the universe" black holes?
It's doing exactly what C/C++ would do if they declared static arrays as T[N][M] arr; instead of T arr[N][M]; The issue is that types are normally read outwards from the variable name. You mostly don't notice it, but it becomes critical to understand when trying to read function pointer types in C/C++. Regardless, a side effect of that is that putting the array lengths on the right like C/C++ does puts them in the order that people expect, whereas putting them on the left like D does makes them seem backwards. It's consistent with how the compiler reads types though. I'm inclined to think that we'd have been better off to introduce an inconsistency and have the lengths be read from left-to-right rather than outward from the type and thus right-to-left, but we're stuck at this point, and it _is_ consistent with how C/C++ reads types. Ultimately though, all it means is that you need to be aware that when you have a multi-dimensional static array, the lengths go from right-to-left instead of left-to-right. Indexing them at that point is perfectly normal and functions like it would for dynamic arrays. You just have to be aware of the seemingly backwards direction of the lengths in the declaration for a static array. - Jonathan M Davis
Aug 28 2017