digitalmars.D - Array types

bearophile (68/68) Aug 26 2010 Arrays are one of the most useful and most efficient data structures for...

Norbert Nemec (25/36) Aug 26 2010 Yippie-yeah! My favorite topic is being mentioned! I have to give a

bearophile (10/20) Aug 26 2010 I think all the features you list are *additive* changes for D language....

Gareth Charnock (5/13) Aug 26 2010 Well this might be a good time to mention that I've started to get back
Nick B (9/23) Aug 28 2010 Bearophile

bearophile (5/7) Aug 28 2010 They often don't give a significant performance increase. They increase ...

bearophile <bearophileHUGS lycos.com> writes:

Arrays are one of the most useful and most efficient data structures for
nonfunctional languages. They look simple, but in a low-level language you
sometimes need various kinds of them. So they are not so simple.

In D there are two kinds of built-in arrays:
A) 1D Dynamic arrays on the heap
B) Almost-nD rectangular fixed-sized arrays allocated on the stack by functions
(or data segment), they may also go on the heap if they are part of an object
(and with placement new they may go everywhere).

So far in D code I have had need of (and I have seen some people need) some
other kinds of arrays, not (well) supported by D:
C) nD rectangular dynamic arrays;
D) Dynamic arrays allocated on the stack;
E) Fixed-sized arrays allocated on the heap;
F) Variable-sized structs allocated on the heap.

----------------------

The (C) arrays are not the same thing as dynamic arrays of dynamic arrays
because:
- Some algorithms are not designed for a triangle where rows may differ in
length. Testing that rows are all the same length at runtime wastes time, and
if you don't test it then it may cause bugs;
- Sometimes you want to reshape a 2D matrix, so it's useful to store the matrix
in contiguous memory;
- Sometimes you need complex slicing, for example removal of some columns and
rows from a 2D matrix.

Recently in D.learn we have discussed this a bit.


that are manages with a syntax similar to Pascal arrays: arr[x,y,z]. I think
there is no so need to put them into D. But I think it's important to have a
official type of them in Phobos, for example implemented as a struct (only the
item type and number of dimensions are known at compile time):

RectangularArray!(T, int ndimensions)

----------------------

The (D) arrays are sometimes useful in some high-performance routines. You
can't implement them in Phobos. C99 has added them as VLA arrays. In bug 4357 I
have proposed to use "scope", but that was not appreciated.

If you want (D) arrays you currently need to write something like (manually
inlined code):

T[] arr = (cast(T*)alloca(n * T.sizeof))[0 .. n];
arr[] = T.init; // optional, but useful if T is a pointer or struct that
contains pointers

This is not too much bad, and using a template mixin may help the situation.

D is designed to make simple the safe operations (like to create (A) arrays),
and to make more elaborate the less safe operations. But in my opinion it's
wrong to make something less safe than possible. Even if stack-allocated arrays
are unsafe (if you escape their pointer/reference/fat pointer you will have
bugs), there is no need to make _their creation too_ more unsafe/bug-prone than
necessary.

A possibility is to use the C99 syntax, that is just allowing runtime variables
when you define a "fixed-sized" array:

int n = 10;
int[n] arr;

That syntax is clean, readable, standard (it comes from C), easy to understand.
A disadvantage is that visually it's not easy to tell apart a VLA from a normal
fixed-sized array.

This is not allowed, because n is a runtime value:

int[n] foo(int n) {
  int[n] arr;
  return arr;
}

But this is allowed, and the arr needs to be copied on the heap:

int[] foo(int n) {
  int[n] arr;
  return arr;
}

I don't like this much, it adds some complexity and more cases.

----------------------

I have used (E) arrays while creating a deque, composed of fixed sized arrays
on the heap. In this case their lengths are fixed and known at compile-time. In
D I usually implement them wrapping them into a struct:

struct Foo(T, int SIZE) {
    T[SIZE] a;
}
void main() {
    auto f = new Foo!(int, 1024)();
}

Later I use the array as f.a[x]. This is not too much bad. Implementing (E)
arrays in D gives problems syntax-wise, because in C you are free to use
pointers too as arrays. So I think the current situation (using the wrapper
struct) is good enough for the few situations were (E) arrays are useful.

----------------------

D supports zero-length fixed-sized arrays, useful to implement (F) "arrays"
(but it seems LLVM does't support them, so in LDC they need 1 bit, I am not
sure where such bit actually goes).

Variable-sized structs are not commonly used in high-level code, they are
uncommon in general. So there is no need for D to support them as built-ins,
they may be implemented manually by the programmer as need arise, also because
they probably need to be quite customized for each situation.

On the other hand it's not too much bad to put into Phobos an intrusive
Variable-sized struct type, something like:

struct VariableLengthStruct(T, Cargo) {
   Cargo cargo;
   size_t length;
   T[0] data;
   // operator methods
}

Where Cargo is the type of the intrusive payload, and VariableLengthStruct
implements the opIndex method too that maps on the data attribute.

A helper function too may be added to help the allocation of a
VariableLengthStruct.

In some situations VariableLengthStruct can't be used, because you need more
custom structure, but I think that VariableLengthStruct is sometimes enough.

----------------------

The summary of this post is:
- It's positive to add efficient and _standard_ nD rectangular dynamic arrays
to Phobos, as RectangularArray.
- Maybe a standard variable-sized struct is good in Phobos, like
VariableLengthStruct.
- I'd still like syntax support for variable-sized stack-allocated arrays. But
I don't like the extra complexity it seem to need. More thinking is needed
about this.

Bye,
bearophile

Aug 26 2010

Norbert Nemec <Norbert Nemec-online.de> writes:

Yippie-yeah! My favorite topic is being mentioned! I have to give a 
comment... :-)

On 26/08/10 22:38, bearophile wrote:
 [...]
 So far in D code I have had need of (and I have seen some people need) some
other kinds of arrays, not (well) supported by D:
 C) nD rectangular dynamic arrays;
 [...]
 The (C) arrays are not the same thing as dynamic arrays of dynamic arrays
because:
 - Some algorithms are not designed for a triangle where rows may differ in
length. Testing that rows are all the same length at runtime wastes time, and
if you don't test it then it may cause bugs;
 - Sometimes you want to reshape a 2D matrix, so it's useful to store the
matrix in contiguous memory;
 - Sometimes you need complex slicing, for example removal of some columns and
rows from a 2D matrix.

 Recently in D.learn we have discussed this a bit.


that are manages with a syntax similar to Pascal arrays: arr[x,y,z]. I think
there is no so need to put them into D. But I think it's important to have a
official type of them in Phobos, for example implemented as a struct (only the
item type and number of dimensions are known at compile time):

 RectangularArray!(T, int ndimensions)

I still have my doubt that rectangular arrays can be implemented to full 
power in the library. Currently, there is a number of essential 
languages missing to allow a sane syntax:

* ranges 'A..B' as standalone expression
* following from this: replacement of opSlice by opIndex overloaded for 
builtin type of range expression
* extension of range expression by stride (e.g. 'A..B:C' )
* handling of opDollar '$' for multiple dimensions

Beyond this, there are a few issues where I don't see that a 
library-based solution will ever have the full power of a builtin 
language feature:

* array expressions with clean and easy to use syntax
* syntax of type declarations: double[A,B] for static arrays and 
something like double[[3]] or double[:,:] for dynamic arrays are just 
far more straightforward and readable than any template-based syntax I 
could imagine.

Ultimately, I think that rectangular arrays are as fundamental of 
numerical computing as hash-maps and strings for other areas. Moreover, 
there is a huge number of physicists working with Fortran95 simply 
because it is about as simple to use as Basic, without classes, let 
alone templates. The mere thought of having to touch classes or 
templates would deter many of them to consider D an option.

Aug 26 2010

bearophile <bearophileHUGS lycos.com> writes:

Norbert Nemec:

I have to give a comment... :-)

Thank you. I see you talk about only one of the several types of arrays.


 I still have my doubt that rectangular arrays can be implemented to full 
 power in the library.

I think all the features you list are *additive* changes for D language. So I
don't consider them a problem. One of them, regarding opDollar, is even already
scheduled for addition (I think). So I am not so much worried.

What worries me are the (little) breaking changes, I have listed some possible
ones in a very recent post.


 Ultimately, I think that rectangular arrays are as fundamental of 
 numerical computing as hash-maps and strings for other areas.

But is D very interested in numerical computing? It has some interest in this
field (see the care about Floating point matters. FP matters in D are not
portable on very different CPUs as in Ada, but they are deep).

You can't add all features in D2, so it needed to prioritize. And seeing how
all the changes you suggest are additive ones, I think D2 designers have
prioritized the right things :-)

For numerical computing D has to learn from Chapel language, I think. It shows
solutions that are general and look natural, they aren't little local
single-trick hacks (as the .. range syntax of foreach).


 Moreover, 
 there is a huge number of physicists working with Fortran95 simply 
 because it is about as simple to use as Basic, without classes, let 
 alone templates. The mere thought of having to touch classes or 
 templates would deter many of them to consider D an option.

If some numeric-oriented features will be added to D3, those researchers will
use already build abstractions (in the language, std lib or external libs), so
they will have only limited need to use templates.

Bye,
bearophile

Aug 26 2010

Gareth Charnock <gareth.charnock gmail.com> writes:

 ----------------------

 The (C) arrays are not the same thing as dynamic arrays of dynamic arrays
because:
 - Some algorithms are not designed for a triangle where rows may differ in
length. Testing that rows are all the same length at runtime wastes time, and
if you don't test it then it may cause bugs;
 - Sometimes you want to reshape a 2D matrix, so it's useful to store the
matrix in contiguous memory;
 - Sometimes you need complex slicing, for example removal of some columns and
rows from a 2D matrix.

 Recently in D.learn we have discussed this a bit.


that are manages with a syntax similar to Pascal arrays: arr[x,y,z]. I think
there is no so need to put them into D. But I think it's important to have a
official type of them in Phobos, for example implemented as a struct (only the
item type and number of dimensions are known at compile time):

 RectangularArray!(T, int ndimensions)

Well this might be a good time to mention that I've started to get back 
to working on that std.matrix proposal for Phobos from a few months 
back. Unfortunately, I wasn't able to work on it for a few months 
(personal reasons). However, this may only solve the problem for numeric 
types. On the other hand, with strategic use of __traits and static if....

Aug 26 2010

Nick B <"nick_NOSPAM_.barbalich" gmail.com> writes:

On 27/08/2010 9:38 a.m., bearophile wrote:
 Arrays are one of the most useful and most efficient data structures for
nonfunctional languages. They look simple, but in a low-level language you
sometimes need various kinds of them. So they are not so simple.

 In D there are two kinds of built-in arrays:
 A) 1D Dynamic arrays on the heap
 B) Almost-nD rectangular fixed-sized arrays allocated on the stack by
functions (or data segment), they may also go on the heap if they are part of
an object (and with placement new they may go everywhere).

 So far in D code I have had need of (and I have seen some people need) some
other kinds of arrays, not (well) supported by D:
 C) nD rectangular dynamic arrays;
 D) Dynamic arrays allocated on the stack;
 E) Fixed-sized arrays allocated on the heap;
 F) Variable-sized structs allocated on the heap.

 ----------------------

 The (C) arrays are not the same thing as dynamic arrays of dynamic arrays
because:
 - Some algorithms are not designed for a triangle where rows may differ in
length. Testing that rows are all the same length at runtime wastes time, and
if you don't test it then it may cause bugs;
 - Sometimes you want to reshape a 2D matrix, so it's useful to store the
matrix in contiguous memory;
 - Sometimes you need complex slicing, for example removal of some columns and
rows from a 2D matrix.

Bearophile

Have you got any numbers to back your claim for increased performance 
for nD rectangular dynamic arrays ?

I too, am interested  in them, but would like to use them with Tango, 
and therefore if there was a increase in performance, and therefore a 
real benefit, then would ask that they be added to the D language, so 
that all in the D community could benefit.

Nick B

Aug 28 2010

bearophile <bearophileHUGS lycos.com> writes:

Nick B:
 Have you got any numbers to back your claim for increased performance 
 for nD rectangular dynamic arrays ?

They often don't give a significant performance increase. They increase a bit
cache coherence if you scan them by rows, and they waste a bit less memory, but
usually this is not the purpose of their existence.

(If their row length is always a power of two, you can find the item with a
shift and sum, this sometimes is a bit faster than using an array of pointers
or a product and sum. But this is not a common case, and unfortunately cache
associativity decreases the efficiency of this case).

Bye,
bearophile

Aug 28 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Array types