www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Designing a matrix library for D

reply "bearophile" <bearophileHUGS lycos.com> writes:
I'd like a matrix library for D, able to handle sparse matrices
too. But first I'd like to discuss its API and general design a
bit.

---------------------------

I think a starting point for the design is visible in this paper,
"Optimizing Array Accesses in High Productivity Languages" by
Mackale Joyner et al.:
http://www.cs.rice.edu/~vs3/PDF/hpcc07-176.pdf

The paper is about optimization, but what's interesting for me is
that it also shows the API and usage of the matrices of the X10
language, that was designed for heavy numerical computations.

Those collections of X10 are based on "points" and "regions".

A Point is a vector of N keys, that are indexes of a vector (like
the row,col for a matrix, or the key for an associative array).
If N is a compile-time constant we lose some flexibility but we
gain efficiency. In theory we want both.

A Region defines part of a collection. In the simpler case it's a
slice of a 1D array, or it can be a 2D rectangular region of a 2D
array, or more. So it's an array of M points. Usually you want M
to be a compile-time constant, but in some cases you want more
flexibility. In some cases you want to intersect regions, or you
want non-convex regions, so the situation gets a little more
complex.

The paper lists some Region operations, that I think can be
implemented with no changes to the D language:

R.rank ::= # dimensions in region;
R.size() ::= # points in region
R.contains(P) ::= predicate if region R contains point P
R.contains(S) ::= predicate if region R contains region S
R.equal(S) ::= true if region R and S contain same set of points
R.rank(i) ::= projection of region R on dimension i (a
one-dimensional region)
R.rank(i).low() ::= lower bound of i-th dimension of region R
R.rank(i).high() ::= upper bound of i-th dimension of region R
R.ordinal(P) ::= ordinal value of point P in region R
R.coord(N) ::= point in region R with ordinal value = N
R1 && R2 ::= region intersection (will be rectangular if R1 and
R2 are rectangular)
R1 || R2 ::= union of regions R1 and R2 (may or may not be
rectangular,compact)
R1 - R2 ::= region difference (may or may not be
rectangular,compact)


The paper also shows some ideas for array operations:

A.rank ::= # dimensions in array
A.region ::= index region (domain) of array
A.distribution ::= distribution of array A
A[P] ::= element at point P, where P belongs to A.region
A | R ::= restriction of array onto region R (returns copy of
subarray)
A.sum(), A.max() ::= sum/max of elements in array
A1 <op> A2 ::= returns result of applying a point-wise op on
array elements,
when A1.region = A2. region
(<op> can include +, -, *, and / )
A1 || A2 ::= disjoint union of arrays A1 and A2
(A1.region and A2.region must be disjoint)
A1.overlay(A2) ::= array with region, A1.region || A2.region,
    with element value A2[P] for all points P in A2.region
    and A1[P] otherwise.

---------------------------

I've found a second source of ideas in this page:
http://dis.um.es/~alberto/hmatrix/static.html

It's a Haskell matrix library named "hmatrix". It's meant to be
"safe" because it performs "automatic inference and compile-time
checking of dimensions in matrix and vector operations.". This
means that in many cases arrays and matrices have a size known at
compile-time (so it's part of the type) and the library verifies
statically that they match:

(matrix
   [ 2.0, 0.0, -1.0
   , 1.0, 1.0,  7.0
   , 5.0, 3.0,  1.0
   , 2.0, 8.0,  0.0 ] :: L 4 3)


But:

The size of the result of certain computations can only be known 
at run time. For instance, the dimension of the nullspace of 
matrix depends on its rank, which is a nontrivial property of 
its elements:<

The library solves this problem using a GHC compiler language extension:
By hiding the unknown dimension in an existential type we can 
still compute safely with a strongly typed nullspace.<

Docs about Haskell existential types: http://www.haskell.org/haskellwiki/Existential_type This is a normal Haskell record with specified types: data Worker b x y = Worker {buffer :: b, input :: x, output :: y} That is similar to this D code: class Worker(B, X, Y) { B buffer; X input; Y output; } Existential types can be used to "hide" a type on the left: data Worker x y = forall b. Buffer b => Worker {buffer :: b, input :: x, output :: y} This allows to "hide" the size of an array where its length is not known at compile-time. I am not an Haskell expert, but I think this is a kind of type erasure (where here the type is the size of the array). With this strategy in most cases the compiler is able to enforce the arrays are of right types at compile-time, and allows run-time sizes for the few remaining cases. In D this idea should avoid template bloat caused by all the different array dimensions, but allow optimizations based on the compile-time knowledge of sizes in the points where the programmer asks for more efficiency (like unrolling on request in certain zones if the size is just 4). Is all this possible with the type system of D? Perhaps it's possible with the enum preconditions or some of the stuff discussed in the "Value range propagation for if-else" thread. Improving the D type system could be an option, if useful. Bye, bearophile
Jun 23 2014
next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Jun 23, 2014 at 04:45:27PM +0000, bearophile via Digitalmars-d wrote:
 I'd like a matrix library for D, able to handle sparse matrices
 too. But first I'd like to discuss its API and general design a
 bit.

When it comes to libraries that need to handle diverse concrete types (e.g. compact matrices vs. sparse matrices), my take is always to write the matrix algorithms in a way that's independent of the concrete matrix type(s), and to provide concrete types as a separate submodule (or logically distinct section of the API). [...]
 Those collections of X10 are based on "points" and "regions".
 
 A Point is a vector of N keys, that are indexes of a vector (like
 the row,col for a matrix, or the key for an associative array).
 If N is a compile-time constant we lose some flexibility but we
 gain efficiency. In theory we want both.
 
 A Region defines part of a collection. In the simpler case it's a
 slice of a 1D array, or it can be a 2D rectangular region of a 2D
 array, or more. So it's an array of M points. Usually you want M
 to be a compile-time constant, but in some cases you want more
 flexibility. In some cases you want to intersect regions, or you
 want non-convex regions, so the situation gets a little more
 complex.

I think supporting non-convex (or non-rectangular) regions is a bit too ambitious. Basic support should focus on rectangular/cuboidal/etc. regions, which have the advantage that they are closed under intersections (the intersection of two rectangular regions is always another rectangular region). Non-rectangular regions would destroy a lot of simplifying assumptions that can be made to streamline the design, and yet most applications I can think of need only rectangular regions. I don't think it's worth the price to support such unusual usages. (Besides, if we design the API correctly, they should still be supportable via wrapped matrix types -- this is why matric algorithms should not depend on concrete types, so that for special needs the user can simply define a wrapper that remaps points in such a way as to achieve the desired non-convex regions.)
 The paper lists some Region operations, that I think can be
 implemented with no changes to the D language:
 
 R.rank ::= # dimensions in region;
 R.size() ::= # points in region
 R.contains(P) ::= predicate if region R contains point P
 R.contains(S) ::= predicate if region R contains region S
 R.equal(S) ::= true if region R and S contain same set of points
 R.rank(i) ::= projection of region R on dimension i (a
 one-dimensional region)
 R.rank(i).low() ::= lower bound of i-th dimension of region R
 R.rank(i).high() ::= upper bound of i-th dimension of region R
 R.ordinal(P) ::= ordinal value of point P in region R
 R.coord(N) ::= point in region R with ordinal value = N
 R1 && R2 ::= region intersection (will be rectangular if R1 and
 R2 are rectangular)
 R1 || R2 ::= union of regions R1 and R2 (may or may not be
 rectangular,compact)
 R1 - R2 ::= region difference (may or may not be
 rectangular,compact)

My current thoughts on this, is that concrete matrix/array types should only support array element access and size measurements. Operations like intersections, subarrays, etc., should be done via wrapper types that remap element access and size measurements, but forward element accesses to the underlying concrete type. This way, we avoid excessive copying, which get very expensive with large or high-dimensional arrays.
 The paper also shows some ideas for array operations:
 
 A.rank ::= # dimensions in array
 A.region ::= index region (domain) of array
 A.distribution ::= distribution of array A

What's a "distribution"?
 A[P] ::= element at point P, where P belongs to A.region
 A | R ::= restriction of array onto region R (returns copy of
 subarray)

IMO, we should avoid all implicit copying. Let the user explicitly ask for a copy if a copy is desired. Implicit copying is the mortal enemy of high performance.
 A.sum(), A.max() ::= sum/max of elements in array
 A1 <op> A2 ::= returns result of applying a point-wise op on
 array elements,
 when A1.region = A2. region
 (<op> can include +, -, *, and / )
 A1 || A2 ::= disjoint union of arrays A1 and A2
 (A1.region and A2.region must be disjoint)
 A1.overlay(A2) ::= array with region, A1.region || A2.region,
    with element value A2[P] for all points P in A2.region
    and A1[P] otherwise.

All of these should be generic functions / wrappers, I think. [...]
 I've found a second source of ideas in this page:
 http://dis.um.es/~alberto/hmatrix/static.html

The size of the result of certain computations can only be known at
run time. For instance, the dimension of the nullspace of matrix
depends on its rank, which is a nontrivial property of its elements:<

The library solves this problem using a GHC compiler language extension:
By hiding the unknown dimension in an existential type we can still
compute safely with a strongly typed nullspace.<


 With this strategy in most cases the compiler is able to enforce
 the arrays are of right types at compile-time, and allows
 run-time sizes for the few remaining cases.
 
 In D this idea should avoid template bloat caused by all the
 different array dimensions, but allow optimizations based on the
 compile-time knowledge of sizes in the points where the
 programmer asks for more efficiency (like unrolling on request in
 certain zones if the size is just 4).
 
 Is all this possible with the type system of D? Perhaps it's
 possible with the enum preconditions or some of the stuff
 discussed in the "Value range propagation for if-else" thread.
 Improving the D type system could be an option, if useful.

IMO, if we separate matrix algorithms from concrete matrix types, we've already won half the battle. Some applications already know the exact sizes of the matrices they will use (e.g., 3D graphics apps / games will almost certainly use only 4x4 homogeneous matrices), so it doesn't make sense to force them to pay for the runtime penalty of dynamically-sized m*n matrices. So in this case, they would use matrices whose size is compile-time determined, probably as compile-time size parameters to a StaticMatrix template. But other applications, e.g., computer algebra systems, cannot know the size of matrices they will need until runtime. For these applications, compile-time matrix sizes can't be used, so they will need to work with dynamically-sized matrices. Furthermore, most applications only use matrices of a fixed rank (usually rank 2, rectangular/square matrices), so it makes sense to take rank as a compile-time parameter. But some applications (e.g., programs that perform tensor calculations) need to support runtime-determined ranks. But if we make rank a runtime parameter, the previous two categories of applications will suffer unacceptable performance degradation. But if we design a proper generic matrix concept, where *any* concrete type can be used by a given matrix algorithm provided it satisfies some required properties (analogous to input ranges satisfying the range API), then we can solve this problem by having multiple matrix modules that provide different concrete types (StaticMatrix, Matrix (of fixed rank), Tensor (dynamic rank), SparseMatrix, etc.), but we only need to implement a single set of matrix operations / algorithms that can work with any combination of concrete matrix types. (And of course, these algorithms can specialize on certain concrete types if it helps improve performance, but at least they should be able to handle anything that looks like a matrix.) The key to success, then, lies in how we design the generic matrix API. If we do it right, then it should be easy to implement features like compile-time size verifications, etc.. Otherwise it can be very painful. T -- Life is complex. It consists of real and imaginary parts. -- YHL
Jun 23 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 Operations like intersections, subarrays, etc., should be done
 via wrapper types

A Region is not an array of data, it's more like a [(x1,x2),(y1,y2)]. You can allocate an array given a Region, you can iterate a Region, etc.
 A.distribution ::= distribution of array A

What's a "distribution"?

It's a concept of higher level compared to Regions. It's less efficient than Regions, but it's more flexible. It allows to decouple the Point from the location of the data memory. So you can have a distribution that maps Points to various different concrete arrays. http://x10.sourceforge.net/x10doc/2.2.1/x10/array/DistArray.html http://x10.sourceforge.net/x10doc/2.2.1/x10/array/Dist.html I have not fully understood what a Distribution is. Bye, bearophile
Jun 23 2014
prev sibling next sibling parent "Wyatt" <wyatt.epp gmail.com> writes:
On Monday, 23 June 2014 at 16:45:28 UTC, bearophile wrote:
 I'd like a matrix library for D, able to handle sparse matrices
 too. But first I'd like to discuss its API and general design a
 bit.

array language features to D at some point. Things APL's monadics and dyadics, and generic lifting of functions and operators as described here: http://www.ccs.neu.edu/home/pete/pub/esop-2014.pdf It's been pretty low on my priority list, but it sounds similar to some of what you're laying out here. -Wyatt
Jun 23 2014
prev sibling next sibling parent "Mason McGill" <mmcgill caltech.edu> writes:
On Monday, 23 June 2014 at 16:45:28 UTC, bearophile wrote:
I'm glad to hear there's interest in this! I've actually been 
working on a generic multidimensional array library myself (on 
and off) for the past few months (I'm hoping to have a 
well-documented dub package by the end of the summer). It's a bit 
of a hybrid between Julia's `AbstractArray`, NumPy's `ndarray`, 
Numba, and std.range, and I think it has a very "D" feel.

It is, however, designed for processing sampled signals (rather 
than matrix algebra), which may be more or less in line with 
different people's interests.

Concepts:
   InputGrid: anything with a size (size_t[n]) and n-dimensional 
opIndex.
   OutputGrid: anything with a size (size_t[n]) and n-dimensional 
opIndexAssign.

Functionality:
   - Lazy `map` and `reduce`.
   - Lazy `zip` with built-in broadcasting.
   - Lazy n-dimensional convolution and cross-correlation.
   - Lazy sampling and interpolation.
   - `collect` to force evaluation.
   - Element-wise operators and and fancy indexing via mixins*.
   - Everyone's favorite MATLAB-style constructors
     (`zeros`, `identity`, `meshGrid`, etc.)
   - Audio/Video and JSON I/O, via GStreamer (though this
     probably ought to be a separate package).

*Because D doesn't allow UFCS for operators, which is silly, but 
that's the way it is.

Other features:
   - Compatibility with an automatic binding
     generator I've been working on.
   - Testing, because I've been using this at work :)

If people want to use this, it might make sense to add linear 
algebra to this package rather than building a separate, 
incompatible algebra library (see this thread: 
http://comments.gmane.org/gmane.comp.lang.julia.devel/14533).

I did a lot of thinking about the level of abstraction that would 
be the most useful, and I settled on something like Julia's 
`AbstractArray` because it fits into D nicely ("like ranges, but 
different!") and easily maps onto something that can be passed to 
other languages (a block of memory with size/strides metadata). 
As in Julia, I decided to make rank (`nDim!grid`) a compile-time 
entity since

1) Writing rank-branching code for NumPy is cumbersome and error 
prone; rank should be part of overload resolution.
2) It speeds up component-programming-style operations without 
requiring leaky abstractions (having to specialize on concrete 
grid types).

It's a busy week so I probably won't be able to follow this 
thread, but I wanted to let you know I was working on this to 
avoid duplicated efforts, etc.

Cheers,
Mason
Jun 23 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 The key to success, then, lies in how we design the generic 
 matrix API. If we do it right, then it should be easy to
 implement features like compile-time size verifications,
 etc.. Otherwise it can be very painful.

One point of my post was to see if it could be useful to extend and improve the current D type system. Apparently you are saying this is not needed for efficiency reasons (but perhaps some type system improvements could be useful to reduce template bloat with the usage of this matrix library. The idea is to perform the compile-time tests of the sizes, and then remove such types in most cases, avoiding the template bloat. There are manual ways to do this, but here it's better to receive help from the type system). Bye, bearophile
Jun 23 2014
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Jun 23, 2014 at 09:08:02PM +0000, Mason McGill via Digitalmars-d wrote:
 On Monday, 23 June 2014 at 16:45:28 UTC, bearophile wrote:
 I'm glad to hear there's interest in this! I've actually been working
 on a generic multidimensional array library myself (on and off) for
 the past few months (I'm hoping to have a well-documented dub package
 by the end of the summer). It's a bit of a hybrid between Julia's
 `AbstractArray`, NumPy's `ndarray`, Numba, and std.range, and I think
 it has a very "D" feel.

You are far from the first one to write a multidimensional array library in D. Denis has already done this some time ago, and I have, too. It's clear that we need to get our heads together and knock up something Phobos-worthy.
 It is, however, designed for processing sampled signals (rather than
 matrix algebra), which may be more or less in line with different
 people's interests.

I've thought about this a lot, and my conclusion is that matrix algebra is best left to a matrix-specific library wrapper that endows matrix algebra properties on top of a generic multidimensional array type. The main reason is that matrix algebra is rather peculiar to matrices specifically, particularly the non-commutative matrix product, and does not have very many useful usages outside of matrix algebra itself; yet multidimensional arrays in general are useful in a wider scope. So in my mind, there are several components that are needed here: - A generic multidimensional array, as a data container. This can be a set of diverse concrete types, e.g. to represent sparse matrices, etc.. No domain-specific operations are specified for these containers; they are purely for storage and access only. - Specialized array types, like matrices which add matrix algebra semantics to the underlying multidimensional array storage container (e.g., matrix product), or tensor wrappers (endows tensor product semantics on the underlying container). - Algorithms: these take any multidimensional array, or specialized matrix type (depending on the purpose of each algorithm), and operate on them using a common multidimensional array API. They should be as generic as they can be, but no more, e.g., LU decomposition algorithms would take a matrix type rather than the general multidimensional array type, because it is specific to matrices, whereas per-element summation would take any general multidimensional array type, since it doesn't need matrix-specific properties. Similarly, subarray operations should be able to work with any multidimensional array type, since it simply provides a restricted view on the underlying storage container, but doesn't depend on the implementational specifics of the container itself. [...]
 I did a lot of thinking about the level of abstraction that would be
 the most useful, and I settled on something like Julia's
 `AbstractArray` because it fits into D nicely ("like ranges, but
 different!") and easily maps onto something that can be passed to
 other languages (a block of memory with size/strides metadata). As in
 Julia, I decided to make rank (`nDim!grid`) a compile-time entity
 since
 
 1) Writing rank-branching code for NumPy is cumbersome and error prone; rank
 should be part of overload resolution.
 2) It speeds up component-programming-style operations without requiring
 leaky abstractions (having to specialize on concrete grid types).
 
 It's a busy week so I probably won't be able to follow this thread, but I
 wanted to let you know I was working on this to avoid duplicated efforts,
 etc.

I think the best solution would be a generic multidimensional concept (analogous to the input range concept) that will fit all of our multidimensional array implementations. Algorithms that can work with diverse ranks shouldn't care if the rank of a particular concrete type is compile-time or runtime, for example. Let the algorithms be adaptable (up to practicality, of course -- if an algo can't work or works very poorly with dynamic rank, then just add a sig constraint that requires static rank, for example), and the user can choose which concrete type(s) best suits his needs. The concrete types can then be provided by a separate module, and if we do it right, should be interoperable with each other. T -- "How are you doing?" "Doing what?"
Jun 23 2014
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Mon, Jun 23, 2014 at 09:09:03PM +0000, bearophile via Digitalmars-d wrote:
 H. S. Teoh:
 
The key to success, then, lies in how we design the generic matrix
API. If we do it right, then it should be easy to implement features
like compile-time size verifications, etc.. Otherwise it can be very
painful.

One point of my post was to see if it could be useful to extend and improve the current D type system. Apparently you are saying this is not needed for efficiency reasons (but perhaps some type system improvements could be useful to reduce template bloat with the usage of this matrix library. The idea is to perform the compile-time tests of the sizes, and then remove such types in most cases, avoiding the template bloat. There are manual ways to do this, but here it's better to receive help from the type system).

I'm not against using / extending the type system to improve performance or reduce template bloat. But it would be far more ambitious than to implement something that works with the current language. T -- Laissez-faire is a French term commonly interpreted by Conservatives to mean 'lazy fairy,' which is the belief that if governments are lazy enough, the Good Fairy will come down from heaven and do all their work for them.
Jun 23 2014
prev sibling next sibling parent "Mason McGill" <mmcgill caltech.edu> writes:
On Monday, 23 June 2014 at 21:33:40 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 You are far from the first one to write a multidimensional 
 array library
 in D. Denis has already done this some time ago, and I have, 
 too.

I've seen Denis' library. It seems nice, but it's a lot more concrete that what the original post was discussing. Is yours online somewhere (for inspiration)?
 I've thought about this a lot, and my conclusion is that matrix 
 algebra
 is best left to a matrix-specific library wrapper that endows 
 matrix
 algebra properties on top of a generic multidimensional array 
 type. The
 main reason is that matrix algebra is rather peculiar to 
 matrices
 specifically, particularly the non-commutative matrix product, 
 and does
 not have very many useful usages outside of matrix algebra 
 itself; yet
 multidimensional arrays in general are useful in a wider scope.

 So in my mind, there are several components that are needed 
 here:
 - A generic multidimensional array, as a data container. This 
 can be a
   set of diverse concrete types, e.g. to represent sparse 
 matrices,
   etc.. No domain-specific operations are specified for these
   containers; they are purely for storage and access only.

 - Specialized array types, like matrices which add matrix 
 algebra
   semantics to the underlying multidimensional array storage 
 container
   (e.g., matrix product), or tensor wrappers (endows tensor 
 product
   semantics on the underlying container).

 - Algorithms: these take any multidimensional array, or 
 specialized
   matrix type (depending on the purpose of each algorithm), and 
 operate
   on them using a common multidimensional array API. They 
 should be as
   generic as they can be, but no more, e.g., LU decomposition 
 algorithms
   would take a matrix type rather than the general 
 multidimensional
   array type, because it is specific to matrices, whereas 
 per-element
   summation would take any general multidimensional array type, 
 since it
   doesn't need matrix-specific properties. Similarly, subarray
   operations should be able to work with any multidimensional 
 array
   type, since it simply provides a restricted view on the 
 underlying
   storage container, but doesn't depend on the implementational
   specifics of the container itself.

Seems reasonable.
 I think the best solution would be a generic multidimensional 
 concept
 (analogous to the input range concept) that will fit all of our
 multidimensional array implementations. Algorithms that can 
 work with
 diverse ranks shouldn't care if the rank of a particular 
 concrete type
 is compile-time or runtime, for example. Let the algorithms be
 adaptable (up to practicality, of course -- if an algo can't 
 work or
 works very poorly with dynamic rank, then just add a
 constraint that
 requires static rank, for example), and the user can choose 
 which
 concrete type(s) best suits his needs.

One possible issue with dynamically-ranked grids is concept-testing. `isInputGrid!Collection` needs to check if `Collection` has a `size` property, and can be indexed with `size.length` indices. If this is to be done at compile-time, `size.length` needs to be constant.
 The concrete types can then be provided by a separate module, 
 and if we
 do it right, should be interoperable with each other.

Definitely an important design goal. You seem quite knowledgeable on this topic and I hope we can continue this conversation when my work on this is further along.
Jun 23 2014
prev sibling next sibling parent reply "David Bregman" <drb sfu.ca> writes:
Have you looked at armadillo for inspiration? 
http://arma.sourceforge.net/

It's a C++ matrix/linear algebra library which has:
-expression templates for high level optimizations (could be done 
even better in D of course, moving more stuff to compile time)
-high performance via lapack/blas
-deliberately matlab-like syntax which is appealing to a large 
subset of people who would use such a library

I think this approach would also be good for D, at least if the 
idea is to target numeric computing people. I am skeptical about 
making the library too generic, since performance is typically a 
primary concern for matrix code. I didn't read the linked paper, 
but the performance quoted in the abstract "comparable (from 48% 
to 100%) to that of lower-level Java programs" sounds pretty 
underwhelming.
Jun 23 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/23/14, 8:46 PM, Tofu Ninja wrote:
 On Tuesday, 24 June 2014 at 03:29:52 UTC, H. S. Teoh via Digitalmars-d
 wrote:

 a compile-time DSL instead of hack tricks like expression

That sounds like a it will be very odd to use and most likely have very little benefit. I think most people would agree that matrix math is a good use case for operator overloading.

Just a thought - a nice thing would be to have a shell backed up by multiple engines (LAPACK, BLAS, SIMD-based, naive etc), similarly to what LINQ does with virtualizing data access. Then possessors of various linear algebra libraries would be able to link with them and use them. Andrei
Jun 24 2014
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jun 24, 2014 at 03:04:53AM +0000, David Bregman via Digitalmars-d wrote:
 Have you looked at armadillo for inspiration? http://arma.sourceforge.net/
 
 It's a C++ matrix/linear algebra library which has:
 -expression templates for high level optimizations (could be done even
 better in D of course, moving more stuff to compile time)
 -high performance via lapack/blas
 -deliberately matlab-like syntax which is appealing to a large subset
 of people who would use such a library
 
 I think this approach would also be good for D, at least if the idea
 is to target numeric computing people. I am skeptical about making the
 library too generic, since performance is typically a primary concern
 for matrix code. I didn't read the linked paper, but the performance
 quoted in the abstract "comparable (from 48% to 100%) to that of
 lower-level Java programs" sounds pretty underwhelming.

Using lapack/blas as the backend would guarantee good performance, though I'm not sure if the Phobos maintainers would be too thrilled at the idea, after the fiasco with libcurl. For high performance, I would implement matrix algebra completely within a compile-time DSL instead of hack tricks like expression templates and operator overloading. That way, the implementation can algebraically manipulate the expressions to maximize opportunities for optimizations. As for genericity, I think D has the ability to generate performant code *and* be generic at the same time. At least, it'd worth shooting for to see how far we can get. I believe the key is to separate the algorithms from the concrete types, and use specialization for the performance-sensitive cases. This way, the generic cases will work, and the specific cases will be high-performance. T -- Nothing in the world is more distasteful to a man than to take the path that leads to himself. -- Herman Hesse
Jun 23 2014
prev sibling next sibling parent "Tofu Ninja" <emmons0 purdue.edu> writes:
On Tuesday, 24 June 2014 at 03:29:52 UTC, H. S. Teoh via 
Digitalmars-d wrote:

 a compile-time DSL instead of hack tricks like expression

That sounds like a it will be very odd to use and most likely have very little benefit. I think most people would agree that matrix math is a good use case for operator overloading.
Jun 23 2014
prev sibling next sibling parent "ed" <growlercab gmail.com> writes:
On Monday, 23 June 2014 at 21:08:03 UTC, Mason McGill wrote:
[snip]
 Concepts:
   InputGrid: anything with a size (size_t[n]) and n-dimensional 
 opIndex.
   OutputGrid: anything with a size (size_t[n]) and 
 n-dimensional opIndexAssign.

 Cheers,
 Mason

I don't think 'Grid' is not a good name for the type. The term Grid is generally used to refer to any sort of grid; regular, semiregular, irregular, unstructured, curvilinear etc. etc. grids. They are all very different in their use and internal representations. Often a grid will have support for metadata at each point (GIS etc.) whereas a matrix will not. I think the word Matrix is a better fit, because that is what they these types are.
Jun 23 2014
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Tue, Jun 24, 2014 at 03:46:09AM +0000, Tofu Ninja via Digitalmars-d wrote:
 On Tuesday, 24 June 2014 at 03:29:52 UTC, H. S. Teoh via Digitalmars-d
 wrote:
 
a compile-time DSL instead of hack tricks like expression

That sounds like a it will be very odd to use and most likely have very little benefit. I think most people would agree that matrix math is a good use case for operator overloading.

The DSL can look exactly like a regular matrix expression. The case against operator overloading, is that unless you resort to hacks like expression templates, the result will likely have poor performance, because the overloaded operator can only see one or two arguments at a time, whereas a compile-time DSL interpreter can analyze the entire expression and use algebraic rules to rewrite it in a form that can be efficiently implemented, eg., in terms of lapack/blas primitives for computing expressions like A*x + y. Doing this sort of rewriting with expression templates would be a maintenance nightmare. T -- Lawyer: (n.) An innocence-vending machine, the effectiveness of which depends on how much money is inserted.
Jun 23 2014
prev sibling next sibling parent "Mason McGill" <mmcgill caltech.edu> writes:
On Tuesday, 24 June 2014 at 04:36:04 UTC, ed wrote:
 On Monday, 23 June 2014 at 21:08:03 UTC, Mason McGill wrote:
 [snip]
 Concepts:
  InputGrid: anything with a size (size_t[n]) and n-dimensional 
 opIndex.
  OutputGrid: anything with a size (size_t[n]) and 
 n-dimensional opIndexAssign.

 Cheers,
 Mason

I don't think 'Grid' is not a good name for the type. The term Grid is generally used to refer to any sort of grid; regular, semiregular, irregular, unstructured, curvilinear etc. etc. grids.

When you're talking about sampling on a grid, warping can either be thought of as warping the grid, or warping the function to be sampled. My library happens to work in terms of the second approach, so it is (pedantically) consistent. I see your point, though.
 They are all very different in their use and internal 
 representations. Often a grid will have support for metadata at 
 each point (GIS etc.) whereas a matrix will not.

 I think the word Matrix is a better fit, because that is what 
 they these types are.

Not quite. "Matrix" connotes 2-dimensional tensors (in the linear algebra sense). The library I'm developing at work is for manipulating multidimensional arrays, which don't have the same semantics people expect from matrices (representing linear operators). However, the name "Array" already refers to at least 3 data structures in D, so it was out. I picked "Grid" because the library is for processing data sampled regularly on n-dimensional grids (mostly vision). With my library, you could represent complex data in a GIS via a grid of structures, or multiple grids ("layers"). If you're not convinced, I intend to release my work under a liberal license, so feel free to download it and find/replace "Grid" -> "Matrix" :) Also, check out this other option if you haven't already: http://denis-sh.bitbucket.org/unstandard/unstd.multidimarray.html
Jun 24 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
David Bregman:

 I didn't read the linked paper, but the performance quoted in 
 the abstract "comparable (from 48% to 100%) to that of 
 lower-level Java programs" sounds pretty underwhelming.

The design I've suggested is not that virtual/indirect, so the performance is better. Bye, bearophile
Jun 24 2014
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
David Bregman:

 I think this approach would also be good for D, at least if the 
 idea is to target numeric computing people.

I am thinking about something slightly different, and with a semantics more similar to that Haskell library. I don't think D is going to compete with Julia and its high dynamism, so better to use the static compilation at its fullest, and catch more bugs at compile-time. That's why I was discussing about type system improvements, etc. Bye, bearophile
Jun 24 2014
prev sibling next sibling parent "ed" <gmail gmail.com> writes:
On Tuesday, 24 June 2014 at 07:01:14 UTC, Mason McGill wrote:
 On Tuesday, 24 June 2014 at 04:36:04 UTC, ed wrote:
 On Monday, 23 June 2014 at 21:08:03 UTC, Mason McGill wrote:
 [snip]
 Concepts:
 InputGrid: anything with a size (size_t[n]) and n-dimensional 
 opIndex.
 OutputGrid: anything with a size (size_t[n]) and 
 n-dimensional opIndexAssign.

 Cheers,
 Mason

I don't think 'Grid' is not a good name for the type. The term Grid is generally used to refer to any sort of grid; regular, semiregular, irregular, unstructured, curvilinear etc. etc. grids.

When you're talking about sampling on a grid, warping can either be thought of as warping the grid, or warping the function to be sampled. My library happens to work in terms of the second approach, so it is (pedantically) consistent. I see your point, though.
 They are all very different in their use and internal 
 representations. Often a grid will have support for metadata 
 at each point (GIS etc.) whereas a matrix will not.

 I think the word Matrix is a better fit, because that is what 
 they these types are.

Not quite. "Matrix" connotes 2-dimensional tensors (in the linear algebra sense). The library I'm developing at work is for manipulating multidimensional arrays, which don't have the same semantics people expect from matrices (representing linear operators). However, the name "Array" already refers to at least 3 data structures in D, so it was out. I picked "Grid" because the library is for processing data sampled regularly on n-dimensional grids (mostly vision). With my library, you could represent complex data in a GIS via a grid of structures, or multiple grids ("layers").

Well you could but there are well established data structures that are optimal for each grid type so it would be better to just use one of those.
 If you're not convinced, I intend to release my work under a 
 liberal license, so feel free to download it and find/replace 
 "Grid" -> "Matrix" :)

Fair enough and I am only nitpicking. Your library sounds promising so I'll definitely give it a whirl when released.
 Also, check out this other option if you haven't already:
 http://denis-sh.bitbucket.org/unstandard/unstd.multidimarray.html

Yep, this is nice and I use it already, along with Sci-D. Cheers, ed
Jun 24 2014
prev sibling next sibling parent "Jared" <jared economicmodeling.com> writes:
On Tuesday, 24 June 2014 at 03:29:52 UTC, H. S. Teoh via 
Digitalmars-d wrote:
 Using lapack/blas as the backend would guarantee good 
 performance,
 though I'm not sure if the Phobos maintainers would be too 
 thrilled at
 the idea, after the fiasco with libcurl.

It is straightforward with "version" blocks to support multiple backends at compile time, even including a pure D version (lower performance but always works). One of our internal libraries supports either ATLAS/BLAS (for dev boxes) or MKL (for production big iron) in this way. We just change a makefile variable to use "--version=ATLAS" or "--version=MKL". J
Jul 07 2014
prev sibling parent "w0rp" <devw0rp gmail.com> writes:
I have implemented a couple of matrices which may be of interest. 
One is heap allocated with garbage collection, another is based 
on 2D static arrays. I didn't implement a spare matrix.

https://github.com/w0rp/dstruct/blob/master/source/dstruct/matrix.d

There's very little which can be shared between sparse matrices 
and full matrices, as they optimise for very different scenarios. 
I wouldn't attempt write too many functions which work on both.

The one very interesting thing I got out of working on the 
matrices is that you can get a reference to the 2D static array 
matrix as a 1D static array, which means that you can do the 
following.

Matrix!(float, M, N) matrix;

fillArrayInC(matrix.array1D.ptr, matrix.array1D.length);

That's the one interesting thing I'd take away from everything I 
did. The ability to do that in D might be useful for someone else 
some time.
Jul 07 2014