www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Matrix/Linear Algebra Library?

reply Benji Smith <dlanguage benjismith.net> writes:
Does anyone have a recommendation for a linear algebra library in D?

I need to do a singular value decomposition on large sparse matrices 
(approx 100,000 x 100,000).

I've glanced at the Blade library, on DSource, but my linear algebra is 
a bit rusty, and I can't tell whether the library simply lacks the 
ability to do an SVD or whether it can be done, but only by composing 
other operations.

Thanks!

--benji
Oct 02 2008
next sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
You need something like ARPACK to do SVD on large sparse matrices.
I have implemented a wrapper for Python before.  And I could use one
for D right now too, but I haven't done it yet.
I have a bunch of other linear algebra wrappers in my MultiArray
project on dsource, including LAPACK which can do SVD on dense
matrices.

There's also GSL wrappers someone contributed in the bindings project.
 I don't know much about it, but supposedly it at least contains BLAS
wrappers.  I don't know if it has anything much beyond that though.

If I needed something like this today, I'd put an Arpack wrapper into
multiarray, following the pattern used by the Blas, Lapack, SuperLU,
Umfpack, and Taucs wrappers I've already written.  Arpack isn't
exactly easy to call directly, so then I'd follow up by looking into
what SciPy has got for Arpack wrappers these days and try porting that
to D.

--bb

On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?

 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).

 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.

 Thanks!

 --benji

Oct 02 2008
prev sibling next sibling parent reply "Bill Baxter" <wbaxter gmail.com> writes:
Oh, and Blade is a very low-level library aimed at optimizing the very
simplest linear algebra expressions like vector+vector.  Last I
checked it didn't contain any matrix ops at all.  And it seems like
Don is frying different fish these days (like Tango BigInt).

--bb

On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?

 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).

 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.

 Thanks!

 --benji

Oct 02 2008
parent reply Don Clugston <nospam nospam.com> writes:
Bill Baxter wrote:
 Oh, and Blade is a very low-level library aimed at optimizing the very
 simplest linear algebra expressions like vector+vector.  Last I
 checked it didn't contain any matrix ops at all.  And it seems like
 Don is frying different fish these days (like Tango BigInt).

It's pretty much obselete now that array ops are in the language. That always happens to my most interesting, sophisticated code -- Walter turns it into a one-liner. BTW BigInt will eventually get to Phobos, too.
 
 --bb
 
 On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?

 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).

 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.

 Thanks!

 --benji


Oct 03 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-10-03 10:27:27 +0200, Don Clugston <nospam nospam.com> said:

 Bill Baxter wrote:
 Oh, and Blade is a very low-level library aimed at optimizing the very
 simplest linear algebra expressions like vector+vector.  Last I
 checked it didn't contain any matrix ops at all.  And it seems like
 Don is frying different fish these days (like Tango BigInt).

It's pretty much obselete now that array ops are in the language. That always happens to my most interesting, sophisticated code -- Walter turns it into a one-liner. BTW BigInt will eventually get to Phobos, too.
 
 --bb
 
 On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?
 
 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).
 
 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.
 
 Thanks!
 
 --benji



I have written dense multidimensional arrays in D. They provide some wrapper to various Lapack functions (through Bill's wrappers). The library is available at http://github.com/fawzi SVD is also there, but just for *dense* matrixes. Bill's wrappers and his dflat also work with sparse matrixes. Actually I think that you should think more about your problem If you need full SVD U and V matrixes are *full* even if A is sparse. 100'000x100'000 double matrix ~74GB, floats is half of it, but you have at least two matrixes. Either you have really big resources, or you switch to a cluster and scalapack, or (better) you reformulate your problem so that you either need just a partial decomposition, or you can solve it iteratively. Fawzi
Oct 03 2008
parent reply Fawzi Mohamed <fmohamed mac.com> writes:
On 2008-10-03 11:46:11 +0200, Fawzi Mohamed <fmohamed mac.com> said:

 On 2008-10-03 10:27:27 +0200, Don Clugston <nospam nospam.com> said:
 
 Bill Baxter wrote:
 Oh, and Blade is a very low-level library aimed at optimizing the very
 simplest linear algebra expressions like vector+vector.  Last I
 checked it didn't contain any matrix ops at all.  And it seems like
 Don is frying different fish these days (like Tango BigInt).

It's pretty much obselete now that array ops are in the language. That always happens to my most interesting, sophisticated code -- Walter turns it into a one-liner. BTW BigInt will eventually get to Phobos, too.
 
 --bb
 
 On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?
 
 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).
 
 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.
 
 Thanks!
 
 --benji



I have written dense multidimensional arrays in D. They provide some wrapper to various Lapack functions (through Bill's wrappers). The library is available at http://github.com/fawzi SVD is also there, but just for *dense* matrixes. Bill's wrappers and his dflat also work with sparse matrixes. Actually I think that you should think more about your problem If you need full SVD U and V matrixes are *full* even if A is sparse. 100'000x100'000 double matrix ~74GB, floats is half of it, but you have at least two matrixes. Either you have really big resources, or you switch to a cluster and scalapack, or (better) you reformulate your problem so that you either need just a partial decomposition, or you can solve it iteratively. Fawzi

Probably if you want least square approximation (that is what you meant with LSA?) some kind of direct minimizer is the way to go (but be careful the problem might ill conditioned). Fawzi
Oct 03 2008
next sibling parent TomD <t_demmer nospam.web.de> writes:
Fawzi Mohamed Wrote:

[...]
 Probably if you want least square approximation (that is what you meant 
 with LSA?) some kind of direct minimizer is the way to go (but be 
 careful the problem might ill conditioned).
 Fawzi
 

for a conjugate gradient method (if the matrix is symmetric), or do some sort of pre-conditioning. But then again, the last time I did CFD was some 10 years ago. Ciao Tom
Oct 03 2008
prev sibling parent Benji Smith <dlanguage benjismith.net> writes:
Fawzi Mohamed wrote:
 Probably if you want least square approximation (that is what you meant 
 with LSA?) some kind of direct minimizer is the way to go (but be 
 careful the problem might ill conditioned).
 Fawzi

Sorry. I meant "Latent Semantic Analysis". I'm building a table of synonyms from a large document repository. There are about 100,000 documents in the repository, and the total corpus has about 100,000 unique words (after stemming). The LSA algorithm uses a matrix, where each row represents a single term, and each column represents a single document. Each cell contains the TF-IDF (Term Frequency / Inverse Document Frequency). Performing an SVD on the matrix yields a collelation matrix, representing the synonymy of all terms, in all documents. --benji
Oct 03 2008
prev sibling next sibling parent reply Oliver Dathe <o.dathe gmx.de> writes:
I used LinAlg of the Shark-Project for SVD but is has no sparse matrix 
support and is C++.
http://shark-project.sourceforge.net/

GNU Scientific Library
http://www.gnu.org/software/gsl/
http://www.gnu.org/software/gsl/manual/html_node/Singular-Value-Decomposition.html
It seem to have no sparse matrix support but there are bindings in
http://www.dsource.org/projects/bindings/browser/trunk/gsl

SVDLIBC
http://tedlab.mit.edu/~dr/SVDLIBC/
It has sparse matrix support but you have to make bindings yourself or 
try bcd for that
http://www.dsource.org/projects/bcd

NumPy
http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#svd
I've never used Python but there is Kirks well known pyd for interfacing 
between Python and D.

Sure you need SVD? Often it is used for least squares fitting or 
pseudoinverse. So depending on what you are doing there may be weaker 
alternatives like Moore-Penrose pseudoinverse.
Oct 02 2008
parent Benji Smith <dlanguage benjismith.net> writes:
Oliver Dathe wrote:
 Sure you need SVD? Often it is used for least squares fitting or 
 pseudoinverse. So depending on what you are doing there may be weaker 
 alternatives like Moore-Penrose pseudoinverse.

Yeah, actually, I definitely need SVD. I'm putting together an LSA algorithm for a machine-learning project. My first instinct was to use Colt & JAMA (both Java libs) since the rest of the project is in Java. But I thought it would be interesting to compare the Java implementation with one in D, both performance-wise and API-wise. Thanks, everybody, for your quick responses! --benji
Oct 02 2008
prev sibling parent "Bill Baxter" <wbaxter gmail.com> writes:
On Sat, Oct 4, 2008 at 1:10 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Fawzi Mohamed wrote:
 Probably if you want least square approximation (that is what you meant
 with LSA?) some kind of direct minimizer is the way to go (but be careful
 the problem might ill conditioned).
 Fawzi

Sorry. I meant "Latent Semantic Analysis". I'm building a table of synonyms from a large document repository. There are about 100,000 documents in the repository, and the total corpus has about 100,000 unique words (after stemming). The LSA algorithm uses a matrix, where each row represents a single term, and each column represents a single document. Each cell contains the TF-IDF (Term Frequency / Inverse Document Frequency). Performing an SVD on the matrix yields a collelation matrix, representing the synonymy of all terms, in all documents.

How many nonzeros? Assuming each document has maybe 10K terms that's still a billion nonzero entries, and the SVD's U and V matrices will be dense. That is a lot of data. Fawzi's right, you probably need some bigger hammers for that, like a distributed or out of core SVD routine. But given the application I'm also guessing you only care about the vectors associated with the largest few singular values, so you should be looking for a routine that can find a few singular values at a time. The Arnoldi methods in ARPACK can do that kind of thing. I can't remember if that handles sparse matrices too, but I think it does. I seem to recall that you provide your own Mat*Vector calculation (and maybe dot or vector addition ops too?) and it does the rest. --bb
Oct 03 2008