digitalmars.D - Matrix/Linear Algebra Library?

Benji Smith (9/9) Oct 02 2008 Does anyone have a recommendation for a linear algebra library in D?

Bill Baxter (17/25) Oct 02 2008 You need something like ARPACK to do SVD on large sparse matrices.
Bill Baxter (6/14) Oct 02 2008 Oh, and Blade is a very low-level library aimed at optimizing the very

Don Clugston (5/26) Oct 03 2008 It's pretty much obselete now that array ops are in the language. That

Fawzi Mohamed (16/45) Oct 03 2008 I have written dense multidimensional arrays in D.

Fawzi Mohamed (5/55) Oct 03 2008 Probably if you want least square approximation (that is what you meant

TomD (8/13) Oct 03 2008 You normally use SVD only if the system is ill conditioned. Otherwise ch...
Benji Smith (11/15) Oct 03 2008 Sorry. I meant "Latent Semantic Analysis". I'm building a table of

Bill Baxter (15/30) Oct 03 2008 How many nonzeros? Assuming each document has maybe 10K terms that's

Oliver Dathe (20/20) Oct 02 2008 I used LinAlg of the Shark-Project for SVD but is has no sparse matrix

Benji Smith (9/12) Oct 02 2008 Yeah, actually, I definitely need SVD. I'm putting together an LSA

Benji Smith <dlanguage benjismith.net> writes:

Does anyone have a recommendation for a linear algebra library in D?

I need to do a singular value decomposition on large sparse matrices 
(approx 100,000 x 100,000).

I've glanced at the Blade library, on DSource, but my linear algebra is 
a bit rusty, and I can't tell whether the library simply lacks the 
ability to do an SVD or whether it can be done, but only by composing 
other operations.

Thanks!

--benji

Oct 02 2008

"Bill Baxter" <wbaxter gmail.com> writes:

You need something like ARPACK to do SVD on large sparse matrices.
I have implemented a wrapper for Python before.  And I could use one
for D right now too, but I haven't done it yet.
I have a bunch of other linear algebra wrappers in my MultiArray
project on dsource, including LAPACK which can do SVD on dense
matrices.

There's also GSL wrappers someone contributed in the bindings project.
 I don't know much about it, but supposedly it at least contains BLAS
wrappers.  I don't know if it has anything much beyond that though.

If I needed something like this today, I'd put an Arpack wrapper into
multiarray, following the pattern used by the Blas, Lapack, SuperLU,
Umfpack, and Taucs wrappers I've already written.  Arpack isn't
exactly easy to call directly, so then I'd follow up by looking into
what SciPy has got for Arpack wrappers these days and try porting that
to D.

--bb

On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?

 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).

 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.

 Thanks!

 --benji

Oct 02 2008

"Bill Baxter" <wbaxter gmail.com> writes:

Oh, and Blade is a very low-level library aimed at optimizing the very
simplest linear algebra expressions like vector+vector.  Last I
checked it didn't contain any matrix ops at all.  And it seems like
Don is frying different fish these days (like Tango BigInt).

--bb

On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?

 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).

 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.

 Thanks!

 --benji

Oct 02 2008

Don Clugston <nospam nospam.com> writes:

Bill Baxter wrote:
 Oh, and Blade is a very low-level library aimed at optimizing the very
 simplest linear algebra expressions like vector+vector.  Last I
 checked it didn't contain any matrix ops at all.  And it seems like
 Don is frying different fish these days (like Tango BigInt).

It's pretty much obselete now that array ops are in the language. That 
always happens to my most interesting, sophisticated code -- Walter 
turns it into a one-liner.

BTW BigInt will eventually get to Phobos, too.


 
 --bb
 
 On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?

 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).

 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.

 Thanks!

 --benji

Oct 03 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-10-03 10:27:27 +0200, Don Clugston <nospam nospam.com> said:

 Bill Baxter wrote:
 Oh, and Blade is a very low-level library aimed at optimizing the very
 simplest linear algebra expressions like vector+vector.  Last I
 checked it didn't contain any matrix ops at all.  And it seems like
 Don is frying different fish these days (like Tango BigInt).

 
 It's pretty much obselete now that array ops are in the language. That 
 always happens to my most interesting, sophisticated code -- Walter 
 turns it into a one-liner.
 
 BTW BigInt will eventually get to Phobos, too.
 
 
 
 --bb
 
 On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?
 
 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).
 
 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.
 
 Thanks!
 
 --benji



I have written dense multidimensional arrays in D.
They provide some wrapper to various Lapack functions (through Bill's 
wrappers).
The library is available at
	http://github.com/fawzi
SVD is also there, but just for *dense* matrixes.
Bill's wrappers and his dflat also work with sparse matrixes.

Actually I think that you should think more about your problem
If you need full SVD U and V matrixes are *full* even if A is sparse.
100'000x100'000 double matrix ~74GB, floats is half of it, but you have 
at least two matrixes.

Either you have really big resources, or you switch to a cluster and 
scalapack, or (better) you reformulate your problem so that you either 
need just a partial decomposition, or you can solve it iteratively.

Fawzi

Oct 03 2008

Fawzi Mohamed <fmohamed mac.com> writes:

On 2008-10-03 11:46:11 +0200, Fawzi Mohamed <fmohamed mac.com> said:

 On 2008-10-03 10:27:27 +0200, Don Clugston <nospam nospam.com> said:
 
 Bill Baxter wrote:
 Oh, and Blade is a very low-level library aimed at optimizing the very
 simplest linear algebra expressions like vector+vector.  Last I
 checked it didn't contain any matrix ops at all.  And it seems like
 Don is frying different fish these days (like Tango BigInt).

 
 It's pretty much obselete now that array ops are in the language. That 
 always happens to my most interesting, sophisticated code -- Walter 
 turns it into a one-liner.
 
 BTW BigInt will eventually get to Phobos, too.
 
 
 
 --bb
 
 On Fri, Oct 3, 2008 at 6:50 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Does anyone have a recommendation for a linear algebra library in D?
 
 I need to do a singular value decomposition on large sparse matrices (approx
 100,000 x 100,000).
 
 I've glanced at the Blade library, on DSource, but my linear algebra is a
 bit rusty, and I can't tell whether the library simply lacks the ability to
 do an SVD or whether it can be done, but only by composing other operations.
 
 Thanks!
 
 --benji



 
 I have written dense multidimensional arrays in D.
 They provide some wrapper to various Lapack functions (through Bill's 
 wrappers).
 The library is available at
 	http://github.com/fawzi
 SVD is also there, but just for *dense* matrixes.
 Bill's wrappers and his dflat also work with sparse matrixes.
 
 Actually I think that you should think more about your problem
 If you need full SVD U and V matrixes are *full* even if A is sparse.
 100'000x100'000 double matrix ~74GB, floats is half of it, but you have 
 at least two matrixes.
 
 Either you have really big resources, or you switch to a cluster and 
 scalapack, or (better) you reformulate your problem so that you either 
 need just a partial decomposition, or you can solve it iteratively.
 
 Fawzi

Probably if you want least square approximation (that is what you meant 
with LSA?) some kind of direct minimizer is the way to go (but be 
careful the problem might ill conditioned).
Fawzi

Oct 03 2008

TomD <t_demmer nospam.web.de> writes:

Fawzi Mohamed Wrote:

[...]
 Probably if you want least square approximation (that is what you meant 
 with LSA?) some kind of direct minimizer is the way to go (but be 
 careful the problem might ill conditioned).
 Fawzi
 

You normally use SVD only if the system is ill conditioned. Otherwise check
for a conjugate gradient method (if the matrix is symmetric), or do some
sort of pre-conditioning. But then again, the last time I did CFD was some
10 years ago.

Ciao
Tom

Oct 03 2008

Benji Smith <dlanguage benjismith.net> writes:

Fawzi Mohamed wrote:
 Probably if you want least square approximation (that is what you meant 
 with LSA?) some kind of direct minimizer is the way to go (but be 
 careful the problem might ill conditioned).
 Fawzi

Sorry. I meant "Latent Semantic Analysis". I'm building a table of 
synonyms from a large document repository.

There are about 100,000 documents in the repository, and the total 
corpus has about 100,000 unique words (after stemming). The LSA 
algorithm uses a matrix, where each row represents a single term, and 
each column represents a single document. Each cell contains the TF-IDF 
(Term Frequency / Inverse Document Frequency).

Performing an SVD on the matrix yields a collelation matrix, 
representing the synonymy of all terms, in all documents.

--benji

Oct 03 2008

"Bill Baxter" <wbaxter gmail.com> writes:

On Sat, Oct 4, 2008 at 1:10 AM, Benji Smith <dlanguage benjismith.net> wrote:
 Fawzi Mohamed wrote:
 Probably if you want least square approximation (that is what you meant
 with LSA?) some kind of direct minimizer is the way to go (but be careful
 the problem might ill conditioned).
 Fawzi

 Sorry. I meant "Latent Semantic Analysis". I'm building a table of synonyms
 from a large document repository.

 There are about 100,000 documents in the repository, and the total corpus
 has about 100,000 unique words (after stemming). The LSA algorithm uses a
 matrix, where each row represents a single term, and each column represents
 a single document. Each cell contains the TF-IDF (Term Frequency / Inverse
 Document Frequency).

 Performing an SVD on the matrix yields a collelation matrix, representing
 the synonymy of all terms, in all documents.

How many nonzeros?  Assuming each document has maybe 10K terms that's
still a billion nonzero entries, and the SVD's U and V matrices will
be dense.
That is a lot of data.  Fawzi's right, you probably need some bigger
hammers for that, like a distributed or out of core SVD routine.

But given the application I'm also guessing you only care about the
vectors associated with the largest few singular values, so you should
be looking for a routine that can find a few singular values at a
time.  The Arnoldi methods in ARPACK can do that kind of thing.   I
can't remember if that handles sparse matrices too, but I think it
does.  I seem to recall that you provide your own Mat*Vector
calculation (and maybe dot or vector addition ops too?) and it does
the rest.

--bb

Oct 03 2008

Oliver Dathe <o.dathe gmx.de> writes:

I used LinAlg of the Shark-Project for SVD but is has no sparse matrix 
support and is C++.
http://shark-project.sourceforge.net/

GNU Scientific Library
http://www.gnu.org/software/gsl/
http://www.gnu.org/software/gsl/manual/html_node/Singular-Value-Decomposition.html
It seem to have no sparse matrix support but there are bindings in
http://www.dsource.org/projects/bindings/browser/trunk/gsl

SVDLIBC
http://tedlab.mit.edu/~dr/SVDLIBC/
It has sparse matrix support but you have to make bindings yourself or 
try bcd for that
http://www.dsource.org/projects/bcd

NumPy
http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#svd
I've never used Python but there is Kirks well known pyd for interfacing 
between Python and D.

Sure you need SVD? Often it is used for least squares fitting or 
pseudoinverse. So depending on what you are doing there may be weaker 
alternatives like Moore-Penrose pseudoinverse.

Oct 02 2008

Benji Smith <dlanguage benjismith.net> writes:

Oliver Dathe wrote:
 Sure you need SVD? Often it is used for least squares fitting or 
 pseudoinverse. So depending on what you are doing there may be weaker 
 alternatives like Moore-Penrose pseudoinverse.


Yeah, actually, I definitely need SVD. I'm putting together an LSA 
algorithm for a machine-learning project.

My first instinct was to use Colt & JAMA (both Java libs) since the rest 
of the project is in Java. But I thought it would be interesting to 
compare the Java implementation with one in D, both performance-wise and 
API-wise.

Thanks, everybody, for your quick responses!

--benji

Oct 02 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Matrix/Linear Algebra Library?