www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Back in the game: Numerics in D

reply Norbert Nemec <Norbert Nemec-online.de> writes:
Hi there,

just to say "Hi" to those that might still remember my name from the 
good ol' times. (I just checked - I actually sent 464 messages to this 
list from 2004 to 2006 and none since.)

To explain: Back in 2004, at the beginning of my PhD project in 
computational physics, I discovered D, spent lots of thought on it in my 
spare time and took part in many discussions, but D was not really 
usable for my actual PhD work, so I never had any serious application 
for it and soon had to cut down the time I spent on it. A little later, 
I met a nice girl, married her, finished my PhD, moved to UK and worked 
two years as a post doc coding mostly in Fortran and Python. Over these 
years, I observed D only from the distance, was amazed about the 
progress but never got into it more seriously.

Now, however -- probably inspired by the frustration over the Fortran 
code I have to work with -- I have picked up the ideas I started in 2004 
and have realized that it might actually be the time to turn these into 
a serious, full time research project:

I am currently considering to apply for an academic research fellowship 
that would allow me to work full time on the development of a numerical 
library for D. The core of this work would be the implementation of 
multi-dimensional numerical arrays and array expressions similar to what 
NumPy does in Python. It would remain to be seen how much can be done as 
a library and what should be done at a language level. Ultimately, I 
believe D could achive support for numerical programming with the 
comfort, simplicity and expressiveness of Python and the performance of 
Fortran.

My core ideas are still similar to what I described years ago:
     http://www.tcm.phy.cam.ac.uk/~nn245/documents/D-multidimarray.html
I was very honored to see that Oskar Linde has actually written a 
proof-of-concept library based on my description and have also seen the 
efforts that Bill Baxter has put into his similar library. Both will 
certainly be a valuable starting point for whatever I might come up with.

Please be aware that the time scale of the whole project is certainly 
too long to keep your breath. Applying for funding now, I could start 
working on it full time earliest in summer 2011. Until then, I cannot 
say how much time I could divert to this project.

In any case, I would really love to see a strong community for numerical 
computing arise around D!

Greetings,
Norbert
Feb 20 2010
next sibling parent reply FeepingCreature <default_357-line yahoo.de> writes:
On 20.02.2010 20:13, Norbert Nemec wrote:
 Hi there,
 
 just to say "Hi" to those that might still remember my name from the
 good ol' times. (I just checked - I actually sent 464 messages to this
 list from 2004 to 2006 and none since.)
 

I don't think I remember you! But I'll say welcome back anyway. If you're interested in high-performance numerical computing, the current gdc has some hairy bugs that only show themselves on -O3 -ffast-math -march=native. Certainly enough to keep people on their toes :) Just keep it in mind - numerical errors may not actually be your fault. Also, here's a copy of the GDC autovectorization patch: http://pastebin.com/f1c5b28df It basically does the following: if you have a (simple) loop with index size four or two or whatever your mmx or sse step size is, it translates it into SSE expressions. Example: float[4] a, b, c; [...] for (int i = 0; i < 4; ++i) a[i] = b[i] + c[i]; // addps You can observe the process by passing -ftree-vectorizer-verbose=9 2>&1 |less to gdc (or gdc-build, which I've found handy for full custom rebuilds: http://pastebin.com/f28310a8a ). Good luck with the numerical work and, welcome back to D. Have fun!
Feb 20 2010
parent FeepingCreature <default_357-line yahoo.de> writes:
On 20.02.2010 20:27, FeepingCreature wrote:
 On 20.02.2010 20:13, Norbert Nemec wrote:
 Hi there,

 just to say "Hi" to those that might still remember my name from the
 good ol' times. (I just checked - I actually sent 464 messages to this
 list from 2004 to 2006 and none since.)

I don't think I remember you! But I'll say welcome back anyway. If you're interested in high-performance numerical computing, the current gdc has some hairy bugs that only show themselves on -O3 -ffast-math -march=native. Certainly enough to keep people on their toes :) Just keep it in mind - numerical errors may not actually be your fault. Also, here's a copy of the GDC autovectorization patch: http://pastebin.com/f1c5b28df It basically does the following: if you have a (simple) loop with index size four or two or whatever your mmx or sse step size is, it translates it into SSE expressions. Example: float[4] a, b, c; [...] for (int i = 0; i < 4; ++i) a[i] = b[i] + c[i]; // addps You can observe the process by passing -ftree-vectorizer-verbose=9 2>&1 |less to gdc (or gdc-build, which I've found handy for full custom rebuilds: http://pastebin.com/f28310a8a ). Good luck with the numerical work and, welcome back to D. Have fun!

PS here's my implementation of a generic vector for comparison: http://dsource.org/projects/scrapple/browser/trunk/tools/tools/vector.d Some nice performance tweaks in there.
Feb 20 2010
prev sibling next sibling parent bearophile <bearophileHUGS lysos.com> writes:
Norbert Nemec:
 Applying for funding now, I could start 
 working on it full time earliest in summer 2011. Until then, I cannot 
 say how much time I could divert to this project.

Your work will probably shape some of the future culture and community of the D language. If you aren't up to date to the D2 language, in the next few days you can take a look at D2 and try to play a bit with it, to learn it some. Then you can think about the design of your libs, and in few days you can tell us what you think are the downsides, faults, problems, missing things/features in D2 regarding the creation of your future lib. The sooner such possible problems are known, the better, some of them can even be fixed before your lib is done. Bye and welcome back, bearophile
Feb 20 2010
prev sibling next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
This sounds fantastic.  I and a few others (actually more than a few) around
here
are very interested in putting together a complete scientific library for D.  A
few of the pieces are already in place.  Lars Kyllingstad has started a SciD
project (http://www.dsource.org/projects/scid).  For now it's basically a nice
wrapper around BLAS and LAPACK, but in the future we'd like to make more of it
native D because using crufty FORTRAN code introduces some very arbitrary
limitations and makes getting up and running with the library relatively
difficult
(as in, I haven't figured out how to build the thing on Windows yet.)

I've written a fairly comprehensive statistics library called dstats
(http://www.dsource.org/projects/dstats).  It's permissively licensed and my
eventual goal is to have it merged with other people's efforts and become part
of
a full-fledged scientific library.  Right now it's focused on descriptive and
inferential statistics, but I'm considering expanding it to include data mining
and machine learning.  One big bottleneck is the lack of a mature matrix/linalg
library for D.
Feb 20 2010
parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
dsimcha wrote:
 This sounds fantastic.  I and a few others (actually more than a few) around
here
 are very interested in putting together a complete scientific library for D.  A
 few of the pieces are already in place.  Lars Kyllingstad has started a SciD
 project (http://www.dsource.org/projects/scid).  For now it's basically a nice
 wrapper around BLAS and LAPACK, but in the future we'd like to make more of it
 native D because using crufty FORTRAN code introduces some very arbitrary
 limitations and makes getting up and running with the library relatively
difficult
 (as in, I haven't figured out how to build the thing on Windows yet.)

For the sake of correctness, I would just like to point out that those limitations (namely, only being able to use float and double, and not real or user-defined floating-point types) only applies to the parts SciD that uses LAPACK, i.e. the scid.linalg module. The rest of SciD is native, templated D code. That said, having to use LAPACK has become a major annoyance. A native D linear algebra library is high on my wish list, but not something I have the time to write myself. There is also, as dsimcha points out, the problem of getting BLAS and LAPACK up and running on Windows. If anyone has experience with this, please speak up. Unfortunately I don't have convenient access to a Windows computer myself, but I would very much like SciD to be as cross-platform as possible. Any help with writing a Windows build/installation guide would be much appreciated. -Lars
Feb 23 2010
prev sibling next sibling parent reply "Nick Sabalausky" <a a.a> writes:
"Norbert Nemec" <Norbert Nemec-online.de> wrote in message 
news:hlpc8g$1rfm$1 digitalmars.com...
 Hi there,

Hi!
 I am currently considering to apply for an academic research fellowship 
 that would allow me to work full time on the development of a numerical 
 library for D. The core of this work would be the implementation of 
 multi-dimensional numerical arrays and array expressions similar to what 
 NumPy does in Python. It would remain to be seen how much can be done as a 
 library and what should be done at a language level. Ultimately, I believe 
 D could achive support for numerical programming with the comfort, 
 simplicity and expressiveness of Python and the performance of Fortran.

I wish I could get paid to work on my library (and app) pet projects! :)
Feb 20 2010
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Nick Sabalausky wrote:
 I wish I could get paid to work on my library (and app) pet projects! :)

Well, let's see - so far it is just a vague dream on my side as well. I've had that dream for a long time, though, and now I finally have a glimpse of hope.
Feb 20 2010
prev sibling next sibling parent reply Walter Bright <newshound1 digitalmars.com> writes:
Norbert Nemec wrote:
 Hi there,
 
 just to say "Hi" to those that might still remember my name from the 
 good ol' times. (I just checked - I actually sent 464 messages to this 
 list from 2004 to 2006 and none since.)

Nice to have you back!
Feb 20 2010
parent reply Steve Teale <steve.teale britseyeview.com> writes:
On Sat, 20 Feb 2010 18:56:15 -0800, Walter Bright wrote:

 Norbert Nemec wrote:
 Hi there,
 
 just to say "Hi" to those that might still remember my name from the
 good ol' times. (I just checked - I actually sent 464 messages to this
 list from 2004 to 2006 and none since.)

Nice to have you back!

Walter - echoes of the past! I still have the Zortech C++ -> Lim tee shirt ;=)
Feb 20 2010
parent Walter Bright <newshound1 digitalmars.com> writes:
Steve Teale wrote:
 Walter - echoes of the past! I still have the Zortech C++ -> Lim tee 
 shirt ;=)

I've still got mine, too!
Feb 21 2010
prev sibling next sibling parent reply Don <nospam nospam.com> writes:
Norbert Nemec wrote:
 Hi there,
 
 just to say "Hi" to those that might still remember my name from the 
 good ol' times. (I just checked - I actually sent 464 messages to this 
 list from 2004 to 2006 and none since.)

Hi! I certainly haven't forgotten you.
 I am currently considering to apply for an academic research fellowship 
 that would allow me to work full time on the development of a numerical 
 library for D. The core of this work would be the implementation of 
 multi-dimensional numerical arrays and array expressions similar to what 
 NumPy does in Python. It would remain to be seen how much can be done as 
 a library and what should be done at a language level.

That would be pretty cool. Ultimately, I
 believe D could achive support for numerical programming with the 
 comfort, simplicity and expressiveness of Python and the performance of 
 Fortran.

I have no doubt that that's true. I also believe we are close to being able to implement it. Note that a few more pieces have been put into place since you were last here. The new operator overloading scheme will be in place in the next release, which will radically simplify the existing opIndex() mess. See also this patch: http://d.puremagic.com/issues/show_bug.cgi?id=3474 This is mentioned in Andrei's book, so it should appear a few compiler releases from now.
 In any case, I would really love to see a strong community for numerical 
 computing arise around D!

Me too. Don.
Feb 22 2010
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Indeed, I have already begun digging through the pile of changes that 
have been implemented over the past years. Most importantly, IFTI. Back 
then, this was the fundamental show stopper for expression template 
programming.

I have not looked into it in detail, that all the essentials are there 
now to write a complete prototype. The really challenging part will then 
  be to make the libary as user-friendly as possible. May aim is the 
quality of NumPy. Whether this can be done in the library, or whether in 
the long run multidimensional arrays should become part of the language 
remains to be seen.

Greetings,
Norbert


Don wrote:
 Norbert Nemec wrote:
 Hi there,

 just to say "Hi" to those that might still remember my name from the 
 good ol' times. (I just checked - I actually sent 464 messages to this 
 list from 2004 to 2006 and none since.)

Hi! I certainly haven't forgotten you.
 I am currently considering to apply for an academic research 
 fellowship that would allow me to work full time on the development of 
 a numerical library for D. The core of this work would be the 
 implementation of multi-dimensional numerical arrays and array 
 expressions similar to what NumPy does in Python. It would remain to 
 be seen how much can be done as a library and what should be done at a 
 language level.

That would be pretty cool. Ultimately, I
 believe D could achive support for numerical programming with the 
 comfort, simplicity and expressiveness of Python and the performance 
 of Fortran.

I have no doubt that that's true. I also believe we are close to being able to implement it. Note that a few more pieces have been put into place since you were last here. The new operator overloading scheme will be in place in the next release, which will radically simplify the existing opIndex() mess. See also this patch: http://d.puremagic.com/issues/show_bug.cgi?id=3474 This is mentioned in Andrei's book, so it should appear a few compiler releases from now.
 In any case, I would really love to see a strong community for 
 numerical computing arise around D!

Me too. Don.

Feb 22 2010
prev sibling parent reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Norbert Nemec wrote:
 Hi there,

Hi!
 [...]
 
 I am currently considering to apply for an academic research fellowship 
 that would allow me to work full time on the development of a numerical 
 library for D. The core of this work would be the implementation of 
 multi-dimensional numerical arrays and array expressions similar to what 
 NumPy does in Python. It would remain to be seen how much can be done as 
 a library and what should be done at a language level. Ultimately, I 
 believe D could achive support for numerical programming with the 
 comfort, simplicity and expressiveness of Python and the performance of 
 Fortran.

I completely agree. And I really don't think there is a lot more that needs to be done with the language -- I find D code as easy to read as Python code, and more elegant. The performance issue is a matter of allowing the compilers to mature, which will happen now that D2 is frozen. Personally, I have a strong belief that D could and should replace FORTRAN and C(++) as the language of choice for numerical scientists.
 [...]
 
 In any case, I would really love to see a strong community for numerical 
 computing arise around D!

Me too. I have the privilege of being able to use D in my PhD work, and as such I have written and ported a few algorithms. Recently I made them available on dsource: http://www.dsource.org/projects/scid SciD contains, among other things, - numerical differentiation functions - a complete D port of QUADPACK - incomplete ports of MINPACK and NAPACK - convenient wrappers around some LAPACK functions Unfortunately, I don't have time to work full time on this -- mostly I just add things whenever I need them for work. But if someone, such as yourself, set out to create a comprehensive scientifice library, I would definitely be interested in contributing. There are others in the community who have published scientific code as well. dsimcha's dstats library and Bill Baxter's MultiArray have already been mentioned. Yet more are listed at: http://prowiki.org/wiki4d/wiki.cgi?ScientificLibraries -Lars
Feb 23 2010
next sibling parent reply Norbert Nemec <Norbert Nemec-online.de> writes:
Lars T. Kyllingstad wrote:
 I completely agree.  And I really don't think there is a lot more that 
 needs to be done with the language -- I find D code as easy to read as 
 Python code, and more elegant.  The performance issue is a matter of 
 allowing the compilers to mature, which will happen now that D2 is frozen.

True. Since IFTI is implemented, I don't see any major show stoppers. Of course, implementing a template expression numerical library a la Boost++ will be a stress test for D and I expect to find a number of rough edges, but that's actually part of the challenge of this project...
 Personally, I have a strong belief that D could and should replace 
 FORTRAN and C(++) as the language of choice for numerical scientists.

That's exactly my dream!
 [...]
   http://www.dsource.org/projects/scid

http://prowiki.org/wiki4d/wiki.cgi?ScientificLibraries

Indeed, there is plenty of groundwork to start out from. I have not really started going through it, but I would make it part of my project to organize these community contributions.
Feb 23 2010
parent Norbert Nemec <Norbert Nemec-online.de> writes:
Robert Jacques wrote:
 You might want to look at bugs 3474 and 2257, as well as the 
 opIndex/opSlice problem. None of these are show-stoppers, but they do 
 limit the api design and 2257 can be very exasperating.

Thanks for the warning! I know the opIndex/opSlice issue and am already considering possible solutions. The other issue is new to me, but I would expect that this is not the only bug that will need fixing in the course of this project...
Feb 23 2010
prev sibling parent "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 23 Feb 2010 07:04:22 -0500, Norbert Nemec  
<Norbert nemec-online.de> wrote:
 Lars T. Kyllingstad wrote:
 I completely agree.  And I really don't think there is a lot more that  
 needs to be done with the language -- I find D code as easy to read as  
 Python code, and more elegant.  The performance issue is a matter of  
 allowing the compilers to mature, which will happen now that D2 is  
 frozen.

True. Since IFTI is implemented, I don't see any major show stoppers. Of course, implementing a template expression numerical library a la Boost++ will be a stress test for D and I expect to find a number of rough edges, but that's actually part of the challenge of this project...

You might want to look at bugs 3474 and 2257, as well as the opIndex/opSlice problem. None of these are show-stoppers, but they do limit the api design and 2257 can be very exasperating.
Feb 23 2010