www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - pfft 0.1

reply "jerro" <a a.com> writes:
I'm announcing the release of pfft, a fast FFT written in D.

Code: https://github.com/jerro/pfft/

Downloads: https://github.com/jerro/pfft/downloads

Documentation: http://jerro.github.com/pfft/doc/pfft.pfft.html

Benchmarks: http://jerro.github.com/pfft/benchmarks/
Jul 20 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
jerro:

 I'm announcing the release of pfft, a fast FFT written in D.

Everything looks nice. Are you using "in", pure/nothrow/immutable/const enough? Is it worth changing the Phobos API to something similar to this API? Bye, bearophile
Jul 20 2012
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 7/20/12 8:02 PM, jerro wrote:
 On Friday, 20 July 2012 at 23:36:37 UTC, bearophile wrote:
 Is it worth changing the Phobos API to something similar to this API?

Pfft already includes an API that is compatible with the Phobos one (pfft.stdapi). But I don't think it makes any sense to change the Phobos API to something like pfft.pfft. The main difference between the two is that std.numeric.Fft uses interleaved format for sequences of complex numbers and pfft.pfft uses a split format. Interleaved format is more convenient in most cases and I think it's more intuitive too. The FFT in Phobos also works on ranges, while pfft.pfft only works on aligned arrays. The only reason the pfft.pfft works the way it does is that it is supposed to be an interface to the underlying implementation with no loss of performance and that is how the implementation works. Even if the implementation in Phobos did use split format, I think it would still be better to use interleaved format for the API. It's just about 30% slower and I think that a nice API is more important for the standard library than a 30% difference in speed.

This all is very interesting. Even considering all potential issues and incompatibilities, it might be great to look into integrating Fft into Phobos. Do you think it would work to add it as a "high-speed alternative with different format and alignment requirements"? Offhand I'd think someone using the FFT is generally interested in speed, and a 30% margin is sensible since the algorithm might be most of some applications' run time.
 There is one more important difference in the API between
 std.numeric.Fft and pfft.pfft. With Phobos you can use one
 instance of Fft for all sizes and types. With pfft.pfft, the Fft
 class is parametrized with a floating point type and you need a
 different instance for each size. I think what Phobos does in
 this case is wrong. The fact that one class works on all
 floating point types results in very poor precision when
 the data consists of doubles or reals. I guess that precomputed
 tables are stored as floats. You could only fix that by saving
 them as reals, but that would probably hurt performance quite
 a lot. The fact that one instance can be used for multiple sizes
 would be a problem if we wanted to change an implementation since
 not all FFT implementation can use one precomputed table for
 different data sizes.

Do you think Phobos can be fixed in a backwards-compatible way (e.g. provide a template and alias it to the old name)? Andrei
Jul 24 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
On Friday, 20 July 2012 at 23:36:37 UTC, bearophile wrote:
 jerro:

 I'm announcing the release of pfft, a fast FFT written in D.

Everything looks nice. Are you using "in", pure/nothrow/immutable/const enough?

Not yet, but I plan to add those.
 Is it worth changing the Phobos API to something similar to 
 this API?

Pfft already includes an API that is compatible with the Phobos one (pfft.stdapi). But I don't think it makes any sense to change the Phobos API to something like pfft.pfft. The main difference between the two is that std.numeric.Fft uses interleaved format for sequences of complex numbers and pfft.pfft uses a split format. Interleaved format is more convenient in most cases and I think it's more intuitive too. The FFT in Phobos also works on ranges, while pfft.pfft only works on aligned arrays. The only reason the pfft.pfft works the way it does is that it is supposed to be an interface to the underlying implementation with no loss of performance and that is how the implementation works. Even if the implementation in Phobos did use split format, I think it would still be better to use interleaved format for the API. It's just about 30% slower and I think that a nice API is more important for the standard library than a 30% difference in speed. There is one more important difference in the API between std.numeric.Fft and pfft.pfft. With Phobos you can use one instance of Fft for all sizes and types. With pfft.pfft, the Fft class is parametrized with a floating point type and you need a different instance for each size. I think what Phobos does in this case is wrong. The fact that one class works on all floating point types results in very poor precision when the data consists of doubles or reals. I guess that precomputed tables are stored as floats. You could only fix that by saving them as reals, but that would probably hurt performance quite a lot. The fact that one instance can be used for multiple sizes would be a problem if we wanted to change an implementation since not all FFT implementation can use one precomputed table for different data sizes.
 Bye,
 bearophile

Jul 20 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
jerro:

Are all your benchmarks done on a 64 bit system?

 I think what Phobos does in
 this case is wrong. The fact that one class works on all
 floating point types results in very poor precision when
 the data consists of doubles or reals. I guess that precomputed
 tables are stored as floats. You could only fix that by saving
 them as reals, but that would probably hurt performance quite
 a lot. The fact that one instance can be used for multiple sizes
 would be a problem if we wanted to change an implementation 
 since
 not all FFT implementation can use one precomputed table for
 different data sizes.

If you are right and this is a problem, are Andrei and others accepting to change this little part of Phobos? If the answer is positive, are you interested in creating a GIT patch that changes that? Bye, bearophile
Jul 20 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 Are all your benchmarks done on a 64 bit system?

They are. Here's one comparison with 32 bit (for complex single precision transform using sse)if you are interested in that: http://imgur.com/CTuCD . The 32 bit executable is slower, probably because there are less general purpose and SSE registers on x86 than on x86_64.
 If you are right and this is a problem, are Andrei and others 
 accepting to change this little part of Phobos? If the answer 
 is positive, are you interested in creating a GIT patch that 
 changes that?

I think that poor precision is a problem. I have checked now and std.numeric.Fft does indeed use floats for the lookup table. The precision problem could be solved by either changing that to real or by changing Fft to a template and using whatever the type parameter is. Just changing float to real doesn't require changing the API, but would probably result in worse performance. I haven't talked to Andrei or others about changing it, but I am willing to write a patch that changes the API, if it would be decided that would be the best thing to do. If it would be decided that it's best to just change the type of the lookup table that's trivial anyway, since it's just one typedef.
Jul 20 2012
prev sibling next sibling parent deadalnix <deadalnix gmail.com> writes:
On 21/07/2012 01:26, jerro wrote:
 I'm announcing the release of pfft, a fast FFT written in D.

 Code: https://github.com/jerro/pfft/

 Downloads: https://github.com/jerro/pfft/downloads

 Documentation: http://jerro.github.com/pfft/doc/pfft.pfft.html

 Benchmarks: http://jerro.github.com/pfft/benchmarks/

A signal processing module in phobos sound like a lot of fun.
Jul 21 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 I think that poor precision is a problem. I have checked now and
 std.numeric.Fft does indeed use floats for the lookup table.
 The precision problem could be solved by either changing that to
 real or by changing Fft to a template and using whatever the 
 type
 parameter is. Just changing float to real doesn't require 
 changing
 the API, but would probably result in worse performance.

I forgot to mention one more solution (the one that pfft.stdapi currently uses). The class could also use lazy initialization. When the fft is called, check if the lookup table for that type is already created, create it if it isn't, store it for later and then compute the fft. This would require casting away const in the fft() method, though.
Jul 21 2012
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
jerro:

 I haven't talked to Andrei or others about changing it, but I am
 willing to write a patch that changes the API, if it would be
 decided that would be the best thing to do.

Given your willingness to work, and the days since you wrote that, maybe we have to write this little proposal again in the main D newsgroup. Bye, bearophile
Jul 24 2012
prev sibling next sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Friday, 20 July 2012 at 23:26:46 UTC, jerro wrote:
 I'm announcing the release of pfft, a fast FFT written in D.

 Code: https://github.com/jerro/pfft/

 Downloads: https://github.com/jerro/pfft/downloads

 Documentation: http://jerro.github.com/pfft/doc/pfft.pfft.html

 Benchmarks: http://jerro.github.com/pfft/benchmarks/

Nice work. Just did the usual advertising in HN and Reddit, http://news.ycombinator.com/item?id=4286257 http://www.reddit.com/r/programming/comments/x2tt8/pfft_a_fast_fourier_transform_written_in_d/ -- Paulo
Jul 24 2012
prev sibling next sibling parent "jerro" <a a.com> writes:
 This all is very interesting. Even considering all potential 
 issues and incompatibilities, it might be great to look into 
 integrating Fft into Phobos. Do you think it would work to add 
 it as a "high-speed alternative with different format and 
 alignment requirements"? Offhand I'd think someone using the 
 FFT is generally interested in speed, and a 30% margin is 
 sensible since the algorithm might be most of some 
 applications' run time.

If we were integrating pfft into Phobos, it would be best to replace the current implementation completely. Pfft already includes an API that is Phobos compatible (it deinterleaves and copies the input to an internal buffer and then interleaves the result and copies it to the output buffer when it's done). This interface is slower than the in place, split format interface, but is still a lot faster than the current Phobos implementation if SIMD operations are available. I guess it wouldn't hurt to also add split format, in place functions which would require aligned arrays for those who need every bit of performance.
 Do you think Phobos can be fixed in a backwards-compatible way 
 (e.g. provide a template and alias it to the old name)?

After thinking about it a bit I think that it would be best to solve this using lazy initialization. Users probably don't care that much whether the internal data is computed in the or just before computing the first transform. That way the API doesn't need to be changed.
Jul 24 2012
prev sibling parent "jerro" <a a.com> writes:
 Nice work.

 Just did the usual advertising in HN and Reddit,

 http://news.ycombinator.com/item?id=4286257

 http://www.reddit.com/r/programming/comments/x2tt8/pfft_a_fast_fourier_transform_written_in_d/

 --
 Paulo

Thanks.
Jul 24 2012