www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Mir vs. Numpy: Reworked!

reply 9il <ilyayaroshenko gmail.com> writes:
Hi all,

Since the first announcement [0] the original benchmark [1] has 
been boosted [2] with Mir-like implementations.

D+Mir:
  1. is more abstract than NumPy
  2. requires less code for multidimensional algorithms
  3. doesn't require indexing
  4. uses recursion across dimensions
  5. a few times faster than NumPy for non-trivial real-world 
applications.

Why Mir is faster than NumPy?

1. Mir allows the compiler to generate specialized kernels while 
NumPy constraints a user to write code that needs to access 
memory twice or more times.

Another Mir killer feature is the ability to write generalized 
N-dimensional implementations, while Numpy code needs to have 
separate implementations for 1D, 2D, and 3D cases. For example, 
the main D loop in the benchmark can compile for 4D, 5D, and 
higher dimensional optimizations.

2.  nogc iteration loop.  nogc helps when you need to control 
what is going on with your memory allocations in the critical 
code part.

[0] 
https://forum.dlang.org/post/pemharpztorlqkxdooul forum.dlang.org
[1] https://github.com/typohnebild/numpy-vs-mir
[2] https://github.com/typohnebild/numpy-vs-mir/pull/1

The benchmark [1] has been created by Christoph Alt and Tobias 
Schmidt.

Kind regards,
Ilya
Dec 03 2020
next sibling parent reply Andre Pany <andre s-e-a-p.de> writes:
On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
 Hi all,

 Since the first announcement [0] the original benchmark [1] has 
 been boosted [2] with Mir-like implementations.

 D+Mir:
  1. is more abstract than NumPy
  2. requires less code for multidimensional algorithms
  3. doesn't require indexing
  4. uses recursion across dimensions
  5. a few times faster than NumPy for non-trivial real-world 
 applications.

 Why Mir is faster than NumPy?

 1. Mir allows the compiler to generate specialized kernels 
 while NumPy constraints a user to write code that needs to 
 access memory twice or more times.

 Another Mir killer feature is the ability to write generalized 
 N-dimensional implementations, while Numpy code needs to have 
 separate implementations for 1D, 2D, and 3D cases. For example, 
 the main D loop in the benchmark can compile for 4D, 5D, and 
 higher dimensional optimizations.

 2.  nogc iteration loop.  nogc helps when you need to control 
 what is going on with your memory allocations in the critical 
 code part.

 [0] 
 https://forum.dlang.org/post/pemharpztorlqkxdooul forum.dlang.org
 [1] https://github.com/typohnebild/numpy-vs-mir
 [2] https://github.com/typohnebild/numpy-vs-mir/pull/1

 The benchmark [1] has been created by Christoph Alt and Tobias 
 Schmidt.

 Kind regards,
 Ilya
Hi Ilya, Thanks a lot for sharing the update. I am currently working on porting a python package called FMPY to D. This package makes usage of numpy and I hope I can use MIR here. Somehow it is hard to get started to learn MIR. What maybe could help python developers is to have some articles showing numpy coding and side by side the equivalent MIR coding. What I miss in MIR is a function to read and write CSV files. Is s.th. like numpy.genfromtxt planned? Kind regards Andre
Dec 03 2020
parent reply 9il <ilyayaroshenko gmail.com> writes:
On Thursday, 3 December 2020 at 16:50:39 UTC, Andre Pany wrote:
 On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
 Hi all,

 Since the first announcement [0] the original benchmark [1] 
 has been boosted [2] with Mir-like implementations.

 D+Mir:
  1. is more abstract than NumPy
  2. requires less code for multidimensional algorithms
  3. doesn't require indexing
  4. uses recursion across dimensions
  5. a few times faster than NumPy for non-trivial real-world 
 applications.

 Why Mir is faster than NumPy?

 1. Mir allows the compiler to generate specialized kernels 
 while NumPy constraints a user to write code that needs to 
 access memory twice or more times.

 Another Mir killer feature is the ability to write generalized 
 N-dimensional implementations, while Numpy code needs to have 
 separate implementations for 1D, 2D, and 3D cases. For 
 example, the main D loop in the benchmark can compile for 4D, 
 5D, and higher dimensional optimizations.

 2.  nogc iteration loop.  nogc helps when you need to control 
 what is going on with your memory allocations in the critical 
 code part.

 [0] 
 https://forum.dlang.org/post/pemharpztorlqkxdooul forum.dlang.org
 [1] https://github.com/typohnebild/numpy-vs-mir
 [2] https://github.com/typohnebild/numpy-vs-mir/pull/1

 The benchmark [1] has been created by Christoph Alt and Tobias 
 Schmidt.

 Kind regards,
 Ilya
Hi Ilya, Thanks a lot for sharing the update. I am currently working on porting a python package called FMPY to D. This package makes usage of numpy and I hope I can use MIR here.
Probably you may want to express FMI entities as Algebraic types rather than classes. mir.algebraic can be really helpful here http://mir-core.libmir.org/mir_algebraic.html
 Somehow it is hard to get started to learn MIR. What maybe 
 could help python developers is to have some articles showing 
 numpy coding and side by side the equivalent MIR coding.
It is hard for me to write articles. I will try to write a small one this year, but it would be Mir only. Maybe this benchmark can be used as an example and if one wishes to write a side-by-side comparison with NumPy I would be happy to comment and explain the D implementation and what it is doing internally.
 What I miss in MIR is a function to read and write CSV files. 
 Is s.th. like numpy.genfromtxt planned?
Unlikely I would add it but can do a code review. Currently, we can load/safe NumPy binary data with numir https://libmir.github.io/numir/io.html Kind regards, Ilya
Dec 04 2020
parent Andre Pany <andre s-e-a-p.de> writes:
On Saturday, 5 December 2020 at 07:04:59 UTC, 9il wrote:
 On Thursday, 3 December 2020 at 16:50:39 UTC, Andre Pany wrote:
 On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
 [...]
Hi Ilya, Thanks a lot for sharing the update. I am currently working on porting a python package called FMPY to D. This package makes usage of numpy and I hope I can use MIR here.
Probably you may want to express FMI entities as Algebraic types rather than classes. mir.algebraic can be really helpful here http://mir-core.libmir.org/mir_algebraic.html
 Somehow it is hard to get started to learn MIR. What maybe 
 could help python developers is to have some articles showing 
 numpy coding and side by side the equivalent MIR coding.
It is hard for me to write articles. I will try to write a small one this year, but it would be Mir only. Maybe this benchmark can be used as an example and if one wishes to write a side-by-side comparison with NumPy I would be happy to comment and explain the D implementation and what it is doing internally.
 What I miss in MIR is a function to read and write CSV files. 
 Is s.th. like numpy.genfromtxt planned?
Unlikely I would add it but can do a code review. Currently, we can load/safe NumPy binary data with numir https://libmir.github.io/numir/io.html Kind regards, Ilya
Thanks a lot for the tipps. I will work through the documentation of mir_algebraic. Currently I follow the strategy to have the D coding as similiar as possible to the python coding. Fmpy is in active development and I want to backport changes easily. I totally missed the fact that MIR can load / safe numpy binary data. This is really great. Kind regards Andre
Dec 05 2020
prev sibling next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
 Hi all,

 Since the first announcement [0] the original benchmark [1] has 
 been boosted [2] with Mir-like implementations.

 D+Mir:
  1. is more abstract than NumPy
  2. requires less code for multidimensional algorithms
  3. doesn't require indexing
  4. uses recursion across dimensions
  5. a few times faster than NumPy for non-trivial real-world 
 applications.

 Why Mir is faster than NumPy?

 1. Mir allows the compiler to generate specialized kernels 
 while NumPy constraints a user to write code that needs to 
 access memory twice or more times.

 Another Mir killer feature is the ability to write generalized 
 N-dimensional implementations, while Numpy code needs to have 
 separate implementations for 1D, 2D, and 3D cases. For example, 
 the main D loop in the benchmark can compile for 4D, 5D, and 
 higher dimensional optimizations.

 2.  nogc iteration loop.  nogc helps when you need to control 
 what is going on with your memory allocations in the critical 
 code part.

 [0] 
 https://forum.dlang.org/post/pemharpztorlqkxdooul forum.dlang.org
 [1] https://github.com/typohnebild/numpy-vs-mir
 [2] https://github.com/typohnebild/numpy-vs-mir/pull/1

 The benchmark [1] has been created by Christoph Alt and Tobias 
 Schmidt.

 Kind regards,
 Ilya
Looks good, but a few typos: "The big difference is especially visible in this figures." "For bigger prolem sizes the FLOP/s slightly drop and finally level out." "Propably this is mainly caused by the overhead of the Python interpreter and might be reduced by more optimization efforts."
Dec 03 2020
parent 9il <ilyayaroshenko gmail.com> writes:
On Thursday, 3 December 2020 at 17:08:58 UTC, jmh530 wrote:
 On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
 Looks good, but a few typos:
Thanks!
Dec 04 2020
prev sibling next sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 3 December 2020 at 16:27:59 UTC, 9il wrote:
 Hi all,

 Since the first announcement [0] the original benchmark [1] has 
 been boosted [2] with Mir-like implementations.

 ... [SNIP]

 Kind regards,
 Ilya
Very interesting work. What is the difference between Mir's field, slice, native and ndslice? Looks like your packages are really hotting up. I wonder how the performance compares with an equivalent Julia implementation.
Dec 03 2020
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 3 December 2020 at 20:25:11 UTC, data pulverizer 
wrote:
 [snip]

 Very interesting work. What is the difference between Mir's 
 field, slice, native and ndslice? [...]
The document says: Slice: Python like. Uses D Slices and Strides for grouping (Red-Black). Naive: one for-loop for each dimension. Matrix-Access via multi-dimensional Array. Field: one for-loop. Matrix is flattened. Access via flattened index. NdSlice: D like. Uses just MIR functionalities.
Dec 03 2020
next sibling parent mw <mingwu gmail.com> writes:
On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
 On Thursday, 3 December 2020 at 20:25:11 UTC, data pulverizer 
 wrote:
 [snip]

 Very interesting work. What is the difference between Mir's 
 field, slice, native and ndslice? [...]
The document says: Slice: Python like. Uses D Slices and Strides for grouping (Red-Black). Naive: one for-loop for each dimension. Matrix-Access via multi-dimensional Array. Field: one for-loop. Matrix is flattened. Access via flattened index. NdSlice: D like. Uses just MIR functionalities.
As Andre said: """ What maybe could help python developers is to have some articles showing numpy coding and side by side the equivalent MIR coding. """ I think such Numpy v.s Mir side-by-side equivalent (or improvement) document will greatly boost the adoption of Mir.
Dec 03 2020
prev sibling next sibling parent data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
 The document says:
     Slice: Python like. Uses D Slices and Strides for grouping 
 (Red-Black).
     Naive: one for-loop for each dimension. Matrix-Access via 
 multi-dimensional Array.
     Field: one for-loop. Matrix is flattened. Access via 
 flattened index.
     NdSlice: D like. Uses just MIR functionalities.
Thanks, evidently I should have been more thorough in reading the document. :-)
Dec 03 2020
prev sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:
 The document says:
     Slice: Python like. Uses D Slices and Strides for grouping 
 (Red-Black).
     Naive: one for-loop for each dimension. Matrix-Access via 
 multi-dimensional Array.
     Field: one for-loop. Matrix is flattened. Access via 
 flattened index.
     NdSlice: D like. Uses just MIR functionalities.
It's quite interesting because it says that it's well worth implementing a field index as supposed to naive access - at least for this algorithm. It makes sense because in the field case at least you know that all the data is on the same array - and therefore in close proximity in memory, whereas individual arrays in multidimensional array could be far apart in the memory. NDSlice is even faster for this case - cool. Am I correct in assuming that the data in the NDSlice is also a single array?
Dec 03 2020
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Friday, 4 December 2020 at 02:35:49 UTC, data pulverizer wrote:
 [snip]
 NDSlice is even faster for this case - cool. Am I correct in 
 assuming that the data in the NDSlice is also a single array?
It looks like all the `sweep_XXX` functions are only defined for contiguous slices, as that would be the default if define a Slice!(T, N). How the functions access the data is a big difference. If you compare the `sweep_field` version with the `sweep_naive` version, the `sweep_field` function is able to access through one index, whereas the `sweep_naive` function has to use two in the 2d version and 3 in the 3d version. Also, the main difference in the NDSlice version is that it uses *built-in* MIR functionality, like how `sweep_ndslice` uses the `each` function from MIR, whereas `sweep_field` uses a for loop. I think this is partially to show that the built-in MIR functionality is as fast as if you tried to do it with a for loop yourself.
Dec 04 2020
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:
 It looks like all the `sweep_XXX` functions are only defined 
 for contiguous slices, as that would be the default if define a 
 Slice!(T, N).

 How the functions access the data is a big difference. If you 
 compare the `sweep_field` version with the `sweep_naive` 
 version, the `sweep_field` function is able to access through 
 one index, whereas the `sweep_naive` function has to use two in 
 the 2d version and 3 in the 3d version.

 Also, the main difference in the NDSlice version is that it 
 uses *built-in* MIR functionality, like how `sweep_ndslice` 
 uses the `each` function from MIR, whereas `sweep_field` uses a 
 for loop. I think this is partially to show that the built-in 
 MIR functionality is as fast as if you tried to do it with a 
 for loop yourself.
I see, looking at some of the code, field case is literally doing the indexing calculation right there. I guess ndslice is doing the same thing just with "Mir magic" an in the background? Still, ndslice is able to get a consistent higher rate of flops than the field case - interesting. One thing I discovered about these kinds of plots is that introducing log scale or two particularly for timed comparisons can make the differences between different methods that look close clearer. A log plot might show some consistent difference between the timings of ndslice and the field case. Underneath they should be doing essentially the same thing so teasing out what is causing the difference would be interesting. Is Mir doing some more efficient form of the indexing calculation than naked field calculations? I'm still not sure why slice is so slow. Doesn't that completely rely on the opSlice implementations? The choice of indexing method and underlying data structure? Isn't it just a symbolic interface that you write whatever you want?
Dec 04 2020
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Friday, 4 December 2020 at 20:26:17 UTC, data pulverizer wrote:
 [snip]

 I see, looking at some of the code, field case is literally 
 doing the indexing calculation right there. I guess ndslice is 
 doing the same thing just with "Mir magic" an in the 
 background? Still, ndslice is able to get a consistent higher 
 rate of flops than the field case - interesting. One thing I 
 discovered about these kinds of plots is that introducing log 
 scale or two particularly for timed comparisons can make the 
 differences between different methods that look close clearer. 
 A log plot might show some consistent difference between the 
 timings of ndslice and the field case. Underneath they should 
 be doing essentially the same thing so teasing out what is 
 causing the difference would be interesting. Is Mir doing some 
 more efficient form of the indexing calculation than naked 
 field calculations?

 I'm still not sure why slice is so slow. Doesn't that 
 completely rely on the opSlice implementations? The choice of 
 indexing method and underlying data structure? Isn't it just a 
 symbolic interface that you write whatever you want?
Ilya might have a better ability to answer that than me.
Dec 04 2020
prev sibling parent 9il <ilyayaroshenko gmail.com> writes:
On Friday, 4 December 2020 at 20:26:17 UTC, data pulverizer wrote:
 On Friday, 4 December 2020 at 14:48:32 UTC, jmh530 wrote:
 It looks like all the `sweep_XXX` functions are only defined 
 for contiguous slices, as that would be the default if define 
 a Slice!(T, N).

 How the functions access the data is a big difference. If you 
 compare the `sweep_field` version with the `sweep_naive` 
 version, the `sweep_field` function is able to access through 
 one index, whereas the `sweep_naive` function has to use two 
 in the 2d version and 3 in the 3d version.

 Also, the main difference in the NDSlice version is that it 
 uses *built-in* MIR functionality, like how `sweep_ndslice` 
 uses the `each` function from MIR, whereas `sweep_field` uses 
 a for loop. I think this is partially to show that the 
 built-in MIR functionality is as fast as if you tried to do it 
 with a for loop yourself.
I see, looking at some of the code, field case is literally doing the indexing calculation right there. I guess ndslice is doing the same thing just with "Mir magic" an in the background?
sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC to unroll the loop. More details here https://forum.dlang.org/post/qejwviqovawnuniuagtd forum.dlang.org
 I'm still not sure why slice is so slow. Doesn't that 
 completely rely on the opSlice implementations? The choice of 
 indexing method and underlying data structure?
sweep_slice is slower because it iterates data in few loops rather than in a single one. For small matrices this makes JMP/FLOP ratio higher, for large matrices that can't feet into the CPU cache, it is less memory efficient.
Dec 05 2020
prev sibling parent reply 9il <ilyayaroshenko gmail.com> writes:
On Friday, 4 December 2020 at 02:35:49 UTC, data pulverizer wrote:
 On Thursday, 3 December 2020 at 21:28:04 UTC, jmh530 wrote:

 Am I correct in assuming that the data in the NDSlice is also a 
 single array?
sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC to unroll the loop. For example, for 2D case, withNeighboursSum [2] will store the pointer to the result, and the pointer at rows above and below. matrix: -------------- ------a------- above iterator ------r------- the result ------b------- below iterator -------------- Also, for AVX-512 targets it allows vectorizing the loop [1]. The benchmark has been run on the AVX2 CPU. [1] https://github.com/typohnebild/numpy-vs-mir/issues/4 [2]
Dec 04 2020
parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:
 sweep_ndslice uses (2*N - 1) arrays to index U, this allows LDC 
 to unroll the loop.

 For example, for 2D case, withNeighboursSum [2] will store the 
 pointer to the result, and the pointer at rows above and below.

 matrix:
 --------------
 ------a------- above iterator
 ------r------- the result
 ------b------- below iterator
 --------------

 Also, for AVX-512 targets it allows vectorizing the loop [1]. 
 The benchmark has been run on the AVX2 CPU.

 [1] https://github.com/typohnebild/numpy-vs-mir/issues/4
 [2] 

Very interesting, thank you for the explanations. Are there journal/book other implementation references for these approaches to implementing tensor-like multidimensional arrays? Tensor-like multidimensional arrays data structures is one of the worst covered in research/conventional literature compared to almost anything else, which can be rather frustrating.
Dec 06 2020
parent reply 9il <ilyayaroshenko gmail.com> writes:
On Sunday, 6 December 2020 at 17:30:13 UTC, data pulverizer wrote:
 On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:
 sweep_ndslice uses (2*N - 1) arrays to index U, this allows 
 LDC to unroll the loop.

 For example, for 2D case, withNeighboursSum [2] will store the 
 pointer to the result, and the pointer at rows above and below.

 matrix:
 --------------
 ------a------- above iterator
 ------r------- the result
 ------b------- below iterator
 --------------

 Also, for AVX-512 targets it allows vectorizing the loop [1]. 
 The benchmark has been run on the AVX2 CPU.

 [1] https://github.com/typohnebild/numpy-vs-mir/issues/4
 [2] 

Very interesting, thank you for the explanations. Are there journal/book other implementation references for these approaches to implementing tensor-like multidimensional arrays?
I don't know. Tensors aren't so complex. The complex part is a design that allows Mir to construct and iterate various kinds of lazy tensors of any complexity and have quite a universal API, and all of these are boosted by the fact that the user-provided kernel(lambda) function is optimized by the compiler without the overhead.
Dec 06 2020
next sibling parent reply Igor Shirkalin <isemsoft gmail.com> writes:
On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
 On Sunday, 6 December 2020 at 17:30:13 UTC, data pulverizer 
 wrote:
 On Saturday, 5 December 2020 at 07:44:33 UTC, 9il wrote:
 sweep_ndslice uses (2*N - 1) arrays to index U, this allows 
 LDC to unroll the loop.
I don't know. Tensors aren't so complex. The complex part is a design that allows Mir to construct and iterate various kinds of lazy tensors of any complexity and have quite a universal API, and all of these are boosted by the fact that the user-provided kernel(lambda) function is optimized by the compiler without the overhead.
Agreed. As a matter of fact the simplest convolutions of tensors are out of date. It is like there's no need to calculate inverse matrix. Mir is the usefull work for author, of course, and practically almost not used. Every one who needs something fast in his own tasks should make same things again in D.
Dec 07 2020
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin wrote:
 [snip]

 Agreed. As a matter of fact the simplest convolutions of 
 tensors are out of date. It is like there's no need to 
 calculate inverse matrix. Mir is the usefull work for author, 
 of course, and practically almost not used. Every one who needs 
 something fast in his own tasks should make same things again 
 in D.
"no need to calculate inverse matrix" What? Since when?
Dec 07 2020
next sibling parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin 
 wrote:
 [snip]

 Agreed. As a matter of fact the simplest convolutions of 
 tensors are out of date. It is like there's no need to 
 calculate inverse matrix. Mir is the usefull work for author, 
 of course, and practically almost not used. Every one who 
 needs something fast in his own tasks should make same things 
 again in D.
"no need to calculate inverse matrix" What? Since when?
I dont know what he meant in this context, but a common technique in computer graphics is to build the inverse as as you apply computations.
Dec 07 2020
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim Grostad 
wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 [snip]

 "no need to calculate inverse matrix" What? Since when?
I dont know what he meant in this context, but a common technique in computer graphics is to build the inverse as as you apply computations.
Ah, well if you have a small matrix, then it's not so hard to calculate the inverse anyway.
Dec 07 2020
parent reply Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Monday, 7 December 2020 at 13:48:51 UTC, jmh530 wrote:
 On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim Grostad 
 wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 [snip]

 "no need to calculate inverse matrix" What? Since when?
I dont know what he meant in this context, but a common technique in computer graphics is to build the inverse as as you apply computations.
Ah, well if you have a small matrix, then it's not so hard to calculate the inverse anyway.
It is an optimization, maybe also for accuracy, dunno. So, instead of ending up with a transform from coordinate system A to B, you also get the transform from B to A for cheap. This may matter when the next step is to go from B to C... And so on...
Dec 07 2020
next sibling parent Igor Shirkalin <isemsoft gmail.com> writes:
On Monday, 7 December 2020 at 13:54:26 UTC, Ola Fosheim Grostad 
wrote:
 On Monday, 7 December 2020 at 13:48:51 UTC, jmh530 wrote:
 On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim 
 Grostad wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 [snip]

 "no need to calculate inverse matrix" What? Since when?
I dont know what he meant in this context, but a common technique in computer graphics is to build the inverse as as you apply computations.
Ah, well if you have a small matrix, then it's not so hard to calculate the inverse anyway.
It is an optimization, maybe also for accuracy, dunno.
Exactly. Optimization plus accuracy.
Dec 10 2020
prev sibling parent Igor Shirkalin <isemsoft gmail.com> writes:
On Monday, 7 December 2020 at 13:54:26 UTC, Ola Fosheim Grostad 
wrote:
 On Monday, 7 December 2020 at 13:48:51 UTC, jmh530 wrote:
 On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim 
 Grostad wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 [snip]

 "no need to calculate inverse matrix" What? Since when?
I dont know what he meant in this context, but a common technique in computer graphics is to build the inverse as as you apply computations.
Ah, well if you have a small matrix, then it's not so hard to calculate the inverse anyway.
It is an optimization, maybe also for accuracy, dunno. So, instead of ending up with a transform from coordinate system A to B, you also get the transform from B to A for cheap. This may matter when the next step is to go from B to C... And so on...
A good example is a Simplex method for linear programming. It can be done such as you have to calculate inverse [m x m] matrix every step. Better, make a transform from one inverse matrix to another, that speeds up algorithm from O(3) to O(2) and even more. You don't even need to calculate the first inverse matrix if the algorithm is built in such a way that it is trivial. It is just one example.
Dec 10 2020
prev sibling parent Igor Shirkalin <isemsoft gmail.com> writes:
On Monday, 7 December 2020 at 13:41:17 UTC, Ola Fosheim Grostad 
wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin 
 wrote:
 [snip]

 Agreed. As a matter of fact the simplest convolutions of 
 tensors are out of date. It is like there's no need to 
 calculate inverse matrix. Mir is the usefull work for author, 
 of course, and practically almost not used. Every one who 
 needs something fast in his own tasks should make same things 
 again in D.
"no need to calculate inverse matrix" What? Since when?
I dont know what he meant in this context, but a common technique in computer graphics is to build the inverse as as you apply computations.
It makes sense for simplest cases if matrices are small (2x2, 3x3 or even 4x4) and you have to multiply them at least thousands of times.
Dec 10 2020
prev sibling parent reply Igor Shirkalin <isemsoft gmail.com> writes:
On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 On Monday, 7 December 2020 at 11:21:16 UTC, Igor Shirkalin 
 wrote:
 [snip]

 Agreed. As a matter of fact the simplest convolutions of 
 tensors are out of date. It is like there's no need to 
 calculate inverse matrix. Mir is the usefull work for author, 
 of course, and practically almost not used. Every one who 
 needs something fast in his own tasks should make same things 
 again in D.
"no need to calculate inverse matrix" What? Since when?
Since when highly optimized algorithms are required. This does not mean that you should not know the algorithms for calculating the inverse matrix.
Dec 10 2020
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 10 December 2020 at 11:07:06 UTC, Igor Shirkalin 
wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 "no need to calculate inverse matrix" What? Since when?
Since when highly optimized algorithms are required. This does not mean that you should not know the algorithms for calculating the inverse matrix.
I still find myself inverting large matrices from time to time. Maybe there are ways to reduce the number of times I do it, but it still needs to get done for some types of problems.
Dec 10 2020
parent Igor Shirkalin <isemsoft gmail.com> writes:
On Thursday, 10 December 2020 at 14:49:08 UTC, jmh530 wrote:
 On Thursday, 10 December 2020 at 11:07:06 UTC, Igor Shirkalin 
 wrote:
 On Monday, 7 December 2020 at 13:17:47 UTC, jmh530 wrote:
 "no need to calculate inverse matrix" What? Since when?
Since when highly optimized algorithms are required. This does not mean that you should not know the algorithms for calculating the inverse matrix.
I still find myself inverting large matrices from time to time. Maybe there are ways to reduce the number of times I do it, but it still needs to get done for some types of problems.
It is what I do for science purposes too, from time to time.
Dec 10 2020
prev sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
 I don't know. Tensors aren't so complex. The complex part is a 
 design that allows Mir to construct and iterate various kinds 
 of lazy tensors of any complexity and have quite a universal 
 API, and all of these are boosted by the fact that the 
 user-provided kernel(lambda) function is optimized by the 
 compiler without the overhead.
I agree that a basic tensor is not hard to implement, but the specific design to choose is not always obvious. Your benchmarks shows that design choices have a large impact on performance, and performance is certainly a very important consideration in tensor design. For example I had no idea that your ndslice variant was using more than one array internally to achieve its performance - it wasn't obvious to me. I think literature that discuss various design choices and approaches would be useful and informative. There is plenty of literature on creating tree structures, linked lists, stacks, queues, hash tables and so forth, but virtually nothing on tensor data structures. It isn't as if implementing a linked list is any more complex than a tensor. I just think it's a bit strange that there is so little on the topic - given the widespread use of tensors in computational science.
Dec 07 2020
parent reply 9il <ilyayaroshenko gmail.com> writes:
On Monday, 7 December 2020 at 12:28:39 UTC, data pulverizer wrote:
 On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
 I don't know. Tensors aren't so complex. The complex part is a 
 design that allows Mir to construct and iterate various kinds 
 of lazy tensors of any complexity and have quite a universal 
 API, and all of these are boosted by the fact that the 
 user-provided kernel(lambda) function is optimized by the 
 compiler without the overhead.
I agree that a basic tensor is not hard to implement, but the specific design to choose is not always obvious. Your benchmarks shows that design choices have a large impact on performance, and performance is certainly a very important consideration in tensor design. For example I had no idea that your ndslice variant was using more than one array internally to achieve its performance - it wasn't obvious to me.
ndslice tensor type uses exactly one iterator. However, the iterator is generic and lazy iterators may contain any number of other iterators and pointers.
Dec 07 2020
parent Igor Shirkalin <isemsoft gmail.com> writes:
On Monday, 7 December 2020 at 13:07:23 UTC, 9il wrote:
 On Monday, 7 December 2020 at 12:28:39 UTC, data pulverizer 
 wrote:
 On Monday, 7 December 2020 at 02:14:41 UTC, 9il wrote:
 I don't know. Tensors aren't so complex. The complex part is 
 a design that allows Mir to construct and iterate various 
 kinds of lazy tensors of any complexity and have quite a 
 universal API, and all of these are boosted by the fact that 
 the user-provided kernel(lambda) function is optimized by the 
 compiler without the overhead.
I agree that a basic tensor is not hard to implement, but the specific design to choose is not always obvious. Your benchmarks shows that design choices have a large impact on performance, and performance is certainly a very important consideration in tensor design. For example I had no idea that your ndslice variant was using more than one array internally to achieve its performance - it wasn't obvious to me.
ndslice tensor type uses exactly one iterator. However, the iterator is generic and lazy iterators may contain any number of other iterators and pointers.
How does the iterator of Mir differ from the concept of an iterator in D and the use of your own design of tensors and the actions that need to be performed on them then in terms of speed of execution if we know how to achive it?
Dec 10 2020
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 12/3/2020 8:27 AM, 9il wrote:
 Since the first announcement [0] the original benchmark [1] has been boosted
[2] 
 with Mir-like implementations.
This is really great! Can you write an article about it? Such would be really helpful in letting people know about it.
Dec 03 2020
parent 9il <ilyayaroshenko gmail.com> writes:
On Friday, 4 December 2020 at 03:48:15 UTC, Walter Bright wrote:
 On 12/3/2020 8:27 AM, 9il wrote:
 Since the first announcement [0] the original benchmark [1] 
 has been boosted [2] with Mir-like implementations.
This is really great! Can you write an article about it? Such would be really helpful in letting people know about it.
Thanks! The README is really great as the benchmark description. I will do a small article about Mir this year.
Dec 04 2020