digitalmars.D.announce - Generalized Linear Models and Stochastic Gradient Descent in D

data pulverizer (8/8) Jun 10 2017 Hi all,

Nicholas Wilson (46/54) Jun 10 2017 Maybe its the default rendering but the open math font is hard to

data pulverizer (58/104) Jun 11 2017 It is obvious that you took time and care to review the article.

Nicholas Wilson (10/95) Jun 11 2017 I suspect that most of the people that would know what

data pulverizer (5/19) Jun 11 2017 Unfortunately I over-wrote the code before the first commit to

9il (6/14) Jun 10 2017 The code has huge number of allocations. For example, T[][]

9il (2/5) Jun 10 2017 It is already has hight level ndslice interface for inv (inverse)

data pulverizer (4/11) Jun 11 2017 p.s. I think the work you guys are doing on lubeck is very
data pulverizer (5/12) Jun 11 2017 p.p.s Okay, I can see how writing an article using D's latest

9il (6/19) Jun 11 2017 See also mir.ndslice.algorithm and mir.ndslice.topology. They are

data pulverizer (26/31) Jun 11 2017 You are right - I realised this as I was writing the script but I

9il (11/39) Jun 11 2017 We supports Mir related posts.

jmh530 (4/10) Jun 11 2017 I wasn't familiar with Lubeck, so thanks for bringing attention

Ilya (11/23) Jun 11 2017 It uses mir-blas and mir-lapack packages. They are low level and

jmh530 (10/20) Jun 12 2017 Thanks. This will definitely be a very useful library. Your

9il (4/15) Jun 12 2017 Laeeth proposed the name

data pulverizer <data.pulverizer gmail.com> writes:

Hi all,

I have written a draft article on Generalized Linear Models and 
Stochastic Gradient Descent in D 
(https://github.com/dataPulverizer/glm-stochastic-gradient-descent-d/blob/m
ster/article.ipynb) and would greatly appreciate your suggestions.

Apologies for the spartan nature of the article formatting. That 
will change for the final draft.

Thank you in advance

DP

Jun 10 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Saturday, 10 June 2017 at 20:03:16 UTC, data pulverizer wrote:
 Hi all,

 I have written a draft article on Generalized Linear Models and 
 Stochastic Gradient Descent in D 
 (https://github.com/dataPulverizer/glm-stochastic-gradient-descent-d/blob/m
ster/article.ipynb) and would greatly appreciate your suggestions.

 Apologies for the spartan nature of the article formatting. 
 That will change for the final draft.

 Thank you in advance

 DP

Maybe its the default rendering but the open math font is hard to 
read as the sub scripts get vertically compressed.

My suggestions:

Distinguish between the likelihood functions for gamma and normal 
rather than calling them both L(x). Maybe L subscript uppercase 
gamma and L subscript N?

Links to wikipedia for the technical terms (e.g. dispersion, chi 
squared, curvature), again the vertical compression of the math 
font does not help here (subscripts of fractions) . It will 
expand your audience if they don't get lost in the introduction!

Speaking of not losing your audience: give a link to the NRA 
and/or a brief explanation of how it generalises to higher 
dimensions (graph or animation for the 2D case would be good, 
perhaps take something from wikipedia)

I dont think it is necessary to show the signature of the BLAS 
and Lapacke function, just a short description and link should 
suffice. also any reason you don't use GLAS?

I would just have gParamCalcs as its own function (unless you are 
trying to show off that particular feature of D).

omit the parentheses of .array() and .reduce()

You use .array a lot: how much of that is necessary? I dont think 
it is in zip(k.repeat().take(n).array(), x, y, mu)

`return(curv);` should be `return curve;`

Any reason you don't square the tolerance rather than sqrt the 
parsDiff?

for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?

zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); 
=>dot(pars,x)?

Theres a lot of code and text, some images and graphs would be 
nice, particularly in combination with a more real world example 
use case.

Factor out code like a[2].repeat().take(a[1].length) to a 
function, perhaps use some more BLAS routines for things like

.map!( a =>
                         zip(a[0].repeat().take(a[1].length),
                             a[1],
                             a[2].repeat().take(a[1].length),
                             a[3].repeat().take(a[1].length))
                         .map!(a => -a[2]*(a[0]/a[3])*a[1])
                         .array())
                     .array();

to make it more obvious what the calculation is doing.

It might not be the point of the article but it would be good to 
show some performance figures, I'm sure optimisation tips will be 
forthcoming.

Jun 10 2017

data pulverizer <data.pulverizer gmail.com> writes:

It is obvious that you took time and care to review the article. 
Thank you very much!

On Sunday, 11 June 2017 at 00:40:23 UTC, Nicholas Wilson wrote:
 Maybe its the default rendering but the open math font is hard 
 to read as the sub scripts get vertically compressed.

 My suggestions:

 Distinguish between the likelihood functions for gamma and 
 normal rather than calling them both L(x). Maybe L subscript 
 uppercase gamma and L subscript N?

Good idea!

 Links to wikipedia for the technical terms (e.g. dispersion, 
 chi squared, curvature), again the vertical compression of the 
 math font does not help here (subscripts of fractions) . It 
 will expand your audience if they don't get lost in the 
 introduction!

Yes I should definitely add clarification references. I should 
probably also add that curvature is also the Hessian, however I 
recently developed a dislike for calling mathematical constructs 
after people or odd names I was serious thinking about calling 
Newton-Raphson something else but that might be taking it too far.

I'll end up writing the final in html so I can add a decent html 
latex package to modify the size of the equations.

 Speaking of not losing your audience: give a link to the NRA 
 and/or a brief explanation of how it generalises to higher 
 dimensions (graph or animation for the 2D case would be good, 
 perhaps take something from wikipedia)

NRA? Don't understand that acronym with reference to the article.

I shall mention the generalisation of the equations over multiple 
observations.

I agree that there is a danger of loosing the audience and 
perhaps some graphics would be nice.


 I dont think it is necessary to show the signature of the BLAS 
 and Lapacke function, just a short description and link should 
 suffice. also any reason you don't use GLAS?

True

 I would just have gParamCalcs as its own function (unless you 
 are trying to show off that particular feature of D).

I think its easier to use mixin in since the same code is 
required since mu and k are used in gLogLik, gGradient, 
gCurvature and xB is also used in gLogLik. Showing off the use of 
mixins is also a plus.

 omit the parentheses of .array() and .reduce()

Yes

 You use .array a lot: how much of that is necessary? I dont 
 think it is in zip(k.repeat().take(n).array(), x, y, mu)

Yes, I should remove the points where .array() is not necessary

 `return(curv);` should be `return curve;`

Thanks that's my R bleeding into my D! So should be:

return curv;

 Any reason you don't square the tolerance rather than sqrt the 
 parsDiff?

The calculation is the L2 norm which ends in sqrt later used for 
the stopping criterion as in the equation

 for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?

hmm potato

 zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); 
 =>dot(pars,x)?

Fair point. When I started writing the article I considered 
attempting to write the whole thing in D functional style - with 
no external libraries. In the end I didn't want to write a matrix 
inverse in functional style so I rolled it back somewhat and 
started adding C calls which is more sensible.

 Theres a lot of code and text, some images and graphs would be 
 nice, particularly in combination with a more real world 
 example use case.

I would agree that the article does need be less austere - 
however the article is about the GLM algorithm rather than its 
uses. I think the analyst should know whether they need a GLM or 
not - there are many sources that explain applications of GLM - I 
could perhaps reference some.

 Factor out code like a[2].repeat().take(a[1].length) to a 
 function, perhaps use some more BLAS routines for things like

 .map!( a =>
                         zip(a[0].repeat().take(a[1].length),
                             a[1],
                             a[2].repeat().take(a[1].length),
                             a[3].repeat().take(a[1].length))
                         .map!(a => -a[2]*(a[0]/a[3])*a[1])
                         .array())
                     .array();

 to make it more obvious what the calculation is doing.

Yes

 It might not be the point of the article but it would be good 
 to show some performance figures, I'm sure optimisation tips 
 will be forthcoming.

Since I am using whatever cblas algorithm is installed I'm not 
sure that benchmarks would really mean much. Ilya raised a good 
point about the amount of copying I am doing - as I was writing 
it I thought so too. I address this below.

Thanks again for taking time to review the article!

My main take way from writing this article is that it would be 
quite straightforward to write a small GLM package in D - I'd use 
quite a different approach with structs/classes GLM objects to 
remove the copying issues and to give a consistent interface to 
the user.

An additional takeaway for me was that I also found the use of 
array operations like

a[] = b[]*c[]

or

d[] -= e[] -f

created odd effects in my calculations the outputs were wrong and 
for ages I didn't know why but later ended up removing those 
expressions from the code all together - which remedied the 
problem.

Jun 11 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Sunday, 11 June 2017 at 10:21:03 UTC, data pulverizer wrote:
 Speaking of not losing your audience: give a link to the NRA 
 and/or a brief explanation of how it generalises to higher 
 dimensions (graph or animation for the 2D case would be good, 
 perhaps take something from wikipedia)

 NRA? Don't understand that acronym with reference to the 
 article.

Sorry Newton-Raphson Algorithm.

 I shall mention the generalisation of the equations over 
 multiple observations.

 I agree that there is a danger of loosing the audience and 
 perhaps some graphics would be nice.

I suspect that most of the people that would know what 
Newton-Raphson is took it in undergrad math and probably didn't 
cover multidimensional cases (I didn't).
 I dont think it is necessary to show the signature of the BLAS 
 and Lapacke function, just a short description and link should 
 suffice. also any reason you don't use GLAS?

 True

 I would just have gParamCalcs as its own function (unless you 
 are trying to show off that particular feature of D).

 I think its easier to use mixin in since the same code is 
 required since mu and k are used in gLogLik, gGradient, 
 gCurvature and xB is also used in gLogLik. Showing off the use 
 of mixins is also a plus.

Then I would state why and give a brief explanation of what that 
mixin does.
 omit the parentheses of .array() and .reduce()

 Yes

 You use .array a lot: how much of that is necessary? I dont 
 think it is in zip(k.repeat().take(n).array(), x, y, mu)

 Yes, I should remove the points where .array() is not necessary

 `return(curv);` should be `return curve;`

 Thanks that's my R bleeding into my D! So should be:

 return curv;

 Any reason you don't square the tolerance rather than sqrt the 
 parsDiff?

 The calculation is the L2 norm which ends in sqrt later used 
 for the stopping criterion as in the equation

 for(int i = 0; i < nepochs; ++i) => foreach(i; iota(epochs))?

 hmm potato

 zip(pars, x).map!(a => a[0]*a[1]).reduce!((a, b) => a + b); 
 =>dot(pars,x)?

 Fair point. When I started writing the article I considered 
 attempting to write the whole thing in D functional style - 
 with no external libraries. In the end I didn't want to write a 
 matrix inverse in functional style so I rolled it back somewhat 
 and started adding C calls which is more sensible.

 Theres a lot of code and text, some images and graphs would be 
 nice, particularly in combination with a more real world 
 example use case.

 I would agree that the article does need be less austere - 
 however the article is about the GLM algorithm rather than its 
 uses. I think the analyst should know whether they need a GLM 
 or not - there are many sources that explain applications of 
 GLM - I could perhaps reference some.

Fair enough, I'm just thinking about breadth of audience.

 Factor out code like a[2].repeat().take(a[1].length) to a 
 function, perhaps use some more BLAS routines for things like

 .map!( a =>
                         zip(a[0].repeat().take(a[1].length),
                             a[1],
                             a[2].repeat().take(a[1].length),
                             a[3].repeat().take(a[1].length))
                         .map!(a => -a[2]*(a[0]/a[3])*a[1])
                         .array())
                     .array();

 to make it more obvious what the calculation is doing.

 Yes

 It might not be the point of the article but it would be good 
 to show some performance figures, I'm sure optimisation tips 
 will be forthcoming.

 Since I am using whatever cblas algorithm is installed I'm not 
 sure that benchmarks would really mean much. Ilya raised a good 
 point about the amount of copying I am doing - as I was writing 
 it I thought so too. I address this below.

 Thanks again for taking time to review the article!

No problem, always happy to help.

 My main take way from writing this article is that it would be 
 quite straightforward to write a small GLM package in D - I'd 
 use quite a different approach with structs/classes GLM objects 
 to remove the copying issues and to give a consistent interface 
 to the user.

 An additional takeaway for me was that I also found the use of 
 array operations like

 a[] = b[]*c[]

 or

 d[] -= e[] -f

 created odd effects in my calculations the outputs were wrong 
 and for ages I didn't know why but later ended up removing 
 those expressions from the code all together - which remedied 
 the problem.

Hmm, did you report those? They _should_  just work.

Jun 11 2017

data pulverizer <data.pulverizer gmail.com> writes:

On Sunday, 11 June 2017 at 11:30:03 UTC, Nicholas Wilson wrote:
 An additional takeaway for me was that I also found the use of 
 array operations like

 a[] = b[]*c[]

 or

 d[] -= e[] -f

 created odd effects in my calculations the outputs were wrong 
 and for ages I didn't know why but later ended up removing 
 those expressions from the code all together - which remedied 
 the problem.

 Hmm, did you report those? They _should_  just work.

Unfortunately I over-wrote the code before the first commit to 
Github and did not report it. The whole experience is too 
traumatic for me to repeat :-) but I think its something to do 
with the way the arrays are being referenced.

Jun 11 2017

9il <ilyayaroshenko gmail.com> writes:

On Saturday, 10 June 2017 at 20:03:16 UTC, data pulverizer wrote:
 Hi all,

 I have written a draft article on Generalized Linear Models and 
 Stochastic Gradient Descent in D 
 (https://github.com/dataPulverizer/glm-stochastic-gradient-descent-d/blob/m
ster/article.ipynb) and would greatly appreciate your suggestions.

 Apologies for the spartan nature of the article formatting. 
 That will change for the final draft.

 Thank you in advance

 DP

The code has huge number of allocations. For example, T[][] 
matrixes are and then concatenated to be used in BLAS.

Why not to use ndslice and Lubeck [1] libraries instead?

[1] https://github.com/kaleidicassociates/lubeck

Ilya

Jun 10 2017

9il <ilyayaroshenko gmail.com> writes:

 Why not to use ndslice and Lubeck [1] libraries instead?

 [1] https://github.com/kaleidicassociates/lubeck

 Ilya

It is already has hight level ndslice interface for inv (inverse) 
and mtimes (matmul).

Jun 10 2017

data pulverizer <data.pulverizer gmail.com> writes:

On Sunday, 11 June 2017 at 01:59:37 UTC, 9il wrote:
 Why not to use ndslice and Lubeck [1] libraries instead?

 [1] https://github.com/kaleidicassociates/lubeck

 Ilya

 It is already has hight level ndslice interface for inv 
 (inverse) and mtimes (matmul).

p.s. I think the work you guys are doing on lubeck is very 
important! Numerical computing in D desperately needs a high 
level linear algebra package.

Jun 11 2017

data pulverizer <data.pulverizer gmail.com> writes:

On Sunday, 11 June 2017 at 01:59:37 UTC, 9il wrote:
 Why not to use ndslice and Lubeck [1] libraries instead?

 [1] https://github.com/kaleidicassociates/lubeck

 Ilya

 It is already has hight level ndslice interface for inv 
 (inverse) and mtimes (matmul).

p.p.s Okay, I can see how writing an article using D's latest 
numerical library will help increase knowledge of packages in the 
ecosystem. Fair enough. I shall look into re-writing to include 
lubeck

Jun 11 2017

9il <ilyayaroshenko gmail.com> writes:

On Sunday, 11 June 2017 at 12:04:36 UTC, data pulverizer wrote:
 On Sunday, 11 June 2017 at 01:59:37 UTC, 9il wrote:
 Why not to use ndslice and Lubeck [1] libraries instead?

 [1] https://github.com/kaleidicassociates/lubeck

 Ilya

 It is already has hight level ndslice interface for inv 
 (inverse) and mtimes (matmul).

 p.p.s Okay, I can see how writing an article using D's latest 
 numerical library will help increase knowledge of packages in 
 the ecosystem. Fair enough. I shall look into re-writing to 
 include lubeck

See also mir.ndslice.algorithm and mir.ndslice.topology. They are 
more efficient then Phobos analogs. For example, 
mir.ndslice.algorithm (each, reduce and others) can iterate 
multiple ndslice at once, while Phobos requires to not efficient 
std.range.zip.

Jun 11 2017

data pulverizer <data.pulverizer gmail.com> writes:

On Sunday, 11 June 2017 at 01:57:52 UTC, 9il wrote:
 The code has huge number of allocations. For example, T[][] 
 matrixes are and then concatenated to be used in BLAS.

You are right - I realised this as I was writing the script but I 
address this point later ...

 Why not to use ndslice and Lubeck [1] libraries instead?

 [1] https://github.com/kaleidicassociates/lubeck

 Ilya

Thank you for mentioning the Lubeck package, I did not know about 
it and it looks very interesting.

The article is exploratory, I also assume that the person reading 
it is busy. I tend to gravitate towards Phobos because its there 
- its the standard library and comes with D, its easy to write 
code with it and easy for a reader to access. If I write an 
article with code I want it to be likely that:

1. Anyone can download and run the code immediately and it will 
work.
2. If someone sees the article in 6 months or 3 years and 
downloads the code it will work.
3. The reader will be able to look up all the functions I have 
used in the D website - makes it very easy for learners.

At this stage the numerical computing ecosystem in D is not 
mature and could change drastically. I added a link to the Mir 
library at the top because I wanted people to be aware of the Mir 
project.

The article is more about GLM in D than performance but I can 
point to the Lubeck package in the article and mention your 
observation on the allocations - making it clearer upfront.

As I said in the previous reply, I did learn a lot from writing 
the article and I think the performance observation is highly 
relevant for building a GLM package in D.

Jun 11 2017

9il <ilyayaroshenko gmail.com> writes:

On Sunday, 11 June 2017 at 11:10:38 UTC, data pulverizer wrote:
 On Sunday, 11 June 2017 at 01:57:52 UTC, 9il wrote:
 [...]

 You are right - I realised this as I was writing the script but 
 I address this point later ...

 [...]

 Thank you for mentioning the Lubeck package, I did not know 
 about it and it looks very interesting.

 The article is exploratory, I also assume that the person 
 reading it is busy. I tend to gravitate towards Phobos because 
 its there - its the standard library and comes with D, its easy 
 to write code with it and easy for a reader to access. If I 
 write an article with code I want it to be likely that:

 1. Anyone can download and run the code immediately and it will 
 work.
 2. If someone sees the article in 6 months or 3 years and 
 downloads the code it will work.
 3. The reader will be able to look up all the functions I have 
 used in the D website - makes it very easy for learners.

We supports Mir related posts.
https://github.com/libmir/blog/pull/35

 At this stage the numerical computing ecosystem in D is not 
 mature and could change drastically. I added a link to the Mir 
 library at the top because I wanted people to be aware of the 
 Mir project.

I ported few large complex Matlab scripts using Lubeck and 
Mir-Algorithm (closed source).
It works perfectly and results are the same as Matlab original!
All functions from Lubeck was used in this work.
Mir Algorithm has over then 2K downloads, and its downloads rates 
grows fast.

 The article is more about GLM in D than performance but I can 
 point to the Lubeck package in the article and mention your 
 observation on the allocations - making it clearer upfront.

 As I said in the previous reply, I did learn a lot from writing 
 the article and I think the performance observation is highly 
 relevant for building a GLM package in D.


Regards,
Ilya

Jun 11 2017

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 11 June 2017 at 12:36:19 UTC, 9il wrote:
 I ported few large complex Matlab scripts using Lubeck and 
 Mir-Algorithm (closed source).
 It works perfectly and results are the same as Matlab original!
 All functions from Lubeck was used in this work.
 Mir Algorithm has over then 2K downloads, and its downloads 
 rates grows fast.

I wasn't familiar with Lubeck, so thanks for bringing attention 
to it.

Can I ask why this isn't part of mir?

Jun 11 2017

Ilya <ilyayaroshenko gmail.com> writes:

On Sunday, 11 June 2017 at 13:36:09 UTC, jmh530 wrote:
 On Sunday, 11 June 2017 at 12:36:19 UTC, 9il wrote:
 I ported few large complex Matlab scripts using Lubeck and 
 Mir-Algorithm (closed source).
 It works perfectly and results are the same as Matlab original!
 All functions from Lubeck was used in this work.
 Mir Algorithm has over then 2K downloads, and its downloads 
 rates grows fast.

 I wasn't familiar with Lubeck, so thanks for bringing attention 
 to it.

 Can I ask why this isn't part of mir?

It uses mir-blas and mir-lapack packages. They are low level and 
satisfy betterC requirements. Mir is a libraries collection to 
write more high level easy to use libraries. Mir should provide 
basic data types, patterns, and low level API.

Lubeck is high level library for scripting like programming. I 
like that it is located at Kaleidic GitHub. It is very important 
for D to have successful open source commercial projects. 
mir-blas, mir-lapack, and Lubeck are sponsored by Symmetry 
Investments and Kaleidic Associates. Another example is ASDF by 
Tamedia Digital. BTW, I need to write announce )

Jun 11 2017

jmh530 <john.michael.hall gmail.com> writes:

On Sunday, 11 June 2017 at 15:57:14 UTC, Ilya wrote:
 It uses mir-blas and mir-lapack packages. They are low level 
 and satisfy betterC requirements. Mir is a libraries collection 
 to write more high level easy to use libraries. Mir should 
 provide basic data types, patterns, and low level API.

 Lubeck is high level library for scripting like programming. I 
 like that it is located at Kaleidic GitHub. It is very 
 important for D to have successful open source commercial 
 projects. mir-blas, mir-lapack, and Lubeck are sponsored by 
 Symmetry Investments and Kaleidic Associates. Another example 
 is ASDF by Tamedia Digital. BTW, I need to write announce )

Thanks. This will definitely be a very useful library. Your 
write-up might explain the meaning of the name.

Also, as you haven't done the announce yet, I'm still a little 
unsure about how high level Lubeck will get. For instance, if I 
write some Matrix math heavy code, then it's going to have mtimes 
all throughout it. I think there was some discussion in the past 
about a struct with opBinary overloaded (at which point it may be 
useful to point out that a DIP to add some more things that 
opBinary could overload might be useful).

Jun 12 2017

9il <ilyayaroshenko gmail.com> writes:

On Monday, 12 June 2017 at 14:46:30 UTC, jmh530 wrote:
 On Sunday, 11 June 2017 at 15:57:14 UTC, Ilya wrote:
 [...]

 Thanks. This will definitely be a very useful library. Your 
 write-up might explain the meaning of the name.

Laeeth proposed the name
https://en.wikipedia.org/wiki/L%C3%BCbeck

 Also, as you haven't done the announce yet, I'm still a little 
 unsure about how high level Lubeck will get. For instance, if I 
 write some Matrix math heavy code, then it's going to have 
 mtimes all throughout it. I think there was some discussion in 
 the past about a struct with opBinary overloaded (at which 
 point it may be useful to point out that a DIP to add some more 
 things that opBinary could overload might be useful).

Yes, the DIP for opBinary would be very useful.

Jun 12 2017

D Programming

C/C++ Programming

Other

digitalmars.D.announce - Generalized Linear Models and Stochastic Gradient Descent in D