www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Scala Spark-like RDD for D?

reply data pulverizer <data.pulverizer gmail.com> writes:
Are there are any plans to create a scala spark-like RDD class 
for D 
(https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? 
This is a powerful model that has taken the data science world by 
storm; it would be useful to have something like this in the D 
world. Most of the algorithms in statistics/data science are 
iterative in nature which fits well with this kind of data model.

I read through the Kind Of Container thread which has some 
relationship with this issue 
(https://forum.dlang.org/thread/n07rh8$dmb$1 digitalmars.com). It 
looks like Immutability would be the way to go for an RDD data 
structure. But I am not wedded to any model as long as we can 
have something that performs the same functionality as the RDD.

As an alternative are there plans for parallel/cluster computing 
frameworks for D?

Apologies if I am kicking a hornet's nest. It is not my intention.

Thanks
Feb 15 2016
next sibling parent reply data pulverizer <data.pulverizer gmail.com> writes:
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:
 Are there are any plans to create a scala spark-like RDD class 
 for D 
 (https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf)? This is a
powerful model that has taken the data science world by storm; it would be
useful to have something like this in the D world. Most of the algorithms in
statistics/data science are iterative in nature which fits well with this kind
of data model.

 I read through the Kind Of Container thread which has some 
 relationship with this issue 
 (https://forum.dlang.org/thread/n07rh8$dmb$1 digitalmars.com). 
 It looks like Immutability would be the way to go for an RDD 
 data structure. But I am not wedded to any model as long as we 
 can have something that performs the same functionality as the 
 RDD.

 As an alternative are there plans for parallel/cluster 
 computing frameworks for D?

 Apologies if I am kicking a hornet's nest. It is not my 
 intention.

 Thanks
Perhaps the question is too prescriptive. Another way is: Does D have a big data strategy? But I tried to anchor it to some currently functioning framework which is why I suggested RDD.
Feb 15 2016
parent reply Jakob Jenkov <jakob jenkov.com> writes:
 Perhaps the question is too prescriptive. Another way is: Does 
 D have a big data strategy? But I tried to anchor it to some 
 currently functioning framework which is why I suggested RDD.
I cannot speak on behalf of the D community. In my opinion I don't think that it is D that needs a big data strategy. It is the users of D that need that strategy. I am originally a Java developer. Java devs. create all kinds of crazy tools all the time. Lots fail, but some survive and grow big, like Spark. D devs need to do the same. Just jump into it. Have it be your hobby project in D. Then see where it takes you.
Feb 16 2016
parent jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 16 February 2016 at 15:03:36 UTC, Jakob Jenkov wrote:
 I cannot speak on behalf of the D community. In my opinion I 
 don't think that it is D that needs a big data strategy. It is 
 the users of D that need that strategy.

 I am originally a Java developer. Java devs. create all kinds 
 of crazy tools all the time. Lots fail, but some survive and 
 grow big, like Spark.

 D devs need to do the same. Just jump into it. Have it be your 
 hobby project in D. Then see where it takes you.
Good attitude. Nevertheless, I think there is a much larger population of people who would want to use D for normal data analysis if packages could replicate much of what people do in R/Python. If the OP really wants to contribute to big data projects in D, he might want to start with things that will more easily allow D to interact with existing libraries. For instance, Google's MR4C allows C code to be run in a Hadoop instance. Maybe adding support for D might be do-able? http://google-opensource.blogspot.com/2015/02/mapreduce-for-c-run-native-code-in.html There is likely value in writing bindings to machine learning libraries. I did a quick search of machine learning libraries and much of it looked like it was in C++. I don't have much expertise with writing bindings to C++ libraries.
Feb 16 2016
prev sibling parent reply bachmeier <no spam.com> writes:
On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
wrote:

 As an alternative are there plans for parallel/cluster 
 computing frameworks for D?
You can use MPI: https://github.com/DlangScience/OpenMPI
Feb 16 2016
parent reply Jon D <jond noreply.com> writes:
On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:
 On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
 wrote:

 As an alternative are there plans for parallel/cluster 
 computing frameworks for D?
You can use MPI: https://github.com/DlangScience/OpenMPI
FWIW, I'm interested in the wider topic of incorporating D into data science environments also. Sounds as if there are several interesting projects in the area, but so far my understanding of them is limited. Perhaps the forum isn't the best place to discuss, but if there happen to be any blog posts or other descriptions, it'd be great to get links. --Jon
Feb 16 2016
parent reply bachmeier <no spam.net> writes:
On Wednesday, 17 February 2016 at 02:03:40 UTC, Jon D wrote:
 On Tuesday, 16 February 2016 at 16:27:27 UTC, bachmeier wrote:
 On Monday, 15 February 2016 at 11:09:10 UTC, data pulverizer 
 wrote:

 As an alternative are there plans for parallel/cluster 
 computing frameworks for D?
You can use MPI: https://github.com/DlangScience/OpenMPI
FWIW, I'm interested in the wider topic of incorporating D into data science environments also. Sounds as if there are several interesting projects in the area, but so far my understanding of them is limited. Perhaps the forum isn't the best place to discuss, but if there happen to be any blog posts or other descriptions, it'd be great to get links. --Jon
You can discuss here, but there is also a gitter room https://gitter.im/DlangScience/public Also, I've got a project that embeds R inside D http://lancebachmeier.com/rdlang/ It's not quite as good a user experience as others because I have limited time for things not related to work. I've got an older project to embed D inside R, but it hasn't been updated in a while and it's Linux only. https://bitbucket.org/bachmeil/dmdinline2
Feb 16 2016
parent Jon D <jond noreply.com> writes:
On Wednesday, 17 February 2016 at 02:32:01 UTC, bachmeier wrote:
 You can discuss here, but there is also a gitter room

 https://gitter.im/DlangScience/public

 Also, I've got a project that embeds R inside D

 http://lancebachmeier.com/rdlang/

 It's not quite as good a user experience as others because I 
 have limited time for things not related to work. I've got an 
 older project to embed D inside R, but it hasn't been updated 
 in a while and it's Linux only.

 https://bitbucket.org/bachmeil/dmdinline2
Excellent, thanks, I'll check these out. --Jon
Feb 16 2016