www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - DasBetterR

reply bachmeier <no spam.net> writes:
I've been using D and R together for a decade. I wrote [a blog 
post for the D 
Blog](https://dlang.org/blog/2020/01/27/d-for-data-science-calling-r-from-d/)
on the eve of the pandemic. I released the [embedrv2
library](https://github.com/bachmeil/embedrv2) in late 2021. It's useful for
writing D functions that are called from R, using D's metaprogramming to write
the necessary bindings for you.

My programs usually take the opposite approach, where D is the 
primary language, and I call into R to fill in missing 
functionality. I've accumulated a large collection of code 
snippets to enable all kinds of things. The problem is that they 
were scattered across many projects, there was no consistency 
across programs, documentation didn't exist, and they were more 
or less useless to anyone other than me.

[This Github repo](https://github.com/bachmeil/betterr) includes 
D modules, tests demonstrating most of the functionality, 
documentation, and some posts about how I do specific things. I'm 
sharing publicly all the things I've been doing in case it has 
value to anyone else.

Examples of functionality:

- Creating, accessing, and mutating R data structures, including 
vector, matrix, data frame, list, array, and time series types. 
Reference counting handles memory management.
- Basic statistical functionality like calculating the mean. Many 
of these functions use Mir for efficiency.
- Linear algebra
- Random number generation and sampling
- Parallel random number generation
- Numerical optimization: direct access to the C libraries used 
by R's optim function
- Quadratic programming
- Passing D functions to R without creating a shared library. For 
example, you can use a D function as the objective function you 
pass to constrOptim for constrained optimization problems.

[Project website](https://bachmeil.github.io/betterr/)

There's more detail on the website, but I used the name "Better 
R" because the entirety of R is available inside your D program 
and you can use D to improve on it as much as you'd like. Feel 
free to hate the name.

I was originally going to include all of this as part of 
embedrv2, but realized there was almost no overlap between the 
two use cases. Moreover, it would be strange to call R from D and 
call D functions from R in the same program. It simplifies things 
to keep them in different projects.

If you try it and have problems, you can [create a 
discussion](https://github.com/bachmeil/betterr/discussions). You 
can also post in this forum, but I won't guarantee I'll see it.
Jun 29 2023
next sibling parent zjh <fqbqrr 163.com> writes:
On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:
 I've been using D and R together for a decade. I wrote [a blog 
 post for the D 
 Blog](https://dlang.org/blog/2020/01/27/d-for-data-science-calling-r-from-d/)
on the eve of the pandemic. I released the [embedrv2
library](https://github.com/bachmeil/embedrv2) in late 2021. It's useful for
writing D functions that are called from R, using D's metaprogramming to write
the necessary bindings for you.
Nice.
Jun 29 2023
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/29/23 7:51 PM, bachmeier wrote:
 I've been using D and R together for a decade. I wrote [a blog post for 
 the D 
 Blog](https://dlang.org/blog/2020/01/27/d-for-data-science-calling-r-from-d/)
on the eve of the pandemic. I released the [embedrv2
library](https://github.com/bachmeil/embedrv2) in late 2021. It's useful for
writing D functions that are called from R, using D's metaprogramming to write
the necessary bindings for you.
 
 My programs usually take the opposite approach, where D is the primary 
 language, and I call into R to fill in missing functionality. I've 
 accumulated a large collection of code snippets to enable all kinds of 
 things. The problem is that they were scattered across many projects, 
 there was no consistency across programs, documentation didn't exist, 
 and they were more or less useless to anyone other than me.
 
 [This Github repo](https://github.com/bachmeil/betterr) includes D 
 modules, tests demonstrating most of the functionality, documentation, 
 and some posts about how I do specific things. I'm sharing publicly all 
 the things I've been doing in case it has value to anyone else.
 
 Examples of functionality:
 
 - Creating, accessing, and mutating R data structures, including vector, 
 matrix, data frame, list, array, and time series types. Reference 
 counting handles memory management.
 - Basic statistical functionality like calculating the mean. Many of 
 these functions use Mir for efficiency.
 - Linear algebra
 - Random number generation and sampling
 - Parallel random number generation
 - Numerical optimization: direct access to the C libraries used by R's 
 optim function
 - Quadratic programming
 - Passing D functions to R without creating a shared library. For 
 example, you can use a D function as the objective function you pass to 
 constrOptim for constrained optimization problems.
 
 [Project website](https://bachmeil.github.io/betterr/)
This is very cool! I've never used R, but I have wanted to learn more about such languages.
 There's more detail on the website, but I used the name "Better R" 
 because the entirety of R is available inside your D program and you can 
 use D to improve on it as much as you'd like. Feel free to hate the name.
Awfull, awfull name... -Steve
Jun 29 2023
prev sibling next sibling parent Guillaume Piolat <first.last spam.org> writes:
On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:
 If you try it and have problems, you can [create a 
 discussion](https://github.com/bachmeil/betterr/discussions). 
 You can also post in this forum, but I won't guarantee I'll see 
 it.
Super cool, congrats!
Jun 30 2023
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:
 [snip]
Glad you're continuing to do work on this front. There's a lot of great material explaining things, which is always good. It would be cool to have another version of the link below for using a mir Slice with R. https://bachmeil.github.io/betterr/setvar.html
Jun 30 2023
next sibling parent reply bachmeier <no spam.net> writes:
On Friday, 30 June 2023 at 16:14:48 UTC, jmh530 wrote:
 On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:
 [snip]
Glad you're continuing to do work on this front. There's a lot of great material explaining things, which is always good. It would be cool to have another version of the link below for using a mir Slice with R. https://bachmeil.github.io/betterr/setvar.html
I assume you mean that you've allocated memory on the D side, like this: ``` auto a = new double[24]; a[] = 1.6; Slice!(double*, 1) s = a.sliced(); ``` and you want to pass s to R for further analysis. Unfortunately, that will not work. R functions only work with memory R has allocated. It has a single struct type, so there's no way to pass s in this example to R. The best you can do right now is something like this: ``` auto a = Vector(24); Slice!(double*,1) s = a.ptr[0..24].sliced(); // Manipulate s // Send a as an argument to R functions ``` In other words, you let R allocate a, and then you work with the underlying data array as a slice. A way around this limitation would be to implement the same struct (SEXPREC) in D, while avoiding issues with R's garbage collector. That's a more involved problem than I've been willing to take on. If someone has the interest, the SEXPREC struct is defined here: https://github.com/wch/r-source/blob/060f8b64a3a8e489d8684c18b269eea63f182e73/src include/Defn.h#L184 and the internals are documented here: https://cran.r-project.org/doc/manuals/r-release/R-ints.html#SEXPs As much fun as it is to figure these things out, I have never had sufficient time or motivation to do so.
Jun 30 2023
parent jmh530 <john.michael.hall gmail.com> writes:
On Friday, 30 June 2023 at 18:47:06 UTC, bachmeier wrote:
 [snip]

 I assume you mean that you've allocated memory on the D side, 
 like this:

 ```
 auto a = new double[24];
 a[] = 1.6;
 Slice!(double*, 1) s = a.sliced();
 ```

 and you want to pass s to R for further analysis. 
 Unfortunately, that will not work. R functions only work with 
 memory R has allocated. It has a single struct type, so there's 
 no way to pass s in this example to R.
Unfortunate, but understood. Looking at the implementation for Vector, the implementation of the constructor and opAssign look like it has to copy the data over anyway.
 [snip]

 As much fun as it is to figure these things out, I have never 
 had sufficient time or motivation to do so.
Yeah, that seems like it would be a bit hairy to figure out.
Jun 30 2023
prev sibling parent reply bachmeier <no spam.net> writes:
On Friday, 30 June 2023 at 16:14:48 UTC, jmh530 wrote:
 On Thursday, 29 June 2023 at 23:51:44 UTC, bachmeier wrote:
 [snip]
Glad you're continuing to do work on this front. There's a lot of great material explaining things, which is always good. It would be cool to have another version of the link below for using a mir Slice with R. https://bachmeil.github.io/betterr/setvar.html
I was wrong. They added custom allocators a while back, but didn't tell anyone. Actually, what I said before is technically correct. The SEXP struct itself still has to be allocated by R and managed by the R garbage collector. It's just that you can use a custom allocator to send a pointer to the data you've allocated, and once R is done with the data, it'll call the function you've provide to free the memory before destroying the SEXP struct that wraps it. I uploaded [an example here](https://github.com/bachmeil/betterr/blob/main/testing/testalloc.d). It's still a bit hackish because you need to adjust the pointer for a header R inserts when it allocates arrays. Adjusting by 10*double.sizeof works in this example, but "my test didn't segfault" doesn't exactly inspire confidence. Once I am comfortable with this solution, I'll do a new release of betterr. This'll be kind of a big deal if it works. For instance, if you want to use a database interface and D doesn't have one, you can use R's interface to that database without having R manage your project's memory. You could use any of the available R interfaces (databases, machine learning libraries, Qt, etc.)
Jul 07 2023
parent jmh530 <john.michael.hall gmail.com> writes:
On Friday, 7 July 2023 at 20:33:08 UTC, bachmeier wrote:
 [snip]

 I was wrong. They added custom allocators a while back, but 
 didn't tell anyone.

 Actually, what I said before is technically correct. The SEXP 
 struct itself still has to be allocated by R and managed by the 
 R garbage collector. It's just that you can use a custom 
 allocator to send a pointer to the data you've allocated, and 
 once R is done with the data, it'll call the function you've 
 provide to free the memory before destroying the SEXP struct 
 that wraps it.

 I uploaded [an example 
 here](https://github.com/bachmeil/betterr/blob/main/testing/testalloc.d).

 It's still a bit hackish because you need to adjust the pointer 
 for a header R inserts when it allocates arrays. Adjusting by 
 10*double.sizeof works in this example, but "my test didn't 
 segfault" doesn't exactly inspire confidence. Once I am 
 comfortable with this solution, I'll do a new release of 
 betterr.

 This'll be kind of a big deal if it works. For instance, if you 
 want to use a database interface and D doesn't have one, you 
 can use R's interface to that database without having R manage 
 your project's memory. You could use any of the available R 
 interfaces (databases, machine learning libraries, Qt, etc.)
Cool. The main thing I want to try is rstan. They have an interface called cmdstan that you can call from the command line that would be possible to use with D. The problem is that you have to write the data to a CSV file and then read it. So it would be kind of slow and I never got around to playing around with it in D. With your tool as it is, I would just have to copy the data in memory, which I would expect not to be as bad of an overhead as IO (but again haven't gotten around to do anything with it).
Jul 07 2023