digitalmars.D - Best interface for memcpy() (and the string.h family of functions)
- Stefanos Baziotis (11/11) May 29 2019 I'm a GSoC student (I'll post this week an update) in
- Jonathan Marler (93/104) May 29 2019 The default memcpy signature is still pretty useful in many
- Stefanos Baziotis (31/51) May 29 2019 I'm not sure about that. Does it really make sense to have such an
- Jonathan Marler (28/81) May 29 2019 Sure. Any time you have a buffer whose type isn't known at
- Stefanos Baziotis (20/60) May 29 2019 You want, because instantiation and inlining of specific types is
- Jonathan Marler (8/37) May 29 2019 It doesn't make a difference whether the final memcpy is `void*`
- Stefanos Baziotis (6/13) May 29 2019 This is what will prevent doing anything really useful in D.
- Jonathan Marler (18/21) May 29 2019 You didn't answer the question. How would inlining the
- Stefanos Baziotis (17/34) May 29 2019 I don't know how "benchmarks" does not answer a question. For me,
- Jonathan Marler (15/53) May 29 2019 Yes that would be an answer, I guess I got confused when you
- Stefanos Baziotis (19/33) May 29 2019 Great, you can see that in the benchmarks, memcpyD is faster than
- Jonathan Marler (6/29) May 29 2019 I haven't benchmarked it yet but here's the changes I've made to
- Stefanos Baziotis (46/50) May 29 2019 Good, this week I'm also working on alignment. (more
- kinke (2/3) May 29 2019 It works fine with LDC, and I guess with GDC too.
- welkam (8/9) May 29 2019 With D you can forward to best suiting implementation. What libc
- kinke (30/33) May 29 2019 ref would only work when copying one instance at a time. Many
- Stefanos Baziotis (12/17) May 29 2019 The current state is that we think that slices should be enough
- kinke (11/17) May 29 2019 In D, there's no ugly and unsafe need to pass slices to memcpy,
- Mike Franklin (8/13) May 29 2019 This is an important observation. My vision for the GSoC project
- Stefanos Baziotis (8/20) May 30 2019 Not important. Because my thought was that a lot of users would
- Mike Franklin (13/23) May 30 2019 If users need to copy blocks of memory they should first prefer
- Mike Franklin (2/5) May 30 2019 should --> shouldn't
- Stefanos Baziotis (25/31) May 30 2019 I agree with Walter on that. I don't think though that ref,
- Kagamin (24/24) May 30 2019 IME partial copy primitives are lacking, so I use this:
I'm a GSoC student (I'll post this week an update) in the project "Independency of D from the C Standard Library". Part of this project is a D implementation of the family of functions memcpy(), memset() etc. What do you think is the best interface for say memcpy()? My initial pick was void memcpyD(T)(T* dst, const T* src), but it was proposed that `ref` instead of pointers might be better. Thanks, Stefanos
May 29 2019
On Wednesday, 29 May 2019 at 11:46:28 UTC, Stefanos Baziotis wrote:I'm a GSoC student (I'll post this week an update) in the project "Independency of D from the C Standard Library". Part of this project is a D implementation of the family of functions memcpy(), memset() etc. What do you think is the best interface for say memcpy()? My initial pick was void memcpyD(T)(T* dst, const T* src), but it was proposed that `ref` instead of pointers might be better. Thanks, StefanosThe default memcpy signature is still pretty useful in many cases. The original signature should still be implemented and available as a non-template function: void memcpy(void* dst, void* src, size_t length); For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this: void memcpy(T,U)(T* dst, U* src, size_t length) { pragma(inline, true); memcpy(cast(void*)dst, cast(void*)src, length); } And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can. However, we also want to get type-safety and bounds-checking when when can. So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy. /** acopy - Array Copy */ void acopy(T,U)(T dst, U src) trusted if (isArrayLike!T && isArrayLike!U && dst[0].sizeof == src[0].sizeof) in { assert(dst.length >= src.length, "copyFrom source length larger than destination"); } do { pragma(inline, true); static assert (!__traits(isStaticArray, T), "acopy doest not accept static arrays since they are passed by value"); import whereever_memcpy_is: memcpy; memcpy(dst.ptr, src.ptr, src.length * ElementSizeForCopy!dst); } /// ditto void acopy(T,U)(T dst, U src) system if (isArrayLike!T && isPointerLike!U && dst[0].sizeof == src[0].sizeof) { pragma(inline, true); static assert (!__traits(isStaticArray, T), "acopy doest not accept static arrays since they are passed by value"); import whereever_memcpy_is: memcpy; memcpy(dst.ptr, src, dst.length * ElementSizeForCopy!dst); } /// ditto void acopy(T,U)(T dst, U src) system if (isPointerLike!T && isArrayLike!U && dst[0].sizeof == src[0].sizeof) { pragma(inline, true); import whereever_memcpy_is: memcpy; memcpy(dst, src.ptr, src.length * ElementSizeForCopy!dst); } /// ditto void acopy(T,U)(T dst, U src, size_t size) system if (isPointerLike!T && isPointerLike!U && dst[0].sizeof == src[0].sizeof) { pragma(inline, true); import whereever_memcpy_is: memcpy; memcpy(dst, src, size * ElementSizeForCopy!dst); } Note that the isArrayLike and isPointerLike and ElementSizeForCopy would probably look something like: template isArrayLike(T) { enum isArrayLike = is(typeof(T.init.length)) && is(typeof(T.init.ptr)) && is(typeof(T.init[0])); } template isPointerLike(T) { enum isPointerLike = T.sizeof == (void*).sizeof && is(typeof(T.init[0])); } // The size of each array element. If the actual size is 0, then it // is assumed to be 1. template ElementSizeForCopy(alias Array) { static if (Array[0].sizeof == 0) enum ElementSizeForCopy = 1; else enum ElementSizeForCopy = Array[0].sizeof; } Note that everything here is an inline-template, so everything gets reduced to a single memcpy call and some bounds checks.
May 29 2019
On Wednesday, 29 May 2019 at 15:41:42 UTC, Jonathan Marler wrote:The default memcpy signature is still pretty useful in many cases. The original signature should still be implemented and available as a non-template function: void memcpy(void* dst, void* src, size_t length); For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this: void memcpy(T,U)(T* dst, U* src, size_t length) { pragma(inline, true); memcpy(cast(void*)dst, cast(void*)src, length); } And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can.I'm not sure about that. Does it really make sense to have such an interface in the case where you don't have libc memcpy available? Although, there is a discussion about such fallback functions. But I don't know, I feel like it will encourage bad practices. In the same way, I don't know about whether it should accept two different types.However, we also want to get type-safety and bounds-checking when when can. So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy.Those are good ideas. But I think all this could be done explicitly with (ref T[] dst, ref T[] source). This makes a specific-to-arrays version, which again I'm unsure if it is good to make specific cases. Generally, all those things are up for discussion, I don't pretend to have some definitive answer. The thing with all this code depending on libc memcpy is that to my understanding, the prospect is that libc will be removed. And this project is a step towards that by making some better D versions (meaning, leveraging D features). If the better version calls libc, then when libc is finally removed, all this code will break. And because we encouraged this bad practice, _a lot_ of code will break. Which will then force people to write their D-version of memcpy(void *dst, const void *src, size_t len); Which of course is bad because suddenly, we lost all the D benefits + we lost all the work that has been put on libc. Best regards, Stefanos
May 29 2019
On Wednesday, 29 May 2019 at 17:35:03 UTC, Stefanos Baziotis wrote:On Wednesday, 29 May 2019 at 15:41:42 UTC, Jonathan Marler wrote:Sure. Any time you have a buffer whose type isn't known at compile-time and you need to copy between them. For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.The default memcpy signature is still pretty useful in many cases. The original signature should still be implemented and available as a non-template function: void memcpy(void* dst, void* src, size_t length); For D, you should also create a template so developer's don't have to cast to `void*` all the time, but it just forwards all calls to the real memcpy function like this: void memcpy(T,U)(T* dst, U* src, size_t length) { pragma(inline, true); memcpy(cast(void*)dst, cast(void*)src, length); } And there's no need to have a different name like `memcpyD`. The function behaves the same as libc's memcpy, and when you have libc available, you should use that implementation instead so you can leverages other people's work when you can.I'm not sure about that. Does it really make sense to have such an interface in the case where you don't have libc memcpy available?Although, there is a discussion about such fallback functions. But I don't know, I feel like it will encourage bad practices. In the same way, I don't know about whether it should accept two different types.Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior. But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.Yes it could be done, but then you end up with N copies of your memcpy implementation, one for every combination of types. You're code size is going to explode. You can certainly support the signature you provided, I just wouldn't have the implementation inside of that template, instead you should cast and forward to memcpy.However, we also want to get type-safety and bounds-checking when when can. So we should also provide a set of templates that accept D arrays, verifies type-safety and bounds checking, then forwards the call to memcpy.Those are good ideas. But I think all this could be done explicitly with (ref T[] dst, ref T[] source). This makes a specific-to-arrays version, which again I'm unsure if it is good to make specific cases.The thing with all this code depending on libc memcpy is that to my understanding, the prospect is that libc will be removed. And this project is a step towards that by making some better D versions (meaning, leveraging D features).Right, which is why you use the libc version by default, and only use your own when libc is disabled. This is what I do in my standard library https://github.com/marler8997/mar which works with or without libc. I went through several designs for how to go about this memcpy solution and what I've provided you is the result of that.If the better version calls libc, then when libc is finally removed, all this code will break. And because we encouraged this bad practice, _a lot_ of code will break.How would it break? If you remove libc, your module should now enable your implementation of memcpy. And all the code that calls memcpy doesn't care whether it came from libc or from a D module.
May 29 2019
On Wednesday, 29 May 2019 at 17:45:59 UTC, Jonathan Marler wrote:So, you copy ubyte*.I'm not sure about that. Does it really make sense to have such an interface in the case where you don't have libc memcpy available?Sure. Any time you have a buffer whose type isn't known at compile-time and you need to copy between them. For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior. But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.You want, because instantiation and inlining of specific types is what makes D memcpy fast. And also, what I hope will make better error messages and instrumentation. But that's yet to be seen, most important is the performance.Yes it could be done, but then you end up with N copies of your memcpy implementation, one for every combination of types. You're code size is going to explode. You can certainly support the signature you provided, I just wouldn't have the implementation inside of that template, instead you should cast and forward to memcpy.Actually, code size for arrays is a very good reminder, thanks.My point is that you will write code differently depending on what memcpy you have, that's why this "new memcpy" will have different signature. To have the best of both worlds, we would have to write our own memcpy(void*, void*, size_t);. And so, if you encourage the use of this interface (because hey, even if you don't have libc eventually, your code will not crash), when libc is not present, the code will be slow.The thing with all this code depending on libc memcpy is that to my understanding, the prospect is that libc will be removed. And this project is a step towards that by making some better D versions (meaning, leveraging D features).Right, which is why you use the libc version by default, and only use your own when libc is disabled. This is what I do in my standard library https://github.com/marler8997/mar which works with or without libc. I went through several designs for how to go about this memcpy solution and what I've provided you is the result of that.If the better version calls libc, then when libc is finally removed, all this code will break. And because we encouraged this bad practice, _a lot_ of code will break.How would it break? If you remove libc, your module should now enable your implementation of memcpy. And all the code that calls memcpy doesn't care whether it came from libc or from a D module.
May 29 2019
On Wednesday, 29 May 2019 at 17:55:49 UTC, Stefanos Baziotis wrote:On Wednesday, 29 May 2019 at 17:45:59 UTC, Jonathan Marler wrote:It doesn't make a difference whether the final memcpy is `void*` or `byte*`. The point is that it's one function, not a template, and you might as well use the same type that the real memcpy uses so you don't change the signature when you're not using libc.So, you copy ubyte*.I'm not sure about that. Does it really make sense to have such an interface in the case where you don't have libc memcpy available?Sure. Any time you have a buffer whose type isn't known at compile-time and you need to copy between them. For example, I have an audio program that copies buffers of audio, but the format of that buffer could be an array of floats or integers depending on the format that your audio hardware and OS support.You don't want to inline the memcpy implementation. What makes you think that would be faster?Well that's why you have memcpy (for those who know what they're doing) and you have other functions for safe behavior. But you don't want to instantiate a new version of memcpy for every type variation, that's why they all just forward the call to the real memcpy.You want, because instantiation and inlining of specific types is what makes D memcpy fast. And also, what I hope will make better error messages and instrumentation. But that's yet to be seen, most important is the performance.
May 29 2019
On Wednesday, 29 May 2019 at 18:00:57 UTC, Jonathan Marler wrote:It doesn't make a difference whether the final memcpy is `void*` or `byte*`.Yes.The point is that it's one function, not a template, and you might as well use the same type that the real memcpy uses so you don't change the signature when you're not using libc.This is what will prevent doing anything really useful in D. This is what I meant that to have that, you have to implement the D version of libc memcpy.You don't want to inline the memcpy implementation. What makes you think that would be faster?CTFE / introspection I hope and currently, benchmarks.
May 29 2019
On Wednesday, 29 May 2019 at 18:04:07 UTC, Stefanos Baziotis wrote:You didn't answer the question. How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation. You only need one instance of the implementation. This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying. All it needs is 2 pointers a size. That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself. If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.You don't want to inline the memcpy implementation. What makes you think that would be faster?CTFE / introspection I hope and currently, benchmarks.
May 29 2019
On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:You didn't answer the question.I don't know how "benchmarks" does not answer a question. For me, it's the most important answer.How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation. You only need one instance of the implementation. This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying. All it needs is 2 pointers a size. That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself. If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.It is typed currently, with benefits. It's not the same for every type and our idea is not to just forward the size. By inlining, you can get quite better performance exactly because you inline and you don't just forward the size and because you know info about the type. Check this: https://github.com/JinShil/memcpyD/blob/master/memcpyd.d And preferably, run it and see the asm generated. Also, what should be considered is that types give you the info about alignment and different implementations depending on this alignment.
May 29 2019
On Wednesday, 29 May 2019 at 19:06:43 UTC, Stefanos Baziotis wrote:On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:Yes that would be an answer, I guess I got confused when you mentioned CTFE and introspection, I wasn't sure if "benchmarks" was referring to those features or to runtime benchmarks. And looks like Mike posted the benchmarks on that github link you sent.You didn't answer the question.I don't know how "benchmarks" does not answer a question. For me, it's the most important answer.It's true that if you can assume pointers are aligned on a particular boundary that you can be faster than memcpy which works with any alignment. This must be what Mike is doing, though, I would then create only a few instances of memcpy that assume alignment on boundaries like 4, 8, 16. And if you have a pointer or an array to a particular type, you can probably assume that pointer/array is aligned on that types's "alignof" property. I think I will use this in my library.How would inlining the implementation of memcpy be faster? The implementation of memcpy doesn't need to know which types it is copying, so every call to it can have the exact same implementation. You only need one instance of the implementation. This means you can fine-tune it, many libc implementations will implement it in assembly because it's used so often and again, it doesn't need to know what types it is copying. All it needs is 2 pointers a size. That's why in D, you should only create wrappers that ensure type-safety and bounds checking and then forward to the real implementation, and those wrappers should be inlined but not the memcpy implementation itself. If you want to provide you own implementation of memcpy you can, but inlining your implementation into every call, when the implementation is truly type agnostic just results in code bloat with no benefit.It is typed currently, with benefits. It's not the same for every type and our idea is not to just forward the size. By inlining, you can get quite better performance exactly because you inline and you don't just forward the size and because you know info about the type. Check this: https://github.com/JinShil/memcpyD/blob/master/memcpyd.d And preferably, run it and see the asm generated. Also, what should be considered is that types give you the info about alignment and different implementations depending on this alignment.
May 29 2019
On Wednesday, 29 May 2019 at 19:35:36 UTC, Jonathan Marler wrote:Yes that would be an answer, I guess I got confused when you mentioned CTFE and introspection, I wasn't sure if "benchmarks" was referring to those features or to runtime benchmarks. And looks like Mike posted the benchmarks on that github link you sent.Great, you can see that in the benchmarks, memcpyD is faster than libc memcpy except for sizes larger than 32768. We hope that we can surpass those as well, as yesterday I did some simple inline SIMD things and got better performance in 32768. But previous work is of course responsibility of Mike and those benchmarks are in part because of inlining.It's true that if you can assume pointers are aligned on a particular boundary that you can be faster than memcpy which works with any alignment. This must be what Mike is doing, though, I would then create only a few instances of memcpy that assume alignment on boundaries like 4, 8, 16. And if you have a pointer or an array to a particular type, you can probably assume that pointer/array is aligned on that types's "alignof" property.This is, as I said, the alignment guarrantee. I hope that I can get other benefits from types also. Also, hopefully we will do LDC / GDC specific things. Leverage the intrinsics for example. I will put an update shortly, as the other students, explaining some of that, but I thought since we started it.. :pI think I will use this in my library.Great! We hope that it will be useful and any feedback is appreciated!
May 29 2019
On Wednesday, 29 May 2019 at 20:28:18 UTC, Stefanos Baziotis wrote:On Wednesday, 29 May 2019 at 19:35:36 UTC, Jonathan Marler wrote:I haven't benchmarked it yet but here's the changes I've made to my standard library to also take advantage of alignment guarantees from typed pointers and arrays. https://github.com/dragon-lang/mar/commit/bb096d2d4f489d47177f6a678b1d9bab756e3dc7[...]Great, you can see that in the benchmarks, memcpyD is faster than libc memcpy except for sizes larger than 32768. We hope that we can surpass those as well, as yesterday I did some simple inline SIMD things and got better performance in 32768. But previous work is of course responsibility of Mike and those benchmarks are in part because of inlining.[...]This is, as I said, the alignment guarrantee. I hope that I can get other benefits from types also. Also, hopefully we will do LDC / GDC specific things. Leverage the intrinsics for example. I will put an update shortly, as the other students, explaining some of that, but I thought since we started it.. :p[...]Great! We hope that it will be useful and any feedback is appreciated!
May 29 2019
On Wednesday, 29 May 2019 at 23:27:35 UTC, Jonathan Marler wrote:I haven't benchmarked it yet but here's the changes I've made to my standard library to also take advantage of alignment guarantees from typed pointers and arrays. https://github.com/dragon-lang/mar/commit/bb096d2d4f489d47177f6a678b1d9bab756e3dc7Good, this week I'm also working on alignment. (more specifically, mis-alignment). Since you took the time anyway to play with alignment, you might find SIMD instructions useful. Take a look at Mike's memcpyD. My yesterday toy SIMD that surpassed libc memcpy was as simple as: static foreach(i; 0 .. T.sizeof/32) { // Assuming RDI is 'dst' and RSI 'src' asm pure nothrow nogc { vmovdqa YMM0, [RDI+i*32]; vmovdqa [RSI+i*32], YMM0; } } /* instead of static foreach(i; 0 .. T.sizeof/32) { memcpyD((cast(S!32*)dst) + i, (cast(const S!32*)src) + i); } */ Again, really simple and dumb, but effective. A couple of notes, so that you don't have the headaches I had: 1) You can use `vmovdqu` (notice the 'u' at the end) for unaligned memory and skip note 2. 2) `vmovdqa` assumes 32-byte aligned memory. Now, `align()` is kind of buggy, so if you have a normal buffer on the stack that you want to align, that: align(32) ubyte[32768] buf; won't work. One solution is to allocate memory on heap and do slight pointer arithmetic to have it aligned. Last minute discovery: Haha, the compiler flags I used were: -mcpu=avx -inline With these flags, memcpyD is faster. _Removing_ -inline resulted in faster code for libc memcpy. I'll have to look close tomorrow. (Oh, and the libc memcpy, it seems from disasm, achieves these results with sse3, so 128-bit instructions. I mean.. at least impressive).
May 29 2019
On Thursday, 30 May 2019 at 00:55:54 UTC, Stefanos Baziotis wrote:Now, `align()` is kind of buggyIt works fine with LDC, and I guess with GDC too.
May 29 2019
On Wednesday, 29 May 2019 at 18:14:11 UTC, Jonathan Marler wrote:and then forward to the real implementationWith D you can forward to best suiting implementation. What libc does it performs various runtime checks in order to figure out what is the best way of copying provided input. With D it should be possible to make certain checks at compile time. Secondly C's memcopy is a big function not because its best for performance but because of convenience. With D we can have many smaller functions and they would be selected by template magic.
May 29 2019
On Wednesday, 29 May 2019 at 11:46:28 UTC, Stefanos Baziotis wrote:My initial pick was void memcpyD(T)(T* dst, const T* src), but it was proposed that `ref` instead of pointers might be better.ref would only work when copying one instance at a time. Many times, you'll want to copy a contiguous array of a length only known at runtime (and definitely NOT invoke memcpy in a loop, so that the implementation can e.g. use SIMD streaming when copying gazillions of 32-bit pixels). I'd suggest a structure similar to this, minimizing bloat: // int a, b; memcpyD(&a, &b); // int[4] a, b; memcpyD(&a, &b); // int[16] a; int[4] b; memcpyD!4(&a[8], b.ptr); void memcpyD(size_t length = 1, T)(T* dst, const T* src) { pickBestImpl!(T.alignof, length * T.sizeof)(dst, src); } void memcpyD(T)(T* dst, const T* src, size_t length) { pickBestImpl!(T.alignof)(dst, src, length * T.sizeof); } private: /* These 2 will probably share most logic, the first one just exploiting a * static size. A common mixin might come in handy (e.g., switching from * runtime-if to static-if). */ void pickBestImpl(size_t alignment, size_t size)(void* dst, const void* src); void pickBestImpl(size_t alignment)(void* dst, const void* src, size_t size);
May 29 2019
On Wednesday, 29 May 2019 at 20:50:45 UTC, kinke wrote:ref would only work when copying one instance at a time. Many times, you'll want to copy a contiguous array of a length only known at runtime (and definitely NOT invoke memcpy in a loop, so that the implementation can e.g. use SIMD streaming when copying gazillions of 32-bit pixels).The current state is that we think that slices should be enough for this need. Meaning, you don't need the third size parameter. In this case, ref is better. On the other, in other cases I think that pointers are more intuitive. Again, of course the fact that _I_ think it is of little importance. That post was primarily made so that you, the community, can give feedback on this. Apart from that, I'm still sceptical about whether we should provide a version with size..
May 29 2019
On Thursday, 30 May 2019 at 00:18:06 UTC, Stefanos Baziotis wrote:The current state is that we think that slices should be enough for this need. Meaning, you don't need the third size parameter. In this case, ref is better. On the other, in other cases I think that pointers are more intuitive.In D, there's no ugly and unsafe need to pass slices to memcpy, as a simple `dst[] = src[]` can do the job much better, boiling down to a memcpy (with 3rd param) if T is a POD (and the two slices don't overlap, have the same length etc. if bounds checks are enabled). Taking a slice by ref, if I understand you correctly, would firstly only work with slice lvalues (i.e., no `ptr[0..$-1]` rvalues), and secondly IMO be very confusing and bad for generic code, as I would expect the slice itself to be memcopied then, not its contents.
May 29 2019
On Thursday, 30 May 2019 at 01:19:54 UTC, kinke wrote:In D, there's no ugly and unsafe need to pass slices to memcpy, as a simple `dst[] = src[]` can do the job much better, boiling down to a memcpy (with 3rd param) if T is a POD (and the two slices don't overlap, have the same length etc. if bounds checks are enabled).This is an important observation. My vision for the GSoC project was targeted primarily at druntime. D memcpy would rarely, if ever, be invoked directly by most users. Expressions like `dst[] = src[]` and other assignment expressions that require memcpy as part of their behaviro, would be lowered by the compiler to the runtime memcpy template. Mike
May 29 2019
On Thursday, 30 May 2019 at 01:35:05 UTC, Mike Franklin wrote:On Thursday, 30 May 2019 at 01:19:54 UTC, kinke wrote:If we don't really target users, then that makes this:In D, there's no ugly and unsafe need to pass slices to memcpy, as a simple `dst[] = src[]` can do the job much better, boiling down to a memcpy (with 3rd param) if T is a POD (and the two slices don't overlap, have the same length etc. if bounds checks are enabled).This is an important observation. My vision for the GSoC project was targeted primarily at druntime. D memcpy would rarely, if ever, be invoked directly by most users.Apart from that, I'm still sceptical about whether we should provide a version with size..Not important. Because my thought was that a lot of users would have some pointers a, b and somehow want to do: memcpy(a, b, for_some_size); What I'm thinking is that yes, we decouple D from libc _on D Runtime_. But in general, users may will still want that.
May 30 2019
On Thursday, 30 May 2019 at 08:28:50 UTC, Stefanos Baziotis wrote:If we don't really target users, then that makes this:If users need to copy blocks of memory they should first prefer those D features that were added to improve upon C, so users don't have to resort to raw pointers, pointer arithmetic, managing sizes outside of arrays, etc. See Walter's article "C's biggest mistake" for some perspective on that http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625 It's important when designing a D replacement for a C feature to not repeat C's mistakes. I wouldn't rule out a public interface in the future, but, at the moment, I don't see a compelling use case given that D has first-class arrays. Regardless, a public interface should be required to achieve the goals of the GSoC project and could introduce controversy and other design complications. MikeApart from that, I'm still sceptical about whether we should provide a version with size..Not important. Because my thought was that a lot of users would have some pointers a, b and somehow want to do: memcpy(a, b, for_some_size); What I'm thinking is that yes, we decouple D from libc _on D Runtime_. But in general, users may will still want that.
May 30 2019
On Thursday, 30 May 2019 at 09:10:11 UTC, Mike Franklin wrote:Regardless, a public interface should be required to achieve the goals of the GSoC project and could introduce controversy and other design complications.should --> shouldn't
May 30 2019
On Thursday, 30 May 2019 at 09:10:11 UTC, Mike Franklin wrote:If users need to copy blocks of memory they should first prefer those D features that were added to improve upon C, so users don't have to resort to raw pointers, pointer arithmetic, managing sizes outside of arrays, etc. See Walter's article "C's biggest mistake" for some perspective on that http://www.drdobbs.com/architecture-and-design/cs-biggest-mistake/228701625 It's important when designing a D replacement for a C feature to not repeat C's mistakes.I agree with Walter on that. I don't think though that ref, dynamic arrays as now and GC are the solution to that or low-level memory management in general. I think that people are in 2 categories: 1) People that use these D features will probably never want to use mempcy() (directly) anyway. 2) People that use D more as a betterC will probably want to use a memcpy() with pointers and possibly one more optional parameter, in which they will give size. But, some important notes: a) D moves in a certain direction, away from C, pointers etc. And it moves towards ref, dynamic arrays. Agreeing with that is not important, but help is. b) If memcpy() targets (possibly only) the D Runtime, then it doesn't really care for the users in category 1) or 2) as they are on the user side. So, I think the best option in this regard, especially note a) is to use refs, unless there are serious implementation obstacles (which I doubt). - Stefanos
May 30 2019
IME partial copy primitives are lacking, so I use this: /// Copy only as much as possible, return the copied data T[] CopyHead(T)(T[] dst, in T[] src) pure { if(dst.length>=src.length)return CopyAll(dst, src); CopyOverlap(dst, src[0..dst.length]); return dst; } /// Copy all input data, return the copied data T[] CopyAll(T)(T[] dst, in T[] src) pure { assert(dst.length>=src.length); dst=dst[0..src.length]; CopyOverlap(dst, src); return dst; } /// Copy overlapping slices void CopyOverlap(T)(T[] dst, in T[] src) pure { import core.stdc.string:memmove; assert(dst.length==src.length,"same lengths required"); byte[] dstBytes=cast(byte[])dst; memmove(dstBytes.ptr, src.ptr, dstBytes.length); }
May 30 2019