www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - memcpy() comparison: C, Rust, and D

reply Walter Bright <newshound2 digitalmars.com> writes:
The C99 Standard says:

   #include <string.h>
   void *memcpy(void * restrict s1, const void * restrict s2, size_t n);

   Description

   The memcpy function copies n characters from the object pointed to by s2
   into the object pointed to by s1. If copying takes place between objects
   that overlap, the behavior is undefined.

   Returns

   The memcpy function returns the value of s1.


Rust says https://doc.rust-lang.org/1.14.0/libc/fn.memcpy.html:

   pub unsafe extern fn memcpy(dest: *mut c_void,
                             src: *const c_void,
                             n: size_t)
                             -> *mut c_void


D says https://github.com/dlang/druntime/blob/master/src/core/stdc/string.d#L32:

   pure void* memcpy(return void* s1, scope const void* s2, size_t n);

---

Just from D's type signature, we can know a lot about memcpy():

1. There are no side effects.
2. The return value is derived from s1.
3. Nothing s2 transitively points to is altered via s2.
4. Copies of s1 or s2 are not saved.

The C declaration does not give us any of that info, although the C description
does give us 2, and the 'restrict' says that s1 and s2 do not overlap.

The Rust declaration does not give us 1, 2 or 4 (because it is marked as 
unsafe). If it was safe, the declaration does not give us 2.

By this information being knowable from the declaration, the compiler knows it 
too and can make use of it.
Jan 30
next sibling parent reply Mike <none none.com> writes:
On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
 Just from D's type signature, we can know a lot about memcpy():
Yes, D has some notable advantages over other languages, but it also has some notable disadvantages. One in particular prevents me from using D, period! - https://issues.dlang.org/show_bug.cgi?id=14758 Rust has "minimal runtime" as a pillar of its language which makes it very attractive for systems programming. D will never compete with Rust in that domain if it doesn't lead an effort to do something about it. Being able to infer implementation characteristics from a function signature is insignificant in comparison. Mike
Jan 30
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/30/2017 5:53 PM, Mike wrote:
 One in particular prevents me from using D, period! -
 https://issues.dlang.org/show_bug.cgi?id=14758
The -betterC switch is the approach we intend to take to deal with that issue.
Jan 30
next sibling parent reply Mike <none none.com> writes:
On Tuesday, 31 January 2017 at 02:01:05 UTC, Walter Bright wrote:
 On 1/30/2017 5:53 PM, Mike wrote:
 One in particular prevents me from using D, period! -
 https://issues.dlang.org/show_bug.cgi?id=14758
The -betterC switch is the approach we intend to take to deal with that issue.
I recommend against that; it's too blunt of an instrument. Instead I suggest following through on things like https://issues.dlang.org/show_bug.cgi?id=12270 and considering this proposal (http://forum.dlang.org/post/psssnzurlzeqeneagora forum.dlang.org) instead. Mike
Jan 30
parent ZombineDev <petar.p.kirov gmail.com> writes:
On Tuesday, 31 January 2017 at 02:18:23 UTC, Mike wrote:
 On Tuesday, 31 January 2017 at 02:01:05 UTC, Walter Bright 
 wrote:
 On 1/30/2017 5:53 PM, Mike wrote:
 One in particular prevents me from using D, period! -
 https://issues.dlang.org/show_bug.cgi?id=14758
The -betterC switch is the approach we intend to take to deal with that issue.
I recommend against that; it's too blunt of an instrument. Instead I suggest following through on things like https://issues.dlang.org/show_bug.cgi?id=12270 and considering this proposal (http://forum.dlang.org/post/psssnzurlzeqeneagora forum.dlang.org) instead. Mike
The main problem - the opaque extern (C) interface an therefore non-lazy code generation - is already being worked on: http://forum.dlang.org/thread/o0kdnp$2i2t$1 digitalmars.com http://forum.dlang.org/thread/wbtjswiwbrtxzqiorwiw forum.dlang.org
Jan 31
prev sibling parent reply Jaded Observer <observer invalid.invalid> writes:
On Tuesday, 31 January 2017 at 02:01:05 UTC, Walter Bright wrote:
 On 1/30/2017 5:53 PM, Mike wrote:
 One in particular prevents me from using D, period! -
 https://issues.dlang.org/show_bug.cgi?id=14758
The -betterC switch is the approach we intend to take to deal with that issue.
You aren't dealing with anything. Your compiler doesn't even target the single biggest embedded platform
Jan 30
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 31/01/2017 3:58 PM, Jaded Observer wrote:
 On Tuesday, 31 January 2017 at 02:01:05 UTC, Walter Bright wrote:
 On 1/30/2017 5:53 PM, Mike wrote:
 One in particular prevents me from using D, period! -
 https://issues.dlang.org/show_bug.cgi?id=14758
The -betterC switch is the approach we intend to take to deal with that issue.
You aren't dealing with anything. Your compiler doesn't even target the single biggest embedded platform
Nobody is going to waste their time adding ARM support to dmd-be. Especially with ldc and gdc which already do. Now if we were to rewrite dmd-be and made sure there was no legality issues for Walter then maybe we can consider more targets into the reference compiler and more importantly non-OS based targets like MCU's and kernel development. Until then however, making dmd itself work for both kernel and user space development for x86(_64) seems like a good idea and can be reused for other compilers such as ldc and gdc.
Jan 30
prev sibling next sibling parent reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:

 Rust:
   pub unsafe extern fn memcpy(dest: *mut c_void,
                             src: *const c_void,
                             n: size_t)
                             -> *mut c_void
 D:
   pure void* memcpy(return void* s1, scope const void* s2, 
 size_t n);
 2. The return value is derived from s1.
How can we be sure that the return value points to the same content as `s1`? If that is what you mean by "derived".
 The Rust declaration does not give us 1, 2 or 4 (because it is 
 marked as unsafe). If it was safe, the declaration does not 
 give us 2.
I don't see how Rust doesn't provide information 2 aswell. Is it because of differences in the meaning of `const` in Rust compared to D?
Jan 31
next sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 January 2017 at 09:31:23 UTC, Nordlöw wrote:
 How can we be sure that the return value points to the same 
 content as `s1`? If that is what you mean by "derived".
Now that we allow allocation functions to be `pure` as in https://github.com/dlang/druntime/pull/1746
Jan 31
prev sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Tuesday, 31 January 2017 at 09:31:23 UTC, Nordlöw wrote:
 How can we be sure that the return value points to the same 
 content as `s1`?
Because of the return attribute. return means I am passing this value through myself.
Jan 31
next sibling parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Tuesday, 31 January 2017 at 10:00:03 UTC, Stefan Koch wrote:
 On Tuesday, 31 January 2017 at 09:31:23 UTC, Nordlöw wrote:
 How can we be sure that the return value points to the same 
 content as `s1`?
Because of the return attribute. return means I am passing this value through myself.
I thought it meant "the parameter can be returned" not "the parameter *will* be returned".
Jan 31
parent reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 January 2017 at 10:20:59 UTC, Olivier FAURE wrote:
 I thought it meant "the parameter can be returned" not "the 
 parameter *will* be returned".
And what about multiple `return`-qualified function parameters?
Jan 31
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/31/2017 2:44 AM, Nordlöw wrote:
 On Tuesday, 31 January 2017 at 10:20:59 UTC, Olivier FAURE wrote:
 I thought it meant "the parameter can be returned" not "the parameter *will*
 be returned".
And what about multiple `return`-qualified function parameters?
'return' means treat the return value "as if" it was derived from that parameter. If multiple parameters are marked as 'return', treat the return value as if it was derived from one of them. This "as if" thing enables the designer of a function API to set the desired relationships even if the implementation is doing some deviated preversion with the data (i.e. a ref counted object).
Jan 31
parent reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 January 2017 at 19:26:51 UTC, Walter Bright wrote:
 On 1/31/2017 2:44 AM, Nordlöw wrote:
 On Tuesday, 31 January 2017 at 10:20:59 UTC, Olivier FAURE 
 wrote:
 I thought it meant "the parameter can be returned" not "the 
 parameter *will*
 be returned".
And what about multiple `return`-qualified function parameters?
'return' means treat the return value "as if" it was derived from that parameter. If multiple parameters are marked as 'return', treat the return value as if it was derived from one of them. This "as if" thing enables the designer of a function API to set the desired relationships even if the implementation is doing some deviated preversion with the data (i.e. a ref counted object).
Why is this feature used? Optimizations? Safety?
Jan 31
next sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 January 2017 at 19:32:58 UTC, Nordlöw wrote:
 Why is this feature used? Optimizations? Safety?
Just me being lazy. I'm gonna read up on https://wiki.dlang.org/DIP25
Jan 31
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/31/2017 11:32 AM, Nordlöw wrote:
 On Tuesday, 31 January 2017 at 19:26:51 UTC, Walter Bright wrote:
 This "as if" thing enables the designer of a function API to set the desired
 relationships even if the implementation is doing some deviated preversion
 with the data (i.e. a ref counted object).
Why is this feature used? Optimizations? Safety?
So ref counted containers can be built.
Jan 31
parent deadalnix <deadalnix gmail.com> writes:
On Tuesday, 31 January 2017 at 23:42:43 UTC, Walter Bright wrote:
 On 1/31/2017 11:32 AM, Nordlöw wrote:
 On Tuesday, 31 January 2017 at 19:26:51 UTC, Walter Bright 
 wrote:
 This "as if" thing enables the designer of a function API to 
 set the desired
 relationships even if the implementation is doing some 
 deviated preversion
 with the data (i.e. a ref counted object).
Why is this feature used? Optimizations? Safety?
So ref counted containers can be built.
As long as they don't use a tree or any other datastructure that require more than one level of indirection internally.
Jan 31
prev sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Tuesday, 31 January 2017 at 10:00:03 UTC, Stefan Koch wrote:
 Because of the return attribute.
 return means I am passing this value through myself.
Ahh, missed that. My mistake. Thanks, Walter!
Jan 31
prev sibling next sibling parent reply Olivier FAURE <olivier.faure epitech.eu> writes:
On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
 3. Nothing s2 transitively points to is altered via s2.
Wait, really? Does that mean that this code is implicitly illegal? import core.stdc.string; void main() { int*[10] data1; int*[10] data2; memcpy(data1.ptr, data2.ptr, 10); } Since memcpy is system, I have no way to know for sure (the compiler obviously won't warn me since I can't mark main as safe), so I'd argue the prototype doesn't carry that information.
Jan 31
parent reply John Colvin <john.loughran.colvin gmail.com> writes:
On Tuesday, 31 January 2017 at 10:17:09 UTC, Olivier FAURE wrote:
 On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright 
 wrote:
 3. Nothing s2 transitively points to is altered via s2.
Wait, really? Does that mean that this code is implicitly illegal? import core.stdc.string; void main() { int*[10] data1; int*[10] data2; memcpy(data1.ptr, data2.ptr, 10); } Since memcpy is system, I have no way to know for sure (the compiler obviously won't warn me since I can't mark main as safe), so I'd argue the prototype doesn't carry that information.
Point 3 is about `const`, which as far as I know is unaffected by application of safe. Did you mean to quote a different point?
Jan 31
parent Olivier FAURE <olivier.faure epitech.eu> writes:
On Tuesday, 31 January 2017 at 11:32:01 UTC, John Colvin wrote:
 On Tuesday, 31 January 2017 at 10:17:09 UTC, Olivier FAURE 
 wrote:
 On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright 
 wrote:
Point 3 is about `const`, which as far as I know is unaffected by application of safe. Did you mean to quote a different point?
Oh yeah, I thought it was about scope. Makes sense then.
Jan 31
prev sibling next sibling parent reply Richard Delorme <abulmo club-internet.fr> writes:
On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
 Just from D's type signature, we can know a lot about memcpy():

 1. There are no side effects.
 2. The return value is derived from s1.
 3. Nothing s2 transitively points to is altered via s2.
 4. Copies of s1 or s2 are not saved.

 The C declaration does not give us any of that info, although 
 the C description
 does give us 2, and the 'restrict' says that s1 and s2 do not 
 overlap.

 The Rust declaration does not give us 1, 2 or 4 (because it is 
 marked as unsafe). If it was safe, the declaration does not 
 give us 2.

 By this information being knowable from the declaration, the 
 compiler knows it too and can make use of it.
Well, I would not have taken memcpy as an example in favor of D. Good C compilers (like gcc) know what memcpy does and are able to optimize it according to its arguments. DMD may know better about memcpy through its declaration but does not make any use about it. A simple example: // cmemcpy.c #include <string.h> #include <stdio.h> int main(void) { char a[16] = "world hello"; char b[16] = ""; memcpy(b, a, 12); memcpy(b, a + 6, 5); memcpy(b + 6, a, 5); printf("%s -> %s\n", a, b); } //------------ gcc -Ofast produces the following code: main: .LFB0: .cfi_startproc subq $40, %rsp .cfi_def_cfa_offset 48 movl $.LC0, %edi movabsq $7307126011096887159, %rax movq %rax, (%rsp) movq %rsp, %rdx movq %rax, 16(%rsp) leaq 16(%rsp), %rsi movq $7302252, 24(%rsp) movl 22(%rsp), %eax movq $0, 8(%rsp) movl $7302252, 8(%rsp) movl %eax, (%rsp) movzbl 26(%rsp), %eax movb %al, 4(%rsp) movl 16(%rsp), %eax movl %eax, 6(%rsp) movzbl 20(%rsp), %eax movb %al, 10(%rsp) xorl %eax, %eax call printf xorl %eax, %eax addq $40, %rsp .cfi_def_cfa_offset 8 ret No call to memcpy, this has been optimized out by the compiler. Now a D equivalent: // dmemcpy.d module dmemcpy; import core.stdc.string, std.stdio; void main() { char [16] a_ = "world hello", b_ = ""; void* a = &a_[0], b = &b_[0]; memcpy(b, a, 12); memcpy(b, a + 6, 5); memcpy(b + 6, a, 5); writefln("%s -> %s", a_, b_); } //-------------------- dmd -O -release -inline -boundscheck=off prouces the following asm: _Dmain: push RBP mov RBP,RSP sub RSP,020h lea RSI,_TMP0 PC32[RIP] lea RDI,-020h[RBP] movsd movsd lea RSI,_TMP0 PC32[RIP] lea RDI,-010h[RBP] movsd movsd mov EDX,0Ch lea RSI,-020h[RBP] lea RDI,-010h[RBP] call memcpy PLT32 mov EDX,5 lea RSI,-01Ah[RBP] lea RDI,-010h[RBP] call memcpy PLT32 mov EDX,5 lea RSI,-020h[RBP] lea RDI,-0Ah[RBP] call memcpy PLT32 lea RDX,_TMP0 PC32[RIP] mov EDI,8 mov RSI,RDX push dword ptr -018h[RBP] push dword ptr -020h[RBP] push dword ptr -8[RBP] push dword ptr -010h[RBP] call _D3std5stdio27__T8writeflnTAyaTG16aTG16aZ8writeflnFNfAyaG16aG16aZv PLT32 add RSP,020h xor EAX,EAX mov RSP,RBP pop RBP ret So with DMD, calls to memcpy are done verbatim, without any optimization :-( To be fair, gdc will optimize the memcpy call out too. But, my main argument here, is that a good C compiler, is able to do a very good job at optimizing memcpy, so the extra information brought by the D language, is not so useful in practice.
Jan 31
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 31 January 2017 at 13:50:33 UTC, Richard Delorme 
wrote:
 But, my main argument here, is that a good C compiler, is able 
 to do a very good job at optimizing memcpy, so the extra 
 information brought by the D language, is not so useful in 
 practice.
The point of Walter's argument was not about performance but, safety guarantees. Guarantees which cannot be given by C, or by Rust due to it's marking as unsafe (as I understand it). So the extra info is very useful for the programmer. Also, I would compare apples to apples here and compile both the C program and the D code with LLVM.
Jan 31
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/31/2017 5:50 AM, Richard Delorme wrote:
 Well, I would not have taken memcpy as an example in favor of D. Good C
 compilers (like gcc) know what memcpy does and are able to optimize it
according
 to its arguments. DMD may know better about memcpy through its declaration but
 does not make any use about it.
That may be true, but is not my point. The compiler cannot have built in knowledge of every function. I just used memcpy() as an example because it is extremely well known. As for making use of the type signature information, DMD uses it to check the memory safety of arguments supplied to memcpy(), something gcc does not do.
Jan 31
parent reply Richard Delorme <abulmo club-internet.fr> writes:
On Tuesday, 31 January 2017 at 19:20:40 UTC, Walter Bright wrote:
 On 1/31/2017 5:50 AM, Richard Delorme wrote:
 Well, I would not have taken memcpy as an example in favor of 
 D. Good C
 compilers (like gcc) know what memcpy does and are able to 
 optimize it according
 to its arguments. DMD may know better about memcpy through its 
 declaration but
 does not make any use about it.
That may be true, but is not my point. The compiler cannot have built in knowledge of every function. I just used memcpy() as an example because it is extremely well known. As for making use of the type signature information, DMD uses it to check the memory safety of arguments supplied to memcpy(), something gcc does not do.
May we have an example of how the memory safety of arguments supplied to memcpy is checked in a way gcc cannot do? I was thinking of the return attribute, that prevents for example to return the address of a local variable through a call to memcpy: module dmemcpy; import std.stdio; extern (C) system nothrow nogc pure void* memcpy(return void* s1, const scope void *s2, size_t n); void *copy(const scope void *c, size_t n) { byte [16] d; return memcpy(&d[0], c, n); } void main() { byte [16] a = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]; byte *b = cast (byte*) copy(&a[0], 8); foreach (i; 0..16) write(b[i], ", "); writeln(); } // ------- end ---------- $ dmd dmemcpy2.d dmemcpy2.d(9): Error: escaping reference to local variable d without the return attribute in memcpy declaration, dmd does not issue this error message. The equivalent program in C: #include <string.h> #include <stdio.h> void *copy(const void *c, size_t n) { char d[16]; return memcpy(d, c, n); } int main(void) { char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; char *b = copy(a, 8); for (int i = 0; i < 16; ++i) printf("%d ", b[i]); putchar('\n'); } // --- end of program ---- Compiling with gcc: $ gcc memcpy2.c -O2 -W memcpy2.c: In function 'copy': memcpy2.c:6:9: warning: function returns address of local variable [-Wreturn-local-addr] return memcpy(d, c, n); ^~~~~~~~~~~~~~~ memcpy2.c:5:7: note: declared here char d[16]; Or with clang: $ clang memcpy2.c --analyze memcpy2.c:6:2: warning: Address of stack memory associated with local variable 'd' returned to caller return memcpy(d, c, n); ^~~~~~~~~~~~~~~~~~~~~~ So, even without a return attribute, good C compilers like gcc or clang are able to emit a warning message. I am not really convinced by the necessity of attributes to enhance memory safety. I think the compiler should be able to check for safety without the user to ask it. Having to write attributes is a burden put on the user of the compiler. What if I forget to write an attribute? I just mistakenly make my program unsafe :-( Having a compiler checking for potential errors without me asking for is much safer in my humble opinion. -- Richard Delorme
Jan 31
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/31/2017 3:00 PM, Richard Delorme wrote:
 May we have an example of how the memory safety of arguments supplied to memcpy
 is checked in a way gcc cannot do?

 I was thinking of the return attribute, that prevents for example to return the
 address of a local variable through a call to memcpy:
The thing about memcpy is compilers build in a LOT of information about it that simply is not there in the declaration. I suggest retrying your example for gcc/clang, but use your own memcpy, i.e.: void* mymemcpy(void * restrict s1, const void * restrict s2, size_t n); Let us know what the results are!
Jan 31
parent reply Richard Delorme <abulmo club-internet.fr> writes:
On Tuesday, 31 January 2017 at 23:30:04 UTC, Walter Bright wrote:
 On 1/31/2017 3:00 PM, Richard Delorme wrote:
 The thing about memcpy is compilers build in a LOT of 
 information about it that simply is not there in the 
 declaration. I suggest retrying your example for gcc/clang, but 
 use your own memcpy, i.e.:

    void* mymemcpy(void * restrict s1, const void * restrict s2, 
 size_t n);

 Let us know what the results are!
//-----8<------------------------------------------------------- #include <string.h> #include <stdio.h> void* mymemcpy(void* restrict dest, const void* restrict src, size_t n) { const char *s = src; char *d = dest; for (size_t i = 0; i < n; ++i) d[i] = s[i]; return d; } void *copy(const void *c, size_t n) { char d[16]; return mymemcpy(d, c, n); } int main(void) { char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; char *b = copy(a, 8); for (int i = 0; i < 16; ++i) printf("%d ", b[i]); putchar('\n'); } //-----8<------------------------------------------------------- $ gcc mymemcpy.c -O2 -W mymemcpy.c: In function 'copy': mymemcpy.c:13:9: warning: function returns address of local variable [-Wreturn-local-addr] return mymemcpy(d, c, n); ^~~~~~~~~~~~~~~~~ memcpy4.c:12:7: note: declared here char d[16]; clang (version 3.8.1) failed to find error in this code.
Feb 01
next sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Wednesday, 1 February 2017 at 10:05:49 UTC, Richard Delorme 
wrote:
 On Tuesday, 31 January 2017 at 23:30:04 UTC, Walter Bright 
 wrote:
 On 1/31/2017 3:00 PM, Richard Delorme wrote:
 The thing about memcpy is compilers build in a LOT of 
 information about it that simply is not there in the 
 declaration. I suggest retrying your example for gcc/clang, 
 but use your own memcpy, i.e.:

    void* mymemcpy(void * restrict s1, const void * restrict 
 s2, size_t n);

 Let us know what the results are!
//-----8<------------------------------------------------------- #include <string.h> #include <stdio.h> void* mymemcpy(void* restrict dest, const void* restrict src, size_t n) { const char *s = src; char *d = dest; for (size_t i = 0; i < n; ++i) d[i] = s[i]; return d; } void *copy(const void *c, size_t n) { char d[16]; return mymemcpy(d, c, n); } int main(void) { char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; char *b = copy(a, 8); for (int i = 0; i < 16; ++i) printf("%d ", b[i]); putchar('\n'); } //-----8<------------------------------------------------------- $ gcc mymemcpy.c -O2 -W mymemcpy.c: In function 'copy': mymemcpy.c:13:9: warning: function returns address of local variable [-Wreturn-local-addr] return mymemcpy(d, c, n); ^~~~~~~~~~~~~~~~~ memcpy4.c:12:7: note: declared here char d[16]; clang (version 3.8.1) failed to find error in this code.
You have to define the mymemcpy() in another source file and only put the prototype in this module. If the compiler sees the code it can do the complete data flow analyses. With only the declaration it can't and that is Walter's point. The annotations allow to give to the declaration the information the compiler can not deduce itself from the code, because the code is in another module (object file, library).
Feb 01
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Wed, 01 Feb 2017 10:20:45 +0000, Patrick Schluter wrote:
 You have to define the mymemcpy() in another source file and only put
 the prototype in this module. If the compiler sees the code it can do
 the complete data flow analyses. With only the declaration it can't and
 that is Walter's point. The annotations allow to give to the declaration
 the information the compiler can not deduce itself from the code,
 because the code is in another module (object file, library).
OTOH I haven't seen anyone distribute a D library with .di files, and even extern(D) is pretty rare. That means whole program analysis is a lot more feasible. The exception is virtual dispatch and functional programming, which is leaky enough compiling the whole program at once and intractable with any level of incremental compilation. The most obvious form is dub compiling each package you depend on into a static library. But all whole-program analysis problems can be solved with a custom linker and object format, right?
Feb 01
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 8:12 AM, Chris Wright wrote:
 OTOH I haven't seen anyone distribute a D library with .di files
druntime uses a lot of header-only imports, including for D code. (The gc, for example, is not presented as source code to the compiler.)
Feb 01
prev sibling parent reply Richard Delorme <abulmo club-internet.fr> writes:
On Wednesday, 1 February 2017 at 10:20:45 UTC, Patrick Schluter 
wrote:
 On Wednesday, 1 February 2017 at 10:05:49 UTC, Richard Delorme 
 wrote:
 //-----8<-------------------------------------------------------
 #include <string.h>
 #include <stdio.h>

 void* mymemcpy(void* restrict dest, const void* restrict src, 
 size_t n) {
 	const char *s = src;
 	char *d = dest;
 	for (size_t i = 0; i < n; ++i) d[i] = s[i];
 	return d;
 }

 void *copy(const void *c, size_t n) {
 	char d[16];
 	return mymemcpy(d, c, n);
 }	

 int main(void) {
 	char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 
 14, 15};
 	char *b = copy(a, 8);

 	for (int i = 0; i < 16; ++i) printf("%d ", b[i]);
 	putchar('\n');
 }
 //-----8<-------------------------------------------------------
 $ gcc mymemcpy.c -O2 -W
 mymemcpy.c: In function 'copy':
 mymemcpy.c:13:9: warning: function returns address of local 
 variable [-Wreturn-local-addr]
   return mymemcpy(d, c, n);
          ^~~~~~~~~~~~~~~~~
 memcpy4.c:12:7: note: declared here
   char d[16];

 clang (version 3.8.1) failed to find error in this code.
You have to define the mymemcpy() in another source file and only put the prototype in this module. If the compiler sees the code it can do the complete data flow analyses. With only the declaration it can't and that is Walter's point. The annotations allow to give to the declaration the information the compiler can not deduce itself from the code, because the code is in another module (object file, library).
Right, if defined in another file, the compiler will not emit any warning. However other tools can detect this kind of error. For instance, valgrind works great in this example, directly on the executable: $ valgrind --track-origins=yes mymemcpy [...] ==31041== Conditional jump or move depends on uninitialised value(s) ==31041== at 0x4E843C7: vfprintf (in /usr/lib64/libc-2.23.so) ==31041== by 0x4E8B9A8: printf (in /usr/lib64/libc-2.23.so) ==31041== by 0x400682: main (main.c:15) ==31041== Uninitialised value was created by a stack allocation ==31041== at 0x400652: main (main.c:13) [...] Thus, I still have a mitigated feeling on attributes. In my humble opinion, it is wrong to put on the programmer the responsibility to make his program safer by stacking attributes on function declarations. I prefer to ask the compiler to detect as much defects as possible (but not more!), and to rely on external tools like valgrind, gdb, etc. to detect more subtle bugs.
Feb 01
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 12:38 PM, Richard Delorme wrote:
 Right, if defined in another file, the compiler will not emit any warning.
 However other tools can detect this kind of error. For instance, valgrind works
 great in this example, directly on the executable:

 $ valgrind --track-origins=yes mymemcpy
 [...]
 ==31041== Conditional jump or move depends on uninitialised value(s)
 ==31041==    at 0x4E843C7: vfprintf (in /usr/lib64/libc-2.23.so)
 ==31041==    by 0x4E8B9A8: printf (in /usr/lib64/libc-2.23.so)
 ==31041==    by 0x400682: main (main.c:15)
 ==31041==  Uninitialised value was created by a stack allocation
 ==31041==    at 0x400652: main (main.c:13)
 [...]

 Thus, I still have a mitigated feeling on attributes. In my humble opinion, it
 is wrong to put on the programmer the responsibility to make his program safer
 by stacking attributes on function declarations. I prefer to ask the compiler
to
 detect as much defects as possible (but not more!), and to rely on external
 tools like valgrind, gdb, etc. to detect more subtle bugs.
You're right that valgrind can detect these sorts of things, and valgrind is such an amazing tool I suspect that it has almost single handedly saved C from oblivion. That said, there are limitations: 1. Valgrind does not detect errors if it isn't run, and not many people run it regularly. 2. Valgrind does not detect errors unless the error actually happens in the running code. This means you'll need a test suite with 100% coverage for valgrind to find all the errors. 3. The prevalence of memory safety errors in shipped code shows that valgrind is not being used enough nor is effective enough. 4. Valgrind slows down the execution of code by an order of magnitude or two. This makes it impractical for many applications, and impossible to instrument code being run by the user. (It isn't run by the D autotester, for example.) 5. Bugs caught at compile time are far, far cheaper to fix than those caught by the test suite. This is well documented. 6. Valgrind isn't available on all platforms, like Windows, embedded systems, phones (?), etc. 7. There is value in having a guarantee that code doesn't suffer from certain kinds of bugs. Valgrind cannot offer such a guarantee. It's much like the dynamic typing vs static typing debate. Do you prefer finding problems at run time or compile time? And lastly, D does not require you to use these annotations.
Feb 01
parent reply Claude <no no.no> writes:
On Wednesday, 1 February 2017 at 21:16:30 UTC, Walter Bright 
wrote:
 6. Valgrind isn't available on all platforms, like Windows, 
 embedded systems, phones (?), etc.
You can use valgrind on embedded systems as long as they run a GNU/Linux OS. I've used valgrind successfully many times on ARM architecture. But I don't know if it works with Android, and I doubt it works on baremetal indeed.
Feb 02
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2017 1:21 AM, Claude wrote:
 On Wednesday, 1 February 2017 at 21:16:30 UTC, Walter Bright wrote:
 6. Valgrind isn't available on all platforms, like Windows, embedded systems,
 phones (?), etc.
You can use valgrind on embedded systems as long as they run a GNU/Linux OS. I've used valgrind successfully many times on ARM architecture. But I don't know if it works with Android, and I doubt it works on baremetal indeed.
I seem to recall Valgrind wasn't on OSX, either, at one point. Maybe that has since been corrected.
Feb 02
next sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 2 February 2017 at 09:28:15 UTC, Walter Bright wrote:
 On 2/2/2017 1:21 AM, Claude wrote:
 On Wednesday, 1 February 2017 at 21:16:30 UTC, Walter Bright 
 wrote:
 6. Valgrind isn't available on all platforms, like Windows, 
 embedded systems,
 phones (?), etc.
You can use valgrind on embedded systems as long as they run a GNU/Linux OS. I've used valgrind successfully many times on ARM architecture. But I don't know if it works with Android, and I doubt it works on baremetal indeed.
I seem to recall Valgrind wasn't on OSX, either, at one point. Maybe that has since been corrected.
Also, unless you're testing possible bugs in compiler backends or the C standard library, it mostly doesn't matter. Compile on regular x86/Linux and run valgrind/asan there. Have I run into weird bugs that only occurred on one platform? Yes. Were they _really_ rare? You betcha. * Atila * At one point I had a bug where casting from double to uint64_t failed for values over the integer maximum for 32-bits... (turns out that platform had no FPU and it was a bug in its libc)
Feb 02
next sibling parent reply Chris Wright <dhasenan gmail.com> writes:
On Thu, 02 Feb 2017 14:19:02 +0000, Atila Neves wrote:
 Also, unless you're testing possible bugs in compiler backends or the C
 standard library, it mostly doesn't matter. Compile on regular x86/Linux
 and run valgrind/asan there.
Assuming you're writing cross-platform code. How common is that? I write Java for a living, and some of my code only works on Linux. (It does at least fail gracefully on OSX, which my coworkers use.)
Feb 02
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2017 8:44 AM, Chris Wright wrote:
 Assuming you're writing cross-platform code. How common is that?
Exactly. That's why Valgrind is not a substitute for a language that offers memory safety features.
Feb 02
prev sibling parent Atila Neves <atila.neves gmail.com> writes:
On Thursday, 2 February 2017 at 16:44:26 UTC, Chris Wright wrote:
 On Thu, 02 Feb 2017 14:19:02 +0000, Atila Neves wrote:
 Also, unless you're testing possible bugs in compiler backends 
 or the C standard library, it mostly doesn't matter. Compile 
 on regular x86/Linux and run valgrind/asan there.
Assuming you're writing cross-platform code. How common is that? I write Java for a living, and some of my code only works on Linux. (It does at least fail gracefully on OSX, which my coworkers use.)
Ah, Java: write once, debug everywhere. :P I almost always write cross-platform code. In C or C++, valgrind/asan will catch nearly all memory corruption problems on plain Linux. It's only weird corner cases that escape. Which isn't to say you won't have Windows-only bugs, say. What I'm saying is if you read past the end of an allocated buffer you don't _need_ to test on all platforms. That'll be caught. i.e. the lack of valgrid on Windows or an embedded platform isn't a big deal. Atila
Feb 03
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2017 6:19 AM, Atila Neves wrote:
 Also, unless you're testing possible bugs in compiler backends or the C
standard
 library, it mostly doesn't matter. Compile on regular x86/Linux and run
 valgrind/asan there.
I've often been able to flush out difficult bugs by compiling on another platform. Back in the bad old DOS days, I quickly learned to develop the programs on a protected mode operating system, then port to 16 bit real mode DOS as the last step. :-)
 Have I run into weird bugs that only occurred on one platform? Yes. Were they
 _really_ rare? You betcha. *
Memory corruption bugs show themselves differently on different platforms, and one of them likely will make it easier to find the bug.
Feb 02
parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 2 February 2017 at 20:50:58 UTC, Walter Bright wrote:
 On 2/2/2017 6:19 AM, Atila Neves wrote:
 Also, unless you're testing possible bugs in compiler backends 
 or the C standard
 library, it mostly doesn't matter. Compile on regular 
 x86/Linux and run
 valgrind/asan there.
I've often been able to flush out difficult bugs by compiling on another platform. Back in the bad old DOS days, I quickly learned to develop the programs on a protected mode operating system, then port to 16 bit real mode DOS as the last step. :-)
That I can see the value in. But (fortunately) those days are long gone.
 Have I run into weird bugs that only occurred on one platform? 
 Yes. Were they
 _really_ rare? You betcha. *
Memory corruption bugs show themselves differently on different platforms, and one of them likely will make it easier to find the bug.
Right, but we're talking about finding memory corruption bugs _before_ they manifest themselves. As I mentioned in my other reply, if you have memory corruption bugs in common cross-platform code, valgrind and asan will (nearly always) catch them. You don't need to wait for weird effects that are hard to trace back. Run on Linux with both valgrind and asan and you'll be fine 99.9%* of the time. Atila * stats totally made up
Feb 03
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/3/2017 4:10 AM, Atila Neves wrote:
 Right, but we're talking about finding memory corruption bugs _before_ they
 manifest themselves. As I mentioned in my other reply, if you have memory
 corruption bugs in common cross-platform code, valgrind and asan will (nearly
 always) catch them. You don't need to wait for weird effects that are hard to
 trace back. Run on Linux with both valgrind and asan and you'll be fine 99.9%*
 of the time.
You're right - if you've got a test suite that'll tickle it!
Feb 03
prev sibling parent David Nadlinger <code klickverbot.at> writes:
On Thursday, 2 February 2017 at 09:28:15 UTC, Walter Bright wrote:
 I seem to recall Valgrind wasn't on OSX, either, at one point. 
 Maybe that has since been corrected.
It nominally works on 10.10, if I recall correctly, but not to the same standard as on Linux. For C/C++, a combination of the various Clang sanitizers works faster and catches more bugs, though. — David
Feb 02
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 2:05 AM, Richard Delorme wrote:
 On Tuesday, 31 January 2017 at 23:30:04 UTC, Walter Bright wrote:
 On 1/31/2017 3:00 PM, Richard Delorme wrote:
 The thing about memcpy is compilers build in a LOT of information about it
 that simply is not there in the declaration. I suggest retrying your example
 for gcc/clang, but use your own memcpy, i.e.:

    void* mymemcpy(void * restrict s1, const void * restrict s2, size_t n);

 Let us know what the results are!
//-----8<------------------------------------------------------- #include <string.h> #include <stdio.h> void* mymemcpy(void* restrict dest, const void* restrict src, size_t n) { const char *s = src; char *d = dest; for (size_t i = 0; i < n; ++i) d[i] = s[i]; return d; } void *copy(const void *c, size_t n) { char d[16]; return mymemcpy(d, c, n); } int main(void) { char a[16] = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}; char *b = copy(a, 8); for (int i = 0; i < 16; ++i) printf("%d ", b[i]); putchar('\n'); } //-----8<------------------------------------------------------- $ gcc mymemcpy.c -O2 -W mymemcpy.c: In function 'copy': mymemcpy.c:13:9: warning: function returns address of local variable [-Wreturn-local-addr] return mymemcpy(d, c, n); ^~~~~~~~~~~~~~~~~ memcpy4.c:12:7: note: declared here char d[16];
Note that you included the source code for mymemcpy(). gcc is apparently able to examine the source code to determine that 'd' is returned. Please try it just using the declaration.
Feb 01
prev sibling next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
 2. The return value is derived from s1.
 4. Copies of s1 or s2 are not saved.
Actually I didn't know either of those things from looking at the signature because DIP25 and DIP1000 have marketing problems, in that the only way to get info on them is on the DIP pages. I'd be willing to bet money that 80% of the people who use D don't know about the -dip25 flag. Is there anywhere which gives a simple explaination of both of these DIP's safety checks?
Jan 31
parent Nick Treleaven <nick geany.org> writes:
On Tuesday, 31 January 2017 at 18:21:02 UTC, Jack Stouffer wrote:
 On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright 
 wrote:
 2. The return value is derived from s1.
 4. Copies of s1 or s2 are not saved.
Actually I didn't know either of those things from looking at the signature because DIP25 and DIP1000 have marketing problems, in that the only way to get info on them is on the DIP pages. I'd be willing to bet money that 80% of the people who use D don't know about the -dip25 flag. Is there anywhere which gives a simple explaination of both of these DIP's safety checks?
DIP1000 is not stable yet AFAICT. I documented return ref around December: https://dlang.org/spec/function.html#return-ref-parameters
Feb 01
prev sibling next sibling parent reply Mathias Lang <mathias.lang sociomantic.com> writes:
On Tuesday, 31 January 2017 at 01:30:48 UTC, Walter Bright wrote:
 By this information being knowable from the declaration, the 
 compiler knows it too and can make use of it.
*Can* make use of it... But won't. Any code calling memcpy has to be in a trusted wrapper, in which `return scope` is not checked. So adding `return scope` annotations to non-safe D binding is just like adding documentation. Which is on par with what C is doing, in the end.
Jan 31
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/31/2017 12:00 PM, Mathias Lang wrote:
 *Can* make use of it... But won't.
 Any code calling memcpy has to be in a  trusted wrapper, in which `return
scope`
 is not checked.
 So adding `return scope` annotations to non-safe D binding is just like adding
 documentation. Which is on par with what C is doing, in the end.
---- import core.stdc.string; void* foo() { char[10] d; char[10] s; return memcpy(&d[0], &s[0], 10); // Error: escaping reference to local variable d } ---- There was a bit of discussion about this a while back. The result was we agreed to not break existing NOT BROKEN code with the new escape detection feature. The above code is broken, and so is diagnosed regardless of -dip1000 settings, safe, trusted or system attributes.
Jan 31
prev sibling parent reply =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Walter Bright <newshound2 digitalmars.com> wrote:
 Rust says https://doc.rust-lang.org/1.14.0/libc/fn.memcpy.html:
 
   pub unsafe extern fn memcpy(dest: *mut c_void,
                             src: *const c_void,
                             n: size_t)
                             -> *mut c_void
 
 
 [...]
 The Rust declaration does not give us 1, 2 or 4 (because it is marked as 
 unsafe). If it was safe, the declaration does not give us 2.
Using an FFI function to compare D vs Rust doesn't tell you much. Foreign functions are usually not used directly in Rust, they are used to build safe wrappers that will give you *all* possible guarantees, including type safety. As a consequence it's not necessary to augment the C declaration with additional information. Marking the function as safe would be wrong in Rust, because dereferencing raw pointers is unsafe. Raw pointers are not necessarily valid, even in safe code. You need references for that guarantee. But again, raw pointers are usually only used for FFI and to build safe abstractions. Tobi
Jan 31
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 1/31/2017 10:43 PM, Tobias Müller wrote:
 Using an FFI function to compare D vs Rust doesn't tell you much. Foreign
 functions are usually not used directly in Rust, they are used to build
 safe wrappers that will give you *all* possible guarantees, including type
 safety.
 As a consequence it's not necessary to augment the C declaration with
 additional information.
I'm not very familiar with Rust. Can you post what a Rust declaration for memcpy would look like with all the guarantees?
 Marking the function as safe would be wrong in Rust, because dereferencing
 raw pointers is unsafe. Raw pointers are not necessarily valid, even in
 safe code. You need references for that guarantee. But again, raw pointers
 are usually only used for FFI and to build safe abstractions.
memcpy() isn't marked safe in D, either.
Feb 01
next sibling parent reply Cody Laeder <codylaeder gmail.com> writes:
On Wednesday, 1 February 2017 at 08:17:45 UTC, Walter Bright 
wrote:
 I'm not very familiar with Rust. Can you post what a Rust 
 declaration for memcpy would look like with all the guarantees?
The memcpy you have linked [1] is just a wrapper around the LLVM intrinsic [2] function. This is not stabilized therefore not part of the standard library, as Rust doesn't want to force permanent dependence on the LLVM (or emulating the LLVM on other future backends). The _traditional_ C-like memcpy [3] in the stdlib. It is unsafe, and carries no side effects for the src buffer. It enforces type safety, but it cannot enforce memory safety as you can blow past the allocation side on your dst buffer (hence why it is unsafe). The simplest _safe_ memcpy [4] is just doing a range check before calling the unsafe memcpy in stdlib. This ensure type and memory safety (returning Err on non-equal length buffers). While this may seem limiting one can still archive non-aligned copies via the Rust sub-slice operator Example: mempy( &src[0..4], &mut dst[20..24]); Which would copy the first 3 bytes of src, into the 20th to 23rd bytes of dst. [1] https://doc.rust-lang.org/1.14.0/libc/fn.memcpy.html [2] http://llvm.org/docs/LangRef.html#llvm-memcpy-intrinsic [3] https://doc.rust-lang.org/std/ptr/fn.copy_nonoverlapping.html [4] https://gist.github.com/1f34331b2cae6ba9e624c5f9f4f2a458
Feb 01
next sibling parent reply Michael Howell <michaelhowell932+dlang gmail.com> writes:
On Wednesday, 1 February 2017 at 14:39:15 UTC, Cody Laeder wrote:
 [4] https://gist.github.com/1f34331b2cae6ba9e624c5f9f4f2a458
That example code won't even typecheck, and the minimal fix (make dest a slice) leaves it unsafe (T needs to be Copy). If you just want to quick bang out code like this, you should probably use play.rust-lang.org to make sure it works. But, anyway, let's use the version of that function that's actually in the standard library: https://github.com/rust-lang/rust/blob/master/src/libcore/slice.rs#L531 // This trait is implemented for slices, // so it can be invoked like this: // dest.copy_from_slice(src) pub trait SliceExt { type Item; // [other slice methods redacted] #[stable(feature = "copy_from_slice", since = "1.9.0")] fn copy_from_slice(&mut self, src: &[Self::Item]) where Self::Item: Copy; } To list the guarantees in the OP: 1. The signature doesn't say anything about side effects. This will probably be a const function, once those exist. 2. Since this function returns nothing, there is nothing to say about the return value. Because of how &mut pointers work in Rust, returning pointers like that is not ergonomic. 3. Nothing src points to, directly or transitively, can be mutated, unless T contains a cell (the compiler can and already does determine this on an as-needed basis, and a human reader can usually ignore interior mutability because it's used for semantically meaningless things like reference counts). 4. self and src can't be saved, because they don't outlive the function invocation. The items behind it can't be saved, either, because Self::Item is a generic that might not live long enough (that feels like cheating, though, because it only works for generics or if the data type is deliberately engineered to not be 'static). Unlike libc's memcpy (which is directly exposed, as part of the stable standard library as copy_nonoverlapping), the slice abstraction expresses that the length of the slice is within bounds of the underlying allocation. But D has slices, too, and probably has a version of this function, so that also feels like cheating. This function signature *does* guarantee that src and self don't overlap, unlike the C and D versions. Personally, I think that's at least as important as whether the function's pure or not. Here's a version of memcpy that's blatantly unidiomatic, but gets the same score on 1, 2, 3, and 4 as the slice version https://play.rust-lang.org/?gist=1f3a07987258500b8afd5a30e589457b: unsafe fn copy_nonoverlapping_ref<T>(src: &T, dest: &mut T, len: usize) { std::ptr::copy_nonoverlapping(src, dest, len) } Again, it doesn't guarantee no side effects, it may guarantee that src isn't mutated, it does guarantee that they aren't stored away somewhere, and it guarantees that src and dest don't overlap. It's still unsafe, because it doesn't do anything about len being possibly out of bounds, and I left out the Copy bound for the sake of flexibility.
Feb 01
next sibling parent reply Michael Howell <michaelhowell932+dlang gmail.com> writes:
On Wednesday, 1 February 2017 at 17:28:28 UTC, Michael Howell 
wrote:
 This function signature *does* guarantee that src and self 
 don't overlap, unlike the C and D versions. Personally, I think 
 that's at least as important as whether the function's pure or 
 not.
Oops, forgot the "restrict" keyword. It is there in the C and D versions.
Feb 01
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 9:56 AM, Michael Howell wrote:
 Oops, forgot the "restrict" keyword. It is there in the C and D versions.
D doesn't have the 'restrict' annotation.
Feb 01
prev sibling next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 9:28 AM, Michael Howell wrote:
 This function signature *does* guarantee that src and self don't overlap,
unlike
 the C and D versions. Personally, I think that's at least as important as
 whether the function's pure or not.
The overlap is handled in C with the 'restrict' annotation. D does not have an equivalent.
Feb 01
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 9:28 AM, Michael Howell wrote:
 unsafe fn copy_nonoverlapping_ref<T>(src: &T, dest: &mut T, len: usize) {
   std::ptr::copy_nonoverlapping(src, dest, len)
 }

 Again, it doesn't guarantee no side effects, it may guarantee that src isn't
 mutated, it does guarantee that they aren't stored away somewhere, and it
 guarantees that src and dest don't overlap.
What part of the signature guarantees non-overlap?
 It's still unsafe, because it
 doesn't do anything about len being possibly out of bounds, and I left out the
 Copy bound for the sake of flexibility.
Being marked 'unsafe' also includes the ability of the function to save the pointers in global variables.
Feb 01
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Wednesday, 1 February 2017 at 21:31:33 UTC, Walter Bright 
wrote:
 What part of the signature guarantees non-overlap?
At the rate D is going, pretty soon the entire function body will be retold in the signature. What's the point when it is obvious that in practice, we can actually analyze the content and get BETTER coverage anyway? (actually in the real world, it won't since nobody will care enough to write `pure public return const(T) hi(return scope T t) nothrow nogc safe noincompetence a_million_other_things { return t; }`)
Feb 01
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 1:47 PM, Adam D. Ruppe wrote:
 At the rate D is going, pretty soon the entire function body will be retold in
 the signature. What's the point when it is obvious that in practice, we can
 actually analyze the content and get BETTER coverage anyway?
It already does this for templates and function literals. It's not really possible for C headers :-)
Feb 01
prev sibling parent reply Mike <none none.com> writes:
On Wednesday, 1 February 2017 at 21:47:58 UTC, Adam D. Ruppe 
wrote:
 (actually in the real world, it won't since nobody will care 
 enough to write `pure public return const(T) hi(return scope T 
 t) nothrow  nogc  safe  noincompetence  a_million_other_things 
 { return t; }`)
I found all the attribution of function definitions a little unnerving when I started learning D, but at least one other believed it was idiomatic D. It makes me wonder if D has the wrong defaults (though probably a moot point if changing it would break code). But, it does place D at a disadvantage to a language like Rust, where one has to opt-out of safety, while in D it appears to be opt-in. Mike
Feb 01
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Wed, Feb 01, 2017 at 11:49:25PM +0000, Mike via Digitalmars-d wrote:
 On Wednesday, 1 February 2017 at 21:47:58 UTC, Adam D. Ruppe wrote:
 (actually in the real world, it won't since nobody will care enough
 to write `pure public return const(T) hi(return scope T t) nothrow
  nogc  safe  noincompetence  a_million_other_things { return t; }`)
I found all the attribution of function definitions a little unnerving when I started learning D, but at least one other believed it was idiomatic D.
I think the goal, if it's possible to attain to eventually, is to have the compiler infer most of the attributes, ideally all. This is already done for a good number of attributes for template functions and auto functions. I agree with you that attribute soup is a Bad Thing for a language.
 It makes me wonder if D has the wrong defaults (though probably a moot
 point if changing it would break code).  But, it does place D at a
 disadvantage to a language like Rust, where one has to opt-out of
 safety, while in D it appears to be opt-in.
[...] We would love to change the defaults, but unfortunately that boat has already sailed a long time ago. If we could do it all over again, I'm sure a lot of defaults would be the opposite of what they are today. But we can't reasonably change that now without massive breakage of current code. T -- Life begins when you can spend your spare time programming instead of watching television. -- Cal Keegan
Feb 01
next sibling parent Jon Degenhardt <jond noreply.com> writes:
On Wednesday, 1 February 2017 at 23:49:29 UTC, H. S. Teoh wrote:
 On Wed, Feb 01, 2017 at 11:49:25PM +0000, Mike via 
 Digitalmars-d wrote:
 On Wednesday, 1 February 2017 at 21:47:58 UTC, Adam D. Ruppe 
 wrote:
 (actually in the real world, it won't since nobody will care 
 enough
 to write `pure public return const(T) hi(return scope T t) 
 nothrow
  nogc  safe  noincompetence  a_million_other_things { return 
 t; }`)
I found all the attribution of function definitions a little unnerving when I started learning D, but at least one other believed it was idiomatic D.
I think the goal, if it's possible to attain to eventually, is to have the compiler infer most of the attributes, ideally all. This is already done for a good number of attributes for template functions and auto functions. I agree with you that attribute soup is a Bad Thing for a language.
A different spin on the same topic - I've been wondering what the anticipated changes to both the standard library and application code are going to be as a result of dip25 and dip1000. Is the expectation that use of the new annotations will be limited to select portions of code bases, or that they will become the predominant way to write D code? Another spin - If a company has a team people building a code-base and decides to establish best practices, how might the facilities introduced by dip25 and dip1000 fit into those best practices? Clearly, there will be a menu of options available to teams, applications, etc. But, I would expect there to be a small number of preferred, commonly adopted approaches. And, it would be preferable, though not necessary, that best practices and idioms used in the standard library were a good starting point for new adoptees and development efforts. Hence my question about both Phobos and application code. Perhaps it's too early to answer these questions, but if there are expected outcomes it would be useful to understand them. --Jon
Feb 01
prev sibling next sibling parent Paolo Invernizzi <paolo.invernizzi no.address> writes:
On Wednesday, 1 February 2017 at 23:49:29 UTC, H. S. Teoh wrote:
 On Wed, Feb 01, 2017 at 11:49:25PM +0000, Mike via

 We would love to change the defaults, but unfortunately that 
 boat has already sailed a long time ago.  If we could do it all 
 over again, I'm sure a lot of defaults would be the opposite of 
 what they are today. But we can't reasonably change that now 
 without massive breakage of current code.


 T
My opinion is that the current situation is not that bad: it's some sort of bottom-up approach. Almost always, in a company, programmers are under time pressure and rushing, so I bet that almost always they will add a system: on the top of the module. Maybe not true, but It come in my mind the Java "throw Exception" parade in function declaration.... /Paolo
Feb 02
prev sibling parent reply Random D user <no email.com> writes:
On Wednesday, 1 February 2017 at 23:49:29 UTC, H. S. Teoh wrote:
 We would love to change the defaults, but unfortunately that 
 boat has already sailed a long time ago.
What if d had a -safe-defaults switch? It should be ok, since safe is stricter than unsafe right? This way old/existing code would compile fine by default, but if you want to use that code/lib with safe-defaults you either have to do trusted wrappers or modify it to be safe. All new code with safe-defaults would compile fine in safe mode and unsafe mode. To me it's similar approach to 'warnings-all' and 'warnings-as-errors'. --- I myself don't really care for safe, it's complex and seems to have big practical hole with trusted. Kind of like 'refs can't be null in c++' (as some people claim/argue) and then someone passes nullptr into function ref arg. Completely unreliable, even though refs usually work ok 99% of the time (by conventions and effort). I've already thrown const, immutable, inout mostly in to trash (string literals etc. are exception) after few tries. They make the code more complex and harder to modify especially when you have bigger system. Often you realize that your system/module isn't truly 100% const in the last insignificant leaf function, and that triggers large cascading modifications and rewrites, just to get the code work. Also I can't really remember when I accidentally modified data that I shouldn't have (i.e. violate const protection). But I often modify correct data incorrectly. I believe most programmers first figure out what they need to do before doing it instead of just writing randomly into some array/pointer that looked handy :) I prefer flexible (fun), fast and debuggable (debugger/printing friendly) code. It seems that neither safe or const are part of it. (I'm not writing life and death safety critical code anyway).
Feb 02
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/2/2017 12:37 PM, Random D user wrote:
 I prefer flexible (fun), fast and debuggable (debugger/printing friendly) code.
 It seems that neither  safe or const are part of it. (I'm not writing life and
 death safety critical code anyway).
One nice feature of D is you don't have to use const, safe, etc., if you don't want to. But for high stakes software, there's no substitute.
Feb 02
prev sibling parent Nick Treleaven <nick geany.org> writes:
On Thursday, 2 February 2017 at 20:37:32 UTC, Random D user wrote:
 What if d had a -safe-defaults switch? It should be ok, since 
 safe is stricter than unsafe right?
Yes, we need this because module-level ' safe:' doesn't allow inferrence of system.
 This way old/existing code would compile fine by default, but 
 if you want to use that code/lib with safe-defaults you either 
 have to do trusted wrappers or modify it to be safe.
Or just override the default with system when you need it. System library code probably can't easily be wrapped with trusted (correctly, see below).
 I myself don't really care for  safe, it's complex and seems to 
 have big practical hole with  trusted.
It's the same as Rust unsafe blocks. Without this 'hole' safe would be much more limited and inefficient, so no one would use it. If trusted is used in a way that exposes unsafe behaviour, that is a programmer bug. The good news is, projects shouldn't have much trusted code and it can easily be grepped for.
 I've already thrown const, immutable, inout mostly in to trash 
 (string literals etc. are exception) after few tries. They make 
 the code more complex and harder to modify especially when you 
 have bigger system. Often you realize that your system/module 
 isn't truly 100% const in the last insignificant leaf function, 
 and that triggers large  cascading modifications and rewrites, 
 just to get the code work.
Transitive immutability is very different to shallow const, you can't use it so often, but it guarantees safe sharing of data across threads or even just string slices in single threaded programs without the possibility of accidental modification.
Feb 03
prev sibling parent reply =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Walter Bright <newshound2 digitalmars.com> wrote:
 On 2/1/2017 9:28 AM, Michael Howell wrote:
 unsafe fn copy_nonoverlapping_ref<T>(src: &T, dest: &mut T, len: usize) {
 std::ptr::copy_nonoverlapping(src, dest, len)
 }
 
 Again, it doesn't guarantee no side effects, it may guarantee that src isn't
 mutated, it does guarantee that they aren't stored away somewhere, and it
 guarantees that src and dest don't overlap.
What part of the signature guarantees non-overlap?
Mutable references cannot overlap in Rust, at least in safe code. The exact guarantees in unsafe code are not yet clear and currently being fleshed out. OTOH using unsafe code in Rust is quite rare.
Feb 01
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 10:23 PM, Tobias Müller wrote:
 Walter Bright <newshound2 digitalmars.com> wrote:
 What part of the signature guarantees non-overlap?
Mutable references cannot overlap in Rust, at least in safe code.
I didn't know that bit, thanks for the info.
Feb 01
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 6:39 AM, Cody Laeder wrote:
 The _traditional_ C-like memcpy [3] in the stdlib. It is unsafe, and carries no
 side effects for the src buffer. It enforces type safety, but it cannot enforce
 memory safety as you can blow past the allocation side on your dst buffer
(hence
 why it is unsafe).
It also does not guarantee the function does not save a copy of those pointers and dereference them later. Programmers "know" this to be true for memcpy, but the compiler cannot know this from the Rust (or C) declaration. The D version does present this guarantee by annotating it with 'pure'. This matters because such a saved pointer can become a dangling reference - a memory corruption bug waiting to happen. [Note: in Rust, functions marked 'unsafe' may store copies of their arguments in globals. 'safe' functions may not access mutable global storage.]
Feb 01
prev sibling parent reply =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Walter Bright <newshound2 digitalmars.com> wrote:
 I'm not very familiar with Rust. Can you post what a Rust declaration for
memcpy 
 would look like with all the guarantees?
You wouldn't use memcpy but just assign the slices. Assignment is always just memcpy in Rust because of move-semantics: a[m1..n1] = b[m2..n2]; It will panic if sizes don't match. But if you still wanted a memcpy it would probably look like this: fn memcpy<'a, T>(dest: &'a mut [T], src: &[T]) -> &'a mut [T]
Feb 01
next sibling parent =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Tobias Müller <troplin bluewin.ch> wrote:
 Walter Bright  
 But if you still wanted a memcpy it would probably look like this:
 
 fn memcpy<'a, T>(dest: &'a mut [T], src: &[T]) -> &'a mut [T]
No, sorry: fn memcpy<'a, T: Copy>(dest: &'a mut [T], src: &[T]) -> &'a mut [T] And mutable references can never alias, you have the same guarantees as with _restrict, statically checked even at the call site.
Feb 01
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 9:22 AM, Tobias Müller wrote:
 You wouldn't use memcpy but just assign the slices.
I clearly made a mistake in this example. I wanted to show how a compiler learns things from the declaration by using a very familiar declaration. But it keeps getting diverted into what people (and some compilers) "know" about memcpy that is not in the declaration.
Feb 01
parent reply =?UTF-8?Q?Tobias=20M=C3=BCller?= <troplin bluewin.ch> writes:
Walter Bright <newshound2 digitalmars.com> wrote:
 On 2/1/2017 9:22 AM, Tobias Müller wrote:
 You wouldn't use memcpy but just assign the slices.
I clearly made a mistake in this example. I wanted to show how a compiler learns things from the declaration by using a very familiar declaration. But it keeps getting diverted into what people (and some compilers) "know" about memcpy that is not in the declaration.
I also showed you how memcpy could look like in Rust, I think it's only fair to also point out that this would be fairly unidiomatic. Apart from that, the entire point of building a safe wrapper around an unsafe FFI function is to exploit additional knowledge that is not present in the C declaration, but only in documentation. It's not relevant if that wrapper is built into the language or a library function. After all, to write the D declaration you also had to exploit that knowledge once.
Feb 01
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/1/2017 10:11 PM, Tobias Müller wrote:
 Apart from that, the entire point of building a safe wrapper around an
 unsafe FFI function is to exploit additional knowledge that is not present
 in the C declaration, but only in documentation. It's not relevant if that
 wrapper is built into the language or a library function.
D won't make you add annotations. You can use Valgrind or gdb if you prefer.
Feb 01