www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A safer interface for core.stdc

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I was looking into ways to make core.stdc safer. That should be 
relatively easy to do by defining a few wrappers. For example:

int  setvbuf(FILE* stream, char* buf, int mode, size_t size);

is unsafe because there's no relationship between buf and size. But this 
is fine:

 trusted int setvbuf(T)(FILE* stream, T[] buf, int mode)
if (is(T == char) || is(T == byte) || is(T == ubyte))
{
     return setvbuf(stream, cast(char*) buf.ptr, mode, buf.length);
}

Another example is:

int stat(in char*, stat_t*);

which may start reading through random memory if the string is not 
zero-terminated. Again, the solution is here to ensure the string does 
have a terminating zero:

 trusted int stat(in char[] name, stat_t* p)
{
     if (isZeroTerminated(name)) return stat(name.ptr, p);
     auto t = cast(char*) malloc(name.length + 1);
     scope(exit) free(t);
     memcpy(t, name.ptr, name.length);
     t[name.length] = 0;
     return stat(t, p);
}

Such wrappers would allow safe code to use more C stdlib primitives. The 
question is whether these wrappers are worth adding to core.stdc.stdio.


Thanks,

Andrei
Feb 07 2015
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Andrei Alexandrescu:

 Such wrappers would allow safe code to use more C stdlib 
 primitives.
I'd also like a safer templated wrapper for calloc() and malloc() and similar. Bye, bearophile
Feb 07 2015
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Sun, Feb 08, 2015 at 12:39:39AM +0000, bearophile via Digitalmars-d wrote:
 Andrei Alexandrescu:
 
Such wrappers would allow safe code to use more C stdlib primitives.
I'd also like a safer templated wrapper for calloc() and malloc() and similar.
[...] You mean something like this? T* malloc(T)() trusted { return cast(T*)malloc(T.sizeof); } struct MyStruct { int x, y, z; } void main() { auto p = malloc!MyStruct(); // Not sure how to make free() usable from safe, unless // we wrap the pointer returned by malloc(). free(p); } T -- Leather is waterproof. Ever see a cow with an umbrella?
Feb 07 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/7/15 5:26 PM, H. S. Teoh via Digitalmars-d wrote:
 On Sun, Feb 08, 2015 at 12:39:39AM +0000, bearophile via Digitalmars-d wrote:
 Andrei Alexandrescu:

 Such wrappers would allow safe code to use more C stdlib primitives.
I'd also like a safer templated wrapper for calloc() and malloc() and similar.
[...] You mean something like this? T* malloc(T)() trusted { return cast(T*)malloc(T.sizeof); }
I think that would go as follows: private system T[] mallocUninitializedArrayImpl(T)(size_t n) { auto p = malloc(n * T.sizeof); p || assert(0, "Not enough memory"); return (cast(T*) p)[0 .. n]; } trusted T[] mallocUninitializedArray(size_t n) if (!hasIndirections!T) { return mallocUninitializedArrayImpl!T(n); } system T[] mallocUninitializedArray(size_t n) if (hasIndirections!T) { return mallocUninitializedArrayImpl!T(n); } Similarly there'd be a mallocMinimallyInitializedArray that zeroes only pointers and is trusted for all types. Then we'd probably have a trusted callocArray that blasts zeros throughout. It's trusted because we know pointers are zeroes (an assumption somewhat not robust in theory but fine in practice). Then we'd have a mallocArray that allocates an array and initializes each element with .init.
 	struct MyStruct {
 		int x, y, z;
 	}

 	void main() {
 		auto p = malloc!MyStruct();

 		// Not sure how to make free() usable from  safe, unless
 		// we wrap the pointer returned by malloc().
 		free(p);
 	}
Indeed we have no safe way to wrap free. Andrei
Feb 07 2015
next sibling parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Sat, Feb 07, 2015 at 06:19:19PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
 private  system T[] mallocUninitializedArrayImpl(T)(size_t n)
 {
     auto p = malloc(n * T.sizeof);
     p || assert(0, "Not enough memory");
This is a truly strange way of writing it... why not: assert(p !is null, "Not enough memory"); ?
     return (cast(T*) p)[0 .. n];
 }
T -- Tell me and I forget. Teach me and I remember. Involve me and I understand. -- Benjamin Franklin
Feb 07 2015
next sibling parent "Vlad Levenfeld" <vlevenfeld gmail.com> writes:
On Sunday, 8 February 2015 at 04:02:52 UTC, H. S. Teoh wrote:
     p || assert(0, "Not enough memory");
This is a truly strange way of writing it... why not: assert(p !is null, "Not enough memory");
I think Andrei's version will remain in release builds, but yours will be elided.
Feb 07 2015
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/7/15 8:00 PM, H. S. Teoh via Digitalmars-d wrote:
 On Sat, Feb 07, 2015 at 06:19:19PM -0800, Andrei Alexandrescu via
Digitalmars-d wrote:
 [...]
 private  system T[] mallocUninitializedArrayImpl(T)(size_t n)
 {
      auto p = malloc(n * T.sizeof);
      p || assert(0, "Not enough memory");
This is a truly strange way of writing it... why not: assert(p !is null, "Not enough memory"); ?
assert(0) is not removed in release mode. -- Andrei
Feb 07 2015
parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Sat, Feb 07, 2015 at 08:15:05PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 2/7/15 8:00 PM, H. S. Teoh via Digitalmars-d wrote:
On Sat, Feb 07, 2015 at 06:19:19PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
[...]
private  system T[] mallocUninitializedArrayImpl(T)(size_t n)
{
     auto p = malloc(n * T.sizeof);
     p || assert(0, "Not enough memory");
This is a truly strange way of writing it... why not: assert(p !is null, "Not enough memory"); ?
assert(0) is not removed in release mode. -- Andrei
Ah, right. But shouldn't it be enforce instead of assert, then? :-P T -- In theory, software is implemented according to the design that has been carefully worked out beforehand. In practice, design documents are written after the fact to describe the sorry mess that has gone on before.
Feb 07 2015
prev sibling parent reply FG <home fgda.pl> writes:
On 2015-02-08 at 03:19, Andrei Alexandrescu wrote:
 Indeed we have no safe way to wrap free.
How about this to prevent double free: Wrapped malloc keeps a static thread-local lookup structure for successful allocations (if having to release memory from the same thread is an acceptable requirement). Wrapped free looks up the pointer in that lookup structure and, if found, frees memory, removes the lookup entry and sets the argument of the call to zero (if it was a pointer) or sets its length and ptr to zero (if it was a dynamic array). It's not completely safe, but for that GC would have to be used instead.
Feb 08 2015
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Sunday, 8 February 2015 at 12:43:38 UTC, FG wrote:
 On 2015-02-08 at 03:19, Andrei Alexandrescu wrote:
 Indeed we have no safe way to wrap free.
How about this to prevent double free: Wrapped malloc keeps a static thread-local lookup structure for successful allocations (if having to release memory from the same thread is an acceptable requirement). Wrapped free looks up the pointer in that lookup structure and, if found, frees memory, removes the lookup entry and sets the argument of the call to zero (if it was a pointer) or sets its length and ptr to zero (if it was a dynamic array). It's not completely safe, but for that GC would have to be used instead.
I don't have any data, but I'd image most double-frees come from multiple references to the same data, not repeated calls to free on the same reference.
Feb 08 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/8/15 5:16 AM, John Colvin wrote:
 On Sunday, 8 February 2015 at 12:43:38 UTC, FG wrote:
 On 2015-02-08 at 03:19, Andrei Alexandrescu wrote:
 Indeed we have no safe way to wrap free.
How about this to prevent double free: Wrapped malloc keeps a static thread-local lookup structure for successful allocations (if having to release memory from the same thread is an acceptable requirement). Wrapped free looks up the pointer in that lookup structure and, if found, frees memory, removes the lookup entry and sets the argument of the call to zero (if it was a pointer) or sets its length and ptr to zero (if it was a dynamic array). It's not completely safe, but for that GC would have to be used instead.
I don't have any data, but I'd image most double-frees come from multiple references to the same data, not repeated calls to free on the same reference.
I think the same. In C++ circles zeroing the pointer after freeing is considering an antipattern - what with false sense of security etc. -- Andrei
Feb 08 2015
parent Walter Bright <newshound2 digitalmars.com> writes:
On 2/8/2015 8:32 AM, Andrei Alexandrescu wrote:
 I think the same. In C++ circles zeroing the pointer after freeing is
 considering an antipattern - what with false sense of security etc. -- Andrei
What worked for me was using a free() that overwrote the free'd memory with 0xFEFEFEFE or something like that. Worked great for finding bugs. These days, valgrind does an even better job.
Feb 10 2015
prev sibling parent "Paulo Pinto" <pjmlp progtools.org> writes:
On Sunday, 8 February 2015 at 12:43:38 UTC, FG wrote:
 On 2015-02-08 at 03:19, Andrei Alexandrescu wrote:
 Indeed we have no safe way to wrap free.
How about this to prevent double free: Wrapped malloc keeps a static thread-local lookup structure for successful allocations (if having to release memory from the same thread is an acceptable requirement). Wrapped free looks up the pointer in that lookup structure and, if found, frees memory, removes the lookup entry and sets the argument of the call to zero (if it was a pointer) or sets its length and ptr to zero (if it was a dynamic array). It's not completely safe, but for that GC would have to be used instead.
A typical C debug library trick for when the money for a proper tool isn't available. -- Paulo
Feb 08 2015
prev sibling next sibling parent reply "tcak" <tcak gmail.com> writes:
On Saturday, 7 February 2015 at 23:50:55 UTC, Andrei Alexandrescu 
wrote:
 I was looking into ways to make core.stdc safer. That should be 
 relatively easy to do by defining a few wrappers. For example:

 int  setvbuf(FILE* stream, char* buf, int mode, size_t size);

 is unsafe because there's no relationship between buf and size. 
 But this is fine:

  trusted int setvbuf(T)(FILE* stream, T[] buf, int mode)
 if (is(T == char) || is(T == byte) || is(T == ubyte))
 {
     return setvbuf(stream, cast(char*) buf.ptr, mode, 
 buf.length);
 }

 Another example is:

 int stat(in char*, stat_t*);

 which may start reading through random memory if the string is 
 not zero-terminated. Again, the solution is here to ensure the 
 string does have a terminating zero:

  trusted int stat(in char[] name, stat_t* p)
 {
     if (isZeroTerminated(name)) return stat(name.ptr, p);
     auto t = cast(char*) malloc(name.length + 1);
     scope(exit) free(t);
     memcpy(t, name.ptr, name.length);
     t[name.length] = 0;
     return stat(t, p);
 }

 Such wrappers would allow safe code to use more C stdlib 
 primitives. The question is whether these wrappers are worth 
 adding to core.stdc.stdio.


 Thanks,

 Andrei
One of the reasons why I use C functions is that I expect same behaviour from D code what I would expect from C. I don't think it is a good idea to make wrapper on top of them. Maybe you could say, "Hey, look, it just makes safer, that's all", but, hmm there are so many functions, and this wrapping process can go in many directions.
Feb 07 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/7/15 7:52 PM, tcak wrote:
 One of the reasons why I use C functions is that I expect same behaviour
 from D code what I would expect from C. I don't think it is a good idea
 to make wrapper on top of them. Maybe you could say, "Hey, look, it just
 makes safer, that's all", but, hmm there are so many functions, and this
 wrapping process can go in many directions.
Just looking at making them safe. Not all can be made safe btw. -- Andrei
Feb 07 2015
parent reply "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Sat, Feb 07, 2015 at 08:14:39PM -0800, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 2/7/15 7:52 PM, tcak wrote:
One of the reasons why I use C functions is that I expect same
behaviour from D code what I would expect from C. I don't think it is
a good idea to make wrapper on top of them. Maybe you could say,
"Hey, look, it just makes safer, that's all", but, hmm there are so
many functions, and this wrapping process can go in many directions.
Just looking at making them safe. Not all can be made safe btw. -- Andrei
Come to think of it, is there any point in making malloc safe/ trusted at all? I don't think it's possible to make free() safe, so what's the purpose of making malloc callable from safe code? Unless you make a ref-counted wrapper of some sort around it, in which case you might as well use RefCounted instead. I thought about making the equivalent of auto_ptr, but unless you make it non-copyable (or only destructively copyable, and no pointer extraction is permitted), there's no way it can be truly safe. The only possible advantage we could gain is *type* safety by wrapping malloc in a type-safe way (i.e., don't expose void*). T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
Feb 07 2015
next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/7/15 8:21 PM, H. S. Teoh via Digitalmars-d wrote:
 Come to think of it, is there any point in making malloc  safe/ trusted
 at all? I don't think it's possible to make free()  safe, so what's the
 purpose of making malloc callable from  safe code? Unless you make a
 ref-counted wrapper of some sort around it, in which case you might as
 well use RefCounted instead.
Same goes about e.g. fopen vs. fclose. I'm thinking just of increasing the quantity of safe code. -- Andrei
Feb 07 2015
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Sunday, 8 February 2015 at 04:45:55 UTC, Andrei Alexandrescu 
wrote:
 On 2/7/15 8:21 PM, H. S. Teoh via Digitalmars-d wrote:
 Come to think of it, is there any point in making malloc 
  safe/ trusted
 at all? I don't think it's possible to make free()  safe, so 
 what's the
 purpose of making malloc callable from  safe code? Unless you 
 make a
 ref-counted wrapper of some sort around it, in which case you 
 might as
 well use RefCounted instead.
Same goes about e.g. fopen vs. fclose. I'm thinking just of increasing the quantity of safe code. -- Andrei
So are you going for this: http://forum.dlang.org/thread/ovoarcbexpvrrceysnrs forum.dlang.org ?
Feb 08 2015
prev sibling next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
H. S. Teoh:

 Come to think of it, is there any point in making malloc 
  safe/ trusted at all?
I am not asking for a trusted function. I'd like a system template wrapper for malloc/calloc/free that is safer than the C functions (safer because it's type-aware). Bye, bearophile
Feb 08 2015
prev sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
 come to think of it, is there any point in making malloc 
  safe/ trusted
 at all? I don't think it's possible to make free()  safe, so 
 what's the
 purpose of making malloc callable from  safe code?
At least two programs, widely used by folks here, never release their memory. Those could be made safe.
Feb 08 2015
prev sibling next sibling parent reply Johannes Pfau <nospam example.com> writes:
Am Sat, 07 Feb 2015 15:50:53 -0800
schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:

 I was looking into ways to make core.stdc safer. That should be 
 relatively easy to do by defining a few wrappers. For example:
This might be a good idea, but it might also be more difficult than you think:
 
 int  setvbuf(FILE* stream, char* buf, int mode, size_t size);
 
 is unsafe because there's no relationship between buf and size. But
 this is fine:
 
  trusted int setvbuf(T)(FILE* stream, T[] buf, int mode)
 if (is(T == char) || is(T == byte) || is(T == ubyte))
 {
      return setvbuf(stream, cast(char*) buf.ptr, mode, buf.length);
 }
 
This can still cause memory corruption if `buf` is GC-allocated. You'd have to pin the buffer which might not be easy in such a low-level wrapper. OTOH in a higher level wrapper (std.stdio.File) you can simply keep a reference to the buffer.
 Another example is:
 
 int stat(in char*, stat_t*);
 
 which may start reading through random memory if the string is not 
 zero-terminated. Again, the solution is here to ensure the string
 does have a terminating zero:
 
  trusted int stat(in char[] name, stat_t* p)
 {
      if (isZeroTerminated(name)) return stat(name.ptr, p);
How would you implement `isZeroTerminated` in a memory safe way? We have exactly the same problem in toStringz and nobody ever came up with a really safe solution. The best you could do is using special types for zero-terminated strings but that might be cumbersome to use.
      auto t = cast(char*) malloc(name.length + 1);
      scope(exit) free(t);
      memcpy(t, name.ptr, name.length);
      t[name.length] = 0;
      return stat(t, p);
 }
 
 Such wrappers would allow safe code to use more C stdlib primitives.
 The question is whether these wrappers are worth adding to
 core.stdc.stdio.
 
That's the main question. There's only a limited amount of stdc functions which can be wrapped in a safe way and std.stdio etc. are already kind of a safe wrapper. And it's also important to get these wrappers right and make sure they don't introduce memory safety bugs.
Feb 08 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/8/15 2:54 AM, Johannes Pfau wrote:
 Am Sat, 07 Feb 2015 15:50:53 -0800
 schrieb Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>:
  trusted int setvbuf(T)(FILE* stream, T[] buf, int mode)
 if (is(T == char) || is(T == byte) || is(T == ubyte))
 {
       return setvbuf(stream, cast(char*) buf.ptr, mode, buf.length);
 }
This can still cause memory corruption if `buf` is GC-allocated. You'd have to pin the buffer which might not be easy in such a low-level wrapper. OTOH in a higher level wrapper (std.stdio.File) you can simply keep a reference to the buffer.
Good point, thanks. Moving GCs didn't occur to me.
  trusted int stat(in char[] name, stat_t* p)
 {
       if (isZeroTerminated(name)) return stat(name.ptr, p);
How would you implement `isZeroTerminated` in a memory safe way? We have exactly the same problem in toStringz and nobody ever came up with a really safe solution. The best you could do is using special types for zero-terminated strings but that might be cumbersome to use.
I thought of a few things, nothing is 100% foolproof. But I'm not too worried - many of these functions issue system calls, and the cost of a malloc/free pulse is unlikely to be measurable. With opportunistic use of alloca it gets even better.
       auto t = cast(char*) malloc(name.length + 1);
       scope(exit) free(t);
       memcpy(t, name.ptr, name.length);
       t[name.length] = 0;
       return stat(t, p);
 }

 Such wrappers would allow safe code to use more C stdlib primitives.
 The question is whether these wrappers are worth adding to
 core.stdc.stdio.
That's the main question. There's only a limited amount of stdc functions which can be wrapped in a safe way and std.stdio etc. are already kind of a safe wrapper. And it's also important to get these wrappers right and make sure they don't introduce memory safety bugs.
I see it as increased opportunity to rely on simple manually checkable low-level functions, both in Phobos and outside it. It seems there's merit in that. Andrei
Feb 08 2015
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Saturday, 7 February 2015 at 23:50:55 UTC, Andrei Alexandrescu 
wrote:
 I was looking into ways to make core.stdc safer. That should be 
 relatively easy to do by defining a few wrappers. For example:

 int  setvbuf(FILE* stream, char* buf, int mode, size_t size);

 is unsafe because there's no relationship between buf and size. 
 But this is fine:

  trusted int setvbuf(T)(FILE* stream, T[] buf, int mode)
 if (is(T == char) || is(T == byte) || is(T == ubyte))
 {
     return setvbuf(stream, cast(char*) buf.ptr, mode, 
 buf.length);
 }

 Another example is:

 int stat(in char*, stat_t*);

 which may start reading through random memory if the string is 
 not zero-terminated. Again, the solution is here to ensure the 
 string does have a terminating zero:

  trusted int stat(in char[] name, stat_t* p)
 {
     if (isZeroTerminated(name)) return stat(name.ptr, p);
     auto t = cast(char*) malloc(name.length + 1);
     scope(exit) free(t);
     memcpy(t, name.ptr, name.length);
     t[name.length] = 0;
     return stat(t, p);
 }

 Such wrappers would allow safe code to use more C stdlib 
 primitives. The question is whether these wrappers are worth 
 adding to core.stdc.stdio.


 Thanks,

 Andrei
I think this is crucial if we want to keep actual Phobos sources easily review-able within your requirements. There is a good value in having `core.stdc` to map C headers 1-to-1 though. Would you consider separate `core.safestdc` package tree where such wrappers could be put on per need basis (duplicating tree structure of core.stdc modules internally)
Feb 09 2015
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/9/15 1:39 AM, Dicebot wrote:
 I think this is crucial if we want to keep actual Phobos sources easily
 review-able within your requirements. There is a good value in having
 `core.stdc` to map C headers 1-to-1 though.

 Would you consider separate `core.safestdc` package tree where such
 wrappers could be put on per need basis (duplicating tree structure of
 core.stdc modules internally)
Walter opposes this on grounds of increased maintenance burdens. He points out (rightly imho) that we better focus on the higher-level primitives present in Phobos. -- Andrei
Feb 09 2015
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Monday, 9 February 2015 at 17:45:09 UTC, Andrei Alexandrescu 
wrote:
 On 2/9/15 1:39 AM, Dicebot wrote:
 I think this is crucial if we want to keep actual Phobos 
 sources easily
 review-able within your requirements. There is a good value in 
 having
 `core.stdc` to map C headers 1-to-1 though.

 Would you consider separate `core.safestdc` package tree where 
 such
 wrappers could be put on per need basis (duplicating tree 
 structure of
 core.stdc modules internally)
Walter opposes this on grounds of increased maintenance burdens. He points out (rightly imho) that we better focus on the higher-level primitives present in Phobos. -- Andrei
I foresee certain stdc-based primitives being duplicated over and over again as nested functions - in scope of complying with "minimal trusted function that exposes safe API" rule.
Feb 09 2015
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 2/9/15 9:47 AM, Dicebot wrote:
 I foresee certain stdc-based primitives being duplicated over and over
 again as nested functions - in scope of complying with "minimal  trusted
 function that exposes  safe API" rule.
Would be nice to encapsulate them in Phobos on a case basis. -- Andrei
Feb 09 2015
prev sibling parent "finalpatch" <fengli gmail.com> writes:
On Monday, 9 February 2015 at 17:45:09 UTC, Andrei Alexandrescu 
wrote:
 On 2/9/15 1:39 AM, Dicebot wrote:
 I think this is crucial if we want to keep actual Phobos 
 sources easily
 review-able within your requirements. There is a good value in 
 having
 `core.stdc` to map C headers 1-to-1 though.

 Would you consider separate `core.safestdc` package tree where 
 such
 wrappers could be put on per need basis (duplicating tree 
 structure of
 core.stdc modules internally)
Walter opposes this on grounds of increased maintenance burdens. He points out (rightly imho) that we better focus on the higher-level primitives present in Phobos. -- Andrei
Please don't waste your limited resource on this. When people use core.stdc they know what they are doing and take full responsibility of the consequence. If they want safety or nice APIs they wouldn't be using these functions in the first place.
Feb 11 2015