www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How does D distnguish managed pointers from raw pointers?

reply IGotD- <nise nise.com> writes:
According to the GC documentation this code snippet

char* p = new char[10];
char* q = p + 6; // ok
q = p + 11;      // error: undefined behavior
q = p - 1;       // error: undefined behavior

suggests that char *p is really a "fat pointer" with size 
information.

However, if get some memory allocated by some C library that is 
allocated with malloc we have no size information. We would get a 
char * without any size information and according to the 
documentation we can do anything including access out of bounds.

How does D internally know that a pointer was previously 
allocated by the GC or malloc?

If we would replace the GC with reference counting. How would D 
be able to distinguish a reference counted pointer from a raw 
pointer at compile time in order to insert the code associated 
with the reference counting?

This brings me back to MS managed C++ where they actually had two 
types of "pointers" a managed pointer and the normal C++ 
pointers. Like this:

MyType^ instance = gcnew MyType();

In this case it was obvious what is done with GC and what wasn't 
(past tense since managed C++ is deprecated). In this case it 
would be trivial to replace the GC algorithm with whatever you 
want since the compiler know the type at compile time.
Oct 03 2019
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 3 October 2019 at 14:13:55 UTC, IGotD- wrote:
 suggests that char *p is really a "fat pointer" with size 
 information.
D pointers are plain naked pointers. What that doc segment is saying is it works like C - in-bounds arithmetic will work, out of bounds is undefined behavior. You can do it, but it might crash you or whatever. There's no difference in the language between a Gc pointer and any other pointer. But....
 How does D internally know that a pointer was previously 
 allocated by the GC or malloc?
But, this is a bit more nuanced. D, the language, does not know how it was allocated, there's no difference in the type system, but the runtime can figure it out based on the pointer value, if it falls inside the range of the GC's allocated area. It does NOT use that for bounds checking though! It is just an internal detail it uses for some of the GC function to help its sweeps and some of the interface functions.
 If we would replace the GC with reference counting. How would D 
 be able to distinguish a reference counted pointer from a raw 
 pointer at compile time in order to insert the code associated 
 with the reference counting?
It won't, D reference counting is and then would have to be done by different types.
Oct 03 2019
prev sibling next sibling parent reply Andrea Fontana <nospam example.com> writes:
On Thursday, 3 October 2019 at 14:13:55 UTC, IGotD- wrote:
 According to the GC documentation this code snippet

 char* p = new char[10];
 char* q = p + 6; // ok
 q = p + 11;      // error: undefined behavior
 q = p - 1;       // error: undefined behavior

 suggests that char *p is really a "fat pointer" with size 
 information.
No it's not. char* is a plain pointer. The example is wrong, since you can't assign a new char[10] to char*. Probably they mean something like: auto arr = new char[10] char* p = arr.ptr; ... This code actually compiles, but its behaviour is undefined, so it is a logical error. In D arrays are fat pointer instead: int[10] my_array; my_array is actually a pair ptr+length.
Oct 03 2019
parent reply Johan Engelen <j j.nl> writes:
On Thursday, 3 October 2019 at 14:21:37 UTC, Andrea Fontana wrote:
 In D arrays are fat pointer instead:

 int[10] my_array;

 my_array is actually a pair ptr+length.
``` int[10] my_static_array; int[] my_dynamic_array; ``` my_static_array will not be a fat pointer. Length is known at compile time. Address is known at link/load time so it's also not a pointer but just a normal variable (& will give you a pointer to the array data). my_dynamic_array will be a pair for ptr+length. -Johan
Oct 04 2019
parent reply IGotD- <nise nise.com> writes:
On Friday, 4 October 2019 at 15:03:04 UTC, Johan Engelen wrote:
 On Thursday, 3 October 2019 at 14:21:37 UTC, Andrea Fontana 
 wrote:
 In D arrays are fat pointer instead:

 int[10] my_array;

 my_array is actually a pair ptr+length.
``` int[10] my_static_array; int[] my_dynamic_array; ``` my_static_array will not be a fat pointer. Length is known at compile time. Address is known at link/load time so it's also not a pointer but just a normal variable (& will give you a pointer to the array data). my_dynamic_array will be a pair for ptr+length. -Johan
What if you pass a static array to a function that expects a dynamic array. Will D automatically create a dynamic array from the static array?
Oct 04 2019
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 October 2019 at 18:30:17 UTC, IGotD- wrote:
 What if you pass a static array to a function that expects a 
 dynamic array. Will D automatically create a dynamic array from 
 the static array?
No, you have to append [] to create a slice from the static array.
Oct 04 2019
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Oct 04, 2019 at 06:34:40PM +0000, Dennis via Digitalmars-d-learn wrote:
 On Friday, 4 October 2019 at 18:30:17 UTC, IGotD- wrote:
 What if you pass a static array to a function that expects a dynamic
 array. Will D automatically create a dynamic array from the static
 array?
No, you have to append [] to create a slice from the static array.
Actually, it *does* automatically convert the static array to a slice. Which is actually a bug, because you get problems like this: int[] func() { int[5] data = [ 1, 2, 3, 4, 5 ]; return data; // implicit conversion to int[] } void main() { auto data = func(); // Oops: data now references out-of-scope elements on the stack. // Expect garbage values and stack corruption exploits. } See: https://issues.dlang.org/show_bug.cgi?id=15932 T -- "How are you doing?" "Doing what?"
Oct 04 2019
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 October 2019 at 18:43:34 UTC, H. S. Teoh wrote:
 Actually, it *does* automatically convert the static array to a 
 slice.
You're right, I'm confused. I recall there was a situation where you had to explicitly slice a static array, but I can't think of it now.
Oct 04 2019
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 4 October 2019 at 19:03:14 UTC, Dennis wrote:
 You're right, I'm confused. I recall there was a situation 
 where you had to explicitly slice a static array, but I can't 
 think of it now.
When passing to a range template it is necessary, otherwise the template will see it as non-resizable and it will fail the range constraint check. (personally though I like to explicitly slice it all the time though, it is more clear and the habit is nice)
Oct 04 2019
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Oct 04, 2019 at 07:08:04PM +0000, Adam D. Ruppe via Digitalmars-d-learn
wrote:
 On Friday, 4 October 2019 at 19:03:14 UTC, Dennis wrote:
 You're right, I'm confused. I recall there was a situation where you
 had to explicitly slice a static array, but I can't think of it now.
When passing to a range template it is necessary, otherwise the template will see it as non-resizable and it will fail the range constraint check. (personally though I like to explicitly slice it all the time though, it is more clear and the habit is nice)
Yeah, and it's always better to consciously slice it, and therefore be reminded to think about the implications of slicing it, so that you'll be aware not to let the slice leak past the lifetime of the underlying static array. T -- Unix was not designed to stop people from doing stupid things, because that would also stop them from doing clever things. -- Doug Gwyn
Oct 04 2019
prev sibling parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 October 2019 at 19:08:04 UTC, Adam D. Ruppe wrote:
 (personally though I like to explicitly slice it all the time 
 though, it is more clear and the habit is nice)
Turns out I have this habit as well. I'm looking through some of my code and see redundant slicing everywhere.
Oct 04 2019
parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, October 4, 2019 1:22:26 PM MDT Dennis via Digitalmars-d-learn 
wrote:
 On Friday, 4 October 2019 at 19:08:04 UTC, Adam D. Ruppe wrote:
 (personally though I like to explicitly slice it all the time
 though, it is more clear and the habit is nice)
Turns out I have this habit as well. I'm looking through some of my code and see redundant slicing everywhere.
Really, it should be required by the language, because it's not something that you want to be hidden. It's an easy source of bugs - especially once you start passing that dynamic array around. It's incredibly useful to be able to do it, but you need to be careful with such code. It's the array equivalent of taking the address of a local variable and passing a pointer to it around. IIRC, -dip1000 improves the situation by making it so that the type of a slice of a static array is scope, but it's still easy to miss, since it only affects safe code. It should certainly be possible to slice a static array in system code without having to deal with scope, but the fact that explicit slicing isn't required in such a case makes it more error-prone than it would be if explicit slicing were required. - Jonathan M Davis
Oct 04 2019
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Oct 04, 2019 at 11:43:34AM -0700, H. S. Teoh via Digitalmars-d-learn
wrote:
 On Fri, Oct 04, 2019 at 06:34:40PM +0000, Dennis via Digitalmars-d-learn wrote:
 On Friday, 4 October 2019 at 18:30:17 UTC, IGotD- wrote:
 What if you pass a static array to a function that expects a
 dynamic array. Will D automatically create a dynamic array from
 the static array?
No, you have to append [] to create a slice from the static array.
Actually, it *does* automatically convert the static array to a slice.
[...] Here's an actual working example that illustrates the pitfall of this implicit conversion: ----- struct S { int[] data; this(int[] _data) { data = _data; } } S makeS() { int[5] data = [ 1, 2, 3, 4, 5 ]; return S(data); } void func(S s) { import std.stdio; writeln("s.data = ", s.data); } void main() { S s = makeS(); func(s); } ----- Expected output: s.data = [1, 2, 3, 4, 5] Actual output: s.data = [-2111884160, 32766, 1535478075, 22053, 5] T -- MSDOS = MicroSoft's Denial Of Service
Oct 04 2019
parent reply Dennis <dkorpel gmail.com> writes:
On Friday, 4 October 2019 at 18:53:30 UTC, H. S. Teoh wrote:
 Here's an actual working example that illustrates the pitfall 
 of this implicit conversion:
Luckily it's caught by -dip1000
Oct 04 2019
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Oct 04, 2019 at 07:21:34PM +0000, Dennis via Digitalmars-d-learn wrote:
 On Friday, 4 October 2019 at 18:53:30 UTC, H. S. Teoh wrote:
 Here's an actual working example that illustrates the pitfall of
 this implicit conversion:
Luckily it's caught by -dip1000
Nice! T -- "A man's wife has more power over him than the state has." -- Ralph Emerson
Oct 04 2019
prev sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 04/10/2019 3:13 AM, IGotD- wrote:
 According to the GC documentation this code snippet
 
 char* p = new char[10];
 char* q = p + 6; // ok
 q = p + 11;      // error: undefined behavior
 q = p - 1;       // error: undefined behavior
 
 suggests that char *p is really a "fat pointer" with size information.
The pointer is raw. There is no size information stored with it. The GC will store size information separately from it so it can know about reallocation and what its memory range is to search for.
 However, if get some memory allocated by some C library that is 
 allocated with malloc we have no size information. We would get a char * 
 without any size information and according to the documentation we can 
 do anything including access out of bounds.
Access out of bounds is do-able with a pointer allocated by the GC. int[] array; arr.length = 5; int* arrayPointer = array.ptr; int value = arrayPointer[10]; // compiles!!! but will segfault at runtime And of course that won't work in safe code.
 How does D internally know that a pointer was previously allocated by 
 the GC or malloc?
Either the GC has that information or it doesn't.
 If we would replace the GC with reference counting. How would D be able 
 to distinguish a reference counted pointer from a raw pointer at compile 
 time in order to insert the code associated with the reference counting?
It can't.
 This brings me back to MS managed C++ where they actually had two types 
 of "pointers" a managed pointer and the normal C++ pointers. Like this:
 
 MyType^ instance = gcnew MyType();
 
 In this case it was obvious what is done with GC and what wasn't (past 
 tense since managed C++ is deprecated). In this case it would be trivial 
 to replace the GC algorithm with whatever you want since the compiler 
 know the type at compile time.
There is only one type of pointer in D. The GC is a library with language hooks. Nothing more than that. It is easily swappable from within druntime. But it does need to hook into threads and control them (e.g. thread local storage and pausing them) so there are a few restrictions like it must be chosen immediately after libc initialization at the start of druntime initialization.
Oct 03 2019