digitalmars.D - Array Slices and Interior Pointers

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (6/6) Dec 10 2012 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-poin...

Timon Gehr (10/12) Dec 10 2012 Why does the internal representation have to be the same for a managed

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (12/28) Dec 10 2012 They don't have to be. Ideally it shouldn't even have to matter because

Rainer Schuetze (11/14) Dec 10 2012 I don't think there is a noticeable difference in detecting whether a

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (49/67) Dec 11 2012 From what I could find in e.g. the Boehm GC, there seems to be

Robert Jacques (32/56) Dec 11 2012 On Tue, 11 Dec 2012 11:25:44 -0600, Alex R=F8nne Petersen

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= (29/73) Dec 11 2012 Yes, but I'm not really here to convince anyone about whether interior
Era Scarecrow (24/29) Dec 11 2012 The pointer & lengths won't work well together if you mix them.

Rainer Schuetze (23/80) Dec 11 2012 At least for the D GC, the major work is to figure out if the pointer is...

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (23/120) Dec 11 2012 This got me thinking a bit.

Rainer Schuetze (13/29) Dec 11 2012 Yes, it is only interested in pointers. The current implementation

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= (10/44) Dec 12 2012 OK, makes sense, and point taken. The D GC would not benefit from

renoX (7/9) Dec 11 2012 Instead of changing slices, shouldn't all pointers be modified if

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (8/18) Dec 11 2012 Interior pointers are not generally as useful for other things in the

Dmitry Olshansky (10/12) Dec 11 2012 Aside from the fact that I can use slices without GC just fine? :)

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (7/21) Dec 11 2012 Yes, in theory. But that's not how most idiomatic D code written today

Dmitry Olshansky (21/42) Dec 11 2012 I'd mention that the most of idiomatic D code is agnostic with respect

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (11/54) Dec 11 2012 Then our current slice design is broken.

Dmitry Olshansky (6/24) Dec 12 2012 Nice one ;)

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= (7/32) Dec 12 2012 Just to clarify: I'm not saying you're wrong. I think the fact that that...

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/

Destroy.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Dec 10 2012

Timon Gehr <timon.gehr gmx.ch> writes:

On 12/11/2012 01:04 AM, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/


 Destroy.

Why does the internal representation have to be the same for a managed 
port and native D? Also, how does the second representation work 
exactly? Not all slices extend to the end of the memory block.

I don't really feel strongly about the memory requirements for slices, 
but 12 / 24 bytes is starting to feel a little bulky. I am not 
intimately familiar with druntime, but OTOH and AFAICS, the additional 
pointer should also allow faster retrieval of the slice's capacity. 
(though the compiler should IMHO implement specific optimizations for ~= 
in loops anyway.)

Dec 10 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 11-12-2012 02:49, Timon Gehr wrote:
 On 12/11/2012 01:04 AM, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/



 Destroy.

 Why does the internal representation have to be the same for a managed
 port and native D? Also, how does the second representation work
 exactly? Not all slices extend to the end of the memory block.

They don't have to be. Ideally it shouldn't even have to matter because 
D code shouldn't make assumptions about it.

And good point. That makes the second variation not useful for VMs that 
don't natively support slicing arrays, so I'll scratch that as a useful 
representation. A representation for a VM would then probably need to be 
{length, base, offset} (which could also work for a native D).

 I don't really feel strongly about the memory requirements for slices,
 but 12 / 24 bytes is starting to feel a little bulky. I am not
 intimately familiar with druntime, but OTOH and AFAICS, the additional
 pointer should also allow faster retrieval of the slice's capacity.
 (though the compiler should IMHO implement specific optimizations for ~=
 in loops anyway.)

Some optimizations can probably be done when the base pointer is known.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Dec 10 2012

Rainer Schuetze <r.sagitario gmx.de> writes:

On 11.12.2012 01:04, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/


 Destroy.

I don't think there is a noticeable difference in detecting whether a 
pointer is pointing to the beginning of a GC memory block or somewhere 
inside it.

 This is clearly a huge problem for type-precise garbage collection.

I don't see problems here. If a memory block is referenced, all of it 
contents remains in memory, so they are scanned with their full type 
info. Or do you want to chop off unreferenced parts of the memory block?

 From your post, it seems these are restrictions imposed by the .NET GC, 
not by slices in general. If you take a pointer to a field inside a 
struct, you will again get interior pointer. Do you want "fat pointers" 
for this as well?

Dec 10 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 11-12-2012 08:29, Rainer Schuetze wrote:
 On 11.12.2012 01:04, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/



 Destroy.

 I don't think there is a noticeable difference in detecting whether a
 pointer is pointing to the beginning of a GC memory block or somewhere
 inside it.

 From what I could find in e.g. the Boehm GC, there seems to be 
significant work done to catch interior pointers in addition to base 
pointers (grep for GC_all_interior_pointers and related symbols).

  > This is clearly a huge problem for type-precise garbage collection.

 I don't see problems here. If a memory block is referenced, all of it
 contents remains in memory, so they are scanned with their full type
 info. Or do you want to chop off unreferenced parts of the memory block?

No, the problem I was getting at is:

Suppose we have a field int* p; somewhere in the GC heap. With the 
current state of affairs, we have to consider that this field can hold a 
value that is either:

a) null (we don't care)
b) a pointer into C memory (we don't care)
c) a base pointer into the GC heap (unlikely but possible if "new int" 
was used somewhere)
d) an interior pointer into the GC heap (much more likely; a pointer to 
a field of another object)

So we have to look at the pointer and first figure out what kind of 
memory block it is /actually/ pointing to before we have any kind of 
type info available (just the knowledge that it's of type int* is not 
particularly useful by itself other than knowing that it could be a 
pointer at all).

With my scheme, the possibilities would be:

a) null (we don't care)
b) a pointer into C memory (we don't care)
c) a base pointer into the GC heap where the memory block is of type int*

Notice how we did not have to do any significant work to figure out what 
we're dealing with; we immediately know what kind of typed memory the 
pointer is pointing to.

This becomes more of an advantage with aggregates. Suppose we have:

struct A
{
     // ... more fields ...
}

And we have a field A* p; somewhere in the GC heap. We can now look at 
it and immediately tell whether it's a case of a, b, or c above and can 
trivially continue scanning into the pointed-to memory (if needed).

So the TL;DR is: We avoid extra work to figure out the actual type of 
the memory something is pointing to by simply making such cases illegal.

Whether that is practical, I do not know, and I don't plan to push for 
it anytime soon at least. But it has to be done for D to ever run on the 
CLI.

  From your post, it seems these are restrictions imposed by the .NET GC,
 not by slices in general. If you take a pointer to a field inside a
 struct, you will again get interior pointer. Do you want "fat pointers"
 for this as well?

Sure, there's nothing wrong with slices if we assume all GCs that'll be 
running in a D implementation support interior pointers. But if we make 
this assumption, D can never run on the CLI.

Interior pointers are OK in the stack and registers, so taking pointers 
to fields inside aggregates should be fine so long as they are not 
stored in the heap.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Dec 11 2012

"Robert Jacques" <rjacque2 live.johnshopkins.edu> writes:

On Tue, 11 Dec 2012 11:25:44 -0600, Alex R=F8nne Petersen <alex lycus.or=
g>  =

wrote:
 On 11-12-2012 08:29, Rainer Schuetze wrote:
 On 11.12.2012 01:04, Alex R=F8nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-=



pointers/
 Destroy.



Done.

[snip]

 From what I could find in e.g. the Boehm GC, there seems to be  =

 significant work done to catch interior pointers in addition to base  =

 pointers (grep for GC_all_interior_pointers and related symbols).

*Ahem* Arguments regarding performance require A) hard numbers and B) ar=
e  =

implementation specific.

[snip]

 Suppose we have a field int* p;

p _isn't_ a slice, so you're 'fixes' don't apply.

[snip]

 So we have to look at the pointer and first figure out what kind of  =

 memory block it is /actually/ pointing to before we have any kind of  =

 type info available (just the knowledge that it's of type int* is not =

 =

 particularly useful by itself other than knowing that it could be a  =

 pointer at all).

How is p >> 12 slow or difficult? (Assuming log2(PageSize) =3D=3D 12)

 So the TL;DR is: We avoid extra work to figure out the actual type of =

 =

 the memory something is pointing to by simply making such cases illega=

l.

At the cost of extra work and more memory everywhere arrays are used.

 Whether that is practical, I do not know, and I don't plan to push for=

  =

 it anytime soon at least. But it has to be done for D to ever run on t=

he  =

 CLI.

The issue with the CLI has nothing to do with this. The problem is that =
D  =

arrays are D slices (i.e. we don't have T[new]) and D code is written to=
  =

be slice compatible. Whereas the .Net libraries are, for the most part, =
 =

slice incompatible. So slice-based code, in D or .Net, has to constantly=
  =

convert back to arrays, which is a major performance sink.

[snip]

 But if we make this assumption, D can never run on the CLI.

False, see http://dnet.codeplex.com/.

 Interior pointers are OK in the stack and registers, so taking pointer=

s  =

 to fields inside aggregates should be fine so long as they are not  =

 stored in the heap.

So what about unions?

Dec 11 2012

=?ISO-8859-15?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 11-12-2012 19:11, Robert Jacques wrote:
 On Tue, 11 Dec 2012 11:25:44 -0600, Alex R�nne Petersen <alex lycus.org>
 wrote:
 On 11-12-2012 08:29, Rainer Schuetze wrote:
 On 11.12.2012 01:04, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/


 Destroy.



 Done.

 [snip]

 From what I could find in e.g. the Boehm GC, there seems to be
 significant work done to catch interior pointers in addition to base
 pointers (grep for GC_all_interior_pointers and related symbols).

 *Ahem* Arguments regarding performance require A) hard numbers and B)
 are implementation specific.

Yes, but I'm not really here to convince anyone about whether interior 
pointers are needed or not. I was looking for input on the soundness of 
my proposal in order to /avoid/ interior pointers. Just that.

That is to say: I don't care enough about arguing this particular point 
to actually construct a benchmark. If you don't think interior pointers 
are a problem, that is fine, but then your input on the proposal 
probably isn't very useful, because even if I took back my argument 
about performance, interior pointers are still a very real problem for D 
integration into the CLI.

 [snip]

 Suppose we have a field int* p;

 p _isn't_ a slice, so you're 'fixes' don't apply.

I was replying to Rainer's question about interior pointers and type 
precision in general. You're taking my reply way out of context.

 [snip]

 So we have to look at the pointer and first figure out what kind of
 memory block it is /actually/ pointing to before we have any kind of
 type info available (just the knowledge that it's of type int* is not
 particularly useful by itself other than knowing that it could be a
 pointer at all).

 How is p >> 12 slow or difficult? (Assuming log2(PageSize) == 12)

Doesn't look slow and difficult to me. But it depends on the GC 
implementation, as you said. :)

 So the TL;DR is: We avoid extra work to figure out the actual type of
 the memory something is pointing to by simply making such cases illegal.

 At the cost of extra work and more memory everywhere arrays are used.

Yes.

 Whether that is practical, I do not know, and I don't plan to push for
 it anytime soon at least. But it has to be done for D to ever run on
 the CLI.

 The issue with the CLI has nothing to do with this. The problem is that
 D arrays are D slices (i.e. we don't have T[new]) and D code is written
 to be slice compatible. Whereas the .Net libraries are, for the most
 part, slice incompatible. So slice-based code, in D or .Net, has to
 constantly convert back to arrays, which is a major performance sink.

I'm sorry, but you are wrong.

Interior pointers are /not/ permitted in the CLI. See Ecma 335, I.8.2.1.1.

D as it exists today cannot work in the CLI if it requires interior 
pointers for such a fundamental language feature no matter how you look 
at it.

 [snip]

 But if we make this assumption, D can never run on the CLI.

 False, see http://dnet.codeplex.com/.

No, not false. This project is stalled because of slices. And 
regardless, the CLI spec clearly does not allow interior pointers.

 Interior pointers are OK in the stack and registers, so taking
 pointers to fields inside aggregates should be fine so long as they
 are not stored in the heap.

 So what about unions?

Emit a type info bit saying "scan conservatively". Unions are the 
exception rather than the rule. As far as the CLI goes, unions cannot 
work at all, obviously.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Dec 11 2012

"Era Scarecrow" <rtcvb32 yahoo.com> writes:

On Tuesday, 11 December 2012 at 18:11:32 UTC, Robert Jacques 
wrote:
 On Tue, 11 Dec 2012 11:25:44 -0600, Alex Rønne Petersen wrote:
 Interior pointers are OK in the stack and registers, so taking 
 pointers to fields inside aggregates should be fine so long as 
 they are not stored in the heap.

 So what about unions?

  The pointer & lengths won't work well together if you mix them. 
Consider.

   struct S {
     union {
       int[] i;
       byte[] b;
     }
   }

   S s;

   s.i.length = 4;
   assert(s.i.length == 4);
   assert(s.b.length == 16); //fails
   assert(s.b.length == 4);  //the implementation

   s.b = cast(byte[]) s.i;
   assert(s.b.length == 16); //true
   assert(s.i.length == 4);  //fails
   assert(s.i.length == 16); //the implementation (last twelve 
Sigfaults probably)

  The only way to properly use that is to have one of the data 
types you always convert from/to, but the GC wouldn't know and 
might try them all; Although only the base pointer might be 
considered so...

Dec 11 2012

Rainer Schuetze <r.sagitario gmx.de> writes:

On 11.12.2012 18:25, Alex R�nne Petersen wrote:
 On 11-12-2012 08:29, Rainer Schuetze wrote:
 On 11.12.2012 01:04, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/

  > This is clearly a huge problem for type-precise garbage collection.

 I don't see problems here. If a memory block is referenced, all of it
 contents remains in memory, so they are scanned with their full type
 info. Or do you want to chop off unreferenced parts of the memory block?

 No, the problem I was getting at is:

 Suppose we have a field int* p; somewhere in the GC heap. With the
 current state of affairs, we have to consider that this field can hold a
 value that is either:

 a) null (we don't care)
 b) a pointer into C memory (we don't care)
 c) a base pointer into the GC heap (unlikely but possible if "new int"
 was used somewhere)
 d) an interior pointer into the GC heap (much more likely; a pointer to
 a field of another object)

 So we have to look at the pointer and first figure out what kind of
 memory block it is /actually/ pointing to before we have any kind of
 type info available (just the knowledge that it's of type int* is not
 particularly useful by itself other than knowing that it could be a
 pointer at all).

At least for the D GC, the major work is to figure out if the pointer is 
pointing to GC memory or not. Once that is done (i.e. a pool of 
contiguous memory is found that contains the addressed memory) it's just 
a table lookup for the size and corresponding address alignment to get 
the base of the referenced GC memory block.

 With my scheme, the possibilities would be:

 a) null (we don't care)
 b) a pointer into C memory (we don't care)
 c) a base pointer into the GC heap where the memory block is of type int*

 Notice how we did not have to do any significant work to figure out what
 we're dealing with; we immediately know what kind of typed memory the
 pointer is pointing to.

This stores the type info with the reference, not with the memory block, 
but it does not make a big difference. (Actually it does: if the 
reference only is a reference a base class of the actual instance, type 
info is lost.)

 This becomes more of an advantage with aggregates. Suppose we have:

 struct A
 {
      // ... more fields ...
 }

 And we have a field A* p; somewhere in the GC heap. We can now look at
 it and immediately tell whether it's a case of a, b, or c above and can
 trivially continue scanning into the pointed-to memory (if needed).

 So the TL;DR is: We avoid extra work to figure out the actual type of
 the memory something is pointing to by simply making such cases illegal.

 Whether that is practical, I do not know, and I don't plan to push for
 it anytime soon at least. But it has to be done for D to ever run on the
 CLI.

I understand that the CLI forbids interior pointers, but that seems an 
implementation detail of its GC.

  From your post, it seems these are restrictions imposed by the .NET GC,
 not by slices in general. If you take a pointer to a field inside a
 struct, you will again get interior pointer. Do you want "fat pointers"
 for this as well?

 Sure, there's nothing wrong with slices if we assume all GCs that'll be
 running in a D implementation support interior pointers. But if we make
 this assumption, D can never run on the CLI.

 Interior pointers are OK in the stack and registers, so taking pointers
 to fields inside aggregates should be fine so long as they are not
 stored in the heap.

I don't think we should introduce pretty strange semantics that 
introduce different kind of pointers and targets depending on whether 
they live on the heap or the stack.

The best that could be done for a .NET target build would be to let the 
compiler create fat pointers that always store the base of the memory 
block and an offset, not just for slices.

BTW I was also thinking whether "instrumented" pointers should be used 
to support a GC that works without "stopping the world". E.g. they would 
allow to keep track of references to each memory block continuously, or 
to remember which references were changed since the last scan in the 
hope to do incremental/generational scans.

Dec 11 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 11-12-2012 21:24, Rainer Schuetze wrote:
 On 11.12.2012 18:25, Alex R�nne Petersen wrote:
 On 11-12-2012 08:29, Rainer Schuetze wrote:
 On 11.12.2012 01:04, Alex R�nne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/

  > This is clearly a huge problem for type-precise garbage collection.

 I don't see problems here. If a memory block is referenced, all of it
 contents remains in memory, so they are scanned with their full type
 info. Or do you want to chop off unreferenced parts of the memory block?

 No, the problem I was getting at is:

 Suppose we have a field int* p; somewhere in the GC heap. With the
 current state of affairs, we have to consider that this field can hold a
 value that is either:

 a) null (we don't care)
 b) a pointer into C memory (we don't care)
 c) a base pointer into the GC heap (unlikely but possible if "new int"
 was used somewhere)
 d) an interior pointer into the GC heap (much more likely; a pointer to
 a field of another object)

 So we have to look at the pointer and first figure out what kind of
 memory block it is /actually/ pointing to before we have any kind of
 type info available (just the knowledge that it's of type int* is not
 particularly useful by itself other than knowing that it could be a
 pointer at all).

 At least for the D GC, the major work is to figure out if the pointer is
 pointing to GC memory or not. Once that is done (i.e. a pool of
 contiguous memory is found that contains the addressed memory) it's just
 a table lookup for the size and corresponding address alignment to get
 the base of the referenced GC memory block.

I see. That probably does make it less of a problem for D's GC.

 With my scheme, the possibilities would be:

 a) null (we don't care)
 b) a pointer into C memory (we don't care)
 c) a base pointer into the GC heap where the memory block is of type int*

 Notice how we did not have to do any significant work to figure out what
 we're dealing with; we immediately know what kind of typed memory the
 pointer is pointing to.

 This stores the type info with the reference, not with the memory block,
 but it does not make a big difference. (Actually it does: if the
 reference only is a reference a base class of the actual instance, type
 info is lost.)

This got me thinking a bit.

In the current on-going precise GC work, what is type info actually used 
for? It seems to me that given the current GC semantics, the only thing 
it's useful for is figuring out what parts of memory do /not/ contain 
pointers, and nothing else.

 This becomes more of an advantage with aggregates. Suppose we have:

 struct A
 {
      // ... more fields ...
 }

 And we have a field A* p; somewhere in the GC heap. We can now look at
 it and immediately tell whether it's a case of a, b, or c above and can
 trivially continue scanning into the pointed-to memory (if needed).

 So the TL;DR is: We avoid extra work to figure out the actual type of
 the memory something is pointing to by simply making such cases illegal.

 Whether that is practical, I do not know, and I don't plan to push for
 it anytime soon at least. But it has to be done for D to ever run on the
 CLI.

 I understand that the CLI forbids interior pointers, but that seems an
 implementation detail of its GC.

It's standardized: Ecma 335, I.8.2.1.1.

And it's something we need to deal with if we care about D on the CLI.

  From your post, it seems these are restrictions imposed by the .NET GC,
 not by slices in general. If you take a pointer to a field inside a
 struct, you will again get interior pointer. Do you want "fat pointers"
 for this as well?

 Sure, there's nothing wrong with slices if we assume all GCs that'll be
 running in a D implementation support interior pointers. But if we make
 this assumption, D can never run on the CLI.

 Interior pointers are OK in the stack and registers, so taking pointers
 to fields inside aggregates should be fine so long as they are not
 stored in the heap.

 I don't think we should introduce pretty strange semantics that
 introduce different kind of pointers and targets depending on whether
 they live on the heap or the stack.

I don't think those semantics are particularly strange. It's how most 
all virtual machines work.

 The best that could be done for a .NET target build would be to let the
 compiler create fat pointers that always store the base of the memory
 block and an offset, not just for slices.

Perhaps, but realistically, we can't do this because most code assumes 
pointers are the same size as the machine's word size (i.e. 
(void*).sizeof == size_t.sizeof).

There's also the problem that, strictly speaking, I.8.2.1.1 says that 
interior pointers are outright forbidden in the heap regardless of 
whether a live base pointer exists...

 BTW I was also thinking whether "instrumented" pointers should be used
 to support a GC that works without "stopping the world". E.g. they would
 allow to keep track of references to each memory block continuously, or
 to remember which references were changed since the last scan in the
 hope to do incremental/generational scans.

I'm not familiar with anything like that so I can't comment on it. 
Sounds interesting, though.

-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Dec 11 2012

Rainer Schuetze <r.sagitario gmx.de> writes:

On 11.12.2012 22:08, Alex R�nne Petersen wrote:
 On 11-12-2012 21:24, Rainer Schuetze wrote:
 This stores the type info with the reference, not with the memory block,
 but it does not make a big difference. (Actually it does: if the
 reference only is a reference a base class of the actual instance, type
 info is lost.)

 This got me thinking a bit.

 In the current on-going precise GC work, what is type info actually used
 for? It seems to me that given the current GC semantics, the only thing
 it's useful for is figuring out what parts of memory do /not/ contain
 pointers, and nothing else.

Yes, it is only interested in pointers. The current implementation 
creates a bitmap from type introspection at compile time, where each bit 
specifies whether the respective word of an instance is a pointer. When 
a memory block is allocated, the bitmap is copied (with some 
complications) from the TypeInfo object into a memory bitmap that is 
used for GC scanning later. This seems slightly inefficient with respect 
to memory usage, but it allows to scan faster, as the complications have 
to be dealt with only once, not every time when scanning. It also allows 
changing the scanning information of only a part later e.g. to integrate 
emplace!T with precise scanning (though this isn't implemented yet).

 I understand that the CLI forbids interior pointers, but that seems an
 implementation detail of its GC.

 It's standardized: Ecma 335, I.8.2.1.1.

I still read that restriction as driven by an implementation detail 
("For performance reasons..."), not by some design necessity.

 And it's something we need to deal with if we care about D on the CLI.

Dec 11 2012

=?ISO-8859-1?Q?Alex_R=F8nne_Petersen?= <alex lycus.org> writes:

On 11-12-2012 22:38, Rainer Schuetze wrote:
 On 11.12.2012 22:08, Alex R�nne Petersen wrote:
 On 11-12-2012 21:24, Rainer Schuetze wrote:
 This stores the type info with the reference, not with the memory block,
 but it does not make a big difference. (Actually it does: if the
 reference only is a reference a base class of the actual instance, type
 info is lost.)

 This got me thinking a bit.

 In the current on-going precise GC work, what is type info actually used
 for? It seems to me that given the current GC semantics, the only thing
 it's useful for is figuring out what parts of memory do /not/ contain
 pointers, and nothing else.

 Yes, it is only interested in pointers. The current implementation
 creates a bitmap from type introspection at compile time, where each bit
 specifies whether the respective word of an instance is a pointer. When
 a memory block is allocated, the bitmap is copied (with some
 complications) from the TypeInfo object into a memory bitmap that is
 used for GC scanning later. This seems slightly inefficient with respect
 to memory usage, but it allows to scan faster, as the complications have
 to be dealt with only once, not every time when scanning. It also allows
 changing the scanning information of only a part later e.g. to integrate
 emplace!T with precise scanning (though this isn't implemented yet).

OK, makes sense, and point taken. The D GC would not benefit from 
getting rid of interior pointers in any significant way. The only 
advantage would then be for VMs like the CLI.

 I understand that the CLI forbids interior pointers, but that seems an
 implementation detail of its GC.

 It's standardized: Ecma 335, I.8.2.1.1.

 I still read that restriction as driven by an implementation detail
 ("For performance reasons..."), not by some design necessity.

Oh, sure, all I'm saying is that since it is in the standard (and both 
MS.NET and Mono require it), we have to deal with it one way or another.

 And it's something we need to deal with if we care about D on the CLI.


-- 
Alex R�nne Petersen
alex lycus.org
http://lycus.org

Dec 12 2012

"renoX" <renozyx gmail.com> writes:

On Tuesday, 11 December 2012 at 00:04:57 UTC, Alex Rønne Petersen 
wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/

 Destroy.

Instead of changing slices, shouldn't all pointers be modified if 
you want to do this kind of things (a pointer would have two 
parts a reference to the "head" and the real reference)?

BR,
renoX

Dec 11 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 11-12-2012 11:36, renoX wrote:
 On Tuesday, 11 December 2012 at 00:04:57 UTC, Alex Rønne Petersen wrote:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/


 Destroy.

 Instead of changing slices, shouldn't all pointers be modified if you
 want to do this kind of things (a pointer would have two parts a
 reference to the "head" and the real reference)?

 BR,
 renoX

Interior pointers are not generally as useful for other things in the 
language as they are for slices, so I don't think any change is 
necessarily needed there.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Dec 11 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/


 Destroy.

Aside from the fact that I can use slices without GC just fine? :)
The base pointers would then be either counted, released manually or 
implicitly as part of stack unwinding.

I personally think that managed VMs are going to have to emulate slices 
and pointers as an array object + one or pair of offsets. In fact it 
could be implemented as an abstract object with implementation depending 
on where you did get that pointer from.

-- 
Dmitry Olshansky

Dec 11 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 11-12-2012 20:09, Dmitry Olshansky wrote:
 12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/



 Destroy.

 Aside from the fact that I can use slices without GC just fine? :)
 The base pointers would then be either counted, released manually or
 implicitly as part of stack unwinding.

Yes, in theory. But that's not how most idiomatic D code written today 
works.

 I personally think that managed VMs are going to have to emulate slices
 and pointers as an array object + one or pair of offsets. In fact it
 could be implemented as an abstract object with implementation depending
 on where you did get that pointer from.


-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Dec 11 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12/11/2012 11:23 PM, Alex Rønne Petersen пишет:
 On 11-12-2012 20:09, Dmitry Olshansky wrote:
 12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/




 Destroy.

 Aside from the fact that I can use slices without GC just fine? :)
 The base pointers would then be either counted, released manually or
 implicitly as part of stack unwinding.

 Yes, in theory. But that's not how most idiomatic D code written today
 works.

I'd mention that the most of idiomatic D code is agnostic with respect 
to the origin of slice. The major reason to use slices is to avoid 
allocations and thus the allocation scheme is not important up to the 
point of explicit copy.

And at that point e.g. Phobos plays it safe and does everything that has 
to copy or incrementally build via GC. And it gets bashed for it every 
once in a while. To put simply it's because there is no concept of 
allocators in idiomatic D code _yet_.

And separating slices and allocation mechanism behind them is the key of 
usability of slices as they stand. If we add stuff that makes them 50% 
more bulky and helps only a certain scheme of GC memory allocation we 
are screwed.

Also what would direct operations with ptr field translate to in your 
scheme e.g.:
arr.ptr = arr.ptr+x;

where ptr+x is hidden by some function and not obvious to the compiler?

Same question with slicing a raw pointer - what will the base contain? 
(The pointer _might_ have been interior.)

 I personally think that managed VMs are going to have to emulate slices
 and pointers as an array object + one or pair of offsets. In fact it
 could be implemented as an abstract object with implementation depending
 on where you did get that pointer from.



-- 
Dmitry Olshansky

Dec 11 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 11-12-2012 21:24, Dmitry Olshansky wrote:
 12/11/2012 11:23 PM, Alex Rønne Petersen пишет:
 On 11-12-2012 20:09, Dmitry Olshansky wrote:
 12/11/2012 4:04 AM, Alex Rønne Petersen пишет:
 http://xtzgzorex.wordpress.com/2012/12/11/array-slices-and-interior-pointers/





 Destroy.

 Aside from the fact that I can use slices without GC just fine? :)
 The base pointers would then be either counted, released manually or
 implicitly as part of stack unwinding.

 Yes, in theory. But that's not how most idiomatic D code written today
 works.

 I'd mention that the most of idiomatic D code is agnostic with respect
 to the origin of slice. The major reason to use slices is to avoid
 allocations and thus the allocation scheme is not important up to the
 point of explicit copy.

 And at that point e.g. Phobos plays it safe and does everything that has
 to copy or incrementally build via GC. And it gets bashed for it every
 once in a while. To put simply it's because there is no concept of
 allocators in idiomatic D code _yet_.

 And separating slices and allocation mechanism behind them is the key of
 usability of slices as they stand. If we add stuff that makes them 50%
 more bulky and helps only a certain scheme of GC memory allocation we
 are screwed.

Then our current slice design is broken.

int[] arr;
arr.length = 1024; // guess where this memory comes from?

 Also what would direct operations with ptr field translate to in your
 scheme e.g.:
 arr.ptr = arr.ptr+x;

 where ptr+x is hidden by some function and not obvious to the compiler?

Exactly what you wrote. Remember, the ptr field doesn't change meaning.

 Same question with slicing a raw pointer - what will the base contain?
 (The pointer _might_ have been interior.)

Of course you have to take care not to slice an interior pointer and let 
the base pointer go out of scope.

 I personally think that managed VMs are going to have to emulate slices
 and pointers as an array object + one or pair of offsets. In fact it
 could be implemented as an abstract object with implementation depending
 on where you did get that pointer from.




-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Dec 11 2012

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

12/12/2012 12:59 AM, Alex Rønne Petersen пишет:
[snip]
 I'd mention that the most of idiomatic D code is agnostic with respect
 to the origin of slice. The major reason to use slices is to avoid
 allocations and thus the allocation scheme is not important up to the
 point of explicit copy.

 And at that point e.g. Phobos plays it safe and does everything that has
 to copy or incrementally build via GC. And it gets bashed for it every
 once in a while. To put simply it's because there is no concept of
 allocators in idiomatic D code _yet_.

 And separating slices and allocation mechanism behind them is the key of
 usability of slices as they stand. If we add stuff that makes them 50%
 more bulky and helps only a certain scheme of GC memory allocation we
 are screwed.

 Then our current slice design is broken.

 int[] arr;
 arr.length = 1024; // guess where this memory comes from?

Nice one ;)
Guess this point was destroyed.

-- 
Dmitry Olshansky

Dec 12 2012

=?UTF-8?B?QWxleCBSw7hubmUgUGV0ZXJzZW4=?= <alex lycus.org> writes:

On 12-12-2012 09:30, Dmitry Olshansky wrote:
 12/12/2012 12:59 AM, Alex Rønne Petersen пишет:
 [snip]
 I'd mention that the most of idiomatic D code is agnostic with respect
 to the origin of slice. The major reason to use slices is to avoid
 allocations and thus the allocation scheme is not important up to the
 point of explicit copy.

 And at that point e.g. Phobos plays it safe and does everything that has
 to copy or incrementally build via GC. And it gets bashed for it every
 once in a while. To put simply it's because there is no concept of
 allocators in idiomatic D code _yet_.

 And separating slices and allocation mechanism behind them is the key of
 usability of slices as they stand. If we add stuff that makes them 50%
 more bulky and helps only a certain scheme of GC memory allocation we
 are screwed.

 Then our current slice design is broken.

 int[] arr;
 arr.length = 1024; // guess where this memory comes from?

 Nice one ;)
 Guess this point was destroyed.

Just to clarify: I'm not saying you're wrong. I think the fact that that 
particular slice feature is tied to the GC is actually a pretty bad thing.

-- 
Alex Rønne Petersen
alex lycus.org
http://lycus.org

Dec 12 2012

D Programming

C/C++ Programming

Other

digitalmars.D - Array Slices and Interior Pointers