digitalmars.D - Why are void[] contents marked as having pointers?

Vladimir Panteleev (23/23) May 31 2009 I just went through a ~15000-line project and replaced most occurrences ...

Walter Bright (7/12) May 31 2009 Rare or common, it still would be a nasty bug lurking to catch someone.

Vladimir Panteleev (6/19) May 31 2009 This isn't about performance, this is about having one thousand casts al...

Andrei Alexandrescu (7/31) May 31 2009 Another alternative would be to allow implicitly casting arrays of any

BCS (6/22) May 31 2009 I'm not sure he is (or at least, he is in a very well defined way; "I ne...

Andrei Alexandrescu (11/16) May 31 2009 Oh there is enough information. What's needed is:

BCS (6/24) May 31 2009 Maybe in some cases but if the primary function of the code is processin...
Vladimir Panteleev (10/14) May 31 2009 This is functionally equivalent to (forgive the D1):

Andrei Alexandrescu (3/17) May 31 2009 This is not safe because you can change the data.

Vladimir Panteleev (6/23) Jun 01 2009 Which is why I wrote "forgive the D1" :)

Vladimir Panteleev (6/8) May 31 2009 I could cut down on the number of casts if I were to replace most array ...

Andrei Alexandrescu (6/28) May 31 2009 I understand. You are sending around object representation. void[] may

Vladimir Panteleev (5/11) Jun 01 2009 I've thought about this for a bit. If we allow any *non-reference* type ...

Vladimir Panteleev (7/20) May 31 2009 I just realized that by "performance" you might have meant memory leaks....

Walter Bright (12/37) May 31 2009 No, in this context I meant improving performance by not scanning the

BCS (4/10) May 31 2009 Most (but not all) of the cases I can think of where you get false point...
Vladimir Panteleev (10/46) May 31 2009 It's just compressed data, which is evenly distributed across the 32-bit...

Andrei Alexandrescu (4/7) May 31 2009 To argue that convincingly, you'd need to disable conversions from

Vladimir Panteleev (5/7) May 31 2009 You're right. Perhaps implicit cast of reference types to void[] should ...

Daniel Keep (4/10) May 31 2009 If only there were a way to indicate that void[]s could contain

Vladimir Panteleev (5/7) May 31 2009 I wanted to add that debugging memory corruptions and other memory probl...
Vladimir Panteleev (7/8) May 31 2009 (again, something I forgot to add... shouldn't hit Send so soon)

bearophile (6/8) May 31 2009 I think a better design for that read() function is to return ubyte[].

Denis Koroskin (5/10) May 31 2009 FWIW, I also consider void[] as a storage for an arbitrary untyped binar...
Denis Koroskin (10/17) May 31 2009

Lionello Lunesu (8/20) May 31 2009 You're contradicting yourself there. void[] is arbitrary untyped data,

Christopher Wright (3/24) May 31 2009 Even in C, people often use unsigned char* for arbitrary data that does

grauzone (17/18) May 31 2009 void[] = can contain pointers

BCS (4/11) May 31 2009 Never say never. Some cases like tmp files or whatnot where the same exe...
Vladimir Panteleev (8/33) May 31 2009 std.boxer is actually a valid counter-example for my post.

Christopher Wright (2/5) May 31 2009 What do you use for "may contain unaligned pointers"?

Vladimir Panteleev (5/13) May 31 2009 Sorry, what do you mean? I don't understand why such a type is needed? I...

Christopher Wright (5/17) Jun 01 2009 Because you can have a struct with align(1) that contains pointers. Then...

Vladimir Panteleev (5/24) Jun 01 2009 The GC will not "see" unaligned pointers, regardless if they're in a str...

Christopher Wright (3/26) Jun 01 2009 Okay, so currently the GC doesn't do anything interesting with its type

Vladimir Panteleev (5/35) Jun 02 2009 I wasn't suggesting any GC modifications, I was just suggesting that voi...

Christopher Wright (10/11) Jun 02 2009 The suggestion was that void[] be used as ubyte[] currently is, and then...

Jarrett Billingsley (3/9) Jun 02 2009 How do you have a void*[] point to a block of memory that is not a

Christopher Wright (2/11) Jun 03 2009 Another good point. Or how do you index it by byte?

bearophile (4/5) Jun 03 2009 How can you read & write files of 3 bytes if voids are 4 bytes long chun...

Christopher Wright (3/10) Jun 03 2009 Vladimir was suggesting that void[] be the same as ubyte[] and that you

Daniel Keep (3/16) Jun 04 2009 How would you generically store the bits of this, then?
Vladimir Panteleev (9/20) Jun 04 2009 Actually, I think Andrei's idea is better (to allow implicit casting

Denis Koroskin (6/24) Jun 04 2009 There is a pitfall: should an "arrays of non-reference types" be

Vladimir Panteleev (8/35) Jun 05 2009 I don't see why you'd want to work with arrays of signed bytes. It doesn...

BCS (4/5) Jun 05 2009 I can think of a number of cases where I would expect numbers to be in a...

Vladimir Panteleev (6/11) Jun 05 2009 Yes, but how is this related to abstracting data types to a generic type...

BCS (5/18) Jun 05 2009 It's not and that's the point. The point is there are uses for 8-bit sig...

Vladimir Panteleev (7/25) Jun 05 2009 Oh yes; I was definitely not suggesting removing byte[] from the languag...

Derek Parnell (6/13) Jun 05 2009 Or sound wave sample points [-127, 127]

BCS (14/63) May 31 2009 I think the idea is that void[] is the most general data type; it can be...

Denis Koroskin (2/30) May 31 2009 In this case you should *explicitly* mark that void[] array as "mightHav...

MLT (16/31) Jun 03 2009 As quite a newby, I can sum up what I understood as follows:

Christopher Wright (28/65) Jun 03 2009 First, this is no problem if you are merely aliasing an existing array.

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

I just went through a ~15000-line project and replaced most occurrences of
void[]. Now the project is an ugly mess of void[], ubyte[] and casts, but at
least it doesn't leak memory like crazy any more.

I don't know why it was decided to mark the contents of void[] as "might have
pointers". It makes no sense! Consider:

1) void[] has this wonderful, magical property that any array type implicitly
casts to void[]. This makes it wonderful to use in libraries and functions that
manipulate data with no regards to what it actually contains. Network
libraries, compression libraries, etc. - right about anywhere where you'd use a
void* and length in C++, a void[] is just and appropriate.
2) Despite that void[] is "typeless", you can still operate on it - namely,
slice and concatenate them. Pass a void[] to a network send() function - how
much did you send? Half the buffer? No problem, slice it away and store the
rest - and no casts.
3) It's very rare in practice that the only pointer to your object (which you
still plan to access later) to be stored in a void[]-allocated array! Remember,
the properties of memory regions are determined when the memory is allocated,
so casting an array of structures to a void[] will not lose you that reference.
You'd need to move your pointer to a void[]-array (which you need to allocate
explicitly or, for example, concatenating your reference to the void[]), then
drop the reference to your original structure, for this to happen.

Here's a simple naive implementation of a buffer:

void[] buffer;
void queue(void[] data)
{
	buffer ~= data;
}
...
queue([1,2,3][]);
queue("Hello, World!");

No casts! So simple and beautiful. However, should you use this pattern to work
with larger amounts of data with a high entropy, the "minefield" effect will
cause the GC to stop collecting most data. Sure, you can call
std.gc.hasNoPointers, but you need to do it after every single concatenation...
and it makes expressions with more than one concatenation unsafe.

I heard that Tango copies over the properties of arrays when they are
reallocated, which helps but solves the problem only partially.

So, I ask you: is there actually code out there that depends on the way void[]
works right now? I brought up this argument a year or so ago on IRC, and there
were people who defended ferociously the current design using idealisms ("it
should work like what it sounds like, it should contain any type" or something
like that), but I've yet to see a practical argument.


P.S. How come the standard library doesn't have a simple function like this?

T[] toArray(T)(inout T data) { return (&data)[0..1]; }

It happens often that I need to get a slice of memory around an object's
reference (for example to pass it to a function that takes a void[] :D), and
typing (&x)[0..1] every time feels like a hack.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Walter Bright <newshound1 digitalmars.com> writes:

Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:

[...]

 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!

Rare or common, it still would be a nasty bug lurking to catch someone. 
The default behavior in D should be to be correct code. Doing 
potentially unsafe things to improve performance should require extra 
effort - in this case it would be either using the gc function to mark 
the memory as not containing pointers, or storing them as ubyte[] instead.

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sun, 31 May 2009 22:41:47 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:

 [...]

 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!

 Rare or common, it still would be a nasty bug lurking to catch someone.  
 The default behavior in D should be to be correct code. Doing  
 potentially unsafe things to improve performance should require extra  
 effort - in this case it would be either using the gc function to mark  
 the memory as not containing pointers, or storing them as ubyte[]  
 instead.

This isn't about performance, this is about having one thousand casts all over
my code. It becomes a burden to cast everything to ubyte[] when working with
abstract binary data. For example, when building a MIME multipart message with
binary fields, every line needs to have a cast in it - when we could have just
used the ~= operator to append to a void[].

Alternative solutions would be to have a second type (either new or one of the
existing, e.g. ubyte[]) act as void[] (any array type casts to it implicitly)
but not be scanned by the GC, but I doubt this is something you'll consider

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Vladimir Panteleev wrote:
 On Sun, 31 May 2009 22:41:47 +0300, Walter Bright
 <newshound1 digitalmars.com> wrote:
 
 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
  "might have pointers". It makes no sense! Consider:

 [...]
 
 3) It's very rare in practice that the only pointer to your 
 object (which you still plan to access later) to be stored in a 
 void[]-allocated array!

 Rare or common, it still would be a nasty bug lurking to catch
 someone. The default behavior in D should be to be correct code.
 Doing potentially unsafe things to improve performance should
 require extra effort - in this case it would be either using the gc
 function to mark the memory as not containing pointers, or storing
 them as ubyte[] instead.

 
 This isn't about performance, this is about having one thousand casts
 all over my code. It becomes a burden to cast everything to ubyte[]
 when working with abstract binary data. For example, when building a
 MIME multipart message with binary fields, every line needs to have a
 cast in it - when we could have just used the ~= operator to append
 to a void[].

Another alternative would be to allow implicitly casting arrays of any 
type to const(ubyte)[] which is always safe. But I think this is too 
much ado about nothing - you're avoiding the type system to start with, 
so use ubyte, insert a cast, and call it a day. If you have too many 
casts, the problem is most likely elsewhere so that argument I'm not buying.

Andrei

May 31 2009

BCS <none anon.com> writes:

Hello Andrei,

 Vladimir Panteleev wrote:
 
 This isn't about performance, this is about having one thousand casts
 all over my code. It becomes a burden to cast everything to ubyte[]
 when working with abstract binary data. For example, when building a
 MIME multipart message with binary fields, every line needs to have a
 cast in it - when we could have just used the ~= operator to append
 to a void[].
 

 Another alternative would be to allow implicitly casting arrays of any
 type to const(ubyte)[] which is always safe.

sounds like something that might work. 

 But I think this is too
 much ado about nothing - you're avoiding the type system to start
 with,

I'm not sure he is (or at least, he is in a very well defined way; "I need 
to look at this data as its bytes")

 so use ubyte, insert a cast, and call it a day. If you have too
 many casts, the problem is most likely elsewhere

You might be correct, but I don't think any of us have enough info right 
now to make that assertion.

May 31 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

BCS wrote:
 so use ubyte, insert a cast, and call it a day. If you have too
 many casts, the problem is most likely elsewhere

 
 You might be correct, but I don't think any of us have enough info right 
 now to make that assertion.

Oh there is enough information. What's needed is:

const(ubyte)[] getRepresentation(T)(T[] data)
{
     return cast(typeof(return)) data;
}

If you have many calls to getRepresentation(), then that 
anticlimatically shows that you need to look at arrays' representations 
often. If there are too many of those, maybe some of the said arrays 
should be dealt with as ubyte[] in the first place.


Andrei

May 31 2009

BCS <none anon.com> writes:

Hello Andrei,

 BCS wrote:
 
 so use ubyte, insert a cast, and call it a day. If you have too many
 casts, the problem is most likely elsewhere
 

 You might be correct, but I don't think any of us have enough info
 right now to make that assertion.
 

 Oh there is enough information. What's needed is:
 
 const(ubyte)[] getRepresentation(T)(T[] data)
 {
 return cast(typeof(return)) data;
 }
 If you have many calls to getRepresentation(), then that
 anticlimatically shows that you need to look at arrays'
 representations often. If there are too many of those, maybe some of
 the said arrays should be dealt with as ubyte[] in the first place.

Maybe in some cases but if the primary function of the code is processing 
stuff between "raw data" and other data types than the above is irrelevant. 
The OP sort of hinted somewhere that this is the kind of thing he is working 
on. Without knowing what the OP is doing, I still don't think we can say 
if his program is well designed.

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 const(ubyte)[] getRepresentation(T)(T[] data)
 {
      return cast(typeof(return)) data;
 }

This is functionally equivalent to (forgive the D1):
ubyte[] getRepresentation(void[] data)
{
	return cast(ubyte[]) data;
}
Since no allocation is done in this case, the use of void[] is safe, and it
doesn't instantiate a version of the function for every type you call it with.
I remarked about this in my other reply.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 
 const(ubyte)[] getRepresentation(T)(T[] data)
 {
      return cast(typeof(return)) data;
 }

 
 This is functionally equivalent to (forgive the D1):
 ubyte[] getRepresentation(void[] data)
 {
 	return cast(ubyte[]) data;
 }
 Since no allocation is done in this case, the use of void[] is safe, and it
doesn't instantiate a version of the function for every type you call it with.
I remarked about this in my other reply.
 

This is not safe because you can change the data.

Andrei

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 02:18:46 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 00:00:45 +0300, Andrei Alexandrescu  
 <SeeWebsiteForEmail erdani.org> wrote:

 const(ubyte)[] getRepresentation(T)(T[] data)
 {
      return cast(typeof(return)) data;
 }

  This is functionally equivalent to (forgive the D1):
 ubyte[] getRepresentation(void[] data)
 {
 	return cast(ubyte[]) data;
 }
 Since no allocation is done in this case, the use of void[] is safe,  
 and it doesn't instantiate a version of the function for every type you  
 call it with. I remarked about this in my other reply.


Which is why I wrote "forgive the D1" :)
I've yet to switch to D2, but it's obvious that the const should be there to
ensure safety.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Jun 01 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 But I think this is too much ado about nothing - you're avoiding the type
system to start with, so use ubyte, insert a cast, and call it a day. 

I don't get it - not using casts is avoiding the type system? :P Note that I am
NOT up-casting the void[] later back to some other type - it goes out to the
network, a file, etc. void[] sounds like it fits perfectly in the type
hierarchy for "just a bunch of bytes", except for the "may contain pointers"
fine print.

 If you have too many casts, the problem is most likely elsewhere so that
argument I'm not buying.

I could cut down on the number of casts if I were to replace most array
appending operations to calls to a function that takes a void[] and then
internally casts to an ubyte[] and appends that somewhere. There's a lot of
diversity of types being worked with in my case - strings, various structs,
more raw data, etc. I'm more annoyed that I'd need to do something like that to
work around a design decision that may not have been fully thought out.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Vladimir Panteleev wrote:
 On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 
 But I think this is too much ado about nothing - you're avoiding
 the type system to start with, so use ubyte, insert a cast, and
 call it a day.

 
 I don't get it - not using casts is avoiding the type system? :P Note
 that I am NOT up-casting the void[] later back to some other type -
 it goes out to the network, a file, etc. void[] sounds like it fits
 perfectly in the type hierarchy for "just a bunch of bytes", except
 for the "may contain pointers" fine print.

I understand. You are sending around object representation. void[] may 
contain pointers, so you're simply not looking at the right abstraction.

 If you have too many casts, the problem is most likely elsewhere so
 that argument I'm not buying.

 
 I could cut down on the number of casts if I were to replace most
 array appending operations to calls to a function that takes a void[]
 and then internally casts to an ubyte[] and appends that somewhere.
 There's a lot of diversity of types being worked with in my case -
 strings, various structs, more raw data, etc. I'm more annoyed that
 I'd need to do something like that to work around a design decision
 that may not have been fully thought out.

Walter has written a class called OutBuffer (see std.outbuffer) the 
likes of which could be used to encapsulate representation marshaling.

Andrei

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sun, 31 May 2009 23:24:09 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 Another alternative would be to allow implicitly casting arrays of any  
 type to const(ubyte)[] which is always safe. But I think this is too  
 much ado about nothing - you're avoiding the type system to start with,  
 so use ubyte, insert a cast, and call it a day. If you have too many  
 casts, the problem is most likely elsewhere so that argument I'm not  
 buying.

I've thought about this for a bit. If we allow any *non-reference* type except
void[] to implicitly cast to ubyte[], but still allow implicitly casting
ubyte[] to void[], it will put ubyte[] in the perfect spot in the type
hierarchy - it'll allow safely (portability issues notwithstanding) getting the
representation of value-type (POD) arrays, while still allowing abstracting it
even further to the "might have pointers" type - at which point it is unsafe to
access individual bytes, which void[] disallows without casts.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Jun 01 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sun, 31 May 2009 22:41:47 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:

 [...]

 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!

 Rare or common, it still would be a nasty bug lurking to catch someone.  
 The default behavior in D should be to be correct code. Doing  
 potentially unsafe things to improve performance should require extra  
 effort - in this case it would be either using the gc function to mark  
 the memory as not containing pointers, or storing them as ubyte[]  
 instead.

I just realized that by "performance" you might have meant memory leaks. Well,
sure, if you can say that my programs crashing every few hours due to running
out of memory is a "performance" problem. I'm sorry to sound bitter, but this
was the cause of much annoyance for my software's users. It took me to write a
memory debugger to understand that no matter how much you chase void[]s with
hasNoPointers, there will always be that one ~ which you overlooked.

As much as I try to look from an objective perspective, I don't see how a
memory leak (and memory leaks in D usually mean that NO memory is being freed,
except for small lucky objects not having bogus pointers to them) is a problem
less significant than an obscure case that involves allocating a void[],
storing a pointer in it and losing all other references to the object. In fact,
I just searched the D documentation and I couldn't find a statement saying
whether void[] are scanned by the GC or not. Enter mr. D-newbie, who wants to
write his own network/compression/file-copying/etc. library/program and
stumbles upon void[], the seemingly perfect abstract-binary-data-container type
for the job... (which is exactly what happened with yours truly).

P.S. Not trying to push my point of view, but just trying to offer some
perspective from someone who has been bit by this design choice...

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Walter Bright <newshound1 digitalmars.com> writes:

Vladimir Panteleev wrote:
 I just realized that by "performance" you might have meant memory
 leaks.

No, in this context I meant improving performance by not scanning the 
void[] memory for pointers.

 Well, sure, if you can say that my programs crashing every few
 hours due to running out of memory is a "performance" problem. I'm
 sorry to sound bitter, but this was the cause of much annoyance for
 my software's users. It took me to write a memory debugger to
 understand that no matter how much you chase void[]s with
 hasNoPointers, there will always be that one ~ which you overlooked.

I'm curious what form of data you have that always seem to look like 
valid pointers. There are a couple other options you can pursue - moving 
the gc pool to another location in the address space, or changing the 
alignment of your void[] data so it won't look like aligned pointers 
(the gc won't look for misaligned pointers).

Or just use ubyte[] instead.

 As much as I try to look from an objective perspective, I don't see
 how a memory leak (and memory leaks in D usually mean that NO memory
 is being freed, except for small lucky objects not having bogus
 pointers to them) is a problem less significant than an obscure case
 that involves allocating a void[], storing a pointer in it and losing
 all other references to the object.

Because one is an obvious failure, and the other will be memory 
corruption. Memory corruption is pernicious and awful.

 In fact, I just searched the D
 documentation and I couldn't find a statement saying whether void[]
 are scanned by the GC or not. Enter mr. D-newbie, who wants to write
 his own network/compression/file-copying/etc. library/program and
 stumbles upon void[], the seemingly perfect
 abstract-binary-data-container type for the job... (which is exactly
 what happened with yours truly).
 
 P.S. Not trying to push my point of view, but just trying to offer
 some perspective from someone who has been bit by this design
 choice...

Hmm. Wouldn't compression data be naturally a ubyte[] type?

May 31 2009

BCS <none anon.com> writes:

Hello Walter,

 I'm curious what form of data you have that always seem to look like
 valid pointers. There are a couple other options you can pursue -
 moving the gc pool to another location in the address space, or
 changing the alignment of your void[] data so it won't look like
 aligned pointers (the gc won't look for misaligned pointers).
 

Most (but not all) of the cases I can think of where you get false pointers, 
re-aligning stuff or moving the heap won't help as the false pointer source 
will hit the full address space.

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 00:28:21 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Vladimir Panteleev wrote:
 I just realized that by "performance" you might have meant memory
 leaks.

 No, in this context I meant improving performance by not scanning the  
 void[] memory for pointers.

 Well, sure, if you can say that my programs crashing every few
 hours due to running out of memory is a "performance" problem. I'm
 sorry to sound bitter, but this was the cause of much annoyance for
 my software's users. It took me to write a memory debugger to
 understand that no matter how much you chase void[]s with
 hasNoPointers, there will always be that one ~ which you overlooked.

 I'm curious what form of data you have that always seem to look like  
 valid pointers. There are a couple other options you can pursue - moving  
 the gc pool to another location in the address space, or changing the  
 alignment of your void[] data so it won't look like aligned pointers  
 (the gc won't look for misaligned pointers).

It's just compressed data, which is evenly distributed across the 32-bit
address space. Let's do the math:

Suppose we have an application which has two blocks of memory, M and N. Block M
is a block with random data which is erroneously marked as having pointers,
while block N is a block which shouldn't have any pointers towards it.
Now, the chance that a random DWORD will point inside N is
sizeof(N)/0x100000000 - or rather, we can say that it will NOT point inside N
with the probability of 1-(sizeof(N)/0x100000000). For as many DWORDs as there
are in M, raise that to the power sizeof(M)/4. For values already as small as 1
MB for M and N, it's pretty much guaranteed that you'll have pointers inside N.
Relocating or re-aligning the data won't help - it won't affect the entropy or
the value range.

 Or just use ubyte[] instead.

And the casts that come with it :(

 As much as I try to look from an objective perspective, I don't see
 how a memory leak (and memory leaks in D usually mean that NO memory
 is being freed, except for small lucky objects not having bogus
 pointers to them) is a problem less significant than an obscure case
 that involves allocating a void[], storing a pointer in it and losing
 all other references to the object.

 Because one is an obvious failure, and the other will be memory  
 corruption. Memory corruption is pernicious and awful.

It is, yes. But if you add "don't put your only references inside void[]s" to
the "don'ts" on the GC page, the programmer will only have himself to blame for
not reading the language documentations. This goes right along with other
tricks IMHO.

 In fact, I just searched the D
 documentation and I couldn't find a statement saying whether void[]
 are scanned by the GC or not. Enter mr. D-newbie, who wants to write
 his own network/compression/file-copying/etc. library/program and
 stumbles upon void[], the seemingly perfect
 abstract-binary-data-container type for the job... (which is exactly
 what happened with yours truly).
  P.S. Not trying to push my point of view, but just trying to offer
 some perspective from someone who has been bit by this design
 choice...

 Hmm. Wouldn't compression data be naturally a ubyte[] type?

That's a subjective opinion :) I could just as well continue arguing that
void[] is the perfect type for any kind of "opaque" binary data due to its
properties.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Vladimir Panteleev wrote:
 That's a subjective opinion :) I could just as well continue arguing
 that void[] is the perfect type for any kind of "opaque" binary data
 due to its properties.

To argue that convincingly, you'd need to disable conversions from 
arrays of class objects to void[].

Andrei

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 02:21:33 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:

 To argue that convincingly, you'd need to disable conversions from  
 arrays of class objects to void[].

You're right. Perhaps implicit cast of reference types to void[] should result
in an error.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 02:21:33 +0300, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org> wrote:
 
 To argue that convincingly, you'd need to disable conversions from  
 arrays of class objects to void[].

 
 You're right. Perhaps implicit cast of reference types to void[] should result
in an error.

If only there were a way to indicate that void[]s could contain
pointers, then they would behave uniformly across types...

Oh wait.

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 00:28:21 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Because one is an obvious failure, and the other will be memory  
 corruption. Memory corruption is pernicious and awful.

I wanted to add that debugging memory corruptions and other memory problems for
D right now is complicated due to lack of proper tools in this area. Hopefully
this will change in the near future.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 00:28:21 +0300, Walter Bright <newshound1 digitalmars.com>
wrote:

 Hmm. Wouldn't compression data be naturally a ubyte[] type?

(again, something I forgot to add... shouldn't hit Send so soon)

Consider this really basic example of file concatenation:

auto data = read("file1") ~ read("file2"); // oops! void[] concatenation -
minefield created

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

bearophile <bearophileHUGS lycos.com> writes:

Vladimir Panteleev:
 Consider this really basic example of file concatenation:
 auto data = read("file1") ~ read("file2"); // oops! void[] concatenation -
minefield created

I think a better design for that read() function is to return ubyte[].
I have never understood why it returns a void[].
To manage generic data ubyte is better than void[] in your program (sometimes
uint[] is useful to increase efficiency compared to ubyte[]).

Bye,
bearophile

May 31 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev
<thecybershadow gmail.com> wrote:

 I just went through a ~15000-line project and replaced most occurrences  
 of void[]. Now the project is an ugly mess of void[], ubyte[] and casts,  
 but at least it doesn't leak memory like crazy any more.

 I don't know why it was decided to mark the contents of void[] as "might  
 have pointers". It makes no sense!

FWIW, I also consider void[] as a storage for an arbitrary untyped binary data,
and thus I believe GC shouldn't scan it.

While it is possible to prevent GC from scanning an arbitrary void[] array,
there is no reasonable way to prevent it from scanning all arrays.

It is a breaking change, but may be changed for D2. In 99% it is a correct
behavior (and a bug in a rest), but reduces application execution speed
significantly.

++vote

May 31 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev
<thecybershadow gmail.com> wrote:
 
 I just went through a ~15000-line project and replaced most occurrences  
 of void[]. Now the project is an ugly mess of void[], ubyte[] and casts,  
 but at least it doesn't leak memory like crazy any more.
  
 I don't know why it was decided to mark the contents of void[] as "might  
 have pointers". It makes no sense!
  

 
FWIW, I also consider void[] as a storage for an arbitrary untyped binary data,
and thus I believe GC shouldn't scan it.
Ignoring void[] arrays is a correct behavior in 99% of cases (and a bug in a
rest), but improves application execution speed significantly.
 
While it is possible to prevent GC from scanning an arbitrary void[] array,
there is no reasonable way to prevent it from scanning all arrays (without
modifying GC code).
 
It is a breaking change, but not too late for D2.

++vote

May 31 2009

Lionello Lunesu <lio lunesu.remove.com> writes:

Denis Koroskin wrote:
 On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev
<thecybershadow gmail.com> wrote:
  
 I just went through a ~15000-line project and replaced most occurrences  
 of void[]. Now the project is an ugly mess of void[], ubyte[] and casts,  
 but at least it doesn't leak memory like crazy any more.
  
 I don't know why it was decided to mark the contents of void[] as "might  
 have pointers". It makes no sense!
  

  
 FWIW, I also consider void[] as a storage for an arbitrary untyped binary
 data, and thus I believe GC shouldn't scan it.

You're contradicting yourself there. void[] is arbitrary untyped data, 
so it could contain uints, floats, bytes, pointers, arrays, strings, 
etc. or structs with any of those.

I think the current behavior is correct: ubyte[] is the new void*.

I also agree that std.file.read (and similar functions) should return 
ubyte[] instead of void[], to prevent surprises after concatenation.

L.

May 31 2009

Christopher Wright <dhasenan gmail.com> writes:

Lionello Lunesu wrote:
 Denis Koroskin wrote:
 On Sun, 31 May 2009 22:45:23 +0400, Vladimir Panteleev 
 <thecybershadow gmail.com> wrote:
  
 I just went through a ~15000-line project and replaced most 
 occurrences  of void[]. Now the project is an ugly mess of void[], 
 ubyte[] and casts,  but at least it doesn't leak memory like crazy 
 any more.
  
 I don't know why it was decided to mark the contents of void[] as 
 "might  have pointers". It makes no sense!
  

  
 FWIW, I also consider void[] as a storage for an arbitrary untyped binary

  > data, and thus I believe GC shouldn't scan it.
 
 You're contradicting yourself there. void[] is arbitrary untyped data, 
 so it could contain uints, floats, bytes, pointers, arrays, strings, 
 etc. or structs with any of those.
 
 I think the current behavior is correct: ubyte[] is the new void*.

Even in C, people often use unsigned char* for arbitrary data that does 
not include pointers.

May 31 2009

grauzone <none example.net> writes:

 3) It's very rare in practice that the only pointer to your object (which you
still plan to access later) to be stored in a void[]-allocated array! Remember,
the properties of memory regions are determined when the memory is allocated,
so casting an array of structures to a void[] will not lose you that reference.
You'd need to move your pointer to a void[]-array (which you need to allocate
explicitly or, for example, concatenating your reference to the void[]), then
drop the reference to your original structure, for this to happen.

void[] = can contain pointers
ubyte[] = can not contain pointers

void[] just wraps void*, which is a low level type and can contain 
anything. Because of that, the conservative GC needs to scan it for 
pointers. ubyte[], on the other hand, contains sequences of 8 bit 
integers. For untyped binary data, ubyte[] is the most correct type.

You want to send it over network or write it into a file? Use ubyte[]. 
The data will never contain any pointers. You want to play low level 
tricks, that involve copying around arbitrary memory contents (like 
boxing, see std.boxer)? Use void[].

I think that's a good way to distinguish it.

You shouldn't cast structs or any other types to ubyte[], because the 
memory representation of those type is highly platform specific. Structs 
can contain padding, integers are endian dependend... If you want to 
convert these to binary data, write a marshaller. You _never_ want to do 
direct casts, because they're simply unportable. If you do the cast, you 
have to know what you're doing.

May 31 2009

BCS <none anon.com> writes:

Hello grauzone,

 You shouldn't cast structs or any other types to ubyte[], because the
 memory representation of those type is highly platform specific.
 Structs can contain padding, integers are endian dependend... If you
 want to convert these to binary data, write a marshaller. You _never_
 want to do direct casts, because they're simply unportable. If you do
 the cast, you have to know what you're doing.
 

Never say never. Some cases like tmp files or whatnot where the same exe 
will save and load the file never* have any need for potability.

*"never" uses intentionally :b.

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Sun, 31 May 2009 23:11:57 +0300, grauzone <none example.net> wrote:

 3) It's very rare in practice that the only pointer to your object  
 (which you still plan to access later) to be stored in a  
 void[]-allocated array! Remember, the properties of memory regions are  
 determined when the memory is allocated, so casting an array of  
 structures to a void[] will not lose you that reference. You'd need to  
 move your pointer to a void[]-array (which you need to allocate  
 explicitly or, for example, concatenating your reference to the  
 void[]), then drop the reference to your original structure, for this  
 to happen.

 void[] = can contain pointers
 ubyte[] = can not contain pointers

 void[] just wraps void*, which is a low level type and can contain  
 anything. Because of that, the conservative GC needs to scan it for  
 pointers. ubyte[], on the other hand, contains sequences of 8 bit  
 integers. For untyped binary data, ubyte[] is the most correct type.

 You want to send it over network or write it into a file? Use ubyte[].  
 The data will never contain any pointers. You want to play low level  
 tricks, that involve copying around arbitrary memory contents (like  
 boxing, see std.boxer)? Use void[].

std.boxer is actually a valid counter-example for my post.
The specific fix is simple: replace the void[] with void*[].
The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your only
reference in a void[] results in undefined behavior. I don't think this should
be an inconvenience to any projects?

 You shouldn't cast structs or any other types to ubyte[], because the  
 memory representation of those type is highly platform specific. Structs  
 can contain padding, integers are endian dependend... If you want to  
 convert these to binary data, write a marshaller. You _never_ want to do  
 direct casts, because they're simply unportable. If you do the cast, you  
 have to know what you're doing.

Thanks for the advice, but I actually know what I'm doing. Unlike C, D's
structure alignment rules are actually part of the specification. If I wanted
my programs to be safe/cross-platform/etc. regardless of execution speed, I'd
use a scripting or VM-ed language.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Christopher Wright <dhasenan gmail.com> writes:

Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to
http://www.digitalmars.com/d/garbage.html adding that hiding your only
reference in a void[] results in undefined behavior. I don't think this should
be an inconvenience to any projects?

What do you use for "may contain unaligned pointers"?

May 31 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to  
 http://www.digitalmars.com/d/garbage.html adding that hiding your only  
 reference in a void[] results in undefined behavior. I don't think this  
 should be an inconvenience to any projects?

 What do you use for "may contain unaligned pointers"?

Sorry, what do you mean? I don't understand why such a type is needed?
Implementing support for scanning memory ranges for unaligned pointers will
slow down the GC even more.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

May 31 2009

Christopher Wright <dhasenan gmail.com> writes:

Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright <dhasenan gmail.com>
wrote:
 
 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to  
 http://www.digitalmars.com/d/garbage.html adding that hiding your only  
 reference in a void[] results in undefined behavior. I don't think this  
 should be an inconvenience to any projects?

 What do you use for "may contain unaligned pointers"?

 
 Sorry, what do you mean? I don't understand why such a type is needed?
Implementing support for scanning memory ranges for unaligned pointers will
slow down the GC even more.

Because you can have a struct with align(1) that contains pointers. Then 
these pointers can be unaligned. Then an array of those structs cast to 
a void*[] would contain pointers, but as an optimization, the GC would 
consider the pointers in this array aligned because you tell it they are.

Jun 01 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to   
 http://www.digitalmars.com/d/garbage.html adding that hiding your  
 only  reference in a void[] results in undefined behavior. I don't  
 think this  should be an inconvenience to any projects?

 What do you use for "may contain unaligned pointers"?

  Sorry, what do you mean? I don't understand why such a type is needed?  
 Implementing support for scanning memory ranges for unaligned pointers  
 will slow down the GC even more.

 Because you can have a struct with align(1) that contains pointers. Then  
 these pointers can be unaligned. Then an array of those structs cast to  
 a void*[] would contain pointers, but as an optimization, the GC would  
 consider the pointers in this array aligned because you tell it they are.

The GC will not "see" unaligned pointers, regardless if they're in a struct or
void[] array. The GC doesn't know the type of the data it's scanning - it just
knows if it might contain pointers or it definitely doesn't contain pointers.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Jun 01 2009

Christopher Wright <dhasenan gmail.com> writes:

Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright <dhasenan gmail.com>
wrote:
 
 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to   
 http://www.digitalmars.com/d/garbage.html adding that hiding your  
 only  reference in a void[] results in undefined behavior. I don't  
 think this  should be an inconvenience to any projects?

 What do you use for "may contain unaligned pointers"?

  Sorry, what do you mean? I don't understand why such a type is needed?  
 Implementing support for scanning memory ranges for unaligned pointers  
 will slow down the GC even more.

 Because you can have a struct with align(1) that contains pointers. Then  
 these pointers can be unaligned. Then an array of those structs cast to  
 a void*[] would contain pointers, but as an optimization, the GC would  
 consider the pointers in this array aligned because you tell it they are.

 
 The GC will not "see" unaligned pointers, regardless if they're in a struct or
void[] array. The GC doesn't know the type of the data it's scanning - it just
knows if it might contain pointers or it definitely doesn't contain pointers.

Okay, so currently the GC doesn't do anything interesting with its type 
information. You're suggesting that that be enforced and codified.

Jun 01 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Tue, 02 Jun 2009 01:01:00 +0300, Christopher Wright <dhasenan gmail.com>
wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 14:10:57 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 On Mon, 01 Jun 2009 05:28:39 +0300, Christopher Wright   
 <dhasenan gmail.com> wrote:

 Vladimir Panteleev wrote:
 std.boxer is actually a valid counter-example for my post.
 The specific fix is simple: replace the void[] with void*[].
 The generic "fix" is just to add a line to    
 http://www.digitalmars.com/d/garbage.html adding that hiding your   
 only  reference in a void[] results in undefined behavior. I don't   
 think this  should be an inconvenience to any projects?

 What do you use for "may contain unaligned pointers"?

  Sorry, what do you mean? I don't understand why such a type is  
 needed?  Implementing support for scanning memory ranges for  
 unaligned pointers  will slow down the GC even more.

 Because you can have a struct with align(1) that contains pointers.  
 Then  these pointers can be unaligned. Then an array of those structs  
 cast to  a void*[] would contain pointers, but as an optimization, the  
 GC would  consider the pointers in this array aligned because you tell  
 it they are.

  The GC will not "see" unaligned pointers, regardless if they're in a  
 struct or void[] array. The GC doesn't know the type of the data it's  
 scanning - it just knows if it might contain pointers or it definitely  
 doesn't contain pointers.

 Okay, so currently the GC doesn't do anything interesting with its type  
 information. You're suggesting that that be enforced and codified.

I wasn't suggesting any GC modifications, I was just suggesting that void[]'s
TypeInfo "has pointers" flag be set to false.

-- 
Best regards,
 Vladimir                          mailto:thecybershadow gmail.com

Jun 02 2009

Christopher Wright <dhasenan gmail.com> writes:

Vladimir Panteleev wrote:
 I wasn't suggesting any GC modifications, I was just suggesting that void[]'s
TypeInfo "has pointers" flag be set to false.

The suggestion was that void[] be used as ubyte[] currently is, and then 
to use void*[] to indicate an array of unknown type that may have pointers.

This works when all pointers are aligned, or when the garbage collector 
does not optimize in cases where a type is known not to contain 
unaligned pointers.

Alternatively, you can change the runtime to notify the GC on array 
copies so it can keep track of type information when you're avoiding the 
type system. But it's so easy to get around this by accident, it's not a 
reasonable solution (even if it could be made fast).

Jun 02 2009

Jarrett Billingsley <jarrett.billingsley gmail.com> writes:

On Tue, Jun 2, 2009 at 7:11 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Vladimir Panteleev wrote:
 I wasn't suggesting any GC modifications, I was just suggesting that
 void[]'s TypeInfo "has pointers" flag be set to false.

 The suggestion was that void[] be used as ubyte[] currently is, and then to
 use void*[] to indicate an array of unknown type that may have pointers.

How do you have a void*[] point to a block of memory that is not a
multiple of (void*).sizeof?

Jun 02 2009

Christopher Wright <dhasenan gmail.com> writes:

Jarrett Billingsley wrote:
 On Tue, Jun 2, 2009 at 7:11 PM, Christopher Wright <dhasenan gmail.com> wrote:
 Vladimir Panteleev wrote:
 I wasn't suggesting any GC modifications, I was just suggesting that
 void[]'s TypeInfo "has pointers" flag be set to false.

 The suggestion was that void[] be used as ubyte[] currently is, and then to
 use void*[] to indicate an array of unknown type that may have pointers.

 
 How do you have a void*[] point to a block of memory that is not a
 multiple of (void*).sizeof?

Another good point. Or how do you index it by byte?

Jun 03 2009

bearophile <bearophileHUGS lycos.com> writes:

Christopher Wright:
 Another good point. Or how do you index it by byte?

How can you read & write files of 3 bytes if voids are 4 bytes long chunks? :o)
I don't understand. I want to read and write files byte-by-byte.

Bye,
bearophile

Jun 03 2009

Christopher Wright <dhasenan gmail.com> writes:

bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?

 
 How can you read & write files of 3 bytes if voids are 4 bytes long chunks?
:o) I don't understand. I want to read and write files byte-by-byte.
 
 Bye,
 bearophile

Vladimir was suggesting that void[] be the same as ubyte[] and that you 
use void*[] if you might include a pointer. So that use case would be safe.

Jun 03 2009

Daniel Keep <daniel.keep.lists gmail.com> writes:

Christopher Wright wrote:
 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?

 How can you read & write files of 3 bytes if voids are 4 bytes long
 chunks? :o) I don't understand. I want to read and write files
 byte-by-byte.

 Bye,
 bearophile

 
 Vladimir was suggesting that void[] be the same as ubyte[] and that you
 use void*[] if you might include a pointer. So that use case would be safe.

How would you generically store the bits of this, then?

struct Gotcha { void* ptr; ubyte boo; }

Jun 04 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright  
<dhasenan gmail.com> wrote:

 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?

  How can you read & write files of 3 bytes if voids are 4 bytes long  
 chunks? :o) I don't understand. I want to read and write files  
 byte-by-byte.
  Bye,
 bearophile

 Vladimir was suggesting that void[] be the same as ubyte[] and that you  
 use void*[] if you might include a pointer. So that use case would be  
 safe.

Actually, I think Andrei's idea is better (to allow implicit casting  
arrays of non-reference types to const(ubyte)[]). It introduces an  
abstract no-pointers type, but still allows implicit casting to "might  
have pointers".

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Jun 04 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Thu, 04 Jun 2009 22:16:42 +0400, Vladimir Panteleev  
<thecybershadow gmail.com> wrote:

 On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?

  How can you read & write files of 3 bytes if voids are 4 bytes long  
 chunks? :o) I don't understand. I want to read and write files  
 byte-by-byte.
  Bye,
 bearophile

 Vladimir was suggesting that void[] be the same as ubyte[] and that you  
 use void*[] if you might include a pointer. So that use case would be  
 safe.

 Actually, I think Andrei's idea is better (to allow implicit casting  
 arrays of non-reference types to const(ubyte)[]). It introduces an  
 abstract no-pointers type, but still allows implicit casting to "might  
 have pointers".

There is a pitfall: should an "arrays of non-reference types" be  
implicitly castable to const(byte)[] or const(ubyte[])[] ?

Should const(byte)[] also be implicitly castable to const(ubyte)[] (or  
vice versa)?

Jun 04 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Thu, 04 Jun 2009 21:31:07 +0300, Denis Koroskin <2korden gmail.com>  
wrote:

 On Thu, 04 Jun 2009 22:16:42 +0400, Vladimir Panteleev  
 <thecybershadow gmail.com> wrote:

 On Thu, 04 Jun 2009 05:10:17 +0300, Christopher Wright  
 <dhasenan gmail.com> wrote:

 bearophile wrote:
 Christopher Wright:
 Another good point. Or how do you index it by byte?

  How can you read & write files of 3 bytes if voids are 4 bytes long  
 chunks? :o) I don't understand. I want to read and write files  
 byte-by-byte.
  Bye,
 bearophile

 Vladimir was suggesting that void[] be the same as ubyte[] and that  
 you use void*[] if you might include a pointer. So that use case would  
 be safe.

 Actually, I think Andrei's idea is better (to allow implicit casting  
 arrays of non-reference types to const(ubyte)[]). It introduces an  
 abstract no-pointers type, but still allows implicit casting to "might  
 have pointers".

 There is a pitfall: should an "arrays of non-reference types" be  
 implicitly castable to const(byte)[] or const(ubyte[])[] ?

 Should const(byte)[] also be implicitly castable to const(ubyte)[] (or  
 vice versa)?

I don't see why you'd want to work with arrays of signed bytes. It doesn't  
make sense to allow implicit casting between the two; the programmer  
should just pick one and stick with it. I think unsigned makes more sense.

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Jun 05 2009

BCS <none anon.com> writes:

Hello Vladimir,

 I don't see why you'd want to work with arrays of signed bytes.

I can think of a number of cases where I would expect numbers to be in a 
range like [-20,+20], for instance, delta of small integral value or golf 
scores relative to par.

Jun 05 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Fri, 05 Jun 2009 10:15:11 +0300, BCS <none anon.com> wrote:

 Hello Vladimir,

 I don't see why you'd want to work with arrays of signed bytes.

 I can think of a number of cases where I would expect numbers to be in a  
 range like [-20,+20], for instance, delta of small integral value or  
 golf scores relative to par.

Yes, but how is this related to abstracting data types to a generic type  
that can be used for stuff like buffering or networking?

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Jun 05 2009

BCS <none anon.com> writes:

Hello Vladimir,

 On Fri, 05 Jun 2009 10:15:11 +0300, BCS <none anon.com> wrote:
 
 Hello Vladimir,
 
 I don't see why you'd want to work with arrays of signed bytes.
 

 I can think of a number of cases where I would expect numbers to be
 in a  range like [-20,+20], for instance, delta of small integral
 value or  golf scores relative to par.
 

 Yes, but how is this related to abstracting data types to a generic
 type  that can be used for stuff like buffering or networking?
 

It's not and that's the point. The point is there are uses for 8-bit signed 
integer values other than as raw data. I might have read your comment out 
of context but it seemed you were saying there is no use for the signed byte 
type.

Jun 05 2009

"Vladimir Panteleev" <thecybershadow gmail.com> writes:

On Fri, 05 Jun 2009 20:16:08 +0300, BCS <none anon.com> wrote:

 Hello Vladimir,

 On Fri, 05 Jun 2009 10:15:11 +0300, BCS <none anon.com> wrote:

 Hello Vladimir,

 I don't see why you'd want to work with arrays of signed bytes.

 I can think of a number of cases where I would expect numbers to be
 in a  range like [-20,+20], for instance, delta of small integral
 value or  golf scores relative to par.

 Yes, but how is this related to abstracting data types to a generic
 type  that can be used for stuff like buffering or networking?

 It's not and that's the point. The point is there are uses for 8-bit  
 signed integer values other than as raw data. I might have read your  
 comment out of context but it seemed you were saying there is no use for  
 the signed byte type.

Oh yes; I was definitely not suggesting removing byte[] from the language.  
<insidejoke namespace="#d">I'm sure he wouldn't be pleased one bit if we  
did that! :P</insidejoke>

-- 
Best regards,
  Vladimir                          mailto:thecybershadow gmail.com

Jun 05 2009

Derek Parnell <derek psych.ward> writes:

On Fri, 5 Jun 2009 07:15:11 +0000 (UTC), BCS wrote:

 Hello Vladimir,
 
 I don't see why you'd want to work with arrays of signed bytes.

 
 I can think of a number of cases where I would expect numbers to be in a 
 range like [-20,+20], for instance, delta of small integral value or golf 
 scores relative to par.

Or sound wave sample points [-127, 127] 

-- 
Derek Parnell
Melbourne, Australia
skype: derek.j.parnell

Jun 05 2009

BCS <none anon.com> writes:

Hello Vladimir,

 I just went through a ~15000-line project and replaced most
 occurrences of void[]. Now the project is an ugly mess of void[],
 ubyte[] and casts, but at least it doesn't leak memory like crazy any
 more.
 
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
 
 2) Despite that void[] is "typeless", you can still operate on it -
 namely, slice and concatenate them. Pass a void[] to a network send()
 function - how much did you send? Half the buffer? No problem, slice
 it away and store the rest - and no casts.
 
 3) It's very rare in practice that the only pointer to your object
 (which you still plan to access later) to be stored in a
 void[]-allocated array! Remember, the properties of memory regions are
 determined when the memory is allocated, so casting an array of
 structures to a void[] will not lose you that reference. You'd need to
 move your pointer to a void[]-array (which you need to allocate
 explicitly or, for example, concatenating your reference to the
 void[]), then drop the reference to your original structure, for this
 to happen.
 

I think the idea is that void[] is the most general data type; it can be 
anything, including pointers. 

Also for a real world use case where void[]=mightHavePointers is valid,
consider 
a system that reads blocks of data structures from a file and then does in 
place substation from file references to memory references. You can't allocate 
buffers of the correct type because you may not even know what that is until 
you have already loaded the data.


 Here's a simple naive implementation of a buffer:
 
 void[] buffer;
 void queue(void[] data)
 {
 buffer ~= data;
 }
 ...
 queue([1,2,3][]);
 queue("Hello, World!");
 No casts! So simple and beautiful. However, should you use this
 pattern to work with larger amounts of data with a high entropy, the
 "minefield" effect will cause the GC to stop collecting most data.
 Sure, you can call std.gc.hasNoPointers, but you need to do it after
 every single concatenation... and it makes expressions with more than
 one concatenation unsafe.

Yes, when data is being copied into void[] from another type[] it is reasonable 
to ignore pointers but as above, going the other way (IMHO the /common/ case) 
it's not so easy.

 
 I heard that Tango copies over the properties of arrays when they are
 reallocated, which helps but solves the problem only partially.
 
 So, I ask you: is there actually code out there that depends on the
 way void[] works right now? I brought up this argument a year or so
 ago on IRC, and there were people who defended ferociously the current
 design using idealisms ("it should work like what it sounds like, it
 should contain any type" or something like that), but I've yet to see
 a practical argument.

I think that void[] should be left as is but I'm almost ready to throw in 
with the idea that we **need** another type that has the no-cast parts of 
void[] but assume no pointers as well.

May 31 2009

"Denis Koroskin" <2korden gmail.com> writes:

On Mon, 01 Jun 2009 00:53:02 +0400, BCS <none anon.com> wrote:

 Hello Vladimir,

 I just went through a ~15000-line project and replaced most
 occurrences of void[]. Now the project is an ugly mess of void[],
 ubyte[] and casts, but at least it doesn't leak memory like crazy any
 more.
  I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:
  2) Despite that void[] is "typeless", you can still operate on it -
 namely, slice and concatenate them. Pass a void[] to a network send()
 function - how much did you send? Half the buffer? No problem, slice
 it away and store the rest - and no casts.
  3) It's very rare in practice that the only pointer to your object
 (which you still plan to access later) to be stored in a
 void[]-allocated array! Remember, the properties of memory regions are
 determined when the memory is allocated, so casting an array of
 structures to a void[] will not lose you that reference. You'd need to
 move your pointer to a void[]-array (which you need to allocate
 explicitly or, for example, concatenating your reference to the
 void[]), then drop the reference to your original structure, for this
 to happen.

 I think the idea is that void[] is the most general data type; it can be  
 anything, including pointers.  
 Also for a real world use case where void[]=mightHavePointers is valid,  
 consider a system that reads blocks of data structures from a file and  
 then does in place substation from file references to memory references.  
 You can't allocate buffers of the correct type because you may not even  
 know what that is until you have already loaded the data.

In this case you should *explicitly* mark that void[] array as
"mightHavePointers".

May 31 2009

MLT <none anon.com> writes:

Walter Bright Wrote:

 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:

 
 [...]
 
 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!

 
 Rare or common, it still would be a nasty bug lurking to catch someone. 
 The default behavior in D should be to be correct code. Doing 
 potentially unsafe things to improve performance should require extra 
 effort - in this case it would be either using the gc function to mark 
 the memory as not containing pointers, or storing them as ubyte[] instead.

As quite a newby, I can sum up what I understood as follows:

1. The idea of void[] is that you can put anything in it without casting. 
2. Because of this, you might put pointers in a void[].
3. Since you have "legitimately" stored pointers, and we don't want to have the
GC throw away something that we still have valid pointers for, we have to have
the GC scan over void[] arrays for possible hits.

4. This pretty much means that any "big"(*) D program can not afford to put
uniformly distributed data in a void[] array, because the GC will stop working
correctly - it will not dispose of stuff that you don't need any more.
(*) where "big" means a program that creates and destroys a lot of objects.

So, currently if you want to use void[] to store non-pointers, you need to use
the gc function to mark the memory as not containing pointers.

A comment and a question. I agree that suddenly losing data because you stored
a pointer in a void[] is worse than GC not working well. However, since GC in D
is so automatic, almost any use of void[] to store non-pointer data will cause
massive memory leaks and eventual program failure. 

I can see 4 solutions...

First, to not allow non-pointers to be stored in void[]. So non-pointers are
stored in ubyte[], pointers in void[]. Kinda looses the main point of using
void[].

Second, void[] is not scanned by GC, but you can mark it to be. This can cause
bugs if you store a pointer in void[], and later retreive it, but don't mark
correctly.

Third, void[] is scanned by GC,  but you can mark it not to be. This can cause
memory leaks if you store complex data in void[] in a big program, and don't
handle GC marking correctly.

Forth - somewhat more complex. Since the compiler knows exactly when a pointer
is stored in a void[] and when not, it would be possible to have the compiler
handle all by itself, as long as the property of having to be scanned by GC is
dirty - once a variable has it, any other that touches that variable gets the
property.

Of these four solutions, the last 3 can still cause bugs if one stores both
pointers and data in the same void[] array, no matter how the memory is marked,
unless one does that marking on a very fine scale (is that possible?)

My conclusion from all this is either "don't use void[]", or "only use void[]
to store pointers" if you don't want bugs in a valid program.

Jun 03 2009

Christopher Wright <dhasenan gmail.com> writes:

MLT wrote:
 Walter Bright Wrote:
 
 Vladimir Panteleev wrote:
 I don't know why it was decided to mark the contents of void[] as
 "might have pointers". It makes no sense! Consider:

 [...]

 3) It's very rare in practice that the only pointer to your
 object (which you still plan to access later) to be stored in a
 void[]-allocated array!

 Rare or common, it still would be a nasty bug lurking to catch someone. 
 The default behavior in D should be to be correct code. Doing 
 potentially unsafe things to improve performance should require extra 
 effort - in this case it would be either using the gc function to mark 
 the memory as not containing pointers, or storing them as ubyte[] instead.

 
 As quite a newby, I can sum up what I understood as follows:
 
 1. The idea of void[] is that you can put anything in it without casting. 
 2. Because of this, you might put pointers in a void[].
 3. Since you have "legitimately" stored pointers, and we don't want to have
the GC throw away something that we still have valid pointers for, we have to
have the GC scan over void[] arrays for possible hits.
 
 4. This pretty much means that any "big"(*) D program can not afford to put
uniformly distributed data in a void[] array, because the GC will stop working
correctly - it will not dispose of stuff that you don't need any more.
 (*) where "big" means a program that creates and destroys a lot of objects.
 
 So, currently if you want to use void[] to store non-pointers, you need to use
the gc function to mark the memory as not containing pointers.
 
 A comment and a question. I agree that suddenly losing data because you stored
a pointer in a void[] is worse than GC not working well. However, since GC in D
is so automatic, almost any use of void[] to store non-pointer data will cause
massive memory leaks and eventual program failure. 

First, this is no problem if you are merely aliasing an existing array. 
In order for it to be an issue, you must copy from some array to a 
void[] -- for instance, appending to an existing void[], or .dup'ing a 
void[] alias. (While a GC could work around the latter case, it would be 
unsafe -- you can append something with pointers to a void[] copy of an 
int[].)

 I can see 4 solutions...
 
 First, to not allow non-pointers to be stored in void[]. So non-pointers are
stored in ubyte[], pointers in void[]. Kinda looses the main point of using
void[].
 
 Second, void[] is not scanned by GC, but you can mark it to be. This can cause
bugs if you store a pointer in void[], and later retreive it, but don't mark
correctly.

This is an unsafe option.

 Third, void[] is scanned by GC,  but you can mark it not to be. This can cause
memory leaks if you store complex data in void[] in a big program, and don't
handle GC marking correctly.

This is already available. If you know your array doesn't have pointers, 
you can call GC.hasNoPointers(array.ptr).

This is a safe option.

 Forth - somewhat more complex. Since the compiler knows exactly when a pointer
is stored in a void[] and when not, it would be possible to have the compiler
handle all by itself, as long as the property of having to be scanned by GC is
dirty - once a variable has it, any other that touches that variable gets the
property.

This isn't really the case unless you get some really invasive whole 
program analysis (not available with D's compilation model, or if you 
want to interact with code written in other languages, or if you want to 
do runtime dynamic linking) or a really invasive runtime (think of 
calling a method every time you access an array).

In point of fact, that's not going to be enough. You need to call the 
runtime with every assignment, since you might be passing individual 
ubytes around when they're part of a pointer and reassembling them 
somewhere else.

 Of these four solutions, the last 3 can still cause bugs if one stores both
pointers and data in the same void[] array, no matter how the memory is marked,
unless one does that marking on a very fine scale (is that possible?)

struct S
{
	int i;
	int* j;
}

You're screwed.

 My conclusion from all this is either "don't use void[]", or "only use void[]
to store pointers" if you don't want bugs in a valid program.

Not bugs, but potential performance issues. And the advice should be 
"don't allocate void[]", to split hairs.

Jun 03 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Why are void[] contents marked as having pointers?