digitalmars.D - Setting array length without initializing/reallocating.

Jonathan Levi (8/8) Dec 11 2020 Wow, there went several hours of debugging.

H. S. Teoh (7/18) Dec 11 2020 I highly recommend reading the following article if you work with D
Kagamin (2/2) Dec 12 2020 Yes, pointers are the only unsafe way to access memory, slices
=?UTF-8?Q?S=c3=b6nke_Ludwig?= (4/15) Dec 12 2020 One way around this is to call `array.assumeSafeAppend();` before
Bastiaan Veelo (12/20) Dec 12 2020 Hold on -- how does this not corrupt memory? As soon as the
Mike Parker (4/12) Dec 12 2020 You're setting yourself up for failure with that. What are you

Jackson22 (3/19) Dec 13 2020 How is avoiding an expensive potentially memory leaking operation

rikki cattermole (5/13) Dec 13 2020 No bounds checking. That slice can extend into memory that isn't of that...

Jackson22 (10/25) Dec 13 2020 No *automatic* bounds checking != no bounds checking.

rikki cattermole (21/46) Dec 13 2020 I have used it in the past where appropriate with 0 issues resulting
Dukc (12/22) Dec 13 2020 Yes it's possible to without automatic bounds checks. Sometimes

Dukc (2/3) Dec 13 2020 Meant: Yes it's possible to live without automatic bounds checks.

Mike Parker (19/35) Dec 13 2020 "avoiding an expensive potentially memory leaking operation" is
Mike Parker (2/6) Dec 13 2020 There's no pretending here. What the OP is doing *is* dangerous.

Jackson22 (7/15) Dec 14 2020 If someone writes a wrapper around .ptr which checks. It'd be

Max Haughton (11/29) Dec 14 2020 Good practice is good practice. If you know what you're doing you
Steven Schveighoffer (10/27) Dec 14 2020 It's possible you have misinterpreted what the OP is asking for.

Paul Backus (4/6) Dec 14 2020 Though doing it correctly may be harder than you'd think:

Mike Parker (15/29) Dec 14 2020 Of course. I'm not arguing otherwise. I don't see that anyone

Dukc (30/38) Dec 13 2020 There is a big downside in doing that: the array will not check

Dukc (4/11) Dec 13 2020 Okay, there is a bug in my code that it won't work if `arr` is

Steven Schveighoffer (36/47) Dec 13 2020 Lots of good responses to a mostly ambiguous message.
Ola Fosheim Grostad (7/15) Dec 15 2020 D and Go have both messed up the concept of a view of an array

Jonathan Levi <catanscout gmail.com> writes:

Wow, there went several hours of debugging.

Increasing the length of a slice, by setting its length, will 
initialize the new elements and reallocate if necessary.

I did not realize length was "smart", I guess I should have 
guessed.

Anyway, to work around this, and probably also be more clear, 
create a new slice from the same pointer.

`array = array.ptr[0..newLength];`

Dec 11 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Sat, Dec 12, 2020 at 12:53:09AM +0000, Jonathan Levi via Digitalmars-d wrote:
 Wow, there went several hours of debugging.
 
 Increasing the length of a slice, by setting its length, will
 initialize the new elements and reallocate if necessary.
 
 I did not realize length was "smart", I guess I should have guessed.
 
 Anyway, to work around this, and probably also be more clear, create a
 new slice from the same pointer.
 
 `array = array.ptr[0..newLength];`

I highly recommend reading the following article if you work with D
arrays in any non-trivial way:

	https://dlang.org/articles/d-array-article.html


T

-- 
Fact is stranger than fiction.

Dec 11 2020

Kagamin <spam here.lot> writes:

Yes, pointers are the only unsafe way to access memory, slices 
don't allow it.

Dec 12 2020

=?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:

Am 12.12.2020 um 01:53 schrieb Jonathan Levi:
 Wow, there went several hours of debugging.
 
 Increasing the length of a slice, by setting its length, will initialize 
 the new elements and reallocate if necessary.
 
 I did not realize length was "smart", I guess I should have guessed.
 
 Anyway, to work around this, and probably also be more clear, create a 
 new slice from the same pointer.
 
 `array = array.ptr[0..newLength];`

One way around this is to call `array.assumeSafeAppend();` before 
setting the new length. In this case it will reuse the already allocated 
block as long as it is large enough and only reallocate if necessary.

Dec 12 2020

Bastiaan Veelo <Bastiaan Veelo.net> writes:

On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`

Hold on -- how does this not corrupt memory? As soon as the 
length exceeds the allocated capacity (the point at which the 
slice would be reallocated when setting its length) you will have 
a silent out of bounds violation, identical to overflowing a C 
array.

Am I wrong??

If you do not want the expansion to be initialized, I guess you 
could allocate a new uninitialized slice and copy contents over 
explicitly.

--Bastiaan.

Dec 12 2020

Mike Parker <aldacron gmail.com> writes:

On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`

You're setting yourself up for failure with that. What are you 
trying to "work around"? The allocation, or the initialization?

Dec 12 2020

Jackson22 <jack.sonof gmail.com> writes:

On Saturday, 12 December 2020 at 14:12:06 UTC, Mike Parker wrote:
 On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
 wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`

 You're setting yourself up for failure with that. What are you 
 trying to "work around"? The allocation, or the initialization?

How is avoiding an expensive potentially memory leaking operation 
"setting yourself up for failure"?

Dec 13 2020

rikki cattermole <rikki cattermole.co.nz> writes:

On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`

 You're setting yourself up for failure with that. What are you trying 
 to "work around"? The allocation, or the initialization?

 
 How is avoiding an expensive potentially memory leaking operation 
 "setting yourself up for failure"?

No bounds checking. That slice can extend into memory that isn't of that 
type or even allocated to the process.

By avoiding that "expensive" memory operation, you instead create a 
silent memory corruption in its place which is far worse.

Dec 13 2020

Jackson22 <jack.sonof gmail.com> writes:

On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole 
wrote:
 On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`

 You're setting yourself up for failure with that. What are 
 you trying to "work around"? The allocation, or the 
 initialization?

 
 How is avoiding an expensive potentially memory leaking 
 operation "setting yourself up for failure"?

 No bounds checking. That slice can extend into memory that 
 isn't of that type or even allocated to the process.

No *automatic* bounds checking != no bounds checking.

There's a reason .ptr exist, I wish people would stop pretending 
that using it where it is appropriate is somehow going to lead to 
failure when there are more successful programming languages that 
have zero automatic bounds checking.

 By avoiding that "expensive" memory operation, you instead 
 create a silent memory corruption in its place which is far 
 worse.

Why did you quote expensive? Are you implying it isn't expensive? 
Are you saying re-allocating 4 GB of memory every 6 ms isn't 
expensive?

Dec 13 2020

rikki cattermole <rikki cattermole.co.nz> writes:

On 14/12/2020 9:03 AM, Jackson22 wrote:
 On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole wrote:
 On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`

 You're setting yourself up for failure with that. What are you 
 trying to "work around"? The allocation, or the initialization?

 How is avoiding an expensive potentially memory leaking operation 
 "setting yourself up for failure"?

 No bounds checking. That slice can extend into memory that isn't of 
 that type or even allocated to the process.

 
 No *automatic* bounds checking != no bounds checking.
 
 There's a reason .ptr exist, I wish people would stop pretending that 
 using it where it is appropriate is somehow going to lead to failure 

I have used it in the past where appropriate with 0 issues resulting 
from it. I do not believe that this is the case here.

 when there are more successful programming languages that have zero 
 automatic bounds checking.

int[] a = [1, 2, 3];
assert(a.ptr is a.ptr[0 .. 4].ptr);

Out of bounds, runs successfully. Doesn't mean that the GC is aware that 
it now has a length of 4.

int[] b = a;
b.length = 4;
assert(a.ptr !is b.ptr);

This is a case where .length is clearly doing the right thing.

int[] c = a;
c.length = 1;
assert(a.ptr is c.ptr);

 By avoiding that "expensive" memory operation, you instead create a 
 silent memory corruption in its place which is far worse.

 
 Why did you quote expensive? Are you implying it isn't expensive? Are 
 you saying re-allocating 4 GB of memory every 6 ms isn't expensive?

Allocating memory is always more expensive than using a buffer where 
life times are known and predictable. You are right about that.

In this case, that isn't what is being described. If length is 
allocating, then that code was not designed to be used with a buffer.

The most expensive thing in this scenario is not allocating memory, it 
is silent memory corruption. Once corrupted not only can the process die 
at any point, but you can't trust its output any longer.

Dec 13 2020

Dukc <ajieskola gmail.com> writes:

On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 No bounds checking. That slice can extend into memory that 
 isn't of that type or even allocated to the process.

 No *automatic* bounds checking != no bounds checking.

 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds checking.

Yes it's possible to without automatic bounds checks. Sometimes 
one has to -when using those older langages or doing very 
low-level system programming. And other times it may not be 
necessary, but still worth it to gain that last bit of 
performance when optimizing. These are the reasons why `.ptr` 
exists.

We really don't know whether either of those cases apply to OP:s 
case, but if the length extension with implicit duplications were 
even close to the desired performance, it seems unlikely.

 Why did you quote expensive? Are you implying it isn't 
 expensive? Are you saying re-allocating 4 GB of memory every 6 
 ms isn't expensive?

I think he was comparing to extending the array in-place, but in 
a bounds-checked way.

Dec 13 2020

Dukc <ajieskola gmail.com> writes:

On Sunday, 13 December 2020 at 21:01:18 UTC, Dukc wrote:
 Yes it's possible to without automatic bounds checks.

Meant: Yes it's possible to live without automatic bounds checks.

Dec 13 2020

Mike Parker <aldacron gmail.com> writes:

On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole 
 wrote:
 On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`

 You're setting yourself up for failure with that. What are 
 you trying to "work around"? The allocation, or the 
 initialization?

 
 How is avoiding an expensive potentially memory leaking 
 operation "setting yourself up for failure"?



"avoiding an expensive potentially memory leaking operation" is 
not the issue, it's how the OP is going about it.

Based on the OP's question and the example, the impression I get 
is that it's an attempt to arbitrarily increase the length of a 
slice with no regard to the capacity of its memory store. If 
`newLength` is greater than the remaining capacity in the memory 
store, then the new length will go beyond whatever has been 
allocated. That is what I meant by "setting yourself up for 
failure", and that is why the lack of bounds checking is an issue 
here. |

Steven's post lays out other potential issues with taking this 
approach in D.

 No bounds checking. That slice can extend into memory that 
 isn't of that type or even allocated to the process.

 No *automatic* bounds checking != no bounds checking.

But even with manual bounds checking, there has to be enough 
memory allocated somewhere to hold the new array elements. For a 
dynamically resizable array, there is no escaping the need to 
allocate memory. The cost can be mitigated by allocating enough 
up front, or with a tailored reallocation strategy, but it can't 
be eliminated.

Dec 13 2020

Mike Parker <aldacron gmail.com> writes:

On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds checking.

There's no pretending here. What the OP is doing *is* dangerous.

Dec 13 2020

Jackson22 <jack.sonof gmail.com> writes:

On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds checking.

 There's no pretending here. What the OP is doing *is* dangerous.

If someone writes a wrapper around .ptr which checks. It'd be 
literally no different than the implementation in druntime.

Like I said, I wish people would stop pretending that using it 
where it is appropriate is somehow going to lead to failure. 
Maybe those people just aren't knowledgeable enough to 
understand, I don't know.

Dec 14 2020

Max Haughton <maxhaton gmail.com> writes:

On Monday, 14 December 2020 at 20:53:39 UTC, Jackson22 wrote:
 On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds 
 checking.

 There's no pretending here. What the OP is doing *is* 
 dangerous.

 If someone writes a wrapper around .ptr which checks. It'd be 
 literally no different than the implementation in druntime.

 Like I said, I wish people would stop pretending that using it 
 where it is appropriate is somehow going to lead to failure. 
 Maybe those people just aren't knowledgeable enough to 
 understand, I don't know.

Good practice is good practice. If you know what you're doing you 
probably shouldn't need to ask.

What Mike is saying is important to know, because even if you use 
exactly the same concept as what druntime does in your code, 
you're still repeating a pattern which will lead to bugs if you 
get it wrong. Good code is all about compartmentalizing bad code, 
especially with memory where (thankfully we have sanitizers now) 
things can often go badly wrong without actually exhibiting any 
side-effects (i.e. we all know why C code has so many security 
problems)

Dec 14 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/14/20 3:53 PM, Jackson22 wrote:
 On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop pretending that 
 using it where it is appropriate is somehow going to lead to failure 
 when there are more successful programming languages that have zero 
 automatic bounds checking.

 There's no pretending here. What the OP is doing *is* dangerous.

 
 If someone writes a wrapper around .ptr which checks. It'd be literally 
 no different than the implementation in druntime.
 
 Like I said, I wish people would stop pretending that using it where it 
 is appropriate is somehow going to lead to failure. Maybe those people 
 just aren't knowledgeable enough to understand, I don't know.

It's possible you have misinterpreted what the OP is asking for.

Maybe the OP misstated what he is looking to do. Without a clarifying 
response from him, it's hard to tell how to respond, which means we have 
to respond with the most pessimistic interpretation of the post possible.

Yes, you can use .ptr to avoid bounds checks, and it's safe if you do it 
correctly. No you shouldn't use .ptr to create array slices that refer 
to memory outside the range that exists (and using .ptr slicing as 
posted in the original can do this). It's as basic as that.

-Steve

Dec 14 2020

Paul Backus <snarwin gmail.com> writes:

On Monday, 14 December 2020 at 23:55:09 UTC, Steven Schveighoffer 
wrote:
 Yes, you can use .ptr to avoid bounds checks, and it's safe if 
 you do it correctly.

Though doing it correctly may be harder than you'd think:

https://gist.github.com/pbackus/39b13e8a2c6aea0e090e4b1fe8046df5#example-short-string

Dec 14 2020

Mike Parker <aldacron gmail.com> writes:

On Monday, 14 December 2020 at 20:53:39 UTC, Jackson22 wrote:
 On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds 
 checking.

 There's no pretending here. What the OP is doing *is* 
 dangerous.

 If someone writes a wrapper around .ptr which checks. It'd be 
 literally no different than the implementation in druntime.

Of course. I'm not arguing otherwise. I don't see that anyone 
else is either. I'm talking about the specific case raised by the 
OP, where the issue isn't just a lack of automatic bounds 
checking, but the lack of any bounds checking at all.

Bounds checking before resizing has one of two possible outcomes: 
a reallocation, or no resizing occurs. The OP explicitly asked 
how to resize an array *without* reallocation, which implies that 
neither outcome of bounds checking is what he's looking for. So 
yes, arbitrarily slicing a pointer beyond its length in that 
situation is asking for trouble.

I mean, if there were more to the story, e.g., the array is 
backed by a block of malloced memory that's large enough for 
newLength, as manual bounds checking would verify, then the 
question of how to resize without reallocating is a moot one, no?

Dec 14 2020

Dukc <ajieskola gmail.com> writes:

On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`

There is a big downside in doing that: the array will not check 
whether it's still referring to valid memory after the resize.

Your way is efficient in machine code, but in most cases it's 
highly unpractical to skip on memory safety to speed up code like 
this.

In the general case, this is a better way to resize arrays 
without reallocating:

```
 safe resizedWithin(T)(T[] arr, T[] within, size_t newSize)
{  if(newSize == 0) return arr[0..0];
    auto startIndex= &arr[0] - &within[0];
    return within[startIndex .. startIndex + newSize];
}

 safe void main()
{  import std;
    auto containerArray = iota(1000).array;
    auto array = containerArray[50 .. 60];
    array = array.resizedWithin(containerArray, 20);
    //[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 
65,
    //66, 67, 68, 69]
    writeln(array);
}
```

Here, if you accidently gave too big new size for `array`, or 
`array` wasn't withing `containerArray` (except if `array.length 
== 0`), the program would immediately abort instead of making an 
invalid array.

Dec 13 2020

Dukc <ajieskola gmail.com> writes:

On Sunday, 13 December 2020 at 13:19:35 UTC, Dukc wrote:
 ```
  safe resizedWithin(T)(T[] arr, T[] within, size_t newSize)
 {  if(newSize == 0) return arr[0..0];
    auto startIndex= &arr[0] - &within[0];
    return within[startIndex .. startIndex + newSize];
 }
 ```

Okay, there is a bug in my code that it won't work if `arr` is 
originally of length 0. May well contain other bugs, use with 
care :D. But hey, at least no memory corruption, as it's ` safe`.

Dec 13 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 12/11/20 7:53 PM, Jonathan Levi wrote:
 Wow, there went several hours of debugging.
 
 Increasing the length of a slice, by setting its length, will initialize 
 the new elements and reallocate if necessary.
 
 I did not realize length was "smart", I guess I should have guessed.
 
 Anyway, to work around this, and probably also be more clear, create a 
 new slice from the same pointer.
 
 `array = array.ptr[0..newLength];`

Lots of good responses to a mostly ambiguous message.

So let's go over some possibilities:

1. You want to *shrink* the array length. array = array[0 .. newLength] 
works just fine. No reallocation, no initialization.

2. You want to *grow* the array length. array = array.ptr[0 .. 
newLength] is incredibly wrong and dangerous. You should not do this.

3. You wish to have no allocation for growing an array beyond it's 
already-allocated block. I only mention this because it could be implied 
by your message, even though I'm pretty sure you don't mean this. This 
is fantasy, and you should not do this. Memory corruption is something 
you don't want to deal with. It's the reason why your chosen solution is 
incorrect.

4. You wish to have no allocation for growing an array into it's ALREADY 
allocated block. This is possible, and even possible without 
reinitializing the new elements. In this context, your code is actually 
OK, though like Sönke mentions, you should call assumeSafeAppend on the 
array:

assert(newLength <= array.capacity); // ensure I am not growing beyond 
the block.
array = array.ptr[0 .. newLength]; // yay, new data that is 
uninitialized (mostly).
array.assumeSafeAppend(); // now the runtime is aware that I have taken 
over that data for use.

Why is it important to call assumeSafeAppend? A few reasons:

1. The GC will run destructors on elements in an array only if they are 
known to be used (in the case that your elements have destructors).
2. If you don't call it, appending to the original slice could overwrite 
your data
3. If you try to append to the resulting array and there technically 
would be space to fill inside the current block, the runtime will 
needlessly reallocate if your array ends outside where it thinks it 
should end.

Alternative to the assert, you could check for capacity and newLength to 
be consistent, and if not, reallocate yourself.

-Steve

Dec 13 2020

Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:

On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`

D and Go have both messed up the concept of a view of an array 
and owning the backing store of an array. I suggest using slices 
like c++ spans, only make them smaller, then create you own 
dynamic array ADT wrapper for explicit array ownership of the 
full array.

Dec 15 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Setting array length without initializing/reallocating.