www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Setting array length without initializing/reallocating.

reply Jonathan Levi <catanscout gmail.com> writes:
Wow, there went several hours of debugging.

Increasing the length of a slice, by setting its length, will 
initialize the new elements and reallocate if necessary.

I did not realize length was "smart", I guess I should have 
guessed.

Anyway, to work around this, and probably also be more clear, 
create a new slice from the same pointer.

`array = array.ptr[0..newLength];`
Dec 11 2020
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Sat, Dec 12, 2020 at 12:53:09AM +0000, Jonathan Levi via Digitalmars-d wrote:
 Wow, there went several hours of debugging.
 
 Increasing the length of a slice, by setting its length, will
 initialize the new elements and reallocate if necessary.
 
 I did not realize length was "smart", I guess I should have guessed.
 
 Anyway, to work around this, and probably also be more clear, create a
 new slice from the same pointer.
 
 `array = array.ptr[0..newLength];`
I highly recommend reading the following article if you work with D arrays in any non-trivial way: https://dlang.org/articles/d-array-article.html T -- Fact is stranger than fiction.
Dec 11 2020
prev sibling next sibling parent Kagamin <spam here.lot> writes:
Yes, pointers are the only unsafe way to access memory, slices 
don't allow it.
Dec 12 2020
prev sibling next sibling parent =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig outerproduct.org> writes:
Am 12.12.2020 um 01:53 schrieb Jonathan Levi:
 Wow, there went several hours of debugging.
 
 Increasing the length of a slice, by setting its length, will initialize 
 the new elements and reallocate if necessary.
 
 I did not realize length was "smart", I guess I should have guessed.
 
 Anyway, to work around this, and probably also be more clear, create a 
 new slice from the same pointer.
 
 `array = array.ptr[0..newLength];`
One way around this is to call `array.assumeSafeAppend();` before setting the new length. In this case it will reuse the already allocated block as long as it is large enough and only reallocate if necessary.
Dec 12 2020
prev sibling next sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`
Hold on -- how does this not corrupt memory? As soon as the length exceeds the allocated capacity (the point at which the slice would be reallocated when setting its length) you will have a silent out of bounds violation, identical to overflowing a C array. Am I wrong?? If you do not want the expansion to be initialized, I guess you could allocate a new uninitialized slice and copy contents over explicitly. --Bastiaan.
Dec 12 2020
prev sibling next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`
You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
Dec 12 2020
parent reply Jackson22 <jack.sonof gmail.com> writes:
On Saturday, 12 December 2020 at 14:12:06 UTC, Mike Parker wrote:
 On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
 wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`
You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?
Dec 13 2020
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`
You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?
No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process. By avoiding that "expensive" memory operation, you instead create a silent memory corruption in its place which is far worse.
Dec 13 2020
parent reply Jackson22 <jack.sonof gmail.com> writes:
On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole 
wrote:
 On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`
You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?
No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process.
No *automatic* bounds checking != no bounds checking. There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.
 By avoiding that "expensive" memory operation, you instead 
 create a silent memory corruption in its place which is far 
 worse.
Why did you quote expensive? Are you implying it isn't expensive? Are you saying re-allocating 4 GB of memory every 6 ms isn't expensive?
Dec 13 2020
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 14/12/2020 9:03 AM, Jackson22 wrote:
 On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole wrote:
 On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`
You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?
No bounds checking. That slice can extend into memory that isn't of that type or even allocated to the process.
No *automatic* bounds checking != no bounds checking. There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure
I have used it in the past where appropriate with 0 issues resulting from it. I do not believe that this is the case here.
 when there are more successful programming languages that have zero 
 automatic bounds checking.
int[] a = [1, 2, 3]; assert(a.ptr is a.ptr[0 .. 4].ptr); Out of bounds, runs successfully. Doesn't mean that the GC is aware that it now has a length of 4. int[] b = a; b.length = 4; assert(a.ptr !is b.ptr); This is a case where .length is clearly doing the right thing. int[] c = a; c.length = 1; assert(a.ptr is c.ptr);
 By avoiding that "expensive" memory operation, you instead create a 
 silent memory corruption in its place which is far worse.
Why did you quote expensive? Are you implying it isn't expensive? Are you saying re-allocating 4 GB of memory every 6 ms isn't expensive?
Allocating memory is always more expensive than using a buffer where life times are known and predictable. You are right about that. In this case, that isn't what is being described. If length is allocating, then that code was not designed to be used with a buffer. The most expensive thing in this scenario is not allocating memory, it is silent memory corruption. Once corrupted not only can the process die at any point, but you can't trust its output any longer.
Dec 13 2020
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 No bounds checking. That slice can extend into memory that 
 isn't of that type or even allocated to the process.
No *automatic* bounds checking != no bounds checking. There's a reason .ptr exist, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure when there are more successful programming languages that have zero automatic bounds checking.
Yes it's possible to without automatic bounds checks. Sometimes one has to -when using those older langages or doing very low-level system programming. And other times it may not be necessary, but still worth it to gain that last bit of performance when optimizing. These are the reasons why `.ptr` exists. We really don't know whether either of those cases apply to OP:s case, but if the length extension with implicit duplications were even close to the desired performance, it seems unlikely.
 Why did you quote expensive? Are you implying it isn't 
 expensive? Are you saying re-allocating 4 GB of memory every 6 
 ms isn't expensive?
I think he was comparing to extending the array in-place, but in a bounds-checked way.
Dec 13 2020
parent Dukc <ajieskola gmail.com> writes:
On Sunday, 13 December 2020 at 21:01:18 UTC, Dukc wrote:
 Yes it's possible to without automatic bounds checks.
Meant: Yes it's possible to live without automatic bounds checks.
Dec 13 2020
prev sibling next sibling parent Mike Parker <aldacron gmail.com> writes:
On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 On Sunday, 13 December 2020 at 17:26:45 UTC, rikki cattermole 
 wrote:
 On 14/12/2020 6:01 AM, Jackson22 wrote:
 `array = array.ptr[0..newLength];`
You're setting yourself up for failure with that. What are you trying to "work around"? The allocation, or the initialization?
How is avoiding an expensive potentially memory leaking operation "setting yourself up for failure"?
"avoiding an expensive potentially memory leaking operation" is not the issue, it's how the OP is going about it. Based on the OP's question and the example, the impression I get is that it's an attempt to arbitrarily increase the length of a slice with no regard to the capacity of its memory store. If `newLength` is greater than the remaining capacity in the memory store, then the new length will go beyond whatever has been allocated. That is what I meant by "setting yourself up for failure", and that is why the lack of bounds checking is an issue here. | Steven's post lays out other potential issues with taking this approach in D.
 No bounds checking. That slice can extend into memory that 
 isn't of that type or even allocated to the process.
No *automatic* bounds checking != no bounds checking.
But even with manual bounds checking, there has to be enough memory allocated somewhere to hold the new array elements. For a dynamically resizable array, there is no escaping the need to allocate memory. The cost can be mitigated by allocating enough up front, or with a tailored reallocation strategy, but it can't be eliminated.
Dec 13 2020
prev sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds checking.
There's no pretending here. What the OP is doing *is* dangerous.
Dec 13 2020
parent reply Jackson22 <jack.sonof gmail.com> writes:
On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds checking.
There's no pretending here. What the OP is doing *is* dangerous.
If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime. Like I said, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure. Maybe those people just aren't knowledgeable enough to understand, I don't know.
Dec 14 2020
next sibling parent Max Haughton <maxhaton gmail.com> writes:
On Monday, 14 December 2020 at 20:53:39 UTC, Jackson22 wrote:
 On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds 
 checking.
There's no pretending here. What the OP is doing *is* dangerous.
If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime. Like I said, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure. Maybe those people just aren't knowledgeable enough to understand, I don't know.
Good practice is good practice. If you know what you're doing you probably shouldn't need to ask. What Mike is saying is important to know, because even if you use exactly the same concept as what druntime does in your code, you're still repeating a pattern which will lead to bugs if you get it wrong. Good code is all about compartmentalizing bad code, especially with memory where (thankfully we have sanitizers now) things can often go badly wrong without actually exhibiting any side-effects (i.e. we all know why C code has so many security problems)
Dec 14 2020
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/14/20 3:53 PM, Jackson22 wrote:
 On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop pretending that 
 using it where it is appropriate is somehow going to lead to failure 
 when there are more successful programming languages that have zero 
 automatic bounds checking.
There's no pretending here. What the OP is doing *is* dangerous.
If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime. Like I said, I wish people would stop pretending that using it where it is appropriate is somehow going to lead to failure. Maybe those people just aren't knowledgeable enough to understand, I don't know.
It's possible you have misinterpreted what the OP is asking for. Maybe the OP misstated what he is looking to do. Without a clarifying response from him, it's hard to tell how to respond, which means we have to respond with the most pessimistic interpretation of the post possible. Yes, you can use .ptr to avoid bounds checks, and it's safe if you do it correctly. No you shouldn't use .ptr to create array slices that refer to memory outside the range that exists (and using .ptr slicing as posted in the original can do this). It's as basic as that. -Steve
Dec 14 2020
parent Paul Backus <snarwin gmail.com> writes:
On Monday, 14 December 2020 at 23:55:09 UTC, Steven Schveighoffer 
wrote:
 Yes, you can use .ptr to avoid bounds checks, and it's safe if 
 you do it correctly.
Though doing it correctly may be harder than you'd think: https://gist.github.com/pbackus/39b13e8a2c6aea0e090e4b1fe8046df5#example-short-string
Dec 14 2020
prev sibling parent Mike Parker <aldacron gmail.com> writes:
On Monday, 14 December 2020 at 20:53:39 UTC, Jackson22 wrote:
 On Monday, 14 December 2020 at 01:36:02 UTC, Mike Parker wrote:
 On Sunday, 13 December 2020 at 20:03:46 UTC, Jackson22 wrote:
 There's a reason .ptr exist, I wish people would stop 
 pretending that using it where it is appropriate is somehow 
 going to lead to failure when there are more successful 
 programming languages that have zero automatic bounds 
 checking.
There's no pretending here. What the OP is doing *is* dangerous.
If someone writes a wrapper around .ptr which checks. It'd be literally no different than the implementation in druntime.
Of course. I'm not arguing otherwise. I don't see that anyone else is either. I'm talking about the specific case raised by the OP, where the issue isn't just a lack of automatic bounds checking, but the lack of any bounds checking at all. Bounds checking before resizing has one of two possible outcomes: a reallocation, or no resizing occurs. The OP explicitly asked how to resize an array *without* reallocation, which implies that neither outcome of bounds checking is what he's looking for. So yes, arbitrarily slicing a pointer beyond its length in that situation is asking for trouble. I mean, if there were more to the story, e.g., the array is backed by a block of malloced memory that's large enough for newLength, as manual bounds checking would verify, then the question of how to resize without reallocating is a moot one, no?
Dec 14 2020
prev sibling next sibling parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`
There is a big downside in doing that: the array will not check whether it's still referring to valid memory after the resize. Your way is efficient in machine code, but in most cases it's highly unpractical to skip on memory safety to speed up code like this. In the general case, this is a better way to resize arrays without reallocating: ``` safe resizedWithin(T)(T[] arr, T[] within, size_t newSize) { if(newSize == 0) return arr[0..0]; auto startIndex= &arr[0] - &within[0]; return within[startIndex .. startIndex + newSize]; } safe void main() { import std; auto containerArray = iota(1000).array; auto array = containerArray[50 .. 60]; array = array.resizedWithin(containerArray, 20); //[50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, //66, 67, 68, 69] writeln(array); } ``` Here, if you accidently gave too big new size for `array`, or `array` wasn't withing `containerArray` (except if `array.length == 0`), the program would immediately abort instead of making an invalid array.
Dec 13 2020
parent Dukc <ajieskola gmail.com> writes:
On Sunday, 13 December 2020 at 13:19:35 UTC, Dukc wrote:
 ```
  safe resizedWithin(T)(T[] arr, T[] within, size_t newSize)
 {  if(newSize == 0) return arr[0..0];
    auto startIndex= &arr[0] - &within[0];
    return within[startIndex .. startIndex + newSize];
 }
 ```
Okay, there is a bug in my code that it won't work if `arr` is originally of length 0. May well contain other bugs, use with care :D. But hey, at least no memory corruption, as it's ` safe`.
Dec 13 2020
prev sibling next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 12/11/20 7:53 PM, Jonathan Levi wrote:
 Wow, there went several hours of debugging.
 
 Increasing the length of a slice, by setting its length, will initialize 
 the new elements and reallocate if necessary.
 
 I did not realize length was "smart", I guess I should have guessed.
 
 Anyway, to work around this, and probably also be more clear, create a 
 new slice from the same pointer.
 
 `array = array.ptr[0..newLength];`
Lots of good responses to a mostly ambiguous message. So let's go over some possibilities: 1. You want to *shrink* the array length. array = array[0 .. newLength] works just fine. No reallocation, no initialization. 2. You want to *grow* the array length. array = array.ptr[0 .. newLength] is incredibly wrong and dangerous. You should not do this. 3. You wish to have no allocation for growing an array beyond it's already-allocated block. I only mention this because it could be implied by your message, even though I'm pretty sure you don't mean this. This is fantasy, and you should not do this. Memory corruption is something you don't want to deal with. It's the reason why your chosen solution is incorrect. 4. You wish to have no allocation for growing an array into it's ALREADY allocated block. This is possible, and even possible without reinitializing the new elements. In this context, your code is actually OK, though like Sönke mentions, you should call assumeSafeAppend on the array: assert(newLength <= array.capacity); // ensure I am not growing beyond the block. array = array.ptr[0 .. newLength]; // yay, new data that is uninitialized (mostly). array.assumeSafeAppend(); // now the runtime is aware that I have taken over that data for use. Why is it important to call assumeSafeAppend? A few reasons: 1. The GC will run destructors on elements in an array only if they are known to be used (in the case that your elements have destructors). 2. If you don't call it, appending to the original slice could overwrite your data 3. If you try to append to the resulting array and there technically would be space to fill inside the current block, the runtime will needlessly reallocate if your array ends outside where it thinks it should end. Alternative to the assert, you could check for capacity and newLength to be consistent, and if not, reallocate yourself. -Steve
Dec 13 2020
prev sibling parent Ola Fosheim Grostad <ola.fosheim.grostad gmail.com> writes:
On Saturday, 12 December 2020 at 00:53:09 UTC, Jonathan Levi 
wrote:
 Wow, there went several hours of debugging.

 Increasing the length of a slice, by setting its length, will 
 initialize the new elements and reallocate if necessary.

 I did not realize length was "smart", I guess I should have 
 guessed.

 Anyway, to work around this, and probably also be more clear, 
 create a new slice from the same pointer.

 `array = array.ptr[0..newLength];`
D and Go have both messed up the concept of a view of an array and owning the backing store of an array. I suggest using slices like c++ spans, only make them smaller, then create you own dynamic array ADT wrapper for explicit array ownership of the full array.
Dec 15 2020