www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Dynamic Arrays Capacity

reply Salih Dincer <salihdb hotmail.com> writes:
Hi,

Do I misunderstand? A dynamic array is allocated memory according 
to the `nextpow2()` algorithm(-1 lapse); strings, on the other 
hand, don't behave like this...

```d
   string str = "0123456789ABCDEF";
   char[] chr = str.dup;

   assert(str.length == 16);
   assert(str.capacity == 0);

   import std.math: thus = nextPow2; //.algebraic

   assert(chr.capacity == thus(str.length) - 1);
   assert(chr.capacity == 31);
```

Also, `.ptr` keeps the address of the most recent first element, 
right?


```d
   write("str[0] ", &str[0]);
   writeln(" ==  ", str.ptr);

   write("chr[0] ", &chr[0]);
   writeln(" ==  ", chr.ptr);
```

**Print Out:** (No Errors)
 str[0] 5607593901E0 ==  5607593901E0
 chr[0] 7F9430982000 ==  7F9430982000
SDB 79
Jun 01 2022
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Thursday, 2 June 2022 at 05:04:03 UTC, Salih Dincer wrote:
 Hi,

 Do I misunderstand? A dynamic array is allocated memory 
 according to the `nextpow2()` algorithm(-1 lapse); strings, on 
 the other hand, don't behave like this...

 ```d
   string str = "0123456789ABCDEF";
   char[] chr = str.dup;

   assert(str.length == 16);
   assert(str.capacity == 0);

   import std.math: thus = nextPow2; //.algebraic

   assert(chr.capacity == thus(str.length) - 1);
   assert(chr.capacity == 31);
You've initialized `str` with a string literal. No memory is allocated for these from the GC. They're stored in the binary, meaning they're loaded into memory from disk by the OS. So `str.ptr` points to a static memory location that's a fixed size, hence no extra capacity. `chr` is allocated from the GC using whatever algorithm is implemented in the runtime. That it happens to be any given algorithm is an implementation detail that could change in any release.
 ```

 Also, `.ptr` keeps the address of the most recent first 
 element, right?
More specifically, it points to the starting address of the allocated block of memory.
Jun 02 2022
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Thursday, 2 June 2022 at 08:14:40 UTC, Mike Parker wrote:

 More specifically, it points to the starting address of the 
 allocated block of memory.
I posted too soon. Given an instance `ts` of type `T[]`, array accesses essentially are this: ```d ts[0] == *(ts.ptr + 0); ts[1] == *(ts.ptr + 1); ts[2] == *(ts.ptr + 2); ``` Since the size of `T` is known, each addition to the pointer adds `N * T.sizeof` bytes. If you converted it to a `ubyte` array, you'd need to handle that yourself. And so, `&ts[0]` is the same as `&(*ts.ptr + 0)`, or simply `ts.ptr`.
Jun 02 2022
parent Mike Parker <aldacron gmail.com> writes:
On Thursday, 2 June 2022 at 08:24:51 UTC, Mike Parker wrote:
 And so, `&ts[0]` is the same as `&(*ts.ptr + 0)`, or simply 
 `ts.ptr`.
That should be the same as `&(*(ts.ptr + 0))`!
Jun 02 2022
prev sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Thursday, 2 June 2022 at 08:14:40 UTC, Mike Parker wrote:
 You've initialized `str` with a string literal. No memory is 
 allocated for these from the GC. They're stored in the binary, 
 meaning they're loaded into memory from disk by the OS. So 
 `str.ptr` points to a static memory location that's a fixed 
 size, hence no extra capacity.
I didn't know that, so maybe this example proves it; the following test code that Ali has started and I have developed: ```d import std.range; import std.stdio; /* toggle array: alias chr = char*; auto data = [' '];/*/ alias chr = immutable(char*); auto data = " ";//*/ void main() { chr[] ptrs; data.fill(3, ptrs); writeln; foreach(ref ptr; ptrs) { " 0x".write(ptr); } } /* Print Out: 0: 0 1: 15 2: 31 3: 47 0x55B07E227020 0x7F2391F9F000 0x7F2391FA0000 0x7F2391FA1000 //*/ void fill(R)(ref R mostRecent, int limit, ref chr[] ptrs) { auto ptr = mostRecent.ptr; size_t capacity, depth; while (depth <= limit) { mostRecent ~= ElementType!R.init; if(ptr != mostRecent.ptr) { ptrs ~= ptr; depth.writef!"%2s: %11s"(capacity); depth++; } if (mostRecent.capacity != capacity) { ptr = mostRecent.ptr; capacity = mostRecent.capacity; } } } ``` As for the result I got from this code: The array configured in the heap is copied to another memory region as soon as its capacity changes (0x5...20 >> 0x7...00). We get the same result in array. Just add the / character to the beginning of the 4th line to try it. Thank you all very much for the replies; all of these open my mind. SDB 79
Jun 02 2022
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 6/2/22 1:04 AM, Salih Dincer wrote:
 Hi,
 
 Do I misunderstand? A dynamic array is allocated memory according to the 
 `nextpow2()` algorithm(-1 lapse); strings, on the other hand, don't 
 behave like this...
 
 ```d
    string str = "0123456789ABCDEF";
    char[] chr = str.dup;
 
    assert(str.length == 16);
    assert(str.capacity == 0);
 
    import std.math: thus = nextPow2; //.algebraic
 
    assert(chr.capacity == thus(str.length) - 1);
    assert(chr.capacity == 31);
 ```
The capacity is how many elements of the array can be stored without reallocating *when appending*. Why 0 for the string literal? Because it's not from the GC, and so has no capacity for appending (note that a capacity of 0 is returned even though the string currently has 16 characters in it). Why 31 for the GC-allocated array? Because implementation details. But I can give you the details: 1. The GC allocates in powers of 2 (mostly) The smallest block is 16 bytes, and the next size up is 32 bytes. 2. In order to remember which parts of the block are used, it needs to allocate some space to record that value. For a 16-byte block, that requires 1 byte. So it can't fit your 16-byte string + 1 byte for the capacity tracker into a 16 byte block, it has to go into a 32 byte block. And of course, 1 byte of that 32 byte block is for the capacity tracker, hence capacity 31.
 Also, `.ptr` keeps the address of the most recent first element, right?
This statement suggests to me that you have an incorrect perception of a string. A string is a pointer paired with a length of how many characters after that pointer are valid. That's it. `str.ptr` is the pointer to the first element of the string. There isn't a notion of "most recent first element". -Steve
Jun 02 2022
parent reply bauss <jj_1337 live.dk> writes:
On Thursday, 2 June 2022 at 20:12:30 UTC, Steven Schveighoffer 
wrote:
 This statement suggests to me that you have an incorrect 
 perception of a string. A string is a pointer paired with a 
 length of how many characters after that pointer are valid. 
 That's it. `str.ptr` is the pointer to the first element of the 
 string.

 There isn't a notion of "most recent first element".

 -Steve
This isn't correct either, at least with unicode, since 1 byte isn't equal to 1 character and a character can be several bytes. I believe it's only true in unicode for utf-32 since all characters do fit in the 4 byte space they have, but for utf-8 and utf-16 the characters will not be the same size of bytes.
Jun 03 2022
parent reply Adam D Ruppe <destructionator gmail.com> writes:
On Friday, 3 June 2022 at 12:49:07 UTC, bauss wrote:
 I believe it's only true in unicode for utf-32 since all 
 characters do fit in the 4 byte space they have
Depends how you define "character".
Jun 03 2022
parent bauss <jj_1337 live.dk> writes:
On Friday, 3 June 2022 at 12:52:30 UTC, Adam D Ruppe wrote:
 On Friday, 3 June 2022 at 12:49:07 UTC, bauss wrote:
 I believe it's only true in unicode for utf-32 since all 
 characters do fit in the 4 byte space they have
Depends how you define "character".
I guess that's true as well, unicode really made it impossible to just say "this string has so many characters because it has this many bytes."
Jun 03 2022