digitalmars.D.learn - Counting an initialised array, and segments

Cecil Ward (41/41) Jun 25 2023 I recently had some problems

novice2 (8/8) Jun 25 2023 ```
Jonathan M Davis (13/26) Jun 26 2023 Without seeing the errors, I can't really say what the problem was, but ...

Cecil Ward (11/44) Jun 26 2023 Where I used symbolic names, such as TAB, that was defined as an

Jonathan M Davis (22/70) Jun 26 2023 sizeof is a property in D. So, you can do char.sizeof or varName.sizeof....

Cecil Ward (23/102) Jun 26 2023 No, point taken, a sloppy example. I don’t in fact do that in the

Jonathan M Davis (29/51) Jun 26 2023 Well, I can't really comment on the details of what you're doing, since ...

Cecil Ward (7/11) Jun 26 2023 I completely agree with everything you said. I merely used

Cecil Ward <cecil cecilward.com> writes:

I recently had some problems

dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];

and I got errors from the compiler which led to me having to 
count the elements in the initialiser and declare the array with 
an explicit size. I don’t want the array to be mutable so I later 
added immutable to it, but that didn’t help matters. At one 
point, because the array was quite long, I got the arr[ 
n_elements ] number wrong, it was too small and the remainder of 
the array was full of 0xffs (or something), which was good, 
helped me spot the bug.

Is there any way to get the compiler to count the number of 
elements in the initialiser and set the array to that size ? And 
it’s immutable.

The only reason that I’m giving it a name is that I want the 
object to be used in several places and I don’t want multiple 
copies of it in the code/readonly initialised data segment.

Another couple of unrelated questions: is there such a thing as a 
no-execute initialised readonly data segment? I’m seeing 
immutables going into the code segment, I think, with x86 LDC at 
least, can’t remember about GDC. Anyway on x86-64 immutables are 
addressed as [rip + displ] which is very pleasing as it’s vastly 
more efficient than accessing statics in TLS which seems to be a 
nightmare in Linux at least.

In MS Windows, isn’t TLS dealt with using FS: ( or GS: ?) 
prefixes? Shame this doesn’t seem to be exploited in Linux, or am 
I wrong?

I’d like to deal with the overhead of retrieving the static base 
address all the time in the Linux situation (if I have got the 
right end of the stick) but having an ‘application object’ which 
contains all the statics in a struct in an alloc cell or 
something, and passing a pointer to this static base app object 
everywhere seems a nightmare too as it eats a register and worse 
eats one of the limited number of precious function argument 
registers which are in short supply in eg x86-64, where there are 
less than half a dozen argument registers allowed. I realise that 
one can deal with that limited number by rolling some passed 
arguments up into a passed struct, but that’s introducing a level 
of indirection and other overhead, that or just live with the 
fact that the extra args are going into the stack, which isn’t 
the worst thing in the world. I wonder what others do about 
statics in TLS?

Jun 25 2023

novice2 <sorryno em.ail> writes:

```
import std;
auto arr = [dchar(' '), '\t', 0x0a, 0x10];
void main()
{
     writeln("Hello D: ", typeid(arr));
}
```

Jun 25 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
 I recently had some problems

 dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];

 and I got errors from the compiler which led to me having to
 count the elements in the initialiser and declare the array with
 an explicit size. I don’t want the array to be mutable so I later
 added immutable to it, but that didn’t help matters. At one
 point, because the array was quite long, I got the arr[
 n_elements ] number wrong, it was too small and the remainder of
 the array was full of 0xffs (or something), which was good,
 helped me spot the bug.

 Is there any way to get the compiler to count the number of
 elements in the initialiser and set the array to that size ? And
 it’s immutable.

Without seeing the errors, I can't really say what the problem was, but most
character literals are going to be char, not dchar, so you may have had
issues related to the type that the compiler was inferring for the array
literal. I don't recall at the moment how exactly the compiler decides the
type of an array literal when it's given values of differing types for the
elements.

Either way, if you want a static array, and you don't want to have to count
the number of elements, then
https://dlang.org/phobos/std_array.html#staticArray should take care of that
problem.

- Jonathan M Davis

Jun 26 2023

Cecil Ward <cecil cecilward.com> writes:

On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis wrote:
 On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via 
 Digitalmars-d-learn wrote:
 I recently had some problems

 dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];

 and I got errors from the compiler which led to me having to
 count the elements in the initialiser and declare the array 
 with
 an explicit size. I don’t want the array to be mutable so I 
 later
 added immutable to it, but that didn’t help matters. At one
 point, because the array was quite long, I got the arr[
 n_elements ] number wrong, it was too small and the remainder 
 of
 the array was full of 0xffs (or something), which was good,
 helped me spot the bug.

 Is there any way to get the compiler to count the number of 
 elements in the initialiser and set the array to that size ? 
 And it’s immutable.

 Without seeing the errors, I can't really say what the problem 
 was, but most character literals are going to be char, not 
 dchar, so you may have had issues related to the type that the 
 compiler was inferring for the array literal. I don't recall at 
 the moment how exactly the compiler decides the type of an 
 array literal when it's given values of differing types for the 
 elements.

 Either way, if you want a static array, and you don't want to 
 have to count the number of elements, then 
 https://dlang.org/phobos/std_array.html#staticArray should take 
 care of that problem.

 - Jonathan M Davis

Where I used symbolic names, such as TAB, that was defined as an 
int (or uint)
enum TAB = 9;
or
enum uint TAB = 9;
I forget which. So I had at least one item that was typed 
something wider than a char.

I tried the usual sizeof( arr )/ sizeof dchar, compiler wouldn’t 
have that for some reason, and yes I know it should be D syntax, 
god how I long for C sizeof()!

Jun 26 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, June 26, 2023 5:08:06 AM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
 On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis wrote:
 On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via

 Digitalmars-d-learn wrote:
 I recently had some problems

 dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];

 and I got errors from the compiler which led to me having to
 count the elements in the initialiser and declare the array
 with
 an explicit size. I don’t want the array to be mutable so I
 later
 added immutable to it, but that didn’t help matters. At one
 point, because the array was quite long, I got the arr[
 n_elements ] number wrong, it was too small and the remainder
 of
 the array was full of 0xffs (or something), which was good,
 helped me spot the bug.

 Is there any way to get the compiler to count the number of
 elements in the initialiser and set the array to that size ?
 And it’s immutable.

 Without seeing the errors, I can't really say what the problem
 was, but most character literals are going to be char, not
 dchar, so you may have had issues related to the type that the
 compiler was inferring for the array literal. I don't recall at
 the moment how exactly the compiler decides the type of an
 array literal when it's given values of differing types for the
 elements.

 Either way, if you want a static array, and you don't want to
 have to count the number of elements, then
 https://dlang.org/phobos/std_array.html#staticArray should take
 care of that problem.

 - Jonathan M Davis

 Where I used symbolic names, such as TAB, that was defined as an
 int (or uint)
 enum TAB = 9;
 or
 enum uint TAB = 9;
 I forget which. So I had at least one item that was typed
 something wider than a char.

 I tried the usual sizeof( arr )/ sizeof dchar, compiler wouldn’t
 have that for some reason, and yes I know it should be D syntax,
 god how I long for C sizeof()!

sizeof is a property in D. So, you can do char.sizeof or varName.sizeof. But
regardless, there really is no reason to use sizeof with D arrays under
normal circumstances. And in the case of dynamic arrays, sizeof will give
you the size of the dynamic array itself, not the slice of memory that it
refers to. You're essentially using sizeof on

struct DynamicArray(T)
{
    size_t length;
    T* ptr;
}

which is not going to tell you anything about the memory it points to. The
length property of an array already tells you the length of the array (be it
static or dynamic), so using sizeof like you're talking about really does
not apply to D.

And I wouldn't advise using uint for a character in D. That's what char,
wchar, and dchar are for. Depending on the circumstances, you get implicit
conversions between character and integer types, but they are distinct
types, and mixing and matching them willy-nilly could result in compilation
errors depending on what your code is doing.

- Jonathan M Davis

Jun 26 2023

Cecil Ward <cecil cecilward.com> writes:

On Monday, 26 June 2023 at 12:28:15 UTC, Jonathan M Davis wrote:
 On Monday, June 26, 2023 5:08:06 AM MDT Cecil Ward via 
 Digitalmars-d-learn wrote:
 On Monday, 26 June 2023 at 08:26:31 UTC, Jonathan M Davis 
 wrote:
 On Sunday, June 25, 2023 4:08:19 PM MDT Cecil Ward via

 Digitalmars-d-learn wrote:
 I recently had some problems

 dchar[] arr = [ ‘ ‘, TAB, CR, LF … ];

 and I got errors from the compiler which led to me having to
 count the elements in the initialiser and declare the array
 with
 an explicit size. I don’t want the array to be mutable so I
 later
 added immutable to it, but that didn’t help matters. At one
 point, because the array was quite long, I got the arr[
 n_elements ] number wrong, it was too small and the 
 remainder
 of
 the array was full of 0xffs (or something), which was good,
 helped me spot the bug.

 Is there any way to get the compiler to count the number of 
 elements in the initialiser and set the array to that size 
 ? And it’s immutable.

 Without seeing the errors, I can't really say what the 
 problem was, but most character literals are going to be 
 char, not dchar, so you may have had issues related to the 
 type that the compiler was inferring for the array literal. 
 I don't recall at the moment how exactly the compiler 
 decides the type of an array literal when it's given values 
 of differing types for the elements.

 Either way, if you want a static array, and you don't want 
 to have to count the number of elements, then 
 https://dlang.org/phobos/std_array.html#staticArray should 
 take care of that problem.

 - Jonathan M Davis

 Where I used symbolic names, such as TAB, that was defined as 
 an
 int (or uint)
 enum TAB = 9;
 or
 enum uint TAB = 9;
 I forget which. So I had at least one item that was typed
 something wider than a char.

 I tried the usual sizeof( arr )/ sizeof dchar, compiler 
 wouldn’t
 have that for some reason, and yes I know it should be D 
 syntax,
 god how I long for C sizeof()!

 sizeof is a property in D. So, you can do char.sizeof or 
 varName.sizeof. But regardless, there really is no reason to 
 use sizeof with D arrays under normal circumstances. And in the 
 case of dynamic arrays, sizeof will give you the size of the 
 dynamic array itself, not the slice of memory that it refers 
 to. You're essentially using sizeof on

 struct DynamicArray(T)
 {
     size_t length;
     T* ptr;
 }

 which is not going to tell you anything about the memory it 
 points to. The length property of an array already tells you 
 the length of the array (be it static or dynamic), so using 
 sizeof like you're talking about really does not apply to D.

 And I wouldn't advise using uint for a character in D. That's 
 what char, wchar, and dchar are for. Depending on the 
 circumstances, you get implicit conversions between character 
 and integer types, but they are distinct types, and mixing and 
 matching them willy-nilly could result in compilation errors 
 depending on what your code is doing.

 - Jonathan M Davis

No, point taken, a sloppy example. I don’t in fact do that in the 
real code. I use dchar everywhere appropriate instead of uint. In 
fact I have aliases for dstring and dchar and successfully did an 
alternative build with the aliases renamed to use 16-bits wchar / 
w string instead of 32-bits and rebuilt and all was well, just to 
test that it is code word size-independent. I would need to do 
something different though if I ever decided to change to use 
16-bit code words in memory because I would still be wanting to 
manipulate 32-bit values for char code points when they are being 
handled in registers, for efficiency too as well as code 
correctness, as 16-bit ‘partial words’ are bad news for 
performance on x86-64. I perhaps ought to introduce a new alias 
called codepoint, which is always 32-bits, to distinguish dchar 
in registers from words in memory. It turns out that I can get 
away with not caring about utf16, as I’m merely _scanning_ a 
string. I couldn’t ever get away with changing the in-memory code 
word type to be 8-bit chars, and then using utf8 though, as I do 
occasionally deal with non-ASCII characters, and I would have to 
either preconvert the Utf8 to do the decoding, or parse 8-bit 
code words and handle the decoding myself on the fly which would 
be madness. If I have to handle utf8 data I will just preconvert 
it.

Jun 26 2023

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Monday, June 26, 2023 1:09:24 PM MDT Cecil Ward via Digitalmars-d-learn 
wrote:
 No, point taken, a sloppy example. I don’t in fact do that in the
 real code. I use dchar everywhere appropriate instead of uint. In
 fact I have aliases for dstring and dchar and successfully did an
 alternative build with the aliases renamed to use 16-bits wchar /
 w string instead of 32-bits and rebuilt and all was well, just to
 test that it is code word size-independent. I would need to do
 something different though if I ever decided to change to use
 16-bit code words in memory because I would still be wanting to
 manipulate 32-bit values for char code points when they are being
 handled in registers, for efficiency too as well as code
 correctness, as 16-bit ‘partial words’ are bad news for
 performance on x86-64. I perhaps ought to introduce a new alias
 called codepoint, which is always 32-bits, to distinguish dchar
 in registers from words in memory. It turns out that I can get
 away with not caring about utf16, as I’m merely _scanning_ a
 string. I couldn’t ever get away with changing the in-memory code
 word type to be 8-bit chars, and then using utf8 though, as I do
 occasionally deal with non-ASCII characters, and I would have to
 either preconvert the Utf8 to do the decoding, or parse 8-bit
 code words and handle the decoding myself on the fly which would
 be madness. If I have to handle utf8 data I will just preconvert
 it.

Well, I can't really comment on the details of what you're doing, since I
don't know them, but I would point out that a dchar is a code point by
definition. That is its purpose. char is a UTF-8 code unit, wchar is a
UTF-16 code unit, and dchar is both a UTF-32 code unit and a code point,
since UTF-32 code units are code points by definition. It is possible for a
dchar to be an invalid code point if you give it bad data, but code points
are 32-bit, and dchar is intended to represent that. Actual characters, of
course, can be multiple code points, annoyingly enough, so all of that
Unicode stuff is of course an annoyingly complicated mess, but D and Phobos
do have a pretty good set of primitives for handling code units and code
points without programmers needing to come up with their own types for
those. char is a UTF-8 code unit, wchar is a UTF-16 code unit, and dchar is
both a UTF-32 code unit and a code point, since UTF-32 code units are code
points by definition.

The primary mistake in what D has is that strings are all ranges of dchar
with the code units automatically being decoded to dchar by front, popFront,
etc. (at the time, Andrei thought that that would ensure correctness, since
he didn't understand that you could have characters that were multiple code
points). We'd like to get rid of that, but it's difficult to do so without
breaking code. std.utf.byCodeUnit helps work around that, and of course, you
can do so by simply operating on the strings as arrays without using the
range primitives, but the range primitives do decode to dchar,
unfortunately. However, in spite of that quirk, the tools are there to
operate on Unicode correctly in a way that don't exist out of the box with
many languages. So, in general, you shouldn't need to be creating new types
for Unicode primitives. The language already has that.

- Jonathan M Davis

Jun 26 2023

Cecil Ward <cecil cecilward.com> writes:

On Monday, 26 June 2023 at 22:19:25 UTC, Jonathan M Davis wrote:
 On Monday, June 26, 2023 1:09:24 PM MDT Cecil Ward via 
 Digitalmars-d-learn wrote:
 [...]

 [...]

I completely agree with everything you said. I merely used 
aliases to give me the freedom to switch between having text in 
either UTF16 or UTF32 in memory, and see how the performance 
changes. That’s the only reason for me doing that. I also want to 
keep a clear distinction between words in me memory and code 
points in registers.

Jun 26 2023

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Counting an initialised array, and segments