digitalmars.D - D design problem on platforms with <32 bit pointer width

Dukc (6/6) Aug 19 2023 This is a request for comments. I've written it as a Git gist as

Dom DiSc (15/21) Aug 19 2023 I hate those stupid promotion rules.

Dukc (11/17) Aug 19 2023 It's not quite that simple. Consider:

sighoya (7/11) Aug 19 2023 It would be better if 8 bit or 16 bit systems provide some sort

sighoya (3/4) Aug 19 2023 16 bit systems.

Dom DiSc (14/20) Aug 19 2023 Sorry, but if you say nothing about which type you want, you may

Johan (6/11) Aug 19 2023 Nice, thanks.

Dukc (4/7) Aug 22 2023 Thanks for the table. I'll see about double-checking and

Dukc (3/11) Aug 24 2023 The table is now included in the gist with my checking and

Walter Bright (36/36) Aug 20 2023 Thanks for taking the time to sum up the issues.

Walter Bright (2/2) Aug 20 2023 To be clear, my proposal would mean that size_t would be an alias for us...

Johan (14/16) Aug 20 2023 Hi Walter,
Patrick Schluter (3/5) Aug 25 2023 Most 16 bit machines use 32 bit sized size_t (m68k). The 16 bit

Dukc (4/9) Aug 25 2023 Maybe. But in this thread we mean 16-bit as the pointer size. In

Richard (Rikki) Andrew Cattermole (4/7) Aug 20 2023 This does beg another question. Should we make integer promotion tied to...

Johan (4/12) Aug 20 2023 Please don't make this discussion any larger than it should be.

Walter Bright (1/1) Aug 20 2023 I welcome discussion on Rikki's idea, but it should start a new thread.

Richard (Rikki) Andrew Cattermole (7/10) Aug 20 2023 I've been thinking about this and I don't think that this is true.

Walter Bright (8/22) Aug 20 2023 I've lived with that problem for 20 years. If there was an answer, one o...
Dukc (7/20) Aug 22 2023 I don't think we need to discuss near/far pointers here. This is

Dukc (19/27) Aug 22 2023 "for 16 bit code generation" - meaning, the int promotion target

Walter Bright (3/9) Aug 20 2023 Ok, it looks like the same as the "16bit Arch Proposal size_t=16bit, int...

Dukc <ajieskola gmail.com> writes:

This is a request for comments. I've written it as a Git gist as 
opposed to forum post because it's pretty long and may warrant 
some editing at some point.

Link: 
https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe

Discussion can remain here though.

Aug 19 2023

Dom DiSc <dominikus scherkl.de> writes:

On Saturday, 19 August 2023 at 10:09:56 UTC, Dukc wrote:
 This is a request for comments. I've written it as a Git gist 
 as opposed to forum post because it's pretty long and may 
 warrant some editing at some point.

 Link: 
 https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe

 Discussion can remain here though.

I hate those stupid promotion rules.
Every literal should be of the smallest type that can represent it
(e.g. 0..255 should be ubyte, 256..65535 should be ushort, etc., 
-1..-127 should be byte, -128..-32767 should be short, ...) and 
can be promoted to whatever is needed implicitly.
Yes, I think -128 should NOT be byte. this would stupid and error 
prone because abs(-128) cannot be byte. In fact 0x80 should be 
the NaN of byte, 0x8000 the NaN of short, etc. instead of 
exceptional negative values.
Also the operations should stay within the same type (largest of 
the involved operands), e.g. short+short = short.
Internally it may be best to work with the machine word size 
(whatever that is - could be even 8 bit) but should be truncated 
to the intended result if not otherwise stated (via cast).

Aug 19 2023

Dukc <ajieskola gmail.com> writes:

On Saturday, 19 August 2023 at 10:43:59 UTC, Dom DiSc wrote:
 I hate those stupid promotion rules.
 Every literal should be of the smallest type that can represent 
 it
 (e.g. 0..255 should be ubyte, 256..65535 should be ushort, 
 etc., -1..-127 should be byte, -128..-32767 should be short, 
 ...) and can be promoted to whatever is needed implicitly.

It's not quite that simple. Consider:

```d
auto x = 40;
// 1600 now, 64 with your rules
x *= x;
```

Inference happens at initialisation of the variable. In this 
case, it's `ubyte` because nothing bigger is needed to hold the 
initialisation value. Inference can't detect what is needed 
later, that is why int literals default to 32 bits.

Aug 19 2023

sighoya <sighoya gmail.com> writes:

On Saturday, 19 August 2023 at 10:55:45 UTC, Dukc wrote:
 Inference happens at initialisation of the variable. In this 
 case, it's `ubyte` because nothing bigger is needed to hold the 
 initialisation value. Inference can't detect what is needed 
 later, that is why int literals default to 32 bits.

It would be better if 8 bit or 16 bit systems provide some sort 
of segmented pointer such that allocating 32 bit would be require 
a `struct Ptr {ubyte firstSegment, ubyte secondSegment, ubyte 
thirdSegment, ubyte fourthSegment}` on 8 bit architecture and 
`struct Ptr {ubyte firstSegment, ubyte secondSegment}` on 16 bit 
systems.

Aug 19 2023

sighoya <sighoya gmail.com> writes:

On Saturday, 19 August 2023 at 12:58:48 UTC, sighoya wrote:
 `struct Ptr {ubyte firstSegment, ubyte secondSegment}` on

  16 bit systems.

ubyte shold be ushort, sorry

Aug 19 2023

Dom DiSc <dominikus scherkl.de> writes:

On Saturday, 19 August 2023 at 10:55:45 UTC, Dukc wrote:
 ```d
 auto x = 40;
 // 1600 now, 64 with your rules
 x *= x;
 ```

 Inference happens at initialisation of the variable.

Sorry, but if you say nothing about which type you want, you may 
always be surprised unless you know the inference rules.
You could have written
```d
int x = 40;
```
which incidentally is even one character shorter. And then you 
have the guarantee that you have 31 bit (+sign) for your 
calculations - no matter on which hardware you're working.
In your example you expect the C rules. But if the program runs 
on a 8-bit system, what you get will be a (signed) byte! So the 
result of your square may be even negative, breaking a whole lot 
of further assumptions on the result.

Aug 19 2023

Johan <j j.nl> writes:

On Saturday, 19 August 2023 at 10:09:56 UTC, Dukc wrote:
 This is a request for comments. I've written it as a Git gist 
 as opposed to forum post because it's pretty long and may 
 warrant some editing at some point.

 Link: 
 https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe

Nice, thanks.
I've added my comment (make a table of all interesting cases!) to 
the gist.

cheers,
   Johan

Aug 19 2023

Dukc <ajieskola gmail.com> writes:

On Saturday, 19 August 2023 at 11:29:58 UTC, Johan wrote:
 Nice, thanks.
 I've added my comment (make a table of all interesting cases!) 
 to the gist.

Thanks for the table. I'll see about double-checking and 
integrating it to my post later. Until that - well, it's already 
there in you comment after all.

Aug 22 2023

Dukc <ajieskola gmail.com> writes:

On Tuesday, 22 August 2023 at 08:25:19 UTC, Dukc wrote:
 On Saturday, 19 August 2023 at 11:29:58 UTC, Johan wrote:
 Nice, thanks.
 I've added my comment (make a table of all interesting cases!) 
 to the gist.

 Thanks for the table. I'll see about double-checking and 
 integrating it to my post later. Until that - well, it's 
 already there in you comment after all.

The table is now included in the gist with my checking and 
additions.

Aug 24 2023

Walter Bright <newshound2 digitalmars.com> writes:

Thanks for taking the time to sum up the issues.

I have a ton of experience with 16 bit code. Abandoning it was an explicit 
decision for D, mainly to make code portable. Writing code that was portable 
between 16 and 32 bit was always a major effort for non-trivial code. 
Fortunately, these days, porting between 32 and 64 bit is trivial.

Some things are impractical for 16 bit code:

1. exception handling
2. garbage collection
3. typeinfo
4. classes (unless using the far memory model)
5. likely the bulk of druntime

I.e. sticking with betterC is more practical.

Some things are impractical for D:

1. mixed near/far pointers

It's fine if a D targeted at 16 bit code has somewhat different semantics. I 
oppose changing the semantics of 32/64 bit D, as it would break everything.

32 bit integer arithmetic is going to be too slow and consume too much code 
space, likely unnecessarily.

You correctly identified integer promotion as the cause of most trouble.

So, I propose the following modification for 16 bit code generation:

Keep integer promotion, but have it promote to short rather than int. I haven't 
thought about it deeply, but first impression says that it will resolve most of 
the issues.

This will require changing many of the ints in the source code to shorts. How 
onerous that would be, I do not know. One could do something like:

     version (SixteenBit)
         alias xint = short;
     else
         alias xint = int;

which would make the source code more portable, but keep in mind that integer 
overflow in the 16 bit world is a major source of unintended problems. 32 bit D 
code converted to 16 bit will need a thorough review.

Programs targeted at 16 bits are, naturally, going to be small programs. I
doubt 
many D programs are that small. So there shouldn't be too much source code that 
needs modifying.

Anyhow, it seems like a fun project you're working on!

Aug 20 2023

Walter Bright <newshound2 digitalmars.com> writes:

To be clear, my proposal would mean that size_t would be an alias for ushort, 
and ptrdiff_t would be an alias for short.

Aug 20 2023

Johan <j j.nl> writes:

On Sunday, 20 August 2023 at 19:15:13 UTC, Walter Bright wrote:
 To be clear, my proposal would mean that size_t would be an 
 alias for ushort, and ptrdiff_t would be an alias for short.

Hi Walter,
   To have a more productive discussion, can you go through the 
table in my comment on the gist and look at the proposal column? 
I think it captures what you wrote.
Forums posts are not suited for this, we need clear definitions 
and a full overview. There are too many cases to keep track of, 
hence the table I made. (and I am sure there are a bunch of items 
still missing from the table)

Please add additional corner cases you can come up with. Just a 
piece of code with an unresolved 16-bit question is OK, proposed 
solution is optional (that can be filled in later).

thanks,
   Johan

Aug 20 2023

Patrick Schluter <Patrick.Schluter bbox.fr> writes:

On Sunday, 20 August 2023 at 19:15:13 UTC, Walter Bright wrote:
 To be clear, my proposal would mean that size_t would be an 
 alias for ushort, and ptrdiff_t would be an alias for short.

Most 16 bit machines use 32 bit sized size_t (m68k). The 16 bit 
size_t of x86 real mode is an anomaly.

Aug 25 2023

Dukc <ajieskola gmail.com> writes:

On Friday, 25 August 2023 at 17:01:38 UTC, Patrick Schluter wrote:
 On Sunday, 20 August 2023 at 19:15:13 UTC, Walter Bright wrote:
 To be clear, my proposal would mean that size_t would be an 
 alias for ushort, and ptrdiff_t would be an alias for short.

 Most 16 bit machines use 32 bit sized size_t (m68k). The 16 bit 
 size_t of x86 real mode is an anomaly.

Maybe. But in this thread we mean 16-bit as the pointer size. In 
fact the platform I'm experimenting in is actually 8-bit with 
16-bit pointers: AVR microcontroller.

Aug 25 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 21/08/2023 7:13 AM, Walter Bright wrote:
 Keep integer promotion, but have it promote to short rather than int. I 
 haven't thought about it deeply, but first impression says that it will 
 resolve most of the issues.

This does beg another question. Should we make integer promotion tied to 
the largest general purpose (fast) registers for a given target? So 
AMD64 that'll be (u)long. Whereas 16bit x86 that'll be (u)short.

Aug 20 2023

Johan <j j.nl> writes:

On Sunday, 20 August 2023 at 20:06:29 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 21/08/2023 7:13 AM, Walter Bright wrote:
 Keep integer promotion, but have it promote to short rather 
 than int. I haven't thought about it deeply, but first 
 impression says that it will resolve most of the issues.

 This does beg another question. Should we make integer 
 promotion tied to the largest general purpose (fast) registers 
 for a given target? So AMD64 that'll be (u)long. Whereas 16bit 
 x86 that'll be (u)short.

Please don't make this discussion any larger than it should be.

-Johan

Aug 20 2023

Walter Bright <newshound2 digitalmars.com> writes:

I welcome discussion on Rikki's idea, but it should start a new thread.

Aug 20 2023

"Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:

On 21/08/2023 7:13 AM, Walter Bright wrote:
 Some things are impractical for D:
 
  1. mixed near/far pointers

I've been thinking about this and I don't think that this is true.

Add two new core.attributes udas.

``enum near;`` and ``enum far;``

If we need segment support ``enum uda(string segment:"FS");`` would suffice.

Require it on all variable declarations, easy! Oh and don't forget to do 
matching of argument/expression validation to declaration for  safe code.

Aug 20 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 8/20/2023 8:48 PM, Richard (Rikki) Andrew Cattermole wrote:
 On 21/08/2023 7:13 AM, Walter Bright wrote:
 Some things are impractical for D:

  1. mixed near/far pointers

 
 I've been thinking about this and I don't think that this is true.

I've lived with that problem for 20 years. If there was an answer, one of the 
compiler vendors would have found it. It's ugly, and there are all kinds of 
problems with it.

Interestingly, C++ compilers designed from the ground up to support multiple 
pointer types did work, up to a point. Compilers designed for 32 bit processors 
were never ported to near/far. It would be like a cyclone let loose on its
innards.


 Add two new core.attributes udas.
 
 ``enum near;`` and ``enum far;``
 
 If we need segment support ``enum uda(string segment:"FS");`` would suffice.
 
 Require it on all variable declarations, easy! Oh and don't forget to do 
 matching of argument/expression validation to declaration for  safe code.

That's only the beginning of the problems.

Aug 20 2023

Dukc <ajieskola gmail.com> writes:

On Monday, 21 August 2023 at 03:48:04 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 21/08/2023 7:13 AM, Walter Bright wrote:
 Some things are impractical for D:
 
  1. mixed near/far pointers

 I've been thinking about this and I don't think that this is 
 true.

 Add two new core.attributes udas.

 ``enum near;`` and ``enum far;``

 If we need segment support ``enum uda(string segment:"FS");`` 
 would suffice.

 Require it on all variable declarations, easy! Oh and don't 
 forget to do matching of argument/expression validation to 
 declaration for  safe code.

I don't think we need to discuss near/far pointers here. This is 
about your regular monosize pointers that just happen to be under 
32 bits. You not have the far pointer option at all. What to do 
when the platform has two different pointer widths, be they 16/32 
or 32/64, is a different question.

Aug 22 2023

Dukc <ajieskola gmail.com> writes:

On Sunday, 20 August 2023 at 19:13:36 UTC, Walter Bright wrote:
 So, I propose the following modification for 16 bit code 
 generation:

 Keep integer promotion, but have it promote to short rather 
 than int. I haven't thought about it deeply, but first 
 impression says that it will resolve most of the issues.

"for 16 bit code generation" - meaning, the int promotion target 
won't change for 32-bit platforms?

 This will require changing many of the ints in the source code 
 to shorts. How onerous that would be, I do not know. One could 
 do something like:

I think this is otherwise actually good, but has one annoying 
trait left: integer literals. To make you code portable, you'll 
have to wrap your integers when doing pointer arithmetic or 
handling array lengths: `ptr += size_t(3)` or `new 
ubyte[oldArr.length + size_t(3)]`. But this one does not have the 
performance problems of the 32-bit size_t solution I initially 
was in favour of.

Still, I'd prefer changing the value range propagation rules 
instead as that:
  - works exactly the same way regardless of pointer width.
  - does not mandate wrapping in literals with `size_t` or 
`ptrdiff_t` constructor.
  - makes non memory address related 8 and 16 bit arithmetic more 
confortable to do while there.

But either solution is a lot better than the current situation, 
at least when going by what the spec says.

Aug 22 2023

Walter Bright <newshound2 digitalmars.com> writes:

On 8/19/2023 3:09 AM, Dukc wrote:
 This is a request for comments. I've written it as a Git gist as opposed to 
 forum post because it's pretty long and may warrant some editing at some point.
 
 Link: https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe
 
 Discussion can remain here though.

Ok, it looks like the same as the "16bit Arch Proposal size_t=16bit, integer 
promotion to short" column. Looks like you beat me to it!

Aug 20 2023

D Programming

C/C++ Programming

Other

digitalmars.D - D design problem on platforms with <32 bit pointer width