www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - D design problem on platforms with <32 bit pointer width

reply Dukc <ajieskola gmail.com> writes:
This is a request for comments. I've written it as a Git gist as 
opposed to forum post because it's pretty long and may warrant 
some editing at some point.

Link: 
https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe

Discussion can remain here though.
Aug 19 2023
next sibling parent reply Dom DiSc <dominikus scherkl.de> writes:
On Saturday, 19 August 2023 at 10:09:56 UTC, Dukc wrote:
 This is a request for comments. I've written it as a Git gist 
 as opposed to forum post because it's pretty long and may 
 warrant some editing at some point.

 Link: 
 https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe

 Discussion can remain here though.
I hate those stupid promotion rules. Every literal should be of the smallest type that can represent it (e.g. 0..255 should be ubyte, 256..65535 should be ushort, etc., -1..-127 should be byte, -128..-32767 should be short, ...) and can be promoted to whatever is needed implicitly. Yes, I think -128 should NOT be byte. this would stupid and error prone because abs(-128) cannot be byte. In fact 0x80 should be the NaN of byte, 0x8000 the NaN of short, etc. instead of exceptional negative values. Also the operations should stay within the same type (largest of the involved operands), e.g. short+short = short. Internally it may be best to work with the machine word size (whatever that is - could be even 8 bit) but should be truncated to the intended result if not otherwise stated (via cast).
Aug 19 2023
parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 19 August 2023 at 10:43:59 UTC, Dom DiSc wrote:
 I hate those stupid promotion rules.
 Every literal should be of the smallest type that can represent 
 it
 (e.g. 0..255 should be ubyte, 256..65535 should be ushort, 
 etc., -1..-127 should be byte, -128..-32767 should be short, 
 ...) and can be promoted to whatever is needed implicitly.
It's not quite that simple. Consider: ```d auto x = 40; // 1600 now, 64 with your rules x *= x; ``` Inference happens at initialisation of the variable. In this case, it's `ubyte` because nothing bigger is needed to hold the initialisation value. Inference can't detect what is needed later, that is why int literals default to 32 bits.
Aug 19 2023
next sibling parent reply sighoya <sighoya gmail.com> writes:
On Saturday, 19 August 2023 at 10:55:45 UTC, Dukc wrote:
 Inference happens at initialisation of the variable. In this 
 case, it's `ubyte` because nothing bigger is needed to hold the 
 initialisation value. Inference can't detect what is needed 
 later, that is why int literals default to 32 bits.
It would be better if 8 bit or 16 bit systems provide some sort of segmented pointer such that allocating 32 bit would be require a `struct Ptr {ubyte firstSegment, ubyte secondSegment, ubyte thirdSegment, ubyte fourthSegment}` on 8 bit architecture and `struct Ptr {ubyte firstSegment, ubyte secondSegment}` on 16 bit systems.
Aug 19 2023
parent sighoya <sighoya gmail.com> writes:
On Saturday, 19 August 2023 at 12:58:48 UTC, sighoya wrote:
 `struct Ptr {ubyte firstSegment, ubyte secondSegment}` on
16 bit systems. ubyte shold be ushort, sorry
Aug 19 2023
prev sibling parent Dom DiSc <dominikus scherkl.de> writes:
On Saturday, 19 August 2023 at 10:55:45 UTC, Dukc wrote:
 ```d
 auto x = 40;
 // 1600 now, 64 with your rules
 x *= x;
 ```

 Inference happens at initialisation of the variable.
Sorry, but if you say nothing about which type you want, you may always be surprised unless you know the inference rules. You could have written ```d int x = 40; ``` which incidentally is even one character shorter. And then you have the guarantee that you have 31 bit (+sign) for your calculations - no matter on which hardware you're working. In your example you expect the C rules. But if the program runs on a 8-bit system, what you get will be a (signed) byte! So the result of your square may be even negative, breaking a whole lot of further assumptions on the result.
Aug 19 2023
prev sibling next sibling parent reply Johan <j j.nl> writes:
On Saturday, 19 August 2023 at 10:09:56 UTC, Dukc wrote:
 This is a request for comments. I've written it as a Git gist 
 as opposed to forum post because it's pretty long and may 
 warrant some editing at some point.

 Link: 
 https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe
Nice, thanks. I've added my comment (make a table of all interesting cases!) to the gist. cheers, Johan
Aug 19 2023
parent reply Dukc <ajieskola gmail.com> writes:
On Saturday, 19 August 2023 at 11:29:58 UTC, Johan wrote:
 Nice, thanks.
 I've added my comment (make a table of all interesting cases!) 
 to the gist.
Thanks for the table. I'll see about double-checking and integrating it to my post later. Until that - well, it's already there in you comment after all.
Aug 22 2023
parent Dukc <ajieskola gmail.com> writes:
On Tuesday, 22 August 2023 at 08:25:19 UTC, Dukc wrote:
 On Saturday, 19 August 2023 at 11:29:58 UTC, Johan wrote:
 Nice, thanks.
 I've added my comment (make a table of all interesting cases!) 
 to the gist.
Thanks for the table. I'll see about double-checking and integrating it to my post later. Until that - well, it's already there in you comment after all.
The table is now included in the gist with my checking and additions.
Aug 24 2023
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Thanks for taking the time to sum up the issues.

I have a ton of experience with 16 bit code. Abandoning it was an explicit 
decision for D, mainly to make code portable. Writing code that was portable 
between 16 and 32 bit was always a major effort for non-trivial code. 
Fortunately, these days, porting between 32 and 64 bit is trivial.

Some things are impractical for 16 bit code:

1. exception handling
2. garbage collection
3. typeinfo
4. classes (unless using the far memory model)
5. likely the bulk of druntime

I.e. sticking with betterC is more practical.

Some things are impractical for D:

1. mixed near/far pointers

It's fine if a D targeted at 16 bit code has somewhat different semantics. I 
oppose changing the semantics of 32/64 bit D, as it would break everything.

32 bit integer arithmetic is going to be too slow and consume too much code 
space, likely unnecessarily.

You correctly identified integer promotion as the cause of most trouble.

So, I propose the following modification for 16 bit code generation:

Keep integer promotion, but have it promote to short rather than int. I haven't 
thought about it deeply, but first impression says that it will resolve most of 
the issues.

This will require changing many of the ints in the source code to shorts. How 
onerous that would be, I do not know. One could do something like:

     version (SixteenBit)
         alias xint = short;
     else
         alias xint = int;

which would make the source code more portable, but keep in mind that integer 
overflow in the 16 bit world is a major source of unintended problems. 32 bit D 
code converted to 16 bit will need a thorough review.

Programs targeted at 16 bits are, naturally, going to be small programs. I
doubt 
many D programs are that small. So there shouldn't be too much source code that 
needs modifying.

Anyhow, it seems like a fun project you're working on!
Aug 20 2023
next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
To be clear, my proposal would mean that size_t would be an alias for ushort, 
and ptrdiff_t would be an alias for short.
Aug 20 2023
next sibling parent Johan <j j.nl> writes:
On Sunday, 20 August 2023 at 19:15:13 UTC, Walter Bright wrote:
 To be clear, my proposal would mean that size_t would be an 
 alias for ushort, and ptrdiff_t would be an alias for short.
Hi Walter, To have a more productive discussion, can you go through the table in my comment on the gist and look at the proposal column? I think it captures what you wrote. Forums posts are not suited for this, we need clear definitions and a full overview. There are too many cases to keep track of, hence the table I made. (and I am sure there are a bunch of items still missing from the table) Please add additional corner cases you can come up with. Just a piece of code with an unresolved 16-bit question is OK, proposed solution is optional (that can be filled in later). thanks, Johan
Aug 20 2023
prev sibling parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Sunday, 20 August 2023 at 19:15:13 UTC, Walter Bright wrote:
 To be clear, my proposal would mean that size_t would be an 
 alias for ushort, and ptrdiff_t would be an alias for short.
Most 16 bit machines use 32 bit sized size_t (m68k). The 16 bit size_t of x86 real mode is an anomaly.
Aug 25 2023
parent Dukc <ajieskola gmail.com> writes:
On Friday, 25 August 2023 at 17:01:38 UTC, Patrick Schluter wrote:
 On Sunday, 20 August 2023 at 19:15:13 UTC, Walter Bright wrote:
 To be clear, my proposal would mean that size_t would be an 
 alias for ushort, and ptrdiff_t would be an alias for short.
Most 16 bit machines use 32 bit sized size_t (m68k). The 16 bit size_t of x86 real mode is an anomaly.
Maybe. But in this thread we mean 16-bit as the pointer size. In fact the platform I'm experimenting in is actually 8-bit with 16-bit pointers: AVR microcontroller.
Aug 25 2023
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 21/08/2023 7:13 AM, Walter Bright wrote:
 Keep integer promotion, but have it promote to short rather than int. I 
 haven't thought about it deeply, but first impression says that it will 
 resolve most of the issues.
This does beg another question. Should we make integer promotion tied to the largest general purpose (fast) registers for a given target? So AMD64 that'll be (u)long. Whereas 16bit x86 that'll be (u)short.
Aug 20 2023
parent reply Johan <j j.nl> writes:
On Sunday, 20 August 2023 at 20:06:29 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 21/08/2023 7:13 AM, Walter Bright wrote:
 Keep integer promotion, but have it promote to short rather 
 than int. I haven't thought about it deeply, but first 
 impression says that it will resolve most of the issues.
This does beg another question. Should we make integer promotion tied to the largest general purpose (fast) registers for a given target? So AMD64 that'll be (u)long. Whereas 16bit x86 that'll be (u)short.
Please don't make this discussion any larger than it should be. -Johan
Aug 20 2023
parent Walter Bright <newshound2 digitalmars.com> writes:
I welcome discussion on Rikki's idea, but it should start a new thread.
Aug 20 2023
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 21/08/2023 7:13 AM, Walter Bright wrote:
 Some things are impractical for D:
 
  1. mixed near/far pointers
I've been thinking about this and I don't think that this is true. Add two new core.attributes udas. ``enum near;`` and ``enum far;`` If we need segment support ``enum uda(string segment:"FS");`` would suffice. Require it on all variable declarations, easy! Oh and don't forget to do matching of argument/expression validation to declaration for safe code.
Aug 20 2023
next sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/20/2023 8:48 PM, Richard (Rikki) Andrew Cattermole wrote:
 On 21/08/2023 7:13 AM, Walter Bright wrote:
 Some things are impractical for D:

  1. mixed near/far pointers
I've been thinking about this and I don't think that this is true.
I've lived with that problem for 20 years. If there was an answer, one of the compiler vendors would have found it. It's ugly, and there are all kinds of problems with it. Interestingly, C++ compilers designed from the ground up to support multiple pointer types did work, up to a point. Compilers designed for 32 bit processors were never ported to near/far. It would be like a cyclone let loose on its innards.
 Add two new core.attributes udas.
 
 ``enum near;`` and ``enum far;``
 
 If we need segment support ``enum uda(string segment:"FS");`` would suffice.
 
 Require it on all variable declarations, easy! Oh and don't forget to do 
 matching of argument/expression validation to declaration for  safe code.
That's only the beginning of the problems.
Aug 20 2023
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Monday, 21 August 2023 at 03:48:04 UTC, Richard (Rikki) Andrew 
Cattermole wrote:
 On 21/08/2023 7:13 AM, Walter Bright wrote:
 Some things are impractical for D:
 
  1. mixed near/far pointers
I've been thinking about this and I don't think that this is true. Add two new core.attributes udas. ``enum near;`` and ``enum far;`` If we need segment support ``enum uda(string segment:"FS");`` would suffice. Require it on all variable declarations, easy! Oh and don't forget to do matching of argument/expression validation to declaration for safe code.
I don't think we need to discuss near/far pointers here. This is about your regular monosize pointers that just happen to be under 32 bits. You not have the far pointer option at all. What to do when the platform has two different pointer widths, be they 16/32 or 32/64, is a different question.
Aug 22 2023
prev sibling parent Dukc <ajieskola gmail.com> writes:
On Sunday, 20 August 2023 at 19:13:36 UTC, Walter Bright wrote:
 So, I propose the following modification for 16 bit code 
 generation:

 Keep integer promotion, but have it promote to short rather 
 than int. I haven't thought about it deeply, but first 
 impression says that it will resolve most of the issues.
"for 16 bit code generation" - meaning, the int promotion target won't change for 32-bit platforms?
 This will require changing many of the ints in the source code 
 to shorts. How onerous that would be, I do not know. One could 
 do something like:
I think this is otherwise actually good, but has one annoying trait left: integer literals. To make you code portable, you'll have to wrap your integers when doing pointer arithmetic or handling array lengths: `ptr += size_t(3)` or `new ubyte[oldArr.length + size_t(3)]`. But this one does not have the performance problems of the 32-bit size_t solution I initially was in favour of. Still, I'd prefer changing the value range propagation rules instead as that: - works exactly the same way regardless of pointer width. - does not mandate wrapping in literals with `size_t` or `ptrdiff_t` constructor. - makes non memory address related 8 and 16 bit arithmetic more confortable to do while there. But either solution is a lot better than the current situation, at least when going by what the spec says.
Aug 22 2023
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 8/19/2023 3:09 AM, Dukc wrote:
 This is a request for comments. I've written it as a Git gist as opposed to 
 forum post because it's pretty long and may warrant some editing at some point.
 
 Link: https://gist.github.com/dukc/04ea4d4a248ff4709f89d5808f67a5fe
 
 Discussion can remain here though.
Ok, it looks like the same as the "16bit Arch Proposal size_t=16bit, integer promotion to short" column. Looks like you beat me to it!
Aug 20 2023