digitalmars.D.learn - Integer precision of function return types

Per =?UTF-8?B?Tm9yZGzDtnc=?= (16/16) Sep 25 2024 Should a function like

monkyyy (4/20) Sep 26 2024 It will only matter if its stored; stack or the very probable
user1234 (7/23) Sep 26 2024 I have no conclusive answer:
Salih Dincer (61/77) Sep 26 2024 When I use standard library facilities, I try to use ubyte; for
Quirin Schroll (32/48) Sep 26 2024 ```d
thinkunix (5/23) Sep 26 2024 What about using 'auto' as the return type?

monkyyy (3/8) Sep 26 2024 You have started a style debate that will last a week, great work

thinkunix (8/17) Sep 27 2024 That was not my intent. It was an honest question. I'm here to learn
H. S. Teoh (32/38) Sep 27 2024 In idiomatic D, you'd use `auto` when either (1) you don't care what the

Salih Dincer (8/24) Sep 27 2024 Once my range didn't work because I used **auto** instead of

Jonathan M Davis (62/78) Sep 27 2024 Well, I don't think that auto is a particularly controversial topic amon...
thinkunix (2/7) Sep 27 2024 Thank you! That was a very helpful response.
thinkunix (5/7) Sep 27 2024 Thank you Jonathan for that very detailed response.

Jonathan M Davis (21/37) Sep 27 2024 I would argue that ubyte would be better, because it's guaranteed to fit

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

Should a function like

```d
uint parseHex(in char ch) pure nothrow  safe  nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}
```

instead return an ubyte?

Sep 25 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?

It will only matter if its stored; stack or the very probable 
inlining optimizations should just be as simple as possible so 
you dont confuse the optimizer

Sep 26 2024

user1234 <user1234 12.de> writes:

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?

I have no conclusive answer:

- From an ABI PoV that does not matter, it's AL vs EAX , i.e same 
"parent" register.
- From a self-documenting PoV I'd use ubyte. But then you hit the 
problem of promotion of `ch - ...` and you have to cast each of 
them.

Sep 26 2024

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?

When I use standard library facilities, I try to use ubyte; for 
example:

(See "toggle comment" in the section...)

```d
void parseFromHexString(R)(out R hex, const(char)[] str)
{
   import std.algorithm : map, copy;
   import std.conv      : to;
   import std.range     : front, chunks;

   alias T = typeof(hex.front);
   str.chunks(T.sizeof * 2)
      .map!(bin => bin
      .to!T(16))
      .copy(hex[]);
}

import std.stdio;
void main()
{
   enum hex = "48656C6C6F2044202620576F726C6421";
   enum size = hex.length / 2;

   auto sample = imported!"std.conv".hexString!hex;
   sample.writeln; // Hello D & World!

   enum hexStr = x"48656C6C6F2044202620576F726C6421";
   hexStr.writeln; // Hello D & World!
   assert(is(typeof(hexStr) == string));

   immutable int[] intStr = x"48656C6C6F2044202620576F726C6421";
   intStr.writeln; // [1214606444, 1864385568, 639653743, 
1919706145]


   int[size] buf;
   buf.parseFromHexString(hex);
   buf.writeln;

   //char[size] buff; /*
   ubyte[size] buff;/* please toggle comment with above */
   buff.parseFromHexString("BADEDE");
   buff.writeln;
```

But when I try to do something with my own functions, I have 
control and I do what I want. You can also use char below, ubyte 
is not a problem either:

```d
auto toHexDigit(char value)
{
   if(value > 9) value += 7;
   return '0' + value;
}

auto toHexString(R)(R str)
{
   string result;

   char a, b;
   foreach(char c; str)
   {
     a = c / 16; b = c % 16;
     result ~= a.toHexDigit;
     result ~= b.toHexDigit;
   }
   return result;
}

void main() {  assert(sample.toHexString == hex); }
```

SDB 79

Sep 26 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?

```d
ubyte parseHex(immutable char ch) pure nothrow  safe  nogc {
	switch (ch) {
	case '0': .. case '9':
		return (ch - '0') & 0x0F;
	case 'a': .. case 'f':
		return (10 + ch - 'a') & 0x0F;
	case 'A': .. case 'F':
		return (10 + ch - 'A') & 0x0F;
	default:
		assert(0, "Non-hexadecimal character");
	}
}
```

I’d say yes, use `ubyte`. I also did two things:
- `(…) & 0x0F` to enable value-range propagation. Essentially, 
the compiler understands that the result of `&` will only ever be 
the minimum of the operands and one operand is `0x0F` which fits 
in a `ubyte`, therefore the expression implicitly converts. 
Always use implicit conversions when they avoid using `cast`. 
With `cast`, in general, you can do bad things. The compiler only 
allows safe casts implicitly, even in ` system` code. Your code 
is marked ` safe`, but this is general advice.
- I removed `in` from the parameter and used `immutable`. The 
`in` storage class means `const` as of now, but with the 
`-preview=in` and `-preview=dip1000` switches combined, it also 
means `scope` and `scope` means something to DIP1000, which can 
become dangerous on ` system` code. Do not use `in` unless you 
know why exactly you’re using it.

Also, for what it’s worth, you could use an `in` and `out` 
contract.

Sep 26 2024

thinkunix <thinkunix zoho.com> writes:

Per Nordlöw via Digitalmars-d-learn wrote:
 Should a function like
 
 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
      switch (ch) {
      case '0': .. case '9':
          return ch - '0';
      case 'a': .. case 'f':
          return 10 + ch - 'a';
      case 'A': .. case 'F':
          return 10 + ch - 'A';
      default:
          assert(0, "Non-hexadecimal character");
      }
 }
 ```
 
 instead return an ubyte?

What about using 'auto' as the return type?
I tried it and it seemed to work OK.

Wondering if there are any good reasons to use auto,
or bad reasons why not to use auto here?

Sep 26 2024

monkyyy <crazymonkyyy gmail.com> writes:

On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
 
 What about using 'auto' as the return type?
 I tried it and it seemed to work OK.

 Wondering if there are any good reasons to use auto,
 or bad reasons why not to use auto here?

You have started a style debate that will last a week, great work

Auto is fantastic and everyone should use it more

Sep 26 2024

thinkunix <thinkunix zoho.com> writes:

monkyyy via Digitalmars-d-learn wrote:
 On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
 What about using 'auto' as the return type?
 I tried it and it seemed to work OK.

 Wondering if there are any good reasons to use auto,
 or bad reasons why not to use auto here?

 
 You have started a style debate that will last a week, great work

That was not my intent.  It was an honest question.  I'm here to learn
and not looking to start debates or for attitude.

I've seen a lot of "use auto everywhere" especially in C++ and was
wondering where the D community stands on it's use.  Is it generally
favored or not?

Personally, I think auto makes understanding code harder for humans.
But in this example, it seemed like auto was a good fit.

Sep 27 2024

"H. S. Teoh" <hsteoh qfbox.info> writes:

On Fri, Sep 27, 2024 at 04:13:45PM -0400, thinkunix via Digitalmars-d-learn
wrote:
[...]
 I've seen a lot of "use auto everywhere" especially in C++ and was
 wondering where the D community stands on it's use.  Is it generally
 favored or not?
 
 Personally, I think auto makes understanding code harder for humans.
 But in this example, it seemed like auto was a good fit.

In idiomatic D, you'd use `auto` when either (1) you don't care what the
type is, you just want whatever value you get to be shoved into a
variable, or (2) you *shouldn't* care what the type is, because your
code shouldn't be depending on it, e.g., when you're using Voldemort
types std.algorithm-style.

The reason for (2) is that in UFCS chains, the only thing you really
only care about is what kind of range it is that you're dealing with,
and maybe the element type.  What exactly the container type is, is
unimportant, and in fact, stating it explicitly is detrimental to
maintenance because the library that gave you that type may change the
concrete type in the future while retaining the same range and element
type.  So by naming an explicit type for the range, you introduce a
potential needless breakage in your code when you next upgrade the
library.  Instead, use `auto` to let the compiler figure out what the
concrete type is, as long as it conforms to the expected range semantics
and has a compatible element type, your code will continue to work as
before.

This applies not only to library upgrades, but also to code maintenance,
e.g., if you decide to reshuffle the elements of a UFCS chain to fix a
bug or introduce a new feature.  If explicit types were always used,
every such change would entail finding out and updating the type of
every component in the chain -- for long chains, this quickly becomes
onerous and unmaintainable.  Instead, use `auto` to let the compiler
figure it all out for you, and make your code independent of the
concrete type so that you can simply move things around just by cutting
and pasting, and you don't have to worry about updating every single
referenced type.


T

-- 
How do you solve any equation?  Multiply both sides by zero.

Sep 27 2024

Salih Dincer <salihdb hotmail.com> writes:

On Friday, 27 September 2024 at 20:28:21 UTC, H. S. Teoh wrote:
 ...
 The reason for (2) is that in UFCS chains, the only thing you 
 really only care about is what kind of range it is that you're 
 dealing with, and maybe the element type.  What exactly the 
 container type is, is unimportant, and in fact, stating it 
 explicitly is detrimental to maintenance because the library 
 that gave you that type may change the concrete type in the 
 future while retaining the same range and element type.  So by 
 naming an explicit type for the range, you introduce a 
 potential needless breakage in your code when you next upgrade 
 the library.  Instead, use `auto` to let the compiler figure 
 out what the concrete type is, as long as it conforms to the 
 expected range semantics and has a compatible element type, 
 your code will continue to work as before.
 ...
 

Once my range didn't work because I used **auto** instead of 
**bool** in the standard InputRange functions (I think it had 
something to do with **length()** too...). As I said, I'm not 
sure, it could also be **size_t length()**. So there are subtle 
cases where we should use auto, I wish I could show you but I 
can't think of any.

SDB 79

Sep 27 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Friday, September 27, 2024 2:13:45 PM MDT thinkunix via Digitalmars-d-learn 
wrote:
 monkyyy via Digitalmars-d-learn wrote:
 On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
 What about using 'auto' as the return type?
 I tried it and it seemed to work OK.

 Wondering if there are any good reasons to use auto,
 or bad reasons why not to use auto here?

 You have started a style debate that will last a week, great work

 That was not my intent.  It was an honest question.  I'm here to learn
 and not looking to start debates or for attitude.

 I've seen a lot of "use auto everywhere" especially in C++ and was
 wondering where the D community stands on it's use.  Is it generally
 favored or not?

 Personally, I think auto makes understanding code harder for humans.
 But in this example, it seemed like auto was a good fit.

Well, I don't think that auto is a particularly controversial topic among D
programmers, but there's no question that in at least some cases, it becomes
a question of preference and style.

auto is used heavily in D, and for some kinds of code, it has to be. In
particular, range-based code routinely returns "Voldemort" types (types
which you can't name) from a function. You care that it's a range (so that
needs to be clear from the documentation), but you're not supposed to know
or care what the actual type is. Before we had that, the signatures for a
lot of range-based functions (e.g. most of std.algorithm) were actually
pretty terrible to read and understand, because the explicit types were so
long and messy (since it's very common for ranges to wrap one another, and
they're usually templated). In a lot of those situations, you really only
care about the API of the type and not what the exact type is. So, auto
simplifies the code considerably and makes it easier to understand.

auto is also pretty critical in generic code in general, because it allows
you to not worry about the exact type you're dealing with. You just need to
worry about which kinds of operations work with that particular type. So,
variables get declared as auto all the time in typical D code. A lot of D
programmers will only use the type explicitly with variables if they want to
force a particular type or if they think that it makes that code easier to
understand.

auto can also make code more maintainable in that when refactoring code, the
type will update automatically, so if it changes, you don't have to change
it all over the place. On the flip side though, it does make it harder to
know what types you're dealing with, which can sometimes make it harder to
read and maintain code. So, when auto isn't actually needed, there's always
going to be some debate about whether it's better to use auto or not, and
that's subjective enough that you're really going to have to decide on your
own what works for you.

Also, for better or worse, because the function has to be compiled to
determine the actual return type (so it won't work with function prototypes
like in .di file), function attributes are inferred for auto return
functions just like they are for templated functions, so some D programmers
will make functions auto just to get the attribute inference.

I would guess that as a general rule, most folks prefer explicit types in
function signatures where auto isn't needed simply because it's
self-documenting at that point. So, I would think that most D programmers
would use an explicit type with the function being discussed in this thread,
but auto will certainly work just fine. Without any explicit casts within
the function, because doing math on char or ubyte results in int, the result
is going to be int (as opposed to uint like in the original post). So, if
that works for what the function is intended for, and the documentation is
clear about what's being returned, then it's not a problem.

However, in this particular case, it's arguably better to return ubyte than
int or uint. That's because the result will always fit within a ubyte, and
if you don't return a ubyte, the caller is going to have to cast to ubyte to
do something like assign the value to a ubyte. So, simply returning auto
as-is arguably isn't desirable. That being said, you can still use auto if
you want to. Because the math results in int, casts to ubyte will be
required regardless of whether the return type in the signature is ubyte or
auto, but you could choose to use auto and have the result be ubyte thanks
to the casts.

Ultimately though, I would argue that in this case, auto just makes the code
harder to understand. The documentation can easily say that the return type
is ubyte in spite of it saying auto, but it's just as easy to type ubyte,
and then the return type is immediately obvious instead of requiring
additional documentation just to say what's being returned. So, while auto
is used quite heavily in D code, I wouldn't expect many folks to choose to
use auto for this particular function.

- Jonathan M Davis

Sep 27 2024

thinkunix <thinkunix zoho.com> writes:

H. S. Teoh via Digitalmars-d-learn wrote:
 In idiomatic D, you'd use `auto` when either (1) you don't care what the
 type is, you just want whatever value you get to be shoved into a
 variable, or (2) you *shouldn't* care what the type is, because your
 code shouldn't be depending on it, e.g., when you're using Voldemort
 types std.algorithm-style.

Thank you!  That was a very helpful response.

Sep 27 2024

thinkunix <thinkunix zoho.com> writes:

Jonathan M Davis via Digitalmars-d-learn wrote:
 Well, I don't think that auto is a particularly controversial topic among D
 programmers...

Thank you Jonathan for that very detailed response.

This thread can end now unless others really feel the need to comment.
I got two outstanding responses and now have a much better understanding
of why and when to use auto.

Sep 27 2024

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, September 26, 2024 12:53:12 AM MDT Per Nordlöw via Digitalmars-d-
learn wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
   switch (ch) {
   case '0': .. case '9':
       return ch - '0';
   case 'a': .. case 'f':
       return 10 + ch - 'a';
   case 'A': .. case 'F':
       return 10 + ch - 'A';
   default:
       assert(0, "Non-hexadecimal character");
   }
 }
 ```

 instead return an ubyte?

I would argue that ubyte would be better, because it's guaranteed to fit
into a ubyte, but if it returns uint, then anyone who wants to assign it to
a ubyte will need to cast it, whereas you can just do the casts right here
(which could mean a lot less casting if this function is used much). Not
only that, but you'd be doing the casts in the code that controls the
result, so if something ever changes that makes it so that the type needs to
change (e.g. you make it operate on dchar instead of char), you won't end up
with callers casting to ubyte when the result then doesn't actually fit into
a ubyte - whereas if parseHex's function signature changes from returning
ubyte to returning ushort or uint or whatnot, then the change would be
caught at compile time with any code that assigned the result to a ubyte.

Now, I'm guessing that it wouldn't ever make sense to change this particular
function in a way that the return type needed to change, and returning uint
should ultimately work just fine, but I think that restricting the surface
area where narrowing casts are likely to happen will ultimately reduce the
risk of bugs, and I think that it's pretty clear that there will be less
casting overall if the casting is done here instead of at the call site
unless the function is barely ever used.

- Jonathan M Davis

Sep 27 2024

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Integer precision of function return types