www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Integer precision of function return types

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
Should a function like

```d
uint parseHex(in char ch) pure nothrow  safe  nogc {
	switch (ch) {
	case '0': .. case '9':
		return ch - '0';
	case 'a': .. case 'f':
		return 10 + ch - 'a';
	case 'A': .. case 'F':
		return 10 + ch - 'A';
	default:
		assert(0, "Non-hexadecimal character");
	}
}
```

instead return an ubyte?
Sep 25
next sibling parent monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?
It will only matter if its stored; stack or the very probable inlining optimizations should just be as simple as possible so you dont confuse the optimizer
Sep 26
prev sibling next sibling parent user1234 <user1234 12.de> writes:
On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?
I have no conclusive answer: - From an ABI PoV that does not matter, it's AL vs EAX , i.e same "parent" register. - From a self-documenting PoV I'd use ubyte. But then you hit the problem of promotion of `ch - ...` and you have to cast each of them.
Sep 26
prev sibling next sibling parent Salih Dincer <salihdb hotmail.com> writes:
On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?
When I use standard library facilities, I try to use ubyte; for example: (See "toggle comment" in the section...) ```d void parseFromHexString(R)(out R hex, const(char)[] str) { import std.algorithm : map, copy; import std.conv : to; import std.range : front, chunks; alias T = typeof(hex.front); str.chunks(T.sizeof * 2) .map!(bin => bin .to!T(16)) .copy(hex[]); } import std.stdio; void main() { enum hex = "48656C6C6F2044202620576F726C6421"; enum size = hex.length / 2; auto sample = imported!"std.conv".hexString!hex; sample.writeln; // Hello D & World! enum hexStr = x"48656C6C6F2044202620576F726C6421"; hexStr.writeln; // Hello D & World! assert(is(typeof(hexStr) == string)); immutable int[] intStr = x"48656C6C6F2044202620576F726C6421"; intStr.writeln; // [1214606444, 1864385568, 639653743, 1919706145] int[size] buf; buf.parseFromHexString(hex); buf.writeln; //char[size] buff; /* ubyte[size] buff;/* please toggle comment with above */ buff.parseFromHexString("BADEDE"); buff.writeln; ``` But when I try to do something with my own functions, I have control and I do what I want. You can also use char below, ubyte is not a problem either: ```d auto toHexDigit(char value) { if(value > 9) value += 7; return '0' + value; } auto toHexString(R)(R str) { string result; char a, b; foreach(char c; str) { a = c / 16; b = c % 16; result ~= a.toHexDigit; result ~= b.toHexDigit; } return result; } void main() { assert(sample.toHexString == hex); } ``` SDB 79
Sep 26
prev sibling next sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Thursday, 26 September 2024 at 06:53:12 UTC, Per Nordlöw wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
 	switch (ch) {
 	case '0': .. case '9':
 		return ch - '0';
 	case 'a': .. case 'f':
 		return 10 + ch - 'a';
 	case 'A': .. case 'F':
 		return 10 + ch - 'A';
 	default:
 		assert(0, "Non-hexadecimal character");
 	}
 }
 ```

 instead return an ubyte?
```d ubyte parseHex(immutable char ch) pure nothrow safe nogc { switch (ch) { case '0': .. case '9': return (ch - '0') & 0x0F; case 'a': .. case 'f': return (10 + ch - 'a') & 0x0F; case 'A': .. case 'F': return (10 + ch - 'A') & 0x0F; default: assert(0, "Non-hexadecimal character"); } } ``` I’d say yes, use `ubyte`. I also did two things: - `(…) & 0x0F` to enable value-range propagation. Essentially, the compiler understands that the result of `&` will only ever be the minimum of the operands and one operand is `0x0F` which fits in a `ubyte`, therefore the expression implicitly converts. Always use implicit conversions when they avoid using `cast`. With `cast`, in general, you can do bad things. The compiler only allows safe casts implicitly, even in ` system` code. Your code is marked ` safe`, but this is general advice. - I removed `in` from the parameter and used `immutable`. The `in` storage class means `const` as of now, but with the `-preview=in` and `-preview=dip1000` switches combined, it also means `scope` and `scope` means something to DIP1000, which can become dangerous on ` system` code. Do not use `in` unless you know why exactly you’re using it. Also, for what it’s worth, you could use an `in` and `out` contract.
Sep 26
prev sibling next sibling parent reply thinkunix <thinkunix zoho.com> writes:
Per Nordlöw via Digitalmars-d-learn wrote:
 Should a function like
 
 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
      switch (ch) {
      case '0': .. case '9':
          return ch - '0';
      case 'a': .. case 'f':
          return 10 + ch - 'a';
      case 'A': .. case 'F':
          return 10 + ch - 'A';
      default:
          assert(0, "Non-hexadecimal character");
      }
 }
 ```
 
 instead return an ubyte?
What about using 'auto' as the return type? I tried it and it seemed to work OK. Wondering if there are any good reasons to use auto, or bad reasons why not to use auto here?
Sep 26
parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
 
 What about using 'auto' as the return type?
 I tried it and it seemed to work OK.

 Wondering if there are any good reasons to use auto,
 or bad reasons why not to use auto here?
You have started a style debate that will last a week, great work Auto is fantastic and everyone should use it more
Sep 26
next sibling parent thinkunix <thinkunix zoho.com> writes:
monkyyy via Digitalmars-d-learn wrote:
 On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
 What about using 'auto' as the return type?
 I tried it and it seemed to work OK.

 Wondering if there are any good reasons to use auto,
 or bad reasons why not to use auto here?
You have started a style debate that will last a week, great work
That was not my intent. It was an honest question. I'm here to learn and not looking to start debates or for attitude. I've seen a lot of "use auto everywhere" especially in C++ and was wondering where the D community stands on it's use. Is it generally favored or not? Personally, I think auto makes understanding code harder for humans. But in this example, it seemed like auto was a good fit.
Sep 27
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Fri, Sep 27, 2024 at 04:13:45PM -0400, thinkunix via Digitalmars-d-learn
wrote:
[...]
 I've seen a lot of "use auto everywhere" especially in C++ and was
 wondering where the D community stands on it's use.  Is it generally
 favored or not?
 
 Personally, I think auto makes understanding code harder for humans.
 But in this example, it seemed like auto was a good fit.
In idiomatic D, you'd use `auto` when either (1) you don't care what the type is, you just want whatever value you get to be shoved into a variable, or (2) you *shouldn't* care what the type is, because your code shouldn't be depending on it, e.g., when you're using Voldemort types std.algorithm-style. The reason for (2) is that in UFCS chains, the only thing you really only care about is what kind of range it is that you're dealing with, and maybe the element type. What exactly the container type is, is unimportant, and in fact, stating it explicitly is detrimental to maintenance because the library that gave you that type may change the concrete type in the future while retaining the same range and element type. So by naming an explicit type for the range, you introduce a potential needless breakage in your code when you next upgrade the library. Instead, use `auto` to let the compiler figure out what the concrete type is, as long as it conforms to the expected range semantics and has a compatible element type, your code will continue to work as before. This applies not only to library upgrades, but also to code maintenance, e.g., if you decide to reshuffle the elements of a UFCS chain to fix a bug or introduce a new feature. If explicit types were always used, every such change would entail finding out and updating the type of every component in the chain -- for long chains, this quickly becomes onerous and unmaintainable. Instead, use `auto` to let the compiler figure it all out for you, and make your code independent of the concrete type so that you can simply move things around just by cutting and pasting, and you don't have to worry about updating every single referenced type. T -- How do you solve any equation? Multiply both sides by zero.
Sep 27
parent Salih Dincer <salihdb hotmail.com> writes:
On Friday, 27 September 2024 at 20:28:21 UTC, H. S. Teoh wrote:
 ...
 The reason for (2) is that in UFCS chains, the only thing you 
 really only care about is what kind of range it is that you're 
 dealing with, and maybe the element type.  What exactly the 
 container type is, is unimportant, and in fact, stating it 
 explicitly is detrimental to maintenance because the library 
 that gave you that type may change the concrete type in the 
 future while retaining the same range and element type.  So by 
 naming an explicit type for the range, you introduce a 
 potential needless breakage in your code when you next upgrade 
 the library.  Instead, use `auto` to let the compiler figure 
 out what the concrete type is, as long as it conforms to the 
 expected range semantics and has a compatible element type, 
 your code will continue to work as before.
 ...
 
Once my range didn't work because I used **auto** instead of **bool** in the standard InputRange functions (I think it had something to do with **length()** too...). As I said, I'm not sure, it could also be **size_t length()**. So there are subtle cases where we should use auto, I wish I could show you but I can't think of any. SDB 79
Sep 27
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Friday, September 27, 2024 2:13:45 PM MDT thinkunix via Digitalmars-d-learn 
wrote:
 monkyyy via Digitalmars-d-learn wrote:
 On Friday, 27 September 2024 at 04:23:32 UTC, thinkunix wrote:
 What about using 'auto' as the return type?
 I tried it and it seemed to work OK.

 Wondering if there are any good reasons to use auto,
 or bad reasons why not to use auto here?
You have started a style debate that will last a week, great work
That was not my intent. It was an honest question. I'm here to learn and not looking to start debates or for attitude. I've seen a lot of "use auto everywhere" especially in C++ and was wondering where the D community stands on it's use. Is it generally favored or not? Personally, I think auto makes understanding code harder for humans. But in this example, it seemed like auto was a good fit.
Well, I don't think that auto is a particularly controversial topic among D programmers, but there's no question that in at least some cases, it becomes a question of preference and style. auto is used heavily in D, and for some kinds of code, it has to be. In particular, range-based code routinely returns "Voldemort" types (types which you can't name) from a function. You care that it's a range (so that needs to be clear from the documentation), but you're not supposed to know or care what the actual type is. Before we had that, the signatures for a lot of range-based functions (e.g. most of std.algorithm) were actually pretty terrible to read and understand, because the explicit types were so long and messy (since it's very common for ranges to wrap one another, and they're usually templated). In a lot of those situations, you really only care about the API of the type and not what the exact type is. So, auto simplifies the code considerably and makes it easier to understand. auto is also pretty critical in generic code in general, because it allows you to not worry about the exact type you're dealing with. You just need to worry about which kinds of operations work with that particular type. So, variables get declared as auto all the time in typical D code. A lot of D programmers will only use the type explicitly with variables if they want to force a particular type or if they think that it makes that code easier to understand. auto can also make code more maintainable in that when refactoring code, the type will update automatically, so if it changes, you don't have to change it all over the place. On the flip side though, it does make it harder to know what types you're dealing with, which can sometimes make it harder to read and maintain code. So, when auto isn't actually needed, there's always going to be some debate about whether it's better to use auto or not, and that's subjective enough that you're really going to have to decide on your own what works for you. Also, for better or worse, because the function has to be compiled to determine the actual return type (so it won't work with function prototypes like in .di file), function attributes are inferred for auto return functions just like they are for templated functions, so some D programmers will make functions auto just to get the attribute inference. I would guess that as a general rule, most folks prefer explicit types in function signatures where auto isn't needed simply because it's self-documenting at that point. So, I would think that most D programmers would use an explicit type with the function being discussed in this thread, but auto will certainly work just fine. Without any explicit casts within the function, because doing math on char or ubyte results in int, the result is going to be int (as opposed to uint like in the original post). So, if that works for what the function is intended for, and the documentation is clear about what's being returned, then it's not a problem. However, in this particular case, it's arguably better to return ubyte than int or uint. That's because the result will always fit within a ubyte, and if you don't return a ubyte, the caller is going to have to cast to ubyte to do something like assign the value to a ubyte. So, simply returning auto as-is arguably isn't desirable. That being said, you can still use auto if you want to. Because the math results in int, casts to ubyte will be required regardless of whether the return type in the signature is ubyte or auto, but you could choose to use auto and have the result be ubyte thanks to the casts. Ultimately though, I would argue that in this case, auto just makes the code harder to understand. The documentation can easily say that the return type is ubyte in spite of it saying auto, but it's just as easy to type ubyte, and then the return type is immediately obvious instead of requiring additional documentation just to say what's being returned. So, while auto is used quite heavily in D code, I wouldn't expect many folks to choose to use auto for this particular function. - Jonathan M Davis
Sep 27
prev sibling next sibling parent thinkunix <thinkunix zoho.com> writes:
H. S. Teoh via Digitalmars-d-learn wrote:
 In idiomatic D, you'd use `auto` when either (1) you don't care what the
 type is, you just want whatever value you get to be shoved into a
 variable, or (2) you *shouldn't* care what the type is, because your
 code shouldn't be depending on it, e.g., when you're using Voldemort
 types std.algorithm-style.
Thank you! That was a very helpful response.
Sep 27
prev sibling parent thinkunix <thinkunix zoho.com> writes:
Jonathan M Davis via Digitalmars-d-learn wrote:
 Well, I don't think that auto is a particularly controversial topic among D
 programmers...
Thank you Jonathan for that very detailed response. This thread can end now unless others really feel the need to comment. I got two outstanding responses and now have a much better understanding of why and when to use auto.
Sep 27
prev sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, September 26, 2024 12:53:12 AM MDT Per Nordlöw via Digitalmars-d-
learn wrote:
 Should a function like

 ```d
 uint parseHex(in char ch) pure nothrow  safe  nogc {
   switch (ch) {
   case '0': .. case '9':
       return ch - '0';
   case 'a': .. case 'f':
       return 10 + ch - 'a';
   case 'A': .. case 'F':
       return 10 + ch - 'A';
   default:
       assert(0, "Non-hexadecimal character");
   }
 }
 ```

 instead return an ubyte?
I would argue that ubyte would be better, because it's guaranteed to fit into a ubyte, but if it returns uint, then anyone who wants to assign it to a ubyte will need to cast it, whereas you can just do the casts right here (which could mean a lot less casting if this function is used much). Not only that, but you'd be doing the casts in the code that controls the result, so if something ever changes that makes it so that the type needs to change (e.g. you make it operate on dchar instead of char), you won't end up with callers casting to ubyte when the result then doesn't actually fit into a ubyte - whereas if parseHex's function signature changes from returning ubyte to returning ushort or uint or whatnot, then the change would be caught at compile time with any code that assigned the result to a ubyte. Now, I'm guessing that it wouldn't ever make sense to change this particular function in a way that the return type needed to change, and returning uint should ultimately work just fine, but I think that restricting the surface area where narrowing casts are likely to happen will ultimately reduce the risk of bugs, and I think that it's pretty clear that there will be less casting overall if the casting is done here instead of at the call site unless the function is barely ever used. - Jonathan M Davis
Sep 27