www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - D's Auto Decoding and You

reply Jack Stouffer <jack jackstouffer.com> writes:
http://jackstouffer.com/blog/d_auto_decoding_and_you.html

Based on the recent thread in General, I wrote this blog post 
that's designed to be part beginner tutorial, part objective 
record of the debate over it, and finally my opinions on the 
matter.

When I first learned about auto-decoding, I was kinda miffed that 
there wasn't anything in the docs or on the website that went 
into more detail. So I wrote this in order to introduce people 
who are getting into D to the concept, it's benefits, and 
downsides. When people are confused in Learn why typeof(s.front) 
== dchar then this can just be linked to them.

If you think there should be any more information included in the 
article, please let me know so I can add it.
May 17 2016
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/17/16 10:06 AM, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post that's
 designed to be part beginner tutorial, part objective record of the
 debate over it, and finally my opinions on the matter.

 When I first learned about auto-decoding, I was kinda miffed that there
 wasn't anything in the docs or on the website that went into more
 detail. So I wrote this in order to introduce people who are getting
 into D to the concept, it's benefits, and downsides. When people are
 confused in Learn why typeof(s.front) == dchar then this can just be
 linked to them.

 If you think there should be any more information included in the
 article, please let me know so I can add it.
Starting to read it, see errors in your examples: is(s[0] == immutable char) -> is(typeof(s[0]) == immutable(char)) is(s.front == dchar) -> is(typeof(s.front()) == dchar) I'm not sure if you need the parens after front, but if it's not marked as property, then this returns a function. -Steve
May 17 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 17 May 2016 at 14:16:48 UTC, Steven Schveighoffer 
wrote:
 Starting to read it, see errors in your examples:

 is(s[0] == immutable char) -> is(typeof(s[0]) == 
 immutable(char))
 is(s.front == dchar) -> is(typeof(s.front()) == dchar)
Thanks, fixed.
May 17 2016
prev sibling parent reply Rory McGuire via Digitalmars-d-announce writes:
On 17 May 2016 16:21, "Steven Schveighoffer via Digitalmars-d-announce" <
digitalmars-d-announce puremagic.com> wrote:
 On 5/17/16 10:06 AM, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post that's
 designed to be part beginner tutorial, part objective record of the
 debate over it, and finally my opinions on the matter.

 When I first learned about auto-decoding, I was kinda miffed that there
 wasn't anything in the docs or on the website that went into more
 detail. So I wrote this in order to introduce people who are getting
 into D to the concept, it's benefits, and downsides. When people are
 confused in Learn why typeof(s.front) == dchar then this can just be
 linked to them.

 If you think there should be any more information included in the
 article, please let me know so I can add it.
Starting to read it, see errors in your examples: is(s[0] == immutable char) -> is(typeof(s[0]) == immutable(char)) is(s.front == dchar) -> is(typeof(s.front()) == dchar) I'm not sure if you need the parens after front, but if it's not marked
as property, then this returns a function.
 -Steve
If I remember correctly adding the brackets then goes against best practices because you can't be sure the underlying implementation of a range is using a function for .front.
May 17 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/17/16 1:49 PM, Rory McGuire via Digitalmars-d-announce wrote:
 On 17 May 2016 16:21, "Steven Schveighoffer via Digitalmars-d-announce"
 <digitalmars-d-announce puremagic.com
 <mailto:digitalmars-d-announce puremagic.com>> wrote:
  >
  > On 5/17/16 10:06 AM, Jack Stouffer wrote:
  >>
  >> http://jackstouffer.com/blog/d_auto_decoding_and_you.html
  >>
  >> Based on the recent thread in General, I wrote this blog post that's
  >> designed to be part beginner tutorial, part objective record of the
  >> debate over it, and finally my opinions on the matter.
  >>
  >> When I first learned about auto-decoding, I was kinda miffed that there
  >> wasn't anything in the docs or on the website that went into more
  >> detail. So I wrote this in order to introduce people who are getting
  >> into D to the concept, it's benefits, and downsides. When people are
  >> confused in Learn why typeof(s.front) == dchar then this can just be
  >> linked to them.
  >>
  >> If you think there should be any more information included in the
  >> article, please let me know so I can add it.
  >
  >
  > Starting to read it, see errors in your examples:
  >
  > is(s[0] == immutable char) -> is(typeof(s[0]) == immutable(char))
  > is(s.front == dchar) -> is(typeof(s.front()) == dchar)
  >
  > I'm not sure if you need the parens after front, but if it's not
 marked as  property, then this returns a function.
  >

 If I remember correctly adding the brackets then goes against best
 practices because you can't be sure the underlying implementation of a
 range is using a function for .front.
Right, but there's this: struct MyRange { private int _val; int front() { return _val; } bool empty() { return _val < 100; } void popFront() { ++_val; } } static assert(isInputRange!MyRange); static assert(!is(typeof(MyRange.init.front) == int)); This is why I recommended the parentheses. In reality, you should do ElementType!MyRange, which does the correct thing. But is(typeof(...)) doesn't always work like you might expect. -Steve
May 17 2016
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
Grammar:

"This will tie in later because the string types front has special behavior"

...because for string types, front has...

Content:
"For C style strings, you can use ubyte[] and call std.string.assumeUTF 
where necessary"

Actually, C style strings are ASCII, no? UTF8 includes ASCII. And D 
treats C strings as char *, not ubyte[]

Content:

D will look ahead in the string and combine things like e and U+0308 into ë

Nope :) This is a grapheme, and D does not decode these into one dchar.

Grammar:

"about it's inclusion"

it's -> its

Typo:

"Pared with the inability to turn it off,"

Pared -> Paired

Typo:

"Phobos String type would be the best option and a deprecation of the 
sting front function"

sting -> string

Like the article, pretty much sums up my thoughts too. IMO, the only 
path forward is something that aliases string to something that 
auto-decodes, but that is NOT a char array. Then you have to deprecate 
implicit access to the backing array, and make it explicit. Probably 
would take 2 years or so to migrate.

-Steve
May 17 2016
parent Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 17 May 2016 at 14:44:06 UTC, Steven Schveighoffer 
wrote:
 ...
Thanks, fixed all issues.
 Like the article, pretty much sums up my thoughts too.
Thanks!
May 17 2016
prev sibling next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post 
 that's designed to be part beginner tutorial, part objective 
 record of the debate over it, and finally my opinions on the 
 matter.
I probably would have preferred this split up into two parts with one on the tutorial and one on the record of the debate. It seems to focus more on the debate. Maybe you could get add a section that was like "for THIS type of string, do X for performance" with clear explanations of why that is the best way and why other ways will be slower. Then, you can have other sub-sections for "for THAT type of string, do Y for performance." You have some of this detail in there, but it's organized more with respect to the context of the debate, I think.
May 17 2016
parent Jack Stouffer <jack jackstouffer.com> writes:
On Tuesday, 17 May 2016 at 16:24:31 UTC, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post 
 that's designed to be part beginner tutorial, part objective 
 record of the debate over it, and finally my opinions on the 
 matter.
I probably would have preferred this split up into two parts with one on the tutorial and one on the record of the debate. It seems to focus more on the debate.
That wasn't my intent. I wanted the debate to be a lens into a discussion of the technical merits and demerits of auto decoding. I have reworded some of the article in order to reflect this.
 Maybe you could get add a section that was like "for THIS type 
 of string, do X for performance" with clear explanations of why 
 that is the best way and why other ways will be slower. Then, 
 you can have other sub-sections for "for THAT type of string, 
 do Y for performance." You have some of this detail in there, 
 but it's organized more with respect to the context of the 
 debate, I think.
I will add this, thanks.
May 17 2016
prev sibling next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html
Thanks for writing this. Great article. Some remarks:
    static assert(is(typeof(s.front()) == dchar));
I believe .front is a property (so some ranges can implement it as a field, not a property function). Hence, no parens.
 So, why is typeof(s.front) == dchar.
Question mark?
 In plain English, this means when iterating over strings in D, 
 D will look ahead in the string and combine any code units that 
 make up a single code point.
Perhaps clarify that this only applies to ranges. `foreach` on a string will iterate over chars, but you can iterate over code points if you specify the dchar type explicitly. More confusing text on the same issue lower, and in the intro:
 Iterating a char array with C style for loops produces 
 different results than foreach loops due to auto decoding.
 One feature of D that is confusing to a lot of new comers is 
 the behavior of strings in relation to range based features 
 like the foreach statement and range algorithms.
---
 E.g. for ë the code units C3 AB (for UTF-8) would turn into a 
 single code point.
Perhaps choose a character that is not also expressable via composite characters, to avoid potential for confusion.
 string s = "cassé";
Ditto (unless the goal was to complement the example from my .d file below)
  These glaring inconsistencies are the cause of a lot of 
 confusion for new comers.
(Opinion) I would say that they also cause issues in generic code.
 Every time one wants a generic algorithm to work with both 
 strings and ranges, you wind up special casing via static 
 if-ing narrow strings to defeat the auto decoding, or to decode 
 the ranges. Case in point.
Link to the exact SHA to prevent the link from getting outdated. On Github, just hit 'y' on your keyboard to go to the "permalink" version.
 Auto decoding has two choices when encountering invalid code 
 units: throw, or produce an error dchar like std.utf.byUTF does.
(Aside) This was an interesting discussion on the subject: https://issues.dlang.org/show_bug.cgi?id=14519
 However, in my opinion D is too far along to to suddenly ask 
 people
"to to" --- Some more info / links on the subject I collected a few years ago: http://wiki.dlang.org/Language_issues#Unicode_and_ranges
May 17 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/17/16 1:18 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html
Thanks for writing this. Great article. Some remarks:
    static assert(is(typeof(s.front()) == dchar));
I believe .front is a property (so some ranges can implement it as a field, not a property function). Hence, no parens.
Right, but s is a string. So front is a function. There is an inconsistency in the compiler for this. If s.front is a function is(typeof(s.front)) will not be what front *returns*, but the function type itself. Unless you tag with property. However, it's perfectly legal for a front function not to be tagged property. -Steve
May 17 2016
next sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer 
wrote:
 On 5/17/16 1:18 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html
Thanks for writing this. Great article. Some remarks:
    static assert(is(typeof(s.front()) == dchar));
I believe .front is a property (so some ranges can implement it as a field, not a property function). Hence, no parens.
Right, but s is a string. So front is a function.
Then what happened to writing generic code?
 There is an inconsistency in the compiler for this. If s.front 
 is a function is(typeof(s.front)) will not be what front 
 *returns*, but the function type itself. Unless you tag with 
  property. However, it's perfectly legal for a front function 
 not to be tagged  property.
There is a simple answer to this, and it is to either use ElementType or do what it does (is(typeof(R.init.front.init) T)).
May 17 2016
parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/17/16 2:23 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
 On 5/17/16 1:18 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html
Thanks for writing this. Great article. Some remarks:
    static assert(is(typeof(s.front()) == dchar));
I believe .front is a property (so some ranges can implement it as a field, not a property function). Hence, no parens.
Right, but s is a string. So front is a function.
Then what happened to writing generic code?
This isn't generic code, it's just demonstrating that string's front does not yield immutable(char). It's very specific to string. In my recommendation to add the parens, I wasn't sure if front is marked property or not. In any case, this is a lot of conversation about something that isn't that important :) -Steve
May 17 2016
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer 
wrote:
 However, it's perfectly legal for a front function not to be 
 tagged  property.
BTW, where is this coming from? Is it simply an emergent property of the existing implementations of isInputRange and ElementType, or is it actually by design?
May 17 2016
next sibling parent reply "H. S. Teoh via Digitalmars-d-announce" writes:
On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
However, it's perfectly legal for a front function not to be tagged
 property.
BTW, where is this coming from? Is it simply an emergent property of the existing implementations of isInputRange and ElementType, or is it actually by design?
This is very bad. The range API does not mandate that .front must be a function. I often write ranges where .front is an actual struct variable that gets updated by .popFront. Now you're saying that my range won't work with some code, because they call .front() (which is a compile error when .front is a variable, not a function)? In the old days (i.e., 1-2 years ago), isForwardRange!R will return false if .save is not marked property. I thought isInputRange!R did the same for .front, or am I imagining things? Did somebody change this recently? T -- INTEL = Only half of "intelligence".
May 17 2016
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 5/17/16 8:36 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
 However, it's perfectly legal for a front function not to be tagged
  property.
BTW, where is this coming from? Is it simply an emergent property of the existing implementations of isInputRange and ElementType, or is it actually by design?
This is very bad. The range API does not mandate that .front must be a function. I often write ranges where .front is an actual struct variable that gets updated by .popFront. Now you're saying that my range won't work with some code, because they call .front() (which is a compile error when .front is a variable, not a function)?
My goodness no! People, please, my point is simply that is(typeof(someRange.front) == ElementType!(typeof(someRange))) DOESN'T ALWAYS WORK. Here is the (long standing) definition of isInputRange: template isInputRange(R) { enum bool isInputRange = is(typeof( (inout int = 0) { R r = R.init; // can define a range object if (r.empty) {} // can test for empty r.popFront(); // can invoke popFront() auto h = r.front; // can get the front of the range })); } Not there is no check for is(typeof(r.front)) to be some certain thing. So this is a valid range: struct AllZeros { int front() { return 0; } enum empty = false; void popFront() {} } Yet, is(typeof(AllZeros.init.front) == int) will be false. This is the line of code from the article that I suggested to add the parens to. Because in that particular case, string.front is a function, not a field. The code in question is NOT GENERIC, it's just showing that string.front is not the same as string[0]. It's very specific to string.
 In the old days (i.e., 1-2 years ago), isForwardRange!R will return
 false if .save is not marked  property. I thought isInputRange!R did the
 same for .front, or am I imagining things?  Did somebody change this
 recently?
You are imagining that someInputRange.front ever required that. In fact, it would have had to go out of its way to do so (because isInputRange puts no requirements on the *type* of front, except that it returns a non-void value). But you are right that save did require property at one time. Not (In my opinion) because it meant to, but because it happened to check the type of r.save against a type (namely, that .save returns its own type). At the same time, I fixed all the isXXXRange traits so property is not required anywhere. In particular, isRandomAccessRange required r.front to be property, even when isInputRange didn't (again, IMO unintentionally). Here is the PR: https://github.com/dlang/phobos/pull/3276 -Steve
May 19 2016
parent "H. S. Teoh via Digitalmars-d-announce" writes:
On Thu, May 19, 2016 at 09:21:40AM -0400, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 5/17/16 8:36 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
 However, it's perfectly legal for a front function not to be
 tagged  property.
BTW, where is this coming from? Is it simply an emergent property of the existing implementations of isInputRange and ElementType, or is it actually by design?
This is very bad. The range API does not mandate that .front must be a function. I often write ranges where .front is an actual struct variable that gets updated by .popFront. Now you're saying that my range won't work with some code, because they call .front() (which is a compile error when .front is a variable, not a function)?
My goodness no! People, please, my point is simply that is(typeof(someRange.front) == ElementType!(typeof(someRange))) DOESN'T ALWAYS WORK.
OK, so the point is, use ElementType!(typeof(range)) instead of typeof(range.front)? That works for me. Sorry for the noise. :-P [...]
 In the old days (i.e., 1-2 years ago), isForwardRange!R will return
 false if .save is not marked  property. I thought isInputRange!R did
 the same for .front, or am I imagining things?  Did somebody change
 this recently?
You are imagining that someInputRange.front ever required that. In fact, it would have had to go out of its way to do so (because isInputRange puts no requirements on the *type* of front, except that it returns a non-void value). But you are right that save did require property at one time. Not (In my opinion) because it meant to, but because it happened to check the type of r.save against a type (namely, that .save returns its own type).
Ah, so that's where it came from. Now I remember that there were bugs caused by .save returning something other than the original range type, which broke certain algorithms. That's probably where the whole .save requiring property thing came from.
 At the same time, I fixed all the isXXXRange traits so  property is
 not required anywhere. In particular, isRandomAccessRange required
 r.front to be  property, even when isInputRange didn't (again, IMO
 unintentionally). Here is the PR:
 https://github.com/dlang/phobos/pull/3276
[...] Thanks for the info! T -- "Maybe" is a strange word. When mom or dad says it it means "yes", but when my big brothers say it it means "no"! -- PJ jr.
May 19 2016
prev sibling parent reply Jonathan M Davis via Digitalmars-d-announce writes:
On Tuesday, May 17, 2016 17:36:44 H. S. Teoh via Digitalmars-d-announce wrote:
 On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via 
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
However, it's perfectly legal for a front function not to be tagged
 property.
BTW, where is this coming from? Is it simply an emergent property of the existing implementations of isInputRange and ElementType, or is it actually by design?
This is very bad. The range API does not mandate that .front must be a function. I often write ranges where .front is an actual struct variable that gets updated by .popFront. Now you're saying that my range won't work with some code, because they call .front() (which is a compile error when .front is a variable, not a function)?
At this point, if anyone ever calls front with parens, they're doing it wrong. The range API checks that accessing front as if it were a variable works and _not_ whether calling it as a function works. So, front can be a variable, an enum (though that wouldn't make much sense), a function, or a property function. All of those are legal. And properly written range-based code will work with all of them, because it won't use parens. The only reason to use parens on front would be if it returned a callable, and since property doesn't handle that correctly right now (the first set of parens still call the function, not what it returns), it really doesn't work correctly to have a range of callables - at least not and call them without assigning them to a variable first.
 In the old days (i.e., 1-2 years ago), isForwardRange!R will return
 false if .save is not marked  property. I thought isInputRange!R did the
 same for .front, or am I imagining things?  Did somebody change this
 recently?
IIRC save stopped checking for that around dconf of last year, since given that it's perfectly legit to call normal functions without parens, and there is no strong property enforcement, it makes no sense to require that save be a property function. But I don't think that isInputRange ever checked that front or empty were property functions (and if it did, I don't know that it would have worked for them to ever be variables - that would depend on how the check for whether it was a property function was done). Regardless, anyone who ever calls front or empty with parens is writing bad code that will not work generically, because that's not what the range API dictates. That's one reason why I wish that we had strict property enforcement, but there's no way that we're getting that at this point. So, while we do have enforcement of how ranges _can_ be used, we don't have enforcement of how they _are_ used, and I don't expect that we'll ever get that. - Jonathan M Davis
May 18 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 At this point, if anyone ever calls front with parens, they're 
 doing it wrong.
Is this true of all property functions? Should this be noted in the spec? Should it be an error? If it shouldn't be an error, is it really such a bad thing?
May 18 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 18 May 2016 at 22:23:45 UTC, jmh530 wrote:
 Is this true of all  property functions?
No, this is purely a range thing where it's legal to have your front be a public member variable rather than a getter function.
 Should this be noted in the spec?
While somewhat supported in the language, at the end of the day ranges are library types, so no.
 Should it be an error?
No, people's code will error if they try to call a non callable anyway.
May 18 2016
prev sibling parent reply Jonathan M Davis via Digitalmars-d-announce writes:
On Wednesday, May 18, 2016 22:23:45 jmh530 via Digitalmars-d-announce wrote:
 On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 At this point, if anyone ever calls front with parens, they're
 doing it wrong.
Is this true of all property functions? Should this be noted in the spec? Should it be an error? If it shouldn't be an error, is it really such a bad thing?
It makes _no_ sense to use parens on a typical property function. The whole point of properties is that they act like variables. If you're marking a function with property, you're clearly indicating that it's intended to be treated as if it were a variable and not as a function. So, in principle, if you're using parens on a property function, the parens should be used on the return value and _not_ the function. That being said, we've never ended up with property enforcement of any kind being added to the language. So, while the compiler _should_ require that an property function be called without parens, it doesn't. And without that requirement, property really doesn't do much. If we want properties to work where the type is a callable like a delegate (e.g. you have a range of delegates, so front returns a delegate), then it's going to need to change so that using parens on an property function actually uses the paren on the return value, and without that property is nothing more than documentation. So, right now, property is pretty much just documentation about how the person who wrote the code expects you to use it. It doesn't really do anything. It does have some affect with regards to typeof, but overall, it does nothing. Now, that being said, when I was talking about calling front with parens, I wasn't really talking about property functions - though front is frequently a property function. Rather, my point was that isInputRange requires that this code compile: R r = R.init; // can define a range object if (r.empty) {} // can test for empty r.popFront(); // can invoke popFront() auto h = r.front; // can get the front of the range If that code compiles with a given type, then that type can be used as an input range. That API is in the API defined for input ranegs. It does _not_ call front with parens nor does it call empty with parens. Rather, it explicitly uses them _without_ parens. So, they could be property functions, or variables, or normal functions that just don't get called with parens, or anything else that can be used without parens and compile with that code. So, if you write a range-based algorithm that uses parens on empty or front, then you're writing an algorithm that does not follow the range API and which will not work with many ranges. The range API does _not_ guarantee that either front or empty can be used with parens. It guarantees that they can be used _without_ them. So, if your code ever uses parens on front or empty, then it's using the range API incorrectly and risks not compiling with many ranges. - Jonathan M Davis
May 19 2016
parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 19 May 2016 at 12:10:36 UTC, Jonathan M Davis wrote:
 [snip]
Very informative, as always. I had not realized the implication of front being called without parens, such as front being something that isn't an property function (esp. variable). Are there any ranges in phobos (you can think of) that do this?
May 19 2016
parent Jonathan M Davis via Digitalmars-d-announce writes:
On Thursday, May 19, 2016 13:11:47 jmh530 via Digitalmars-d-announce wrote:
 On Thursday, 19 May 2016 at 12:10:36 UTC, Jonathan M Davis wrote:
 [snip]
Very informative, as always. I had not realized the implication of front being called without parens, such as front being something that isn't an property function (esp. variable). Are there any ranges in phobos (you can think of) that do this?
I'm not aware of any, but there might be some. I think that most of use pretty much always use property functions, though the usual reasons for that don't really apply to Voldemort types. The usual reason to use an property function instead of a public variable is that while property functions emulate variables, they really aren't the same (e.g. taking its address doesn't have the same semantics, and stuff like incrementing doesn't normally work with property functions). So, unfortunately, you can't transparently swap between property functions and variables, even though that's theoretically one of the reasons that property functions exist in a language. But when using a range that's a Voldemort type, you don't even ever see the type's declaration, and you're only ever supposed to use the range API on it, so the semantics of variable vs function don't really matter, since you're not supposed to be doing any of the stuff where it would matter (e.g. it makes no sense to take the address of front, because what that means is not specified by the range API and will do different things with different range implementations). So, it's arguably better to just use public variables with ranges if functions like front or empty are just going to return a value, but many of use just use property functions out of habit given that in the general case, it's problematic to use public variables instead of property functions (though it _would_ be a nice language enhancement IMHO if using property on a variable made it illegal to do anything on it that you couldn't do on an property function, since then you could make it a public variable until refactoring required that it become a function, and changing it wouldn't break code, whereas now, it might). - Jonathan M Davis
May 20 2016
prev sibling next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 At this point, if anyone ever calls front with parens, they're 
 doing it wrong.
$ cd ~/dlang/phobos && grep -r "\.front()" * | wc -l 3 Not bad. One is commented out and the other two look intentional.
May 18 2016
prev sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 So, while we do have enforcement of how ranges _can_ be used, 
 we don't have enforcement of how they _are_ used, and I don't 
 expect that we'll ever get that.
It would help if there was documented standard testing procedure (and used for all algorithms).
May 19 2016
parent Jonathan M Davis via Digitalmars-d-announce writes:
On Thursday, May 19, 2016 09:05:53 Kagamin via Digitalmars-d-announce wrote:
 On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 So, while we do have enforcement of how ranges _can_ be used,
 we don't have enforcement of how they _are_ used, and I don't
 expect that we'll ever get that.
It would help if there was documented standard testing procedure (and used for all algorithms).
We really need solid tools for testing an algorithm with a variety of ranges to verify that it's doing the right thing as well as tools to test that a range behaves how ranges are supposed to behave. The closest that we have to that is that std.range has some internal helpers for testing ranges, but they're not that great, and I don't think that any of it's public. And we don't have anything for testing that a range acts correctly - just that it follows the right API syntactically with the minimal semantic checking that can be done with typeof. So, there's work to be done. I'd started some of it a while back, but I never got very far, and no one else has done anything like it AFAIK. - Jonathan M Davis
May 19 2016
prev sibling next sibling parent "H. S. Teoh via Digitalmars-d-announce" writes:
On Tue, May 17, 2016 at 02:06:37PM +0000, Jack Stouffer via
Digitalmars-d-announce wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html
[...] Thanks for writing up this article! T -- What did the alien say to Schubert? "Take me to your lieder."
May 17 2016
prev sibling next sibling parent reply Taylor Hillegeist <taylorh140 gmail.com> writes:
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post 
 that's designed to be part beginner tutorial, part objective 
 record of the debate over it, and finally my opinions on the 
 matter.

 When I first learned about auto-decoding, I was kinda miffed 
 that there wasn't anything in the docs or on the website that 
 went into more detail. So I wrote this in order to introduce 
 people who are getting into D to the concept, it's benefits, 
 and downsides. When people are confused in Learn why 
 typeof(s.front) == dchar then this can just be linked to them.

 If you think there should be any more information included in 
 the article, please let me know so I can add it.
I ran into an auto decoding problem earlier. Honestly I was upset, and I think I was rightly upset. The programming language moved my cheese without telling me. I tend to believe in the route of least surprise. if as a newbie I am doing something stupid and find out i was wrong, that is one thing. but if i continue to do something wrong and find out that the programming language thinks I am stupid that's another thing. If people want auto coding behavior shouldn't they just use or convert to dchar?
May 19 2016
parent Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 19 May 2016 at 17:16:54 UTC, Taylor Hillegeist wrote:
 If people want auto coding behavior shouldn't they just use or 
 convert to dchar?
No, they need to _decode_ to dchar. I don't know if you miss-typed, but if you didn't, this is exactly what I was talking about in the article:
 "Unicode is hard. Trying to hide Unicode specifics helps no one 
 because it's going to bite you in the ass eventually."
May 19 2016
prev sibling next sibling parent John Carter <john.carter taitradio.com> writes:
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html
 There are lots of places where invalid Unicode is either 
 commonplace or legal, e.g. Linux file names, and therefore 
 auto decoding cannot be used. It turns out in the wild that 
 pure Unicode is not universal - there's lots of dirty Unicode 
 that should remain unmolested because it's user data, and auto 
 decoding does not play well with that mentality.
As a slightly tangential aside..... https://lwn.net/Articles/686392/ There exists a proposal for a linux kernel module to render the creation of such names impossible..... I for one will install it on all my systems as soon as I can. However, until then, my day job requires me to find, scan and analyze and work with whatever crud, the herd of cats I work with, throws into the repo. And no, sadly I can't just rewrite everything because they (or some tool they use) doesn't understand UTF8.
May 19 2016
prev sibling next sibling parent Martin Nowak <code dawg.eu> writes:
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:

Related discussion 
https://trello.com/c/4XmFdcp6/163-rediscuss-redundant-utf-8-string-validation.
May 20 2016
prev sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in 
 the article, please let me know so I can add it.
I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); }
Jun 02 2016
next sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/16 5:21 PM, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in the
 article, please let me know so I can add it.
I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); }
If that é above is an e followed by a combining character, then you will get the error. This is because autodecoding does not auto normalize as well -- the code points have to match exactly. -Steve
Jun 02 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 6/2/16 5:27 PM, Steven Schveighoffer wrote:
 On 6/2/16 5:21 PM, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in the
 article, please let me know so I can add it.
I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); }
If that é above is an e followed by a combining character, then you will get the error. This is because autodecoding does not auto normalize as well -- the code points have to match exactly. -Steve
Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5 Should I assume some normalization occurred on the way? Andrei
Jun 02 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu 
wrote:
 Should I assume some normalization occurred on the way?
I'm just looking over std.uni's section on normalization and realizing that I had basically no idea what it is or what's going on. The wikipedia page on unicode equivalence is a bit clearer. I'm definitely nowhere near qualified to have an opinion on these issues.
Jun 02 2016
next sibling parent reply Rory McGuire via Digitalmars-d-announce writes:
On Fri, Jun 3, 2016 at 5:16 AM, jmh530 via Digitalmars-d-announce
<digitalmars-d-announce puremagic.com> wrote:
 On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu wrote:
 Should I assume some normalization occurred on the way?
I'm just looking over std.uni's section on normalization and realizing that I had basically no idea what it is or what's going on. The wikipedia page on unicode equivalence is a bit clearer. I'm definitely nowhere near qualified to have an opinion on these issues.
This dpaste shows a couple of issues with combining chars in D. https://dpaste.dzfl.pl/4b006959c5c0 The compiler actually can't handle a combining character literal either. see line 10. R
Jun 02 2016
parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 3 June 2016 at 06:37:59 UTC, Rory McGuire wrote:
 This dpaste shows a couple of issues with combining chars in D.

 https://dpaste.dzfl.pl/4b006959c5c0

 The compiler actually can't handle a combining character 
 literal either. see line 10.
Your paste behaves as expected: the "character" types in D are defined as single Unicode code units. By definition, the NFD form of "é" is not a single code unit. You would need to use a Grapheme or [w|d]string for that. (Of course, one might reasonably question how useful our built-in character types actually are compared to ubyte/ushort/uint.)
Jun 02 2016
parent Rory McGuire via Digitalmars-d-announce writes:
On Fri, Jun 3, 2016 at 8:58 AM, tsbockman via Digitalmars-d-announce
<digitalmars-d-announce puremagic.com> wrote:
 On Friday, 3 June 2016 at 06:37:59 UTC, Rory McGuire wrote:
 This dpaste shows a couple of issues with combining chars in D.

 https://dpaste.dzfl.pl/4b006959c5c0

 The compiler actually can't handle a combining character literal either.
 see line 10.
Your paste behaves as expected: the "character" types in D are defined as single Unicode code units. By definition, the NFD form of "é" is not a single code unit. You would need to use a Grapheme or [w|d]string for that. (Of course, one might reasonably question how useful our built-in character types actually are compared to ubyte/ushort/uint.)
hmm, perhaps it behaves as documented, however I'm not certain that its expected :).
Jun 03 2016
prev sibling parent tsbockman <thomas.bockman gmail.com> writes:
On Friday, 3 June 2016 at 03:16:33 UTC, jmh530 wrote:
 I'm just looking over std.uni's section on normalization and 
 realizing that I had basically no idea what it is or what's 
 going on. The wikipedia page on unicode equivalence is a bit 
 clearer.
This might help a bit, as well: https://dpaste.dzfl.pl/2ffb22b02842
Jun 02 2016
prev sibling parent Steven Schveighoffer <schveiguy yahoo.com> writes:
On 6/2/16 5:33 PM, Andrei Alexandrescu wrote:
 On 6/2/16 5:27 PM, Steven Schveighoffer wrote:
 On 6/2/16 5:21 PM, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in the
 article, please let me know so I can add it.
I was a little confused by something in the main autodecoding thread, so I read your article again. Unfortunately, I don't think my confusion is resolved. I was trying one of your examples (full code I used below). You claim it works, but I keep getting assertion failures. I'm just running it with rdmd on Windows 7. import std.algorithm : canFind; void main() { string s = "cassé"; assert(s.canFind!(x => x == 'é')); }
If that é above is an e followed by a combining character, then you will get the error. This is because autodecoding does not auto normalize as well -- the code points have to match exactly.
Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5 Should I assume some normalization occurred on the way?
I think it depends on what your browser presents. But impossible to tell without being on the OP's machine to see what it's actually stored as. Thunderbird may have normalized as well! -Steve
Jun 03 2016
prev sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
 I was a little confused by something in the main autodecoding 
 thread, so I read your article again. Unfortunately, I don't 
 think my confusion is resolved. I was trying one of your 
 examples (full code I used below). You claim it works, but I 
 keep getting assertion failures. I'm just running it with rdmd 
 on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
 	string s = "cassé";

 	assert(s.canFind!(x => x == 'é'));
 }
Your browser is turning the é in the string into two code points via normalization whereas it should be one. Try using \u00E9 instead.
Jun 02 2016
parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 2 June 2016 at 21:31:39 UTC, Jack Stouffer wrote:
 On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
 I was a little confused by something in the main autodecoding 
 thread, so I read your article again. Unfortunately, I don't 
 think my confusion is resolved. I was trying one of your 
 examples (full code I used below). You claim it works, but I 
 keep getting assertion failures. I'm just running it with rdmd 
 on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
 	string s = "cassé";

 	assert(s.canFind!(x => x == 'é'));
 }
Your browser is turning the é in the string into two code points via normalization whereas it should be one. Try using \u00E9 instead.
That doesn't cause an assert to fail, but when I do writeln('\u00E9') I get é. So there might still be something wonky going on. I looked up \u00E9 online and I don't think there's an error with that.
Jun 02 2016