digitalmars.D.announce - D's Auto Decoding and You

Jack Stouffer (13/13) May 17 2016 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

Steven Schveighoffer (7/19) May 17 2016 Starting to read it, see errors in your examples:

Jack Stouffer (3/7) May 17 2016 Thanks, fixed.
Rory McGuire via Digitalmars-d-announce (6/28) May 17 2016 as @property, then this returns a function.

Steven Schveighoffer (15/49) May 17 2016 Right, but there's this:

Steven Schveighoffer (27/27) May 17 2016 Grammar:

Jack Stouffer (4/6) May 17 2016 Thanks, fixed all issues.

jmh530 (10/15) May 17 2016 I probably would have preferred this split up into two parts with

Jack Stouffer (5/22) May 17 2016 That wasn't my intent. I wanted the debate to be a lens into a

Vladimir Panteleev (25/49) May 17 2016 Thanks for writing this. Great article.

Steven Schveighoffer (7/14) May 17 2016 Right, but s is a string. So front is a function.

Vladimir Panteleev (5/24) May 17 2016 Then what happened to writing generic code?

Steven Schveighoffer (8/24) May 17 2016 This isn't generic code, it's just demonstrating that string's front

Vladimir Panteleev (5/7) May 17 2016 BTW, where is this coming from? Is it simply an emergent property

H. S. Teoh via Digitalmars-d-announce (13/20) May 17 2016 This is very bad. The range API does not mandate that .front must be a

Steven Schveighoffer (41/58) May 19 2016 My goodness no!

H. S. Teoh via Digitalmars-d-announce (13/52) May 19 2016 OK, so the point is, use ElementType!(typeof(range)) instead of

Jonathan M Davis via Digitalmars-d-announce (28/45) May 18 2016 At this point, if anyone ever calls front with parens, they're doing it

jmh530 (4/6) May 18 2016 Is this true of all @property functions? Should this be noted in

Jack Stouffer (7/10) May 18 2016 No, this is purely a range thing where it's legal to have your
Jonathan M Davis via Digitalmars-d-announce (40/46) May 19 2016 It makes _no_ sense to use parens on a typical @property function. The w...

jmh530 (6/7) May 19 2016 Very informative, as always.

Jonathan M Davis via Digitalmars-d-announce (27/34) May 20 2016 I'm not aware of any, but there might be some. I think that most of use

Jack Stouffer (4/6) May 18 2016 $ cd ~/dlang/phobos && grep -r "\.front()" * | wc -l
Kagamin (3/6) May 19 2016 It would help if there was documented standard testing procedure

Jonathan M Davis via Digitalmars-d-announce (12/18) May 19 2016 We really need solid tools for testing an algorithm with a variety of ra...

H. S. Teoh via Digitalmars-d-announce (6/7) May 17 2016 [...]
Taylor Hillegeist (10/23) May 19 2016 I ran into an auto decoding problem earlier. Honestly I was

Jack Stouffer (4/8) May 19 2016 No, they need to _decode_ to dchar. I don't know if you

John Carter (11/18) May 19 2016 As a slightly tangential aside.....
Martin Nowak (3/3) May 20 2016 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
jmh530 (12/14) Jun 02 2016 I was a little confused by something in the main autodecoding

Steven Schveighoffer (5/20) Jun 02 2016 If that é above is an e followed by a combining character, then you

Andrei Alexandrescu (5/30) Jun 02 2016 Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on

jmh530 (7/8) Jun 02 2016 I'm just looking over std.uni's section on normalization and

Rory McGuire via Digitalmars-d-announce (7/16) Jun 02 2016 This dpaste shows a couple of issues with combining chars in D.

tsbockman (7/11) Jun 02 2016 Your paste behaves as expected: the "character" types in D are

Rory McGuire via Digitalmars-d-announce (4/17) Jun 03 2016 hmm, perhaps it behaves as documented, however I'm not certain that

tsbockman (3/7) Jun 02 2016 This might help a bit, as well:

Steven Schveighoffer (5/34) Jun 03 2016 I think it depends on what your browser presents. But impossible to tell...

Jack Stouffer (4/16) Jun 02 2016 Your browser is turning the é in the string into two code points

jmh530 (5/25) Jun 02 2016 That doesn't cause an assert to fail, but when I do

Jack Stouffer <jack jackstouffer.com> writes:

http://jackstouffer.com/blog/d_auto_decoding_and_you.html

Based on the recent thread in General, I wrote this blog post 
that's designed to be part beginner tutorial, part objective 
record of the debate over it, and finally my opinions on the 
matter.

When I first learned about auto-decoding, I was kinda miffed that 
there wasn't anything in the docs or on the website that went 
into more detail. So I wrote this in order to introduce people 
who are getting into D to the concept, it's benefits, and 
downsides. When people are confused in Learn why typeof(s.front) 
== dchar then this can just be linked to them.

If you think there should be any more information included in the 
article, please let me know so I can add it.

May 17 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/17/16 10:06 AM, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post that's
 designed to be part beginner tutorial, part objective record of the
 debate over it, and finally my opinions on the matter.

 When I first learned about auto-decoding, I was kinda miffed that there
 wasn't anything in the docs or on the website that went into more
 detail. So I wrote this in order to introduce people who are getting
 into D to the concept, it's benefits, and downsides. When people are
 confused in Learn why typeof(s.front) == dchar then this can just be
 linked to them.

 If you think there should be any more information included in the
 article, please let me know so I can add it.

Starting to read it, see errors in your examples:

is(s[0] == immutable char) -> is(typeof(s[0]) == immutable(char))
is(s.front == dchar) -> is(typeof(s.front()) == dchar)

I'm not sure if you need the parens after front, but if it's not marked 
as  property, then this returns a function.

-Steve

May 17 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Tuesday, 17 May 2016 at 14:16:48 UTC, Steven Schveighoffer 
wrote:
 Starting to read it, see errors in your examples:

 is(s[0] == immutable char) -> is(typeof(s[0]) == 
 immutable(char))
 is(s.front == dchar) -> is(typeof(s.front()) == dchar)

Thanks, fixed.

May 17 2016

Rory McGuire via Digitalmars-d-announce writes:

On 17 May 2016 16:21, "Steven Schveighoffer via Digitalmars-d-announce" <
digitalmars-d-announce puremagic.com> wrote:
 On 5/17/16 10:06 AM, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post that's
 designed to be part beginner tutorial, part objective record of the
 debate over it, and finally my opinions on the matter.

 When I first learned about auto-decoding, I was kinda miffed that there
 wasn't anything in the docs or on the website that went into more
 detail. So I wrote this in order to introduce people who are getting
 into D to the concept, it's benefits, and downsides. When people are
 confused in Learn why typeof(s.front) == dchar then this can just be
 linked to them.

 If you think there should be any more information included in the
 article, please let me know so I can add it.


 Starting to read it, see errors in your examples:

 is(s[0] == immutable char) -> is(typeof(s[0]) == immutable(char))
 is(s.front == dchar) -> is(typeof(s.front()) == dchar)

 I'm not sure if you need the parens after front, but if it's not marked

as  property, then this returns a function.
 -Steve

If I remember correctly adding the brackets then goes against best
practices because you can't be sure the underlying implementation of a
range is using a function for .front.

May 17 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/17/16 1:49 PM, Rory McGuire via Digitalmars-d-announce wrote:
 On 17 May 2016 16:21, "Steven Schveighoffer via Digitalmars-d-announce"
 <digitalmars-d-announce puremagic.com
 <mailto:digitalmars-d-announce puremagic.com>> wrote:
  >
  > On 5/17/16 10:06 AM, Jack Stouffer wrote:
  >>
  >> http://jackstouffer.com/blog/d_auto_decoding_and_you.html
  >>
  >> Based on the recent thread in General, I wrote this blog post that's
  >> designed to be part beginner tutorial, part objective record of the
  >> debate over it, and finally my opinions on the matter.
  >>
  >> When I first learned about auto-decoding, I was kinda miffed that there
  >> wasn't anything in the docs or on the website that went into more
  >> detail. So I wrote this in order to introduce people who are getting
  >> into D to the concept, it's benefits, and downsides. When people are
  >> confused in Learn why typeof(s.front) == dchar then this can just be
  >> linked to them.
  >>
  >> If you think there should be any more information included in the
  >> article, please let me know so I can add it.
  >
  >
  > Starting to read it, see errors in your examples:
  >
  > is(s[0] == immutable char) -> is(typeof(s[0]) == immutable(char))
  > is(s.front == dchar) -> is(typeof(s.front()) == dchar)
  >
  > I'm not sure if you need the parens after front, but if it's not
 marked as  property, then this returns a function.
  >

 If I remember correctly adding the brackets then goes against best
 practices because you can't be sure the underlying implementation of a
 range is using a function for .front.

Right, but there's this:

struct MyRange
{
    private int _val;
    int front() { return _val; }
    bool empty() { return _val < 100; }
    void popFront() { ++_val; }
}

static assert(isInputRange!MyRange);
static assert(!is(typeof(MyRange.init.front) == int));

This is why I recommended the parentheses. In reality, you should do 
ElementType!MyRange, which does the correct thing. But is(typeof(...)) 
doesn't always work like you might expect.

-Steve

May 17 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

Grammar:

"This will tie in later because the string types front has special behavior"

...because for string types, front has...

Content:
"For C style strings, you can use ubyte[] and call std.string.assumeUTF 
where necessary"

Actually, C style strings are ASCII, no? UTF8 includes ASCII. And D 
treats C strings as char *, not ubyte[]

Content:

D will look ahead in the string and combine things like e and U+0308 into ë

Nope :) This is a grapheme, and D does not decode these into one dchar.

Grammar:

"about it's inclusion"

it's -> its

Typo:

"Pared with the inability to turn it off,"

Pared -> Paired

Typo:

"Phobos String type would be the best option and a deprecation of the 
sting front function"

sting -> string

Like the article, pretty much sums up my thoughts too. IMO, the only 
path forward is something that aliases string to something that 
auto-decodes, but that is NOT a char array. Then you have to deprecate 
implicit access to the backing array, and make it explicit. Probably 
would take 2 years or so to migrate.

-Steve

May 17 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Tuesday, 17 May 2016 at 14:44:06 UTC, Steven Schveighoffer 
wrote:
 ...

Thanks, fixed all issues.

 Like the article, pretty much sums up my thoughts too.

Thanks!

May 17 2016

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post 
 that's designed to be part beginner tutorial, part objective 
 record of the debate over it, and finally my opinions on the 
 matter.

I probably would have preferred this split up into two parts with 
one on the tutorial and one on the record of the debate. It seems 
to focus more on the debate.

Maybe you could get add a section that was like "for THIS type of 
string, do X for performance" with clear explanations of why that 
is the best way and why other ways will be slower. Then, you can 
have other sub-sections for "for THAT type of string, do Y for 
performance." You have some of this detail in there, but it's 
organized more with respect to the context of the debate, I think.

May 17 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Tuesday, 17 May 2016 at 16:24:31 UTC, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post 
 that's designed to be part beginner tutorial, part objective 
 record of the debate over it, and finally my opinions on the 
 matter.

 I probably would have preferred this split up into two parts 
 with one on the tutorial and one on the record of the debate. 
 It seems to focus more on the debate.

That wasn't my intent. I wanted the debate to be a lens into a 
discussion of the technical merits and demerits of auto decoding. 
I have reworded some of the article in order to reflect this.

 Maybe you could get add a section that was like "for THIS type 
 of string, do X for performance" with clear explanations of why 
 that is the best way and why other ways will be slower. Then, 
 you can have other sub-sections for "for THAT type of string, 
 do Y for performance." You have some of this detail in there, 
 but it's organized more with respect to the context of the 
 debate, I think.

I will add this, thanks.

May 17 2016

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

Thanks for writing this. Great article.

Some remarks:

    static assert(is(typeof(s.front()) == dchar));

I believe .front is a property (so some ranges can implement it 
as a field, not a  property function). Hence, no parens.

 So, why is typeof(s.front) == dchar.

Question mark?

 In plain English, this means when iterating over strings in D, 
 D will look ahead in the string and combine any code units that 
 make up a single code point.

Perhaps clarify that this only applies to ranges. `foreach` on a 
string will iterate over chars, but you can iterate over code 
points if you specify the dchar type explicitly.

More confusing text on the same issue lower, and in the intro:

 Iterating a char array with C style for loops produces 
 different results than foreach loops due to auto decoding.

 One feature of D that is confusing to a lot of new comers is 
 the behavior of strings in relation to range based features 
 like the foreach statement and range algorithms.

---

 E.g. for ë the code units C3 AB (for UTF-8) would turn into a 
 single code point.

Perhaps choose a character that is not also expressable via 
composite characters, to avoid potential for confusion.

 string s = "cassé";

Ditto (unless the goal was to complement the example from my .d 
file below)

  These glaring inconsistencies are the cause of a lot of 
 confusion for new comers.

(Opinion) I would say that they also cause issues in generic code.

 Every time one wants a generic algorithm to work with both 
 strings and ranges, you wind up special casing via static 
 if-ing narrow strings to defeat the auto decoding, or to decode 
 the ranges. Case in point.

Link to the exact SHA to prevent the link from getting outdated. 
On Github, just hit 'y' on your keyboard to go to the "permalink" 
version.

 Auto decoding has two choices when encountering invalid code 
 units: throw, or produce an error dchar like std.utf.byUTF does.

(Aside) This was an interesting discussion on the subject: 
https://issues.dlang.org/show_bug.cgi?id=14519

 However, in my opinion D is too far along to to suddenly ask 
 people

"to to"

---

Some more info / links on the subject I collected a few years ago:

http://wiki.dlang.org/Language_issues#Unicode_and_ranges

May 17 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/17/16 1:18 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Thanks for writing this. Great article.

 Some remarks:

    static assert(is(typeof(s.front()) == dchar));

 I believe .front is a property (so some ranges can implement it as a
 field, not a  property function). Hence, no parens.

Right, but s is a string. So front is a function.

There is an inconsistency in the compiler for this. If s.front is a 
function is(typeof(s.front)) will not be what front *returns*, but the 
function type itself. Unless you tag with  property. However, it's 
perfectly legal for a front function not to be tagged  property.

-Steve

May 17 2016

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer 
wrote:
 On 5/17/16 1:18 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Thanks for writing this. Great article.

 Some remarks:

    static assert(is(typeof(s.front()) == dchar));

 I believe .front is a property (so some ranges can implement 
 it as a
 field, not a  property function). Hence, no parens.

 Right, but s is a string. So front is a function.

Then what happened to writing generic code?

 There is an inconsistency in the compiler for this. If s.front 
 is a function is(typeof(s.front)) will not be what front 
 *returns*, but the function type itself. Unless you tag with 
  property. However, it's perfectly legal for a front function 
 not to be tagged  property.

There is a simple answer to this, and it is to either use 
ElementType or do what it does (is(typeof(R.init.front.init) T)).

May 17 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/17/16 2:23 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
 On 5/17/16 1:18 PM, Vladimir Panteleev wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Thanks for writing this. Great article.

 Some remarks:

    static assert(is(typeof(s.front()) == dchar));

 I believe .front is a property (so some ranges can implement it as a
 field, not a  property function). Hence, no parens.

 Right, but s is a string. So front is a function.

 Then what happened to writing generic code?

This isn't generic code, it's just demonstrating that string's front 
does not yield immutable(char). It's very specific to string.

In my recommendation to add the parens, I wasn't sure if front is marked 
 property or not.

In any case, this is a lot of conversation about something that isn't 
that important :)

-Steve

May 17 2016

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer 
wrote:
 However, it's perfectly legal for a front function not to be 
 tagged  property.

BTW, where is this coming from? Is it simply an emergent property 
of the existing implementations of isInputRange and ElementType, 
or is it actually by design?

May 17 2016

"H. S. Teoh via Digitalmars-d-announce" writes:

On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
However, it's perfectly legal for a front function not to be tagged
 property.

 
 BTW, where is this coming from? Is it simply an emergent property of
 the existing implementations of isInputRange and ElementType, or is it
 actually by design?

This is very bad. The range API does not mandate that .front must be a
function. I often write ranges where .front is an actual struct variable
that gets updated by .popFront.  Now you're saying that my range won't
work with some code, because they call .front() (which is a compile
error when .front is a variable, not a function)?

In the old days (i.e., 1-2 years ago), isForwardRange!R will return
false if .save is not marked  property. I thought isInputRange!R did the
same for .front, or am I imagining things?  Did somebody change this
recently?


T

-- 
INTEL = Only half of "intelligence".

May 17 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 5/17/16 8:36 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
 However, it's perfectly legal for a front function not to be tagged
  property.

 BTW, where is this coming from? Is it simply an emergent property of
 the existing implementations of isInputRange and ElementType, or is it
 actually by design?

 This is very bad. The range API does not mandate that .front must be a
 function. I often write ranges where .front is an actual struct variable
 that gets updated by .popFront.  Now you're saying that my range won't
 work with some code, because they call .front() (which is a compile
 error when .front is a variable, not a function)?

My goodness no!

People, please, my point is simply that is(typeof(someRange.front) == 
ElementType!(typeof(someRange))) DOESN'T ALWAYS WORK.

Here is the (long standing) definition of isInputRange:

template isInputRange(R)
{
     enum bool isInputRange = is(typeof(
     (inout int = 0)
     {
         R r = R.init;     // can define a range object
         if (r.empty) {}   // can test for empty
         r.popFront();     // can invoke popFront()
         auto h = r.front; // can get the front of the range
     }));
}

Not there is no check for is(typeof(r.front)) to be some certain thing.

So this is a valid range:

struct AllZeros
{
     int front() { return 0; }
     enum empty = false;
     void popFront() {}
}

Yet, is(typeof(AllZeros.init.front) == int) will be false. This is the 
line of code from the article that I suggested to add the parens to. 
Because in that particular case, string.front is a function, not a 
field. The code in question is NOT GENERIC, it's just showing that 
string.front is not the same as string[0]. It's very specific to string.

 In the old days (i.e., 1-2 years ago), isForwardRange!R will return
 false if .save is not marked  property. I thought isInputRange!R did the
 same for .front, or am I imagining things?  Did somebody change this
 recently?

You are imagining that someInputRange.front ever required that. In fact, 
it would have had to go out of its way to do so (because isInputRange 
puts no requirements on the *type* of front, except that it returns a 
non-void value).

But you are right that save did require  property at one time. Not (In 
my opinion) because it meant to, but because it happened to check the 
type of r.save against a type (namely, that .save returns its own type).

At the same time, I fixed all the isXXXRange traits so  property is not 
required anywhere. In particular, isRandomAccessRange required r.front 
to be  property, even when isInputRange didn't (again, IMO 
unintentionally). Here is the PR: https://github.com/dlang/phobos/pull/3276

-Steve

May 19 2016

"H. S. Teoh via Digitalmars-d-announce" writes:

On Thu, May 19, 2016 at 09:21:40AM -0400, Steven Schveighoffer via
Digitalmars-d-announce wrote:
 On 5/17/16 8:36 PM, H. S. Teoh via Digitalmars-d-announce wrote:
 On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via
Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
 However, it's perfectly legal for a front function not to be
 tagged  property.

 
 BTW, where is this coming from? Is it simply an emergent property
 of the existing implementations of isInputRange and ElementType,
 or is it actually by design?

 
 This is very bad. The range API does not mandate that .front must be
 a function. I often write ranges where .front is an actual struct
 variable that gets updated by .popFront.  Now you're saying that my
 range won't work with some code, because they call .front() (which
 is a compile error when .front is a variable, not a function)?

 
 My goodness no!
 
 People, please, my point is simply that is(typeof(someRange.front) ==
 ElementType!(typeof(someRange))) DOESN'T ALWAYS WORK.

OK, so the point is, use ElementType!(typeof(range)) instead of
typeof(range.front)? That works for me. Sorry for the noise. :-P


[...]
 In the old days (i.e., 1-2 years ago), isForwardRange!R will return
 false if .save is not marked  property. I thought isInputRange!R did
 the same for .front, or am I imagining things?  Did somebody change
 this recently?

 
 You are imagining that someInputRange.front ever required that. In
 fact, it would have had to go out of its way to do so (because
 isInputRange puts no requirements on the *type* of front, except that
 it returns a non-void value).
 
 But you are right that save did require  property at one time. Not (In
 my opinion) because it meant to, but because it happened to check the
 type of r.save against a type (namely, that .save returns its own
 type).

Ah, so that's where it came from. Now I remember that there were bugs
caused by .save returning something other than the original range type,
which broke certain algorithms. That's probably where the whole .save
requiring  property thing came from.


 At the same time, I fixed all the isXXXRange traits so  property is
 not required anywhere. In particular, isRandomAccessRange required
 r.front to be  property, even when isInputRange didn't (again, IMO
 unintentionally). Here is the PR:
 https://github.com/dlang/phobos/pull/3276

[...]

Thanks for the info!


T

-- 
"Maybe" is a strange word.  When mom or dad says it it means "yes", but when my
big brothers say it it means "no"! -- PJ jr.

May 19 2016

Jonathan M Davis via Digitalmars-d-announce writes:

On Tuesday, May 17, 2016 17:36:44 H. S. Teoh via Digitalmars-d-announce wrote:
 On Tue, May 17, 2016 at 08:19:48PM +0000, Vladimir Panteleev via 

Digitalmars-d-announce wrote:
 On Tuesday, 17 May 2016 at 17:26:59 UTC, Steven Schveighoffer wrote:
However, it's perfectly legal for a front function not to be tagged
 property.

 BTW, where is this coming from? Is it simply an emergent property of
 the existing implementations of isInputRange and ElementType, or is it
 actually by design?

 This is very bad. The range API does not mandate that .front must be a
 function. I often write ranges where .front is an actual struct variable
 that gets updated by .popFront.  Now you're saying that my range won't
 work with some code, because they call .front() (which is a compile
 error when .front is a variable, not a function)?

At this point, if anyone ever calls front with parens, they're doing it
wrong. The range API checks that accessing front as if it were a variable
works and _not_ whether calling it as a function works. So, front can be a
variable, an enum (though that wouldn't make much sense), a function, or a
property function. All of those are legal. And properly written range-based
code will work with all of them, because it won't use parens.

The only reason to use parens on front would be if it returned a callable,
and since  property doesn't handle that correctly right now (the first set
of parens still call the function, not what it returns), it really doesn't
work correctly to have a range of callables - at least not and call them
without assigning them to a variable first.

 In the old days (i.e., 1-2 years ago), isForwardRange!R will return
 false if .save is not marked  property. I thought isInputRange!R did the
 same for .front, or am I imagining things?  Did somebody change this
 recently?

IIRC save stopped checking for that around dconf of last year, since given
that it's perfectly legit to call normal functions without parens, and there
is no strong property enforcement, it makes no sense to require that save be
a property function. But I don't think that isInputRange ever checked that
front or empty were property functions (and if it did, I don't know that it
would have worked for them to ever be variables - that would depend on how
the check for whether it was a property function was done).

Regardless, anyone who ever calls front or empty with parens is writing bad
code that will not work generically, because that's not what the range API
dictates. That's one reason why I wish that we had strict property
enforcement, but there's no way that we're getting that at this point. So,
while we do have enforcement of how ranges _can_ be used, we don't have
enforcement of how they _are_ used, and I don't expect that we'll ever get
that.

- Jonathan M Davis

May 18 2016

jmh530 <john.michael.hall gmail.com> writes:

On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 At this point, if anyone ever calls front with parens, they're 
 doing it wrong.

Is this true of all  property functions? Should this be noted in 
the spec? Should it be an error? If it shouldn't be an error, is 
it really such a bad thing?

May 18 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 18 May 2016 at 22:23:45 UTC, jmh530 wrote:
 Is this true of all  property functions?

No, this is purely a range thing where it's legal to have your 
front be a public member variable rather than a getter function.

 Should this be noted in the spec?

While somewhat supported in the language, at the end of the day 
ranges are library types, so no.

 Should it be an error?

No, people's code will error if they try to call a non callable 
anyway.

May 18 2016

Jonathan M Davis via Digitalmars-d-announce writes:

On Wednesday, May 18, 2016 22:23:45 jmh530 via Digitalmars-d-announce wrote:
 On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 At this point, if anyone ever calls front with parens, they're
 doing it wrong.

 Is this true of all  property functions? Should this be noted in
 the spec? Should it be an error? If it shouldn't be an error, is
 it really such a bad thing?

It makes _no_ sense to use parens on a typical  property function. The whole
point of properties is that they act like variables. If you're marking a
function with  property, you're clearly indicating that it's intended to be
treated as if it were a variable and not as a function. So, in principle, if
you're using parens on a property function, the parens should be used on the
return value and _not_ the function. That being said, we've never ended up
with property enforcement of any kind being added to the language.  So,
while the compiler _should_ require that an  property function be called
without parens, it doesn't. And without that requirement,  property really
doesn't do much. If we want properties to work where the type is a callable
like a delegate (e.g. you have a range of delegates, so front returns a
delegate), then it's going to need to change so that using parens on an
 property function actually uses the paren on the return value, and without
that  property is nothing more than documentation. So, right now,  property
is pretty much just documentation about how the person who wrote the code
expects you to use it. It doesn't really do anything. It does have some
affect with regards to typeof, but overall, it does nothing.

Now, that being said, when I was talking about calling front with parens, I
wasn't really talking about property functions - though front is frequently
a property function. Rather, my point was that isInputRange requires that
this code compile:

        R r = R.init;     // can define a range object
        if (r.empty) {}   // can test for empty
        r.popFront();     // can invoke popFront()
        auto h = r.front; // can get the front of the range

If that code compiles with a given type, then that type can be used as an
input range. That API is in the API defined for input ranegs. It does _not_
call front with parens nor does it call empty with parens. Rather, it
explicitly uses them _without_ parens. So, they could be property functions,
or variables, or normal functions that just don't get called with parens, or
anything else that can be used without parens and compile with that code.
So, if you write a range-based algorithm that uses parens on empty or front,
then you're writing an algorithm that does not follow the range API and
which will not work with many ranges.  The range API does _not_ guarantee
that either front or empty can be used with parens. It guarantees that they
can be used _without_ them. So, if your code ever uses parens on front or
empty, then it's using the range API incorrectly and risks not compiling
with many ranges.

- Jonathan M Davis

May 19 2016

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 19 May 2016 at 12:10:36 UTC, Jonathan M Davis wrote:
 [snip]

Very informative, as always.

I had not realized the implication of front being called without 
parens, such as front being something that isn't an  property 
function (esp. variable). Are there any ranges in phobos (you can 
think of) that do this?

May 19 2016

Jonathan M Davis via Digitalmars-d-announce writes:

On Thursday, May 19, 2016 13:11:47 jmh530 via Digitalmars-d-announce wrote:
 On Thursday, 19 May 2016 at 12:10:36 UTC, Jonathan M Davis wrote:
 [snip]

 Very informative, as always.

 I had not realized the implication of front being called without
 parens, such as front being something that isn't an  property
 function (esp. variable). Are there any ranges in phobos (you can
 think of) that do this?

I'm not aware of any, but there might be some. I think that most of use
pretty much always use  property functions, though the usual reasons for
that don't really apply to Voldemort types. The usual reason to use an
 property function instead of a public variable is that while property
functions emulate variables, they really aren't the same (e.g. taking its
address doesn't have the same semantics, and stuff like incrementing doesn't
normally work with property functions). So, unfortunately, you can't
transparently swap between  property functions and variables, even though
that's theoretically one of the reasons that property functions exist in a
language. But when using a range that's a Voldemort type, you don't even
ever see the type's declaration, and you're only ever supposed to use the
range API on it, so the semantics of variable vs function don't really
matter, since you're not supposed to be doing any of the stuff where it
would matter (e.g. it makes no sense to take the address of front, because
what that means is not specified by the range API and will do different
things with different range implementations).

So, it's arguably better to just use public variables with ranges if
functions like front or empty are just going to return a value, but many of
use just use  property functions out of habit given that in the general
case, it's problematic to use public variables instead of  property
functions (though it _would_ be a nice language enhancement IMHO if
using  property on a variable made it illegal to do anything on it that you
couldn't do on an  property function, since then you could make it a public
variable until refactoring required that it become a function, and changing
it wouldn't break code, whereas now, it might).

- Jonathan M Davis

May 20 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 At this point, if anyone ever calls front with parens, they're 
 doing it wrong.

$ cd ~/dlang/phobos && grep -r "\.front()" * | wc -l
3

Not bad. One is commented out and the other two look intentional.

May 18 2016

Kagamin <spam here.lot> writes:

On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 So, while we do have enforcement of how ranges _can_ be used, 
 we don't have enforcement of how they _are_ used, and I don't 
 expect that we'll ever get that.

It would help if there was documented standard testing procedure 
(and used for all algorithms).

May 19 2016

Jonathan M Davis via Digitalmars-d-announce writes:

On Thursday, May 19, 2016 09:05:53 Kagamin via Digitalmars-d-announce wrote:
 On Wednesday, 18 May 2016 at 20:10:09 UTC, Jonathan M Davis wrote:
 So, while we do have enforcement of how ranges _can_ be used,
 we don't have enforcement of how they _are_ used, and I don't
 expect that we'll ever get that.

 It would help if there was documented standard testing procedure
 (and used for all algorithms).

We really need solid tools for testing an algorithm with a variety of ranges
to verify that it's doing the right thing as well as tools to test that a
range behaves how ranges are supposed to behave. The closest that we have to
that is that std.range has some internal helpers for testing ranges, but
they're not that great, and I don't think that any of it's public. And we
don't have anything for testing that a range acts correctly - just that it
follows the right API syntactically with the minimal semantic checking that
can be done with typeof. So, there's work to be done. I'd started some of it
a while back, but I never got very far, and no one else has done anything
like it AFAIK.

- Jonathan M Davis

May 19 2016

"H. S. Teoh via Digitalmars-d-announce" writes:

On Tue, May 17, 2016 at 02:06:37PM +0000, Jack Stouffer via
Digitalmars-d-announce wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

[...]

Thanks for writing up this article!


T

-- 
What did the alien say to Schubert? "Take me to your lieder."

May 17 2016

Taylor Hillegeist <taylorh140 gmail.com> writes:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 Based on the recent thread in General, I wrote this blog post 
 that's designed to be part beginner tutorial, part objective 
 record of the debate over it, and finally my opinions on the 
 matter.

 When I first learned about auto-decoding, I was kinda miffed 
 that there wasn't anything in the docs or on the website that 
 went into more detail. So I wrote this in order to introduce 
 people who are getting into D to the concept, it's benefits, 
 and downsides. When people are confused in Learn why 
 typeof(s.front) == dchar then this can just be linked to them.

 If you think there should be any more information included in 
 the article, please let me know so I can add it.

I ran into an auto decoding problem earlier. Honestly I was 
upset, and I think I was rightly upset. The programming language 
moved my cheese without telling me. I tend to believe in the 
route of least surprise. if as a newbie I am doing something 
stupid and find out i was wrong, that is one thing. but if i 
continue to do something wrong and find out that the programming 
language thinks I am stupid that's another thing.

If people want auto coding behavior shouldn't they just use or 
convert to dchar?

May 19 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Thursday, 19 May 2016 at 17:16:54 UTC, Taylor Hillegeist wrote:
 If people want auto coding behavior shouldn't they just use or 
 convert to dchar?

No, they need to _decode_ to dchar. I don't know if you 
miss-typed, but if you didn't, this is exactly what I was talking 
about in the article:

 "Unicode is hard. Trying to hide Unicode specifics helps no one 
 because it's going to bite you in the ass eventually."

May 19 2016

John Carter <john.carter taitradio.com> writes:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 http://jackstouffer.com/blog/d_auto_decoding_and_you.html

 There are lots of places where invalid Unicode is either 
 commonplace or legal, e.g. Linux file names, and therefore 
 auto decoding cannot be used. It turns out in the wild that 
 pure Unicode is not universal - there's lots of dirty Unicode 
 that should remain unmolested because it's user data, and auto 
 decoding does not play well with that mentality.


As a slightly tangential aside.....

https://lwn.net/Articles/686392/

There exists a proposal for a linux kernel module to render the 
creation of such names impossible.....

I for one will install it on all my systems as soon as I can.

However, until then, my day job requires me to find, scan and 
analyze and work with whatever crud, the herd of cats I work 
with, throws into the repo.

And no, sadly I can't just rewrite everything because they (or 
some tool they use) doesn't understand UTF8.

May 19 2016

Martin Nowak <code dawg.eu> writes:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:

Related discussion 
https://trello.com/c/4XmFdcp6/163-rediscuss-redundant-utf-8-string-validation.

May 20 2016

jmh530 <john.michael.hall gmail.com> writes:

On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in 
 the article, please let me know so I can add it.

I was a little confused by something in the main autodecoding 
thread, so I read your article again. Unfortunately, I don't 
think my confusion is resolved. I was trying one of your examples 
(full code I used below). You claim it works, but I keep getting 
assertion failures. I'm just running it with rdmd on Windows 7.


import std.algorithm : canFind;

void main()
{
	string s = "cassé";

	assert(s.canFind!(x => x == 'é'));
}

Jun 02 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/2/16 5:21 PM, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in the
 article, please let me know so I can add it.

 I was a little confused by something in the main autodecoding thread, so
 I read your article again. Unfortunately, I don't think my confusion is
 resolved. I was trying one of your examples (full code I used below).
 You claim it works, but I keep getting assertion failures. I'm just
 running it with rdmd on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
     string s = "cassé";

     assert(s.canFind!(x => x == 'é'));
 }

If that é above is an e followed by a combining character, then you 
will get the error. This is because autodecoding does not auto normalize 
as well -- the code points have to match exactly.

-Steve

Jun 02 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/2/16 5:27 PM, Steven Schveighoffer wrote:
 On 6/2/16 5:21 PM, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in the
 article, please let me know so I can add it.

 I was a little confused by something in the main autodecoding thread, so
 I read your article again. Unfortunately, I don't think my confusion is
 resolved. I was trying one of your examples (full code I used below).
 You claim it works, but I keep getting assertion failures. I'm just
 running it with rdmd on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
     string s = "cassé";

     assert(s.canFind!(x => x == 'é'));
 }

 If that é above is an e followed by a combining character, then you will
 get the error. This is because autodecoding does not auto normalize as
 well -- the code points have to match exactly.

 -Steve

Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on 
OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5

Should I assume some normalization occurred on the way?


Andrei

Jun 02 2016

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu 
wrote:
 Should I assume some normalization occurred on the way?

I'm just looking over std.uni's section on normalization and 
realizing that I had basically no idea what it is or what's going 
on. The wikipedia page on unicode equivalence is a bit clearer.

I'm definitely nowhere near qualified to have an opinion on these 
issues.

Jun 02 2016

Rory McGuire via Digitalmars-d-announce writes:

On Fri, Jun 3, 2016 at 5:16 AM, jmh530 via Digitalmars-d-announce
<digitalmars-d-announce puremagic.com> wrote:
 On Thursday, 2 June 2016 at 21:33:02 UTC, Andrei Alexandrescu wrote:
 Should I assume some normalization occurred on the way?

 I'm just looking over std.uni's section on normalization and realizing that
 I had basically no idea what it is or what's going on. The wikipedia page on
 unicode equivalence is a bit clearer.

 I'm definitely nowhere near qualified to have an opinion on these issues.


This dpaste shows a couple of issues with combining chars in D.

https://dpaste.dzfl.pl/4b006959c5c0

The compiler actually can't handle a combining character literal
either. see line 10.

R

Jun 02 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 3 June 2016 at 06:37:59 UTC, Rory McGuire wrote:
 This dpaste shows a couple of issues with combining chars in D.

 https://dpaste.dzfl.pl/4b006959c5c0

 The compiler actually can't handle a combining character 
 literal either. see line 10.

Your paste behaves as expected: the "character" types in D are 
defined as single Unicode code units. By definition, the NFD form 
of "é" is not a single code unit. You would need to use a 
Grapheme or [w|d]string for that.

(Of course, one might reasonably question how useful our built-in 
character types actually are compared to ubyte/ushort/uint.)

Jun 02 2016

Rory McGuire via Digitalmars-d-announce writes:

On Fri, Jun 3, 2016 at 8:58 AM, tsbockman via Digitalmars-d-announce
<digitalmars-d-announce puremagic.com> wrote:
 On Friday, 3 June 2016 at 06:37:59 UTC, Rory McGuire wrote:
 This dpaste shows a couple of issues with combining chars in D.

 https://dpaste.dzfl.pl/4b006959c5c0

 The compiler actually can't handle a combining character literal either.
 see line 10.


 Your paste behaves as expected: the "character" types in D are defined as
 single Unicode code units. By definition, the NFD form of "é" is not a
 single code unit. You would need to use a Grapheme or [w|d]string for that.

 (Of course, one might reasonably question how useful our built-in character
 types actually are compared to ubyte/ushort/uint.)

hmm, perhaps it behaves as documented, however I'm not certain that
its expected :).

Jun 03 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 3 June 2016 at 03:16:33 UTC, jmh530 wrote:
 I'm just looking over std.uni's section on normalization and 
 realizing that I had basically no idea what it is or what's 
 going on. The wikipedia page on unicode equivalence is a bit 
 clearer.

This might help a bit, as well:
     https://dpaste.dzfl.pl/2ffb22b02842

Jun 02 2016

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 6/2/16 5:33 PM, Andrei Alexandrescu wrote:
 On 6/2/16 5:27 PM, Steven Schveighoffer wrote:
 On 6/2/16 5:21 PM, jmh530 wrote:
 On Tuesday, 17 May 2016 at 14:06:37 UTC, Jack Stouffer wrote:
 If you think there should be any more information included in the
 article, please let me know so I can add it.

 I was a little confused by something in the main autodecoding thread, so
 I read your article again. Unfortunately, I don't think my confusion is
 resolved. I was trying one of your examples (full code I used below).
 You claim it works, but I keep getting assertion failures. I'm just
 running it with rdmd on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
     string s = "cassé";

     assert(s.canFind!(x => x == 'é'));
 }

 If that é above is an e followed by a combining character, then you will
 get the error. This is because autodecoding does not auto normalize as
 well -- the code points have to match exactly.

 Indeed. FWIW I just copied OP's code from Thunderbird into Chrome (on
 OSX) and it worked: https://dpaste.dzfl.pl/09b9188d87a5

 Should I assume some normalization occurred on the way?

I think it depends on what your browser presents. But impossible to tell 
without being on the OP's machine to see what it's actually stored as. 
Thunderbird may have normalized as well!

-Steve

Jun 03 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
 I was a little confused by something in the main autodecoding 
 thread, so I read your article again. Unfortunately, I don't 
 think my confusion is resolved. I was trying one of your 
 examples (full code I used below). You claim it works, but I 
 keep getting assertion failures. I'm just running it with rdmd 
 on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
 	string s = "cassé";

 	assert(s.canFind!(x => x == 'é'));
 }

Your browser is turning the é in the string into two code points 
via normalization whereas it should be one. Try using \u00E9 
instead.

Jun 02 2016

jmh530 <john.michael.hall gmail.com> writes:

On Thursday, 2 June 2016 at 21:31:39 UTC, Jack Stouffer wrote:
 On Thursday, 2 June 2016 at 21:21:50 UTC, jmh530 wrote:
 I was a little confused by something in the main autodecoding 
 thread, so I read your article again. Unfortunately, I don't 
 think my confusion is resolved. I was trying one of your 
 examples (full code I used below). You claim it works, but I 
 keep getting assertion failures. I'm just running it with rdmd 
 on Windows 7.


 import std.algorithm : canFind;

 void main()
 {
 	string s = "cassé";

 	assert(s.canFind!(x => x == 'é'));
 }

 Your browser is turning the é in the string into two code 
 points via normalization whereas it should be one. Try using 
 \u00E9 instead.

That doesn't cause an assert to fail, but when I do  
writeln('\u00E9') I get ├⌐. So there might still be something 
wonky going on. I looked up \u00E9 online and I don't think 
there's an error with that.

Jun 02 2016

D Programming

C/C++ Programming

Other

digitalmars.D.announce - D's Auto Decoding and You