www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Should this work?

reply Manu <turkeyman gmail.com> writes:
This works fine:
  string x = find("Hello", 'H');

This doesn't:
  string y = find(retro("Hello"), 'H');
  > Error: cannot implicitly convert expression (find(retro("Hello"), 'H'))
of type Result!() to string

Is that wrong? That seems to be how the docs suggest it should be used.

On a side note, am I the only one that finds std.algorithm/std.range/etc
for string processing really obtuse?
I can rarely understand the error messages, so say it's better than STL is
optimistic.
Using std.algorithm and std.range to do string manipulation feels really
lame to me.
I hate looking through the docs of 3-4 modules to understand the complete
set of useful string operations (std.string, std.uni, std.algorithm,
std.range... at least).
I also find the names of the generic algorithms are often unrelated to the
name of the string operation.
My feeling is, everyone is always on about how cool D is at string, but
other than 'char[]', and the builtin slice operator, I feel really
unproductive whenever I do any heavy string manipulation in D.
I also hate that I need to import at least 4-5 modules to do anything
useful with strings... I feel my program bloating and cringe with every
gigantic import that sources exactly one symbol.
Jan 09 2014
next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 This works fine:
   string x = find("Hello", 'H');

 This doesn't:
   string y = find(retro("Hello"), 'H');
   > Error: cannot implicitly convert expression 
 (find(retro("Hello"), 'H'))
 of type Result!() to string

 Is that wrong? That seems to be how the docs suggest it should 
 be used.

 On a side note, am I the only one that finds 
 std.algorithm/std.range/etc
 for string processing really obtuse?
 I can rarely understand the error messages, so say it's better 
 than STL is
 optimistic.
 Using std.algorithm and std.range to do string manipulation 
 feels really
 lame to me.
 I hate looking through the docs of 3-4 modules to understand 
 the complete
 set of useful string operations (std.string, std.uni, 
 std.algorithm,
 std.range... at least).
 I also find the names of the generic algorithms are often 
 unrelated to the
 name of the string operation.
 My feeling is, everyone is always on about how cool D is at 
 string, but
 other than 'char[]', and the builtin slice operator, I feel 
 really
 unproductive whenever I do any heavy string manipulation in D.
 I also hate that I need to import at least 4-5 modules to do 
 anything
 useful with strings... I feel my program bloating and cringe 
 with every
 gigantic import that sources exactly one symbol.
std.algorithm.find returns the type it gets as input, so it's retros return type and not string. I agree, that it isn't always obvious which types are expected or returned in std.algorithm and especially std.container
Jan 09 2014
prev sibling next sibling parent reply "Tobias Pankrath" <tobias pankrath.net> writes:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 Is that wrong? That seems to be how the docs suggest it should 
 be used.
-- string s = find(retro("Hello"), "H").source; -- Is that working?
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 00:19, Tobias Pankrath <tobias pankrath.net> wrote:

 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

 Is that wrong? That seems to be how the docs suggest it should be used.
-- string s = find(retro("Hello"), "H").source; -- Is that working?
If I have to type that, I'm going to write my own string library... There's no argument where that can be considered superior to: strrchr("Hello", 'H');
Jan 09 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 10 Jan 2014 00:33:28 +1000
schrieb Manu <turkeyman gmail.com>:

 On 10 January 2014 00:19, Tobias Pankrath <tobias pankrath.net> wrote:
 
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

 Is that wrong? That seems to be how the docs suggest it should be used.
-- string s = find(retro("Hello"), "H").source; -- Is that working?
If I have to type that, I'm going to write my own string library... There's no argument where that can be considered superior to: strrchr("Hello", 'H');
If you do let me know, we can merge the efforts. Coincidentally what I started uses your std.simd: http://code.dlang.org/packages/fast https://github.com/mleise/fast I haven't pushed the latest changes which include updates to the latest D versions and switching between lookup tables and SSE3 for char in string search. The idea is to build a collection of the fastest versions of basic utility functions. No safety nets, no garbage collection. -- Marco
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 01:04, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Fri, 10 Jan 2014 00:33:28 +1000
 schrieb Manu <turkeyman gmail.com>:

 On 10 January 2014 00:19, Tobias Pankrath <tobias pankrath.net> wrote:

 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

 Is that wrong? That seems to be how the docs suggest it should be
used.

 --
 string s = find(retro("Hello"), "H").source;
 --
 Is that working?
If I have to type that, I'm going to write my own string library... There's no argument where that can be considered superior to: strrchr("Hello", 'H');
If you do let me know, we can merge the efforts. Coincidentally what I started uses your std.simd: http://code.dlang.org/packages/fast https://github.com/mleise/fast I haven't pushed the latest changes which include updates to the latest D versions and switching between lookup tables and SSE3 for char in string search. The idea is to build a collection of the fastest versions of basic utility functions. No safety nets, no garbage collection.
Awesome! Although it looks like you still have a lot of work ahead of you :)
Jan 09 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 10 Jan 2014 01:20:26 +1000
schrieb Manu <turkeyman gmail.com>:

 Awesome! Although it looks like you still have a lot of work ahead of you :)
So... when was std.simd going to be in Phobos again? :p -- Marco
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 01:56, Marco Leise <Marco.Leise gmx.de> wrote:

 Am Fri, 10 Jan 2014 01:20:26 +1000
 schrieb Manu <turkeyman gmail.com>:

 Awesome! Although it looks like you still have a lot of work ahead of
you :) So... when was std.simd going to be in Phobos again? :p
When there are a zillion unit tests >_< And I kinda wanna prove it is efficient on other architectures before it is committed to the stone tablet that is phobos; that can never be changed once committed.
Jan 09 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 10 Jan 2014 02:21:35 +1000
schrieb Manu <turkeyman gmail.com>:

 On 10 January 2014 01:56, Marco Leise <Marco.Leise gmx.de> wrote:
 
 Am Fri, 10 Jan 2014 01:20:26 +1000
 schrieb Manu <turkeyman gmail.com>:

 Awesome! Although it looks like you still have a lot of work ahead of
you :) So... when was std.simd going to be in Phobos again? :p
When there are a zillion unit tests >_< And I kinda wanna prove it is efficient on other architectures before it is committed to the stone tablet that is phobos; that can never be changed once committed.
I Phobos should follow OpenGL in this regard and use a prefix like `etc` for useful but not finalized modules, so early adapters can try out new modules compare them with any existing API in Phobos where applicable (e.g. streams, json, ...) and report any issues. I have a feeling that right now most modules are tested by 2 people prior to the merge, because they spent a life in obscurity. -- Marco
Jan 09 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-09 17:35, Marco Leise wrote:

 I Phobos should follow OpenGL in this regard and use a
 prefix like `etc` for useful but not finalized modules, so
 early adapters can try out new modules compare them with any
 existing API in Phobos where applicable (e.g. streams,
 json, ...) and report any issues. I have a feeling that right
 now most modules are tested by 2 people prior to the merge,
 because they spent a life in obscurity.
That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental. Someone here said that the javax. packages originally was experimental packages to they continued to live in the javax namespace to avoid breaking changes. -- /Jacob Carlborg
Jan 09 2014
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 09, 2014 at 09:19:40PM +0100, Jacob Carlborg wrote:
 On 2014-01-09 17:35, Marco Leise wrote:
 
I Phobos should follow OpenGL in this regard and use a
prefix like `etc` for useful but not finalized modules, so
early adapters can try out new modules compare them with any
existing API in Phobos where applicable (e.g. streams,
json, ...) and report any issues. I have a feeling that right
now most modules are tested by 2 people prior to the merge,
because they spent a life in obscurity.
That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental. Someone here said that the javax. packages originally was experimental packages to they continued to live in the javax namespace to avoid breaking changes.
[...] Maybe instead of calling it 'etc' we should outright call it 'experimental'. If you have code like: import experimental.myawesomemodule; ... I doubt you'd object very much when you have to rename it to: import std.myawesomemodule; ... since the word 'experimental' staring you in the face every time you open up the file will be a constant nagging reminder that you're depending on something unstable, giving you motivation to want to move it to something stable as soon as you can. T -- "I speak better English than this villain Bush" -- Mohammed Saeed al-Sahaf, Iraqi Minister of Information
Jan 09 2014
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 9 January 2014 at 20:40:30 UTC, H. S. Teoh wrote:
 On Thu, Jan 09, 2014 at 09:19:40PM +0100, Jacob Carlborg wrote:
 On 2014-01-09 17:35, Marco Leise wrote:
 
I Phobos should follow OpenGL in this regard and use a
prefix like `etc` for useful but not finalized modules, so
early adapters can try out new modules compare them with any
existing API in Phobos where applicable (e.g. streams,
json, ...) and report any issues. I have a feeling that right
now most modules are tested by 2 people prior to the merge,
because they spent a life in obscurity.
That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental. Someone here said that the javax. packages originally was experimental packages to they continued to live in the javax namespace to avoid breaking changes.
[...] Maybe instead of calling it 'etc' we should outright call it 'experimental'. If you have code like: import experimental.myawesomemodule; ... I doubt you'd object very much when you have to rename it to: import std.myawesomemodule; ... since the word 'experimental' staring you in the face every time you open up the file will be a constant nagging reminder that you're depending on something unstable, giving you motivation to want to move it to something stable as soon as you can. T
I was of the opinion that phobos needed an experimental section for getting real world testing of proposed modules but these days I think we should just stick things up on dub (including modules proposed for inclusion in phobos).
Jan 09 2014
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 09 Jan 2014 23:32:37 +0000
schrieb "Brad Anderson" <eco gnuk.net>:

 On Thursday, 9 January 2014 at 20:40:30 UTC, H. S. Teoh wrote:
 On Thu, Jan 09, 2014 at 09:19:40PM +0100, Jacob Carlborg wrote:
 On 2014-01-09 17:35, Marco Leise wrote:
 
I Phobos should follow OpenGL in this regard and use a
prefix like `etc` for useful but not finalized modules, so
early adapters can try out new modules compare them with any
existing API in Phobos where applicable (e.g. streams,
json, ...) and report any issues. I have a feeling that right
now most modules are tested by 2 people prior to the merge,
because they spent a life in obscurity.
That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental. Someone here said that the javax. packages originally was experimental packages to they continued to live in the javax namespace to avoid breaking changes.
[...] Maybe instead of calling it 'etc' we should outright call it 'experimental'. If you have code like: import experimental.myawesomemodule; ... I doubt you'd object very much when you have to rename it to: import std.myawesomemodule; ... since the word 'experimental' staring you in the face every time you open up the file will be a constant nagging reminder that you're depending on something unstable, giving you motivation to want to move it to something stable as soon as you can. T
I was of the opinion that phobos needed an experimental section for getting real world testing of proposed modules but these days I think we should just stick things up on dub (including modules proposed for inclusion in phobos).
Dub is a nice extension to D, but it falls way to short for what I expect package management to do on Linux: o system wide installation of shared libraries o keep one library per ABI, where ABI is a cross of: (x86,amd64) x (dmd,ldc,gdc) x (compiler version) o therefore accept custom library installation paths o remove packages that weren't explicitly requested and are not a dependency of something else Actually I don't expect dub to do all that. It duplicates parts of the existing package manager, which has to be used to seamlessly integrate with the rest of the system anyway. Dub does make building foreign packages a snap and that's what it is great for, but soon people will expect complete applications with their dependencies to be in the package list. At least as soon as someone writes something _popular_ in D that uses more than just DMD. (This excludes vibe.d for example, which is - I think - the most notable D product outside of this community.) -- Marco
Jan 10 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 06:19, Jacob Carlborg <doob me.com> wrote:

 On 2014-01-09 17:35, Marco Leise wrote:

  I Phobos should follow OpenGL in this regard and use a
 prefix like `etc` for useful but not finalized modules, so
 early adapters can try out new modules compare them with any
 existing API in Phobos where applicable (e.g. streams,
 json, ...) and report any issues. I have a feeling that right
 now most modules are tested by 2 people prior to the merge,
 because they spent a life in obscurity.
That has been suggested before and the counter argument is that people will start using and complain when it's changed, even if it's in an experimental.
I've heard that, and I think that's a lame argument. Would people rather break peoples code *who deliberately chose to use a beta feature, and accept the contract while doing so (that it would later be moved to 'std' proper)*, or consistently produce features that have very little proven foundation in practical application? It takes year(/s) before enough people can have had a crack at a new API in enough scenarios to reveal where it went right, and where it went wrong. In the case of std.simd, I'm not ever going to consider presenting it for inclusion until such a time I'm absolutely happy with it (although in this case, it's also just not finished ;), and since it's not readily available, that really just relies on my using it in enough of my own projects that I manage to satisfy myself... it makes no sense. Someone here said that the javax. packages originally was experimental
 packages to they continued to live in the javax namespace to avoid breaking
 changes.

 --
 /Jacob Carlborg
Jan 09 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-10 01:57, Manu wrote:

 I've heard that, and I think that's a lame argument. Would people rather
 break peoples code *who deliberately chose to use a beta feature, and
 accept the contract while doing so (that it would later be moved to
 'std' proper)*, or consistently produce features that have very little
 proven foundation in practical application? It takes year(/s) before
 enough people can have had a crack at a new API in enough scenarios to
 reveal where it went right, and where it went wrong.
I think it's a good idea, others don't. -- /Jacob Carlborg
Jan 09 2014
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 10 Jan 2014 08:42:14 +0100
schrieb Jacob Carlborg <doob me.com>:

 On 2014-01-10 01:57, Manu wrote:
 
 I've heard that, and I think that's a lame argument. Would people rather
 break peoples code *who deliberately chose to use a beta feature, and
 accept the contract while doing so (that it would later be moved to
 'std' proper)*, or consistently produce features that have very little
 proven foundation in practical application? It takes year(/s) before
 enough people can have had a crack at a new API in enough scenarios to
 reveal where it went right, and where it went wrong.
I think it's a good idea, others don't.
When do we have a meeting of the elders to decide on this matter? -- Marco
Jan 10 2014
prev sibling next sibling parent reply Benjamin Thaut <code benjamin-thaut.de> writes:
Am 09.01.2014 15:07, schrieb Manu:
 On a side note, am I the only one that finds std.algorithm/std.range/etc
 for string processing really obtuse?
 I can rarely understand the error messages, so say it's better than STL
 is optimistic.
 Using std.algorithm and std.range to do string manipulation feels really
 lame to me.
 I hate looking through the docs of 3-4 modules to understand the
 complete set of useful string operations (std.string, std.uni,
 std.algorithm, std.range... at least).
 I also find the names of the generic algorithms are often unrelated to
 the name of the string operation.
 My feeling is, everyone is always on about how cool D is at string, but
 other than 'char[]', and the builtin slice operator, I feel really
 unproductive whenever I do any heavy string manipulation in D.
 I also hate that I need to import at least 4-5 modules to do anything
 useful with strings... I feel my program bloating and cringe with every
 gigantic import that sources exactly one symbol.
named in a way that actually helps you understand what they do. The best example in D is the deprection of indexOf. Now you have to call countUntil. But if I have to choose between the two names, indexOf actually tells me what it does, while countUntil does not. count until what? The confusion mostly comes from the condition which is a template argument with default value. Not to speak of the issues with UTF8 characters, where countUntil does not actually give you a index into the array, but actually gives you the index of the character it found. So you can't use whatever comes out of countUntil for slicing. -- Kind Regards Benjamin Thaut
Jan 09 2014
next sibling parent "Kira Backes" <kira.backes nrwsoft.de> writes:
  On Thursday, 9 January 2014 at 14:25:20 UTC, Benjamin Thaut 
wrote:
 The best example in D is the deprection of indexOf. Now you 
 have to call countUntil. But if I have to choose between the 
 two names, indexOf actually tells me what it does, while 
 countUntil does not. count until what?
std.algorithm.indexOf was deprecated, not std.string.indexOf, so you can still use it of course and it still gives you the byte (array-access) index of the supplied parameter. And countUntil counts elements until it finds the supplied parameter. I think this is logical and useful and easy to understand.
Jan 09 2014
prev sibling parent reply "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Thursday, 9 January 2014 at 14:25:20 UTC, Benjamin Thaut wrote:

 that are named in a way that actually helps you understand what 
 they do.
Interesting, I've had the opposite experience. I keep trying to course ever more desired. if(string.IsNullOrEmpty(str)) vs if(str.empty) keeps throwing me off.
Jan 09 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 02:04, Jesse Phillips wrote:

 Interesting, I've had the opposite experience. I keep trying to perform

 more desired.



      if(string.IsNullOrEmpty(str))

 vs

      if(str.empty)

 keeps throwing me off.
Or as in Ruby on Rails: if str.blank? end "str" is conisderd blank if: * it's nil (null) * empty (its length is 0) * it only contains whitespce BTW, it works on all objects, not just strings. For arrays it will check the length as well, but for other objects it will just check for nil. -- /Jacob Carlborg
Jan 09 2014
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 This works fine:
   string x = find("Hello", 'H');

 This doesn't:
   string y = find(retro("Hello"), 'H');
   > Error: cannot implicitly convert expression 
 (find(retro("Hello"), 'H'))
 of type Result!() to string
In order to return the result as a string it would require an allocation. You have to request that allocation (and associated eager evaluation) explicitly string y = "Hello".retro.find('H').to!string; However, I think to get the expected result from unicode you need string y = "Hello".byGrapheme.retro.find('H').to!string; but I might be wrong.
Jan 09 2014
next sibling parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 00:34, John Colvin <john.loughran.colvin gmail.com>wrote:

 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

 This works fine:
   string x = find("Hello", 'H');

 This doesn't:
   string y = find(retro("Hello"), 'H');
   > Error: cannot implicitly convert expression (find(retro("Hello"),
 'H'))
 of type Result!() to string
In order to return the result as a string it would require an allocation. You have to request that allocation (and associated eager evaluation) explicitly string y = "Hello".retro.find('H').to!string;
Ah yes. Well I really just want the offset anyway... However, I think to get the expected result from unicode you need
 string y = "Hello".byGrapheme.retro.find('H').to!string;

 but I might be wrong.
Bugger that. This is not an example of "D is good at strings!".
Jan 09 2014
next sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 9 January 2014 at 15:14:04 UTC, Manu wrote:
 However, I think to get the expected result from unicode you 
 need
 string y = "Hello".byGrapheme.retro.find('H').to!string;

 but I might be wrong.
Bugger that. This is not an example of "D is good at strings!".
Agreed. std.range and std.algorithm should be unicode correct with strings and leave the byte by byte access to ubyte arrays.
Jan 09 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 9 January 2014 at 15:14:04 UTC, Manu wrote:
 However, I think to get the expected result from unicode you 
 need
 string y = "Hello".byGrapheme.retro.find('H').to!string;

 but I might be wrong.
Bugger that. This is not an example of "D is good at strings!".
I have 0 ideas how are you going to get same functionality in C with strchr. This small line uses quite lot of features to be reliably unicode-correct.
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 02:05, Dicebot <public dicebot.lv> wrote:

 On Thursday, 9 January 2014 at 15:14:04 UTC, Manu wrote:

 However, I think to get the expected result from unicode you need

 string y = "Hello".byGrapheme.retro.find('H').to!string;

 but I might be wrong.
Bugger that. This is not an example of "D is good at strings!".
I have 0 ideas how are you going to get same functionality in C with strchr. This small line uses quite lot of features to be reliably unicode-correct.
It's nice that it's unicode correct, but it's not nice that you have to be familiar with a massive amount of the standard library and you need to search through 4-5 (huge! and often poorly documented) modules to find the functions you need to perform _basic string operations_, like finding the last instance of a character... My standing opinion is that string manipulation in D is not nice, it is possibly the most difficult and time consuming I have used in any language ever. Am I alone?
Jan 09 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 9 January 2014 at 16:22:08 UTC, Manu wrote:
 It's nice that it's unicode correct, but it's not nice that you 
 have to be
 familiar with a massive amount of the standard library and you 
 need to
 search through 4-5 (huge! and often poorly documented) modules 
 to find the
 functions you need to perform _basic string operations_, like 
 finding the
 last instance of a character...
That I do agree. One idea is that once everything is split into smaller packages we can start providing meta-packages that do public imports of small sets of commonly used functions. Still once needed functions are found I do consider end result very robust for what it actually does and don't know any other language that does it better.
 My standing opinion is that string manipulation in D is not 
 nice, it is
 possibly the most difficult and time consuming I have used in 
 any language
 ever. Am I alone?
Unicode is the doom. If you only keep ASCII in mind you statement is indeed true and D stuff seems ridiculously complicated compared even to plain C. But it has also teached me that _every single_ program I have written before in other languages was broken in regards to Unicode handling. So, yes, it is quite difficult but it is the cost for doing what no one else does - being correct out of the box. Well, at least in most scenarios :)
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 02:36, Dicebot <public dicebot.lv> wrote:

 On Thursday, 9 January 2014 at 16:22:08 UTC, Manu wrote:

 It's nice that it's unicode correct, but it's not nice that you have to be
 familiar with a massive amount of the standard library and you need to
 search through 4-5 (huge! and often poorly documented) modules to find the
 functions you need to perform _basic string operations_, like finding the
 last instance of a character...
That I do agree. One idea is that once everything is split into smaller packages we can start providing meta-packages that do public imports of small sets of commonly used functions. Still once needed functions are found I do consider end result very robust for what it actually does and don't know any other language that does it better. My standing opinion is that string manipulation in D is not nice, it is
 possibly the most difficult and time consuming I have used in any language
 ever. Am I alone?
Unicode is the doom. If you only keep ASCII in mind you statement is indeed true and D stuff seems ridiculously complicated compared even to plain C. But it has also teached me that _every single_ program I have written before in other languages was broken in regards to Unicode handling. So, yes, it is quite difficult but it is the cost for doing what no one else does - being correct out of the box. Well, at least in most scenarios :)
That's great and all, but it's no good if I have to pay for it (time and money!) even when that's not a requirement. I'm dealing with ascii right now. At very least, there needs to be massive assistance. std.string should probably offer a crap load of aliases and wrappers for common operations. And I hate how std.algorithm looks in intellisense pop ups, you never have any idea what types you're dealing with, everything is templates, many levels deep. And then it's riddled with these little wrappers around 'Impl' types, which just adds more layers to the typing confusion. I want string functions that deal with types like 'string', not 'Unqual!(ElementEncodingType!(ElementType!Range))[]'
Jan 09 2014
next sibling parent "Dicebot" <public dicebot.lv> writes:
On Thursday, 9 January 2014 at 17:21:11 UTC, Manu wrote:
 money!) even when that's not a requirement. I'm dealing with 
 ascii right
 now.
Then first (and mandatory) thing to do is stop using `string` type and switch to `ubyte[]` (or wrapper from std.ascii) Second thing to do then will be to complain about lack of `ubyte[]` specializations/overloads for most string processing functions ;)
Jan 09 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-09 18:20, Manu wrote:

 That's great and all, but it's no good if I have to pay for it (time and
 money!) even when that's not a requirement. I'm dealing with ascii right
 now.
There are couple of functions in std.ascii but not what you needed here. -- /Jacob Carlborg
Jan 09 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 8:21 AM, Manu wrote:
 My standing opinion is that string manipulation in D is not nice, it is
 possibly the most difficult and time consuming I have used in any
 language ever. Am I alone?
No, but probably in the minority. The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT). std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration. I personally find strings very easy to deal with in D. They might be easier in Perl or sometimes Python, but at a steep efficiency cost. Walter has recently written a non-trivial utility that beats the pants off (3x performance) the equivalent C program that has been highly scrutinized and honed for literally decades by dozens (hundreds?) of professionals. Walter's implementations uses ranges and algorithms (a few standard, many custom) through and through. If all goes well we'll open-source it. He himself is now an range/algorithm convert, even though he'd be the first to point the no-nonsense nature of a function like strrchr. (And btw strrchr is after all a POS because it needs to scan the string left to right... so lastIndex is faster!) Andrei
Jan 09 2014
next sibling parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 15:48, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org
 wrote:
 On 1/9/14 8:21 AM, Manu wrote:

 My standing opinion is that string manipulation in D is not nice, it is
 possibly the most difficult and time consuming I have used in any
 language ever. Am I alone?
No, but probably in the minority. The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT). std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration.
The thing is, that pesky string thing is usually a trivial detail in an otherwise completely unrelated task. I'm not joking when I've had details like formatting a useful error message take 90% of the time to complete some totally unrelated task. I guess I'm a little isolated from high level algorithms, because I spend most of my time at the level of twiddling bits. This is a key motivation for my kicking off this all-D game project, and getting others involved. I need excuse to push myself to have more involvement with these type of things. Doing more high-level code than I usually do will help, and having other D users also in the project will keep me in check, and hopefully improve my D code a lot while at it ;) I personally find strings very easy to deal with in D. They might be easier
 in Perl or sometimes Python, but at a steep efficiency cost.

 Walter has recently written a non-trivial utility that beats the pants off
 (3x performance) the equivalent C program that has been highly scrutinized
 and honed for literally decades by dozens (hundreds?) of professionals.
 Walter's implementations uses ranges and algorithms (a few standard, many
 custom) through and through. If all goes well we'll open-source it. He
 himself is now an range/algorithm convert, even though he'd be the first to
 point the no-nonsense nature of a function like strrchr. (And btw strrchr
 is after all a POS because it needs to scan the string left to right... so
 lastIndex is faster!)
How long did it take to get him there? I suspect he made the leap only when a particular task that motivated him to do so came up. I suspect I'm likely to follow that same pattern given the context; like him, I'm a somewhat no-frills practicality-oriented programmer, and don't get too excited about futuristic shiny things unless it's readily apparent they can make my workload simpler and more efficient (although I would also require it not sacrifice computation efficiency). But my point remains, as a trivial ancillary detail - I'm not doing stuff with strings; I'm working on other stuff that just _has_ some strings - it's not presented in a way that one can just get the job done with low friction, and without at least tripling the number of imports from the std library.
Jan 09 2014
parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2014 10:37 PM, Manu wrote:
 How long did it take to get him there? I suspect he made the leap only when a
 particular task that motivated him to do so came up.
Pretty much true. And it was worth it :-)
Jan 22 2014
prev sibling next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 04:37:03PM +1000, Manu wrote:
 On 10 January 2014 15:48, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org
 wrote:
 On 1/9/14 8:21 AM, Manu wrote:

 My standing opinion is that string manipulation in D is not nice,
 it is possibly the most difficult and time consuming I have used in
 any language ever. Am I alone?
No, but probably in the minority. The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT). std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration.
The thing is, that pesky string thing is usually a trivial detail in an otherwise completely unrelated task. I'm not joking when I've had details like formatting a useful error message take 90% of the time to complete some totally unrelated task.
You have to be doing something wrong... formatting error messages is as trivial as using std.string.format: if (argsAreBad(x,y,z)) throw new Exception("Parameters x=%s y=%s z=%s are invalid!" .format(x,y,z)); I can't imagine what can be simpler than this. (Not to mention, %s in D just means "string format of X", so the above code will actually work for x, y, z of *any* type that has some kind of conversion to string. Try this with C/C++, and you'll be segfaulting all day.)
 I guess I'm a little isolated from high level algorithms, because I
 spend most of my time at the level of twiddling bits.
That would explain your difficulty with Phobos algorithms. :)
 This is a key motivation for my kicking off this all-D game project,
 and getting others involved. I need excuse to push myself to have more
 involvement with these type of things. Doing more high-level code than
 I usually do will help, and having other D users also in the project
 will keep me in check, and hopefully improve my D code a lot while at
 it ;)
Well, maybe the reward of not having to grit your teeth everytime you do string manipulation in D will motivate you to learn how to use Phobos effectively? :)
 I personally find strings very easy to deal with in D. They might be
 easier in Perl or sometimes Python, but at a steep efficiency cost.

 Walter has recently written a non-trivial utility that beats the
 pants off (3x performance) the equivalent C program that has been
 highly scrutinized and honed for literally decades by dozens
 (hundreds?) of professionals.  Walter's implementations uses ranges
 and algorithms (a few standard, many custom) through and through. If
 all goes well we'll open-source it. He himself is now an
 range/algorithm convert, even though he'd be the first to point the
 no-nonsense nature of a function like strrchr. (And btw strrchr is
 after all a POS because it needs to scan the string left to right...
 so lastIndex is faster!)
How long did it take to get him there? I suspect he made the leap only when a particular task that motivated him to do so came up. I suspect I'm likely to follow that same pattern given the context; like him, I'm a somewhat no-frills practicality-oriented programmer, and don't get too excited about futuristic shiny things unless it's readily apparent they can make my workload simpler and more efficient (although I would also require it not sacrifice computation efficiency).
I'm not the kind to get excited about futuristic shiny things either... I don't even use a GUI, for example! (Well, technically I do, since I'm running on X11, but it's so bare bones to the point that my manager is baffled how I could even begin to use such an interface. I barely ever touch the mouse except when browsing, for one thing. Almost everything is completely keyboard-driven.) And I'm also skeptical of new trendy overhyped things that has people jumping on the bandwagon by droves -- and usually it turns out that it's just another ordinary idea blown out of proportion by the PR machine. Yet I had no trouble getting up to speed with Phobos algorithms. I *will* say there's a learning curve, though -- you need to understand what ranges are and why they're the way they are, before you can fully grok Phobos algorithms. Andrei's article "On Iteration" (linked from the std.range docs) is almost a must-read. But IMO it's more than worth the time to learn this. It will revolutionize the way you think about code. ;-)
 But my point remains, as a trivial ancillary detail - I'm not doing
 stuff with strings; I'm working on other stuff that just _has_ some
 strings - it's not presented in a way that one can just get the job
 done with low friction, and without at least tripling the number of
 imports from the std library.
But that's the thing, if you have some level of facility with ranges, you could be using exactly the same algorithms for your other stuff as you'd use for strings. That's much less mental overhead than having to remember one set of API's for manipulating said other stuff, and a different set of API's for manipulating strings. The number of imports needed, though, is a different issue. That's something that Phobos needs improvement in. At least the last time I checked, the "Phobos philosophy", as stated on dlang.org, is that you shouldn't need to import half the library just to do a single simple operation like reading a file. Unfortunately, from what I can tell, that philosophy hasn't really been carried through. Lazy imports, discussed earlier this week, are a direction I'd like to see implemented some time in the near future. Some of the code bloat just from importing a single std module is a bit excessive, and bugs me quite a bit. Nevertheless, I haven't experienced any "high friction" issues in getting stuff done with strings. Once you learn where things are and what is available, it's pretty straightforward to throw something together. It does take a bit of time to learn this, but honestly, that's not any more effort than learning C for the first time and learning what strchr or memset means, and when to use strcat and when not to. In fact, I'd argue that learning the C string functions is a lot more effort, because they have so many pitfalls and gotchas that you must memorize and constantly keep in mind, otherwise your program suddenly acquires gratuitous segfaults, pointer bugs, and buffer overruns. IME, it takes *more* effort to write string manipulation code in C, rather than less, since so many more things can go wrong. T -- Turning your clock 15 minutes ahead won't cure lateness---you're just making time go faster!
Jan 09 2014
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-10 06:48, Andrei Alexandrescu wrote:
 On 1/9/14 8:21 AM, Manu wrote:
 My standing opinion is that string manipulation in D is not nice, it is
 possibly the most difficult and time consuming I have used in any
 language ever. Am I alone?
No, but probably in the minority. The long and short of it is, you must get ranges in order to enjoy the power of D algorithms (as per http://goo.gl/dVprVT). std.{algorithm,range} are commonly mentioned as an attractive asset of D, and those who get that style of doing things have no trouble applying such notions to a variety of data, notably including strings. So going with the attitude "I don't use, know, or care for phobos... I just want to do this pesky string thing!" is bound to create frustration.
Even if you do get how ranges work it can be difficult to figure out where a function is located, in std.algorithms, std.string, std.array, std.uni or std.range. Like, "is this a string operation or a general container algorithm?". Why is there a std.string.indexOf function? Isn't that a general array operation or algorithm? Isn't std.string.(left|right)Justify a general operation as well? -- /Jacob Carlborg
Jan 09 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 11:53 PM, Jacob Carlborg wrote:
 Even if you do get how ranges work it can be difficult to figure out
 where a function is located, in std.algorithms, std.string, std.array,
 std.uni or std.range. Like, "is this a string operation or a general
 container algorithm?". Why is there a std.string.indexOf function? Isn't
 that a general array operation or algorithm? Isn't
 std.string.(left|right)Justify a general operation as well?
That's a documentation issue. We've pursued generalization of string algorithms with good result. As such indexOf is susceptible for generalization. However, the justification functions are unlikely to be useful for other data types because most don't have a notion of "filler" object. Andrei
Jan 10 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-10 09:29, Andrei Alexandrescu wrote:

 That's a documentation issue. We've pursued generalization of string
 algorithms with good result. As such indexOf is susceptible for
 generalization. However, the justification functions are unlikely to be
 useful for other data types because most don't have a notion of "filler"
 object.
They might not have a default "filler" object but you can pass the "filler" as an argument. -- /Jacob Carlborg
Jan 10 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/10/14 12:49 AM, Jacob Carlborg wrote:
 On 2014-01-10 09:29, Andrei Alexandrescu wrote:

 That's a documentation issue. We've pursued generalization of string
 algorithms with good result. As such indexOf is susceptible for
 generalization. However, the justification functions are unlikely to be
 useful for other data types because most don't have a notion of "filler"
 object.
They might not have a default "filler" object but you can pass the "filler" as an argument.
By that I was implying that the whole notion is not sensible for general types. Honest, I did consider generalizing everything in std.string, but the algorithms left in there made little sense for other types than strings. Andrei
Jan 10 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 17:14, Andrei Alexandrescu wrote:

 By that I was implying that the whole notion is not sensible for general
 types. Honest, I did consider generalizing everything in std.string, but
 the algorithms left in there made little sense for other types than
 strings.
Fair enough, I just though I found a couple of more that could be generalized. -- /Jacob Carlborg
Jan 10 2014
prev sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 9 January 2014 at 14:34:43 UTC, John Colvin wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 This works fine:
  string x = find("Hello", 'H');

 This doesn't:
  string y = find(retro("Hello"), 'H');
  > Error: cannot implicitly convert expression 
 (find(retro("Hello"), 'H'))
 of type Result!() to string
In order to return the result as a string it would require an allocation. You have to request that allocation (and associated eager evaluation) explicitly string y = "Hello".retro.find('H').to!string; However, I think to get the expected result from unicode you need string y = "Hello".byGrapheme.retro.find('H').to!string; but I might be wrong.
Oh. I see you actually wanted strrchr behaviour. That's different.
Jan 09 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 09 Jan 2014 15:20:13 +0000
schrieb "John Colvin" <john.loughran.colvin gmail.com>:

 On Thursday, 9 January 2014 at 14:34:43 UTC, John Colvin wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 This works fine:
  string x =3D find("Hello", 'H');

 This doesn't:
  string y =3D find(retro("Hello"), 'H');
  > Error: cannot implicitly convert expression=20
 (find(retro("Hello"), 'H'))
 of type Result!() to string
In order to return the result as a string it would require an=20 allocation. You have to request that allocation (and associated=20 eager evaluation) explicitly string y =3D "Hello".retro.find('H').to!string; However, I think to get the expected result from unicode you=20 need string y =3D "Hello".byGrapheme.retro.find('H').to!string; but I might be wrong.
=20 Oh. I see you actually wanted strrchr behaviour. That's different.
The point about graphemes is good. D's functions still stop mid-way. From UTF-8 you can iterate UTF-32 code points, but grapheme clusters are the new characters. I.e. the basic need to iterate Unicode _characters_ is not supported! I cannot even come up with use cases for working with code points and think they are a conceptual black hole. Something carried over from a time when grapheme clusters didn't exist. When you search for 'A', '=C3=84' shows up when it is built from an A and the "two dots" symbol. It also has the walk length 2. This isn't an issue as long as we use strings from languages that are traditionally well supported with single code-unit characters. Basically the element type when iterating over a string would have to be another string of arbitrary length, since you could attach any number of combining diacritical symbols to a letter. See?: e=CD=9C=CD=A1=CD=9F=CD=9E --=20 Marco
Jan 09 2014
parent reply Jerry <jlquinn optonline.net> writes:
Marco Leise <Marco.Leise gmx.de> writes:

 Am Thu, 09 Jan 2014 15:20:13 +0000
 schrieb "John Colvin" <john.loughran.colvin gmail.com>:
 The point about graphemes is good. D's functions still stop
 mid-way. From UTF-8 you can iterate UTF-32 code points, but
 grapheme clusters are the new characters. I.e. the basic need
 to iterate Unicode _characters_ is not supported!
 I cannot even come up with use cases for working with code
 points and think they are a conceptual black hole. Something
 carried over from a time when grapheme clusters didn't exist.
Actually, you can do tons of NLP without grapheme clusters. If you're paranoid, you standardize on a specific Unicode normalization first. You can probably get a bit better results by paying attention to clusters, but I suspect it will be a marginal improvement. That said, I do agree with the OP that the string API is currently more complex to understand than I'd like. However, it's significantly easier to use than what's in standard C++ for anything beyond ascii. Jerry
Jan 09 2014
parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Thu, 09 Jan 2014 15:51:36 -0500
schrieb Jerry <jlquinn optonline.net>:

 Marco Leise <Marco.Leise gmx.de> writes:
=20
 Am Thu, 09 Jan 2014 15:20:13 +0000
 schrieb "John Colvin" <john.loughran.colvin gmail.com>:
=20
 The point about graphemes is good. D's functions still stop
 mid-way. From UTF-8 you can iterate UTF-32 code points, but
 grapheme clusters are the new characters. I.e. the basic need
 to iterate Unicode _characters_ is not supported!
 I cannot even come up with use cases for working with code
 points and think they are a conceptual black hole. Something
 carried over from a time when grapheme clusters didn't exist.
=20 Actually, you can do tons of NLP without grapheme clusters. If you're paranoid, you standardize on a specific Unicode normalization first. =20 You can probably get a bit better results by paying attention to clusters, but I suspect it will be a marginal improvement. =20 That said, I do agree with the OP that the string API is currently more complex to understand than I'd like. However, it's significantly easier to use than what's in standard C++ for anything beyond ascii. =20 Jerry
Sorry, I got confused with the Unicode definitions. I see now that a grapheme cluster is e.g. \r\n. What I really meant is that Phobos needs to support graphemes. But seeing that monsters like this exist: n=CD=A0g, I don't even know if this is one character or two, but right now Phobos sees it as three characters. --=20 Marco
Jan 10 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-10 17:01, Marco Leise wrote:

 Sorry, I got confused with the Unicode definitions. I see now
 that a grapheme cluster is e.g. \r\n. What I really meant is
 that Phobos needs to support graphemes. But seeing that
 monsters like this exist: n͠g, I don't even know if this is
 one character or two, but right now Phobos sees it as three
 characters.
Thunderbird sees that as two characters. Ruby sees it as three. -- /Jacob Carlborg
Jan 10 2014
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Fri, 10 Jan 2014 18:07:54 +0100
schrieb Jacob Carlborg <doob me.com>:

 On 2014-01-10 17:01, Marco Leise wrote:
=20
 Sorry, I got confused with the Unicode definitions. I see now
 that a grapheme cluster is e.g. \r\n. What I really meant is
 that Phobos needs to support graphemes. But seeing that
 monsters like this exist: n=CD=A0g, I don't even know if this is
 one character or two, but right now Phobos sees it as three
 characters.
=20 Thunderbird sees that as two characters. Ruby sees it as three.
I think this is the (or one of the) official documents about where a "user-perceived character" ends: http://www.unicode.org/reports/tr29/#Grapheme_Cluster_Boundary_Rules According to this, the above n=CD=A0g is indeed defined as 2 characters. Ruby is just no better than Phobos :p =C2=BBGrapheme cluster boundaries are important for collation, regular expressions, UI interactions (such as mouse selection, arrow key movement, backspacing), segmentation for vertical text, identification of boundaries for first-letter styling, and counting =E2=80=9Ccharacter=E2=80=9D positions within text.=C2=AB --=20 Marco
Jan 10 2014
prev sibling next sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 09 Jan 2014 14:07:36 -0000, Manu <turkeyman gmail.com> wrote:
 This works fine:
   string x = find("Hello", 'H');

 This doesn't:
   string y = find(retro("Hello"), 'H');
   > Error: cannot implicitly convert expression (find(retro("Hello"),  
 'H'))
 of type Result!() to string

 Is that wrong? That seems to be how the docs suggest it should be used.

 On a side note, am I the only one that finds std.algorithm/std.range/etc
 for string processing really obtuse?
 I can rarely understand the error messages, so say it's better than STL  
 is  optimistic.
 Using std.algorithm and std.range to do string manipulation feels really
 lame to me.
 I hate looking through the docs of 3-4 modules to understand the complete
 set of useful string operations (std.string, std.uni, std.algorithm,
 std.range... at least).
 I also find the names of the generic algorithms are often unrelated to  
 the name of the string operation.
 My feeling is, everyone is always on about how cool D is at string, but
 other than 'char[]', and the builtin slice operator, I feel really
 unproductive whenever I do any heavy string manipulation in D.
 I also hate that I need to import at least 4-5 modules to do anything
 useful with strings... I feel my program bloating and cringe with every
 gigantic import that sources exactly one symbol.
I feel exactly the same way. I must admit I haven't done any serious D for a couple of years now, and the main reason is lack of free time, but the other is that each time I come back to try and do something I get weird arse error messages like the one you got above. I realise that it is probably the way it is, to avoid bloating the language with several ways to do the same thing. I agree with that position, however.. I don't think it's a bad thing (TM) to have a custom/specific set of operations for a given area which re-use more generic operations behind the scenes. In other words, why can't we alias or wrap the generic routines in std.string such that the expected operations are easy to find and do exactly what you'd expect, for strings. If someone is dealing with generic code where the ranges involved might be strings/arrays or might be something else of course they will call std.range functions, but if they are only dealing with strings there should be string specific functions for them to call - which may/may not use std.range or std.algorithm functions etc behind the scenes. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 09 2014
next sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 09 Jan 2014 17:15:41 -0000, Regan Heath <regan netmail.co.nz>  
wrote:

 In other words, why can't we alias or wrap the generic routines in  
 std.string
What I meant here is why can't we alias or wrap the generic routines (from std.range, std.algo.. into aliases/functions) in std.string. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 03:17, Regan Heath <regan netmail.co.nz> wrote:

 On Thu, 09 Jan 2014 17:15:41 -0000, Regan Heath <regan netmail.co.nz>
 wrote:

  In other words, why can't we alias or wrap the generic routines in
 std.string
What I meant here is why can't we alias or wrap the generic routines (from std.range, std.algo.. into aliases/functions) in std.string.
We can and should. Very liberally. I'm still very concerned about the magnitude of bloat that gets pulled in by any of these modules though. They're all intimately connected, none of them seem to be able to exist without all of the others. And there are some really huge template functions out there. Massive functions, which take multiple template arguments (N^2 permutations), where the template types might only affects one or 2 lines... they need to be broken down into very small template functions, and a non-templated inner function.
Jan 09 2014
parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 09 Jan 2014 17:25:13 -0000, Manu <turkeyman gmail.com> wrote:

 On 10 January 2014 03:17, Regan Heath <regan netmail.co.nz> wrote:

 On Thu, 09 Jan 2014 17:15:41 -0000, Regan Heath <regan netmail.co.nz>
 wrote:

  In other words, why can't we alias or wrap the generic routines in
 std.string
What I meant here is why can't we alias or wrap the generic routines (from std.range, std.algo.. into aliases/functions) in std.string.
We can and should. Very liberally. I'm still very concerned about the magnitude of bloat that gets pulled in by any of these modules though. They're all intimately connected, none of them seem to be able to exist without all of the others. And there are some really huge template functions out there. Massive functions, which take multiple template arguments (N^2 permutations), where the template types might only affects one or 2 lines... they need to be broken down into very small template functions, and a non-templated inner function.
We need, if one does not exist already, a dependency mapper tool. One which would give some sort of graphical/hierarchical output of modules and their dependencies, ideally drilling right down to the functions, methods, variables etc being used. Sounds fun, and there is a DMD frontend to build on right? Anyone got the spare time? Regan -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 10 2014
prev sibling parent reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Thursday, 9 January 2014 at 17:15:43 UTC, Regan Heath wrote:

clip

 In other words, why can't we alias or wrap the generic routines 
 in std.string such that the expected operations are easy to 
 find and do exactly what you'd expect, for strings.

 If someone is dealing with generic code where the ranges 
 involved might be strings/arrays or might be something else of 
 course they will call std.range functions, but if they are only 
 dealing with strings there should be string specific functions 
 for them to call - which may/may not use std.range or 
 std.algorithm functions etc behind the scenes.

 R
I think this would be a nice solution. I only use D for string processing rarely and as a result I always struggle a bit, because I can never remember where to go to look for things. Happily, my most recent experience with it was fairly smooth. A while ago I was trying to do something with splitter on a string and I ended up asking a question on D.learn. I got into a very confusing debate because the person trying to help me thought I was using the splitter in std.array and I was using the one from another module (see the last few posts from here): http://www.digitalmars.com/d/archives/digitalmars/D/learn/splitting_numbers_from_a_test_file_39448.html It would be nice if std.string in D provided a nice, easy, string manipulation that swept most of the difficulties under the table, and provided links in the documentation to the functions they wrap for when people want to do more complex things.
Jan 09 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 9 January 2014 at 18:57:26 UTC, Craig Dillabaugh 
wrote:
 A while ago I was trying to do something with splitter on a 
 string and I ended up asking a question on D.learn. [...]

 It would be nice if std.string in D provided a nice, easy, 
 string manipulation that swept most of the difficulties under 
 the table
http://dlang.org/phobos/std_array.html#split Note that std.array is publicly imported from std.string so this works: void main() { import std.string; auto parts = "hello".split("l"); import std.stdio; writeln(parts); }
 provided links in the documentation to the functions they wrap 
 for when people want to do more complex things.
Actually, when writing my D book, I decided to spend more time on the unicode stuff in strings than these basic operations, since I thought these were pretty straightforward. But maybe the docs suck more than I thought. I learned most of D string stuff from Phobos1 which kept it all simple...
Jan 09 2014
parent reply "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Thursday, 9 January 2014 at 19:05:19 UTC, Adam D. Ruppe wrote:
 On Thursday, 9 January 2014 at 18:57:26 UTC, Craig Dillabaugh 
 wrote:
 A while ago I was trying to do something with splitter on a 
 string and I ended up asking a question on D.learn. [...]

 It would be nice if std.string in D provided a nice, easy, 
 string manipulation that swept most of the difficulties under 
 the table
http://dlang.org/phobos/std_array.html#split Note that std.array is publicly imported from std.string so this works: void main() { import std.string; auto parts = "hello".split("l"); import std.stdio; writeln(parts); }
 provided links in the documentation to the functions they wrap 
 for when people want to do more complex things.
Actually, when writing my D book, I decided to spend more time on the unicode stuff in strings than these basic operations, since I thought these were pretty straightforward. But maybe the docs suck more than I thought. I learned most of D string stuff from Phobos1 which kept it all simple...
Thats the thing. In most cases the correct way to do something in D, does end up being rather nice. However, its often a bit of a challenge finding the that correct way! When I had my troubles I expected to find the library solutions in std.string (remember I rarely use D's string processing utilities). It never really occurred to me that I might want to check std.array for the function I wanted. So what it std.array is imported when I import std.string, as a programmer I still had no idea 'split()' was there! At the very least the documentation for std.string should say something along the lines of: "The libraries std.unicode and std.array also include a number of functions that operate on strings, so if what you are looking for isn't here, try looking there."
Jan 09 2014
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 09, 2014 at 08:53:12PM +0000, Craig Dillabaugh wrote:
[...]
 Thats the thing.  In most cases the correct way to do something in
 D, does end up being rather nice.  However, its often a bit of a
 challenge finding the that correct way!
 
 When I had my troubles I expected to find the library solutions in
 std.string (remember I rarely use D's string processing utilities).
 It never really occurred to me that I might want to check std.array
 for the function I wanted. So what it std.array is imported when I
 import std.string, as a programmer I still had no idea 'split()' was
 there!
 
 At the very least the documentation for std.string should say
 something along the lines of:
 
 "The libraries std.unicode and std.array also include a number of
 functions that operate on strings, so if what you are looking for
 isn't here, try looking there."
Yeah, any public imports should be mentioned somewhere in the docs, otherwise it's just random invisible magic as far as the end-user is concerned ("Hmm, I imported std.string in one module, and array.front works, but in this other module, array.front doesn't work! Why? Who knows."); Please submit a pull request to add that to the docs. T -- People walk. Computers run.
Jan 09 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 00:34, H. S. Teoh wrote:

 Yeah, any public imports should be mentioned somewhere in the docs,
 otherwise it's just random invisible magic as far as the end-user is
 concerned ("Hmm, I imported std.string in one module, and array.front
 works, but in this other module, array.front doesn't work! Why? Who
 knows.");

 Please submit a pull request to add that to the docs.
I agree, and it should be automatic. -- /Jacob Carlborg
Jan 09 2014
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 9 January 2014 at 20:53:13 UTC, Craig Dillabaugh 
wrote:
 Thats the thing.  In most cases the correct way to do something 
 in D, does end up being rather nice.  However, its often a bit 
 of a challenge finding the that correct way!
Yeah, and indeed it is a bit weird that so many of the functions moved from std.string to std.array. (Yet are still specialized on strings... I think they have to be in the same module just to be in the same overload set though.)
 "The libraries std.unicode and std.array also include a number 
 of functions that operate on strings, so if what you are 
 looking for isn't here, try looking there."
Aye, I think the documentation could use a few higher level overviews that bring the modules together.
Jan 09 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 3:39 PM, Adam D. Ruppe wrote:
 Aye, I think the documentation could use a few higher level overviews
 that bring the modules together.
PRP Andrei
Jan 09 2014
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 06:53, Craig Dillabaugh <cdillaba cg.scs.carleton.ca>wrote:

 On Thursday, 9 January 2014 at 19:05:19 UTC, Adam D. Ruppe wrote:

 On Thursday, 9 January 2014 at 18:57:26 UTC, Craig Dillabaugh wrote:

 A while ago I was trying to do something with splitter on a string and I
 ended up asking a question on D.learn. [...]

 It would be nice if std.string in D provided a nice, easy, string
 manipulation that swept most of the difficulties under the table
http://dlang.org/phobos/std_array.html#split Note that std.array is publicly imported from std.string so this works: void main() { import std.string; auto parts = "hello".split("l"); import std.stdio; writeln(parts); } provided links in the documentation to the functions they wrap for when
 people want to do more complex things.
Actually, when writing my D book, I decided to spend more time on the unicode stuff in strings than these basic operations, since I thought these were pretty straightforward. But maybe the docs suck more than I thought. I learned most of D string stuff from Phobos1 which kept it all simple...
Thats the thing. In most cases the correct way to do something in D, does end up being rather nice. However, its often a bit of a challenge finding the that correct way! When I had my troubles I expected to find the library solutions in std.string (remember I rarely use D's string processing utilities). It never really occurred to me that I might want to check std.array for the function I wanted. So what it std.array is imported when I import std.string, as a programmer I still had no idea 'split()' was there! At the very least the documentation for std.string should say something along the lines of: "The libraries std.unicode and std.array also include a number of functions that operate on strings, so if what you are looking for isn't here, try looking there."
Or just alias the functions useful for string processing...
Jan 09 2014
parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-10 02:34, Manu wrote:

 Or just alias the functions useful for string processing...
I agree. It already has some aliases, converting to lower and uppercase. -- /Jacob Carlborg
Jan 09 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 11:56 PM, Jacob Carlborg wrote:
 On 2014-01-10 02:34, Manu wrote:

 Or just alias the functions useful for string processing...
I agree. It already has some aliases, converting to lower and uppercase.
I wouldn't want to get to the point where many functions have 2-3 names. Andrei
Jan 10 2014
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 09:16, Andrei Alexandrescu wrote:

 I wouldn't want to get to the point where many functions have 2-3 names.
They're aliased in from std.uni, I think that's a different thing. It's not like Ruby which has both "collect" and "map", in the same place, meaning the same thing. -- /Jacob Carlborg
Jan 10 2014
prev sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 10 Jan 2014 08:16:53 -0000, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 1/9/14 11:56 PM, Jacob Carlborg wrote:
 On 2014-01-10 02:34, Manu wrote:

 Or just alias the functions useful for string processing...
I agree. It already has some aliases, converting to lower and uppercase.
I wouldn't want to get to the point where many functions have 2-3 names.
This is only a problem if they are all in the same sphere of concern.. by that I mean if you're looking for string functions and you find 2 names for the same function this would be wrong/confusing/pointless. But, if you have one name in the string category and one in the range category and they were both the same function underneath I don't see this as the "same" problem, right? R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 10 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/10/14 6:07 AM, Regan Heath wrote:
 On Fri, 10 Jan 2014 08:16:53 -0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 1/9/14 11:56 PM, Jacob Carlborg wrote:
 On 2014-01-10 02:34, Manu wrote:

 Or just alias the functions useful for string processing...
I agree. It already has some aliases, converting to lower and uppercase.
I wouldn't want to get to the point where many functions have 2-3 names.
This is only a problem if they are all in the same sphere of concern.. by that I mean if you're looking for string functions and you find 2 names for the same function this would be wrong/confusing/pointless. But, if you have one name in the string category and one in the range category and they were both the same function underneath I don't see this as the "same" problem, right?
The way I see it one learns a name for an algorithm (low cognitive load) and then uses it everywhere. This is not Go. Andrei
Jan 10 2014
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 10 Jan 2014 16:30:12 -0000, Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> wrote:

 On 1/10/14 6:07 AM, Regan Heath wrote:
 On Fri, 10 Jan 2014 08:16:53 -0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:

 On 1/9/14 11:56 PM, Jacob Carlborg wrote:
 On 2014-01-10 02:34, Manu wrote:

 Or just alias the functions useful for string processing...
I agree. It already has some aliases, converting to lower and =
 uppercase.
I wouldn't want to get to the point where many functions have 2-3 =
 names.
This is only a problem if they are all in the same sphere of concern.=
.
 by that I mean if you're looking for string functions and you find 2
 names for the same function this would be wrong/confusing/pointless.
 But, if you have one name in the string category and one in the range=
 category and they were both the same function underneath I don't see
 this as the "same" problem, right?
The way I see it one learns a name for an algorithm (low cognitive loa=
d) =
 and then uses it everywhere. This is not Go.
Sure. But, lets take an example: std.algorithm.canFind is more or less = = what you might call std.string.contains (which does not exist - instead = = we'd use indexOf !=3D -1.. I think). What is the harm in having an alias in std.string called contains which = = simply calls std.algorithm.canFind? Sure, it opens the door to someone using both canFind and contains on = strings in their code. So what? Use of contains is more likely/intuiti= ve = for string related code, but both are intelligible. canFind will be mor= e = likely in generic code, where you would think of that generic algorithm = = name. It seems to me that people think of algorithms by different names in = different contexts. In the context of strings "contains" would make the= = most intuitive sense to the most people. Side-issue.. from std.algorithm: bool canFind(alias pred =3D "a =3D=3D b", R, E)(R haystack, E needle) if= = (is(typeof(find!pred(haystack, needle)))); Returns true if and only if **value** can be found in range. Performs = =CE=9F(needle.length) evaluations of pred. What is **value** shouldn't that be needle? R -- = Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 13 2014
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-13 13:53, Regan Heath wrote:

 Sure.  But, lets take an example: std.algorithm.canFind is more or less
 what you might call std.string.contains (which does not exist - instead
 we'd use indexOf != -1.. I think).
I think "contains" is a way better name. That's what most other languages use, I think. -- /Jacob Carlborg
Jan 13 2014
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/13/14 4:53 AM, Regan Heath wrote:
 On Fri, 10 Jan 2014 16:30:12 -0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 The way I see it one learns a name for an algorithm (low cognitive
 load) and then uses it everywhere. This is not Go.
Sure. But, lets take an example: std.algorithm.canFind is more or less what you might call std.string.contains (which does not exist - instead we'd use indexOf != -1.. I think).
Well there's no perfection in this world :o).
 What is the harm in having an alias in std.string called contains which
 simply calls std.algorithm.canFind?
I think you can answer that for yourself. Just take the approach to its logical conclusion.
 Sure, it opens the door to someone using both canFind and contains on
 strings in their code.  So what?  Use of contains is more
 likely/intuitive for string related code, but both are intelligible.
 canFind will be more likely in generic code, where you would think of
 that generic algorithm name.

 It seems to me that people think of algorithms by different names in
 different contexts.  In the context of strings "contains" would make the
 most intuitive sense to the most people.
I agree that good names are difficult to find. I think you'd have a hard time with a "the more the merrier" stance.
 Side-issue.. from std.algorithm:

 bool canFind(alias pred = "a == b", R, E)(R haystack, E needle) if
 (is(typeof(find!pred(haystack, needle))));
 Returns true if and only if **value** can be found in range. Performs
 Ο(needle.length) evaluations of pred.

 What is **value** shouldn't that be needle?
Please file a bug or pull request. Thanks! Andrei
Jan 22 2014
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 22 Jan 2014 19:39:14 -0000, Andrei Alexandrescu  =

<SeeWebsiteForEmail erdani.org> wrote:

 On 1/13/14 4:53 AM, Regan Heath wrote:
 On Fri, 10 Jan 2014 16:30:12 -0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 The way I see it one learns a name for an algorithm (low cognitive
 load) and then uses it everywhere. This is not Go.
Sure. But, lets take an example: std.algorithm.canFind is more or le=
ss
 what you might call std.string.contains (which does not exist - inste=
ad
 we'd use indexOf !=3D -1.. I think).
Well there's no perfection in this world :o).
 What is the harm in having an alias in std.string called contains whi=
ch
 simply calls std.algorithm.canFind?
I think you can answer that for yourself. Just take the approach to it=
s =
 logical conclusion.
You mean the best possible name in all contexts, yes! :p Seriously, I am not suggesting we do this for all functions all the time= , = but just enough so that most users find what they expect to find and get= = what they expect to get, where it doesn't break D's philosophy of not = doing inefficient things for the sake of being generic of course.
 Sure, it opens the door to someone using both canFind and contains on=
 strings in their code.  So what?  Use of contains is more
 likely/intuitive for string related code, but both are intelligible.
 canFind will be more likely in generic code, where you would think of=
 that generic algorithm name.

 It seems to me that people think of algorithms by different names in
 different contexts.  In the context of strings "contains" would make =
the
 most intuitive sense to the most people.
I agree that good names are difficult to find. I think you'd have a ha=
rd =
 time with a "the more the merrier" stance.
This. Not my position. Rather I am suggesting we identify individual = omissions (like std.string.contains) and add an alias. So that people = don't have to struggle quite so much when switching to D. The lower the= = bar and all that..
 Side-issue.. from std.algorithm:

 bool canFind(alias pred =3D "a =3D=3D b", R, E)(R haystack, E needle)=
if
 (is(typeof(find!pred(haystack, needle))));
 Returns true if and only if **value** can be found in range. Performs=
 =CE=9F(needle.length) evaluations of pred.

 What is **value** shouldn't that be needle?
Please file a bug or pull request. Thanks!
Bug filed. R
Jan 23 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/23/14 8:06 AM, Regan Heath wrote:
 This.  Not my position.  Rather I am suggesting we identify individual
 omissions (like std.string.contains) and add an alias.  So that people
 don't have to struggle quite so much when switching to D.  The lower the
 bar and all that..
Ionno. Just look at the current morass with https://github.com/D-Programming-Language/phobos/pull/1875. We have two names for the same function "canFind" and "any". Then we want to deprecate one, but look at how much impact it's having on Phobos alone. Are you sure you want to add a _third_? Andrei
Jan 23 2014
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-23 21:53, Andrei Alexandrescu wrote:

 Ionno. Just look at the current morass with
 https://github.com/D-Programming-Language/phobos/pull/1875. We have two
 names for the same function "canFind" and "any". Then we want to
 deprecate one, but look at how much impact it's having on Phobos alone.
 Are you sure you want to add a _third_?
Personally I would expect "any" to take a predicate and return "true" if it can find any matching element. If a predicate is not supplied it would behave as the opposite of "empty". I would expect "contains" to take a element and check if it exists in the range. I think "canFind" is just a weird name. -- /Jacob Carlborg
Jan 24 2014
next sibling parent "Andrea Fontana" <nospam example.com> writes:
On Friday, 24 January 2014 at 08:21:12 UTC, Jacob Carlborg wrote:
 Personally I would expect "any" to take a predicate and return 
 "true" if it can find any matching element. If a predicate is 
 not supplied it would behave as the opposite of "empty".
+1
 I would expect "contains" to take a element and check if it 
 exists in the range.
+1
 I think "canFind" is just a weird name.
+1 I also think that contains/canFind/etc ... should get advantage of sorted ranges to speed up their searches (now it seems they don't!)
Jan 24 2014
prev sibling next sibling parent reply "Stanislav Blinov" <stanislav.blinov gmail.com> writes:
On Friday, 24 January 2014 at 08:21:12 UTC, Jacob Carlborg wrote:
 On 2014-01-23 21:53, Andrei Alexandrescu wrote:
 I would expect "contains" to take a element and check if it 
 exists in the range.

 I think "canFind" is just a weird name.
I agree on the latter point. As for "contains"... Well, if we address the terminology, we should consider that ranges are not really containers, therefore "contains" would be slightly incorrect. Perhaps "encounters" or "isWithin"? :) On a serious note though, "contains" is leagues ahead of "canFind".
Jan 24 2014
parent "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 24 Jan 2014 08:36:07 -0000, Stanislav Blinov  
<stanislav.blinov gmail.com> wrote:

 On Friday, 24 January 2014 at 08:21:12 UTC, Jacob Carlborg wrote:
 On 2014-01-23 21:53, Andrei Alexandrescu wrote:
 I would expect "contains" to take a element and check if it exists in  
 the range.

 I think "canFind" is just a weird name.
I agree on the latter point. As for "contains"... Well, if we address the terminology, we should consider that ranges are not really containers, therefore "contains" would be slightly incorrect. Perhaps "encounters" or "isWithin"? :)
This is the complete opposite of the point I was trying to make :p I don't want a generic name/function, or a range specific name/function we already have the generic one, and probably a range one, I want a string specific one - in this particular example - with a name people will expect to find (pun intended) when doing string manipulation.
 On a serious note though, "contains" is leagues ahead of "canFind".
I think it depends on the context. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 24 2014
prev sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Fri, 24 Jan 2014 08:21:12 -0000, Jacob Carlborg <doob me.com> wrote:

 On 2014-01-23 21:53, Andrei Alexandrescu wrote:

 Ionno. Just look at the current morass with
 https://github.com/D-Programming-Language/phobos/pull/1875. We have two
 names for the same function "canFind" and "any". Then we want to
 deprecate one, but look at how much impact it's having on Phobos alone.
 Are you sure you want to add a _third_?
Personally I would expect "any" to take a predicate and return "true" if it can find any matching element. If a predicate is not supplied it would behave as the opposite of "empty".
 I would expect "contains" to take a element and check if it exists in  
 the range.
Except in the case of string, where we also want an overload taking more than a single element aka a substring.
 I think "canFind" is just a weird name.
Me too, but it makes sense as a "generic" name I think. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 24 2014
parent reply Manu <turkeyman gmail.com> writes:
On 24 January 2014 21:49, Regan Heath <regan netmail.co.nz> wrote:

 On Fri, 24 Jan 2014 08:21:12 -0000, Jacob Carlborg <doob me.com> wrote:

 I would expect "contains" to take a element and check if it exists in the
 range.
Except in the case of string, where we also want an overload taking more than a single element aka a substring.
A great example of when the string function should not be conflated with the general function.
Jan 25 2014
next sibling parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 25 January 2014 at 09:18:24 UTC, Manu wrote:
 On 24 January 2014 21:49, Regan Heath <regan netmail.co.nz> 
 wrote:

 On Fri, 24 Jan 2014 08:21:12 -0000, Jacob Carlborg 
 <doob me.com> wrote:

 I would expect "contains" to take a element and check if it 
 exists in the
 range.
Except in the case of string, where we also want an overload taking more than a single element aka a substring.
A great example of when the string function should not be conflated with the general function.
You always want the overload. If this works: contains("hello", "el"); then this should work: contains([1, 2, 3, 4, 5], [2, 3]); Special cases are pure evil. There's nothing special about strings in this case.
Jan 25 2014
next sibling parent "Jacob Carlborg" <doob me.com> writes:
On Saturday, 25 January 2014 at 10:15:30 UTC, Peter Alexander 
wrote:

 You always want the overload.

 If this works:

     contains("hello", "el");

 then this should work:

     contains([1, 2, 3, 4, 5], [2, 3]);

 Special cases are pure evil. There's nothing special about 
 strings in this case.
I agree. Since strings are just a kind of array, it would be stupid to not allow the above. -- /Jacob Carlborg
Jan 25 2014
prev sibling next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 25 January 2014 at 10:15:30 UTC, Peter Alexander 
wrote:
 If this works:

     contains("hello", "el");

 then this should work:

     contains([1, 2, 3, 4, 5], [2, 3]);

 Special cases are pure evil. There's nothing special about 
 strings in this case.
I don't disagree, but naming and intuitive semantics should match up. In this case it does not. "contains" signifies set membership. hasSequence/findSequence would be more appropriate
Jan 25 2014
parent reply "Peter Alexander" <peter.alexander.au gmail.com> writes:
On Saturday, 25 January 2014 at 11:43:03 UTC, Ola Fosheim Grøstad 
wrote:
 On Saturday, 25 January 2014 at 10:15:30 UTC, Peter Alexander 
 wrote:
 If this works:

    contains("hello", "el");

 then this should work:

    contains([1, 2, 3, 4, 5], [2, 3]);

 Special cases are pure evil. There's nothing special about 
 strings in this case.
I don't disagree, but naming and intuitive semantics should match up. In this case it does not. "contains" signifies set membership. hasSequence/findSequence would be more appropriate
100% agree. The key thing is that it should be consistent between strings and other range types.
Jan 25 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 25 January 2014 at 14:23:48 UTC, Peter Alexander 
wrote:
 100% agree. The key thing is that it should be consistent 
 between strings and other range types.
Indeed. It is better to have to look up the name in the beginning. Also, a good IDE will give you a list of alternatives and it is important to keep this list as short as possible. Ideally there should be no more than 10 functions for any type in order to maximize the benefit of using an IDE. So few functions, but with very descriptive names make me more efficient (I don't have to look it up in the documentation). Basically, it is better to have a small core that can be used with lambdas for the specifics. I notice when I do Python that I don't use all the special functions. I use the generic ones with lambdas instead.
Jan 25 2014
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 25 Jan 2014 14:36:52 +0000
schrieb "Ola Fosheim Gr=C3=B8stad"
<ola.fosheim.grostad+dlang gmail.com>:

 On Saturday, 25 January 2014 at 14:23:48 UTC, Peter Alexander=20
 wrote:
 100% agree. The key thing is that it should be consistent=20
 between strings and other range types.
=20 Indeed. It is better to have to look up the name in the beginning.
If the name works well for strings, I agree. But otherwise I prefer established function names from popular languages like Python, Pascal/Delphi or Java.
 Also, a good IDE will give you a list of alternatives and it is=20
 important to keep this list as short as possible. Ideally there=20
 should be no more than 10 functions for any type in order to=20
 maximize the benefit of using an IDE. So few functions, but with=20
 very descriptive names make me more efficient (I don't have to=20
 look it up in the documentation).
There are 360 completions for a string already in Mono-D with these imports: import std.algorithm; import std.array; import std.conv; import std.range; import std.stdio; import std.string; import std.traits; Don't bother with removing two or three function names for string overloads. That's optimizing in the wrong area. :) --=20 Marco
Jan 26 2014
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 25 January 2014 20:15, Peter Alexander <peter.alexander.au gmail.com>wrote:

 On Saturday, 25 January 2014 at 09:18:24 UTC, Manu wrote:

 On 24 January 2014 21:49, Regan Heath <regan netmail.co.nz> wrote:

  On Fri, 24 Jan 2014 08:21:12 -0000, Jacob Carlborg <doob me.com> wrote:
  I would expect "contains" to take a element and check if it exists in
 the
 range.
Except in the case of string, where we also want an overload taking more than a single element aka a substring.
A great example of when the string function should not be conflated with the general function.
You always want the overload. If this works: contains("hello", "el"); then this should work: contains([1, 2, 3, 4, 5], [2, 3]); Special cases are pure evil. There's nothing special about strings in this case.
Does that work in all cases of strings wrt utf encodings? String normalisation? What about when char and wchar strings are compared? (should that should be handled transparently?) Strings are special, they're almost always special, and rarely conflate with generalisations well. Strings almost always exposes special cases that aren't useful or relevant for any other context.
Jan 25 2014
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/25/14 6:07 PM, Manu wrote:
 On 25 January 2014 20:15, Peter Alexander <peter.alexander.au gmail.com
 <mailto:peter.alexander.au gmail.com>> wrote:
     If this works:

          contains("hello", "el");

     then this should work:

          contains([1, 2, 3, 4, 5], [2, 3]);

     Special cases are pure evil. There's nothing special about strings
     in this case.


 Does that work in all cases of strings wrt utf encodings?
Yes.
 String normalisation?
Not automatically, but you could do things such as find(haystack.byGrapheme, needle.byGrapheme).
 What about when char and wchar strings are compared?
Yes.
 (should that should be handled transparently?)
It is. D's stdlib is one of very few languages that can do this out of the box without converting one to the other.
 Strings are special, they're almost always special, and rarely conflate
 with generalisations well.
There is considerable evidence that the above is wrong.
 Strings almost always exposes special cases that aren't useful or
 relevant for any other context.
The special cases I found almost always involve whitespace (there is no obvious generalization of the notion). Other than that, UTF strings are nothing else but boring variable-length encodings. Andrei
Jan 25 2014
prev sibling parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Sat, 25 Jan 2014 10:15:28 -0000, Peter Alexander  
<peter.alexander.au gmail.com> wrote:
 Special cases are pure evil. There's nothing special about strings in  
 this case.
This is a tangent to my suggestion. I am arguing for domain specific language (aliases) where sensible, not domain specific functions. If canFind can already handle all the desirable string cases, perfect, but lets alias it in std.string as "contains" so that people find what they expect to find first time and don't get frustrated looking for the correct generic name for the functionality they want. There are likely other cases where we already have all the functionality in a nice generic function, but people struggle to find it because it has a suitably generic name. I just want us to lower the bar for beginners coming from other languages R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 27 2014
next sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Monday, 27 January 2014 at 14:27:42 UTC, Regan Heath wrote:
 On Sat, 25 Jan 2014 10:15:28 -0000, Peter Alexander 
 <peter.alexander.au gmail.com> wrote:
 Special cases are pure evil. There's nothing special about 
 strings in this case.
This is a tangent to my suggestion. I am arguing for domain specific language (aliases) where sensible, not domain specific functions. If canFind can already handle all the desirable string cases, perfect, but lets alias it in std.string as "contains" so that people find what they expect to find first time and don't get frustrated looking for the correct generic name for the functionality they want. There are likely other cases where we already have all the functionality in a nice generic function, but people struggle to find it because it has a suitably generic name. I just want us to lower the bar for beginners coming from other R
I think that is a small short-term learning advantage but huge long-term damage for code readability. Now you suddenly need to not only remember what Phobos can do but also all defined aliases for that stuff. What could have been awesome is to be able to define such aliases via DDOC so that IDE's can understand them and list in auto-completion, while still putting "real" name in source code. It would have solved discoverability issue without harming naming consistency.
Jan 27 2014
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 27 Jan 2014 14:34:30 -0000, Dicebot <public dicebot.lv> wrote:

 On Monday, 27 January 2014 at 14:27:42 UTC, Regan Heath wrote:
 On Sat, 25 Jan 2014 10:15:28 -0000, Peter Alexander  
 <peter.alexander.au gmail.com> wrote:
 Special cases are pure evil. There's nothing special about strings in  
 this case.
This is a tangent to my suggestion. I am arguing for domain specific language (aliases) where sensible, not domain specific functions. If canFind can already handle all the desirable string cases, perfect, but lets alias it in std.string as "contains" so that people find what they expect to find first time and don't get frustrated looking for the correct generic name for the functionality they want. There are likely other cases where we already have all the functionality in a nice generic function, but people struggle to find it because it has a suitably generic name. I just want us to lower the bar for beginners coming from other R
I think that is a small short-term learning advantage but huge long-term damage for code readability. Now you suddenly need to not only remember what Phobos can do but also all defined aliases for that stuff.
No, you really don't. If you're writing string code you will intuitively reach for "substring", "contains", etc because you already know these terms and what behaviour to expect from them. In a generic context, or a range context you will reach for different generic or range type names. Likewise when reading code you will read "contains" and immediately know what it does, you don't need to remember that it's also called canFind .. why would you care? Even *if* you decided to compare some string code with some generic code, and the two were actually doing the "same" thing with different calls, you wouldn't have any trouble at all in understanding each and then realising they do the same thing.
 What could have been awesome is to be able to define such aliases via  
 DDOC so that IDE's can understand them and list in auto-completion,  
 while still putting "real" name in source code. It would have solved  
 discoverability issue without harming naming consistency.
I think I would dislike this.. not sure. Do our docs have "synonyms" in function descriptions.. then at least google would find "contains" on the page next to canFind and you would have an answer. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 28 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Tuesday, 28 January 2014 at 11:26:39 UTC, Regan Heath wrote:
 No, you really don't.

 If you're writing string code you will intuitively reach for 
 "substring", "contains", etc because you already know these 
 terms and what behaviour to expect from them.  In a generic 
 context, or a range context you will reach for different 
 generic or range type names.
Trusting intuition is not acceptable. I will go and check in docs in most case if I have not encountered it before. Check each time for every new aliases. I'd hate to have this overhead. Right now all I need to do is to stop thinking about strings as strings - easy and fast.
 What could have been awesome is to be able to define such 
 aliases via DDOC so that IDE's can understand them and list in 
 auto-completion, while still putting "real" name in source 
 code. It would have solved discoverability issue without 
 harming naming consistency.
I think I would dislike this.. not sure. Do our docs have "synonyms" in function descriptions.. then at least google would find "contains" on the page next to canFind and you would have an answer.
They don't have it right now and I propose to introduce it for this very reason.
Jan 29 2014
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 29 Jan 2014 09:52:01 -0000, Dicebot <public dicebot.lv> wrote:

 On Tuesday, 28 January 2014 at 11:26:39 UTC, Regan Heath wrote:
 No, you really don't.

 If you're writing string code you will intuitively reach for  
 "substring", "contains", etc because you already know these terms and  
 what behaviour to expect from them.  In a generic context, or a range  
 context you will reach for different generic or range type names.
Trusting intuition is not acceptable.
Sure it is, if we're talking about making life easier for beginners and making things more "obvious" in general. Of course, not everyone has the same idea of obvious, but there is enough overlap and we would *only* define aliases for that overlap. In short, if people expect it to be there, lets make sure it's there.
 I will go and check in docs in most case if I have not encountered it  
 before. Check each time for every new aliases. I'd hate to have this  
 overhead.
Huh? Assuming you have a decent editor checking the docs should be as simple as pressing F1 on the unknown function. And, that's only assuming it's not immediately obvious what it's doing. Are you telling me, that you would be confused by seeing... if (str.contains("hello")) I seriously doubt that, and that's all I'm suggesting, adding aliases for things which are obvious, things which any beginner will expect to be there, and currently aren't there. I am *not* suggesting we add every obscure name for every single function, that would be complete nonsense. Lets not get confused about the scope of what I'm suggesting, I am suggesting a very limited number of new aliases, and only for cases where there is a clear obvious expected name which we currently lack.
 Right now all I need to do is to stop thinking about strings as strings  
 - easy and fast.
Sure, once you learn all the generic terms for things. I *still* have trouble finding the LINQ function I need when I want to do something in the LINQ generic style .. and I've been using LINQ for at least a year now. The issue is that the generic name just does not naturally occur to me in certain contexts, like strings. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 29 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Wednesday, 29 January 2014 at 10:13:44 UTC, Regan Heath wrote:
 I will go and check in docs in most case if I have not 
 encountered it before. Check each time for every new aliases. 
 I'd hate to have this overhead.
Huh? Assuming you have a decent editor checking the docs should be as simple as pressing F1 on the unknown function.
It requires your mental context switching anyway.
 And, that's only assuming it's not immediately obvious what 
 it's doing.  Are you telling me, that you would be confused by 
 seeing...

 if (str.contains("hello"))
I won't be confused but I won't also be sure. For example, it may return boolean or inclusion count. `str` can be string of array of strings. With uniform ranges-based algorithms I can always expect consistent interpretation (or rant about inconsistent naming :)
 I seriously doubt that, and that's all I'm suggesting, adding 
 aliases for things which are obvious, things which any beginner 
 will expect to be there, and currently aren't there.
I don't buy into appealing to imaginary "any beginner" which has expectations identical to other "any beginner". My observations show quite the contrary - that those expectations are actually often different and incompatible and best way for a language is to force beginners to switch to expectations of the language.
Jan 29 2014
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 29 Jan 2014 10:42:08 -0000, Dicebot <public dicebot.lv> wrote:

 On Wednesday, 29 January 2014 at 10:13:44 UTC, Regan Heath wrote:
 I will go and check in docs in most case if I have not encountered it  
 before. Check each time for every new aliases. I'd hate to have this  
 overhead.
Huh? Assuming you have a decent editor checking the docs should be as simple as pressing F1 on the unknown function.
It requires your mental context switching anyway.
 And, that's only assuming it's not immediately obvious what it's  
 doing.  Are you telling me, that you would be confused by seeing...

 if (str.contains("hello"))
I won't be confused but I won't also be sure. For example, it may return boolean or inclusion count. `str` can be string of array of strings. With uniform ranges-based algorithms I can always expect consistent interpretation (or rant about inconsistent naming :)
 I seriously doubt that, and that's all I'm suggesting, adding aliases  
 for things which are obvious, things which any beginner will expect to  
 be there, and currently aren't there.
I don't buy into appealing to imaginary "any beginner" which has expectations identical to other "any beginner". My observations show quite the contrary - that those expectations are actually often different and incompatible and best way for a language is to force beginners to switch to expectations of the language.
*shrug* agree to disagree on all points. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 29 2014
parent "Dicebot" <public dicebot.lv> writes:
On Wednesday, 29 January 2014 at 15:57:30 UTC, Regan Heath wrote:
 *shrug* agree to disagree on all points.

 R
Peace!
Jan 29 2014
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/27/14 6:27 AM, Regan Heath wrote:
 On Sat, 25 Jan 2014 10:15:28 -0000, Peter Alexander
 <peter.alexander.au gmail.com> wrote:
 Special cases are pure evil. There's nothing special about strings in
 this case.
This is a tangent to my suggestion. I am arguing for domain specific language (aliases) where sensible, not domain specific functions. If canFind can already handle all the desirable string cases, perfect, but lets alias it in std.string as "contains" so that people find what they expect to find first time and don't get frustrated looking for the correct generic name for the functionality they want. There are likely other cases where we already have all the functionality in a nice generic function, but people struggle to find it because it has a suitably generic name. I just want us to lower the bar for beginners coming from other
I just don't think this scales, though I understand it can sound reasonable before it being tried. Walter doesn't like writing libraries so when he first defined Phobos' string support he simply took the string functions in Python and Ruby and implemented them. That didn't work well at all, in spite of the functions having the same names and semantics. Andrei
Jan 27 2014
parent reply "Regan Heath" <regan netmail.co.nz> writes:
On Mon, 27 Jan 2014 16:19:54 -0000, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 1/27/14 6:27 AM, Regan Heath wrote:
 On Sat, 25 Jan 2014 10:15:28 -0000, Peter Alexander
 <peter.alexander.au gmail.com> wrote:
 Special cases are pure evil. There's nothing special about strings in
 this case.
This is a tangent to my suggestion. I am arguing for domain specific language (aliases) where sensible, not domain specific functions. If canFind can already handle all the desirable string cases, perfect, but lets alias it in std.string as "contains" so that people find what they expect to find first time and don't get frustrated looking for the correct generic name for the functionality they want. There are likely other cases where we already have all the functionality in a nice generic function, but people struggle to find it because it has a suitably generic name. I just want us to lower the bar for beginners coming from other
I just don't think this scales, though I understand it can sound reasonable before it being tried. Walter doesn't like writing libraries so when he first defined Phobos' string support he simply took the string functions in Python and Ruby and implemented them. That didn't work well at all, in spite of the functions having the same names and semantics.
What specifically didn't work? All I can recall are UTF and slicing issues, some of which remain with us today. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 28 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/28/14 3:28 AM, Regan Heath wrote:
 On Mon, 27 Jan 2014 16:19:54 -0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Walter doesn't like writing libraries so when he first defined Phobos'
 string support he simply took the string functions in Python and Ruby
 and implemented them. That didn't work well at all, in spite of the
 functions having the same names and semantics.
What specifically didn't work? All I can recall are UTF and slicing issues, some of which remain with us today.
Problem is what we had was a crappy strings API because it used none of D's inherent advantages. What we have now is much better. Andrei
Jan 28 2014
parent "Regan Heath" <regan netmail.co.nz> writes:
On Wed, 29 Jan 2014 06:49:30 -0000, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 1/28/14 3:28 AM, Regan Heath wrote:
 On Mon, 27 Jan 2014 16:19:54 -0000, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org> wrote:
 Walter doesn't like writing libraries so when he first defined Phobos'
 string support he simply took the string functions in Python and Ruby
 and implemented them. That didn't work well at all, in spite of the
 functions having the same names and semantics.
What specifically didn't work? All I can recall are UTF and slicing issues, some of which remain with us today.
Problem is what we had was a crappy strings API because it used none of D's inherent advantages. What we have now is much better.
Sure, but it would be better still if the commonly expected names for routines were present.. is all I'm saying. I am certainly not suggesting we go back to a bad API, I am just saying there are some functions people expect to see, and they're not there, and that is frustrating; perhaps enough to put someone off. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 29 2014
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 25 January 2014 at 09:18:24 UTC, Manu wrote:
 On 24 January 2014 21:49, Regan Heath <regan netmail.co.nz> 
 wrote:

 On Fri, 24 Jan 2014 08:21:12 -0000, Jacob Carlborg 
 <doob me.com> wrote:

 I would expect "contains" to take a element and check if it 
 exists in the
 range.
Except in the case of string, where we also want an overload taking more than a single element aka a substring.
A great example of when the string function should not be conflated with the general function.
Both `find` and `canFind` support subrange search, and that works with any range, not just substrings.
Jan 25 2014
prev sibling parent "Regan Heath" <regan netmail.co.nz> writes:
On Thu, 23 Jan 2014 20:53:01 -0000, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 1/23/14 8:06 AM, Regan Heath wrote:
 This.  Not my position.  Rather I am suggesting we identify individual
 omissions (like std.string.contains) and add an alias.  So that people
 don't have to struggle quite so much when switching to D.  The lower the
 bar and all that..
Ionno. Just look at the current morass with https://github.com/D-Programming-Language/phobos/pull/1875. We have two names for the same function "canFind" and "any". Then we want to deprecate one, but look at how much impact it's having on Phobos alone. Are you sure you want to add a _third_?
Not *quite* the same. Any is/was in the same module as canFind and for use in the exact same context. A string specific "contains" would only be used in the context of string parsing. If contains existed in std.string then it would be unusual for anyone to use canFind on a string (in a string only context). That's what I'm suggesting, not adding more generic aliases/names for existing functions (as Any was) but adding specific names in specific contexts for otherwise generic functions with odd generic names, like canFind. R -- Using Opera's revolutionary email client: http://www.opera.com/mail/
Jan 24 2014
prev sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Monday, 13 January 2014 at 12:53:08 UTC, Regan Heath wrote:
 or less what you might call std.string.contains (which does not 
 exist - instead we'd use indexOf != -1.. I think).
Just a side track: What I dislike about return values as error-indicators is that they are arbitrary so you have to memorize "-1", "0", null, throws… I think it is often useful to have user-supplied default and sensible naming like having functions that allow testing for "0","false","null" as failure ending with "OK" in their name. And functions that throws ought to have some kind of assertive name like "validate" or a name that explicitly hints at exceptions. "-1" is really a horrible error value since it fails the "boolean test", and e.g. if you want the non-query part of an url, you want string length to be the "not found value" when searching for "?", not -1: "http://server.com/page" "http://server.com/page?query=xyz"
Jan 22 2014
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 12:53 PM, Craig Dillabaugh wrote:
 At the very least the documentation for std.string should say something
 along the lines of:

 "The libraries std.unicode and std.array also include a number of
 functions that operate on strings, so if what you are looking for isn't
 here, try looking there."
Pull request please. Andrei
Jan 09 2014
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
   string y = find(retro("Hello"), 'H');
import std.string; auto idx = lastIndexOf("Hello", 'H'); Wow, that's unbelievable difficult. D sucks.
Jan 09 2014
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Thursday, 9 January 2014 at 17:39:00 UTC, Adam D. Ruppe wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
  string y = find(retro("Hello"), 'H');
import std.string; auto idx = lastIndexOf("Hello", 'H'); Wow, that's unbelievable difficult. D sucks.
How on earth did I miss that...
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 03:40, John Colvin <john.loughran.colvin gmail.com>wrote:

 On Thursday, 9 January 2014 at 17:39:00 UTC, Adam D. Ruppe wrote:

 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

  string y = find(retro("Hello"), 'H');
import std.string; auto idx = lastIndexOf("Hello", 'H'); Wow, that's unbelievable difficult. D sucks.
How on earth did I miss that...
I have to wonder the same thing. It's just not anything like anything I've ever called it before I guess. I guess I started with find, and then it refers you to retro if you want to reverse find, and of course, by this time I'm nowhere near std.string anymore. Hard to find something if you're not even looking in the same file :/
Jan 09 2014
next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 4:34 PM, Manu wrote:
 On 10 January 2014 03:40, John Colvin <john.loughran.colvin gmail.com
 <mailto:john.loughran.colvin gmail.com>> wrote:

     On Thursday, 9 January 2014 at 17:39:00 UTC, Adam D. Ruppe wrote:

         On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:

               string y = find(retro("Hello"), 'H');


         import std.string;
         auto idx = lastIndexOf("Hello", 'H');

         Wow, that's unbelievable difficult. D sucks.


     How on earth did I miss that...


 I have to wonder the same thing.
 It's just not anything like anything I've ever called it before I guess.
 I guess I started with find, and then it refers you to retro if you want
 to reverse find, and of course, by this time I'm nowhere near std.string
 anymore. Hard to find something if you're not even looking in the same
 file :/
Probably an xref of indexOf/lastIndexOf in find would be useful. PRP Andrei
Jan 09 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 01:34, Manu wrote:

 I have to wonder the same thing.
 It's just not anything like anything I've ever called it before I guess.
 I guess I started with find, and then it refers you to retro if you want
 to reverse find, and of course, by this time I'm nowhere near std.string
 anymore. Hard to find something if you're not even looking in the same
 file :/
But "strchr" is a good name? If I wanted the index of a character in a string I would most likely look for something called "indexOf", or "index". In Python it's called "index" (and "find"). In PHP it's called "strrpos" and in C++ it's called "find". I think we're in pretty good shape here with D. -- /Jacob Carlborg
Jan 10 2014
prev sibling next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
09-Jan-2014 21:38, Adam D. Ruppe пишет:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
   string y = find(retro("Hello"), 'H');
import std.string; auto idx = lastIndexOf("Hello", 'H'); Wow, that's unbelievable difficult. D sucks.
+1 LOL -- Dmitry Olshansky
Jan 09 2014
prev sibling parent reply "Dicebot" <public dicebot.lv> writes:
On Thursday, 9 January 2014 at 17:39:00 UTC, Adam D. Ruppe wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
  string y = find(retro("Hello"), 'H');
import std.string; auto idx = lastIndexOf("Hello", 'H'); Wow, that's unbelievable difficult. D sucks.
It is not the same thing as sample with byGrapheme though.
Jan 09 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Thursday, 9 January 2014 at 17:54:05 UTC, Dicebot wrote:
 It is not the same thing as sample with byGrapheme though.
Right, but it works for ascii (and others) and shows std.string isn't as weak as being said in this thread.
Jan 09 2014
next sibling parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 04:00, Adam D. Ruppe <destructionator gmail.com> wrote:

 On Thursday, 9 January 2014 at 17:54:05 UTC, Dicebot wrote:

 It is not the same thing as sample with byGrapheme though.
Right, but it works for ascii (and others) and shows std.string isn't as weak as being said in this thread.
So is it 'correct'? The docs don't really say what it does. Is 'index' in bytes, in codepoints, or in graphemes? Looks like bytes, but then it talks about std.utf.UTFException, so maybe codepoints? Being correct is constantly being thrown around as the 'value' in why everything's so fucking hard... if this function isn't 'correct', then we have a disparity. I also don't think it excuses any of my other points. There shouldn't be 4-5+ modules where you have to look whenever you want to find string related stuff. In this case, my explicit example is just the straw that broke the camels back. My experience still stands; every time I try to do any serious string work, I waste far more time than I care to, and I HATE doing it. Makes me feel dirty and I don't enjoy my programming time (which I ususally do enjoy). In my experience, if you're not enjoying programming, something went wrong. The D docs are pretty terrible, they don't do much to help you find what you're looking for. You have a massive block of function names at the top of the page, you have to carefully scan through one by one, hoping that it's named something obvious that will stand out to you, and in the event it doesn't have a helper function, you need to work out the proper sequence of algorithm/range/whatever operations to do what you want (and then repeat the process finding the small parts you need across a bunch of modules). Blah! </endrant>
Jan 09 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 So is it 'correct'?
Yes, with the caveat that it might find a surrogate pair (like H followed by an accent code point). That's what byGrapheme is about: combining those pairs. But meh, do you really care about that? indexOf does correctly handle the UTF formats and returns an index suitable for slicing (or -1). auto idx = "cool".indexOf("o"); if(idx == -1) throw new Exception("not found"); auto before = "cool"[0 .. idx]; auto after = "cool"[idx + 1 .. $]; Code like that will always yield valid UTF strings. Again, it *might* break up a pair of code points, but it *will* correctly handle multi-byte code points... so probably good enough for 99% of use cases.
 Looks like bytes, but then it talks
It is bytes on string, and wchars on wstring; it is whatever unit is correct for slicing the type you pass it.
 The D docs are pretty terrible, they don't do much to help you 
 find what you're looking for.
I mostly agree (and this is partially why I started writing http://dpldocs.info/ but I never finished that so it isn't much better). I don't notice it so much because I already know where to look for most things but regardless I agree it is a pain for anything new.
Jan 09 2014
next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
BTW, I'll say it again: it was a *lot* easier to get started with 
this back in the phobos1 days, where std.string WAS the one-stop 
location for string stuff.

At the least, we should get the docs to point people in the right 
place, but I think we should also do more conceptual overview 
pages that talk about cross-module things.
Jan 09 2014
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 01:18:01AM +0000, Adam D. Ruppe wrote:
 BTW, I'll say it again: it was a *lot* easier to get started with
 this back in the phobos1 days, where std.string WAS the one-stop
 location for string stuff.
I thought it still is? Except that a lot of it is now implicit via public import from std.array and std.algorithm and wherever else. (But I wouldn't know, though, I wasn't around in the D1 days.)
 At the least, we should get the docs to point people in the right
 place,
Yeah, I think all public imports should at least get a mention in the ddoc header so that people know what's *actually* getting imported, not just what the docs say are in the module.
 but I think we should also do more conceptual overview pages that talk
 about cross-module things.
+1. Currently Phobos has way too many modules under std, and unless you're already familiar with where things are, you wouldn't even know where to start looking when searching for new functionality. T -- Кто везде - тот нигде.
Jan 09 2014
parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 10 January 2014 at 01:26:50 UTC, H. S. Teoh wrote:
 I thought it still is?
Yeah, mostly, though sometimes the disambiguation leaks the other details (for example replace() sometimes has a name conflict, so you need to explicitly import it or use a full name to disambiguate). But this is primarily a documentation problem rather than a code one. Some code differences from the old days: * before: converting to and from string was in std.string. Functions like toInt, toString, etc. Nowadays, this is all done with std.conv.to. The new way is way cool, but a newbie's first place to look might be for std.string.toString rather than std.conv.to!string. * before: some char type stuff was in std.string (and the rest in std.ctype IIRC). Now, it is in std.ascii and std.uni. * before: the signatures were char[] foo(char[]). Nowadays, it is S foo(S)(S s) if(isSomeString!S)... so much wordier! Better functionality, but omg it can be a pain to read and surely intimidating for newbs. I think things are generally improved as for functionality and consistency, but the docs are more debatable.
Jan 09 2014
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 01:34:46AM +0000, Adam D. Ruppe wrote:
[...]
 Some code differences from the old days:
 
 * before: converting to and from string was in std.string. Functions
 like toInt, toString, etc. Nowadays, this is all done with
 std.conv.to. The new way is way cool, but a newbie's first place to
 look might be for std.string.toString rather than std.conv.to!string.
Right, so it should be mentioned in std.string. But probably your idea of more concept-oriented overview pages is better. It doesn't seem like the right solution to just insert hyperlinks to std.conv in every other Phobos module.
 * before: some char type stuff was in std.string (and the rest in
 std.ctype IIRC). Now, it is in std.ascii and std.uni.
Yeah, this is one of the things I found annoying. Sure I understand why std.ascii needs to be different from std.uni, but then you have stuff split across std.string, std.ascii, std.uni, and std.utf -- what's the diff between std.utf and std.uni?! (Yes I know what the diff is, the point is that it looks silly to a newcomer.)
 * before: the signatures were char[] foo(char[]). Nowadays, it is S
 foo(S)(S s) if(isSomeString!S)... so much wordier! Better
 functionality, but omg it can be a pain to read and surely
 intimidating for newbs.
Sig constraints seriously need to be formatted differently from the way they are right now, which is an unreadable blob of obtuse text. Take std.algorithm.makeIndex, for example. How do you even *read* that mess??! It's 6 lines of dense, *bolded* text (on my browser anyway, YMMV), and it's not even clear that it's actually two overloads. I have trouble telling what exactly it returns, and where its parameter lists start and end. Nor what the sig constraints actually mean. Actually, this particular case seems to be a prime example of the sig constraint vs. static if idea I had in another post (i.e., sig constraints should only define the scope of the overload, and type requirements on arguments within that scope should be inside static ifs in the body of the function / template). From what I can see, makeIndex really should be in a *single* template, probably with no sig constraints (or only very simple ones), and everything else should be inside the template body as static if blocks. Whatever is unclear from the outer sig constraints should be explained in the text of the ddoc. Users shouldn't be expected to be able to parse sig constraints that are really Phobos internal implementation details.
 I think things are generally improved as for functionality and
 consistency, but the docs are more debatable.
I agree, functionality is more unified and consistent, but the docs are very newbie-unfriendly. T -- Why can't you just be a nonconformist like everyone else? -- YHL
Jan 09 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/9/14 6:00 PM, H. S. Teoh wrote:
 On Fri, Jan 10, 2014 at 01:34:46AM +0000, Adam D. Ruppe wrote:
 [...]
 Some code differences from the old days:

 * before: converting to and from string was in std.string. Functions
 like toInt, toString, etc. Nowadays, this is all done with
 std.conv.to. The new way is way cool, but a newbie's first place to
 look might be for std.string.toString rather than std.conv.to!string.
Right, so it should be mentioned in std.string. But probably your idea of more concept-oriented overview pages is better. It doesn't seem like the right solution to just insert hyperlinks to std.conv in every other Phobos module.
A tutorial on string manipulation in D would be awesome. Andrei
Jan 09 2014
parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 10 January 2014 at 05:28:24 UTC, Andrei Alexandrescu 
wrote:
 On 1/9/14 6:00 PM, H. S. Teoh wrote:
 On Fri, Jan 10, 2014 at 01:34:46AM +0000, Adam D. Ruppe wrote:
 [...]
 Some code differences from the old days:

 * before: converting to and from string was in std.string. 
 Functions
 like toInt, toString, etc. Nowadays, this is all done with
 std.conv.to. The new way is way cool, but a newbie's first 
 place to
 look might be for std.string.toString rather than 
 std.conv.to!string.
Right, so it should be mentioned in std.string. But probably your idea of more concept-oriented overview pages is better. It doesn't seem like the right solution to just insert hyperlinks to std.conv in every other Phobos module.
A tutorial on string manipulation in D would be awesome. Andrei
That would be a very useful asset.
Jan 10 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 02:34, Adam D. Ruppe wrote:

 Some code differences from the old days:

 * before: converting to and from string was in std.string. Functions
 like toInt, toString, etc. Nowadays, this is all done with std.conv.to.
 The new way is way cool, but a newbie's first place to look might be for
 std.string.toString rather than std.conv.to!string.
I think it would be good to still have a few alias, like toString and toInt.
 * before: some char type stuff was in std.string (and the rest in
 std.ctype IIRC). Now, it is in std.ascii and std.uni.
std.uni was available in D1 as well. -- /Jacob Carlborg
Jan 10 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 02:18, Adam D. Ruppe wrote:
 BTW, I'll say it again: it was a *lot* easier to get started with this
 back in the phobos1 days, where std.string WAS the one-stop location for
 string stuff.
There was std.uni back in the D1 days as well ;) -- /Jacob Carlborg
Jan 10 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
10-Jan-2014 05:16, Adam D. Ruppe пишет:
 On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 So is it 'correct'?
Yes, with the caveat that it might find a surrogate pair (like H followed by an accent code point). That's what byGrapheme is about: combining those pairs.
Not at all. Take time to read the Unicode standard. Surrogate pairs are a part of UTF-16 encoding and little else. -- Dmitry Olshansky
Jan 09 2014
parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
10-Jan-2014 11:49, Dmitry Olshansky пишет:
 10-Jan-2014 05:16, Adam D. Ruppe пишет:
 On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 So is it 'correct'?
Yes, with the caveat that it might find a surrogate pair (like H followed by an accent code point). That's what byGrapheme is about: combining those pairs.
Not at all. Take time to read the Unicode standard. Surrogate pairs are a part of UTF-16 encoding and little else.
To clarify: grapheme cluster is not a pair, nor it's a surrogate pair, but H with accent is a grapheme cluster ;) -- Dmitry Olshansky
Jan 09 2014
prev sibling next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 The D docs are pretty terrible, they don't do much to help you 
 find what
 you're looking for.
 You have a massive block of function names at the top of the 
 page, you have
 to carefully scan through one by one, hoping that it's named 
 something
 obvious that will stand out to you, and in the event it doesn't 
 have a
 helper function, you need to work out the proper sequence of
 algorithm/range/whatever operations to do what you want (and 
 then repeat
 the process finding the small parts you need across a bunch of 
 modules).
DDox improves on this a bit by giving a table with brief descriptions right up top: http://vibed.org/temp/dlang.org/library/std/string.html Still plenty left to do though.
 Blah! </endrant>
Jan 09 2014
next sibling parent Manu <turkeyman gmail.com> writes:
On 10 January 2014 12:40, Brad Anderson <eco gnuk.net> wrote:

 On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:

 The D docs are pretty terrible, they don't do much to help you find what
 you're looking for.
 You have a massive block of function names at the top of the page, you
 have

 to carefully scan through one by one, hoping that it's named something
 obvious that will stand out to you, and in the event it doesn't have a
 helper function, you need to work out the proper sequence of
 algorithm/range/whatever operations to do what you want (and then repeat
 the process finding the small parts you need across a bunch of modules).
DDox improves on this a bit by giving a table with brief descriptions right up top: http://vibed.org/temp/dlang.org/library/std/string.html Still plenty left to do though.
I prefer this immeasurably.
Jan 09 2014
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-10 03:40, Brad Anderson wrote:

 DDox improves on this a bit by giving a table with brief
 descriptions right up top:
 http://vibed.org/temp/dlang.org/library/std/string.html
What's the hold up of making the official documentation use DDox? -- /Jacob Carlborg
Jan 10 2014
next sibling parent "Kira Backes" <kira.backes nrwsoft.de> writes:
On Friday, 10 January 2014 at 08:15:19 UTC, Jacob Carlborg wrote:
 What's the hold up of making the official documentation use 
 DDox?
I’m also interested in this, since the current documentation is not beginner-friendly due to missing overview and this hurts D.
Jan 10 2014
prev sibling parent reply Dmitry Olshansky <dmitry.olsh gmail.com> writes:
10-Jan-2014 12:15, Jacob Carlborg пишет:
 On 2014-01-10 03:40, Brad Anderson wrote:

 DDox improves on this a bit by giving a table with brief
 descriptions right up top:
 http://vibed.org/temp/dlang.org/library/std/string.html
What's the hold up of making the official documentation use DDox?
Seconded. -- Dmitry Olshansky
Jan 10 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 1/10/14 12:17 AM, Dmitry Olshansky wrote:
 10-Jan-2014 12:15, Jacob Carlborg пишет:
 On 2014-01-10 03:40, Brad Anderson wrote:

 DDox improves on this a bit by giving a table with brief
 descriptions right up top:
 http://vibed.org/temp/dlang.org/library/std/string.html
What's the hold up of making the official documentation use DDox?
Seconded.
Let's set to switch to ddox with 2.065. Andrei
Jan 10 2014
parent reply "Dicebot" <public dicebot.lv> writes:
On Friday, 10 January 2014 at 08:31:28 UTC, Andrei Alexandrescu 
wrote:
 On 1/10/14 12:17 AM, Dmitry Olshansky wrote:
 10-Jan-2014 12:15, Jacob Carlborg пишет:
 On 2014-01-10 03:40, Brad Anderson wrote:

 DDox improves on this a bit by giving a table with brief
 descriptions right up top:
 http://vibed.org/temp/dlang.org/library/std/string.html
What's the hold up of making the official documentation use DDox?
Seconded.
Let's set to switch to ddox with 2.065. Andrei
Let's not put to much stress on release/deployment process (which changing documentation engine will require) and make at least one "simple" release just to get everyone familiar with it.
Jan 10 2014
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 10 January 2014 at 08:39:13 UTC, Dicebot wrote:
 On Friday, 10 January 2014 at 08:31:28 UTC, Andrei Alexandrescu 
 wrote:
 On 1/10/14 12:17 AM, Dmitry Olshansky wrote:
 10-Jan-2014 12:15, Jacob Carlborg пишет:
 On 2014-01-10 03:40, Brad Anderson wrote:

 DDox improves on this a bit by giving a table with brief
 descriptions right up top:
 http://vibed.org/temp/dlang.org/library/std/string.html
What's the hold up of making the official documentation use DDox?
Seconded.
Let's set to switch to ddox with 2.065. Andrei
Let's not put to much stress on release/deployment process (which changing documentation engine will require) and make at least one "simple" release just to get everyone familiar with it.
Updating the website is almost strictly Andrei's domain so he should be able to do it independently of the release process (though integrating updating the website with the release process should probably happen at some point). ddox was merged with the tools repo 6 months ago and dlang.org 3 months ago so as far as I know all that's left is for Andrei to generate the pages and upload them as 2.065 is completed.
Jan 10 2014
parent "Dicebot" <public dicebot.lv> writes:
On Friday, 10 January 2014 at 16:54:30 UTC, Brad Anderson wrote:
 Updating the website is almost strictly Andrei's domain so he 
 should be able to do it independently of the release process 
 (though integrating updating the website with the release 
 process should probably happen at some point).  ddox was merged 
 with the tools repo 6 months ago and dlang.org 3 months ago so 
 as far as I know all that's left is for Andrei to generate the 
 pages and upload them as 2.065 is completed.
Andrew should have access too as any release-related updates are supposed to be moved into domain of release manager.
Jan 12 2014
prev sibling parent Jerry <jlquinn optonline.net> writes:
"Brad Anderson" <eco gnuk.net> writes:

 On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 The D docs are pretty terrible, they don't do much to help you find what
 you're looking for.
 You have a massive block of function names at the top of the page, you have
 to carefully scan through one by one, hoping that it's named something
 obvious that will stand out to you, and in the event it doesn't have a
 helper function, you need to work out the proper sequence of
 algorithm/range/whatever operations to do what you want (and then repeat
 the process finding the small parts you need across a bunch of modules).
DDox improves on this a bit by giving a table with brief descriptions right up top: http://vibed.org/temp/dlang.org/library/std/string.html Still plenty left to do though.
This looks much nicer as a summary. I would personally prefer to have the details all on the same page below, rather than having to jump to a new page for each different function. Still, thumbs up! Jerry
Jan 17 2014
prev sibling next sibling parent reply "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 The D docs are pretty terrible, they don't do much to help you 
 find what
 you're looking for.
 You have a massive block of function names at the top of the 
 page, you have
 to carefully scan through one by one, hoping that it's named 
 something
 obvious that will stand out to you, and in the event it doesn't 
 have a
 helper function, you need to work out the proper sequence of
 algorithm/range/whatever operations to do what you want (and 
 then repeat
 the process finding the small parts you need across a bunch of 
 modules).
I find this to be true in other languages, except the "block of function names." Google it and find some StackOverflow page with the answer. In Java, I Google it and find a Java API page (this was mostly be for StackOverflow took over). D, I have a generally idea of where I need to be. Maybe it there are a couple modules to look at. Searching isn't as effective, there just aren't enough arbitrary tutorials on how to do the most basic of things to be able to find those basic things. need isn't fun. But it can be a little better if you know which class you need.
Jan 09 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 03:40, Jesse Phillips wrote:


 isn't fun. But it can be a little better if you know which class you need.
It's easier in a more object oriented language. It's most likely that -- /Jacob Carlborg
Jan 10 2014
prev sibling parent "Jesse Phillips" <Jesse.K.Phillips+D gmail.com> writes:
On Friday, 10 January 2014 at 00:56:36 UTC, Manu wrote:
 On 10 January 2014 04:00, Adam D. Ruppe 
 <destructionator gmail.com> wrote:

 On Thursday, 9 January 2014 at 17:54:05 UTC, Dicebot wrote:

 It is not the same thing as sample with byGrapheme though.
Right, but it works for ascii (and others) and shows std.string isn't as weak as being said in this thread.
So is it 'correct'?
It is interesting that you ask this about the D code but not the C function, which is not correct, you're trying to mimic.
Jan 09 2014
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 10:56:27AM +1000, Manu wrote:
[...]
 The D docs are pretty terrible, they don't do much to help you find
 what you're looking for.
 You have a massive block of function names at the top of the page,
Yeah, that blob of links is useless unless you already knew what you were looking for (kinda defeats the purpose). The hand-classified table of functions in std.algorithm and std.range is more useful, IMO. At least it lets you use divide-and-conquer to zoom down to your area of interest, whereas the order of links in the blob of links has no relation whatsoever to the functionality provided. The order of docs for each symbol also follows the order in the source code, which may not necessarily follow a logical order. This makes browsing the docs difficult -- one minute it's describing find() overloads, next minute it's talking about set unions, then after that it's back to findAfter(), then it jumps to remove(), etc.. Try finding what you want when the docs are 50 pages of this random jumping around. All the more this makes a hand-classified table of symbols indispensable.
 you have to carefully scan through one by one, hoping that it's named
 something obvious that will stand out to you, and in the event it
 doesn't have a helper function, you need to work out the proper
 sequence of algorithm/range/whatever operations to do what you want
 (and then repeat the process finding the small parts you need across a
 bunch of modules).
[...] I will say, though, that taking the time to learn where things are in std.algorithm, std.range, std.string, and std.array helps immensely in knowing where to look for things in the future. This doesn't excuse the poor state of the docs, of course, nor the non-intuitive placement of some of the functions in Phobos, but you're likely to feel far less frustrated if you took the time to familiarize yourself with where things are. :) I usually don't have too much trouble finding what I need when it comes to string manipulation. But then again, when I fail to find something within 15 seconds of looking at the obvious places, I just import std.regex and proceed to crush the proverbial ant with the proverbial nuclear warhead. :-P T -- It's bad luck to be superstitious. -- YHL
Jan 09 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 02:13, H. S. Teoh wrote:

 The hand-classified table of functions in std.algorithm and std.range is
 more useful, IMO. At least it lets you use divide-and-conquer to zoom
 down to your area of interest, whereas the order of links in the blob of
 links has no relation whatsoever to the functionality provided.
I'm convinced the both of the tables on in the std.algorithm documentation can automatically be generated with a bit help from the compiler. Add a macro, $(CATEGORY), the compiler knows about. The compiler will the generate the first table by using the symbol (which it already knows about) and the $(CATEGORY) macro. The second table can be generated in a similar way, just take the summary (first paragraph) of the documentation of the symbol.
 The order of docs for each symbol also follows the order in the source
 code, which may not necessarily follow a logical order. This makes
 browsing the docs difficult -- one minute it's describing find()
 overloads, next minute it's talking about set unions, then after that
 it's back to findAfter(), then it jumps to remove(), etc.. Try finding
 what you want when the docs are 50 pages of this random jumping around.
 All the more this makes a hand-classified table of symbols
 indispensable.
I would say that is poorly organized code. Although, if you do have a $(CATEGORY) macro, as described above, it might be a good idea to group the rest of the documentation after this as well. -- /Jacob Carlborg
Jan 10 2014
prev sibling next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 This works fine:
   string x = find("Hello", 'H');

 This doesn't:
   string y = find(retro("Hello"), 'H');
   > Error: cannot implicitly convert expression 
 (find(retro("Hello"), 'H'))
 of type Result!() to string

 Is that wrong? That seems to be how the docs suggest it should 
 be used.

 On a side note, am I the only one that finds 
 std.algorithm/std.range/etc
 for string processing really obtuse?
 I can rarely understand the error messages, so say it's better 
 than STL is
 optimistic.
I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times.
 Using std.algorithm and std.range to do string manipulation 
 feels really
 lame to me.
 I hate looking through the docs of 3-4 modules to understand 
 the complete
 set of useful string operations (std.string, std.uni, 
 std.algorithm,
 std.range... at least).
I've finally started to get the hang of what stuff is in what module but it's taken me a couple years. Things like File being in std.stdio instead of the more intuitive std.file are confusing enough but with strings you end up having to look in std.string, std.array, std.algorithm, std.range, std.format, and std.uni (and there are probably more than that).
 I also find the names of the generic algorithms are often 
 unrelated to the
 name of the string operation.
 My feeling is, everyone is always on about how cool D is at 
 string, but
 other than 'char[]', and the builtin slice operator, I feel 
 really
 unproductive whenever I do any heavy string manipulation in D.
I actually feel a lot more productive in D than in C++ with strings. Boost's string algorithms library helps fill the gap (and at least you only have one place to look for documentation when you are using it) but overall I prefer my experience working in D with pseudo-member chains.
Jan 09 2014
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
On a side note, am I the only one that finds
std.algorithm/std.range/etc for string processing really obtuse?  I
can rarely understand the error messages, so say it's better than STL
is optimistic.
I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times.
Yeah, that error drives me up the wall too. I often get screenfuls of errors, dumping 25 or so overloads of some obscure Phobos internal function (like toImpl) as though an end-user would understand any of it. You have to parse all the sig constraints (and boy some of them are obscure), *understand* what they mean (which requires understanding how Phobos works internally), and *then* try to figure out, by elimination, which is the one that you intended to match, and why your code failed to match it. I'm almost tempted to say that using sig constraints to differentiate between template overloads is a bad idea. Instead, consider this alternative implementation of toImpl: template toImpl(S,T) // N.B.: no sig constraints here { { S toImpl(T t) { // implementation here } } { S toImpl(T t) { // implementation here } } ... else // N.B.: user-readable error message { static assert(0, "Unable to convert " ~ T.stringof ~ " to " ~ S.stringof); } } By putting all overloads inside a single template, we can give a useful default message when no overloads match. Alternatively, maybe sig constraints can have an additional string parameter that specifies a message that explains why that particular overload was rejected. These messages are not displayed if at least one overload matches; only if no overload matches, they will be displayed (so that the user can at least see why each of the overloads didn't match). [...]
I also find the names of the generic algorithms are often unrelated
to the name of the string operation.  My feeling is, everyone is
always on about how cool D is at string, but other than 'char[]', and
the builtin slice operator, I feel really unproductive whenever I do
any heavy string manipulation in D.
Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API. Whereas in languages like C, sure you get familiar with string-specific functions, but then when you need a similar-operating function for an array of ints, you have to name it something else, and then basically the same algorithm reimplemented for linked lists, called by yet another name, etc.. Added together, it's many times more mental load than just learning a single set of generic algorithms that work on (almost) everything. The composability of generic algorithms also allow me to think on a more abstract level -- instead of thinking about manipulating individual chars, I can figure out OK, if I split the string by "," then I can filter for the strings I'm looking for, then join them back again with another delimiter. Since the same set of algorithms work with other ranges too, I can apply exactly the same thought process for working with arrays, linked lists, and other containers, without having to remember 5 different names of essentially the same algorithm but applied to 5 different types.
 I actually feel a lot more productive in D than in C++ with strings.
 Boost's string algorithms library helps fill the gap (and at least
 you only have one place to look for documentation when you are using
 it) but overall I prefer my experience working in D with
 pseudo-member chains.
I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested. T -- Claiming that your operating system is the best in the world because more people use it is like saying McDonalds makes the best food in the world. -- Carl B. Constantine
Jan 09 2014
next sibling parent reply "Brad Anderson" <eco gnuk.net> writes:
On Thursday, 9 January 2014 at 20:40:33 UTC, H. S. Teoh wrote:
 On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
On a side note, am I the only one that finds
std.algorithm/std.range/etc for string processing really 
obtuse?  I
can rarely understand the error messages, so say it's better 
than STL
is optimistic.
I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times.
Yeah, that error drives me up the wall too. I often get screenfuls of errors, dumping 25 or so overloads of some obscure Phobos internal function (like toImpl) as though an end-user would understand any of it. You have to parse all the sig constraints (and boy some of them are obscure), *understand* what they mean (which requires understanding how Phobos works internally), and *then* try to figure out, by elimination, which is the one that you intended to match, and why your code failed to match it. I'm almost tempted to say that using sig constraints to differentiate between template overloads is a bad idea. Instead, consider this alternative implementation of toImpl: template toImpl(S,T) // N.B.: no sig constraints here { */) { S toImpl(T t) { // implementation here } } else static if (... /* sig constraint conditions for overload { S toImpl(T t) { // implementation here } } ... else // N.B.: user-readable error message { static assert(0, "Unable to convert " ~ T.stringof ~ " to " ~ S.stringof); } } By putting all overloads inside a single template, we can give a useful default message when no overloads match.
Interesting and there is a lot of flexibility there. It does make the functions a lot more verbose though for something that is really the compiler's job (clearly describing errors).
 Alternatively, maybe sig constraints can have an additional 
 string
 parameter that specifies a message that explains why that 
 particular
 overload was rejected. These messages are not displayed if at 
 least one
 overload matches; only if no overload matches, they will be 
 displayed
 (so that the user can at least see why each of the overloads 
 didn't
 match).
Each constraint would have a string? I think that would help for some of the more obscure constraints that aren't wrapped up in an eponymous template helper but I don't think it'd help with the problem generally because the problem is identifying which exact constraint failed. Example: void main() { import std.algorithm, std.range; struct A { } auto a = recurrence!"n"(0).take(5).find(A()); } This is the error message you get: --- /d14/f101.d(5): Error: template std.algorithm.find does not match any function template declaration. Candidates are: /opt/compilers/dmd2/include/std/algorithm.d(3650): std.algorithm.find(alias pred = "a == b", R, E)(R haystack, E needle) if (isInputRange!R && is(typeof(binaryFun!pred(haystack.front, needle)) : bool)) /opt/compilers/dmd2/include/std/algorithm.d(3713): std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) if (isForwardRange!R1 && isForwardRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) && !isRandomAccessRange!R1) /opt/compilers/dmd2/include/std/algorithm.d(3749): std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) if (isRandomAccessRange!R1 && isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool)) /opt/compilers/dmd2/include/std/algorithm.d(3821): std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) if (isRandomAccessRange!R1 && isForwardRange!R2 && !isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool)) /opt/compilers/dmd2/include/std/algorithm.d(4053): std.algorithm.find(alias pred = "a == b", Range, Ranges...)(Range haystack, Ranges needles) if (Ranges.length > 1 && is(typeof(startsWith!pred(haystack, needles)))) --- Where do you even begin with that flood of information? To fix it all you really want to see is which constraint you didn't satisfy. An error message like this would help greatly: --- /d539/f571.d(5): Error: template std.algorithm.find call fails all constraints. Candidates are: /opt/compilers/dmd2/include/std/algorithm.d: (3650) find(alias pred = "a == b", R, E)(R haystack, E needle): isInputRange!R && is(typeof(binaryFun!pred(haystack.front, needle)) : bool) <- FAILS (3713) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle): isForwardRange!R1 && isForwardRange!R2 <- FAILS && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) && !isRandomAccessRange!R1 (3749) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle): isRandomAccessRange!R1 <- FAILS && isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) (3821) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle) isRandomAccessRange!R1 <- FAILS && isForwardRange!R2 && !isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) (4053) find(alias pred = "a == b", Range, Ranges...)(Range haystack, Ranges needles) Ranges.length > 1 <-- FAILS && is(typeof(startsWith!pred(haystack, needles))) --- The NG line limit will probably mangle that and I'm assuming constraints are short-circuited. The exact appearance isn't as important as just pointing out the failing constraints as strongly as you can.
 [...]
I also find the names of the generic algorithms are often 
unrelated
to the name of the string operation.  My feeling is, everyone 
is
always on about how cool D is at string, but other than 
'char[]', and
the builtin slice operator, I feel really unproductive 
whenever I do
any heavy string manipulation in D.
Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API. Whereas in languages like C, sure you get familiar with string-specific functions, but then when you need a similar-operating function for an array of ints, you have to name it something else, and then basically the same algorithm reimplemented for linked lists, called by yet another name, etc.. Added together, it's many times more mental load than just learning a single set of generic algorithms that work on (almost) everything. The composability of generic algorithms also allow me to think on a more abstract level -- instead of thinking about manipulating individual chars, I can figure out OK, if I split the string by "," then I can filter for the strings I'm looking for, then join them back again with another delimiter. Since the same set of algorithms work with other ranges too, I can apply exactly the same thought process for working with arrays, linked lists, and other containers, without having to remember 5 different names of essentially the same algorithm but applied to 5 different types.
 I actually feel a lot more productive in D than in C++ with 
 strings.
 Boost's string algorithms library helps fill the gap (and at 
 least
 you only have one place to look for documentation when you are 
 using
 it) but overall I prefer my experience working in D with
 pseudo-member chains.
I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
Agreed. Except for some hiccups and those terrible error messages I find std.algorithm and std.range to be a work of genius. I envy them every day while I'm stuck using C++ at work.
 T
Jan 09 2014
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Jan 09, 2014 at 11:28:07PM +0000, Brad Anderson wrote:
 On Thursday, 9 January 2014 at 20:40:33 UTC, H. S. Teoh wrote:
On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
On a side note, am I the only one that finds
std.algorithm/std.range/etc for string processing really obtuse?  I
can rarely understand the error messages, so say it's better than
STL is optimistic.
I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times.
Yeah, that error drives me up the wall too. I often get screenfuls of errors, dumping 25 or so overloads of some obscure Phobos internal function (like toImpl) as though an end-user would understand any of it. You have to parse all the sig constraints (and boy some of them are obscure), *understand* what they mean (which requires understanding how Phobos works internally), and *then* try to figure out, by elimination, which is the one that you intended to match, and why your code failed to match it. I'm almost tempted to say that using sig constraints to differentiate between template overloads is a bad idea. Instead, consider this alternative implementation of toImpl: template toImpl(S,T) // N.B.: no sig constraints here { { S toImpl(T t) { // implementation here } } */) { S toImpl(T t) { // implementation here } } ... else // N.B.: user-readable error message { static assert(0, "Unable to convert " ~ T.stringof ~ " to " ~ S.stringof); } } By putting all overloads inside a single template, we can give a useful default message when no overloads match.
Interesting and there is a lot of flexibility there. It does make the functions a lot more verbose though for something that is really the compiler's job (clearly describing errors).
The way I see it, is that any sig constraints should go on the outer template, and should define the *logical* scope of all overloads encompassed therein. E.g., if you have a set of overloads for sqrt, say, then the outer template would have a sig constraint that matches any number-like type. Within the template, each individual overload would handle various concrete types, and the static assert at the end is essentially saying "in theory your arguments should match *something* in this template, but currently your particular combination of types isn't implemented by any overload". Or, put another way, the outer template represents the "logical" meta-function that does some given task (e.g., sqrt computes the square root of *something*), whereas the inner overloads provide the actual set of available implementations that implement that meta-function (computes the square root of an int, computes the square root of a float, etc.). My hypothesis is that you get the wall-of-template-errors problem when there's a logical meta-function (or a small number of them) that, for implementational reasons, consists of a large set of overloads. By treating the logical meta-function as an actual entity (the outer template), we can give a unified error message of failure to implement the meta-function for the requested types, rather than many error messages for each of the many overloads, most of which are irrelevant to the user.
Alternatively, maybe sig constraints can have an additional string
parameter that specifies a message that explains why that particular
overload was rejected. These messages are not displayed if at least
one overload matches; only if no overload matches, they will be
displayed (so that the user can at least see why each of the
overloads didn't match).
Each constraint would have a string? I think that would help for some of the more obscure constraints that aren't wrapped up in an eponymous template helper but I don't think it'd help with the problem generally because the problem is identifying which exact constraint failed.
True.
 Example:
 
     void main()
     {
       import std.algorithm, std.range;
       struct A { }
       auto a = recurrence!"n"(0).take(5).find(A());
     }
 
 This is the error message you get:
 
 ---
 /d14/f101.d(5): Error: template std.algorithm.find does not match
 any function template declaration. Candidates are:
 /opt/compilers/dmd2/include/std/algorithm.d(3650):
 std.algorithm.find(alias pred = "a == b", R, E)(R haystack, E
 needle) if (isInputRange!R &&
 is(typeof(binaryFun!pred(haystack.front, needle)) : bool))
 /opt/compilers/dmd2/include/std/algorithm.d(3713):
 std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2
 needle) if (isForwardRange!R1 && isForwardRange!R2 &&
 is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool) &&
 !isRandomAccessRange!R1)
 /opt/compilers/dmd2/include/std/algorithm.d(3749):
 std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2
 needle) if (isRandomAccessRange!R1 && isBidirectionalRange!R2 &&
 is(typeof(binaryFun!pred(haystack.front, needle.front)) : bool))
 /opt/compilers/dmd2/include/std/algorithm.d(3821):
 std.algorithm.find(alias pred = "a == b", R1, R2)(R1 haystack, R2
 needle) if (isRandomAccessRange!R1 && isForwardRange!R2 &&
 !isBidirectionalRange!R2 && is(typeof(binaryFun!pred(haystack.front,
 needle.front)) : bool))
 /opt/compilers/dmd2/include/std/algorithm.d(4053):
 std.algorithm.find(alias pred = "a == b", Range, Ranges...)(Range
 haystack, Ranges needles) if (Ranges.length > 1 &&
 is(typeof(startsWith!pred(haystack, needles))))
 ---
 
 Where do you even begin with that flood of information? To fix it
 all you really want to see is which constraint you didn't satisfy.
 An error message like this would help greatly:
 
 ---
 /d539/f571.d(5): Error: template std.algorithm.find call fails all
 constraints. Candidates are:
 /opt/compilers/dmd2/include/std/algorithm.d:
   (3650) find(alias pred = "a == b", R, E)(R haystack, E needle):
               isInputRange!R
            && is(typeof(binaryFun!pred(haystack.front, needle)) :
 bool) <- FAILS
   (3713) find(alias pred = "a == b", R1, R2)(R1 haystack, R2
 needle):
               isForwardRange!R1
            && isForwardRange!R2 <- FAILS
            && is(typeof(binaryFun!pred(haystack.front,
 needle.front)) : bool)
            && !isRandomAccessRange!R1
   (3749) find(alias pred = "a == b", R1, R2)(R1 haystack, R2
 needle):
               isRandomAccessRange!R1 <- FAILS
            && isBidirectionalRange!R2
            && is(typeof(binaryFun!pred(haystack.front,
 needle.front)) : bool)
   (3821) find(alias pred = "a == b", R1, R2)(R1 haystack, R2 needle)
               isRandomAccessRange!R1 <- FAILS
            && isForwardRange!R2
            && !isBidirectionalRange!R2
            && is(typeof(binaryFun!pred(haystack.front,
 needle.front)) : bool)
   (4053) find(alias pred = "a == b", Range, Ranges...)(Range
 haystack, Ranges needles)
               Ranges.length > 1 <-- FAILS
            && is(typeof(startsWith!pred(haystack, needles)))
But still, this will dump out a whole bunch of overloads that aren't necessarily interesting to the user. I mean, if I want to search for an int in an int[], but accidentally passed a string instead of an int, then I'm really only interested in seeing how the overload that handles int[] searching failed to match my string argument; I don't care about why the sig constraints failed for the overload that handles linked lists, for example. Perhaps a better solution lies in distinguishing the logical scope of the function, vs. requirements on its argument types within that scope. For example, the find() overload that searches T[] for some T, has T[] as its scope, whereas within this scope, it imposes certain requirements on the needle U (e.g., U must be comparable with an element of T). This suggests that it should be implemented like this: auto find(R,S)(R haystack, S needle) if (is(R == T[], T)) // <-- defines the scope of this function { static if (isComparable(S, ElementType!R)) // <-- Defines type requirements within this function's scope { // implementation here } else static if (isComparable(ElementType!S, ElementType!R)) // <-- ditto { // implementation here } else static assert(0, "Don't know how to search for " ~ S.stringof ~ " in " ~ R.stringof); } Then when you try to search for a string in an int[], for example, it will first match this overload of find(), then fail the static if conditions because the needle you passed in doesn't match the type requirements for searching an int[]. Note that I've grouped at least two of the current find() overloads under a single overload above -- because they are just two implementations for handling two cases within the same scope: searching an array. The fact that array-searching is implemented by two distinct algorithms is irrelevant to the user, and so it makes sense to "hide" them inside a single function's body. So to summarize: (1) use sig constraints to define the scope of an overload; and (2) use static if inside the function body (or template body) to enforce type requirements within that scope. This solves the problem of needing the compiler to somehow read your mind and figure out exactly which of the 56 overloads of find() you intended to match but failed to. T -- The only difference between male factor and malefactor is just a little emptiness inside.
Jan 09 2014
parent reply "Brad Anderson" <eco gnuk.net> writes:
On Friday, 10 January 2014 at 00:52:27 UTC, H. S. Teoh wrote:
 <snip>

 So to summarize:
 (1) use sig constraints to define the scope of an overload; and
 (2) use static if inside the function body (or template body) 
 to enforce
 type requirements within that scope.

 This solves the problem of needing the compiler to somehow read 
 your
 mind and figure out exactly which of the 56 overloads of find() 
 you
 intended to match but failed to.


 T
Ok, you've convinced me. I still think highlighting which constraints failed should happen but for well implemented modules like those in the standard library your approach offers even more helpful and tight error messages.
Jan 09 2014
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 01/10/2014 02:19 AM, Brad Anderson wrote:
 On Friday, 10 January 2014 at 00:52:27 UTC, H. S. Teoh wrote:
 <snip>

 So to summarize:
 (1) use sig constraints to define the scope of an overload; and
 (2) use static if inside the function body (or template body) to enforce
 type requirements within that scope.

 This solves the problem of needing the compiler to somehow read your
 mind and figure out exactly which of the 56 overloads of find() you
 intended to match but failed to.


 T
Ok, you've convinced me. I still think highlighting which constraints failed should happen but for well implemented modules like those in the standard library your approach offers even more helpful and tight error messages.
static assert is not a good way to implement custom error messages because it also changes the behaviour of the declaration.
Jan 09 2014
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 04:03:53AM +0100, Timon Gehr wrote:
 On 01/10/2014 02:19 AM, Brad Anderson wrote:
On Friday, 10 January 2014 at 00:52:27 UTC, H. S. Teoh wrote:
<snip>

So to summarize:
(1) use sig constraints to define the scope of an overload; and
(2) use static if inside the function body (or template body) to
enforce type requirements within that scope.

This solves the problem of needing the compiler to somehow read your
mind and figure out exactly which of the 56 overloads of find() you
intended to match but failed to.


T
Ok, you've convinced me. I still think highlighting which constraints failed should happen but for well implemented modules like those in the standard library your approach offers even more helpful and tight error messages.
static assert is not a good way to implement custom error messages because it also changes the behaviour of the declaration.
It's not just about custom error messages; it's about picking up a particular template signature even when you don't have an implementation for it, because logically speaking, your set of overloads *should* pick up all such instantiations to begin with. This is why I differentiated between the scope of a template, vs. the actual available overloads. With sig constraints, you're declining instantiations that don't satisfy certain conditions; I'm arguing that sometimes you *want* to accept instantiations that you currently don't implement (yet), because it falls under the logical scope of what you intend to handle; you just haven't gotten around to actually implementing it yet. T -- Stop staring at me like that! It's offens... no, you'll hurt your eyes!
Jan 10 2014
next sibling parent "QAston" <qaston gmail.com> writes:
Your proposal is awesome, this should be in phobos style guide 
imo.
Jan 12 2014
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Saturday, 11 January 2014 at 05:45:41 UTC, H. S. Teoh wrote:
 It's not just about custom error messages; it's about picking 
 up a
 particular template signature even when you don't have an 
 implementation
 for it, because logically speaking, your set of overloads 
 *should* pick
 up all such instantiations to begin with. This is why I 
 differentiated
 between the scope of a template, vs. the actual available 
 overloads.

 With sig constraints, you're declining instantiations that 
 don't satisfy
 certain conditions; I'm arguing that sometimes you *want* to 
 accept
 instantiations that you currently don't implement (yet), 
 because it
 falls under the logical scope of what you intend to handle; you 
 just
 haven't gotten around to actually implementing it yet.
There is nothing stopping you from writing constraints that accept currently unimplemented instantiations. There is no difference here with the two approaches.
Jan 13 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-09 21:27, H. S. Teoh wrote:

 Yeah, that error drives me up the wall too. I often get screenfuls of
 errors, dumping 25 or so overloads of some obscure Phobos internal
 function (like toImpl) as though an end-user would understand any of it.
 You have to parse all the sig constraints (and boy some of them are
 obscure), *understand* what they mean (which requires understanding how
 Phobos works internally), and *then* try to figure out, by elimination,
 which is the one that you intended to match, and why your code failed to
 match it.

 I'm almost tempted to say that using sig constraints to differentiate
 between template overloads is a bad idea. Instead, consider this
 alternative implementation of toImpl:

 	template toImpl(S,T)
 		// N.B.: no sig constraints here
 	{

 		{
 			S toImpl(T t)
 			{
 				// implementation here
 			}
 		}

 		{
 			S toImpl(T t)
 			{
 				// implementation here
 			}
 		}
 		...
 		else // N.B.: user-readable error message
 		{
 			static assert(0, "Unable to convert " ~
 				T.stringof ~ " to " ~ S.stringof);
 		}
 	}

 By putting all overloads inside a single template, we can give a useful
 default message when no overloads match.
If I recall correctly, Andrei has mentioned that something like the above doesn't works so well with __tratis(compile). -- /Jacob Carlborg
Jan 10 2014
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On 10 January 2014 06:27, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
On a side note, am I the only one that finds
std.algorithm/std.range/etc for string processing really obtuse?  I
can rarely understand the error messages, so say it's better than STL
is optimistic.
I absolutely hate the "does not match any template declaration" error. It's extremely unhelpful for figuring out what you need to do and anytime I try to do something fun with ranges I can expect to see it a dozen times.
Yeah, that error drives me up the wall too. I often get screenfuls of errors, dumping 25 or so overloads of some obscure Phobos internal function (like toImpl) as though an end-user would understand any of it. You have to parse all the sig constraints (and boy some of them are obscure), *understand* what they mean (which requires understanding how Phobos works internally), and *then* try to figure out, by elimination, which is the one that you intended to match, and why your code failed to match it. I'm almost tempted to say that using sig constraints to differentiate between template overloads is a bad idea. Instead, consider this alternative implementation of toImpl: template toImpl(S,T) // N.B.: no sig constraints here { static if (... /* sig constraint conditions for overload { S toImpl(T t) { // implementation here } } else static if (... /* sig constraint conditions for { S toImpl(T t) { // implementation here } } ... else // N.B.: user-readable error message { static assert(0, "Unable to convert " ~ T.stringof ~ " to " ~ S.stringof); } } By putting all overloads inside a single template, we can give a useful default message when no overloads match.
*THIS* .. I've always thought that, and intuitively written my D code that way. Funnily, I was always concerned I was being unidiomatic doing so, since the 'std' code is rarely written like that. Alternatively, maybe sig constraints can have an additional string
 parameter that specifies a message that explains why that particular
 overload was rejected. These messages are not displayed if at least one
 overload matches; only if no overload matches, they will be displayed
 (so that the user can at least see why each of the overloads didn't
 match).


 [...]
I also find the names of the generic algorithms are often unrelated
to the name of the string operation.  My feeling is, everyone is
always on about how cool D is at string, but other than 'char[]', and
the builtin slice operator, I feel really unproductive whenever I do
any heavy string manipulation in D.
Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API.
That sounds good in theory, but if any time you try and actually use D's generic algorithms you end up with many of the kind of errors you refer to in your prior paragraph, then that basically undermines the whole experience. I don't like wasting my time, and I don't like pushing my way through learning something that I feel is obtuse to begin with, so I usually take a side path and work around it (most things can be done easily with a couple of nested foreach-es). So, perhaps embarrassingly, despite my 3+ years spent hanging around here, part of the problem is that I barely know/use phobos. Call me lazy, but I don't think it's an unrealistic experience for any end-user. If it saves me time/headache (and bloat) not using it, why would I? ** Yes, it's the 'standard' library, and I like that concept in essence, and feel like I should make use of it on principle... but it's like, you need to already know phobos intimately to think it's awesome, which creates a weird barrier to entry. And the docs don't help a lot. Whereas in languages like C, sure you get familiar with string-specific
 functions, but then when you need a similar-operating function for an
 array of ints, you have to name it something else, and then basically
 the same algorithm reimplemented for linked lists, called by yet another
 name, etc.. Added together, it's many times more mental load than just
 learning a single set of generic algorithms that work on (almost)
 everything.

 The composability of generic algorithms also allow me to think on a more
 abstract level -- instead of thinking about manipulating individual
 chars, I can figure out OK, if I split the string by "," then I can
 filter for the strings I'm looking for, then join them back again with
 another delimiter. Since the same set of algorithms work with other
 ranges too, I can apply exactly the same thought process for working
 with arrays, linked lists, and other containers, without having to
 remember 5 different names of essentially the same algorithm but applied
 to 5 different types.
See, I get that idea about composability. Maybe it's just baggage from C, but I just don't think that way. Maybe that's a large part of why I always go wrong with phobos. I would never think of doing something fundamental like string processing with a sequence of generic algorithm. I'd freak out about the relatively unknown performance characteristics. Algorithms are usually a lot simpler when performed on strings of bytes than they are performed on strings of objects with any imaginable copying mechanisms and allocations patterns. Unless I wrote something myself, I can never have faith that the sort of concessions required to make it generic also make it fast in the case it happens to be performed in a byte array. There's an argument that you can specialise for string types, which is true within single functions, but if you're 'composing' a function with generic parts, then you can't specialise for strings anymore... There's no way to specialise a call to a.b.c() as a compound operation. Like I say, it's probably psychological baggage, but I tend to unconsciously dismiss/reject that sort of thing without a second though... or maybe experience learned me my lesson (*cough* STL).
 I actually feel a lot more productive in D than in C++ with strings.
 Boost's string algorithms library helps fill the gap (and at least
 you only have one place to look for documentation when you are using
 it) but overall I prefer my experience working in D with
 pseudo-member chains.
I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
Perhaps you're right. But I think there's ***HUGE*** room for improvement. The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D?
Jan 09 2014
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 11:33:35AM +1000, Manu wrote:
 On 10 January 2014 06:27, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:
 
 On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
I also find the names of the generic algorithms are often
unrelated to the name of the string operation.  My feeling is,
everyone is always on about how cool D is at string, but other
than 'char[]', and the builtin slice operator, I feel really
unproductive whenever I do any heavy string manipulation in D.
Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API.
That sounds good in theory, but if any time you try and actually use D's generic algorithms you end up with many of the kind of errors you refer to in your prior paragraph, then that basically undermines the whole experience.
Really? I only encounter those kinds of errors once in a while. They *are* extremely annoying when they happen, but on the whole, they're relatively rare. You must be doing something wrong if you're seeing them all the time.
 I don't like wasting my time, and I don't like pushing my way through
 learning something that I feel is obtuse to begin with, so I usually
 take a side path and work around it (most things can be done easily
 with a couple of nested foreach-es). So, perhaps embarrassingly,
 despite my 3+ years spent hanging around here, part of the problem is
 that I barely know/use phobos. Call me lazy, but I don't think it's an
 unrealistic experience for any end-user. If it saves me time/headache
 (and bloat) not using it, why would I?

 ** Yes, it's the 'standard' library, and I like that concept in
 essence, and feel like I should make use of it on principle... but
 it's like, you need to already know phobos intimately to think it's
 awesome, which creates a weird barrier to entry. And the docs don't
 help a lot.
I think you're tainted by your experience with C. :-) Using Phobos effectively requires that you take the time to understand and use ranges; or, as somebody else said, stick with std.string. But if that doesn't do what you need, then you need to ... er, understand and use ranges. :-P Expecting to use things the same way as in C is probably the root cause for your frustrations.
 Whereas in languages like C, sure you get familiar with
 string-specific functions, but then when you need a
 similar-operating function for an array of ints, you have to name it
 something else, and then basically the same algorithm reimplemented
 for linked lists, called by yet another name, etc.. Added together,
 it's many times more mental load than just learning a single set of
 generic algorithms that work on (almost) everything.

 The composability of generic algorithms also allow me to think on a
 more abstract level -- instead of thinking about manipulating
 individual chars, I can figure out OK, if I split the string by ","
 then I can filter for the strings I'm looking for, then join them
 back again with another delimiter. Since the same set of algorithms
 work with other ranges too, I can apply exactly the same thought
 process for working with arrays, linked lists, and other containers,
 without having to remember 5 different names of essentially the same
 algorithm but applied to 5 different types.
See, I get that idea about composability. Maybe it's just baggage from C, but I just don't think that way. Maybe that's a large part of why I always go wrong with phobos.
Yes, the baggage is slowing you down. Cast it overboard and lighten the boat, man. ;-)
 I would never think of doing something fundamental like string
 processing with a sequence of generic algorithm. I'd freak out about
 the relatively unknown performance characteristics.
I think your caution is misplaced. Things like std.algorithm.find are actually quite efficient -- don't be misled by the verbose layers of template abstractions surrounding the code; for the common cases, it translates to a simple loop. And recently, certain cases even translate straight to C's strchr / memchr, and so are on par with C.
 Algorithms are usually a lot simpler when performed on strings of
 bytes than they are performed on strings of objects with any
 imaginable copying mechanisms and allocations patterns.
Phobos also has lots of template specializations that take advantage of strings and arrays.
 Unless I wrote something myself, I can never have faith that the sort
 of concessions required to make it generic also make it fast in the
 case it happens to be performed in a byte array.
Well, if you're going to insist on NIH syndrome, then you might as well write your own standard library instead of fighting with Phobos. :)
 There's an argument that you can specialise for string types, which is
 true within single functions, but if you're 'composing' a function
 with generic parts, then you can't specialise for strings anymore...
 There's no way to specialise a call to a.b.c() as a compound
 operation.
And how exactly does the C compiler specialize strchr(strcat(a,b),c) as a single compound operation? If you want a single-pass compound operation on a string, you'd have to write it out manually in C... and in D, you could write it out manually too, just use a for loop over the string -- same effort, same performance. Or you could save yourself the trouble and compose two algorithms from std.algorithm, the result of which is *also* single-pass (because ranges are lazy). Sure you can object that there's overhead introduced by using ranges, but since .front translates to just *ptr and .popFront translates to just ++ptr, the only overhead is just a few function calls if the compiler doesn't inline them. Which, for functions that small, it probably does.
 Like I say, it's probably psychological baggage, but I tend to
 unconsciously dismiss/reject that sort of thing without a second
 though...  or maybe experience learned me my lesson (*cough* STL).
OK, let's get one thing straight here. Comparing Phobos to STL is truly unfair. I spent almost 2 decades writing C++, and wrote code both using STL and without (from when STL didn't exist yet), and IME, Phobos's range algorithms are *orders* of magnitude better than STL in terms of usability. At least. In STL, you have to always manage pointer pairs, which become a massive pain when you need to pass multiple pairs around (very error-prone, transpose one argument, and you have a nice segfault or memory corruption bug). Then you have stupid verbose syntax like: // You can't even write the for-loop conditions in a single // line! for (std::vector<MyType<Blah> >::iterator it = myContainer.start(); it != myContainer.end(); it++) { // What's with this (*smartPtr)->x nonsense everywhere? doSomething((*((*it)->impl)->myDataField); // What, I can't even write a simple X != Y if-condition // in a single line?! Not to mention the silly // redundancy of having to write out the entire chain of // dereferences to exactly the same object twice. if (find((*(*it)->impl)->mySubContainer, key) == (*(*it)->impl)->mySubContainer.end()) { // How I long for D's .init! std::vector<MyTypeBlah> >::iterator empty; return empty; } } Whereas in D: foreach (item; myContainer) { doSomething(item.impl.myDataField); if (!item.mySubContainer.canFind(key)) return ElementType!MyContainer.init; } There's no comparison, I tell you. No comparison at all.
 I actually feel a lot more productive in D than in C++ with
 strings.  Boost's string algorithms library helps fill the gap
 (and at least you only have one place to look for documentation
 when you are using it) but overall I prefer my experience working
 in D with pseudo-member chains.
I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
Perhaps you're right. But I think there's ***HUGE*** room for improvement. The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D?
No offense, but IME, junior programmers tend to pick up these things much faster than experienced programmers with lots of baggage from other languages, precisely because they don't have all that baggage to slow them down. Old habits die hard, as they say. That's not to say that the D docs don't need improvement, of course. But given all your objections about Phobos algorithms despite having barely *used* Phobos, I think the source of your difficulty lies more in the baggage than in the documentation. :) T -- Give me some fresh salted fish, please.
Jan 09 2014
parent reply "Atila Neves" <atila.neves gmail.com> writes:
I agree that std.algorithm is better than <algorithm>, but let's 
not pretend that C++11 never happened (that happens from time to 
time on this forum). The modern C++ version isn't _that_ 
different:

     for(auto& blah: myContainer) { //for-loop condition on one 
line
         doSomething(blah->impl->myDataField);
         if(find(blah->impl->mySubContainer.begin(), 
blah->impl->mySubContainer.end(), key) == 
blah->impl->mySubContainer.end()) {
             //decltype is way shorter than 
std::vector<MyType<Blah>>
             //and change-resistant
             return decltype(blah)::iterator{};
         }
      }

Again, I think that std.algorithm is better and that passing a 
pair of iterators to everything when 99.9% of the time they'll be 
begin() and end() anyway is a massive PITA. I'm a D convert. 
Nobody here makes a point of posting D1 code and IMHO there's 
also no point in posting C++98 / C++2003 code.

Atila

 	// You can't even write the for-loop conditions in a single
 	// line!
 	for (std::vector<MyType<Blah> >::iterator it =
 		myContainer.start();
 		it != myContainer.end();
 		it++)
 	{
 		// What's with this (*smartPtr)->x nonsense everywhere?
 		doSomething((*((*it)->impl)->myDataField);

 		// What, I can't even write a simple X != Y if-condition
 		// in a single line?! Not to mention the silly
 		// redundancy of having to write out the entire chain of
 		// dereferences to exactly the same object twice.
 		if (find((*(*it)->impl)->mySubContainer, key) ==
 			(*(*it)->impl)->mySubContainer.end())
 		{
 			// How I long for D's .init!
 			std::vector<MyTypeBlah> >::iterator empty;
 			return empty;
 		}
 	}
 OK, let's get one thing straight here. Comparing Phobos to STL 
 is truly
 unfair. I spent almost 2 decades writing C++, and wrote code 
 both using
 STL and without (from when STL didn't exist yet), and IME, 
 Phobos's
 range algorithms are *orders* of magnitude better than STL in 
 terms of
 usability. At least. In STL, you have to always manage pointer 
 pairs,
 which become a massive pain when you need to pass multiple 
 pairs around
 (very error-prone, transpose one argument, and you have a nice 
 segfault
 or memory corruption bug).  Then you have stupid verbose syntax 
 like:

 	// You can't even write the for-loop conditions in a single
 	// line!
 	for (std::vector<MyType<Blah> >::iterator it =
 		myContainer.start();
 		it != myContainer.end();
 		it++)
 	{
 		// What's with this (*smartPtr)->x nonsense everywhere?
 		doSomething((*((*it)->impl)->myDataField);

 		// What, I can't even write a simple X != Y if-condition
 		// in a single line?! Not to mention the silly
 		// redundancy of having to write out the entire chain of
 		// dereferences to exactly the same object twice.
 		if (find((*(*it)->impl)->mySubContainer, key) ==
 			(*(*it)->impl)->mySubContainer.end())
 		{
 			// How I long for D's .init!
 			std::vector<MyTypeBlah> >::iterator empty;
 			return empty;
 		}
 	}

 Whereas in D:

 	foreach (item; myContainer) {
 		doSomething(item.impl.myDataField);
 		if (!item.mySubContainer.canFind(key))
 			return ElementType!MyContainer.init;
 	}

 There's no comparison, I tell you. No comparison at all.


 I actually feel a lot more productive in D than in C++ with
 strings.  Boost's string algorithms library helps fill the 
 gap
 (and at least you only have one place to look for 
 documentation
 when you are using it) but overall I prefer my experience 
 working
 in D with pseudo-member chains.
I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
Perhaps you're right. But I think there's ***HUGE*** room for improvement. The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D?
No offense, but IME, junior programmers tend to pick up these things much faster than experienced programmers with lots of baggage from other languages, precisely because they don't have all that baggage to slow them down. Old habits die hard, as they say. That's not to say that the D docs don't need improvement, of course. But given all your objections about Phobos algorithms despite having barely *used* Phobos, I think the source of your difficulty lies more in the baggage than in the documentation. :) T
Jan 10 2014
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Jan 10, 2014 at 07:32:23PM +0000, Atila Neves wrote:
 I agree that std.algorithm is better than <algorithm>, but let's not
 pretend that C++11 never happened (that happens from time to time on
 this forum). The modern C++ version isn't _that_ different:
 
     for(auto& blah: myContainer) { //for-loop condition on one line
         doSomething(blah->impl->myDataField);
         if(find(blah->impl->mySubContainer.begin(),
 blah->impl->mySubContainer.end(), key) ==
 blah->impl->mySubContainer.end()) {
             //decltype is way shorter than std::vector<MyType<Blah>>
             //and change-resistant
             return decltype(blah)::iterator{};
         }
      }
 
 Again, I think that std.algorithm is better and that passing a pair
 of iterators to everything when 99.9% of the time they'll be begin()
 and end() anyway is a massive PITA. I'm a D convert. Nobody here
 makes a point of posting D1 code and IMHO there's also no point in
 posting C++98 / C++2003 code.
You're right, my C++ is outdated. I'm not exactly motivated to keep up with the latest version of C++, though, since D is far better, and my day job is primarily with C, and what C++ code we have is still in the dark ages of C++2003 (or perhaps *shudder* even C++98), and is unlikely to be upgraded to C++11 anytime in the foreseeable future. [...]
	// You can't even write the for-loop conditions in a single
	// line!
	for (std::vector<MyType<Blah> >::iterator it =
		myContainer.start();
		it != myContainer.end();
		it++)
	{
		// What's with this (*smartPtr)->x nonsense everywhere?
		doSomething((*((*it)->impl)->myDataField);

		// What, I can't even write a simple X != Y if-condition
		// in a single line?! Not to mention the silly
		// redundancy of having to write out the entire chain of
		// dereferences to exactly the same object twice.
		if (find((*(*it)->impl)->mySubContainer, key) ==
			(*(*it)->impl)->mySubContainer.end())
		{
			// How I long for D's .init!
			std::vector<MyTypeBlah> >::iterator empty;
			return empty;
		}
	}
T -- Heuristics are bug-ridden by definition. If they didn't have bugs, they'd be algorithms.
Jan 10 2014
prev sibling parent "Craig Dillabaugh" <cdillaba cg.scs.carleton.ca> writes:
On Friday, 10 January 2014 at 19:32:24 UTC, Atila Neves wrote:
 I agree that std.algorithm is better than <algorithm>, but 
 let's not pretend that C++11 never happened (that happens from 
 time to time on this forum). The modern C++ version isn't 
 _that_ different:

     for(auto& blah: myContainer) { //for-loop condition on one 
 line
         doSomething(blah->impl->myDataField);
         if(find(blah->impl->mySubContainer.begin(), 
 blah->impl->mySubContainer.end(), key) == 
 blah->impl->mySubContainer.end()) {
             //decltype is way shorter than 
 std::vector<MyType<Blah>>
             //and change-resistant
             return decltype(blah)::iterator{};
         }
      }

 Again, I think that std.algorithm is better and that passing a 
 pair of iterators to everything when 99.9% of the time they'll 
 be begin() and end() anyway is a massive PITA. I'm a D convert. 
 Nobody here makes a point of posting D1 code and IMHO there's 
 also no point in posting C++98 / C++2003 code.

 Atila
In our company we have people working with Visual Studio 2005, so when I am working on common code I still have to avoid any new C++ features! I am 'really' trying to get them to upgrade! Craig
Jan 10 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 12:48, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Fri, Jan 10, 2014 at 11:33:35AM +1000, Manu wrote:
 On 10 January 2014 06:27, H. S. Teoh <hsteoh quickfur.ath.cx> wrote:

 On Thu, Jan 09, 2014 at 06:25:33PM +0000, Brad Anderson wrote:
 On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
[...]
I also find the names of the generic algorithms are often
unrelated to the name of the string operation.  My feeling is,
everyone is always on about how cool D is at string, but other
than 'char[]', and the builtin slice operator, I feel really
unproductive whenever I do any heavy string manipulation in D.
Really?? I find myself much more productive, because I only have to learn one set of generic algorithms, and I can use them not just for strings but for all sorts of other stuff that implement the range API.
That sounds good in theory, but if any time you try and actually use D's generic algorithms you end up with many of the kind of errors you refer to in your prior paragraph, then that basically undermines the whole experience.
Really? I only encounter those kinds of errors once in a while. They *are* extremely annoying when they happen, but on the whole, they're relatively rare. You must be doing something wrong if you're seeing them all the time.
I think not really knowing quite what you need to do in advance elevates the probability of doing something wrong ;) The quality of these range error messages needs to be improved somehow if basic string operations are supposed to use them comfortably.
 I don't like wasting my time, and I don't like pushing my way through
 learning something that I feel is obtuse to begin with, so I usually
 take a side path and work around it (most things can be done easily
 with a couple of nested foreach-es). So, perhaps embarrassingly,
 despite my 3+ years spent hanging around here, part of the problem is
 that I barely know/use phobos. Call me lazy, but I don't think it's an
 unrealistic experience for any end-user. If it saves me time/headache
 (and bloat) not using it, why would I?

 ** Yes, it's the 'standard' library, and I like that concept in
 essence, and feel like I should make use of it on principle... but
 it's like, you need to already know phobos intimately to think it's
 awesome, which creates a weird barrier to entry. And the docs don't
 help a lot.
I think you're tainted by your experience with C. :-) Using Phobos effectively requires that you take the time to understand and use ranges; or, as somebody else said, stick with std.string. But if that doesn't do what you need, then you need to ... er, understand and use ranges. :-P Expecting to use things the same way as in C is probably the root cause for your frustrations.
I don't agree that something like ranges shouldn't be more or less intuitive. C doesn't have ranges, so I don't think I'm really transposing C baggage when considering how to debug my mistakes in range based code in this case. Like most things, once you know your way around it, it's fine, but is there opportunities (mostly in trivial things like better naming conventions/standards and improved error messages) to make it a whole lot more intuitive?
 Whereas in languages like C, sure you get familiar with
 string-specific functions, but then when you need a
 similar-operating function for an array of ints, you have to name it
 something else, and then basically the same algorithm reimplemented
 for linked lists, called by yet another name, etc.. Added together,
 it's many times more mental load than just learning a single set of
 generic algorithms that work on (almost) everything.

 The composability of generic algorithms also allow me to think on a
 more abstract level -- instead of thinking about manipulating
 individual chars, I can figure out OK, if I split the string by ","
 then I can filter for the strings I'm looking for, then join them
 back again with another delimiter. Since the same set of algorithms
 work with other ranges too, I can apply exactly the same thought
 process for working with arrays, linked lists, and other containers,
 without having to remember 5 different names of essentially the same
 algorithm but applied to 5 different types.
See, I get that idea about composability. Maybe it's just baggage from C, but I just don't think that way. Maybe that's a large part of why I always go wrong with phobos.
Yes, the baggage is slowing you down. Cast it overboard and lighten the boat, man. ;-)
 I would never think of doing something fundamental like string
 processing with a sequence of generic algorithm. I'd freak out about
 the relatively unknown performance characteristics.
I think your caution is misplaced. Things like std.algorithm.find are actually quite efficient -- don't be misled by the verbose layers of template abstractions surrounding the code; for the common cases, it translates to a simple loop. And recently, certain cases even translate straight to C's strchr / memchr, and so are on par with C.
Surely it can't do that if the operation requires any composition? How do you specialise a composed sequence of operations?
 Algorithms are usually a lot simpler when performed on strings of
 bytes than they are performed on strings of objects with any
 imaginable copying mechanisms and allocations patterns.
Phobos also has lots of template specializations that take advantage of strings and arrays.
Again, I'm talking WRT composition specifically here.
 Unless I wrote something myself, I can never have faith that the sort
 of concessions required to make it generic also make it fast in the
 case it happens to be performed in a byte array.
Well, if you're going to insist on NIH syndrome, then you might as well write your own standard library instead of fighting with Phobos. :)
 There's an argument that you can specialise for string types, which is
 true within single functions, but if you're 'composing' a function
 with generic parts, then you can't specialise for strings anymore...
 There's no way to specialise a call to a.b.c() as a compound
 operation.
And how exactly does the C compiler specialize strchr(strcat(a,b),c) as a single compound operation?
That's equally a composed statement. It's the same as the concern I raise. I was refering to cases where D requires a composed statement as opposed to cases where other languages may have some explicit function that does a single complex thing. And I'm not talking about specifics, I was illustrating the nature of my psychological baggage :) .. I have an unreasonable distrust towards requiring composed statements to do very simple things. It's not a specific criticism, it's a comment. If you want a single-pass compound operation on a string, you'd have to
 write it out manually in C... and in D, you could write it out manually
 too, just use a for loop over the string -- same effort, same
 performance. Or you could save yourself the trouble and compose two
 algorithms from std.algorithm, the result of which is *also* single-pass
 (because ranges are lazy). Sure you can object that there's overhead
 introduced by using ranges, but since .front translates to just *ptr and
 .popFront translates to just ++ptr, the only overhead is just a few
 function calls if the compiler doesn't inline them. Which, for functions
 that small, it probably does.
Surely it can't be *ptr and ++ptr as you say, otherwise none of it would be unicode safe...?
 Like I say, it's probably psychological baggage, but I tend to
 unconsciously dismiss/reject that sort of thing without a second
 though...  or maybe experience learned me my lesson (*cough* STL).
OK, let's get one thing straight here. Comparing Phobos to STL is truly unfair. I spent almost 2 decades writing C++, and wrote code both using STL and without (from when STL didn't exist yet), and IME, Phobos's range algorithms are *orders* of magnitude better than STL in terms of usability. At least. In STL, you have to always manage pointer pairs, which become a massive pain when you need to pass multiple pairs around (very error-prone, transpose one argument, and you have a nice segfault or memory corruption bug). Then you have stupid verbose syntax like: // You can't even write the for-loop conditions in a single // line! for (std::vector<MyType<Blah> >::iterator it = myContainer.start(); it != myContainer.end(); it++) { // What's with this (*smartPtr)->x nonsense everywhere? doSomething((*((*it)->impl)->myDataField); // What, I can't even write a simple X != Y if-condition // in a single line?! Not to mention the silly // redundancy of having to write out the entire chain of // dereferences to exactly the same object twice. if (find((*(*it)->impl)->mySubContainer, key) == (*(*it)->impl)->mySubContainer.end()) { // How I long for D's .init! std::vector<MyTypeBlah> >::iterator empty; return empty; } } Whereas in D: foreach (item; myContainer) { doSomething(item.impl.myDataField); if (!item.mySubContainer.canFind(key)) return ElementType!MyContainer.init; } There's no comparison, I tell you. No comparison at all.
Yes, I'm aware that it's syntactically superior, but the quality of the error messages isn't much better than STL. I also find things easier to find and/or more logically named (probably biased from past exposure, i know) in the STL than in phobos.
 I actually feel a lot more productive in D than in C++ with
 strings.  Boost's string algorithms library helps fill the gap
 (and at least you only have one place to look for documentation
 when you are using it) but overall I prefer my experience working
 in D with pseudo-member chains.
I found that what I got out of taking the time to learn std.algorithm and std.range was worth far more than the effort invested.
Perhaps you're right. But I think there's ***HUGE*** room for improvement. The key in your sentence is, it shouldn't require 'effort'; if it's not intuitive to programmers with decades of experience, then there are probably some fundamental design (or documentation/accessibility) deficiencies that needs to be prioritised. How is any junior programmer meant to take to D?
No offense, but IME, junior programmers tend to pick up these things much faster than experienced programmers with lots of baggage from other languages, precisely because they don't have all that baggage to slow them down. Old habits die hard, as they say.
Maybe you're right, but I can't imagine many juniors that would be capable of tracking down what went wrong when they inevitably made a mistake and get met with weird errors relating to ranges and template constraints and all that good stuff... Maybe they'd be doing it differently in the first place though? Who knows. That's not to say that the D docs don't need improvement, of course. But
 given all your objections about Phobos algorithms despite having barely
 *used* Phobos, I think the source of your difficulty lies more in the
 baggage than in the documentation. :)
I already said that myself. But I'd like to think the experience could be smoother, more helpful, and more intuitive. I don't think you can say it's perfect, or even particularly 'good'. It's acceptable, it does seem to work, but it's not an easy learning curve, and it's hard to take in small steps, or to absorb via osmosis. Every time I try and repeat something that 'I kinda remember seeing a few months ago' and 'it was kinda like this...', it takes me AGES to get right. Always finicky little details that take the most time, and I often find the phobos source code more helpful than the docs, which isn't a good sign. That's my general point. I think there's a lot of room for case study, and improvement.
Jan 09 2014
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 07:18, Manu wrote:

 Surely it can't be *ptr and ++ptr as you say, otherwise none of it would
 be unicode safe...?
For UTF-8 strings it's an extra if-statement: immutable c = str[0]; if(c < 0x80) { //ptr is used to avoid unnnecessary bounds checking. str = str.ptr[1 .. str.length]; } -- /Jacob Carlborg
Jan 10 2014
prev sibling parent Walter Bright <newshound2 digitalmars.com> writes:
On 1/9/2014 10:18 PM, Manu wrote:
 Always
 finicky little details that take the most time, and I often find the phobos
 source code more helpful than the docs, which isn't a good sign.

 That's my general point. I think there's a lot of room for case study, and
 improvement.
You're right, and I see the same thing when I use ranges. The only way to tackle it is when running into things, on a case by case basis, submit improvement proposals.
Jan 26 2014
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-01-09 15:07, Manu wrote:
 This works fine:
    string x = find("Hello", 'H');

 This doesn't:
    string y = find(retro("Hello"), 'H');
    > Error: cannot implicitly convert expression (find(retro("Hello"),
 'H')) of type Result!() to string

 Is that wrong? That seems to be how the docs suggest it should be used.
As other as said, the problem is that "find" returns a range, which is not implicitly convertible to "string". The main reason is to avoid temporary allocations when chaining algorithms. If it was the other way around you would probably be complaining it wasn't efficient enough ;)
 On a side note, am I the only one that finds std.algorithm/std.range/etc
 for string processing really obtuse?
 I can rarely understand the error messages, so say it's better than STL
 is optimistic.
 Using std.algorithm and std.range to do string manipulation feels really
 lame to me.
 I hate looking through the docs of 3-4 modules to understand the
 complete set of useful string operations (std.string, std.uni,
 std.algorithm, std.range... at least).
You forgot std.array ;)
 I also find the names of the generic algorithms are often unrelated to
 the name of the string operation.
 My feeling is, everyone is always on about how cool D is at string, but
 other than 'char[]', and the builtin slice operator, I feel really
 unproductive whenever I do any heavy string manipulation in D.
You have built-in appending, concatenation, using strings in switch statements and so on.
 I also hate that I need to import at least 4-5 modules to do anything
 useful with strings... I feel my program bloating and cringe with every
 gigantic import that sources exactly one symbol.
I agree with you. I have built up a small library through out the years that basically allows me to only import a single module to do most string operations I need. You probably don't like it but you could have a look at Tango as well. It contains two useful modules (for this case). One for handling arbitrary array operators and one for string operations. tango.core.Array tango.text.Util https://github.com/SiegeLord/Tango-D2 http://siegelord.github.io/Tango-D2/ -- /Jacob Carlborg
Jan 09 2014
parent reply Manu <turkeyman gmail.com> writes:
On 10 January 2014 06:34, Jacob Carlborg <doob me.com> wrote:

 On 2014-01-09 15:07, Manu wrote:

 This works fine:
    string x = find("Hello", 'H');

 This doesn't:
    string y = find(retro("Hello"), 'H');
    > Error: cannot implicitly convert expression (find(retro("Hello"),
 'H')) of type Result!() to string

 Is that wrong? That seems to be how the docs suggest it should be used.
As other as said, the problem is that "find" returns a range, which is not implicitly convertible to "string". The main reason is to avoid temporary allocations when chaining algorithms. If it was the other way around you would probably be complaining it wasn't efficient enough ;)
Then there's probably a fundamental problem somewhere, and it should be re-thought at a lower level. Perhaps even something super simple like a can't-go-wrong naming convention, that makes it REALLY plain when string related function are dealing with bytes, codepoints, or graphemes? It would seem to be that a lot of the confusion and complexity surrounding strings is because it tries to be 'correct' (and varying levels of correct in different circumstances), but there are no clear relationships between different functions that deal with these different versions of 'correct'-ness. On a side note, am I the only one that finds std.algorithm/std.range/etc
 for string processing really obtuse?
 I can rarely understand the error messages, so say it's better than STL
 is optimistic.
 Using std.algorithm and std.range to do string manipulation feels really
 lame to me.
 I hate looking through the docs of 3-4 modules to understand the
 complete set of useful string operations (std.string, std.uni,
 std.algorithm, std.range... at least).
You forgot std.array ;)
I did! And there are probably others too. You can't do anything without std.typecons either. Although not directly related, it's always seems to be there alongside. I also find the names of the generic algorithms are often unrelated to
 the name of the string operation.
 My feeling is, everyone is always on about how cool D is at string, but
 other than 'char[]', and the builtin slice operator, I feel really
 unproductive whenever I do any heavy string manipulation in D.
You have built-in appending, concatenation, using strings in switch statements and so on.
Correct, those things are good. That is where 'D is awesome at strings' ends though, in my opinion. I also hate that I need to import at least 4-5 modules to do anything
 useful with strings... I feel my program bloating and cringe with every
 gigantic import that sources exactly one symbol.
I agree with you. I have built up a small library through out the years that basically allows me to only import a single module to do most string operations I need.
I suspect your effort is not uncommon. Is this not clear evidence of a critical problem? You probably don't like it but you could have a look at Tango as well. It
 contains two useful modules (for this case). One for handling arbitrary
 array operators and one for string operations.

 tango.core.Array
 tango.text.Util

 https://github.com/SiegeLord/Tango-D2
 http://siegelord.github.io/Tango-D2/
Yeah... I want less libraries, not more :/
Jan 09 2014
parent Jacob Carlborg <doob me.com> writes:
On 2014-01-10 02:06, Manu wrote:

 Then there's probably a fundamental problem somewhere, and it should be
 re-thought at a lower level.
 Perhaps even something super simple like a can't-go-wrong naming
 convention, that makes it REALLY plain when string related function are
 dealing with bytes, codepoints, or graphemes?
Isn't it with convention that every thing _can_ go wrong.
 It would seem to be that a lot of the confusion and complexity
 surrounding strings is because it tries to be 'correct' (and varying
 levels of correct in different circumstances), but there are no clear
 relationships between different functions that deal with these different
 versions of 'correct'-ness.
I think the confusion comes from strings are just plain arrays, which are also containers. If there's a function that works on containers it will work on arrays and strings as well. Because of that it's put in a general module for containers, in this case std.algorithm. Functions that work on arrays will also work on strings and they're put in the most general location they fit, std.array. Functions that work only work on strings are put in std.string. The we of course have some other modules, like std.uni and std.utf making it a bit more complicated.
 I suspect your effort is not uncommon. Is this not clear evidence of a
 critical problem?
Probably. I find that to be a problem in most standard libraries. They have very general functionality but very few convenient functions, that require calling two or three functions and perhaps creating an object. -- /Jacob Carlborg
Jan 10 2014
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Thursday, 9 January 2014 at 14:08:02 UTC, Manu wrote:
 [snip]
Using std.algorithm or std.range requires learning about ranges. You shouldn't be surprised that string handling with ranges works differently from specialized string handling functions, which is the norm in most languages. For anyone with even a cursory knowledge of ranges and range algorithms, it's no surprise when the result of a range composition is not of string type even when the input is a string. If you don't want to learn about ranges, use std.string. If std.string is not sufficient, then you should consider learning about ranges, which means accepting that yes, things will be different. Learning about ranges and how to use them for string manipulation is not the easiest thing right now due to a dearth of learning material, but that's not a problem with ranges. Compiler error messages are indeed part of the problem, but they are a WIP. 2.065 contains an incremental improvement to error messages on failure of overload resolution (Thanks Kenji). About Unicode, the unit that the language promotes and the standard library embraces is `dchar`, the Unicode code point. The choice of not using graphemes is a compromise between correctness and performance. That means that the onus is still on the user to cover the last mile of correctness, so the user is not exempt from having to learn at least the basics of Unicode in order to write Unicode-correct code in D. However, this is a surprisingly reasonable compromise: as long as all inputs are normalized to the same format (which may require std.uni.normalize if the source of the input does not guarantee a particular format), then outside of contrived examples it's very hard to break grapheme clusters by using range-based code, even though they are ranges of code points. Explicit handling of graphemes is typically only needed for very specific domains, like if you're writing a text rendering library or a text input box etc. Thus typical range-based string manipulation tends to be correct even for multi-code-point graphemes, without the author having to consciously handle it. 2.065 has std.uni.byGrapheme/byCodePoint for range-based grapheme manipulation. However, there is a performance cost involved so I recommend against using it dogmatically. The result of `byGrapheme` is not bidirectional yet - someone needs to take the time to implement `decodeGraphemeBack` and/or `graphemeStrideBack` first.
Jan 09 2014