digitalmars.D - std.string will get the boot

Andrei Alexandrescu (31/31) Jan 29 2010 I plan a few improvements to Phobos that will improve string handling.

bearophile (8/16) Jan 29 2010 32 bits are not enough to represent certain "characters", they need more...

Andrei Alexandrescu (13/29) Jan 29 2010 I think it's a tad late for that.

Simen kjaeraas (5/9) Jan 30 2010 So adding aliases to object.d is not possible this late in the process?

Andrei Alexandrescu (3/13) Jan 30 2010 That would be possible.
Denis Koroskin (4/12) Jan 31 2010 Everyone can do that on their own. I see no reason to pollute the

Simen kjaeraas (5/20) Jan 31 2010 Nor do I. I was only inquiring as to its feasibility.

Lionello Lunesu (10/34) Jan 30 2010 I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF

Michel Fortin (22/37) Jan 30 2010 32-bit is enough to cover all code points. But there are many combining
Simen kjaeraas (13/19) Jan 31 2010 struct Typedef( T ) {

Denis Koroskin (8/25) Jan 31 2010 a =
Lionello Lunesu (12/36) Feb 01 2010 Using alias you loose all type safety.

Jacob Carlborg (6/37) Jan 29 2010 I would keep std.string for string specific functions and perhaps

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (12/15) Jan 29 2010 I've been thinking about characters lately and have realized that

Andrei Alexandrescu (6/25) Jan 29 2010 My thoughts exactly. In fact I'm thinking of generalizing toupper and
Jacob Carlborg (6/22) Jan 29 2010 I'm not sure I really understand this, probably because I don't know

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (25/54) Jan 29 2010 'i' and 'i' are the same "character", because they have the same ASCII

Andrei Alexandrescu (8/17) Jan 29 2010 My idea of functions for upper/lowercase would help you solve exactly

dsimcha (5/7) Jan 29 2010 Please, no. I **HATE** fine-grained imports like Tango has. I don't wa...

Jonathan M Davis (14/24) Jan 29 2010 We need a balance. Fine-grained can be great, but if it's too fine-grain...

Lutger (12/20) Jan 29 2010 I like how naturaldocs, which is similar to ddoc helps with this: by

Lutger (3/4) Jan 29 2010 sorry, wrong anchor:
Andrei Alexandrescu (7/34) Jan 29 2010 I think the idea of tags is awesome, particularly because it doesn't

Lutger (5/39) Jan 29 2010 Cool, tags are even better (naturaldocs groups aren't tags really). How
bearophile (9/14) Jan 29 2010 I am far from expert about such hairy matters, so I can be wrong. This i...

Lutger (4/8) Jan 29 2010 This is about the documentation, which at the moment is based on the

Andrei Alexandrescu (9/26) Jan 29 2010 I don't think it would be too far-fetched to define and use tags for

bearophile (6/9) Jan 29 2010 A next step is to allow to import all names with a specified tag, even i...

Robert Jacques (6/36) Jan 29 2010 By the way, in the sort term you could greatly improve the usability of ...

Andrei Alexandrescu (5/45) Jan 29 2010 That jump to index is automatically generated. I can have it sorted

Clemens (3/28) Feb 02 2010 I think you may misunderstand what the "alias this" construct does. It d...

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

I plan a few improvements to Phobos that will improve string handling.

Currently arrays of characters count as random-access ranges, which is 
not true for arrays of char and wchar. I plan to make std.range aware of 
that and only characterize char[] and wchar[] (and their qualified 
versions) as bidirectional ranges. Also, std.range will define s.front 
and s.back for strings to return the correctly decoded dchar. Naturally, 
s.popFront and s.popBack will yank an entire encoded character, which is 
what you want most of the time anyway. (You're still free to do s = s[1 
.. $] if that's what you need.)

These changes will have the great effect of enabling std.algorithm to 
work with strings correctly without any further impedance adaptation. 
(At some point I'd defined byDchar to wrap a string as a bidirectional 
range; it works, but of course it's much better without an intermediary.)

Following that change, I plan to eliminate std.string entirely and roll 
all of its functionality into std.algorithm. This is because I noticed 
that I'd like many string functions to be available for other data 
types, and also because people who want to define their own non-UTF 
encodings can benefit of the support that UTF already has.

(As an example, startsWith or endsWith are very useful not only with 
strings, but general data as well.)

A possible idea would be to move algorithms out of std.string and roll 
std.utf and std.encoding into std.string. That way std.string becomes 
something UTF-specific, which may be sensible.

One problem I foresee is the growth of std.algorithm. It already has 
many things in it, and I fear that some user who just wants to trim a 
string may find it intimidating to browse through all that 
documentation. I wonder how we could break std.algorithm into smaller 
units (which is an issue largely independent from generalizing the 
algorithms now found in std.string).

Any ideas are welcome.


Andrei

Jan 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 Currently arrays of characters count as random-access ranges, which is 
 not true for arrays of char and wchar. I plan to make std.range aware of 
 that and only characterize char[] and wchar[] (and their qualified 
 versions) as bidirectional ranges.

32 bits are not enough to represent certain "characters", they need more than
one of such dchar. So dchar too may be a bidirectional range.

I can't remember the bit size of wchar and dchar. So names like char, char16
and char32 can be better...

Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to
use cast(ubyte[]) every time I use an algorithm on them :-)


 One problem I foresee is the growth of std.algorithm. It already has 
 many things in it, and I fear that some user who just wants to trim a 
 string may find it intimidating to browse through all that 
 documentation.

It's not just a matter of documentation: to choose among n items a human needs
more time as n grows (people that designg important menus in GUIs must be aware
of this). So huge APIs slow down programming.
A possible solution is to keep the std.string module, but make it just a list
of aliases and thin wrappers around functions of std.algorithm, tuned for
string processing (example I usually don't need tolower on generic arrays),
there are some operations that are mostly useful for strings).

Bye,
bearophile

Jan 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

bearophile wrote:
 Andrei Alexandrescu:
 Currently arrays of characters count as random-access ranges, which is 
 not true for arrays of char and wchar. I plan to make std.range aware of 
 that and only characterize char[] and wchar[] (and their qualified 
 versions) as bidirectional ranges.

 
 32 bits are not enough to represent certain "characters", they need more than
one of such dchar. So dchar too may be a bidirectional range.

[citation needed]

 I can't remember the bit size of wchar and dchar. So names like char, char16
and char32 can be better...

I think it's a tad late for that.

 Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced
to use cast(ubyte[]) every time I use an algorithm on them :-)

That's exactly one of the cases in which my change would help. char is 
UTF-8, so that's out as an option for expressing ASCII characters. 
You'll be able to define your own type:

struct AsciiChar {
    ubyte datum;
    ...
}

Then express stuff in terms of AsciiChar[] etc.

 One problem I foresee is the growth of std.algorithm. It already has 
 many things in it, and I fear that some user who just wants to trim a 
 string may find it intimidating to browse through all that 
 documentation.

 
 It's not just a matter of documentation: to choose among n items a human needs
more time as n grows (people that designg important menus in GUIs must be aware
of this). So huge APIs slow down programming.
 A possible solution is to keep the std.string module, but make it just a list
of aliases and thin wrappers around functions of std.algorithm, tuned for
string processing (example I usually don't need tolower on generic arrays),
there are some operations that are mostly useful for strings).

That's a good possibility.

Andrei

Jan 29 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 bearophile wrote:
 I can't remember the bit size of wchar and dchar. So names like char,  
 char16 and char32 can be better...

 I think it's a tad late for that.

So adding aliases to object.d is not possible this late in the process?
I'm not sure I want that to happen, just out of curiosity.

-- 
Simen

Jan 30 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Simen kjaeraas wrote:
 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:
 
 bearophile wrote:
 I can't remember the bit size of wchar and dchar. So names like char, 
 char16 and char32 can be better...

 I think it's a tad late for that.

 
 So adding aliases to object.d is not possible this late in the process?
 I'm not sure I want that to happen, just out of curiosity.

That would be possible.

Andrei

Jan 30 2010

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 31 Jan 2010 01:30:41 +0300, Simen kjaeraas  
<simen.kjaras gmail.com> wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 bearophile wrote:
 I can't remember the bit size of wchar and dchar. So names like char,  
 char16 and char32 can be better...

 I think it's a tad late for that.

 So adding aliases to object.d is not possible this late in the process?
 I'm not sure I want that to happen, just out of curiosity.

Everyone can do that on their own. I see no reason to pollute the  
namespace.

Jan 31 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

On Sun, 31 Jan 2010 15:09:28 +0100, Denis Koroskin <2korden gmail.com>  
wrote:

 On Sun, 31 Jan 2010 01:30:41 +0300, Simen kjaeraas  
 <simen.kjaras gmail.com> wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 bearophile wrote:
 I can't remember the bit size of wchar and dchar. So names like char,  
 char16 and char32 can be better...

 I think it's a tad late for that.

 So adding aliases to object.d is not possible this late in the process?
 I'm not sure I want that to happen, just out of curiosity.

 Everyone can do that on their own. I see no reason to pollute the  
 namespace.

Nor do I. I was only inquiring as to its feasibility.

-- 
Simen

Jan 31 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

On 30-1-2010 1:59, Andrei Alexandrescu wrote:
 bearophile wrote:
 Andrei Alexandrescu:
 Currently arrays of characters count as random-access ranges, which
 is not true for arrays of char and wchar. I plan to make std.range
 aware of that and only characterize char[] and wchar[] (and their
 qualified versions) as bidirectional ranges.

 32 bits are not enough to represent certain "characters", they need
 more than one of such dchar. So dchar too may be a bidirectional range.

 
 [citation needed]

I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF
as the highest code point.

 Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be
 forced to use cast(ubyte[]) every time I use an algorithm on them :-)

 
 That's exactly one of the cases in which my change would help. char is
 UTF-8, so that's out as an option for expressing ASCII characters.
 You'll be able to define your own type:
 
 struct AsciiChar {
    ubyte datum;
    ...
 }
 
 Then express stuff in terms of AsciiChar[] etc.

I miss typedef. I think this is exactly what typedef was intended
for. Perhaps we can reintroduce it as a 'short hand' for such a
struct?

By the way, ASCII is a subset of UTF-8 (that was the whole
point), so there's no reason why 'char[]' can't still be used for
ASCII strings, right?

L.

Jan 30 2010

Michel Fortin <michel.fortin michelf.com> writes:

On 2010-01-30 22:06:06 -0500, Lionello Lunesu <lio lunesu.remove.com> said:

 On 30-1-2010 1:59, Andrei Alexandrescu wrote:
 bearophile wrote:
 Andrei Alexandrescu:
 Currently arrays of characters count as random-access ranges, which
 is not true for arrays of char and wchar. I plan to make std.range
 aware of that and only characterize char[] and wchar[] (and their
 qualified versions) as bidirectional ranges.

 
 32 bits are not enough to represent certain "characters", they need
 more than one of such dchar. So dchar too may be a bidirectional range.

 
 [citation needed]

 
 I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF
 as the highest code point.

32-bit is enough to cover all code points. But there are many combining 
code points in Unicode, allowing you to combine diacritic with various 
other characters, such as an acute accent with a 'k'. Some of these 
combinations exists in precombined form and are considered equivalent. 
So if you want to count the number of characters the user actually see 
instead of counting code points, then you need to take these combining 
code points into account.

But if you really wanted to iterate over "characters" instead of code 
points, note that it can become quite hard if you take into account 
double diacritics, combining diacritic signs placed across two letters. 
So I think it's reasonable to have dchar, a code point, as the base 
unit for iterating over a string.

http://en.wikipedia.org/wiki/Combining_character
http://en.wikipedia.org/wiki/Unicode_normalization

Another interesting case:
http://en.wikipedia.org/wiki/Combining_grapheme_joiner

Unicode, isn't it great?


-- 
Michel Fortin
michel.fortin michelf.com
http://michelf.com/

Jan 30 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Lionello Lunesu <lio lunesu.remove.com> wrote:

 I miss typedef. I think this is exactly what typedef was intended
 for. Perhaps we can reintroduce it as a 'short hand' for such a
 struct?

struct Typedef( T ) {
   T payload;
   alias payload this;
}

Usage:

alias Typedef!( int ) myInt;

Is this what you want?

 By the way, ASCII is a subset of UTF-8 (that was the whole
 point), so there's no reason why 'char[]' can't still be used for
 ASCII strings, right?

AS far as I have understood (I am no Unicode guru), in some locales  
toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being a  
strict subset of UTF-8 is not always true.

-- 
Simen

Jan 31 2010

"Denis Koroskin" <2korden gmail.com> writes:

On Sun, 31 Jan 2010 11:34:03 +0300, Simen kjaeraas  =

<simen.kjaras gmail.com> wrote:

 Lionello Lunesu <lio lunesu.remove.com> wrote:

 I miss typedef. I think this is exactly what typedef was intended
 for. Perhaps we can reintroduce it as a 'short hand' for such a
 struct?

 struct Typedef( T ) {
    T payload;
    alias payload this;
 }

 Usage:

 alias Typedef!( int ) myInt;

 Is this what you want?

 By the way, ASCII is a subset of UTF-8 (that was the whole
 point), so there's no reason why 'char[]' can't still be used for
 ASCII strings, right?

 AS far as I have understood (I am no Unicode guru), in some locales  =

 toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being=

 a  =

 strict subset of UTF-8 is not always true.

I only know one example (in turkish):

i < - > =C4=B0
=C4=B1 < - > I

That's a big issue because toUpper/toLower needs a locale to provide  =

correct result.

Jan 31 2010

Lionello Lunesu <lio lunesu.remove.com> writes:

On 31-1-2010 16:34, Simen kjaeraas wrote:
 Lionello Lunesu <lio lunesu.remove.com> wrote:
 
 I miss typedef. I think this is exactly what typedef was intended
 for. Perhaps we can reintroduce it as a 'short hand' for such a
 struct?

 
 struct Typedef( T ) {
   T payload;
   alias payload this;
 }
 
 Usage:
 
 alias Typedef!( int ) myInt;
 
 Is this what you want?

Using alias you loose all type safety.

I remember Andrei mentioned that he and Walter couldn't agree
whether typedef should behave as a sub or super class. I think it
should not be looked at from a inheritance perspective, but just
consider it as wrapper struct with a ctor that takes the
underlying type.

 By the way, ASCII is a subset of UTF-8 (that was the whole
 point), so there's no reason why 'char[]' can't still be used for
 ASCII strings, right?

 
 AS far as I have understood (I am no Unicode guru), in some locales
 toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being a
 strict subset of UTF-8 is not always true.
 

True, but then that upper resp lowercase would no longer be
ASCII. As long as you stick to ASCII, char[] should work just fine.

So, toLower and toUpper can accept ASCII char[] but always output
one of those new char ranges. Problem fixed :)

L.

Feb 01 2010

Jacob Carlborg <doob me.com> writes:

On 1/29/10 18:36, Andrei Alexandrescu wrote:
 I plan a few improvements to Phobos that will improve string handling.

 Currently arrays of characters count as random-access ranges, which is
 not true for arrays of char and wchar. I plan to make std.range aware of
 that and only characterize char[] and wchar[] (and their qualified
 versions) as bidirectional ranges. Also, std.range will define s.front
 and s.back for strings to return the correctly decoded dchar. Naturally,
 s.popFront and s.popBack will yank an entire encoded character, which is
 what you want most of the time anyway. (You're still free to do s = s[1
 .. $] if that's what you need.)

 These changes will have the great effect of enabling std.algorithm to
 work with strings correctly without any further impedance adaptation.
 (At some point I'd defined byDchar to wrap a string as a bidirectional
 range; it works, but of course it's much better without an intermediary.)

 Following that change, I plan to eliminate std.string entirely and roll
 all of its functionality into std.algorithm. This is because I noticed
 that I'd like many string functions to be available for other data
 types, and also because people who want to define their own non-UTF
 encodings can benefit of the support that UTF already has.

I would keep std.string for string specific functions and perhaps 
publicly import std.algorithm. For exmaple functions like: tolower, icmp 
and toStringz.

 (As an example, startsWith or endsWith are very useful not only with
 strings, but general data as well.)

 A possible idea would be to move algorithms out of std.string and roll
 std.utf and std.encoding into std.string. That way std.string becomes
 something UTF-specific, which may be sensible.

 One problem I foresee is the growth of std.algorithm. It already has
 many things in it, and I fear that some user who just wants to trim a
 string may find it intimidating to browse through all that
 documentation. I wonder how we could break std.algorithm into smaller
 units (which is an issue largely independent from generalizing the
 algorithms now found in std.string).

Perhaps it's time to start adding more packages than just the std. Make 
std.algorithm a package and try to split it into several modules.

 Any ideas are welcome.


 Andrei

Jan 29 2010

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

Jacob Carlborg wrote:

 I would keep std.string for string specific functions and perhaps
 publicly import std.algorithm. For exmaple functions like: tolower, icmp
 and toStringz.

I've been thinking about characters lately and have realized that 
tolower, toupper, icmp, and friends should not be in a string library. 
Those functions need an "alphabet" to be useful; not language, nor locale...

In fact, the character itself must have alphabet information. Otherwise 
a string like "ali & jim" cannot be converted to upper-case correctly(*) 
as "ALİ & JIM". And the word "correctly" there depends on each 
character's alphabet.

Similarly, two characters that look the same cannot be compared for 
ordering. Comparing the 'x' of one alphabet to the 'x' of another 
alphabet is a meaningless operation.

Ali

Jan 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Ali Çehreli wrote:
 Jacob Carlborg wrote:

  > I would keep std.string for string specific functions and perhaps
  > publicly import std.algorithm. For exmaple functions like: tolower, icmp
  > and toStringz.

 I've been thinking about characters lately and have realized that 
 tolower, toupper, icmp, and friends should not be in a string library. 
 Those functions need an "alphabet" to be useful; not language, nor 
 locale...

 In fact, the character itself must have alphabet information. Otherwise 
 a string like "ali & jim" cannot be converted to upper-case correctly(*) 
 as "ALİ & JIM". And the word "correctly" there depends on each 
 character's alphabet.

 Similarly, two characters that look the same cannot be compared for 
 ordering. Comparing the 'x' of one alphabet to the 'x' of another 
 alphabet is a meaningless operation.

My thoughts exactly. In fact I'm thinking of generalizing toupper and 
tolower for strings to take an optional trie mapping strings to strings. 
That way correct capitalization can be done for any string, given a good 
collection of capitalization patterns.

Andrei

Jan 29 2010

Jacob Carlborg <doob me.com> writes:

On 1/29/10 22:18, Ali Çehreli wrote:
 Jacob Carlborg wrote:

  > I would keep std.string for string specific functions and perhaps
  > publicly import std.algorithm. For exmaple functions like: tolower, icmp
  > and toStringz.

 I've been thinking about characters lately and have realized that
 tolower, toupper, icmp, and friends should not be in a string library.
 Those functions need an "alphabet" to be useful; not language, nor
 locale...

 In fact, the character itself must have alphabet information. Otherwise
 a string like "ali & jim" cannot be converted to upper-case correctly(*)
 as "ALİ & JIM". And the word "correctly" there depends on each
 character's alphabet.

 Similarly, two characters that look the same cannot be compared for
 ordering. Comparing the 'x' of one alphabet to the 'x' of another
 alphabet is a meaningless operation.

 Ali

I'm not sure I really understand this, probably because I don't know 
much about how Unciode works. I'm thinking out loud:

If "i", as you have in "ali", have the corresponding "İ" as upper case 
wouldn't that be another character than the English "i"? If so, I'm not 
sure I see the problem. If not, I see the problem.

Jan 29 2010

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

Jacob Carlborg wrote:
 On 1/29/10 22:18, Ali Çehreli wrote:
 Jacob Carlborg wrote:

  > I would keep std.string for string specific functions and perhaps
  > publicly import std.algorithm. For exmaple functions like: tolower,
 icmp
  > and toStringz.

 I've been thinking about characters lately and have realized that
 tolower, toupper, icmp, and friends should not be in a string library.
 Those functions need an "alphabet" to be useful; not language, nor
 locale...

 In fact, the character itself must have alphabet information. Otherwise
 a string like "ali & jim" cannot be converted to upper-case correctly(*)
 as "ALİ & JIM". And the word "correctly" there depends on each
 character's alphabet.

 Similarly, two characters that look the same cannot be compared for
 ordering. Comparing the 'x' of one alphabet to the 'x' of another
 alphabet is a meaningless operation.

 Ali

 I'm not sure I really understand this, probably because I don't know
 much about how Unciode works. I'm thinking out loud:

 If "i", as you have in "ali", have the corresponding "İ" as upper case
 wouldn't that be another character than the English "i"?

'i' and 'i' are the same "character", because they have the same ASCII 
and Unicode values in different alphabets. But it is not the same 
"letter" when they are part of different text.

iİ (and ıI) issue is probably too special. A number of Turkic alphabets 
chose ASCII 'i' probably for historical reasons. Unicode did not define 
a separate code point for 'i' either, probably because those alphabets 
already were using the ASCII 'i'.

 If so, I'm not
 sure I see the problem. If not, I see the problem.

The letter 'i' (and I) is special but the issue is valid for any other 
letter: Is it valid to compare an 'i' in English text to an 'i' in 
German text?

I think it's only valid at the lowest data representation level. And 
ASCII never claims to be more than a code table for "information 
interchange". That part is fine.

The problem is with the use of certain ranges of the ASCII table as the 
English alphabet. It is unfortunate that it works... :)

D is great that it supports three separate Unicode encodings in the 
language, but encodings are at a lower level of abstraction than 
"letters". I am not sure what data is used for toUniUpper and toUniLower 
in std.uni, but they can't work correctly without alphabet information. 
They favor the ASCII layout probabyl because for historical reasons.

I think the problems with using the ASCII table for sorting is well 
known. A more interesting example is with the Azeri alphabet: it uses 
the ASCII xX characters, but sorts them after hH.

Ali

Jan 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Ali Çehreli wrote:
 D is great that it supports three separate Unicode encodings in the 
 language, but encodings are at a lower level of abstraction than 
 "letters". I am not sure what data is used for toUniUpper and toUniLower 
 in std.uni, but they can't work correctly without alphabet information. 
 They favor the ASCII layout probabyl because for historical reasons.
 
 I think the problems with using the ASCII table for sorting is well 
 known. A more interesting example is with the Azeri alphabet: it uses 
 the ASCII xX characters, but sorts them after hH.

My idea of functions for upper/lowercase would help you solve exactly 
the issue you mention. A conversion trie as an optional parameter would 
allow to capitalize Straße as STRASSE and ali as ALİ.

The trie will match the longest substring of the original string and 
will have translation strings in the nodes. The way capitalization is 
done will depend on the way you set up the table.


Andrei

Jan 29 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Jacob Carlborg (doob me.com)'s article
 Perhaps it's time to start adding more packages than just the std. Make
 std.algorithm a package and try to split it into several modules.

Please, no.  I **HATE** fine-grained imports like Tango has.  I don't want to
write tons of boilerplate at the top of every file just to have access to a
bunch
of closely related functionality.  If this is done, **PLEASE** at least make a
std.algorithm.all that publicly imports everything in the old std.algorithm.

Jan 29 2010

Jonathan M Davis <jmdavisProg gmail.com> writes:

dsimcha wrote:

 == Quote from Jacob Carlborg (doob me.com)'s article
 Perhaps it's time to start adding more packages than just the std. Make
 std.algorithm a package and try to split it into several modules.

 
 Please, no.  I **HATE** fine-grained imports like Tango has.  I don't want
 to write tons of boilerplate at the top of every file just to have access
 to a bunch
 of closely related functionality.  If this is done, **PLEASE** at least
 make a std.algorithm.all that publicly imports everything in the old
 std.algorithm.

We need a balance. Fine-grained can be great, but if it's too fine-grained, 
it gets hard to find things and you have to import a ton of modules. Not 
fine-grained enough, however, and you have a hard me finding things because 
there's so much to search through in each module - though importing what you 
need is easy.

Personally, I'm fine with std.algorithm being split into sub-modules. It's 
already fairly large and splitting it up would make a lot of sense. But then 
a solution allowing you to import large portions - if not all of it - at 
once would definitely be nice. It's why being able to do something like

import std.*;

and have it recursively grab every sub-module would be nice. But 
std.algorithm.all is a good idea.

- Jonathan M Davis

Jan 29 2010

Lutger <lutger.blijdestijn gmail.com> writes:

On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote:
...
 One problem I foresee is the growth of std.algorithm. It already has
 many things in it, and I fear that some user who just wants to trim a
 string may find it intimidating to browse through all that
 documentation. I wonder how we could break std.algorithm into smaller
 units (which is an issue largely independent from generalizing the
 algorithms now found in std.string).

 Any ideas are welcome.


 Andrei

I like how naturaldocs, which is similar to ddoc helps with this: by 
adding a group tag. See this example of a summary of a class:

http://www.naturaldocs.org/documenting/reference.html#Example_Class

Probably it is possible to come up with categories for algorithm like:
- functional tools
- searching and sorting
- string utilities
...

Arguably a more D like alternative is to make std.algorithm a package 
and each 'category' a module of that package.

Jan 29 2010

Lutger <lutger.blijdestijn gmail.com> writes:

On 01/29/2010 09:13 PM, Lutger wrote:
 http://www.naturaldocs.org/documenting/reference.html#Example_Class

sorry, wrong anchor:
http://www.naturaldocs.org/documenting/reference.html#Summaries

Jan 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lutger wrote:
 On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote:
 ...
 One problem I foresee is the growth of std.algorithm. It already has
 many things in it, and I fear that some user who just wants to trim a
 string may find it intimidating to browse through all that
 documentation. I wonder how we could break std.algorithm into smaller
 units (which is an issue largely independent from generalizing the
 algorithms now found in std.string).

 Any ideas are welcome.


 Andrei

 
 I like how naturaldocs, which is similar to ddoc helps with this: by 
 adding a group tag. See this example of a summary of a class:
 
 http://www.naturaldocs.org/documenting/reference.html#Example_Class
 
 Probably it is possible to come up with categories for algorithm like:
 - functional tools
 - searching and sorting
 - string utilities
 ...
 
 Arguably a more D like alternative is to make std.algorithm a package 
 and each 'category' a module of that package.

I think the idea of tags is awesome, particularly because it doesn't 
require one to divide items in disjoint sets. I'll think some more of 
it. It might require changes in ddoc. At any rate, sounds like a D3 
thing. Until then, I think I'll add to std.algorithm in confidence that 
we can scale the documentation later.

Andrei

Jan 29 2010

Lutger <lutger.blijdestijn gmail.com> writes:

On 01/29/2010 09:18 PM, Andrei Alexandrescu wrote:
 Lutger wrote:
 On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote:
 ...
 One problem I foresee is the growth of std.algorithm. It already has
 many things in it, and I fear that some user who just wants to trim a
 string may find it intimidating to browse through all that
 documentation. I wonder how we could break std.algorithm into smaller
 units (which is an issue largely independent from generalizing the
 algorithms now found in std.string).

 Any ideas are welcome.


 Andrei

 I like how naturaldocs, which is similar to ddoc helps with this: by
 adding a group tag. See this example of a summary of a class:

 http://www.naturaldocs.org/documenting/reference.html#Example_Class

 Probably it is possible to come up with categories for algorithm like:
 - functional tools
 - searching and sorting
 - string utilities
 ...

 Arguably a more D like alternative is to make std.algorithm a package
 and each 'category' a module of that package.

 I think the idea of tags is awesome, particularly because it doesn't
 require one to divide items in disjoint sets. I'll think some more of
 it. It might require changes in ddoc. At any rate, sounds like a D3
 thing. Until then, I think I'll add to std.algorithm in confidence that
 we can scale the documentation later.

 Andrei

Cool, tags are even better (naturaldocs groups aren't tags really). How 
are you going to do so? Perhaps better to reserve this as a standard 
ddoc section saying it is 'to be imlemented'? This way everybody can 
benefit eventually.

Jan 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:
 I think the idea of tags is awesome, particularly because it doesn't 
 require one to divide items in disjoint sets. I'll think some more of it.

A hierarchical D/Python-like module system isn't the only way to organize
blocks of code. Both future Windows file system and Google Email use tags to
create groups of items in a less disjoint way. But I don't know if it's
possible to design the equivalent of a module system based on tags instead of a
hierarchy of modules/packages (and superpackages). It seems a cute idea.


32 bits are not enough to represent certain "characters", they need more than
one of such dchar. So dchar too may be a bidirectional range.<<

[citation needed]<

I am far from expert about such hairy matters, so I can be wrong. This is from
Wikipedia:
http://en.wikipedia.org/wiki/UTF-32

Though a fixed number of bytes per code point seems convenient, it is not used
as much as the other Unicode encodings. It makes truncation slightly easier but
not significantly so compared to UTF-8 and UTF-16. It does not make calculating
the displayed width of a string any easier except in very limited cases, since
even with a "fixed width" font there may be more than one code point per
character position (combining marks) or more than one character position per
code point (for example CJK ideographs). Combining marks also mean editors
cannot treat one code point as being the same as one unit for editing.<

That paragraph of text also links to:
http://en.wikipedia.org/wiki/Combining_character
http://en.wikipedia.org/wiki/CJK

Bye,
bearophile

Jan 29 2010

Lutger <lutger.blijdestijn gmail.com> writes:

On 01/29/2010 09:43 PM, bearophile wrote:
 Andrei Alexandrescu:
 I think the idea of tags is awesome, particularly because it doesn't
 require one to divide items in disjoint sets. I'll think some more of it.

 A hierarchical D/Python-like module system isn't the only way to organize
blocks of code. Both future Windows file system and Google Email use tags to
create groups of items in a less disjoint way. But I don't know if it's
possible to design the equivalent of a module system based on tags instead of a
hierarchy of modules/packages (and superpackages). It seems a cute idea.

This is about the documentation, which at the moment is based on the 
module system, type system and order of declarations. Such tags allow 
for better indexes, organization and search through the docs.

Jan 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Lutger wrote:
 On 01/29/2010 09:43 PM, bearophile wrote:
 Andrei Alexandrescu:
 I think the idea of tags is awesome, particularly because it doesn't
 require one to divide items in disjoint sets. I'll think some more of 
 it.

 A hierarchical D/Python-like module system isn't the only way to 
 organize blocks of code. Both future Windows file system and Google 
 Email use tags to create groups of items in a less disjoint way. But I 
 don't know if it's possible to design the equivalent of a module 
 system based on tags instead of a hierarchy of modules/packages (and 
 superpackages). It seems a cute idea.

 
 This is about the documentation, which at the moment is based on the 
 module system, type system and order of declarations. Such tags allow 
 for better indexes, organization and search through the docs.

I don't think it would be too far-fetched to define and use tags for 
selective imports a la:

// inside std.algorithm
 tag(string, comparison) bool startsWith(...)(...) { ... }

// in client code
// get everything tagged with "string"
import std.algorithm :  tag(string);


Andrei

Jan 29 2010

bearophile <bearophileHUGS lycos.com> writes:

Andrei Alexandrescu:

 // in client code
 // get everything tagged with "string"
 import std.algorithm :  tag(string);

A next step is to allow to import all names with a specified tag, even if such
names are inside more than one text file (the compiler can create a json txt
file to speed up this retrieval):

import  tag(string);

To keep things tidy I think it's better to minimize the number of different
tags inside each file, so they are similar to modules anyway: perfect
hierarchies are sometimes too much rigid to represent real life complexities,
but an approximate hierarchy is tidier and simpler to understand than an
amorphous soup of tags.

Bye,
bearophile

Jan 29 2010

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 29 Jan 2010 15:18:14 -0500, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 Lutger wrote:
 On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote:
 ...
 One problem I foresee is the growth of std.algorithm. It already has
 many things in it, and I fear that some user who just wants to trim a
 string may find it intimidating to browse through all that
 documentation. I wonder how we could break std.algorithm into smaller
 units (which is an issue largely independent from generalizing the
 algorithms now found in std.string).

 Any ideas are welcome.


 Andrei

  I like how naturaldocs, which is similar to ddoc helps with this: by  
 adding a group tag. See this example of a summary of a class:
  http://www.naturaldocs.org/documenting/reference.html#Example_Class
  Probably it is possible to come up with categories for algorithm like:
 - functional tools
 - searching and sorting
 - string utilities
 ...
  Arguably a more D like alternative is to make std.algorithm a package  
 and each 'category' a module of that package.

 I think the idea of tags is awesome, particularly because it doesn't  
 require one to divide items in disjoint sets. I'll think some more of  
 it. It might require changes in ddoc. At any rate, sounds like a D3  
 thing. Until then, I think I'll add to std.algorithm in confidence that  
 we can scale the documentation later.

 Andrei

By the way, in the sort term you could greatly improve the usability of  
std.algorithm by cleaning up the index ("jump to") at the top of the file.  
A simple alphabetical listing would be great and you could easily start  
grouping links under categories (which would eventually become tags)

Jan 29 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

Robert Jacques wrote:
 On Fri, 29 Jan 2010 15:18:14 -0500, Andrei Alexandrescu 
 <SeeWebsiteForEmail erdani.org> wrote:
 
 Lutger wrote:
 On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote:
 ...
 One problem I foresee is the growth of std.algorithm. It already has
 many things in it, and I fear that some user who just wants to trim a
 string may find it intimidating to browse through all that
 documentation. I wonder how we could break std.algorithm into smaller
 units (which is an issue largely independent from generalizing the
 algorithms now found in std.string).

 Any ideas are welcome.


 Andrei

  I like how naturaldocs, which is similar to ddoc helps with this: by 
 adding a group tag. See this example of a summary of a class:
  http://www.naturaldocs.org/documenting/reference.html#Example_Class
  Probably it is possible to come up with categories for algorithm like:
 - functional tools
 - searching and sorting
 - string utilities
 ...
  Arguably a more D like alternative is to make std.algorithm a 
 package and each 'category' a module of that package.

 I think the idea of tags is awesome, particularly because it doesn't 
 require one to divide items in disjoint sets. I'll think some more of 
 it. It might require changes in ddoc. At any rate, sounds like a D3 
 thing. Until then, I think I'll add to std.algorithm in confidence 
 that we can scale the documentation later.

 Andrei

 
 By the way, in the sort term you could greatly improve the usability of 
 std.algorithm by cleaning up the index ("jump to") at the top of the 
 file. A simple alphabetical listing would be great and you could easily 
 start grouping links under categories (which would eventually become tags)

That jump to index is automatically generated. I can have it sorted 
alphabetically, which makes sense for large lists. But then should I 
also list components in alphabetical order?

Andrei

Jan 29 2010

Clemens <eriatarka84 gmail.com> writes:

Lionello Lunesu Wrote:

 On 31-1-2010 16:34, Simen kjaeraas wrote:
 Lionello Lunesu <lio lunesu.remove.com> wrote:
 
 I miss typedef. I think this is exactly what typedef was intended
 for. Perhaps we can reintroduce it as a 'short hand' for such a
 struct?

 
 struct Typedef( T ) {
   T payload;
   alias payload this;
 }
 
 Usage:
 
 alias Typedef!( int ) myInt;
 
 Is this what you want?

 
 Using alias you loose all type safety.
 
 I remember Andrei mentioned that he and Walter couldn't agree
 whether typedef should behave as a sub or super class. I think it
 should not be looked at from a inheritance perspective, but just
 consider it as wrapper struct with a ctor that takes the
 underlying type.

I think you may misunderstand what the "alias this" construct does. It does
exactly what you ask for:

http://www.digitalmars.com/d/2.0/class.html#AliasThis

Feb 02 2010

D Programming

C/C++ Programming

Other

digitalmars.D - std.string will get the boot