digitalmars.D - std.string will get the boot
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 29 2010
- bearophile <bearophileHUGS lycos.com> Jan 29 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 29 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 30 2010
- Lionello Lunesu <lio lunesu.remove.com> Jan 30 2010
- Michel Fortin <michel.fortin michelf.com> Jan 30 2010
- Lionello Lunesu <lio lunesu.remove.com> Feb 01 2010
- "Simen kjaeraas" <simen.kjaras gmail.com> Jan 30 2010
- "Simen kjaeraas" <simen.kjaras gmail.com> Jan 31 2010
- "Denis Koroskin" <2korden gmail.com> Jan 31 2010
- "Denis Koroskin" <2korden gmail.com> Jan 31 2010
- "Simen kjaeraas" <simen.kjaras gmail.com> Jan 31 2010
- Jacob Carlborg <doob me.com> Jan 29 2010
- =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> Jan 29 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 29 2010
- Jacob Carlborg <doob me.com> Jan 29 2010
- dsimcha <dsimcha yahoo.com> Jan 29 2010
- Jonathan M Davis <jmdavisProg gmail.com> Jan 29 2010
- Lutger <lutger.blijdestijn gmail.com> Jan 29 2010
- Lutger <lutger.blijdestijn gmail.com> Jan 29 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 29 2010
- Lutger <lutger.blijdestijn gmail.com> Jan 29 2010
- bearophile <bearophileHUGS lycos.com> Jan 29 2010
- Lutger <lutger.blijdestijn gmail.com> Jan 29 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 29 2010
- bearophile <bearophileHUGS lycos.com> Jan 29 2010
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Jan 29 2010
- "Robert Jacques" <sandford jhu.edu> Jan 29 2010
I plan a few improvements to Phobos that will improve string handling. Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges. Also, std.range will define s.front and s.back for strings to return the correctly decoded dchar. Naturally, s.popFront and s.popBack will yank an entire encoded character, which is what you want most of the time anyway. (You're still free to do s = s[1 .. $] if that's what you need.) These changes will have the great effect of enabling std.algorithm to work with strings correctly without any further impedance adaptation. (At some point I'd defined byDchar to wrap a string as a bidirectional range; it works, but of course it's much better without an intermediary.) Following that change, I plan to eliminate std.string entirely and roll all of its functionality into std.algorithm. This is because I noticed that I'd like many string functions to be available for other data types, and also because people who want to define their own non-UTF encodings can benefit of the support that UTF already has. (As an example, startsWith or endsWith are very useful not only with strings, but general data as well.) A possible idea would be to move algorithms out of std.string and roll std.utf and std.encoding into std.string. That way std.string becomes something UTF-specific, which may be sensible. One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
Jan 29 2010
Andrei Alexandrescu:Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.
32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range. I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better... Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to use cast(ubyte[]) every time I use an algorithm on them :-)One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation.
It's not just a matter of documentation: to choose among n items a human needs more time as n grows (people that designg important menus in GUIs must be aware of this). So huge APIs slow down programming. A possible solution is to keep the std.string module, but make it just a list of aliases and thin wrappers around functions of std.algorithm, tuned for string processing (example I usually don't need tolower on generic arrays), there are some operations that are mostly useful for strings). Bye, bearophile
Jan 29 2010
bearophile wrote:Andrei Alexandrescu:Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.
32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.
[citation needed]I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...
I think it's a tad late for that.Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to use cast(ubyte[]) every time I use an algorithm on them :-)
That's exactly one of the cases in which my change would help. char is UTF-8, so that's out as an option for expressing ASCII characters. You'll be able to define your own type: struct AsciiChar { ubyte datum; ... } Then express stuff in terms of AsciiChar[] etc.One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation.
It's not just a matter of documentation: to choose among n items a human needs more time as n grows (people that designg important menus in GUIs must be aware of this). So huge APIs slow down programming. A possible solution is to keep the std.string module, but make it just a list of aliases and thin wrappers around functions of std.algorithm, tuned for string processing (example I usually don't need tolower on generic arrays), there are some operations that are mostly useful for strings).
That's a good possibility. Andrei
Jan 29 2010
Simen kjaeraas wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:bearophile wrote:I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...
I think it's a tad late for that.
So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity.
That would be possible. Andrei
Jan 30 2010
On 30-1-2010 1:59, Andrei Alexandrescu wrote:bearophile wrote:Andrei Alexandrescu:Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.
32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.
[citation needed]
I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF as the highest code point.Sometimes I have ugly 7-bit ASCII strings, I am not sure I want to be forced to use cast(ubyte[]) every time I use an algorithm on them :-)
That's exactly one of the cases in which my change would help. char is UTF-8, so that's out as an option for expressing ASCII characters. You'll be able to define your own type: struct AsciiChar { ubyte datum; ... } Then express stuff in terms of AsciiChar[] etc.
I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct? By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right? L.
Jan 30 2010
On 2010-01-30 22:06:06 -0500, Lionello Lunesu <lio lunesu.remove.com> said:On 30-1-2010 1:59, Andrei Alexandrescu wrote:bearophile wrote:Andrei Alexandrescu:Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges.
32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.
[citation needed]
I also doubt 32-bit is not enough. In fact, Unicode has 0x10FFFF as the highest code point.
32-bit is enough to cover all code points. But there are many combining code points in Unicode, allowing you to combine diacritic with various other characters, such as an acute accent with a 'k'. Some of these combinations exists in precombined form and are considered equivalent. So if you want to count the number of characters the user actually see instead of counting code points, then you need to take these combining code points into account. But if you really wanted to iterate over "characters" instead of code points, note that it can become quite hard if you take into account double diacritics, combining diacritic signs placed across two letters. So I think it's reasonable to have dchar, a code point, as the base unit for iterating over a string. http://en.wikipedia.org/wiki/Combining_character http://en.wikipedia.org/wiki/Unicode_normalization Another interesting case: http://en.wikipedia.org/wiki/Combining_grapheme_joiner Unicode, isn't it great? -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 30 2010
On 31-1-2010 16:34, Simen kjaeraas wrote:Lionello Lunesu <lio lunesu.remove.com> wrote:I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?
struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?
Using alias you loose all type safety. I remember Andrei mentioned that he and Walter couldn't agree whether typedef should behave as a sub or super class. I think it should not be looked at from a inheritance perspective, but just consider it as wrapper struct with a ctor that takes the underlying type.By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right?
AS far as I have understood (I am no Unicode guru), in some locales toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being a strict subset of UTF-8 is not always true.
True, but then that upper resp lowercase would no longer be ASCII. As long as you stick to ASCII, char[] should work just fine. So, toLower and toUpper can accept ASCII char[] but always output one of those new char ranges. Problem fixed :) L.
Feb 01 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:bearophile wrote:I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...
I think it's a tad late for that.
So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity. -- Simen
Jan 30 2010
Lionello Lunesu <lio lunesu.remove.com> wrote:I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?
struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right?
AS far as I have understood (I am no Unicode guru), in some locales toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being a strict subset of UTF-8 is not always true. -- Simen
Jan 31 2010
On Sun, 31 Jan 2010 01:30:41 +0300, Simen kjaeraas <simen.kjaras gmail.com> wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:bearophile wrote:I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...
I think it's a tad late for that.
So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity.
Everyone can do that on their own. I see no reason to pollute the namespace.
Jan 31 2010
On Sun, 31 Jan 2010 11:34:03 +0300, Simen kjaeraas = <simen.kjaras gmail.com> wrote:Lionello Lunesu <lio lunesu.remove.com> wrote:I miss typedef. I think this is exactly what typedef was intended for. Perhaps we can reintroduce it as a 'short hand' for such a struct?
struct Typedef( T ) { T payload; alias payload this; } Usage: alias Typedef!( int ) myInt; Is this what you want?By the way, ASCII is a subset of UTF-8 (that was the whole point), so there's no reason why 'char[]' can't still be used for ASCII strings, right?
AS far as I have understood (I am no Unicode guru), in some locales =
toUpper and toLower map ASCII chars to non-ASCII chars. So ASCII being=
strict subset of UTF-8 is not always true.
I only know one example (in turkish): i < - > =C4=B0 =C4=B1 < - > I That's a big issue because toUpper/toLower needs a locale to provide = correct result.
Jan 31 2010
On Sun, 31 Jan 2010 15:09:28 +0100, Denis Koroskin <2korden gmail.com> wrote:On Sun, 31 Jan 2010 01:30:41 +0300, Simen kjaeraas <simen.kjaras gmail.com> wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:bearophile wrote:I can't remember the bit size of wchar and dchar. So names like char, char16 and char32 can be better...
I think it's a tad late for that.
So adding aliases to object.d is not possible this late in the process? I'm not sure I want that to happen, just out of curiosity.
Everyone can do that on their own. I see no reason to pollute the namespace.
Nor do I. I was only inquiring as to its feasibility. -- Simen
Jan 31 2010
On 1/29/10 18:36, Andrei Alexandrescu wrote:I plan a few improvements to Phobos that will improve string handling. Currently arrays of characters count as random-access ranges, which is not true for arrays of char and wchar. I plan to make std.range aware of that and only characterize char[] and wchar[] (and their qualified versions) as bidirectional ranges. Also, std.range will define s.front and s.back for strings to return the correctly decoded dchar. Naturally, s.popFront and s.popBack will yank an entire encoded character, which is what you want most of the time anyway. (You're still free to do s = s[1 .. $] if that's what you need.) These changes will have the great effect of enabling std.algorithm to work with strings correctly without any further impedance adaptation. (At some point I'd defined byDchar to wrap a string as a bidirectional range; it works, but of course it's much better without an intermediary.) Following that change, I plan to eliminate std.string entirely and roll all of its functionality into std.algorithm. This is because I noticed that I'd like many string functions to be available for other data types, and also because people who want to define their own non-UTF encodings can benefit of the support that UTF already has.
I would keep std.string for string specific functions and perhaps publicly import std.algorithm. For exmaple functions like: tolower, icmp and toStringz.(As an example, startsWith or endsWith are very useful not only with strings, but general data as well.) A possible idea would be to move algorithms out of std.string and roll std.utf and std.encoding into std.string. That way std.string becomes something UTF-specific, which may be sensible. One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string).
Perhaps it's time to start adding more packages than just the std. Make std.algorithm a package and try to split it into several modules.Any ideas are welcome. Andrei
Jan 29 2010
Jacob Carlborg wrote:I would keep std.string for string specific functions and perhaps publicly import std.algorithm. For exmaple functions like: tolower, icmp and toStringz.
I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation. Ali
Jan 29 2010
Ali Çehreli wrote:Jacob Carlborg wrote: > I would keep std.string for string specific functions and perhaps > publicly import std.algorithm. For exmaple functions like: tolower, icmp > and toStringz. I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation.
My thoughts exactly. In fact I'm thinking of generalizing toupper and tolower for strings to take an optional trie mapping strings to strings. That way correct capitalization can be done for any string, given a good collection of capitalization patterns. Andrei
Jan 29 2010
On 1/29/10 22:18, Ali Çehreli wrote:Jacob Carlborg wrote: > I would keep std.string for string specific functions and perhaps > publicly import std.algorithm. For exmaple functions like: tolower, icmp > and toStringz. I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation. Ali
I'm not sure I really understand this, probably because I don't know much about how Unciode works. I'm thinking out loud: If "i", as you have in "ali", have the corresponding "İ" as upper case wouldn't that be another character than the English "i"? If so, I'm not sure I see the problem. If not, I see the problem.
Jan 29 2010
Jacob Carlborg wrote:On 1/29/10 22:18, Ali Çehreli wrote:Jacob Carlborg wrote: > I would keep std.string for string specific functions and perhaps > publicly import std.algorithm. For exmaple functions like: tolower, icmp > and toStringz. I've been thinking about characters lately and have realized that tolower, toupper, icmp, and friends should not be in a string library. Those functions need an "alphabet" to be useful; not language, nor locale... In fact, the character itself must have alphabet information. Otherwise a string like "ali & jim" cannot be converted to upper-case correctly(*) as "ALİ & JIM". And the word "correctly" there depends on each character's alphabet. Similarly, two characters that look the same cannot be compared for ordering. Comparing the 'x' of one alphabet to the 'x' of another alphabet is a meaningless operation. Ali
I'm not sure I really understand this, probably because I don't know much about how Unciode works. I'm thinking out loud: If "i", as you have in "ali", have the corresponding "İ" as upper case wouldn't that be another character than the English "i"?
'i' and 'i' are the same "character", because they have the same ASCII and Unicode values in different alphabets. But it is not the same "letter" when they are part of different text. iİ (and ıI) issue is probably too special. A number of Turkic alphabets chose ASCII 'i' probably for historical reasons. Unicode did not define a separate code point for 'i' either, probably because those alphabets already were using the ASCII 'i'.If so, I'm not sure I see the problem. If not, I see the problem.
The letter 'i' (and I) is special but the issue is valid for any other letter: Is it valid to compare an 'i' in English text to an 'i' in German text? I think it's only valid at the lowest data representation level. And ASCII never claims to be more than a code table for "information interchange". That part is fine. The problem is with the use of certain ranges of the ASCII table as the English alphabet. It is unfortunate that it works... :) D is great that it supports three separate Unicode encodings in the language, but encodings are at a lower level of abstraction than "letters". I am not sure what data is used for toUniUpper and toUniLower in std.uni, but they can't work correctly without alphabet information. They favor the ASCII layout probabyl because for historical reasons. I think the problems with using the ASCII table for sorting is well known. A more interesting example is with the Azeri alphabet: it uses the ASCII xX characters, but sorts them after hH. Ali
Jan 29 2010
Ali Çehreli wrote:D is great that it supports three separate Unicode encodings in the language, but encodings are at a lower level of abstraction than "letters". I am not sure what data is used for toUniUpper and toUniLower in std.uni, but they can't work correctly without alphabet information. They favor the ASCII layout probabyl because for historical reasons. I think the problems with using the ASCII table for sorting is well known. A more interesting example is with the Azeri alphabet: it uses the ASCII xX characters, but sorts them after hH.
My idea of functions for upper/lowercase would help you solve exactly the issue you mention. A conversion trie as an optional parameter would allow to capitalize Straße as STRASSE and ali as ALİ. The trie will match the longest substring of the original string and will have translation strings in the nodes. The way capitalization is done will depend on the way you set up the table. Andrei
Jan 29 2010
== Quote from Jacob Carlborg (doob me.com)'s articlePerhaps it's time to start adding more packages than just the std. Make std.algorithm a package and try to split it into several modules.
Please, no. I **HATE** fine-grained imports like Tango has. I don't want to write tons of boilerplate at the top of every file just to have access to a bunch of closely related functionality. If this is done, **PLEASE** at least make a std.algorithm.all that publicly imports everything in the old std.algorithm.
Jan 29 2010
dsimcha wrote:== Quote from Jacob Carlborg (doob me.com)'s articlePerhaps it's time to start adding more packages than just the std. Make std.algorithm a package and try to split it into several modules.
Please, no. I **HATE** fine-grained imports like Tango has. I don't want to write tons of boilerplate at the top of every file just to have access to a bunch of closely related functionality. If this is done, **PLEASE** at least make a std.algorithm.all that publicly imports everything in the old std.algorithm.
We need a balance. Fine-grained can be great, but if it's too fine-grained, it gets hard to find things and you have to import a ton of modules. Not fine-grained enough, however, and you have a hard me finding things because there's so much to search through in each module - though importing what you need is easy. Personally, I'm fine with std.algorithm being split into sub-modules. It's already fairly large and splitting it up would make a lot of sense. But then a solution allowing you to import large portions - if not all of it - at once would definitely be nice. It's why being able to do something like import std.*; and have it recursively grab every sub-module would be nice. But std.algorithm.all is a good idea. - Jonathan M Davis
Jan 29 2010
On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
I like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
Jan 29 2010
On 01/29/2010 09:13 PM, Lutger wrote:http://www.naturaldocs.org/documenting/reference.html#Example_Class
sorry, wrong anchor: http://www.naturaldocs.org/documenting/reference.html#Summaries
Jan 29 2010
Lutger wrote:On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
I like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. Andrei
Jan 29 2010
On 01/29/2010 09:18 PM, Andrei Alexandrescu wrote:Lutger wrote:On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
I like how naturaldocs, which is similar to ddoc helps with this: by adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. Andrei
Cool, tags are even better (naturaldocs groups aren't tags really). How are you going to do so? Perhaps better to reserve this as a standard ddoc section saying it is 'to be imlemented'? This way everybody can benefit eventually.
Jan 29 2010
Andrei Alexandrescu:I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it.
A hierarchical D/Python-like module system isn't the only way to organize blocks of code. Both future Windows file system and Google Email use tags to create groups of items in a less disjoint way. But I don't know if it's possible to design the equivalent of a module system based on tags instead of a hierarchy of modules/packages (and superpackages). It seems a cute idea.32 bits are not enough to represent certain "characters", they need more than one of such dchar. So dchar too may be a bidirectional range.<<
I am far from expert about such hairy matters, so I can be wrong. This is from Wikipedia: http://en.wikipedia.org/wiki/UTF-32Though a fixed number of bytes per code point seems convenient, it is not used as much as the other Unicode encodings. It makes truncation slightly easier but not significantly so compared to UTF-8 and UTF-16. It does not make calculating the displayed width of a string any easier except in very limited cases, since even with a "fixed width" font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks also mean editors cannot treat one code point as being the same as one unit for editing.<
That paragraph of text also links to: http://en.wikipedia.org/wiki/Combining_character http://en.wikipedia.org/wiki/CJK Bye, bearophile
Jan 29 2010
On 01/29/2010 09:43 PM, bearophile wrote:Andrei Alexandrescu:I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it.
A hierarchical D/Python-like module system isn't the only way to organize blocks of code. Both future Windows file system and Google Email use tags to create groups of items in a less disjoint way. But I don't know if it's possible to design the equivalent of a module system based on tags instead of a hierarchy of modules/packages (and superpackages). It seems a cute idea.
This is about the documentation, which at the moment is based on the module system, type system and order of declarations. Such tags allow for better indexes, organization and search through the docs.
Jan 29 2010
Lutger wrote:On 01/29/2010 09:43 PM, bearophile wrote:Andrei Alexandrescu:I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it.
A hierarchical D/Python-like module system isn't the only way to organize blocks of code. Both future Windows file system and Google Email use tags to create groups of items in a less disjoint way. But I don't know if it's possible to design the equivalent of a module system based on tags instead of a hierarchy of modules/packages (and superpackages). It seems a cute idea.
This is about the documentation, which at the moment is based on the module system, type system and order of declarations. Such tags allow for better indexes, organization and search through the docs.
I don't think it would be too far-fetched to define and use tags for selective imports a la: // inside std.algorithm tag(string, comparison) bool startsWith(...)(...) { ... } // in client code // get everything tagged with "string" import std.algorithm : tag(string); Andrei
Jan 29 2010
Andrei Alexandrescu:// in client code // get everything tagged with "string" import std.algorithm : tag(string);
A next step is to allow to import all names with a specified tag, even if such names are inside more than one text file (the compiler can create a json txt file to speed up this retrieval): import tag(string); To keep things tidy I think it's better to minimize the number of different tags inside each file, so they are similar to modules anyway: perfect hierarchies are sometimes too much rigid to represent real life complexities, but an approximate hierarchy is tidier and simpler to understand than an amorphous soup of tags. Bye, bearophile
Jan 29 2010
Robert Jacques wrote:On Fri, 29 Jan 2010 15:18:14 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Lutger wrote:On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. Andrei
By the way, in the sort term you could greatly improve the usability of std.algorithm by cleaning up the index ("jump to") at the top of the file. A simple alphabetical listing would be great and you could easily start grouping links under categories (which would eventually become tags)
That jump to index is automatically generated. I can have it sorted alphabetically, which makes sense for large lists. But then should I also list components in alphabetical order? Andrei
Jan 29 2010
On Fri, 29 Jan 2010 15:18:14 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Lutger wrote:On 01/29/2010 06:36 PM, Andrei Alexandrescu wrote: ...One problem I foresee is the growth of std.algorithm. It already has many things in it, and I fear that some user who just wants to trim a string may find it intimidating to browse through all that documentation. I wonder how we could break std.algorithm into smaller units (which is an issue largely independent from generalizing the algorithms now found in std.string). Any ideas are welcome. Andrei
adding a group tag. See this example of a summary of a class: http://www.naturaldocs.org/documenting/reference.html#Example_Class Probably it is possible to come up with categories for algorithm like: - functional tools - searching and sorting - string utilities ... Arguably a more D like alternative is to make std.algorithm a package and each 'category' a module of that package.
I think the idea of tags is awesome, particularly because it doesn't require one to divide items in disjoint sets. I'll think some more of it. It might require changes in ddoc. At any rate, sounds like a D3 thing. Until then, I think I'll add to std.algorithm in confidence that we can scale the documentation later. Andrei
By the way, in the sort term you could greatly improve the usability of std.algorithm by cleaning up the index ("jump to") at the top of the file. A simple alphabetical listing would be great and you could easily start grouping links under categories (which would eventually become tags)
Jan 29 2010









Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> 