digitalmars.D - std.gregorian contribution
- negerns (48/48) May 16 2010 I filled in some of the functions from Andrei's first draft of
- Tomek =?UTF-8?B?U293acWEc2tp?= (9/40) May 17 2010 std.string.
- negerns (3/43) May 17 2010 I wish it wouldn't be too long like splitByChar :)
- Andrei Alexandrescu (24/71) May 17 2010 I have two unrelated suggestions about unjoin.
- Simen kjaeraas (6/26) May 17 2010 D could use a set type, and this is a very nice way to specify these
- Pelle (2/5) May 17 2010 I agree, and I find a set type to be generally very useful. :)
- Steven Schveighoffer (12/44) May 17 2010 Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I se...
- Andrei Alexandrescu (15/25) May 17 2010 These are good points. They have gone through my mind as well, but
- Steven Schveighoffer (10/39) May 17 2010 These are good ideas.
- Philippe Sigaud (22/33) May 17 2010 I personally use a predicate, isOneOf(some range). It's curried, to it
I filled in some of the functions from Andrei's first draft of std.gregorian module. I hope they are good enough. I changed some identifiers like GregYear, GregMonth, etc to GregorianYear, GregorianMonth, etc so as to be consistent with the 'julian' identifiers but it's a minor changes that maybe I shouldn't have touched. Also, I have introduced a unjoin() function as a helper function. It splits a string into an array of lines using the specified array of characters as delimiters. I am not sure if there is already an existing function that does the same but I could not find it. For lack of a better word I opted for the opposite of the join() function in std.string. string[] unjoin(string s, char[] ch) { uint start = 0; uint i = 0; string[] result; for (i = 0; i < s.length; i++) { if (indexOf(ch, s[i]) != -1) { result ~= s[start..i]; start = i + 1; } } if (start < i) { result ~= s[start..$]; } return result; } unittest { string s = "2010-05-31"; string[] r = unjoin(s, ['/', '-', '.', ',', '\\']); assert(r[0] == "2010"); assert(r[1] == "05"); assert(r[2] == "31"); } I have modified the signature of fromString() and fromUndelimitedString() to accept string arguments instead of char[]. I am not sure if it is alright with Andrei. Here's a list of what I have implemented so far: - Date fromString(in string s) - Date fromUndelimitedString(in string s) - property string toSimpleString() - property string toIsoString() - property string toIsoExtendedString() - added string[] months used only by toSimpleString() - unit tests I have attached the .diff file gregorian.diff Regards, negerns
May 16 2010
negerns wrote:Also, I have introduced a unjoin() function as a helper function. It splits a string into an array of lines using the specified array of characters as delimiters. I am not sure if there is already anexistingfunction that does the same but I could not find it. For lack of a better word I opted for the opposite of the join() function instd.string.string[] unjoin(string s, char[] ch) { uint start = 0; uint i = 0; string[] result; for (i = 0; i < s.length; i++) { if (indexOf(ch, s[i]) != -1) { result ~= s[start..i]; start = i + 1; } } if (start < i) { result ~= s[start..$]; } return result; } unittest { string s = "2010-05-31"; string[] r = unjoin(s, ['/', '-', '.', ',', '\\']); assert(r[0] == "2010"); assert(r[1] == "05"); assert(r[2] == "31"); }Thanks, it's useful. There's std.string.split but it takes only one delimiter. It'd be nice to have it as an overload that takes any range of delims. Yet, a delim can be a string (an array) and there would be problems how to understand split(..., "://"). So I suggest calling it splitBy to disambiguate. Like it? Tomek
May 17 2010
On 5/18/2010 1:03 AM, Tomek Sowiński wrote:negerns wrote:I wish it wouldn't be too long like splitByChar :) I'm out of ideas.Also, I have introduced a unjoin() function as a helper function. It splits a string into an array of lines using the specified array of characters as delimiters. I am not sure if there is already anexistingfunction that does the same but I could not find it. For lack of a better word I opted for the opposite of the join() function instd.string.string[] unjoin(string s, char[] ch) { uint start = 0; uint i = 0; string[] result; for (i = 0; i< s.length; i++) { if (indexOf(ch, s[i]) != -1) { result ~= s[start..i]; start = i + 1; } } if (start< i) { result ~= s[start..$]; } return result; } unittest { string s = "2010-05-31"; string[] r = unjoin(s, ['/', '-', '.', ',', '\\']); assert(r[0] == "2010"); assert(r[1] == "05"); assert(r[2] == "31"); }Thanks, it's useful. There's std.string.split but it takes only one delimiter. It'd be nice to have it as an overload that takes any range of delims. Yet, a delim can be a string (an array) and there would be problems how to understand split(..., "://"). So I suggest calling it splitBy to disambiguate. Like it? Tomek
May 17 2010
On 05/17/2010 12:32 PM, negerns wrote:On 5/18/2010 1:03 AM, Tomek Sowiński wrote:I have two unrelated suggestions about unjoin. First, you may want to follow the model set by splitter() instead of split() when defining unjoin(). This is because split() allocates memory whereas splitter splits lazily so it doesn't need to. If you do want split(), just call array(splitter()). Second, there is an ambiguity between splitting using a string as separator and splitting using a set of characters as separator. This could be solved by simply using different names: string str = ...; foreach (splitByOneOf(str, "; ")) { ... } foreach (splitter(str, "; ")) { ... } First look splits by one of the two, whereas the second splits by the exact string "; ". An idea I am toying with is to factor things out into the data types. After all, if I'm splitting by "one of" an element in a set of elements, that should be reflected in the set's type. For example: foreach (splitter(str, either(';', ' ')) { ... } foreach (splitter(str, "; ")) { ... } or, using a more general notion of a set: foreach (splitter(str, set(';', ' ')) { ... } One nice outcome is that we can then reuse the same pattern in other signatures. Andreinegerns wrote:I wish it wouldn't be too long like splitByChar :) I'm out of ideas.Also, I have introduced a unjoin() function as a helper function. It splits a string into an array of lines using the specified array of characters as delimiters. I am not sure if there is already anexistingfunction that does the same but I could not find it. For lack of a better word I opted for the opposite of the join() function instd.string.string[] unjoin(string s, char[] ch) { uint start = 0; uint i = 0; string[] result; for (i = 0; i< s.length; i++) { if (indexOf(ch, s[i]) != -1) { result ~= s[start..i]; start = i + 1; } } if (start< i) { result ~= s[start..$]; } return result; } unittest { string s = "2010-05-31"; string[] r = unjoin(s, ['/', '-', '.', ',', '\\']); assert(r[0] == "2010"); assert(r[1] == "05"); assert(r[2] == "31"); }Thanks, it's useful. There's std.string.split but it takes only one delimiter. It'd be nice to have it as an overload that takes any range of delims. Yet, a delim can be a string (an array) and there would be problems how to understand split(..., "://"). So I suggest calling it splitBy to disambiguate. Like it? Tomek
May 17 2010
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I have two unrelated suggestions about unjoin. First, you may want to follow the model set by splitter() instead of split() when defining unjoin(). This is because split() allocates memory whereas splitter splits lazily so it doesn't need to. If you do want split(), just call array(splitter()). Second, there is an ambiguity between splitting using a string as separator and splitting using a set of characters as separator. This could be solved by simply using different names: string str = ...; foreach (splitByOneOf(str, "; ")) { ... } foreach (splitter(str, "; ")) { ... } First look splits by one of the two, whereas the second splits by the exact string "; ". An idea I am toying with is to factor things out into the data types. After all, if I'm splitting by "one of" an element in a set of elements, that should be reflected in the set's type. For example: foreach (splitter(str, either(';', ' ')) { ... } foreach (splitter(str, "; ")) { ... } or, using a more general notion of a set: foreach (splitter(str, set(';', ' ')) { ... }D could use a set type, and this is a very nice way to specify these different parameters. votes = -~votes; -- Simen
May 17 2010
On 05/17/2010 08:00 PM, Simen kjaeraas wrote:D could use a set type, and this is a very nice way to specify these different parameters. votes = -~votes;I agree, and I find a set type to be generally very useful. :)
May 17 2010
On Mon, 17 May 2010 14:00:41 -0400, Simen kjaeraas <simen.kjaras gmail.com> wrote:Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I see one major difference here -- "; " is a literal, set(';', ' ') is not. I would expect that 'set' as a generic set type would implement it's guts as some sort of tree/hash, which means a lot of overhead for a simple argument. With the literal version, the notation is in the function, not the type. While it seems rather neat, the overhead should be considered. A compromise: foreach(x; splitter(str, either("; "))) Which can be implemented without heap activity. -SteveI have two unrelated suggestions about unjoin. First, you may want to follow the model set by splitter() instead of split() when defining unjoin(). This is because split() allocates memory whereas splitter splits lazily so it doesn't need to. If you do want split(), just call array(splitter()). Second, there is an ambiguity between splitting using a string as separator and splitting using a set of characters as separator. This could be solved by simply using different names: string str = ...; foreach (splitByOneOf(str, "; ")) { ... } foreach (splitter(str, "; ")) { ... } First look splits by one of the two, whereas the second splits by the exact string "; ". An idea I am toying with is to factor things out into the data types. After all, if I'm splitting by "one of" an element in a set of elements, that should be reflected in the set's type. For example: foreach (splitter(str, either(';', ' ')) { ... } foreach (splitter(str, "; ")) { ... } or, using a more general notion of a set: foreach (splitter(str, set(';', ' ')) { ... }D could use a set type, and this is a very nice way to specify these different parameters. votes = -~votes;
May 17 2010
On 05/17/2010 03:16 PM, Steven Schveighoffer wrote:Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I see one major difference here -- "; " is a literal, set(';', ' ') is not. I would expect that 'set' as a generic set type would implement it's guts as some sort of tree/hash, which means a lot of overhead for a simple argument. With the literal version, the notation is in the function, not the type. While it seems rather neat, the overhead should be considered. A compromise: foreach(x; splitter(str, either("; "))) Which can be implemented without heap activity.These are good points. They have gone through my mind as well, but lately I've started to take a somewhat more liberal view of containers. For example, I'm thinking that Set!T (which would be the type returned by set()) could automatically use arrays and linear search for small sets. Other special cases come to mind, such as the small array optimization. In other words Set!T would not have a guaranteed implementation, but instead exploit magnitude to choose among a spectrum of implementation alternatives. Of course using a different name such as either() is even better for the implementation because it can make additional implementation dictated by the restricted use of either(). For example either() could return a FixedSet!T that does not accept adding new members and is optimized accordingly. Andrei
May 17 2010
On Mon, 17 May 2010 17:01:22 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 05/17/2010 03:16 PM, Steven Schveighoffer wrote:These are good ideas.Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I see one major difference here -- "; " is a literal, set(';', ' ') is not. I would expect that 'set' as a generic set type would implement it's guts as some sort of tree/hash, which means a lot of overhead for a simple argument. With the literal version, the notation is in the function, not the type. While it seems rather neat, the overhead should be considered. A compromise: foreach(x; splitter(str, either("; "))) Which can be implemented without heap activity.These are good points. They have gone through my mind as well, but lately I've started to take a somewhat more liberal view of containers. For example, I'm thinking that Set!T (which would be the type returned by set()) could automatically use arrays and linear search for small sets. Other special cases come to mind, such as the small array optimization. In other words Set!T would not have a guaranteed implementation, but instead exploit magnitude to choose among a spectrum of implementation alternatives.Of course using a different name such as either() is even better for the implementation because it can make additional implementation dictated by the restricted use of either(). For example either() could return a FixedSet!T that does not accept adding new members and is optimized accordingly.FixedSet!T would be easily implemented as a sorted array for any number of members (unsorted for under some threshold number of members). My point was simply that all input parameters besides the string literal require heap activity to maintain the set. Much less if you allocate one array, but for something like "; ", I would like to see zero heap activity :) -Steve
May 17 2010
On Mon, May 17, 2010 at 19:44, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> wrote:First, you may want to follow the model set by splitter() instead of split() when defining unjoin(). This is because split() allocates memory whereas splitter splits lazily so it doesn't need to. If you do want split(), just call array(splitter()). Second, there is an ambiguity between splitting using a string as separator and splitting using a set of characters as separator. This could be solved by simply using different names: string str = ...; foreach (splitByOneOf(str, "; ")) { ... } foreach (splitter(str, "; ")) { ... }I personally use a predicate, isOneOf(some range). It's curried, to it produces the 'real' predicate which returns true when its input is in the range. It's something like this: bool delegate(ElementType!R) isOneOf(R)(R range) if (isInputRange!R) { auto r = array(range); sort(r); return (ElementType!R e) { return !find(assumeSorted(r), e).empty;}; } It's been a long time since I used std.algo.find. Is it efficient to sort what will be most of the time a very short array? Maybe !find(range,e) is enough. Usage: splitBy!(isOneOf(";,/"))(rangeToBeSplitted) One nice outcome is that we can then reuse the same pattern in othersignatures.Indeed, and for many things: filtering, stopping iterations (takeWhile, unfoldWhile), splitting... And, of course, taking the negation not!isOneOf => isNotIn(";,/") Philippe
May 17 2010