www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.gregorian contribution

reply negerns <negerns gmail.com> writes:
I filled in some of the functions from Andrei's first draft of 
std.gregorian module. I hope they are good enough.

I changed some identifiers like GregYear, GregMonth, etc to 
GregorianYear, GregorianMonth, etc so as to be consistent with the 
'julian' identifiers but it's a minor changes that maybe I shouldn't 
have touched.

Also, I have introduced a unjoin() function as a helper function. It 
splits a string into an array of lines using the specified array of 
characters as delimiters. I am not sure if there is already an existing 
function that does the same but I could not find it. For lack of a 
better word I opted for the opposite of the join() function in std.string.

string[] unjoin(string s, char[] ch)
{
     uint start = 0;
     uint i = 0;
     string[] result;

     for (i = 0; i < s.length; i++) {
         if (indexOf(ch, s[i]) != -1) {
             result ~= s[start..i];
             start = i + 1;
         }
     }
     if (start < i) {
         result ~= s[start..$];
     }
     return result;
}

unittest {
     string s = "2010-05-31";
     string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
     assert(r[0] == "2010");
     assert(r[1] == "05");
     assert(r[2] == "31");
}

I have modified the signature of fromString() and 
fromUndelimitedString() to accept string arguments instead of char[]. I 
am not sure if it is alright with Andrei.

Here's a list of what I have implemented so far:
- Date fromString(in string s)
- Date fromUndelimitedString(in string s)
-  property string toSimpleString()
-  property string toIsoString()
-  property string toIsoExtendedString()
- added string[] months used only by toSimpleString()
- unit tests

I have attached the .diff file gregorian.diff

Regards,
negerns
May 16 2010
parent reply Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:
negerns wrote:

 Also, I have introduced a unjoin() function as a helper function. It
 splits a string into an array of lines using the specified array of
 characters as delimiters. I am not sure if there is already an 
existing
 function that does the same but I could not find it. For lack of a
 better word I opted for the opposite of the join() function in 
std.string.
 
 string[] unjoin(string s, char[] ch)
 {
      uint start = 0;
      uint i = 0;
      string[] result;
 
      for (i = 0; i < s.length; i++) {
          if (indexOf(ch, s[i]) != -1) {
              result ~= s[start..i];
              start = i + 1;
          }
      }
      if (start < i) {
          result ~= s[start..$];
      }
      return result;
 }
 
 unittest {
      string s = "2010-05-31";
      string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
      assert(r[0] == "2010");
      assert(r[1] == "05");
      assert(r[2] == "31");
 }
Thanks, it's useful. There's std.string.split but it takes only one delimiter. It'd be nice to have it as an overload that takes any range of delims. Yet, a delim can be a string (an array) and there would be problems how to understand split(..., "://"). So I suggest calling it splitBy to disambiguate. Like it? Tomek
May 17 2010
parent reply negerns <negerns gmail.com> writes:
On 5/18/2010 1:03 AM, Tomek Sowiński wrote:
 negerns wrote:

 Also, I have introduced a unjoin() function as a helper function. It
 splits a string into an array of lines using the specified array of
 characters as delimiters. I am not sure if there is already an
existing
 function that does the same but I could not find it. For lack of a
 better word I opted for the opposite of the join() function in
std.string.
 string[] unjoin(string s, char[] ch)
 {
       uint start = 0;
       uint i = 0;
       string[] result;

       for (i = 0; i<  s.length; i++) {
           if (indexOf(ch, s[i]) != -1) {
               result ~= s[start..i];
               start = i + 1;
           }
       }
       if (start<  i) {
           result ~= s[start..$];
       }
       return result;
 }

 unittest {
       string s = "2010-05-31";
       string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
       assert(r[0] == "2010");
       assert(r[1] == "05");
       assert(r[2] == "31");
 }
Thanks, it's useful. There's std.string.split but it takes only one delimiter. It'd be nice to have it as an overload that takes any range of delims. Yet, a delim can be a string (an array) and there would be problems how to understand split(..., "://"). So I suggest calling it splitBy to disambiguate. Like it? Tomek
I wish it wouldn't be too long like splitByChar :) I'm out of ideas.
May 17 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/17/2010 12:32 PM, negerns wrote:
 On 5/18/2010 1:03 AM, Tomek Sowiński wrote:
 negerns wrote:

 Also, I have introduced a unjoin() function as a helper function. It
 splits a string into an array of lines using the specified array of
 characters as delimiters. I am not sure if there is already an
existing
 function that does the same but I could not find it. For lack of a
 better word I opted for the opposite of the join() function in
std.string.
 string[] unjoin(string s, char[] ch)
 {
 uint start = 0;
 uint i = 0;
 string[] result;

 for (i = 0; i< s.length; i++) {
 if (indexOf(ch, s[i]) != -1) {
 result ~= s[start..i];
 start = i + 1;
 }
 }
 if (start< i) {
 result ~= s[start..$];
 }
 return result;
 }

 unittest {
 string s = "2010-05-31";
 string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
 assert(r[0] == "2010");
 assert(r[1] == "05");
 assert(r[2] == "31");
 }
Thanks, it's useful. There's std.string.split but it takes only one delimiter. It'd be nice to have it as an overload that takes any range of delims. Yet, a delim can be a string (an array) and there would be problems how to understand split(..., "://"). So I suggest calling it splitBy to disambiguate. Like it? Tomek
I wish it wouldn't be too long like splitByChar :) I'm out of ideas.
I have two unrelated suggestions about unjoin. First, you may want to follow the model set by splitter() instead of split() when defining unjoin(). This is because split() allocates memory whereas splitter splits lazily so it doesn't need to. If you do want split(), just call array(splitter()). Second, there is an ambiguity between splitting using a string as separator and splitting using a set of characters as separator. This could be solved by simply using different names: string str = ...; foreach (splitByOneOf(str, "; ")) { ... } foreach (splitter(str, "; ")) { ... } First look splits by one of the two, whereas the second splits by the exact string "; ". An idea I am toying with is to factor things out into the data types. After all, if I'm splitting by "one of" an element in a set of elements, that should be reflected in the set's type. For example: foreach (splitter(str, either(';', ' ')) { ... } foreach (splitter(str, "; ")) { ... } or, using a more general notion of a set: foreach (splitter(str, set(';', ' ')) { ... } One nice outcome is that we can then reuse the same pattern in other signatures. Andrei
May 17 2010
next sibling parent reply "Simen kjaeraas" <simen.kjaras gmail.com> writes:
Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I have two unrelated suggestions about unjoin.

 First, you may want to follow the model set by splitter() instead of  
 split() when defining unjoin(). This is because split() allocates memory  
 whereas splitter splits lazily so it doesn't need to. If you do want  
 split(), just call array(splitter()).

 Second, there is an ambiguity between splitting using a string as  
 separator and splitting using a set of characters as separator. This  
 could be solved by simply using different names:

 string str = ...;
 foreach (splitByOneOf(str, "; ")) { ... }
 foreach (splitter(str, "; ")) { ... }

 First look splits by one of the two, whereas the second splits by the  
 exact string "; ".

 An idea I am toying with is to factor things out into the data types.  
 After all, if I'm splitting by "one of" an element in a set of elements,  
 that should be reflected in the set's type. For example:

 foreach (splitter(str, either(';', ' ')) { ... }
 foreach (splitter(str, "; ")) { ... }

 or, using a more general notion of a set:

 foreach (splitter(str, set(';', ' ')) { ... }
D could use a set type, and this is a very nice way to specify these different parameters. votes = -~votes; -- Simen
May 17 2010
next sibling parent Pelle <pelle.mansson gmail.com> writes:
On 05/17/2010 08:00 PM, Simen kjaeraas wrote:
 D could use a set type, and this is a very nice way to specify these
 different parameters.

 votes = -~votes;
I agree, and I find a set type to be generally very useful. :)
May 17 2010
prev sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 17 May 2010 14:00:41 -0400, Simen kjaeraas  
<simen.kjaras gmail.com> wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I have two unrelated suggestions about unjoin.

 First, you may want to follow the model set by splitter() instead of  
 split() when defining unjoin(). This is because split() allocates  
 memory whereas splitter splits lazily so it doesn't need to. If you do  
 want split(), just call array(splitter()).

 Second, there is an ambiguity between splitting using a string as  
 separator and splitting using a set of characters as separator. This  
 could be solved by simply using different names:

 string str = ...;
 foreach (splitByOneOf(str, "; ")) { ... }
 foreach (splitter(str, "; ")) { ... }

 First look splits by one of the two, whereas the second splits by the  
 exact string "; ".

 An idea I am toying with is to factor things out into the data types.  
 After all, if I'm splitting by "one of" an element in a set of  
 elements, that should be reflected in the set's type. For example:

 foreach (splitter(str, either(';', ' ')) { ... }
 foreach (splitter(str, "; ")) { ... }

 or, using a more general notion of a set:

 foreach (splitter(str, set(';', ' ')) { ... }
D could use a set type, and this is a very nice way to specify these different parameters. votes = -~votes;
Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I see one major difference here -- "; " is a literal, set(';', ' ') is not. I would expect that 'set' as a generic set type would implement it's guts as some sort of tree/hash, which means a lot of overhead for a simple argument. With the literal version, the notation is in the function, not the type. While it seems rather neat, the overhead should be considered. A compromise: foreach(x; splitter(str, either("; "))) Which can be implemented without heap activity. -Steve
May 17 2010
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/17/2010 03:16 PM, Steven Schveighoffer wrote:
 Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I
 see one major difference here -- "; " is a literal, set(';', ' ') is not.

 I would expect that 'set' as a generic set type would implement it's
 guts as some sort of tree/hash, which means a lot of overhead for a
 simple argument. With the literal version, the notation is in the
 function, not the type. While it seems rather neat, the overhead should
 be considered.

 A compromise:

 foreach(x; splitter(str, either("; ")))

 Which can be implemented without heap activity.
These are good points. They have gone through my mind as well, but lately I've started to take a somewhat more liberal view of containers. For example, I'm thinking that Set!T (which would be the type returned by set()) could automatically use arrays and linear search for small sets. Other special cases come to mind, such as the small array optimization. In other words Set!T would not have a guaranteed implementation, but instead exploit magnitude to choose among a spectrum of implementation alternatives. Of course using a different name such as either() is even better for the implementation because it can make additional implementation dictated by the restricted use of either(). For example either() could return a FixedSet!T that does not accept adding new members and is optimized accordingly. Andrei
May 17 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 17 May 2010 17:01:22 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 05/17/2010 03:16 PM, Steven Schveighoffer wrote:
 Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I
 see one major difference here -- "; " is a literal, set(';', ' ') is  
 not.

 I would expect that 'set' as a generic set type would implement it's
 guts as some sort of tree/hash, which means a lot of overhead for a
 simple argument. With the literal version, the notation is in the
 function, not the type. While it seems rather neat, the overhead should
 be considered.

 A compromise:

 foreach(x; splitter(str, either("; ")))

 Which can be implemented without heap activity.
These are good points. They have gone through my mind as well, but lately I've started to take a somewhat more liberal view of containers. For example, I'm thinking that Set!T (which would be the type returned by set()) could automatically use arrays and linear search for small sets. Other special cases come to mind, such as the small array optimization. In other words Set!T would not have a guaranteed implementation, but instead exploit magnitude to choose among a spectrum of implementation alternatives.
These are good ideas.
 Of course using a different name such as either() is even better for the  
 implementation because it can make additional implementation dictated by  
 the restricted use of either(). For example either() could return a  
 FixedSet!T that does not accept adding new members and is optimized  
 accordingly.
FixedSet!T would be easily implemented as a sorted array for any number of members (unsorted for under some threshold number of members). My point was simply that all input parameters besides the string literal require heap activity to maintain the set. Much less if you allocate one array, but for something like "; ", I would like to see zero heap activity :) -Steve
May 17 2010
prev sibling parent Philippe Sigaud <philippe.sigaud gmail.com> writes:
On Mon, May 17, 2010 at 19:44, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 First, you may want to follow the model set by splitter() instead of
 split() when defining unjoin(). This is because split() allocates memory
 whereas splitter splits lazily so it doesn't need to. If you do want
 split(), just call array(splitter()).

 Second, there is an ambiguity between splitting using a string as separator
 and splitting using a set of characters as separator. This could be solved
 by simply using different names:

 string str = ...;
 foreach (splitByOneOf(str, "; ")) { ... }
 foreach (splitter(str, "; ")) { ... }
I personally use a predicate, isOneOf(some range). It's curried, to it produces the 'real' predicate which returns true when its input is in the range. It's something like this: bool delegate(ElementType!R) isOneOf(R)(R range) if (isInputRange!R) { auto r = array(range); sort(r); return (ElementType!R e) { return !find(assumeSorted(r), e).empty;}; } It's been a long time since I used std.algo.find. Is it efficient to sort what will be most of the time a very short array? Maybe !find(range,e) is enough. Usage: splitBy!(isOneOf(";,/"))(rangeToBeSplitted) One nice outcome is that we can then reuse the same pattern in other
 signatures.
Indeed, and for many things: filtering, stopping iterations (takeWhile, unfoldWhile), splitting... And, of course, taking the negation not!isOneOf => isNotIn(";,/") Philippe
May 17 2010