digitalmars.D - std.gregorian contribution

negerns (48/48) May 16 2010 I filled in some of the functions from Andrei's first draft of

Tomek =?UTF-8?B?U293acWEc2tp?= (9/40) May 17 2010 std.string.

negerns (3/43) May 17 2010 I wish it wouldn't be too long like splitByChar :)

Andrei Alexandrescu (24/71) May 17 2010 I have two unrelated suggestions about unjoin.

Simen kjaeraas (6/26) May 17 2010 D could use a set type, and this is a very nice way to specify these

Pelle (2/5) May 17 2010 I agree, and I find a set type to be generally very useful. :)
Steven Schveighoffer (12/44) May 17 2010 Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I se...

Andrei Alexandrescu (15/25) May 17 2010 These are good points. They have gone through my mind as well, but

Steven Schveighoffer (10/39) May 17 2010 These are good ideas.

Philippe Sigaud (22/33) May 17 2010 I personally use a predicate, isOneOf(some range). It's curried, to it

negerns <negerns gmail.com> writes:

I filled in some of the functions from Andrei's first draft of 
std.gregorian module. I hope they are good enough.

I changed some identifiers like GregYear, GregMonth, etc to 
GregorianYear, GregorianMonth, etc so as to be consistent with the 
'julian' identifiers but it's a minor changes that maybe I shouldn't 
have touched.

Also, I have introduced a unjoin() function as a helper function. It 
splits a string into an array of lines using the specified array of 
characters as delimiters. I am not sure if there is already an existing 
function that does the same but I could not find it. For lack of a 
better word I opted for the opposite of the join() function in std.string.

string[] unjoin(string s, char[] ch)
{
     uint start = 0;
     uint i = 0;
     string[] result;

     for (i = 0; i < s.length; i++) {
         if (indexOf(ch, s[i]) != -1) {
             result ~= s[start..i];
             start = i + 1;
         }
     }
     if (start < i) {
         result ~= s[start..$];
     }
     return result;
}

unittest {
     string s = "2010-05-31";
     string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
     assert(r[0] == "2010");
     assert(r[1] == "05");
     assert(r[2] == "31");
}

I have modified the signature of fromString() and 
fromUndelimitedString() to accept string arguments instead of char[]. I 
am not sure if it is alright with Andrei.

Here's a list of what I have implemented so far:
- Date fromString(in string s)
- Date fromUndelimitedString(in string s)
-  property string toSimpleString()
-  property string toIsoString()
-  property string toIsoExtendedString()
- added string[] months used only by toSimpleString()
- unit tests

I have attached the .diff file gregorian.diff

Regards,
negerns

May 16 2010

Tomek =?UTF-8?B?U293acWEc2tp?= <just ask.me> writes:

negerns wrote:

 Also, I have introduced a unjoin() function as a helper function. It
 splits a string into an array of lines using the specified array of
 characters as delimiters. I am not sure if there is already an 

existing
 function that does the same but I could not find it. For lack of a
 better word I opted for the opposite of the join() function in 

std.string.
 
 string[] unjoin(string s, char[] ch)
 {
      uint start = 0;
      uint i = 0;
      string[] result;
 
      for (i = 0; i < s.length; i++) {
          if (indexOf(ch, s[i]) != -1) {
              result ~= s[start..i];
              start = i + 1;
          }
      }
      if (start < i) {
          result ~= s[start..$];
      }
      return result;
 }
 
 unittest {
      string s = "2010-05-31";
      string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
      assert(r[0] == "2010");
      assert(r[1] == "05");
      assert(r[2] == "31");
 }

Thanks, it's useful. There's std.string.split but it takes only one 
delimiter. It'd be nice to have it as an overload that takes any range 
of delims. Yet, a delim can be a string (an array) and there would be 
problems how to understand split(..., "://"). So I suggest calling it 
splitBy to disambiguate. Like it?


Tomek

May 17 2010

negerns <negerns gmail.com> writes:

On 5/18/2010 1:03 AM, Tomek Sowiński wrote:
 negerns wrote:

 Also, I have introduced a unjoin() function as a helper function. It
 splits a string into an array of lines using the specified array of
 characters as delimiters. I am not sure if there is already an

 existing
 function that does the same but I could not find it. For lack of a
 better word I opted for the opposite of the join() function in

 std.string.
 string[] unjoin(string s, char[] ch)
 {
       uint start = 0;
       uint i = 0;
       string[] result;

       for (i = 0; i<  s.length; i++) {
           if (indexOf(ch, s[i]) != -1) {
               result ~= s[start..i];
               start = i + 1;
           }
       }
       if (start<  i) {
           result ~= s[start..$];
       }
       return result;
 }

 unittest {
       string s = "2010-05-31";
       string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
       assert(r[0] == "2010");
       assert(r[1] == "05");
       assert(r[2] == "31");
 }

 Thanks, it's useful. There's std.string.split but it takes only one
 delimiter. It'd be nice to have it as an overload that takes any range
 of delims. Yet, a delim can be a string (an array) and there would be
 problems how to understand split(..., "://"). So I suggest calling it
 splitBy to disambiguate. Like it?


 Tomek

I wish it wouldn't be too long like splitByChar :)
I'm out of ideas.

May 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 05/17/2010 12:32 PM, negerns wrote:
 On 5/18/2010 1:03 AM, Tomek Sowiński wrote:
 negerns wrote:

 Also, I have introduced a unjoin() function as a helper function. It
 splits a string into an array of lines using the specified array of
 characters as delimiters. I am not sure if there is already an

 existing
 function that does the same but I could not find it. For lack of a
 better word I opted for the opposite of the join() function in

 std.string.
 string[] unjoin(string s, char[] ch)
 {
 uint start = 0;
 uint i = 0;
 string[] result;

 for (i = 0; i< s.length; i++) {
 if (indexOf(ch, s[i]) != -1) {
 result ~= s[start..i];
 start = i + 1;
 }
 }
 if (start< i) {
 result ~= s[start..$];
 }
 return result;
 }

 unittest {
 string s = "2010-05-31";
 string[] r = unjoin(s, ['/', '-', '.', ',', '\\']);
 assert(r[0] == "2010");
 assert(r[1] == "05");
 assert(r[2] == "31");
 }

 Thanks, it's useful. There's std.string.split but it takes only one
 delimiter. It'd be nice to have it as an overload that takes any range
 of delims. Yet, a delim can be a string (an array) and there would be
 problems how to understand split(..., "://"). So I suggest calling it
 splitBy to disambiguate. Like it?


 Tomek

 I wish it wouldn't be too long like splitByChar :)
 I'm out of ideas.

I have two unrelated suggestions about unjoin.

First, you may want to follow the model set by splitter() instead of 
split() when defining unjoin(). This is because split() allocates memory 
whereas splitter splits lazily so it doesn't need to. If you do want 
split(), just call array(splitter()).

Second, there is an ambiguity between splitting using a string as 
separator and splitting using a set of characters as separator. This 
could be solved by simply using different names:

string str = ...;
foreach (splitByOneOf(str, "; ")) { ... }
foreach (splitter(str, "; ")) { ... }

First look splits by one of the two, whereas the second splits by the 
exact string "; ".

An idea I am toying with is to factor things out into the data types. 
After all, if I'm splitting by "one of" an element in a set of elements, 
that should be reflected in the set's type. For example:

foreach (splitter(str, either(';', ' ')) { ... }
foreach (splitter(str, "; ")) { ... }

or, using a more general notion of a set:

foreach (splitter(str, set(';', ' ')) { ... }

One nice outcome is that we can then reuse the same pattern in other 
signatures.


Andrei

May 17 2010

"Simen kjaeraas" <simen.kjaras gmail.com> writes:

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I have two unrelated suggestions about unjoin.

 First, you may want to follow the model set by splitter() instead of  
 split() when defining unjoin(). This is because split() allocates memory  
 whereas splitter splits lazily so it doesn't need to. If you do want  
 split(), just call array(splitter()).

 Second, there is an ambiguity between splitting using a string as  
 separator and splitting using a set of characters as separator. This  
 could be solved by simply using different names:

 string str = ...;
 foreach (splitByOneOf(str, "; ")) { ... }
 foreach (splitter(str, "; ")) { ... }

 First look splits by one of the two, whereas the second splits by the  
 exact string "; ".

 An idea I am toying with is to factor things out into the data types.  
 After all, if I'm splitting by "one of" an element in a set of elements,  
 that should be reflected in the set's type. For example:

 foreach (splitter(str, either(';', ' ')) { ... }
 foreach (splitter(str, "; ")) { ... }

 or, using a more general notion of a set:

 foreach (splitter(str, set(';', ' ')) { ... }

D could use a set type, and this is a very nice way to specify these
different parameters.

votes = -~votes;

-- 
Simen

May 17 2010

Pelle <pelle.mansson gmail.com> writes:

On 05/17/2010 08:00 PM, Simen kjaeraas wrote:
 D could use a set type, and this is a very nice way to specify these
 different parameters.

 votes = -~votes;

I agree, and I find a set type to be generally very useful. :)

May 17 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 17 May 2010 14:00:41 -0400, Simen kjaeraas  
<simen.kjaras gmail.com> wrote:

 Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:

 I have two unrelated suggestions about unjoin.

 First, you may want to follow the model set by splitter() instead of  
 split() when defining unjoin(). This is because split() allocates  
 memory whereas splitter splits lazily so it doesn't need to. If you do  
 want split(), just call array(splitter()).

 Second, there is an ambiguity between splitting using a string as  
 separator and splitting using a set of characters as separator. This  
 could be solved by simply using different names:

 string str = ...;
 foreach (splitByOneOf(str, "; ")) { ... }
 foreach (splitter(str, "; ")) { ... }

 First look splits by one of the two, whereas the second splits by the  
 exact string "; ".

 An idea I am toying with is to factor things out into the data types.  
 After all, if I'm splitting by "one of" an element in a set of  
 elements, that should be reflected in the set's type. For example:

 foreach (splitter(str, either(';', ' ')) { ... }
 foreach (splitter(str, "; ")) { ... }

 or, using a more general notion of a set:

 foreach (splitter(str, set(';', ' ')) { ... }

 D could use a set type, and this is a very nice way to specify these
 different parameters.

 votes = -~votes;

Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I see  
one major difference here  -- "; " is a literal, set(';', ' ') is not.

I would expect that 'set' as a generic set type would implement it's guts  
as some sort of tree/hash, which means a lot of overhead for a simple  
argument.  With the literal version, the notation is in the function, not  
the type.  While it seems rather neat, the overhead should be considered.

A compromise:

foreach(x; splitter(str, either("; ")))

Which can be implemented without heap activity.

-Steve

May 17 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 05/17/2010 03:16 PM, Steven Schveighoffer wrote:
 Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I
 see one major difference here -- "; " is a literal, set(';', ' ') is not.

 I would expect that 'set' as a generic set type would implement it's
 guts as some sort of tree/hash, which means a lot of overhead for a
 simple argument. With the literal version, the notation is in the
 function, not the type. While it seems rather neat, the overhead should
 be considered.

 A compromise:

 foreach(x; splitter(str, either("; ")))

 Which can be implemented without heap activity.

These are good points. They have gone through my mind as well, but 
lately I've started to take a somewhat more liberal view of containers. 
For example, I'm thinking that Set!T (which would be the type returned 
by set()) could automatically use arrays and linear search for small 
sets. Other special cases come to mind, such as the small array 
optimization. In other words Set!T would not have a guaranteed 
implementation, but instead exploit magnitude to choose among a spectrum 
of implementation alternatives.

Of course using a different name such as either() is even better for the 
implementation because it can make additional implementation dictated by 
the restricted use of either(). For example either() could return a 
FixedSet!T that does not accept adding new members and is optimized 
accordingly.


Andrei

May 17 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Mon, 17 May 2010 17:01:22 -0400, Andrei Alexandrescu  
<SeeWebsiteForEmail erdani.org> wrote:

 On 05/17/2010 03:16 PM, Steven Schveighoffer wrote:
 Comparing splitByOneOf(str, "; ")) to splitter(str, set(';', ' ')), I
 see one major difference here -- "; " is a literal, set(';', ' ') is  
 not.

 I would expect that 'set' as a generic set type would implement it's
 guts as some sort of tree/hash, which means a lot of overhead for a
 simple argument. With the literal version, the notation is in the
 function, not the type. While it seems rather neat, the overhead should
 be considered.

 A compromise:

 foreach(x; splitter(str, either("; ")))

 Which can be implemented without heap activity.

 These are good points. They have gone through my mind as well, but  
 lately I've started to take a somewhat more liberal view of containers.  
 For example, I'm thinking that Set!T (which would be the type returned  
 by set()) could automatically use arrays and linear search for small  
 sets. Other special cases come to mind, such as the small array  
 optimization. In other words Set!T would not have a guaranteed  
 implementation, but instead exploit magnitude to choose among a spectrum  
 of implementation alternatives.

These are good ideas.

 Of course using a different name such as either() is even better for the  
 implementation because it can make additional implementation dictated by  
 the restricted use of either(). For example either() could return a  
 FixedSet!T that does not accept adding new members and is optimized  
 accordingly.

FixedSet!T would be easily implemented as a sorted array for any number of  
members (unsorted for under some threshold number of members).

My point was simply that all input parameters besides the string literal  
require heap activity to maintain the set.  Much less if you allocate one  
array, but for something like "; ", I would like to see zero heap activity  
:)

-Steve

May 17 2010

Philippe Sigaud <philippe.sigaud gmail.com> writes:

On Mon, May 17, 2010 at 19:44, Andrei Alexandrescu <
SeeWebsiteForEmail erdani.org> wrote:

 First, you may want to follow the model set by splitter() instead of
 split() when defining unjoin(). This is because split() allocates memory
 whereas splitter splits lazily so it doesn't need to. If you do want
 split(), just call array(splitter()).

 Second, there is an ambiguity between splitting using a string as separator
 and splitting using a set of characters as separator. This could be solved
 by simply using different names:

 string str = ...;
 foreach (splitByOneOf(str, "; ")) { ... }
 foreach (splitter(str, "; ")) { ... }

I personally use a predicate, isOneOf(some range). It's curried, to it
produces the 'real' predicate which returns true when its input is in the
range.

It's something like this:

bool delegate(ElementType!R) isOneOf(R)(R range) if (isInputRange!R)
{
    auto r = array(range);
    sort(r);
    return (ElementType!R e) { return !find(assumeSorted(r), e).empty;};
}

It's been a long time since I used std.algo.find. Is it efficient to sort
what will be most of the time a very short array? Maybe !find(range,e) is
enough.

Usage:

     splitBy!(isOneOf(";,/"))(rangeToBeSplitted)



One nice outcome is that we can then reuse the same pattern in other
 signatures.


Indeed, and for many things: filtering, stopping iterations (takeWhile,
unfoldWhile), splitting...
And, of course, taking the negation not!isOneOf => isNotIn(";,/")


Philippe

May 17 2010

D Programming

C/C++ Programming

Other

digitalmars.D - std.gregorian contribution