www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - string splitting funcs

While we're at tweaking std.string:

When writing string libs or types (like Text recently), I implement 3 
string splitting methods. This may --or not-- be useful for D's string 
module.

The core point is: what to do with empty parts? They may be generated when:
* the separator is present at either end of the source string
* successive separators occur in the source string
Thus,
     split("--abc-----def----", "--")
basically returns
     ["","abc,"","def","",""]

This may be or not what we expect. But why? I ended up considering there 
are 2 distinct use cases where we need to split a string:
1. it is like a record (fields)
2. it is like a list (elements)

In the first case, we want to keep empty fields so that each field has a 
constant index, and sometimes empty fields are meaningful. For instance, 
in name--phone--email, when phone is absent, we still want email as 
third field.
In the case of a list instead, most commonly empty elements are 
irrelevant, actually often due to flexibility of the grammar (not always 
formal). For instance, lists of words / numbers / tokens; or more simply 
lines: we will rarely keep blank ones for further process.

This leads to 2 different string splitting funcs, eg
     string[] listElements (string sep)
     string[] recordFields (string sep)
(names discussable ;-)
The first func is symmetric to join. The second one may simply filter 
the first one's results, or instead drop empty elements on the fly.

Finally, there is a third, different, use case, which may well be the 
most common one, and requires yet another func:
     string[] split (string whitespace=" \t\n")
which indeed splits on any whitespace. Usually, the expected behaviour 
is any combination or repetition of ws chars is considered a single 
separator; but ws at start/end well generates an empty part.

Makes sense?

Denis
_________________
vita es estrany
spir.wikidot.com
Jan 22 2011