digitalmars.D.learn - code D'ish enough? - ignore previous post with same subject

thorstein (54/54) Feb 26 2017 Hi,

ag0aep6g (62/104) Feb 26 2017 * "no-data-value"?

Thorstein (8/11) Feb 27 2017 No-data-values in data sets are common at least in geosciences:

Jordan Wilson (23/77) Feb 26 2017 I'm in a similar boat, as I continue to learn D, I find myself

Thorstein (3/22) Feb 27 2017 This looks like more friendly readable compact code, where should
crimaniak (3/4) Feb 28 2017 If lambda just calls another function you can pass it directly:

Jordan Wilson (2/7) Feb 28 2017 Learn something new everyday, thanks :-)

thorstein <thorstein novalid.de> writes:

Hi,

sorry for posting again, but I used a keyboard combination that 
accidently send my post before it was done.

Coming more or less from Python I just started with D. Not a real 
programmer, just automating things and looking for a neat 
compiled language.

Just to learn, I wrote a function to read CSV-like files (I know 
D has its own routine). Since I'm still a bit overwhelmed by the 
many complex language features, I'm curious what could I change 
to make my code as D'ish as possible?

Thank you for any suggestion,
Thorstein

// Reads CSV-like files with only numeric values in each column
// new_ndv replaces ndv, which is the original no-data-value
double[][]* readNumMatCsv(char[] filePath, int numHeaderLines, 
char[] ndv, char[] new_ndv)
{ double[][]* p_numArray;
   double[][] numArray;
   char[] line;
   string noCont = "File content not usable. Quit here.";
   string noFile = "Could not read file. Quit here.";
   string nonNum = "Found a non-numeric value in data matrix. Quit 
here.";
   Regex!char re = regex(r"(\n$)");

   if(exists(filePath))
   { File f = File(filePath, "r");
     if((line = f.readln().dup).length > 0)
     { while (!f.eof())
       // 1st replace ndv with new_ndv, 2nd remove all \n, 3rd 
check id size of read line >0
       { if((line = replaceAll(f.readln().dup.replace(ndv, 
new_ndv), re, "")).length > 0)
         // check if all elements of splitted line are numeric
         { foreach(i;split(line,","))
           { if(isNumeric(i) == false)
             // otherwise return pointer to empty array
             { writeln(nonNum);
               return p_numArray;
             }
           }
           // convert characters to double
           if(split(line,",").length > 0)
           { numArray ~= to!(double[])(split(line,","));
           }
         }
       }
       // pass reference to pointer
       p_numArray = &numArray;
       // first line empty -> return pointer to empty array
     } else { writeln(noCont); }
     // file could not be find
   } else { writeln(noFile); }
   return p_numArray;
}

Feb 26 2017

ag0aep6g <anonymous example.com> writes:

On Sunday, 26 February 2017 at 19:34:33 UTC, thorstein wrote:
 // Reads CSV-like files with only numeric values in each column
 // new_ndv replaces ndv, which is the original no-data-value
 double[][]* readNumMatCsv(char[] filePath, int numHeaderLines, 
 char[] ndv, char[] new_ndv)

* "no-data-value"?
* Returning a pointer seems dubious. I'd expect double[][] to 
work.
* It's not obvious to me what "Mat" means in "readNumMatCsv".
* The char[] arguments should be const if possible.
* You can probably add some attributes to the function: pure, 
 safe.

 { double[][]* p_numArray;

This brace style is rather uncommon.

   double[][] numArray;
   char[] line;
   string noCont = "File content not usable. Quit here.";
   string noFile = "Could not read file. Quit here.";
   string nonNum = "Found a non-numeric value in data matrix. 
 Quit here.";

Those message constants should be enums or static immutable. 
Personally, I'd just put the literals directly in the writeln 
calls, as you're not using them only once.

   Regex!char re = regex(r"(\n$)");

   if(exists(filePath))
   { File f = File(filePath, "r");
     if((line = f.readln().dup).length > 0)

As far as I see, you're never using the line you're reading here. 
Does this just skip the header line? You're also not using 
numHeaderLines. Maybe this should loop numHeaderLines times? 
There's no need to assign to `line` then. Also no need to dup.

     { while (!f.eof())
       // 1st replace ndv with new_ndv, 2nd remove all \n, 3rd 
 check id size of read line >0
       { if((line = replaceAll(f.readln().dup.replace(ndv, 
 new_ndv), re, "")).length > 0)

byLineCopy may be simpler to use here than readln. For this use 
case, byLine (no copy) could probably work, too. But byLineCopy 
is less bug prone, so better start with that one.

https://dlang.org/phobos/std_stdio.html#.File.byLineCopy

It's still not clear to me what's the deal with ndv and new_ndv.

A line should contain at most one newline, at the end. There's a 
special function in std.string to remove an optional suffic: 
chomp. And with byLineCopy there's a parameter to omit the 
newline. I'd prefer both of those over regex here.

https://dlang.org/phobos/std_string.html#.chomp

         // check if all elements of splitted line are numeric
         { foreach(i;split(line,","))

A string that's called "i" may be surprising to some.

Could probably use the range version of `split`: 
std.algorithm.iteration.splitter. That would avoid an allocation. 
But split is ok if you're more comfortable with arrays than with 
ranges.

https://dlang.org/phobos/std_algorithm_iteration.html#.splitter

           { if(isNumeric(i) == false)

if (!isNumeric(i))

             // otherwise return pointer to empty array
             { writeln(nonNum);
               return p_numArray;
             }
           }
           // convert characters to double
           if(split(line,",").length > 0)
           { numArray ~= to!(double[])(split(line,","));

You're executing `split(line, ",")` three times. Better do it 
just once and save the result in a (const) variable.

Instead of checking beforehand if all values are numeric, you can 
also just let to!(double[]) fail, and catch the exception if you 
want.

If I read it right, the `.length > 0` check means that blank 
lines are allowed at this point, right? Seems inconsistent to 
forbid a blank leading line but allow them later on.

           }
         }
       }
       // pass reference to pointer
       p_numArray = &numArray;

No, no, no. This is a pointer to a local variable and later you 
return it. That's invalid, undefined behavior. Don't do it. Just 
return numArray itself. As expected, you don't need to return a 
pointer, just double[][] is fine.

       // first line empty -> return pointer to empty array
     } else { writeln(noCont); }
     // file could not be find
   } else { writeln(noFile); }

Instead of the nested `if`s style, you could reverse the 
conditions and return early, or maybe throw an Exception:

----
if (/* ... no file ... */)
{
     writeln(noFile);
     return [];
}
if (/* ... no content ... */)
{
     writeln(noCont);
     return [];
}
/* ... rest of the code ... */
----

That saves some indentation levels, and makes the purpose of 
those checks more obvious.

   return p_numArray;
 }

Feb 26 2017

Thorstein <tlange invalid.de> writes:

I really appriciate your comments and thoughts!

On Sunday, 26 February 2017 at 21:02:52 UTC, ag0aep6g wrote:
 On Sunday, 26 February 2017 at 19:34:33 UTC, thorstein wrote:

 * "no-data-value"?

No-data-values in data sets are common at least in geosciences: 
raster images, spatial simulation outputs etc. Not all cells have 
evaluable information and need to be filtered by a specific very 
high or very low numeric value. And different software may 
require different no-data-values.

 * It's not obvious to me what "Mat" means in "readNumMatCsv".

means 'read numeric matrix from csv-like files'

Feb 27 2017

Jordan Wilson <wilsonjord gmail.com> writes:

On Sunday, 26 February 2017 at 19:34:33 UTC, thorstein wrote:
 Hi,

 sorry for posting again, but I used a keyboard combination that 
 accidently send my post before it was done.

 Coming more or less from Python I just started with D. Not a 
 real programmer, just automating things and looking for a neat 
 compiled language.

 Just to learn, I wrote a function to read CSV-like files (I 
 know D has its own routine). Since I'm still a bit overwhelmed 
 by the many complex language features, I'm curious what could I 
 change to make my code as D'ish as possible?

 Thank you for any suggestion,
 Thorstein

 // Reads CSV-like files with only numeric values in each column
 // new_ndv replaces ndv, which is the original no-data-value
 double[][]* readNumMatCsv(char[] filePath, int numHeaderLines, 
 char[] ndv, char[] new_ndv)
 { double[][]* p_numArray;
   double[][] numArray;
   char[] line;
   string noCont = "File content not usable. Quit here.";
   string noFile = "Could not read file. Quit here.";
   string nonNum = "Found a non-numeric value in data matrix. 
 Quit here.";
   Regex!char re = regex(r"(\n$)");

   if(exists(filePath))
   { File f = File(filePath, "r");
     if((line = f.readln().dup).length > 0)
     { while (!f.eof())
       // 1st replace ndv with new_ndv, 2nd remove all \n, 3rd 
 check id size of read line >0
       { if((line = replaceAll(f.readln().dup.replace(ndv, 
 new_ndv), re, "")).length > 0)
         // check if all elements of splitted line are numeric
         { foreach(i;split(line,","))
           { if(isNumeric(i) == false)
             // otherwise return pointer to empty array
             { writeln(nonNum);
               return p_numArray;
             }
           }
           // convert characters to double
           if(split(line,",").length > 0)
           { numArray ~= to!(double[])(split(line,","));
           }
         }
       }
       // pass reference to pointer
       p_numArray = &numArray;
       // first line empty -> return pointer to empty array
     } else { writeln(noCont); }
     // file could not be find
   } else { writeln(noFile); }
   return p_numArray;
 }

I'm in a similar boat, as I continue to learn D, I find myself 
using UFCS, "auto", and operating on ranges alot more; I think 
that's considered idiomatic.

For example, here is my rough attempt:
auto readNumMatCsv2 (string filePath, string ndv, string new_ndv){
     double[][] p_numArray;
     try {
         auto lines = File(filePath,"r").byLine;
         lines.popFront; // get read of header
         p_numArray = lines.map!(a => a.replace (ndv,new_ndv)
                                       .splitter (",")
                                       .map!(a => a.to!double)
                                       .array)
                           .array;
     } catch (Exception e){
         e.msg.writeln; // this replaces "Could not read file. 
Quit here."
     }
     return p_numArray;
}

It took me quite a while to get the whole usage of range stuff 
like "map" and "filter" etc., but I think it's worth the effort.

Feb 26 2017

Thorstein <tlange invalid.de> writes:

On Sunday, 26 February 2017 at 21:50:38 UTC, Jordan Wilson wrote:
 auto readNumMatCsv2 (string filePath, string ndv, string 
 new_ndv){
     double[][] p_numArray;
     try {
         auto lines = File(filePath,"r").byLine;
         lines.popFront; // get read of header
         p_numArray = lines.map!(a => a.replace (ndv,new_ndv)
                                       .splitter (",")
                                       .map!(a => a.to!double)
                                       .array)
                           .array;
     } catch (Exception e){
         e.msg.writeln; // this replaces "Could not read file. 
 Quit here."
     }
     return p_numArray;
 }

 It took me quite a while to get the whole usage of range stuff 
 like "map" and "filter" etc., but I think it's worth the effort.

This looks like more friendly readable compact code, where should 
be the goal. Thanks for your insights!

Feb 27 2017

crimaniak <crimaniak gmail.com> writes:

On Sunday, 26 February 2017 at 21:50:38 UTC, Jordan Wilson wrote:
     .map!(a => a.to!double)

  If lambda just calls another function you can pass it directly:
== .map!(to!double)

Feb 28 2017

Jordan Wilson <wilsonjord gmail.com> writes:

On Tuesday, 28 February 2017 at 20:49:39 UTC, crimaniak wrote:
 On Sunday, 26 February 2017 at 21:50:38 UTC, Jordan Wilson 
 wrote:
     .map!(a => a.to!double)

  If lambda just calls another function you can pass it directly:
 == .map!(to!double)

Learn something new everyday, thanks :-)

Feb 28 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - code D'ish enough? - ignore previous post with same subject