www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How come a count of a range becomes 0 before a foreach?

reply ikelaiah <iwan.kelaiah gmail.com> writes:
Hi,

I've written a file that converts Rmd (R Markdown file), to a 
MarkDown file.

All works well if and only if line 73 is commented out.

I marked line 72 in the code below.

If line 73 is not commented out, `foreach` does not execute as 
the `rmdFiles.walklength` in line 82 becomes `0`.

How does `rmdFiles.walkLength` becomes `0` before the `foreach`? 
I'm a but confused.

Can someone clarify? Thank you.


```d
module rmd2md;

import std.algorithm;
import std.stdio;
import file = std.file;
import std.conv;
import std.regex;
import std.getopt;
import std.path;
import std.datetime;
import std.parallelism;
import std.range;

void main(string[] args)
{
     // Set variables for the main program
     string programName = "Rmd2md";

     // Setup Regex for capturing Rmd code snippet header
     Regex!char re = regex(r"`{3}\{r[a-zA-Z0-9= ]*\}", "g");

     // Set default values for the arguments
     string inputPath = file.getcwd();
     string fileEndsWith = ".Rmd";
     string outputPath = file.getcwd();

     // Set GetOpt variables
     auto helpInformation = getopt(
         args,
         "path|p", "Path of Rmd files. Default: current working 
directory.", &inputPath,
         "fext|e", "Extension of Rmd files. Default: `.Rmd`", 
&fileEndsWith,
         "fout|o", "Output folder to save the MD files. Default: 
current working directory.", &outputPath
     );

     if (helpInformation.helpWanted)
     {
         defaultGetoptPrinter("Rmd to Markdown (md) file 
converter.",
             helpInformation.options);
         return;
     }

     // is the path valid?
     if (!std.path.isValidPath(inputPath))
     {
         writeln(programName ~ ": invalid input path");
         return;
     }

     // is output path valid?
     if (!std.path.isValidPath(outputPath))
     {
         writeln(programName ~ ": invalid output path");
         return;
     }

     // is file extension valid?
     if (!startsWith(fileEndsWith, "."))
     {
         writeln(programName ~ ": invalid extension given");
         return;
     }

     writeln(programName ~ ": input directory is " ~ inputPath);
     writeln(programName ~ ": output directory is " ~ outputPath);
     writeln(programName ~ ": ...");

     // Get files in specified inputPath variable with a specific 
extension
     auto rmdFiles = file.dirEntries(inputPath, 
file.SpanMode.shallow)
         .filter!(f => f.isFile)
         .filter!(f => f.name.endsWith(fileEndsWith));

     // LINE 72 -- WARNING -- If we count the range here, later it 
will become 0 in line 82
     writeln(programName ~ ": number of files found " ~ 
to!string(rmdFiles.walkLength));

     // Get start time
     auto stattime = Clock.currTime();

     // Process each Rmd file
     int fileWrittenCount = 0;

     // LINE 81 -- WARNING -- if line 73 is not commented out, the 
walkLength returns 0
     writeln(programName ~ ": number of files found " ~ 
to!string(rmdFiles.walkLength));

     foreach (file.DirEntry item; parallel(rmdFiles))
     {
         writeln(programName ~ ": processing " ~ item.name);

         try
         {
             // Read content as string
             string content = file.readText(item.name);
             // Replace ```{r} or ```{r option1=value} with ```R
             string modified = replaceAll(content, re, "```R");
             // Set the Markdown output file
             string outputFile = replaceAll(baseName(item.name), 
regex(r".Rmd"), ".md");
             // Build an output path, using output path and 
baseName(item.name)
             string outputFilenamePath = buildPath(outputPath, 
outputFile);
             // Save output Markdown file
             file.write(outputFilenamePath, modified);
             writeln(programName ~ ": written " ~ 
outputFilenamePath);
             // Increase counter to indicate number of files 
processed
             fileWrittenCount++;
         }
         catch (file.FileException e)
         {
             writeln(programName ~ ": " ~ e.msg);
         }
     }

     writeln(programName ~ ": ...");

     // Gett end clock
     auto endttime = Clock.currTime();
     auto duration = endttime - stattime;
     writeln("Duration: ", duration);

     // Console output a summary
     writeln(programName ~ ": written " ~ 
to!string(fileWrittenCount) ~ " files");
}
```


For testing, you can create a text file, save as `.Rmd` in the 
same folder as the D file. Run the script as:

```bash
rdmd rmd2md.d
```

It will find the `.Rmd` file in current path and save it in the 
current path.
Apr 08 2023
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/8/23 9:38 PM, ikelaiah wrote:
 // Get files in specified inputPath variable with a specific extension
      auto rmdFiles = file.dirEntries(inputPath, file.SpanMode.shallow)
          .filter!(f => f.isFile)
          .filter!(f => f.name.endsWith(fileEndsWith));
 
      // LINE 72 -- WARNING -- If we count the range here, later it will 
 become 0 in line 82
      writeln(programName ~ ": number of files found " ~ 
 to!string(rmdFiles.walkLength));
dirEntries returns an *input range*, not a *forward range*. This means that once it's iterated, it's done. If you want to iterate it twice, you'll have to construct it twice. -Steve
Apr 08 2023
parent reply ikelaiah <iwan.kelaiah gmail.com> writes:
On Sunday, 9 April 2023 at 03:39:52 UTC, Steven Schveighoffer 
wrote:
 On 4/8/23 9:38 PM, ikelaiah wrote:
 // Get files in specified inputPath variable with a specific 
 extension
      auto rmdFiles = file.dirEntries(inputPath, 
 file.SpanMode.shallow)
          .filter!(f => f.isFile)
          .filter!(f => f.name.endsWith(fileEndsWith));
 
      // LINE 72 -- WARNING -- If we count the range here, 
 later it will become 0 in line 82
      writeln(programName ~ ": number of files found " ~ 
 to!string(rmdFiles.walkLength));
dirEntries returns an *input range*, not a *forward range*. This means that once it's iterated, it's done. If you want to iterate it twice, you'll have to construct it twice. -Steve
Steve, You're absolutely right. I did not read the manual correctly. It is clearly written [here](https://dlang.org/library/std/file/dir_entries.html) that `dirEntry` is an `input range`. I will modify the code to construct it twice. Many thanks! -ikelaiah
Apr 08 2023
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 4/8/23 21:38, ikelaiah wrote:

 I will modify the code to construct it twice.
Multiple iterations of dirEntries can produce different results, which may or may not be what your program will be happy with. Sticking an .array at the end will iterate a single time and maintain the list forever because .array returns an array. :) auto entries = dirEntries(/* ... */).array; Ali
Apr 09 2023
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/9/23 9:16 AM, Ali Çehreli wrote:
 On 4/8/23 21:38, ikelaiah wrote:
 
  > I will modify the code to construct it twice.
 
 Multiple iterations of dirEntries can produce different results, which 
 may or may not be what your program will be happy with.
 
 Sticking an .array at the end will iterate a single time and maintain 
 the list forever because .array returns an array. :)
 
    auto entries = dirEntries(/* ... */).array;
I'd be cautious of that. I don't know what the underlying code uses, it may reuse buffers for e.g. filenames to avoid allocation. If you are confident the directory contents won't change in that split-second, then I think iterating twice is fine. -Steve
Apr 09 2023
parent reply ikelaiah <iwan.kelaiah gmail.com> writes:
On Monday, 10 April 2023 at 01:01:59 UTC, Steven Schveighoffer 
wrote:
 On 4/9/23 9:16 AM, Ali Çehreli wrote:
 On 4/8/23 21:38, ikelaiah wrote:
 
  > I will modify the code to construct it twice.
 
 Multiple iterations of dirEntries can produce different 
 results, which may or may not be what your program will be 
 happy with.
 
 Sticking an .array at the end will iterate a single time and 
 maintain the list forever because .array returns an array. :)
 
    auto entries = dirEntries(/* ... */).array;
I'd be cautious of that. I don't know what the underlying code uses, it may reuse buffers for e.g. filenames to avoid allocation. If you are confident the directory contents won't change in that split-second, then I think iterating twice is fine. -Steve
Steve, The Rmd files are not on a network drive, but saved locally. So, I'm confident, the files won't change in a split-second. -ikelaiah.
Apr 10 2023
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/10/23 6:43 PM, ikelaiah wrote:
 On Monday, 10 April 2023 at 01:01:59 UTC, Steven Schveighoffer wrote:
 On 4/9/23 9:16 AM, Ali Çehreli wrote:
    auto entries = dirEntries(/* ... */).array;
I'd be cautious of that. I don't know what the underlying code uses, it may reuse buffers for e.g. filenames to avoid allocation. If you are confident the directory contents won't change in that split-second, then I think iterating twice is fine.
Steve, The Rmd files are not on a network drive, but saved locally. So, I'm confident, the files won't change in a split-second.
That is not what I meant. What I mean is that `array` is going to copy whatever values the range gives it, which might be later *overwritten* depending on how `dirEntries` is implemented. e.g. the following code is broken: ```d auto lines = File("foo.txt").byLine.array; ``` But the following is correct: ``` auto lines = File("foo.txt").byLineCopy.array; ``` Why? Because `byLine` reuses the line buffer eventually to save on allocations. The array of lines might contain garbage in the earlier elements as they got overwritten. I'm not saying it's wrong for `dirEntries`, I haven't looked. But you may want to be cautious about just using `array` to get you out of trouble, especially for lazy input ranges. -Steve
Apr 10 2023
prev sibling parent ikelaiah <iwan.kelaiah gmail.com> writes:
On Sunday, 9 April 2023 at 13:16:51 UTC, Ali Çehreli wrote:
 On 4/8/23 21:38, ikelaiah wrote:

 I will modify the code to construct it twice.
Multiple iterations of dirEntries can produce different results, which may or may not be what your program will be happy with. Sticking an .array at the end will iterate a single time and maintain the list forever because .array returns an array. :) auto entries = dirEntries(/* ... */).array; Ali
Ali, I didn't think about returning `dirEntries` as `array`. Thanks for the Gems (and your online book too). Regards, ikelaiah
Apr 10 2023