digitalmars.D.learn - How come a count of a range becomes 0 before a foreach?

ikelaiah (134/134) Apr 08 2023 Hi,

Steven Schveighoffer (5/14) Apr 08 2023 dirEntries returns an *input range*, not a *forward range*. This means

ikelaiah (10/27) Apr 08 2023 Steve,

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/8) Apr 09 2023 Multiple iterations of dirEntries can produce different results, which

Steven Schveighoffer (6/17) Apr 09 2023 I'd be cautious of that. I don't know what the underlying code uses, it

ikelaiah (6/25) Apr 10 2023 Steve,

Steven Schveighoffer (20/35) Apr 10 2023 That is not what I meant.

ikelaiah (6/15) Apr 10 2023 Ali,

ikelaiah <iwan.kelaiah gmail.com> writes:

Hi,

I've written a file that converts Rmd (R Markdown file), to a 
MarkDown file.

All works well if and only if line 73 is commented out.

I marked line 72 in the code below.

If line 73 is not commented out, `foreach` does not execute as 
the `rmdFiles.walklength` in line 82 becomes `0`.

How does `rmdFiles.walkLength` becomes `0` before the `foreach`? 
I'm a but confused.

Can someone clarify? Thank you.


```d
module rmd2md;

import std.algorithm;
import std.stdio;
import file = std.file;
import std.conv;
import std.regex;
import std.getopt;
import std.path;
import std.datetime;
import std.parallelism;
import std.range;

void main(string[] args)
{
     // Set variables for the main program
     string programName = "Rmd2md";

     // Setup Regex for capturing Rmd code snippet header
     Regex!char re = regex(r"`{3}\{r[a-zA-Z0-9= ]*\}", "g");

     // Set default values for the arguments
     string inputPath = file.getcwd();
     string fileEndsWith = ".Rmd";
     string outputPath = file.getcwd();

     // Set GetOpt variables
     auto helpInformation = getopt(
         args,
         "path|p", "Path of Rmd files. Default: current working 
directory.", &inputPath,
         "fext|e", "Extension of Rmd files. Default: `.Rmd`", 
&fileEndsWith,
         "fout|o", "Output folder to save the MD files. Default: 
current working directory.", &outputPath
     );

     if (helpInformation.helpWanted)
     {
         defaultGetoptPrinter("Rmd to Markdown (md) file 
converter.",
             helpInformation.options);
         return;
     }

     // is the path valid?
     if (!std.path.isValidPath(inputPath))
     {
         writeln(programName ~ ": invalid input path");
         return;
     }

     // is output path valid?
     if (!std.path.isValidPath(outputPath))
     {
         writeln(programName ~ ": invalid output path");
         return;
     }

     // is file extension valid?
     if (!startsWith(fileEndsWith, "."))
     {
         writeln(programName ~ ": invalid extension given");
         return;
     }

     writeln(programName ~ ": input directory is " ~ inputPath);
     writeln(programName ~ ": output directory is " ~ outputPath);
     writeln(programName ~ ": ...");

     // Get files in specified inputPath variable with a specific 
extension
     auto rmdFiles = file.dirEntries(inputPath, 
file.SpanMode.shallow)
         .filter!(f => f.isFile)
         .filter!(f => f.name.endsWith(fileEndsWith));

     // LINE 72 -- WARNING -- If we count the range here, later it 
will become 0 in line 82
     writeln(programName ~ ": number of files found " ~ 
to!string(rmdFiles.walkLength));

     // Get start time
     auto stattime = Clock.currTime();

     // Process each Rmd file
     int fileWrittenCount = 0;

     // LINE 81 -- WARNING -- if line 73 is not commented out, the 
walkLength returns 0
     writeln(programName ~ ": number of files found " ~ 
to!string(rmdFiles.walkLength));

     foreach (file.DirEntry item; parallel(rmdFiles))
     {
         writeln(programName ~ ": processing " ~ item.name);

         try
         {
             // Read content as string
             string content = file.readText(item.name);
             // Replace ```{r} or ```{r option1=value} with ```R
             string modified = replaceAll(content, re, "```R");
             // Set the Markdown output file
             string outputFile = replaceAll(baseName(item.name), 
regex(r".Rmd"), ".md");
             // Build an output path, using output path and 
baseName(item.name)
             string outputFilenamePath = buildPath(outputPath, 
outputFile);
             // Save output Markdown file
             file.write(outputFilenamePath, modified);
             writeln(programName ~ ": written " ~ 
outputFilenamePath);
             // Increase counter to indicate number of files 
processed
             fileWrittenCount++;
         }
         catch (file.FileException e)
         {
             writeln(programName ~ ": " ~ e.msg);
         }
     }

     writeln(programName ~ ": ...");

     // Gett end clock
     auto endttime = Clock.currTime();
     auto duration = endttime - stattime;
     writeln("Duration: ", duration);

     // Console output a summary
     writeln(programName ~ ": written " ~ 
to!string(fileWrittenCount) ~ " files");
}
```


For testing, you can create a text file, save as `.Rmd` in the 
same folder as the D file. Run the script as:

```bash
rdmd rmd2md.d
```

It will find the `.Rmd` file in current path and save it in the 
current path.

Apr 08 2023

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/8/23 9:38 PM, ikelaiah wrote:
 // Get files in specified inputPath variable with a specific extension
      auto rmdFiles = file.dirEntries(inputPath, file.SpanMode.shallow)
          .filter!(f => f.isFile)
          .filter!(f => f.name.endsWith(fileEndsWith));
 
      // LINE 72 -- WARNING -- If we count the range here, later it will 
 become 0 in line 82
      writeln(programName ~ ": number of files found " ~ 
 to!string(rmdFiles.walkLength));

dirEntries returns an *input range*, not a *forward range*. This means 
that once it's iterated, it's done.

If you want to iterate it twice, you'll have to construct it twice.

-Steve

Apr 08 2023

ikelaiah <iwan.kelaiah gmail.com> writes:

On Sunday, 9 April 2023 at 03:39:52 UTC, Steven Schveighoffer 
wrote:
 On 4/8/23 9:38 PM, ikelaiah wrote:
 // Get files in specified inputPath variable with a specific 
 extension
      auto rmdFiles = file.dirEntries(inputPath, 
 file.SpanMode.shallow)
          .filter!(f => f.isFile)
          .filter!(f => f.name.endsWith(fileEndsWith));
 
      // LINE 72 -- WARNING -- If we count the range here, 
 later it will become 0 in line 82
      writeln(programName ~ ": number of files found " ~ 
 to!string(rmdFiles.walkLength));

 dirEntries returns an *input range*, not a *forward range*. 
 This means that once it's iterated, it's done.

 If you want to iterate it twice, you'll have to construct it 
 twice.

 -Steve

Steve,

You're absolutely right. I did not read the manual correctly.

It is clearly written 
[here](https://dlang.org/library/std/file/dir_entries.html) that 
`dirEntry` is an `input range`.

I will modify the code to construct it twice.
Many thanks!

-ikelaiah

Apr 08 2023

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 4/8/23 21:38, ikelaiah wrote:

 I will modify the code to construct it twice.

Multiple iterations of dirEntries can produce different results, which 
may or may not be what your program will be happy with.

Sticking an .array at the end will iterate a single time and maintain 
the list forever because .array returns an array. :)

   auto entries = dirEntries(/* ... */).array;

Ali

Apr 09 2023

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/9/23 9:16 AM, Ali Çehreli wrote:
 On 4/8/23 21:38, ikelaiah wrote:
 
  > I will modify the code to construct it twice.
 
 Multiple iterations of dirEntries can produce different results, which 
 may or may not be what your program will be happy with.
 
 Sticking an .array at the end will iterate a single time and maintain 
 the list forever because .array returns an array. :)
 
    auto entries = dirEntries(/* ... */).array;

I'd be cautious of that. I don't know what the underlying code uses, it 
may reuse buffers for e.g. filenames to avoid allocation.

If you are confident the directory contents won't change in that 
split-second, then I think iterating twice is fine.

-Steve

Apr 09 2023

ikelaiah <iwan.kelaiah gmail.com> writes:

On Monday, 10 April 2023 at 01:01:59 UTC, Steven Schveighoffer 
wrote:
 On 4/9/23 9:16 AM, Ali Çehreli wrote:
 On 4/8/23 21:38, ikelaiah wrote:
 
  > I will modify the code to construct it twice.
 
 Multiple iterations of dirEntries can produce different 
 results, which may or may not be what your program will be 
 happy with.
 
 Sticking an .array at the end will iterate a single time and 
 maintain the list forever because .array returns an array. :)
 
    auto entries = dirEntries(/* ... */).array;

 I'd be cautious of that. I don't know what the underlying code 
 uses, it may reuse buffers for e.g. filenames to avoid 
 allocation.

 If you are confident the directory contents won't change in 
 that split-second, then I think iterating twice is fine.

 -Steve

Steve,

The Rmd files are not on a network drive, but saved locally.
So, I'm confident, the files won't change in a split-second.

-ikelaiah.

Apr 10 2023

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/10/23 6:43 PM, ikelaiah wrote:
 On Monday, 10 April 2023 at 01:01:59 UTC, Steven Schveighoffer wrote:
 On 4/9/23 9:16 AM, Ali Çehreli wrote:


    auto entries = dirEntries(/* ... */).array;

 I'd be cautious of that. I don't know what the underlying code uses, 
 it may reuse buffers for e.g. filenames to avoid allocation.

 If you are confident the directory contents won't change in that 
 split-second, then I think iterating twice is fine.

 
 Steve,
 
 The Rmd files are not on a network drive, but saved locally.
 So, I'm confident, the files won't change in a split-second.

That is not what I meant.

What I mean is that `array` is going to copy whatever values the range 
gives it, which might be later *overwritten* depending on how 
`dirEntries` is implemented.

e.g. the following code is broken:

```d
auto lines = File("foo.txt").byLine.array;
```

But the following is correct:

```
auto lines = File("foo.txt").byLineCopy.array;
```

Why? Because `byLine` reuses the line buffer eventually to save on 
allocations. The array of lines might contain garbage in the earlier 
elements as they got overwritten.

I'm not saying it's wrong for `dirEntries`, I haven't looked. But you 
may want to be cautious about just using `array` to get you out of 
trouble, especially for lazy input ranges.

-Steve

Apr 10 2023

ikelaiah <iwan.kelaiah gmail.com> writes:

On Sunday, 9 April 2023 at 13:16:51 UTC, Ali Çehreli wrote:
 On 4/8/23 21:38, ikelaiah wrote:

 I will modify the code to construct it twice.

 Multiple iterations of dirEntries can produce different 
 results, which may or may not be what your program will be 
 happy with.

 Sticking an .array at the end will iterate a single time and 
 maintain the list forever because .array returns an array. :)

   auto entries = dirEntries(/* ... */).array;

 Ali


Ali,

I didn't think about returning `dirEntries` as `array`.
Thanks for the Gems (and your online book too).

Regards,
ikelaiah

Apr 10 2023

D Programming

C/C++ Programming

Other

digitalmars.D.learn - How come a count of a range becomes 0 before a foreach?