digitalmars.D.learn - How come a count of a range becomes 0 before a foreach?
- ikelaiah (134/134) Apr 08 2023 Hi,
- Steven Schveighoffer (5/14) Apr 08 2023 dirEntries returns an *input range*, not a *forward range*. This means
- ikelaiah (10/27) Apr 08 2023 Steve,
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/8) Apr 09 2023 Multiple iterations of dirEntries can produce different results, which
- Steven Schveighoffer (6/17) Apr 09 2023 I'd be cautious of that. I don't know what the underlying code uses, it
- ikelaiah (6/25) Apr 10 2023 Steve,
- Steven Schveighoffer (20/35) Apr 10 2023 That is not what I meant.
- ikelaiah (6/15) Apr 10 2023 Ali,
Hi, I've written a file that converts Rmd (R Markdown file), to a MarkDown file. All works well if and only if line 73 is commented out. I marked line 72 in the code below. If line 73 is not commented out, `foreach` does not execute as the `rmdFiles.walklength` in line 82 becomes `0`. How does `rmdFiles.walkLength` becomes `0` before the `foreach`? I'm a but confused. Can someone clarify? Thank you. ```d module rmd2md; import std.algorithm; import std.stdio; import file = std.file; import std.conv; import std.regex; import std.getopt; import std.path; import std.datetime; import std.parallelism; import std.range; void main(string[] args) { // Set variables for the main program string programName = "Rmd2md"; // Setup Regex for capturing Rmd code snippet header Regex!char re = regex(r"`{3}\{r[a-zA-Z0-9= ]*\}", "g"); // Set default values for the arguments string inputPath = file.getcwd(); string fileEndsWith = ".Rmd"; string outputPath = file.getcwd(); // Set GetOpt variables auto helpInformation = getopt( args, "path|p", "Path of Rmd files. Default: current working directory.", &inputPath, "fext|e", "Extension of Rmd files. Default: `.Rmd`", &fileEndsWith, "fout|o", "Output folder to save the MD files. Default: current working directory.", &outputPath ); if (helpInformation.helpWanted) { defaultGetoptPrinter("Rmd to Markdown (md) file converter.", helpInformation.options); return; } // is the path valid? if (!std.path.isValidPath(inputPath)) { writeln(programName ~ ": invalid input path"); return; } // is output path valid? if (!std.path.isValidPath(outputPath)) { writeln(programName ~ ": invalid output path"); return; } // is file extension valid? if (!startsWith(fileEndsWith, ".")) { writeln(programName ~ ": invalid extension given"); return; } writeln(programName ~ ": input directory is " ~ inputPath); writeln(programName ~ ": output directory is " ~ outputPath); writeln(programName ~ ": ..."); // Get files in specified inputPath variable with a specific extension auto rmdFiles = file.dirEntries(inputPath, file.SpanMode.shallow) .filter!(f => f.isFile) .filter!(f => f.name.endsWith(fileEndsWith)); // LINE 72 -- WARNING -- If we count the range here, later it will become 0 in line 82 writeln(programName ~ ": number of files found " ~ to!string(rmdFiles.walkLength)); // Get start time auto stattime = Clock.currTime(); // Process each Rmd file int fileWrittenCount = 0; // LINE 81 -- WARNING -- if line 73 is not commented out, the walkLength returns 0 writeln(programName ~ ": number of files found " ~ to!string(rmdFiles.walkLength)); foreach (file.DirEntry item; parallel(rmdFiles)) { writeln(programName ~ ": processing " ~ item.name); try { // Read content as string string content = file.readText(item.name); // Replace ```{r} or ```{r option1=value} with ```R string modified = replaceAll(content, re, "```R"); // Set the Markdown output file string outputFile = replaceAll(baseName(item.name), regex(r".Rmd"), ".md"); // Build an output path, using output path and baseName(item.name) string outputFilenamePath = buildPath(outputPath, outputFile); // Save output Markdown file file.write(outputFilenamePath, modified); writeln(programName ~ ": written " ~ outputFilenamePath); // Increase counter to indicate number of files processed fileWrittenCount++; } catch (file.FileException e) { writeln(programName ~ ": " ~ e.msg); } } writeln(programName ~ ": ..."); // Gett end clock auto endttime = Clock.currTime(); auto duration = endttime - stattime; writeln("Duration: ", duration); // Console output a summary writeln(programName ~ ": written " ~ to!string(fileWrittenCount) ~ " files"); } ``` For testing, you can create a text file, save as `.Rmd` in the same folder as the D file. Run the script as: ```bash rdmd rmd2md.d ``` It will find the `.Rmd` file in current path and save it in the current path.
Apr 08 2023
On 4/8/23 9:38 PM, ikelaiah wrote:// Get files in specified inputPath variable with a specific extension auto rmdFiles = file.dirEntries(inputPath, file.SpanMode.shallow) .filter!(f => f.isFile) .filter!(f => f.name.endsWith(fileEndsWith)); // LINE 72 -- WARNING -- If we count the range here, later it will become 0 in line 82 writeln(programName ~ ": number of files found " ~ to!string(rmdFiles.walkLength));dirEntries returns an *input range*, not a *forward range*. This means that once it's iterated, it's done. If you want to iterate it twice, you'll have to construct it twice. -Steve
Apr 08 2023
On Sunday, 9 April 2023 at 03:39:52 UTC, Steven Schveighoffer wrote:On 4/8/23 9:38 PM, ikelaiah wrote:Steve, You're absolutely right. I did not read the manual correctly. It is clearly written [here](https://dlang.org/library/std/file/dir_entries.html) that `dirEntry` is an `input range`. I will modify the code to construct it twice. Many thanks! -ikelaiah// Get files in specified inputPath variable with a specific extension auto rmdFiles = file.dirEntries(inputPath, file.SpanMode.shallow) .filter!(f => f.isFile) .filter!(f => f.name.endsWith(fileEndsWith)); // LINE 72 -- WARNING -- If we count the range here, later it will become 0 in line 82 writeln(programName ~ ": number of files found " ~ to!string(rmdFiles.walkLength));dirEntries returns an *input range*, not a *forward range*. This means that once it's iterated, it's done. If you want to iterate it twice, you'll have to construct it twice. -Steve
Apr 08 2023
On 4/8/23 21:38, ikelaiah wrote:I will modify the code to construct it twice.Multiple iterations of dirEntries can produce different results, which may or may not be what your program will be happy with. Sticking an .array at the end will iterate a single time and maintain the list forever because .array returns an array. :) auto entries = dirEntries(/* ... */).array; Ali
Apr 09 2023
On 4/9/23 9:16 AM, Ali Çehreli wrote:On 4/8/23 21:38, ikelaiah wrote: > I will modify the code to construct it twice. Multiple iterations of dirEntries can produce different results, which may or may not be what your program will be happy with. Sticking an .array at the end will iterate a single time and maintain the list forever because .array returns an array. :) auto entries = dirEntries(/* ... */).array;I'd be cautious of that. I don't know what the underlying code uses, it may reuse buffers for e.g. filenames to avoid allocation. If you are confident the directory contents won't change in that split-second, then I think iterating twice is fine. -Steve
Apr 09 2023
On Monday, 10 April 2023 at 01:01:59 UTC, Steven Schveighoffer wrote:On 4/9/23 9:16 AM, Ali Çehreli wrote:Steve, The Rmd files are not on a network drive, but saved locally. So, I'm confident, the files won't change in a split-second. -ikelaiah.On 4/8/23 21:38, ikelaiah wrote: > I will modify the code to construct it twice. Multiple iterations of dirEntries can produce different results, which may or may not be what your program will be happy with. Sticking an .array at the end will iterate a single time and maintain the list forever because .array returns an array. :) auto entries = dirEntries(/* ... */).array;I'd be cautious of that. I don't know what the underlying code uses, it may reuse buffers for e.g. filenames to avoid allocation. If you are confident the directory contents won't change in that split-second, then I think iterating twice is fine. -Steve
Apr 10 2023
On 4/10/23 6:43 PM, ikelaiah wrote:On Monday, 10 April 2023 at 01:01:59 UTC, Steven Schveighoffer wrote:On 4/9/23 9:16 AM, Ali Çehreli wrote:That is not what I meant. What I mean is that `array` is going to copy whatever values the range gives it, which might be later *overwritten* depending on how `dirEntries` is implemented. e.g. the following code is broken: ```d auto lines = File("foo.txt").byLine.array; ``` But the following is correct: ``` auto lines = File("foo.txt").byLineCopy.array; ``` Why? Because `byLine` reuses the line buffer eventually to save on allocations. The array of lines might contain garbage in the earlier elements as they got overwritten. I'm not saying it's wrong for `dirEntries`, I haven't looked. But you may want to be cautious about just using `array` to get you out of trouble, especially for lazy input ranges. -SteveSteve, The Rmd files are not on a network drive, but saved locally. So, I'm confident, the files won't change in a split-second.auto entries = dirEntries(/* ... */).array;I'd be cautious of that. I don't know what the underlying code uses, it may reuse buffers for e.g. filenames to avoid allocation. If you are confident the directory contents won't change in that split-second, then I think iterating twice is fine.
Apr 10 2023
On Sunday, 9 April 2023 at 13:16:51 UTC, Ali Çehreli wrote:On 4/8/23 21:38, ikelaiah wrote:Ali, I didn't think about returning `dirEntries` as `array`. Thanks for the Gems (and your online book too). Regards, ikelaiahI will modify the code to construct it twice.Multiple iterations of dirEntries can produce different results, which may or may not be what your program will be happy with. Sticking an .array at the end will iterate a single time and maintain the list forever because .array returns an array. :) auto entries = dirEntries(/* ... */).array; Ali
Apr 10 2023