www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How do I iteratively replace lines in a file?

reply Andrej Mitrovic <none none.none> writes:
I'm trying to do something like the following:

File inputfile;
foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
{
    if (!(isFile(name) && getExt(name) == "d"))
    {
        continue;
    }
    
    inputfile = File(name, "a+");
    
    foreach (line; inputfile.byLine)
    {
        if (line == "import foo.d")
        {
            inputfile.write("import bar.d");  // or ideally `line = "import
bar.d"`
        }
    }
}

That obviously won't work. I think I might need to use the `fseek` function to
keep track of where I am in the file, or something like that. File I/O in D is
no fun..
Mar 19 2011
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Saturday 19 March 2011 16:51:19 Andrej Mitrovic wrote:
 I'm trying to do something like the following:
 
 File inputfile;
 foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
 {
     if (!(isFile(name) && getExt(name) == "d"))
     {
         continue;
     }
 
     inputfile = File(name, "a+");
 
     foreach (line; inputfile.byLine)
     {
         if (line == "import foo.d")
         {
             inputfile.write("import bar.d");  // or ideally `line = "import
 bar.d"` }
     }
 }
 
 That obviously won't work. I think I might need to use the `fseek` function
 to keep track of where I am in the file, or something like that. File I/O
 in D is no fun..
I think that most of the D file I/O stuff is built around the idea of reading in the whole file and writing out a whole file rather than editing a file - certainly the range-based stuff works that way at the moment. You can probably use std.stdio.File.seek to seek to the appropriate position and then write there, but I believe that all of the range-based stuff currently is really only for reading a file. Personally, I only ever read in whole files and write out whole files without any kind of interleaving, but while that generally works great, it doesn't scale once you start dealing with large files. It's likely an area that D's file I/O could use some improvement. That may or may not need to be part of the stream stuff though. - Jonathan M Davis
Mar 19 2011
prev sibling next sibling parent =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 03/19/2011 04:51 PM, Andrej Mitrovic wrote:
 I'm trying to do something like the following:

 File inputfile;
 foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
 {
      if (!(isFile(name)&&  getExt(name) == "d"))
      {
          continue;
      }

      inputfile = File(name, "a+");

      foreach (line; inputfile.byLine)
      {
          if (line == "import foo.d")
          {
              inputfile.write("import bar.d");  // or ideally `line = 
"import bar.d"`
          }
      }
 }

 That obviously won't work. I think I might need to use the `fseek` 
function to keep track of where I am in the file, or something like that. That's not a good idea with text files. Even for binary files, the file must have a well defined format. It is not possible to insert or remove bytes from a file due to low level reasons. The file systems that I am aware of don't provide such interfaces. And writing after fseek would overwrite existing data. Like Jonathan M Davis said, the best is to read from the source and write to the destination. Ali
Mar 20 2011
prev sibling parent reply Kai Meyer <kai unixlords.com> writes:
On 03/19/2011 05:51 PM, Andrej Mitrovic wrote:
 I'm trying to do something like the following:

 File inputfile;
 foreach (string name; dirEntries(r".\subdir\", SpanMode.shallow))
 {
      if (!(isFile(name)&&  getExt(name) == "d"))
      {
          continue;
      }

      inputfile = File(name, "a+");

      foreach (line; inputfile.byLine)
      {
          if (line == "import foo.d")
          {
              inputfile.write("import bar.d");  // or ideally `line = "import
bar.d"`
          }
      }
 }

 That obviously won't work. I think I might need to use the `fseek` function to
keep track of where I am in the file, or something like that. File I/O in D is
no fun..
The only problem with your approach that a "line" is an abstract concept. In a filesystem, there are only blocks of bytes. When you write (flush) a byte to a file, the file transaction is actually an entire block at a time (ext3 defaults to a 4k block, for example.) Lines are just an array of bytes. When dealing with (relatively) fast memory, modifying a line is pretty transparent. If you open a 1GB file and add bytes at the very beginning, the filesystem is quite likely to write out the entire file again. I would suggest you write out to a temporary file, and then move the file on top of the original file. foreach(name ...) { inputfile = File(name, "r"); outputfile = File("/tmp/" ~ name, "a"); foreach(line ...) { do something to line outputfile.write(line); } outputfile.close(); rename("/tmp" ~ name, name); } This will allow you to manipulate line by line, but it won't be in-place. This is the type of approach that a lot of text editors take, and a very common work around. If you were to encounter a language that allows you to read and write lines iteratively and in-place like this in a file, I'll bet you they are writing your changes to a temp file, and moving the file over the top of the original at the end (perhaps when you close()).
Mar 20 2011
parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Yeah, I've already done exactly as you guys proposed. Note however
that `inputfile` and `outputfile` should be declared inside the
foreach loop. Either that or you have to call `close()` explicitly. If
you don't do that, file handles don't get released, and you'll
eventually get back a stdio error such as "too many file handles
opened". You could loose files this way. I know this because it just
happened yesterday while testing. :p

Anywho, I needed a quick script to append a semicolon to import lines
because I managed to screw up some files when using sed to replace
some lines. It's a quick hack but worked for me:

import std.stdio;
import std.file;
import std.stdio;
import std.path;
import std.string;

void main()
{
    File inputfile;
    File outputfile;
    string newname;

    foreach (string name; dirEntries(r".", SpanMode.breadth))
    {
        if (!(isFile(name) && getExt(name) == "d"))
        {
            continue;
        }

        newname = name.idup ~ "backup";
        if (exists(newname))
        {
            remove(newname);
        }

        rename(name, newname);

        inputfile = File(newname, "r");
        outputfile = File(name, "w");

        foreach (line; inputfile.byLine)
        {
            if ((line.startsWith("private import") ||
line.startsWith("import")) &&
                !line.endsWith(",") &&
                !line.endsWith(";"))
            {
                outputfile.writeln(line ~ ";");
            }
            else
            {
                outputfile.writeln(line);
            }
        }

        inputfile.close();
        outputfile.close();
    }

    foreach (string name; dirEntries(r".", SpanMode.breadth))
    {
        if (getExt(name) == "dbackup")
        {
            remove(name);
        }
    }
}
Mar 20 2011
parent Kai Meyer <kai unixlords.com> writes:
On 03/20/2011 09:46 AM, Andrej Mitrovic wrote:
 Yeah, I've already done exactly as you guys proposed. Note however
 that `inputfile` and `outputfile` should be declared inside the
 foreach loop. Either that or you have to call `close()` explicitly. If
 you don't do that, file handles don't get released, and you'll
 eventually get back a stdio error such as "too many file handles
 opened". You could loose files this way. I know this because it just
 happened yesterday while testing. :p

 Anywho, I needed a quick script to append a semicolon to import lines
 because I managed to screw up some files when using sed to replace
 some lines. It's a quick hack but worked for me:

 import std.stdio;
 import std.file;
 import std.stdio;
 import std.path;
 import std.string;

 void main()
 {
      File inputfile;
      File outputfile;
      string newname;

      foreach (string name; dirEntries(r".", SpanMode.breadth))
      {
          if (!(isFile(name)&&  getExt(name) == "d"))
          {
              continue;
          }

          newname = name.idup ~ "backup";
          if (exists(newname))
          {
              remove(newname);
          }

          rename(name, newname);

          inputfile = File(newname, "r");
          outputfile = File(name, "w");

          foreach (line; inputfile.byLine)
          {
              if ((line.startsWith("private import") ||
 line.startsWith("import"))&&
                  !line.endsWith(",")&&
                  !line.endsWith(";"))
              {
                  outputfile.writeln(line ~ ";");
              }
              else
              {
                  outputfile.writeln(line);
              }
          }

          inputfile.close();
          outputfile.close();
      }

      foreach (string name; dirEntries(r".", SpanMode.breadth))
      {
          if (getExt(name) == "dbackup")
          {
              remove(name);
          }
      }
 }
Funny, I would have just fixed it with sed. sed -ir 's/^(import.*)/\1;' *.d Infact, I think sed is actually a great example of an application that you apply a search and replace on a per-line basis. I'd be curious if somebody knows how their '-i' flag (for in-place) works. Based on the man page, I'll bet it opens the source read-only, and opens the destination write-only like Andrej's example. -i[SUFFIX], --in-place[=SUFFIX] edit files in place (makes backup if extension supplied) The SUFFIX option just renames the original instead of deleting at the end.
Mar 20 2011