www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Using regular expressions when reading a file

reply Alexander Zhirov <azhirov1991 gmail.com> writes:
I want to use a configuration file with external settings. I'm 
trying to use regular expressions to read the `Property = Value` 
settings. I would like to do it all more beautifully. Is there 
any way to get rid of the line break character? How much does 
everything look "right"?

**settings.conf:**

```sh
host = 127.0.0.1
port = 5432
dbname = database
user = postgres
```

**code:**

```d
auto file = File("settings.conf", "r");
string[string] properties;
auto p_property = regex(r"^\w+ *= *.+", "s");
while (!file.eof())
{
   string line = file.readln();
   auto m = matchAll(line, p_property);
   if (!m.empty())
   {
     string property = matchAll(line, regex(r"^\w+", "m")).hit;
     string value = replaceAll(line, regex(r"^\w+ *= *", "m"), "");
     properties[property] = value;
   }
}
file.close();
writeln(properties);
```

**output:**

```sh
["host":"127.0.0.1\n", "dbname":"mydb\n", "user":"postgres", 
"port":"5432\n"]
```
May 05 2022
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 05, 2022 at 05:53:57PM +0000, Alexander Zhirov via
Digitalmars-d-learn wrote:
 I want to use a configuration file with external settings. I'm trying
 to use regular expressions to read the `Property = Value` settings. I
 would like to do it all more beautifully. Is there any way to get rid
 of the line break character? How much does everything look "right"?
[...]
 ```d
 auto file = File("settings.conf", "r");
 string[string] properties;
 auto p_property = regex(r"^\w+ *= *.+", "s");
 while (!file.eof())
 {
   string line = file.readln();
   auto m = matchAll(line, p_property);
   if (!m.empty())
   {
     string property = matchAll(line, regex(r"^\w+", "m")).hit;
     string value = replaceAll(line, regex(r"^\w+ *= *", "m"), "");
     properties[property] = value;
   }
 }
Your regex already matches the `Property = Value` pattern; why not just use captures to extract the relevant parts of the match, insteead of doing it all over again inside the if-statement? // I added captures (parentheses) to extract the property name // and value directly from the pattern. auto p_property = regex(r"^(\w+) *= *(.+)", "s"); // I assume you only want one `Property = Value` pair per input // line, so you really don't need matchAll; matchFirst will do // the job. auto m = matchFirst(line, p_property); if (m) { // No need to run a match again, just extract the // captures string property = m[1]; string value = m[2]; properties[property] = value; } T -- "You are a very disagreeable person." "NO."
May 05 2022
parent reply Alexander Zhirov <azhirov1991 gmail.com> writes:
On Thursday, 5 May 2022 at 18:15:28 UTC, H. S. Teoh wrote:
 	auto m = matchFirst(line, p_property);
Yes, it looks more attractive. Thanks! I just don't quite understand how `matchFirst` works. I seem to have read the [description](https://dlang.org/phobos/std_regex.html#Captures), but I can't understand something. And yet I have to manually remove the line break: ```sh ["host":"192.168.100.236\n", "dbname":"belpig\n", "user":"postgres", "port":"5432\n"] ```
May 05 2022
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, May 05, 2022 at 06:50:17PM +0000, Alexander Zhirov via
Digitalmars-d-learn wrote:
 On Thursday, 5 May 2022 at 18:15:28 UTC, H. S. Teoh wrote:
 	auto m = matchFirst(line, p_property);
Yes, it looks more attractive. Thanks! I just don't quite understand how `matchFirst` works. I seem to have read the [description](https://dlang.org/phobos/std_regex.html#Captures), but I can't understand something. And yet I have to manually remove the line break: ```sh ["host":"192.168.100.236\n", "dbname":"belpig\n", "user":"postgres", "port":"5432\n"] ```
You don't have to. Just add a `$` to the end of your regex, and it should match the newline. If you put it outside the capture parentheses, it will not be included in the value. T -- In a world without fences, who needs Windows and Gates? -- Christian Surchi
May 05 2022
parent reply Alexander Zhirov <azhirov1991 gmail.com> writes:
On Thursday, 5 May 2022 at 18:58:41 UTC, H. S. Teoh wrote:
 You don't have to. Just add a `$` to the end of your regex, and 
 it should match the newline. If you put it outside the capture 
 parentheses, it will not be included in the value.
In fact, it turned out to be much easier. It was just necessary to use the `m` flag instead of the `s` flag: ```d auto p_property = regex(r"^(\w+) *= *(.+)", "m"); ```
May 05 2022
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 5/5/22 12:05, Alexander Zhirov wrote:
 On Thursday, 5 May 2022 at 18:58:41 UTC, H. S. Teoh wrote:
 You don't have to. Just add a `$` to the end of your regex, and it 
 should match the newline. If you put it outside the capture 
 parentheses, it will not be included in the value.
In fact, it turned out to be much easier. It was just necessary to use the `m` flag instead of the `s` flag: ```d auto p_property = regex(r"^(\w+) *= *(.+)", "m"); ```
Couldn't help myself from improving. :) The following regex works in my Linux console. No issues with '\n'. (?) It also allows for leading and trailing spaces: import std.regex; import std.stdio; import std.algorithm; import std.array; import std.typecons; import std.functional; void main() { auto p_property = regex(r"^ *(\w+) *= *(\w+) *$"); const properties = File("settings.conf") .byLineCopy .map!(line => matchFirst(line, p_property)) .filter!(not!empty) // OR: .filter!(m => !m.empty) .map!(m => tuple(m[1], m[2])) .assocArray; writeln(properties); } Ali
May 05 2022
parent Alexander Zhirov <azhirov1991 gmail.com> writes:
On Thursday, 5 May 2022 at 19:19:26 UTC, Ali Çehreli wrote:
 Couldn't help myself from improving. :) The following regex 
 works in my Linux console. No issues with '\n'. (?) It also 
 allows for leading and trailing spaces:

 import std.regex;
 import std.stdio;
 import std.algorithm;
 import std.array;
 import std.typecons;
 import std.functional;

 void main() {
   auto p_property = regex(r"^ *(\w+) *= *(\w+) *$");
   const properties = File("settings.conf")
                      .byLineCopy
                      .map!(line => matchFirst(line, p_property))
                      .filter!(not!empty) // OR: .filter!(m => 
 !m.empty)
                      .map!(m => tuple(m[1], m[2]))
                      .assocArray;

   writeln(properties);
 }
It will need to be sorted out with a fresh head. 😀 Thanks!
May 05 2022
prev sibling parent reply forkit <forkit gmail.com> writes:
On Thursday, 5 May 2022 at 17:53:57 UTC, Alexander Zhirov wrote:
 I want to use a configuration file with external settings. I'm 
 trying to use regular expressions to read the `Property = 
 Value` settings. I would like to do it all more beautifully. Is 
 there any way to get rid of the line break character? How much 
 does everything look "right"?
regex never looks right ;-) try something else perhaps?? // ------------ module test; import std; void main() { auto file = File("d:\\settings.conf", "r"); string[string] aa; // create an associate array of settings -> [key:value] foreach (line; file.byLine().filter!(a => !a.empty)) { auto myTuple = line.split(" = "); aa[myTuple[0].to!string] = myTuple[1].to!string; } // write out all the settings. foreach (key, value; aa.byPair) writefln("%s:%s", key, value); writeln; // write just the host value writeln(aa["host"]); } // ------------
May 05 2022
parent reply Alexander Zhirov <azhirov1991 gmail.com> writes:
On Friday, 6 May 2022 at 05:40:52 UTC, forkit wrote:
 auto myTuple = line.split(" = ");
Well, only if as a strict form :)
May 06 2022
next sibling parent forkit <forkit gmail.com> writes:
On Friday, 6 May 2022 at 07:51:01 UTC, Alexander Zhirov wrote:
 On Friday, 6 May 2022 at 05:40:52 UTC, forkit wrote:
 auto myTuple = line.split(" = ");
Well, only if as a strict form :)
well.. a settings file should be following a strict format. ..otherwise...anything goes... and good luck with that... regex won't help you either in that case... e.g: user =som=eu=ser (how you going to deal with this ?)
May 06 2022
prev sibling parent novice2 <sorry noem.ail> writes:
imho, regexp is overkill here.
as for me, i usually just split line for first '=', then trim 
spaces left and right parts.
May 06 2022