www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Help with Regular Expressions (std.regex)

reply Samir <samir aol.com> writes:
I am belatedly working my way through the 2018 edition of the 
Advent of Code[1] programming challenges using D and am stumped 
on Problem 3[2].  The challenge requires you to parse a set of 
lines in the format:
#99   652,39: 24x23
#100   61,13: 15x24
#101   31,646: 16x28

I would like to store each number (match) as an element in an 
array so that I can refer to them by index.  For example, for the 
first line:

m = [99, 652, 39, 24, 23]
assert(m[0] == 99);
assert(m[1] == 652);
// ...
assert(m[4] == 23);

What is the best way to do this?  (I will worry about converting 
characters to integers later.)

I have the following solution so far based on reading Dmitry 
Olshansky's article on std.regex[3] and the std.regex 
documention[4]:

import std.stdio;
import std.regex;

void main() {
     auto line    = "#99   652,39: 24x23";
     auto pattern = regex(r"\d+");
     auto m       = matchAll(line, pattern);
     writeln(m);
}

which results in:
[["99"], ["652"], ["39"], ["24"], ["23"]]

But this doesn't seem to be an iterable array as changing 
writeln(m) to writeln(m[0]) yields
Error: no [] operator overload for type RegexMatch!string

Changing the line to writeln(m.front[0]) yields
99

but m.front doesn't allow me to access other elements (i.e. 
m.front[1]):
requested submatch number 1 is out of range
----------------
??:? _d_assert_msg [0x4dc27a]
??:? inout pure nothrow  trusted inout(immutable(char)[]) 
std.regex.Captures!(immutable(char)[]).Captures.opIndex!().opIndex(ulong)
[0x4d8d57]
??:? _Dmain [0x49ffc8]

I've tried something like
foreach (m; matchAll(line, pattern))
         writeln(m.hit);

which is close but doesn't result in an array.  Do I need to use 
matchFirst?

Thanks in advance.
Samir

[1] https://adventofcode.com/2018
[2] https://adventofcode.com/2018/day/3
[3] https://dlang.org/articles/regular-expression.html
[4] https://dlang.org/phobos/std_regex.html
Mar 03
next sibling parent reply user1234 <user1234 12.de> writes:
On Sunday, 3 March 2019 at 18:07:57 UTC, Samir wrote:
 I am belatedly working my way through the 2018 edition of the 
 Advent of Code[1] programming challenges using D and am stumped 
 on Problem 3[2].  The challenge requires you to parse a set of 
 lines in the format:
 #99   652,39: 24x23
 #100   61,13: 15x24
 #101   31,646: 16x28

 I would like to store each number (match) as an element in an 
 array so that I can refer to them by index.  For example, for 
 the first line:

 m = [99, 652, 39, 24, 23]
 assert(m[0] == 99);
 assert(m[1] == 652);
 // ...
 assert(m[4] == 23);

 What is the best way to do this?  (I will worry about 
 converting characters to integers later.)

 I have the following solution so far based on reading Dmitry 
 Olshansky's article on std.regex[3] and the std.regex 
 documention[4]:

 import std.stdio;
 import std.regex;

 void main() {
     auto line    = "#99   652,39: 24x23";
     auto pattern = regex(r"\d+");
     auto m       = matchAll(line, pattern);
     writeln(m);
 }

 which results in:
 [["99"], ["652"], ["39"], ["24"], ["23"]]

 But this doesn't seem to be an iterable array as changing 
 writeln(m) to writeln(m[0]) yields
 Error: no [] operator overload for type RegexMatch!string

 Changing the line to writeln(m.front[0]) yields
 99

 but m.front doesn't allow me to access other elements (i.e. 
 m.front[1]):
 requested submatch number 1 is out of range
 ----------------
 ??:? _d_assert_msg [0x4dc27a]
 ??:? inout pure nothrow  trusted inout(immutable(char)[]) 
 std.regex.Captures!(immutable(char)[]).Captures.opIndex!().opIndex(ulong)
[0x4d8d57]
 ??:? _Dmain [0x49ffc8]

 I've tried something like
 foreach (m; matchAll(line, pattern))
         writeln(m.hit);

 which is close but doesn't result in an array.  Do I need to 
 use matchFirst?

 Thanks in advance.
 Samir

 [1] https://adventofcode.com/2018
 [2] https://adventofcode.com/2018/day/3
 [3] https://dlang.org/articles/regular-expression.html
 [4] https://dlang.org/phobos/std_regex.html
Hello, Something like this should work: import std.array: array; auto allMatches = matchAll(line, pattern).array; or // sorry i don't have the regex API in mind import std.array: array; import std.alogrithm.iteration : map; auto allMatches = matchAll(line, pattern).map(a => a.hit).array; What happened with `writeln` is that it iterates the `matchAll` results which is an input range, which is lazy. `.array` stores the results in an array.
Mar 03
parent reply user1234 <user1234 12.de> writes:
On Sunday, 3 March 2019 at 18:32:14 UTC, user1234 wrote:
 On Sunday, 3 March 2019 at 18:07:57 UTC, Samir wrote:
 or  // sorry i don't have the regex API in mind

   import std.array: array;
   import std.alogrithm.iteration : map;
   auto allMatches = matchAll(line, pattern).map(a => 
 a.hit).array;
oops forgot the bang auto allMatches = matchAll(line, pattern).map!(a => a.hit).array;
Mar 03
parent Samir <samir aol.com> writes:
On Sunday, 3 March 2019 at 19:27:17 UTC, user1234 wrote:
 oops forgot the bang

   auto allMatches = matchAll(line, pattern).map!(a => 
 a.hit).array;
Thanks, user1234! Looks like `map` is another topic I need to read up upon. I slightly modified your suggestion and went with: auto allMatches = matchAll(line, pattern).map!(a => to!int(a.hit)).array; which also takes care of converting the string to int. Samir
Mar 04
prev sibling parent reply dwdv <dwdv posteo.de> writes:
On 3/3/19 7:07 PM, Samir via Digitalmars-d-learn wrote:
 I am belatedly working my way through the 2018 edition of the Advent of 
 Code[1] programming challenges using D and am stumped on Problem 3[2].  
 The challenge requires you to parse a set of lines in the format:
 #99   652,39: 24x23
 #100   61,13: 15x24
 #101   31,646: 16x28
 
 I would like to store each number (match) as an element in an array so 
 that I can refer to them by index.
There is also std.file.slurp which makes this quite easy: slurp!(int, int, int, int, int)("03.input", "#%d %d,%d: %dx%d"); You can then later expand the matches in a loop and process the claims: foreach(id, offX, offY, width, height; ...
Mar 04
parent Samir <samir aol.com> writes:
On Monday, 4 March 2019 at 18:57:34 UTC, dwdv wrote:
 There is also std.file.slurp which makes this quite easy:
 slurp!(int, int, int, int, int)("03.input", "#%d   %d,%d: 
 %dx%d");
That's brilliant! This language just keeps putting a smile on my face every time I learn something new like this!
Mar 05