www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - regex with literal (ie automatically replace '(' with '\(', etc) )

reply Timothee Cour <thelastmammoth gmail.com> writes:
See below:

import std.stdio;
import std.regex;

void main(){
"h(i".replace!(a=>a.hit~a.hit)(regex(`h\(`,"g")).writeln; //this works, but
I need to specify the escape manually
// "h(i".replace!(a=>a.hit~a.hit)(regex(`h(`,"gl")).writeln;  //I'd like
this to work with a flag, say 'l' (lowercase L) as in 'litteral'.
}

note, std.array.replace doesn't work because I want to be able to use
std.regex' replace with delegate functionality as above.
This is especially useful when the regex's first argument is given as an
input argument (ie is unknown), and we want to properly escape it.

Alternatively, (and perhaps more generally), could we have a function:
string toRegexLiteral(string){
//replace all regex special characters (like '(' ) with their escaped
equivalent
}
May 29 2013
parent reply "timotheecour" <timothee.cour2 gmail.com> writes:
something like this, which we should have in std.regex:

string escapeRegex(string a){
	import std.string;
	enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`, 
'?': `\?`, '(': `\(`, ')': `\)`];
	return translate(a, transTable);
}
string escapeRegexReplace(string a){
	import std.string;
//	enum transTable = ['$' : `$$`, '\\' : `\\`];
	enum transTable = ['$' : `$$`];
	return translate(a, transTable);
}

unittest{
	string a=`asdf(def[ghi]+*|)`;
	assert(match(a,regex(escapeRegex(a))).hit==a);
	string b=`$aa\/$ $$# $\0$1#$ %# %=+_`;
	auto s=replace(a,regex(escapeRegex(a)),escapeRegexReplace(b));
	assert(s==b);
}



On Wednesday, 29 May 2013 at 23:28:19 UTC, Timothee Cour wrote:
 See below:

 import std.stdio;
 import std.regex;

 void main(){
 "h(i".replace!(a=>a.hit~a.hit)(regex(`h\(`,"g")).writeln; 
 //this works, but
 I need to specify the escape manually
 // "h(i".replace!(a=>a.hit~a.hit)(regex(`h(`,"gl")).writeln;  
 //I'd like
 this to work with a flag, say 'l' (lowercase L) as in 
 'litteral'.
 }

 note, std.array.replace doesn't work because I want to be able 
 to use
 std.regex' replace with delegate functionality as above.
 This is especially useful when the regex's first argument is 
 given as an
 input argument (ie is unknown), and we want to properly escape 
 it.

 Alternatively, (and perhaps more generally), could we have a 
 function:
 string toRegexLiteral(string){
 //replace all regex special characters (like '(' ) with their 
 escaped
 equivalent
 }
May 29 2013
parent reply "Diggory" <diggsey googlemail.com> writes:
On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:
 something like this, which we should have in std.regex:

 string escapeRegex(string a){
 	import std.string;
 	enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': 
 `\+`, '?': `\?`, '(': `\(`, ')': `\)`];
 	return translate(a, transTable);
 }
 string escapeRegexReplace(string a){
 	import std.string;
 //	enum transTable = ['$' : `$$`, '\\' : `\\`];
 	enum transTable = ['$' : `$$`];
 	return translate(a, transTable);
 }

 unittest{
 	string a=`asdf(def[ghi]+*|)`;
 	assert(match(a,regex(escapeRegex(a))).hit==a);
 	string b=`$aa\/$ $$# $\0$1#$ %# %=+_`;
 	auto s=replace(a,regex(escapeRegex(a)),escapeRegexReplace(b));
 	assert(s==b);
 }
That would be good (although you missed a few :P) Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.
May 29 2013
parent reply Timothee Cour <thelastmammoth gmail.com> writes:
ok, here it is:

https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78
simplified implementation and added missing escape symbols. Any symbol
missing?
I was basing myself based on http://dlang.org/phobos/std_regex.html, table
entry '\c where c is one of', but that was incomplete. I'm also noting that
table entry 'any character except' is also incomplete.

 Technically any working "escapeRegex" would also function as a valid
"escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$). shall i do a pull request for std.regex? On Wed, May 29, 2013 at 8:32 PM, Diggory <diggsey googlemail.com> wrote:
 On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:

 something like this, which we should have in std.regex:

 string escapeRegex(string a){
         import std.string;
         enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`, '+': `\+`,
 '?': `\?`, '(': `\(`, ')': `\)`];
         return translate(a, transTable);
 }
 string escapeRegexReplace(string a){
         import std.string;
 //      enum transTable = ['$' : `$$`, '\\' : `\\`];
         enum transTable = ['$' : `$$`];
         return translate(a, transTable);
 }

 unittest{
         string a=`asdf(def[ghi]+*|)`;
         assert(match(a,regex(**escapeRegex(a))).hit==a);
         string b=`$aa\/$ $$# $\0$1#$ %# %=+_`;
         auto s=replace(a,regex(escapeRegex(**a)),escapeRegexReplace(b));
         assert(s==b);
 }
That would be good (although you missed a few :P) Technically any working "escapeRegex" would also function as a valid "escapeRegexReplace", although it might be slightly faster to have a specialised one.
May 29 2013
next sibling parent reply "Diggory" <diggsey googlemail.com> writes:
On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:
 ok, here it is:

 https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78
 simplified implementation and added missing escape symbols. Any 
 symbol
 missing?
 I was basing myself based on 
 http://dlang.org/phobos/std_regex.html, table
 entry '\c where c is one of', but that was incomplete. I'm also 
 noting that
 table entry 'any character except' is also incomplete.

 Technically any working "escapeRegex" would also function as a 
 valid
"escapeRegexReplace", although it might be slightly faster to have a specialised one. not sure, because they escape differently (\$ vs $$).
According to this: http://dlang.org/phobos/std_regex.html#.replace you can use the same escape sequences for both (\c -> c in the replacement string).
May 30 2013
parent reply Timothee Cour <thelastmammoth gmail.com> writes:
 According to this: http://dlang.org/phobos/std_**regex.html#.replace<http://dlang.org/phobos/std_
egex.html#.replace> you
can use the same escape sequences for both (\c -> c in the replacement string). Your suggestion does not work; try for yourself by replacing the $$ by \$ in my code. Is that a bug in std.regex' doc? eg: replace("",regex(``),`\$`); => invalid format string in regex replace However everything works fine with $$, see my code above. On Thu, May 30, 2013 at 1:14 AM, Diggory <diggsey googlemail.com> wrote:
 On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:

 ok, here it is:

 https://github.com/**timotheecour/dtools/blob/**
 master/dtools/util/util.d#L78<https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78>
 simplified implementation and added missing escape symbols. Any symbol
 missing?
 I was basing myself based on
http://dlang.org/phobos/std_**regex.html<http://dlang.org/phobos/std_regex.html>,
 table
 entry '\c where c is one of', but that was incomplete. I'm also noting
 that
 table entry 'any character except' is also incomplete.

  Technically any working "escapeRegex" would also function as a valid

 "escapeRegexReplace", although it might be slightly faster to have a
 specialised one.

 not sure, because they escape differently (\$ vs $$).
According to this: http://dlang.org/phobos/std_**regex.html#.replace<http://dlang.org/phobos/std_reg x.html#.replace>you can use the same escape sequences for both (\c -> c in the replacement string).
May 30 2013
next sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
30-May-2013 14:24, Timothee Cour пишет:
  > According to this: http://dlang.org/phobos/std___regex.html#.replace
 <http://dlang.org/phobos/std_regex.html#.replace> you can use the same
 escape sequences for both (\c -> c in the replacement string).

 Your suggestion does not work; try for yourself by replacing the $$ by
 \$ in my code. Is that a bug in std.regex' doc?
 eg:
 replace("",regex(``),`\$`);
 => invalid format string in regex replace
Indeed replace format string is a different beast. I can't recall if I stolen the original std.regex or devised this $$ myself. By any rate replace(fmt, `\$`, "$$") would work or the same with replace from std.string. So I feel it's a bit of stretch to include a function for such a narrow case.
 However everything works fine with $$, see my code above.

 On Thu, May 30, 2013 at 1:14 AM, Diggory <diggsey googlemail.com
 <mailto:diggsey googlemail.com>> wrote:

     On Thursday, 30 May 2013 at 06:50:06 UTC, Timothee Cour wrote:

         ok, here it is:

         https://github.com/__timotheecour/dtools/blob/__master/dtools/util/util.d#L78
         <https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78>
         simplified implementation and added missing escape symbols. Any
         symbol
         missing?
         I was basing myself based on
         http://dlang.org/phobos/std___regex.html
         <http://dlang.org/phobos/std_regex.html>, table
         entry '\c where c is one of', but that was incomplete. I'm also
         noting that
         table entry 'any character except' is also incomplete.

             Technically any working "escapeRegex" would also function as
             a valid

         "escapeRegexReplace", although it might be slightly faster to have a
         specialised one.

         not sure, because they escape differently (\$ vs $$).


     According to this: http://dlang.org/phobos/std___regex.html#.replace
     <http://dlang.org/phobos/std_regex.html#.replace> you can use the
     same escape sequences for both (\c -> c in the replacement string).
-- Dmitry Olshansky
May 30 2013
prev sibling parent "Diggory" <diggsey googlemail.com> writes:
 Your suggestion does not work; try for yourself by replacing 
 the $$ by \$
 in my code. Is that a bug in std.regex' doc?
 eg:
 replace("",regex(``),`\$`);
 => invalid format string in regex replace

 However everything works fine with $$, see my code above.
Either the doc or the code should probably be changed then so they are consistent.
May 30 2013
prev sibling parent Dmitry Olshansky <dmitry.olsh gmail.com> writes:
30-May-2013 10:49, Timothee Cour пишет:
 ok, here it is:

 https://github.com/timotheecour/dtools/blob/master/dtools/util/util.d#L78
 simplified implementation and added missing escape symbols. Any symbol
 missing?
 I was basing myself based on http://dlang.org/phobos/std_regex.html,
 table entry '\c where c is one of', but that was incomplete. I'm also
 noting that table entry 'any character except' is also incomplete.
One thing missing that '.' that should become '\.'.
  > Technically any working "escapeRegex" would also function as a valid
 "escapeRegexReplace", although it might be slightly faster to have a
 specialised one.

 not sure, because they escape differently (\$ vs $$).

 shall i do a pull request for std.regex?
Yes, please. It's was a blind spot for long time. Strictly speaking I think that a generic escaping routine would work: auto escape(S1, S2, C)(S1 src, S2 escapables, C escape='\\') if(isSomeString!S1 && isSomeString!S2 && isSomeChar!C) { .... } Do we have something like this in std.string? Then all we need is a convenience wrapper in std.regex? BTW unescape is as important.
 On Wed, May 29, 2013 at 8:32 PM, Diggory <diggsey googlemail.com
 <mailto:diggsey googlemail.com>> wrote:

     On Wednesday, 29 May 2013 at 23:33:30 UTC, timotheecour wrote:

         something like this, which we should have in std.regex:

         string escapeRegex(string a){
                  import std.string;
                  enum transTable = ['[' : `\[`, '|' : `\|`, '*': `\*`,
         '+': `\+`, '?': `\?`, '(': `\(`, ')': `\)`];
                  return translate(a, transTable);
         }
         string escapeRegexReplace(string a){
                  import std.string;
         //      enum transTable = ['$' : `$$`, '\\' : `\\`];
                  enum transTable = ['$' : `$$`];
                  return translate(a, transTable);
         }

         unittest{
                  string a=`asdf(def[ghi]+*|)`;
                  assert(match(a,regex(__escapeRegex(a))).hit==a);
                  string b=`$aa\/$ $$# $\0$1#$ %# %=+_`;
                  auto
         s=replace(a,regex(escapeRegex(__a)),escapeRegexReplace(b));
                  assert(s==b);
         }


     That would be good (although you missed a few :P)

     Technically any working "escapeRegex" would also function as a valid
     "escapeRegexReplace", although it might be slightly faster to have a
     specialised one.
-- Dmitry Olshansky
May 30 2013