www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - std.string.inPattern()

reply "Janice Caron" <caron800 googlemail.com> writes:
I noticed in the docs that the pattern parameter to inPattern is
specified as an array of chars, not an array of dchars.

I realise that one can easily be converted to the other, but it leaves
me wondering ... does inPattern() work with non-ASCII characters? Can
I, for example, specify "\u0100-\u0200" as a range and expect it to
work?

Also, it's not clear how to match a minus sign.
Oct 22 2007
next sibling parent reply Lutger <lutger.blijdestijn gmail.com> writes:
Janice Caron wrote:
 I noticed in the docs that the pattern parameter to inPattern is
 specified as an array of chars, not an array of dchars.
 
 I realise that one can easily be converted to the other, but it leaves
 me wondering ... does inPattern() work with non-ASCII characters? Can
 I, for example, specify "\u0100-\u0200" as a range and expect it to
 work?

Should work, inPattern converts pattern's chars to dchars internally.
 Also, it's not clear how to match a minus sign.

Put it as the first or last character of the pattern argument and it'll work.
Oct 22 2007
parent Lutger <lutger.blijdestijn gmail.com> writes:
Lutger wrote:
 Janice Caron wrote:

 Put it as the first or last character of the pattern argument and it'll 
 work.

By the way, this is an example of why including unittests as well as contracts in the ddoc system would be useful imo: this behavior of inPattern is 'documented' in it's unittests.
Oct 22 2007
prev sibling parent reply Alexander Panek <alexander.panek brainsware.org> writes:
On Mon, 22 Oct 2007 10:22:48 +0100
"Janice Caron" <caron800 googlemail.com> wrote:

 I noticed in the docs that the pattern parameter to inPattern is
 specified as an array of chars, not an array of dchars.
 
 I realise that one can easily be converted to the other, but it leaves
 me wondering ... does inPattern() work with non-ASCII characters?

char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.) -- Alexander Panek <alexander.panek brainsware.org>
Oct 23 2007
parent reply davidl <davidl 126.com> writes:
在 Tue, 23 Oct 2007 20:07:53 +0800,Alexander Panek  
<alexander.panek brainsware.org> 写道:

 On Mon, 22 Oct 2007 10:22:48 +0100
 "Janice Caron" <caron800 googlemail.com> wrote:

 I noticed in the docs that the pattern parameter to inPattern is
 specified as an array of chars, not an array of dchars.

 I realise that one can easily be converted to the other, but it leaves
 me wondering ... does inPattern() work with non-ASCII characters?

char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.)

I think char[] is just an array of char. Just some stdlib APIs treat it as it's UTF8 encoded. -- 使用 Opera 革命性的电子邮件客户程序: http://www.opera.com/mail/
Oct 23 2007
next sibling parent Lutger <lutger.blijdestijn gmail.com> writes:
davidl wrote:
 在 Tue, 23 Oct 2007 20:07:53 +0800,Alexander Panek 
 <alexander.panek brainsware.org> 写道:
 
 On Mon, 22 Oct 2007 10:22:48 +0100
 "Janice Caron" <caron800 googlemail.com> wrote:

 I noticed in the docs that the pattern parameter to inPattern is
 specified as an array of chars, not an array of dchars.

 I realise that one can easily be converted to the other, but it leaves
 me wondering ... does inPattern() work with non-ASCII characters?

char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.)

I think char[] is just an array of char. Just some stdlib APIs treat it as it's UTF8 encoded.

It's more than that, you can do this with char[] and it works with multibyte characters in UTF-8: foreach (dchar ch; pattern) /* stuff */ This is what inPattern does.
Oct 23 2007
prev sibling parent Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
davidl wrote:
 在 Tue, 23 Oct 2007 20:07:53 +0800,Alexander Panek 
 <alexander.panek brainsware.org> 写道:
 
 On Mon, 22 Oct 2007 10:22:48 +0100
 "Janice Caron" <caron800 googlemail.com> wrote:

 I noticed in the docs that the pattern parameter to inPattern is
 specified as an array of chars, not an array of dchars.

 I realise that one can easily be converted to the other, but it leaves
 me wondering ... does inPattern() work with non-ASCII characters?

char[] is not ASCII, but UTF-8, so any UTF-8 sequence (may it be multibyte, or not) is valid, I suppose. (I don't know how inPattern works, though, so I can't answer the original question - just wanted to clarify this.)

I think char[] is just an array of char. Just some stdlib APIs treat it as it's UTF8 encoded.

char[] is UTF8 by specification, see http://www.digitalmars.com/d/type.html. String constants are also UTF8/16/32, and putting non-utf data into them will not compile. So it is a bit more than convention. To answer the original question: Yes, inPattern works with non-ASCII characters. -- Oskar
Oct 23 2007