www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Delimited strings

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
I've finally got my lexer to the point where it can successfully
tokenize /usr/include/d2/4.3.2/std/*.d. Yay! :-) Now I'm going back to
fill in the gaps that still haven't been implemented yet. Among which
are delimited strings.

According to the online specs, delimited strings start with q" followed
by the delimiter, whether a character or an identifier. There are some
ambiguities here:

1) Does this:

	q"abca"

represent the string "bc"? Is it the same as:

	q"a
	bc
	a"

?

What do delimiter characters refer to? Are they restricted to
non-identifier symbols, like this:

	q"%abc%"

representing "abc"?


2) Among the possible delimiters are "nesting delimiters", so you can
write stuff like:

	q"(abc)"

which is the same as "abc". Now it's a bit confusing that the specs use
this as an example:

	q"(foo(xxx))"

as though the '(' and ')' inside the string matter. So does this mean
that you can write:

	q"(foo(q"(xxx)"))"

and have it represent the string

	foo(q"(xxx)")

? If not, then why are these called "nested delimiters", since any ')'
not immediately followed by " is obviously not the end of the literal?
For example, this:

	q"(a)b)"

obviously is equal to "a)b" since the first ) can't possibly terminate
the literal.


T

-- 
"I speak better English than this villain Bush" -- Mohammed Saeed
al-Sahaf, Iraqi Minister of Information
Feb 14 2012
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 02/15/2012 01:38 AM, H. S. Teoh wrote:
 I've finally got my lexer to the point where it can successfully
 tokenize /usr/include/d2/4.3.2/std/*.d. Yay! :-) Now I'm going back to
 fill in the gaps that still haven't been implemented yet. Among which
 are delimited strings.

 According to the online specs, delimited strings start with q" followed
 by the delimiter, whether a character or an identifier. There are some
 ambiguities here:

 1) Does this:

 	q"abca"

 represent the string "bc"? Is it the same as:

 	q"a
 	bc
 	a"

 ?

No, q"abca" is illegal. The pattern is q"identifier string identifier" (The terminating new line is kept, so the string in this case is "string\n")
 What do delimiter characters refer to? Are they restricted to
 non-identifier symbols, like this:

 	q"%abc%"

 representing "abc"?

Yes. (all non identifier-starting symbols, digits are ok)
 2) Among the possible delimiters are "nesting delimiters", so you can
 write stuff like:

 	q"(abc)"

 which is the same as "abc". Now it's a bit confusing that the specs use
 this as an example:

 	q"(foo(xxx))"

 as though the '(' and ')' inside the string matter.

They do matter.
 So does this mean
 that you can write:

 	q"(foo(q"(xxx)"))"

 and have it represent the string

 	foo(q"(xxx)")

 ?

Yes.
 If not, then why are these called "nested delimiters", since any ')'
 not immediately followed by " is obviously not the end of the literal?
 For example, this:

 	q"(a)b)"

 obviously is equal to "a)b" since the first ) can't possibly terminate
 the literal.


 T

It is illegal because the parens do not match.
Feb 14 2012
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 02/15/2012 02:03 AM, H. S. Teoh wrote:
 On Wed, Feb 15, 2012 at 01:46:51AM +0100, Timon Gehr wrote:
 [...]
 No, q"abca" is illegal. The pattern is

 q"identifier
 string
 identifier"

 (The terminating new line is kept, so the string in this case is
 "string\n")

I see. The online specs need to be clarified, then. [...]
 So does this mean that you can write:

 	q"(foo(q"(xxx)"))"

 and have it represent the string

 	foo(q"(xxx)")

 ?

Yes.

I see. [...]
 	q"(a)b)"


 It is illegal because the parens do not match.

OK, I see. Thanks for the clarification. Makes me wonder, though: what's the purpose of this convoluted construction? I mean, I can understand why being able to write q"(z=q"(y)";)" would be useful, but why should it matter that the parentheses in q"(a(b))" match? What's the purpose of this restriction? T

q"(()")"
Feb 14 2012