www.digitalmars.com         C & C++   DMDScript  

c++ - ##: "concatenation vs. juxtaposition" full dissertation...

reply dan <dan_member pathlink.com> writes:
(straight from boost email forum; just pasting it below...)


...............................
 Even if 'concatenation' per-se is not called for, and against 
 the Standard, could it be that the "." (dot) relieves the 
 preprocessor from 
 responsibility for
 adding a space at the end of the preceding string (since the 
 dot already 
 acts as
 a kind of 'separator'..)?

No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while the second forms a new preprocessing token. E.g. #define ID(x) x #define MACRO(a, b) ID(a)b MACRO(+,+) results in two immediately adjacent '+' preprocessing tokens. There is no intervening whitespace. Whether or not whitespace exists is irrelevant for all purposes *except* stringizing and the creation of an <h-char-sequence>. A preprocessor that does text stream -> text stream must insert whitespace in order to avoid the errant retokenization that would occur when the result gets reprocessed by some other tool (such as a C or C++ compiler). However, that is just a hack to make it work similarly in the presence of retokenization which does not exist in the phases of translation.
 I just find it hilarious how the boost libraries work with so 
 many compilers, but only need dozens of ## in many files to 
 work with DM.  I wouldn't be surprised at all that they'd be 
 all wrong; --won't be the first time that everybody is wrong, 
 but this bug may be just about ready for acceptance by 
 ANSI/ISO/whatever...  ;-)

I wish that arbitrary token-pasting was well-defined. However, the example given doesn't even make sense (per se). The reason is that token-pasting occurs prior to rescanning, so a construction like this: #define A(x) B(x) ## .h #define B(x) x The period (.) gets concatenated to right parenthesis before the expansion of B(x). Even if arbitrary token-pasting was well-defined, the argument 'x' could contain any amount of whitespace, and cause the construction to not work properly: #define EMPTY() A(file EMPTY()) // file .h In other words, there are only certain points in which whitespace is removed or when whitespace is condensed to only a single whitespace. This is not one of them. As I said before, however, this kind of problem only occurs during stringizing and during the creation of a header-name preprocessing token of the form <h-char-sequence>. Further, there is only one sure-fire way to guarantee that no whitespace exists and that is to concatenate to a placemarker preprocessing token ala C99: #define NO_LEADING(x) NO_LEADING_I(, x) #define NO_LEADING_I(p, x) p ## x #define NO_TRAILING(x) NO_TRAILING_I(, x) #define NO_TRAILING_I(p, x) x ## p #define NO_LEADING_AND_TRAILING(x) \ NO_LEADING(NO_TRAILING(x)) \ /**/ ..but that is not currently well-defined in C++ as it is in C99.
 --------------------------------------------------------------
 ------------------------
  >The separator inserted by dmc is to make the preprocessor 
 work right, it  >isn't easilly removed. I don't really 
 understand why boost seems to want to  >rely on the 
 'juxtaposition-equals-concatenation' kludge, the ## operator 
 was  >added to Standard C specifically to move away from that 
 practice.

Juxtaposition is not concatenation, and a preprocessor that is operating at the character level rather than the preprocessing token level at this point in translation has to jump through hoops to mimic the behavior the actual phases of translation. This is not a kludge on Boost's side, this is a preprocessor implementation kludge revolving around textual representation at a phase of translation where it doesn't exist.
 --------------------------------------------------------------
 ------------------------
 Maybe if someone could paste the section of the Standard 
 dealing with this, 
 I'd much appreciate it.
 Yours.
 dan

There is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be removed or adjacent whitespace should be condensed. Regards, Paul Mensonides
Dec 02 2003
parent reply "Walter" <walter digitalmars.com> writes:
I appreciate your doing this. I still think, however, that tokens are
concatenated with the ## operator, and not otherwise.

"dan" <dan_member pathlink.com> wrote in message
news:bqjemg$m3b$1 digitaldaemon.com...
 (straight from boost email forum; just pasting it below...)


 ...............................
 Even if 'concatenation' per-se is not called for, and against
 the Standard, could it be that the "." (dot) relieves the
 preprocessor from
 responsibility for
 adding a space at the end of the preceding string (since the
 dot already
 acts as
 a kind of 'separator'..)?

No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while

 second forms a new preprocessing token.  E.g.

 #define ID(x) x

 #define MACRO(a, b) ID(a)b

 MACRO(+,+)

 results in two immediately adjacent '+' preprocessing tokens.  There is no
 intervening whitespace.  Whether or not whitespace exists is irrelevant

 purposes *except* stringizing and the creation of an <h-char-sequence>.

 A preprocessor that does text stream -> text stream must insert whitespace

 order to avoid the errant retokenization that would occur when the result

 reprocessed by some other tool (such as a C or C++ compiler).  However,

 just a hack to make it work similarly in the presence of retokenization

 does not exist in the phases of translation.

 I just find it hilarious how the boost libraries work with so
 many compilers, but only need dozens of ## in many files to
 work with DM.  I wouldn't be surprised at all that they'd be
 all wrong; --won't be the first time that everybody is wrong,
 but this bug may be just about ready for acceptance by
 ANSI/ISO/whatever...  ;-)

I wish that arbitrary token-pasting was well-defined. However, the

 given doesn't even make sense (per se).  The reason is that token-pasting

 prior to rescanning, so a construction like this:

 #define A(x) B(x) ## .h
 #define B(x) x

 The period (.) gets concatenated to right parenthesis before the expansion

 B(x).  Even if arbitrary token-pasting was well-defined, the argument 'x'

 contain any amount of whitespace, and cause the construction to not work
 properly:

 #define EMPTY()

 A(file EMPTY()) // file .h

 In other words, there are only certain points in which whitespace is

 when whitespace is condensed to only a single whitespace.  This is not one

 them.  As I said before, however, this kind of problem only occurs during
 stringizing and during the creation of a header-name preprocessing token

 form <h-char-sequence>.

 Further, there is only one sure-fire way to guarantee that no whitespace

 and that is to concatenate to a placemarker preprocessing token ala C99:

 #define NO_LEADING(x) NO_LEADING_I(, x)
 #define NO_LEADING_I(p, x) p ## x

 #define NO_TRAILING(x) NO_TRAILING_I(, x)
 #define NO_TRAILING_I(p, x) x ## p

 #define NO_LEADING_AND_TRAILING(x) \
 NO_LEADING(NO_TRAILING(x)) \
 /**/

 ..but that is not currently well-defined in C++ as it is in C99.

 --------------------------------------------------------------
 ------------------------
  >The separator inserted by dmc is to make the preprocessor
 work right, it  >isn't easilly removed. I don't really
 understand why boost seems to want to  >rely on the
 'juxtaposition-equals-concatenation' kludge, the ## operator
 was  >added to Standard C specifically to move away from that
 practice.

Juxtaposition is not concatenation, and a preprocessor that is operating

 character level rather than the preprocessing token level at this point in
 translation has to jump through hoops to mimic the behavior the actual

 translation.  This is not a kludge on Boost's side, this is a preprocessor
 implementation kludge revolving around textual representation at a phase

 translation where it doesn't exist.

 --------------------------------------------------------------
 ------------------------
 Maybe if someone could paste the section of the Standard
 dealing with this,
 I'd much appreciate it.
 Yours.
 dan

There is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be

 adjacent whitespace should be condensed.

 Regards,
 Paul Mensonides

Dec 02 2003
parent reply dan <dan_member pathlink.com> writes:
I appreciate your doing this. I still think, however, that tokens are
concatenated with the ## operator, and not otherwise.

I'm having a hard time understanding his explanation. I think that what he means is that concatenation is not what is intended; --though that is not to mean that an extra space is. I always thought that the preprocessor did pure text substitution; but he seems to be saying that tokenization comes first, and that ## was invented to be able to violate the initial tokenization. But having tokens with a dot in between like in 'something.else' the tokens are well separated already, adding a white space does nothing of value to it. Whereas with 'something else' it needs to preserve the white space, of course. And so in the case you need to violate initial tokenization to concatenate strings, then you have to use ##. But in the case of #define a(x) x a(something).else turning that into something.else is not concatenation, nor juxtaposition, for that matter, because no tokens are in fact merging. So, at the text level you might call it concatenation, but at the token level it isn't. But then I'm not sure what happens if the preprocessor encounters, a(something)else Then we're in real trouble... ;-) Donno what the answer is Walter, I posted the whole thing in comp.lang.c++ but no replies yet... Cheers! dan
Dec 02 2003
parent dan <dan_member pathlink.com> writes:
To my question:

.............................
But then I'm not sure what happens if the preprocessor encounters,

a(something)else
.............................

AG replied:

-----------------------------------------------------
16.3.3 [cpp.concat] para 3 (my emphasis):

"For both object-like and function-like macro invocations, before the
replacement list is reexamined for more macro names to replace, each
instance of a ## preprocessing token in the replacement list (not from an
argument) is deleted and the preceding preprocessing token is concatenated
with the following preprocessing token. *If the result is not a valid
preprocessing token, the behavior is undefined*. [...]"

In the case in question, ")." is definitely not a valid preprocessing token
(it's two).

Which I take to mean that ## is not needed, even if no space ends up being in there, since it would not result in an invalid token being created. And that if an invalid token were being created, the result is undefined, according to the standard, anyways. Just my take. Cheers! dan
Dec 03 2003