c++ - ##: "concatenation vs. juxtaposition" full dissertation...
- dan <dan_member pathlink.com> Dec 02 2003
- "Walter" <walter digitalmars.com> Dec 02 2003
- dan <dan_member pathlink.com> Dec 02 2003
- dan <dan_member pathlink.com> Dec 03 2003
(straight from boost email forum; just pasting it below...) ...............................Even if 'concatenation' per-se is not called for, and against the Standard, could it be that the "." (dot) relieves the preprocessor from responsibility for adding a space at the end of the preceding string (since the dot already acts as a kind of 'separator'..)?
No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while the second forms a new preprocessing token. E.g. #define ID(x) x #define MACRO(a, b) ID(a)b MACRO(+,+) results in two immediately adjacent '+' preprocessing tokens. There is no intervening whitespace. Whether or not whitespace exists is irrelevant for all purposes *except* stringizing and the creation of an <h-char-sequence>. A preprocessor that does text stream -> text stream must insert whitespace in order to avoid the errant retokenization that would occur when the result gets reprocessed by some other tool (such as a C or C++ compiler). However, that is just a hack to make it work similarly in the presence of retokenization which does not exist in the phases of translation.I just find it hilarious how the boost libraries work with so many compilers, but only need dozens of ## in many files to work with DM. I wouldn't be surprised at all that they'd be all wrong; --won't be the first time that everybody is wrong, but this bug may be just about ready for acceptance by ANSI/ISO/whatever... ;-)
I wish that arbitrary token-pasting was well-defined. However, the example given doesn't even make sense (per se). The reason is that token-pasting occurs prior to rescanning, so a construction like this: #define A(x) B(x) ## .h #define B(x) x The period (.) gets concatenated to right parenthesis before the expansion of B(x). Even if arbitrary token-pasting was well-defined, the argument 'x' could contain any amount of whitespace, and cause the construction to not work properly: #define EMPTY() A(file EMPTY()) // file .h In other words, there are only certain points in which whitespace is removed or when whitespace is condensed to only a single whitespace. This is not one of them. As I said before, however, this kind of problem only occurs during stringizing and during the creation of a header-name preprocessing token of the form <h-char-sequence>. Further, there is only one sure-fire way to guarantee that no whitespace exists and that is to concatenate to a placemarker preprocessing token ala C99: #define NO_LEADING(x) NO_LEADING_I(, x) #define NO_LEADING_I(p, x) p ## x #define NO_TRAILING(x) NO_TRAILING_I(, x) #define NO_TRAILING_I(p, x) x ## p #define NO_LEADING_AND_TRAILING(x) \ NO_LEADING(NO_TRAILING(x)) \ /**/ ..but that is not currently well-defined in C++ as it is in C99.-------------------------------------------------------------- ------------------------ >The separator inserted by dmc is to make the preprocessor work right, it >isn't easilly removed. I don't really understand why boost seems to want to >rely on the 'juxtaposition-equals-concatenation' kludge, the ## operator was >added to Standard C specifically to move away from that practice.
Juxtaposition is not concatenation, and a preprocessor that is operating at the character level rather than the preprocessing token level at this point in translation has to jump through hoops to mimic the behavior the actual phases of translation. This is not a kludge on Boost's side, this is a preprocessor implementation kludge revolving around textual representation at a phase of translation where it doesn't exist.-------------------------------------------------------------- ------------------------ Maybe if someone could paste the section of the Standard dealing with this, I'd much appreciate it. Yours. dan
There is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be removed or adjacent whitespace should be condensed. Regards, Paul Mensonides
Dec 02 2003
I appreciate your doing this. I still think, however, that tokens are concatenated with the ## operator, and not otherwise. "dan" <dan_member pathlink.com> wrote in message news:bqjemg$m3b$1 digitaldaemon.com...(straight from boost email forum; just pasting it below...) ...............................Even if 'concatenation' per-se is not called for, and against the Standard, could it be that the "." (dot) relieves the preprocessor from responsibility for adding a space at the end of the preceding string (since the dot already acts as a kind of 'separator'..)?
No. The preprocessor does not "insert spaces" *ever*. At this point in translation, the preprocessor is operating on preprocessing tokens, not characters. There is a big difference between a lack of whitespace and concatenation. The first simply has adjacent preprocessing tokens, while
second forms a new preprocessing token. E.g. #define ID(x) x #define MACRO(a, b) ID(a)b MACRO(+,+) results in two immediately adjacent '+' preprocessing tokens. There is no intervening whitespace. Whether or not whitespace exists is irrelevant
purposes *except* stringizing and the creation of an <h-char-sequence>. A preprocessor that does text stream -> text stream must insert whitespace
order to avoid the errant retokenization that would occur when the result
reprocessed by some other tool (such as a C or C++ compiler). However,
just a hack to make it work similarly in the presence of retokenization
does not exist in the phases of translation.I just find it hilarious how the boost libraries work with so many compilers, but only need dozens of ## in many files to work with DM. I wouldn't be surprised at all that they'd be all wrong; --won't be the first time that everybody is wrong, but this bug may be just about ready for acceptance by ANSI/ISO/whatever... ;-)
I wish that arbitrary token-pasting was well-defined. However, the
given doesn't even make sense (per se). The reason is that token-pasting
prior to rescanning, so a construction like this: #define A(x) B(x) ## .h #define B(x) x The period (.) gets concatenated to right parenthesis before the expansion
B(x). Even if arbitrary token-pasting was well-defined, the argument 'x'
contain any amount of whitespace, and cause the construction to not work properly: #define EMPTY() A(file EMPTY()) // file .h In other words, there are only certain points in which whitespace is
when whitespace is condensed to only a single whitespace. This is not one
them. As I said before, however, this kind of problem only occurs during stringizing and during the creation of a header-name preprocessing token
form <h-char-sequence>. Further, there is only one sure-fire way to guarantee that no whitespace
and that is to concatenate to a placemarker preprocessing token ala C99: #define NO_LEADING(x) NO_LEADING_I(, x) #define NO_LEADING_I(p, x) p ## x #define NO_TRAILING(x) NO_TRAILING_I(, x) #define NO_TRAILING_I(p, x) x ## p #define NO_LEADING_AND_TRAILING(x) \ NO_LEADING(NO_TRAILING(x)) \ /**/ ..but that is not currently well-defined in C++ as it is in C99.-------------------------------------------------------------- ------------------------ >The separator inserted by dmc is to make the preprocessor work right, it >isn't easilly removed. I don't really understand why boost seems to want to >rely on the 'juxtaposition-equals-concatenation' kludge, the ## operator was >added to Standard C specifically to move away from that practice.
Juxtaposition is not concatenation, and a preprocessor that is operating
character level rather than the preprocessing token level at this point in translation has to jump through hoops to mimic the behavior the actual
translation. This is not a kludge on Boost's side, this is a preprocessor implementation kludge revolving around textual representation at a phase
translation where it doesn't exist.-------------------------------------------------------------- ------------------------ Maybe if someone could paste the section of the Standard dealing with this, I'd much appreciate it. Yours. dan
There is no section of the standard that *ever* says whitespace should be inserted. There are only places where it says whitespace should be
adjacent whitespace should be condensed. Regards, Paul Mensonides
Dec 02 2003
I appreciate your doing this. I still think, however, that tokens are concatenated with the ## operator, and not otherwise.
I'm having a hard time understanding his explanation. I think that what he means is that concatenation is not what is intended; --though that is not to mean that an extra space is. I always thought that the preprocessor did pure text substitution; but he seems to be saying that tokenization comes first, and that ## was invented to be able to violate the initial tokenization. But having tokens with a dot in between like in 'something.else' the tokens are well separated already, adding a white space does nothing of value to it. Whereas with 'something else' it needs to preserve the white space, of course. And so in the case you need to violate initial tokenization to concatenate strings, then you have to use ##. But in the case of #define a(x) x a(something).else turning that into something.else is not concatenation, nor juxtaposition, for that matter, because no tokens are in fact merging. So, at the text level you might call it concatenation, but at the token level it isn't. But then I'm not sure what happens if the preprocessor encounters, a(something)else Then we're in real trouble... ;-) Donno what the answer is Walter, I posted the whole thing in comp.lang.c++ but no replies yet... Cheers! dan
Dec 02 2003
To my question: ............................. But then I'm not sure what happens if the preprocessor encounters, a(something)else ............................. AG replied: -----------------------------------------------------16.3.3 [cpp.concat] para 3 (my emphasis): "For both object-like and function-like macro invocations, before the replacement list is reexamined for more macro names to replace, each instance of a ## preprocessing token in the replacement list (not from an argument) is deleted and the preceding preprocessing token is concatenated with the following preprocessing token. *If the result is not a valid preprocessing token, the behavior is undefined*. [...]" In the case in question, ")." is definitely not a valid preprocessing token (it's two).
Which I take to mean that ## is not needed, even if no space ends up being in there, since it would not result in an invalid token being created. And that if an invalid token were being created, the result is undefined, according to the standard, anyways. Just my take. Cheers! dan
Dec 03 2003








dan <dan_member pathlink.com>