c++ - ##: "concatenation vs. juxtaposition" full dissertation...

dan (55/84) Dec 02 2003 No. The preprocessor does not "insert spaces" *ever*. At this point in

Walter (22/106) Dec 02 2003 I appreciate your doing this. I still think, however, that tokens are

dan (26/28) Dec 02 2003 I'm having a hard time understanding his explanation. I think that what ...

dan (14/23) Dec 03 2003 To my question:

dan <dan_member pathlink.com> writes:

(straight from boost email forum; just pasting it below...)


...............................
 Even if 'concatenation' per-se is not called for, and against 
 the Standard, could it be that the "." (dot) relieves the 
 preprocessor from 
 responsibility for
 adding a space at the end of the preceding string (since the 
 dot already 
 acts as
 a kind of 'separator'..)?

No.  The preprocessor does not "insert spaces" *ever*.  At this point in
translation, the preprocessor is operating on preprocessing tokens, not
characters.  There is a big difference between a lack of whitespace and
concatenation.  The first simply has adjacent preprocessing tokens, while the
second forms a new preprocessing token.  E.g.

#define ID(x) x

#define MACRO(a, b) ID(a)b

MACRO(+,+)

results in two immediately adjacent '+' preprocessing tokens.  There is no
intervening whitespace.  Whether or not whitespace exists is irrelevant for all
purposes *except* stringizing and the creation of an <h-char-sequence>.

A preprocessor that does text stream -> text stream must insert whitespace in
order to avoid the errant retokenization that would occur when the result gets
reprocessed by some other tool (such as a C or C++ compiler).  However, that is
just a hack to make it work similarly in the presence of retokenization which
does not exist in the phases of translation.

 I just find it hilarious how the boost libraries work with so 

 work with DM.  I wouldn't be surprised at all that they'd be 
 all wrong; --won't be the first time that everybody is wrong, 
 but this bug may be just about ready for acceptance by 
 ANSI/ISO/whatever...  ;-)

I wish that arbitrary token-pasting was well-defined.  However, the example
given doesn't even make sense (per se).  The reason is that token-pasting occurs
prior to rescanning, so a construction like this:

#define
#define B(x) x

The period (.) gets concatenated to right parenthesis before the expansion of
B(x).  Even if arbitrary token-pasting was well-defined, the argument 'x' could
contain any amount of whitespace, and cause the construction to not work
properly:

#define EMPTY()

A(file EMPTY()) // file .h

In other words, there are only certain points in which whitespace is removed or
when whitespace is condensed to only a single whitespace.  This is not one of
them.  As I said before, however, this kind of problem only occurs during
stringizing and during the creation of a header-name preprocessing token of the
form <h-char-sequence>.

Further, there is only one sure-fire way to guarantee that no whitespace exists
and that is to concatenate to a placemarker preprocessing token ala C99:

#define NO_LEADING(x) NO_LEADING_I(, x)
#define

#define NO_TRAILING(x) NO_TRAILING_I(, x)
#define

#define NO_LEADING_AND_TRAILING(x) \
NO_LEADING(NO_TRAILING(x)) \
/**/

..but that is not currently well-defined in C++ as it is in C99.

 --------------------------------------------------------------
 ------------------------
  >The separator inserted by dmc is to make the preprocessor 
 work right, it  >isn't easilly removed. I don't really 
 understand why boost seems to want to  >rely on the 

 was  >added to Standard C specifically to move away from that 
 practice.

Juxtaposition is not concatenation, and a preprocessor that is operating at the
character level rather than the preprocessing token level at this point in
translation has to jump through hoops to mimic the behavior the actual phases of
translation.  This is not a kludge on Boost's side, this is a preprocessor
implementation kludge revolving around textual representation at a phase of
translation where it doesn't exist.

 --------------------------------------------------------------
 ------------------------
 Maybe if someone could paste the section of the Standard 
 dealing with this, 
 I'd much appreciate it.
 Yours.
 dan

There is no section of the standard that *ever* says whitespace should be
inserted.  There are only places where it says whitespace should be removed or
adjacent whitespace should be condensed.

Regards,
Paul Mensonides

Dec 02 2003

"Walter" <walter digitalmars.com> writes:

I appreciate your doing this. I still think, however, that tokens are


"dan" <dan_member pathlink.com> wrote in message
news:bqjemg$m3b$1 digitaldaemon.com...
 (straight from boost email forum; just pasting it below...)


 ...............................
 Even if 'concatenation' per-se is not called for, and against
 the Standard, could it be that the "." (dot) relieves the
 preprocessor from
 responsibility for
 adding a space at the end of the preceding string (since the
 dot already
 acts as
 a kind of 'separator'..)?

 No.  The preprocessor does not "insert spaces" *ever*.  At this point in
 translation, the preprocessor is operating on preprocessing tokens, not
 characters.  There is a big difference between a lack of whitespace and
 concatenation.  The first simply has adjacent preprocessing tokens, while

the
 second forms a new preprocessing token.  E.g.

 #define ID(x) x

 #define MACRO(a, b) ID(a)b

 MACRO(+,+)

 results in two immediately adjacent '+' preprocessing tokens.  There is no
 intervening whitespace.  Whether or not whitespace exists is irrelevant

for all
 purposes *except* stringizing and the creation of an <h-char-sequence>.

 A preprocessor that does text stream -> text stream must insert whitespace

in
 order to avoid the errant retokenization that would occur when the result

gets
 reprocessed by some other tool (such as a C or C++ compiler).  However,

that is
 just a hack to make it work similarly in the presence of retokenization

which
 does not exist in the phases of translation.

 I just find it hilarious how the boost libraries work with so

 work with DM.  I wouldn't be surprised at all that they'd be
 all wrong; --won't be the first time that everybody is wrong,
 but this bug may be just about ready for acceptance by
 ANSI/ISO/whatever...  ;-)

 I wish that arbitrary token-pasting was well-defined.  However, the

example
 given doesn't even make sense (per se).  The reason is that token-pasting

occurs
 prior to rescanning, so a construction like this:

 #define
 #define B(x) x

 The period (.) gets concatenated to right parenthesis before the expansion

of
 B(x).  Even if arbitrary token-pasting was well-defined, the argument 'x'

could
 contain any amount of whitespace, and cause the construction to not work
 properly:

 #define EMPTY()

 A(file EMPTY()) // file .h

 In other words, there are only certain points in which whitespace is

removed or
 when whitespace is condensed to only a single whitespace.  This is not one

of
 them.  As I said before, however, this kind of problem only occurs during
 stringizing and during the creation of a header-name preprocessing token

of the
 form <h-char-sequence>.

 Further, there is only one sure-fire way to guarantee that no whitespace

exists
 and that is to concatenate to a placemarker preprocessing token ala C99:

 #define NO_LEADING(x) NO_LEADING_I(, x)
 #define

 #define NO_TRAILING(x) NO_TRAILING_I(, x)
 #define

 #define NO_LEADING_AND_TRAILING(x) \
 NO_LEADING(NO_TRAILING(x)) \
 /**/

 ..but that is not currently well-defined in C++ as it is in C99.

 --------------------------------------------------------------
 ------------------------
  >The separator inserted by dmc is to make the preprocessor
 work right, it  >isn't easilly removed. I don't really
 understand why boost seems to want to  >rely on the

 was  >added to Standard C specifically to move away from that
 practice.

 Juxtaposition is not concatenation, and a preprocessor that is operating

at the
 character level rather than the preprocessing token level at this point in
 translation has to jump through hoops to mimic the behavior the actual

phases of
 translation.  This is not a kludge on Boost's side, this is a preprocessor
 implementation kludge revolving around textual representation at a phase

of
 translation where it doesn't exist.

 --------------------------------------------------------------
 ------------------------
 Maybe if someone could paste the section of the Standard
 dealing with this,
 I'd much appreciate it.
 Yours.
 dan

 There is no section of the standard that *ever* says whitespace should be
 inserted.  There are only places where it says whitespace should be

removed or
 adjacent whitespace should be condensed.

 Regards,
 Paul Mensonides

Dec 02 2003

dan <dan_member pathlink.com> writes:

I appreciate your doing this. I still think, however, that tokens are


I'm having a hard time understanding his explanation. I think that what he means
is that concatenation is not what is intended; --though that is not to mean that
an extra space is.

I always thought that the preprocessor did pure text substitution; but he seems

to violate the initial tokenization. But having tokens with a dot in between
like in 'something.else' the tokens are well separated already, adding a white
space does nothing of value to it.

Whereas with 'something else' it needs to preserve the white space, of course.
And so in the case you need to violate initial tokenization to concatenate

But in the case of

#define a(x) x

a(something).else

turning that into

something.else

is not concatenation, nor juxtaposition, for that matter, because no tokens are
in fact merging. So, at the text level you might call it concatenation, but at
the token level it isn't.
But then I'm not sure what happens if the preprocessor encounters,

a(something)else

Then we're in real trouble...  ;-)

Donno what the answer is Walter, I posted the whole thing in comp.lang.c++ but
no replies yet...

Cheers!
dan

Dec 02 2003

dan <dan_member pathlink.com> writes:

To my question:

.............................
But then I'm not sure what happens if the preprocessor encounters,

a(something)else
.............................

AG replied:

-----------------------------------------------------
16.3.3 [cpp.concat] para 3 (my emphasis):

"For both object-like and function-like macro invocations, before the
replacement list is reexamined for more macro names to replace, each

argument) is deleted and the preceding preprocessing token is concatenated
with the following preprocessing token. *If the result is not a valid
preprocessing token, the behavior is undefined*. [...]"

In the case in question, ")." is definitely not a valid preprocessing token
(it's two).

-----------------------------------------------------


there, since it would not result in an invalid token being created. And that if
an invalid token were being created, the result is undefined, according to the
standard, anyways. Just my take.

Cheers!
dan

Dec 03 2003

D Programming

C/C++ Programming

Other

c++ - ##: "concatenation vs. juxtaposition" full dissertation...