www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - == comparison of string literals, and their usage

reply diniz <diniz posteo.net> writes:
Hello,

Since literal strings are interned (and immutable), can I count on the fact
that 
they are compared (==) by pointer?

Context: The use case is a custom lexer for a custom language. I initially 
wanted to represent lexeme classes by a big enum 'LexClass'. However, this
makes 
me write 3 times all constant lexemes (keywords and keysigns):
1- in the enum of lexeme classes
2- in an array of constants (for the contant-scanning func)
3- in an associative array mapping constants to their classes
However, if literal strings are compared by equality, then they are kinds of 
Scheme or Ruby symbols: read enum values representing *cases*, which is exactly 
what I need. I would thus use the constants' strings themselves as lexeme 
classes... the parser would not be slown down.

What do you think?
-- 
diniz {la vita e estranj}
Apr 05
parent reply AltFunction1 <af1af1af1 af1.af1> writes:
On Friday, 5 April 2019 at 14:49:50 UTC, diniz wrote:
 Hello,

 Since literal strings are interned (and immutable), can I count 
 on the fact that they are compared (==) by pointer?
No. "==" performs a full array comparison and "is" is apparently simplified at compile time. In the compiler there's no notion of string literal as a special expression. It's always a StringExp. See https://d.godbolt.org/z/K5R6u6. However you're right to say that literal are not duplicated.
Apr 06
parent reply diniz <diniz posteo.net> writes:
Le 06/04/2019 à 16:07, AltFunction1 via Digitalmars-d-learn a écrit :
 On Friday, 5 April 2019 at 14:49:50 UTC, diniz wrote:
 Hello,

 Since literal strings are interned (and immutable), can I count on the fact 
 that they are compared (==) by pointer?
No. "==" performs a full array comparison and "is" is apparently simplified at compile time. In the compiler there's no notion of string literal as a special expression. It's always a StringExp. See https://d.godbolt.org/z/K5R6u6. However you're right to say that literal are not duplicated.
Thank you very much. So, I still could store and use and compare string pointers myself [1], and get valid results, meaning: pointer equality implies (literal) string equality. Or am I wrong? The point is, the parser, operating on an array of prescanned lexemes, will constantly check whether a valid lexeme is present simply by checking the lexeme "class". I don't want that to be a real string comp, too expesensive and for no gain. [1] As in the second comp of your example: void main() { auto c2 = "one" == "two"; auto c1 = "one".ptr is "two".ptr; } -- diniz {la vita e estranj}
Apr 06
parent reply lithium iodate <whatdoiknow doesntexist.net> writes:
On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
 So, I still could store and use and compare string pointers 
 myself [1], and get valid results, meaning: pointer equality 
 implies (literal) string equality. Or am I wrong? The point is, 
 the parser, operating on an array of prescanned lexemes,  will 
 constantly check whether a valid lexeme is present simply by 
 checking the lexeme "class". I don't want that to be a real 
 string comp, too expesensive and for no gain.

 [1] As in the second comp of your example:
 void main()
 {
     auto c2 =  "one" == "two";
     auto c1 =  "one".ptr is "two".ptr;
 }
Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in string a = "hello"; string b = a; assert(a is b); assert(a[] is b[]); Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.
Apr 06
next sibling parent diniz <diniz posteo.net> writes:
Le 06/04/2019 à 21:47, lithium iodate via Digitalmars-d-learn a écrit :
 On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
 So, I still could store and use and compare string pointers myself [1], and 
 get valid results, meaning: pointer equality implies (literal) string 
 equality. Or am I wrong? The point is, the parser, operating on an array of 
 prescanned lexemes,  will constantly check whether a valid lexeme is present 
 simply by checking the lexeme "class". I don't want that to be a real string 
 comp, too expesensive and for no gain.

 [1] As in the second comp of your example:
 void main()
 {
     auto c2 =  "one" == "two";
     auto c1 =  "one".ptr is "two".ptr;
 }
Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in string a = "hello"; string b = a; assert(a is b); assert(a[] is b[]); Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.
Thank you very much! And yes, properly documenting is also important to me. -- diniz {la vita e estranj}
Apr 06
prev sibling parent reply bauss <jj_1337 live.dk> writes:
On Saturday, 6 April 2019 at 19:47:14 UTC, lithium iodate wrote:
 On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
 So, I still could store and use and compare string pointers 
 myself [1], and get valid results, meaning: pointer equality 
 implies (literal) string equality. Or am I wrong? The point 
 is, the parser, operating on an array of prescanned lexemes,  
 will constantly check whether a valid lexeme is present simply 
 by checking the lexeme "class". I don't want that to be a real 
 string comp, too expesensive and for no gain.

 [1] As in the second comp of your example:
 void main()
 {
     auto c2 =  "one" == "two";
     auto c1 =  "one".ptr is "two".ptr;
 }
Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in string a = "hello"; string b = a; assert(a is b); assert(a[] is b[]); Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.
To add onto this. Here is an example why it's important to compare the length as well: string a = "hello"; string b = a[0 .. 3]; assert(a.ptr == b.ptr); assert(a.length != b.length);
Apr 07
parent diniz <diniz posteo.net> writes:
Le 07/04/2019 à 14:23, bauss via Digitalmars-d-learn a écrit :
 On Saturday, 6 April 2019 at 19:47:14 UTC, lithium iodate wrote:
 On Saturday, 6 April 2019 at 15:35:22 UTC, diniz wrote:
 So, I still could store and use and compare string pointers myself [1], and 
 get valid results, meaning: pointer equality implies (literal) string 
 equality. Or am I wrong? The point is, the parser, operating on an array of 
 prescanned lexemes, will constantly check whether a valid lexeme is present 
 simply by checking the lexeme "class". I don't want that to be a real string 
 comp, too expesensive and for no gain.

 [1] As in the second comp of your example:
 void main()
 {
     auto c2 =  "one" == "two";
     auto c1 =  "one".ptr is "two".ptr;
 }
Not quite. D-strings strictly consist of pointer *and* length, so you need to compare the .length properties as well to correctly conclude that the strings equal. You can concisely do that in one go by simply `is` comparing the array references as in string a = "hello"; string b = a; assert(a is b); assert(a[] is b[]); Of course, if the strings are never sliced, you can just compare the pointers and be done, just make sure to document how it operates. Depending on the circumstances I'd throw in some asserts that do actual strings comparison to verify the program logic.
To add onto this. Here is an example why it's important to compare the length as well:     string a = "hello";     string b = a[0 .. 3];     assert(a.ptr == b.ptr);     assert(a.length != b.length);
Thank you! Very clear :-). -- diniz {la vita e estranj}
Apr 07