www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Strings and Slices

reply Mike Brown <mikey.be gmail.com> writes:
Hi All,

I'm rebuilding a C++, and the beginning section is a lexer that 
uses strings, and string_view.

Is slices comparable to a string_view?

The architecture of the lexer is a single layer (Non-)FSM Lexer. 
Basically a main loop, checking the first letter of the current 
input position, which then calls a function/lambda to continue 
from there.

string lex_identifier(ref string input) {
   ...
}

while(!input.empty()) {
   if (isAlpha(input.front)) {
     auto tmp = lex_identifier(input);
   }
}

I'm passing in a ref because I ideally want to iterate over a 
string, and to produce a slice to the lexeme. This needs a way to 
create a mark at a given point, and have an iteration point. 
These marks seem logically to be the slice start and end.

Is there a better way to make these slices?
The ref doesn't work well with unittests that pass in a literal, 
is there an easier way than creating a temp var for the input?

In the body of the lex_identifier, i am using drop(). This 
doesn't seem to do what I thought it did. I want to create a 
slice from the beginning of a ref slice upto a given mark, and 
move the beginning point of that ref slice to that mark also.

what is the best way to achieve this?

Kind regards,
Mikey
Feb 18
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 18 February 2021 at 20:47:33 UTC, Mike Brown wrote:
 Is slices comparable to a string_view?
My c++ is rusty af but yes I think so. A d slice is `struct slice { size_t length; T* ptr; }` so when in doubt just think back to what that does.
 string lex_identifier(ref string input) {
And that makes this ref iffy. Since it is already passed as a ptr+length pair, you rarely need ref on it. Only when you'd use a char** in C; that is, when you can reassign the value of the pointer and have the caller see that change (e.g. you are appending to it). If you're just looking, no need for ref.
 In the body of the lex_identifier, i am using drop(). This 
 doesn't seem to do what I thought it did. I want to create a 
 slice from the beginning of a ref slice upto a given mark, and 
 move the beginning point of that ref slice to that mark also.
I do it in two steps: piece = input[0 .. mark]; // get piece out input = input[mark .. $]; // advance the original slice Note that such operations are just `ptr += mark; length -= mark;` so they are very cheap.
Feb 18
parent reply Mike Brown <mikey.be gmail.com> writes:
On Thursday, 18 February 2021 at 21:08:45 UTC, Adam D. Ruppe 
wrote:
 On Thursday, 18 February 2021 at 20:47:33 UTC, Mike Brown wrote:
 [...]
My c++ is rusty af but yes I think so. A d slice is `struct slice { size_t length; T* ptr; }` so when in doubt just think back to what that does.
 [...]
And that makes this ref iffy. Since it is already passed as a ptr+length pair, you rarely need ref on it. Only when you'd use a char** in C; that is, when you can reassign the value of the pointer and have the caller see that change (e.g. you are appending to it). If you're just looking, no need for ref.
 [...]
I do it in two steps: piece = input[0 .. mark]; // get piece out input = input[mark .. $]; // advance the original slice Note that such operations are just `ptr += mark; length -= mark;` so they are very cheap.
Thank you. Is there a standardised type to make "mark"? size_t or is a normal integer suitable?
Feb 20
parent reply Mike Brown <mikey.be gmail.com> writes:
On Saturday, 20 February 2021 at 19:28:00 UTC, Mike Brown wrote:
 On Thursday, 18 February 2021 at 21:08:45 UTC, Adam D. Ruppe 
 wrote:
 [...]
Thank you. Is there a standardised type to make "mark"? size_t or is a normal integer suitable?
Ah, and whats the recommended way to iterate over a slice using a mark? Can I get the current iteration point from a foreach loop?
Feb 20
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 2/20/21 2:31 PM, Mike Brown wrote:
 On Saturday, 20 February 2021 at 19:28:00 UTC, Mike Brown wrote:
 On Thursday, 18 February 2021 at 21:08:45 UTC, Adam D. Ruppe wrote:
 [...]
Thank you. Is there a standardised type to make "mark"? size_t or is a normal integer suitable?
Ah, and whats the recommended way to iterate over a slice using a mark? Can I get the current iteration point from a foreach loop?
ints work as slice endpoints just fine. They will get cast to size_t when used for slicing. If you are going to keep the mark valid, you shouldn't slice away the input, because now 0 becomes the point at the mark. Typically with slices, you don't store a position (sometimes), you just divvy up the slice into pieces you care about. It all depends on what information is important. -Steve
Feb 20