www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - A case for opImplicitCast: making string search work better

reply downs <default_357-line yahoo.de> writes:
Consider this type:

struct StringPosition {
  size_t pos;
  void opImplicitCast(out size_t sz) {
    sz = pos;
  }
  void opImplicitCast(out bool b) {
    b = pos != -1;
  }
}

Wouldn't that effectively sidestep most problems people have with find
returning -1?

Or am I missing something?

Of course, this would require a way to resolve ambiguities, i.e.
functions/statements with preferences - for instance, if() would "prefer" bool
over int. I don't know if this is possible.
May 15 2009
next sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 15 May 2009 06:07:10 -0400, downs <default_357-line yahoo.de>  
wrote:

 Consider this type:

 struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }

 Wouldn't that effectively sidestep most problems people have with find  
 returning -1?

 Or am I missing something?

 Of course, this would require a way to resolve ambiguities, i.e.  
 functions/statements with preferences - for instance, if() would  
 "prefer" bool over int. I don't know if this is possible.

No, I want the length of the string if it is not found, not -1. It's not a question of -1 vs. false, it's a question of usability. -1 can be tested as well as string.length, but -1 cannot be seamlessly forwarded to slicing operations. Most of the time, you want to USE the index returned, not just check if it is valid. -Steve
May 15 2009
prev sibling next sibling parent reply grauzone <none example.net> writes:
downs wrote:
 Consider this type:
 
 struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }
 
 Wouldn't that effectively sidestep most problems people have with find
returning -1?
 
 Or am I missing something?

Could work, but it looks overcomplicated. It could be intuitive, but even then someone new would not be able to figure out what is actually going on, without digging deep into the internals of the library (or the D language). I like my way better (returning two slices for search). Also, it wouldn't require this:
 Of course, this would require a way to resolve ambiguities, i.e.
functions/statements with preferences - for instance, if() would "prefer" bool
over int. I don't know if this is possible.

...and with my way, it's very simple to check if the search was successful. e.g. void myfind(char[] text, char[] search_for, out char[] before, char[] after); char[] before, after; myfind(text, something, before, after); //was it found? bool was_found = !!after.length; //where was it found? int at = before.length; Both operations are frequently needed and don't require you to reference text or something again, which means they can be returned by other functions, and you don't need to break the "flow" by putting them into temporary variables. With multiple return values, the signature of myfind() could become nicer, too: auto before, after = myfind(text, something); (Or at least allow static arrays as return values for functions.) Am _I_ missing something?
May 15 2009
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 15 May 2009 09:36:51 -0400, grauzone <none example.net> wrote:

 downs wrote:
 Consider this type:
  struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }
  Wouldn't that effectively sidestep most problems people have with find  
 returning -1?
  Or am I missing something?

Could work, but it looks overcomplicated. It could be intuitive, but even then someone new would not be able to figure out what is actually going on, without digging deep into the internals of the library (or the D language). I like my way better (returning two slices for search). Also, it wouldn't require this:
 Of course, this would require a way to resolve ambiguities, i.e.  
 functions/statements with preferences - for instance, if() would  
 "prefer" bool over int. I don't know if this is possible.

...and with my way, it's very simple to check if the search was successful. e.g. void myfind(char[] text, char[] search_for, out char[] before, char[] after); char[] before, after; myfind(text, something, before, after); //was it found? bool was_found = !!after.length; //where was it found? int at = before.length; Both operations are frequently needed and don't require you to reference text or something again, which means they can be returned by other functions, and you don't need to break the "flow" by putting them into temporary variables. With multiple return values, the signature of myfind() could become nicer, too: auto before, after = myfind(text, something); (Or at least allow static arrays as return values for functions.) Am _I_ missing something?

Your solution actually goes the opposite direction than I'd like. That is, it looks more complicated than simply returning an index or a slice. I don't want to have to declare return values ahead of time and I'm not holding my breath for multiple return values. You may be able to return a pair struct, but still, what could be simpler than returning an index? It's easy to construct the value you want (before or after), and if you both multiple values, that is also possible (and probably results in simpler code). -Steve
May 15 2009
parent reply grauzone <none example.net> writes:
 to return a pair struct, but still, what could be simpler than returning 
 an index?  It's easy to construct the value you want (before or after), 
 and if you both multiple values, that is also possible (and probably 
 results in simpler code).

All what you can do with the index is 1. compare it against the length of the searched string to test if the search was successful 2. slice the searched string 3. do something rather special What else would you do? You'd just have to store the searched string as a temporary, and then you'd slice the searched string (for 2.), or compare it against the length of the searched string. You always have to keep the searched string in a temporary. That's rather unpractical. Oh sure, if you _really_ need the index (for 3.), then directly returning an index is of course the best way. With my approach, you don't need to grab the passed searched string again. All of these can be done in a single, trivial expression (for 3. getting the index only). Actually, compared to your approach, this would just eliminate the trivial but annoying slicing code after the search call, that'd you'd type in... what, 90% of all cases? The thing about multiple return values is true (sadly), but in this case, you could simply return a static array (char[][2]). At least that should be possible in D2 at some point. Maybe a struct would work fine too. But I don't like it, because the programmer had to look up the struct members first. He had to memorize the struct members, and couldn't tell what the function returns just by looking at the function signature. (Yay bikeshed issues.)
May 15 2009
parent grauzone <none example.net> writes:
 a good point.  The only drawback in this case is you are constructing 
 information you sometimes do not need or care about.  If all you want is 
 whether it succeeded or not, then you don't need two ranges constructed 
 and returned.  But therein lies a fundamental tradeoff that cannot be 
 avoided.  The very basic information you get is the index, and with 
 that, you can construct any larger pieces from the pieces you have, but 
 not always easily, and not without repeating identifiers.

The whole point of the search function is to make programming easier, isn't it? Its implementation is rather trivial. You call it because it makes your life easier. I don't see why constructing this "additional information" is a problem. Anyway, you always could move this to a second function. I just think that returning a tuple of slices is the most useful way.
 I like your approach, but with the single return type, not out 
 parameters.  Having out parameters would be a deal breaker.

I just wanted to show something, that works on D1 without memory allocation. And without returning a struct.
 If this were implemented, the return type would be very common.  At some 
 point you have to look up everything (what's a "range"?).

I think multiple return values are simpler, and more versatile, elegant and intuitive. I contrast, having to define structs for return values of (almost) trivial functions is not a good sign. You could as well pass all in-parameters of a function as struct, claiming this is more practical, because then you can have named arguments and arbitrary default arguments. Huh.
May 15 2009
prev sibling parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Fri, 15 May 2009 10:30:17 -0400, grauzone <none example.net> wrote:

 to return a pair struct, but still, what could be simpler than  
 returning an index?  It's easy to construct the value you want (before  
 or after), and if you both multiple values, that is also possible (and  
 probably results in simpler code).

All what you can do with the index is 1. compare it against the length of the searched string to test if the search was successful 2. slice the searched string 3. do something rather special What else would you do? You'd just have to store the searched string as a temporary, and then you'd slice the searched string (for 2.), or compare it against the length of the searched string. You always have to keep the searched string in a temporary. That's rather unpractical. Oh sure, if you _really_ need the index (for 3.), then directly returning an index is of course the best way. With my approach, you don't need to grab the passed searched string again. All of these can be done in a single, trivial expression (for 3. getting the index only). Actually, compared to your approach, this would just eliminate the trivial but annoying slicing code after the search call, that'd you'd type in... what, 90% of all cases?

I hadn't thought of the case where you are calling *on* a temporary, I always had in mind that the source string was already declared, this is a good point. The only drawback in this case is you are constructing information you sometimes do not need or care about. If all you want is whether it succeeded or not, then you don't need two ranges constructed and returned. But therein lies a fundamental tradeoff that cannot be avoided. The very basic information you get is the index, and with that, you can construct any larger pieces from the pieces you have, but not always easily, and not without repeating identifiers. I like your approach, but with the single return type, not out parameters. Having out parameters would be a deal breaker. I'd prefer not to have two strings but a string that has an identified pivot point. You could generate the desired left and right hand sides dynamically, and it would work without any changes to the current syntax. for example: struct partition(R) { R range; uint pivot; R lhs() {return range[0..pivot];} R rhs() {return range[pivot..$];} bool found() {return pivot < range.length;} } partition!string indexOf(string haystack, dchar needle); usage: string s = str.find("hi").rhs; // or .lhs or .found or .pivot
 Maybe a struct would work fine too. But I don't like it, because the  
 programmer had to look up the struct members first. He had to memorize  
 the struct members, and couldn't tell what the function returns just by  
 looking at the function signature.

If this were implemented, the return type would be very common. At some point you have to look up everything (what's a "range"?). -Steve
May 15 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
downs wrote:
 Consider this type:
 
 struct StringPosition {
   size_t pos;
   void opImplicitCast(out size_t sz) {
     sz = pos;
   }
   void opImplicitCast(out bool b) {
     b = pos != -1;
   }
 }
 
 Wouldn't that effectively sidestep most problems people have with find
returning -1?
 
 Or am I missing something?
 
 Of course, this would require a way to resolve ambiguities, i.e.
functions/statements with preferences - for instance, if() would "prefer" bool
over int. I don't know if this is possible.

Just use two functions: find and contains.
May 15 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Christopher Wright:
 Just use two functions: find and contains.

Or better, define a built in operator, you may call it "in" :-) 'e' in "hello" => true (The compiler may even cache the resulting position somewhere, so a successive find can be very fast). Bye, bearophile
May 15 2009