www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - One way to deal with transient ranges & char[] buffers

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
Recently, I discovered an interesting idiom for dealing with transient
ranges, esp. w.r.t. strings / char[]. Many places in D require string,
but sometimes what you have is char[] which can't be converted to string
except by .idup. But you don't want to .idup in generic code, because if
the input's already a string, then it's wasteful duplication. The
solution is to do this:

	void func(S)(S input) if (isSomeString!S) {
		string x = to!string(input);
		... // use at will
	}

The nice thing about this is that if input is already a string,
to!string does nothing, and if it's char[] or const(char)[], to!()
handles calling .idup for you so you don't have to pollute generic code
with it.

When writing a generic function that takes a range of some string, and
you're not sure if the range is transient or not, the same trick helps
ensure that you don't run into transience-related problems:

	auto func(R)(R range) if (isSomeString!(ElementType!S)) {
		struct WrapperRange {
			...
			auto front() {
				// This ensures we don't run into
				// transience related problems, and that
				// the range we return will *not* be
				// transient.
				return range.front.to!string();
			}
		}
		return WrapperRange(...);
	}


T

-- 
Freedom: (n.) Man's self-given right to be enslaved by his own depravity.
Aug 01 2013
next sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:
 Recently, I discovered an interesting idiom for dealing with 
 transient
 ranges, esp. w.r.t. strings / char[]. Many places in D require 
 string,
 but sometimes what you have is char[] which can't be converted 
 to string
 except by .idup. But you don't want to .idup in generic code, 
 because if
 the input's already a string, then it's wasteful duplication.

Places in D that require `string` either do so because they need the immutable guarantee or they do so out of error (e.g. should have used a string of const characters instead). The latter can of course be worked around, but the only *solution* involves fixing the upstream code, so I'll assume we're discussing the former case. We don't have any generic mechanism for deep copying ranges. The `save` primitive is often implemented by means of copying, but conceptually is doing something very different, so it cannot be applied here. So, I don't see how your idea translates to ranges in general (not completely sure if it was intended to). Thus, let's tackle the case of arrays/slices in particular, of which strings are the most common example. There is a precedent in D to push to the decision to copy an array upwards in the code. When the operations at hand require the immutable guarantee, state it in the interface of the code, such as by asking for `string` on a function's parameter. That's why so many functions take `string` when they need the immutable guarantee, as opposed to `const(char)[]` or a template parameter, followed by a GC copy operation. This way, copies are not only minimized, but centralized more in user code where they are more visible, and the method of making the copy - remember, not all client code is fine with rampant GC use - is also pushed up. Also, copies are one thing, but what if the caller had a string but in a different encoding? Not only does an allocation have to be made, but decoding and encoding is also necessary; the details of how to handle this are also pushed up, with the same benefits. It's a pretty mainstream idiom and is often reiterated by members of the community, such as in Ali's talk at dconf. Your proposed solution only shares one benefit with the solution described above - that if the direct caller had a `string` already (or a range of `string`s), nothing has to be done. It forfeits all the other benefits for convenience. It also has problems with template bloat, which can be fixed but at a syntactical cost. Overall I think it reduces the genericity of algorithms by trying to handle input types it doesn't actually support, which can be a big problem for performance-critical code.
Aug 02 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:
 	void func(S)(S input) if (isSomeString!S) {
 		string x = to!string(input);
 		... // use at will
 	}

+1. I saw this used recently, and I find it very clever.
Aug 02 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 2 August 2013 at 07:50:28 UTC, Jakob Ovrum wrote:
 Places in D that require `string` either do so because they 
 need the immutable guarantee or they do so out of error (e.g. 
 should have used a string of const characters instead). The 
 latter can of course be worked around, but the only *solution* 
 involves fixing the upstream code, so I'll assume we're 
 discussing the former case.

One of the problems is often the return type. For example, readln will return a string, because it is simply more convenient in end user code. The "standard" string format in D is "string", so that's what it returns. However, if you call readln!(char[]), then you'll the *exact* same string, but of a non-immutable type. This means that you *can* get the best out of both worlds, at no extra run-time cost. I think more functions should do this. Without doing this, you face the eternal problem: Should I return a "string", to give my end user more guarantees, when in fact my char array is perfectly mutable, or should I return a char[], forcing my end user to make an idup(or an unsafe cast) if he actually needed a string? It's a tough problem to tackle.
Aug 02 2013
prev sibling next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
On Friday, 2 August 2013 at 08:46:18 UTC, monarch_dodra wrote:
 Without doing this, you face the eternal problem: Should I 
 return a "string", to give my end user more guarantees, when in 
 fact my char array is perfectly mutable, or should I return a 
 char[], forcing my end user to make an idup(or an unsafe cast) 
 if he actually needed a string?

 It's a tough problem to tackle.

The solution has been posted to this newsgroup already in form of unique/unaliased types. Thats THE selling point for them. If you have an array of char and it could be char[] as well as string, than let the type system infer the right one for you.
Aug 02 2013
prev sibling next sibling parent "monarch_dodra" <monarchdodra gmail.com> writes:
On Friday, 2 August 2013 at 11:19:05 UTC, Tobias Pankrath wrote:
 On Friday, 2 August 2013 at 08:46:18 UTC, monarch_dodra wrote:
 Without doing this, you face the eternal problem: Should I 
 return a "string", to give my end user more guarantees, when 
 in fact my char array is perfectly mutable, or should I return 
 a char[], forcing my end user to make an idup(or an unsafe 
 cast) if he actually needed a string?

 It's a tough problem to tackle.

The solution has been posted to this newsgroup already in form of unique/unaliased types. Thats THE selling point for them. If you have an array of char and it could be char[] as well as string, than let the type system infer the right one for you.

Interesting. Thanks for the tip. I'll look these up.
Aug 02 2013
prev sibling next sibling parent "Tobias Pankrath" <tobias pankrath.net> writes:
On Friday, 2 August 2013 at 11:27:48 UTC, monarch_dodra wrote:
 On Friday, 2 August 2013 at 11:19:05 UTC, Tobias Pankrath wrote:
 On Friday, 2 August 2013 at 08:46:18 UTC, monarch_dodra wrote:
 Without doing this, you face the eternal problem: Should I 
 return a "string", to give my end user more guarantees, when 
 in fact my char array is perfectly mutable, or should I 
 return a char[], forcing my end user to make an idup(or an 
 unsafe cast) if he actually needed a string?

 It's a tough problem to tackle.

The solution has been posted to this newsgroup already in form of unique/unaliased types. Thats THE selling point for them. If you have an array of char and it could be char[] as well as string, than let the type system infer the right one for you.

Interesting. Thanks for the tip. I'll look these up.

http://www.digitalmars.com/d/archives/digitalmars/D/Immutable_and_unique_in_C_180572.html
Aug 02 2013
prev sibling parent "Jakob Ovrum" <jakobovrum gmail.com> writes:
On Friday, 2 August 2013 at 08:37:50 UTC, monarch_dodra wrote:
 On Friday, 2 August 2013 at 05:35:28 UTC, H. S. Teoh wrote:
 	void func(S)(S input) if (isSomeString!S) {
 		string x = to!string(input);
 		... // use at will
 	}

+1. I saw this used recently, and I find it very clever.

It's really not. It takes the decision to make a GC-allocated copy away from the client code in return for some minor convenience.
Aug 02 2013