digitalmars.D - *Should* char[] be an output range?

digitalmars.D - Should char[] be an output range?

monarch_dodra (19/19) Sep 02 2013 I'm finalizing work on improvements to "put". The main

Jonathan M Davis (20/44) Sep 03 2013 The main problem has to do with what you do when there's not enough room...

monarch_dodra (37/71) Sep 04 2013 Thanks for your reply. Yeah, the debate always boils back down to

"monarch_dodra" <monarchdodra gmail.com> writes:

I'm finalizing work on improvements to "put". The main
improvement is that put will now be able to transcode on the fly,
allowing it to do things such as putting a dchar into char sink.

What this means is that "(const(char)[]){}" is now considered an
output range for dchar. This wasn't the case before, and a source
of bugs in functions like formattedWrite: They didn't get much
visibility, since Appender (the sink of choice for tests) can do
it natively. But passing a delegate char sink to formattedWrite
often ended in error.

In any case, my main question is:

*currently*, "char[]" isn't considered an output range. Shouldn't
it though? The rationale is that it contains dchar elements, and
we don't know how to "put" a dechar in a char[]'s front. But we
do now.

Was this just an implementation restriction? Or is there a real
good reason to not allow it?

In my code, it's a one line tweak to "unlock" char[] as a full
fledged output range for char/wchar/dchar/string/wstring/dstring.
Should I do it?

Sep 02 2013

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Monday, September 02, 2013 10:04:20 monarch_dodra wrote:
 I'm finalizing work on improvements to "put". The main
 improvement is that put will now be able to transcode on the fly,
 allowing it to do things such as putting a dchar into char sink.
 
 What this means is that "(const(char)[]){}" is now considered an
 output range for dchar. This wasn't the case before, and a source
 of bugs in functions like formattedWrite: They didn't get much
 visibility, since Appender (the sink of choice for tests) can do
 it natively. But passing a delegate char sink to formattedWrite
 often ended in error.
 
 In any case, my main question is:
 
 *currently*, "char[]" isn't considered an output range. Shouldn't
 it though? The rationale is that it contains dchar elements, and
 we don't know how to "put" a dechar in a char[]'s front. But we
 do now.
 
 Was this just an implementation restriction? Or is there a real
 good reason to not allow it?
 
 In my code, it's a one line tweak to "unlock" char[] as a full
 fledged output range for char/wchar/dchar/string/wstring/dstring.
 Should I do it?

The main problem has to do with what you do when there's not enough room to 
write to it. Even checking the length isn't enough, because it's unknown ahead 
of time whether a particular code point (or string of characters if calling 
put with multiple characters) will fit - at least not without converting the 
character to UTF-8 first, which would incur additional cost, since presumably 
that would result in it being converted twice (once to check and once when 
actually putting it).

Of course, it's a problem in general with output ranges that we haven't defined 
how to check whether put will succeed or what the normal procedure is when 
it's going to fail. My first suggestion for that would be to make it so that 
put returned whether it was successful or not, but it's something that 
probably needs to be discussed. However, with that problem solved, it may be 
reasonable to make char[] an output range.

But until we sort out some of the remaining details of output ranges (most 
particularly how to deal with put failing, but there are probably other issues 
that I'm not thinking of at the moment), I don't think that it's a good idea 
to change how char[] is treated, since how all of that is sorted out could 
have an effect on what we do with char[].

- Jonathan M Davis

Sep 03 2013

"monarch_dodra" <monarchdodra gmail.com> writes:

On Wednesday, 4 September 2013 at 05:00:58 UTC, Jonathan M Davis 
wrote:
 The main problem has to do with what you do when there's not 
 enough room to
 write to it. Even checking the length isn't enough, because 
 it's unknown ahead
 of time whether a particular code point (or string of 
 characters if calling
 put with multiple characters) will fit - at least not without 
 converting the
 character to UTF-8 first, which would incur additional cost, 
 since presumably
 that would result in it being converted twice (once to check 
 and once when
 actually putting it).

 Of course, it's a problem in general with output ranges that we 
 haven't defined
 how to check whether put will succeed or what the normal 
 procedure is when
 it's going to fail. My first suggestion for that would be to 
 make it so that
 put returned whether it was successful or not, but it's 
 something that
 probably needs to be discussed. However, with that problem 
 solved, it may be
 reasonable to make char[] an output range.

 But until we sort out some of the remaining details of output 
 ranges (most
 particularly how to deal with put failing, but there are 
 probably other issues
 that I'm not thinking of at the moment), I don't think that 
 it's a good idea
 to change how char[] is treated, since how all of that is 
 sorted out could
 have an effect on what we do with char[].

 - Jonathan M Davis

Thanks for your reply. Yeah, the debate always boils back down to 
what "are arrays/input ranges output ranges, and what do we do 
when they are "full" (eg empty), and can we even detect it".

I think that given your answer, I will simply *not* make them 
output ranges, but I *will* make sure that making them as such is 
easy.

FYI (but it might need a little bit more work), I think I may be 
on to something. In my implementation, I defined a private 
function called "doPut". "doPut" is basically the last "atomic" 
operation in the "put" functionality.

Given a sink "s", and an element "e", it calls exactly the 
correct call of :
s.put(e);
s.front = e;
s(e);

It does *not* iterate over "e", if it is a range. It does *not* 
attempt to check [e], and it does *not* transcode "e". What this 
means is that basically, if you write "doPut(s, e)", then you are 
putting *exactly* the single element "e" into "s". It means the s 
is a "native" output range for "e": It doesn't need any help from 
"put".

 From there, I defined the package trait "isNativeOutputRange(S, 
E)". If a pair S/E verify this "Native" variant of isOutputRange, 
then the user has the *guarantee* that "put(s, e)" will place 
*exactly* "e" into "s". This is particularly interesting for:
1. InputRanges: if the range is not empty, the *put* is guarateed 
to not overflow.
2. Certain format functions, such as "std.encoding.encode(C, 
S)(dchar c, S sink)", will transcode c into elements of type C, 
and place them in the output range S. Here, it is *vital* that no 
transcoding happen. My "doPut"/Native trait help guarantee this.

--------

Well, right now, it is only used as private implementation 
detail, but it works pretty good. It might be worth investigating 
in more details if we want this public?

Sep 04 2013

D Programming

C/C++ Programming

Other

digitalmars.D - Should char[] be an output range?