www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Our Sister

reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
I've been working on RCStr (endearingly pronounced "Our Sister"), D's 
up-and-coming reference counted string type. The goals are:

* Reference counted, shouldn't leak if all instances destroyed; even if 
not, use the GC as a last-resort reclamation mechanism.

* Entirely  safe.

* Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but also raw 
manipulation and custom encodings via RCStr!ubyte, RCStr!ushort etc.

* Support several views of the same string, e.g. given s of type 
RCStr!char, it can be iterated byte-wise, code point-wise, code 
unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar etc.

* Support const and immutable qualifiers for the character type.

* Work well with const and immutable when they qualify the entire RCStr 
type.

* Fast: use the small string optimization and various other layout and 
algorithms to make it a good choice for high performance strings

RFC: what primitives should RCStr have?


Thanks,

Andrei
May 26 2016
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 I've been working on RCStr (endearingly pronounced "Our Sister")
You really should actually mention RCStr in the subject line so people overwhelmed with the staggering amount of off topic chatter on this forum don't disregard this thread too.
May 26 2016
next sibling parent ixid <adamsibson hotmail.com> writes:
On Thursday, 26 May 2016 at 16:20:37 UTC, Adam D. Ruppe wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
 wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister")
You really should actually mention RCStr in the subject line so people overwhelmed with the staggering amount of off topic chatter on this forum don't disregard this thread too.
To be fair using a forum called 'General' for technical discussion is asking for trouble. We will be able to tell when D actually starts to become popular because this part of the forum will cease to function as it's inundated with newbies who expect it to mean general questions or something similar.
May 26 2016
prev sibling next sibling parent Joakim <dlang joakim.fea.st> writes:
On Thursday, 26 May 2016 at 16:20:37 UTC, Adam D. Ruppe wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
 wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister")
You really should actually mention RCStr in the subject line so people overwhelmed with the staggering amount of off topic chatter on this forum don't disregard this thread too.
Where do you see all this "chatter?" Looking at the topics for the last 10 days, I only see one not about D generally, and it's labeled OT.
May 26 2016
prev sibling next sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, May 26, 2016 16:20:37 Adam D. Ruppe via Digitalmars-d wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu

 wrote:
 I've been working on RCStr (endearingly pronounced "Our Sister")
You really should actually mention RCStr in the subject line so people overwhelmed with the staggering amount of off topic chatter on this forum don't disregard this thread too.
Yeah. I was about to ignore this thread as being clearly OT until I saw that it was started by Andrei. - Jonathan M Davis
May 26 2016
prev sibling parent reply Pete <shootme nospam.us> writes:
I post this only as a warning to others.

Imagine being the kind of person who isn't certain he could 
actually get Hello World past the D compiler  -but (and?) sees 
the subject "Our Sister" and immediately thinks:
"oh, Alexandrescu must be referring to his sister who is a doctor 
and did the art on the book cover".

    ---Welcome to the world of the PL trainspotter.---

Shoot-me shoot-me shoot-me.

It gets worse: I'm at the supermarket the other day, and the guy 
at the checkout has a strong Africaans accent.
I find myself saying to him; "umm if you right now, like 
hypothetically, heard the sound of hooves  -would you think of 
horses or zebras"
No lie. Try working *that* into a brief conversation about 
whether you have a store loyalty card.

Forget  the Star Wars allusion  -think Aliens ...when the Ripley 
character mercifully torches the wretched mutant clones of 
herself.

I was an actual programmer once.

please, somebody ..kill .. me
May 27 2016
parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Friday, 27 May 2016 at 17:08:33 UTC, Pete wrote:
 ...
Please don't derail this conversation. If you have a complaint please make it in a separate thread and tag it OT.
May 27 2016
parent reply Pete <shootme nospam.com> writes:
and tag it OT.<<
read the subject line slowly Jack ..but I appreciate your witty use of the word derail. If anyone calls, Jack and I will be over at stack overflow gleefully closing down the derailers there. On Friday, 27 May 2016 at 17:37:20 UTC, Jack Stouffer wrote:
 On Friday, 27 May 2016 at 17:08:33 UTC, Pete wrote:
 ...
Please don't derail this conversation. If you have a complaint please make it in a separate thread and tag it OT.
May 27 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Friday, 27 May 2016 at 19:35:58 UTC, Pete wrote:
 read the subject line slowly Jack
Sorry about that. I use the web interface and everything is grouped together even if it doesn't have the same subject line, so I didn't see that you changed it.
May 27 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2016 03:35 PM, Pete wrote:
 If anyone calls, Jack and I will be over at stack overflow gleefully
 closing down the derailers there.
Thanks for that. Not sure what your moniker is there, but I noticed a good number of solid answers to D questions on SO. Regarding the title, it was actually making a subtle point: if it's not marked as [OT] it's on topic! :o) Andrei
May 27 2016
prev sibling next sibling parent reply Gary Willoughby <dev nomad.so> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 * Support several views of the same string, e.g. given s of 
 type RCStr!char, it can be iterated byte-wise, code point-wise, 
 code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar 
 etc.
Will s.by!Grapheme be supported too?
May 26 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/26/2016 12:58 PM, Gary Willoughby wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
 * Support several views of the same string, e.g. given s of type
 RCStr!char, it can be iterated byte-wise, code point-wise, code
 unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar etc.
Will s.by!Grapheme be supported too?
Yes. -- Andrei
May 26 2016
prev sibling next sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 * Support const and immutable qualifiers for the character type.
How is that going BTW. Last I heard you were having problems with inout/const.
 * Support several views of the same string, e.g. given s of 
 type RCStr!char, it can be iterated byte-wise, code point-wise, 
 code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar 
 etc.

 [snip]

 RFC: what primitives should RCStr have?
Well, because we already have the standard library functions representation, byUTF, byCodePoint, byCodeUnit, and byGrapheme, I think RCStr should provide these names as methods which all return ranges. If possible, these would all work regardless of character or integer type of the data. So in effect, RCStr would have completely encapsulated data. Let's not make the same mistake that we made with string et al. by providing a default. If at all possible, it would be great if it was also an output range.
 RCStr
*bikeshedding*: How about RCString, because the convention for D names is to be explicit most of the time.
May 26 2016
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 26 May 2016 at 17:32:33 UTC, Jack Stouffer wrote:
 Well, because we already have the standard library functions 
 representation, byUTF
That would be templated so like byUTF!char and byUTF!wchar right? Then byCodePoint can just be another name for byUTF!dchar. I kinda like that. Ideally, the string type would also use lazy imports for any conversion table. So if you never call byGrapheme, it never imports the std.uni tables. (Heck, std.uni could be the one to provide that type, of course.) Would an RCStr pass isSomeString? I kinda think it shouldn't. Actually, isSomeString probably shouldn't often be used - instead checking for string-like range capabilities is likely better for algorithms. Then doing some_algorithm(my_rcstr) fails - you must do some_algorithm(my_rcstr.some_range)
May 26 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Thursday, 26 May 2016 at 17:50:36 UTC, Adam D. Ruppe wrote:
 That would be templated so like byUTF!char and byUTF!wchar 
 right?

 Then byCodePoint can just be another name for byUTF!dchar. I 
 kinda like that.

 Ideally, the string type would also use lazy imports for any 
 conversion table. So if you never call byGrapheme, it never 
 imports the std.uni tables. (Heck, std.uni could be the one to 
 provide that type, of course.)
This has the added benefit that it would automatically work with a lot of generic code that uses those functions.
 Would an RCStr pass isSomeString? I kinda think it shouldn't.
I agree, it shouldn't. isSomeString should only test for one of the language provided string types.
May 26 2016
prev sibling parent Jonathan M Davis via Digitalmars-d <digitalmars-d puremagic.com> writes:
On Thursday, May 26, 2016 17:50:36 Adam D. Ruppe via Digitalmars-d wrote:
 Would an RCStr pass isSomeString? I kinda think it shouldn't.
 Actually, isSomeString probably shouldn't often be used - instead
 checking for string-like range capabilities is likely better for
 algorithms. Then doing some_algorithm(my_rcstr) fails - you must
 do some_algorithm(my_rcstr.some_range)
RCStr definitely should _not_ pass isSomeString. Those traits specifically work only for the built-in types and not for stuff that acts like them. It's a disaster waiting to happen otherwise. We need to distinguish between testing for something that is a string and something that acts like one. - Jonathan M Davis
May 26 2016
prev sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 26 May 2016 at 17:32:33 UTC, Jack Stouffer wrote:
 *bikeshedding*: How about RCString, because the convention for 
 D names is to be explicit most of the time.
+1
May 27 2016
prev sibling next sibling parent reply Xinok <xinok live.com> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister"), D's up-and-coming reference counted string type. The 
 goals are:
 ...
I don't know how practical this would be, but if at all feasible, I think one of the goals should be to have a common interface/primitives with regular strings so we can write generic functions which accept both native strings and RCStr. Otherwise, I second Jack's points.
 * Reference counted, shouldn't leak if all instances destroyed; 
 even if not, use the GC as a last-resort reclamation mechanism.
Could you (or somebody) elaborate a little on how this could work from a technical standpoint? The only way I see this working is if the GC always scans for RCStr-allocated memory, in which case, why even bother with RC?
May 26 2016
parent reply Seb <seb wilzba.ch> writes:
On Thursday, 26 May 2016 at 17:45:15 UTC, Xinok wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
 wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister"), D's up-and-coming reference counted string type. The 
 goals are:
 ...
I don't know how practical this would be, but if at all feasible, I think one of the goals should be to have a common interface/primitives with regular strings so we can write generic functions which accept both native strings and RCStr.
Great news! I think one can't stress this enough: If you want RCStr to be adapted it has to be a drop-in replacement for string. Maybe we can bundle the transition from auto-decoding with the adaption to a RCString. There was the proposal of having String without auto-decoding for this migration.
May 26 2016
next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 26 May 2016 at 18:44:42 UTC, Seb wrote:
 Great news!
 I think one can't stress this enough: If you want RCStr to be 
 adapted it has to be a drop-in replacement for string.

 Maybe we can bundle the transition from auto-decoding with the 
 adaption to a RCString. There was the proposal of having String 
 without auto-decoding for this migration.
I like these ideas (and RCString over RCStr).
May 26 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/26/2016 02:44 PM, Seb wrote:
 If you want RCStr to be adapted it has to be a drop-in replacement for
 string.
With all the criticism leveled against string, I thought more of the opposite. This is an opportunity to get it right. -- Andrei
May 26 2016
next sibling parent reply jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 26 May 2016 at 20:24:10 UTC, Andrei Alexandrescu 
wrote:
 On 05/26/2016 02:44 PM, Seb wrote:
 If you want RCStr to be adapted it has to be a drop-in 
 replacement for
 string.
With all the criticism leveled against string, I thought more of the opposite. This is an opportunity to get it right. -- Andrei
Hmm, I think it would be better to be right than necessarily a drop-in. I think the idea is so that you could change alias string = immutable(char)[]; to something using RCString and there would be minimal breakages.
May 26 2016
parent reply Seb <seb wilzba.ch> writes:
On Thursday, 26 May 2016 at 21:42:31 UTC, jmh530 wrote:
 On Thursday, 26 May 2016 at 20:24:10 UTC, Andrei Alexandrescu 
 wrote:
 On 05/26/2016 02:44 PM, Seb wrote:
 If you want RCStr to be adapted it has to be a drop-in 
 replacement for
 string.
With all the criticism leveled against string, I thought more of the opposite. This is an opportunity to get it right. -- Andrei
Hmm, I think it would be better to be right than necessarily a drop-in. I think the idea is so that you could change alias string = immutable(char)[]; to something using RCString and there would be minimal breakages.
Oh yes that's what I meant. Sorry for being so confusing. __Right__ is way more important than breakages. For that we have `dfix`.
May 26 2016
parent Dicebot <public dicebot.lv> writes:
On 05/27/2016 01:17 AM, Seb wrote:
 Oh yes that's what I meant. Sorry for being so confusing.
 __Right__ is way more important than breakages. For that we have `dfix`.
Don't get overly excited. dfix will never be capable of automatic fixup with such deep levels of semantic analysis required, this can only be done by compiler itself (which is currently not designed for fixup kind of tasks).
May 29 2016
prev sibling parent "H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:
On Thu, May 26, 2016 at 04:24:10PM -0400, Andrei Alexandrescu via Digitalmars-d
wrote:
 On 05/26/2016 02:44 PM, Seb wrote:
 If you want RCStr to be adapted it has to be a drop-in replacement
 for string.
With all the criticism leveled against string, I thought more of the opposite. This is an opportunity to get it right. -- Andrei
I'm not sure what criticism you're referring to. The only one I can think of is autodecoding, which isn't really an inherent part of string being immutable(char)[], which I think is a fine idea. T -- The most powerful one-line C program: #include "/dev/tty" -- IOCCC
May 26 2016
prev sibling next sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 RFC: what primitives should RCStr have?
Having a "null" state which is distinguishable from an empty string.
May 26 2016
prev sibling next sibling parent reply Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister"), D's up-and-coming reference counted string type. The 
 goals are:

 * Reference counted, shouldn't leak if all instances destroyed; 
 even if not, use the GC as a last-resort reclamation mechanism.

 * Entirely  safe.

 * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but 
 also raw manipulation and custom encodings via RCStr!ubyte, 
 RCStr!ushort etc.

 * Support several views of the same string, e.g. given s of 
 type RCStr!char, it can be iterated byte-wise, code point-wise, 
 code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar 
 etc.

 * Support const and immutable qualifiers for the character type.

 * Work well with const and immutable when they qualify the 
 entire RCStr type.

 * Fast: use the small string optimization and various other 
 layout and algorithms to make it a good choice for high 
 performance strings
Interesting! I few noob questions first: * Would it support implicit sharing (copy-on-write)? What about sub-strings? * Will concatenations be fast? * Would this have value for compile time string operations, mixin's, etc.?
 RFC: what primitives should RCStr have?
String may have a few that are worth supporting: http://doc.qt.io/qt-5/qstring.html Bastiaan.
May 26 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/26/2016 04:32 PM, Bastiaan Veelo wrote:
 * Would it support implicit sharing (copy-on-write)? What about
 sub-strings?
Yes, COW. Substrings will be managed COW-ish as well (no copy upon substring extraction).
 * Will concatenations be fast?
No, it will copy (i.e. no multiple segments management). It will be of course optimized as much as we can.
 * Would this have value for compile time string operations, mixin's, etc.?
Not planned.
 RFC: what primitives should RCStr have?
String may have a few that are worth supporting: http://doc.qt.io/qt-5/qstring.html
Good list. Thanks! Andrei
May 26 2016
prev sibling next sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 * Fast: use the small string optimization and various other 
 layout and algorithms to make it a good choice for high 
 performance strings
For inspiration see: - Vladimir recommends `tempCString` - Nikolay has https://bitbucket.org/sibnick/inplacearray.git Original thread: https://forum.dlang.org/post/msrlumbobhpuljvhwrlh forum.dlang.org
May 27 2016
prev sibling next sibling parent reply Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 RFC: what primitives should RCStr have?
It should _safely_ convert to `const(char)[]`.
May 27 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/27/16 7:07 AM, Marc Schütz wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
 RFC: what primitives should RCStr have?
It should _safely_ convert to `const(char)[]`.
That is not possible, sorry. -- Andrei
May 27 2016
next sibling parent reply Era Scarecrow <rtcvb32 yahoo.com> writes:
On Friday, 27 May 2016 at 13:32:30 UTC, Andrei Alexandrescu wrote:
 On 5/27/16 7:07 AM, Marc Schütz wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
 wrote:
 RFC: what primitives should RCStr have?
It should _safely_ convert to `const(char)[]`.
That is not possible, sorry. -- Andrei
I wonder if it could... For a while now I've wondered why there isn't an option to include flags to every type (for debugging)? The flags could relay a lot of information, like if a variable was originally immutable, const, shared, other? If it was originally allocated using the GC, malloc, C/C++/Other or stack. If it used a constructor, init, or not at all (= void)? Along with control options like where/when an assignment tries to happen, copies it's state (or it's variables with indirection), or printing an output each time it changes, etc. With the current state of things, I'll just take your word on it.
May 27 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2016 05:02 PM, Era Scarecrow wrote:
   With the current state of things, I'll just take your word on it.
Reasoning is simple - yes we could safely convert to const(char)[] but that means effectively all refcounting is lost for that string. So we can convert but in an explicit manner, e.g. str.toGCThisWillCompletelySuckMan. -- Andrei
May 27 2016
next sibling parent reply Seb <seb wilzba.ch> writes:
On Friday, 27 May 2016 at 21:25:50 UTC, Andrei Alexandrescu wrote:
 On 05/27/2016 05:02 PM, Era Scarecrow wrote:
   With the current state of things, I'll just take your word 
 on it.
Reasoning is simple - yes we could safely convert to const(char)[] but that means effectively all refcounting is lost for that string. So we can convert but in an explicit manner, e.g. str.toGCThisWillCompletelySuckMan. -- Andrei
not if [] would be ref-counted too ;-)
May 27 2016
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 27 May 2016 at 21:51:59 UTC, Seb wrote:
 not if [] would be ref-counted too ;-)
That would be kinda horrible. Right now, slicing is virtually free and compatible with all kinds of backing schemes. If it became refcounted, it'd: 1) have to keep a pointer to the refcount structure with the slice, adding memory cost 2) make assignments and slicing work through that refcount pointer, adding cpu cost 3) somehow need to know the appropriate freeing strategy, adding some kind of indirect call when refcount = 0, and would make creating a slice more tedious as you'd need to know this (meaning you also probably need to allocate this structure! no more free ptr[0 .. length] operation on malloc'd blocks.) So I'd be pretty strongly against that.
May 27 2016
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 28 May 2016 at 10:16, Adam D. Ruppe via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On Friday, 27 May 2016 at 21:51:59 UTC, Seb wrote:
 not if [] would be ref-counted too ;-)
That would be kinda horrible. Right now, slicing is virtually free and compatible with all kinds of backing schemes. If it became refcounted, it'd: 1) have to keep a pointer to the refcount structure with the slice, adding memory cost
This is only true for the owner. If we had 'scope', or something like it (ie, borrowing in rust lingo), then the fat slice wouldn't need to be passed around, it's only a burden on the top-level owner. 'scope' is consistently rejected, but it solves so many long-standing problems we have, and this reduction of 'fat'(/rc)-slices to normal slices is a particularly important one.
May 27 2016
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Saturday, 28 May 2016 at 04:15:45 UTC, Manu wrote:
 This is only true for the owner. If we had 'scope', or 
 something like
 it (ie, borrowing in rust lingo), then the fat slice wouldn't 
 need to
 be passed around
Right, I agree - if we keep the slice just the way it is now, it all still works if you borrow correctly! (BTW, I don't think we even need this to be strictly safe, though it would be nice if it was tested, we could say system getSlice and potentially change it to safe later.)
May 28 2016
prev sibling parent reply Marco Leise <Marco.Leise gmx.de> writes:
Am Sat, 28 May 2016 14:15:45 +1000
schrieb Manu via Digitalmars-d <digitalmars-d puremagic.com>:

 On 28 May 2016 at 10:16, Adam D. Ruppe via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Friday, 27 May 2016 at 21:51:59 UTC, Seb wrote:  
 not if [] would be ref-counted too ;-)  
That would be kinda horrible. Right now, slicing is virtually free and compatible with all kinds of backing schemes. If it became refcounted, it'd: 1) have to keep a pointer to the refcount structure with the slice, adding memory cost
This is only true for the owner. If we had 'scope', or something like it (ie, borrowing in rust lingo), then the fat slice wouldn't need to be passed around, it's only a burden on the top-level owner. 'scope' is consistently rejected, but it solves so many long-standing problems we have, and this reduction of 'fat'(/rc)-slices to normal slices is a particularly important one.
I second that thought. But I'd be ok with an unsafe slice and making sure myself, that I don't keep a reference around. A lot of functions only borrow data and can work on a naked pointer/ref/slice, while the owner(s) have the smart pointer. These can of course be converted to templates taking either char[] or RCStr, but I think borrowing is cleaner when the function in question doesn't care a bag of beans if the chars it works on were allocated on the GC heap or reference counted. -- Marco
May 30 2016
parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 31 May 2016 at 01:00, Marco Leise via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 Am Sat, 28 May 2016 14:15:45 +1000
 schrieb Manu via Digitalmars-d <digitalmars-d puremagic.com>:

 On 28 May 2016 at 10:16, Adam D. Ruppe via Digitalmars-d
 <digitalmars-d puremagic.com> wrote:
 On Friday, 27 May 2016 at 21:51:59 UTC, Seb wrote:
 not if [] would be ref-counted too ;-)
That would be kinda horrible. Right now, slicing is virtually free and compatible with all kinds of backing schemes. If it became refcounted, it'd: 1) have to keep a pointer to the refcount structure with the slice, adding memory cost
This is only true for the owner. If we had 'scope', or something like it (ie, borrowing in rust lingo), then the fat slice wouldn't need to be passed around, it's only a burden on the top-level owner. 'scope' is consistently rejected, but it solves so many long-standing problems we have, and this reduction of 'fat'(/rc)-slices to normal slices is a particularly important one.
I second that thought. But I'd be ok with an unsafe slice and making sure myself, that I don't keep a reference around. A lot of functions only borrow data and can work on a naked pointer/ref/slice, while the owner(s) have the smart pointer. These can of course be converted to templates taking either char[] or RCStr, but I think borrowing is cleaner when the function in question doesn't care a bag of beans if the chars it works on were allocated on the GC heap or reference counted.
D loves templates, but templates aren't a given. Closed-source projects often can't have templates in the public API (ie, source should not be available), and this is my world.
May 31 2016
parent Marco Leise <Marco.Leise gmx.de> writes:
Am Wed, 1 Jun 2016 01:06:36 +1000
schrieb Manu via Digitalmars-d <digitalmars-d puremagic.com>:

 D loves templates, but templates aren't a given. Closed-source
 projects often can't have templates in the public API (ie, source
 should not be available), and this is my world.
Same effect for GPL code. Funny. (Template instantiations are like statically linking in the open source code.) -- Marco
May 31 2016
prev sibling next sibling parent reply tsbockman <thomas.bockman gmail.com> writes:
On Friday, 27 May 2016 at 21:25:50 UTC, Andrei Alexandrescu wrote:
 On 05/27/2016 05:02 PM, Era Scarecrow wrote:
   With the current state of things, I'll just take your word 
 on it.
Reasoning is simple - yes we could safely convert to const(char)[] but that means effectively all refcounting is lost for that string. So we can convert but in an explicit manner, e.g. str.toGCThisWillCompletelySuckMan. -- Andrei
But conversions to scope const(char)[] could be made safe, right? (If scope were ever fully implemented, that is.)
May 27 2016
next sibling parent Adam D. Ruppe <destructionator gmail.com> writes:
On Friday, 27 May 2016 at 22:09:48 UTC, tsbockman wrote:
 But conversions to scope const(char)[] could be made safe, 
 right? (If scope were ever fully implemented, that is.)
Indeed, and I really think we should spend more effort on making this work. Not as much as Rust spends on it, but a lil more than our current return ref dip.
May 27 2016
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 05/27/2016 06:09 PM, tsbockman wrote:
 On Friday, 27 May 2016 at 21:25:50 UTC, Andrei Alexandrescu wrote:
 On 05/27/2016 05:02 PM, Era Scarecrow wrote:
   With the current state of things, I'll just take your word on it.
Reasoning is simple - yes we could safely convert to const(char)[] but that means effectively all refcounting is lost for that string. So we can convert but in an explicit manner, e.g. str.toGCThisWillCompletelySuckMan. -- Andrei
But conversions to scope const(char)[] could be made safe, right? (If scope were ever fully implemented, that is.)
Yah, in principle. -- Andrei
May 27 2016
prev sibling parent Nick Treleaven <ntrel-pub mybtinternet.com> writes:
On Friday, 27 May 2016 at 21:25:50 UTC, Andrei Alexandrescu wrote:
 On 05/27/2016 05:02 PM, Era Scarecrow wrote:
   With the current state of things, I'll just take your word 
 on it.
Reasoning is simple - yes we could safely convert to const(char)[] but that means effectively all refcounting is lost for that string. So we can convert but in an explicit manner, e.g. str.toGCThisWillCompletelySuckMan. -- Andrei
We could have: const(char)[] s = rcstr.stealSlice; Which is null* if the refcount is > 1. rcstr would then be empty on success. In fact if with the RC DIP we guarantee the memory doesn't escape, stealSlice could return string. *Or better, return an Option.
May 31 2016
prev sibling next sibling parent reply Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 27 May 2016 at 23:32, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 On 5/27/16 7:07 AM, Marc Schütz wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu wrote:
 RFC: what primitives should RCStr have?
It should _safely_ convert to `const(char)[]`.
That is not possible, sorry. -- Andrei
It should safely convert to 'scope const(char)[]', then we only need a fat-slice or like at the very top of the callstack...
May 27 2016
parent Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Saturday, 28 May 2016 at 04:28:16 UTC, Manu wrote:
 On 27 May 2016 at 23:32, Andrei Alexandrescu via Digitalmars-d 
 <digitalmars-d puremagic.com> wrote:
 On 5/27/16 7:07 AM, Marc Schütz wrote:
 It should _safely_ convert to `const(char)[]`.
That is not possible, sorry. -- Andrei
It should safely convert to 'scope const(char)[]', then we only need a fat-slice or like at the very top of the callstack...
I didn't want to mention the s-word ;-)
May 28 2016
prev sibling parent Marc =?UTF-8?B?U2Now7x0eg==?= <schuetzm gmx.net> writes:
On Friday, 27 May 2016 at 13:32:30 UTC, Andrei Alexandrescu wrote:
 On 5/27/16 7:07 AM, Marc Schütz wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
 wrote:
 RFC: what primitives should RCStr have?
It should _safely_ convert to `const(char)[]`.
That is not possible, sorry. -- Andrei
It is when DIP25 [1] is finally fully implemented (by that I mean including for slices and pointers etc., Walter told me at Dconf that this is going to happen), and the problem with aliasing references is solved (which needs to happen anyway for any reference counting to be safe). [1] https://wiki.dlang.org/DIP25
May 28 2016
prev sibling next sibling parent Manu via Digitalmars-d <digitalmars-d puremagic.com> writes:
On 27 May 2016 at 02:11, Andrei Alexandrescu via Digitalmars-d
<digitalmars-d puremagic.com> wrote:
 I've been working on RCStr (endearingly pronounced "Our Sister"),
Ah, I totally skipped over this thread... Wow... this really doesn't work in any accent I'm close to, but I can hear it if I imagine you saying it ;) If I said RCStr, it sounds like 'are'-'see'-strrr, but 'our sister' would be 'hour'-sistə... isn't it strange that word recognition seems to work pretty much reliably down a sliding scale until an arbitrary point where it just drops off. There's not a lot of fuzzy area in the middle.
May 27 2016
prev sibling parent reply ZombineDev <petar.p.kirov gmail.com> writes:
On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister"), D's up-and-coming reference counted string type. The 
 goals are:
<Slightly off-topic> RCStr may be an easier first step, but I think generic dynamic arrays are more interesting, because are more generally applicable and user types like move-only resources make them a more challenging problem to solve. BTW, what happened to scope? Generally speaking, I'm not a fan of Rust, and I know that you think that D needs to differentiate, but I like their borrowing model for several reasons: a) while not 100% safe and quite verbose, it offers enough improvements over safe D to make it a worthwhile upgrade, if you don't care about any other language features b) it's not that hard to grasp / almost natural for people familiar with C++11's copy (shared_ptr) and move (unique_ptr) semantics. 3) it's general enough that it can be applied to areas like iterator invalidation, thread synchronization and other logic bugs, like some third-party rust packages demonstrate. I think that improving escape analysis with the scope attribute can go along way to shortening the gap between Rust and D in that area. The other elephant(s) in the room are nested contexts like delegates, nested structs and some alias template parameter arguments. These are especially bad because the user has zero control over those GC allocations. Which makes some of D's key features unusable in nogc contexts. <End off-topic>
 * Reference counted, shouldn't leak if all instances destroyed; 
 even if not, use the GC as a last-resort reclamation mechanism.

 * Entirely  safe.

 * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. but 
 also raw manipulation and custom encodings via RCStr!ubyte, 
 RCStr!ushort etc.

 * Support several views of the same string, e.g. given s of 
 type RCStr!char, it can be iterated byte-wise, code point-wise, 
 code unit-wise etc. by using s.by!ubyte, s.by!char, s.by!dchar 
 etc.

 * Support const and immutable qualifiers for the character type.

 * Work well with const and immutable when they qualify the 
 entire RCStr type.

 * Fast: use the small string optimization and various other 
 layout and algorithms to make it a good choice for high 
 performance strings

 RFC: what primitives should RCStr have?


 Thanks,

 Andrei
0) (Prerequisite) Composition/interaction with language features/user types - RCStr in nested contexts (alias template parameters, delegates, nested structs/classes), array of RCStr-s, RCStr as a struct/class member, RCStr passed as (const) ref parameter, etc. should correctly increase/decrease ref count. This is also a prerequisite for safe RefCounted!T. Action item: related compiler bugs should be prioritized. E.g. the RAII bug from Shachar Shemesh's lightning talk - http://forum.dlang.org/post/n8algm$qra$1 digitalmars.com. See also: https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631 https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632 (not everything in those lists is related but there are some nasty ones, like bad RVO codegen). 1) Safe slicing 2) shared overloads of member functions (e.g. for stuff like atomic incRef/decRef) 3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char) 4) (Optional) Reserving (pre-allocating capacity) / shrinking. I labeled this feature request as optional, as it's not clear if RCStr is more like a container, or more like a slice/range. 5) Some sort of optimization for zero-terminated strings. Quite often one needs to interact with C APIs, which requires calling toStringz / toUTFz, which causes unnecessary allocations. It would be great if RCStr could efficiently handle this scenario. 6) !!! Not really a primitive, but we need to make sure that applying a chain of range transformations won't break ownership (e.g. leak or free prematurely). 7) Should be able to replace GC usage in transient ranges like e.g. File.byLine 8) Cheap initialization/assignment from string literals - should be roughly the same as either initializing a static character array (if the small string optimization is used) or just making it point to read-only memory in the data segment of the executable. It shouldn't try to write or free such memory. When initialized from a string literal, RCStr should also offer a null-terminating byte, provided that it points to the whole If one wants to assign a string literal by overwriting parts of the already allocated storage, std.algorithm.mutation.copy should be used instead. There may be other important primitives which I haven't thought of, but generally we should try to leverage std.algorithm, std.range, std.string and std.uni for them, via UFCS. ---------- On a related note, I know that you want to use AffixAllocator for reference counting, and I think it's a great idea. I have one question, which wasn't answered during that discussion: // Use a nightly build to compile import core.thread : Thread, thread_joinAll; import std.range : iota; import std.experimental.allocator : makeArray; import std.experimental.allocator.building_blocks.region : InSituRegion; import std.experimental.allocator.building_blocks.affix_allocator : AffixAllocator; AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator; static assert (tlsAllocator.sizeof >= 4096); import std.stdio; void main() { shared(int)[] myArray; foreach (i; 0 .. 100) { new Thread( { if (i != 0) return; myArray = tlsAllocator.makeArray!(shared int)(100.iota); static assert(is(typeof(&tlsAllocator.prefix(myArray)) == shared(uint)*)); writefln("At %x: %s", myArray.ptr, myArray); }).start(); thread_joinAll(); } writeln(myArray); // prints garbage!!! } So my question is: should it be possible to share thread-local data like this? IMO, the current allocator design opens a serious hole in the type system, because it allows using data allocated from another thread's thread-local storage. After the other thread exits, accessing memory allocated from it's TLS should not be possible, but https://github.com/dlang/phobos/pull/3991 clearly allows that. One should be able to allocate shared memory only from shared allocators. And shared allocators must backed by shared parent allocators or shared underlying storage. In this case the Region allocator should be shared, and must be backed by shared memory, Mallocator, or something in that vein.
May 28 2016
parent ZombineDev <petar.p.kirov gmail.com> writes:
On Saturday, 28 May 2016 at 09:43:41 UTC, ZombineDev wrote:
 On Thursday, 26 May 2016 at 16:11:22 UTC, Andrei Alexandrescu 
 wrote:
 I've been working on RCStr (endearingly pronounced "Our 
 Sister"), D's up-and-coming reference counted string type. The 
 goals are:
<Slightly off-topic> RCStr may be an easier first step, but I think generic dynamic arrays are more interesting, because are more generally applicable and user types like move-only resources make them a more challenging problem to solve. BTW, what happened to scope? Generally speaking, I'm not a fan of Rust, and I know that you think that D needs to differentiate, but I like their borrowing model for several reasons: a) while not 100% safe and quite verbose, it offers enough improvements over safe D to make it a worthwhile upgrade, if you don't care about any other language features b) it's not that hard to grasp / almost natural for people familiar with C++11's copy (shared_ptr) and move (unique_ptr) semantics. 3) it's general enough that it can be applied to areas like iterator invalidation, thread synchronization and other logic bugs, like some third-party rust packages demonstrate. I think that improving escape analysis with the scope attribute can go along way to shortening the gap between Rust and D in that area. The other elephant(s) in the room are nested contexts like delegates, nested structs and some alias template parameter arguments. These are especially bad because the user has zero control over those GC allocations. Which makes some of D's key features unusable in nogc contexts. <End off-topic>
 * Reference counted, shouldn't leak if all instances 
 destroyed; even if not, use the GC as a last-resort 
 reclamation mechanism.

 * Entirely  safe.

 * Support UTF 100% by means of RCStr!char, RCStr!wchar etc. 
 but also raw manipulation and custom encodings via 
 RCStr!ubyte, RCStr!ushort etc.

 * Support several views of the same string, e.g. given s of 
 type RCStr!char, it can be iterated byte-wise, code 
 point-wise, code unit-wise etc. by using s.by!ubyte, 
 s.by!char, s.by!dchar etc.

 * Support const and immutable qualifiers for the character 
 type.

 * Work well with const and immutable when they qualify the 
 entire RCStr type.

 * Fast: use the small string optimization and various other 
 layout and algorithms to make it a good choice for high 
 performance strings

 RFC: what primitives should RCStr have?


 Thanks,

 Andrei
0) (Prerequisite) Composition/interaction with language features/user types - RCStr in nested contexts (alias template parameters, delegates, nested structs/classes), array of RCStr-s, RCStr as a struct/class member, RCStr passed as (const) ref parameter, etc. should correctly increase/decrease ref count. This is also a prerequisite for safe RefCounted!T. Action item: related compiler bugs should be prioritized. E.g. the RAII bug from Shachar Shemesh's lightning talk - http://forum.dlang.org/post/n8algm$qra$1 digitalmars.com. See also: https://issues.dlang.org/buglist.cgi?quicksearch=raii&list_id=208631 https://issues.dlang.org/buglist.cgi?quicksearch=destructor&list_id=208632 (not everything in those lists is related but there are some nasty ones, like bad RVO codegen). 1) Safe slicing 2) shared overloads of member functions (e.g. for stuff like atomic incRef/decRef) 3) Concatenation (RCStr ~= RCStr ~ RCStr ~ char) 4) (Optional) Reserving (pre-allocating capacity) / shrinking. I labeled this feature request as optional, as it's not clear if RCStr is more like a container, or more like a slice/range. 5) Some sort of optimization for zero-terminated strings. Quite often one needs to interact with C APIs, which requires calling toStringz / toUTFz, which causes unnecessary allocations. It would be great if RCStr could efficiently handle this scenario. 6) !!! Not really a primitive, but we need to make sure that applying a chain of range transformations won't break ownership (e.g. leak or free prematurely). 7) Should be able to replace GC usage in transient ranges like e.g. File.byLine 8) Cheap initialization/assignment from string literals - should be roughly the same as either initializing a static character array (if the small string optimization is used) or just making it point to read-only memory in the data segment of the executable. It shouldn't try to write or free such memory. When initialized from a string literal, RCStr should also offer a null-terminating byte, provided that it points to the whole If one wants to assign a string literal by overwriting parts of the already allocated storage, std.algorithm.mutation.copy should be used instead. There may be other important primitives which I haven't thought of, but generally we should try to leverage std.algorithm, std.range, std.string and std.uni for them, via UFCS. ---------- On a related note, I know that you want to use AffixAllocator for reference counting, and I think it's a great idea. I have one question, which wasn't answered during that discussion: // Use a nightly build to compile import core.thread : Thread, thread_joinAll; import std.range : iota; import std.experimental.allocator : makeArray; import std.experimental.allocator.building_blocks.region : InSituRegion; import std.experimental.allocator.building_blocks.affix_allocator : AffixAllocator; AffixAllocator!(InSituRegion!(4096) , uint) tlsAllocator; static assert (tlsAllocator.sizeof >= 4096); import std.stdio; void main() { shared(int)[] myArray; foreach (i; 0 .. 100) { new Thread( { if (i != 0) return; myArray = tlsAllocator.makeArray!(shared int)(100.iota); static assert(is(typeof(&tlsAllocator.prefix(myArray)) == shared(uint)*)); writefln("At %x: %s", myArray.ptr, myArray); }).start(); thread_joinAll(); } writeln(myArray); // prints garbage!!! } So my question is: should it be possible to share thread-local data like this? IMO, the current allocator design opens a serious hole in the type system, because it allows using data allocated from another thread's thread-local storage. After the other thread exits, accessing memory allocated from it's TLS should not be possible, but https://github.com/dlang/phobos/pull/3991 clearly allows that. One should be able to allocate shared memory only from shared allocators. And shared allocators must backed by shared parent allocators or shared underlying storage. In this case the Region allocator should be shared, and must be backed by shared memory, Mallocator, or something in that vein.
Here's another case where the last change to AffixAllocator is really dangerous: void main() { immutable(int)[] myArray; foreach (i; 0 .. 100) { new Thread( { if (i != 0) return; myArray = tlsAllocator.makeArray!(immutable int)(100.iota); writeln(myArray); // prints [0, ..., 99] }).start(); thread_joinAll(); // prints garbage } writeln(myArray); } In this case it severely violates the promise of immutable.
May 28 2016