www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - shuffle a character array

reply celavek <cetatzeanum yahoo.com> writes:
Hi

I'm trying to shuffle a character array but I get some 
compilation errors.

*
char[] upper = std.ascii.uppercase.dup;
randomShuffle!(typeof(upper))(upper);

randomShuffle(upper);

example.d(34): Error: template std.random.randomShuffle cannot 
deduce function from argument types !(char[])(char[]), candidates 
are:
/usr/include/dmd/phobos/std/random.d(1822):        
std.random.randomShuffle(Range, RandomGen)(Range r, ref RandomGen 
gen) if (isRandomAccessRange!Range && isUniformRNG!RandomGen)
/usr/include/dmd/phobos/std/random.d(1829):        
std.random.randomShuffle(Range)(Range r) if 
(isRandomAccessRange!Range)

example.d(34): Error: template std.random.randomShuffle cannot 
deduce function from argument types !()(char[]), candidates are:
/usr/include/dmd/phobos/std/random.d(1822):        
std.random.randomShuffle(Range, RandomGen)(Range r, ref RandomGen 
gen) if (isRandomAccessRange!Range && isUniformRNG!RandomGen)
/usr/include/dmd/phobos/std/random.d(1829):        
std.random.randomShuffle(Range)(Range r) if 
(isRandomAccessRange!Range)
*

I thought that I could use a dynamic array as a range ...
Jul 20 2016
parent reply Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:

 I thought that I could use a dynamic array as a range ...
You can. However, if you take a look at the documentation for std.random.randomShuffle [1], you'll find the following constraint: if (isRandomAccessRange!Range); You can then go to the documentation for std.range.primitives.isRandomAccessRange [2], where you'll find the following: "Although char[] and wchar[] (as well as their qualified versions including string and wstring) are arrays, isRandomAccessRange yields false for them because they use variable-length encodings (UTF-8 and UTF-16 respectively). These types are bidirectional ranges only." If you are absolutely, 100% certain that you are dealing with ASCII, you can do this: ``` import std.string : representation; randomShuffle(charArray.representation); That will give you a ubyte[] for char[] and a ushort[] for wchar[]. [1] https://dlang.org/phobos/std_random.html#.randomShuffle [2] https://dlang.org/phobos/std_range_primitives.html#isRandomAccessRange
Jul 20 2016
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:
 If you are absolutely, 100% certain that you are dealing with 
 ASCII, you can do this:
And I forgot to add: Otherwise, you'll want to convert to dchar[] (probably via std.utf.toUTF32) and pass that along instead.
Jul 20 2016
parent Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 08:05:20 UTC, Mike Parker wrote:
 On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:
 If you are absolutely, 100% certain that you are dealing with 
 ASCII, you can do this:
And I forgot to add: Otherwise, you'll want to convert to dchar[] (probably via std.utf.toUTF32) and pass that along instead.
Actually, std.conv.to might be better, since toUTF32 returns dstring: auto dcharArray = to!(dchar[])(charArray);
Jul 20 2016
prev sibling next sibling parent reply celavek <cetatzeanum yahoo.com> writes:
On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 If you are absolutely, 100% certain that you are dealing with 
 ASCII, you can do this:

 ```
 import std.string : representation;
 randomShuffle(charArray.representation);

 That will give you a ubyte[] for char[] and a ushort[] for 
 wchar[].

 [1] https://dlang.org/phobos/std_random.html#.randomShuffle
 [2] 
 https://dlang.org/phobos/std_range_primitives.html#isRandomAccessRange
Ahhh! That again. I was thinking about using the representation. I should take a deeper look at the documentation. As far as my current understanding goes the shuffle will be done in place. If I use the "representation" would that still hold, that is will I be able to use the same char[] but in the shuffled form? (of course I will test that) Thank you
Jul 20 2016
parent reply Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 08:18:55 UTC, celavek wrote:

 As far as my current understanding goes the shuffle will be 
 done in place.
 If I use the "representation" would that still hold, that is 
 will I be able
 to use the same char[] but in the shuffled form? (of course I 
 will test that)
representation does not allocate any new memory. It points to the same memory, same data. If we think of D arrays as something like this: struct Array(T) { size_t len; T* ptr; } Then representation is doing this: Array original; Array representation(original.len, original.ptr); So, yes, the char data will still be shuffled in place. All you're doing is getting a ubyte view onto it so that it can be treated as a range.
Jul 20 2016
parent celavek <cetatzeanum yahoo.com> writes:
On Wednesday, 20 July 2016 at 08:30:37 UTC, Mike Parker wrote:

 representation does not allocate any new memory. It points to 
 the same memory, same data. If we think of D arrays as 
 something like this:

 struct Array(T) {
     size_t len;
     T* ptr;
 }

 Then representation is doing this:

 Array original;
 Array representation(original.len, original.ptr);

 So, yes, the char data will still be shuffled in place. All 
 you're doing is getting a ubyte view onto it so that it can be 
 treated as a range.
Thank you for the very useful information. I really appreciate taking the time to explain these, maybe trivial, things to me. I confirmed the behavior with a test. working as expected.
Jul 20 2016
prev sibling parent reply pineapple <meapineapple gmail.com> writes:
On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 You can then go to the documentation for 
 std.range.primitives.isRandomAccessRange [2], where you'll find 
 the following:

 "Although char[] and wchar[] (as well as their qualified 
 versions including string and wstring) are arrays, 
 isRandomAccessRange yields false for them because they use 
 variable-length encodings (UTF-8 and UTF-16 respectively). 
 These types are bidirectional ranges only."
There's also the shuffle module in mach.range which doesn't do any auto-decoding: https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d
Jul 20 2016
next sibling parent reply Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 10:40:04 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 You can then go to the documentation for 
 std.range.primitives.isRandomAccessRange [2], where you'll 
 find the following:

 "Although char[] and wchar[] (as well as their qualified 
 versions including string and wstring) are arrays, 
 isRandomAccessRange yields false for them because they use 
 variable-length encodings (UTF-8 and UTF-16 respectively). 
 These types are bidirectional ranges only."
There's also the shuffle module in mach.range which doesn't do any auto-decoding: https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d
There is no auto-decoding going on here, as char[] and wchar[] are rejected outright since they are not considered random access ranges.
Jul 20 2016
next sibling parent reply pineapple <meapineapple gmail.com> writes:
On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and wchar[] 
 are rejected outright since they are not considered random 
 access ranges.
They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
Jul 20 2016
next sibling parent reply pineapple <meapineapple gmail.com> writes:
On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and wchar[] 
 are rejected outright since they are not considered random 
 access ranges.
They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
On second thought that's not even relevant - the linked-to module performs an out-of-place shuffle and so does not even require the input range to have random access.
Jul 20 2016
parent reply pineapple <meapineapple gmail.com> writes:
On Wednesday, 20 July 2016 at 16:04:50 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and 
 wchar[] are rejected outright since they are not considered 
 random access ranges.
They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
On second thought that's not even relevant - the linked-to module performs an out-of-place shuffle and so does not even require the input range to have random access.
Pardon my being scatterbrained (and there not being an "edit post" function) - you're referring to phobos not considering char[] and wchar[] to have random access? The reason they are not considered to have random access is because they are auto-decoded by other functions that handle them, and the auto-decoding makes random access inefficient. Not because shuffleRandom itself auto-decodes them.
Jul 20 2016
parent reply Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 16:08:26 UTC, pineapple wrote:

 Pardon my being scatterbrained (and there not being an "edit 
 post" function) - you're referring to phobos not considering 
 char[] and wchar[] to have random access? The reason they are 
 not considered to have random access is because they are 
 auto-decoded by other functions that handle them, and the 
 auto-decoding makes random access inefficient. Not because 
 shuffleRandom itself auto-decodes them.
The relevant lines I quoted from the docs above explain quite clearly that it's because they are multi-byte formats. Indexing them is not inefficient, it simply makes no sense. What does it mean to take the value at index i when it is part of a multi-byte sequence that continues at index i+1? Auto-decoding has nothing to do with it.
Jul 20 2016
parent ag0aep6g <anonymous example.com> writes:
On 07/20/2016 06:18 PM, Mike Parker wrote:
 The relevant lines I quoted from the docs above explain quite clearly
 that it's because they are multi-byte formats. Indexing them is not
 inefficient, it simply makes no sense. What does it mean to take the
 value at index i when it is part of a multi-byte sequence that continues
 at index i+1? Auto-decoding has nothing to do with it.
Without auto decoding, char[] would (most probably) be a random access range of code units. Taking the value at index i would return the code unit at index i, like it does for the array. It's not that way, because narrow strings are decoded by the range primitives (auto decoding).
Jul 20 2016
prev sibling parent reply Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:
On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and wchar[] 
 are rejected outright since they are not considered random 
 access ranges.
They are considered random access ranges by my ranges library, because they are treated as arrays of characters and not as unicode strings.
I think you mean that your range library treats them as arrays of code units, meaning your library will break (some) unicode strings. Note that auto decoding and random access range are different. The isRandomAccess check must make a special condition that the string is not "narrow" else they would be considered random access even though front automatically decodes. 922: static assert(!isNarrowString!R);
Jul 20 2016
parent pineapple <meapineapple gmail.com> writes:
On Wednesday, 20 July 2016 at 18:32:15 UTC, Jesse Phillips wrote:
 I think you mean that your range library treats them as arrays 
 of code units, meaning your library will break (some) unicode 
 strings.
Right - I disagree with the assessment that all (or even most) char[] types are intended to represent unicode strings, rather than arrays containing chars. If you want your array to be interpreted as a unicode string, then you should use std.utc's byGrapheme or similar functions.
Jul 21 2016
prev sibling parent reply ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here,
...
 as char[] and wchar[] are rejected outright since they are not 
 considered random access ranges.
...due to autodecoding.
Jul 20 2016
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 07/20/2016 09:44 AM, ketmar wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here,
...
 as char[] and wchar[] are rejected outright since they are not
 considered random access ranges.
...due to autodecoding.
I think both not being random access ranges and there is auto-decoding in Phobos are design decisions due to the fact that char[] is a multi-byte encoding. Phobos could choose not to auto-decode but char[] would still be multi-byte, making it impossible to access randomly. Ali
Jul 20 2016
next sibling parent ketmar <ketmar ketmar.no-ip.org> writes:
On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:
 I think both not being random access ranges and there is 
 auto-decoding in Phobos are design decisions due to the fact 
 that char[] is a multi-byte encoding.

 Phobos could choose not to auto-decode but char[] would still 
 be multi-byte, making it impossible to access randomly.
but it does happen that we have autodecoding, and non-random-access char ranges, and it is clearly tied. so, leaving aside "what if..." things, we can say that it is autodecoding issue. ;-)
Jul 20 2016
prev sibling parent reply Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:
 making it impossible to access randomly
making it impossible to access randomly __correctly__, unless you're safely assuming there's only ASCII in your string.
Jul 20 2016
parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 07/20/2016 10:40 AM, Jack Stouffer wrote:
 On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:
 making it impossible to access randomly
making it impossible to access randomly __correctly__, unless you're safely assuming there's only ASCII in your string.
Yes, perhaps I should have said "making it not meaningful to access randomly" (in general, as you note). Ali
Jul 20 2016
prev sibling parent Mike Parker <aldacron gmail.com> writes:
On Wednesday, 20 July 2016 at 16:44:11 UTC, ketmar wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here,
...
 as char[] and wchar[] are rejected outright since they are not 
 considered random access ranges.
...due to autodecoding.
No, due to them being multi-byte formats. I don't see what auto decoding has to do with it. That's a separate concept. We could take auto decoding out of Phobos and still disqualify them as random access ranges.
Jul 20 2016
prev sibling parent celavek <cetatzeanum yahoo.com> writes:
On Wednesday, 20 July 2016 at 10:40:04 UTC, pineapple wrote:

 There's also the shuffle module in mach.range which doesn't do 
 any auto-decoding: 
 https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d
Interesting project. Thanks for the link.
Jul 20 2016