digitalmars.D.learn - shuffle a character array

celavek (26/26) Jul 20 2016 Hi

Mike Parker (23/24) Jul 20 2016 You can. However, if you take a look at the documentation for

Mike Parker (4/7) Jul 20 2016 And I forgot to add:

Mike Parker (4/12) Jul 20 2016 Actually, std.conv.to might be better, since toUTF32 returns

celavek (11/21) Jul 20 2016 Ahhh! That again. I was thinking about using the representation.

Mike Parker (14/20) Jul 20 2016 representation does not allocate any new memory. It points to the

celavek (5/18) Jul 20 2016 Thank you for the very useful information. I really appreciate

pineapple (4/12) Jul 20 2016 There's also the shuffle module in mach.range which doesn't do

Mike Parker (4/17) Jul 20 2016 There is no auto-decoding going on here, as char[] and wchar[]

pineapple (4/7) Jul 20 2016 They are considered random access ranges by my ranges library,

pineapple (4/11) Jul 20 2016 On second thought that's not even relevant - the linked-to module

pineapple (8/20) Jul 20 2016 Pardon my being scatterbrained (and there not being an "edit

Mike Parker (7/14) Jul 20 2016 The relevant lines I quoted from the docs above explain quite

ag0aep6g (6/11) Jul 20 2016 Without auto decoding, char[] would (most probably) be a random access

Jesse Phillips (9/16) Jul 20 2016 I think you mean that your range library treats them as arrays of

pineapple (6/9) Jul 21 2016 Right - I disagree with the assessment that all (or even most)

ketmar (3/6) Jul 20 2016 ...due to autodecoding.

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/13) Jul 20 2016 I think both not being random access ranges and there is auto-decoding

ketmar (5/10) Jul 20 2016 but it does happen that we have autodecoding, and
Jack Stouffer (3/4) Jul 20 2016 making it impossible to access randomly __correctly__, unless

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/8) Jul 20 2016 Yes, perhaps I should have said "making it not meaningful to access

Mike Parker (5/11) Jul 20 2016 No, due to them being multi-byte formats. I don't see what auto

celavek (2/5) Jul 20 2016 Interesting project. Thanks for the link.

celavek <cetatzeanum yahoo.com> writes:

Hi

I'm trying to shuffle a character array but I get some 
compilation errors.

*
char[] upper = std.ascii.uppercase.dup;
randomShuffle!(typeof(upper))(upper);

randomShuffle(upper);

example.d(34): Error: template std.random.randomShuffle cannot 
deduce function from argument types !(char[])(char[]), candidates 
are:
/usr/include/dmd/phobos/std/random.d(1822):        
std.random.randomShuffle(Range, RandomGen)(Range r, ref RandomGen 
gen) if (isRandomAccessRange!Range && isUniformRNG!RandomGen)
/usr/include/dmd/phobos/std/random.d(1829):        
std.random.randomShuffle(Range)(Range r) if 
(isRandomAccessRange!Range)

example.d(34): Error: template std.random.randomShuffle cannot 
deduce function from argument types !()(char[]), candidates are:
/usr/include/dmd/phobos/std/random.d(1822):        
std.random.randomShuffle(Range, RandomGen)(Range r, ref RandomGen 
gen) if (isRandomAccessRange!Range && isUniformRNG!RandomGen)
/usr/include/dmd/phobos/std/random.d(1829):        
std.random.randomShuffle(Range)(Range r) if 
(isRandomAccessRange!Range)
*

I thought that I could use a dynamic array as a range ...

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:

 I thought that I could use a dynamic array as a range ...

You can. However, if you take a look at the documentation for 
std.random.randomShuffle [1], you'll find the following 
constraint:

if (isRandomAccessRange!Range);

You can then go to the documentation for 
std.range.primitives.isRandomAccessRange [2], where you'll find 
the following:

"Although char[] and wchar[] (as well as their qualified versions 
including string and wstring) are arrays, isRandomAccessRange 
yields false for them because they use variable-length encodings 
(UTF-8 and UTF-16 respectively). These types are bidirectional 
ranges only."

If you are absolutely, 100% certain that you are dealing with 
ASCII, you can do this:

```
import std.string : representation;
randomShuffle(charArray.representation);

That will give you a ubyte[] for char[] and a ushort[] for 
wchar[].

[1] https://dlang.org/phobos/std_random.html#.randomShuffle
[2] 
https://dlang.org/phobos/std_range_primitives.html#isRandomAccessRange

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:

 If you are absolutely, 100% certain that you are dealing with 
 ASCII, you can do this:

And I forgot to add:

Otherwise, you'll want to convert to dchar[] (probably via 
std.utf.toUTF32) and pass that along instead.

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 08:05:20 UTC, Mike Parker wrote:
 On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 On Wednesday, 20 July 2016 at 07:49:38 UTC, celavek wrote:

 If you are absolutely, 100% certain that you are dealing with 
 ASCII, you can do this:

 And I forgot to add:

 Otherwise, you'll want to convert to dchar[] (probably via 
 std.utf.toUTF32) and pass that along instead.

Actually, std.conv.to might be better, since toUTF32 returns 
dstring:

auto dcharArray = to!(dchar[])(charArray);

Jul 20 2016

celavek <cetatzeanum yahoo.com> writes:

On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 If you are absolutely, 100% certain that you are dealing with 
 ASCII, you can do this:

 ```
 import std.string : representation;
 randomShuffle(charArray.representation);

 That will give you a ubyte[] for char[] and a ushort[] for 
 wchar[].

 [1] https://dlang.org/phobos/std_random.html#.randomShuffle
 [2] 
 https://dlang.org/phobos/std_range_primitives.html#isRandomAccessRange

Ahhh! That again. I was thinking about using the representation. 
I should take a deeper
look at the documentation.

As far as my current understanding goes the shuffle will be done 
in place.
If I use the "representation" would that still hold, that is will 
I be able
to use the same char[] but in the shuffled form? (of course I 
will test that)

Thank you

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 08:18:55 UTC, celavek wrote:

 As far as my current understanding goes the shuffle will be 
 done in place.
 If I use the "representation" would that still hold, that is 
 will I be able
 to use the same char[] but in the shuffled form? (of course I 
 will test that)

representation does not allocate any new memory. It points to the 
same memory, same data. If we think of D arrays as something like 
this:

struct Array(T) {
     size_t len;
     T* ptr;
}

Then representation is doing this:

Array original;
Array representation(original.len, original.ptr);

So, yes, the char data will still be shuffled in place. All 
you're doing is getting a ubyte view onto it so that it can be 
treated as a range.

Jul 20 2016

celavek <cetatzeanum yahoo.com> writes:

On Wednesday, 20 July 2016 at 08:30:37 UTC, Mike Parker wrote:

 representation does not allocate any new memory. It points to 
 the same memory, same data. If we think of D arrays as 
 something like this:

 struct Array(T) {
     size_t len;
     T* ptr;
 }

 Then representation is doing this:

 Array original;
 Array representation(original.len, original.ptr);

 So, yes, the char data will still be shuffled in place. All 
 you're doing is getting a ubyte view onto it so that it can be 
 treated as a range.

Thank you for the very useful information. I really appreciate 
taking the time to explain
these, maybe trivial, things to me.

I confirmed the behavior with a test. working as expected.

Jul 20 2016

pineapple <meapineapple gmail.com> writes:

On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 You can then go to the documentation for 
 std.range.primitives.isRandomAccessRange [2], where you'll find 
 the following:

 "Although char[] and wchar[] (as well as their qualified 
 versions including string and wstring) are arrays, 
 isRandomAccessRange yields false for them because they use 
 variable-length encodings (UTF-8 and UTF-16 respectively). 
 These types are bidirectional ranges only."

There's also the shuffle module in mach.range which doesn't do 
any auto-decoding: 
https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 10:40:04 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 08:02:07 UTC, Mike Parker wrote:
 You can then go to the documentation for 
 std.range.primitives.isRandomAccessRange [2], where you'll 
 find the following:

 "Although char[] and wchar[] (as well as their qualified 
 versions including string and wstring) are arrays, 
 isRandomAccessRange yields false for them because they use 
 variable-length encodings (UTF-8 and UTF-16 respectively). 
 These types are bidirectional ranges only."

 There's also the shuffle module in mach.range which doesn't do 
 any auto-decoding: 
 https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d

There is no auto-decoding going on here, as char[] and wchar[] 
are rejected outright since they are not considered random access 
ranges.

Jul 20 2016

pineapple <meapineapple gmail.com> writes:

On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and wchar[] 
 are rejected outright since they are not considered random 
 access ranges.

They are considered random access ranges by my ranges library, 
because they are treated as arrays of characters and not as 
unicode strings.

Jul 20 2016

pineapple <meapineapple gmail.com> writes:

On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and wchar[] 
 are rejected outright since they are not considered random 
 access ranges.

 They are considered random access ranges by my ranges library, 
 because they are treated as arrays of characters and not as 
 unicode strings.

On second thought that's not even relevant - the linked-to module 
performs an out-of-place shuffle and so does not even require the 
input range to have random access.

Jul 20 2016

pineapple <meapineapple gmail.com> writes:

On Wednesday, 20 July 2016 at 16:04:50 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and 
 wchar[] are rejected outright since they are not considered 
 random access ranges.

 They are considered random access ranges by my ranges library, 
 because they are treated as arrays of characters and not as 
 unicode strings.

 On second thought that's not even relevant - the linked-to 
 module performs an out-of-place shuffle and so does not even 
 require the input range to have random access.

Pardon my being scatterbrained (and there not being an "edit 
post" function) - you're referring to phobos not considering 
char[] and wchar[] to have random access? The reason they are not 
considered to have random access is because they are auto-decoded 
by other functions that handle them, and the auto-decoding makes 
random access inefficient. Not because shuffleRandom itself 
auto-decodes them.

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 16:08:26 UTC, pineapple wrote:

 Pardon my being scatterbrained (and there not being an "edit 
 post" function) - you're referring to phobos not considering 
 char[] and wchar[] to have random access? The reason they are 
 not considered to have random access is because they are 
 auto-decoded by other functions that handle them, and the 
 auto-decoding makes random access inefficient. Not because 
 shuffleRandom itself auto-decodes them.

The relevant lines I quoted from the docs above explain quite 
clearly that it's because they are multi-byte formats. Indexing 
them is not inefficient, it simply makes no sense. What does it 
mean to take the value at index i when it is part of a multi-byte 
sequence that continues at index i+1? Auto-decoding has nothing 
to do with it.

Jul 20 2016

ag0aep6g <anonymous example.com> writes:

On 07/20/2016 06:18 PM, Mike Parker wrote:
 The relevant lines I quoted from the docs above explain quite clearly
 that it's because they are multi-byte formats. Indexing them is not
 inefficient, it simply makes no sense. What does it mean to take the
 value at index i when it is part of a multi-byte sequence that continues
 at index i+1? Auto-decoding has nothing to do with it.

Without auto decoding, char[] would (most probably) be a random access 
range of code units. Taking the value at index i would return the code 
unit at index i, like it does for the array.

It's not that way, because narrow strings are decoded by the range 
primitives (auto decoding).

Jul 20 2016

Jesse Phillips <Jesse.K.Phillips+D gmail.com> writes:

On Wednesday, 20 July 2016 at 16:03:27 UTC, pineapple wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here, as char[] and wchar[] 
 are rejected outright since they are not considered random 
 access ranges.

 They are considered random access ranges by my ranges library, 
 because they are treated as arrays of characters and not as 
 unicode strings.

I think you mean that your range library treats them as arrays of 
code units, meaning your library will break (some) unicode 
strings.

Note that auto decoding and random access range are different. 
The isRandomAccess check must make a special condition that the 
string is not "narrow" else they would be considered random 
access even though front automatically decodes.

922: static assert(!isNarrowString!R);

Jul 20 2016

pineapple <meapineapple gmail.com> writes:

On Wednesday, 20 July 2016 at 18:32:15 UTC, Jesse Phillips wrote:
 I think you mean that your range library treats them as arrays 
 of code units, meaning your library will break (some) unicode 
 strings.

Right - I disagree with the assessment that all (or even most) 
char[] types are intended to represent unicode strings, rather 
than arrays containing chars.

If you want your array to be interpreted as a unicode string, 
then you should use std.utc's byGrapheme or similar functions.

Jul 21 2016

ketmar <ketmar ketmar.no-ip.org> writes:

On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here,

...

 as char[] and wchar[] are rejected outright since they are not 
 considered random access ranges.

...due to autodecoding.

Jul 20 2016

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 07/20/2016 09:44 AM, ketmar wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here,

 ...

 as char[] and wchar[] are rejected outright since they are not
 considered random access ranges.

 ...due to autodecoding.

I think both not being random access ranges and there is auto-decoding 
in Phobos are design decisions due to the fact that char[] is a 
multi-byte encoding.

Phobos could choose not to auto-decode but char[] would still be 
multi-byte, making it impossible to access randomly.

Ali

Jul 20 2016

ketmar <ketmar ketmar.no-ip.org> writes:

On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:
 I think both not being random access ranges and there is 
 auto-decoding in Phobos are design decisions due to the fact 
 that char[] is a multi-byte encoding.

 Phobos could choose not to auto-decode but char[] would still 
 be multi-byte, making it impossible to access randomly.

but it does happen that we have autodecoding, and 
non-random-access char ranges, and it is clearly tied. so, 
leaving aside "what if..." things, we can say that it is 
autodecoding issue. ;-)

Jul 20 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:
 making it impossible to access randomly

making it impossible to access randomly __correctly__, unless 
you're safely assuming there's only ASCII in your string.

Jul 20 2016

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 07/20/2016 10:40 AM, Jack Stouffer wrote:
 On Wednesday, 20 July 2016 at 17:31:18 UTC, Ali Çehreli wrote:
 making it impossible to access randomly

 making it impossible to access randomly __correctly__, unless you're
 safely assuming there's only ASCII in your string.

Yes, perhaps I should have said "making it not meaningful to access 
randomly" (in general, as you note).

Ali

Jul 20 2016

Mike Parker <aldacron gmail.com> writes:

On Wednesday, 20 July 2016 at 16:44:11 UTC, ketmar wrote:
 On Wednesday, 20 July 2016 at 13:33:34 UTC, Mike Parker wrote:
 There is no auto-decoding going on here,

 ...

 as char[] and wchar[] are rejected outright since they are not 
 considered random access ranges.

 ...due to autodecoding.

No, due to them being multi-byte formats. I don't see what auto 
decoding has to do with it. That's a separate concept. We could 
take auto decoding out of Phobos and still disqualify them as 
random access ranges.

Jul 20 2016

celavek <cetatzeanum yahoo.com> writes:

On Wednesday, 20 July 2016 at 10:40:04 UTC, pineapple wrote:

 There's also the shuffle module in mach.range which doesn't do 
 any auto-decoding: 
 https://github.com/pineapplemachine/mach.d/blob/master/mach/range/random/shuffle.d

Interesting project. Thanks for the link.

Jul 20 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - shuffle a character array