digitalmars.D.bugs - Moving structs containing strings ...

Bob W (55/55) Apr 23 2006 After moving structs containing char[n] arrays

Derek Parnell (12/12) Apr 23 2006 On Mon, 24 Apr 2006 08:44:57 +1000, Bob W wrote:

Bob W (10/22) Apr 23 2006 Thanks for your suggestion. But I am more keen to know

kris (14/49) Apr 23 2006 eeeewe :)

Bob W (5/17) Apr 23 2006 Absolutely right! Now I am asking myself the usual question:

Derek Parnell (12/15) Apr 23 2006 Another solution is to recalculate the addresses of the id strings after

Bob W (9/14) Apr 24 2006 I'll gladly accept this as a solution, because it will

Derek Parnell (44/61) Apr 24 2006 However, there's a typo in it. Here is a fuller test of it ...

Bob W (50/53) Apr 24 2006 While I like your source in principal, I am

Frits van Bommel (10/27) Apr 24 2006 How about defining the struct like this:

Bob W (18/26) Apr 24 2006 No idea what slightly slower means to you, but I can

Bob W (14/14) Apr 24 2006 Thanks guys for your replies!

"Bob W" <nospam aol.com> writes:

After moving structs containing char[n] arrays
and char[] strings, the latter can point to the
wrong string.

This is a sample program producing the
'unexpected':

------

///////////////////////////////
// test - array struct quirks
///////////////////////////////

// This program demonstrates an unexpected
// D 'feature':
//
// When members of an array of structs are copied
// from higher to lower array indices, some char[]
// vars of the resulting structs are pointing to
// wrong string locations (dmd 0.154).

import std.stdio, std.string;

alias writefln wrl;  alias writef wr;  alias toString tS;

struct Sc { uint val;  char[3] idz;  char[] id; }

Sc[] arr;


int main() {

  // allocate array space and init array

  arr.length=3;
  with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
  with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
  with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

  // display original

  wrl();  wrl("The original array:");
  foreach (uint x, Sc v; arr)
    wrl("arr[%d] - val:%d, id:'%s'", x,v.val,v.id);


  // delete arr[0] by moving arr[1] and arr[2]

  arr[0]=arr[1];  arr[1]=arr[2];
  arr=arr[0..2];   // adjust length

  // diplay 'expected' and 'actual'

  wrl();  wrl("Reduced array should look like this:");
  foreach (uint x, Sc v; arr)
    wrl("arr[%d] - val:%d, id:'%s'", x,v.val,tS(v.idz));

  wrl();  wrl("But it actually looks like that:");
  foreach (uint x, Sc v; arr)
    wrl("arr[%d] - val:%d, id:'%s'", x,v.val,v.id);

  return 0;
}

--------------

Output of the above example:

    The original array:
    arr[0] - val:101, id:'ab'
    arr[1] - val:102, id:'cd'
    arr[2] - val:103, id:'ef'

    Reduced array should look like this:
    arr[0] - val:102, id:'cd'
    arr[1] - val:103, id:'ef'

    But it actually looks like that:
    arr[0] - val:102, id:'ef'      <<<<
    arr[1] - val:103, id:'ef'

Apr 23 2006

"Derek Parnell" <derek psych.ward> writes:

On Mon, 24 Apr 2006 08:44:57 +1000, Bob W <nospam aol.com> wrote:

For a workaround,

    with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
    with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
    with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

replace those lines with

   with (arr[0]) { idz[]="ab\0";  id=idz[0..2].dup;  val=101; }
   with (arr[1]) { idz[]="cd\0";  id=idz[0..2].dup;  val=102; }
   with (arr[2]) { idz[]="ef\0";  id=idz[0..2].dup;  val=103; }


-- 
Derek Parnell
Melbourne, Australia

Apr 23 2006

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:op.s8hhktza6b8z09 ginger.vic.bigpond.net.au...
 On Mon, 24 Apr 2006 08:44:57 +1000, Bob W <nospam aol.com> wrote:

 For a workaround,

    with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
    with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
    with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

 replace those lines with

   with (arr[0]) { idz[]="ab\0";  id=idz[0..2].dup;  val=101; }
   with (arr[1]) { idz[]="cd\0";  id=idz[0..2].dup;  val=102; }
   with (arr[2]) { idz[]="ef\0";  id=idz[0..2].dup;  val=103; }


 -- 
 Derek Parnell
 Melbourne, Australia

Thanks for your suggestion. But I am more keen to know
why the example mentioned in my post is not producing
the desired results than in applying a workaround.

If this question remains unanswered I'll have to revert
to using pointers for the application which initially has
produced these erroneus results. Using '.dup' several
million times would probably bring my system to a
standstill. I might be wrong with this guess, however ...

Apr 23 2006

kris <foo bar.com> writes:

Bob W wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:op.s8hhktza6b8z09 ginger.vic.bigpond.net.au...
 
On Mon, 24 Apr 2006 08:44:57 +1000, Bob W <nospam aol.com> wrote:

For a workaround,

   with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
   with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
   with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

replace those lines with

  with (arr[0]) { idz[]="ab\0";  id=idz[0..2].dup;  val=101; }
  with (arr[1]) { idz[]="cd\0";  id=idz[0..2].dup;  val=102; }
  with (arr[2]) { idz[]="ef\0";  id=idz[0..2].dup;  val=103; }


-- 
Derek Parnell
Melbourne, Australia

 
 
 Thanks for your suggestion. But I am more keen to know
 why the example mentioned in my post is not producing
 the desired results than in applying a workaround.
 
 If this question remains unanswered I'll have to revert
 to using pointers for the application which initially has
 produced these erroneus results. Using '.dup' several
 million times would probably bring my system to a
 standstill. I might be wrong with this guess, however ...
 
 
 


eeeewe :)

The 'id' slices are mapped to absolute addresses, and thus are mapped to 
specific addresses within the array.

When you remove array element 0 and compress the array, those addresses 
are not updated, and you end up with the result shown. After 
compression, the 'id' slice for val:102 is still pointing at where it 
used to live, although the original "cd" text has now been overwritten 
with the val:103 "ef" text instead :)

The 'id' slice for val:103 is actually still pointing to its original 
location, which is now /past/ the end of the array (hehe).

One solution is to adjust each 'id' slice when you compress or otherwise 
adjust the array content. Wouldn't you have to do this with pointers also?

- Kris

Apr 23 2006

"Bob W" <nospam aol.com> writes:

"kris" <foo bar.com> wrote in message news:e2hbp1$pnk$1 digitaldaemon.com...
 eeeewe :)

 The 'id' slices are mapped to absolute addresses, and thus are mapped to 
 specific addresses within the array.

Absolutely right! Now I am asking myself the usual question:
Why didn't I take a break over the weekend and recognize this
first thing on Monday morning?


 When you remove array element 0 and compress the array, those addresses 
 are not updated, and you end up with the result shown. After compression, 
 the 'id' slice for val:102 is still pointing at where it used to live, 
 although the original "cd" text has now been overwritten with the val:103 
 "ef" text instead :)

 The 'id' slice for val:103 is actually still pointing to its original 
 location, which is now /past/ the end of the array (hehe).

 One solution is to adjust each 'id' slice when you compress or otherwise 
 adjust the array content. Wouldn't you have to do this with pointers also?

I'll adjust. Thank you very much!

Apr 23 2006

Derek Parnell <derek psych.ward> writes:

On Mon, 24 Apr 2006 00:44:57 +0200, Bob W wrote:

 After moving structs containing char[n] arrays
 and char[] strings, the latter can point to the
 wrong string.

Another solution is to recalculate the addresses of the id strings after
changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

-- 
Derek
(skype: derek.j.parnell)
Melbourne, Australia
"Down with mediocracy!"
24/04/2006 12:19:51 PM

Apr 23 2006

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:c7u4nwvom7ca$.o0vru71wqxwe.dlg 40tude.net...
 Another solution is to recalculate the addresses of the id strings after
 changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

I'll gladly accept this as a solution, because it will
utilize existing strings within the structs-array.

The workaround you have posted earlier allocates
memory and duplicates strings one by one. That is
something I would not want to try in real-world
applications due to obvious reasons.

Thanks for your input.

Apr 24 2006

"Derek Parnell" <derek psych.ward> writes:

On Mon, 24 Apr 2006 20:19:43 +1000, Bob W <nospam aol.com> wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:c7u4nwvom7ca$.o0vru71wqxwe.dlg 40tude.net...
 Another solution is to recalculate the addresses of the id strings after
 changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

 I'll gladly accept this as a solution, because it will
 utilize existing strings within the structs-array.

 The workaround you have posted earlier allocates
 memory and duplicates strings one by one. That is
 something I would not want to try in real-world
 applications due to obvious reasons.

 Thanks for your input.

However, there's a typo in it. Here is a fuller test of it ...

//------------------------
import std.stdio;

alias std.stdio.writefln wrl;

void AssignIdAddr(Sc[] pArr)
{
   foreach( inout Sc s; pArr)
       s.id = s.idz[0 .. $];
}

void DisplayArray(Sc[] pArr, char[] pMsg)
{
   wrl("\n%s...", pMsg);
   foreach (uint i, Sc s; pArr)
     wrl("arr[%d] - val:%d, id:'%s'", i, s.val, s.id);
}

struct Sc { uint val;  char[2] idz;  char[] id;
             void opCall(char[] z, uint v)
              {
                  idz[]=z;
                  val = v;
             }
         }

Sc[] arr;

void main()
{

   // init array
   arr.length = arr.length + 1; arr[$-1]("ab", 101);
   arr.length = arr.length + 1; arr[$-1]("cd", 102);
   arr.length = arr.length + 1; arr[$-1]("ef", 103);
   arr.length = arr.length + 1; arr[$-1]("gh", 104);
   AssignIdAddr(arr);

   DisplayArray(arr, "Original");

   // delete first item
   for( uint i = 0; (i+1) < arr.length; i++)
     arr[i]=arr[i+1];
   arr.length = arr.length - 1;   // adjust length
   AssignIdAddr(arr);

   DisplayArray(arr, "Updated");
}
//------------------------

-- 
Derek Parnell
Melbourne, Australia

Apr 24 2006

"Bob W" <nospam aol.com> writes:

"Derek Parnell" <derek psych.ward> wrote in message 
news:op.s8if7ipr6b8z09 ginger.vic.bigpond.net.au...
 .........

 However, there's a typo in it. Here is a fuller test of it ...

 .........

While I like your source in principal, I am
somewhat allergic to sequential memory
allocations. Therefore I have made some
modifications (see sample below).

std.file.read() is probably still not working
properly with large files due to a similar
reason.

Please note that this is my last post about
the subject and I definitely do not intend to
start anything like a programming competition
(I wish I had the time for it though).

My own program, which looks completely
different than the simplified examples, was
fixed in a couple of minutes after reading
kris' post. So I won't concentrate on this
issue any further.

------------------


import std.stdio;

alias std.stdio.writefln wrl;

struct Sc {
  uint val;  char[2] idz;  char[] id;
  void opCall() { id=idz; }
  void opCall(char[] z, uint v) { idz[]=z;  val=v;  id=idz; }
}

Sc[] arr;


void ShowMe(char[] msg) {
  wrl("\n%s ...", msg);
  foreach (uint i, Sc s; arr)
    wrl("arr[%d] - val:%d, id:'%s'", i,s.val,s.id);
}

void main() {
  // DmD documentation - quote: " ... Resizing a dynamic
  //   array is a relatively expensive operation ..."

  arr.length=99;  // leave enough space for ev. additions

  uint n=0;
  arr[n++]("ab",101);
  arr[n++]("cd",102);
  arr[n++]("ef",103);
  arr[n++]("gh",104);
  arr.length=n;   // now cut it's size back

  ShowMe("Original");

  // delete 1st item

  for (uint i=1; i<arr.length; i++) {
    arr[i-1]=arr[i];  arr[i-1]();
  }
  arr.length=arr.length-1;   // adjust length

  ShowMe("Updated");
}

Apr 24 2006

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Bob W wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:c7u4nwvom7ca$.o0vru71wqxwe.dlg 40tude.net...
 Another solution is to recalculate the addresses of the id strings after
 changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

 
 I'll gladly accept this as a solution, because it will
 utilize existing strings within the structs-array.
 
 The workaround you have posted earlier allocates
 memory and duplicates strings one by one. That is
 something I would not want to try in real-world
 applications due to obvious reasons.

How about defining the struct like this:

struct Sc { uint val;  char[3] idz;  char[] id() { return idz[0..3]; } }

This turns 'id' into a property instead of a field. This might be 
slightly slower*, but has several advantages:

- It will always return the current string without needing manual updates.
- It doesn't allocate.
- It removes the need to initialize 'id' separately.
- Added bonus: Sc.sizeof is halved. (goes from 16 bytes to 8 bytes)


*) Not necessarily though, it may well be optimized out.

Apr 24 2006

"Bob W" <nospam aol.com> writes:

"Frits van Bommel" <fvbommel REMwOVExCAPSs.nl> wrote in message 
news:e2iqkp$2fvl$1 digitaldaemon.com...
 How about defining the struct like this:

 struct Sc { uint val;  char[3] idz;  char[] id() { return idz[0..3]; } }

 This turns 'id' into a property instead of a field. This might be slightly 
 slower*, but has several advantages:

No idea what slightly slower means to you, but I can
guarantee you that with indirect string access you
won't even get to 50% of the original speed.


 - It will always return the current string without needing manual updates.

I need direct access to the strings, due to performance
reasons. Therefore I do not mind the updates whenerver
they are required.


 - It doesn't allocate.

This would be unacceptable anyway.


 - It removes the need to initialize 'id' separately.

You are referring to the sample which has those
convenient constant length strings. My application
does not work this way, however. I'd have to
have some basic initialisation first in order to
get quicker access later.


 - Added bonus: Sc.sizeof is halved. (goes from 16 bytes to 8 bytes)

I guess most of us have 1 or 2GB installed these days, true?


Don't get me wrong. Your idea would be definitely
worthwhile considering for other applications. But it
just does not fit my current requirements.

Apr 24 2006

"Bob W" <nospam aol.com> writes:

Thanks guys for your replies!

For anyone who had no time to check upon
my suspicions that D was producing something
'unexpected', here is a short explanation:

My sample program can be imagined as the
attempt to mark a freeway exit with an arrow
inside a car in order to find it later. Unfortunately
I was moving the car afterwards so the arrow
was pointing to ... (you guess).

As a result I have to admit that D did not
deserve any of my secret suspicions about
its stability in this respect.

At least I was careful enough not to use
the word 'bug' in any of my posts.   ;-)

Apr 24 2006

D Programming

C/C++ Programming

Other

digitalmars.D.bugs - Moving structs containing strings ...