www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - Moving structs containing strings ...

reply "Bob W" <nospam aol.com> writes:
After moving structs containing char[n] arrays
and char[] strings, the latter can point to the
wrong string.

This is a sample program producing the
'unexpected':

------

///////////////////////////////
// test - array struct quirks
///////////////////////////////

// This program demonstrates an unexpected
// D 'feature':
//
// When members of an array of structs are copied
// from higher to lower array indices, some char[]
// vars of the resulting structs are pointing to
// wrong string locations (dmd 0.154).

import std.stdio, std.string;

alias writefln wrl;  alias writef wr;  alias toString tS;

struct Sc { uint val;  char[3] idz;  char[] id; }

Sc[] arr;


int main() {

  // allocate array space and init array

  arr.length=3;
  with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
  with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
  with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

  // display original

  wrl();  wrl("The original array:");
  foreach (uint x, Sc v; arr)
    wrl("arr[%d] - val:%d, id:'%s'", x,v.val,v.id);

  // ####### this is the trouble spot: #######
  // delete arr[0] by moving arr[1] and arr[2]

  arr[0]=arr[1];  arr[1]=arr[2];
  arr=arr[0..2];   // adjust length

  // diplay 'expected' and 'actual'

  wrl();  wrl("Reduced array should look like this:");
  foreach (uint x, Sc v; arr)
    wrl("arr[%d] - val:%d, id:'%s'", x,v.val,tS(v.idz));

  wrl();  wrl("But it actually looks like that:");
  foreach (uint x, Sc v; arr)
    wrl("arr[%d] - val:%d, id:'%s'", x,v.val,v.id);

  return 0;
}

--------------

Output of the above example:

    The original array:
    arr[0] - val:101, id:'ab'
    arr[1] - val:102, id:'cd'
    arr[2] - val:103, id:'ef'

    Reduced array should look like this:
    arr[0] - val:102, id:'cd'
    arr[1] - val:103, id:'ef'

    But it actually looks like that:
    arr[0] - val:102, id:'ef'      <<<<
    arr[1] - val:103, id:'ef'
Apr 23 2006
next sibling parent reply "Derek Parnell" <derek psych.ward> writes:
On Mon, 24 Apr 2006 08:44:57 +1000, Bob W <nospam aol.com> wrote:

For a workaround,

    with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
    with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
    with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

replace those lines with

   with (arr[0]) { idz[]="ab\0";  id=idz[0..2].dup;  val=101; }
   with (arr[1]) { idz[]="cd\0";  id=idz[0..2].dup;  val=102; }
   with (arr[2]) { idz[]="ef\0";  id=idz[0..2].dup;  val=103; }


-- 
Derek Parnell
Melbourne, Australia
Apr 23 2006
parent reply "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:op.s8hhktza6b8z09 ginger.vic.bigpond.net.au...
 On Mon, 24 Apr 2006 08:44:57 +1000, Bob W <nospam aol.com> wrote:

 For a workaround,

    with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
    with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
    with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

 replace those lines with

   with (arr[0]) { idz[]="ab\0";  id=idz[0..2].dup;  val=101; }
   with (arr[1]) { idz[]="cd\0";  id=idz[0..2].dup;  val=102; }
   with (arr[2]) { idz[]="ef\0";  id=idz[0..2].dup;  val=103; }


 -- 
 Derek Parnell
 Melbourne, Australia

Thanks for your suggestion. But I am more keen to know why the example mentioned in my post is not producing the desired results than in applying a workaround. If this question remains unanswered I'll have to revert to using pointers for the application which initially has produced these erroneus results. Using '.dup' several million times would probably bring my system to a standstill. I might be wrong with this guess, however ...
Apr 23 2006
parent reply kris <foo bar.com> writes:
Bob W wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:op.s8hhktza6b8z09 ginger.vic.bigpond.net.au...
 
On Mon, 24 Apr 2006 08:44:57 +1000, Bob W <nospam aol.com> wrote:

For a workaround,

   with (arr[0]) { idz[]="ab\0";  id=idz[0..2];  val=101; }
   with (arr[1]) { idz[]="cd\0";  id=idz[0..2];  val=102; }
   with (arr[2]) { idz[]="ef\0";  id=idz[0..2];  val=103; }

replace those lines with

  with (arr[0]) { idz[]="ab\0";  id=idz[0..2].dup;  val=101; }
  with (arr[1]) { idz[]="cd\0";  id=idz[0..2].dup;  val=102; }
  with (arr[2]) { idz[]="ef\0";  id=idz[0..2].dup;  val=103; }


-- 
Derek Parnell
Melbourne, Australia

Thanks for your suggestion. But I am more keen to know why the example mentioned in my post is not producing the desired results than in applying a workaround. If this question remains unanswered I'll have to revert to using pointers for the application which initially has produced these erroneus results. Using '.dup' several million times would probably bring my system to a standstill. I might be wrong with this guess, however ...

eeeewe :) The 'id' slices are mapped to absolute addresses, and thus are mapped to specific addresses within the array. When you remove array element 0 and compress the array, those addresses are not updated, and you end up with the result shown. After compression, the 'id' slice for val:102 is still pointing at where it used to live, although the original "cd" text has now been overwritten with the val:103 "ef" text instead :) The 'id' slice for val:103 is actually still pointing to its original location, which is now /past/ the end of the array (hehe). One solution is to adjust each 'id' slice when you compress or otherwise adjust the array content. Wouldn't you have to do this with pointers also? - Kris
Apr 23 2006
parent "Bob W" <nospam aol.com> writes:
"kris" <foo bar.com> wrote in message news:e2hbp1$pnk$1 digitaldaemon.com...
 eeeewe :)

 The 'id' slices are mapped to absolute addresses, and thus are mapped to 
 specific addresses within the array.

Absolutely right! Now I am asking myself the usual question: Why didn't I take a break over the weekend and recognize this first thing on Monday morning?
 When you remove array element 0 and compress the array, those addresses 
 are not updated, and you end up with the result shown. After compression, 
 the 'id' slice for val:102 is still pointing at where it used to live, 
 although the original "cd" text has now been overwritten with the val:103 
 "ef" text instead :)

 The 'id' slice for val:103 is actually still pointing to its original 
 location, which is now /past/ the end of the array (hehe).

 One solution is to adjust each 'id' slice when you compress or otherwise 
 adjust the array content. Wouldn't you have to do this with pointers also?

I'll adjust. Thank you very much!
Apr 23 2006
prev sibling next sibling parent reply Derek Parnell <derek psych.ward> writes:
On Mon, 24 Apr 2006 00:44:57 +0200, Bob W wrote:

 After moving structs containing char[n] arrays
 and char[] strings, the latter can point to the
 wrong string.

Another solution is to recalculate the addresses of the id strings after changing the length ... arr=arr[0..2]; // adjust length foreach( inout Sc s; arr) s.id = s.idz[0..s.id.length]; -- Derek (skype: derek.j.parnell) Melbourne, Australia "Down with mediocracy!" 24/04/2006 12:19:51 PM
Apr 23 2006
parent reply "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:c7u4nwvom7ca$.o0vru71wqxwe.dlg 40tude.net...
 Another solution is to recalculate the addresses of the id strings after
 changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

I'll gladly accept this as a solution, because it will utilize existing strings within the structs-array. The workaround you have posted earlier allocates memory and duplicates strings one by one. That is something I would not want to try in real-world applications due to obvious reasons. Thanks for your input.
Apr 24 2006
next sibling parent reply "Derek Parnell" <derek psych.ward> writes:
On Mon, 24 Apr 2006 20:19:43 +1000, Bob W <nospam aol.com> wrote:

 "Derek Parnell" <derek psych.ward> wrote in message
 news:c7u4nwvom7ca$.o0vru71wqxwe.dlg 40tude.net...
 Another solution is to recalculate the addresses of the id strings after
 changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

I'll gladly accept this as a solution, because it will utilize existing strings within the structs-array. The workaround you have posted earlier allocates memory and duplicates strings one by one. That is something I would not want to try in real-world applications due to obvious reasons. Thanks for your input.

However, there's a typo in it. Here is a fuller test of it ... //------------------------ import std.stdio; alias std.stdio.writefln wrl; void AssignIdAddr(Sc[] pArr) { foreach( inout Sc s; pArr) s.id = s.idz[0 .. $]; } void DisplayArray(Sc[] pArr, char[] pMsg) { wrl("\n%s...", pMsg); foreach (uint i, Sc s; pArr) wrl("arr[%d] - val:%d, id:'%s'", i, s.val, s.id); } struct Sc { uint val; char[2] idz; char[] id; void opCall(char[] z, uint v) { idz[]=z; val = v; } } Sc[] arr; void main() { // init array arr.length = arr.length + 1; arr[$-1]("ab", 101); arr.length = arr.length + 1; arr[$-1]("cd", 102); arr.length = arr.length + 1; arr[$-1]("ef", 103); arr.length = arr.length + 1; arr[$-1]("gh", 104); AssignIdAddr(arr); DisplayArray(arr, "Original"); // delete first item for( uint i = 0; (i+1) < arr.length; i++) arr[i]=arr[i+1]; arr.length = arr.length - 1; // adjust length AssignIdAddr(arr); DisplayArray(arr, "Updated"); } //------------------------ -- Derek Parnell Melbourne, Australia
Apr 24 2006
parent "Bob W" <nospam aol.com> writes:
"Derek Parnell" <derek psych.ward> wrote in message 
news:op.s8if7ipr6b8z09 ginger.vic.bigpond.net.au...
 .........

 However, there's a typo in it. Here is a fuller test of it ...

 .........

While I like your source in principal, I am somewhat allergic to sequential memory allocations. Therefore I have made some modifications (see sample below). std.file.read() is probably still not working properly with large files due to a similar reason. Please note that this is my last post about the subject and I definitely do not intend to start anything like a programming competition (I wish I had the time for it though). My own program, which looks completely different than the simplified examples, was fixed in a couple of minutes after reading kris' post. So I won't concentrate on this issue any further. ------------------ import std.stdio; alias std.stdio.writefln wrl; struct Sc { uint val; char[2] idz; char[] id; void opCall() { id=idz; } void opCall(char[] z, uint v) { idz[]=z; val=v; id=idz; } } Sc[] arr; void ShowMe(char[] msg) { wrl("\n%s ...", msg); foreach (uint i, Sc s; arr) wrl("arr[%d] - val:%d, id:'%s'", i,s.val,s.id); } void main() { // DmD documentation - quote: " ... Resizing a dynamic // array is a relatively expensive operation ..." arr.length=99; // leave enough space for ev. additions uint n=0; arr[n++]("ab",101); arr[n++]("cd",102); arr[n++]("ef",103); arr[n++]("gh",104); arr.length=n; // now cut it's size back ShowMe("Original"); // delete 1st item for (uint i=1; i<arr.length; i++) { arr[i-1]=arr[i]; arr[i-1](); } arr.length=arr.length-1; // adjust length ShowMe("Updated"); }
Apr 24 2006
prev sibling parent reply Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:
Bob W wrote:
 "Derek Parnell" <derek psych.ward> wrote in message 
 news:c7u4nwvom7ca$.o0vru71wqxwe.dlg 40tude.net...
 Another solution is to recalculate the addresses of the id strings after
 changing the length ...

  arr=arr[0..2];   // adjust length
  foreach( inout Sc s; arr)
      s.id = s.idz[0..s.id.length];

I'll gladly accept this as a solution, because it will utilize existing strings within the structs-array. The workaround you have posted earlier allocates memory and duplicates strings one by one. That is something I would not want to try in real-world applications due to obvious reasons.

How about defining the struct like this: struct Sc { uint val; char[3] idz; char[] id() { return idz[0..3]; } } This turns 'id' into a property instead of a field. This might be slightly slower*, but has several advantages: - It will always return the current string without needing manual updates. - It doesn't allocate. - It removes the need to initialize 'id' separately. - Added bonus: Sc.sizeof is halved. (goes from 16 bytes to 8 bytes) *) Not necessarily though, it may well be optimized out.
Apr 24 2006
parent "Bob W" <nospam aol.com> writes:
"Frits van Bommel" <fvbommel REMwOVExCAPSs.nl> wrote in message 
news:e2iqkp$2fvl$1 digitaldaemon.com...
 How about defining the struct like this:

 struct Sc { uint val;  char[3] idz;  char[] id() { return idz[0..3]; } }

 This turns 'id' into a property instead of a field. This might be slightly 
 slower*, but has several advantages:

No idea what slightly slower means to you, but I can guarantee you that with indirect string access you won't even get to 50% of the original speed.
 - It will always return the current string without needing manual updates.

I need direct access to the strings, due to performance reasons. Therefore I do not mind the updates whenerver they are required.
 - It doesn't allocate.

This would be unacceptable anyway.
 - It removes the need to initialize 'id' separately.

You are referring to the sample which has those convenient constant length strings. My application does not work this way, however. I'd have to have some basic initialisation first in order to get quicker access later.
 - Added bonus: Sc.sizeof is halved. (goes from 16 bytes to 8 bytes)

I guess most of us have 1 or 2GB installed these days, true? Don't get me wrong. Your idea would be definitely worthwhile considering for other applications. But it just does not fit my current requirements.
Apr 24 2006
prev sibling parent "Bob W" <nospam aol.com> writes:
Thanks guys for your replies!

For anyone who had no time to check upon
my suspicions that D was producing something
'unexpected', here is a short explanation:

My sample program can be imagined as the
attempt to mark a freeway exit with an arrow
inside a car in order to find it later. Unfortunately
I was moving the car afterwards so the arrow
was pointing to ... (you guess).

As a result I have to admit that D did not
deserve any of my secret suspicions about
its stability in this respect.

At least I was careful enough not to use
the word 'bug' in any of my posts.   ;-)
Apr 24 2006