www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - How is this code invalid?

reply thebluepandabear <therealbluepandabear protonmail.com> writes:
I am reading the fantastic book about D by Ali Çehreli, and he 
gives the following example when he talks about variadic 
functions:

```D
int[] numbersForLaterUse;

void foo(int[] numbers...) {
    numbersForLaterUse = numbers;
}

struct S {
   string[] namesForLaterUse;

   void foo(string[] names...) {
      namesForLaterUse = names;
   }
}
```

He says that the code above is a bug because:

"Both the free-standing function foo() and the member function 
S.foo() are in
error because they store slices to automatically-generated 
temporary arrays that
live on the program stack. Those arrays are valid only during the 
execution of the
variadic functions."

The thing is, when I run the code I get absolutely no error, so 
how is this exactly a 'bug' if the code runs properly? That's 
what I am confused about. What is the D compiler doing behind the 
scenes?
Dec 16 2022
next sibling parent ag0aep6g <anonymous example.com> writes:
On Saturday, 17 December 2022 at 00:23:32 UTC, thebluepandabear 
wrote:
 ```D
 int[] numbersForLaterUse;

 void foo(int[] numbers...) {
    numbersForLaterUse = numbers;
 }

 struct S {
   string[] namesForLaterUse;

   void foo(string[] names...) {
      namesForLaterUse = names;
   }
 }
 ```
[...]
 The thing is, when I run the code I get absolutely no error, so 
 how is this exactly a 'bug' if the code runs properly? That's 
 what I am confused about. What is the D compiler doing behind 
 the scenes?
You're witnessing the wonders of undefined behavior. Invalid code can still produce the results you're hoping for, or it can produce garbage results, or it can crash, or it can do something else entirely. And just because running it once does one thing, does not mean that the next run will do the same. For your particular code, here is an example where `numberForLaterUse` end up not being what we pass in: ```d int[] numbersForLaterUse; void foo(int[] numbers...) { numbersForLaterUse = numbers; /* No! Don't! Bad programmer! Bad! */ } void bar() { int[3] n = [1, 2, 3]; foo(n); } void main() { bar(); import std.stdio; writeln(numbersForLaterUse); /* prints garbage */ } ``` But again nothing at all is actually guaranteed about what that program does. It exhibits undefined behavior. So it could just as well print "[1, 2, 3]", making you think that everything is fine.
Dec 16 2022
prev sibling next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Sat, Dec 17, 2022 at 12:23:32AM +0000, thebluepandabear via
Digitalmars-d-learn wrote:
[...]
 ```D
 int[] numbersForLaterUse;
 
 void foo(int[] numbers...) {
    numbersForLaterUse = numbers;
 }
 
 struct S {
   string[] namesForLaterUse;
 
   void foo(string[] names...) {
      namesForLaterUse = names;
   }
 }
 ```
[...]
 The thing is, when I run the code I get absolutely no error, so how is
 this exactly a 'bug' if the code runs properly? That's what I am
 confused about.  What is the D compiler doing behind the scenes?
Try labelling the above functions with safe and see what the compiler says. If you really want to see what could possibly have gone wrong, try this version of the code: ------------------------------snip----------------------------------- int[] numbersForLaterUse; void foo(int[] numbers...) { numbersForLaterUse = numbers; } struct S { string[] namesForLaterUse; void foo(string[] names...) { namesForLaterUse = names; } } void whatwentwrong() { import std.stdio; writeln(numbersForLaterUse); } void whatelsewentwrong(S s) { import std.stdio; writeln(s.namesForLaterUse); } void badCodeBad() { foo(1, 2, 3, 4, 5); } S alsoReallyBad() { S s; s.foo("hello", "world!"); return s; } void main() { badCodeBad(); whatwentwrong(); auto s = alsoReallyBad(); whatelsewentwrong(s); } ------------------------------snip----------------------------------- The results will likely differ depending on your OS and specific environment; but on my Linux machine, it outputs a bunch of garbage (instead of the expected numbers and "hello" "world!" strings) and crashes. T -- If you want to solve a problem, you need to address its root cause, not just its symptoms. Otherwise it's like treating cancer with Tylenol...
Dec 16 2022
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Fri, Dec 16, 2022 at 05:39:08PM -0800, H. S. Teoh via Digitalmars-d-learn
wrote:
[...]
 If you really want to see what could possibly have gone wrong, try
 this version of the code:
[...]
 The results will likely differ depending on your OS and specific
 environment; but on my Linux machine, it outputs a bunch of garbage
 (instead of the expected numbers and "hello" "world!" strings) and
 crashes.
[...] In case you're wondering, here's a brief explanation of why the above code triggers a problem: When your program is running, the CPU has FIFO (first-in, first-out) queue that it uses as scratch space for computations, called the runtime stack. Function arguments are typically passed by having the calling function push the values on the stack, and having the called function retrieve these values from the stack. In addition to function arguments, the CPU also stores various other information on the stack, such as the return address to jump to once the called function returns, and potentially other stuff, depending on the specific OS and CPU. Furthermore, the called function itself also reserves some space on the stack for storing local variables. Together, this information is called a "stack frame". When you call badCodeBad(), the arguments [ 1, 2, 3, 4, 5 ] are allocated on the stack and passed to foo(). foo() then stores a slice to these arguments, i.e., a slice of the stack locations that currently contain [ 1, 2, 3, 4, 5 ]. Then foo() returns to badCodeBad(), and badCodeBad() returns to main. The stack frame that contains the [ 1, 2, 3, 4, 5 ] is now no longer in scope. However, it may not necessarily have been overwritten with new data yet. Then main() calls whatwentwrong(). This involves creating a new stack frame for whatwentwrong(), pushing the return address on the stack, and so on. At this point, whatwentwrong()'s stack frame overwrites the original stack frame where badCodeBad() stored the [ 1, 2, 3, 4, 5 ]. The array elements are now overwritten with other data that aren't supposed to be interpreted as integers. That's why when whatwentwrong() tries to print the contents of numberForLaterUse, which now points to an area on the stack that has just been overwritten by whatwentwrong()'s stack frame, you get garbage output. A similar thing happens when you call alsoReallyBad(). It allocates the string array [ "hello", "world!" ] on the stack, and S.foo() wrongly stores a slice to that location on the stack. When alsoReallyBad() returns, the stack frame that contains this array goes out of scope (though not necessarily overwritten just yet). When main() then calls whatelsewentwrong(), that involves passing the instance of S as argument, and also creating a new stack frame for alsoReallyBad(). All of this new data overwrites the original stack frame, stomping all over the [ "hello", "world!" ] array and overwriting it with stuff that isn't supposed to be interpreted as a string array. When whatelsewentwrong() then tries to print the contents of s.namesForLaterUse, the slice points to the location on the stack that now contains data that no longer contains the string array; writeln tries to interpret this as a string array, which results in garbage being printed. Since a string is also an array, consisting of a pointer and a length, interpreting random data as a string causes writeln to read a random amount of data from a random location in memory. On my system, it just so happens part of range of memory locations is outside the range mapped by the OS to the program; this causes an invalid memory access that made the OS forcefully terminate the program. // The underlying cause of these problems is exactly what Ali said in his book: foo() and S.foo() tried to store a slice to a stack location past its lifetime. Once the stack frame went out of scope, all bets are off as to what the slice now points to. It could have been overwritten by other data that can no longer be interpreted as an int[] or string[]. In this case, it caused the program to print random garbage and crash. In more complicated scenarios, such a bug in the code can become a hole for a hacker to exploit. Consider, for example, if the code tried to do some arithmetic on the int[] that it saved as numbersForLaterUse. Since the location that used to contain the int[] now contains a function stack frame, part of it could potentially contain a return address to main(). The hacker could exploit this by manipulating the program's input such that the arithmetic on the int[] overwrites this return address to point to something else, such as an OS call to format your hard drive. Then when the function finishes what it's doing and tries to return, instead of returning to main() it jumps to the function that formats your hard drive. The takeaway from all this is: (1) It's Bad(tm) to store a slice to a stack location past its lifetime. (2) Use safe when possible so that the compiler will tell you when you're doing something wrong and potentially dangerous. T -- A computer doesn't mind if its programs are put to purposes that don't match their names. -- D. Knuth
Dec 16 2022
next sibling parent reply thebluepandabear <therealbluepandabear protonmail.com> writes:
 T
Thanks, I've tried to mark it with ` safe` and it did give me a warning. I was also wondering, why is this code valid? ```D int[] numbersForLaterUse; safe void foo(int[] numbers) { numbersForLaterUse = numbers; } ```
Dec 16 2022
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Sat, Dec 17, 2022 at 02:36:10AM +0000, thebluepandabear via
Digitalmars-d-learn wrote:
[...]
 Thanks, I've tried to mark it with ` safe` and it did give me a
 warning.
 
 I was also wondering, why is this code valid?
 
 ```D
 int[] numbersForLaterUse;
 
  safe void foo(int[] numbers) {
 	numbersForLaterUse = numbers;
 }
 ```
This code is safe provided the arguments are not allocated on the stack, which is usually the case because you can no longer call it with: foo(1, 2, 3, 4); but you have to write: foo([ 1, 2, 3, 4 ]); The [] here will allocate a new array on the heap, so the array elements will not go out of scope when the caller returns. (They will be collected by the GC after all references to them have gone out of scope. This is one of the advantages of using a GC: it saves you from having to worry about complicated lifetimes in such cases.) You may still run into trouble, though, if you do this: int[3] data = [ 1, 2, 3 ]; // N.B.: stack-allocated foo(data[]); // uh oh To guard against this, use safe and -dip1000, which will cause the compiler to detect this dangerous usage and generate an error. T -- Answer: Because it breaks the logical sequence of discussion. / Question: Why is top posting bad?
Dec 17 2022
prev sibling parent =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 12/16/22 18:20, H. S. Teoh wrote:

 scratch space for computations, called the runtime
 stack.
I called it "function call stack" where I gave a very simplistic view of it here: https://www.youtube.com/watch?v=NWIU5wn1F1I&t=236s
 (2) Use  safe when possible so that the compiler will tell you when
 you're doing something wrong and potentially dangerous.
Unfortunately, safe is not as prominent in the book as it should be. Part of the reason is I think its implementation is not complete especially how it changes with -dip1000. Ali
Dec 16 2022