digitalmars.D - D array expansion and non-deterministic re-allocation
- Bartosz Milewski <bartosz-nospam relisoft.com> Nov 15 2009
- Tim Matthews <tim.matthews7 gmail.com> Nov 15 2009
- Walter Bright <newshound1 digitalmars.com> Nov 15 2009
- "Nick Sabalausky" <a a.a> Nov 15 2009
- Walter Bright <newshound1 digitalmars.com> Nov 15 2009
- Walter Bright <newshound1 digitalmars.com> Nov 15 2009
- Denis Koroskin <2korden gmail.com> Nov 17 2009
- Sean Kelly <sean invisibleduck.org> Nov 17 2009
- Walter Bright <newshound1 digitalmars.com> Nov 17 2009
- grauzone <none example.net> Nov 15 2009
- Walter Bright <newshound1 digitalmars.com> Nov 16 2009
- Rainer Deyke <rainerd eldwood.com> Nov 16 2009
- Walter Bright <newshound1 digitalmars.com> Nov 16 2009
- "Nick Sabalausky" <a a.a> Nov 16 2009
- Walter Bright <newshound1 digitalmars.com> Nov 16 2009
- BCS <none anon.com> Nov 16 2009
- Walter Bright <newshound1 digitalmars.com> Nov 16 2009
- Bartosz Milewski <bartosz-nospam relisoft.com> Nov 17 2009
- Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> Nov 17 2009
- Bartosz Milewski <bartosz-nospam relisoft.com> Nov 18 2009
- Bill Baxter <wbaxter gmail.com> Nov 17 2009
- "Denis Koroskin" <2korden gmail.com> Nov 15 2009
- Leandro Lucarella <llucax gmail.com> Nov 15 2009
- Ali Cehreli <acehreli yahoo.com> Nov 16 2009
- dsimcha <dsimcha yahoo.com> Nov 16 2009
I read Andrei's chapter on arrays and there's one thing that concerns me. When
a slice is extended, the decision to re-allocate, and therefore to cut its
connection to other slices, is non-deterministic. How does that influence
program testing and can it be exploited to attack a buggy system?
Here's a piece of code I came up with--it's not realistic but it describes a
certain class of scenarios:
void f(byte[] code)
{
doSomeChecks(code);
execute(code);
}
void doSomeChecks(byte[] a)
{
…
// to avoid special-casing empty arrays
a.length += 1; // extension
…
// get byte b (or a whole bunch) from somewhere
a[0] = b; // write
…
// check what happens when a[0] = b
}
Of course this code is incorrect because doSomeChecks assumes that it holds a
unique array, and the caller failed to make a copy before sending it to
doSomeChecks. Bad bad programmer!
However imagine this scenario: Under testing conditions, the array extension
just happens to always cause reallocation, so the write is NOT visible in the
calling routine. All tests pass with flying colors. But since the re-allocation
is, by language definition, non-deterministic, it might happen that, after the
code has been deployed, the re-allocation stopped happening, say, on Sundays.
If a hacker learns about that, he may be able to inject malicious code or data
into the program.
In this case the non-determinism built into the language helped camouflage a
serious bug. Should we, or shouldn't we, worry about it in practice?
Nov 15 2009
Bartosz Milewski wrote:I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic. How does that influence program testing and can it be exploited to attack a buggy system? Here's a piece of code I came up with--it's not realistic but it describes a certain class of scenarios: void f(byte[] code) { doSomeChecks(code); execute(code); } void doSomeChecks(byte[] a) { … // to avoid special-casing empty arrays a.length += 1; // extension … // get byte b (or a whole bunch) from somewhere a[0] = b; // write … // check what happens when a[0] = b } Of course this code is incorrect because doSomeChecks assumes that it holds a unique array, and the caller failed to make a copy before sending it to doSomeChecks. Bad bad programmer! However imagine this scenario: Under testing conditions, the array extension just happens to always cause reallocation, so the write is NOT visible in the calling routine. All tests pass with flying colors. But since the re-allocation is, by language definition, non-deterministic, it might happen that, after the code has been deployed, the re-allocation stopped happening, say, on Sundays. If a hacker learns about that, he may be able to inject malicious code or data into the program. In this case the non-determinism built into the language helped camouflage a serious bug. Should we, or shouldn't we, worry about it in practice?
Isn't the new kind of arrays created to fix this already? See T[new]. PS: Are you still working on D's concurrency?
Nov 15 2009
Bartosz Milewski wrote:I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.
It is not non-deterministic. Try it - you'll get the same results every time. It is implementation-defined behavior.How does that influence program testing and can it be exploited to attack a buggy system?
Exploitations rely on undefined-behavior, such as buffer overflows, not implementation-defined behavior. This issue is no more conducive to an "exploit" than any other garden variety programming issue. It's entirely different than a buffer overflow attack.
Nov 15 2009
"Walter Bright" <newshound1 digitalmars.com> wrote in message news:hdqea8$2mte$1 digitalmars.com...Bartosz Milewski wrote:I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.
It is not non-deterministic. Try it - you'll get the same results every time. It is implementation-defined behavior.How does that influence program testing and can it be exploited to attack a buggy system?
Exploitations rely on undefined-behavior, such as buffer overflows, not implementation-defined behavior. This issue is no more conducive to an "exploit" than any other garden variety programming issue. It's entirely different than a buffer overflow attack.
Definition used by Implementation "Foo": "Resize if there's room, otherwise reallocate" (sounds fairly realistic to me) Code in program using Implementation "Foo": myArrayOnHeap ~= getUserInput(); Resizes or Reallocates?: Deterministically dependant on "Is there room?" which is deterministically dependant on two things: 1. The size of the user input (non-deterministic). 2. The amount of freespace after the current end of myArrayOnHeap (dependent on the specific actions of the GC, which in turn, is dependent on other behaviors of the program, which, for almost any useful program, are dependent on non-deterministic runtime conditions, such as other user input). If 'A' ("resize or reallocate?") is deterministically determined by 'B', and 'B' ("is there room?") is non-deterministic ("dependent on things like user input"), then 'A' is, in effect, non-deterministic. Simplified example of the same basic principle in action: void main(char[][] args) { if(args.length < 10) showPrettyPicture(); else blowUpEarth(); } Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic. Safe? Fuck no.
Nov 15 2009
Nick Sabalausky wrote:Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic.
It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging. A non-deterministic problem will give you a different result every time you run it. Threading problems are an example of this, as are any dependencies on uninitialized data. This particular issue is implementation-defined.Safe? Fuck no.
It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.
Nov 15 2009
Denis Koroskin wrote:It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).
The LRU is thread local.
Nov 15 2009
Walter Bright Wrote:Denis Koroskin wrote:It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).
The LRU is thread local.
It will then prevent a moving GC from being possible in D. Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.
Nov 17 2009
Denis Koroskin Wrote:Walter Bright Wrote:Denis Koroskin wrote:It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).
The LRU is thread local.
It will then prevent a moving GC from being possible in D. Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.
It won't work with the existing GC either. If a page is completely emptied then it can be re-used for different sized allocations. This is why the current cache is wiped during a collection.
Nov 17 2009
Sean Kelly wrote:Denis Koroskin Wrote:Walter Bright Wrote:Denis Koroskin wrote:It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).
Otherwise, moving a block of data will invalidate LRU which will result in non-deterministic reallocation.
It won't work with the existing GC either. If a page is completely emptied then it can be re-used for different sized allocations. This is why the current cache is wiped during a collection.
In both cases, the LRU can be adjusted rather than wiped to account for it.
Nov 17 2009
Walter Bright wrote:Safe? Fuck no.
It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.
Even if it's memory safe, it could cause a bug. Look at Bartosz example again; if the memory block gets copied really depends from the input data length.
Nov 15 2009
grauzone wrote:Even if it's memory safe, it could cause a bug. Look at Bartosz example again; if the memory block gets copied really depends from the input data length.
Yes, it could cause a bug. But it isn't undefined-behavior, it isn't memory corruption, and it isn't non-deterministic. i = ++i; can also cause a bug.
Nov 16 2009
Walter Bright wrote:It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.
On the same platform, with the same compiler, compiler settings, and standard library implementation. That makes it harder to test, not easier. You now have to test with multiple compilers.It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.
The current behavior is unsafe in that you can accidentally have two variables pointing at the same buffer. Let's say one buffer holds network input and the other holds some bytecode to execute. Boom - a bug that can be exploited to execute malicious (byte-)code. -- Rainer Deyke - rainerd eldwood.com
Nov 16 2009
Rainer Deyke wrote:Walter Bright wrote:It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.
standard library implementation. That makes it harder to test, not easier. You now have to test with multiple compilers.
That is still determinate. Indeterminate means you get different results if your run it again on the same machine.It's safe as in memory safe. This is as opposed to undefined-behavior, which is not memory safe. A buffer overflow is an example of undefined-behavior.
The current behavior is unsafe in that you can accidentally have two variables pointing at the same buffer. Let's say one buffer holds network input and the other holds some bytecode to execute. Boom - a bug that can be exploited to execute malicious (byte-)code.
It is not two random arrays randomly sharing data.
Nov 16 2009
"Walter Bright" <newshound1 digitalmars.com> wrote in message news:hdr6mm$1bf0$1 digitalmars.com...Rainer Deyke wrote:Walter Bright wrote:It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.
standard library implementation. That makes it harder to test, not easier. You now have to test with multiple compilers.
That is still determinate. Indeterminate means you get different results if your run it again on the same machine.
Even if it is technically determinate if you run it on the same machine with the same inputs, that still does nothing to address Bartosz's claim that it's a potential security hole - Apps don't always get run on the same machine with the same inputs.
Nov 16 2009
Nick Sabalausky wrote:Even if it is technically determinate if you run it on the same machine with the same inputs, that still does nothing to address Bartosz's claim that it's a potential security hole - Apps don't always get run on the same machine with the same inputs.
It's not a security hole in any more serious manner than any other routine programming bug would be. Very few ordinary programming bugs are exploitable. A buffer overflow, however, is much more of a security hole because they are nearly always exploitable, because it allows arbitrary user data to be executed. This is not the case with the array resizing issue. That's why I drew a distinction between undefined-behavior and implementation-defined behavior. The former is a couple more orders of magnitude more serious.
Nov 16 2009
Hello Walter,Nick Sabalausky wrote:Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic.
the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging. A non-deterministic problem will give you a different result every time you run it. Threading problems are an example of this, as are any dependencies on uninitialized data. This particular issue is implementation-defined.
would you agree that it is not something the programer can predict in advance?
Nov 16 2009
BCS wrote:would you agree that it is not something the programer can predict in advance?
He can, but it is not reasonable to expect him to. But it's still deterministic.
Nov 16 2009
Walter Bright Wrote:BCS wrote:would you agree that it is not something the programer can predict in advance?
He can, but it is not reasonable to expect him to. But it's still deterministic.
I've been following this discussion about determinism and I think it addresses the problem from the wrong point of view, that of a specific implementation. My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays.
Nov 17 2009
Bartosz Milewski wrote:Walter Bright Wrote:BCS wrote:would you agree that it is not something the programer can predict in advance?
deterministic.
I've been following this discussion about determinism and I think it addresses the problem from the wrong point of view, that of a specific implementation. My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays.
It would not be difficult to specify in the language definition (and TDPL) that behavior is deterministic for a given platform. I think this has some impact on the freedom of the memory allocator, but probably not major. Andrei
Nov 17 2009
Andrei Alexandrescu Wrote:My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays. It would not be difficult to specify in the language definition (and TDPL) that behavior is deterministic for a given platform. I think this has some impact on the freedom of the memory allocator, but probably not major.
Actually this wouldn't fix the problem. Although this would make the program deterministic, it would still exhibit chaotic behavior (and chaos is a pretty good simulator of non-determinism--see random number generators). An input string that is one character longer than in the previous run in one part of the program could cause change in allocation in a completely different part of the program (arbitrary long distance coupling). Theoretically, the heap is deterministic, but in practice no program should depend on all pointers having exactly the same values from run to run. For all intents and purposes the heap should be treated as non-deterministic. This is why no language bothers to impose determinism on the heap. Neither should D.
Nov 18 2009
Bartosz Milewski wrote:Andrei Alexandrescu Wrote:My concern is the semantics of the language. As it is defined right now, a conforming implementation is free to use a quantum random number generator to decide whether to re-allocate or not. Is it likely? I don't think so; but the non-determinism is part of the semantics of D arrays. It would not be difficult to specify in the language definition (and TDPL) that behavior is deterministic for a given platform. I think this has some impact on the freedom of the memory allocator, but probably not major.
Actually this wouldn't fix the problem. Although this would make the program deterministic, it would still exhibit chaotic behavior (and chaos is a pretty good simulator of non-determinism--see random number generators).
I am glad you have also reached that conclusion.An input string that is one character longer than in the previous run in one part of the program could cause change in allocation in a completely different part of the program (arbitrary long distance coupling).
Then you must restart your argument which was centered around non-determinism. It starts like that: "I read Andrei's chapter on arrays and there's one thing that concerns me. When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic." Andrei
Nov 18 2009
On Tue, Nov 17, 2009 at 4:38 PM, Bartosz Milewski <bartosz-nospam relisoft.com> wrote:Walter Bright Wrote:BCS wrote:would you agree that it is not something the programer can predict in advance?
He can, but it is not reasonable to expect him to. But it's still deterministic.
I've been following this discussion about determinism and I think it addr=
tation.My concern is the semantics of the language. As it is defined right now, =
r to decide whether to re-allocate or not. Is it likely? I don't think so; = but the non-determinism is part of the semantics of D arrays. Then why not just fix the spec to say the behavior has to be deterministic? --bb
Nov 17 2009
On Mon, 16 Nov 2009 09:24:13 +0300, Walter Bright <newshound1 digitalmars.com> wrote:Nick Sabalausky wrote:Deterministic? Only in the same sense that "resize or realloc upon appending" is deterministic.
It's deterministic in the sense that if you run the program again with the same inputs, you will get the same result. This is a highly useful attribute for testing and debugging.
It is *non*-deterministic. The decision to reallocate depends (or will depend) on LRU and it may be cleared by another thread (e.g. another thread may reset it manually or via a GC cycle run).
Nov 15 2009
Tim Matthews, el 16 de noviembre a las 15:05 me escribiste:Isn't the new kind of arrays created to fix this already? See T[new].
T[new] is gone. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Parece McGuevara's o CheDonald's
Nov 15 2009
Bartosz Milewski Wrote:I read Andrei's chapter on arrays and there's one thing that concerns me.
I tried to voice the same concern earlier, but my voice is not loud yet. ;)When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.
I don't see it being any different than C++'s std::vector invalidating iterators when the vector is expanded. That has the same "non-determinism", but we've been living with it. The reason we've been living happily with it is, because it's been spelled out: "may invalidate all iterators." In D2's slices, what we're missing is the correct language to describe the semantics. As long as we set the semantics right, there will be no problem. (Earlier, I proposed "discretionary sharing semantics" for D2's slice semantics.) The first code examples that are used to describe slices better say "slice" on the left hand side: int[] slice = new int[2]; // GOOD for the semantics int[] array = new int[2]; // BAD for the semantics Ali
Nov 16 2009
Ali Cehreli wrote:Bartosz Milewski Wrote:I read Andrei's chapter on arrays and there's one thing that concerns me.
I tried to voice the same concern earlier, but my voice is not loud yet. ;)When a slice is extended, the decision to re-allocate, and therefore to cut its connection to other slices, is non-deterministic.
I don't see it being any different than C++'s std::vector invalidating iterators when the vector is expanded. That has the same "non-determinism", but we've been living with it.
It is different because invalid STL iterators could do anything. Andrei
Nov 16 2009
== Quote from Bartosz Milewski (bartosz-nospam relisoft.com)'s articleI read Andrei's chapter on arrays and there's one thing that concerns me. When a
connection to other slices, is non-deterministic. How does that influence program testing and can it be exploited to attack a buggy system?Here's a piece of code I came up with--it's not realistic but it describes a
void f(byte[] code) { doSomeChecks(code); execute(code); } void doSomeChecks(byte[] a) { … // to avoid special-casing empty arrays a.length += 1; // extension … // get byte b (or a whole bunch) from somewhere a[0] = b; // write … // check what happens when a[0] = b } Of course this code is incorrect because doSomeChecks assumes that it holds a
doSomeChecks. Bad bad programmer!However imagine this scenario: Under testing conditions, the array extension
calling routine. All tests pass with flying colors. But since the re-allocation is, by language definition, non-deterministic, it might happen that, after the code has been deployed, the re-allocation stopped happening, say, on Sundays. If a hacker learns about that, he may be able to inject malicious code or data into the program.In this case the non-determinism built into the language helped camouflage a
The one thing that I think has been missing from this discussion is, what would be the alternative if we didn't have this "non-deterministic" reallocation? How else could you **efficiently** implement dynamic arrays?
Nov 16 2009









Tim Matthews <tim.matthews7 gmail.com> 