digitalmars.D.learn - Address of data that is static, be it shared or tls or _

digitalmars.D.learn - Address of data that is static, be it shared or tls or __gshared or

Cecil Ward (11/11) Sep 06 2017 If someone has some static data somewhere, be it in tls or marked

=?UTF-8?Q?Ali_=c3=87ehreli?= (19/30) Sep 06 2017 Yes, they are all valid operations. Further, the object need not be a

Cecil Ward (59/101) Sep 10 2017 Ali, I have worked on operating systems' development in r+d. My

John Burton (30/46) Sep 11 2017 I wrote this program :-

=?UTF-8?Q?Ali_=c3=87ehreli?= (41/68) Sep 11 2017 The output is deceptive. 'data' is thread-local: Every thread has its

Walter Bright (15/62) Sep 11 2017 D tries very hard to use the exact same TLS method used by the local C o...

=?UTF-8?Q?Ali_=c3=87ehreli?= (42/46) Sep 11 2017 Since we're talking about TLS, the data is not shared. So, I think
Moritz Maxeiner (11/14) Sep 11 2017 Could you elaborate on that explanation more? The way I thought

Cecil Ward <d cecilward.com> writes:

If someone has some static data somewhere, be it in tls or marked 
shared __gshared or immutable or combinations (whatever), and 
someone takes the address of it and pass that address to some 
other routine of mine that does not have access to the source 
code of the original definition of the object in question, then 
is it possible to just use 'the address' passed without knowing 
anything about that data? I'm assuming that the answer might also 
depend on compilers, machine architectures and operating systems?

If this kind of assumption is very ill-advised, is there anything 
written up about implementation details in different operating 
systems / compilers ?

Sep 06 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/06/2017 08:27 AM, Cecil Ward wrote:
 If someone has some static data somewhere, be it in tls or marked shared
 __gshared or immutable or combinations (whatever), and someone takes the
 address of it and pass that address to some other routine of mine that
 does not have access to the source code of the original definition of
 the object in question, then is it possible to just use 'the address'
 passed without knowing anything about that data? I'm assuming that the
 answer might also depend on compilers, machine architectures and
 operating systems?

 If this kind of assumption is very ill-advised, is there anything
 written up about implementation details in different operating systems /
 compilers ?

Yes, they are all valid operations. Further, the object need not be a 
static one; you can do the same with any object even it's on the stack. 
However,

- The object must remain alive whenever the other routine uses it. This 
precludes the case of the object being on the stack and the other 
routine saving it for later use. When that later use happens, there is 
no object any more. (An exception: The object may be kept alive by a 
closure; so even that case is valid.)

- Remember that in D data is thread-local by default; e.g. a module 
variable will appear to be on the same address to all threads but each 
thread will have its own copy. So, if the data is going to be used in 
another thread, it must be defined as 'shared'. Otherwise, although the 
code will look like it's working, different threads will be accessing 
different data. (Sometimes this is exactly what is desired but not what 
you're looking for.) (Fortunately, many high-level thread operations 
like the ones in std.concurrency will not let you share data unless it's 
'shared'.)

Ali

Sep 06 2017

Cecil Ward <d cecilward.com> writes:

On Wednesday, 6 September 2017 at 15:55:35 UTC, Ali Çehreli wrote:
 On 09/06/2017 08:27 AM, Cecil Ward wrote:
 If someone has some static data somewhere, be it in tls or

 marked shared
 __gshared or immutable or combinations (whatever), and

 someone takes the
 address of it and pass that address to some other routine of

 mine that
 does not have access to the source code of the original

 definition of
 the object in question, then is it possible to just use 'the

 address'
 passed without knowing anything about that data? I'm assuming

 that the
 answer might also depend on compilers, machine architectures

 and
 operating systems?

 If this kind of assumption is very ill-advised, is there

 anything
 written up about implementation details in different

 operating systems /
 compilers ?

 Yes, they are all valid operations. Further, the object need 
 not be a static one; you can do the same with any object even 
 it's on the stack. However,

 - The object must remain alive whenever the other routine uses 
 it. This precludes the case of the object being on the stack 
 and the other routine saving it for later use. When that later 
 use happens, there is no object any more. (An exception: The 
 object may be kept alive by a closure; so even that case is 
 valid.)

 - Remember that in D data is thread-local by default; e.g. a 
 module variable will appear to be on the same address to all 
 threads but each thread will have its own copy. So, if the data 
 is going to be used in another thread, it must be defined as 
 'shared'. Otherwise, although the code will look like it's 
 working, different threads will be accessing different data. 
 (Sometimes this is exactly what is desired but not what you're 
 looking for.) (Fortunately, many high-level thread operations 
 like the ones in std.concurrency will not let you share data 
 unless it's 'shared'.)

 Ali

Ali, I have worked on operating systems' development in r+d. My 
definitions of terms are hopefully the same as yours. If we refer 
to two threads, if they both belong to the same process, then 
they share a common address space, by my definition of the terms 
'thread' and 'process'. I use thread to mean basically a stack, 
plus register set, a cpu execution context, but has nothing to do 
with virtual memory spaces or o/s ownership of resources, the one 
exception being a tls space, which by definition is 
one-per-thread. A process is one or more threads plus an address 
space and a set of all the resources owned by the process 
according to the o/s. I'm just saying this so you know how I'm 
used to approving this.

Tls could I suppose either be dealt with by having allocated 
regions within a common address space that are all visible to one 
another. Objects inside a tls could (1) be referenced by absolute 
virtual addresses that are meaningful to all the threads in the 
process, but not meaningful to (threads belong to) other 
processes. (By definition of 'process'.) or (2) be referenced 
most often by section-offsets, relative addresses from the start 
of a tls section, which constantly have to be made usable by 
having the tls base virtual address added to them before they can 
be dereferenced adding a big runtime cost and making tls very bad 
news. I have worked on a system like (2). But even in (2) an 
address of a type-2 tls object can still be converted to a 
readily usable absolute virtual address and used by any thread in 
the process with zero overhead. A third option though could be to 
use processor segmentation, so tls objects have to (3a) be 
dereferenced using a segment prefixed operation, and then it's 
impossible to just have a single dereference operation such as 
star without knowing whether to use the segment prefix or not. 
But if it is again possible to use forbidden or official 
knowledge to convert the segmented form into a process-wide 
meaningful straight address (as in 8086 20-bit addresses) then we 
could term this 3a addressing. If this is not possible because vm 
hardware translation is in use then I will term this 3b. In 3a I 
am going to assume that vm hardware is used merely to provide 
relocation, address offsetting, so the use of a segmentation 
prefix basically merely adds a per-thread fixed offset to the 
virtual address and if you could discover that offset then you 
don't need to bother with the segment prefix. In 3b, vm hardware 
maps virtual addresses to a set of per-tls pages using 
who-knows-what mechanism, anyway something that apps cannot just 
bypass using forbidden knowledge to generate a single 
process-wide virtual address. This means that 3b threads are 
probably breaking my definition of thread vs process, although 
they threads of one process do also have a common address space 
and they share resources.

I don't know what d's assumptions if any are. I have very briefly 
looked at some code generated by GDC and LDC for Linux x64. It 
seems to me that these are 3a systems, optimised strongly enough 
by the compilers to remove 3a inefficiency that they are nearly 
1. But I must admit, I haven't looked into it properly, just 
noted a few things in passing and haven't written any test cases 
as I don't know d well enough yet. I haven't seen the code these 
compilers generate for Windows.

[Many thanks for your superb book btw, which I am just reading 
for the second time round. I wouldn't have got very far without 
it.]

Sep 10 2017

John Burton <john.burton jbmail.com> writes:

On Sunday, 10 September 2017 at 21:38:03 UTC, Cecil Ward wrote:
 On Wednesday, 6 September 2017 at 15:55:35 UTC, Ali Çehreli 
 wrote:
 [...]

 Ali, I have worked on operating systems' development in r+d. My 
 definitions of terms are hopefully the same as yours. If we 
 refer to two threads, if they both belong to the same process, 
 then they share a common address space, by my definition of the 
 terms 'thread' and 'process'. I use thread to mean basically a 
 stack, plus register set, a cpu execution context, but has 
 nothing to do with virtual memory spaces or o/s ownership of 
 resources, the one exception being a tls space, which by 
 definition is one-per-thread. A process is one or more threads 
 plus an address space and a set of all the resources owned by 
 the process according to the o/s. I'm just saying this so you 
 know how I'm used to approving this.

 [...]




I wrote this program :-

import std.stdio;
import std.concurrency;

int data;

void display()
{
     writeln("Address is ", &data);
}

void main()
{
     auto tid1 = spawn(&display);
     auto tid2 = spawn(&display);
     auto tid3 = spawn(&display);
}

It displayed :-

Address is 51AD20
Address is 51AD20
Address is 51F6D0
Address is 521AC0

This indicated to me that a thread local variable does in fact 
have a different address to other thread's instances of the same 
thread so you can in fact pass the address to another thread and 
access it from there via pointer, which is what I'd hope.

Interesting it also (sometimes) prints one of the lines twice 
quite often.
I wonder if this is the same "bug" as 
https://issues.dlang.org/show_bug.cgi?id=17797 that doesnt even 
require any reading? (platform is windows 7 DMD32 D Compiler 
v2.076.0)

Sep 11 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/11/2017 01:51 AM, John Burton wrote:

 I wrote this program :-

 import std.stdio;
 import std.concurrency;

 int data;

 void display()
 {
     writeln("Address is ", &data);
 }

 void main()
 {
     auto tid1 = spawn(&display);
     auto tid2 = spawn(&display);
     auto tid3 = spawn(&display);
 }

 It displayed :-

 Address is 51AD20
 Address is 51AD20
 Address is 51F6D0
 Address is 521AC0

 This indicated to me that a thread local variable does in fact have a
 different address to other thread's instances of the same thread so you
 can in fact pass the address to another thread and access it from there
 via pointer, which is what I'd hope.

The output is deceptive. 'data' is thread-local: Every thread has its 
own copy. The following program indicates the variables are different:

import std.stdio;
import std.concurrency;
import core.thread;

int data;

class Lock {
}

void display(shared Lock lock)
{
     synchronized (lock) {
         writeln("Address is ", &data);
         ++data;
         writeln(data);
     }
}

void main()
{
     auto lock = new shared Lock();
     auto tid1 = spawn(&display, lock);
     auto tid2 = spawn(&display, lock);
     auto tid3 = spawn(&display, lock);
     thread_joinAll();
     writeln(data);
}

Sample output (yes, 64-bit build):

Address is 7F8443BCE580
1
Address is 7F843BFFF580
1
Address is 7F844441F580
1
0

 Interesting it also (sometimes) prints one of the lines twice quite 

often.

That's a coincidence that different threads see 'data' at the same 
address value but they are still different objects. Actually, I'm 
surprised that they are reported differently. If I remember correctly, 
in the past it would report the same address. Perhaps a case of ASLR?

 I wonder if this is the same "bug" as
 https://issues.dlang.org/show_bug.cgi?id=17797 that doesnt even require
 any reading? (platform is windows 7 DMD32 D Compiler v2.076.0)

I doubt it unless you get 4 addresses instead of 3.

Ali

Sep 11 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 9/10/2017 2:38 PM, Cecil Ward wrote:
 Ali, I have worked on operating systems' development in r+d. My definitions of 
 terms are hopefully the same as yours. If we refer to two threads, if they
both 
 belong to the same process, then they share a common address space, by my 
 definition of the terms 'thread' and 'process'. I use thread to mean basically
a 
 stack, plus register set, a cpu execution context, but has nothing to do with 
 virtual memory spaces or o/s ownership of resources, the one exception being a 
 tls space, which by definition is one-per-thread. A process is one or more 
 threads plus an address space and a set of all the resources owned by the 
 process according to the o/s. I'm just saying this so you know how I'm used to 
 approving this.
 
 Tls could I suppose either be dealt with by having allocated regions within a 
 common address space that are all visible to one another. Objects inside a tls 
 could (1) be referenced by absolute virtual addresses that are meaningful to
all 
 the threads in the process, but not meaningful to (threads belong to) other 
 processes. (By definition of 'process'.) or (2) be referenced most often by 
 section-offsets, relative addresses from the start of a tls section, which 
 constantly have to be made usable by having the tls base virtual address added 
 to them before they can be dereferenced adding a big runtime cost and making
tls 
 very bad news. I have worked on a system like (2). But even in (2) an address
of 
 a type-2 tls object can still be converted to a readily usable absolute
virtual 
 address and used by any thread in the process with zero overhead. A third
option 
 though could be to use processor segmentation, so tls objects have to (3a) be 
 dereferenced using a segment prefixed operation, and then it's impossible to 
 just have a single dereference operation such as star without knowing whether
to 
 use the segment prefix or not. But if it is again possible to use forbidden or 
 official knowledge to convert the segmented form into a process-wide
meaningful 
 straight address (as in 8086 20-bit addresses) then we could term this 3a 
 addressing. If this is not possible because vm hardware translation is in use 
 then I will term this 3b. In 3a I am going to assume that vm hardware is used 
 merely to provide relocation, address offsetting, so the use of a segmentation 
 prefix basically merely adds a per-thread fixed offset to the virtual address 
 and if you could discover that offset then you don't need to bother with the 
 segment prefix. In 3b, vm hardware maps virtual addresses to a set of per-tls 
 pages using who-knows-what mechanism, anyway something that apps cannot just 
 bypass using forbidden knowledge to generate a single process-wide virtual 
 address. This means that 3b threads are probably breaking my definition of 
 thread vs process, although they threads of one process do also have a common 
 address space and they share resources.
 
 I don't know what d's assumptions if any are. I have very briefly looked at
some 
 code generated by GDC and LDC for Linux x64. It seems to me that these are 3a 
 systems, optimised strongly enough by the compilers to remove 3a inefficiency 
 that they are nearly 1. But I must admit, I haven't looked into it properly, 
 just noted a few things in passing and haven't written any test cases as I
don't 
 know d well enough yet. I haven't seen the code these compilers generate for 
 Windows.

D tries very hard to use the exact same TLS method used by the local C or C++ 
compiler, so the same assumptions and methods apply. Only of the local C or C++ 
compiler does not support TLS does D provide its own implementation.

In the case of Windows and Linux (and others), TLS support is embedded into the 
standard linker, and D makes use of that.

If an address is taken to a TLS object, any relocations and adjustments are
made 
at the time the pointer is generated, not when the pointer is dereferenced. 
Hence, the pointer may be passed from thread to thread, and will still point to 
the same object. There is only ONE pointer type in D. D does not support 
multiple pointer types, such as near/far, or pointers tagged with additional 
data saying how they should be dereferenced.

TLS data is all owned by the process. TLS is not a method for inter-process 
communication.

TLS code generation for D is the same as for C and C++.

Sep 11 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 09/11/2017 03:38 PM, Walter Bright wrote:

 If an address is taken to a TLS object, any relocations and adjustments
 are made at the time the pointer is generated, not when the pointer is
 dereferenced. Hence, the pointer may be passed from thread to thread,
 and will still point to the same object.

Since we're talking about TLS, the data is not shared. So, I think 
you're referring to an example where the value of the pointer is passed 
e.g. as a ulong. Otherwise, of course std.concurrency.spawn does not 
allow non-shared parameters.

Continuing with John Burton's example, the following program 
demonstrates your point. The address of main's TLS 'data' is passed as 
ulong and then used as an int* by other threads:

import std.stdio;
import std.concurrency;
import core.thread;
import core.atomic;

int data;

class Lock {
}

void display(shared(Lock) lock, ulong u) {
     synchronized (lock) {
         int *p = cast(int*)u;
         writeln("Address is ", p);
         ++(*p);
         writeln(*p);
     }
}

void main()
{
     auto lock = new shared(Lock)();
     auto u = cast(ulong)&data;
     auto tid1 = spawn(&display, lock, u);
     auto tid2 = spawn(&display, lock, u);
     auto tid3 = spawn(&display, lock, u);
     thread_joinAll();
     writeln(data);
}

The output shows that all threads did modify the same data:

Address is 7F3E4DF5E740
1
Address is 7F3E4DF5E740
2
Address is 7F3E4DF5E740
3
3

Ali

Sep 11 2017

Moritz Maxeiner <moritz ucworks.org> writes:

On Monday, 11 September 2017 at 22:38:21 UTC, Walter Bright wrote:
 If an address is taken to a TLS object, any relocations and 
 adjustments are made at the time the pointer is generated, not 
 when the pointer is dereferenced.

Could you elaborate on that explanation more? The way I thought 
about it was that no matter where the data is actually stored 
(global, static, tls, heap, etc.), in order to access it by 
pointer it must be mapped into virtual memory (address) space. 
 From that it follows that each thread will have its own "slice" 
of that address space. Thus, if you pass an address into such a 
slice (that happens to be mapped to the TLS of a thread) to other 
threads, you can manipulate the first thread's TLS data (and 
cause the usual data races without proper synchronization, of 
course).

Sep 11 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Address of data that is static, be it shared or tls or __gshared or