digitalmars.D - Type safety could prevent nuclear war

tsbockman (10/10) Feb 04 2016 The annual Underhanded C Contest announced their winners today.

H. S. Teoh via Digitalmars-d (21/34) Feb 04 2016 The C preprocessor accepts all sorts of nasty, nonsensical things. For

tsbockman (9/27) Feb 04 2016 Definitely. What puzzles me about the winning entry, though, is

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/18) Feb 04 2016 Linkers don't know anything about types. A type is a language

tsbockman (6/18) Feb 04 2016 Yes, that was my point...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (16/19) Feb 04 2016 The context is a compilation system for building big software on

tsbockman (15/28) Feb 04 2016 OK. That's a good reason for C's original design.

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (29/41) Feb 04 2016 C has to be backwards compatible, but I don't know why people do

tsbockman (5/13) Feb 04 2016 Why would simply adding a warning change any of that?

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/24) Feb 04 2016 Not sure what you mean by adding a warning. You can probably find

H. S. Teoh via Digitalmars-d (24/29) Feb 04 2016 That's a lot more expensive than you think. There's a reason most modern

tsbockman (9/12) Feb 04 2016 I should also point out that D can link to (more or less)

H. S. Teoh via Digitalmars-d (49/52) Feb 04 2016 It cannot, because C symbols are not mangled. The function name uniquely

Walter Bright (5/6) Feb 04 2016 The preprocessor makes C++ into an inherently unreliable, unsafe program...

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/9) Feb 04 2016 AFAICT C would have complained if he had included . This

tsbockman (7/16) Feb 04 2016 What restriction does not checking, by default, that the

Chris Wright (11/24) Feb 04 2016 C linkage does zero name mangling, which is the problem. C++ introduced

tsbockman (4/14) Feb 04 2016 That explains why the linker doesn't catch it. I still don't see

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Feb 04 2016 The excuse is that C use the same mechanism for creating bindings
Chris Wright (12/15) Feb 04 2016 Doing this sort of validation requires build system integration (track

tsbockman (17/25) Feb 04 2016 There is no need to take "as much time as compiling the whole

Chris Wright (12/25) Feb 04 2016 True. That works if this is baked into your compiler, or if your compile...

tsbockman (11/21) Feb 04 2016 On Friday, 5 February 2016 at 00:56:28 UTC, Ola Fosheim Grøstad

Chris Wright (5/7) Feb 04 2016 The compiler doesn't have all the information you need. You could add it...

tsbockman (15/20) Feb 04 2016 What information, specifically, is the compiler missing?

Chris Wright (31/39) Feb 04 2016 It doesn't know what targets I'm ultimately creating, and it doesn't kno...

tsbockman (16/50) Feb 04 2016 No spurious error is generated by my proposal in your example 2,

H. S. Teoh via Digitalmars-d (27/50) Feb 04 2016 This fails for multi-executable projects, which may legally have

tsbockman (15/49) Feb 04 2016 It's a small fraction of the total data being handled by the

H. S. Teoh via Digitalmars-d (37/87) Feb 05 2016 The problem is, the linker knows nothing about the language. Arguably it

Chris Wright (22/52) Feb 05 2016 I think you're talking about maintaining an in-memory, modifiable data

tsbockman (13/43) Feb 05 2016 It doesn't necessarily have to be slow when you only changed one

tsbockman (7/17) Feb 05 2016 I did some quick tests on my system, and even with 100,000,000

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (19/25) Feb 05 2016 Well, compilers "should" only implement the standard, then they

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/6) Feb 05 2016 Let me add to this that the superior approach is to compile to an

anonymous (22/26) Feb 04 2016 You can do the same thing in D, using extern(C) to get no mangling:

tsbockman (9/31) Feb 04 2016 You can do the same thing in D if you try, but it's not natural

anonymous (5/11) Feb 04 2016 We do have a lot of bindings to C libraries, though. When there's a

tsbockman (7/15) Feb 04 2016 The compiler cannot (in the general case) verify that `extern(C)`

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/18) Feb 04 2016 I guess D could do it, although this is a rather unlikely source

tsbockman (9/17) Feb 04 2016 Aliasing types like that can be useful sometimes, but only within

Daniel Murphy (12/18) Feb 05 2016 Currently D allows overloading extern(C) declarations, see

tsbockman (12/26) Feb 05 2016 I think it makes sense (when actually linking to C) to allow

Daniel Murphy (6/15) Feb 05 2016 Safety on C functions is always going to need to be hand verified, the

H. S. Teoh via Digitalmars-d (15/25) Feb 04 2016 Nah... while D, by default, tries to be type-safe and prevent guffaws

Chris Wright (6/10) Feb 04 2016 Which suggests a check of this sort should be a warning rather than an

tsbockman (2/7) Feb 04 2016 Yes.

tsbockman (11/28) Feb 04 2016 I'm not saying that `extern(C)` is bad in general; I understand

Adam D. Ruppe (10/13) Feb 04 2016 D allows that. This is why I recommend putting `static

tsbockman (12/21) Feb 04 2016 D *doesn't* allow that though - at least, not in a monolithic,

Adam D. Ruppe (25/29) Feb 04 2016 Well, technically, a .di file is just a .d file renamed, but it

tsbockman (16/46) Feb 04 2016 Thanks for the explanation. That does sound basically the same as

H. S. Teoh via Digitalmars-d (13/25) Feb 04 2016 [...]

tsbockman (9/18) Feb 04 2016 I should have clarified that I was considering static libraries,

tsbockman <thomas.bockman gmail.com> writes:

The annual Underhanded C Contest announced their winners today.

As always, the results are very entertaining, and also an 
excellent advertisement for languages-that-are-not-C.

The first place entry is particularly ridiculous; is there any 
modern language that would make it so easy to commit such an 
awful "mistake"?

http://www.underhanded-c.org/#winner

Actually, I'm surprised that this works even in C - I would have 
expected at least a compiler (or linker?) warning; this seems 
like it should be easy to detect automatically.

Feb 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Feb 04, 2016 at 10:57:00PM +0000, tsbockman via Digitalmars-d wrote:
 The annual Underhanded C Contest announced their winners today.
 
 As always, the results are very entertaining, and also an excellent
 advertisement for languages-that-are-not-C.
 
 The first place entry is particularly ridiculous; is there any modern
 language that would make it so easy to commit such an awful "mistake"?
 
 http://www.underhanded-c.org/#winner
 
 Actually, I'm surprised that this works even in C - I would have
 expected at least a compiler (or linker?) warning; this seems like it
 should be easy to detect automatically.

The C preprocessor accepts all sorts of nasty, nonsensical things. For
example, the following code compiles and runs (without any warning(!) on
my Linux box's standard gcc installation), and prints "No":

	#include <stdio.h>
	#define if(a) if(!(a))
	int main() {
		int i = 1;
		if (i == 1)
			printf("Yes\n");
		else
			printf("No\n");
	}

Imagine if this nasty #define is buried somewhere under several layers
of #include's.

I'm pretty sure somebody can also concoct some nasty #define that will
break the standard #include headers in horrible ways by changing the
semantics of certain supposedly-built-in constructs.


T

-- 
Mediocrity has been pushed to extremes.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:10:23 UTC, H. S. Teoh wrote:
 On Thu, Feb 04, 2016 at 10:57:00PM +0000, tsbockman via 
 Digitalmars-d wrote:
 The annual Underhanded C Contest announced their winners today.
 
 As always, the results are very entertaining, and also an 
 excellent advertisement for languages-that-are-not-C.
 
 The first place entry is particularly ridiculous; is there any 
 modern language that would make it so easy to commit such an 
 awful "mistake"?
 
 http://www.underhanded-c.org/#winner
 
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a compiler (or linker?) warning; this 
 seems like it should be easy to detect automatically.

 The C preprocessor accepts all sorts of nasty, nonsensical 
 things.

Definitely. What puzzles me about the winning entry, though, is 
that the compiler and/or linker should be able to trivially 
detect the type mismatch *after* the preprocessor pass(es) are 
already done.

It should just see that the post-preprocessor signatures of 
`spectral_contrast()` in match.c and spectral_contrast.c are in 
conflict, and either issue a warning, or refuse to link them at 
all.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 4 February 2016 at 23:21:54 UTC, tsbockman wrote:
 Definitely. What puzzles me about the winning entry, though, is 
 that the compiler and/or linker should be able to trivially 
 detect the type mismatch *after* the preprocessor pass(es) are 
 already done.

Linkers don't know anything about types. A type is a language 
feature.

 It should just see that the post-preprocessor signatures of 
 `spectral_contrast()` in match.c and spectral_contrast.c are in 
 conflict, and either issue a warning, or refuse to link them at 
 all.

Has nothing to do with the preprocessor.

He defined float_t to be an alias for double in one compilation 
unit, and float_t to be an alias for float in another compilation 
unit.

In C, compilation units are completely independent, and can in 
fact come from different compilers and different languages. C is 
very much a system level programming language.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 February 2016 at 23:21:54 UTC, tsbockman wrote:
 It should just see that the post-preprocessor signatures of 
 `spectral_contrast()` in match.c and spectral_contrast.c are 
 in conflict, and either issue a warning, or refuse to link 
 them at all.

 Has nothing to do with the preprocessor.

Yes, that was my point...

 He defined float_t to be an alias for double in one compilation 
 unit, and float_t to be an alias for float in another 
 compilation unit.

 In C, compilation units are completely independent, and can in 
 fact come from different compilers and different languages. C 
 is very much a system level programming language.

Just because *sometimes* the source code of the other module must 
be compiled independently, is a poor excuse to skip obvious, 
useful safety checks *all* the time.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:
 Just because *sometimes* the source code of the other module 
 must be compiled independently, is a poor excuse to skip 
 obvious, useful safety checks *all* the time.

The context is a compilation system for building big software on 
very slow CPUs with kilobytes of RAM.

C was designed for always compiling independently and compiling 
source files that are bigger than what can be held in RAM, and 
also for building executables that can fill most of system RAM. 
So the compilation system was designed for using external memory 
(disk) and that affects C a lot. The forerunner for C, BCPL was a 
bootstrap language for writing compilers. So C is minimal by 
design.

BTW, C++ programmers sometimes use similar unsafe hacks of 
"pruned header files" to break dependencies and speed up 
compilation. So this is not unique to C, but C++ introduced the 
mangling of types into names to support overloading of functions 
on parameter types, which is why C++ detects (some) type issues 
at link time.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:53:58 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 February 2016 at 23:35:46 UTC, tsbockman wrote:
 Just because *sometimes* the source code of the other module 
 must be compiled independently, is a poor excuse to skip 
 obvious, useful safety checks *all* the time.

 The context is a compilation system for building big software 
 on very slow CPUs with kilobytes of RAM.

 C was designed for always compiling independently and compiling 
 source files that are bigger than what can be held in RAM, and 
 also for building executables that can fill most of system RAM. 
 So the compilation system was designed for using external 
 memory (disk) and that affects C a lot. The forerunner for C, 
 BCPL was a bootstrap language for writing compilers. So C is 
 minimal by design.

OK. That's a good reason for C's original design.

But it's 2016 and my PC has 32GiB of RAM. Why should a C compiler 
running on such a system skip safety checks just because they 
would be too expensive to run on some *other* computer?

This isn't even a particularly expensive (in compile-time costs) 
check to perform anyway; all that is necessary is to store a 
temporary table of symbol signatures somewhere (it doesn't need 
to be in RAM), and check that any duplicate entries are 
consistent with each other before linking.

This is already a solved problem in most other programming 
languages; there is no fundamental reason that the solutions used 
in D, C++, or Java could not be applied to C - without even 
changing any of the language semantics.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
 But it's 2016 and my PC has 32GiB of RAM. Why should a C 
 compiler running on such a system skip safety checks just 
 because they would be too expensive to run on some *other* 
 computer?

C has to be backwards compatible, but I don't know why people do 
larger projects in C in 2016.

Libraries are done in C for portability and because it provides a 
FFI interface defined as the ABI by hardware and OS vendors. BeOS 
tried to define a specific C++ compiler as their ABI, but it was 
problematic.

C++ does not have an ABI, you cannot link object files from 

So, basically, there is no suitable industry standard other than 
C.

 This is already a solved problem in most other programming 
 languages; there is no fundamental reason that the solutions 
 used in D, C++, or Java could not be applied to C - without 
 even changing any of the language semantics.

D and C++ change.  C uses the ABI defined by the hardware/OS 
vendor.  It is locked in stone, frozen, beyond discussion.

As mentioned BeOS adopted C++. Apple has adopted Objective-C and 
Swift. But how can you make _all_ the other vendors (Microsoft, 
Google, IBM etc) standardize on something that isn't C?

 Aliasing types like that can be useful sometimes, but only 
 within certain limits. In particular, the size (with alignment 
 padding) of the types in question must match, otherwise you 
 will corrupt the stack.

I see where you are coming from, but I meant what I said 
literally. Machine language only deals with bitpatterns. When we 
interface with machine language we just add lots of constraints 
on what we hand over to it. Adding _more_ constraints the the 
creator of the machine language code intended is never wrong. Not 
adding enough constraints is not ideal, but often difficult to 
avoid if we care about performance.

So if I write a piece of machine language code and give you the 
object file you only have my words for what the input is supposed 
to be. And then you have to make a formulation of the constraints 
that fits your use case and is expressible in your language. 
Different languages have different levels of expressiveness for 
describing and enforcing type constraints.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
 But it's 2016 and my PC has 32GiB of RAM. Why should a C 
 compiler running on such a system skip safety checks just 
 because they would be too expensive to run on some *other* 
 computer?

 C has to be backwards compatible, but I don't know why people 
 do larger projects in C in 2016.
 [...]

Why would simply adding a warning change any of that?

No ABI changes are required. Backwards compatibility is not 
broken.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 5 February 2016 at 00:50:32 UTC, tsbockman wrote:
 On Friday, 5 February 2016 at 00:41:52 UTC, Ola Fosheim Grøstad 
 wrote:
 On Friday, 5 February 2016 at 00:14:11 UTC, tsbockman wrote:
 But it's 2016 and my PC has 32GiB of RAM. Why should a C 
 compiler running on such a system skip safety checks just 
 because they would be too expensive to run on some *other* 
 computer?

 C has to be backwards compatible, but I don't know why people 
 do larger projects in C in 2016.
 [...]

 Why would simply adding a warning change any of that?

 No ABI changes are required. Backwards compatibility is not 
 broken.

Not sure what you mean by adding a warning. You can probably find 
sanitizers that do it, but the standard does not require warnings 
for anything (AFAIK). That is up to compiler vendors.

As for why C isn't displaced by something better, maybe the right 
question is: why don't new languages stick to the C ABI and 
provide sensible C code gen.

Well, they want more features... and features... and features...

There is probably a market for it, but nobody can be bothered to 
create and maintain a simple modern system level language.

Feb 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Feb 05, 2016 at 12:14:11AM +0000, tsbockman via Digitalmars-d wrote:
[...]
 This isn't even a particularly expensive (in compile-time costs) check
 to perform anyway; all that is necessary is to store a temporary table
 of symbol signatures somewhere (it doesn't need to be in RAM), and
 check that any duplicate entries are consistent with each other before
 linking.

That's a lot more expensive than you think. There's a reason most modern
linkers do not do full cross-referencing of symbols -- because doing so
would be excruciatingly slow and consume gobs of memory. Even a 32GB
machine would not be able to hold *all* the symbols in some very large
software projects, and looking things up on disk is unacceptably slow
for software of those sizes. Most modern linkers instead use faster
algorithms that rely on clever scheduling of the order of symbol
resolution, just so they *don't* have to cross-reference all symbols at
once.

Besides, all this is unnecessary work. All you need to do is to have C
compilers mangle function names.  Mission accomplished.

(However, this *will* break a lot of existing inter-language code that
rely on being able to spell out symbols explicitly. So it probably will
not fly.  But, in theory, it *is* possible...)

And to paraphrase one of my favorite Walter quotes: fixing inconsistent
function signatures is only plugging one hole in a cheese grater. C has
far more dangerous gotchas than just function signature mismatches.


T

-- 
They say that "guns don't kill people, people kill people." Well I think
the gun helps. If you just stood there and yelled BANG, I don't think
you'd kill too many people. -- Eddie Izzard, Dressed to Kill

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:25:58 UTC, Ola Fosheim Grøstad 
wrote:
 In C, compilation units are completely independent, and can in 
 fact come from different compilers and different languages. C 
 is very much a system level programming language.

I should also point out that D can link to (more or less) 
anything that C can, and yet does not have the weakness exploited 
by the winning entry.

The only real reason that D is one wit less of a "system level 
programming language" than C, is the heavyweight runtime library 
- but that is irrelevant to the problem of type-checking 
cross-module references within the same code base.

Feb 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Feb 04, 2016 at 11:21:54PM +0000, tsbockman via Digitalmars-d wrote:
[...]
 Definitely. What puzzles me about the winning entry, though, is that the
 compiler and/or linker should be able to trivially detect the type mismatch
 *after* the preprocessor pass(es) are already done.

It cannot, because C symbols are not mangled. The function name uniquely
identifies the function, and the signature is not encoded anywhere.

The linker knows nothing about types or parameters; all it knows is that
within offset X of binary blob B, there's a binary number (usually a 32-
or 64-bit address) associated with a symbol that it needs to replace
with the value (i.e., address) of that symbol, which it obtains from the
object file that defines that symbol.

So as far as the linker is concerned, the function names match up, and
that's all there is to it.

C provides zero protection against calling functions with mismatched
parameters if the caller is not in the same file, and does not have the
right declaration. E.g.:

	/* module1.c */
	void func(int a, int b) { ... }

	/* module2.c */
	extern int func(double x); /* I'm too lazy to #include a header */
	int main() {
		int x = func(1.0); /* kaboom */
	}

In theory, this problem is solved by #include'ing the appropriate header
file, but even that isn't free from accidents like forgetting to update
the header after you change the function signature.  Of course, most
sane C projects will also #include the header in the file that defines
the function, in which case, finally, the compiler will catch the
mistake. But you can see just how fragile this is, and how many points
of failure it has, and, believe it or not, there *are* still C projects
out there that don't follow the convention of one header per .c file,
and of those that do, a frightening number do not #include the header in
the .c file.

This isn't the whole story, either. Even if you follow said conventions
to prevent function signature mismatches, problems can still occur. For
instance, once I've had to debug a mysterious crash problem in an
enterprise project that, seemingly, cannot be found in the code.  Turns
out, that it was caused by two shared libraries that defined two
different functions under the same name. Since the conflicting functions
are in separately-compiled libraries, the compiler is oblivious to the
conflict. Furthermore, the linker doesn't detect it either, because,
being shared libraries, all the linker knows is that it found symbol X
in library1, so it didn't bother looking for symbol X again in library2
which is processed afterward. An unrelated code change caused the order
of libraries linked to change, and suddenly now the linker finds symbol
X in library2 first, leading to the function call being linked to the
wrong implementation.  So at runtime, kaboom.

Name mangling singlehandedly solves all of the above problems.


T

-- 
Beware of bugs in the above code; I have only proved it correct, not tried it.
-- Donald Knuth

Feb 04 2016

Walter Bright <newshound2 digitalmars.com> writes:

On 2/4/2016 3:10 PM, H. S. Teoh via Digitalmars-d wrote:
 The C preprocessor accepts all sorts of nasty, nonsensical things.

The preprocessor makes C++ into an inherently unreliable, unsafe programming 
language. I've talked to some C++ committee members about this, about why there 
is no push to rid (at least deprecate) all use of the preprocessor. The general 
reaction I get is it is unimportant to do so.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a compiler (or linker?) warning; this 
 seems like it should be easy to detect automatically.

AFAICT C would have complained if he had included <math.h>. This 
is a rather unlikely mistake...

Anyway, in C being able to work around restrictions is sometimes 
desired, so... if you don't want the ability to do it, don't use 
C.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:19:20 UTC, Ola Fosheim Grøstad 
wrote:
 On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a compiler (or linker?) warning; this 
 seems like it should be easy to detect automatically.

 AFAICT C would have complained if he had included <math.h>. 
 This is a rather unlikely mistake...

 Anyway, in C being able to work around restrictions is 
 sometimes desired, so... if you don't want the ability to do 
 it, don't use C.

What restriction does not checking, by default, that the 
parameter types match allow one to work around, though?

C already has `void*` and explicit casts, either of which would 
allow one to explicitly indicate that type checking is not 
desired.

Feb 04 2016

Chris Wright <dhasenan gmail.com> writes:

On Thu, 04 Feb 2016 22:57:00 +0000, tsbockman wrote:

 The annual Underhanded C Contest announced their winners today.
 
 As always, the results are very entertaining, and also an excellent
 advertisement for languages-that-are-not-C.
 
 The first place entry is particularly ridiculous; is there any modern
 language that would make it so easy to commit such an awful "mistake"?
 
 http://www.underhanded-c.org/#winner
 
 Actually, I'm surprised that this works even in C - I would have
 expected at least a compiler (or linker?) warning; this seems like it
 should be easy to detect automatically.

C linkage does zero name mangling, which is the problem. C++ introduced 
name mangling, so compiling with g++ would show the error rather quickly. 
C99 is pretty close to C++98, but there are enough differences that that 
isn't a reliable diagnostic. (Though if you're familiar with the 
differences, you could use it as a quick way to show potential problem 
areas.)

I suppose a compiler could produce two symbol tables, one featuring 
mangled names and one with unmangled names. The linker would prefer 
matching mangled names and issue a warning if it only had an unmangled 
match with a mangled false match.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:24:21 UTC, Chris Wright wrote:
 C linkage does zero name mangling, which is the problem. C++ 
 introduced name mangling, so compiling with g++ would show the 
 error rather quickly. C99 is pretty close to C++98, but there 
 are enough differences that that isn't a reliable diagnostic. 
 (Though if you're familiar with the differences, you could use 
 it as a quick way to show potential problem areas.)

 I suppose a compiler could produce two symbol tables, one 
 featuring mangled names and one with unmangled names. The 
 linker would prefer matching mangled names and issue a warning 
 if it only had an unmangled match with a mangled false match.

That explains why the linker doesn't catch it. I still don't see 
much excuse for the compiler allowing it though, beyond a desire 
to allow each module to be compiled independently.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Thursday, 4 February 2016 at 23:29:10 UTC, tsbockman wrote:
 That explains why the linker doesn't catch it. I still don't 
 see much excuse for the compiler allowing it though, beyond a 
 desire to allow each module to be compiled independently.

The excuse is that C use the same mechanism for creating bindings 
to C and non-C code. It is actually very handy. IF you want a 
system level language and full separation of compilation units 
(which allows for very fast compilation).

Feb 04 2016

Chris Wright <dhasenan gmail.com> writes:

On Thu, 04 Feb 2016 23:29:10 +0000, tsbockman wrote:

 That explains why the linker doesn't catch it. I still don't see much
 excuse for the compiler allowing it though, beyond a desire to allow
 each module to be compiled independently.

Doing this sort of validation requires build system integration (track 
the command line arguments that went into producing this object file; 
find which object files are combined into which targets; run the analysis 
on that) and costs as much time as compiling the whole project from 
scratch. Developing such a system is nontrivial, so it's not a matter of 
conjuring excuses; rather, someone would have to put in considerable 
effort to make it work.

I'm betting some of the commercial static analyzers for C do this, but 
they're not the sort of things you install on every dev machine and run 
on every build. Generally they're the sort of thing that you send off to 
the security company anda they send you a report some weeks later.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:
 Doing this sort of validation requires build system integration 
 (track the command line arguments that went into producing this 
 object file; find which object files are combined into which 
 targets; run the analysis on that) and costs as much time as 
 compiling the whole project from scratch.

There is no need to take "as much time as compiling the whole 
project from scratch".

The necessary information is already gathered during the normal 
course of compilation; all that is required is to actually save 
it somewhere until link-time, instead of throwing it away.

The time required for the check should be at most O(N log(N)), 
where N is the number of function and global variable 
declarations in the project. The space required for the table is 
O(N). In both cases the constant factors should be quite small.

 Developing such a system is nontrivial, so it's not a matter of
 conjuring excuses; rather, someone would have to put in
 considerable effort to make it work.

Adding any interesting feature to a build system is usually 
nontrivial, but I still think you're overestimating the cost of 
this one.

Again, the hard part (finding all the signatures and processing 
them into a semantically meaningful form) is already being done 
by the compiler. The results just need to be saved, sorted, and 
scanned for conflicts.

Feb 04 2016

Chris Wright <dhasenan gmail.com> writes:

On Fri, 05 Feb 2016 00:38:16 +0000, tsbockman wrote:

 On Friday, 5 February 2016 at 00:03:56 UTC, Chris Wright wrote:
 Doing this sort of validation requires build system integration (track
 the command line arguments that went into producing this object file;
 find which object files are combined into which targets; run the
 analysis on that) and costs as much time as compiling the whole project
 from scratch.

 
 There is no need to take "as much time as compiling the whole project
 from scratch".
 
 The necessary information is already gathered during the normal course
 of compilation; all that is required is to actually save it somewhere
 until link-time, instead of throwing it away.

True. That works if this is baked into your compiler, or if your compiler 
has plugin support. And you'd have to compile with this plugin or the 
relevant options turned on by default in order for you not to duplicate 
work.

That's partly an engineering issue (build this thing in this particular 
way) and partly a social issue (get people to run it by default; have 
them add the extra flag to the makefile to specify to create the relevant 
output; possibly get your compiler vendor to build it in, depending on 
what compiler your devs are using).

I imagine Google, to take a random example where I have experience, would 
add this as a presubmit step rather than requiring it on every build.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 00:56:16 UTC, Chris Wright wrote:
 True. That works if this is baked into your compiler, or if 
 your compiler has plugin support. And you'd have to compile 
 with this plugin or the relevant options turned on by default 
 in order for you not to duplicate work.

On Friday, 5 February 2016 at 00:56:28 UTC, Ola Fosheim Grøstad 
wrote:
 Not sure what you mean by adding a warning. You can probably 
 find sanitizers that do it, but the standard does not require 
 warnings for anything (AFAIK). That is up to compiler vendors.

Quoting myself (emphasis added):

On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 Actually, I'm surprised that this works even in C - I would 
 have expected at least a COMPILER (or linker?) warning; this 
 seems like it should be easy to detect automatically.

All along I have been saying this is something that *compilers* 
should warn about. As far as I can recall, I never suggested 
using linters, sanitizers, changing the C standard - or even 
compiler plugins.

(I did suggest the linker as an alternative, but you all have 
already explained why that can't work for C.)

Feb 04 2016

Chris Wright <dhasenan gmail.com> writes:

On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:

 All along I have been saying this is something that *compilers* should
 warn about.

The compiler doesn't have all the information you need. You could add it 
to the build system or the linker as well as the compiler. Adding it to 
the linker is almost identical to my previous suggestion of adding 
optional name mangling to C.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
 On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
 The compiler doesn't have all the information you need. You 
 could add it to the build system or the linker as well as the 
 compiler. Adding it to the linker is almost identical to my 
 previous suggestion of adding optional name mangling to C.

What information, specifically, is the compiler missing?

The compiler already computes the name and type signature of each 
function. As far as I can see, all that is necessary is to:

1) Insert that information (together with what file and line 
number it came from) into a big list in a temporary file.
2) After all modules have been compiled, go back and sort the 
list by function name.
3) Finally, scan the list for entries that share the same name, 
but have incompatible type signatures. Emit warning messages as 
needed. (The compiler should be used for this step, because it 
already has a lot of information about C's type system built into 
it that can help define "incompatible" sensibly.)

As far as I can see, this requires an extra pass, but no 
additional information. What am I missing?

Feb 04 2016

Chris Wright <dhasenan gmail.com> writes:

On Fri, 05 Feb 2016 04:02:41 +0000, tsbockman wrote:

 On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
 On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
 The compiler doesn't have all the information you need. You could add
 it to the build system or the linker as well as the compiler. Adding it
 to the linker is almost identical to my previous suggestion of adding
 optional name mangling to C.

 
 What information, specifically, is the compiler missing?

It doesn't know what targets I'm ultimately creating, and it doesn't know 
what files have been modified that I'm about to compile (but haven't 
compiled yet).

Example 1:

I compile one .c file referencing a function:
void foo(int);

That's going to end up in libfoo.so.

I compile another .c file in the same directory defining a function:
void foo(float);

That's going to end up in libbar.so.

No bug here. (The linker should tell us if someone depends on foo from 
libbar and foo from libfoo in the same executable.)

How does your putative compiler plugin handle it? Either I have to define 
a build rule for every source file to specify where to put this symbol 
cache (and you need to add parameters for the plugin to look for multiple 
caches, because libfoo and libbar share a lot of source files), or the 
plugin gives me false positives.

Example 2:

I compile a.c:
int foo(int i) { return i + 1; }

In the course of refactoring, I delete that function from a.c and add it 
to b.c with modifications:
int foo(int i, int increment) { return i + increment; }

My build script recompiles b.c before it recompiles a.c. Your compiler 
plugin produces a build error, halting my build. I have to make clean &&  
make in order to proceed -- and that's assuming I know your tool doesn't 
work well with incremental compilation.

The first problem might be uncommon, but the second would crop up 
constantly. They have the same fix: collect the information when you 
compile, evaluate it when you link.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 06:05:49 UTC, Chris Wright wrote:
 It doesn't know what targets I'm ultimately creating, and it 
 doesn't know what files have been modified that I'm about to 
 compile (but haven't compiled yet).

 Example 1:

 I compile one .c file referencing a function:
 void foo(int);

 That's going to end up in libfoo.so.

 I compile another .c file in the same directory defining a 
 function:
 void foo(float);

 That's going to end up in libbar.so.

 No bug here. (The linker should tell us if someone depends on 
 foo from libbar and foo from libfoo in the same executable.)

 How does your putative compiler plugin handle it? Either I have 
 to define a build rule for every source file to specify where 
 to put this symbol cache (and you need to add parameters for 
 the plugin to look for multiple caches, because libfoo and 
 libbar share a lot of source files), or the plugin gives me 
 false positives.

 Example 2:

 I compile a.c:
 int foo(int i) { return i + 1; }

 In the course of refactoring, I delete that function from a.c 
 and add it
 to b.c with modifications:
 int foo(int i, int increment) { return i + increment; }

 My build script recompiles b.c before it recompiles a.c. Your 
 compiler plugin produces a build error, halting my build. I 
 have to make clean && make in order to proceed -- and that's 
 assuming I know your tool doesn't work well with incremental 
 compilation.

 The first problem might be uncommon, but the second would crop 
 up constantly. They have the same fix: collect the information 
 when you compile, evaluate it when you link.

No spurious error is generated by my proposal in your example 2, 
because I specifically stated that the extra pass must be done 
once, after *all* modules have been compiled.

I see, however, that this would require one of:

1) Modifying build scripts to pass the complete list of .c files 
to the compiler in a single command, or
2) Modifying build scripts to run the compiler one extra time 
after processing all the .c files, or
3) Run the final check at link-time.

For a C tool chain with a clean-sheet design, any of those would 
handle example 2 fine. (1) or (3) could also handle example 1 
without issue.

However, as you say, only (3) is backwards compatible with 
existing make files and what-not. (This is not a limitation of 
the C language or ABI, though.)

Feb 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d wrote:
 On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:
The compiler doesn't have all the information you need. You could add it
to the build system or the linker as well as the compiler. Adding it to
the linker is almost identical to my previous suggestion of adding
optional name mangling to C.

 
 What information, specifically, is the compiler missing?
 
 The compiler already computes the name and type signature of each function.
 As far as I can see, all that is necessary is to:
 
 1) Insert that information (together with what file and line number it
 came from) into a big list in a temporary file.
 2) After all modules have been compiled, go back and sort the list by
 function name.

This would make compilation of large projects excruciatingly slow.


 3) Finally, scan the list for entries that share the same name, but
 have incompatible type signatures. Emit warning messages as needed.
 (The compiler should be used for this step, because it already has a
 lot of information about C's type system built into it that can help
 define "incompatible" sensibly.)

This fails for multi-executable projects, which may legally have
different functions under the same name. (Even though that's arguably a
very bad idea.)


 As far as I can see, this requires an extra pass, but no additional
 information. What am I missing?

The fact that the C compiler only sees one file at a time, and has no
idea which one, if any, of them will even end up in the final
executable. Many projects produce multiple executables with some shared
sources between them, and only the build system knows which file(s) go
with which executables.

So as others have said, this can only work for compilers that are aware
of the larger picture than just the single source file it's currently
compiling. Even in D, for a sufficiently large project the compiler
can't see everything at once either, because it won't fit into your RAM.
Thankfully, D doesn't suffer from this particular problem because of
name mangling.

Which is why I said, adding name mangling to the C compiler will solve
this problem. Except that it breaks existing inter-language code, so it
won't work for *all* C programs. And it will also break linkage with
existing shared libraries, which are *not* name-mangled. (Recompiling
said libraries may not be an option if they are OEM, binary-only blobs.)
So it can only work for self-contained, independent projects with no
inter-language linkage, which would be a very restricted subset of C
codebases.


T

-- 
Nobody is perfect.  I am Nobody. -- pepoluan, GKC forum

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
 On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via 
 Digitalmars-d wrote:
 On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:

 What information, specifically, is the compiler missing?
 
 The compiler already computes the name and type signature of 
 each function. As far as I can see, all that is necessary is 
 to:
 
 1) Insert that information (together with what file and line 
 number it
 came from) into a big list in a temporary file.
 2) After all modules have been compiled, go back and sort the 
 list by
 function name.

 This would make compilation of large projects excruciatingly 
 slow.

It's a small fraction of the total data being handled by the 
compiler (smaller than the source code), and the list could 
probably be directly generated in a partially sorted state. 
Little-to-no random access to the list is required at any point 
in the process. It does not ever need to all be in RAM at the 
same time.

I can see it may cost more than it's actually worth, but where 
does the "excruciatingly slow" part come from?

 3) Finally, scan the list for entries that share the same 
 name, but have incompatible type signatures. Emit warning 
 messages as needed. (The compiler should be used for this 
 step, because it already has a lot of information about C's 
 type system built into it that can help define "incompatible" 
 sensibly.)

 This fails for multi-executable projects, which may legally 
 have different functions under the same name. (Even though 
 that's arguably a very bad idea.)

Chris Wright pointed this out, as well. This just means the final 
pass should be done at link-time, though. It's not a fundamental 
problem with generating the warning.

 As far as I can see, this requires an extra pass, but no 
 additional information. What am I missing?

 The fact that the C compiler only sees one file at a time, and 
 has no idea which one, if any, of them will even end up in the 
 final executable. Many projects produce multiple executables 
 with some shared sources between them, and only the build 
 system knows which file(s) go with which executables.

This could be worked around with a little cooperation between the 
compiler and the linker. It's not even a feature of C the 
language - it's just the way current tool chains happen to work.

Feb 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d wrote:
 On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d
wrote:
On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:

What information, specifically, is the compiler missing?

The compiler already computes the name and type signature of each
function. As far as I can see, all that is necessary is to:

1) Insert that information (together with what file and line number
it came from) into a big list in a temporary file.
2) After all modules have been compiled, go back and sort the list
by function name.

This would make compilation of large projects excruciatingly slow.

 
 It's a small fraction of the total data being handled by the compiler
 (smaller than the source code), and the list could probably be
 directly generated in a partially sorted state. Little-to-no random
 access to the list is required at any point in the process. It does
 not ever need to all be in RAM at the same time.
 
 I can see it may cost more than it's actually worth, but where does
 the "excruciatingly slow" part come from?

OK, probably I'm misunderstanding something here. :-P


3) Finally, scan the list for entries that share the same name, but
have incompatible type signatures. Emit warning messages as needed.
(The compiler should be used for this step, because it already has a
lot of information about C's type system built into it that can help
define "incompatible" sensibly.)

This fails for multi-executable projects, which may legally have
different functions under the same name. (Even though that's arguably
a very bad idea.)

 
 Chris Wright pointed this out, as well. This just means the final pass
 should be done at link-time, though. It's not a fundamental problem
 with generating the warning.

The problem is, the linker knows nothing about the language. Arguably it
should, but as things stand currently, it doesn't, and can't, because
usually linkers are shipped with the OS, and are expected to link object
files of *any* pedigree without needing to code for language-explicit
checks.

Perhaps this is slowly starting to change, as LTO and other recent
innovations are pushing the envelope of what the linker can do.  Maybe
one day there will emerge a language-agnostic way for the linker to
check for such errors... but I really don't see it happening, because
languages *other* than C have already solved the problem with name
mangling. There isn't much motivation for linkers to change just because
C has some language design issues.

(And note that I'm not trying to disagree with you -- I'm totally in
agreement that what C allows is oftentimes extremely dangerous and
rather unwise. But the way things are is just so entrenched that it's
unlikely to change in the near (or even distant) future.)


As far as I can see, this requires an extra pass, but no additional
information. What am I missing?

The fact that the C compiler only sees one file at a time, and has no
idea which one, if any, of them will even end up in the final
executable. Many projects produce multiple executables with some
shared sources between them, and only the build system knows which
file(s) go with which executables.

 
 This could be worked around with a little cooperation between the
 compiler and the linker. It's not even a feature of C the language -
 it's just the way current tool chains happen to work.

And that's where the sticky part lies. Current toolchains work in this,
arguably suboptimal, way mainly because of historical baggage, but more
because doing otherwise will make the toolchain incompatible with
existing other toolchains and systems. The current divide between
compiler and linker is actually IMO not in the best place it could be,
as it hampers a lot of what, arguably, should be the compiler's job, not
the linker's. Nevertheless, changing this means you become incompatible
with much of the ecosystem and become a walled garden -- like Java (JNI
was an afterthought, and requires a very specific setup to even work --
there's definitely no way to link Java objects with OS-level object
files without jumping through lots of hoops with lots of caveats). I
just don't see this ever happening, especially not for something that,
in the big picture, really isn't *that* big of a deal. After all, C
coders have gotten used to working with far more dangerous things in C
than merely mismatched prototypes; it would take a LOT more than that
for people to accept changing the way things work.


T

-- 
Skill without imagination is craftsmanship and gives us many useful objects
such as wickerwork picnic baskets.  Imagination without skill gives us modern
art. -- Tom Stoppard

Feb 05 2016

Chris Wright <dhasenan gmail.com> writes:

On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via Digitalmars-d wrote:

 On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via Digitalmars-d
 wrote:
 On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via Digitalmars-d
wrote:
On Friday, 5 February 2016 at 03:46:37 UTC, Chris Wright wrote:
On Fri, 05 Feb 2016 01:10:53 +0000, tsbockman wrote:

What information, specifically, is the compiler missing?

The compiler already computes the name and type signature of each
function. As far as I can see, all that is necessary is to:

1) Insert that information (together with what file and line number
it came from) into a big list in a temporary file.
2) After all modules have been compiled, go back and sort the list by
function name.

This would make compilation of large projects excruciatingly slow.

 
 It's a small fraction of the total data being handled by the compiler
 (smaller than the source code), and the list could probably be directly
 generated in a partially sorted state. Little-to-no random access to
 the list is required at any point in the process. It does not ever need
 to all be in RAM at the same time.
 
 I can see it may cost more than it's actually worth, but where does the
 "excruciatingly slow" part come from?

 
 OK, probably I'm misunderstanding something here. :-P

I think you're talking about maintaining an in-memory, modifiable data 
structure, doing one insert per operation and one point query per use. 
That's useful for incremental compilation, but it's going to be pretty 
slow.

tsbockman is thinking of a single pass at link time that checks 
everything at once. You append an entry to a list for each prototype and 
definition, then later sort all those lists together by name. Error on 
duplicate names with mismatched signatures.

This is faster for fresh builds than it is for incremental compilation -- 
tsbockman mentioned a brief benchmark, and that cost would crop up on 
every build, even if you'd only changed one line of code. (Granted, that 
example was pretty huge.) But this might typically be faster than a bunch 
of point queries even with incremental compilation.

Anyway, that's why I'm thinking most people who used such a feature would 
turn it on in their continuous integration server or as a presubmit step 
rather than every build.

 The problem is, the linker knows nothing about the language.

We're only talking about a linker because we need to run this tool after 
compiling all your files, and it has to know what input files you're 
putting into the linker.

So this "linker" is really just a shell script that invokes our checker 
and then calls the system linker.

Feb 05 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 20:35:16 UTC, Chris Wright wrote:
 On Fri, 05 Feb 2016 10:04:01 -0800, H. S. Teoh via 
 Digitalmars-d wrote:

 On Fri, Feb 05, 2016 at 07:31:34AM +0000, tsbockman via 
 Digitalmars-d wrote:
 On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:

 OK, probably I'm misunderstanding something here. :-P

 I think you're talking about maintaining an in-memory, 
 modifiable data structure, doing one insert per operation and 
 one point query per use. That's useful for incremental 
 compilation, but it's going to be pretty slow.

 tsbockman is thinking of a single pass at link time that checks 
 everything at once. You append an entry to a list for each 
 prototype and definition, then later sort all those lists 
 together by name. Error on duplicate names with mismatched 
 signatures.

Yes.

 This is faster for fresh builds than it is for incremental 
 compilation -- tsbockman mentioned a brief benchmark, and that 
 cost would crop up on every build, even if you'd only changed 
 one line of code. (Granted, that example was pretty huge.) But 
 this might typically be faster than a bunch of point queries 
 even with incremental compilation.

 Anyway, that's why I'm thinking most people who used such a 
 feature would turn it on in their continuous integration server 
 or as a presubmit step rather than every build.

It doesn't necessarily have to be slow when you only changed one 
line:

* The list from the previous compilation could be re-used to 
speed things up considerably, although retaining it would cost 
some disk space.

* If that's still too expensive, just don't cross-check files 
that aren't being recompiled. The check will be less useful on 
incremental builds, but not *useless*. The CI server can still do 
the full check (using the compiler), as you suggest.

 The problem is, the linker knows nothing about the language.

 We're only talking about a linker because we need to run this 
 tool after compiling all your files, and it has to know what 
 input files you're putting into the linker.

 So this "linker" is really just a shell script that invokes our 
 checker and then calls the system linker.

Yes. (Or, it's the compiler with a special option set, which then 
calls the linker after it finishes its global pre-link tasks.)

Feb 05 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 07:05:06 UTC, H. S. Teoh wrote:
 On Fri, Feb 05, 2016 at 04:02:41AM +0000, tsbockman via 
 Digitalmars-d wrote:
 1) Insert that information (together with what file and line 
 number it
 came from) into a big list in a temporary file.
 2) After all modules have been compiled, go back and sort the 
 list by
 function name.

 This would make compilation of large projects excruciatingly 
 slow.

I did some quick tests on my system, and even with 100,000,000 
names (more names than there are lines of code in the Linux 
kernel...) this can be done in less than three minutes. Smaller 
projects take seconds or less.

I suspect there is a major disconnect between what I meant, and 
what you think I meant.

Feb 05 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 5 February 2016 at 01:10:53 UTC, tsbockman wrote:
 All along I have been saying this is something that *compilers* 
 should warn about. As far as I can recall, I never suggested 
 using linters, sanitizers, changing the C standard - or even 
 compiler plugins.

Well, compilers "should" only implement the standard, then they 
"may" add extra static analysis.

The direction C and C++ takes is that increasing compilation 
times by doing extra static analysis on every build isn't 
desirable. Therefore compilers should focus on what is necessary 
for code gen and optimization and sanitizers should focus on 
correctness.

This is different from Rust, who do sanitization as part of their 
compilation, but that makes the compiler more complicated and/or 
much _slower_.

 (I did suggest the linker as an alternative, but you all have 
 already explained why that can't work for C.)

It can work if you compile all source files with the same 
compiler, that has historically not been the case as commercial 
libraries would be compiled with other compilers or be 
handwritten assembly.

C compilers that do Whole Program Analysis have dedicated linkers 
that should be able to do extended type checking if the IR used 
in the object file provides typing info. I don't know if Clang or 
GCC does emit typing info though, but they _could_. Yes.

Feb 05 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

Let me add to this that the superior approach is to compile to an 
intermediated high level format that retains type information. I 
guess this is where Rust is heading.

It just isn't possible with C semantics to make a reasonable 
version of that, since the language itself is 90% unsafe and just 
a small step up from assembly (for good and bad).

Feb 05 2016

anonymous <anonymous example.com> writes:

On 04.02.2016 23:57, tsbockman wrote:
 http://www.underhanded-c.org/#winner

 Actually, I'm surprised that this works even in C - I would have
 expected at least a compiler (or linker?) warning; this seems like it
 should be easy to detect automatically.

You can do the same thing in D, using extern(C) to get no mangling:

main.d:
----
alias float_t = double;
extern(C) float_t deref(float_t* a);
void main()
{
     import std.stdio: writeln;
     float_t d = 1.23;
     writeln(deref(&d)); /* prints "1.01856e-314" */
}
----

deref.d:
----
alias float_t = float;
extern(C) float_t deref(float_t* a) {return *a;}
----

Command to build and run:
----
dmd main.d deref.d && ./main
----

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:40:13 UTC, anonymous wrote:
 You can do the same thing in D, using extern(C) to get no 
 mangling:

 main.d:
 ----
 alias float_t = double;
 extern(C) float_t deref(float_t* a);
 void main()
 {
     import std.stdio: writeln;
     float_t d = 1.23;
     writeln(deref(&d)); /* prints "1.01856e-314" */
 }
 ----

 deref.d:
 ----
 alias float_t = float;
 extern(C) float_t deref(float_t* a) {return *a;}
 ----

 Command to build and run:
 ----
 dmd main.d deref.d && ./main
 ----

You can do the same thing in D if you try, but it's not natural 
at all to use `extern(C)` for *internal* linkage of an all-D 
program like that.

Any competent reviewer would certainly question why you were 
using `extern(C)`; this scores much lower in "underhanded-ness" 
than the original C program.

Even so, I think that qualifies as a compiler bug or a hole in 
the D spec.

Feb 04 2016

anonymous <anonymous example.com> writes:

On 05.02.2016 00:47, tsbockman wrote:
 You can do the same thing in D if you try, but it's not natural at all
 to use `extern(C)` for *internal* linkage of an all-D program like that.

 Any competent reviewer would certainly question why you were using
 `extern(C)`; this scores much lower in "underhanded-ness" than the
 original C program.

We do have a lot of bindings to C libraries, though. When there's a 
wrong alias in one of them, you have the same scenario.

 Even so, I think that qualifies as a compiler bug or a hole in the D spec.

Can anything be done about it? The compiler simply has no way to verify 
declarations, has it?

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:51:57 UTC, anonymous wrote:
 We do have a lot of bindings to C libraries, though. When 
 there's a wrong alias in one of them, you have the same 
 scenario.

 On 05.02.2016 00:47, tsbockman wrote:
 Even so, I think that qualifies as a compiler bug or a hole in 
 the D spec.

 Can anything be done about it? The compiler simply has no way 
 to verify declarations, has it?

The compiler cannot (in the general case) verify that `extern(C)` 
declarations are *correct*. What it could do, though, is verify 
that they are *consistent*.

If the same `extern(C)` symbol is declared multiple places in the 
D source code for a program, the compiler should issue at least a 
warning if the D signatures don't agree with each other.

Feb 04 2016

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Friday, 5 February 2016 at 00:03:20 UTC, tsbockman wrote:
 If the same `extern(C)` symbol is declared multiple places in 
 the D source code for a program, the compiler should issue at 
 least a warning if the D signatures don't agree with each other.

I guess D could do it, although this is a rather unlikely source 
for bugs.

C cannot do it. It would be annoying as declarations are file 
local.

C doesn't really build programs, it builds object files that are 
linked into a program.

It makes perfect sense for one compilation unit to type a 
parameter pointer to float  and another unit to type the same 
parameter as a simd-array of floats. The underlying code could be 
machine language. And in machine language there are no types (on 
current CPUs), only bit patterns. So you can have multiple 
reasonable interpretations of the same machine language entry.

A type is a constraint, but it isn't a property of the actual 
bits, it is a language specific interpretation.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 00:12:07 UTC, Ola Fosheim Grøstad 
wrote:
 It makes perfect sense for one compilation unit to type a 
 parameter pointer to float  and another unit to type the same 
 parameter as a simd-array of floats. The underlying code could 
 be machine language. And in machine language there are no types 
 (on current CPUs), only bit patterns. So you can have multiple 
 reasonable interpretations of the same machine language entry.

 A type is a constraint, but it isn't a property of the actual 
 bits, it is a language specific interpretation.

Aliasing types like that can be useful sometimes, but only within 
certain limits. In particular, the size (with alignment padding) 
of the types in question must match, otherwise you will corrupt 
the stack.

It is often useful to cast from one pointer type to another, but 
that is why C has void* and explicit casts - so that one may 
document that the reinterpretation is intentional.

Feb 04 2016

Daniel Murphy <yebbliesnospam gmail.com> writes:

On 5/02/2016 11:03 AM, tsbockman wrote:
 The compiler cannot (in the general case) verify that `extern(C)`
 declarations are *correct*. What it could do, though, is verify that
 they are *consistent*.

 If the same `extern(C)` symbol is declared multiple places in the D
 source code for a program, the compiler should issue at least a warning
 if the D signatures don't agree with each other.

Currently D allows overloading extern(C) declarations, see
https://issues.dlang.org/show_bug.cgi?id=15217

Checking for invalid overloads with non-D linkage is covered here:
https://issues.dlang.org/show_bug.cgi?id=2789

But neither of these cover overloads that aren't simultaneously visible.
15217 shows us that this lack of checking, when combined with D's 
abundant binary-compatible-but-distinct types, is somewhat useful.

Apart from some scary ABI hacks there is nothing really stopping us from 
enforcing that all non-D function in all modules included in a single 
compilation have distinct symbol names or (at least binary-compatible) 
matching D parameters.

Feb 05 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 10:49:50 UTC, Daniel Murphy wrote:
 Currently D allows overloading extern(C) declarations, see
 https://issues.dlang.org/show_bug.cgi?id=15217

 Checking for invalid overloads with non-D linkage is covered 
 here:
 https://issues.dlang.org/show_bug.cgi?id=2789

 But neither of these cover overloads that aren't simultaneously 
 visible.
 15217 shows us that this lack of checking, when combined with 
 D's abundant binary-compatible-but-distinct types, is somewhat 
 useful.

 Apart from some scary ABI hacks there is nothing really 
 stopping us from enforcing that all non-D function in all 
 modules included in a single compilation have distinct symbol 
 names or (at least binary-compatible) matching D parameters.

I think it makes sense (when actually linking to C) to allow 
stuff like druntime's creative use of overloads. The signatures 
of the two bsd_signal() overloads are compatible (from C's 
perspective), so why not?

However, multiple `extern(C)` overloads that differ in the number 
or size of arguments should trigger a warning. Signed versus 
unsigned or even int versus floating point is more of a gray area.

Overloads with conflicting pointer types should definitely be 
allowed, but ideally the compiler would force them to be marked 
 system or  trusted, since there is an implied unsafe cast in 
there somewhere.

Feb 05 2016

Daniel Murphy <yebbliesnospam gmail.com> writes:

On 5/02/2016 10:07 PM, tsbockman wrote:
 I think it makes sense (when actually linking to C) to allow stuff like
 druntime's creative use of overloads. The signatures of the two
 bsd_signal() overloads are compatible (from C's perspective), so why not?

 However, multiple `extern(C)` overloads that differ in the number or
 size of arguments should trigger a warning. Signed versus unsigned or
 even int versus floating point is more of a gray area.

That's what I meant by binary compatible.

 Overloads with conflicting pointer types should definitely be allowed,
 but ideally the compiler would force them to be marked  system or
  trusted, since there is an implied unsafe cast in there somewhere.

Safety on C functions is always going to need to be hand verified, the 
presence of overloads doesn't change that.  Conflicting pointer types 
are pretty much the same as a function taking void* - all the unsafe 
stuff is on the other side and invisible to the D compiler.

Feb 05 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via Digitalmars-d wrote:
[...]
 You can do the same thing in D if you try, but it's not natural at all
 to use `extern(C)` for *internal* linkage of an all-D program like
 that.
 
 Any competent reviewer would certainly question why you were using
 `extern(C)`; this scores much lower in "underhanded-ness" than the
 original C program.
 
 Even so, I think that qualifies as a compiler bug or a hole in the D
 spec.

Nah... while D, by default, tries to be type-safe and prevent guffaws
like the above, it *is* also a systems programming language (or at
least, that's one of the stated goals), so it does allow you to go under
the hood to do things that you normally aren't allowed to do.

Linking to foreign languages is a use case for allowing extern(C)
function names: if you know the mangling scheme of the target language,
you can declare the mangled name under extern(C) and that will allow D
code to call functions written in the target language directly.
Otherwise you'd have to change the compiler (and wait for the next
release, etc.) before you could do that.


T

-- 
Do not reason with the unreasonable; you lose by definition.

Feb 04 2016

Chris Wright <dhasenan gmail.com> writes:

On Thu, 04 Feb 2016 15:59:06 -0800, H. S. Teoh via Digitalmars-d wrote:

 Nah... while D, by default, tries to be type-safe and prevent guffaws
 like the above, it *is* also a systems programming language (or at
 least, that's one of the stated goals), so it does allow you to go under
 the hood to do things that you normally aren't allowed to do.

Which suggests a check of this sort should be a warning rather than an 
error, or perhaps that a pragma or attribute could be offered to ignore 
it.

Systems languages let you go into "Here Be Dragons" territory, but it 
would be nice if they still pointed out the signs to you.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 00:07:45 UTC, Chris Wright wrote:
 Which suggests a check of this sort should be a warning rather 
 than an error, or perhaps that a pragma or attribute could be 
 offered to ignore it.

 Systems languages let you go into "Here Be Dragons" territory, 
 but it would be nice if they still pointed out the signs to you.

Yes.

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Thursday, 4 February 2016 at 23:59:06 UTC, H. S. Teoh wrote:
 On Thu, Feb 04, 2016 at 11:47:53PM +0000, tsbockman via 
 Digitalmars-d wrote: [...]
 Even so, I think that qualifies as a compiler bug or a hole in 
 the D spec.

 Nah... while D, by default, tries to be type-safe and prevent 
 guffaws like the above, it *is* also a systems programming 
 language (or at least, that's one of the stated goals), so it 
 does allow you to go under the hood to do things that you 
 normally aren't allowed to do.

 Linking to foreign languages is a use case for allowing 
 extern(C) function names: if you know the mangling scheme of 
 the target language, you can declare the mangled name under 
 extern(C) and that will allow D code to call functions written 
 in the target language directly. Otherwise you'd have to change 
 the compiler (and wait for the next release, etc.) before you 
 could do that.


 T

I'm not saying that `extern(C)` is bad in general; I understand 
why it's necessary.

I'm saying that anonymous' example 
(http://forum.dlang.org/post/n90ngu$1r6v$1 digitalmars.com) 
showcases a hole in the spec, because in it the D compiler has 
access to the full source code of the function being linked to, 
and doesn't bother to verify that its signature in main.d is 
compatible with the definition in deref.d.

If the D compiler does *not* have access to the function's 
definition, then obviously it cannot perform this verification.

Feb 04 2016

Adam D. Ruppe <destructionator gmail.com> writes:

On Thursday, 4 February 2016 at 22:57:00 UTC, tsbockman wrote:
 The first place entry is particularly ridiculous; is there any 
 modern language that would make it so easy to commit such an 
 awful "mistake"?

D allows that. This is why I recommend putting `static 
assert(foo.sizeof == expectation);` in code that interfaces with 
external things, like C code, or D .di stuff.

#include <math.h> /* sqrt */

that line is an interesting one too: the trick is depending on 
namespace pollution by the include. In D, you might write `import 
core.stdc.math : sqrt;` and make that misleading comment part of 
the code.... though then you could perhaps exploit that module 
bug (314?).

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 01:14:05 UTC, Adam D. Ruppe wrote:
 D allows that. This is why I recommend putting `static 
 assert(foo.sizeof == expectation);` in code that interfaces 
 with external things, like C code, or D .di stuff.

 #include <math.h> /* sqrt */

D *doesn't* allow that though - at least, not in a monolithic, 
idiomatic D program: there wouldn't be any duplicate declaration 
of `spectral_contrast()` to mess up.

Yes, you can force the matter using `extern(C)` like anonymous 
demonstrated earlier - but using `extern(C)` for internal linkage 
in an all-D program would certainly attract scrutiny from 
reviewers; it would score poorly on the "underhanded-ness" test.

As to the ".di" stuff - I've not used them. Care to educate me? 
How can they cause similar problems?

 that line is an interesting one too: the trick is depending on 
 namespace pollution by the include. In D, you might write 
 `import core.stdc.math : sqrt;` and make that misleading 
 comment part of the code.... though then you could perhaps 
 exploit that module bug (314?).

314 definitely has potential. Should we start an "Underhanded D" 
contest? Sounds like bad marketing, but a lot of fun :-P

Feb 04 2016

Adam D. Ruppe <destructionator gmail.com> writes:

On Friday, 5 February 2016 at 01:33:14 UTC, tsbockman wrote:
 As to the ".di" stuff - I've not used them. Care to educate me? 
 How can they cause similar problems?

Well, technically, a .di file is just a .d file renamed, but it 
tends to have the bodies stripped out. Separate compliation is a 
supported feature of D.

The way you'd do it is something like this:

struct Foo {
    float a;
    float b;
}

void bar(Foo* f) {
    f.b = whatever;
}


Then compile it with -lib and make a "header" file manually:

struct Foo {
    double a;
    double b;
}
void bar(Foo*);


You can now create D modules that import this and link against 
the compiled library. Very similar to C's model...

But I redefined Foo! The name mangling won't catch this. bar will 
be mangled to take `Foo` as an argument and the linker will catch 
if we change that, but it doesn't know what Foo actually is.

By changing that, we introduce the problem.

 314 definitely has potential. Should we start an "Underhanded 
 D" contest? Sounds like bad marketing, but a lot of fun :-P

it might be :)

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 04:25:09 UTC, Adam D. Ruppe wrote:
 On Friday, 5 February 2016 at 01:33:14 UTC, tsbockman wrote:
 As to the ".di" stuff - I've not used them. Care to educate 
 me? How can they cause similar problems?

 Well, technically, a .di file is just a .d file renamed, but it 
 tends to have the bodies stripped out. Separate compliation is 
 a supported feature of D.

 The way you'd do it is something like this:

 struct Foo {
    float a;
    float b;
 }

 void bar(Foo* f) {
    f.b = whatever;
 }


 Then compile it with -lib and make a "header" file manually:

 struct Foo {
    double a;
    double b;
 }
 void bar(Foo*);


 You can now create D modules that import this and link against 
 the compiled library. Very similar to C's model...

 But I redefined Foo! The name mangling won't catch this. bar 
 will be mangled to take `Foo` as an argument and the linker 
 will catch if we change that, but it doesn't know what Foo 
 actually is.

 By changing that, we introduce the problem.

 314 definitely has potential. Should we start an "Underhanded 
 D" contest? Sounds like bad marketing, but a lot of fun :-P

 it might be :)

Thanks for the explanation. That does sound basically the same as 
the C issue.

Since .di files are normally generated automatically, this seems 
like an easily solvable problem:

1) When compiling a library and its attendant .di file(s), 
generate a unique version identifier (such as a UUID or a hash of 
the completed binary) and append it to both the library and each 
.di file.

2) Whenever someone tries to link against the library, verify 
that the version ID matches. If it does not, issue a prominent 
warning.

Problem solved? Or is this harder than it looks?

(Of course there are various details to consider, such as how to 
efficiently share one set of .di files across many 
platforms/compiler settings; this is just a rough sketch.)

Feb 04 2016

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Fri, Feb 05, 2016 at 04:39:13AM +0000, tsbockman via Digitalmars-d wrote:
[...]
 Thanks for the explanation. That does sound basically the same as the
 C issue.
 
 Since .di files are normally generated automatically, this seems like
 an easily solvable problem:
 
 1) When compiling a library and its attendant .di file(s), generate a
 unique version identifier (such as a UUID or a hash of the completed
 binary) and append it to both the library and each .di file.
 
 2) Whenever someone tries to link against the library, verify that the
 version ID matches. If it does not, issue a prominent warning.

[...]

This would break shared library upgrades that do not change the ABI.

Plus, it doesn't fix wrong linkage at runtime, because the dynamic
linker is part of the OS and the D compiler has no control over what it
does beyond the standard symbol matching and relocation mechanisms. If
you compile against libfoo, but at runtime the user happens to have a
stale, ABI-incompatible version of libfoo hanging around that gets
picked up by the dynamic linker, you'll have the same problem.


T

-- 
VI = Visual Irritation

Feb 04 2016

tsbockman <thomas.bockman gmail.com> writes:

On Friday, 5 February 2016 at 07:15:56 UTC, H. S. Teoh wrote:
 This would break shared library upgrades that do not change the 
 ABI.

 Plus, it doesn't fix wrong linkage at runtime, because the 
 dynamic linker is part of the OS and the D compiler has no 
 control over what it does beyond the standard symbol matching 
 and relocation mechanisms. If you compile against libfoo, but 
 at runtime the user happens to have a stale, ABI-incompatible 
 version of libfoo hanging around that gets picked up by the 
 dynamic linker, you'll have the same problem.

I should have clarified that I was considering static libraries, 
only. (I thought D's dynamic library support was kind of broken 
right at the moment, anyway?)

Dynamic libraries are definitely a harder problem. I think useful 
automated protection against bad .di files could be developed for 
dynamic libraries as well, but the scheme wouldn't be anywhere 
near as simple and it might require the maintainer to actually 
follow SemVer to be useful.

Feb 04 2016

D Programming

C/C++ Programming

Other

digitalmars.D - Type safety could prevent nuclear war