www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Ideas from Clang

reply bearophile <bearophileHUGS lycos.com> writes:
A belated comment on the GoingNative 2012 talk "Defending C++ fom Murphy's
Million Monkeys" by Chandler Carruth:
http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Clang-Defending-C-from-Murphy-s-Million-Monkeys

The slides:
http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf

This talk shows some ways used by Clang to help the programmer spot or fix some
mistakes in C++ code (according to Chandler, all bugs shown in the talk are
real bugs found in production software). I think some of those ideas are
interesting or useful for D too.

-----------------------------

Slides page 31 - 32, 14.48 in the video:

I have adapted the Clang example to this D code:


class BaseType {}
static int basetype;
class DerivedType : Basetype {}
void main() {}


The latest DMD gives this error:
test.d(3): Error: undefined identifier Basetype, did you mean variable basetype?


The idea here is to restrict the list of names to search for similar ones only
among the class names, because here "basetype" is an int so it can't be what
the programmer meant. There are few other groups of names for other situations.

This also reduces the work of the routines that look for similar names, because
they need to work on a shorter list of possibilities.

-----------------------------

Slides page 47 - 48, 21.24 in the video:

C++ code:

static const long long DiskCacheSize = 8 << 30; // 8 Gigs

Clang gives:

% clang++ -std=c++11 -fsyntax-only overflow.cpp
overflow.cpp:1:42: warning: signed shift result (0x200000000) requires 35 
bits to represent, but 'int' only has 32 bits [-Wshift-overflow]
static const long long DiskCacheSize = 8 << 30; // 8 Gigs
                                       ~ ^  ~~

In D/DMD this compiles with no errors or warnings:

static const long diskCacheSize = 8 << 30;
static assert(diskCacheSize == 0);
void main() {}


This kind of overflow errors are important and common enough, so I think D too
has to spot them, as Clang.


I have done a little test, and I've seen that with the latest Clang this C++
code gives no warnings:

const static unsigned int x = -1;

-----------------------------

Slides 57 - 58, 28.11 in the video:

C++ code:

void test(bool b, double x, double y) {
  if (b || x < y && x > 0) {
    // ...
  }
}


Clang gives:


% clang++ -std=c++11 -fsyntax-only -Wparentheses parentheses3.cpp
parentheses3.cpp:2:18: warning: '&&' within '||' [-Wlogical-op-
parentheses]
  if (b || x < y && x > 0) {
        ~~ ~~~~~~^~~~~~~~
parentheses3.cpp:2:18: note: place parentheses around the '&&' expression
to silence this warning
  if (b || x < y && x > 0) {
                 ^
           (             )
1 warning generated.


DMD compiles that code with no warnings. I think a similar warning is useful in
D/DMD too.

-----------------------------

Slides 59 - 60, 29.17 in the video:

int test(bool b, int x, int y) {
  return 42 + b ? x : y;
}


Clang gives:


% clang++ -std=c++11 -fsyntax-only -Wparentheses parentheses4.cpp
parentheses4.cpp:2:17: warning: operator '?:' has lower precedence than 
'+'; '+' will be evaluated first [-Wparentheses]
  return 42 + b ? x : y;
         ~~~~~~ ^
parentheses4.cpp:2:17: note: place parentheses around the '+' expression 
to silence this warning
  return 42 + b ? x : y;
                ^
         (     )
parentheses4.cpp:2:17: note: place parentheses around the '?:' expression
to evaluate it first
  return 42 + b ? x : y;
                ^
              (        )
1 warning generated.


DMD compiles that code with no warnings.
(Maybe here Don thinks that accepting int+bool in the language rules is bad in
the first place.)

-----------------------------

A little comment on slides 63 - 64: in D when I switch on an enumeration, I
always use a final switch. I think use cases for nonfinal switches on enums are
rare in D code.

-----------------------------

Slides 65 - 66, 31.57 in the video:


constexpr int arr_size = 42;
constexpr int N = 44;
void f(int);
int test() {
  int arr[arr_size];
  // ...
  f(arr[N]);
  // ...
  if (N < arr_size) return arr[N];
  return 0;
}


Clang gives:

% clang++ -std=c++11 -fsyntax-only deadcode.cpp
deadcode.cpp:7:5: warning: array index 44 is past the end of the array
(which contains 42 elements) [-Warray-bounds]
  f(arr[N]);
    ^   ~
deadcode.cpp:5:3: note: array 'arr' declared here
  int arr[arr_size];
  ^
1 warning generated.


Similar D code:

enum int arr_size = 42;
enum int N = 44;
void foo(int) {}
int test() {
    int[arr_size] arr;
    foo(arr[N]);
    if (N < arr_size)
        return arr[N];
    return 0;
}
void main() {}


DMD is able to catch the array overflow bugs, but it shows three errors:

test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
test.d(8): Error: array index 44 is out of bounds arr[0 .. 42]


Clang doesn't give an error at the "return arr[N];" line because it's inside
the "if(N<arr_size)" branch. This is nice (despite this code looks silly, it's
real code reduced from a meaningful example, the enums come from conditional
compilation, etc).

As a related example, in D I have written code similar to:


void foo(ulong x) {
    if (x < 5_000)
        int y = x;
}
void main() {}


Here DMD asks for a cast(int) or to!int:
test.d(3): Error: cannot implicitly convert expression (x) of type ulong to int

But a bit smarter compiler/language is able to tell there's no need for casts
or conversions there.

-----------------------------

The lock annotations in slides 70 - 74 are nice.

-----------------------------

Regarding slides 76 - 79, is it useful to catch simple bugs like this
statically?


void main() {
    auto a = new int[10];
    a[10] = 2;
}

-----------------------------

Slides 80 - 81, 41.46 in the video:

Clang is able to catch this bug at runtime in C++ code:


#include <iostream>
int test(int n) {
  return 42 << n;
}
int main() {
  std::cerr << "Bogus: " << test(42)
            << "\n";
}



% clang++ -g -std=c++0x undef.cpp -o undef -fcatch-undefined-behavior
% gdb -x =(echo -e "r\nbt") --args ./undef
...
Program received signal SIGILL, Illegal instruction.
0x00000000004007fe in test (n=42) at undef.cpp:3
3         return 42 << n;
#0  0x00000000004007fe in test (n=42) at undef.cpp:3
#1  0x000000000040084b in main () at undef.cpp:6


Here Chandler says that this runtime instrumentation is light and fast. This is
good compiler sugar for future D compilers (maybe for LDC2).

-----------------------------

Taken one at a time those Clang features are small, but all at the same time
are impressive. Hopefully Clang will poke GCC developers to improve the
diagnostic features of their compiler.

I think some of those ideas are good for D too, so I'd like to open few D
enhancement requests based on this post, but I have to think more on them.
Suggestions and comments are welcome.

Bye,
bearophile
Feb 19 2012
next sibling parent reply deadalnix <deadalnix gmail.com> writes:
Le 19/02/2012 21:19, bearophile a écrit :
 A belated comment on the GoingNative 2012 talk "Defending C++ fom Murphy's
Million Monkeys" by Chandler Carruth:
 http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Clang-Defending-C-from-Murphy-s-Million-Monkeys

 The slides:
 http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf

 This talk shows some ways used by Clang to help the programmer spot or fix
some mistakes in C++ code (according to Chandler, all bugs shown in the talk
are real bugs found in production software). I think some of those ideas are
interesting or useful for D too.

 -----------------------------

 Slides page 31 - 32, 14.48 in the video:

 I have adapted the Clang example to this D code:


 class BaseType {}
 static int basetype;
 class DerivedType : Basetype {}
 void main() {}


 The latest DMD gives this error:
 test.d(3): Error: undefined identifier Basetype, did you mean variable
basetype?


 The idea here is to restrict the list of names to search for similar ones only
among the class names, because here "basetype" is an int so it can't be what
the programmer meant. There are few other groups of names for other situations.

 This also reduces the work of the routines that look for similar names,
because they need to work on a shorter list of possibilities.

 -----------------------------

 Slides page 47 - 48, 21.24 in the video:

 C++ code:

 static const long long DiskCacheSize = 8<<  30; // 8 Gigs

 Clang gives:

 % clang++ -std=c++11 -fsyntax-only overflow.cpp
 overflow.cpp:1:42: warning: signed shift result (0x200000000) requires 35
 bits to represent, but 'int' only has 32 bits [-Wshift-overflow]
 static const long long DiskCacheSize = 8<<  30; // 8 Gigs
                                         ~ ^  ~~

 In D/DMD this compiles with no errors or warnings:

 static const long diskCacheSize = 8<<  30;
 static assert(diskCacheSize == 0);
 void main() {}


 This kind of overflow errors are important and common enough, so I think D too
has to spot them, as Clang.


 I have done a little test, and I've seen that with the latest Clang this C++
code gives no warnings:

 const static unsigned int x = -1;

 -----------------------------

 Slides 57 - 58, 28.11 in the video:

 C++ code:

 void test(bool b, double x, double y) {
    if (b || x<  y&&  x>  0) {
      // ...
    }
 }


 Clang gives:


 % clang++ -std=c++11 -fsyntax-only -Wparentheses parentheses3.cpp
 parentheses3.cpp:2:18: warning: '&&' within '||' [-Wlogical-op-
 parentheses]
    if (b || x<  y&&  x>  0) {
          ~~ ~~~~~~^~~~~~~~
 parentheses3.cpp:2:18: note: place parentheses around the '&&' expression
 to silence this warning
    if (b || x<  y&&  x>  0) {
                   ^
             (             )
 1 warning generated.


 DMD compiles that code with no warnings. I think a similar warning is useful
in D/DMD too.

 -----------------------------

 Slides 59 - 60, 29.17 in the video:

 int test(bool b, int x, int y) {
    return 42 + b ? x : y;
 }


 Clang gives:


 % clang++ -std=c++11 -fsyntax-only -Wparentheses parentheses4.cpp
 parentheses4.cpp:2:17: warning: operator '?:' has lower precedence than
 '+'; '+' will be evaluated first [-Wparentheses]
    return 42 + b ? x : y;
           ~~~~~~ ^
 parentheses4.cpp:2:17: note: place parentheses around the '+' expression
 to silence this warning
    return 42 + b ? x : y;
                  ^
           (     )
 parentheses4.cpp:2:17: note: place parentheses around the '?:' expression
 to evaluate it first
    return 42 + b ? x : y;
                  ^
                (        )
 1 warning generated.


 DMD compiles that code with no warnings.
 (Maybe here Don thinks that accepting int+bool in the language rules is bad in
the first place.)

 -----------------------------

 A little comment on slides 63 - 64: in D when I switch on an enumeration, I
always use a final switch. I think use cases for nonfinal switches on enums are
rare in D code.

 -----------------------------

 Slides 65 - 66, 31.57 in the video:


 constexpr int arr_size = 42;
 constexpr int N = 44;
 void f(int);
 int test() {
    int arr[arr_size];
    // ...
    f(arr[N]);
    // ...
    if (N<  arr_size) return arr[N];
    return 0;
 }


 Clang gives:

 % clang++ -std=c++11 -fsyntax-only deadcode.cpp
 deadcode.cpp:7:5: warning: array index 44 is past the end of the array
 (which contains 42 elements) [-Warray-bounds]
    f(arr[N]);
      ^   ~
 deadcode.cpp:5:3: note: array 'arr' declared here
    int arr[arr_size];
    ^
 1 warning generated.


 Similar D code:

 enum int arr_size = 42;
 enum int N = 44;
 void foo(int) {}
 int test() {
      int[arr_size] arr;
      foo(arr[N]);
      if (N<  arr_size)
          return arr[N];
      return 0;
 }
 void main() {}


 DMD is able to catch the array overflow bugs, but it shows three errors:

 test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
 test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
 test.d(8): Error: array index 44 is out of bounds arr[0 .. 42]


 Clang doesn't give an error at the "return arr[N];" line because it's inside
the "if(N<arr_size)" branch. This is nice (despite this code looks silly, it's
real code reduced from a meaningful example, the enums come from conditional
compilation, etc).

 As a related example, in D I have written code similar to:


 void foo(ulong x) {
      if (x<  5_000)
          int y = x;
 }
 void main() {}


 Here DMD asks for a cast(int) or to!int:
 test.d(3): Error: cannot implicitly convert expression (x) of type ulong to int

 But a bit smarter compiler/language is able to tell there's no need for casts
or conversions there.

 -----------------------------

 The lock annotations in slides 70 - 74 are nice.

 -----------------------------

 Regarding slides 76 - 79, is it useful to catch simple bugs like this
statically?


 void main() {
      auto a = new int[10];
      a[10] = 2;
 }

 -----------------------------

 Slides 80 - 81, 41.46 in the video:

 Clang is able to catch this bug at runtime in C++ code:


 #include<iostream>
 int test(int n) {
    return 42<<  n;
 }
 int main() {
    std::cerr<<  "Bogus: "<<  test(42)
              <<  "\n";
 }



 % clang++ -g -std=c++0x undef.cpp -o undef -fcatch-undefined-behavior
 % gdb -x =(echo -e "r\nbt") --args ./undef
 ...
 Program received signal SIGILL, Illegal instruction.
 0x00000000004007fe in test (n=42) at undef.cpp:3
 3         return 42<<  n;
 #0  0x00000000004007fe in test (n=42) at undef.cpp:3
 #1  0x000000000040084b in main () at undef.cpp:6


 Here Chandler says that this runtime instrumentation is light and fast. This
is good compiler sugar for future D compilers (maybe for LDC2).

 -----------------------------

 Taken one at a time those Clang features are small, but all at the same time
are impressive. Hopefully Clang will poke GCC developers to improve the
diagnostic features of their compiler.

 I think some of those ideas are good for D too, so I'd like to open few D
enhancement requests based on this post, but I have to think more on them.
Suggestions and comments are welcome.

 Bye,
 bearophile

+100 Especially for parenthesis. « If you think you can remember precendence rules beyond +-*/, you're only 99% right. Also, nobody else can remember them, so your code is now 1% buggy, and 100% unmaintainable. Use brackets. » : http://home.comcast.net/~tom_forsyth/blog.wiki.html#OffendOMatic
Feb 22 2012
parent "so" <so so.so> writes:
On Wednesday, 22 February 2012 at 14:27:18 UTC, deadalnix wrote:
 Le 19/02/2012 21:19, bearophile a écrit :
 A belated comment on the GoingNative 2012 talk "Defending C++ 
 fom Murphy's Million Monkeys" by Chandler Carruth:
 http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Clang-Defending-C-from-Murphy-s-Million-Monkeys

 The slides:
 http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf

 This talk shows some ways used by Clang to help the programmer 
 spot or fix some mistakes in C++ code (according to Chandler, 
 all bugs shown in the talk are real bugs found in production 
 software). I think some of those ideas are interesting or 
 useful for D too.

 -----------------------------

 Slides page 31 - 32, 14.48 in the video:

 I have adapted the Clang example to this D code:


 class BaseType {}
 static int basetype;
 class DerivedType : Basetype {}
 void main() {}


 The latest DMD gives this error:
 test.d(3): Error: undefined identifier Basetype, did you mean 
 variable basetype?


 The idea here is to restrict the list of names to search for 
 similar ones only among the class names, because here 
 "basetype" is an int so it can't be what the programmer meant. 
 There are few other groups of names for other situations.

 This also reduces the work of the routines that look for 
 similar names, because they need to work on a shorter list of 
 possibilities.

 -----------------------------

 Slides page 47 - 48, 21.24 in the video:

 C++ code:

 static const long long DiskCacheSize = 8<<  30; // 8 Gigs

 Clang gives:

 % clang++ -std=c++11 -fsyntax-only overflow.cpp
 overflow.cpp:1:42: warning: signed shift result (0x200000000) 
 requires 35
 bits to represent, but 'int' only has 32 bits 
 [-Wshift-overflow]
 static const long long DiskCacheSize = 8<<  30; // 8 Gigs
                                        ~ ^  ~~

 In D/DMD this compiles with no errors or warnings:

 static const long diskCacheSize = 8<<  30;
 static assert(diskCacheSize == 0);
 void main() {}


 This kind of overflow errors are important and common enough, 
 so I think D too has to spot them, as Clang.


 I have done a little test, and I've seen that with the latest 
 Clang this C++ code gives no warnings:

 const static unsigned int x = -1;

 -----------------------------

 Slides 57 - 58, 28.11 in the video:

 C++ code:

 void test(bool b, double x, double y) {
   if (b || x<  y&&  x>  0) {
     // ...
   }
 }


 Clang gives:


 % clang++ -std=c++11 -fsyntax-only -Wparentheses 
 parentheses3.cpp
 parentheses3.cpp:2:18: warning: '&&' within '||' [-Wlogical-op-
 parentheses]
   if (b || x<  y&&  x>  0) {
         ~~ ~~~~~~^~~~~~~~
 parentheses3.cpp:2:18: note: place parentheses around the '&&' 
 expression
 to silence this warning
   if (b || x<  y&&  x>  0) {
                  ^
            (             )
 1 warning generated.


 DMD compiles that code with no warnings. I think a similar 
 warning is useful in D/DMD too.

 -----------------------------

 Slides 59 - 60, 29.17 in the video:

 int test(bool b, int x, int y) {
   return 42 + b ? x : y;
 }


 Clang gives:


 % clang++ -std=c++11 -fsyntax-only -Wparentheses 
 parentheses4.cpp
 parentheses4.cpp:2:17: warning: operator '?:' has lower 
 precedence than
 '+'; '+' will be evaluated first [-Wparentheses]
   return 42 + b ? x : y;
          ~~~~~~ ^
 parentheses4.cpp:2:17: note: place parentheses around the '+' 
 expression
 to silence this warning
   return 42 + b ? x : y;
                 ^
          (     )
 parentheses4.cpp:2:17: note: place parentheses around the '?:' 
 expression
 to evaluate it first
   return 42 + b ? x : y;
                 ^
               (        )
 1 warning generated.


 DMD compiles that code with no warnings.
 (Maybe here Don thinks that accepting int+bool in the language 
 rules is bad in the first place.)

 -----------------------------

 A little comment on slides 63 - 64: in D when I switch on an 
 enumeration, I always use a final switch. I think use cases 
 for nonfinal switches on enums are rare in D code.

 -----------------------------

 Slides 65 - 66, 31.57 in the video:


 constexpr int arr_size = 42;
 constexpr int N = 44;
 void f(int);
 int test() {
   int arr[arr_size];
   // ...
   f(arr[N]);
   // ...
   if (N<  arr_size) return arr[N];
   return 0;
 }


 Clang gives:

 % clang++ -std=c++11 -fsyntax-only deadcode.cpp
 deadcode.cpp:7:5: warning: array index 44 is past the end of 
 the array
 (which contains 42 elements) [-Warray-bounds]
   f(arr[N]);
     ^   ~
 deadcode.cpp:5:3: note: array 'arr' declared here
   int arr[arr_size];
   ^
 1 warning generated.


 Similar D code:

 enum int arr_size = 42;
 enum int N = 44;
 void foo(int) {}
 int test() {
     int[arr_size] arr;
     foo(arr[N]);
     if (N<  arr_size)
         return arr[N];
     return 0;
 }
 void main() {}


 DMD is able to catch the array overflow bugs, but it shows 
 three errors:

 test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
 test.d(6): Error: array index 44 is out of bounds arr[0 .. 42]
 test.d(8): Error: array index 44 is out of bounds arr[0 .. 42]


 Clang doesn't give an error at the "return arr[N];" line 
 because it's inside the "if(N<arr_size)" branch. This is nice 
 (despite this code looks silly, it's real code reduced from a 
 meaningful example, the enums come from conditional 
 compilation, etc).

 As a related example, in D I have written code similar to:


 void foo(ulong x) {
     if (x<  5_000)
         int y = x;
 }
 void main() {}


 Here DMD asks for a cast(int) or to!int:
 test.d(3): Error: cannot implicitly convert expression (x) of 
 type ulong to int

 But a bit smarter compiler/language is able to tell there's no 
 need for casts or conversions there.

 -----------------------------

 The lock annotations in slides 70 - 74 are nice.

 -----------------------------

 Regarding slides 76 - 79, is it useful to catch simple bugs 
 like this statically?


 void main() {
     auto a = new int[10];
     a[10] = 2;
 }

 -----------------------------

 Slides 80 - 81, 41.46 in the video:

 Clang is able to catch this bug at runtime in C++ code:


 #include<iostream>
 int test(int n) {
   return 42<<  n;
 }
 int main() {
   std::cerr<<  "Bogus: "<<  test(42)
             <<  "\n";
 }



 % clang++ -g -std=c++0x undef.cpp -o undef 
 -fcatch-undefined-behavior
 % gdb -x =(echo -e "r\nbt") --args ./undef
 ...
 Program received signal SIGILL, Illegal instruction.
 0x00000000004007fe in test (n=42) at undef.cpp:3
 3         return 42<<  n;
 #0  0x00000000004007fe in test (n=42) at undef.cpp:3
 #1  0x000000000040084b in main () at undef.cpp:6


 Here Chandler says that this runtime instrumentation is light 
 and fast. This is good compiler sugar for future D compilers 
 (maybe for LDC2).

 -----------------------------

 Taken one at a time those Clang features are small, but all at 
 the same time are impressive. Hopefully Clang will poke GCC 
 developers to improve the diagnostic features of their 
 compiler.

 I think some of those ideas are good for D too, so I'd like to 
 open few D enhancement requests based on this post, but I have 
 to think more on them. Suggestions and comments are welcome.

 Bye,
 bearophile

+100 Especially for parenthesis. « If you think you can remember precendence rules beyond +-*/, you're only 99% right. Also, nobody else can remember them, so your code is now 1% buggy, and 100% unmaintainable. Use brackets. » : http://home.comcast.net/~tom_forsyth/blog.wiki.html#OffendOMatic

It took me 12 scrolls to get here. Sweet revenge :)
Feb 22 2012
prev sibling parent reply Don <nospam nospam.com> writes:
On 19.02.2012 21:19, bearophile wrote:
 A belated comment on the GoingNative 2012 talk "Defending C++ fom Murphy's
Million Monkeys" by Chandler Carruth:
 http://channel9.msdn.com/Events/GoingNative/GoingNative-2012/Clang-Defending-C-from-Murphy-s-Million-Monkeys

 The slides:
 http://ecn.channel9.msdn.com/events/GoingNative12/GN12Clang.pdf

 This talk shows some ways used by Clang to help the programmer spot or fix
some mistakes in C++ code

[snip] Half of those are things you've already created bugzilla entries for.
Mar 02 2012
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
Don:

 Half of those are things you've already created bugzilla 
 entries for.

And when possible I'll think if few other ones are worth adding to bugzilla. Adding issues to Bugzilla sometimes is not enough, especially when they are enhancement requests or changes of the D language, because they often get ignored for years (and then sometimes they get closed because "it's too much late", even if maybe it wasn't too much later at the time they were added in Bugzilla). So they need to be discussed, appreciated or refused, and advertized here. Bye, bearophile
Mar 05 2012
parent Don Clugston <dac nospam.com> writes:
On 05/03/12 20:33, bearophile wrote:
 Don:

 Half of those are things you've already created bugzilla entries for.

And when possible I'll think if few other ones are worth adding to bugzilla. Adding issues to Bugzilla sometimes is not enough, especially when they are enhancement requests or changes of the D language, because they often get ignored for years (and then sometimes they get closed because "it's too much late", even if maybe it wasn't too much later at the time they were added in Bugzilla). So they need to be discussed, appreciated or refused, and advertized here.

You're dramatically underestimating your influence, I think. 286 bugs you've reported have been resolved. That's far more than anybody else. For comparison, 139 bugs reported by Andrei have been resolved. If something hasn't been addressed, it's because of other bugs you've reported! I'm not kidding - nearly 1/3 of all currently open bugs were reported by you.
Mar 06 2012