www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - More bugs found in OS code

reply bearophile <bearophileHUGS lycos.com> writes:
The now usual article that advertises the PVS-Studio tool that shows plenty of
(depressing) bugs found in already debugged source code of widely used C/C++
open source projects:

http://software.intel.com/en-us/articles/90-errors-in-open-source-projects/

There is a Reddit discussion too, but I find it useless:
http://www.reddit.com/r/programming/comments/lxjrb/examples_of_errors_detected_in_various_opensource/


In the Reddit discussion someone is free to list in D how many of the 91 bugs:
- Are not applicable or are not normally done in D;
- Are statically caught by D/DMD;
- Are always caught at runtime by DMD in non release mode;
- Are planned/discussed to be statically avoided or statically caught;
- Are planned/discussed to be always caught at runtime by DMD in non release
mode.



Here I list and comment about some of the more interesting parts. Below I have
divided the problems in 5 groups (I have not included all the bug groups shown
in the article).

A]========================================

Example 3. Fennec Media Project project. Complex expression.

uint32 CUnBitArrayOld::DecodeValueRiceUnsigned(uint32 k) 
{
  ...
  while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
    Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31])) {}
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The
'm_nCurrentBitIndex' variable is modified while being used twice at single
sequence point. MACLib unbitarrayold.cpp 78

------------------------

Example 4. Miranda IM project. Complex expression.

short ezxml_internal_dtd(ezxml_root_t root,
  char *s, size_t len)
{
  ...
  while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') {
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The 's'
variable is modified while being used twice between sequence points.msne zxml.c
371

------------------------

Currently DMD accepts this code:

x = x++;

Currently D is using the C semantics, this means the result of that line of
code is undefined, this means you don't know what it will do once compiled by
different D compilers or even with different optimization level on the same
compiler. This means writing such code in C (and currently in D) is _always_ a
bug (because a program with undefined semantics is often useless. There are
other sources of undefined semantics, but the less there are, the better it
is). I don't understand why C compilers don't refuse such code statically with
an error. In my opinion this is not acceptable.

Two basic solutions for D:
1) Define exactly what that code does in D. Walter has expressed few times his
desire for this solution.
2) Turn similar lines of code into compile-time errors (and maybe accept them
forever if the -d switch is used, but I am not sure this is a good idea).

The first solution is good because it makes it simpler to port C code to D, it
allows C/C++/Java programmers to use that code still. But such C code has
undefined results, so I don't see a great advantage here (it's useful still to
port Java code).

But lately I am slowly leaning toward the second solution, because even if D
defines exactly the semantics of code like this, so it gives the same result on
all D compilers:

(*(n = ++s + strspn(s, EZXML_WS)) && *n != '>')

I don't want to read similar code in D programs I have to debug or modify. Even
if it's unambiguous for the D language, it's a bit too much hard for me to
understand. Go language has chosen this second solution.

B]========================================

Curretly D2 accepts the usage of & and | among booleans:


void main() {
    bool a, b;
    auto r1 = a & b;
    static assert(is(typeof(r1) == bool));
    auto r2 = a | b;
    static assert(is(typeof(r2) == bool));
}


But I am not sure this is useful enough to justify the risks that gives.

Using booleans as integers is useful when I have to count them and in few other
situations. But I don't remember the last time I've had to use bitwise
or/bitwise and on boolean values, I think I have never had to do this. So maybe
it's better to forbid this. The advantage of forbidding this operation is that
it avoids several mistakes caused by operator precedence like (!x & y).
Comments on this are welcome.

C]========================================

Example 5. IPP Samples project. Priorities of ?: and | operations.

vm_file* vm_file_fopen(...)
{
  ...
  mds[3] = FILE_ATTRIBUTE_NORMAL |
           (islog == 0) ? 0 : FILE_FLAG_NO_BUFFERING;
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator
works in a different way than it was expected. The '?:' operator has a lower
priority than the '|' operator. vm vm_file_win.c 393

------------------------

Example 6. Newton Game Dynamics project. Priorities of ?: and * operations.

dgInt32 CalculateConvexShapeIntersection (...)
{
  ...
  den = dgFloat32 (1.0e-24f) *
        (den > dgFloat32 (0.0f)) ?
          dgFloat32 (1.0f) : dgFloat32 (-1.0f);
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator
works in a different way than it was expected. The '?:' operator has a lower
priority than the '*' operator. physics dgminkowskiconv.cpp 1061

D]========================================

Currently this is not caught by D, it prints "12":

import std.stdio;
void main() {
    writefln("%d%d", 1, 2, 3);
}


I'd like this code to give a compile-time error.

If this is not possible, then I'd like this code to give a run-time error.

The third possibility is to print "123", but this is bad enough.

Printing just "12" and ignoring 3, as done by DMD 2.056, is not acceptable, in
my opinion.

E]========================================

Example 1. Shareaza project. Value range of char type.

void CRemote::Output(LPCTSTR pszName)
{

  ...
  CHAR* pBytes = new CHAR[ nBytes ];
  hFile.Read( pBytes, nBytes );
  ...
  if ( nBytes > 3 && pBytes[0] == 0xEF &&
       pBytes[1] == 0xBB && pBytes[2] == 0xBF )
  {
    pBytes += 3;
    nBytes -= 3;
    bBOM = true;
  }
  ...
}

The error was found through the V547 diagnostic: Expression 'pBytes [ 0 ] ==
0xEF' is always false. The value range of signed char type: [-128, 127].
Shareaza remote.cpp 350

------------------------

Example 3. VirtualDub project. Unsigned type is always >= 0.

typedef unsigned short wint_t;
...
void lexungetc(wint_t c) {
  if (c < 0)
    return;
   g_backstack.push_back(c);
}

The error was found through the V547 diagnostic: Expression 'c < 0' is always
false. Unsigned type value is never < 0. Ami lexer.cpp 225

The "c < 0" condition is always false because the variable of the unsigned type
is always above or equal to 0.

------------------------

Example 7. QT project. Dangerous loop.

bool equals( class1* val1, class2* val2 ) const{
{
  ...
  size_t size = val1->size();
  ...
  while ( --size >= 0 ){
    if ( !comp(*itr1,*itr2) )
      return false;
    itr1++;
    itr2++;
  }
  ...
}

The error was found through the V547 diagnostic: Expression '--size >= 0' is
always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154

The (--size >= 0) condition is always true, since the size variable has the
unsigned type. It means that if two sequences being compared are alike, we will
get an overflow that will in its turn cause Access Violation or other program
failures.

------------------------

Example 8. MySQL project. Error in condition.

enum enum_mysql_timestamp_type
str_to_datetime(...)
{
  ...
  else if (str[0] != 'a' || str[0] != 'A')
    continue; /* Not AM/PM */
  ...
}

The error was found through the V547 diagnostic: Expression 'str [0] != 'a' ||
str [0] != 'A'' is always true. Probably the '&&' operator should be used here.
clientlib my_time.c 340

The condition is always true because the character is always either not equal
to 'a' or to 'A'. This is the correct check:

else if (str[0] != 'a' && str[0] != 'A')


Comments are welcome.

Bye,
bearophile
Nov 02 2011
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
bearophile wrote:
 Currently this is not caught by D, it prints "12":
 import std.stdio;
 void main() {
    writefln("%d%d", 1, 2, 3);
 }

That's not really an error. You might change out the format string at runtime based on user preferences, perhaps for internationalization, or other reasons.
Nov 02 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:

 bearophile wrote:
 Currently this is not caught by D, it prints "12":
 import std.stdio;
 void main() {
    writefln("%d%d", 1, 2, 3);
 }

That's not really an error. You might change out the format string at runtime based on user preferences, perhaps for internationalization, or other reasons.

Right, the format string is in general a run-time value, so you can't always catch this situation at compile-time. But how many times do you want to ignore some of the arguments listed? Even if this happens (and I don't remember needing this), I think it's not common enough to justify so permissive semantics at run-time. Recently Andrei A. has expressed some opinions on this topic, but I don't remember the bug report number. Bye, bearophile
Nov 02 2011
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
bearophile:
 But how many times do you want to ignore some of the 
 arguments listed? 

I do a lot. The way I do it is the arguments are made available to the format, but it doesn't always need them at runtime. string f = showNames ? "%1$s\t%2$d" : "%2$d"; writefln(f, name, number); Though I don't literally use writefln for most my code the same idea applies.
Nov 02 2011
next sibling parent bcs <bcs example.com> writes:
On 11/02/2011 09:00 PM, Adam D. Ruppe wrote:
 bearophile:
 But how many times do you want to ignore some of the
 arguments listed?

I do a lot. The way I do it is the arguments are made available to the format, but it doesn't always need them at runtime. string f = showNames ? "%1$s\t%2$d" : "%2$d"; writefln(f, name, number); Though I don't literally use writefln for most my code the same idea applies.

If you use a literal format string, don't use indexed formatting and don't use all the args, I think it would be safe to call that a bug in most any case. Fail any one of those pre-conditions and you may have a point.
Nov 03 2011
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Adam D. Ruppe:

 I do a lot. The way I do it is the arguments are made
 available to the format, but it doesn't always need them
 at runtime.
 
 string f = showNames ? "%1$s\t%2$d" : "%2$d";
 writefln(f, name, number);
 
 Though I don't literally use writefln for most
 my code the same idea applies.

Python is strict here:
 "%d" % (1)



 "%d" % (1, 2)



File "<stdin>", line 1, in <module> TypeError: not all arguments converted during string formatting Even if you use that idiom, it has costs, measured in bugs. My experience tells me that a sloppy semantics, introduced or kept for a little convenience, always manages to find a way to bite your ass later. So I'd like more tidyness here. You are allowed to write: if (showNames) writeln("%1$s\t%2$d", name, number); else writeln("%2$d", name); There are also other solutions that don't compromise the already low safety of writeln, that is a dynamically typed isle in a statically typed language. Bye, bearophile
Nov 05 2011
prev sibling parent reply Andrej Mitrovic <andrej.mitrovich gmail.com> writes:
Personally I like using this for a sort of on/off switch:
bool val;
// ..
val ^= 1;

Maybe that's just stupid but I'm kind of used to it, lol. :p
Nov 02 2011
parent bearophile <bearophileHUGS lycos.com> writes:
Andrej Mitrovic:

 Personally I like using this for a sort of on/off switch:
 bool val;
 // ..
 val ^= 1;
 
 Maybe that's just stupid but I'm kind of used to it, lol. :p

I use this, I think it's more explicit (but you have to state the variable name two times): val = !val; Bye, bearophile
Nov 02 2011