digitalmars.D - More bugs found in OS code

bearophile (146/146) Nov 02 2011 The now usual article that advertises the PVS-Studio tool that shows ple...

Adam D. Ruppe (4/9) Nov 02 2011 That's not really an error. You might change out the format

Andrej Mitrovic (5/5) Nov 02 2011 Personally I like using this for a sort of on/off switch:

bearophile (5/11) Nov 02 2011 I use this, I think it's more explicit (but you have to state the variab...

bearophile (6/16) Nov 02 2011 Right, the format string is in general a run-time value, so you can't al...

Adam D. Ruppe (8/10) Nov 02 2011 I do a lot. The way I do it is the arguments are made

bcs (5/15) Nov 03 2011 If you use a literal format string, don't use indexed formatting and
bearophile (15/26) Nov 05 2011 '1'

bearophile <bearophileHUGS lycos.com> writes:

The now usual article that advertises the PVS-Studio tool that shows plenty of
(depressing) bugs found in already debugged source code of widely used C/C++
open source projects:

http://software.intel.com/en-us/articles/90-errors-in-open-source-projects/

There is a Reddit discussion too, but I find it useless:
http://www.reddit.com/r/programming/comments/lxjrb/examples_of_errors_detected_in_various_opensource/


In the Reddit discussion someone is free to list in D how many of the 91 bugs:
- Are not applicable or are not normally done in D;
- Are statically caught by D/DMD;
- Are always caught at runtime by DMD in non release mode;
- Are planned/discussed to be statically avoided or statically caught;
- Are planned/discussed to be always caught at runtime by DMD in non release
mode.



Here I list and comment about some of the more interesting parts. Below I have
divided the problems in 5 groups (I have not included all the bug groups shown
in the article).

A]========================================

Example 3. Fennec Media Project project. Complex expression.

uint32 CUnBitArrayOld::DecodeValueRiceUnsigned(uint32 k) 
{
  ...
  while (!(m_pBitArray[m_nCurrentBitIndex >> 5] &
    Powers_of_Two_Reversed[m_nCurrentBitIndex++ & 31])) {}
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The
'm_nCurrentBitIndex' variable is modified while being used twice at single
sequence point. MACLib unbitarrayold.cpp 78

------------------------

Example 4. Miranda IM project. Complex expression.

short ezxml_internal_dtd(ezxml_root_t root,
  char *s, size_t len)
{
  ...
  while (*(n = ++s + strspn(s, EZXML_WS)) && *n != '>') {
  ...
}

The error was found through the V567 diagnostic: Undefined behavior. The 's'
variable is modified while being used twice between sequence points.msne zxml.c
371

------------------------

Currently DMD accepts this code:

x = x++;

Currently D is using the C semantics, this means the result of that line of
code is undefined, this means you don't know what it will do once compiled by
different D compilers or even with different optimization level on the same
compiler. This means writing such code in C (and currently in D) is _always_ a
bug (because a program with undefined semantics is often useless. There are
other sources of undefined semantics, but the less there are, the better it
is). I don't understand why C compilers don't refuse such code statically with
an error. In my opinion this is not acceptable.

Two basic solutions for D:
1) Define exactly what that code does in D. Walter has expressed few times his
desire for this solution.
2) Turn similar lines of code into compile-time errors (and maybe accept them
forever if the -d switch is used, but I am not sure this is a good idea).

The first solution is good because it makes it simpler to port C code to D, it
allows C/C++/Java programmers to use that code still. But such C code has
undefined results, so I don't see a great advantage here (it's useful still to
port Java code).

But lately I am slowly leaning toward the second solution, because even if D
defines exactly the semantics of code like this, so it gives the same result on
all D compilers:

(*(n = ++s + strspn(s, EZXML_WS)) && *n != '>')

I don't want to read similar code in D programs I have to debug or modify. Even
if it's unambiguous for the D language, it's a bit too much hard for me to
understand. Go language has chosen this second solution.

B]========================================

Curretly D2 accepts the usage of & and | among booleans:


void main() {
    bool a, b;
    auto r1 = a & b;
    static assert(is(typeof(r1) == bool));
    auto r2 = a | b;
    static assert(is(typeof(r2) == bool));
}


But I am not sure this is useful enough to justify the risks that gives.

Using booleans as integers is useful when I have to count them and in few other
situations. But I don't remember the last time I've had to use bitwise
or/bitwise and on boolean values, I think I have never had to do this. So maybe
it's better to forbid this. The advantage of forbidding this operation is that
it avoids several mistakes caused by operator precedence like (!x & y).
Comments on this are welcome.

C]========================================

Example 5. IPP Samples project. Priorities of ?: and | operations.

vm_file* vm_file_fopen(...)
{
  ...
  mds[3] = FILE_ATTRIBUTE_NORMAL |
           (islog == 0) ? 0 : FILE_FLAG_NO_BUFFERING;
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator
works in a different way than it was expected. The '?:' operator has a lower
priority than the '|' operator. vm vm_file_win.c 393

------------------------

Example 6. Newton Game Dynamics project. Priorities of ?: and * operations.

dgInt32 CalculateConvexShapeIntersection (...)
{
  ...
  den = dgFloat32 (1.0e-24f) *
        (den > dgFloat32 (0.0f)) ?
          dgFloat32 (1.0f) : dgFloat32 (-1.0f);
  ...
}

The error was found through the V502 diagnostic: Perhaps the '?:' operator
works in a different way than it was expected. The '?:' operator has a lower
priority than the '*' operator. physics dgminkowskiconv.cpp 1061

D]========================================

Currently this is not caught by D, it prints "12":

import std.stdio;
void main() {
    writefln("%d%d", 1, 2, 3);
}


I'd like this code to give a compile-time error.

If this is not possible, then I'd like this code to give a run-time error.

The third possibility is to print "123", but this is bad enough.

Printing just "12" and ignoring 3, as done by DMD 2.056, is not acceptable, in
my opinion.

E]========================================

Example 1. Shareaza project. Value range of char type.

void CRemote::Output(LPCTSTR pszName)
{

  ...
  CHAR* pBytes = new CHAR[ nBytes ];
  hFile.Read( pBytes, nBytes );
  ...
  if ( nBytes > 3 && pBytes[0] == 0xEF &&
       pBytes[1] == 0xBB && pBytes[2] == 0xBF )
  {
    pBytes += 3;
    nBytes -= 3;
    bBOM = true;
  }
  ...
}

The error was found through the V547 diagnostic: Expression 'pBytes [ 0 ] ==
0xEF' is always false. The value range of signed char type: [-128, 127].
Shareaza remote.cpp 350

------------------------

Example 3. VirtualDub project. Unsigned type is always >= 0.

typedef unsigned short wint_t;
...
void lexungetc(wint_t c) {
  if (c < 0)
    return;
   g_backstack.push_back(c);
}

The error was found through the V547 diagnostic: Expression 'c < 0' is always
false. Unsigned type value is never < 0. Ami lexer.cpp 225

The "c < 0" condition is always false because the variable of the unsigned type
is always above or equal to 0.

------------------------

Example 7. QT project. Dangerous loop.

bool equals( class1* val1, class2* val2 ) const{
{
  ...
  size_t size = val1->size();
  ...
  while ( --size >= 0 ){
    if ( !comp(*itr1,*itr2) )
      return false;
    itr1++;
    itr2++;
  }
  ...
}

The error was found through the V547 diagnostic: Expression '--size >= 0' is
always true. Unsigned type value is always >= 0. QtCLucene arrays.h 154

The (--size >= 0) condition is always true, since the size variable has the
unsigned type. It means that if two sequences being compared are alike, we will
get an overflow that will in its turn cause Access Violation or other program
failures.

------------------------

Example 8. MySQL project. Error in condition.

enum enum_mysql_timestamp_type
str_to_datetime(...)
{
  ...
  else if (str[0] != 'a' || str[0] != 'A')
    continue; /* Not AM/PM */
  ...
}

The error was found through the V547 diagnostic: Expression 'str [0] != 'a' ||
str [0] != 'A'' is always true. Probably the '&&' operator should be used here.
clientlib my_time.c 340

The condition is always true because the character is always either not equal
to 'a' or to 'A'. This is the correct check:

else if (str[0] != 'a' && str[0] != 'A')


Comments are welcome.

Bye,
bearophile

Nov 02 2011

Adam D. Ruppe <destructionator gmail.com> writes:

bearophile wrote:
 Currently this is not caught by D, it prints "12":
 import std.stdio;
 void main() {
    writefln("%d%d", 1, 2, 3);
 }

That's not really an error. You might change out the format
string at runtime based on user preferences, perhaps for
internationalization, or other reasons.

Nov 02 2011

Andrej Mitrovic <andrej.mitrovich gmail.com> writes:

Personally I like using this for a sort of on/off switch:
bool val;
// ..
val ^= 1;

Maybe that's just stupid but I'm kind of used to it, lol. :p

Nov 02 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrej Mitrovic:

 Personally I like using this for a sort of on/off switch:
 bool val;
 // ..
 val ^= 1;
 
 Maybe that's just stupid but I'm kind of used to it, lol. :p

I use this, I think it's more explicit (but you have to state the variable name
two times):

val = !val;

Bye,
bearophile

Nov 02 2011

bearophile <bearophileHUGS lycos.com> writes:

Adam D. Ruppe:

 bearophile wrote:
 Currently this is not caught by D, it prints "12":
 import std.stdio;
 void main() {
    writefln("%d%d", 1, 2, 3);
 }

 
 That's not really an error. You might change out the format
 string at runtime based on user preferences, perhaps for
 internationalization, or other reasons.

Right, the format string is in general a run-time value, so you can't always
catch this situation at compile-time.

 But how many times do you want to ignore some of the arguments listed? Even if
this happens (and I don't remember needing this), I think it's not common
enough to justify so permissive semantics at run-time.

Recently Andrei A. has expressed some opinions on this topic, but I don't
remember the bug report number.

Bye,
bearophile

Nov 02 2011

Adam D. Ruppe <destructionator gmail.com> writes:

bearophile:
 But how many times do you want to ignore some of the 
 arguments listed? 

I do a lot. The way I do it is the arguments are made
available to the format, but it doesn't always need them
at runtime.

string f = showNames ? "%1$s\t%2$d" : "%2$d";
writefln(f, name, number);


Though I don't literally use writefln for most
my code the same idea applies.

Nov 02 2011

bcs <bcs example.com> writes:

On 11/02/2011 09:00 PM, Adam D. Ruppe wrote:
 bearophile:
 But how many times do you want to ignore some of the
 arguments listed?

 I do a lot. The way I do it is the arguments are made
 available to the format, but it doesn't always need them
 at runtime.

 string f = showNames ? "%1$s\t%2$d" : "%2$d";
 writefln(f, name, number);


 Though I don't literally use writefln for most
 my code the same idea applies.

If you use a literal format string, don't use indexed formatting and 
don't use all the args, I think it would be safe to call that a bug in 
most any case. Fail any one of those pre-conditions and you may have a 
point.

Nov 03 2011

bearophile <bearophileHUGS lycos.com> writes:

Adam D. Ruppe:

 I do a lot. The way I do it is the arguments are made
 available to the format, but it doesn't always need them
 at runtime.
 
 string f = showNames ? "%1$s\t%2$d" : "%2$d";
 writefln(f, name, number);
 
 Though I don't literally use writefln for most
 my code the same idea applies.

Python is strict here:


 "%d" % (1)



'1'
 "%d" % (1, 2)



Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: not all arguments converted during string formatting


Even if you use that idiom, it has costs, measured in bugs. My experience tells
me that a sloppy semantics, introduced or kept for a little convenience, always
manages to find a way to bite your ass later. So I'd like more tidyness here.

You are allowed to write:

if (showNames)
    writeln("%1$s\t%2$d", name, number);
else
    writeln("%2$d", name);


There are also other solutions that don't compromise the already low safety of
writeln, that is a dynamically typed isle in a statically typed language.

Bye,
bearophile

Nov 05 2011

D Programming

C/C++ Programming

Other

digitalmars.D - More bugs found in OS code