www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 6346] New: Make == null a warning for arrays

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346

           Summary: Make == null a warning for arrays
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: jmdavisProg gmx.com



PDT ---
The correct way to test whether something in null in D is to use is. Using ==
null with strings is doing the same as checking whether their length == 0. This
is _not_ what most people are going to expect. So, if you actually understand
the rules, then the correct way to check for null is is null, and the correct
way to check for empty is to either use length == 0 or std.array.empty. As
such, there is no point in using == and if it's in the code, there's a good
chance that it's a bug.

It appears that there is _already_ such a warning when dealing with classes. I
think that it would be far less error-prone to make the same warning apply to
arrays. I don't think that there is really a reasonable case for use == null in
D at all, so in all cases, it should generate a warning. is null is what
actually checks for null, and there are better, clearer ways to check for empty
with arrays.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 18 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346


bearophile_hugs eml.cc changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bearophile_hugs eml.cc





 I don't think that there is really a reasonable case for use == null in
 D at all,
If this is true then I suggest to statically disallow such syntax, turning it into an error, instead of producing a warning. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 18 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346


Steven Schveighoffer <schveiguy yahoo.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |schveiguy yahoo.com



13:25:45 PDT ---
I disagree with the idea that == null is so bad to require a warning.

Note that .empty requires importing std.array, and arr.length == 0 doesn't read
as well as array == null.

Also note that the problem with classes (actually this has been fixed) is that
obj == null would cause a segfault if obj actually was null.  The distinction
between arr == null and arr is null is far more subtle and far less dangerous.

Also, this change makes code behave inconsistently.

For example:

foo(int[] y)
{
  int[] x = null;
  if(y == x) // ok
  {}
  else if(y == null) // warning???
}

What happens if the former is folded into the latter during optimization?

I'd argue this is not worth a change to the language warnings.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 18 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




PDT ---
Well, that's why I said warning rather than error. But I would consider it bad
practice to use == null anywhere with arrays. I'm automatically go to look at
it and wonder whether the programmer really meant is null (and they probably
did). And since I'd expect most code to import std.array if it's doing much
with arrays at all, I'd fully expect it to have access to empty. And while
arr.length == 0 is not as short as arr == length, it's far less error prone. I
think that the used of arr == null implies a fundamental misunderstanding on
the part of the programmer using it. Now, maybe they do understand that it's
the same as arr.length == 0 or arr.empty, but the odds are so high that they
really meant is null that I do _not_ think that code should be using == null.
If you're talking about "code smells" as Bearophile likes to, arr == null
_definitely_ smells. Sure, it _might_ be that it does what the programmer
intended, but odds are that it doesn't.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 18 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




13:54:46 PDT ---
There is a flaw in your argument, however.  For the coder to have meant they
want to check specifically if the pointer is null, they have to know there is a
difference between null and an "empty but non-null" slice.  If this is the
case, then wouldn't they use is null instead of == null?

Most coders don't even know that an empty slice could have a non-null pointer,
or would even care.  They just know that comparing an empty slice to null
results in true.  Consider that there are very few, if at all, functions which
make a distinction between null and "empty but not null" slices.  So where
exactly does doing == null lead to a problem?

If you want to focus on confusing spots, the two problems I see are:

if(arr is null)

and 

if(arr)

Both of these act *differently* based on whether the pointer is null or not. 
Now that is confusing.

Note that this can be fixed by doing:

if(arr is null) => warn, recommend using if(arr.ptr is null)

if(arr) => should always be false if arr.length == 0

But I think == null is perfectly safe, and 99% of the time exactly what the
writer intended.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 18 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




PDT ---
In pretty much every other language that I've ever used, null is _not_ the same
thing as empty _ever_. Someone coming from C, C++, Java etc. is going to expect
dynamic arrays to be able to be null. The way that you check for null in all of
those languages is either == 0 or == null (or == NULL or == nullptr). is does
not exist in any other language that I've ever seen. I fully expect programmers
from C, C++, Java etc. to expect == null to be checking whether an array is
null - that is whether is null is true. But it doesn't do that. It checks
whether length == 0.

The fact that an empty array an null are almost synonymous in D is incredibly
bizarre. It simplifies declaring an array and then appending it or setting its
length, but it is _not_ what your average programmer is going to expect coming
from other C-based languages. As such, I expect that == null risks being a
frequent newbie mistake. The only reason it wouldn't be is if not that many
newbies need to or care about checking for null.

The fact that

if(arr)

and

if(arr is null)

don't do the same thing is a related but separate issue, and should probably
also be addressed. But I really think that == null is a problem which is going
to throw off a lot of people. If I see it in code, I'm either going to point it
out or change it (depending on the circumstances). The odds are just too high
that it doesn't do what the programmer intended or that someone else reading
the code will misinterpret what it does.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 18 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




05:31:19 PDT ---

 In pretty much every other language that I've ever used, null is _not_ the same
 thing as empty _ever_. Someone coming from C, C++, Java etc. is going to expect
 dynamic arrays to be able to be null. The way that you check for null in all of
 those languages is either == 0 or == null (or == NULL or == nullptr).
php just uses if(!arr) or if(arr == array()) or if(arr == null). All of those return true for empty arrays. But the point I was making is, for D null *is* synonymous with empty. D is not every other language, and shouldn't be bound to the rules of those languages. The larger question to answer here is, is it likely to be *wrong* when someone enters: if(arr == null) and expects this to check if the pointer is null. The answer is very much no. It's likely to be either *exactly* what you want, or harmless. The super-subtle difference between null arrays and empty-but-not-null arrays is seldom, if ever, used. Now, I agree that we could have a better built-in moniker for is this array empty, but do we really need a warning? The code is clear, unambiguous, and exactly what the developer wanted, or close enough to be effective. In java or C, checking an array for a null pointer is needed because an unallocated array throws an exception or segfault if you try to use it. This is not the case in D. D does not have this problem. Anyone's code who thinks it does is bound to be obvious: if(arr == null) { arr = new arr[5]; } No need to warn here, the code is fine (works exactly how the user wants). But the easy optimization is just to remove the if check.
 is does
 not exist in any other language that I've ever seen.
for it, since operators can't be overloaded.
 I fully expect programmers
 from C, C++, Java etc. to expect == null to be checking whether an array is
 null - that is whether is null is true. But it doesn't do that. It checks
 whether length == 0.
Yeah, but the point I'm trying to make is, checking for an actual null pointer means nothing in D! So checking if an array is empty is a much more useful implementation. Null arrays are not bad things in D, they don't throw exceptions or segfaults simply because they are null. If anything, the check for null is probably extraneous, and can be removed.
 But I really think that == null is a problem which is going
 to throw off a lot of people. If I see it in code, I'm either going to point it
 out or change it (depending on the circumstances). The odds are just too high
 that it doesn't do what the programmer intended or that someone else reading
 the code will misinterpret what it does.
I completely disagree, the odds are very low that it's damaging or incorrect at all. Experienced D coders use this feature all the time. You just gotta start thinking in D instead of your previous languages. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




05:32:02 PDT ---

 In pretty much every other language that I've ever used, null is _not_ the same
 thing as empty _ever_. Someone coming from C, C++, Java etc. is going to expect
 dynamic arrays to be able to be null. The way that you check for null in all of
 those languages is either == 0 or == null (or == NULL or == nullptr).
php just uses if(!arr) or if(arr == array()) or if(arr == null). All of those return true for empty arrays. But the point I was making is, for D null *is* synonymous with empty. D is not every other language, and shouldn't be bound to the rules of those languages. The larger question to answer here is, is it likely to be *wrong* when someone enters: if(arr == null) and expects this to check if the pointer is null. The answer is very much no. It's likely to be either *exactly* what you want, or harmless. The super-subtle difference between null arrays and empty-but-not-null arrays is seldom, if ever, used. Now, I agree that we could have a better built-in moniker for is this array empty, but do we really need a warning? The code is clear, unambiguous, and exactly what the developer wanted, or close enough to be effective. In java or C, checking an array for a null pointer is needed because an unallocated array throws an exception or segfault if you try to use it. This is not the case in D. D does not have this problem. Anyone's code who thinks it does is bound to be obvious: if(arr == null) { arr = new int[5]; } No need to warn here, the code is fine (works exactly how the user wants). But the easy optimization is just to remove the if check.
 is does
 not exist in any other language that I've ever seen.
for it, since operators can't be overloaded.
 I fully expect programmers
 from C, C++, Java etc. to expect == null to be checking whether an array is
 null - that is whether is null is true. But it doesn't do that. It checks
 whether length == 0.
Yeah, but the point I'm trying to make is, checking for an actual null pointer means nothing in D! So checking if an array is empty is a much more useful implementation. Null arrays are not bad things in D, they don't throw exceptions or segfaults simply because they are null. If anything, the check for null is probably extraneous, and can be removed.
 But I really think that == null is a problem which is going
 to throw off a lot of people. If I see it in code, I'm either going to point it
 out or change it (depending on the circumstances). The odds are just too high
 that it doesn't do what the programmer intended or that someone else reading
 the code will misinterpret what it does.
I completely disagree, the odds are very low that it's damaging or incorrect at all. Experienced D coders use this feature all the time. You just gotta start thinking in D instead of your previous languages. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Jul 19 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




05:33:14 PDT ---
Sorry for the duplicate comment, I wanted the first to be overwritten by the
second (bugzilla told me that's what would happen), but it didn't :(  Ignore
comment 6 (/me wishes for comment editing/deleting)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 19 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6346




PDT ---
Except that null and empty are _not_ the same for arrays. True, checking for
null is not as necessary in D, but you _can_ write code which relies on whether
arrays are null or not. And the most likely thing that someone is going to do
if they're writing code like that is to use == null, unless they were paying a
lot of attention when reading the online docs or TDPL and remembered is and how
it differs from ==. If == null and is null were identical, it wouldn't be an
issue, but they're not. It can matter whether a function returns null or "" (or
[]). It can matter whether you use == or is. And I wouldn't expect many people
new to D to correctly use is with null. Rather, if they're checking for null,
they'll use == null. And since there is _zero_ need for == null given the
alternatives and given how it's likely that a newbie would use == where they
should use is, I see no reason that good D code should be using == null (at
best, it's a bit prettier than length == 0, and given that empty matters with
containers, empty should be encouraged over length == 0 anyway). So, given that
it's going to cause problems (and bugs) for newbies, and anyone who knows what
they're doing doesn't need it, I really think that there should be a warning
for using == null.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Jul 19 2011