digitalmars.D - Google C++ style guide

bearophile (70/96) Oct 03 2009 I have found this page linked from Reddit (click "Toggle all summaries" ...

Justin Johansson (12/15) Oct 03 2009 Coming from a career in acronym-city (aerospace), project management man...
Jeremie Pelletier (80/216) Oct 03 2009 I think these are more programming guidelines than language design

bearophile (20/30) Oct 03 2009 Yes, of course. But programming guidelines can give possible ideas to a ...

Jeremie Pelletier (25/37) Oct 03 2009 I'm not sure if that's a good thing, different companies enforce

Christopher Wright (6/15) Oct 04 2009 You use RTTI for dynamic casts, variadic functions, and the default

Jeremie Pelletier (4/21) Oct 04 2009 Yeah something like "don't generate type names" and other extra

Don (7/29) Oct 05 2009 I've often thought that a pragma for a module to "don't generate module

bearophile (9/11) Oct 05 2009 Do you use the LDC compiler?

Jeremie Pelletier (3/21) Oct 05 2009 I would much prefer these to be compiler switches, so you can make them

bearophile (4/6) Oct 05 2009 Compiler switches are a blunt tool. So I think module-wide switches are ...
Don (5/31) Oct 05 2009 That's a completely different use case, I think. For internal modules,

Sean Kelly (11/40) Oct 05 2009 One thing that can trip this up is structs containing floating point num...

sclytrack (8/11) Oct 04 2009 Function Default Arguments
=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= (24/41) Oct 04 2009 a C++ style guide from a firm that uses C++ more than Google. On the oth...

Justin Johansson (31/54) Oct 04 2009 Ditto. A special use case to consider is when you have a function templ...

bearophile (4/6) Oct 04 2009 If the information isn't missing in D2 you can sometimes use "auto" retu...

Kagamin (10/40) Oct 05 2009 No. D has static constructors which do the same.

Kagamin (2/3) Oct 05 2009 A... I've misread the spec a little. Though I think, it's still a proble...
bearophile (7/10) Oct 05 2009 In D it's better to not use them when you want a strictly positive numbe...

bearophile (15/16) Oct 05 2009 We may even disallow all implicit conversions that lose a significant am...

bearophile (4/7) Oct 05 2009 We aren't talking about a feature here, but a standard syntax to denote ...

bearophile <bearophileHUGS lycos.com> writes:

I have found this page linked from Reddit (click "Toggle all summaries" at the
top to read the full page):
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml

At Google C++ isn't the most used language, so it may be better to use a C++
style guide from a firm that uses C++ more than Google. On the other hand
Google has hired many good programmers, and probably some of them have strong
C++ experience, so if you are interested in C++/D this style guide deserves to
be read.

This guide is mostly (as it often happens with C++) a list of features that are
forbidden, I think usually to reduce the total bug count of the programs. Some
of such imposed limits make me a little nervous, so I'd like to remove/relax
some of those limits, but I am ignorant regarding C++, while the people that
have written this document are expert, so their judgement has weight.

They forbid several features that are present in D too. Does it means D has to
drop such features (or make them less "natural", so the syntax discourages
their use)?

Here are few things from that document that I think are somehow interesting.
Some of those things may be added to D style guide, or they may even suggest
changes in the language itself.

-------------------

Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<

D may even enforce this, allowing "out" only after "in" arguments.

-------------------

Nested Classes: Do not make nested classes public unless they are actually part
of the interface, e.g., a class that holds a set of options for some method.<

-------------------

Static and Global Variables: Static or global variables of class type are
forbidden: they cause hard-to-find bugs due to indeterminate order of
construction and destruction. [...] The order in which class constructors,
destructors, and initializers for static variables are called is only partially
specified in C++ and can even change from build to build, which can cause bugs
that are difficult to find. [...] As a result we only allow static variables to
contain POD data.<

I think D avoids such problem.

-------------------

Doing Work in Constructors: Do only trivial initialization in a constructor. If
at all possible, use an Init() method for non-trivial initialization. [...] If
the work calls virtual functions, these calls will not get dispatched to the
subclass implementations. Future modification to your class can quietly
introduce this problem even if your class is not currently subclassed, causing
much confusion.<

-------------------

Declaration Order: Use the specified order of declarations within a class:
public: before private:, methods before data members (variables), etc.<

D may even enforce such order (Pascal does something similar).

-------------------

Reference Arguments: All parameters passed by reference must be labeled const.<

In fact it is a very strong convention in Google code that input arguments are
values or const references while output arguments are pointers. Input
parameters may be const pointers, but we never allow non-const reference
parameters.<

I think C solves part of such problem forcing the programmer to add "ref"
before the variable name in the calling place too. D may do the same.

-------------------

Function Overloading: Use overloaded functions (including constructors) only in
cases where input can be specified in different types that contain the same
information.

Cons: One reason to minimize function overloading is that overloading can make
it hard to tell which function is being called at a particular call site.
Another one is that most people are confused by the semantics of inheritance if
a deriving class overrides only some of the variants of a function.<

Decision: If you want to overload a function, consider qualifying the name with
some information about the arguments, e.g., AppendString(), AppendInt() rather
than just Append().<


This is a strong limitation. One of the things that makes C++ more handy than
C. I accept it for normal code, but I refuse it for "library code". Library
code is designed to be more flexible and reusable, making syntax simpler, etc.
So I want D to keep overloaded functions.

-------------------

Default Arguments: We do not allow default function parameters.<

Cons: People often figure out how to use an API by looking at existing code
that uses it. Default parameters are more difficult to maintain because
copy-and-paste from previous code may not reveal all the parameters.
Copy-and-pasting of code segments can cause major problems when the default
arguments are not appropriate for the new code.<

Decision: We require all arguments to be explicitly specified, to force
programmers to consider the API and the values they are passing for each
argument rather than silently accepting defaults they may not be aware of.<


This too is a strong limitation. I understand that it may make life a little
more complex, but they are handy. So I think their usage has to be limited, but
I don't like to totally forbid them.
"Forcing the programmers to consider the API" has some negative side-effects
too that they seem to ignore. So I want D to keep its default function
parameters feature.

-------------------

Variable-Length Arrays and alloca(): We do not allow variable-length arrays or
alloca().<

Cons: Variable-length arrays and alloca [...] allocate a data-dependent amount
of stack space that can trigger difficult-to-find memory overwriting bugs: "It
ran fine on my machine, but dies mysteriously in production".<

Decision:  Use a safe allocator instead, such as scoped_ptr/scoped_array.<

After reading this page:
http://www.boost.org/doc/libs/1_40_0/libs/smart_ptr/scoped_array.htm
I think they are just a pointer that points to heap-allocated memory, plus it
gets deallocated when the scope ends.

In 99.5% of the cases a heap allocation is good enough in D (especially of the
GC gets better). But once in a while speed is more important, so for very small
arrays I'd like to have variable-length arrays in D (allocating large arrays on
the stack is always bad in production code).

-------------------

Run-Time Type Information (RTTI): We do not use Run Time Type Information
(RTTI).<

If you find yourself in need of writing code that behaves differently based on
the class of an object, consider one of the alternatives to querying the type.
Virtual methods are the preferred way of executing different code paths
depending on a specific subclass type. This puts the work within the object
itself. If the work belongs outside the object and instead in some processing
code, consider a double-dispatch solution, such as the Visitor design pattern.
This allows a facility outside the object itself to determine the type of class
using the built-in type system. If you think you truly cannot use those ideas,
you may use RTTI. But think twice about it. :-) Then think twice again. Do not
hand-implement an RTTI-like workaround. The arguments against RTTI apply just
as much to workarounds like class hierarchies with type tags. <

I think this is in most situations acceptable. On the other hand I'd like D to
have a better implemented reflection (whithin the bounds of the things that can
be done by a static compiler, even if future D implementations may run on a VM,
like a future alternative LDC), that can be useful in unittesting.

I am not sure about this, I don't use RTTI a lot in D code.

-------------------

Casting: Use C++ casts like static_cast<>(). Do not use other cast formats like
int y = (int)x; or int y = int(x);.<

Pros: The problem with C casts is the ambiguity of the operation; sometimes you
are doing a conversion (e.g., (int)3.5) and sometimes you are doing a cast
(e.g., (int)"hello"); C++ casts avoid this. Additionally C++ casts are more
visible when searching for them.<

Do not use C-style casts. Instead, use these C++-style casts.

* Use static_cast as the equivalent of a C-style cast that does value
conversion, or when you need to explicitly up-cast a pointer from a class to
its superclass.
* Use const_cast to remove the const qualifier (see const).
* Use reinterpret_cast to do unsafe conversions of pointer types to and from
integer and other pointer types. Use this only if you know what you are doing
and you understand the aliasing issues.
* Do not use dynamic_cast except in test code. If you need to know type
information at runtime in this way outside of a unittest, you probably have a
design flaw.<

I agree with them that mixing all different kinds of cast as in D is bad. In D
I'd like to know what I'm doing in a more precise way. This is something that
can be improved in D.

-------------------

Integer Types:

You should not use the unsigned integer types such as uint32_t, unless the
quantity you are representing is really a bit pattern rather than a number, or
unless you need defined twos-complement overflow. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this.<

I'm for the removal of size_t from everywhere it's not stricly necessary (so
for example from array lenghts) to avoid bugs.

See also the recent thread about signed-unsigned issues:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=17800

Integer oveflow tests too will help.

-------------------

Boost:

Cons: Some Boost libraries encourage coding practices which can hamper
readability, such as metaprogramming and other advanced template techniques,
and an excessively "functional" style of programming.<

Advanced used of templates makes the code less easy to understand. But
sometimes functional style makes code shorter, more readable, safer
multiprocessing-wise, sometimes even parallelizable, etc.

-------------------

Type Names: often I don't like the C++ practice of using a single uppercase
letter for a template type, like T. Better to give a meaningful name to types,
when possible.

-------------------

Class Data Members: Data members (also called instance variables or member
variables) are lowercase with optional underscores like regular variable names,
but always end with a trailing underscore.<

D may even enforce some simple syntax for class members, like that underscore
or something else. No other variable is allowed to share the same syntax (so
this syntax is used iff it's a class member). It makes conversions from other
languages a little more work, but I think it will pay off.

-------------------

Regular Functions: Functions should start with a capital letter and have a
capital letter for each new word. No underscores:<

That's ugly.

-------------------

Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<

4 spaces are more readable :-)

-------------------

Pointer and Reference Expressions:

// These are fine, space following.
char* c;    // but remember to do "char* c, *d, *e, ...;"!

That's good in D but bad in C/C++. They are wrong here.

-------------------

Class Format: Sections in public, protected and private order, each indented
one space.<

There are no good solutions to this. I use 4 spaces for them too.

-------------------

Loops and Conditionals:

for ( ; i < 5 ; ++i) {  // For loops always have a space after the
  ...                   // semicolon, and may have a space before the
                        // semicolon.

That space before the ; is quite important. But I don't think there's a need
for a warning if it's absent.

-------------------

Bye,
bearophile

Oct 03 2009

Justin Johansson <no spam.com> writes:

bearophile Wrote:

Regular Functions: Functions should start with a capital letter and have a
capital letter for each new word. No underscores:<

 
 That's ugly.

Coming from a career in acronym-city (aerospace), project management mandated
that use of acronyms
in identifiers MUST be clearly indicated with uppercase letters.  In the event
that ambiguity could arise, such as in camel-cased identifiers, the end of an
acronym had to be separated by an underscore between it and any following
letter in the identifier.

This rule, whilst painful/ugly at times, was rigorously enforced in safety
critical systems lest there be
any possibility, no matter how remote, of confusion with interpretation of
nomenclature in systems
engineering documents.

So, for example, the following were for verboten:

ParseXmlDocument    (The correct acronym for Xml is XML)
PaseXMLDocument     (XMLD might be erroneously interpreted as a 4 letter
acronym)

Required formulation of the identifier in the case must be "ParseXML_Document".

Ciao
Justin Johansson

Oct 03 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

bearophile wrote:
 I have found this page linked from Reddit (click "Toggle all summaries" at the
top to read the full page):
 http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
 
 At Google C++ isn't the most used language, so it may be better to use a C++
style guide from a firm that uses C++ more than Google. On the other hand
Google has hired many good programmers, and probably some of them have strong
C++ experience, so if you are interested in C++/D this style guide deserves to
be read.
 
 This guide is mostly (as it often happens with C++) a list of features that
are forbidden, I think usually to reduce the total bug count of the programs.
Some of such imposed limits make me a little nervous, so I'd like to
remove/relax some of those limits, but I am ignorant regarding C++, while the
people that have written this document are expert, so their judgement has
weight.
 
 They forbid several features that are present in D too. Does it means D has to
drop such features (or make them less "natural", so the syntax discourages
their use)?
 
 Here are few things from that document that I think are somehow interesting.
Some of those things may be added to D style guide, or they may even suggest
changes in the language itself.

I think these are more programming guidelines than language design 
rules. That's like most academic teachers saying "goto" is evil and 
should never be used, yet new languages like D still support it.

 -------------------
 
 Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<

 
 D may even enforce this, allowing "out" only after "in" arguments.

That can be good for readability in most cases, but I also like to order 
parameters in logical order instead of storage class order, enforcing 
parameter order would also break lots of existing code.

 Static and Global Variables: Static or global variables of class type are
forbidden: they cause hard-to-find bugs due to indeterminate order of
construction and destruction. [...] The order in which class constructors,
destructors, and initializers for static variables are called is only partially
specified in C++ and can even change from build to build, which can cause bugs
that are difficult to find. [...] As a result we only allow static variables to
contain POD data.<

 
 I think D avoids such problem.

Indeed, static ctors/dtors are very useful but I like to keep their 
number down to a minimum and perform lazy initialization instead.

 -------------------
 
 Declaration Order: Use the specified order of declarations within a class:
public: before private:, methods before data members (variables), etc.<

 
 D may even enforce such order (Pascal does something similar).

Again, I wouldn't want to enforce such an order, sometimes I declare a 
private helper method right next to the set of public methods using it 
so I don't have to scroll down 400 lines to view the two.

 -------------------
 
 Reference Arguments: All parameters passed by reference must be labeled const.<

 
 In fact it is a very strong convention in Google code that input arguments are
values or const references while output arguments are pointers. Input
parameters may be const pointers, but we never allow non-const reference
parameters.<

 
 I think C solves part of such problem forcing the programmer to add "ref"
before the variable name in the calling place too. D may do the same.

I don't recall C having a "ref" keyword :)

That guideline I agree with, that's also how I write my parameters, 
although I take it a step further in D with in/const/immutable:

'in' for variables that are not modified and don't escape the method's 
scope.
'const' for variables that are not modified but escape the method's 
scope, maybe with a copy because the data may be mutable somewhere else.
'immutable' for variables that are not modified but escape the method's 
scope, never copied because they're expected to never change for their 
entire lifetime.

 -------------------
 
 Function Overloading: Use overloaded functions (including constructors) only
in cases where input can be specified in different types that contain the same
information.
 
 Cons: One reason to minimize function overloading is that overloading can make
it hard to tell which function is being called at a particular call site.
Another one is that most people are confused by the semantics of inheritance if
a deriving class overrides only some of the variants of a function.<

 
 Decision: If you want to overload a function, consider qualifying the name
with some information about the arguments, e.g., AppendString(), AppendInt()
rather than just Append().<

 
 
 This is a strong limitation. One of the things that makes C++ more handy than
C. I accept it for normal code, but I refuse it for "library code". Library
code is designed to be more flexible and reusable, making syntax simpler, etc.
 So I want D to keep overloaded functions.

I partly agree, function overloading is very nice if you need generic 
code. But I also agree with the guideline in that you should keep your 
overloads short and to the point.

For example on my output stream interface I allow writes from direct 
data or data from an input stream, those have different names instead of 
an overload because there's nothing generic here.

Anyways, considering how easy it is to write method templates in D 
overloading for different primitive types is almost unneeded.

 -------------------
 
 Default Arguments: We do not allow default function parameters.<

 
 Cons: People often figure out how to use an API by looking at existing code
that uses it. Default parameters are more difficult to maintain because
copy-and-paste from previous code may not reveal all the parameters.
Copy-and-pasting of code segments can cause major problems when the default
arguments are not appropriate for the new code.<

 
 Decision: We require all arguments to be explicitly specified, to force
programmers to consider the API and the values they are passing for each
argument rather than silently accepting defaults they may not be aware of.<

 
 
 This too is a strong limitation. I understand that it may make life a little
more complex, but they are handy. So I think their usage has to be limited, but
I don't like to totally forbid them.
 "Forcing the programmers to consider the API" has some negative side-effects
too that they seem to ignore. So I want D to keep its default function
parameters feature.

I completely agree here, JavaScript for example has no default 
parameters and it's annoying as hell. Looking at existing code is really 
handy to learn about the usage of a function when the documentation is 
too vague, that documentation is still the best source to learn about 
the parameters.

 -------------------
 
 Variable-Length Arrays and alloca(): We do not allow variable-length arrays or
alloca().<

 
 Cons: Variable-length arrays and alloca [...] allocate a data-dependent amount
of stack space that can trigger difficult-to-find memory overwriting bugs: "It
ran fine on my machine, but dies mysteriously in production".<

 
 Decision:  Use a safe allocator instead, such as scoped_ptr/scoped_array.<

 
 After reading this page:
 http://www.boost.org/doc/libs/1_40_0/libs/smart_ptr/scoped_array.htm
 I think they are just a pointer that points to heap-allocated memory, plus it
gets deallocated when the scope ends.
 
 In 99.5% of the cases a heap allocation is good enough in D (especially of the
GC gets better). But once in a while speed is more important, so for very small
arrays I'd like to have variable-length arrays in D (allocating large arrays on
the stack is always bad in production code).

I barely use alloca at all, since you don't always know if the array is 
going to be 50 bytes or 20k bytes. If you know the array's size or at 
least the max size it can get then you can just use a fixed-size array 
which will get allocated on the stack.

 -------------------
 
 Run-Time Type Information (RTTI): We do not use Run Time Type Information
(RTTI).<

 
 If you find yourself in need of writing code that behaves differently based on
the class of an object, consider one of the alternatives to querying the type.
Virtual methods are the preferred way of executing different code paths
depending on a specific subclass type. This puts the work within the object
itself. If the work belongs outside the object and instead in some processing
code, consider a double-dispatch solution, such as the Visitor design pattern.
This allows a facility outside the object itself to determine the type of class
using the built-in type system. If you think you truly cannot use those ideas,
you may use RTTI. But think twice about it. :-) Then think twice again. Do not
hand-implement an RTTI-like workaround. The arguments against RTTI apply just
as much to workarounds like class hierarchies with type tags. <

 
 I think this is in most situations acceptable. On the other hand I'd like D to
have a better implemented reflection (whithin the bounds of the things that can
be done by a static compiler, even if future D implementations may run on a VM,
like a future alternative LDC), that can be useful in unittesting.
 
 I am not sure about this, I don't use RTTI a lot in D code.

Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
disable the generation of all ClassInfo and TypeInfo instances, along 
with a version identifier, maybe "version = RTTI_Disabled;" to let code 
handle it.

I use RTTI a lot for simple debugging like printing the name of a class 
or type in generic code or meta programming, but not at all in 
production code. Most of the time I can rely on .stringof and a message 
pragma to do the same.

 -------------------
 
 Casting: Use C++ casts like static_cast<>(). Do not use other cast formats
like int y = (int)x; or int y = int(x);.<

 
 Pros: The problem with C casts is the ambiguity of the operation; sometimes
you are doing a conversion (e.g., (int)3.5) and sometimes you are doing a cast
(e.g., (int)"hello"); C++ casts avoid this. Additionally C++ casts are more
visible when searching for them.<

 
 Do not use C-style casts. Instead, use these C++-style casts.

 * Use static_cast as the equivalent of a C-style cast that does value
conversion, or when you need to explicitly up-cast a pointer from a class to
its superclass.
 * Use const_cast to remove the const qualifier (see const).
 * Use reinterpret_cast to do unsafe conversions of pointer types to and from
integer and other pointer types. Use this only if you know what you are doing
and you understand the aliasing issues.
 * Do not use dynamic_cast except in test code. If you need to know type
information at runtime in this way outside of a unittest, you probably have a
design flaw.<
 
 I agree with them that mixing all different kinds of cast as in D is bad. In D
I'd like to know what I'm doing in a more precise way. This is something that
can be improved in D.

I also agree with you here, static/dynamic/reinterpret casts aren't that 
hard to understand in C++ and really say what the programmer wants to 
do, as well as letting the compiler warn you when its not a possible cast.

Its all neat to have a single cast keyword that does it all, but its 
even better to know whats happening and to control it, maybe the cast 
syntax can be extended like this:

cast(Object, static)(new Foo);

as well as dynamic and reinterpret identifiers, which wouldn't be 
keywords anywhere else in the language (just like __traits and pragma do)

 -------------------
 
 Integer Types:
 
 You should not use the unsigned integer types such as uint32_t, unless the
quantity you are representing is really a bit pattern rather than a number, or
unless you need defined twos-complement overflow. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this.<

 
 I'm for the removal of size_t from everywhere it's not stricly necessary (so
for example from array lenghts) to avoid bugs.

I don't think this guideline was about the size of integrals but rather 
their sign bit.

 See also the recent thread about signed-unsigned issues:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=17800
 
 Integer oveflow tests too will help.

Yeah I would like overflow tests in D too, although I don't like how you 
can't control which tests are used and which arent, they're either all 
enabled or all disabled.

 -------------------
 
 Boost:
 
 Cons: Some Boost libraries encourage coding practices which can hamper
readability, such as metaprogramming and other advanced template techniques,
and an excessively "functional" style of programming.<

 
 Advanced used of templates makes the code less easy to understand. But
sometimes functional style makes code shorter, more readable, safer
multiprocessing-wise, sometimes even parallelizable, etc.

Boost is the best thing to happen to C++! I agree it can get very hard 
to maintain readability in C++, but D does not have that problem. 
Templates in D are very elegant and much more powerful than C++'s at the 
same time.

It really depends on what you're coding, for example I use very little 
templates in a GUI interface but I use templates on nearly every 
function to handle strings. I also use templates a lot as class/method 
traits to lower the runtime overhead.

 -------------------
 
 Type Names: often I don't like the C++ practice of using a single uppercase
letter for a template type, like T. Better to give a meaningful name to types,
when possible.

I think T fits generic template parameters the same way i fits for loops :)

 -------------------
 
 Class Data Members: Data members (also called instance variables or member
variables) are lowercase with optional underscores like regular variable names,
but always end with a trailing underscore.<

 
 D may even enforce some simple syntax for class members, like that underscore
or something else. No other variable is allowed to share the same syntax (so
this syntax is used iff it's a class member). It makes conversions from other
languages a little more work, but I think it will pay off.

I don't think it should be enforced by the language, it's a great 
guideline but the programmer should be free to select its flavor (ie 
m_var, mVar, _var, var_, etc)

 -------------------
 
 Regular Functions: Functions should start with a capital letter and have a
capital letter for each new word. No underscores:<

 
 That's ugly.

That's how I write my method names! Maybe I did too much code around the 
win32 api, the Mozilla code also uses these method names.

I like it that way cause I can easily differentiate variableNames from 
MethodNames from CONSTANT_NAMES :)

 -------------------
 
 Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<

 
 4 spaces are more readable :-)

Tabs are better since the editor can be set to whatever number of spaces 
you wish for them :) I use 4 myself.

 -------------------
 
 Loops and Conditionals:
 
 for ( ; i < 5 ; ++i) {  // For loops always have a space after the
   ...                   // semicolon, and may have a space before the
                         // semicolon.
 
 That space before the ; is quite important. But I don't think there's a need
for a warning if it's absent.

Why would there be a warning?

 -------------------
 
 Bye,
 bearophile

Oct 03 2009

bearophile <bearophileHUGS lycos.com> writes:

Jeremie Pelletier:

I think these are more programming guidelines than language design rules.<

Yes, of course. But programming guidelines can give possible ideas to a
language designer, because:
- if everyone is encouraged to follow a certain idiom to avoid bugs, it may be
good to let the language itself enforce the idiom (see D that disallows
for(...); ).
- if most similar guidelines suggest to not use a certain language feature,
such feature may need a redesign, or maybe to be made "less nice" syntax-wise,
so the syntax shows its usage is discouraged.
- if in many guidelines suggest to do something in a standard way, to improve
uniformity, it may be good to add such thing too to help spreading and
transmission of code in the programmer community of that language. One of the
causes of Python success is that it forces a very uniform coding style, and
this helps people understand and modify each other code, this helps a little
the creation of an ecosystem of reusable code. The compile-enforcing of syntax
for class attributes in D can be one of such things.


enforcing parameter order would also break lots of existing code.<

D2 is in flux still, every release breaks existing code.


I don't recall C having a "ref" keyword :)<




I completely agree here, JavaScript for example has no default parameters and
it's annoying as hell. Looking at existing code is really handy to learn about
the usage of a function when the documentation is too vague, that documentation
is still the best source to learn about the parameters.<

I'm waiting for named arguments too in D :-)


I barely use alloca at all, since you don't always know if the array is going
to be 50 bytes or 20k bytes. If you know the array's size or at least the max
size it can get then you can just use a fixed-size array which will get
allocated on the stack.<

I was talking about smarter function, that allocates on the heap if the
requested size is too much large or if the stack is finishing :-) But of course
fixed sized arrays are often enough.


I don't think this guideline was about the size of integrals but rather their
sign bit.<

Right, I meant unsigned integral numbers.


Yeah I would like overflow tests in D too, although I don't like how you can't
control which tests are used and which arent, they're either all enabled or all
disabled.<

There are ways to solve this problem/limit. Putting basic tests in is a
starting point. I have given LLVM developers some small enhancements requests
to implement such tests more efficiently:
http://llvm.org/bugs/show_bug.cgi?id=4916
http://llvm.org/bugs/show_bug.cgi?id=4917
http://llvm.org/bugs/show_bug.cgi?id=4918
I have also discussed this topic with LDC developers, for possible
implementations.


I agree it can get very hard to maintain readability in C++, but D does not
have that problem. Templates in D are very elegant and much more powerful than
C++'s at the same time.<

D template programming can become very unreadable, trust me :-)


I think T fits generic template parameters the same way i fits for loops :)<

Sometimes I avoid "i" for loops :-)


I don't think it should be enforced by the language, it's a great guideline but
the programmer should be free to select its flavor (ie m_var, mVar, _var, var_,
etc)<

Here I don't agree with you. Uniformity in such thing is important enough.

Bye and thank you for your answers,
bearophile

Oct 03 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

bearophile wrote:
 Jeremie Pelletier:
 
 I think these are more programming guidelines than language design rules.<

 
 Yes, of course. But programming guidelines can give possible ideas to a
language designer, because:
 - if everyone is encouraged to follow a certain idiom to avoid bugs, it may be
good to let the language itself enforce the idiom (see D that disallows
for(...); ).
 - if most similar guidelines suggest to not use a certain language feature,
such feature may need a redesign, or maybe to be made "less nice" syntax-wise,
so the syntax shows its usage is discouraged.
 - if in many guidelines suggest to do something in a standard way, to improve
uniformity, it may be good to add such thing too to help spreading and
transmission of code in the programmer community of that language. One of the
causes of Python success is that it forces a very uniform coding style, and
this helps people understand and modify each other code, this helps a little
the creation of an ecosystem of reusable code. The compile-enforcing of syntax
for class attributes in D can be one of such things.

I'm not sure if that's a good thing, different companies enforce 
different guidelines for different reasons, and then you have 
independent programmers with their own guidelines too.

As for less nice syntax, I'd hate to use __goto, __traits is already 
ugly enough that I always hide it behind a template with a nicer name 
and lets not even talk about __gshared showing its ugly self all over my 
C bindings :)

Maybe if the compiler had a -strict switch to enforce a certain 
guideline over code, we already have -safe for enforcements over memory 
usage! Such an enforcement would then be an awesome feature for D to 
have. I'm not against the idea, I'm against making it the only available 
option!

 I was talking about smarter function, that allocates on the heap if the
requested size is too much large or if the stack is finishing :-) But of course
fixed sized arrays are often enough.

Those smarts have some overhead to them to first check the allocation 
size and the remaining stack size, and finally call the appropriate 
allocator, that overhead would almost make such a smart function useless 
when compared to direct heap allocations.

 D template programming can become very unreadable, trust me :-)

Not anymore than any other bit of code :)

 Sometimes I avoid "i" for loops :-)

Sometimes I avoid "T" for templates :)

 Here I don't agree with you. Uniformity in such thing is important enough.

Again I believe such an enforcement should be behind a -strict switch, I 
agree with you that uniformity can be a great thing and I can only 
imagine the all good it does to the python community. However we're 
talking systems programming here, people want the choice between using 
the feature or not using it :)

Jeremie

Oct 03 2009

Christopher Wright <dhasenan gmail.com> writes:

Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
 disable the generation of all ClassInfo and TypeInfo instances, along 
 with a version identifier, maybe "version = RTTI_Disabled;" to let code 
 handle it.
 
 I use RTTI a lot for simple debugging like printing the name of a class 
 or type in generic code or meta programming, but not at all in 
 production code. Most of the time I can rely on .stringof and a message 
 pragma to do the same.

You use RTTI for dynamic casts, variadic functions, and the default 
implementation of toString. You could safely eliminate some fields from 
ClassInfo and TypeInfo, but you can't get rid of them entirely.

The best you can do is make TypeInfo entirely opaque (no fields) and 
only include the base class, interfaces, and name for ClassInfo.

Oct 04 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
 disable the generation of all ClassInfo and TypeInfo instances, along 
 with a version identifier, maybe "version = RTTI_Disabled;" to let 
 code handle it.

 I use RTTI a lot for simple debugging like printing the name of a 
 class or type in generic code or meta programming, but not at all in 
 production code. Most of the time I can rely on .stringof and a 
 message pragma to do the same.

 
 You use RTTI for dynamic casts, variadic functions, and the default 
 implementation of toString. You could safely eliminate some fields from 
 ClassInfo and TypeInfo, but you can't get rid of them entirely.
 
 The best you can do is make TypeInfo entirely opaque (no fields) and 
 only include the base class, interfaces, and name for ClassInfo.

Yeah something like "don't generate type names" and other extra 
informations would be a definive plus, that makes reverse engineering 
too easy :)

Oct 04 2009

Don <nospam nospam.com> writes:

Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
 disable the generation of all ClassInfo and TypeInfo instances, along 
 with a version identifier, maybe "version = RTTI_Disabled;" to let 
 code handle it.

 I use RTTI a lot for simple debugging like printing the name of a 
 class or type in generic code or meta programming, but not at all in 
 production code. Most of the time I can rely on .stringof and a 
 message pragma to do the same.

 You use RTTI for dynamic casts, variadic functions, and the default 
 implementation of toString. You could safely eliminate some fields 
 from ClassInfo and TypeInfo,  but you can't get rid of them entirely.

 The best you can do is make TypeInfo entirely opaque (no fields) and 
 only include the base class, interfaces, and name for ClassInfo.

 
 Yeah something like "don't generate type names" and other extra 
 informations would be a definive plus, that makes reverse engineering 
 too easy :)

I've often thought that a pragma for a module to "don't generate module 
info" would be very useful for executable size. I'm particularly 
thinking of bindings like the Win32 headers, where there are a hundred 
modules, and the module info isn't actually useful. There could be a 
default ModuleInfo instance, with module name "ModuleInfoUnavailable", 
which all such modules would point to.

Oct 05 2009

bearophile <bearophileHUGS lycos.com> writes:

Don:

 I've often thought that a pragma for a module to "don't generate module 
 info" would be very useful for executable size.

Do you use the LDC compiler?

LDC has the pragmas:
pragma(no_typeinfo): You can use this pragma to stop typeinfo from being
implicitly generated for a declaration.

pragma(no_moduleinfo): You can use this pragma to stop moduleinfo from being
implicitly generated for a declaration.

I've never used those yet, I'll try them soon.

But you meant something more global, module-wide. Maybe you can ask to LDC
devs. I agree that having standard and not compiler-specific features is better.

Bye,
bearophile

Oct 05 2009

Jeremie Pelletier <jeremiep gmail.com> writes:

bearophile wrote:
 Don:
 
 I've often thought that a pragma for a module to "don't generate module 
 info" would be very useful for executable size.

 
 Do you use the LDC compiler?
 
 LDC has the pragmas:
 pragma(no_typeinfo): You can use this pragma to stop typeinfo from being
implicitly generated for a declaration.
 
 pragma(no_moduleinfo): You can use this pragma to stop moduleinfo from being
implicitly generated for a declaration.
 
 I've never used those yet, I'll try them soon.
 
 But you meant something more global, module-wide. Maybe you can ask to LDC
devs. I agree that having standard and not compiler-specific features is better.
 
 Bye,
 bearophile

I would much prefer these to be compiler switches, so you can make them 
global with very little effort.

Oct 05 2009

bearophile <bearophileHUGS lycos.com> writes:

Jeremie Pelletier:

 I would much prefer these to be compiler switches, so you can make them 
 global with very little effort.

Compiler switches are a blunt tool. So I think module-wide switches are better.

Bye,
bearophile

Oct 05 2009

Don <nospam nospam.com> writes:

Jeremie Pelletier wrote:
 bearophile wrote:
 Don:

 I've often thought that a pragma for a module to "don't generate 
 module info" would be very useful for executable size.

 Do you use the LDC compiler?

 LDC has the pragmas:
 pragma(no_typeinfo): You can use this pragma to stop typeinfo from 
 being implicitly generated for a declaration.

 pragma(no_moduleinfo): You can use this pragma to stop moduleinfo from 
 being implicitly generated for a declaration.


Sounds great. They should be standard.

 I've never used those yet, I'll try them soon.

 But you meant something more global, module-wide. Maybe you can ask to 
 LDC devs. I agree that having standard and not compiler-specific 
 features is better.

 Bye,
 bearophile

 
 I would much prefer these to be compiler switches, so you can make them 
 global with very little effort.

That's a completely different use case, I think. For internal modules, 
the existence of that module is an implementation detail, and shouldn't 
be externally visible even through reflection, IMHO.

Oct 05 2009

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to
 disable the generation of all ClassInfo and TypeInfo instances, along
 with a version identifier, maybe "version = RTTI_Disabled;" to let
 code handle it.

 I use RTTI a lot for simple debugging like printing the name of a
 class or type in generic code or meta programming, but not at all in
 production code. Most of the time I can rely on .stringof and a
 message pragma to do the same.

 You use RTTI for dynamic casts, variadic functions, and the default
 implementation of toString. You could safely eliminate some fields
 from ClassInfo and TypeInfo,  but you can't get rid of them entirely.

 The best you can do is make TypeInfo entirely opaque (no fields) and
 only include the base class, interfaces, and name for ClassInfo.

 Yeah something like "don't generate type names" and other extra
 informations would be a definive plus, that makes reverse engineering
 too easy :)

 I've often thought that a pragma for a module to "don't generate module
 info" would be very useful for executable size. I'm particularly
 thinking of bindings like the Win32 headers, where there are a hundred
 modules, and the module info isn't actually useful. There could be a
 default ModuleInfo instance, with module name "ModuleInfoUnavailable",
 which all such modules would point to.

One thing that can trip this up is structs containing floating point numbers
or static arrays, since they have custom initializers.  I've taken to declaring
structs from C headers with an "= void" to eliminate the link dependency,
but maybe the initializer could be eliminated by declaring the struct as:

struct S
{
    char c      = 0;
    float[2] f = 0.0[];
}

Or something like that.

Oct 05 2009

sclytrack <idiot hotmail.com> writes:

Function Parameter Ordering: When defining a function, parameter order is:


inputs, then outputs.<
 D may even enforce this, allowing "out" only after "in" arguments.
 -------------------

Function Default Arguments

void foo(int x, int y = 3)
{
   ...
}
...
foo(4);   // same as foo(4, 3)

Oct 04 2009

=?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:

bearophile wrote:
 I have found this page linked from Reddit (click "Toggle all summaries"=

 at the top to read the full page):
 http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
=20
 At Google C++ isn't the most used language, so it may be better to use =

a C++ style guide from a firm that uses C++ more than Google. On the othe=
r hand Google has hired many good programmers, and probably some of them =
have strong C++ experience, so if you are interested in C++/D this style =
guide deserves to be read.
=20
 This guide is mostly (as it often happens with C++) a list of features =

that are forbidden, I think usually to reduce the total bug count of the =
programs. Some of such imposed limits make me a little nervous, so I'd li=
ke to remove/relax some of those limits, but I am ignorant regarding C++,=
 while the people that have written this document are expert, so their ju=
dgement has weight.
=20
 They forbid several features that are present in D too. Does it means D=

 has to drop such features (or make them less "natural", so the syntax di=
scourages their use)?
=20
 Here are few things from that document that I think are somehow interes=

ting. Some of those things may be added to D style guide, or they may eve=
n suggest changes in the language itself.
=20
 -------------------
=20
 Function Parameter Ordering: When defining a function, parameter order=


 is: inputs, then outputs.<
=20
 D may even enforce this, allowing "out" only after "in" arguments.
=20

	I actually use the inverse convention: "out" arguments come first.=20
This way, it is easy to see that "a =3D b" and "assign (a, b)" modify=20
"a" and not "b".

		Jerome
--=20
mailto:jeberger free.fr
http://jeberger.free.fr
Jabber: jeberger jabber.fr

Oct 04 2009

Justin Johansson <no spam.com> writes:

Jérôme M. Berger Wrote:

 bearophile wrote:
 I have found this page linked from Reddit (click "Toggle all summaries" at the
top to read the full page):
 http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
 
 At Google C++ isn't the most used language, so it may be better to use a C++
style guide from a firm that uses C++ more than Google. On the other hand
Google has hired many good programmers, and probably some of them have strong
C++ experience, so if you are interested in C++/D this style guide deserves to
be read.
 
 This guide is mostly (as it often happens with C++) a list of features that
are forbidden, I think usually to reduce the total bug count of the programs.
Some of such imposed limits make me a little nervous, so I'd like to
remove/relax some of those limits, but I am ignorant regarding C++, while the
people that have written this document are expert, so their judgement has
weight.
 
 They forbid several features that are present in D too. Does it means D has to
drop such features (or make them less "natural", so the syntax discourages
their use)?
 
 Here are few things from that document that I think are somehow interesting.
Some of those things may be added to D style guide, or they may even suggest
changes in the language itself.
 
 -------------------
 
 Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<

 
 D may even enforce this, allowing "out" only after "in" arguments.
 

 	I actually use the inverse convention: "out" arguments come first. 
 This way, it is easy to see that "a = b" and "assign (a, b)" modify 
 "a" and not "b".
 
 		Jerome

Ditto.  A special use case to consider is when you have a function template
that returns a type
that is a template parameter and the types of the function arguments are also
template parameters.

Often type inference can be used to determine the type of the function
arguments without explicit
qualification of the argument types in the instantiation.  The return type must
be specified however,
since inference cannot be made from missing information.

This suggests a natural order that results (and out arguments) should be on the
LHS and (in) arguments
on the RHS.

So if one writes this:

R Foo(R, A1, A2)( A1 arg1, A2 arg2) {
  R r;
  return r;
}

auto r = Foo!(double)( 3, 4);

Isn't it more natural or consistent to write this also:

void Bar(R, A1, A2)( out R r, A1 arg1, A2 arg2) {
}

double r;
Bar!(double)( 3, 4);

I haven't tried it so not sure if this works but you get the idea.

Another reason why outs/inouts should be before in arguments is in the case of
functions
taking variable length argument lists or variadic arguments.  Normally there is
only one
output argument but there is an arbitrary number of input arguments that the
function
can take.

Yet another reason why so is by analogy with output stream functions; an output
stream
argument is analogous to an output value or reference.

Nearly all I/O libraries that I've seen have usage like this:

fprintf( stdout, /+args...+/);

write( os, value);

Rarely the other way around, namely, input arguments before output stream/file
channel argument.

-- Justin Johansson

Oct 04 2009

bearophile <bearophileHUGS lycos.com> writes:

Justin Johansson:

 The return type must be specified however,
 since inference cannot be made from missing information.

If the information isn't missing in D2 you can sometimes use "auto" return type
for function templates and some functions, and in some other situations you can
also use typeof(return).

Bye,
bearophile

Oct 04 2009

Kagamin <spam here.lot> writes:

bearophile Wrote:

Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<

 
 D may even enforce this, allowing "out" only after "in" arguments.

I'm trying to do the reverse. Maybe I used fprintf and sprintf too much.

Static and Global Variables: Static or global variables of class type are
forbidden: they cause hard-to-find bugs due to indeterminate order of
construction and destruction. [...] The order in which class constructors,
destructors, and initializers for static variables are called is only partially
specified in C++ and can even change from build to build, which can cause bugs
that are difficult to find. [...] As a result we only allow static variables to
contain POD data.<

 
 I think D avoids such problem.

No. D has static constructors which do the same.

Doing Work in Constructors: Do only trivial initialization in a constructor. If
at all possible, use an Init() method for non-trivial initialization. [...] If
the work calls virtual functions, these calls will not get dispatched to the
subclass implementations. Future modification to your class can quietly
introduce this problem even if your class is not currently subclassed, causing
much confusion.<


Never understood this advice to split the construction of object? What is it
trying to solve? And how they plan to not dispatch calls to subclasses? Do they
overwrite vtbl at the end of constructor? In fact DMD has bug here: spec says,
this pointer must not be taken implicitly or explicitly, yet dmd allows calling
virtual methods on the object being constructed.

Declaration Order: Use the specified order of declarations within a class:
public: before private:, methods before data members (variables), etc.<

 
 D may even enforce such order (Pascal does something similar).

Methods before data seems unnatural for me.

Decision: If you want to overload a function, consider qualifying the name with
some information about the arguments, e.g., AppendString(), AppendInt() rather
than just Append().<

 
 
 This is a strong limitation. One of the things that makes C++ more handy than
C. I accept it for normal code, but I refuse it for "library code". Library
code is designed to be more flexible and reusable, making syntax simpler, etc.
 So I want D to keep overloaded functions.

A good example is BinaryWriter. It's unusable when implemented with overloaded
methods.

Default Arguments: We do not allow default function parameters.<

 
Decision: We require all arguments to be explicitly specified, to force
programmers to consider the API and the values they are passing for each
argument rather than silently accepting defaults they may not be aware of.<

 

Is it a solution? Default parameters can be emulated by overloads with
different number of parameters, which call actual method with defaults for the
rest of the parameters. They just propose to always use the full api? How about
going back to asm to consider your code rather than accepting compiler magic?

 Integer Types:
 
You should not use the unsigned integer types such as uint32_t, unless the
quantity you are representing is really a bit pattern rather than a number, or
unless you need defined twos-complement overflow. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this.<

 
 I'm for the removal of size_t from everywhere it's not stricly necessary (so
for example from array lenghts) to avoid bugs.

Yess, unsigneds are evil. They must go to the camp of gotos and unsafe pointers.

 Type Names: often I don't like the C++ practice of using a single uppercase
letter for a template type, like T. Better to give a meaningful name to types,
when possible.
 

I thought it's a common practice that the length (meaningfulness) of the name
of a variable is determined more by the size of its scope rather than its
purpose.

Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<

 
 4 spaces are more readable :-)
 

I prefer 3. 4 is too much. Almost every editor has the option to specify the
tab width and people have different tastes.

Oct 05 2009

Kagamin <spam here.lot> writes:

Kagamin Wrote:

 In fact DMD has bug here: spec says, this pointer must not be taken implicitly
or explicitly, yet dmd allows calling virtual methods on the object being
constructed.

A... I've misread the spec a little. Though I think, it's still a problem that
constructor allows to call virtual methods.

Oct 05 2009

bearophile <bearophileHUGS lycos.com> writes:

Kagamin:

 I'm for the removal of size_t from everywhere it's not stricly necessary (so
for example from array lenghts) to avoid bugs.

 
 Yess, unsigneds are evil. They must go to the camp of gotos and unsafe
pointers.

In D it's better to not use them when you want a strictly positive number, or
for general iteration purposes, etc. So I don't like to see them used in the
built-ins and std lib where they aren't necessary.
I use them when I need bitfields, or when I need the full range (but that's
less common).
If you want me to list something that's a little evil, is the automatic silent
cast from an integral to its unsigned version. I'd like to disallow such silent


Regarding pointers, they are unsafe, but there are ways to increase their
safety a little, with no performance costs in release mode. I think this is
positive because it helps find and fix bugs in less time.

Bye,
bearophile

Oct 05 2009

bearophile <bearophileHUGS lycos.com> writes:

If you want me to list something that's a little evil, is the automatic silent
cast from an integral to its unsigned version. I'd like to disallow such silent


We may even disallow all implicit conversions that lose a significant amount of
information:

double => float
real => float


And maybe even (but this is less handy, so I am not sure): real => double

------------------------

Even long => real sometimes loses a little information:

import std.stdio: writeln;
void main() {
    real r = long.min;
    writeln(r, " ", cast(long)r, " ", long.max-cast(long)r);
}

But for now I'm not interested in regulating long => real implicit casts.

Bye,
bearophile

Oct 05 2009

bearophile <bearophileHUGS lycos.com> writes:

Jeremie Pelletier:

 However we're 
 talking systems programming here, people want the choice between using 
 the feature or not using it :)

We aren't talking about a feature here, but a standard syntax to denote class
attributes. And D being a system language has nothing to do with being free to
take such kind of choices. A system language has to give freedom in how you use
memory or how you use the CPU at runtime, it has nothing to do to the syntax
you use to write identifiers. Such freedom isn't required.

Bye,
bearophile

Oct 05 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Google C++ style guide