www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Google C++ style guide

reply bearophile <bearophileHUGS lycos.com> writes:
I have found this page linked from Reddit (click "Toggle all summaries" at the
top to read the full page):
http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml

At Google C++ isn't the most used language, so it may be better to use a C++
style guide from a firm that uses C++ more than Google. On the other hand
Google has hired many good programmers, and probably some of them have strong
C++ experience, so if you are interested in C++/D this style guide deserves to
be read.

This guide is mostly (as it often happens with C++) a list of features that are
forbidden, I think usually to reduce the total bug count of the programs. Some
of such imposed limits make me a little nervous, so I'd like to remove/relax
some of those limits, but I am ignorant regarding C++, while the people that
have written this document are expert, so their judgement has weight.

They forbid several features that are present in D too. Does it means D has to
drop such features (or make them less "natural", so the syntax discourages
their use)?

Here are few things from that document that I think are somehow interesting.
Some of those things may be added to D style guide, or they may even suggest
changes in the language itself.

-------------------

Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<
D may even enforce this, allowing "out" only after "in" arguments. -------------------
Nested Classes: Do not make nested classes public unless they are actually part
of the interface, e.g., a class that holds a set of options for some method.<
-------------------
Static and Global Variables: Static or global variables of class type are
forbidden: they cause hard-to-find bugs due to indeterminate order of
construction and destruction. [...] The order in which class constructors,
destructors, and initializers for static variables are called is only partially
specified in C++ and can even change from build to build, which can cause bugs
that are difficult to find. [...] As a result we only allow static variables to
contain POD data.<
I think D avoids such problem. -------------------
Doing Work in Constructors: Do only trivial initialization in a constructor. If
at all possible, use an Init() method for non-trivial initialization. [...] If
the work calls virtual functions, these calls will not get dispatched to the
subclass implementations. Future modification to your class can quietly
introduce this problem even if your class is not currently subclassed, causing
much confusion.<
-------------------
Declaration Order: Use the specified order of declarations within a class:
public: before private:, methods before data members (variables), etc.<
D may even enforce such order (Pascal does something similar). -------------------
Reference Arguments: All parameters passed by reference must be labeled const.<
In fact it is a very strong convention in Google code that input arguments are
values or const references while output arguments are pointers. Input
parameters may be const pointers, but we never allow non-const reference
parameters.<
I think C solves part of such problem forcing the programmer to add "ref" before the variable name in the calling place too. D may do the same. ------------------- Function Overloading: Use overloaded functions (including constructors) only in cases where input can be specified in different types that contain the same information.
Cons: One reason to minimize function overloading is that overloading can make
it hard to tell which function is being called at a particular call site.
Another one is that most people are confused by the semantics of inheritance if
a deriving class overrides only some of the variants of a function.<
Decision: If you want to overload a function, consider qualifying the name with
some information about the arguments, e.g., AppendString(), AppendInt() rather
than just Append().<
This is a strong limitation. One of the things that makes C++ more handy than C. I accept it for normal code, but I refuse it for "library code". Library code is designed to be more flexible and reusable, making syntax simpler, etc. So I want D to keep overloaded functions. -------------------
Default Arguments: We do not allow default function parameters.<
Cons: People often figure out how to use an API by looking at existing code
that uses it. Default parameters are more difficult to maintain because
copy-and-paste from previous code may not reveal all the parameters.
Copy-and-pasting of code segments can cause major problems when the default
arguments are not appropriate for the new code.<
Decision: We require all arguments to be explicitly specified, to force
programmers to consider the API and the values they are passing for each
argument rather than silently accepting defaults they may not be aware of.<
This too is a strong limitation. I understand that it may make life a little more complex, but they are handy. So I think their usage has to be limited, but I don't like to totally forbid them. "Forcing the programmers to consider the API" has some negative side-effects too that they seem to ignore. So I want D to keep its default function parameters feature. -------------------
Variable-Length Arrays and alloca(): We do not allow variable-length arrays or
alloca().<
Cons: Variable-length arrays and alloca [...] allocate a data-dependent amount
of stack space that can trigger difficult-to-find memory overwriting bugs: "It
ran fine on my machine, but dies mysteriously in production".<
Decision:  Use a safe allocator instead, such as scoped_ptr/scoped_array.<
After reading this page: http://www.boost.org/doc/libs/1_40_0/libs/smart_ptr/scoped_array.htm I think they are just a pointer that points to heap-allocated memory, plus it gets deallocated when the scope ends. In 99.5% of the cases a heap allocation is good enough in D (especially of the GC gets better). But once in a while speed is more important, so for very small arrays I'd like to have variable-length arrays in D (allocating large arrays on the stack is always bad in production code). -------------------
Run-Time Type Information (RTTI): We do not use Run Time Type Information
(RTTI).<
If you find yourself in need of writing code that behaves differently based on
the class of an object, consider one of the alternatives to querying the type.
Virtual methods are the preferred way of executing different code paths
depending on a specific subclass type. This puts the work within the object
itself. If the work belongs outside the object and instead in some processing
code, consider a double-dispatch solution, such as the Visitor design pattern.
This allows a facility outside the object itself to determine the type of class
using the built-in type system. If you think you truly cannot use those ideas,
you may use RTTI. But think twice about it. :-) Then think twice again. Do not
hand-implement an RTTI-like workaround. The arguments against RTTI apply just
as much to workarounds like class hierarchies with type tags. <
I think this is in most situations acceptable. On the other hand I'd like D to have a better implemented reflection (whithin the bounds of the things that can be done by a static compiler, even if future D implementations may run on a VM, like a future alternative LDC), that can be useful in unittesting. I am not sure about this, I don't use RTTI a lot in D code. -------------------
Casting: Use C++ casts like static_cast<>(). Do not use other cast formats like
int y = (int)x; or int y = int(x);.<
Pros: The problem with C casts is the ambiguity of the operation; sometimes you
are doing a conversion (e.g., (int)3.5) and sometimes you are doing a cast
(e.g., (int)"hello"); C++ casts avoid this. Additionally C++ casts are more
visible when searching for them.<
Do not use C-style casts. Instead, use these C++-style casts.
* Use static_cast as the equivalent of a C-style cast that does value conversion, or when you need to explicitly up-cast a pointer from a class to its superclass. * Use const_cast to remove the const qualifier (see const). * Use reinterpret_cast to do unsafe conversions of pointer types to and from integer and other pointer types. Use this only if you know what you are doing and you understand the aliasing issues. * Do not use dynamic_cast except in test code. If you need to know type information at runtime in this way outside of a unittest, you probably have a design flaw.< I agree with them that mixing all different kinds of cast as in D is bad. In D I'd like to know what I'm doing in a more precise way. This is something that can be improved in D. ------------------- Integer Types:
You should not use the unsigned integer types such as uint32_t, unless the
quantity you are representing is really a bit pattern rather than a number, or
unless you need defined twos-complement overflow. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this.<
I'm for the removal of size_t from everywhere it's not stricly necessary (so for example from array lenghts) to avoid bugs. See also the recent thread about signed-unsigned issues: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=17800 Integer oveflow tests too will help. ------------------- Boost:
Cons: Some Boost libraries encourage coding practices which can hamper
readability, such as metaprogramming and other advanced template techniques,
and an excessively "functional" style of programming.<
Advanced used of templates makes the code less easy to understand. But sometimes functional style makes code shorter, more readable, safer multiprocessing-wise, sometimes even parallelizable, etc. ------------------- Type Names: often I don't like the C++ practice of using a single uppercase letter for a template type, like T. Better to give a meaningful name to types, when possible. -------------------
Class Data Members: Data members (also called instance variables or member
variables) are lowercase with optional underscores like regular variable names,
but always end with a trailing underscore.<
D may even enforce some simple syntax for class members, like that underscore or something else. No other variable is allowed to share the same syntax (so this syntax is used iff it's a class member). It makes conversions from other languages a little more work, but I think it will pay off. -------------------
Regular Functions: Functions should start with a capital letter and have a
capital letter for each new word. No underscores:<
That's ugly. -------------------
Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<
4 spaces are more readable :-) ------------------- Pointer and Reference Expressions: // These are fine, space following. char* c; // but remember to do "char* c, *d, *e, ...;"! That's good in D but bad in C/C++. They are wrong here. -------------------
Class Format: Sections in public, protected and private order, each indented
one space.<
There are no good solutions to this. I use 4 spaces for them too. ------------------- Loops and Conditionals: for ( ; i < 5 ; ++i) { // For loops always have a space after the ... // semicolon, and may have a space before the // semicolon. That space before the ; is quite important. But I don't think there's a need for a warning if it's absent. ------------------- Bye, bearophile
Oct 03 2009
next sibling parent Justin Johansson <no spam.com> writes:
bearophile Wrote:

Regular Functions: Functions should start with a capital letter and have a
capital letter for each new word. No underscores:<
That's ugly.
Coming from a career in acronym-city (aerospace), project management mandated that use of acronyms in identifiers MUST be clearly indicated with uppercase letters. In the event that ambiguity could arise, such as in camel-cased identifiers, the end of an acronym had to be separated by an underscore between it and any following letter in the identifier. This rule, whilst painful/ugly at times, was rigorously enforced in safety critical systems lest there be any possibility, no matter how remote, of confusion with interpretation of nomenclature in systems engineering documents. So, for example, the following were for verboten: ParseXmlDocument (The correct acronym for Xml is XML) PaseXMLDocument (XMLD might be erroneously interpreted as a 4 letter acronym) Required formulation of the identifier in the case must be "ParseXML_Document". Ciao Justin Johansson
Oct 03 2009
prev sibling next sibling parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 I have found this page linked from Reddit (click "Toggle all summaries" at the
top to read the full page):
 http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
 
 At Google C++ isn't the most used language, so it may be better to use a C++
style guide from a firm that uses C++ more than Google. On the other hand
Google has hired many good programmers, and probably some of them have strong
C++ experience, so if you are interested in C++/D this style guide deserves to
be read.
 
 This guide is mostly (as it often happens with C++) a list of features that
are forbidden, I think usually to reduce the total bug count of the programs.
Some of such imposed limits make me a little nervous, so I'd like to
remove/relax some of those limits, but I am ignorant regarding C++, while the
people that have written this document are expert, so their judgement has
weight.
 
 They forbid several features that are present in D too. Does it means D has to
drop such features (or make them less "natural", so the syntax discourages
their use)?
 
 Here are few things from that document that I think are somehow interesting.
Some of those things may be added to D style guide, or they may even suggest
changes in the language itself.
I think these are more programming guidelines than language design rules. That's like most academic teachers saying "goto" is evil and should never be used, yet new languages like D still support it.
 -------------------
 
 Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<
D may even enforce this, allowing "out" only after "in" arguments.
That can be good for readability in most cases, but I also like to order parameters in logical order instead of storage class order, enforcing parameter order would also break lots of existing code.
 Static and Global Variables: Static or global variables of class type are
forbidden: they cause hard-to-find bugs due to indeterminate order of
construction and destruction. [...] The order in which class constructors,
destructors, and initializers for static variables are called is only partially
specified in C++ and can even change from build to build, which can cause bugs
that are difficult to find. [...] As a result we only allow static variables to
contain POD data.<
I think D avoids such problem.
Indeed, static ctors/dtors are very useful but I like to keep their number down to a minimum and perform lazy initialization instead.
 -------------------
 
 Declaration Order: Use the specified order of declarations within a class:
public: before private:, methods before data members (variables), etc.<
D may even enforce such order (Pascal does something similar).
Again, I wouldn't want to enforce such an order, sometimes I declare a private helper method right next to the set of public methods using it so I don't have to scroll down 400 lines to view the two.
 -------------------
 
 Reference Arguments: All parameters passed by reference must be labeled const.<
 In fact it is a very strong convention in Google code that input arguments are
values or const references while output arguments are pointers. Input
parameters may be const pointers, but we never allow non-const reference
parameters.<
I think C solves part of such problem forcing the programmer to add "ref" before the variable name in the calling place too. D may do the same.
I don't recall C having a "ref" keyword :) That guideline I agree with, that's also how I write my parameters, although I take it a step further in D with in/const/immutable: 'in' for variables that are not modified and don't escape the method's scope. 'const' for variables that are not modified but escape the method's scope, maybe with a copy because the data may be mutable somewhere else. 'immutable' for variables that are not modified but escape the method's scope, never copied because they're expected to never change for their entire lifetime.
 -------------------
 
 Function Overloading: Use overloaded functions (including constructors) only
in cases where input can be specified in different types that contain the same
information.
 
 Cons: One reason to minimize function overloading is that overloading can make
it hard to tell which function is being called at a particular call site.
Another one is that most people are confused by the semantics of inheritance if
a deriving class overrides only some of the variants of a function.<
 Decision: If you want to overload a function, consider qualifying the name
with some information about the arguments, e.g., AppendString(), AppendInt()
rather than just Append().<
This is a strong limitation. One of the things that makes C++ more handy than C. I accept it for normal code, but I refuse it for "library code". Library code is designed to be more flexible and reusable, making syntax simpler, etc. So I want D to keep overloaded functions.
I partly agree, function overloading is very nice if you need generic code. But I also agree with the guideline in that you should keep your overloads short and to the point. For example on my output stream interface I allow writes from direct data or data from an input stream, those have different names instead of an overload because there's nothing generic here. Anyways, considering how easy it is to write method templates in D overloading for different primitive types is almost unneeded.
 -------------------
 
 Default Arguments: We do not allow default function parameters.<
 Cons: People often figure out how to use an API by looking at existing code
that uses it. Default parameters are more difficult to maintain because
copy-and-paste from previous code may not reveal all the parameters.
Copy-and-pasting of code segments can cause major problems when the default
arguments are not appropriate for the new code.<
 Decision: We require all arguments to be explicitly specified, to force
programmers to consider the API and the values they are passing for each
argument rather than silently accepting defaults they may not be aware of.<
This too is a strong limitation. I understand that it may make life a little more complex, but they are handy. So I think their usage has to be limited, but I don't like to totally forbid them. "Forcing the programmers to consider the API" has some negative side-effects too that they seem to ignore. So I want D to keep its default function parameters feature.
I completely agree here, JavaScript for example has no default parameters and it's annoying as hell. Looking at existing code is really handy to learn about the usage of a function when the documentation is too vague, that documentation is still the best source to learn about the parameters.
 -------------------
 
 Variable-Length Arrays and alloca(): We do not allow variable-length arrays or
alloca().<
 Cons: Variable-length arrays and alloca [...] allocate a data-dependent amount
of stack space that can trigger difficult-to-find memory overwriting bugs: "It
ran fine on my machine, but dies mysteriously in production".<
 Decision:  Use a safe allocator instead, such as scoped_ptr/scoped_array.<
After reading this page: http://www.boost.org/doc/libs/1_40_0/libs/smart_ptr/scoped_array.htm I think they are just a pointer that points to heap-allocated memory, plus it gets deallocated when the scope ends. In 99.5% of the cases a heap allocation is good enough in D (especially of the GC gets better). But once in a while speed is more important, so for very small arrays I'd like to have variable-length arrays in D (allocating large arrays on the stack is always bad in production code).
I barely use alloca at all, since you don't always know if the array is going to be 50 bytes or 20k bytes. If you know the array's size or at least the max size it can get then you can just use a fixed-size array which will get allocated on the stack.
 -------------------
 
 Run-Time Type Information (RTTI): We do not use Run Time Type Information
(RTTI).<
 If you find yourself in need of writing code that behaves differently based on
the class of an object, consider one of the alternatives to querying the type.
Virtual methods are the preferred way of executing different code paths
depending on a specific subclass type. This puts the work within the object
itself. If the work belongs outside the object and instead in some processing
code, consider a double-dispatch solution, such as the Visitor design pattern.
This allows a facility outside the object itself to determine the type of class
using the built-in type system. If you think you truly cannot use those ideas,
you may use RTTI. But think twice about it. :-) Then think twice again. Do not
hand-implement an RTTI-like workaround. The arguments against RTTI apply just
as much to workarounds like class hierarchies with type tags. <
I think this is in most situations acceptable. On the other hand I'd like D to have a better implemented reflection (whithin the bounds of the things that can be done by a static compiler, even if future D implementations may run on a VM, like a future alternative LDC), that can be useful in unittesting. I am not sure about this, I don't use RTTI a lot in D code.
Me neither, in fact I would *love* to see a -nrtti switch in DMD to disable the generation of all ClassInfo and TypeInfo instances, along with a version identifier, maybe "version = RTTI_Disabled;" to let code handle it. I use RTTI a lot for simple debugging like printing the name of a class or type in generic code or meta programming, but not at all in production code. Most of the time I can rely on .stringof and a message pragma to do the same.
 -------------------
 
 Casting: Use C++ casts like static_cast<>(). Do not use other cast formats
like int y = (int)x; or int y = int(x);.<
 Pros: The problem with C casts is the ambiguity of the operation; sometimes
you are doing a conversion (e.g., (int)3.5) and sometimes you are doing a cast
(e.g., (int)"hello"); C++ casts avoid this. Additionally C++ casts are more
visible when searching for them.<
 Do not use C-style casts. Instead, use these C++-style casts.
* Use static_cast as the equivalent of a C-style cast that does value conversion, or when you need to explicitly up-cast a pointer from a class to its superclass. * Use const_cast to remove the const qualifier (see const). * Use reinterpret_cast to do unsafe conversions of pointer types to and from integer and other pointer types. Use this only if you know what you are doing and you understand the aliasing issues. * Do not use dynamic_cast except in test code. If you need to know type information at runtime in this way outside of a unittest, you probably have a design flaw.< I agree with them that mixing all different kinds of cast as in D is bad. In D I'd like to know what I'm doing in a more precise way. This is something that can be improved in D.
I also agree with you here, static/dynamic/reinterpret casts aren't that hard to understand in C++ and really say what the programmer wants to do, as well as letting the compiler warn you when its not a possible cast. Its all neat to have a single cast keyword that does it all, but its even better to know whats happening and to control it, maybe the cast syntax can be extended like this: cast(Object, static)(new Foo); as well as dynamic and reinterpret identifiers, which wouldn't be keywords anywhere else in the language (just like __traits and pragma do)
 -------------------
 
 Integer Types:
 
 You should not use the unsigned integer types such as uint32_t, unless the
quantity you are representing is really a bit pattern rather than a number, or
unless you need defined twos-complement overflow. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this.<
I'm for the removal of size_t from everywhere it's not stricly necessary (so for example from array lenghts) to avoid bugs.
I don't think this guideline was about the size of integrals but rather their sign bit.
 See also the recent thread about signed-unsigned issues:
 http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D.learn&article_id=17800
 
 Integer oveflow tests too will help.
Yeah I would like overflow tests in D too, although I don't like how you can't control which tests are used and which arent, they're either all enabled or all disabled.
 -------------------
 
 Boost:
 
 Cons: Some Boost libraries encourage coding practices which can hamper
readability, such as metaprogramming and other advanced template techniques,
and an excessively "functional" style of programming.<
Advanced used of templates makes the code less easy to understand. But sometimes functional style makes code shorter, more readable, safer multiprocessing-wise, sometimes even parallelizable, etc.
Boost is the best thing to happen to C++! I agree it can get very hard to maintain readability in C++, but D does not have that problem. Templates in D are very elegant and much more powerful than C++'s at the same time. It really depends on what you're coding, for example I use very little templates in a GUI interface but I use templates on nearly every function to handle strings. I also use templates a lot as class/method traits to lower the runtime overhead.
 -------------------
 
 Type Names: often I don't like the C++ practice of using a single uppercase
letter for a template type, like T. Better to give a meaningful name to types,
when possible.
I think T fits generic template parameters the same way i fits for loops :)
 -------------------
 
 Class Data Members: Data members (also called instance variables or member
variables) are lowercase with optional underscores like regular variable names,
but always end with a trailing underscore.<
D may even enforce some simple syntax for class members, like that underscore or something else. No other variable is allowed to share the same syntax (so this syntax is used iff it's a class member). It makes conversions from other languages a little more work, but I think it will pay off.
I don't think it should be enforced by the language, it's a great guideline but the programmer should be free to select its flavor (ie m_var, mVar, _var, var_, etc)
 -------------------
 
 Regular Functions: Functions should start with a capital letter and have a
capital letter for each new word. No underscores:<
That's ugly.
That's how I write my method names! Maybe I did too much code around the win32 api, the Mozilla code also uses these method names. I like it that way cause I can easily differentiate variableNames from MethodNames from CONSTANT_NAMES :)
 -------------------
 
 Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<
4 spaces are more readable :-)
Tabs are better since the editor can be set to whatever number of spaces you wish for them :) I use 4 myself.
 -------------------
 
 Loops and Conditionals:
 
 for ( ; i < 5 ; ++i) {  // For loops always have a space after the
   ...                   // semicolon, and may have a space before the
                         // semicolon.
 
 That space before the ; is quite important. But I don't think there's a need
for a warning if it's absent.
Why would there be a warning?
 -------------------
 
 Bye,
 bearophile
Oct 03 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

I think these are more programming guidelines than language design rules.<
Yes, of course. But programming guidelines can give possible ideas to a language designer, because: - if everyone is encouraged to follow a certain idiom to avoid bugs, it may be good to let the language itself enforce the idiom (see D that disallows for(...); ). - if most similar guidelines suggest to not use a certain language feature, such feature may need a redesign, or maybe to be made "less nice" syntax-wise, so the syntax shows its usage is discouraged. - if in many guidelines suggest to do something in a standard way, to improve uniformity, it may be good to add such thing too to help spreading and transmission of code in the programmer community of that language. One of the causes of Python success is that it forces a very uniform coding style, and this helps people understand and modify each other code, this helps a little the creation of an ecosystem of reusable code. The compile-enforcing of syntax for class attributes in D can be one of such things.
enforcing parameter order would also break lots of existing code.<
D2 is in flux still, every release breaks existing code.
I don't recall C having a "ref" keyword :)<
I completely agree here, JavaScript for example has no default parameters and
it's annoying as hell. Looking at existing code is really handy to learn about
the usage of a function when the documentation is too vague, that documentation
is still the best source to learn about the parameters.<
I'm waiting for named arguments too in D :-)
I barely use alloca at all, since you don't always know if the array is going
to be 50 bytes or 20k bytes. If you know the array's size or at least the max
size it can get then you can just use a fixed-size array which will get
allocated on the stack.<
I was talking about smarter function, that allocates on the heap if the requested size is too much large or if the stack is finishing :-) But of course fixed sized arrays are often enough.
I don't think this guideline was about the size of integrals but rather their
sign bit.<
Right, I meant unsigned integral numbers.
Yeah I would like overflow tests in D too, although I don't like how you can't
control which tests are used and which arent, they're either all enabled or all
disabled.<
There are ways to solve this problem/limit. Putting basic tests in is a starting point. I have given LLVM developers some small enhancements requests to implement such tests more efficiently: http://llvm.org/bugs/show_bug.cgi?id=4916 http://llvm.org/bugs/show_bug.cgi?id=4917 http://llvm.org/bugs/show_bug.cgi?id=4918 I have also discussed this topic with LDC developers, for possible implementations.
I agree it can get very hard to maintain readability in C++, but D does not
have that problem. Templates in D are very elegant and much more powerful than
C++'s at the same time.<
D template programming can become very unreadable, trust me :-)
I think T fits generic template parameters the same way i fits for loops :)<
Sometimes I avoid "i" for loops :-)
I don't think it should be enforced by the language, it's a great guideline but
the programmer should be free to select its flavor (ie m_var, mVar, _var, var_,
etc)<
Here I don't agree with you. Uniformity in such thing is important enough. Bye and thank you for your answers, bearophile
Oct 03 2009
parent Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 Jeremie Pelletier:
 
 I think these are more programming guidelines than language design rules.<
Yes, of course. But programming guidelines can give possible ideas to a language designer, because: - if everyone is encouraged to follow a certain idiom to avoid bugs, it may be good to let the language itself enforce the idiom (see D that disallows for(...); ). - if most similar guidelines suggest to not use a certain language feature, such feature may need a redesign, or maybe to be made "less nice" syntax-wise, so the syntax shows its usage is discouraged. - if in many guidelines suggest to do something in a standard way, to improve uniformity, it may be good to add such thing too to help spreading and transmission of code in the programmer community of that language. One of the causes of Python success is that it forces a very uniform coding style, and this helps people understand and modify each other code, this helps a little the creation of an ecosystem of reusable code. The compile-enforcing of syntax for class attributes in D can be one of such things.
I'm not sure if that's a good thing, different companies enforce different guidelines for different reasons, and then you have independent programmers with their own guidelines too. As for less nice syntax, I'd hate to use __goto, __traits is already ugly enough that I always hide it behind a template with a nicer name and lets not even talk about __gshared showing its ugly self all over my C bindings :) Maybe if the compiler had a -strict switch to enforce a certain guideline over code, we already have -safe for enforcements over memory usage! Such an enforcement would then be an awesome feature for D to have. I'm not against the idea, I'm against making it the only available option!
 I was talking about smarter function, that allocates on the heap if the
requested size is too much large or if the stack is finishing :-) But of course
fixed sized arrays are often enough.
Those smarts have some overhead to them to first check the allocation size and the remaining stack size, and finally call the appropriate allocator, that overhead would almost make such a smart function useless when compared to direct heap allocations.
 D template programming can become very unreadable, trust me :-)
Not anymore than any other bit of code :)
 Sometimes I avoid "i" for loops :-)
Sometimes I avoid "T" for templates :)
 Here I don't agree with you. Uniformity in such thing is important enough.
Again I believe such an enforcement should be behind a -strict switch, I agree with you that uniformity can be a great thing and I can only imagine the all good it does to the python community. However we're talking systems programming here, people want the choice between using the feature or not using it :) Jeremie
Oct 03 2009
prev sibling parent reply Christopher Wright <dhasenan gmail.com> writes:
Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
 disable the generation of all ClassInfo and TypeInfo instances, along 
 with a version identifier, maybe "version = RTTI_Disabled;" to let code 
 handle it.
 
 I use RTTI a lot for simple debugging like printing the name of a class 
 or type in generic code or meta programming, but not at all in 
 production code. Most of the time I can rely on .stringof and a message 
 pragma to do the same.
You use RTTI for dynamic casts, variadic functions, and the default implementation of toString. You could safely eliminate some fields from ClassInfo and TypeInfo, but you can't get rid of them entirely. The best you can do is make TypeInfo entirely opaque (no fields) and only include the base class, interfaces, and name for ClassInfo.
Oct 04 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
 disable the generation of all ClassInfo and TypeInfo instances, along 
 with a version identifier, maybe "version = RTTI_Disabled;" to let 
 code handle it.

 I use RTTI a lot for simple debugging like printing the name of a 
 class or type in generic code or meta programming, but not at all in 
 production code. Most of the time I can rely on .stringof and a 
 message pragma to do the same.
You use RTTI for dynamic casts, variadic functions, and the default implementation of toString. You could safely eliminate some fields from ClassInfo and TypeInfo, but you can't get rid of them entirely. The best you can do is make TypeInfo entirely opaque (no fields) and only include the base class, interfaces, and name for ClassInfo.
Yeah something like "don't generate type names" and other extra informations would be a definive plus, that makes reverse engineering too easy :)
Oct 04 2009
parent reply Don <nospam nospam.com> writes:
Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to 
 disable the generation of all ClassInfo and TypeInfo instances, along 
 with a version identifier, maybe "version = RTTI_Disabled;" to let 
 code handle it.

 I use RTTI a lot for simple debugging like printing the name of a 
 class or type in generic code or meta programming, but not at all in 
 production code. Most of the time I can rely on .stringof and a 
 message pragma to do the same.
You use RTTI for dynamic casts, variadic functions, and the default implementation of toString. You could safely eliminate some fields from ClassInfo and TypeInfo, but you can't get rid of them entirely. The best you can do is make TypeInfo entirely opaque (no fields) and only include the base class, interfaces, and name for ClassInfo.
Yeah something like "don't generate type names" and other extra informations would be a definive plus, that makes reverse engineering too easy :)
I've often thought that a pragma for a module to "don't generate module info" would be very useful for executable size. I'm particularly thinking of bindings like the Win32 headers, where there are a hundred modules, and the module info isn't actually useful. There could be a default ModuleInfo instance, with module name "ModuleInfoUnavailable", which all such modules would point to.
Oct 05 2009
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Don:

 I've often thought that a pragma for a module to "don't generate module 
 info" would be very useful for executable size.
Do you use the LDC compiler? LDC has the pragmas: pragma(no_typeinfo): You can use this pragma to stop typeinfo from being implicitly generated for a declaration. pragma(no_moduleinfo): You can use this pragma to stop moduleinfo from being implicitly generated for a declaration. I've never used those yet, I'll try them soon. But you meant something more global, module-wide. Maybe you can ask to LDC devs. I agree that having standard and not compiler-specific features is better. Bye, bearophile
Oct 05 2009
parent reply Jeremie Pelletier <jeremiep gmail.com> writes:
bearophile wrote:
 Don:
 
 I've often thought that a pragma for a module to "don't generate module 
 info" would be very useful for executable size.
Do you use the LDC compiler? LDC has the pragmas: pragma(no_typeinfo): You can use this pragma to stop typeinfo from being implicitly generated for a declaration. pragma(no_moduleinfo): You can use this pragma to stop moduleinfo from being implicitly generated for a declaration. I've never used those yet, I'll try them soon. But you meant something more global, module-wide. Maybe you can ask to LDC devs. I agree that having standard and not compiler-specific features is better. Bye, bearophile
I would much prefer these to be compiler switches, so you can make them global with very little effort.
Oct 05 2009
next sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 I would much prefer these to be compiler switches, so you can make them 
 global with very little effort.
Compiler switches are a blunt tool. So I think module-wide switches are better. Bye, bearophile
Oct 05 2009
prev sibling parent Don <nospam nospam.com> writes:
Jeremie Pelletier wrote:
 bearophile wrote:
 Don:

 I've often thought that a pragma for a module to "don't generate 
 module info" would be very useful for executable size.
Do you use the LDC compiler? LDC has the pragmas: pragma(no_typeinfo): You can use this pragma to stop typeinfo from being implicitly generated for a declaration. pragma(no_moduleinfo): You can use this pragma to stop moduleinfo from being implicitly generated for a declaration.
Sounds great. They should be standard.
 I've never used those yet, I'll try them soon.

 But you meant something more global, module-wide. Maybe you can ask to 
 LDC devs. I agree that having standard and not compiler-specific 
 features is better.

 Bye,
 bearophile
I would much prefer these to be compiler switches, so you can make them global with very little effort.
That's a completely different use case, I think. For internal modules, the existence of that module is an implementation detail, and shouldn't be externally visible even through reflection, IMHO.
Oct 05 2009
prev sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Don (nospam nospam.com)'s article
 Jeremie Pelletier wrote:
 Christopher Wright wrote:
 Jeremie Pelletier wrote:
 Me neither, in fact I would *love* to see a -nrtti switch in DMD to
 disable the generation of all ClassInfo and TypeInfo instances, along
 with a version identifier, maybe "version = RTTI_Disabled;" to let
 code handle it.

 I use RTTI a lot for simple debugging like printing the name of a
 class or type in generic code or meta programming, but not at all in
 production code. Most of the time I can rely on .stringof and a
 message pragma to do the same.
You use RTTI for dynamic casts, variadic functions, and the default implementation of toString. You could safely eliminate some fields from ClassInfo and TypeInfo, but you can't get rid of them entirely. The best you can do is make TypeInfo entirely opaque (no fields) and only include the base class, interfaces, and name for ClassInfo.
Yeah something like "don't generate type names" and other extra informations would be a definive plus, that makes reverse engineering too easy :)
I've often thought that a pragma for a module to "don't generate module info" would be very useful for executable size. I'm particularly thinking of bindings like the Win32 headers, where there are a hundred modules, and the module info isn't actually useful. There could be a default ModuleInfo instance, with module name "ModuleInfoUnavailable", which all such modules would point to.
One thing that can trip this up is structs containing floating point numbers or static arrays, since they have custom initializers. I've taken to declaring structs from C headers with an "= void" to eliminate the link dependency, but maybe the initializer could be eliminated by declaring the struct as: struct S { char c = 0; float[2] f = 0.0[]; } Or something like that.
Oct 05 2009
prev sibling next sibling parent sclytrack <idiot hotmail.com> writes:
Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<
 D may even enforce this, allowing "out" only after "in" arguments.
 -------------------
Function Default Arguments void foo(int x, int y = 3) { ... } ... foo(4); // same as foo(4, 3)
Oct 04 2009
prev sibling next sibling parent reply =?UTF-8?B?IkrDqXLDtG1lIE0uIEJlcmdlciI=?= <jeberger free.fr> writes:
bearophile wrote:
 I have found this page linked from Reddit (click "Toggle all summaries"=
at the top to read the full page):
 http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
=20
 At Google C++ isn't the most used language, so it may be better to use =
a C++ style guide from a firm that uses C++ more than Google. On the othe= r hand Google has hired many good programmers, and probably some of them = have strong C++ experience, so if you are interested in C++/D this style = guide deserves to be read.
=20
 This guide is mostly (as it often happens with C++) a list of features =
that are forbidden, I think usually to reduce the total bug count of the = programs. Some of such imposed limits make me a little nervous, so I'd li= ke to remove/relax some of those limits, but I am ignorant regarding C++,= while the people that have written this document are expert, so their ju= dgement has weight.
=20
 They forbid several features that are present in D too. Does it means D=
has to drop such features (or make them less "natural", so the syntax di= scourages their use)?
=20
 Here are few things from that document that I think are somehow interes=
ting. Some of those things may be added to D style guide, or they may eve= n suggest changes in the language itself.
=20
 -------------------
=20
 Function Parameter Ordering: When defining a function, parameter order=
is: inputs, then outputs.<
=20
 D may even enforce this, allowing "out" only after "in" arguments.
=20
I actually use the inverse convention: "out" arguments come first.=20 This way, it is easy to see that "a =3D b" and "assign (a, b)" modify=20 "a" and not "b". Jerome --=20 mailto:jeberger free.fr http://jeberger.free.fr Jabber: jeberger jabber.fr
Oct 04 2009
parent reply Justin Johansson <no spam.com> writes:
Jérôme M. Berger Wrote:

 bearophile wrote:
 I have found this page linked from Reddit (click "Toggle all summaries" at the
top to read the full page):
 http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml
 
 At Google C++ isn't the most used language, so it may be better to use a C++
style guide from a firm that uses C++ more than Google. On the other hand
Google has hired many good programmers, and probably some of them have strong
C++ experience, so if you are interested in C++/D this style guide deserves to
be read.
 
 This guide is mostly (as it often happens with C++) a list of features that
are forbidden, I think usually to reduce the total bug count of the programs.
Some of such imposed limits make me a little nervous, so I'd like to
remove/relax some of those limits, but I am ignorant regarding C++, while the
people that have written this document are expert, so their judgement has
weight.
 
 They forbid several features that are present in D too. Does it means D has to
drop such features (or make them less "natural", so the syntax discourages
their use)?
 
 Here are few things from that document that I think are somehow interesting.
Some of those things may be added to D style guide, or they may even suggest
changes in the language itself.
 
 -------------------
 
 Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<
D may even enforce this, allowing "out" only after "in" arguments.
I actually use the inverse convention: "out" arguments come first. This way, it is easy to see that "a = b" and "assign (a, b)" modify "a" and not "b". Jerome
Ditto. A special use case to consider is when you have a function template that returns a type that is a template parameter and the types of the function arguments are also template parameters. Often type inference can be used to determine the type of the function arguments without explicit qualification of the argument types in the instantiation. The return type must be specified however, since inference cannot be made from missing information. This suggests a natural order that results (and out arguments) should be on the LHS and (in) arguments on the RHS. So if one writes this: R Foo(R, A1, A2)( A1 arg1, A2 arg2) { R r; return r; } auto r = Foo!(double)( 3, 4); Isn't it more natural or consistent to write this also: void Bar(R, A1, A2)( out R r, A1 arg1, A2 arg2) { } double r; Bar!(double)( 3, 4); I haven't tried it so not sure if this works but you get the idea. Another reason why outs/inouts should be before in arguments is in the case of functions taking variable length argument lists or variadic arguments. Normally there is only one output argument but there is an arbitrary number of input arguments that the function can take. Yet another reason why so is by analogy with output stream functions; an output stream argument is analogous to an output value or reference. Nearly all I/O libraries that I've seen have usage like this: fprintf( stdout, /+args...+/); write( os, value); Rarely the other way around, namely, input arguments before output stream/file channel argument. -- Justin Johansson
Oct 04 2009
parent bearophile <bearophileHUGS lycos.com> writes:
Justin Johansson:

 The return type must be specified however,
 since inference cannot be made from missing information.
If the information isn't missing in D2 you can sometimes use "auto" return type for function templates and some functions, and in some other situations you can also use typeof(return). Bye, bearophile
Oct 04 2009
prev sibling next sibling parent reply Kagamin <spam here.lot> writes:
bearophile Wrote:

Function Parameter Ordering: When defining a function, parameter order is:
inputs, then outputs.<
D may even enforce this, allowing "out" only after "in" arguments.
I'm trying to do the reverse. Maybe I used fprintf and sprintf too much.
Static and Global Variables: Static or global variables of class type are
forbidden: they cause hard-to-find bugs due to indeterminate order of
construction and destruction. [...] The order in which class constructors,
destructors, and initializers for static variables are called is only partially
specified in C++ and can even change from build to build, which can cause bugs
that are difficult to find. [...] As a result we only allow static variables to
contain POD data.<
I think D avoids such problem.
No. D has static constructors which do the same.
Doing Work in Constructors: Do only trivial initialization in a constructor. If
at all possible, use an Init() method for non-trivial initialization. [...] If
the work calls virtual functions, these calls will not get dispatched to the
subclass implementations. Future modification to your class can quietly
introduce this problem even if your class is not currently subclassed, causing
much confusion.<
Never understood this advice to split the construction of object? What is it trying to solve? And how they plan to not dispatch calls to subclasses? Do they overwrite vtbl at the end of constructor? In fact DMD has bug here: spec says, this pointer must not be taken implicitly or explicitly, yet dmd allows calling virtual methods on the object being constructed.
Declaration Order: Use the specified order of declarations within a class:
public: before private:, methods before data members (variables), etc.<
D may even enforce such order (Pascal does something similar).
Methods before data seems unnatural for me.
Decision: If you want to overload a function, consider qualifying the name with
some information about the arguments, e.g., AppendString(), AppendInt() rather
than just Append().<
This is a strong limitation. One of the things that makes C++ more handy than C. I accept it for normal code, but I refuse it for "library code". Library code is designed to be more flexible and reusable, making syntax simpler, etc. So I want D to keep overloaded functions.
A good example is BinaryWriter. It's unusable when implemented with overloaded methods.
Default Arguments: We do not allow default function parameters.<
Decision: We require all arguments to be explicitly specified, to force
programmers to consider the API and the values they are passing for each
argument rather than silently accepting defaults they may not be aware of.<
Is it a solution? Default parameters can be emulated by overloads with different number of parameters, which call actual method with defaults for the rest of the parameters. They just propose to always use the full api? How about going back to asm to consider your code rather than accepting compiler magic?
 Integer Types:
 
You should not use the unsigned integer types such as uint32_t, unless the
quantity you are representing is really a bit pattern rather than a number, or
unless you need defined twos-complement overflow. In particular, do not use
unsigned types to say a number will never be negative. Instead, use assertions
for this.<
I'm for the removal of size_t from everywhere it's not stricly necessary (so for example from array lenghts) to avoid bugs.
Yess, unsigneds are evil. They must go to the camp of gotos and unsafe pointers.
 Type Names: often I don't like the C++ practice of using a single uppercase
letter for a template type, like T. Better to give a meaningful name to types,
when possible.
 
I thought it's a common practice that the length (meaningfulness) of the name of a variable is determined more by the size of its scope rather than its purpose.
Spaces vs. Tabs: Use only spaces, and indent 2 spaces at a time.<
4 spaces are more readable :-)
I prefer 3. 4 is too much. Almost every editor has the option to specify the tab width and people have different tastes.
Oct 05 2009
next sibling parent Kagamin <spam here.lot> writes:
Kagamin Wrote:

 In fact DMD has bug here: spec says, this pointer must not be taken implicitly
or explicitly, yet dmd allows calling virtual methods on the object being
constructed.
A... I've misread the spec a little. Though I think, it's still a problem that constructor allows to call virtual methods.
Oct 05 2009
prev sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Kagamin:

 I'm for the removal of size_t from everywhere it's not stricly necessary (so
for example from array lenghts) to avoid bugs.
Yess, unsigneds are evil. They must go to the camp of gotos and unsafe pointers.
In D it's better to not use them when you want a strictly positive number, or for general iteration purposes, etc. So I don't like to see them used in the built-ins and std lib where they aren't necessary. I use them when I need bitfields, or when I need the full range (but that's less common). If you want me to list something that's a little evil, is the automatic silent cast from an integral to its unsigned version. I'd like to disallow such silent Regarding pointers, they are unsafe, but there are ways to increase their safety a little, with no performance costs in release mode. I think this is positive because it helps find and fix bugs in less time. Bye, bearophile
Oct 05 2009
parent bearophile <bearophileHUGS lycos.com> writes:
If you want me to list something that's a little evil, is the automatic silent
cast from an integral to its unsigned version. I'd like to disallow such silent

We may even disallow all implicit conversions that lose a significant amount of information: double => float real => float And maybe even (but this is less handy, so I am not sure): real => double ------------------------ Even long => real sometimes loses a little information: import std.stdio: writeln; void main() { real r = long.min; writeln(r, " ", cast(long)r, " ", long.max-cast(long)r); } But for now I'm not interested in regulating long => real implicit casts. Bye, bearophile
Oct 05 2009
prev sibling parent bearophile <bearophileHUGS lycos.com> writes:
Jeremie Pelletier:

 However we're 
 talking systems programming here, people want the choice between using 
 the feature or not using it :)
We aren't talking about a feature here, but a standard syntax to denote class attributes. And D being a system language has nothing to do with being free to take such kind of choices. A system language has to give freedom in how you use memory or how you use the CPU at runtime, it has nothing to do to the syntax you use to write identifiers. Such freedom isn't required. Bye, bearophile
Oct 05 2009