D - array bounds checking

Walter (6/6) Jan 19 2002 I've been working on implementing it. After turning it on and recompilin...

H. Ellenberger (6/13) Jan 19 2002 No surprise for people with experience in Topspeed Modula-2 which had

Walter (8/21) Jan 19 2002 recompiling

D (8/29) Feb 04 2002 Bouinds checking will eventually be seen as essential for both array and

Pavel Minayev (7/13) Feb 04 2002 "Bounded pointer" - i.e. a pointer that knows size of data
Walter (6/12) Feb 04 2002 Already in D! Well, a software version anyway.

D (5/18) Feb 04 2002 Great. I take it that all pointer increments and assignments are checke...

Walter (8/33) Feb 04 2002 thrown

D (5/13) Feb 06 2002 Then it's not a pointer is it?

Pavel Minayev (5/6) Feb 06 2002 IT IS A POINTER THAT CANNOT BE PERFORMED POINTER MATH ON.

D (12/18) Feb 07 2002 No paul. Apparently it is not a pointer. The array index is compared

Walter (5/11) Feb 07 2002 the

D (9/20) Feb 07 2002 Walter, in most cases people use pointers rather than arrays.

Walter (6/12) Feb 07 2002 Pointers are necessary for, if nothing else, compatibility with C API's.

D (494/498) Feb 09 2002 Your changes to the C language are too minimal for D to survive as a

Richard Krehbiel (9/20) Feb 08 2002 recomputing

Russell Borogove (5/28) Feb 08 2002 It depends on the exact architecture of course, and I think
Walter (7/13) Feb 08 2002 addressing

D (17/30) Feb 09 2002 First. Not all cpu's have the ability to perform op(add(add(shift))) wi...

Roberto Mariottini (5/13) Jan 21 2002 recompiling

Russell Borogove (7/10) Jan 19 2002 And naturally you immediately added code to the test suite

"Walter" <walter digitalmars.com> writes:

I've been working on implementing it. After turning it on and recompiling
the library and test code, it tripped and found 3 bugs in the regexp
implementation - code that I have a nice test suite for that was passing.

Just goes to show, array bounds checking really is valuable! And being able
to turn it off for performance code is why D is better than other languages
offering bounds checks.

Jan 19 2002

"H. Ellenberger" <ele1 gmx.ch> writes:

Walter wrote:

 I've been working on implementing it. After turning it on and recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was passing.

 Just goes to show, array bounds checking really is valuable!

No surprise for people with experience in Topspeed Modula-2 which had
this RT check many years ago.

 And being able
 to turn it off for performance code is why D is better than other languages
 offering bounds checks.

My experience showed that a good implementation in most cases does not slow
down too much, so I often left all checks (array bounds, overflow, NIL pointer)
on, except for well tested library functions.

Jan 19 2002

"Walter" <walter digitalmars.com> writes:

"H. Ellenberger" <ele1 gmx.ch> wrote in message
news:3C49E692.C9DE8997 gmx.ch...
 Walter wrote:
 I've been working on implementing it. After turning it on and


recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was


passing.
 Just goes to show, array bounds checking really is valuable!

 No surprise for people with experience in Topspeed Modula-2 which had
 this RT check many years ago.
 And being able
 to turn it off for performance code is why D is better than other


languages
 offering bounds checks.

 My experience showed that a good implementation in most cases does not

slow
 down too much, so I often left all checks (array bounds, overflow, NIL

pointer)
 on, except for well tested library functions.

I figure by making it optional,  any objections to it should be addressed.

Jan 19 2002

"D" <s_nudds hotmail.com> writes:

Bouinds checking will eventually be seen as essential for both array and
pointer operations.

A secondary bounded pointer type should be defined for that purpose.  It
should be a composite type, and it should be targeted so that it is
implemented in hardware, and throws an error when an access is attempted
outside the bouinded range.

Walter <walter digitalmars.com> wrote in message
news:a2d4kq$1tdi$1 digitaldaemon.com...
 "H. Ellenberger" <ele1 gmx.ch> wrote in message
 news:3C49E692.C9DE8997 gmx.ch...
 Walter wrote:
 I've been working on implementing it. After turning it on and


 recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was


 passing.
 Just goes to show, array bounds checking really is valuable!

 No surprise for people with experience in Topspeed Modula-2 which had
 this RT check many years ago.
 And being able
 to turn it off for performance code is why D is better than other


 languages
 offering bounds checks.

 My experience showed that a good implementation in most cases does not

 slow
 down too much, so I often left all checks (array bounds, overflow, NIL

 pointer)
 on, except for well tested library functions.

 I figure by making it optional,  any objections to it should be addressed.

Feb 04 2002

"Pavel Minayev" <evilone omen.ru> writes:

"D" <s_nudds hotmail.com> wrote in message
news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.  It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is attempted
 outside the bouinded range.

"Bounded pointer" - i.e. a pointer that knows size of data
it points to - is a D dynamic array:

    int[] a = new int[5];
    ...
    b = a[10];    // throws ArrayBoundsError

Feb 04 2002

"Walter" <walter digitalmars.com> writes:

"D" <s_nudds hotmail.com> wrote in message
news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.  It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is attempted
 outside the bouinded range.

Already in D! Well, a software version anyway.

Example: convert a pointer p into a "bounded pointer" bp:

char *p;
char[] bp = p[0..len];

Feb 04 2002

"D" <s_nudds hotmail.com> writes:

Great.  I take it that all pointer increments and assignments are checked
against the upper and lower bounds of the array, and an exception is thrown
if the range is violated?

Walter <walter digitalmars.com> wrote in message
news:a3lp86$v31$1 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.  It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is attempted
 outside the bouinded range.

 Already in D! Well, a software version anyway.

 Example: convert a pointer p into a "bounded pointer" bp:

 char *p;
 char[] bp = p[0..len];

Feb 04 2002

"Walter" <walter digitalmars.com> writes:

"D" <s_nudds hotmail.com> wrote in message
news:a3ni7d$29f5$1 digitaldaemon.com...
 Great.  I take it that all pointer increments and assignments are checked
 against the upper and lower bounds of the array, and an exception is

thrown
 if the range is violated?

Not exactly. You don't increment dynamic arrays, but you do increment the
index.

 Walter <walter digitalmars.com> wrote in message
 news:a3lp86$v31$1 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array



and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.



It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is



attempted
 outside the bouinded range.

 Already in D! Well, a software version anyway.

 Example: convert a pointer p into a "bounded pointer" bp:

 char *p;
 char[] bp = p[0..len];

Feb 04 2002

"D" <s_nudds hotmail.com> writes:

Then it's not a pointer is it?

As I said, I recommend implementing a bounded pointer type.

Walter <walter digitalmars.com> wrote in message
news:a3nvlr$2f0q$3 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3ni7d$29f5$1 digitaldaemon.com...
 Great.  I take it that all pointer increments and assignments are


checked
 against the upper and lower bounds of the array, and an exception is

 thrown
 if the range is violated?

 Not exactly. You don't increment dynamic arrays, but you do increment the
 index.

Feb 06 2002

"Pavel Minayev" <evilone omen.ru> writes:

"D" <s_nudds hotmail.com> wrote in message
news:a3r373$28kk$1 digitaldaemon.com...
 Then it's not a pointer is it?

IT IS A POINTER THAT CANNOT BE PERFORMED POINTER MATH ON.

I always wondered, why do C geeks never consider something
that cannot be moved, a pointer?

Feb 06 2002

"D" <s_nudds hotmail.com> writes:

No paul.  Apparently it is not a pointer.  The array index is compared
against the array bounds.
That means array references are performed the typical way, by recomputing
the pointer from the index
after an index change.  That reqires a multiply and add before at least the
first reference after an index change.

A pointer is different. A poitner poitns to an area of memory.
An index is not a pointer.

Apparenlty D doesn't have a bounded pointer type.
I recommend one be added.

Pavel Minayev <evilone omen.ru> wrote in message
news:a3r4qu$29ai$1 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3r373$28kk$1 digitaldaemon.com...
 Then it's not a pointer is it?

 IT IS A POINTER THAT CANNOT BE PERFORMED POINTER MATH ON.

 I always wondered, why do C geeks never consider something
 that cannot be moved, a pointer?

Feb 07 2002

"Walter" <walter digitalmars.com> writes:

"D" <s_nudds hotmail.com> wrote in message
news:a3tuq7$qtk$1 digitaldaemon.com...
 No paul.  Apparently it is not a pointer.  The array index is compared
 against the array bounds.
 That means array references are performed the typical way, by recomputing
 the pointer from the index
 after an index change.  That reqires a multiply and add before at least

the
 first reference after an index change.

In most cases, the multiply and add are done in hardware in the addressing
mode calculation and do not add any execution time.

Feb 07 2002

"D" <s_nudds hotmail.com> writes:

Walter, in most cases people use pointers rather than arrays.
If yoiu believe that arrays can take the place of pointers, then drop
pointers from the language spec.

Further, delimited pointers would restrict access within structures as well

Do you wish to fix the legion of problems in C or not?  If not, what is the
point of D?.


 "D" <s_nudds hotmail.com> wrote in message
 news:a3tuq7$qtk$1 digitaldaemon.com...
 No paul.  Apparently it is not a pointer.  The array index is compared
 against the array bounds.
 That means array references are performed the typical way, by


recomputing
 the pointer from the index
 after an index change.  That reqires a multiply and add before at least

 the
 first reference after an index change.


Walter <walter digitalmars.com> wrote in message
news:a3ufcm$1nvu$2 digitaldaemon.com...
 In most cases, the multiply and add are done in hardware in the addressing
 mode calculation and do not add any execution time.

Feb 07 2002

"Walter" <walter digitalmars.com> writes:

"D" <s_nudds hotmail.com> wrote in message
news:a3vjlf$1ah0$1 digitaldaemon.com...
 Walter, in most cases people use pointers rather than arrays.
 If yoiu believe that arrays can take the place of pointers, then drop
 pointers from the language spec.

Pointers are necessary for, if nothing else, compatibility with C API's.

 Further, delimited pointers would restrict access within structures as

well
 Do you wish to fix the legion of problems in C or not?  If not, what is

the
 point of D?.

D addresses the worst of the problems with C.

Feb 07 2002

"D" <s_nudds hotmail.com> writes:

 Do you wish to fix the legion of problems in C or not?  If not, what is

 the
 point of D?.


Walter <walter digitalmars.com> wrote in message
news:a3vsm5$1e1s$1 digitaldaemon.com...
 D addresses the worst of the problems with C.

Your changes to the C language are too minimal for D to survive as a
language.

I have just read with great interest the overview of your "D" language,
contained within the distribution file dmdalpha.zip and thought I would add
my 2 cents.

Well, ok.... 27K of uncommon sense.

Let me start by saying that I loath both C and C++.  These languages are
abominations, abortions, unworthy of existence, a pox on the earth, etc...
C fails on so many levels that I don't know where to begin.  Even the
standard I/O library is an abomination....

Consider the function "GETS".  It takes a special kind of moron to write
such a function.  It takes a special class of mindless wonders to actually
decide to incorporate such a piece of filth into a "standard" library.
Those who wrote included Gets in the first place, and those who voted to put
it in the standard simply are too stupid to justify their continued
existence.  Even torture and death is an inadequate punishment for their
criminal stupidity.  Words can not convey my loathing for these worthless
vermin.

However having got that off my chest, C/C++ still do have some nice
characteristics. And while the languages themselves are also unworthy of
existence, they can be salvaged.  I long for a day when C is replaced by a
similarly featured, language that solves most of it's ample list of
failings.

I come from an assembly language background so I am very familiar with how
machines operate at the lowest levels.  I'm intimately familiar with
pointers, and register sets, memory allocation, and rolling my own functions
and syntax as needed.

I appreciate the utility of high level languages, although I am turned off
by the pathetic level of optimizations they provide. To this day, compilers
still produce code that is 2 to 4 times larger and 2 to 4 times smaller than
I can produce by hand.  Of course they can produce it a zillion times faster
than I can so in the current environment that is the more significant
factor.  I am also turned off by the lack of low level control that most
high level languages provide.

C/C++ strike a minimally acceptable balance between low level capability and
high level convenience.

Having said that, I recognize that C/C++ fail to provide adequate low level
control for many common low level functions.

These limitations are mostly a result of K&R (may they burn eternally in
hell), defining a language that appeals to the lowest common denominator
among all machines.  IE. makes no assumptions about register sizes or the
manner in which variable data is represented.  Variables are considered as
simply abstractly numeric and not necessarily stored internally as a set of
binary bits.

Given that C/C++ is used to code I/O drivers, and write other very low level
code, the lack of concept of "bit" in the language necessarily makes these
programs non-compliant with proper C coding standards, and hence
non-portable.  It was foolish of K&R to create a median level language for
low level coding while defining it in such a manner as to make the
production of low level code, outside the sanctioned scope of the language.

Insanity!

As you observe, C/C++ can't be fixed.  At least not while remaining anywhere
near compatible.  It needs to be replaced.  You have my vote for "D".  It's
a step in the right direction.

By far your best decision has been to abandon the concept of code
compatibility with C/c++.  The need to maintain code compatibility with C
clearly placed nasty constraints on the development of C++, and as you
observe the language "standard" is vastly too convoluted to manage let alone
patch.

As I implied above, in my view, C/C++ is a very poor language because it's
underlying core philosophy is often contradictory and simply wrong headed.
Consistency is replaced by special cases, and irrational behaviour and
inconsistent nomenclature.  Not only does this cause the standard to become
bloated and confused, but it also makes the language more difficult to use,
and the resulting programs more error prone.

There was a time when programs were written like spaghetti.  Programmers, it
was observed, didn't have the self decipline to follow rational coding
standards.  As a result, the structured programming paradigm was invented in
order to provide a consistent approach to writing programs.  The result was
higher code quality, greater code simplicity and less error prone software.

The same argument is at the heart of the object model of programming.
Programmers, it was observed, didn't have the self decipline to follow
rational coding standards and properly encapsulate the functionality of
their programs.  The Object programming paradigm, imposes for the most part
the "correct" behaviour on the programmer, or at least points him/her in the
proper direction.  Encapsulation is now foremost on everyone's mind.

Odd, I've been encapsulating Assembler functions since before the days of
C++.  The justification for encapsulation was clear.  The method was generic
and applicable to many other languages.  But not easily applicable to C,
because the language is so poorly defined.

The purpose of any language above assembler is two fold.  First, to make it
easier for programmers to write code, and second, to make it more probable
that the code produced will properly do it's intended job.

Portability is often raised as a reason for the existence of high level
languages, but this is not a legitimate argument since low level languages
can be made to be just as portable as high level ones.  Java byte code and
other languages that generate intermediate code or Pcode have repeatedly
proven this.  JIT compilers are the final nail in the coffin.  Portability
is a feature of any language, and not as it is so often claimed by members
of the C/C++ religions, a characteristic of high level languages.

Ease of use and program correctness are related of course.  One pretty much
implies the other.  When a language is easy to use, it will typically be
easier to spot programming errors.  The opposite is also true.  When it is
difficult to write correct programs, a language is generally difficult to
use.  It's just common sense.

C is poor because it's subtle behavioural anomalies and inconsistencies make
it difficult to follow it's code.  Type conversions are often hidden,
assumptions must often be made about a variable's size and internal
representation which are outside of the language's specifications etc.

C is poor because it relies on the conscious effort of the programmer to
avoid hidden or concealed pitfalls, rather than avoiding them in the first
place though proper ergonomic language design.

The C philosophy is contradictory since as a higher level language, it's
reason for existence is convenience.  But it then burdens the programmer
with it's own irrational and inconsistent behaviours making the language
inconvenient and error prone.

At it's very core the C philosophy is contradictory.  It is a structured
language, (clearly K&R recognized the superiority of structure in the
creation of well written programs).  But having recognized this, K&R then
foolishly refused to implement many important structural features that are
highly desirable, and trivially easy to implement.

Consider that C omits block typing.  Those structured languages that don't
descend from C, typically identify different kinds of blocks with different
bracketing keywords.  Do/While, If/Endif, Repeat/Until etc.

When reading a block typed program, if the programmer sees the word "until",
he/she knows that the statement ends a "repeat" block.

With C on the other hand the same delimiters are used for all blocks.  The
bracket that ends a C block could be the end of an if block or any other
kind of block.  The programmer is forced when reading the code, to search
back through the program to find which block is being closed.   For
convenience the strategy to combat this language failing is to progressively
indent blocks so that the start of a block can easily be identified.  And of
course sicne C is a free format language, we immediately run into the
problem of people having different indention styles that typically make
following code written by another person difficult.

The indent requirement is utter lunacy.  No language should be able to have
it's meaning be made essentially human unreadable through a simple loss of
indentation.

Insanity!


The C philosophy is that the language should be self documenting, yet the
language provides inadequate syntactic guards, redundancy and signposts that
would make it self documenting.

Specifically the omission of keywords like "then" after "if" beg the
question "if what?"

Yes, in an "if then" statement the keyword "then" is a redundancy.
Redundancy is a very good thing.

Natural language is full of redundancy.  It exists for a purpose.  If this
were not so, it would have been evolved out of the language.

Redundancy is maintained through the evolution of a language for a reason.
The reason is that it facilitates the conveyance of the intended meaning of
what is said.  A person may miss, or misinterpret one key idea in a
sentence, but through the language redundancy will often be guided to the
intended meaning.

Practical redundancy to enhance readability is a very good thing.  Syntactic
minimalism is a wish for the ignorant, and is be avoided in any rationally
defined languages.  APL is proof enough of that.

The more cryptic a language the more likely errors will be made.  The C
"religion" accepts that C is superior to assembler because the C syntax is
more readable and more convenient to use than assembler.  I agree.  However,
C dogma then goes on to reject alterations in the syntax that would provide
further benefits to readability and every other language convenience on the
grounds that they don't add any value to the language itself.  Clearly this
dogma is false.

I strongly urge you to move further away from the C syntax, and implement
block typing. If Then/End if, While/Whend, Do/Loop, SelectCase/EndCase,
Begin/End.

Note that I say SelectCase rather than Switch.  The word switch has no
contextual connection to the operation it identifies in C/C++.  In the real
world, a switch is a binary device.  It is either on or off.  In the real
world a switch does not allow the selection of multiple settings.  Dials,
and other kinds of SELECTORS do that kind of thing, and when they make
discrete choices, they are composed of <separate> "switches".
The keyword "switch" should therefore be changed to something meaningful,
something that describes what is being done.  How about using "Select"?
Seems to work for other languages.

The "switch" statement also has the unfortunate characteristic of not
allowing multiple comparisons for each case.  Also unfortunate is the need
for the "break" keyword to exit the statement.

Switch would be syntactically cleaner if it allowed multiple entries, and
the break keyword is removed.

Select (expression)
   Case a,b,c;
   Case d,e,f;
   default;
end select;

Rather than

switch (expression) (
   case a;
   case b;
   case c;
   break;
   case d;
   case e;
   case f;
   break;
   default;
);

The scope rules for C and C++ are also inconsistent.  All variables inside a
block are visible only within the block.  All variables inside a procedure
are visible only within the procedure.  C is inconsistent in that variables
and functions that are inside a translation unit or module, (whatever you
wish to call it), have global scope by default.

This behaviour is also plainly irrational.

Modules, should present <NO> external linkages at all unless those linkages
are explicitly indicated.  With the default behaviour set to make all module
level definitions global in scope, the programmer is forced to perform extra
work in order to code properly.  The unweary end up wasting compile
resources and throwing away potential optimizations when procedures are left
to the default global scope.

Insanity!  In order to minimize error and make a language convenient,
default behaviour should be that behaviour that is most consistent with good
programming practice.

C's method of defining variables is also fundamentally flawed. Not in the
syntax used, but in the manner in which the variables are sized.  C/C++
provides no clean and rational mechanism for programmers to know if the
variables they will be using are of adequate size or type to hold the data
they wish to represent.

C dogma holds that either the programmer write to the lowest common
denominator, or
use conditional compilation to select appropriate sized variables for the
task at hand.

That's not portability.  It's Insanity!

It is an extreme burden for the programmer to constantly have to worry if
his program is going to run correctly because the language may have altered
the size or type of his variable in some environments.  How many subtle
programming errors can be introduced into a program when it's integer size
be altered from 8 to 32 bits?  Or more catastrophically the opposite?

The proper way to solve the problem is to provide the programmer with a set
of variable types that are guaranteed to be implemented in the language.  It
is the duty of the language to conform to the programmers demands.  If the
programmer wishes to use an 8 bit integer, and it is a sanctioned size, it
is the obligation of the language to synthesize one, out of the given CPU
registers even if the size is not natively supported.

Undefined variable sizes produce errors.

Compilers are often burdened to provide support for floating point types on
CPU's that don't have floating point variables.  On some CPU's floats must
be synthesized out of raw integers.  The programmer need not concern himself
as to weather the target CPU actually has native register support for these
variables.  Integers should be no different.

It is not relevant if the underlying CPU or environment supports 8 bit
integers or not. Such variables can be constructed with 16 or 32 bit
registers as quite easily.  Program correctness must trump efficiency.  In
the case of variable definitions, C places efficiency above program
correctness.

Insanity!

For D, your decision to forget about 8 bit CPU's will limit this problem to
some extent, but as a practical matter, integer variables are never going to
exceed 128 bits in size, and number representation is not going to stray
from typical binary representation found in modern CPU's.   So, this places
reasonable limits on the number of variable types that must be supported.

Some machines will fail to conform.  Too bad, those machines shouldn't
exist.

Rather than providing a set of integer variables of unknown size, I strongly
recommend that you define a set of integer variables of fixed size and sign
type.

I would suggest the following...

Byte   (unsigned 8 bit)
SByte  (Signed 8 bit)
Word   (unsigned 16 bit)
SWord  (signed 16 bit)
Dword  (unsigned 32 bits)
SDword (Signed 32 bits)
Qword  (unsigned 64 bits)
SQword (Signed 64 bits)
0Word  (Unsigned 128 bits)
SOWord (Signed 128 bits)

or ...

int8    (unsigned 8 bit)
Sint8   (Signed 8 bit)
int16   (unsigned 16 bit)
sint16  (signed 16 bit)
int32   (unsigned 32 bits)
sint32  (Signed 32 bits)
int64   (unsigned 64 bits)
sint64  (Signed 64 bits)
int128  (Unsigned 128 bits)
sint128 (Signed 128 bits)

Lets have no optional behaviour in the language spec. None of this - a
character is an 8 bit signed number in some machines, and an unsigned number
in others - nonsense.

Another aspect of variable definition that should be changed is the names of
the types themselves.  Again the problem is ambiguity.  Ambiguity breeds
error.

Single.  Single what?    "flapjack" has just as much meaning.
Double.  Double what?    Double the flapjack of course.  How about "tall"?
Long.    Long what?      How long is a long?  Shouldn't "dint" be "short"?

With the integer data types shown previously there is no ambiguity.

I see nothing wrong with...

Float1
Float2
Float3

or

FSingle
FDouble
FExtra

As long as the floating point nature of the variable, if not it's absolute
size are made abundantly clear.

Specifying pointer sizes is clearly an issue.  As a practical matter no CPU
that I know of has more than one pointer size, so not specifying the size in
bits is a legitimate option.  It does however raise the issue of how the
program can ensure that a pointer subtraction can be contained within a
register of known size.

Fortunately pointer subtraction doesn't occur very often so in my view a
typedef based on variable size seems the most pragmatic and acceptable
solution.


C also fails when it comes to character types as it does not specify if they
are signed or unsigned.  defining Char and SChar, WChr and SWChar will solve
that problem for both ASCII and wide Unicode charactes.

But why should characters be treated as numbers at all when they are
characters. Doing so is a rejection of type definition itself.  Rational
languages are strongly typed and conversion can be indicated through
explicit casting.

While I am on the subject of variables, I observe that very many programming
errors in C are caused by improper array bounds checking.  I have long
thought that a secondary "secure" pointer type should be implemented that
places bounds on the pointer's value. This would be a complex type that
contains not only the pointer itself, but the upper and lower bounding
values.

Of course you can implement bounded pointers through overloading, but who
actually does it?  Using a composite variable is also inefficient when
implemented in software by the compiler.  Ideally this type should be
implemented in hardware so that bounding tests can be performed in parallel
as the content of the pointer register is changed.

Why this kind of thing isn't already implemented in hardware is beyond me.
It's not needed I guess.  All those buffer overflow problems that are the
source of 90% of all the security exploits must be figments of my
imagination...

Insanity!

In C and C++ /*Comments*/ can't nest.  I've never understood the
justification for such a limitation.  Certainly the preprocessor strips out
the comments before the compiler sees them, so it should have been trivial
to simply have the pre-processor use a counter rather than a boolean to
determine if it was inside a comment.

C/C++ are free format languages, and there is nothing wrong with that.

However, it is often argued by C religionists, that the mandatory use of
semicolons are required if the language is to remain free format.  This is
clearly false.


their level of experience.  Why not remove their requirement?

This should be as simple as defining <EOL> to be equivalent to a semicolon,
and then defining a line continuation character that causes the next
occurrence of <EOL> to be ignored.  Semicolons can be reserved for putting
multiple statements on one logical line.

begin
  dint a,b,c
  c = b + a; b = a + c
end

Rather than...

{
  dint a;
  dint b;
  dint c;
  c = b + a;
  b = a + c;
}

a = FtnCall(VariableA, VariableB, _
            VariableC, VariableD)

Rather than...

a = FtnCall(VariableA, VariableB,
            VariableC, VariableD);



Your choice in "D" is to remove operator overloading from the language spec.
Excellent, I concur with you that the ability to redefine operators causes
many more problems than it solves.  However, having the ability to use
operator notation does greatly simplify the coding of equations involving
complex numbers etc.

The idea of converting function calls to an operator syntax is a good one.
C (and other oop languages) simply take the wrong tact.  Rather than
overload the existing operators, it is a much better idea to allow the
creation of new operators.
New operators should all have equal precedence.

Many languages have the facility to implement new operators rather than
simply overloading the existing ones.  If properly used this can provide a
means of operator typing and therefore improve code clarity.  You could for
example write statements such as...

c .c= a .c+ b

Where <.c=> is defined as complex assignment
      <.c+> is defined as complex add

Recc .Rec= Reca .Rec_Name_Greater Recb

Where .Rec= is defined as record assignment
      .Rec_Name_Greater  is defined as a comparon between named portions of
two records.

Once identified, the above syntax can easily be converted to a function like
synatx by a preprocessor as follows.

Define Operator Rec_Name_Greater = as type Rec Rec_Name_Greater(type Rec,
type Rec)

The precompiler would then translate the statement ..

Recc .Rec= Reca .Rec_Name_Greater Recb

into the call ...

Recc = Reg_Name_Greater(Reca,Recb)

This implies that the compler should be smart enough to be able to
automatically handle assignment to complex data types via a block move as
necessary.

It also implies that the internal organization of structures be defined
within the language.

Defining the internal variable arrangement of structures poses difficulties
for language portability and efficiency.  I would suggest that there be two
types of structures.  One defined for portability between platforms, and one
for internal use with the stipulation that the one defined for portability
between platforms be used only where structures are necessarily shared.

Since data formats are assumed to be binary and in the form of 8, 16, 32, 64
or 128 bits and strings, it is necessary to worry about endian formats, both
bitwise and bytewise. The most convenient way to do this would be to allow
the definition of new big endian/little endian variable types within
structures.  The best way to do this would be to define the types as
"overrides" in the structure type definition itself.

I.E.

new iotype wombat BigBitEndian, BigByteEndian
   int8  a ,b ,c
   int16 a1,b1,c1
   int32 a2,b2,c2
   ...
end type

iotypes should not be allowed to support any operation other than
assignment.  This will force the immediate conversion of such types to the
native format used by the machine.

Again, coming from a machine language environment, I prefer to keep things
as compact as possible.  To that end, I often define one integer of storage
space and use it to hold multiple boolean flags.

const wombat  equ 0x0000000000000001
const wombat1 equ 0x0000000000000010
const wombat2 equ 0x0000000000000100
...
WombatFlg     dw  0x0000000000000000

You can do similar things in most other languages of course, but in every
case that I know of you end up with a whopping number of similarly named and
unassociated constants that may have the same numerical values.  Not good.

I would like to have the ability to associate a constant with a particular
data type.  For example

new type wombat
   const wombat  equ 0x0000000000000001
   const wombat1 equ 0x0000000000000010
   const wombat2 equ 0x0000000000000100
   ...
   WombatFlg     dw  0x0000000000000000
end type wombatx

wombatx.wombatflg |= wombat.wombat1


The way C and C++ implement Preincrement and Postincrement is also
fundamentally wrong headed.  The C standard stipulates that there is no
guarantee where in an expression the pre or post increment will occur, only
that they will have been performed at the end of the expression.  Multiple
appearances of a variable in an expression provide different results for the
expression if the variable is incremented/decremented within the expression.

Insanity!

At the very least, C/C++ should check for such problems and REFUSE to
compile any program that contains such a problem.  Hopefully spitting out a
meaningful error message.

I would prefer the following...

The definition should stipulate that preincrements occur before the first
reference to the variable in question, and post increments occur after the
last reference to a variable.  Multiple increments/decrements of the same
variable in an expression should trigger a compile error.


Automatic variable type conversion is a legitimate convenience feature that
C provides. However, once again the language default behaviour is contrary
to common sense.

Casting errors are extremely common and often difficult to find particularly
when they occur in a function call.  To make this less likely, variable
casting should only occur if explicitly stated.  I would even go so far as
to require explicit casting where arrays are converted to pointer references
in function calls.  This may be going too far, but anything that will make
the conversion more explicit is beneficial.

I believe I read that "D" will pass arrays rather than converting them to
pointers.  This is a mistake, as passing arrays will clearly be very
inefficient.  Any attempt to pass an array as an array rather than a casted
pointer should generate an error.

Label case sensitivity is also an issue.  Here in the real world, words do
not change their meaning when they capitalized.  A Banana is a BANANA.  So
it should also be with programming labels and keywords.   All compiler
should ignore the case of all names and keywords, yet optionally keep the
case intact for the purpose of export and linkage.  Linkage within the
language itself should also be case insensitive.  I would recommend through
the conversion of all exported labels to upper case.

Keeping case sensitivity to names and keywords promotes poor programming
practices like defining variables with names like HWin, hWin, hWIN, etc. Oh
ya, like that isn't found in every windows program ever written.

Insanity!

In the current C/C++ dogma, case sensitivity is considered to be a good
because it provides slightly greater flexibility in naming, but this benefit
is predicated on the creation of identically named identifiers that differ
only by case, yet creating such labels is considered (correctly so), bad
programming practice.  So the benefit is only had through bad programming
practice.  Another inconsistency.

Insanity!


Other features that I would like to see is the ability to perform
initializations in the following manner

int16 a,b,c = 7;
int16 a,b,c = 7,8,9;

static int16 a,b,c = 0;


You have gotten rid of the . vs -> fiasco.  Wonderful.
Lost the forward reference, predefinition kludge.  Wonderful.
Lost macros.  Wonderful as long as a better facility is provided.
              Being able to define your own operators goes a long way in
this respect.
Improved error trapping.  Wonderful.
Decided to include strings as a native type.  Wonderful.

Improvements in object implementation.  Wonderful.

I wish you success in the development of "D", and I prey for a world
rational enough to abandon the abominations that are now being used, and
adopt your language.

Unfortunately, members of the C religion are far too deep into the
Plauktau - the blood fever - to listen to reason.

Landru - guide us!

All is chaos!

Feb 09 2002

"Richard Krehbiel" <rich kastle.com> writes:

"Walter" <walter digitalmars.com> wrote in message
news:a3ufcm$1nvu$2 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3tuq7$qtk$1 digitaldaemon.com...
 No paul.  Apparently it is not a pointer.  The array index is compared
 against the array bounds.
 That means array references are performed the typical way, by


recomputing
 the pointer from the index
 after an index change.  That reqires a multiply and add before at least

 the
 first reference after an index change.

 In most cases, the multiply and add are done in hardware in the addressing
 mode calculation and do not add any execution time.

Should I take this to mean that you believe applying the "strength
reduction" optimization to array subscripting operations to be unnecessary?
I didn't think multiplies and adds were *that* free yet.

--
Richard Krehbiel, Arlington, VA, USA
rich kastle.com (work) or krehbiel3 home.com (personal)

Feb 08 2002

Russell Borogove <kaleja estarcion.com> writes:

Richard Krehbiel wrote:

 "Walter" <walter digitalmars.com> wrote in message
 news:a3ufcm$1nvu$2 digitaldaemon.com...
 
"D" <s_nudds hotmail.com> wrote in message
news:a3tuq7$qtk$1 digitaldaemon.com...
That means array references are performed the typical way, by


 recomputing
 
the pointer from the index
after an index change.  That reqires a multiply and add before at least

the

first reference after an index change.

In most cases, the multiply and add are done in hardware in the addressing
mode calculation and do not add any execution time.

 
 Should I take this to mean that you believe applying the "strength
 reduction" optimization to array subscripting operations to be unnecessary?
 I didn't think multiplies and adds were *that* free yet.


It depends on the exact architecture of course, and I think
that Walter's "in most cases" meant the very common multiply-
by-2 and multiply-by-4 to index 16- and 32-bit arrays.

-Russell B

Feb 08 2002

"Walter" <walter digitalmars.com> writes:

"Richard Krehbiel" <rich kastle.com> wrote in message
news:a40nur$1r5s$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 In most cases, the multiply and add are done in hardware in the


addressing
 mode calculation and do not add any execution time.

 Should I take this to mean that you believe applying the "strength
 reduction" optimization to array subscripting operations to be

unnecessary?
 I didn't think multiplies and adds were *that* free yet.

In my testing, doing such strength reduction for modern processors makes
things worse, not better. The DMC optimizer has specific code in it to
'undo' optimizations that would otherwise fit in an addressing mode.

Feb 08 2002

"D" <s_nudds hotmail.com> writes:

First.  Not all cpu's have the ability to perform op(add(add(shift))) within
a single instruction.
Second, the fact that the CPU actually performs a shift rather than a
multiply limits all such references to regular power of two byte boundaries.

These restrictions are suitable for array references, but they do not lend
themselves for pointer references.

That is after all the reason pointers exist in C/C++ in addition to arrays.

If arrays were sufficient, then pointers need not exist and would not be
implemented.

Clearly then if you believe that bounded arrays are a language requirement,
you must logically
conclude that bounded pointers are also a requirement since by including
pointers in the first place
you have admitted that arrays are not sufficient.

Logic is not a pretty wreath of flowers that smell bad.

 "Richard Krehbiel" <rich kastle.com> wrote in message
 news:a40nur$1r5s$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 In most cases, the multiply and add are done in hardware in the


 addressing
 mode calculation and do not add any execution time.

 Should I take this to mean that you believe applying the "strength
 reduction" optimization to array subscripting operations to be

 unnecessary?
 I didn't think multiplies and adds were *that* free yet.



Walter <walter digitalmars.com> wrote in message
news:a41ime$2pnk$4 digitaldaemon.com...
 In my testing, doing such strength reduction for modern processors makes
 things worse, not better. The DMC optimizer has specific code in it to
 'undo' optimizations that would otherwise fit in an addressing mode.

Feb 09 2002

"Roberto Mariottini" <rmariottini lycosmail.com> writes:

"H. Ellenberger" <ele1 gmx.ch> ha scritto nel messaggio
news:3C49E692.C9DE8997 gmx.ch...
 Walter wrote:

 I've been working on implementing it. After turning it on and


recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was


passing.
 Just goes to show, array bounds checking really is valuable!

 No surprise for people with experience in Topspeed Modula-2 which had
 this RT check many years ago.

Or in Turbo Pascal, which had this optional RT check in the 80s.

Jan 21 2002

Russell Borogove <kaleja estarcion.com> writes:

Walter wrote:

 I've been working on implementing it. After turning it on and recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was passing.

And naturally you immediately added code to the test suite
that would have caught those bugs if the bounds check hadn't,
right? Belt and suspenders! Belt and suspenders!

Yes, the first time I started using a range-checked array
class, I was surprised at how many catches it made.

-R

Jan 19 2002

D Programming

C/C++ Programming

Other

D - array bounds checking