www.digitalmars.com         C & C++   DMDScript  

D - array bounds checking

reply "Walter" <walter digitalmars.com> writes:
I've been working on implementing it. After turning it on and recompiling
the library and test code, it tripped and found 3 bugs in the regexp
implementation - code that I have a nice test suite for that was passing.

Just goes to show, array bounds checking really is valuable! And being able
to turn it off for performance code is why D is better than other languages
offering bounds checks.
Jan 19 2002
next sibling parent reply "H. Ellenberger" <ele1 gmx.ch> writes:
Walter wrote:

 I've been working on implementing it. After turning it on and recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was passing.

 Just goes to show, array bounds checking really is valuable!

No surprise for people with experience in Topspeed Modula-2 which had this RT check many years ago.
 And being able
 to turn it off for performance code is why D is better than other languages
 offering bounds checks.

My experience showed that a good implementation in most cases does not slow down too much, so I often left all checks (array bounds, overflow, NIL pointer) on, except for well tested library functions.
Jan 19 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"H. Ellenberger" <ele1 gmx.ch> wrote in message
news:3C49E692.C9DE8997 gmx.ch...
 Walter wrote:
 I've been working on implementing it. After turning it on and


 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was


 Just goes to show, array bounds checking really is valuable!

this RT check many years ago.
 And being able
 to turn it off for performance code is why D is better than other


 offering bounds checks.


 down too much, so I often left all checks (array bounds, overflow, NIL

 on, except for well tested library functions.

I figure by making it optional, any objections to it should be addressed.
Jan 19 2002
parent reply "D" <s_nudds hotmail.com> writes:
Bouinds checking will eventually be seen as essential for both array and
pointer operations.

A secondary bounded pointer type should be defined for that purpose.  It
should be a composite type, and it should be targeted so that it is
implemented in hardware, and throws an error when an access is attempted
outside the bouinded range.

Walter <walter digitalmars.com> wrote in message
news:a2d4kq$1tdi$1 digitaldaemon.com...
 "H. Ellenberger" <ele1 gmx.ch> wrote in message
 news:3C49E692.C9DE8997 gmx.ch...
 Walter wrote:
 I've been working on implementing it. After turning it on and


 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was


 Just goes to show, array bounds checking really is valuable!

this RT check many years ago.
 And being able
 to turn it off for performance code is why D is better than other


 offering bounds checks.


 down too much, so I often left all checks (array bounds, overflow, NIL

 on, except for well tested library functions.

I figure by making it optional, any objections to it should be addressed.

Feb 04 2002
next sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"D" <s_nudds hotmail.com> wrote in message
news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.  It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is attempted
 outside the bouinded range.

"Bounded pointer" - i.e. a pointer that knows size of data it points to - is a D dynamic array: int[] a = new int[5]; ... b = a[10]; // throws ArrayBoundsError
Feb 04 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"D" <s_nudds hotmail.com> wrote in message
news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.  It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is attempted
 outside the bouinded range.

Already in D! Well, a software version anyway. Example: convert a pointer p into a "bounded pointer" bp: char *p; char[] bp = p[0..len];
Feb 04 2002
parent reply "D" <s_nudds hotmail.com> writes:
Great.  I take it that all pointer increments and assignments are checked
against the upper and lower bounds of the array, and an exception is thrown
if the range is violated?

Walter <walter digitalmars.com> wrote in message
news:a3lp86$v31$1 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array and
 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.  It
 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is attempted
 outside the bouinded range.

Already in D! Well, a software version anyway. Example: convert a pointer p into a "bounded pointer" bp: char *p; char[] bp = p[0..len];

Feb 04 2002
parent reply "Walter" <walter digitalmars.com> writes:
"D" <s_nudds hotmail.com> wrote in message
news:a3ni7d$29f5$1 digitaldaemon.com...
 Great.  I take it that all pointer increments and assignments are checked
 against the upper and lower bounds of the array, and an exception is

 if the range is violated?

Not exactly. You don't increment dynamic arrays, but you do increment the index.
 Walter <walter digitalmars.com> wrote in message
 news:a3lp86$v31$1 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3ll98$svr$1 digitaldaemon.com...
 Bouinds checking will eventually be seen as essential for both array



 pointer operations.

 A secondary bounded pointer type should be defined for that purpose.



 should be a composite type, and it should be targeted so that it is
 implemented in hardware, and throws an error when an access is



 outside the bouinded range.

Already in D! Well, a software version anyway. Example: convert a pointer p into a "bounded pointer" bp: char *p; char[] bp = p[0..len];


Feb 04 2002
parent reply "D" <s_nudds hotmail.com> writes:
Then it's not a pointer is it?

As I said, I recommend implementing a bounded pointer type.

Walter <walter digitalmars.com> wrote in message
news:a3nvlr$2f0q$3 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3ni7d$29f5$1 digitaldaemon.com...
 Great.  I take it that all pointer increments and assignments are


 against the upper and lower bounds of the array, and an exception is

 if the range is violated?

Not exactly. You don't increment dynamic arrays, but you do increment the index.

Feb 06 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"D" <s_nudds hotmail.com> wrote in message
news:a3r373$28kk$1 digitaldaemon.com...
 Then it's not a pointer is it?

IT IS A POINTER THAT CANNOT BE PERFORMED POINTER MATH ON. I always wondered, why do C geeks never consider something that cannot be moved, a pointer?
Feb 06 2002
parent reply "D" <s_nudds hotmail.com> writes:
No paul.  Apparently it is not a pointer.  The array index is compared
against the array bounds.
That means array references are performed the typical way, by recomputing
the pointer from the index
after an index change.  That reqires a multiply and add before at least the
first reference after an index change.

A pointer is different. A poitner poitns to an area of memory.
An index is not a pointer.

Apparenlty D doesn't have a bounded pointer type.
I recommend one be added.

Pavel Minayev <evilone omen.ru> wrote in message
news:a3r4qu$29ai$1 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3r373$28kk$1 digitaldaemon.com...
 Then it's not a pointer is it?

IT IS A POINTER THAT CANNOT BE PERFORMED POINTER MATH ON. I always wondered, why do C geeks never consider something that cannot be moved, a pointer?

Feb 07 2002
parent reply "Walter" <walter digitalmars.com> writes:
"D" <s_nudds hotmail.com> wrote in message
news:a3tuq7$qtk$1 digitaldaemon.com...
 No paul.  Apparently it is not a pointer.  The array index is compared
 against the array bounds.
 That means array references are performed the typical way, by recomputing
 the pointer from the index
 after an index change.  That reqires a multiply and add before at least

 first reference after an index change.

In most cases, the multiply and add are done in hardware in the addressing mode calculation and do not add any execution time.
Feb 07 2002
next sibling parent reply "D" <s_nudds hotmail.com> writes:
Walter, in most cases people use pointers rather than arrays.
If yoiu believe that arrays can take the place of pointers, then drop
pointers from the language spec.

Further, delimited pointers would restrict access within structures as well

Do you wish to fix the legion of problems in C or not?  If not, what is the
point of D?.


 "D" <s_nudds hotmail.com> wrote in message
 news:a3tuq7$qtk$1 digitaldaemon.com...
 No paul.  Apparently it is not a pointer.  The array index is compared
 against the array bounds.
 That means array references are performed the typical way, by


 the pointer from the index
 after an index change.  That reqires a multiply and add before at least

 first reference after an index change.


Walter <walter digitalmars.com> wrote in message news:a3ufcm$1nvu$2 digitaldaemon.com...
 In most cases, the multiply and add are done in hardware in the addressing
 mode calculation and do not add any execution time.

Feb 07 2002
parent reply "Walter" <walter digitalmars.com> writes:
"D" <s_nudds hotmail.com> wrote in message
news:a3vjlf$1ah0$1 digitaldaemon.com...
 Walter, in most cases people use pointers rather than arrays.
 If yoiu believe that arrays can take the place of pointers, then drop
 pointers from the language spec.

Pointers are necessary for, if nothing else, compatibility with C API's.
 Further, delimited pointers would restrict access within structures as

 Do you wish to fix the legion of problems in C or not?  If not, what is

 point of D?.

D addresses the worst of the problems with C.
Feb 07 2002
parent "D" <s_nudds hotmail.com> writes:
 Do you wish to fix the legion of problems in C or not?  If not, what is

 point of D?.


Walter <walter digitalmars.com> wrote in message news:a3vsm5$1e1s$1 digitaldaemon.com...
 D addresses the worst of the problems with C.

Your changes to the C language are too minimal for D to survive as a language. I have just read with great interest the overview of your "D" language, contained within the distribution file dmdalpha.zip and thought I would add my 2 cents. Well, ok.... 27K of uncommon sense. Let me start by saying that I loath both C and C++. These languages are abominations, abortions, unworthy of existence, a pox on the earth, etc... C fails on so many levels that I don't know where to begin. Even the standard I/O library is an abomination.... Consider the function "GETS". It takes a special kind of moron to write such a function. It takes a special class of mindless wonders to actually decide to incorporate such a piece of filth into a "standard" library. Those who wrote included Gets in the first place, and those who voted to put it in the standard simply are too stupid to justify their continued existence. Even torture and death is an inadequate punishment for their criminal stupidity. Words can not convey my loathing for these worthless vermin. However having got that off my chest, C/C++ still do have some nice characteristics. And while the languages themselves are also unworthy of existence, they can be salvaged. I long for a day when C is replaced by a similarly featured, language that solves most of it's ample list of failings. I come from an assembly language background so I am very familiar with how machines operate at the lowest levels. I'm intimately familiar with pointers, and register sets, memory allocation, and rolling my own functions and syntax as needed. I appreciate the utility of high level languages, although I am turned off by the pathetic level of optimizations they provide. To this day, compilers still produce code that is 2 to 4 times larger and 2 to 4 times smaller than I can produce by hand. Of course they can produce it a zillion times faster than I can so in the current environment that is the more significant factor. I am also turned off by the lack of low level control that most high level languages provide. C/C++ strike a minimally acceptable balance between low level capability and high level convenience. Having said that, I recognize that C/C++ fail to provide adequate low level control for many common low level functions. These limitations are mostly a result of K&R (may they burn eternally in hell), defining a language that appeals to the lowest common denominator among all machines. IE. makes no assumptions about register sizes or the manner in which variable data is represented. Variables are considered as simply abstractly numeric and not necessarily stored internally as a set of binary bits. Given that C/C++ is used to code I/O drivers, and write other very low level code, the lack of concept of "bit" in the language necessarily makes these programs non-compliant with proper C coding standards, and hence non-portable. It was foolish of K&R to create a median level language for low level coding while defining it in such a manner as to make the production of low level code, outside the sanctioned scope of the language. Insanity! As you observe, C/C++ can't be fixed. At least not while remaining anywhere near compatible. It needs to be replaced. You have my vote for "D". It's a step in the right direction. By far your best decision has been to abandon the concept of code compatibility with C/c++. The need to maintain code compatibility with C clearly placed nasty constraints on the development of C++, and as you observe the language "standard" is vastly too convoluted to manage let alone patch. As I implied above, in my view, C/C++ is a very poor language because it's underlying core philosophy is often contradictory and simply wrong headed. Consistency is replaced by special cases, and irrational behaviour and inconsistent nomenclature. Not only does this cause the standard to become bloated and confused, but it also makes the language more difficult to use, and the resulting programs more error prone. There was a time when programs were written like spaghetti. Programmers, it was observed, didn't have the self decipline to follow rational coding standards. As a result, the structured programming paradigm was invented in order to provide a consistent approach to writing programs. The result was higher code quality, greater code simplicity and less error prone software. The same argument is at the heart of the object model of programming. Programmers, it was observed, didn't have the self decipline to follow rational coding standards and properly encapsulate the functionality of their programs. The Object programming paradigm, imposes for the most part the "correct" behaviour on the programmer, or at least points him/her in the proper direction. Encapsulation is now foremost on everyone's mind. Odd, I've been encapsulating Assembler functions since before the days of C++. The justification for encapsulation was clear. The method was generic and applicable to many other languages. But not easily applicable to C, because the language is so poorly defined. The purpose of any language above assembler is two fold. First, to make it easier for programmers to write code, and second, to make it more probable that the code produced will properly do it's intended job. Portability is often raised as a reason for the existence of high level languages, but this is not a legitimate argument since low level languages can be made to be just as portable as high level ones. Java byte code and other languages that generate intermediate code or Pcode have repeatedly proven this. JIT compilers are the final nail in the coffin. Portability is a feature of any language, and not as it is so often claimed by members of the C/C++ religions, a characteristic of high level languages. Ease of use and program correctness are related of course. One pretty much implies the other. When a language is easy to use, it will typically be easier to spot programming errors. The opposite is also true. When it is difficult to write correct programs, a language is generally difficult to use. It's just common sense. C is poor because it's subtle behavioural anomalies and inconsistencies make it difficult to follow it's code. Type conversions are often hidden, assumptions must often be made about a variable's size and internal representation which are outside of the language's specifications etc. C is poor because it relies on the conscious effort of the programmer to avoid hidden or concealed pitfalls, rather than avoiding them in the first place though proper ergonomic language design. The C philosophy is contradictory since as a higher level language, it's reason for existence is convenience. But it then burdens the programmer with it's own irrational and inconsistent behaviours making the language inconvenient and error prone. At it's very core the C philosophy is contradictory. It is a structured language, (clearly K&R recognized the superiority of structure in the creation of well written programs). But having recognized this, K&R then foolishly refused to implement many important structural features that are highly desirable, and trivially easy to implement. Consider that C omits block typing. Those structured languages that don't descend from C, typically identify different kinds of blocks with different bracketing keywords. Do/While, If/Endif, Repeat/Until etc. When reading a block typed program, if the programmer sees the word "until", he/she knows that the statement ends a "repeat" block. With C on the other hand the same delimiters are used for all blocks. The bracket that ends a C block could be the end of an if block or any other kind of block. The programmer is forced when reading the code, to search back through the program to find which block is being closed. For convenience the strategy to combat this language failing is to progressively indent blocks so that the start of a block can easily be identified. And of course sicne C is a free format language, we immediately run into the problem of people having different indention styles that typically make following code written by another person difficult. The indent requirement is utter lunacy. No language should be able to have it's meaning be made essentially human unreadable through a simple loss of indentation. Insanity! The C philosophy is that the language should be self documenting, yet the language provides inadequate syntactic guards, redundancy and signposts that would make it self documenting. Specifically the omission of keywords like "then" after "if" beg the question "if what?" Yes, in an "if then" statement the keyword "then" is a redundancy. Redundancy is a very good thing. Natural language is full of redundancy. It exists for a purpose. If this were not so, it would have been evolved out of the language. Redundancy is maintained through the evolution of a language for a reason. The reason is that it facilitates the conveyance of the intended meaning of what is said. A person may miss, or misinterpret one key idea in a sentence, but through the language redundancy will often be guided to the intended meaning. Practical redundancy to enhance readability is a very good thing. Syntactic minimalism is a wish for the ignorant, and is be avoided in any rationally defined languages. APL is proof enough of that. The more cryptic a language the more likely errors will be made. The C "religion" accepts that C is superior to assembler because the C syntax is more readable and more convenient to use than assembler. I agree. However, C dogma then goes on to reject alterations in the syntax that would provide further benefits to readability and every other language convenience on the grounds that they don't add any value to the language itself. Clearly this dogma is false. I strongly urge you to move further away from the C syntax, and implement block typing. If Then/End if, While/Whend, Do/Loop, SelectCase/EndCase, Begin/End. Note that I say SelectCase rather than Switch. The word switch has no contextual connection to the operation it identifies in C/C++. In the real world, a switch is a binary device. It is either on or off. In the real world a switch does not allow the selection of multiple settings. Dials, and other kinds of SELECTORS do that kind of thing, and when they make discrete choices, they are composed of <separate> "switches". The keyword "switch" should therefore be changed to something meaningful, something that describes what is being done. How about using "Select"? Seems to work for other languages. The "switch" statement also has the unfortunate characteristic of not allowing multiple comparisons for each case. Also unfortunate is the need for the "break" keyword to exit the statement. Switch would be syntactically cleaner if it allowed multiple entries, and the break keyword is removed. Select (expression) Case a,b,c; Case d,e,f; default; end select; Rather than switch (expression) ( case a; case b; case c; break; case d; case e; case f; break; default; ); The scope rules for C and C++ are also inconsistent. All variables inside a block are visible only within the block. All variables inside a procedure are visible only within the procedure. C is inconsistent in that variables and functions that are inside a translation unit or module, (whatever you wish to call it), have global scope by default. This behaviour is also plainly irrational. Modules, should present <NO> external linkages at all unless those linkages are explicitly indicated. With the default behaviour set to make all module level definitions global in scope, the programmer is forced to perform extra work in order to code properly. The unweary end up wasting compile resources and throwing away potential optimizations when procedures are left to the default global scope. Insanity! In order to minimize error and make a language convenient, default behaviour should be that behaviour that is most consistent with good programming practice. C's method of defining variables is also fundamentally flawed. Not in the syntax used, but in the manner in which the variables are sized. C/C++ provides no clean and rational mechanism for programmers to know if the variables they will be using are of adequate size or type to hold the data they wish to represent. C dogma holds that either the programmer write to the lowest common denominator, or use conditional compilation to select appropriate sized variables for the task at hand. That's not portability. It's Insanity! It is an extreme burden for the programmer to constantly have to worry if his program is going to run correctly because the language may have altered the size or type of his variable in some environments. How many subtle programming errors can be introduced into a program when it's integer size be altered from 8 to 32 bits? Or more catastrophically the opposite? The proper way to solve the problem is to provide the programmer with a set of variable types that are guaranteed to be implemented in the language. It is the duty of the language to conform to the programmers demands. If the programmer wishes to use an 8 bit integer, and it is a sanctioned size, it is the obligation of the language to synthesize one, out of the given CPU registers even if the size is not natively supported. Undefined variable sizes produce errors. Compilers are often burdened to provide support for floating point types on CPU's that don't have floating point variables. On some CPU's floats must be synthesized out of raw integers. The programmer need not concern himself as to weather the target CPU actually has native register support for these variables. Integers should be no different. It is not relevant if the underlying CPU or environment supports 8 bit integers or not. Such variables can be constructed with 16 or 32 bit registers as quite easily. Program correctness must trump efficiency. In the case of variable definitions, C places efficiency above program correctness. Insanity! For D, your decision to forget about 8 bit CPU's will limit this problem to some extent, but as a practical matter, integer variables are never going to exceed 128 bits in size, and number representation is not going to stray from typical binary representation found in modern CPU's. So, this places reasonable limits on the number of variable types that must be supported. Some machines will fail to conform. Too bad, those machines shouldn't exist. Rather than providing a set of integer variables of unknown size, I strongly recommend that you define a set of integer variables of fixed size and sign type. I would suggest the following... Byte (unsigned 8 bit) SByte (Signed 8 bit) Word (unsigned 16 bit) SWord (signed 16 bit) Dword (unsigned 32 bits) SDword (Signed 32 bits) Qword (unsigned 64 bits) SQword (Signed 64 bits) 0Word (Unsigned 128 bits) SOWord (Signed 128 bits) or ... int8 (unsigned 8 bit) Sint8 (Signed 8 bit) int16 (unsigned 16 bit) sint16 (signed 16 bit) int32 (unsigned 32 bits) sint32 (Signed 32 bits) int64 (unsigned 64 bits) sint64 (Signed 64 bits) int128 (Unsigned 128 bits) sint128 (Signed 128 bits) Lets have no optional behaviour in the language spec. None of this - a character is an 8 bit signed number in some machines, and an unsigned number in others - nonsense. Another aspect of variable definition that should be changed is the names of the types themselves. Again the problem is ambiguity. Ambiguity breeds error. Single. Single what? "flapjack" has just as much meaning. Double. Double what? Double the flapjack of course. How about "tall"? Long. Long what? How long is a long? Shouldn't "dint" be "short"? With the integer data types shown previously there is no ambiguity. I see nothing wrong with... Float1 Float2 Float3 or FSingle FDouble FExtra As long as the floating point nature of the variable, if not it's absolute size are made abundantly clear. Specifying pointer sizes is clearly an issue. As a practical matter no CPU that I know of has more than one pointer size, so not specifying the size in bits is a legitimate option. It does however raise the issue of how the program can ensure that a pointer subtraction can be contained within a register of known size. Fortunately pointer subtraction doesn't occur very often so in my view a typedef based on variable size seems the most pragmatic and acceptable solution. C also fails when it comes to character types as it does not specify if they are signed or unsigned. defining Char and SChar, WChr and SWChar will solve that problem for both ASCII and wide Unicode charactes. But why should characters be treated as numbers at all when they are characters. Doing so is a rejection of type definition itself. Rational languages are strongly typed and conversion can be indicated through explicit casting. While I am on the subject of variables, I observe that very many programming errors in C are caused by improper array bounds checking. I have long thought that a secondary "secure" pointer type should be implemented that places bounds on the pointer's value. This would be a complex type that contains not only the pointer itself, but the upper and lower bounding values. Of course you can implement bounded pointers through overloading, but who actually does it? Using a composite variable is also inefficient when implemented in software by the compiler. Ideally this type should be implemented in hardware so that bounding tests can be performed in parallel as the content of the pointer register is changed. Why this kind of thing isn't already implemented in hardware is beyond me. It's not needed I guess. All those buffer overflow problems that are the source of 90% of all the security exploits must be figments of my imagination... Insanity! In C and C++ /*Comments*/ can't nest. I've never understood the justification for such a limitation. Certainly the preprocessor strips out the comments before the compiler sees them, so it should have been trivial to simply have the pre-processor use a counter rather than a boolean to determine if it was inside a comment. C/C++ are free format languages, and there is nothing wrong with that. However, it is often argued by C religionists, that the mandatory use of semicolons are required if the language is to remain free format. This is clearly false. Missing semicolons are the #1 bugaboo of all C programmers no matter what their level of experience. Why not remove their requirement? This should be as simple as defining <EOL> to be equivalent to a semicolon, and then defining a line continuation character that causes the next occurrence of <EOL> to be ignored. Semicolons can be reserved for putting multiple statements on one logical line. begin dint a,b,c c = b + a; b = a + c end Rather than... { dint a; dint b; dint c; c = b + a; b = a + c; } a = FtnCall(VariableA, VariableB, _ VariableC, VariableD) Rather than... a = FtnCall(VariableA, VariableB, VariableC, VariableD); Your choice in "D" is to remove operator overloading from the language spec. Excellent, I concur with you that the ability to redefine operators causes many more problems than it solves. However, having the ability to use operator notation does greatly simplify the coding of equations involving complex numbers etc. The idea of converting function calls to an operator syntax is a good one. C (and other oop languages) simply take the wrong tact. Rather than overload the existing operators, it is a much better idea to allow the creation of new operators. New operators should all have equal precedence. Many languages have the facility to implement new operators rather than simply overloading the existing ones. If properly used this can provide a means of operator typing and therefore improve code clarity. You could for example write statements such as... c .c= a .c+ b Where <.c=> is defined as complex assignment <.c+> is defined as complex add Recc .Rec= Reca .Rec_Name_Greater Recb Where .Rec= is defined as record assignment .Rec_Name_Greater is defined as a comparon between named portions of two records. Once identified, the above syntax can easily be converted to a function like synatx by a preprocessor as follows. Define Operator Rec_Name_Greater = as type Rec Rec_Name_Greater(type Rec, type Rec) The precompiler would then translate the statement .. Recc .Rec= Reca .Rec_Name_Greater Recb into the call ... Recc = Reg_Name_Greater(Reca,Recb) This implies that the compler should be smart enough to be able to automatically handle assignment to complex data types via a block move as necessary. It also implies that the internal organization of structures be defined within the language. Defining the internal variable arrangement of structures poses difficulties for language portability and efficiency. I would suggest that there be two types of structures. One defined for portability between platforms, and one for internal use with the stipulation that the one defined for portability between platforms be used only where structures are necessarily shared. Since data formats are assumed to be binary and in the form of 8, 16, 32, 64 or 128 bits and strings, it is necessary to worry about endian formats, both bitwise and bytewise. The most convenient way to do this would be to allow the definition of new big endian/little endian variable types within structures. The best way to do this would be to define the types as "overrides" in the structure type definition itself. I.E. new iotype wombat BigBitEndian, BigByteEndian int8 a ,b ,c int16 a1,b1,c1 int32 a2,b2,c2 ... end type iotypes should not be allowed to support any operation other than assignment. This will force the immediate conversion of such types to the native format used by the machine. Again, coming from a machine language environment, I prefer to keep things as compact as possible. To that end, I often define one integer of storage space and use it to hold multiple boolean flags. const wombat equ 0x0000000000000001 const wombat1 equ 0x0000000000000010 const wombat2 equ 0x0000000000000100 ... WombatFlg dw 0x0000000000000000 You can do similar things in most other languages of course, but in every case that I know of you end up with a whopping number of similarly named and unassociated constants that may have the same numerical values. Not good. I would like to have the ability to associate a constant with a particular data type. For example new type wombat const wombat equ 0x0000000000000001 const wombat1 equ 0x0000000000000010 const wombat2 equ 0x0000000000000100 ... WombatFlg dw 0x0000000000000000 end type wombatx wombatx.wombatflg |= wombat.wombat1 The way C and C++ implement Preincrement and Postincrement is also fundamentally wrong headed. The C standard stipulates that there is no guarantee where in an expression the pre or post increment will occur, only that they will have been performed at the end of the expression. Multiple appearances of a variable in an expression provide different results for the expression if the variable is incremented/decremented within the expression. Insanity! At the very least, C/C++ should check for such problems and REFUSE to compile any program that contains such a problem. Hopefully spitting out a meaningful error message. I would prefer the following... The definition should stipulate that preincrements occur before the first reference to the variable in question, and post increments occur after the last reference to a variable. Multiple increments/decrements of the same variable in an expression should trigger a compile error. Automatic variable type conversion is a legitimate convenience feature that C provides. However, once again the language default behaviour is contrary to common sense. Casting errors are extremely common and often difficult to find particularly when they occur in a function call. To make this less likely, variable casting should only occur if explicitly stated. I would even go so far as to require explicit casting where arrays are converted to pointer references in function calls. This may be going too far, but anything that will make the conversion more explicit is beneficial. I believe I read that "D" will pass arrays rather than converting them to pointers. This is a mistake, as passing arrays will clearly be very inefficient. Any attempt to pass an array as an array rather than a casted pointer should generate an error. Label case sensitivity is also an issue. Here in the real world, words do not change their meaning when they capitalized. A Banana is a BANANA. So it should also be with programming labels and keywords. All compiler should ignore the case of all names and keywords, yet optionally keep the case intact for the purpose of export and linkage. Linkage within the language itself should also be case insensitive. I would recommend through the conversion of all exported labels to upper case. Keeping case sensitivity to names and keywords promotes poor programming practices like defining variables with names like HWin, hWin, hWIN, etc. Oh ya, like that isn't found in every windows program ever written. Insanity! In the current C/C++ dogma, case sensitivity is considered to be a good because it provides slightly greater flexibility in naming, but this benefit is predicated on the creation of identically named identifiers that differ only by case, yet creating such labels is considered (correctly so), bad programming practice. So the benefit is only had through bad programming practice. Another inconsistency. Insanity! Other features that I would like to see is the ability to perform initializations in the following manner int16 a,b,c = 7; int16 a,b,c = 7,8,9; static int16 a,b,c = 0; You have gotten rid of the . vs -> fiasco. Wonderful. Lost the forward reference, predefinition kludge. Wonderful. Lost macros. Wonderful as long as a better facility is provided. Being able to define your own operators goes a long way in this respect. Improved error trapping. Wonderful. Decided to include strings as a native type. Wonderful. Improvements in object implementation. Wonderful. I wish you success in the development of "D", and I prey for a world rational enough to abandon the abominations that are now being used, and adopt your language. Unfortunately, members of the C religion are far too deep into the Plauktau - the blood fever - to listen to reason. Landru - guide us! All is chaos!
Feb 09 2002
prev sibling parent reply "Richard Krehbiel" <rich kastle.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:a3ufcm$1nvu$2 digitaldaemon.com...
 "D" <s_nudds hotmail.com> wrote in message
 news:a3tuq7$qtk$1 digitaldaemon.com...
 No paul.  Apparently it is not a pointer.  The array index is compared
 against the array bounds.
 That means array references are performed the typical way, by


 the pointer from the index
 after an index change.  That reqires a multiply and add before at least

 first reference after an index change.

In most cases, the multiply and add are done in hardware in the addressing mode calculation and do not add any execution time.

Should I take this to mean that you believe applying the "strength reduction" optimization to array subscripting operations to be unnecessary? I didn't think multiplies and adds were *that* free yet. -- Richard Krehbiel, Arlington, VA, USA rich kastle.com (work) or krehbiel3 home.com (personal)
Feb 08 2002
next sibling parent Russell Borogove <kaleja estarcion.com> writes:
Richard Krehbiel wrote:

 "Walter" <walter digitalmars.com> wrote in message
 news:a3ufcm$1nvu$2 digitaldaemon.com...
 
"D" <s_nudds hotmail.com> wrote in message
news:a3tuq7$qtk$1 digitaldaemon.com...
That means array references are performed the typical way, by


the pointer from the index
after an index change.  That reqires a multiply and add before at least

first reference after an index change.

mode calculation and do not add any execution time.

Should I take this to mean that you believe applying the "strength reduction" optimization to array subscripting operations to be unnecessary? I didn't think multiplies and adds were *that* free yet.

It depends on the exact architecture of course, and I think that Walter's "in most cases" meant the very common multiply- by-2 and multiply-by-4 to index 16- and 32-bit arrays. -Russell B
Feb 08 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Richard Krehbiel" <rich kastle.com> wrote in message
news:a40nur$1r5s$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 In most cases, the multiply and add are done in hardware in the


 mode calculation and do not add any execution time.

reduction" optimization to array subscripting operations to be

 I didn't think multiplies and adds were *that* free yet.

In my testing, doing such strength reduction for modern processors makes things worse, not better. The DMC optimizer has specific code in it to 'undo' optimizations that would otherwise fit in an addressing mode.
Feb 08 2002
parent "D" <s_nudds hotmail.com> writes:
First.  Not all cpu's have the ability to perform op(add(add(shift))) within
a single instruction.
Second, the fact that the CPU actually performs a shift rather than a
multiply limits all such references to regular power of two byte boundaries.

These restrictions are suitable for array references, but they do not lend
themselves for pointer references.

That is after all the reason pointers exist in C/C++ in addition to arrays.

If arrays were sufficient, then pointers need not exist and would not be
implemented.

Clearly then if you believe that bounded arrays are a language requirement,
you must logically
conclude that bounded pointers are also a requirement since by including
pointers in the first place
you have admitted that arrays are not sufficient.

Logic is not a pretty wreath of flowers that smell bad.

 "Richard Krehbiel" <rich kastle.com> wrote in message
 news:a40nur$1r5s$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 In most cases, the multiply and add are done in hardware in the


 mode calculation and do not add any execution time.

reduction" optimization to array subscripting operations to be

 I didn't think multiplies and adds were *that* free yet.


Walter <walter digitalmars.com> wrote in message news:a41ime$2pnk$4 digitaldaemon.com...
 In my testing, doing such strength reduction for modern processors makes
 things worse, not better. The DMC optimizer has specific code in it to
 'undo' optimizations that would otherwise fit in an addressing mode.

Feb 09 2002
prev sibling parent "Roberto Mariottini" <rmariottini lycosmail.com> writes:
"H. Ellenberger" <ele1 gmx.ch> ha scritto nel messaggio
news:3C49E692.C9DE8997 gmx.ch...
 Walter wrote:

 I've been working on implementing it. After turning it on and


 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was


 Just goes to show, array bounds checking really is valuable!

No surprise for people with experience in Topspeed Modula-2 which had this RT check many years ago.

Or in Turbo Pascal, which had this optional RT check in the 80s.
Jan 21 2002
prev sibling parent Russell Borogove <kaleja estarcion.com> writes:
Walter wrote:

 I've been working on implementing it. After turning it on and recompiling
 the library and test code, it tripped and found 3 bugs in the regexp
 implementation - code that I have a nice test suite for that was passing.

And naturally you immediately added code to the test suite that would have caught those bugs if the bounds check hadn't, right? Belt and suspenders! Belt and suspenders! Yes, the first time I started using a range-checked array class, I was surprised at how many catches it made. -R
Jan 19 2002