www.digitalmars.com         C & C++   DMDScript  

D - dmd 30 bugs

reply Patrick Down <pat codemoon.com> writes:
Walter,

If you have a prefered bug reporting method let me
know.  Also it might be nice if you would put a 
version or build number in the compler banner to
refer to.

=================

The following causes my dmd.exe to exception with a
null pointer reference.

import stream;  // http://int19h.tamb.ru/files.html

void Bar(int input)
{
}

int main(char[][] args)
{
  Bar(stream);
  
  return 0;
}
May 09 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
Posting them here is fine. I thought the banner did put out the version
number. Execute DMD with no arguments.

"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920A3EEB2579patcodemooncom 63.105.9.61...
 Walter,

 If you have a prefered bug reporting method let me
 know.  Also it might be nice if you would put a
 version or build number in the compler banner to
 refer to.

 =================

 The following causes my dmd.exe to exception with a
 null pointer reference.

 import stream;  // http://int19h.tamb.ru/files.html

 void Bar(int input)
 {
 }

 int main(char[][] args)
 {
   Bar(stream);

   return 0;
 }
May 09 2002
parent reply Patrick Down <pat codemoon.com> writes:
"Walter" <walter digitalmars.com> wrote in news:abfp0m$1qa8$1
 digitaldaemon.com:

 Posting them here is fine. I thought the banner did put out the version
 number. Execute DMD with no arguments.
You are right it does print the version it has no arguments. =================== The next one's a little weard. When I run the following code it prints... here1 here2 err2 Then it hangs after printing err2. I don't know if this is your's or Pavel's. I suspect that MemoryStream.writeBlock is throwing the error but I don't know why the program hangs. import stream; int main(char[][] args) { try { printf("here1\n"); char[] foo = "variable=value\n" "var=value\n"; MemoryStream strm; printf("here2\n"); strm.writeString(foo); printf("here3\n"); } catch(WriteError e) { printf("err1\n"); } catch(Exception e) { printf("err2\n"); } return 0; }
May 09 2002
next sibling parent "Pavel Minayev" <evilone omen.ru> writes:
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920A143BCE8FCpatcodemooncom 63.105.9.61...

 Then it hangs after printing err2.
 I don't know if this is your's or Pavel's.  I suspect
 that MemoryStream.writeBlock is throwing the error but
 I don't know why the program hangs.
Nah, it's not my fault. Let's see how it goes. writeString() calls writeExact(), which in turn calls writeBlock(). Since MemoryStream is just a wrapper around OutBuffer, it calls OutBuffer.write() - which throws an exception "overlapping array copy". You can check it yourself. Try: import outbuffer, c.stdio; int main(char[][] args) { try { printf("start\n"); char[] s = "foo"; OutBuffer strm; printf("writing..."); strm.write(s); printf("done\n"); } catch (Exception e) printf("\nError: %.*s\n", e.msg); return 0; }
May 10 2002
prev sibling parent reply Patrick Down <pat codemoon.com> writes:
Patrick Down <pat codemoon.com> wrote in 
news:Xns920A143BCE8FCpatcodemooncom 63.105.9.61:
     MemoryStream strm;
Aggg! I'm still thinking C++! Sorry for the false alarm. MemoryStream strm = new MemoryStream;
May 10 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920A56844ACEFpatcodemooncom 63.105.9.61...

 Aggg!  I'm still thinking C++!  Sorry for the
 false alarm.

 MemoryStream strm = new MemoryStream;
Hah! I didn't notice it as well! =) Shame on me!
May 10 2002
parent reply MicroWizard <MicroWizard_member pathlink.com> writes:
In article <abgl73$2jq4$1 digitaldaemon.com>, Pavel Minayev says...
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920A56844ACEFpatcodemooncom 63.105.9.61...

 Aggg!  I'm still thinking C++!  Sorry for the
 false alarm.

 MemoryStream strm = new MemoryStream;
Hah! I didn't notice it as well! =) Shame on me!
Well... well... What about to have a run-time check against calling null references instead of having the system hung... Frankly, it would be very useful. Since I'm a stupid C++ programmer and I always forget to initialize object instances with an explicit 'new'. Tamas Nagy
May 10 2002
parent "Pavel Minayev" <evilone omen.ru> writes:
"MicroWizard" <MicroWizard_member pathlink.com> wrote in message
news:abh43c$30k2$1 digitaldaemon.com...

 What about to have a run-time check against calling null references
 instead of having the system hung...

 Frankly, it would be very useful. Since I'm a stupid C++ programmer
 and I always forget to initialize object instances with an explicit 'new'.
We're discussing it for the last few days already =)
May 10 2002
prev sibling next sibling parent reply Patrick Down <pat codemoon.com> writes:
Is this right?
err.d(2): cannot have out parameter of type bit

void foo(out bit bar)
{
  bar = true;
}

int main(char[][] args)
{
  bit a;
  
  foo(a);

  return 0;
}
May 10 2002
parent reply "Pavel Minayev" <evilone omen.ru> writes:
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920B30A21022patcodemooncom 63.105.9.61...

 Is this right?
 err.d(2): cannot have out parameter of type bit
There can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
May 11 2002
next sibling parent reply "OddesE" <OddesE_XYZ hotmail.com> writes:
"Pavel Minayev" <evilone omen.ru> wrote in message
news:abiv29$2lhf$1 digitaldaemon.com...
 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920B30A21022patcodemooncom 63.105.9.61...

 Is this right?
 err.d(2): cannot have out parameter of type bit
There can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
So we do need a bool type, byte sized! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
May 11 2002
parent reply Patrick Down <pat codemoon.com> writes:
"OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1
 digitaldaemon.com: 
 
 So we do need a bool type, byte sized!
 
typedef ubyte bool; I am sort of doubting the utility of the bit type for flags. I think I may go back to schemes like this. enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 } flags a = flags.f1 | flags.f4;
May 11 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
Where the bit type comes into its own is in very large arrays of bits.

"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920BB36E7D600patcodemooncom 63.105.9.61...
 "OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1
  digitaldaemon.com:
 So we do need a bool type, byte sized!
typedef ubyte bool; I am sort of doubting the utility of the bit type for flags. I think I may go back to schemes like this. enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 } flags a = flags.f1 | flags.f4;
May 11 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
If there's one thing a computer is good at, it's moving, scanning, and
otherwise manipulating bits.  Sucks that most languages don't extend down to
the bit type... the operations on bit arrays are like "detailed" versions of
the standard memcpy, memset, memmove, strchr, etc functions but at the bit
level.  And bit arrays are fairly powerful things.  I'd like to see more
attention to language features that deal with that kind of thing... maybe
for a D 2.0 feature.

The next area I'd focus on is color pixel processing.  The concept of color
is fundamental to our interface to the computer, the display subsystem, it
almost deserves to be a builtin type.  Some way of specifying operations on
a range of array entries and preferrably even extend slicing to rectangular
slices of a 2d array, cubic slices of a 3d array... that could get powerful
pretty fast.  The compiler could generate SIMD code for that stuff pretty
nicely.  Color arrays are palettes, 2d arrays are like images or textures,
and 3d arrays are like movies or voxels.  I wonder if C++ has ever
considered a "lossy" modifier ("typedef unsigned lossy long color", which
would represent a data type that will be dealing with potentially 64 bit
values but if it couldn't do that, or it wasn't fast to use longs and you'd
selected high optimization settings, it could substitute a smaller type)

Standardizing filters and such (with wrapping!) could make D a great image
processing language, wonderful for writing things that manipulate bitmaps.
One doesn't have to know much about modern 3D graphics to see that alot of
what goes on is specifying operations on millions of pixels by filling
triangles with gradients of values and merging that with the existing image
data somehow.  D could become the next Renderman or OpenGL.  Maybe too tight
a focus but the graphics would sure make a great library.  If we can chain
together operations on a source one could parse the language itself for the
operations..  source * x + dest * (1 - x)

I'd actually like to see bounded int types in the language.  Saturating int,
call it whatever you want.  Something that the hardware "tops out" or clamps
to the allowed range before storing into.  Alot of CPU's support this
concept, but no language that I know of ever has.  It's something the
compiler could do for you automatically (using software emulation if
necessary), and would sure be a convenient timesaver.  Pixels usually fall
into the category of saturated values

struct Pixel
{
  saturated(0 <= byte <= 255) r,g,b,a;
  or
  saturated(0, 255) byte r,g,b,a;
}

That leads me to another idea  optimizing only areas of the code that it can
tell are obvious inner loops (the part that needs optimizing the most).
Maybe some kind of depth-first search could traverse the code, optimizing
for a given amount of time, and when it's time to run it packs everything it
has so far together into an .exe and runs it.

Actually pixel could be a basic type.  There could be pixel which is the
ubiquitous 32-byte ARGB/RGBA whatever.  D could also have "named size" types
such as pixelARGB1555, pixelRGB565, pixelRGBA4444, etc, etc) actually this
is leading back to SIMD support because it's far easier to code this
yourself rather than rely on typedefs for the thousands of kinds of pixels
that are possible.  So I'd tackle the language feature at a lower level,
that of saturating scalar types, chains of math on values of 2d/3d arrays.

How would one describe mathematically what's going on in a typical SIMD
situation, such as compositing (blending) one black-and-white image onto
another based on the value in a third image.

void composite(in byte[][] image1, in byte[][] image2, in byte[][] mask, out
byte[][] result)
{
  // make sure all arrays are the same size.  Note use of introspection
capability to inquire the dimensions of an array.
  assert(image1.dim(1) == image2.dim(1) == mask.dim(1) && image1.dim(2) ==
image2.dim(2) == mask.dim(2))
  assert(image1.dim == image2.dim);  // shorthand for the above.

  // make the result the same size as the others
  result.resize(image1.dim(1),image2.dim(2))

  // for each value in result (recursively in the case of nested arrays) do
the mixing.
  // writing it this way allows the compiler to write the operation using
SIMD
  result.foreach[[x]]
  {
     result[[x]] = saturate_cast(byte) (image1[[x]] * mask[[x]] +
image2[[x]] * ~mask[[x]]);
  }

  // do the same thing but now just for a slice of the arrays, and using
only the first column of the mask
  result.foreach[[x]][[y]] in [result.dim(1)/4 ..
(result.dim(1)/4)*3][result.dim(2)/4 .. (result.dim(2)/4)*3]
  {
     result[[x]][[y]] = saturate_cast(byte) (image1[[x]][[y] *
mask[[x]][[0]] + image2[[x]][[y]] * ~mask[[x]][[0]]);
  }
}

the compiler can either transform that into
  for (int __i = 0; __i < result.dim(1); ++i)
    for (int __j = 0; __j < result.dim(2); ++j)
  {
     int temp = image1[__i][__j] * mask[__i][__j] + image2 *
~mask[__i][__j];
     result[__i][__j] = (temp < 0 ? 0 : temp > 255 ? 255 : temp);
  }

or into an optimized version of that (pulling those unchanging values out of
the loop helps alot) or even directly to SIMD asm code.  That SIMD stuff
would kick ass if it was integrated as tightly into the compiler and
optimizer as, say, the int type is.  ;)

Anyone who has ever written a 2D memory move or, say, a highly optimized
memcpy, knows how difficult it is to write this stuff without bugs and fast,
*and* in a general way.  It could however be entirely automated by the
compiler.  I think the compiler will have a much easier time changing

  result.foreach[[x]][[y]] in [result.dim(1)/4 ..
(result.dim(1)/4)*3][result.dim(2)/4 .. (result.dim(2)/4)*3]
  {
     result[[x]][[y]] = saturate_cast(byte) (image1[[x]][[y] *
mask[[x]][[0]] + image2[[x]][[y]] * ~mask[[x]][[0]]);
  }

into really tight SIMD code than it would trying to optimize the following
(which is the only way to express the idea to the compiler in languages such
as C):

  for (int __i = result.dim(1)/4; __i < (result.dim(1)/4)*3; ++i)
    for (int __j = result.dim(2)/4; __j < (result.dim(2)/4)*3; ++j)
  {
     int temp = image1[__i][__j] * mask[__i][__j] + image2 *
~mask[__i][__j];
     result[__i][__j] = (temp < 0 ? 0 : temp > 255 ? 255 : temp);
  }

If the loop limits are known at compile time, or god forbid the address was
known at compile time, it could output some nice custom-crafted, highly
optimized code utilizing whatever instructions and methods are available.
Doing that stuff isn't easy, but I've done it and I know how repetitive and
error-prone it is; but it wouldn't be incredibly hard to write a program
that writes traversal loops that are well optimized if you start out with
the concept that the optimizer would otherwise have to infer:  that you're
going to do the exact same operation to a whole bunch of data.  With a
language extension, you say this directly.  If you assume the compiler can
figure it out for you, it could, but it's much easier for you to just tell
it what you mean instead of writing it yourself.

One might be able to approach that level of sophisticated code generation if
you had a really badass inline assembler and really badass template system
that let you manipulate values and execute logic at compile time, but making
the badass inline assembler and badass template system is a lot harder than
just correcting a basic language deficiency:  the ability to specify a set
of operations on many datum as a single execution unit (leaving the
traversal code entirely up to the compiler, which for multidimensional
arrays, Walter's work now will save many a programmer countless man-hours of
work writing this type of code.  Code would be built custom for each loop
probably, and that's ok with me.

Now if we can find a way to integrate the concept of "brush" and "polygon"
and "mask" and "blend" and "filter" into the language, we're well on the way
to being a useful software rendering tool.

Have you ever written code to do bilinear filtering?  Conceptually it's
given 2 inputs and a 2d array, does 4 lookups, 2 "frac" modulo 1.0
operations, and blend the results together using 3 linear interpolations of
the form

byte BilinearFilter(byte[][] image, int x_times_16, int y_times_16)
{
  int ix = x_times_16>>4;
  int iy = y_times_16>>4;
  int fx = x_times_16&((1<<4)-1);
  int fy = y_times_16&((1<<4)-1);

  byte ul = image[iy][ix], ur = image[iy][ix+1], bl = image[iy+1][ix], br =
image[iy+1][ix+1];
  byte terp1 = (ul * fx + ur * (15 - fx))>>4;
  byte terp2 = (bl * fx + br * (15 - fx))>>4;
  return (terp1 * fy + terp2 * (15 - fy))>>4;
}

Not sure what kind of language feature would make that kind of mess easier
to deal with.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:abk9iv$n2j$3 digitaldaemon.com...
 Where the bit type comes into its own is in very large arrays of bits.

 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920BB36E7D600patcodemooncom 63.105.9.61...
 "OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1
  digitaldaemon.com:
 So we do need a bool type, byte sized!
typedef ubyte bool; I am sort of doubting the utility of the bit type for flags. I think I may go back to schemes like this. enum flags { f1=1, f2=2, f3=4, f4=8, f5=16 } flags a = flags.f1 | flags.f4;
May 11 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
You're far beyond me in graphics code. But one possibility for easilly
generating loop code would be to, say, build a simple compiler for a
specialized language that generated D code. I use similar kinds of things
for building tables. This would be easilly modifiable to generate custom
code.
May 12 2002
parent "Sean L. Palmer" <spalmer iname.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:abl940$1h50$1 digitaldaemon.com...
 You're far beyond me in graphics code. But one possibility for easilly
 generating loop code would be to, say, build a simple compiler for a
 specialized language that generated D code. I use similar kinds of things
 for building tables. This would be easilly modifiable to generate custom
 code.
This array manipulation is just the basics of graphics code. Everything today uses polygons, not rectangles. I have some ideas for representing triangles in D but I'll save it til the right time. Sean
May 12 2002
prev sibling parent Russell Borogove <kaleja estarcion.com> writes:
Sean L. Palmer wrote:
 I wonder if C++ has ever
 considered a "lossy" modifier ("typedef unsigned lossy long color", which
 would represent a data type that will be dealing with potentially 64 bit
 values but if it couldn't do that, or it wasn't fast to use longs and you'd
 selected high optimization settings, it could substitute a smaller type)
This seems like such a small incremental win (an occasional increase in portability for certain kinds of code) for the effort it would take to specify and implement the feature that I don't see any language doing it anytime soon. Of course, you could put it in a class via operator overloading; it's up to the library implementor to port it instead of the compiler implementor.
 
 I'd actually like to see bounded int types in the language.  Saturating int,
 call it whatever you want.  Something that the hardware "tops out" or clamps
 to the allowed range before storing into.  Alot of CPU's support this
 concept,
Not very many architectures do, and not very generally, though...
 but no language that I know of ever has.  It's something the
 compiler could do for you automatically (using software emulation if
 necessary), and would sure be a convenient timesaver. 
Yeah, I called for this one a while ago. You should be able to use it on floats and in arbitrary non-power-of-two ranges as well. Besides arbitrary saturating ranges, arbitrary wrapping ranges would be nice as well. typedef saturated( 1, 5 ) char review_rating; // 1 to 5 stars typedef wrapped( 0.0, 2.0*3.14159265358 ) float phase_angle; wrapped types would effectively do a % or fmod every time they were evaluated unless the compiler could guarantee they were already in-range.
 Have you ever written code to do bilinear filtering?  Conceptually it's
 given 2 inputs and a 2d array, does 4 lookups, 2 "frac" modulo 1.0
 operations, and blend the results together using 3 linear interpolations...

 Not sure what kind of language feature would make that kind of mess easier
 to deal with.
I've been playing with audio DSP stuff lately, and (usually 1D) interpolations and fmod(x,1.0) or f-int(f) are also heavily used therein. I have no trouble with this stuff as part of a standard library, but I'm less comfortable with making it part of the language. -Russell B
May 12 2002
prev sibling parent reply "OddesE" <OddesE_XYZ hotmail.com> writes:
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920BB36E7D600patcodemooncom 63.105.9.61...
 "OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1
  digitaldaemon.com:
 So we do need a bool type, byte sized!
typedef ubyte bool;
*snif* And we are back in the bad old C days. This totally sucks and we all know why! We need a byte sized *bool*! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
May 14 2002
parent "OddesE" <OddesE_XYZ hotmail.com> writes:
"OddesE" <OddesE_XYZ hotmail.com> wrote in message
news:abrr5k$1acl$1 digitaldaemon.com...
 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920BB36E7D600patcodemooncom 63.105.9.61...
 "OddesE" <OddesE_XYZ hotmail.com> wrote in news:abjvsl$fem$1
  digitaldaemon.com:
 So we do need a bool type, byte sized!
typedef ubyte bool;
*snif* And we are back in the bad old C days. This totally sucks and we all know why! We need a byte sized *bool*! -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
I am forking this of into a new thread, because I think it is important... -- Stijn OddesE_XYZ hotmail.com http://OddesE.cjb.net _________________________________________________ Remove _XYZ from my address when replying by mail
May 14 2002
prev sibling parent "Sean L. Palmer" <spalmer iname.com> writes:
That sucks.  The compiler should do some behind-the-scenes manipulation
there so that it *can* store into that bit, instead of telling the
programmer (with an error!) that he shouldn't be returning bits from
functions.  Either extend the concept of pointer for pointers to bit such
that the pointer is not just the pointer to the memory, but also a combined
bit index so the result bit can be shifted during the exit code for a
function from the return value buffer into the correct bit of whatever it
points to.  Slightly larger pointer.  Or have it implemented as a *byte and
have the calling code deal with putting the value there and stuffing it back
into its rightful place.

But don't just disallow it.  Exceptions such as this will ruin a language.
Just have the compiler Do The Right Thing, which is exactly what you asked,
nay commanded it to do.  The "how" is an implementation question.  Good
programmers avoid using features which are too slow, but whether a
particular implementation is acceptable is up to the programmer.  I suppose
Walter's D compiler could even implement it to return an error, I just don't
think it should be part of the language spec that you can't use bits as out
parameters.

Sean

"Pavel Minayev" <evilone omen.ru> wrote in message
news:abiv29$2lhf$1 digitaldaemon.com...
 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920B30A21022patcodemooncom 63.105.9.61...

 Is this right?
 err.d(2): cannot have out parameter of type bit
There can be no pointers to bits. Since out parameters are implicitly dereferenced pointers, they cannot be bit.
May 11 2002
prev sibling parent reply Patrick Down <pat codemoon.com> writes:
The following causes the compiler to generate:
Internal error: ..\ztc\cgcs.c 348

void foo(inout bit[] test)
{
  test[0] = 1;
  test[1] = 0;
  test[2] = 1;
}

int main(char[][] args)
{  
  
  bit[] a;
  a.length = 32;

  for(int i = 0; i < a.length; ++i)
    a[i] = 0;
  
  foo(a[3..6]); // foo(a) works ok
  
  for(int i = 0; i < a.length; ++i)
    printf("%d",a[i]);
    
  printf("\n");
  
  return 0;
}
May 11 2002
parent reply "Walter" <walter digitalmars.com> writes:
Thanks for reporting this. The bit type is turning out to be a constant
special case in the compiler.

"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920B8C31915A4patcodemooncom 63.105.9.61...
 The following causes the compiler to generate:
 Internal error: ..\ztc\cgcs.c 348

 void foo(inout bit[] test)
 {
   test[0] = 1;
   test[1] = 0;
   test[2] = 1;
 }

 int main(char[][] args)
 {

   bit[] a;
   a.length = 32;

   for(int i = 0; i < a.length; ++i)
     a[i] = 0;

   foo(a[3..6]); // foo(a) works ok

   for(int i = 0; i < a.length; ++i)
     printf("%d",a[i]);

   printf("\n");

   return 0;
 }
May 11 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
Of course.  It's what the other types are built out of.  THE most
fundamental type that has storage.  I'm not surprised that it's a special
case.  All the implementation has to deal with it as an array contained in a
series of fixed-size arrays (the machine word), not as one array.  I think
you'll find that if you can write bit-packing code such as dynamic bit array
that SIMD actually uses the same concept and in fact it's just an extension
of the concept of ints being fixed-sized containers of bits with special
operations.  SIMD types are just fixed-sized containers of bytes or ints or
floats with special operations.

Wouldn't it be nice to generalize this entire concept into some kind of SIMD
type?

You could declare

typedef bit[[32]] machine_word;

or

typedef float[[4]] vector;

then write shorthand

vector a,b,c;

a[[]] += b[[]] * c[[]];

I'm using [[]] as shorthand for "everything in not only this dimension but
all smaller dimensions as well".

The compiler would be able to generate pretty optimal code using SIMD even
if the type you declare isn't exactly available as a basic SIMD register
type.  For instance if you declare

typedef float[[6]] vector6;

vector a,b,c;

a[[]] += b[[]] * c[[]];

and the compiler only had a 4-float SIMD register available it could still
use that for part of the operation and either another 4-float SIMD register
or 2 FPU registers to do the remainder of the work.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:abk9iu$n2j$2 digitaldaemon.com...
 Thanks for reporting this. The bit type is turning out to be a constant
 special case in the compiler.

 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920B8C31915A4patcodemooncom 63.105.9.61...
 The following causes the compiler to generate:
 Internal error: ..\ztc\cgcs.c 348

 void foo(inout bit[] test)
 {
   test[0] = 1;
   test[1] = 0;
   test[2] = 1;
 }

 int main(char[][] args)
 {

   bit[] a;
   a.length = 32;

   for(int i = 0; i < a.length; ++i)
     a[i] = 0;

   foo(a[3..6]); // foo(a) works ok

   for(int i = 0; i < a.length; ++i)
     printf("%d",a[i]);

   printf("\n");

   return 0;
 }
May 11 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
I've been thinking about this and it seems that it might be better to extend
the slicing mechanism D already has.

D already has:

typedef float[4] vector;
vector a,b,c;
a[] += b[] * c[];

which the compiler can easily tell what you intend, but what are the rules
for what happens when the sizes of the arrays don't match?  What happens if
they're multidimensional?

D can also already do this:

a[1..3] += b[1..3] * c[1..3];

If one defined a range like so:

range int r = 1..3;
a[r] += b[r] * c[r];

would work as convenient shorthand.

For multidimensional slicing, it could work as follows:

typedef float[4][4] matrix;
matrix a,b,c;
range(int) i = 1..3;
range(int) j = 1..3;
a[i][j] = 2.0f * sqrt(b[i][j] + c[i][j+1]);

So I guess for this to work ranges would have to be a class A type in the D
language.  They could be compile-time constants or variables, or a mixture
of the two.  The generated code would obviously benefit from the ranges
being compile-time constants, but that shouldn't be necessary.  If one
writes an expression with side-effects using a range in place of a value
inside an array access, there should be a few rules:

Only one range may be included in any pair of brackets, along with optional
integer offset or scale.

The entire expression is evaluated once for each permutation of
possibilities in each of the included ranges.  If you have a range R from
0..2 (two possibilities, 0 and 1) and a 3d array int x[][][], then
x[R][R][R] += 1 would evaluate to eight increments.

The order of evaluation is undefined however it would be nice if one could
control this behavior so that overlapping moves do the right thing.  It
should be possible to figure out whether to go forward or backward for each
range by evaluating the dependencies in the graph of information flow.  I
can't see how it would handle say a matrix transpose gracefully though:

typedef float[4][4] matrix;
matrix a;
range(int) i = 0..4;
range(int) j = 0..4;
swap(a[i][j],a[j][i]);

Writing memcpy() is hard on today's processors.  This job will be far more
difficult as what's needed is a program that generates optimized code to
execute a loop that deals with alignment and direction and multiple
operations instead of writing the one program yourself which only has just
copying.  I know memset is about half as difficult to write as memcpy...
this sounds like 4 or 5 times harder than writing optimized memcpy.  The
unoptimized version is still useful though and it's not hard to write at
all.  Walter already has most of what's needed.

Sean


"Sean L. Palmer" <spalmer iname.com> wrote in message
news:abkl59$vpu$1 digitaldaemon.com...
 Of course.  It's what the other types are built out of.  THE most
 fundamental type that has storage.  I'm not surprised that it's a special
 case.  All the implementation has to deal with it as an array contained in
a
 series of fixed-size arrays (the machine word), not as one array.  I think
 you'll find that if you can write bit-packing code such as dynamic bit
array
 that SIMD actually uses the same concept and in fact it's just an
extension
 of the concept of ints being fixed-sized containers of bits with special
 operations.  SIMD types are just fixed-sized containers of bytes or ints
or
 floats with special operations.

 Wouldn't it be nice to generalize this entire concept into some kind of
SIMD
 type?

 You could declare

 typedef bit[[32]] machine_word;

 or

 typedef float[[4]] vector;

 then write shorthand

 vector a,b,c;

 a[[]] += b[[]] * c[[]];

 I'm using [[]] as shorthand for "everything in not only this dimension but
 all smaller dimensions as well".

 The compiler would be able to generate pretty optimal code using SIMD even
 if the type you declare isn't exactly available as a basic SIMD register
 type.  For instance if you declare

 typedef float[[6]] vector6;

 vector a,b,c;

 a[[]] += b[[]] * c[[]];

 and the compiler only had a 4-float SIMD register available it could still
 use that for part of the operation and either another 4-float SIMD
register
 or 2 FPU registers to do the remainder of the work.

 Sean

 "Walter" <walter digitalmars.com> wrote in message
 news:abk9iu$n2j$2 digitaldaemon.com...
 Thanks for reporting this. The bit type is turning out to be a constant
 special case in the compiler.

 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920B8C31915A4patcodemooncom 63.105.9.61...
 The following causes the compiler to generate:
 Internal error: ..\ztc\cgcs.c 348

 void foo(inout bit[] test)
 {
   test[0] = 1;
   test[1] = 0;
   test[2] = 1;
 }

 int main(char[][] args)
 {

   bit[] a;
   a.length = 32;

   for(int i = 0; i < a.length; ++i)
     a[i] = 0;

   foo(a[3..6]); // foo(a) works ok

   for(int i = 0; i < a.length; ++i)
     printf("%d",a[i]);

   printf("\n");

   return 0;
 }
May 12 2002
next sibling parent reply "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:abl6cj$1f1g$1 digitaldaemon.com...
 typedef float[4] vector;
 vector a,b,c;
 a[] += b[] * c[];
 which the compiler can easily tell what you intend, but what are the rules
 for what happens when the sizes of the arrays don't match?  What happens
if
 they're multidimensional?
That would be a runtime error.
 D can also already do this:

 a[1..3] += b[1..3] * c[1..3];

 If one defined a range like so:

 range int r = 1..3;
 a[r] += b[r] * c[r];

 would work as convenient shorthand.
I never thought of that.
 For multidimensional slicing, it could work as follows:

 typedef float[4][4] matrix;
 matrix a,b,c;
 range(int) i = 1..3;
 range(int) j = 1..3;
 a[i][j] = 2.0f * sqrt(b[i][j] + c[i][j+1]);
D doesn't currently allow multidimensional slicing, though it's an obvious future extension.
 The order of evaluation is undefined however it would be nice if one could
 control this behavior so that overlapping moves do the right thing.  It
 should be possible to figure out whether to go forward or backward for
each
 range by evaluating the dependencies in the graph of information flow.
The best thing to do with overlapping ranges is disallow them. That cleans up the semantics, and enables better code generation.
May 12 2002
next sibling parent reply Patrick Down <pat codemoon.com> writes:
"Walter" <walter digitalmars.com> wrote in
news:abm4c0$25l6$1 digitaldaemon.com: 
 
 The best thing to do with overlapping ranges is disallow them. That
 cleans up the semantics, and enables better code generation.
I understand your reasons for disallowing overlapping ranges but it seems to make the language less consistant. Would it be a problem to have a small check at the begining of a copy to check for overlapping ranges and switch between two differnt copy routines based on the result?
May 12 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920C89BAF31DApatcodemooncom 63.105.9.61...
 "Walter" <walter digitalmars.com> wrote in
 news:abm4c0$25l6$1 digitaldaemon.com:
 The best thing to do with overlapping ranges is disallow them. That
 cleans up the semantics, and enables better code generation.
I understand your reasons for disallowing overlapping ranges but it seems to make the language less consistant. Would it be a problem to have a small check at the begining of a copy to check for overlapping ranges and switch between two differnt copy routines based on the result?
That's how memcpy is usually implemented. If everything about the addresses involved is known at compile time, the compiler can select between several different code snippet templates to generate the code, ranging from completely unoptimized all the way down to just completely unrolling the loop. I know this feels like one of the basics, Walter, and I know you're itching to move on to bigger and better things, but the fact that moving blocks of data is one of the basics of computing indicates that it needs robust language support, not special cases. Sean
May 12 2002
next sibling parent Russell Borogove <kaleja estarcion.com> writes:
Sean L. Palmer wrote:
 "Patrick Down" <pat codemoon.com> wrote in message
 news:Xns920C89BAF31DApatcodemooncom 63.105.9.61...
I understand your reasons for disallowing overlapping ranges but
it seems to make the language less consistant.  Would it be a
problem to have a small check at the begining of a copy to check
for overlapping ranges and switch between two differnt copy routines
based on the result?
That's how memcpy is usually implemented.
ITYM memmove(); C standard memcpy() has undefined behavior for overlapping ranges, and I'd consider it a violation of the implicit performance contract for memcpy() to check the ranges. -RB
May 12 2002
prev sibling parent reply "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:abmetg$2e7i$1 digitaldaemon.com...
 I know this feels like one of the basics, Walter, and I know you're
itching
 to move on to bigger and better things, but the fact that moving blocks of
 data is one of the basics of computing indicates that it needs robust
 language support, not special cases.
You're right, it is a basic thing, and I have not just blown it off. I use memcpy() very regularly, and I depend on it. With a few observations, I think you'll see my reasoning for disallowing overlapping copies, even if you don't agree with it: 1) 99% of memory copies are not overlapping. 2) overall program performance tends to be sensitive to memcpy speed. 3) putting the test in for each copy will sacrifice performance and add bloat to satisfy only 1% of the cases. 4) I've seen memcpy code that *relied* on the wrong way overlapping copy behavior. 5) Because of (4), I think if one needs to do an overlapping copy, one does need to decide just what it means to do an overlapping copy, and explicitly code appropriately. Leaving it implicit can be misleading. 6) In debug builds, checks for overlapping are inserted, so one will not inadvertantly fall prey to copy direction bugs. 7) And most importantly, it gets rid of the "aliasing" problem C has that prevents array operations from being compiled as efficiently as FORTRAN.
May 14 2002
parent reply "Sean L. Palmer" <spalmer iname.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:abqfqk$574$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:abmetg$2e7i$1 digitaldaemon.com...
 I know this feels like one of the basics, Walter, and I know you're
itching
 to move on to bigger and better things, but the fact that moving blocks
of
 data is one of the basics of computing indicates that it needs robust
 language support, not special cases.
You're right, it is a basic thing, and I have not just blown it off. I use memcpy() very regularly, and I depend on it. With a few observations, I think you'll see my reasoning for disallowing overlapping copies, even if you don't agree with it: 1) 99% of memory copies are not overlapping. 2) overall program performance tends to be sensitive to memcpy speed. 3) putting the test in for each copy will sacrifice performance and add bloat to satisfy only 1% of the cases.
You are right. I suppose. <g>
 4) I've seen memcpy code that *relied* on the wrong way overlapping copy
 behavior.
I still don't like special cases, but I have actually seen code that used a wrong-way memcpy before too, (though why they didn't just write a pattern fill routine I don't know..) When writing overlapping memcpy, 99% of the time you'll want to do it the "right" way. Leaving it explicit leaves open a possibility for the programmer to accidentally write bugs in what would otherwise be a perfectly formed compiler-generated memory shift. Sample bug: If I scroll up it works, but if I scroll down it destroys all entries with copies of the first entry.
 5) Because of (4), I think if one needs to do an overlapping copy, one
does
 need to decide just what it means to do an overlapping copy, and
explicitly
 code appropriately. Leaving it implicit can be misleading.
They can always code an explicit loop if they want explicit control. i.e. want to copy the "wrong" way.
 6) In debug builds, checks for overlapping are inserted, so one will not
 inadvertantly fall prey to copy direction bugs.
Won't inadvertently fall prey to some bugs at least. I hope your user's program doesn't die when his client tries to copy a record to clipboard, then later tries to paste it back onto the original data... if the cut is lazy you may very well end up with a "copying range exactly onto itself" NO-OP which is caught by a runtime check and causes the program to die. Maybe if start and end of ranges of memory are identical it should fall thru as no-op instead of runtime assert? But that's more runtime checks needed in debug build.
 7) And most importantly, it gets rid of the "aliasing" problem C has that
 prevents array operations from being compiled as efficiently as FORTRAN.
I've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what the compiler is probably doing under there... I know aliasing more in terms of bugs that happen when you pass the same argument in as both an input and an output (via two different parameters). If the implementation relies on the two objects not being the same, you get a bug. It prevents certain optimizations from being 100% safe. I prefer to code without aliasing when possible so I can turn up the optimization more. There are many ways to have aliasing, I'm just curious how potential aliasing affects the array optimization problem. Sean
May 14 2002
parent reply "Walter" <walter digitalmars.com> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:abqkno$9cv$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:abqfqk$574$1 digitaldaemon.com...
 7) And most importantly, it gets rid of the "aliasing" problem C has
that
 prevents array operations from being compiled as efficiently as FORTRAN.
I've never tried to code a highly optimized backend for arrays. Can you describe this problem in more detail? I'm curious to know what the
compiler
 is probably doing under there...
Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple iterations of the loop can be interleaved with the stores delayed. But in C these optimizations cannot be done, because if the arrays overlap then the result of one iteration of the loop can be dependent on the result of the previous iteration.
May 14 2002
next sibling parent reply Patrick Down <pat codemoon.com> writes:
"Walter" <walter digitalmars.com> wrote in
news:abrcj6$tsq$1 digitaldaemon.com: 
 Let's say you are trying to compute a=b+c, where a, b and c are
 arrays. In C, you'd write:
 
     for (i = 0; i < n; i++)
         a[i] = b[i] + c[i];
 
 If a CPU has parallel execution capabilities, they can be exploited to
 optimize code (such as a vector add instruction). Also, multiple
 iterations of the loop can be interleaved with the stores delayed. But
 in C these optimizations cannot be done, because if the arrays overlap
 then the result of one iteration of the loop can be dependent on the
 result of the previous iteration.
So is a[] = a[] + b[]; also going to be classified as and overlapped copy?
May 14 2002
parent "Walter" <walter digitalmars.com> writes:
"Patrick Down" <pat codemoon.com> wrote in message
news:Xns920E81D566F62patcodemooncom 63.105.9.61...
 "Walter" <walter digitalmars.com> wrote in
 news:abrcj6$tsq$1 digitaldaemon.com:
 Let's say you are trying to compute a=b+c, where a, b and c are
 arrays. In C, you'd write:

     for (i = 0; i < n; i++)
         a[i] = b[i] + c[i];

 If a CPU has parallel execution capabilities, they can be exploited to
 optimize code (such as a vector add instruction). Also, multiple
 iterations of the loop can be interleaved with the stores delayed. But
 in C these optimizations cannot be done, because if the arrays overlap
 then the result of one iteration of the loop can be dependent on the
 result of the previous iteration.
So is a[] = a[] + b[]; also going to be classified as and overlapped copy?
Good question. It's not an overlapped copy because each array element is not dependent on any other array element in the same array.
May 14 2002
prev sibling parent "Sean L. Palmer" <spalmer iname.com> writes:
And if you can't guarantee that they don't overlap, then the optimization
can't be done safely.

Gotcha.

Usually there'll be one of two situations:  Either the compiler can tell
that they don't overlap at compile time, or it can't.  If it can't, it's
worth spending a few instructions checking so it can do the optimized inner
loop in the likely situation that they don't in fact overlap.  It's probably
gotta check for alignment anyway to use SIMD so it may as well check overlap
as well.  If they overlap, or if the alignment sucks, do the unoptimized
loop.  If the array sizes are large, the overhead of the checks will be
negligible in comparison.  If the array sizes are small, well, too bad
there's nothing we can do to help.

Or just disallow overlap.  Problem solved, albeit with a sledgehammer.

Sean

"Walter" <walter digitalmars.com> wrote in message
news:abrcj6$tsq$1 digitaldaemon.com...
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:abqkno$9cv$1 digitaldaemon.com...
 "Walter" <walter digitalmars.com> wrote in message
 news:abqfqk$574$1 digitaldaemon.com...
 7) And most importantly, it gets rid of the "aliasing" problem C has
that
 prevents array operations from being compiled as efficiently as
FORTRAN.
 I've never tried to code a highly optimized backend for arrays.  Can you
 describe this problem in more detail?  I'm curious to know what the
compiler
 is probably doing under there...
Let's say you are trying to compute a=b+c, where a, b and c are arrays. In C, you'd write: for (i = 0; i < n; i++) a[i] = b[i] + c[i]; If a CPU has parallel execution capabilities, they can be exploited to optimize code (such as a vector add instruction). Also, multiple
iterations
 of the loop can be interleaved with the stores delayed. But in C these
 optimizations cannot be done, because if the arrays overlap then the
result
 of one iteration of the loop can be dependent on the result of the
previous
 iteration.
May 14 2002
prev sibling parent "Sean L. Palmer" <spalmer iname.com> writes:
"Walter" <walter digitalmars.com> wrote in message
news:abm4c0$25l6$1 digitaldaemon.com...
 The order of evaluation is undefined however it would be nice if one
could
 control this behavior so that overlapping moves do the right thing.  It
 should be possible to figure out whether to go forward or backward for
each
 range by evaluating the dependencies in the graph of information flow.
The best thing to do with overlapping ranges is disallow them. That cleans up the semantics, and enables better code generation.
I'd weigh carefully before making it illegal. People use memcpy and strcpy with overlapping all the time. I'd strongly suggest trying to make the compiler "do the right thing". I suppose someone could always write their own (probably non-optimal) move function that deals with this. There are simple rules for it for 1d overlapping array copies: if source doesn't overlap destination, do it however you please, i.e. whatever's fastest. if source begin is equal to destination begin, nothing needs to be done so long as you're just copying (math ops would still need to be done) if source begin is less than destination begin, iterate from the end to the start if source begin is greater than destination begin, iterate from the start to the end I think these rules could be easily extended to multiple dimensions, by applying the same rule separately for each dimension. If you ignore these rules, and do it the other way, what it actually accomplishes is a "pattern fill" where the pattern size is equal to the offset between the starts of the ranges. But that's not what the user asked for, they wanted the entire range copied intact. One could see problems arise when the copy was to be done using a larger memory unit than the size of the type, such as copy using 128-bit blocks. But those kinds of copies usually have to be aligned anyway so the compiler would have to know the addresses at compile time or a "copy dispatch" wrapper would have had to decide that the large block move is ok. If not, it does the slower "one item at a time" move. Here again we have an arbitrary decision to limit a language feature just because it makes the compiler implementation a bit easier. At least with your solution we don't end up with a language that doesn't specify the behavior ("implementation-defined" ala C++) ;) Sean
May 12 2002
prev sibling parent reply "Stephen Fuld" <s.fuld.pleaseremove att.net> writes:
"Sean L. Palmer" <spalmer iname.com> wrote in message
news:abl6cj$1f1g$1 digitaldaemon.com...
 I've been thinking about this and it seems that it might be better to
extend
 the slicing mechanism D already has.

 D already has:

 typedef float[4] vector;
 vector a,b,c;
 a[] += b[] * c[];

 which the compiler can easily tell what you intend, but what are the rules
 for what happens when the sizes of the arrays don't match?  What happens
if
 they're multidimensional?

 D can also already do this:

 a[1..3] += b[1..3] * c[1..3];

 If one defined a range like so:

 range int r = 1..3;
 a[r] += b[r] * c[r];

 would work as convenient shorthand.
Yup. That is one of the nice things you can do to extend the syntax once you have ranges. Another is to symplify the common case of loop variables incrementing by 1. range int r = 1..10; For (r) . . . etc. And, since they are defined at compile time, the range variable name can be used to specify the size of an array. There are a lot of advantages in maintainability since you can change the range in one place and it automaically gets all the places. -- - Stephen Fuld e-mail address disguised to prevent spam
May 12 2002
parent Russell Borogove <kaleja estarcion.com> writes:
Stephen Fuld wrote:
 "Sean L. Palmer" <spalmer iname.com> wrote in message
 news:abl6cj$1f1g$1 digitaldaemon.com...
If one defined a range like so:

range int r = 1..3;
a[r] += b[r] * c[r];

would work as convenient shorthand.
Yup. That is one of the nice things you can do to extend the syntax once you have ranges. Another is to symplify the common case of loop variables incrementing by 1. range int r = 1..10; For (r) . . .
Eeek! I would beg for a distinct keyword like "foreach" in both those cases to make it totally clear that these were vectorized/iterated/looped operations in cases where the range definition was far away from the range usage: range int r = 1..3; foreach (r) { a[r] += b[r] * c[r]; } Other than that, it's good. -RB
May 12 2002