digitalmars.D - Unichar optimization: performance vs RAM usage. Your opinion?

Hauke Duden (19/19) Jun 08 2004 I had an idea last night on how to optimize the unichar module to only

Arcane Jill (9/10) Jun 08 2004 Huh? There are over a million codepoints in Unicode (1,114,112, in fact)...

Hauke Duden (9/22) Jun 08 2004 Not quite ;). If you look at the data then you'll see that only the

Kevin Bealer (7/29) Jun 09 2004 I've seen the idea of a two-level table mentioned on one of the Unicode ...

Arcane Jill (8/17) Jun 08 2004 No bitshifting. You just need tables for blocks of 256 characters, then ...

Arcane Jill (4/10) Jun 08 2004 That's assuming little-endian architecture, of course. You'd need two ve...
Hauke Duden (9/35) Jun 08 2004 Splitting the data into blocks and using the same memory for equal

Walter (4/4) Jun 08 2004 It's such a huge reduction in memory usage, I'd go for it. Reducing memo...

DemmeGod (4/8) Jun 08 2004 I think what Walter means to say is that it'll never make it into Phobos

Hauke Duden <H.NS.Duden gmx.net> writes:

I had an idea last night on how to optimize the unichar module to only 
use about 50 KB instead of the current 2 MB of RAM at runtime. It may 
also be possible to reduce the impact on the executable size (12 KB 
currently) but I haven't worked that out fully yet.

Unfortunately the change would make the lookup code slower. For example, 
translating "chr" to lower case would have to be changed from

return lowerData[chr]

to

return chr + lowerData[chr>>9][chr & 511];

In other words instead of one memory lookup it will do two memory 
lookups, a shift, and AND and an addition. It is not THAT expensive, but 
in an inner loop it might add up.

And people have lots of RAM nowadays - 2 MB is a small amount and since 
only small parts of it will be actually used (depending on the language 
the program is working with) most of it will be paged to the disk by the OS.

So what do you think: is the lower RAM usage worth the performance 
decrease? I could also implement both versions and select one at compile 
time, but then the question is which one is the default?


Hauke

Jun 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...

return lowerData[chr]

Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That
must be a ----ing HUGE lookup table! Even if the data were only one byte per
codepoint, that's a meg already.

Of course it's reasonable to compress that. There are huge swathes of codepoints
that are just totally unused (yet), and many, many blocks in which all
characters have identical properties. One humungous lookup table is definitely
not how I would have done it. Maybe I'm just not enough of a speed freak.

Jill

Jun 08 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:

 In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...
 
 
return lowerData[chr]

 
 
 Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That
 must be a ----ing HUGE lookup table! Even if the data were only one byte per
 codepoint, that's a meg already.

Not quite ;). If you look at the data then you'll see that only the 
first ~200.000 actually have case mappings. That translates to about 2 
MB of needed RAM (for all tables together).

 Of course it's reasonable to compress that. There are huge swathes of
codepoints
 that are just totally unused (yet), and many, many blocks in which all
 characters have identical properties. One humungous lookup table is definitely
 not how I would have done it. Maybe I'm just not enough of a speed freak.

Dividing the Unicode range into pages and using the same memory for 
identical ones is exactly the optimization I think about doing. The 
question is if the modest (by today's standards) decrease in needed RAM 
is worth the decrease in performance.

Hauke

Jun 08 2004

Kevin Bealer <Kevin_member pathlink.com> writes:

In article <ca4aap$11jk$1 digitaldaemon.com>, Hauke Duden says...
Arcane Jill wrote:

 In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...
 
 
return lowerData[chr]

 
 
 Huh? There are over a million codepoints in Unicode (1,114,112, in fact). That
 must be a ----ing HUGE lookup table! Even if the data were only one byte per
 codepoint, that's a meg already.

Not quite ;). If you look at the data then you'll see that only the 
first ~200.000 actually have case mappings. That translates to about 2 
MB of needed RAM (for all tables together).

 Of course it's reasonable to compress that. There are huge swathes of
codepoints
 that are just totally unused (yet), and many, many blocks in which all
 characters have identical properties. One humungous lookup table is definitely
 not how I would have done it. Maybe I'm just not enough of a speed freak.

Dividing the Unicode range into pages and using the same memory for 
identical ones is exactly the optimization I think about doing. The 
question is if the modest (by today's standards) decrease in needed RAM 
is worth the decrease in performance.

Hauke

I've seen the idea of a two-level table mentioned on one of the Unicode sites;
it was one of the recommended implementations for a lookup table.  I used a
version of this at IBM to handle ASCII/EBCDIC/Unicode translation.

I think doing an extra table lookup per character will not be a big issue, since
the first table (the pointers to the other tables) will probably stay in cache.

Kevin

Jun 09 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...

return chr + lowerData[chr>>9][chr & 511];

You can get faster than that.

       union U
       {
           dchar c;
           ubyte[3] n;
       }

       U u;
       u.c = chr;
       return lowerData[u.n[2]][u.n[1]][u.n[0]];

No bitshifting. You just need tables for blocks of 256 characters, then tables
for blocks of 256 pointers to these, then tables for blocks of 17 pointers to
these. Duplicate blocks (in particular, those for unassigned codepoints) can
simply be the same block.

Well, that's how I would have done it.
Jill

Jun 08 2004

Arcane Jill <Arcane_member pathlink.com> writes:

In article <ca46cu$rfj$1 digitaldaemon.com>, Arcane Jill says...
       union U
       {
           dchar c;
           ubyte[3] n;
       }


That's assuming little-endian architecture, of course. You'd need two versions
to cover both possibilities. 

Jill

Jun 08 2004

Hauke Duden <H.NS.Duden gmx.net> writes:

Arcane Jill wrote:
 In article <ca3vvv$gpf$1 digitaldaemon.com>, Hauke Duden says...
 
 
return chr + lowerData[chr>>9][chr & 511];

 
 
 You can get faster than that.
 
 
      union U
      {
          dchar c;
          ubyte[3] n;
      }

      U u;
      u.c = chr;
      return lowerData[u.n[2]][u.n[1]][u.n[0]];

 
 
 No bitshifting. You just need tables for blocks of 256 characters, then tables
 for blocks of 256 pointers to these, then tables for blocks of 17 pointers to
 these. Duplicate blocks (in particular, those for unassigned codepoints) can
 simply be the same block.
 
 Well, that's how I would have done it.

Splitting the data into blocks and using the same memory for equal 
blocks is indeed the main idea behind the optimization.

But doing 3 memory lookups with byte indices is probably slower than 
doing the bit shift and 2 lookups. Memory is usually a bottleneck and 
shifts have been 1-cycle operations for ages (even on the 386 IIRC). 
Pipelining is also possible to some degree.

Also, byte-based operations are slower than int-based ones on many CPUs.

Hauke

Jun 08 2004

"Walter" <newshound digitalmars.com> writes:

It's such a huge reduction in memory usage, I'd go for it. Reducing memory
consumption has its own associated performance speedups - less swapping,
less memory for the gc to scan, etc. Keep the faster version there with a
version {} statement, though, in case experience shows otherwise.

Jun 08 2004

DemmeGod <me demmegod.com> writes:

I think what Walter means to say is that it'll never make it into Phobos
using 2mb of ram.  Yes?

John

On Tue, 08 Jun 2004 09:23:08 -0700, Walter wrote:

 It's such a huge reduction in memory usage, I'd go for it. Reducing memory
 consumption has its own associated performance speedups - less swapping,
 less memory for the gc to scan, etc. Keep the faster version there with a
 version {} statement, though, in case experience shows otherwise.

Jun 08 2004

D Programming

C/C++ Programming

Other

digitalmars.D - Unichar optimization: performance vs RAM usage. Your opinion?