digitalmars.D.learn - Variable-Length Bit-Level Encoding

=?UTF-8?B?Tm9yZGzDtnc=?= (12/12) Nov 12 2016 I'm looking for libraries/snippets (either in D or similar

=?UTF-8?B?Tm9yZGzDtnc=?= (3/6) Nov 12 2016 Doh, forget the normal distribution thing here. It, of course,
James Buren (9/12) Nov 12 2016 I don't recall the name, but there is an algorithm for encoding

LiNbO3 (3/15) Nov 13 2016 Maybe the LEB128 encoding [1] ?

Era Scarecrow (13/20) Nov 12 2016 Hmm off hand I'm recalling there's a different encoding that
Arafel (3/15) Nov 14 2016 If you have a sample of your data, perhaps Huffman codes

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

I'm looking for libraries/snippets (either in D or similar 
languages) that perform variable-length encoding of unsigned 
integers onto a bit-stream. Requirement is that smaller inputs 
(integer values) should be encoded with equal or fewer bits.

This

0 => [0]
1 => [1,0]
2 => [1,1,0]

is easy but assumes a too extreme input value distribution.

Does anybody have a suggestion for an encoder that is more 
suitable for real-world values that are, for instance, normally 
distributed?

Nov 12 2016

=?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?

Doh, forget the normal distribution thing here. It, of course, 
doesn't work with unsigned integers.

Nov 12 2016

James Buren <ryu0 ymail.com> writes:

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?

I don't recall the name, but there is an algorithm for encoding 
data of an arbitrary number of bytes/bits into a stream of 
octets. It reserves the MSB of each octet for use as a marker. If 
it is set to 1, then there are yet more bits to read. If it is 
set to 0, then this is the last group of bits. This enables each 
octet to carry 7 bits at a time, while allowing you to encode 
data of any bit size into an octet stream. You just need to break 
your data down into groups of 7 bits.

Nov 12 2016

LiNbO3 <nosp m.please> writes:

On Saturday, 12 November 2016 at 19:19:58 UTC, James Buren wrote:
 On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, 
 normally distributed?

 I don't recall the name, but there is an algorithm for encoding 
 data of an arbitrary number of bytes/bits into a stream of 
 octets. It reserves the MSB of each octet for use as a marker. 
 If it is set to 1, then there are yet more bits to read. If it 
 is set to 0, then this is the last group of bits. This enables 
 each octet to carry 7 bits at a time, while allowing you to 
 encode data of any bit size into an octet stream. You just need 
 to break your data down into groups of 7 bits.

Maybe the LEB128 encoding [1] ?

[1] https://en.wikipedia.org/wiki/LEB128

Nov 13 2016

Era Scarecrow <rtcvb32 yahoo.com> writes:

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 0 => [0]
 1 => [1,0]
 2 => [1,1,0]

 is easy but assumes a too extreme input value distribution.

 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?

  Hmm off hand I'm recalling there's a different encoding that 
uses 0's to determine how long the encoding is, and then the 
first 1 breaks it off to the actual encoding. Great for short 
ints.

  1 = 0
  010 = 1
  011 = 2
  00101 = 4
  000011111 = 30

  https://en.wikipedia.org/wiki/Exponential-Golomb_coding

  If you know the range of the data, I'd almost be more inclined 
to do range encoding mixed with arithmetic coding.

Nov 12 2016

Arafel <er.krali gmail.com> writes:

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 I'm looking for libraries/snippets (either in D or similar 
 languages) that perform variable-length encoding of unsigned 
 integers onto a bit-stream. Requirement is that smaller inputs 
 (integer values) should be encoded with equal or fewer bits.

 This

 0 => [0]
 1 => [1,0]
 2 => [1,1,0]

 is easy but assumes a too extreme input value distribution.

 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?

If you have a sample of your data, perhaps Huffman codes 
(https://en.wikipedia.org/wiki/Huffman_coding) might be an option?

Nov 14 2016

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Variable-Length Bit-Level Encoding