www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Variable-Length Bit-Level Encoding

reply =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
I'm looking for libraries/snippets (either in D or similar 
languages) that perform variable-length encoding of unsigned 
integers onto a bit-stream. Requirement is that smaller inputs 
(integer values) should be encoded with equal or fewer bits.

This

0 => [0]
1 => [1,0]
2 => [1,1,0]

is easy but assumes a too extreme input value distribution.

Does anybody have a suggestion for an encoder that is more 
suitable for real-world values that are, for instance, normally 
distributed?
Nov 12 2016
next sibling parent =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?
Doh, forget the normal distribution thing here. It, of course, doesn't work with unsigned integers.
Nov 12 2016
prev sibling next sibling parent reply James Buren <ryu0 ymail.com> writes:
On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?
I don't recall the name, but there is an algorithm for encoding data of an arbitrary number of bytes/bits into a stream of octets. It reserves the MSB of each octet for use as a marker. If it is set to 1, then there are yet more bits to read. If it is set to 0, then this is the last group of bits. This enables each octet to carry 7 bits at a time, while allowing you to encode data of any bit size into an octet stream. You just need to break your data down into groups of 7 bits.
Nov 12 2016
parent LiNbO3 <nosp m.please> writes:
On Saturday, 12 November 2016 at 19:19:58 UTC, James Buren wrote:
 On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, 
 normally distributed?
I don't recall the name, but there is an algorithm for encoding data of an arbitrary number of bytes/bits into a stream of octets. It reserves the MSB of each octet for use as a marker. If it is set to 1, then there are yet more bits to read. If it is set to 0, then this is the last group of bits. This enables each octet to carry 7 bits at a time, while allowing you to encode data of any bit size into an octet stream. You just need to break your data down into groups of 7 bits.
Maybe the LEB128 encoding [1] ? [1] https://en.wikipedia.org/wiki/LEB128
Nov 13 2016
prev sibling next sibling parent Era Scarecrow <rtcvb32 yahoo.com> writes:
On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 0 => [0]
 1 => [1,0]
 2 => [1,1,0]

 is easy but assumes a too extreme input value distribution.

 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?
Hmm off hand I'm recalling there's a different encoding that uses 0's to determine how long the encoding is, and then the first 1 breaks it off to the actual encoding. Great for short ints. 1 = 0 010 = 1 011 = 2 00101 = 4 000011111 = 30 https://en.wikipedia.org/wiki/Exponential-Golomb_coding If you know the range of the data, I'd almost be more inclined to do range encoding mixed with arithmetic coding.
Nov 12 2016
prev sibling parent Arafel <er.krali gmail.com> writes:
On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:
 I'm looking for libraries/snippets (either in D or similar 
 languages) that perform variable-length encoding of unsigned 
 integers onto a bit-stream. Requirement is that smaller inputs 
 (integer values) should be encoded with equal or fewer bits.

 This

 0 => [0]
 1 => [1,0]
 2 => [1,1,0]

 is easy but assumes a too extreme input value distribution.

 Does anybody have a suggestion for an encoder that is more 
 suitable for real-world values that are, for instance, normally 
 distributed?
If you have a sample of your data, perhaps Huffman codes (https://en.wikipedia.org/wiki/Huffman_coding) might be an option?
Nov 14 2016