## digitalmars.D.learn - Variable-Length Bit-Level Encoding

- =?UTF-8?B?Tm9yZGzDtnc=?= (12/12) Nov 12 2016 I'm looking for libraries/snippets (either in D or similar
- =?UTF-8?B?Tm9yZGzDtnc=?= (3/6) Nov 12 2016 Doh, forget the normal distribution thing here. It, of course,
- James Buren (9/12) Nov 12 2016 I don't recall the name, but there is an algorithm for encoding
- LiNbO3 (3/15) Nov 13 2016 Maybe the LEB128 encoding [1] ?
- Era Scarecrow (13/20) Nov 12 2016 Hmm off hand I'm recalling there's a different encoding that
- Arafel (3/15) Nov 14 2016 If you have a sample of your data, perhaps Huffman codes

I'm looking for libraries/snippets (either in D or similar languages) that perform variable-length encoding of unsigned integers onto a bit-stream. Requirement is that smaller inputs (integer values) should be encoded with equal or fewer bits. This 0 => [0] 1 => [1,0] 2 => [1,1,0] is easy but assumes a too extreme input value distribution. Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?

Nov 12 2016

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?Doh, forget the normal distribution thing here. It, of course, doesn't work with unsigned integers.

Nov 12 2016

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?I don't recall the name, but there is an algorithm for encoding data of an arbitrary number of bytes/bits into a stream of octets. It reserves the MSB of each octet for use as a marker. If it is set to 1, then there are yet more bits to read. If it is set to 0, then this is the last group of bits. This enables each octet to carry 7 bits at a time, while allowing you to encode data of any bit size into an octet stream. You just need to break your data down into groups of 7 bits.

Nov 12 2016

On Saturday, 12 November 2016 at 19:19:58 UTC, James Buren wrote:On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:Maybe the LEB128 encoding [1] ? [1] https://en.wikipedia.org/wiki/LEB128Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?I don't recall the name, but there is an algorithm for encoding data of an arbitrary number of bytes/bits into a stream of octets. It reserves the MSB of each octet for use as a marker. If it is set to 1, then there are yet more bits to read. If it is set to 0, then this is the last group of bits. This enables each octet to carry 7 bits at a time, while allowing you to encode data of any bit size into an octet stream. You just need to break your data down into groups of 7 bits.

Nov 13 2016

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:0 => [0] 1 => [1,0] 2 => [1,1,0] is easy but assumes a too extreme input value distribution. Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?Hmm off hand I'm recalling there's a different encoding that uses 0's to determine how long the encoding is, and then the first 1 breaks it off to the actual encoding. Great for short ints. 1 = 0 010 = 1 011 = 2 00101 = 4 000011111 = 30 https://en.wikipedia.org/wiki/Exponential-Golomb_coding If you know the range of the data, I'd almost be more inclined to do range encoding mixed with arithmetic coding.

Nov 12 2016

On Saturday, 12 November 2016 at 19:13:13 UTC, Nordlöw wrote:I'm looking for libraries/snippets (either in D or similar languages) that perform variable-length encoding of unsigned integers onto a bit-stream. Requirement is that smaller inputs (integer values) should be encoded with equal or fewer bits. This 0 => [0] 1 => [1,0] 2 => [1,1,0] is easy but assumes a too extreme input value distribution. Does anybody have a suggestion for an encoder that is more suitable for real-world values that are, for instance, normally distributed?If you have a sample of your data, perhaps Huffman codes (https://en.wikipedia.org/wiki/Huffman_coding) might be an option?

Nov 14 2016