www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Ecoji-d v1.0.0 is released - Base1024 using emojis

reply Anton Fediushin <fediushin.anton yandex.ru> writes:
๐Ÿ––, I'm glad to announce that ecoji-d - pure D implementation of 
ecoji encoding version 1๏ธโƒฃ.0๏ธโƒฃ.0๏ธโƒฃ is finally releasedโ—

What is ecoji?

Ecoji encodes data as base1024 with an emoji character set. It 
can be used instead of boring and old base64 ๐Ÿคฎ๐Ÿคฎ๐Ÿคฎ.

Encoding example:

---
$ echo "Base64 is so 1999, isn't there something better?" | 
ecoji-d
๐Ÿ—๐Ÿ“ฉ๐ŸŽฆ๐Ÿ‡๐ŸŽ›๐Ÿ“˜๐Ÿ”ฏ๐Ÿšœ๐Ÿ’ž๐Ÿ˜ฝ๐Ÿ†–๐ŸŠ๐ŸŽฑ๐Ÿฅ๐Ÿš„๐ŸŒฑ๐Ÿ’ž๐Ÿ˜ญ๐Ÿ’ฎ๐Ÿ‡ต๐Ÿ’ข๐Ÿ•ฅ๐Ÿญ๐Ÿ”ธ๐Ÿ‰๐Ÿšฒ๐Ÿฆ‘๐Ÿถ๐Ÿ’ข๐Ÿ•ฅ๐Ÿ”ฎ๐Ÿ”บ๐Ÿ‰๐Ÿ“ธ๐Ÿฎ๐ŸŒผ๐Ÿ‘ฆ๐ŸšŸ๐Ÿฅด๐Ÿ“‘
---

And decoding:

---
$ echo -n "๐Ÿ—๐Ÿ“ฉ๐ŸŽฆ๐Ÿ‡๐ŸŽ›๐Ÿ“˜๐Ÿ”ฏ๐Ÿšœ๐Ÿ’ž๐Ÿ˜ฝ๐Ÿ†–๐ŸŠ๐ŸŽฑ๐Ÿฅ๐Ÿš„๐ŸŒฑ๐Ÿ’ž๐Ÿ˜ญ๐Ÿ’ฎ๐Ÿ‡ต๐Ÿ’ข๐Ÿ•ฅ๐Ÿญ๐Ÿ”ธ๐Ÿ‰๐Ÿšฒ๐Ÿฆ‘๐Ÿถ๐Ÿ’ข๐Ÿ•ฅ๐Ÿ”ฎ๐Ÿ”บ๐Ÿ‰๐Ÿ“ธ๐Ÿฎ๐
Œผ๐Ÿ‘ฆ๐ŸšŸ๐Ÿฅด๐Ÿ“‘" | ecoji-d -d
Base64 is so 1999, isn't there something better?
---


Ecoji-d's features:

     โœ”๏ธ Range interface
     โœ”๏ธ Lazy encoding/decoding
     โœ”๏ธ Low memory usage
     โœ”๏ธ  safe and pure when possible
     โœ”๏ธ Many tests
     โœ”๏ธ Can be used as a library and as a CLI utility


API consists of just 2๏ธโƒฃ functions:

     ๐Ÿ‘‰ `encode`, which does encoding
     ๐Ÿ‘‰ `decode`, which does decoding


Links:

     ๐Ÿ“ฆ DUB package page: http://code.dlang.org/packages/ecoji-d
     ๐Ÿ‘ GitHub repository: https://github.com/ohdatboi/ecoji-d
     ๐ŸคŸ GitHub repository of the reference Go implementation: 
https://github.com/keith-turner/ecoji
Mar 14
next sibling parent reply bauss <jj_1337 live.dk> writes:
On Wednesday, 14 March 2018 at 17:30:18 UTC, Anton Fediushin 
wrote:
 ๐Ÿ––, I'm glad to announce that ecoji-d - pure D implementation of 
 ecoji encoding version 1๏ธโƒฃ.0๏ธโƒฃ.0๏ธโƒฃ is finally releasedโ—

 What is ecoji?

 Ecoji encodes data as base1024 with an emoji character set. It 
 can be used instead of boring and old base64 ๐Ÿคฎ๐Ÿคฎ๐Ÿคฎ.

 Encoding example:

 ---
 $ echo "Base64 is so 1999, isn't there something better?" | 
 ecoji-d
 ๐Ÿ—๐Ÿ“ฉ๐ŸŽฆ๐Ÿ‡๐ŸŽ›๐Ÿ“˜๐Ÿ”ฏ๐Ÿšœ๐Ÿ’ž๐Ÿ˜ฝ๐Ÿ†–๐ŸŠ๐ŸŽฑ๐Ÿฅ๐Ÿš„๐ŸŒฑ๐Ÿ’ž๐Ÿ˜ญ๐Ÿ’ฎ๐Ÿ‡ต๐Ÿ’ข๐Ÿ•ฅ๐Ÿญ๐Ÿ”ธ๐Ÿ‰๐Ÿšฒ๐Ÿฆ‘๐Ÿถ๐Ÿ’ข๐Ÿ•ฅ๐Ÿ”ฎ๐Ÿ”บ๐Ÿ‰๐Ÿ“ธ๐Ÿฎ๐ŸŒผ๐Ÿ‘ฆ๐ŸšŸ๐Ÿฅด๐Ÿ“‘
 ---

 And decoding:

 ---
 $ echo -n "๐Ÿ—๐Ÿ“ฉ๐ŸŽฆ๐Ÿ‡๐ŸŽ›๐Ÿ“˜๐Ÿ”ฏ๐Ÿšœ๐Ÿ’ž๐Ÿ˜ฝ๐Ÿ†–๐ŸŠ๐ŸŽฑ๐Ÿฅ๐Ÿš„๐ŸŒฑ๐Ÿ’ž๐Ÿ˜ญ๐Ÿ’ฎ๐Ÿ‡ต๐Ÿ’ข๐Ÿ•ฅ๐Ÿญ๐Ÿ”ธ๐Ÿ‰๐Ÿšฒ๐Ÿฆ‘๐Ÿถ๐Ÿ’ข๐Ÿ•ฅ๐Ÿ”ฎ๐Ÿ”บ๐Ÿ‰๐Ÿ“ธ๐Ÿฎ๐
Œผ๐Ÿ‘ฆ๐ŸšŸ๐Ÿฅด๐Ÿ“‘" | ecoji-d 
 -d
 Base64 is so 1999, isn't there something better?
 ---


 Ecoji-d's features:

     โœ”๏ธ Range interface
     โœ”๏ธ Lazy encoding/decoding
     โœ”๏ธ Low memory usage
     โœ”๏ธ  safe and pure when possible
     โœ”๏ธ Many tests
     โœ”๏ธ Can be used as a library and as a CLI utility


 API consists of just 2๏ธโƒฃ functions:

     ๐Ÿ‘‰ `encode`, which does encoding
     ๐Ÿ‘‰ `decode`, which does decoding


 Links:

     ๐Ÿ“ฆ DUB package page: http://code.dlang.org/packages/ecoji-d
     ๐Ÿ‘ GitHub repository: https://github.com/ohdatboi/ecoji-d
     ๐ŸคŸ GitHub repository of the reference Go implementation: 
 https://github.com/keith-turner/ecoji
Fun, but seems pretty useless in practice.
Mar 15
parent reply Anton Fediushin <fediushin.anton yandex.ru> writes:
On Thursday, 15 March 2018 at 09:32:50 UTC, bauss wrote:
 Fun, but seems pretty useless in practice.
I disagree. Ecoji (base1024) has bigger character set meaning that it can encode more information per emoji than base64 can encode per character. For example ecoji encoded "abcde" looks like this: "๐Ÿ‘–๐Ÿ“ธ๐ŸŽฆ๐ŸŒญ" And base64 encoded one looks like this: "YWJjZGU=". Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data: --- $ dd if=/dev/urandom bs=4K count=16K of=test.raw 16384+0 records in 16384+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 1.90423 s, 35.2 MB/s $ dd if=test.raw | ./ecoji-d | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 6.7699 s, 9.9 MB/s 71591534 # Size increased just by 6% $ dd if=test.raw | base64 | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 0.750174 s, 89.5 MB/s 90655837 # 35%(!) increase in size --- And if we move to real word scenarios, where web pages are gzip'ped most of the time: --- $ dd if=test.raw | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s 67119122 # Raw files are terrible for compression $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s 32178275 # 48% improvement $ dd if=test.raw | base64 | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s 68892893 # Pretty bad, yeah --- So yeah, ecoji is better than base64 in everything but speed. Speed will be improved. Later.
Mar 15
next sibling parent reply bauss <jj_1337 live.dk> writes:
On Thursday, 15 March 2018 at 18:45:51 UTC, Anton Fediushin wrote:
 On Thursday, 15 March 2018 at 09:32:50 UTC, bauss wrote:
 Fun, but seems pretty useless in practice.
I disagree. Ecoji (base1024) has bigger character set meaning that it can encode more information per emoji than base64 can encode per character. For example ecoji encoded "abcde" looks like this: "๐Ÿ‘–๐Ÿ“ธ๐ŸŽฆ๐ŸŒญ" And base64 encoded one looks like this: "YWJjZGU=". Even though each emoji is 4 bytes long, there is a noticable difference in size when we are talking about larger chunks of data: --- $ dd if=/dev/urandom bs=4K count=16K of=test.raw 16384+0 records in 16384+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 1.90423 s, 35.2 MB/s $ dd if=test.raw | ./ecoji-d | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 6.7699 s, 9.9 MB/s 71591534 # Size increased just by 6% $ dd if=test.raw | base64 | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 0.750174 s, 89.5 MB/s 90655837 # 35%(!) increase in size --- And if we move to real word scenarios, where web pages are gzip'ped most of the time: --- $ dd if=test.raw | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s 67119122 # Raw files are terrible for compression $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s 32178275 # 48% improvement $ dd if=test.raw | base64 | gzip -c | wc -c 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s 68892893 # Pretty bad, yeah --- So yeah, ecoji is better than base64 in everything but speed. Speed will be improved. Later.
If your care about size of data then you're not going to encode anyway. Same goes for speed. Besides your encoding isn't going to work with actual web-pages anyway, because your encoder doesn't have browser support. Sure you can encode your data and gzip it, but once it reaches the browser and it unzips it, then what? The browser doesn't know what to do with the data. You can't even use base64 for http headers. At most it could be used for email clients, since they do support "Content-Transfer-Encoding" but browsers don't. They only support "Content-Encoding" which at most can be compressions such as gzip.
Mar 16
parent reply Anton Fediushin <fediushin.anton yandex.ru> writes:
On Friday, 16 March 2018 at 08:25:30 UTC, bauss wrote:
 Besides your encoding isn't going to work with actual web-pages 
 anyway, because your encoder doesn't have browser support.
Well, encoding is not *mine*, only D implementation is. What do you mean by "browser support"? Indeed, ecoji-d cannot be used on the client side, but since algorithm is simple and code is publically available anyone can implement decoding in JavaScript or any other language.
 Sure you can encode your data and gzip it, but once it reaches 
 the browser and it unzips it, then what? The browser doesn't 
 know what to do with the data. You can't even use base64 for 
 http headers.
Then you use client-side decoder, of course!
Mar 18
parent bauss <jj_1337 live.dk> writes:
On Sunday, 18 March 2018 at 12:51:23 UTC, Anton Fediushin wrote:
 On Friday, 16 March 2018 at 08:25:30 UTC, bauss wrote:
 Besides your encoding isn't going to work with actual 
 web-pages anyway, because your encoder doesn't have browser 
 support.
Well, encoding is not *mine*, only D implementation is. What do you mean by "browser support"? Indeed, ecoji-d cannot be used on the client side, but since algorithm is simple and code is publically available anyone can implement decoding in JavaScript or any other language.
Yes, but that makes your example pointless, because having to decode in javascript is not exactly something that anybody in their sane mind would ever do with a webpage or anything like that anyway.
Mar 18
prev sibling next sibling parent Rainer Schuetze <r.sagitario gmx.de> writes:
On 15/03/2018 19:45, Anton Fediushin wrote:
 $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c
 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s
 32178275 # 48% improvement
If you can compress random data to 52% of the original data, you should repeat this step until there is a single byte left.
Mar 16
prev sibling next sibling parent Manu <turkeyman gmail.com> writes:
On 15 March 2018 at 11:45, Anton Fediushin via Digitalmars-d-announce <
digitalmars-d-announce puremagic.com> wrote:

 Even though each emoji is 4 bytes long, there is a noticable difference in
 size when we are talking about larger chunks of data:
This doesn't make sense. For every 10 bits, you're emitting 32 bits... you're more than tripling the size of the data. Base64 takes 6 bits and emits 8 bits, which is a third larger. 1.333x is smaller than 3.2x. O_o
Mar 17
prev sibling parent reply Cym13 <cpicard openmailbox.org> writes:
On Thursday, 15 March 2018 at 18:45:51 UTC, Anton Fediushin wrote:
 $ dd if=test.raw | gzip -c | wc -c
 67108864 bytes (67 MB, 64 MiB) copied, 5.49022 s, 12.2 MB/s
 67119122 # Raw files are terrible for compression
 $ dd if=test.raw | ./ecoji-d | gzip -c | wc -c
 67108864 bytes (67 MB, 64 MiB) copied, 27.9972 s, 2.4 MB/s
 32178275 # 48% improvement
 $ dd if=test.raw | base64 | gzip -c | wc -c
 67108864 bytes (67 MB, 64 MiB) copied, 10.3381 s, 6.5 MB/s
 68892893 # Pretty bad, yeah
Randomness isn't compressible. The fact that ecoji-d compresses anything above 1% shows only that there is a bug in your library: ``` $ dd if=/dev/urandom bs=4K count=16K of=test.raw 16384+0 records in 16384+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 0.373423 s, 180 MB/s $ dd if=test.raw | ./ecoji-d | gzip -c | gzip -cd | ./ecoji-d -d
 test2.raw
131072+0 records in 131072+0 records out 67108864 bytes (67 MB, 64 MiB) copied, 24.9523 s, 2.7 MB/s $ wc -c test.raw test2.raw 67108864 test.raw 11185155 test2.raw ``` So definitely not the same files before and after compression/decompression. However the beginning is the same: ``` $ xxd test.raw | head 00000010: a05f c801 bf01 13c1 04a2 556a 6d79 a09c ._........Ujmy.. 00000020: 8032 523e 851d 419a b0d3 0c4f e7ba 93e1 .2R>..A....O.... 00000030: 9fdc 7c55 2645 f6e7 3f9e f5db bc92 1e29 ..|U&E..?......) 00000040: 457a a3b9 c274 3b08 6bde 486a 1798 f281 Ez...t;.k.Hj.... 00000050: 9d91 e97a f13f db8b 5d0c 114a 27be 2154 ...z.?..]..J'.!T 00000060: a9a2 3a17 36e4 9181 64f2 35b6 aa91 064d ..:.6...d.5....M 00000070: 863a ddbd 8776 f87d 3eb2 634f 12dc 6e7f .:...v.}>.cO..n. 00000080: 46c9 bc95 2620 b315 e84d 9ee4 8651 d172 F...& ...M...Q.r 00000090: 836d 7bf8 9e1c 09c3 0e10 b787 7e06 bc39 .m{.........~..9 $ xxd test2.raw | head 00000010: a05f c801 bf01 13c1 04a2 556a 6d79 a09c ._........Ujmy.. 00000020: 8032 523e 851d 419a b0d3 0c4f e7ba 93e1 .2R>..A....O.... 00000030: 9fdc 7c55 2645 f6e7 3f9e f5db bc92 1e29 ..|U&E..?......) 00000040: 457a a3b9 c274 3b08 6bde 486a 1798 f281 Ez...t;.k.Hj.... 00000050: 9d91 e97a f13f db8b 5d0c 114a 27be 2154 ...z.?..]..J'.!T 00000060: a9a2 3a17 36e4 9181 64f2 35b6 aa91 064d ..:.6...d.5....M 00000070: 863a ddbd 8776 f87d 3eb2 634f 12dc 6e7f .:...v.}>.cO..n. 00000080: 46c9 bc95 2620 b315 e84d 9ee4 8651 d172 F...& ...M...Q.r 00000090: 836d 7bf8 9e1c 09c3 0e10 b787 7e06 bc39 .m{.........~..9 ``` So I think ecoji-d just truncates its input at some point.
Mar 18
parent Anton Fediushin <fediushin.anton yandex.ru> writes:
On Sunday, 18 March 2018 at 11:25:45 UTC, Cym13 wrote:
 So I think ecoji-d just truncates its input at some point.
Indeed, there's an error somewhere. For some reason it stops after 7457792 bytes. I'll create an issue for that and will look into this later
Mar 18
prev sibling next sibling parent Faux Amis <faux amis.com> writes:
On 2018-03-14 18:30, Anton Fediushin wrote:
 ๐Ÿ––, I'm glad to announce that ecoji-d - pure D implementation of ecoji 
 encoding version 1๏ธโƒฃ.0๏ธโƒฃ.0๏ธโƒฃ is finally releasedโ—
 
 What is ecoji?
 
 Ecoji encodes data as base1024 with an emoji character set. It can be 
 used instead of boring and old base64 ๐Ÿคฎ๐Ÿคฎ๐Ÿคฎ.
 
 Encoding example:
 
 ---
 $ echo "Base64 is so 1999, isn't there something better?" | ecoji-d
 ๐Ÿ—๐Ÿ“ฉ๐ŸŽฆ๐Ÿ‡๐ŸŽ›๐Ÿ“˜๐Ÿ”ฏ๐Ÿšœ๐Ÿ’ž๐Ÿ˜ฝ๐Ÿ†–๐ŸŠ๐ŸŽฑ๐Ÿฅ๐Ÿš„๐ŸŒฑ๐Ÿ’ž๐Ÿ˜ญ๐Ÿ’ฎ๐Ÿ‡ต๐Ÿ’ข๐Ÿ•ฅ๐Ÿญ๐Ÿ”ธ๐Ÿ‰๐Ÿšฒ๐Ÿฆ‘๐Ÿถ๐Ÿ’ข๐Ÿ•ฅ๐Ÿ”ฎ๐Ÿ”บ๐Ÿ‰๐Ÿ“ธ๐Ÿฎ
ŸŒผ๐Ÿ‘ฆ๐ŸšŸ๐Ÿฅด๐Ÿ“‘ 
 
Useful feature: Easy manual verification.
Mar 17
prev sibling parent Abdulhaq <alynch4047 gmail.com> writes:
On Wednesday, 14 March 2018 at 17:30:18 UTC, Anton Fediushin 
wrote:
 ๐Ÿ––, I'm glad to announce that ecoji-d - pure D implementation of 
 ecoji encoding version 1๏ธโƒฃ.0๏ธโƒฃ.0๏ธโƒฃ is finally releasedโ—

 [...]
Congratulations, it's a nice bit of fun.
Mar 18