Cet assombrissement est soumis au nom de Paris.pm canal assombri.

This program perform a Huffman compression to its STDIN or to the files
given on the command-line. One drawback is that I was too lazy to write
the code to output the compressed data of different files to different
output files...

HOW DOES IT WORK?

First $/ and $" are undefined, so that the first use of the diamond operator
slurps the whole file, and that the double-quoting of array(slice)s don't
result in unwanted spaces. map is used instead of for because I like returning
lists in a void context. Furthermore, it saves a few ()'s.

Stphane Payrard (of OPC3 fame) teached me that @0 is a perfectly valid
array. That's a good enough reason to use it. The same goes for %0.

Thanks to O::Deparse, I learned a few tricks like writing $a->[1] as $$a[1]...
(Take this as a proof that one can learn something through obfuscation.)
Naturally, I doubt that O::Deparse helped you much, since here it spits out:

Can't call method "sibling" on an undefined value at /usr/lib/perl5/5.6.0/i386-linux/B/Deparse.pm line 257.
CHECK failed--call queue aborted.

I must admit I don't know at all how it is possible. But, hey that's on more
hurdle on your way to understanding this program.

So, %0 holds the count for each character (the character is the key, the count
is the value). Given the character, we create a array containing one array by
character, containg: one sub-tree of the Huffman tree (at first, it's only
a leaf (i.e. the character itself)), its weight, and a string containing all
the sub-tree leaves.

THE HUFFMAN TREE

The Huffman algorithm pairs the sub-trees by weight. As long as there are
more than one sub-tree, pairs the lightest two into one sub-tree. I chose
to have the lightest branches on the left (bit 0).

            45           For a file containing: 27 A
            /\                                  15 B
          18  A                                  3 C
          /\                                     1 D
         3  B                                    1 E
        /\                                       1 F
       C  4                                      1 G
         /\
        2  2             the resulting tree is (with the weight of each
       /\  /\            sub-tree noted at its root) drawed on the left.
      D  EF  G

This tree is encoded at the beginning of the file in the following manner:

            01           Two bits are used by node (in the previous example,
            /\           there are 6 nodes and 7 leaves), one for each branch.
          01  A
          /\             0 means non-terminal node and 1 means terminal node
        10  B            (i.e. there is a leaf hanging there).
        /\
       C 00              Our tree looks like the one on the left.
         /\
       11  11            The recursive procedure that builds the tree
       /\  /\            represents the data in the following manner:
      D  EF  G           [left-leaf][sub-tree][right-leaf]

Applying this procedure (named _) recursively, we obtain: 010110C0011DE11FGBA
(every character being replaced with its bit representation).

SO IT'S JUST A REGEX?

While we build the tree (line 2 and 3), we also create a HASH %t which keys
are the characters in the file to be compressed and which values are the bit
string (the path in the tree). This path is constructed as we creare the
tree itself. $; and $/ are used to create the needed 0's and 1's... $/ equals
2 (which modulus helps), and $; is first undefined and is an even number each
time we need it again in our map.

Finally, we perform a substitution on the data, replacing each character by
the bit string that represents the path in the tree. The compressed file is
the concatenation of the length of the data (since pack will pad the file with
null bits, we need this information for the decompressor), the dictionary
(which is self-explaining: no need for its size), and the data itself.

FINAL NOTE

This program was designed to have as little distinct characters as possible,
and also so as some character appear really a lot. It was too difficult
for me to write it so that the respective probability of each character
would be a negative power of 2 (1/2**$k if you like), which is when Huffman
encoding performs at its best. My goal was to write a compressor/decompressor
all in one obfuscation. Alas, Huffman doesn't perform well enough (it's that
lousy dictionary's fault!). OK, next time I'll try with Lempel-Ziv...	
