Qbasicnews.com

QBasic => QB Discussion & Programming Help => Topic started by: R@dioman on May 28, 2002, 03:33:53 AM



Title: Does anybody know anything about compression?
Post by: R@dioman on May 28, 2002, 03:33:53 AM
Not just algorithms, but actually knew the math behind compressing information? Because I tried and managed to compress only a few digits using an elaborate method using a 36 base number system (involving greek letters, our letters and our numbers). I would like to get into that. Is there a really good tutorial out there? Let me know  :)


Title: Does anybody know anything about compression?
Post by: wizardlife on May 28, 2002, 11:51:52 AM
The basic principle is that you assign characters to certain recurring symbols that come up in the file a lot. I think I could set it up, but I'm not sure if it would be that great, or that fast.


Title: www.google.com
Post by: Agamemnus on January 28, 2003, 08:23:13 PM
data+compression+methods

 :bounce:


Title: Does anybody know anything about compression?
Post by: relsoft on January 29, 2003, 12:38:42 AM
RLE =Just about Anybody can....

LZW = Antoni Gual/Rich GelDreich


Title: Does anybody know anything about compression?
Post by: wizardlife on January 29, 2003, 01:59:23 AM
Quote from: "relsoft"
RLE =Just about Anybody can....


lol. Depends what you're compressing. You won't get very far compressing a novel that way, but it works for images...


Title: Does anybody know anything about compression?
Post by: LooseCaboose on January 29, 2003, 08:07:11 AM
*Digs out notes on compression*

If you just want algorithms then youll find plenty of examples on the net, with RLE, LZH, gzip, bzip, huffman encoding, etc. Also look at prefix codes and block codes for simple compression techniques.

The actual theoritical math is (IMHO) rather ugly. To start off with you have compression ratio
Code:

ratio = (LenghtAfter / LengthBefore) * 100

and compression rate
Code:

rate = ((LengthBefore - LengthAfter) / LengthBefore) * 100


Those two are rather straight forward and are the basis of measurement for any given compression method. There are limits to compression, not all data can be compressed (all messages are equally likely), and not all data can be compressed equally well.

The main mathmatical basis for compression is around message probabilites and distributions. For example using the ascii set, you have 256 possible messages, in a novel message 'e' is far more probable than other messages such as 'z' and 'x'. Encoding message 'e' with a short bit string and message 'z' with a longer one is how compression is achieved.

There are formulas for the upper and lower bounds of compression, as well as things like noiseless coding, block code constrution etc. (I wont post them here because they are too complex to write in text, unless you have LaTeX).

Do a google search on Shannon, Kraft and Huffman who all did large amounts of work with the mathmatics and theory behind compression. Hope this helps?


Title: Does anybody know anything about compression?
Post by: Neo on January 29, 2003, 08:44:05 AM
There are different ways of compressing, of which RLE is (I think) the easiest.

A harder one is the RDX compression.


Title: Does anybody know anything about compression?
Post by: toonski84 on January 29, 2003, 08:53:58 AM
rle = representing repeated pixels longer than 3 as 3 characters.

lzw = representing repeated patterns as 2 characters.

huffman/jpg = placing a voodoo doll in the center of a room, light a circle of fire around you and dance around the room praying to antoni gual to send magic jpeg decoding rays from the sky.


Title: Does anybody know anything about compression?
Post by: relsoft on January 29, 2003, 08:57:04 AM
All hail Antoni!!!!!


 :rotfl:


Title: Does anybody know anything about compression?
Post by: Neo on January 29, 2003, 09:11:53 AM
:rotfl:  :bounce:  :king:  :bounce:  :rotfl:


Title: Does anybody know anything about compression?
Post by: LooseCaboose on January 29, 2003, 06:07:30 PM
I always thought Huffman encoding was easy, I had to write a Huffman encoder in Java as a first year project, it gives really nice results for the right kinds of data as well. :-)

* digs out notes on JPEG format *

Yeah that is nasty, even so far as to have different compression formulas for PAL and NTSC JPEGs. Compression is done around two chrominance and one luminance values (because the eyes responds more to the former) and breaking the image into 8x8 blocks. The blocks are then passed through a Discrete Cosine Transformation (DCT) to create a set of 8x8 matrices. Quantization then removes the less important DCT values, each individual application must provide its own quantization tables for loss-compression tradeoff (these arent part of the JPEG standard). The differences of the {0, 0} values are then calculated, reducing most of them to small values.

The 64 elements are then linearized with a zig-zag and compressed using RLE (;-)), usually there are lots of zeros in the lower right corner of the matrix so this compresses quite well. Then the resulting data is Huffman encoded for easy transmition. JPEG gives about 20:1 compression ratio. Running the algorithm backwards will decompress it in about the same time taken to compress it.

JPEG is a bit tricky, but at least it isnt as bad as MPEG (which is the same only worse) or bzip which one of my tutors commented on "The author of bzip compression was clearly on heavy drugs at the time and I could show you how the compression algorithm works, but you honestly wouldnt believe me" ;-).


Title: Does anybody know anything about compression?
Post by: Antoni Gual on January 29, 2003, 08:23:22 PM
The idea of how the LZW decoder predicts next character is just genial!  :o (a very simple and effective algorithm, not so easy to catch).

Huffman algorithm is easy to understand, very theoric, not a spark of imagination, just statistics.
Jpeg is a funny thing: an algorithm on the top of an algorithm on the top of.. It's difficult to put all together and it's harder to guess why all that complication is needed. Because it was designed by a comission?  :???:
 
Some day I will try progressive JPEG's, they add a new layer of complexity, more fun!. I need to master EMS before.. :-?


Bzip sounds interesting. Where i can find something about it? :bounce:


Title: Does anybody know anything about compression?
Post by: LooseCaboose on January 29, 2003, 08:53:51 PM
The home page for bzip is at:
http://sources.redhat.com/bzip2/#what-is-bzip2
You can get the freely available source there which is fully ANSI complaint so it should compile on any platform. Check the futher reading section in the help page for information regarding the algorithm. The author comments that bzip doesnt actually present any new ideas in compression, but is more of an engineering of old ideas. I think you would have to use the source code in conjunction with the cited papers in order to get a full grasp on how the compression method works. I havent actually looked at it myself because I havent had the time with so many other projects and issues in real life(tm).


Title: Does anybody know anything about compression?
Post by: red_Marvin on January 30, 2003, 12:24:16 PM
I have worked out a genial but bad way of compressing
files: More or less a file is made off ascii chars which is
values from 0-255. I f you then could make a mathematical
algorithm to decide what ascii number a character in the
nth position
=
Code:

asciiChar=(filePosition*52)^filePosition/LotsOfOtherMathStuff


If you know what I mean

...But the disadvantage of it is that the equation tend to be
bigger than the initial file...

but it would be quite nice :wink: ...


Title: Does anybody know anything about compression?
Post by: Agamemnus on January 30, 2003, 02:16:46 PM
Yep, I thought of the same thing before. A minimal equation to represent your file. That should compress it to the theoretical maximum.

But how does one get such an equation?

Plot the ASCII characters on an X (LOF) / Y (0 to 255) graph..
 
You could use a genetic algorithm to find approximate matches.. but if the approximate match corrupts the whole file, what then?

It has to be perfect.

I suppose you could start with a bunch of *exact* (no decimals for ASCII chars!!) sine/cosine curve approximations... Then you could use some exotic logic system to shorten your N equations to M equations, where M is the theoretical maximum.

You could start by enveloping most of the curves as determined by a density analysis into one sine/cosine curve and then using another analysis make another one, and so forth...

The COS/SIN format can be provably the smallest in existence for the current data when you can't compress the actual SIN/COS data anymore!!!!


Title: Does anybody know anything about compression?
Post by: Antoni Gual on January 30, 2003, 05:41:00 PM
Just found a link to a series of articles on compression
http://www.rebolforces.com/articles/compression1.html
it seems interesting and well explained.


Title: Does anybody know anything about compression?
Post by: Hard Rock on January 30, 2003, 06:47:24 PM
png is based somewhat on zip compression right?  Anyways thats something to look up, loading up png, the whole format is open source and comes with libs ans stuff so i guess its a good start.

The reason i assume png uses some form of zip compression is becuase well, it needs zlib to run.

And i think zip is lzw right? and upx uses a slighltly different form of lzw.


Title: Does anybody know anything about compression?
Post by: LooseCaboose on January 30, 2003, 08:49:26 PM
Red_Marvin, your approach may actually work correctly if you try to create the mathmatical formula for blocks of data rather than the entire file. For example if a sequence of bytes in the file were a, b, c, d, e, f, g, ... Then you could create a formula for that block and store a block lookup table at the beginning of the compressed file.

The problem is that not all data can be represented by mathmatical formula, completely random data being the prime example and that it would be computationaly expensive for a compression algorithm to try and derive formulas from data, how would the machine know how big to make the blocks, or the best arrangement.

Most good compression algorithms (ie JPEG and bzip) are hybrids of several compression techniques. A good compression algorithm should also yield good results for most types of data, ie an algorithm that compresses some data very well, but makes random data larger is not ideal.