Coding Delay

Coordinator
Aug 2, 2007 at 4:23 AM
Edited Aug 2, 2007 at 4:27 AM
I have been working on a major project at my current job. It's a new compression algorithm to support compressing of documents with mostly similar information in the file into one archive file. This kind of compression does not exist and to prove it, take 2 one meg text files that are the exact same (except the file name) compress one in the into a zip file. Then take a look at the size of the zip file then compress the second file that is the exact same, notice the zip file is now double the size! But why?

You will get the same size if you compress both files into one zip file and with any other archival compression system.

However the problem is not files with the same size the problem is when the second file only had a few bytes that are different and the when you millions of files with only a 6k different than the other you can see there is a lot of wasted space.

That's where my new cool algorithm comes in. ;)
I'm doing it in C# so I am happy about that. I have a dead line of Aug 14 so I have put all of my programming efforts there for now.

I will resume work after my work project is finished.

Cheers

Mr Dee