Capitalware has an MQ solution called MQ Message Replication (MQMR).
MQ Message Replication will clone messages being written (via MQPUT or MQPUT1 API calls) to an application’s output queue and MQMR will write the exact same messages to ‘n’ target queues (‘n’ can be up to 100). When MQMR replicates a message both the message data and the message’s MQMD structure will be cloned. This means that the fields of the MQMD structure (i.e. PutTime, MessageId, CorrelId, UserId, etc..) will be exactly the same as the original message’s MQMD structure.
MQMR includes 2 auxiliary programs:
- MQ Queue To SQLite DB (MQ2SDB) program will offload MQ messages to an SQLite database.
- SQLite DB To MQ Queue (SDB2MQ) program will load SQLite database rows into messages in an MQ queue.
The SQLite databases, created by the MQ2SDB program, can grow to be extremely large when thousands or tens of thousands of messages are offloaded to it. A quick solution would be to run a nightly job and compress/zip the previous day’s SQLite databases to free up disk space. Or the SQLite databases can be moved to a different file system.
I had a thought, why not add an option to the MQ2SDB program to compress the message data before it is written to the SQLite database. And add code in SDB2MQ program to decompress the data when it is put to a queue.
I did a bunch of research and compression algorithms are almost as complex as encryption algorithms. The compression algorithms are far, far more dependent on the data than encryption algorithms. What I mean is that the type of data and the structure of the data dictate how well and how fast the compression algorithms will perform.
I decided it was best to add a variety of lossless compression algorithms, so that end-users can select the compression algorithm that best fits their data.
The MQ2SDB program supports the following 8 lossless compression algorithms:
- LZ1 (aka LZ77) – I used Andy Herbert’s modified version with a pointer length bit-width of 5.
- LZ4 – It is promoted as extremely fast (which it is).
- LZW – I used Michael Dipperstein’s implementation of Lempel-Ziv-Welch.
- LZMA Fast – I used the LZMA SDK from 7-Zip with a Level set to 4.
- LZMA Best – I used the LZMA SDK from 7-Zip with a Level set to 5.
- RLE – Run Length Encoding – I wrote the code from pseudo code – very basic stuff.
- ZLIB Fast – I used Rich Geldreich’s miniz implementation of ZLIB with a Level of Z_BEST_SPEED.
- ZLIB Best – I used Rich Geldreich’s miniz implementation of ZLIB with a Level of Z_BEST_COMPRESSION.
So, how do you know what is the best compression algorithm for the end-user’s data? Well, to take the guess work out of it, I wrote a simple program called TESTCMPRSN. It applies all 8 compression algorithms against a file and display the results.
Here’s an example of TESTCMPRSN program being run against a 2.89MB XML file:
C:\test>testcmprsn.exe msg5.xml testcmprsn version 0.0.1 (Windows64) {Sep 2 2020} msg5.xml size is 3034652 (2.89MB) Time taken to perform memcpy() is 1.0757ms Algorithm Compressed Compression Compression Decompression Size Time in ms Ratio Time in ms LZ1 375173 (366.38KB) 541.6782 8.09 to 1 5.6972 LZ4 140692 (137.39KB) 4.9557 21.57 to 1 1.3401 LZMA Fast 75967 (74.19KB) 49.4750 39.95 to 1 10.7603 LZMA Best 71453 (69.78KB) 463.8315 42.47 to 1 10.7566 LZW 186484 (182.11KB) 76.0163 16.27 to 1 19.8878 RLE 4054366 (3.87MB) 8.1609 0.75 to 1 9.4421 ZLIB Fast 151404 (147.86KB) 15.3561 20.04 to 1 6.8379 ZLIB Best 84565 (82.58KB) 60.6147 35.89 to 1 6.0363 testcmprsn is ending.
Clearly, LZMA Best crushed it. It reduced a 2.89MB file to just 69.78KB but at a cost of 467.498 milliseconds. A better option for that type of data is to use LZMA Fast but if speed is what you want then LZ4 is by far the better choice.
As a benchmark, the TESTCMPRSN program performs a memcpy() of the data, so that the end-user can compare the compression algorithms compression time against the memcpy() time.
As they say: your mileage will vary. The only way to know which compression algorithm will work best for your data is to test it. Note: RLE should only be used with alphanumeric data (plain text) that has repeating characters and never with binary data.
I have completed a wide variety of tests and everything looks good.
If anyone would like to test out the latest release then send the email to support@capitalware.com
Regards,
Roger Lacroix
Capitalware Inc.