INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL ® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel products are not intended for use in medical, life saving, life sustaining, critical control or safety systems, or in nuclear facility applications. Intel may make changes to specifications and product descriptions at anytime, without notice.
By downloading and installing this sample, you hereby agree that the accompanying Materials are being provided to you under the terms and conditions of the End User License Agreement for the Intel® Integrated Performance Primitives product previously accepted by you. Please refer to the file ippEULA.rtf or ippEULA.txt located in the root directory of your Intel® IPP product installation for more information.
The Data Compression-based IPP_GZIP Sample (IPP_GZIP) illustrates the way of implementing effective lossless data compression solution by using Intel® Integrated Performance Primitives 6.0 Data Compression domain API. Additionally, this sample shows the ways of parallelizing a user application using OpenMP and other methods to fully benefit from modern Intel® microprocessor architectures.
The sample uses dictionary based IPP functions implementing Lempel-Ziv (LZ77) algorithm and original GZIP data formats according to RFC 1950, 1951 and 1952 specifications.
The resulting compressed data formats are fully compatible with the original GZIP formats, so the utilities are interchangeable. You can compress data with IPP_GZIP and decompress it with GZIP, or, vice versa, compress with GZIP and decompress using GZIP. However, to benefit from using multiprocessor/multicore architectures, use IPP_GZIP.
Additional information on this software as well as other Intel software performance products is available at http://developer.intel.com/software/products/ .
IPP_GZIP sample uses the following functions for Intel® Performance Primitives Data Compression library:
IPP_GZIP Function | IPP Function | Function Description |
Compression/Test | ippsEncodeLZ77InitAlloc_8u | Allocates memory and initializes the encoding state |
ippsEncodeLZ77SetStatus_8u | Sets the deflate status to the desired value in the LZ77 encoding state structure | |
ippsEncodeLZ77GetPairs_8u | Retrieves pair data from the LZ77 encoding state | |
ippsEncodeLZ77_8u | Performs LZ77 encoding | |
ippsEncodeLZ77FixedHuff_8u | Performs fixed Huffman encoding | |
ippsEncodeLZ77DynamicHuff_8u | Performs dynamic Huffman encoding | |
ippsEncodeLZ77Flush_8u | Writes the checksum and total length of the input data to the end of the stream | |
ippsLZ77Free_8u | Frees memory allocated for the LZ77 encoding and decoding structures | |
ippsCRC32_8u | Computes the CRC32 checksum for the source data buffer | |
Decompression | ippsDecodeLZ77InitAlloc_8u | Allocates memory and initializes the LZ77 decoding structure |
ippsDecodeLZ77SetStatus_8u | Sets the inflate status to the desired value in the LZ77 decoding state structure | |
ippsDecodeLZ77GetBlockType_8u | Determines the type of encoded data block | |
ippsDecodeLZ77FixedHuffFull_8u | Performs LZ77 and fixed Huffman decoding | |
ippsDecodeLZ77DynamicHuffFull_8u | Performs LZ77 and dynamic Huffman decoding | |
ippsDecodeLZ77StoredBlock_8u | Performs stored block (RFC 1951) decoding |
Recommended hardware:
Hardware requirements:
Software requirements:
Note: for now, IPP_GZIP sample cannot be built using Microsoft* Visual Studio 6.0 and Microsoft* Visual Studio 2003 environment, since this sample uses OpenMP functionality which is available in Microsoft* Visual Studio 2005/2008 and Intel® C++ compilers only. Future sample releases will use Windows thread API and can be built using VS6.0 and VS2003 compilers too.
The Intel® IPP Data Compression IPP_GZIP Sample for Windows contains the following files:
.\ipp-samples |
|
ippEULA.rtf or ippEULA.txt |
End User License Agreement |
support.txt |
Contains technical support information |
.\ipp-samples\data-compression\ipp_gzip\windows |
|
Build32.bat |
Batch file for building the sample for a Windows system based on the IA-32 architecture |
buildem64t.bat |
Batch file for building the sample for a Windows system based on the Intel ® 64 architecture |
Makefile |
Make file to build sample application |
readme.htm |
This file |
compress.bat |
Examples of how to use ipp_gzip in Unix-like command line environment |
uncompress.bat |
|
zcat.bat |
|
zgrep.bat |
|
zless.bat |
|
zmore.bat |
|
.\ipp-samples\data-compression\ipp_gzip\src |
|
ipp_gzip.c |
Sample’s main function and upper-level dispatching functions |
ipp_gzip_deflate.c |
Single-threaded data compression function and directory processing function |
ipp_gzip_deflate_mt.c |
Multi-threaded data compression function and service functions to it |
ipp_gzip_inflate.c |
Single-threaded data decompression function |
ipp_gzip_inflate_mt.c |
Multi-threaded data decompression function and service functions |
ipp_gzip_io.c |
File input/output functions |
ipp_gzip_utils.c |
Several common functions |
.\ipp-samples\data-compression\ipp_gzip\include |
|
ipp_gzip.h |
Common header file for the sample |
version.h |
Application version text |
Extract all files in w_ipp-samples_*.zip to a desired folder. Make sure the directory structure is preserved.
+ Set up your build environment by creating an environment variable named
IPPROOT
that points to the root directory of your Intel
®
IPP installation. For example:
C:\Program Files\Intel\IPP\6.0.x.xxx\
+ To build for a system based on the IA-32 architecture, run build32.bat.
The following files should be created as a result of correct building:
executable files: .\bin\win32\ipp_gzip.exe
+ To build on Intel® 64 architecture based system, run buildem64t.bat.
The following files should be created as a result of correct building:executable files: .\bin\winem64t\ipp_gzip.exe
Note that with default settings, the sample will be built using the Intel C++ compiler. To change the default settings, see Makefile for brief description of available variables and additional options.
To run this sample, the Intel® IPP 6.0 DLLs must be on the system's path. The DLLs are located in the bin subdirectory of the root directory of the Intel® IPP installation (IPPROOT). This can be done by setting the PATH environment variable manually or by invoking the batch file:
%IPPENV%
When built, the data-compression IPP_GZIP sample contains one executable file: ipp_gzip.exe .
Type the following command at the command prompt : ipp_gzip -V
If IPP environment is correctly, you must see the following version output:
Copyright © 2007-2008 Intel Corporation. All rights reserved.
IPP gzip. Version 0.1
Target architecture: win32
Optimization: Speed
Library: dynamic
ipp_gzip: compressed data not written to a terminal. Use -f to force compression.
For help, type: ipp_gzip -h
Try to get “-h” page using “ipp_gzip –h”:
Usage: ipp_gzip.exe [OPTION]... [FILE]... Compress or uncompress FILEs (by default, compress FILES in-place)
Mandatory arguments to long options are mandatory for short options too. -c, --to-stdout output to stdout -c, --stdout the same -d, --decompress decompress -d, --uncompress the same -f, --force force output file overwrite -h, --help print this text -l, --list list content of gzip archive -n, --no-name don't save/restore original filename/time -N, --name save/restore original filename/time -r, --recursive recurse into directories -S, --suffix make new suffix instead of .gz -t, --test test archive integrity -T, --no-time don't save/restore file date/timestamp -v, --verbose print some intermediate information -V, --version print current version number -1, --fast use faster compression method -9, --best use best compression method -m, --num-threads DEBUG: set number of threads to create -b, --force-dynamic force dynamic Huffman coding -j, --min-size DEBUG: set minimum input file length to slice Input/Ouput options: -D, --DEBUG DEBUG mode: prints some debug info -y, --read-method I/O: read method to use -y 0 (low/level) -y 1 (mmap) -w, --write-method I/O: write method to use -w 0 (low/level) -w 1 (mmap) -u, --read-buffer I/O: read buffer size in Kbytes -i, --write-buffer I/O: write buffer size in Kbytes -s, --stat display performance statistics
IPP_GZIP command line syntax is compatible with the original GZIP utility syntax and has the following common format:
> ipp_gzip [option] [option] … [file] [file] …
Option | Description | Usage Example |
-c
(or) --to-stdout (or) --stdout |
Uses “stdout” as application output file. “Stdout” is not always terminal. For example, when redirection is used “> file”, “stdout” is a specified file. Or, when pipe operator is used “ | next_command”, “stdout” is a pipe. Moreover, IPP_GZIP never puts compressed data to terminal-like stdout. |
ipp_gzip –c table.txt > mytable.gz
(compresses table.txt file and redirects “stdout” to mytable.gz file) ipp_gzip –c –d mytable.gz (decompresses mytable.gz file to screen) ipp_gzip –c table.txt | gzip –t –v (compress table.txt with IPP_GZIP and tests the compressed data integrity with original GZIP) |
-d
(or) --decompress (or) --uncompress |
Forces data decompression operation.
By default, when no “-d” option is specified, data processing is compression. |
ipp_gzip –d table.txt.gz
(decompresses table.txt.gz file into table.txt file) |
-f
(or) --force |
Overwrites existing output file without any additional questions.
By default, IPP_GZIP asks for a user’s approval to overwrite existing file with the same name |
ipp_gzip –d –f table.txt.gz
(decompresses table.txt.gz into table.txt file and overwrites existing table.txt file if any) |
-h
(or) --help |
Prints the list of options | ipp_gzip –help |
-l
(or) --list |
Prints the content of compressed file(s) |
ipp_gzip –l *.gz
(prints the content of all .gz files in current directory) |
-n
(or) --no-name |
Does not save or restore original file name/date-time stamp.
By default, IPP_GZIP writes the name of the file it compresses into GZIP header, so that the original file with its name could be restored even if the archive has been renamed. When “-n” option is used, IPP_GZIP creates decompressed file name from the name of the archive. |
ipp_gzip –n table.txt
(compresses table.txt file producing table.txt.gz file with no original name inside) |
-N
(or) --name |
Saves original file name and date-time stamp (default case) | ipp_gzip –N table.txt |
-r
(or) --recursive |
When IPP_GZIP encounters a directory file, it processes all files inside this directory |
ipp_gzip –r archive
(if “archive” is a directory, compress all files – other directories too – inside “archive”) |
-S .suff
(or) --suffix .suff |
Uses specified “.suff” suffix instead of default “.gz”. Note that dot “.” will NOT be added automatically |
ipp_gzip –S .gzip table.txt
(compresses table.txt into table.txt.gzip file) |
-t
(or) --test |
Does not decompress the file, but tests its integrity. Notifies the user about all non-consistent problems inside the compressed file. |
ipp_gzip –t table.txt.gz
|
-v
(or) --verbose |
Prints some additional information after the following operations:
- compression (prints % % of compression) - decompression (prints %% of compression) - list (prints checksum, date stamp, compression method) |
ipp_gzip –d –v table.txt.gz
(decompresses table.txt.gz file and prints the name of restored file and % value of compression) |
-V
(or) --version |
Prints sample name, copyright information, target architecture, compiler optimization and library linkage. Does nothing extra. | |
-1
(or) --fast |
Uses the fastest compression method. Saves time during compression, but loses in compression ratio (and produces larger files) | ipp_gzip - -fast table.txt |
-9
(or) --best |
Uses the best compression method. Saves disk space, but loses in compression time (sometimes significantly) | ipp_gzip -9 table.txt |
-m
number
(or) --num-threads number |
Sets the number of active threads during multi-threaded operations. See the next section for more details. |
ipp_gzip –m 8 huge_file.txt
(compresses huge_file.txt using 8 threads) |
-j
size
(or) --min-size size |
Sets the minimum file length that the parallel compression begins from. See the next section for more details. |
ipp_gzip –j 10000 small_file.dat
(if small_file.dat is less than 10000 bytes long, single-threaded compression is used) |
-y number | Defines what I/O read file method will be used. "-y 0" forces low-level i/o functions (POSIX read/write). "-y 1" forces memory mapping file reads. "-y" option together with "-w" allow to choose best i/o method depending on a particular user platform. Use "-s" option to get i/o statistics for best i/o method choice, | |
-w number | "-w 0" forces low-level i/o functions on file writes. "-w 1" forces memory mapping. | |
-u number
(or) -i number |
These options set the size of read (-u) or write (-i) buffers which will be used during i/o operarions. By default, number=64, i.e. 64KB buffers used. Sometimes, the increased sizes of i/o buffers improve the throughput of i/o operations. Use "-s" option to find the best combination of i/o methods and buffer size. | |
-s | Prints the number of CPU clocks used per input symbol (i.e. input byte). The lower values mean better performance. |
Since the goal of IPP_GZIP is not only to use the IPP Data Compression functions for speeding-up the file processing, but also to fully utilize the benefits of the modern Intel® microprocessor architectures, it extensively uses multi-threading during its execution. Multi-threading has specific features depending on IPP_GZIP operation .
Compression is the most 'heavy' operation in terms of CPU resources, so the maximum benefit from multi-threading can be obtained, as it would be expected, during the compression. There are two ways of using multi-threading: multi-file threading and multi-chunk threading.
Multi-file threading is used when more than one file is specified on the command line. For example, if we want to compress two files on a two-CPU computer (or on a single-unit Intel® Core (TM) 2 Duo processor computer), our natural decision will be to process each file in a separate thread and thus fully benefit from a dual-CPU computer. That is what IPP_GZIP does. For example:
> ipp_gzip file1 file2
will compress file1 on one CPU and file2 on the other CPU. If our system has more than two CPUs, other CPUs will not be used. If number of files specified on the IPP_GZIP command line is more than number of available CPUs , all of them will be processed in parallel using existing CPUs. For example, file1 on CPU1, file 2 on CPU2, file3 on CPU1, etc.
Multi-chunk file processing is used when we process a single file on a multi-CPU computer. Thus, on a 4-CPU computer the command line
> ipp_gzip a-very-huge-file.dat
will split the “a-very-huge-file.dat” file into 4 pieces (chunks) and will compress each chunk on separate CPUs combining processed data into a single output file “ a-very-huge-file.dat.gz ”. Of course, the compression ratio in this case will be a little bit worse than in the single-thread compression – since LZ77 compression methods use statistical data (or pre-history) to compress better – but this overhead (actually, 1-2%) is the cost of boosted compression performance ( 10x-20x times faster on 4/8-CPU computers vs. original GZIP compression speed).
The “-m” option can be used to control the multi-thread operations. For example, using “-m 2” on a 4-CPU computer we can limit IPP_GZIP to two threads. Or, vice versa, using “-m 4” option on a single-CPU computer we can produce archives as if they were compressed on a 4-core CPU. Of course, forced multi-threading on a single-CPU computer will not speed-up the compression, but it will produce the archives which can be decompressed on a multi-CPU system and thus benefit from multi-CPU.
The “-j size” option controls the multi-chunk compression. For example, if we are using a multiprocessor system, but the file to be processed is not big enough, we may not speed-up, but, rather, slow-down the compression because of thread creation/synchronization overhead. The default value of minimum file length is 256 KB and is defined by “#define MIN_LENGTH_TO_SLICE …” value in the “ipp_gzip.h” file.
Decompression is quite simple operation which does not require significant CPU resources, so the benefit of multi-thread decompression is not that big here. Decompression is almost a copy operation and is almost fully limited by input/output system performance. But, nevertheless, multi-thread decompression can be 2+ times faster than single-thread decompression.
Multi-file threading is the same as in the case of compression. If more than one file is specified on the IPP_GZIP command line, and the current system has more than one CPU, all files are decompressed in parallel.
Now about multi-chunk decompression. Of course, not every compressed file can be decompressed in parallel. The trick is that, during multi-chunk compression, additional information about chunks – chunk offsets and partial check sums – is written to the comment field of the GZIP file header. This field is ignored by the original GZIP, so existence/absence of this field does not affect usual GZIP operations. This means that files compressed by IPP_GZIP using multiple chunks can be decompressed by the original GZIP tool.
As in the case of compression, the “-m <number>” option can control the number of processors to be used. If we want to limit decompression to the specified number of threads – to save some CPU resources for other applications – we can use the “-m <less_thread_number>” option for that. If the “-m” option is not specified, IPP_GZIP tries to use as much processors as possible. For example, the file previously compressed on a 4-CPU system using a 4-thread processing is decompressed on a 2-CPU system using all two CPUs. If the number of the decompressing processors is less than the number of chunks, other processors will not be used.
To receive support or provide feedback for the Intel® Integrated Performance Primitives Data Compression IPP_GZIP Sample for Windows*, please refer to the "Technical Support and Feedback" section of the release notes ( ReleaseNotes.htm ) provided in the Intel® IPP product installation. Your feedback on the Intel® IPP 6.0 samples is very important to us and your input will be considered for future releases. The Intel® IPP 6.0 sample code is intended only as an example of how to use the APIs to implement algorithms in different development environments. Please submit problems with installation, compiling, linking, runtime errors or incorrect output to Intel® Premier Support .
You can also share and discuss the experience of IPP sample usage with other developers at Intel Software Developer Forum .
BunnyPeople, Celeron, Celeron Inside, Centrino, Centrino logo, Core Inside, FlashFile, i960, InstantIP, Intel, Intel logo, Intel386, Intel486, Intel740, IntelDX2, IntelDX4, IntelSX2, Intel Core, Intel Inside, Intel Inside logo, Intel. Leap ahead., Intel. Leap ahead. logo, Intel NetBurst, Intel NetMerge, Intel NetStructure, Intel SingleDriver, Intel SpeedStep, Intel StrataFlash, Intel Viiv, Intel vPro, Intel XScale, IPLink, Itanium, Itanium Inside, MCS, MMX, Oplus, OverDrive, PDCharm, Pentium, Pentium Inside, skoool, Sound Mark, The Journey Inside, VTune, Xeon, and Xeon Inside are trademarks of Intel Corporation in the U.S. and other countries.
* Other names and brands may be claimed as the property of others.
Copyright © 2002-2008 Intel Corporation. All rights reserved.