Compression codecs
Codecs (Coder/Decoders) are used to compress and decompress data using various compression algorithms. Flume supports gzip
, bzip2
, lzo
, and snappy
, although you might have to install lzo
yourself, especially if you are using a distribution such as CDH, due to licensing issues.
If you want to specify compression for your data, set the hdfs.codeC
property if you want the HDFS sink to write compressed files. The property is also used as the file suffix for the files written to HDFS. For example, if you specify the following, all files that are written will have a .gzip
extension, so you don't need to specify the hdfs.fileSuffix
property in this case:
agent.sinks.k1.hdfs.codeC=gzip
The codec you choose to use will require some research on your part. There are arguments for using gzip
or bzip2
for their higher compression ratios at the cost of longer compression times, especially if your data is written once but will be read hundreds or thousands of times. On the other hand,...