arbtt capture logs' compressibility
Gwern Branwen
gwern at gwern.net
Fri Jan 2 22:11:11 CET 2015
I was wondering how well the append-logs work for removing redundancy
and how compressible capture logs are, so I looked at some compression
algorithms' (zip/gzip/bzip2/xz) performance on my 403MB of logs
captured since 2012 (with varying interval settings).
It looks like even with the weakest compression, my logs compress to
1/5th the size, and with the strongest reasonably available
compression algorithms (barring exotic novelties like ZPAQ), down to
1/7th the size.
Table of results:
Method Setting Result size Compression Time
-------- -------- ----------- ----------- --------
423485623 1.000 00:01.80
gzip min 093571261 0.221 00:06.76
gzip default 083031289 0.196 00:14.22
gzip max 080882789 0.191 01:31.65
zip min 093571387 0.221 00:07.41
zip default 083031415 0.196 00:15.16
zip max 080882915 0.191 01:37.52
bzip2 min 075065976 0.177 00:49.55
bzip2 max 071075812 0.168 00:54.65
xz min 066347932 0.157 05:09.73
xz default 063307572 0.150 08:01.83
xz max 062339916 0.147 10:10.62
Shell commands:
# 404M total
du -c *.log
255692 2012-2013.log
47572 2013-2014.log
108148 2014.log
2168 capture.log
413580 total
# time to read off SSD:
cat *.log | time wc --bytes
423485623
0.00user 0.12system 0:01.80elapsed 7%CPU (0avgtext+0avgdata
1844maxresident)k
0inputs+0outputs (0major+86minor)pagefaults 0swaps
## GZIP
# min
$ cat *.log | gzip -1 --stdout - | time wc --bytes
93571261
0.01user 0.04system 0:06.76elapsed 0%CPU (0avgtext+0avgdata
1928maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
# default
$ cat *.log | gzip -6 --stdout - | time wc --bytes
83031289
0.00user 0.04system 0:14.22elapsed 0%CPU (0avgtext+0avgdata
1908maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
# max
$ cat *.log | gzip -9 --stdout - | time wc --bytes
80882789
0.00user 0.06system 1:31.65elapsed 0%CPU (0avgtext+0avgdata
1796maxresident)k
0inputs+0outputs (0major+82minor)pagefaults 0swaps
## ZIP
$ cat *.log | zip -1 - - | time wc --bytes
adding: - (deflated 78%)
93571387
0.01user 0.03system 0:07.41elapsed 0%CPU (0avgtext+0avgdata
1864maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
$ cat *.log | zip -6 - - | time wc --bytes
adding: - (deflated 80%)
83031415
0.00user 0.05system 0:15.16elapsed 0%CPU (0avgtext+0avgdata
1860maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
$ cat *.log | zip -9 - - | time wc --bytes
adding: - (deflated 81%)
80882915
0.00user 0.05system 1:37.52elapsed 0%CPU (0avgtext+0avgdata
1920maxresident)k
0inputs+0outputs (0major+85minor)pagefaults 0swaps
## BZIP2
# min
$ cat *.log | bzip2 -1 --stdout --compress --quiet | time wc --bytes
75065976
0.03user 0.06system 0:49.55elapsed 0%CPU (0avgtext+0avgdata
1864maxresident)k
0inputs+0outputs (0major+84minor)pagefaults 0swaps
# max (default)
$ cat *.log | bzip2 -9 --stdout --compress --quiet | time wc --bytes
71075812
0.00user 0.05system 0:54.65elapsed 0%CPU (0avgtext+0avgdata
1932maxresident)k
8inputs+0outputs (1major+87minor)pagefaults 0swaps
## XZ
$ cat *.log | xz -0 --stdout | time wc --bytes
66347932
0.00user 0.03system 5:09.73elapsed 0%CPU (0avgtext+0avgdata
1924maxresident)k
0inputs+0outputs (0major+87minor)pagefaults 0swaps
$ cat *.log | xz -6 --stdout | time wc --bytes
63307572
0.00user 0.04system 8:01.83elapsed 0%CPU (0avgtext+0avgdata
1800maxresident)k
$ cat *.log | xz -9 --extreme --stdout | time wc --bytes
62339916
0.00user 0.03system 10:10.62elapsed 0%CPU (0avgtext+0avgdata
1840maxresident)k
0inputs+0outputs (0major+83minor)pagefaults 0swaps
--
gwern
http://www.gwern.net
More information about the arbtt
mailing list