From gwern at gwern.net Fri Jan 2 22:11:11 2015 From: gwern at gwern.net (Gwern Branwen) Date: Fri, 2 Jan 2015 16:11:11 -0500 Subject: arbtt capture logs' compressibility Message-ID: I was wondering how well the append-logs work for removing redundancy and how compressible capture logs are, so I looked at some compression algorithms' (zip/gzip/bzip2/xz) performance on my 403MB of logs captured since 2012 (with varying interval settings). It looks like even with the weakest compression, my logs compress to 1/5th the size, and with the strongest reasonably available compression algorithms (barring exotic novelties like ZPAQ), down to 1/7th the size. Table of results: Method Setting Result size Compression Time -------- -------- ----------- ----------- -------- 423485623 1.000 00:01.80 gzip min 093571261 0.221 00:06.76 gzip default 083031289 0.196 00:14.22 gzip max 080882789 0.191 01:31.65 zip min 093571387 0.221 00:07.41 zip default 083031415 0.196 00:15.16 zip max 080882915 0.191 01:37.52 bzip2 min 075065976 0.177 00:49.55 bzip2 max 071075812 0.168 00:54.65 xz min 066347932 0.157 05:09.73 xz default 063307572 0.150 08:01.83 xz max 062339916 0.147 10:10.62 Shell commands: # 404M total du -c *.log 255692 2012-2013.log 47572 2013-2014.log 108148 2014.log 2168 capture.log 413580 total # time to read off SSD: cat *.log | time wc --bytes 423485623 0.00user 0.12system 0:01.80elapsed 7%CPU (0avgtext+0avgdata 1844maxresident)k 0inputs+0outputs (0major+86minor)pagefaults 0swaps ## GZIP # min $ cat *.log | gzip -1 --stdout - | time wc --bytes 93571261 0.01user 0.04system 0:06.76elapsed 0%CPU (0avgtext+0avgdata 1928maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps # default $ cat *.log | gzip -6 --stdout - | time wc --bytes 83031289 0.00user 0.04system 0:14.22elapsed 0%CPU (0avgtext+0avgdata 1908maxresident)k 0inputs+0outputs (0major+85minor)pagefaults 0swaps # max $ cat *.log | gzip -9 --stdout - | time wc --bytes 80882789 0.00user 0.06system 1:31.65elapsed 0%CPU (0avgtext+0avgdata 1796maxresident)k 0inputs+0outputs (0major+82minor)pagefaults 0swaps ## ZIP $ cat *.log | zip -1 - - | time wc --bytes adding: - (deflated 78%) 93571387 0.01user 0.03system 0:07.41elapsed 0%CPU (0avgtext+0avgdata 1864maxresident)k 0inputs+0outputs (0major+84minor)pagefaults 0swaps $ cat *.log | zip -6 - - | time wc --bytes adding: - (deflated 80%) 83031415 0.00user 0.05system 0:15.16elapsed 0%CPU (0avgtext+0avgdata 1860maxresident)k 0inputs+0outputs (0major+83minor)pagefaults 0swaps $ cat *.log | zip -9 - - | time wc --bytes adding: - (deflated 81%) 80882915 0.00user 0.05system 1:37.52elapsed 0%CPU (0avgtext+0avgdata 1920maxresident)k 0inputs+0outputs (0major+85minor)pagefaults 0swaps ## BZIP2 # min $ cat *.log | bzip2 -1 --stdout --compress --quiet | time wc --bytes 75065976 0.03user 0.06system 0:49.55elapsed 0%CPU (0avgtext+0avgdata 1864maxresident)k 0inputs+0outputs (0major+84minor)pagefaults 0swaps # max (default) $ cat *.log | bzip2 -9 --stdout --compress --quiet | time wc --bytes 71075812 0.00user 0.05system 0:54.65elapsed 0%CPU (0avgtext+0avgdata 1932maxresident)k 8inputs+0outputs (1major+87minor)pagefaults 0swaps ## XZ $ cat *.log | xz -0 --stdout | time wc --bytes 66347932 0.00user 0.03system 5:09.73elapsed 0%CPU (0avgtext+0avgdata 1924maxresident)k 0inputs+0outputs (0major+87minor)pagefaults 0swaps $ cat *.log | xz -6 --stdout | time wc --bytes 63307572 0.00user 0.04system 8:01.83elapsed 0%CPU (0avgtext+0avgdata 1800maxresident)k $ cat *.log | xz -9 --extreme --stdout | time wc --bytes 62339916 0.00user 0.03system 10:10.62elapsed 0%CPU (0avgtext+0avgdata 1840maxresident)k 0inputs+0outputs (0major+83minor)pagefaults 0swaps -- gwern http://www.gwern.net From gwern at gwern.net Tue Jan 27 00:15:43 2015 From: gwern at gwern.net (gwern at gwern.net) Date: Mon, 26 Jan 2015 15:15:43 -0800 (PST) Subject: darcs patch: arbtt.xml: effective use: +irssi solution Message-ID: <54c6ca9f.0b608c0a.a313.1554@mx.google.com> 1 patch for repository http://darcs.nomeata.de/arbtt: Mon Jan 26 18:14:51 EST 2015 gwern at gwern.net * arbtt.xml: effective use: +irssi solution -------------- next part -------------- A non-text attachment was scrubbed... Name: patch-preview.txt Type: text/x-darcs-patch Size: 1086 bytes Desc: Patch preview URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: arbtt_xml_-effective-use_-_irssi-solution.dpatch Type: application/x-darcs-patch Size: 2169 bytes Desc: A darcs patch for your repository! URL: From mail at joachim-breitner.de Tue Jan 27 09:45:57 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Tue, 27 Jan 2015 09:45:57 +0100 Subject: darcs patch: arbtt.xml: effective use: +irssi solution In-Reply-To: <54c6ca9f.0b608c0a.a313.1554@mx.google.com> References: <54c6ca9f.0b608c0a.a313.1554@mx.google.com> Message-ID: <1422348357.1951.0.camel@joachim-breitner.de> Hi, Am Montag, den 26.01.2015, 15:15 -0800 schrieb gwern at gwern.net: > 1 patch for repository http://darcs.nomeata.de/arbtt: > > Mon Jan 26 18:14:51 EST 2015 gwern at gwern.net > * arbtt.xml: effective use: +irssi solution thanks! applied. Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jason at tunapanda.org Tue Jun 23 12:03:58 2015 From: jason at tunapanda.org (Jason Mule) Date: Tue, 23 Jun 2015 13:03:58 +0300 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats Message-ID: I get this error when I run arbtt-stats 0.9.0.4 after running arbtt-capture. My categorize.cfg has a single line: $idle > 30 ==> tag inactive I have omitted the capture log due to privacy reasons. Any help/ ideas towards resolving this will be appreciated. -- Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at joachim-breitner.de Tue Jun 23 12:28:07 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Tue, 23 Jun 2015 12:28:07 +0200 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: References: Message-ID: <1435055287.1308.16.camel@joachim-breitner.de> Hi, Am Dienstag, den 23.06.2015, 13:03 +0300 schrieb Jason Mule: > I get this error when I run arbtt-stats 0.9.0.4 after running arbtt > -capture. My categorize.cfg has a single line: > $idle > 30 ==> tag inactive > hmm. There is only one use of !! in the code of arbtt-stats, and that involves reading the capture log. Does "arbtt-dump" work? Does it help to fix the log file with "arbtt -recover"? Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jason at tunapanda.org Tue Jun 23 14:23:35 2015 From: jason at tunapanda.org (Jason Mule) Date: Tue, 23 Jun 2015 15:23:35 +0300 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: <1435055287.1308.16.camel@joachim-breitner.de> References: <1435055287.1308.16.camel@joachim-breitner.de> Message-ID: On Tue, Jun 23, 2015 at 1:28 PM, Joachim Breitner wrote: > Am Dienstag, den 23.06.2015, 13:03 +0300 schrieb Jason Mule: > > I get this error when I run arbtt-stats 0.9.0.4 after running arbtt > > -capture. My categorize.cfg has a single line: > > $idle > 30 ==> tag inactive > > > > hmm. There is only one use of !! in the code of arbtt-stats, and that > involves reading the capture log. > > Does "arbtt-dump" work? Does it help to fix the log file with "arbtt > -recover"? > > arbtt-dump works up to a certain point and then the same error is thrown. arbtt-recover complains with the following error: "Failed to read value at position 35696: Unsupported TimeLogEntry version tag 87" and many more like this. I have attached the categorize.cfg that was actually used to generate the log. Apologies for missing this and thanks for the quick response. -- Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: categorize.cfg Type: application/octet-stream Size: 982 bytes Desc: not available URL: From mail at joachim-breitner.de Tue Jun 23 15:12:50 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Tue, 23 Jun 2015 15:12:50 +0200 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: References: <1435055287.1308.16.camel@joachim-breitner.de> Message-ID: <1435065170.1308.24.camel@joachim-breitner.de> Hi, Am Dienstag, den 23.06.2015, 15:23 +0300 schrieb Jason Mule: > arbtt-dump works up to a certain point and then the same error is > thrown. > > arbtt-recover complains with the following error: > "Failed to read value at position 35696: > Unsupported TimeLogEntry version tag 87" > > and many more like this. So the log is corrupt. This sometimes happens, but never to me... It also means the problem is independent of your categorize.cfg, which is only used later on. Your best bet is to let arbtt-recover run and follow the instructions in the manpage: arbtt-recover tries to read the data samples recorded by arbtt-capture(1), skipping over possible broken entries. A fixed log file is written to ~/.arbtt/capture.log.recovered. If the recovery was successful, you should stop arbtt-capture and move the file to ~/.arbtt/capture.log. Ideally, you should only lose a few broken entries in the log this way. Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jason at tunapanda.org Tue Jun 23 16:56:29 2015 From: jason at tunapanda.org (Jason Mule) Date: Tue, 23 Jun 2015 17:56:29 +0300 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: <1435065170.1308.24.camel@joachim-breitner.de> References: <1435055287.1308.16.camel@joachim-breitner.de> <1435065170.1308.24.camel@joachim-breitner.de> Message-ID: On Tue, Jun 23, 2015 at 4:12 PM, Joachim Breitner wrote: > Your best bet is to let arbtt-recover run and follow the instructions > in the manpage: > > arbtt-recover tries to read the data samples > recorded by arbtt-capture(1), skipping over > possible broken entries. A fixed log file is > written to ~/.arbtt/capture.log.recovered. If the > recovery was successful, you should stop > arbtt-capture and move the file to > ~/.arbtt/capture.log. > I managed to run this to recover the log. > > Ideally, you should only lose a few broken entries in the log this way. > Unfortunately arbtt-stats doesn't output anything when using the recovered log. -- Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at joachim-breitner.de Wed Jun 24 10:00:23 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Wed, 24 Jun 2015 10:00:23 +0200 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: References: <1435055287.1308.16.camel@joachim-breitner.de> <1435065170.1308.24.camel@joachim-breitner.de> Message-ID: <1435132823.28618.0.camel@joachim-breitner.de> Hi, Am Dienstag, den 23.06.2015, 17:56 +0300 schrieb Jason Mule: > I managed to run this to recover the log. how do the file sizes compare? > > Ideally, you should only lose a few broken entries in the log this > > way. > > > Unfortunately arbtt-stats doesn't output anything when using the > recovered log. Not anything at all? What about "arbtt-stats --info"? What about arbtt-dump? Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jason at tunapanda.org Wed Jun 24 11:05:50 2015 From: jason at tunapanda.org (Jason Mule) Date: Wed, 24 Jun 2015 12:05:50 +0300 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: <1435132823.28618.0.camel@joachim-breitner.de> References: <1435055287.1308.16.camel@joachim-breitner.de> <1435065170.1308.24.camel@joachim-breitner.de> <1435132823.28618.0.camel@joachim-breitner.de> Message-ID: Hi, On Wed, Jun 24, 2015 at 11:00 AM, Joachim Breitner wrote: > Not anything at all? What about "arbtt-stats --info"? > > What about arbtt-dump? > I managed to get data with a different categorize.cfg. The difference in file sizes is 3K and the original log file was 76K. Thanks again for your help! -- Jason -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at joachim-breitner.de Wed Jun 24 11:08:37 2015 From: mail at joachim-breitner.de (Joachim Breitner) Date: Wed, 24 Jun 2015 11:08:37 +0200 Subject: 'arbtt-stats: Prelude.(!!): index too large' when running arbtt-stats In-Reply-To: References: <1435055287.1308.16.camel@joachim-breitner.de> <1435065170.1308.24.camel@joachim-breitner.de> <1435132823.28618.0.camel@joachim-breitner.de> Message-ID: <1435136917.28618.11.camel@joachim-breitner.de> Hi, Am Mittwoch, den 24.06.2015, 12:05 +0300 schrieb Jason Mule: > I managed to get data with a different categorize.cfg. The difference > in file sizes is 3K and the original log file was 76K. Thanks again > for your help! given that arbtt-recover also compactifies the data a bit, it looks as if you did not lose much. Greetings, Joachim -- Joachim ?nomeata? Breitner mail at joachim-breitner.de ? http://www.joachim-breitner.de/ Jabber: nomeata at joachim-breitner.de ? GPG-Key: 0xF0FBF51F Debian Developer: nomeata at debian.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: