From mail at joachim-breitner.de  Wed Jan  1 02:31:41 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 01 Jan 2014 02:31:41 +0100
Subject: tagging inactive intervals
In-Reply-To: <l9v5jv$lgd$2@ger.gmane.org>
References: <l9stdt$qhk$1@ger.gmane.org> <1388511157.5709.27.camel@kirk>
 <l9v5jv$lgd$2@ger.gmane.org>
Message-ID: <1388539901.13556.1.camel@kirk>

Hi,

Am Dienstag, den 31.12.2013, 19:23 +0000 schrieb Oren Gampel:
> That's my problem. I usually use separate desktops for separate tasks so 
> the other task's windows stay open.
> The coming solution to the famous Issue #1 (https://bitbucket.org/nomeata/
> arbtt/issue/1) would actually solve all my troubles. :)

so you use a browser window on the workspace corresponding to your task?

Great, I don?t think Issue #1 will cause any problems, so expect me to
implement it soon.


> > I am a bit wary because it requires looking into the future of the
> > currently investigated tag, which raises the complexity of code and
> > algorithms, and will prevent you from getting accurate results for
> > samples taken just now, i.e. their tags will change depending on what
> > you do later.
> 
> I assume you're doing a "single pass" on the log, and I guess any 
> algorithm suggested should be a single pass one. I don't think there is a 
> "mathematically correct" solution, so keeping it simple should be indeed 
> the way to go.

Single-Pass, preferably stateless, is of course preferred. If needed,
one can do something more fancy... but it better be worth it :-)

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140101/84e64ad8/attachment.asc>

From mail at joachim-breitner.de  Wed Jan  1 15:33:54 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 01 Jan 2014 15:33:54 +0100
Subject: tagging inactive intervals
In-Reply-To: <1388539901.13556.1.camel@kirk>
References: <l9stdt$qhk$1@ger.gmane.org> <1388511157.5709.27.camel@kirk>
 <l9v5jv$lgd$2@ger.gmane.org> <1388539901.13556.1.camel@kirk>
Message-ID: <1388586834.2801.7.camel@kirk>

Hi,

Am Mittwoch, den 01.01.2014, 02:31 +0100 schrieb Joachim Breitner:
> Am Dienstag, den 31.12.2013, 19:23 +0000 schrieb Oren Gampel:
> > That's my problem. I usually use separate desktops for separate tasks so 
> > the other task's windows stay open.
> > The coming solution to the famous Issue #1 (https://bitbucket.org/nomeata/
> > arbtt/issue/1) would actually solve all my troubles. :)
> 
> so you use a browser window on the workspace corresponding to your task?
> 
> Great, I don?t think Issue #1 will cause any problems, so expect me to
> implement it soon.

done! You can use "$desktop" in categorize.cfg. Can you check that it
works for you as requested?

This changes the on-disk-format of capture.log. It is possible to run
the old and new arbtt-capture alternating (i.e. it is safe to kill the
stable version and run the repository version for a while, but returning
to the other later), but then only the repo version of arbtt-stats can
read the data.

Or work with flags to use a test file, instead of ~/.arbtt/capture.log.

Or simply switch to darcs alltogether, and tell me about all the bugs
you find :-)

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140101/0bce16b1/attachment.asc>

From mail at joachim-breitner.de  Wed Jan  1 15:42:51 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 01 Jan 2014 15:42:51 +0100
Subject: several feature suggestion
In-Reply-To: <l9rram$g3e$1@ger.gmane.org>
References: <1388330745.2502.16.camel@kirk> <l9q34a$9qb$1@ger.gmane.org>
 < l9rkbe$vf$1@ger.gmane.org> <1388403621.2589.3.camel@kirk>
 <l9rram$g3e$1@ger.gmane.org>
Message-ID: <1388587371.2801.8.camel@kirk>

Hi,

Am Montag, den 30.12.2013, 13:09 +0000 schrieb Oren Gampel:
> >> > Ok, so there are two problems:
> >> >  1. Plasma desktops also appear as windows, and hence are useless to
> >> > record. This leads to the question: How to detect such windows
> >> > (without adding configuration options and preferably without
> >> > hard-coding their name). Does xprop list anything useful for them?
> >> > What is their _NET_WM_WINDOW_TYPE?
> >> 
> >> I'm afraid _NET_WM_WINDOW_TYPE(ATOM) = _NET_WM_WINDOW_TYPE_DESKTOP
> >> 
> >> but...
> > 
> > why ?I?m afraid? ? this is actually good. I guess arbtt-capture should
> > ignore all windows with that window type, right?
> 
> Ok, gotcha. Yes! That's just noise in the log as it is now...

also done. Can you make sure (e.g. with the new "arbtt-capture --dump"
feature) that these windows no longer appear in the log file?

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140101/baa70885/attachment.asc>

From oren at orengampel.com  Wed Jan  1 23:19:42 2014
From: oren at orengampel.com (Oren Gampel)
Date: Wed, 1 Jan 2014 22:19:42 +0000 (UTC)
Subject: tagging inactive intervals
References: <l9stdt$qhk$1@ger.gmane.org> <1388511157.5709.27.camel@kirk> <
 l9v5jv$lgd$2@ger.gmane.org> <1388539901.13556.1.camel@kirk> <1388586834.
 2801.7.camel@kirk>
Message-ID: <la249u$lgd$3@ger.gmane.org>

Hey,

>> Great, I don?t think Issue #1 will cause any problems, so expect me to
>> implement it soon.
> 
> done! You can use "$desktop" in categorize.cfg. Can you check that it
> works for you as requested?
> 

Perfect! Working well, on KDE + Plasma. 
This would make the .cfg file so much simpler.

Thanks a lot Joachim!


From oren at orengampel.com  Wed Jan  1 23:20:13 2014
From: oren at orengampel.com (Oren Gampel)
Date: Wed, 1 Jan 2014 22:20:13 +0000 (UTC)
Subject: several feature suggestion
References: <1388330745.2502.16.camel@kirk> <l9q34a$9qb$1@ger.gmane.org> <
 1388403621.2589.3.camel@kirk> <l9rram$g3e$1@ger.gmane.org> <1388587371.2801
 .8.camel@kirk>
Message-ID: <la24at$lgd$4@ger.gmane.org>

Hey,

>> > why ?I?m afraid? ? this is actually good. I guess arbtt-capture
>> > should ignore all windows with that window type, right?
>> 
>> Ok, gotcha. Yes! That's just noise in the log as it is now...
> 
> also done. Can you make sure (e.g. with the new "arbtt-capture --dump"
> feature) that these windows no longer appear in the log file?
> 

Working well, on KDE + Plasma. I no longer get the former (unnecessary) 
plasma entries.


From ianwojtowicz at gmail.com  Thu Jan  2 04:54:21 2014
From: ianwojtowicz at gmail.com (Ian Wojtowicz)
Date: Thu, 2 Jan 2014 03:54:21 +0000 (UTC)
Subject: arbtt as a library?
References: <52495BAD.30909@ocharles.org.uk> <1380540040.4640.28.camel@kirk>
Message-ID: <loom.20140102T045239-396@post.gmane.org>

> > As an aside, I thought I could make do with the csv output, but I can't
> > pipe that:
> > 
> > 
> > ollie <at> io ~> arbtt-stats -c IRC --output-format=csv | grep -i haskell
> > arbtt-stats: ioctl: invalid argument (Invalid argument)
> 
> For now I?d prefer interaction and integration via the commands. I just
> fixed this particular bug in the darcs repository.

I'm having the same problem with arbtt 0.7. I tried building the repo code, 
but it failed on a bunch of dependencies.

Is there a debian package of the fixed code I can get from somewhere?

Ian


From mail at joachim-breitner.de  Thu Jan  2 11:16:02 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Thu, 02 Jan 2014 11:16:02 +0100
Subject: arbtt as a library?
In-Reply-To: <loom.20140102T045239-396@post.gmane.org>
References: <52495BAD.30909@ocharles.org.uk> <1380540040.4640.28.camel@kirk>
 <loom.20140102T045239-396@post.gmane.org>
Message-ID: <1388657762.2542.1.camel@kirk>

Hi,

Am Donnerstag, den 02.01.2014, 03:54 +0000 schrieb Ian Wojtowicz:
> > > As an aside, I thought I could make do with the csv output, but I can't
> > > pipe that:
> > > 
> > > 
> > > ollie <at> io ~> arbtt-stats -c IRC --output-format=csv | grep -i haskell
> > > arbtt-stats: ioctl: invalid argument (Invalid argument)
> > 
> > For now I?d prefer interaction and integration via the commands. I just
> > fixed this particular bug in the darcs repository.
> 
> I'm having the same problem with arbtt 0.7. I tried building the repo code, 
> but it failed on a bunch of dependencies.
> 
> Is there a debian package of the fixed code I can get from somewhere?

I could manually build one, but it would be easier if you manage to
build from the repo. Can you try

$ apt-get build-dep arbtt
$ apt-get install cabal-install
$ cd ..../arbtt/
$ darcs pull      # or git pull
$ cabal install

and if that does not work, tell us the error message?

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140102/75c97e20/attachment.asc>

From rich at racitup.com  Tue Jan  7 14:39:59 2014
From: rich at racitup.com (Richard Case)
Date: Tue, 7 Jan 2014 13:39:59 +0000
Subject: Nested if then else in categorize.cfg
Message-ID: <CAK1feJCVYLnhPx88px3GoiqaQfJJTc+mHd=6uFeMcChzwGgmDg@mail.gmail.com>

Hi guys,

I'm new to this so bear with me but I'm trying to do a rule like this for
Firefox using arbtt v0.7:

-- Firefox
current window ($program == "Navigator") ==>
  if $title =~ /^(.*) - (.*@.*) - .* Mail - Mozilla Firefox$/ then tag
Email:$2-$1 else
  if $title =~ /^(.*) - Calendar - Mozilla Firefox$/ then tag Calendar:$1
else
  if $title =~ /^(.*) - Mozilla Firefox$/ then tag Firefox:$1 else tag
Firefox,

But I get:
Parser error:
"/home/rich/.arbtt/categorize.cfg" (line 29, column 3):
unexpected "i"
expecting "else"

If looks to be correct according to the syntax rules; am I doing something
wrong or is this a bug?

Cheers,
Rich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140107/6c6bbab0/attachment.htm>

From mail at joachim-breitner.de  Tue Jan  7 15:44:06 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Tue, 07 Jan 2014 14:44:06 +0000
Subject: Nested if then else in categorize.cfg
In-Reply-To: <CAK1feJCVYLnhPx88px3GoiqaQfJJTc+mHd=6uFeMcChzwGgmDg@mail.gmail.com>
References: <CAK1feJCVYLnhPx88px3GoiqaQfJJTc+mHd=6uFeMcChzwGgmDg@mail.gmail.com>
Message-ID: <1389105846.14997.1.camel@kirk>

Hi Rich,

Am Dienstag, den 07.01.2014, 13:39 +0000 schrieb Richard Case:

> I'm new to this so bear with me but I'm trying to do a rule like this
> for Firefox using arbtt v0.7:
> 
> -- Firefox
> current window ($program == "Navigator") ==>
>   if $title =~ /^(.*) - (.*@.*) - .* Mail - Mozilla Firefox$/ then tag
> Email:$2-$1 else
>   if $title =~ /^(.*) - Calendar - Mozilla Firefox$/ then tag
> Calendar:$1 else
>   if $title =~ /^(.*) - Mozilla Firefox$/ then tag Firefox:$1 else tag
> Firefox,
> 
> 
> But I get:
> Parser error:
> "/home/rich/.arbtt/categorize.cfg" (line 29, column 3):
> unexpected "i"
> expecting "else"
> 
> 
> 
> If looks to be correct according to the syntax rules; am I doing
> something wrong or is this a bug?

thanks for bringing this up. This is a bug, and my proposed fix also
uncovers a bug. See my answer on SE:
http://stackoverflow.com/a/20973950/946226

And also thanks for posting on Stackexchange; we are now present there
with our very own arbtt-tag:
http://stackoverflow.com/questions/tagged/arbtt

I?ll be happy to answer more arbtt-questions there.

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140107/05ea98a5/attachment.asc>

From tkadapter at tkadapter.com  Sat Jan 18 00:00:03 2014
From: tkadapter at tkadapter.com (2014-01-18 07:00:15)
Date: Sat, 18 Jan 2014 07:00:03 +0800
Subject: Make the payment after you receive oil paintings
Message-ID: <20140118070016318304@tkadapter.com>

An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140118/f93862f7/attachment.htm>

From ian at miscellaneousprojects.com  Mon Feb 10 00:35:06 2014
From: ian at miscellaneousprojects.com (Ian Wojtowicz)
Date: Sun, 9 Feb 2014 15:35:06 -0800
Subject: Time Formatting
Message-ID: <CAHVyvEfEBXajGG4pENQM6-9eKAW=iEkeiPYKZ36XsisSP3DsPQ@mail.gmail.com>

I would like to see future versions of arbtt improve the data export
functionality so we can chain this great utility to reporting and
visualization tools.

One small request would be to change the time format from ##h##m##s to
##:##:##. This would improve compatibility with most over software systems.

The other request is about CSV exporting. I'm not sure what the best
solution would be, but I would like to be able to export project and tag
stats by time periods. For example, at the end of a week or work, I'd like
to be able to export a series of daily values for a set of projects and
tags.

Right now, this requires a lot of manual editing of CSV files. It should be
simpler...

_____________________________________________________
http://miscellaneousprojects.com
Talk to us. We're interactive.
617 466 4701
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140209/c2fb856c/attachment.htm>

From mail at joachim-breitner.de  Mon Feb 10 23:53:28 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 10 Feb 2014 22:53:28 +0000
Subject: Time Formatting
In-Reply-To: <CAHVyvEfEBXajGG4pENQM6-9eKAW=iEkeiPYKZ36XsisSP3DsPQ@mail.gmail.com>
References: <CAHVyvEfEBXajGG4pENQM6-9eKAW=iEkeiPYKZ36XsisSP3DsPQ@mail.gmail.com>
Message-ID: <1392072808.18345.11.camel@kirk>

Hi,

Am Sonntag, den 09.02.2014, 15:35 -0800 schrieb Ian Wojtowicz:
> I would like to see future versions of arbtt improve the data export
> functionality so we can chain this great utility to reporting and
> visualization tools.

Sure, any suggestions are welcome.

> One small request would be to change the time format from ##h##m##s to
> ##:##:##. This would improve compatibility with most over software
> systems.

This has come up so often now, I guess it?s about time (pun intended):
https://bitbucket.org/nomeata/arbtt/issue/7/machine-readable-time-format

Do you agree that for human-readable output, the current output is
easier to read, given the wildly varying ranges, from seconds to weeks?

> The other request is about CSV exporting. I'm not sure what the best
> solution would be, but I would like to be able to export project and
> tag stats by time periods. For example, at the end of a week or work,
> I'd like to be able to export a series of daily values for a set of
> projects and tags.
> 
> 
> Right now, this requires a lot of manual editing of CSV files. It
> should be simpler...

there are so many different way of generating reports, so we?ll have to
see which are generally useful, or how to make the reporting
compositional.

Have you tried
$ arbtt-stats --for-each=day --filter='$date >= 2014-02-01'
which can be combined with your favorite report, and also with
--output-format=csv ? seems to be what you need.

(You need to run the repository version for that, though.)

Greetings,
Joachim


-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140210/b3b92fdb/attachment.asc>

From antonio.blanco1 at aol.com  Wed Feb 19 09:14:24 2014
From: antonio.blanco1 at aol.com (=?iso-8859-1?Q?=22Antonio_Blanco_?==?iso-8859-1?Q?=22?=)
Date: Wed, 19 Feb 2014 08:14:24 -0000
Subject: OFFIZIELLE GEWINNBENACHRITIGUNG
Message-ID: <bb3ad203c47b737b61e6ce1a4d0de25c.squirrel@138.91.227.98>


-- 
Achtung:

 ?ffnen Sie bitte das beigef?gte Dokument zum Abrufen ihrer Nachrichten

 Danke

 mfg
 Eva Morena

 Alle Korrespondenten an,
 Don Juan Gomez (agent)
 Win Seguros
 Email: ddongomez at gmail.com, oder
 juan.gomez at spainmail.com
 Tel: 0034 631 547 811
 Fax: 0034 917 693 077


*****************************************
 Der Austausch von Nachrichten per e-mail dient ausschlie?lich zu
 Informationszwecken. Deshalb nehmen wir keine rechtlichen Erkl?rungen des
 Absenders per e-mail. Die Informationen in dieser Nachricht ist
 vertraulich und ausschlie?lich f?r den Adressaten. Wenn sich der Empf?nger
 dieser Nachricht ist nicht der Adressat, einer seiner Mitarbeiter oder
 sein bevollm?chtigter Vertreter, der Empf?nger wird hiermit darauf
 aufmerksam gemacht, dass er/sie sich nicht mit den Inhalten, offenlegen
 oder reproduzieren ihren Inhalt. Wenn Sie diese Meldung irrt?mlich
 erhalten haben, benachrichtigen Sie bitte den Absender sofort und l?schen
 Sie die Nachricht von Ihrem System.
 ***********************
 Alle Warenzeichen sind Eigentum der jeweiligen Inhaber.
>Copyright ? 2010-2014. Alle Rechte vorbehalten
-------------- next part --------------
A non-text attachment was scrubbed...
Name: c.g.euromilion.pdf
Type: application/pdf
Size: 545187 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140219/0a1c4e54/attachment.pdf>

From ykloan at guide2dubrovnik.com  Wed Feb 26 01:04:19 2014
From: ykloan at guide2dubrovnik.com (TV sem limites)
Date: Tue, 25 Feb 2014 21:04:19 -0300
Subject: =?ISO-8859-1?Q?Televis=E3o?= ilimitada - =?ISO-8859-1?Q?Divers=E3o?=
 SEM MENSALIDADES
Message-ID: <CP5-293939hEUsebmsp000005d7@CP5-293939>


Conhe?a agora o primeiro e melhor sistema de TV pelo computador do Brasil.

Voc? poder? assistir a todos os canais pagos, sem pagar mensalidade, sem
instalar nada no computador, e de qualquer computador em que estiver.

Acesse agora:
http://ow.ly/u05Ws


Veja abaixo algumas das vantagens exclusivas do nosso sistema:

QUALQUER PESSOA PODE ADQUIRIR TV NO PC
Voc? pode assistir em qualquer Computador ou Notebook com Internet banda larga.

O QUE ? O Super Guia de TV no seu Computador?
? um Guia de Canais Online, ao vivo e v?deos da internet, com os quais ? poss?vel receber e assistir variados canais de TV e R?dio do mundo inteiro.

N?O PRECISA INSTALAR NENHUM PROGRAMA EM SEU COMPUTADOR
Enviaremos uma senha de acesso em seu email para voc? assistir TV Ao Vivo em tempo real, 24 horas por dia,n?o importa aonde voc? esteja, no trabalho, em casa, no lazer, etc, basta sempre acessar seu canais online atrav?s da internet a qualquer hora do dia.

F?CIL ACESSO:
Interface de f?cil acesso atrav?s de senha, tudo em portugu?s. Em LINUX ou WINDOWS. Basta ter um navegador de Internet, conex?o Banda Larga e Windows Media Player.

TECNOLOGIA DIGITAL VIA INTERNET:
Esta nova tecnologia chegou para proporcionar a voc? uma programa??o infinita de canais, sem cobran?a de mensalidades.

-----------------------------------------
Acesse agora:
http://ow.ly/u05Ws
-----------------------------------------

PROGRAMA??O COMPLETA:
Assista filmes, programas jornal?sticos, de entretenimento, culturais, document?rios, canais de videoclipes, em qualquer lugar do mundo.

MAIS DE 10.000 CANAIS:
Assista de mais de 180 pa?ses diferentes no mundo. S?o mais de 10.000 canais de TV e r?dio dispon?veis para sua escolha.

TVs DO MUNDO INTEIRO:
Transmiss?o de TV do Brasil e de todos os pa?ses do mundo em tempo real!

SUPER F?CIL DE USAR:
Seletor r?pido de canais na tela do seu computador. Basta selecionar o pa?s e a programa??o desejada, e pronto!

NENHUM APARELHO PRECISA SER INSTALADO NO SEU COMPUTADOR:
Somente ? necess?rio um computador ou notebook conectado ? Internet e mais nada!

SEM PAGAMENTOS MENSAIS OU ASSINATURA:
Sem nenhuma taxa extra. Sem mensalidades. Voc? nunca ser? cobrado por nada. Somente a taxa de aquisi??o.

________________________________
Acesse agora:
http://ow.ly/u05Ws
________________________________


Estamos esperando por voc?.


Atenciosamente,

Equipe de divulga??o - TV2010
http://ow.ly/u05Ws


From fotoefotoefotoe at fotoe.com  Wed Feb 26 11:47:50 2014
From: fotoefotoefotoe at fotoe.com ( )
Date: Wed, 26 Feb 2014 18:47:50 +0800
Subject: No subject
Message-ID: <20140226184756056027@fotoe.com>

An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140226/e5904732/attachment.htm>

From mail at joachim-breitner.de  Sat Mar 29 19:21:20 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sat, 29 Mar 2014 19:21:20 +0100
Subject: arbtt
In-Reply-To: <5335CC99.3030908@unity.pl>
References: <5335CC99.3030908@unity.pl>
Message-ID: <1396117280.2549.32.camel@kirk>

Dear Krzysztof,

may I invite you to join the arbtt mailing list:
https://lists.nomeata.de/mailman/listinfo/arbtt

It is a better forum to discuss arbtt issues, as there are more users
who can contribute and comment.

Your feature request of "--only-tags" is sensible. I?m not fully sure
about the name. Note that "--only" selects which samples are to be
considered, _not_ which tags are to be shown. I believe you want
something like "--output-only-uncategorized-tags"...

If I wanted to avoid introducing a new option, maybe "--output-only :"
should do that ? it could arguably mean ?output only tags that have no
category?. What do you think?

Greetings,
Joachim


Am Freitag, den 28.03.2014, 20:25 +0100 schrieb Krzysztof Jakowczyk:
> Hello,
> 
> I miss the time summaries for each tag without category listing, eg.
> now:
> 
> _______________________________________Tag_|______Time_|_Percentage_
>                                   terminal |  1h31m00s |      53.85
>                                        www |    49m00s |      28.99
>                                      opera |    45m00s |      26.63
>                              obsluga-biura |    26m00s |      15.38
>              terminal:xelite_s-kjakowcz___ |    19m00s |      11.24
>                    terminal:root_tpsprap01 |    17m00s |      10.06
>                                     jabber |    17m00s |      10.06
>                      jabber:sebastian_berc |    15m00s |       8.88
>                 terminal:arbtt-dump___less |     9m00s |       5.33
> terminal:categorize_cfg_______arbtt__-_VIM |     8m00s |       4.73
>           terminal:root_nova-controller___ |     6m00s |       3.55
>   terminal:categorize_cfg_____arbtt__-_VIM |     6m00s |       3.55
>       terminal:vim____arbtt_categorize_cfg |     5m00s |       2.96
>                        terminal:root_test4 |     5m00s |       2.96
>                    terminal:root_bjgprap01 |     5m00s |       2.96
>                                     chrome |     4m00s |       2.37
>                                  iceweasel |     3m00s |       1.78
>                                    icedove |     3m00s |       1.78
>                    terminal:ssh_10_1_1_199 |     2m00s |       1.18
> 
> 
> ...but i want only: (something like --only-tags)
> 
> _______________________________________Tag_|______Time_|_Percentage_
>                                   terminal |  1h31m00s |      53.85
>                                        www |    49m00s |      28.99
>                                      opera |    45m00s |      26.63
>                              obsluga-biura |    26m00s |      15.38
>                                     jabber |    17m00s |      10.06
>                                     chrome |     4m00s |       2.37
>                                  iceweasel |     3m00s |       1.78
>                                    icedove |     3m00s |       1.78
> 
> 
> 
> 

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0x4743206C
  Debian Developer: nomeata at debian.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 181 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140329/8e7c3e13/attachment.asc>

From m.katsikatsou at lse.ac.uk  Fri Apr  4 11:33:15 2014
From: m.katsikatsou at lse.ac.uk (BEC & GR)
Date: Fri, 04 Apr 2014 10:33:15 +0100
Subject: GREAT CALL
Message-ID: <EXCHIC3zEaKb8wpOucY0004fd2d@exchic3.lse.ac.uk>

<HTML><BODY><IMG src="http://i.imgur.com/Sy0REOD.jpg"></BODY></HTML>

Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140404/02be5095/attachment.htm>

From info at royalassyredn.com  Sun Apr  6 09:31:57 2014
From: info at royalassyredn.com (Royal Assured Loan)
Date: Sun, 06 Apr 2014 17:31:57 +1000
Subject: Darlehen Angebot !
Message-ID: <20140406173157.19706bev9gqe7m8s@webmail.devilbendgolf.com.au>

Wir bieten privaten und gewerblichen Darlehen ohne Sicherheiten (nur  
Identifikation) bei 3% Zinssatz, ab ? 10.000 bis ? 90.000.000 in 1  
Jahr bis 20 Jahren Laufzeit ?berall in der Welt

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.


From delighting at kuechemann-hevensen.de  Mon Apr 14 21:57:52 2014
From: delighting at kuechemann-hevensen.de (Beech Keipe)
Date: Mon, 14 Apr 2014 22:57:52 +0300
Subject: bauxite
Message-ID: <534C3DB6.1080506@ryiu.de>

n never hurt you again, G


From ween at wmm.de  Wed Apr 16 01:10:17 2014
From: ween at wmm.de (=?utf-8?b?QmViZS1BbWVjIFppbms=?=)
Date: Tue, 15 Apr 2014 19:10:17 -0400
Subject: =?utf-8?b?w7xiZXJmw6RsbGlnZSBaYWhsdW5n?=
Message-ID: <534DBC3C7673168@la-piccola-toscana.de>

Alle laufenden Arbeiten wurden rechtzeitig durchgef?hrt,
ich verstehe nicht warum Sie immer noch nicht bezahlt
haben und mir 187.15 euro schulden. Kostenplan im Anhang.

Bebe-Amec Zink.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: kostenplan.zip
Type: application/x-zip-compressed
Size: 32341 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140415/42160cf2/attachment.bin>

From mail at joachim-breitner.de  Fri Apr 18 22:28:10 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Fri, 18 Apr 2014 22:28:10 +0200
Subject: arbtt 0.8 released
Message-ID: <1397852890.2539.21.camel@kirk>

Dear arbtt users,

I just uploaded arbtt 0.8 to Hackage. See
http://arbtt.nomeata.de/doc/users_guide/release-notes.html#release-notes-0.8
for a list of changes.

This release is the result of slightly above 100 commits, closing 8 bugs
and took a bit more than a year. I hope you like it!

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140418/c9d792bd/attachment.asc>

From miffoljud at gmail.com  Fri Apr 18 23:12:59 2014
From: miffoljud at gmail.com (Arash Rouhani)
Date: Fri, 18 Apr 2014 23:12:59 +0200
Subject: arbtt 0.8 released
In-Reply-To: <1397852890.2539.21.camel@kirk>
References: <1397852890.2539.21.camel@kirk>
Message-ID: <5351955B.1050900@gmail.com>

Great job Joachim!

/Arash

On 2014-04-18 22:28, Joachim Breitner wrote:
> Dear arbtt users,
>
> I just uploaded arbtt 0.8 to Hackage. See
> http://arbtt.nomeata.de/doc/users_guide/release-notes.html#release-notes-0.8
> for a list of changes.
>
> This release is the result of slightly above 100 commits, closing 8 bugs
> and took a bit more than a year. I hope you like it!
>
> Greetings,
> Joachim
>
>
>
> _______________________________________________
> arbtt mailing list
> arbtt at lists.nomeata.de
> https://lists.nomeata.de/mailman/listinfo/arbtt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140418/528db566/attachment.htm>

From schuld at wegener-bauregie.de  Tue Apr 29 11:46:17 2014
From: schuld at wegener-bauregie.de (Slavica Martens)
Date: Tue, 29 Apr 2014 18:46:17 +0900
Subject: =?utf-8?b?S3JlZGl0c2NodWxkIHp1ciBSZWNobnVuZyAjOTQ3MTQzMjIyNTA1Njg0MQ==?=
Message-ID: <20140429094253$251907hydroponics@blume-und-design.de>

Sehr geehrter Kunde,
 
Wir sind ganz dankbar, dass Sie die Dienstleistungen unserer Bank benutzt haben. 
Wir teilen Ihnen mit, dass vom 28.04.2014 die Schuld beim Konto #9471432225056841 2569.10 Euro betr?gt. Wir bieten Ihnen an, die R?ckzahlung der Geldmittel in vollem Umfang bis 14.05.2014 freiwillig durchzuf?hren. 

Die freiwillige R?ckzahlung der Geldmittel zum Vertrag #197664574963683E8 bietet Ihnen an:
1) Ihre positive Kredit-Geschichte beibehalten 
2) Die Gerichtsverhandlung vermeiden 

Im Falle der Nichtzahlung 2569.10 Euro sind wir im Rahmen der aktuellen Gesetzgebung berechtigt, die Gerichtsstrafe wegen der Schuldigkeit auszuf?hren. 
Die Vertragskopie #197664574963683E8 und Zahlungsangaben sind zu diesem Brief als ZIP-Datei "vertrag_197664574963683E8.zip" hinzugef?gt. 

Mit freundlichen Gr??en, 
Leiter des Departments f?r die Arbeit mit den Schulden.
Slavica Martens
+49 (0) 30 858 142 93
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vertrag_197664574963683E8.zip
Type: application/x-zip-compressed
Size: 27291 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140429/6426823b/attachment.bin>

From ecard at twistdesigns.de  Mon May  5 13:51:26 2014
From: ecard at twistdesigns.de (ecard at twistdesigns.de)
Date: Mon, 05 May 2014 12:51:26 +0100
Subject: =?utf-8?b?RS1DYXJkIGZyb20gIiswMTc0OTM5OTI3MyI=?=
Message-ID: <4w4qh3LKLo1108622945IRa4wFO@plast-laminiertechnik.de>

Absender: +01749399273
Datum: 2014.05.05 11:48:54 UTC.
Nachricht: Ich liebe dich auch!
ID: 6064564E178825.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecard_6064564E178825.zip
Type: application/x-zip-compressed
Size: 25017 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140505/5a78c821/attachment.bin>

From gutschein at 04155846.schule.bwl.de  Mon May 12 09:59:05 2014
From: gutschein at 04155846.schule.bwl.de (gutschein at 04155846.schule.bwl.de)
Date: Mon, 12 May 2014 15:59:05 +0800
Subject: =?utf-8?b?R3V0c2NoZWluIEVFODY3NzU1ODkxNkI=?=
Message-ID: <grudged_8UHyfKH@tsv-schoeppenstedt.de>

Zur Er?ffnung bekommen Sie einen Gutschein f?r kostenlose W?rstchen von uns geschenkt.
Gutscheinnummer: EE8677558916B
G?ltigkeit: bis 21-05-2014

Mit freundlichen Gr??en,
Dimokratis Valentin
+01514464739
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gutschein_EE8677558916B.zip
Type: application/x-zip-compressed
Size: 49916 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140512/297c248f/attachment.bin>

From Informationen at sparkasse.de  Sun May 18 14:21:36 2014
From: Informationen at sparkasse.de (Sparkasse)
Date: 18 May 2014 13:21:36 +0100
Subject: Aktualisieren Sie Ihr Online-Banking-Konto
Message-ID: <20140518122136.18587.qmail@lvps80-90-198-242.vps.webfusion.co.uk>

An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140518/3d14f747/attachment.htm>

From fax at michelfelder-holzwaren.de  Mon May 19 15:17:16 2014
From: fax at michelfelder-holzwaren.de (fax at michelfelder-holzwaren.de)
Date: Mon, 19 May 2014 15:17:16 +0200
Subject: =?utf-8?b?ZmF4IGF1cyAiKzQ5KDApMzAtNTc3LTc0NC00NiIgLSAyNCBzZWl0ZW4=?=
Message-ID: <15402b20140519131603@argen.de>

Faxnachricht [Caller-ID: +49(0)30-577-744-46]
Seiten: 24.
Datum: 2014-05-13 13:16:03 UTC.
Kennziffer: FB9576F175876461459A.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fax_FB9576F175876461459A.zip
Type: application/x-zip-compressed
Size: 41091 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140519/9e6b8f40/attachment.bin>

From fax at staffnet.de  Mon May 26 13:27:00 2014
From: fax at staffnet.de (fax at staffnet.de)
Date: Mon, 26 May 2014 08:27:00 -0300
Subject: =?utf-8?b?ZmF4IGF1cyAiKzQ5KDApMzA4MjY5NzM0OSIgLSAyMCBzZWl0ZW4=?=
Message-ID: <1401103584-systematises@pmdm.de>

Faxnachricht [Caller-ID: +49(0)3082697349]
Seiten: 20.
Datum: 2014.05.13 11:26:24 UTC.
Kennziffer: B19166058858EF2073A7.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: fax_B19166058858EF2073A7.zip
Type: application/x-zip-compressed
Size: 39974 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140526/2147306f/attachment.bin>

From eual.jp at gmail.com  Tue Jun  3 18:03:15 2014
From: eual.jp at gmail.com (Alexander Batischev)
Date: Tue, 3 Jun 2014 19:03:15 +0300
Subject: arbtt-stats: Unsupported TimeLogEntry version tag 0
Message-ID: <20140603160315.GA32379@antaeus>

Hello,

I'm running version 0.8 here.

I start arbtt-capture from my ~/.xsession like that:

    arbtt-capture &

After running the demon for some time, arbtt-stats starts reporting the
following error:

    arbtt-stats: Unsupported TimeLogEntry version tag 0

(and no report is shown, of course). I hypothesised that it is caused by
arbtt-capture being terminated in a wrong way when I terminate my
X session (which I do every few weeks).

I tried to run arbtt-capture in foreground and terminate it with Ctrl-C, but
it didn't cause a problem. So I deleted the log, then run arbtt-capture
for a few minutes to get some entries, and then truncated it by one
byte:

    dd if=capture.log of=new count=1331 ibs=1 && mv -f new capture.log

That did the trick: the error showed up again.

So it seems that I sometimes manage to terminate my X session while
arbtt-capture is writing its log. Can something be done to make
arbtt-capture more resilient to situations like that?

-- 
Regards,
Alexander Batischev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: Digital signature
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140603/9fb8e35e/attachment.asc>

From mail at joachim-breitner.de  Tue Jun  3 23:42:29 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Tue, 03 Jun 2014 23:42:29 +0200
Subject: arbtt-stats: Unsupported TimeLogEntry version tag 0
In-Reply-To: <20140603160315.GA32379@antaeus>
References: <20140603160315.GA32379@antaeus>
Message-ID: <1401831749.15404.4.camel@kirk>

Hi,

thanks for the report.

First of all: You can usually recover from a broken log using
arbtt-recover. But that is not nice if it happens often.

I run arbtt-capture for years now, and I shut down X daily, and haven?t
seen that problem, so there must be more to it ?but now idea what.

But you are right: The writing could be more reliable. I guess instead
of always appending, I could append and then write the length somewhere
in the beginning of the file (hoping that writing one word is atomic);
when opening the file again I truncate at that point...

Greetings,
Joachim


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140603/5aff8f70/attachment.asc>

From sendebericht at acapaelzer.de  Mon Jun 23 08:19:42 2014
From: sendebericht at acapaelzer.de (sendebericht at acapaelzer.de)
Date: Sun, 22 Jun 2014 23:19:42 -0700
Subject: Fax sendebericht
Message-ID: <rationale.yFyNIhi@eifel-bauernhof.de>

An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140622/e8f0ebc3/attachment.htm>

From gwern at gwern.net  Thu Jun 26 22:53:49 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Thu, 26 Jun 2014 16:53:49 -0400
Subject: Extracting CSV summaries by day
In-Reply-To: <CAMwO0gzPqOtN+GJkoNb_dJizXAKdAYq0TFAzSu6BOLUdC_gD+g@mail.gmail.com>
References: <CAMwO0gwXG6AKE4nV6KQcF=JBVsscpAk-QDQtKn24OY+Zko8d8Q@mail.gmail.com>
 <1383508997.4978.8.camel@kirk>
 <CAMwO0gxTuJzFgZxjGX82fBxkhzUW7Ad8FFJsynXfFTroyRKZzg@mail.gmail.com>
 <1383605744.5011.6.camel@kirk>
 <CAMwO0gzzdHVJ9EBgz8qkQpShoVfTQkmmSdyudjoGD17hP4s=1Q@mail.gmail.com>
 <1383640126.2353.1.camel@kirk>
 <CAMwO0gx934fyLc8tqUa+_XbkPTQq6vyaTP56BxW1u0udZB9teQ@mail.gmail.com>
 <1383688444.18235.4.camel@kirk>
 <CAMwO0gxOCf_s62jvGeXDz7XiZMRoo-8Q8iLFOL4kNgZe3Pd64w@mail.gmail.com>
 <1383690170.18235.9.camel@kirk>
 <CAMwO0gzPqOtN+GJkoNb_dJizXAKdAYq0TFAzSu6BOLUdC_gD+g@mail.gmail.com>
Message-ID: <CAMwO0gwBi-mpV8RXo=_F1LdhR5Z=YqYPefRGGOEHT6r3ayS-QQ@mail.gmail.com>

On Tue, Nov 5, 2013 at 5:56 PM, Gwern Branwen <gwern at gwern.net> wrote:
> On Tue, Nov 5, 2013 at 5:22 PM, Joachim Breitner
> <mail at joachim-breitner.de> wrote:
>> ok, try this and tell me what you think of it:
>> http://people.debian.org/~nomeata/arbtt/arbtt_0.7.1-1~pre1_amd64.deb
>
> Looks good. I think I can use it. (R has routines for converting long
> to wide format, so the column vs row thing isn't a really big
> problem.)

Yes, turns out to work pretty nicely. I've been looking at a factor
analysis of my various metrics, and it turned out to be pretty easy to
incorporate the arbtt csv output (once I repaired the log with
arbtt-recover, yet again).

So for my current purpose my workflow goes:

    $ arbtt-stats --logfile=/home/gwern/doc/arbtt/2013-2014.log
--output-format="csv" --for-each="day" --min-percentage=0 >
2013-2014-arbtt.txt
    $ emacs -nw 2013-2014-arbtt.txt # delete before 2 March 2014 and
after 24 June 2014; rename 'Day'->'Date'
    $ mv 2013-2014-arbtt.txt 2014-marchjune-arbtt.csv
   $ R

Then in R we can get a nice clean wide format dataset thusly:

    arbtt <- read.csv("2014-marchjune-arbtt.csv")
    arbtt$Percentage <- NULL # we don't care

    # Convert time-lengths to second-counts: "0:16:40" to 1000
(seconds); "7:57:30" to 28650 (seconds) etc.
    # We prefer units of seconds since arbtt has sub-minute resolution
and not all categories
    # will have a lot of time each day.
    interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x))
as.integer(sub(" s","",x))
                                               else { y <-
unlist(strsplit(x, ":"));

as.integer(y[[1]])*3600 + as.integer(y[[2]])*60 + as.integer(y[[3]]);
}
                                                      }
                              else NA
                              }
    arbtt$Time <- sapply(as.character(arbtt$Time), interval)

    library(reshape)
    arbtt <- reshape(arbtt, v.names="Time", timevar="Tag",
idvar="Date", direction="wide")

-- 
gwern
http://www.gwern.net


From gwern at gwern.net  Sun Jun 29 23:35:49 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sun, 29 Jun 2014 17:35:49 -0400
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
Message-ID: <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>

On Sun, Nov 3, 2013 at 6:36 PM, Gwern Branwen <gwern at gwern.net> wrote:
> On Sun, Nov 3, 2013 at 5:31 PM, Joachim Breitner
> <mail at joachim-breitner.de> wrote:
>> But now you surely want to know what these selected samples look like,
>> right? That leads us to the discussion we had on the list with Waldir
>> in June: What should the tool like like that combines the dumping of
>> arbtt-dump with the sample selection of arbtt-stats... I?m unsure about
>> the proper design here.
>
> To me, it seems pretty simple. Keeping the same interface, apply a
> categorize.cfg's set of rules to each sample and then print or not
> based on what tags matched or didn't match.

Has there been any more thought on this issue? After repairing my logs
& working out how to use the CSV, I wondered how much data I was
missing due to a lack of matching tag. This apparently is reported by
the -i flag. Even after adding some more tagging, this is what I get:

    $ arbtt-stats -i -m 0 -f '$sampleage <100:00'
    General Information
    ===================
                           FirstRecord | 2014-06-26 01:33:16.291076 UTC
                            LastRecord | 2014-06-29 21:28:30.625435 UTC
                     Number of records |                           7485
                   Total time recorded |                    3d19h31m00s
                   Total time selected |                    1d12h41m10s
       Fraction of total time recorded |                           100%
       Fraction of total time selected |                            40%
    Fraction of recorded time selected |                            40%

Given the existence of the flag '--also-inactive         include
samples with the tag "inactive"', I infer all this recorded time
reported is active time. But that means fully *60%* of my activity is
not being classified in any way! That's a heck of a lot of lost data.

And I don't know what the lost data is: I already classified
everything I could think of. What am I missing? I have no way of
knowing unless arbtt will tell me and give me samples of active time
which don't match so I can go 'aha, I need to classify $X/Y/Z as tag
A! Much better.'

-- 
gwern
http://www.gwern.net


From adrian.wilkins at gmail.com  Thu Jul  3 12:26:59 2014
From: adrian.wilkins at gmail.com (Adrian Wilkins)
Date: Thu, 03 Jul 2014 11:26:59 +0100
Subject: Hello and questions..
Message-ID: <53B52FF3.5040308@gmail.com>

Hi there.

Having just spent a day in Timesheet Hell I have sworn to automate the 
process as much as possible.

I conceived of something that had three stages

1. Record information about what I was doing
2. Process the log to distil values for each project / task
3. Automatically upload the digests to our time-tracking system
   * We use Redmine

I even got so far as to test some code to do stage 1 on Windows.... but 
lo and behold, arbtt would seem to do stages 1 and 2 already - on both 
the platforms I use. Hooray! This underscores just why I love the Free 
Software community.

So ; questions (and apologies if they are questions that have already 
been asked)

* Can arbtt aggregate mutliple event logs?

The reason I ask is that my typical working day is conducted across two 
machines - my work-issued Windows laptop, and my personal Linux 
installation. The vast bulk of the work takes place on the Linux 
machine, so I imagine it would be a fair representation of my work time 
in the main, but I do sometimes have to switch to the Windows machine 
(for emails, for example).

* Can arbtt sample mouse position?

I switch between the machines via two methods ; one is by using remote 
desktop, the other by Synergy (a network mouse/keyboard sharing program) 
; I can imagine there might be a place for logging the mouse position at 
time of sampling for this reason - ordinarily it would be useless 
information but when using Synergy you could write rules based on 
whether the mouse was actually on the screen of the machine that is 
logging. (can already write rules to ignore all events collected while 
using a remote desktop, or allocate the time out depending on the 
machine being connected to).

* Can arbtt aggregate other sources of events?

I guess this is a corollary to the above - if, for example, someone were 
to write something that rummaged through your Outlook calendar and 
produced appropriate TimeLogEntry objects for the calendar event ; if 
I'm in a meeting, I want to book all the time in that meeting to the 
instigator of that meeting (regardless of what I'm actually doing IN the 
meeting).


From mail at joachim-breitner.de  Thu Jul  3 12:40:30 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Thu, 03 Jul 2014 12:40:30 +0200
Subject: Hello and questions..
In-Reply-To: <53B52FF3.5040308@gmail.com>
References: <53B52FF3.5040308@gmail.com>
Message-ID: <1404384030.19001.9.camel@kirk>

Hi Adrian,

Am Donnerstag, den 03.07.2014, 11:26 +0100 schrieb Adrian Wilkins:
> * Can arbtt aggregate mutliple event logs?
> 
> The reason I ask is that my typical working day is conducted across two 
> machines - my work-issued Windows laptop, and my personal Linux 
> installation. The vast bulk of the work takes place on the Linux 
> machine, so I imagine it would be a fair representation of my work time 
> in the main, but I do sometimes have to switch to the Windows machine 
> (for emails, for example).

in principle, yes. Most reports (everything but the interval report,
IIRC) works correctly if you have samples in the wrong oder. So you can
use "arbtt-dump --format=Show" to export your various binary logs,
concatenate them and load them back into a new file with
"arbtt-import". 

Eventually we can consider supporting passing --logfile multiple times
to arbtt-stats.

> * Can arbtt sample mouse position?
> 
> I switch between the machines via two methods ; one is by using remote 
> desktop, the other by Synergy (a network mouse/keyboard sharing program) 
> ; I can imagine there might be a place for logging the mouse position at 
> time of sampling for this reason - ordinarily it would be useless 
> information but when using Synergy you could write rules based on 
> whether the mouse was actually on the screen of the machine that is 
> logging. (can already write rules to ignore all events collected while 
> using a remote desktop, or allocate the time out depending on the 
> machine being connected to).

Currently, arbtt does nothing about the mouse position. With Synergy,
what X window is under the mouse when it appears to you as if you are
working on the other machine? I would expect that you can somehow tell
the two situations apart by looking at the active window, but I don?t
know the details of Synergy.

> * Can arbtt aggregate other sources of events?
> 
> I guess this is a corollary to the above - if, for example, someone were 
> to write something that rummaged through your Outlook calendar and 
> produced appropriate TimeLogEntry objects for the calendar event ; if 
> I'm in a meeting, I want to book all the time in that meeting to the 
> instigator of that meeting (regardless of what I'm actually doing IN the 
> meeting).

That is not supported out of the box. But you can easily emulate that
using a small program that just gives you a text input box (or a drop
down or whatever) and puts the information into its title. Then you can
match on that program?s title.


Greetings,
Joachim


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140703/91b0b116/attachment.asc>

From adrian.wilkins at gmail.com  Thu Jul  3 12:51:04 2014
From: adrian.wilkins at gmail.com (Adrian Wilkins)
Date: Thu, 03 Jul 2014 11:51:04 +0100
Subject: Windows build of ARBTT
Message-ID: <53B53598.1000402@gmail.com>

More questions from the noob ...

* Is there a technical reason why the Windows build of ARBTT has lagged 
behind the Linux build so much, or is it just inertia about building it?

 From my POV I'll process the stats on my Linux machine, so if 
arbtt-capture still has the same capability and format, I don't mind so 
much.. but...

* I tried to start up the current 0.61 version and got

OpenProcess: permission denied

This seems to be a problem with requesting the PROCESS_QUERY_INFORMATION 
permission. PROCESS_QUERY_LIMITED_INFORMATION might be more appropriate 
(but XP has no such permission level).

Also our corporate image has some unpleasant corporate malware on it, 
and I'm not making any effort to start arbtt-capture with elevated 
rights. I managed to collect window titles from the current focussed 
window by using hooks without this kind of problem and without 
elevating, so maybe alternate approaches / Win32 API calls might be 
worth looking at.


From adrian.wilkins at gmail.com  Thu Jul  3 13:06:29 2014
From: adrian.wilkins at gmail.com (Adrian Wilkins)
Date: Thu, 03 Jul 2014 12:06:29 +0100
Subject: Hello and questions..
In-Reply-To: <1404384030.19001.9.camel@kirk>
References: <53B52FF3.5040308@gmail.com> <1404384030.19001.9.camel@kirk>
Message-ID: <53B53935.8050605@gmail.com>

On 03/07/14 11:40, Joachim Breitner wrote:
> Hi Adrian,
>
> Am Donnerstag, den 03.07.2014, 11:26 +0100 schrieb Adrian Wilkins:
>> * Can arbtt aggregate mutliple event logs?

> in principle, yes. Most reports (everything but the interval report,
> IIRC) works correctly if you have samples in the wrong oder. So you can
> use "arbtt-dump --format=Show" to export your various binary logs,
> concatenate them and load them back into a new file with
> "arbtt-import".
>

On looking I'm thinking it might be appropriate to merge the set of 
CaptureData for a given sample interval together (assuming that the two 
machines are operating on the same clock time, you could work out which 
TimeLogEntry items go together even with slight clock drift, and merge 
their CaptureData sets, or is this overthinking things?)

> Eventually we can consider supporting passing --logfile multiple times
> to arbtt-stats.
>


>> * Can arbtt sample mouse position?

> With Synergy,
> what X window is under the mouse when it appears to you as if you are
> working on the other machine? I would expect that you can somehow tell
> the two situations apart by looking at the active window, but I don?t
> know the details of Synergy.
>

Next time I'm in the office and using it I'll have a look at the logs.

>> * Can arbtt aggregate other sources of events?

> That is not supported out of the box. But you can easily emulate that
> using a small program that just gives you a text input box (or a drop
> down or whatever) and puts the information into its title. Then you can
> match on that program?s title.
>

That would be a neat workaround ; just open a notepad with a file titled 
"MeetingFor<project>.txt" (for notepads that show the filename in the 
title) :-)

I'm envisaging a program that goes through your calendar and adds 
entries for each minute of the meeting along with it's title / requester 
/ etc ; keeping in with the spirit of no interruptions and not having to 
manually remember to do things. If you could aggregate logs, programs 
that served as alternate event sources would just naturally feed into 
that feature.

My Haskell is virtually non-existent, but then my time-logging blues are 
intense, so my motivation to try and help is strong :-)


From mail at joachim-breitner.de  Thu Jul  3 13:26:52 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Thu, 03 Jul 2014 13:26:52 +0200
Subject: Hello and questions..
In-Reply-To: <53B53935.8050605@gmail.com>
References: <53B52FF3.5040308@gmail.com> <1404384030.19001.9.camel@kirk>
 <53B53935.8050605@gmail.com>
Message-ID: <1404386812.19001.13.camel@kirk>

Hi,


Am Donnerstag, den 03.07.2014, 12:06 +0100 schrieb Adrian Wilkins:
> On 03/07/14 11:40, Joachim Breitner wrote:
> > Hi Adrian,
> >
> > Am Donnerstag, den 03.07.2014, 11:26 +0100 schrieb Adrian Wilkins:
> >> * Can arbtt aggregate mutliple event logs?
> 
> > in principle, yes. Most reports (everything but the interval report,
> > IIRC) works correctly if you have samples in the wrong oder. So you can
> > use "arbtt-dump --format=Show" to export your various binary logs,
> > concatenate them and load them back into a new file with
> > "arbtt-import".
> >
> 
> On looking I'm thinking it might be appropriate to merge the set of 
> CaptureData for a given sample interval together (assuming that the two 
> machines are operating on the same clock time, you could work out which 
> TimeLogEntry items go together even with slight clock drift, and merge 
> their CaptureData sets, or is this overthinking things?)

ah, I see. You are working on two machines simultaneously. That?s
currently not supported and would be non-trivial.

> >> * Can arbtt aggregate other sources of events?
> 
> > That is not supported out of the box. But you can easily emulate that
> > using a small program that just gives you a text input box (or a drop
> > down or whatever) and puts the information into its title. Then you can
> > match on that program?s title.
> >
> 
> That would be a neat workaround ; just open a notepad with a file titled 
> "MeetingFor<project>.txt" (for notepads that show the filename in the 
> title) :-)
> 
> I'm envisaging a program that goes through your calendar and adds 
> entries for each minute of the meeting along with it's title / requester 
> / etc ; keeping in with the spirit of no interruptions and not having to 
> manually remember to do things.

Right. And this program can be technically completely independent from
arbtt, in the spirit of Unix and its small dedicated tools.

>  If you could aggregate logs, programs 
> that served as alternate event sources would just naturally feed into 
> that feature.

Sorry, I can?t follow. How is that related to aggregating logs?

> My Haskell is virtually non-existent, but then my time-logging blues are 
> intense, so my motivation to try and help is strong :-)

Great :-). And a few things you can do without Haskell, anyways.

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140703/440ecd10/attachment.asc>

From mail at joachim-breitner.de  Thu Jul  3 13:29:43 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Thu, 03 Jul 2014 13:29:43 +0200
Subject: Windows build of ARBTT
In-Reply-To: <53B53598.1000402@gmail.com>
References: <53B53598.1000402@gmail.com>
Message-ID: <1404386983.19001.16.camel@kirk>

Hi,

Am Donnerstag, den 03.07.2014, 11:51 +0100 schrieb Adrian Wilkins:
> More questions from the noob ...
> 
> * Is there a technical reason why the Windows build of ARBTT has lagged 
> behind the Linux build so much, or is it just inertia about building it?

Inertia. I?m not aware of any Windows users... (and the for building it
I need to install Haskell and mingw and the installer and stuff under
wine, which is quite annoying). 

I would be very happy if someone would step up and maintain the Windows
version of it, given that I don?t run windows and can?t properly test
this.

> From my POV I'll process the stats on my Linux machine, so if 
> arbtt-capture still has the same capability and format, I don't mind so 
> much.. but...
> 
> * I tried to start up the current 0.61 version and got
> 
> OpenProcess: permission denied
> 
> This seems to be a problem with requesting the PROCESS_QUERY_INFORMATION 
> permission. PROCESS_QUERY_LIMITED_INFORMATION might be more appropriate 
> (but XP has no such permission level).
> 
> Also our corporate image has some unpleasant corporate malware on it, 
> and I'm not making any effort to start arbtt-capture with elevated 
> rights. I managed to collect window titles from the current focussed 
> window by using hooks without this kind of problem and without 
> elevating, so maybe alternate approaches / Win32 API calls might be 
> worth looking at.

Hmm. I guess this needs someone to look into it, but that won?t be me...
it can be you, though :-)

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140703/63071f91/attachment.asc>

From adrian.wilkins at gmail.com  Thu Jul  3 13:48:50 2014
From: adrian.wilkins at gmail.com (Adrian Wilkins)
Date: Thu, 03 Jul 2014 12:48:50 +0100
Subject: Hello and questions..
In-Reply-To: <1404386812.19001.13.camel@kirk>
References: <53B52FF3.5040308@gmail.com> <1404384030.19001.9.camel@kirk>	
 <53B53935.8050605@gmail.com> <1404386812.19001.13.camel@kirk>
Message-ID: <53B54322.7060009@gmail.com>

On 03/07/14 12:26, Joachim Breitner wrote:
> Hi,
>
>

>> I'm envisaging a program that goes through your calendar and adds
>> entries
>
> Right. And this program can be technically completely independent from
> arbtt, in the spirit of Unix and its small dedicated tools.
>

Indeed. This puts me in mind of an alternate approach - a program (or 
more than one) that arbtt-capture interrogates for event data ; so you 
could have arbtt-meeting-daemon etc, and arbtt-capture could add their 
information to it's TimeEventLog entries.

But I still like the idea of being able to work with multiple logs, if 
only because I could write the meeting log thing at the end of the month 
and be able to retrospectively include the events in my arbtt-stats run 
- in much the same way that you can write new rules and get a better 
quality report from the same logged events.

>>   If you could aggregate logs, programs
>> that served as alternate event sources would just naturally feed into
>> that feature.
>
> Sorry, I can?t follow. How is that related to aggregating logs?
>

I was thinking that in order for the separate program(s) discussed above 
to produce useful inputs for arbtt-stats, it would have to aggregrate 
the data in those logs together such that all events for a given sample 
are considered part of the same TimeLogEntry (when being categorized).

e.g. for the hypothetical meeting analyzer program, if it produces a 
TimeLogEntry for each minute of your meeting, you wouldn't want to 
consider those separately to the other TimeLogEntry objects logged by 
arbtt-capture (I take my laptop to meetings and the activity logged 
would disagree with the meeting category in some cases).

----  meeting-analyzer log
2014-07-03 12:32:00 Meeting:$title="Meeting about tortoises"
----  arbtt-capture log
2014-07-03 12:32:05 Program:$title="Web browser : page about snakes"

Rule sets that considered Meeting to take priority over everything else 
would still put the second log entry in the "Project:Snakes" category 
rather than the "Project:Tortoises" category, because that entry has no 
Meeting.

If you rolled those two entries together based on their time being close 
to each other, then Meeting can override the web browser. I'm presuming 
here that arbtt-stats analyzes each TimeLogEntry separately.


From mail at joachim-breitner.de  Thu Jul  3 13:55:51 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Thu, 03 Jul 2014 13:55:51 +0200
Subject: Hello and questions..
In-Reply-To: <53B54322.7060009@gmail.com>
References: <53B52FF3.5040308@gmail.com> <1404384030.19001.9.camel@kirk>
 <53B53935.8050605@gmail.com> <1404386812.19001.13.camel@kirk>
 <53B54322.7060009@gmail.com>
Message-ID: <1404388551.19001.21.camel@kirk>

Hi,

Am Donnerstag, den 03.07.2014, 12:48 +0100 schrieb Adrian Wilkins:
> >>   If you could aggregate logs, programs
> >> that served as alternate event sources would just naturally feed into
> >> that feature.
> >
> > Sorry, I can?t follow. How is that related to aggregating logs?
> >
> 
> I was thinking that in order for the separate program(s) discussed above 
> to produce useful inputs for arbtt-stats, it would have to aggregrate 
> the data in those logs together such that all events for a given sample 
> are considered part of the same TimeLogEntry (when being categorized).
> 
> e.g. for the hypothetical meeting analyzer program, if it produces a 
> TimeLogEntry for each minute of your meeting, you wouldn't want to 
> consider those separately to the other TimeLogEntry objects logged by 
> arbtt-capture (I take my laptop to meetings and the activity logged 
> would disagree with the meeting category in some cases).
> 
> ----  meeting-analyzer log
> 2014-07-03 12:32:00 Meeting:$title="Meeting about tortoises"
> ----  arbtt-capture log
> 2014-07-03 12:32:05 Program:$title="Web browser : page about snakes"
> 
> Rule sets that considered Meeting to take priority over everything else 
> would still put the second log entry in the "Project:Snakes" category 
> rather than the "Project:Tortoises" category, because that entry has no 
> Meeting.

well, if the separate programs (which put additional information into
their title) run on the same machine as the browser, arbtt will take the
samples together, and there is no need to merge them.


But it is of course a valid feature request to intelligently merge log
files from two machines so that simultaneous samples appear as one
sample. But it is rather specialized and non-trival (what to do with
non-matching sampling rates, e.g.), so it?s not on my TODO list yet.

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140703/2de348ce/attachment.asc>

From adrian.wilkins at gmail.com  Thu Jul  3 15:05:02 2014
From: adrian.wilkins at gmail.com (Adrian Wilkins)
Date: Thu, 03 Jul 2014 14:05:02 +0100
Subject: Hello and questions..
In-Reply-To: <1404388551.19001.21.camel@kirk>
References: <53B52FF3.5040308@gmail.com> <1404384030.19001.9.camel@kirk>		
 <53B53935.8050605@gmail.com> <1404386812.19001.13.camel@kirk>	
 <53B54322.7060009@gmail.com> <1404388551.19001.21.camel@kirk>
Message-ID: <53B554FE.1060309@gmail.com>

On 03/07/14 12:55, Joachim Breitner wrote:

> well, if the separate programs (which put additional information into
> their title) run on the same machine as the browser, arbtt will take the
> samples together, and there is no need to merge them.
>

Yeah, the idea of making hidden windows works nicely (maybe a little 
systray applet that collects information and holds a collection of 
objects that arbtt-capture can see), but I still like the idea of having 
programs that you don't need to run 100% of the time for things like 
calendars that are not as dynamic as user activity.

>
> But it is of course a valid feature request to intelligently merge log
> files from two machines so that simultaneous samples appear as one
> sample. But it is rather specialized and non-trival (what to do with
> non-matching sampling rates, e.g.), so it?s not on my TODO list yet.
>

I'm guessing that a more limited case where sample rates must match 
would be the best start there ; I don't like the prospect of matching 
events with different sample rates up either ; clock drift and different 
process start times are bad enough.

In any case... getting it to work at all on Windows would seem to be a 
challenge, which makes my multi-machine ambitions a moot point right now 
- still getting that permission denied problem, even when I run the 
application elevated, although I'm not sure what's causing it - as 
mentioned, we have some pretty heinous corporate malware installed. The 
alternate approach I tried that didn't involve OpenProcess seems to work 
without issues just from my little hacky C# application, so I shall see 
about sorting that out when I have a spare minute.

Until then, I'll be tracking my other minutes on my Linux boxes with 
more accuracy :-)


From mail at joachim-breitner.de  Sat Jul  5 13:42:34 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sat, 05 Jul 2014 13:42:34 +0200
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
Message-ID: <1404560554.4066.7.camel@kirk>

Dear Gwern,

(sorry for the late reply)

Am Sonntag, den 29.06.2014, 17:35 -0400 schrieb Gwern Branwen:
> On Sun, Nov 3, 2013 at 6:36 PM, Gwern Branwen <gwern at gwern.net> wrote:
> > On Sun, Nov 3, 2013 at 5:31 PM, Joachim Breitner
> > <mail at joachim-breitner.de> wrote:
> >> But now you surely want to know what these selected samples look like,
> >> right? That leads us to the discussion we had on the list with Waldir
> >> in June: What should the tool like like that combines the dumping of
> >> arbtt-dump with the sample selection of arbtt-stats... I?m unsure about
> >> the proper design here.
> >
> > To me, it seems pretty simple. Keeping the same interface, apply a
> > categorize.cfg's set of rules to each sample and then print or not
> > based on what tags matched or didn't match.
> 
> Has there been any more thought on this issue?

Have you had a look at the features in 0.8? I believe they (partly)
address the issue:
 
 * arbtt-stats can print the actual samples selected, with 
   --dump-samples.
 (http://darcs.nomeata.de/arbtt/doc/users_guide/release-notes.html)

> After repairing my logs
> & working out how to use the CSV, I wondered how much data I was
> missing due to a lack of matching tag. This apparently is reported by
> the -i flag. Even after adding some more tagging, this is what I get:
> 
>     $ arbtt-stats -i -m 0 -f '$sampleage <100:00'
>     General Information
>     ===================
>                            FirstRecord | 2014-06-26 01:33:16.291076 UTC
>                             LastRecord | 2014-06-29 21:28:30.625435 UTC
>                      Number of records |                           7485
>                    Total time recorded |                    3d19h31m00s
>                    Total time selected |                    1d12h41m10s
>        Fraction of total time recorded |                           100%
>        Fraction of total time selected |                            40%
>     Fraction of recorded time selected |                            40%
> 
> Given the existence of the flag '--also-inactive         include
> samples with the tag "inactive"', I infer all this recorded time
> reported is active time. But that means fully *60%* of my activity is
> not being classified in any way! That's a heck of a lot of lost data.

I believe you understood the flag the wrong way around: Without
--also-inactive, inactive times are _not_ counted as selected. So the
40% in your report should go up when you use "--also-inactive".

Also, your --filter in the above command will have everything that is
older than 100h (if there is any) to be considered as not selected.

> And I don't know what the lost data is: I already classified
> everything I could think of. What am I missing? I have no way of
> knowing unless arbtt will tell me and give me samples of active time
> which don't match so I can go 'aha, I need to classify $X/Y/Z as tag
> A! Much better.'

What if you have a tag "current-program" that will always be present?
With such a tag, the feature you describe is useless. I guess you mean
?show me the data from samples that are not categorized into one of
these tags:....?. But that is already possible:

$ arbtt-stats --dump-samples --filter '$sampleage < 1:00' -x Web -x Project: -x ...

Greetings,
Joachim

-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140705/24ad4c43/attachment.asc>

From gwern at gwern.net  Sat Jul  5 18:38:59 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sat, 5 Jul 2014 12:38:59 -0400
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <1404560554.4066.7.camel@kirk>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
Message-ID: <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>

On Sat, Jul 5, 2014 at 7:42 AM, Joachim Breitner
<mail at joachim-breitner.de> wrote
> Have you had a look at the features in 0.8? I believe they (partly)
> address the issue:
>
>  * arbtt-stats can print the actual samples selected, with
>    --dump-samples.
>  (http://darcs.nomeata.de/arbtt/doc/users_guide/release-notes.html)

No, I didn't even know you had added that. (I assumed you had dropped
the issue back in November.)

> What if you have a tag "current-program" that will always be present?
> With such a tag, the feature you describe is useless. I guess you mean
> ?show me the data from samples that are not categorized into one of
> these tags:....?. But that is already possible:
>
> $ arbtt-stats --dump-samples --filter '$sampleage < 1:00' -x Web -x Project: -x ...

I expected you to object that and I was going to point out that it
could easily be solved on the UX level by letting the user specify
either whitelist or blacklists of tags (either tags to not consider as
a match or tags to exclude). But I see that's how you solved the
problem anyway.

Trying that out now, it seems to work! If I throw in an "| fgrep
'(*)'" to look at the active window only, it looks even better.

I can see a lot of programs I've failed to classify (eg when Google
Reader shut down, I switched to a local RSS reader, Liferea, but
forgot to add it to arbtt), and I've spotted a few instances where my
rule didn't actually work. (eg I had been matching the program
'FBReader' for tagging time spent reading ebooks, but now that I look
at the samples, it seems the program is actually 'fbreader' and it's
the *title* which has 'FBReader' in it. I don't know if FBReader
changed its X properties at some point or if I simply confused program
with title when I was looking at it in `xprop`, but either way, it's
not working.)

This seems like a critical tool for debugging one's arbtt rules and
expanding them. Has a discussion of this been added to the manual? I
only see a mention that the option exists in `docs/arbtt.xml`.

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Sat Jul  5 18:44:39 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sat, 05 Jul 2014 18:44:39 +0200
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
Message-ID: <1404578679.5548.2.camel@kirk>

Hi,

Am Samstag, den 05.07.2014, 12:38 -0400 schrieb Gwern Branwen:
> Trying that out now, it seems to work! If I throw in an "| fgrep
> '(*)'" to look at the active window only, it looks even better.
> 
> I can see a lot of programs I've failed to classify (eg when Google
> Reader shut down, I switched to a local RSS reader, Liferea, but
> forgot to add it to arbtt), and I've spotted a few instances where my
> rule didn't actually work. (eg I had been matching the program
> 'FBReader' for tagging time spent reading ebooks, but now that I look
> at the samples, it seems the program is actually 'fbreader' and it's
> the *title* which has 'FBReader' in it. I don't know if FBReader
> changed its X properties at some point or if I simply confused program
> with title when I was looking at it in `xprop`, but either way, it's
> not working.)

glad to hear it works (and it clearly demonstrates the usefulness of the
a posteriori approach that we take here).

> This seems like a critical tool for debugging one's arbtt rules and
> expanding them. Has a discussion of this been added to the manual? I
> only see a mention that the option exists in `docs/arbtt.xml`.

No, but the manual is quite short on ?how do I do X?. Would you be
interesting in contributing here? I think that _not_ being a developer
is an advantage when writing good documentation.

Greetings,
Joachim


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140705/6b0f4a79/attachment.asc>

From gwern at gwern.net  Sat Aug 16 23:02:18 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sat, 16 Aug 2014 17:02:18 -0400
Subject: Example of time-tracking
Message-ID: <CAMwO0gywpXtOkMp1VqA+SKdS3am-NaivADS09_UskpB-CcNuNQ@mail.gmail.com>

http://lightonphiri.org/blog/quantified-self-the-three-year-long-time-tracking-experiment

> Between June 2011 and June 2013 I diligently tracked the way that I used my 24-hour cycles. I initiated this painstaking task after going through a long spell of productivity draught; I was obsessed with how exactly I spent my ?Work -> Eat -> Sleep? cycles. To achieve this, I used Hamster Time Tracker [1]. I recently decided to dig into this wealth of information and have spent the last couple of days analyzing it. While the data I collected is not 100% accurate, visible patterns emerge.
>
> ...
> - I discovered that I slept an average of 5.1 hours per day (see table below)
> - Day-to-day tasks account for an average of 36.14%, implying that I had 63.86% of productivity time at my disposal
> - I work more and often in the morning (segmenting results into day slots? dawn, wee hours, morning, mid-morning, afternoon, evening?yielded even more interesting results?)
> ...
> - I was able to figure out when I am most productive and was thus able to plan my waking life accordingly
> - I knew where most productivity leaks were coming from (social networking sites for instance) and was able to cut down on those activities when I needed to reclaim time
> - I was able to identify tasks that I could easily perform when I was in ?Zombie? mode (e.g. current affairs)
> - Perhaps the most prized outcome was figuring out when I was most productive?I wrote more in the morning, I read more in the morning and did most of my coding late at night

He used Project Hamster http://projecthamster.wordpress.com/about/
which is a manual self-tracking program:

> Whenever you change from doing one task to other, you change your current activity in Hamster. After a while you can see how many hours you have spent on what. Maybe print it out, or export to some suitable format, if time reporting is a request of your employee.

So capable of more semantics than arbtt, but also a lot less
fine-grained and more work.

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Sat Aug 16 23:10:18 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sat, 16 Aug 2014 23:10:18 +0200
Subject: Example of time-tracking
In-Reply-To: <CAMwO0gywpXtOkMp1VqA+SKdS3am-NaivADS09_UskpB-CcNuNQ@mail.gmail.com>
References: <CAMwO0gywpXtOkMp1VqA+SKdS3am-NaivADS09_UskpB-CcNuNQ@mail.gmail.com>
Message-ID: <1408223418.1972.2.camel@joachim-breitner.de>

Hi,

thanks for the link!

Greetings,
Joachim


Am Samstag, den 16.08.2014, 17:02 -0400 schrieb Gwern Branwen:
> http://lightonphiri.org/blog/quantified-self-the-three-year-long-time-tracking-experiment
> 
> > Between June 2011 and June 2013 I diligently tracked the way that I used my 24-hour cycles. I initiated this painstaking task after going through a long spell of productivity draught; I was obsessed with how exactly I spent my ?Work -> Eat -> Sleep? cycles. To achieve this, I used Hamster Time Tracker [1]. I recently decided to dig into this wealth of information and have spent the last couple of days analyzing it. While the data I collected is not 100% accurate, visible patterns emerge.
> >
> > ...
> > - I discovered that I slept an average of 5.1 hours per day (see table below)
> > - Day-to-day tasks account for an average of 36.14%, implying that I had 63.86% of productivity time at my disposal
> > - I work more and often in the morning (segmenting results into day slots? dawn, wee hours, morning, mid-morning, afternoon, evening?yielded even more interesting results?)
> > ...
> > - I was able to figure out when I am most productive and was thus able to plan my waking life accordingly
> > - I knew where most productivity leaks were coming from (social networking sites for instance) and was able to cut down on those activities when I needed to reclaim time
> > - I was able to identify tasks that I could easily perform when I was in ?Zombie? mode (e.g. current affairs)
> > - Perhaps the most prized outcome was figuring out when I was most productive?I wrote more in the morning, I read more in the morning and did most of my coding late at night
> 
> He used Project Hamster http://projecthamster.wordpress.com/about/
> which is a manual self-tracking program:
> 
> > Whenever you change from doing one task to other, you change your current activity in Hamster. After a while you can see how many hours you have spent on what. Maybe print it out, or export to some suitable format, if time reporting is a request of your employee.
> 
> So capable of more semantics than arbtt, but also a lot less
> fine-grained and more work.
> 

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140816/46447c34/attachment.asc>

From gwern at gwern.net  Sun Aug 17 00:20:51 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sat, 16 Aug 2014 18:20:51 -0400
Subject: Example of time-tracking
In-Reply-To: <CAMwO0gywpXtOkMp1VqA+SKdS3am-NaivADS09_UskpB-CcNuNQ@mail.gmail.com>
References: <CAMwO0gywpXtOkMp1VqA+SKdS3am-NaivADS09_UskpB-CcNuNQ@mail.gmail.com>
Message-ID: <CAMwO0gyB5ra3HSGncDtK-ZpKFbhSi6xTERpBtnp2Kw8DD0zA2g@mail.gmail.com>

On Sat, Aug 16, 2014 at 5:02 PM, Gwern Branwen <gwern at gwern.net> wrote:
>> - I work more and often in the morning (segmenting results into day slots? dawn, wee hours, morning, mid-morning, afternoon, evening?yielded even more interesting results?)

I thought I'd try to extract my own 24-h data from arbtt and see how
my factor-analysis fared, but it looks like this can't be done with
arbtt right now: I need categorization of data usage by minute or by
hour, but it seems arbtt only supports extracting data to csv by
chunks of day/month/year

               --for-each=PERIOD       one of: day, month, year

Is there a workaround here? Or does --for-each need to be extended? I
think it would be enough to add 'minute' as a period, since arbtt
isn't generally used more fine-grained than that and I can aggregate
by hour in R if it turns out that there's not enough data for
plotting/regressing by minute over 24h.

-- 
gwern
http://www.gwern.net


From gwern0 at gmail.com  Wed Sep  3 21:30:19 2014
From: gwern0 at gmail.com (gwern0 at gmail.com)
Date: Wed, 03 Sep 2014 12:30:19 -0700 (PDT)
Subject: darcs patch: arbtt.xml: fix duplicate ID in release n... (and 3 more)
Message-ID: <54076c4b.4533e00a.0a6b.ffffe9e5@mx.google.com>

4 patches for repository http://darcs.nomeata.de/arbtt:

Wed Sep  3 15:15:04 EDT 2014  gwern at gwern.net
  * arbtt.xml: fix duplicate ID in release notes

Wed Sep  3 15:15:35 EDT 2014  gwern at gwern.net
  * arbtt.xml: delete trailing whitespace

Wed Sep  3 15:15:46 EDT 2014  gwern at gwern.net
  * arbtt.xml: add self to doc authors, add a list of similar projects to intro to describe arbtt better

Wed Sep  3 15:16:40 EDT 2014  gwern at gwern.net
  * example categorize.cfg: add local variable to set emacs to haskell-mode
  by adding the metadata, emacs users copying the example config get appropriate
  syntax highlighting without additional work, and other users are reminded that
  their favorite editor's haskell mode would work well in displaying arbtt configs
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-preview.txt
Type: text/x-darcs-patch
Size: 10770 bytes
Desc: Patch preview
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140903/157d6aa8/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arbtt_xml_-fix-duplicate-id-in-release-notes.dpatch
Type: application/x-darcs-patch
Size: 11376 bytes
Desc: A darcs patch for your repository!
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140903/157d6aa8/attachment-0001.bin>

From mail at joachim-breitner.de  Wed Sep  3 21:59:26 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 03 Sep 2014 21:59:26 +0200
Subject: darcs patch: arbtt.xml: fix duplicate ID in release n... (and 3
 more)
In-Reply-To: <54076c4b.4533e00a.0a6b.ffffe9e5@mx.google.com>
References: <54076c4b.4533e00a.0a6b.ffffe9e5@mx.google.com>
Message-ID: <1409774366.1805.4.camel@joachim-breitner.de>

Hi,


Am Mittwoch, den 03.09.2014, 12:30 -0700 schrieb gwern0 at gmail.com:
> 4 patches for repository http://darcs.nomeata.de/arbtt:

thanks! Applied.


Greetings,
Joachim
-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140903/73ca2dd2/attachment.asc>

From gwern at gwern.net  Wed Sep  3 21:32:09 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Wed, 3 Sep 2014 15:32:09 -0400
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <1404578679.5548.2.camel@kirk>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
Message-ID: <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>

On Sat, Jul 5, 2014 at 12:44 PM, Joachim Breitner
<mail at joachim-breitner.de> wrote:
> No, but the manual is quite short on ?how do I do X?. Would you be
> interesting in contributing here? I think that _not_ being a developer
> is an advantage when writing good documentation.

Maybe. I may not be an arbtt developer, but I'm still not a regular
user. Regardless, I think some of the tricks and observations I made
while working with arbtt are worth including in the manual; the manual
currently gives one little idea how one would actually go about
effectively using arbtt. I wrote up some thoughts in Markdown (below)
- I have never used that XML stuff you are using and would probably
muck it up, so hopefully you can convert the Markdown version to XML.

If other arbtt users could mention various roadblocks and solutions
they came up with, that'd be helpful.

----

The idea is that this would be placed after 'Configuring the arbtt
categorizer (arbtt-stats)'
http://arbtt.nomeata.de/doc/users_guide/configuration.html#idp20408 -

# Effective Use of Arbtt

Now that the syntax has been described & the toolbox laid out, how
does one practically go about using and configuring arbtt?

## Enabling data collection

After installing arbtt, one needs to configure it to run. There are
many ways one can run the `arbtt-capture` daemon, but a standard way
on Unix systems would be to add it as a
[`cron`](https://en.wikipedia.org/wiki/Cron) job: for example, one
could edit one's crontab file (`crontab -e`) and add a line like this:

    DISPLAY=:0
    @reboot arbtt-capture --logfile=/home/username/doc/arbtt/capture.log

At boot, `arbtt-capture` will be run in the background and will
capture a snapshot of the X metadata for active windows every 60
seconds (the default). If one wanted more fine-grained time data at
the expense of doubling storage use per day, one could increase the
sampling rate with a command like `--sample-rate=30`. To be resilient
to any errors or segfaults, one could also wrap it in a infinite loop
to restart the daemon should it ever crash, with a command like

    DISPLAY=:0
    @reboot while true; do arbtt-capture --sample-rate=30; sleep 1m; done

## Checking data availability

arbtt tracks X properties like window title, class, and running
program, and one writes regexp rules to classify the strings as one
wishes; but this assumes that the necessary data is present in those
properties.

For some programs, this is the case. For example, web browsers like
Firefox typically set the X title to the `<title>` of the web page in
the currently-focused tab, which is enough for classification.

Some programs do not set titles or class, and all arbtt sees is empty
strings like ""; or they may set the title/class to a constant like
"Liferea", which may be acceptable if that program is used for only
one purpose, but if it is used for many purposes, then one cannot
write a rule matching it without producing highly-misleading time
analyses. (For example, a web browser may be used for countless
purposes, ranging from work to research to music to writing to
programming; but if the web browser's title/class were always just
"Web browser", how would one classify 5 hours spent using the web
browser? If the 5 hours are classified as any or all of those
purposes, then the results will be misleading garbage - one probably
didn't spend 5 hours just listening to music, but a mixture of those
purposes, which changes from day to day.)

One should check for such problematic programs upon starting using
arbtt. It would be unfortunate if one were to log for a few months, go
back for a detailed report for some reason, and discover that the
necessary data was never actually available for arbtt to log!

These programs can sometimes be customized internally, a bug report
filed with the maintainers, or their titles can be externally set by
[`wmctrl`](https://en.wikipedia.org/wiki/Wmctrl) or
[`xprop`](http://jonisalonen.com/2014/setting-x11-window-properties-with-xprop/).

### `xprop`

One can check the X properties of a running window by running the
command [`xprop`](http://www.xfree86.org/current/xprop.1.html) and
clicking on the window; `xprop` will print out all the relevant X
information. For example, the output for Emacs might look like this

    $ xprop | tail -5
    WM_CLASS(STRING) = "emacs", "Emacs"
    WM_ICON_NAME(STRING) = "emacs at elan"
    _NET_WM_ICON_NAME(UTF8_STRING) = "emacs at elan"
    WM_NAME(STRING) = "emacs at elan"
    _NET_WM_NAME(UTF8_STRING) = "emacs at elan"

This is not very helpful: it does not tell us the filename being
edited, the mode being used, or anything. One could classify time
spent in Emacs as "programming" or "writing", but this would be
imperfect, especially if one does both activities regularly. However,
Emacs can be customized by editing `~/.emacs`, and after some
searching with queries like "setting Emacs window title", the [Emacs
wiki](http://www.emacswiki.org/emacs-en/FrameTitle) &
[manual](https://www.gnu.org/software/emacs/manual/html_node/efaq/Displaying-the-current-file-name-in-the-titlebar.html)
advise us to put something like this Elisp in our `.emacs` file:

    (setq frame-title-format "%f")

Now the output looks different:

    $ xprop | tail -5
    WM_CLASS(STRING) = "emacs", "Emacs"
    WM_ICON_NAME(STRING) = "/home/gwern/arbtt.page"
    _NET_WM_ICON_NAME(UTF8_STRING) = "/home/gwern/arbtt.page"
    WM_NAME(STRING) = "/home/gwern/arbtt.page"
    _NET_WM_NAME(UTF8_STRING) = "/home/gwern/arbtt.page"

With this, we can usefully classify all such time samples as being "writing".

Another common gap is terminals/shells: they often do not include
information in the title like the current working directory or last
shell command. For example, urxvt/Bash:

    WM_COMMAND(STRING) = { "urxvt" }
    _NET_WM_ICON_NAME(UTF8_STRING) = "urxvt"
    WM_ICON_NAME(STRING) = "urxvt"
    _NET_WM_NAME(UTF8_STRING) = "urxvt"
    WM_NAME(STRING) = "urxvt"

Programmers may spend many hours in the shell doing a variety of
things (like Emacs), so this is a problem. Fortunately, this is also
solvable by customizing one's `.bashrc` to set the prompt to emit an
escape code interpreted by the terminal (baroque, but it works). The
following will include the working directory, a timestamp, and the
last command:

    trap 'echo -ne "\033]2;$(pwd); $(history 1 | sed "s/^[ ]*[0-9]*[
]*//g")\007"' DEBUG

Now the urxvt samples are useful:

    _NET_WM_NAME(UTF8_STRING) = "/home/gwern/wiki; 2014-09-03 13:39:32
arbtt-stats --help"

A rule could classify based on the directory one is working in, the
command one ran, or both. Other shells like zsh can be fixed this way
too but the exact command may differ; you will need to research &
experiment.

Some programs can be tricky to set. The [X image viewer
feh](http://feh.finalrewind.org/) has a `--title` option but it cannot
be set in the configuration file, `.config/feh/themes`, because it
needs to be specified dynamically; so one needs to set up a shell
alias or script to wrap the command like `feh --title "$(pwd) / %f /
%n"`.

### Raw samples

`xprop` can be tedious to use on every running window and one may not
think to check rarer programs. A better approach is to use
`arbtt-stats`'s `--dump-samples` option: this option will print out
the collected data for specified time periods, allowing one to examine
the X properties en masse. This option can be used with the
`-x`/`--exclude=` options to print the samples for *samples not
matched by existing rules* as well, which is indispensable for
improving coverage and suggesting ideas for new rules. A good way to
figure out what customizations to make is to run arbtt as a daemon for
a day or so, and then begin examining the raw samples for problems.

An example: suppose I create a simple category file named `foo` with
just the line

    $idle > 30 ==> tag inactive

I can then dump all my arbtt samples for the past day with a command like this:

    arbtt-stats --categorizefile=foo --m=0 --filter='$sampleage
<24:00' --dump-samples

Because there are so many open windows, this produces a large amount
(26586 lines) of hard-to-read output:

    ...
    ( ) Navigator:      /r/Touhou's Favorite Arranges! Part 71:
Retribution for the Eternal Night ~ Imperishable Night : touhou -
Iceweasel
    ( ) Navigator:      Configuring the arbtt categorizer
(arbtt-stats) - Iceweasel
    ( ) evince:         ATTACHMENT02
    ( ) evince:         2009-geisler.pdf ? Heart rate variability
predicts self-control in goal pursuit
    ( ) urxvt:          /home/gwern; arbtt-stats --categorizefile=foo
--m=0 --filter='$sampleage <24:00' --dump-samples
    ( ) mnemosyne:      Mnemosyne
    ( ) urxvt:          /home/gwern; 2014-09-03 13:11:45 xprop
    ( ) urxvt:          /home/gwern; 2014-09-03 13:42:17 history 1 |
cut --delimiter=' ' --fields=5-
    ( ) urxvt:          /home/gwern; 2014-09-03 13:12:21 git log -p .emacs
    (*) emacs:          emacs at elan
    ( ) urxvt:          /home/gwern; 2014-09-01 14:50:30 while true;
do cd ~/ && getmail_fetch --ssl pop.gmail.com gwern0
'ugaozoumbhwcijxb' ./mail/; done
    ( ) urxvt:
/home/gwern/blackmarket-mirrors/silkroad2-forums; 2014-08-31 23:20:10
mv /home/gwern/cookies.txt ./; http_proxy="localhost:8118" wget...
    ( ) urxvt:          /home/gwern/blackmarket-mirrors/agora;
2014-08-31 23:15:50 mv /home/gwern/cookies.txt ./;
http_proxy="localhost:8118" wget --mirror ...
    ( ) urxvt:
/home/gwern/blackmarket-mirrors/evolution-forums; 2014-08-31 23:04:10
mv ~/cookies.txt ./; http_proxy="localhost:8118" wget --mirror ...
    ( ) puddletag:      puddletag: /home/gwern/music

Active windows are denoted by an asterisk, so I can focus & simplify
by adding a pipe like `| fgrep '(*)'`, producing more manageable
output like

    (*) urxvt:          irssi
    (*) urxvt:          irssi
    (*) urxvt:          irssi
    (*) Navigator:      Pyramid of Technology - NextNature.net - Iceweasel
    (*) Navigator:      Search results - gwern0 at gmail.com - Gmail - Iceweasel
    (*) Navigator:      [New comment] The Wrong Path -
gwern0 at gmail.com - Gmail - Iceweasel
    (*) Navigator:      Iceweasel
    (*) Navigator:      Litecoin Exchange Rate - $4.83 USD -
litecoinexchangerate.org - Iceweasel
    (*) Navigator:      PredictionBook: LiteCoin will trade at >=10
USD per ltc in 2 years, - Iceweasel
    (*) urxvt:          irssi
    (*) Navigator:      Bug#691547 closed by Mikhail Gusarov
<dottedmag at dottedmag.net> (Re: s3cmd: Man page: --default-mime-type
documentation incomplete...)
    (*) Navigator:      Bug#691547 closed by Mikhail Gusarov
<dottedmag at dottedmag.net> (Re: s3cmd: Man page: --default-mime-type
documentation incomplete...)
    (*) Navigator:      Bug#691547 closed by Mikhail Gusarov
<dottedmag at dottedmag.net> (Re: s3cmd: Man page: --default-mime-type
documentation incomplete...)
    (*) urxvt:          /home/gwern; 2014-09-02 14:25:17 man s3cmd
    (*) evince:         bayesiancausality.pdf
    (*) evince:         bayesiancausality.pdf
    (*) puddletag:      puddletag: /home/gwern/music
    (*) puddletag:      puddletag: /home/gwern/music
    (*) evince:         bayesiancausality.pdf
    (*) Navigator:      ? Umineko no Naku Koro ni Music Box 4 - ??????
?2?? ??? - YouTube - Iceweasel
    ...

This is better. We can see a few things: the windows all now produce
enough information to be usefully classified (Gmail can be classified
under email, irssi can be classified as IRC, the urxvt usage can
clearly be classified as programming, the PDF being read is
statistics, etc) in part because of customizations to bash/urxvt. The
duplication still impedes focus, and we don't know what's most common.
We can use another pipeline to sort, count duplicates, and sort by
number of duplicates (`| sort | uniq --count | sort
--general-numeric-sort`), yielding:

     ...
     14     (*) Navigator:      A Bluer Shade of White Chapter 4, a
frozen fanfic | FanFiction - Iceweasel
     14     (*) Navigator:      Iceweasel
     15     (*) evince:         2009-geisler.pdf ? Heart rate
variability predicts self-control in goal pursuit
     15     (*) Navigator:      Tool use by animals - Wikipedia, the
free encyclopedia - Iceweasel
     16     (*) Navigator:      Hacker News | Add Comment - Iceweasel
     17     (*) evince:         bayesiancausality.pdf
     17     (*) Navigator:      Comments - Less Wrong Discussion - Iceweasel
     17     (*) Navigator:      Keith Gessen ? Why not kill them all?:
In Donetsk ? LRB 11 September 2014 - Iceweasel
     17     (*) Navigator:      Notes on the Celebrity Data Theft |
Hacker News - Iceweasel
     18     (*) Navigator:      A Bluer Shade of White Chapter 1, a
frozen fanfic | FanFiction - Iceweasel
     19     (*) gl:             mplayer2
     19     (*) Navigator:      Neural networks and deep learning - Iceweasel
     20     (*) Navigator:      Harry Potter and the Philosopher's
Zombie, a harry potter fanfic | FanFiction - Iceweasel
     20     (*) Navigator:      [OBNYC] Time tracking app -
gwern0 at gmail.com - Gmail - Iceweasel
     25     (*) evince:         ps2007.pdf ? untitled
     35     (*) emacs:          /home/gwern/arbtt.page
     43     (*) Navigator:      CCC comments on The Octopus, the
Dolphin and Us: a Great Filter tale - Less Wrong - Iceweasel
     62     (*) evince:         The physics of information processing
superobjects - Anders Sandberg - 1999.pdf ? Brains2
     69     (*) liferea:        Liferea
     82     (*) evince:         BMS_raftery.pdf ? untitled
     84     (*) emacs:          emacs at elan
     87     (*) Navigator:      overview for gwern - Iceweasel
    109     (*) puddletag:      puddletag: /home/gwern/music
    150     (*) urxvt:          irssi

Put this way, we can see what rules we should write to categorize: we
could categorize the activities here into a few categories of
"recreational", "statistics", "music", "email", "IRC", "research", &
"writing"; and add to the `categorize.cfg` some rules like thus:

    $idle > 30 ==> tag inactive,

    current window $title =~ [/.*Hacker News.*/, /.*Less Wrong.*/,
/.*overview for gwern.*/, /.*[fF]an[fF]ic.*/, /.* LRB .*/]
      || current window $program == "liferea" ==> tag Recreation,
    current window $title =~ [/.*puddletag.*/, /.*mplayer2.*/] ==> tag Music,
    current window $title =~ [/.*[bB]ayesian.*/, /.*[nN]eural
[nN]etworks.*/, /.*ps2007.pdf.*/, /.*[Rr]aftery.*/] ==> tag
Statistics,
    current window $title =~ [/.*Wikipedia.*/, /.*Heart rate
variability.*/, /.*Anders Sandberg.*/] ==> tag Research,
    current window $title =~ [/.*Gmail.*/] ==> tag Email,
    current window $title =~ [/.*arbtt.*/] ==> tag Writing,
    current window $title == "irssi" ==> tag IRC,

If we reran the command, we'd see the same output, so we need to
leverage our new rules and *exclude* any samples matching our current
tags, so now we run a command like:

    arbtt-stats --categorizefile=foo --filter='$sampleage <24:00'
--dump-samples --exclude=Recreation --exclude=Music
--exclude=Statistics
                 --exclude=Research --exclude=Email --exclude=Writing
--exclude=IRC |
                 fgrep '(*)' | sort | uniq --count | sort --general-numeric-sort

Now the previous samples disappear, leaving us with a fresh batch of
unclassified samples to work with:

      9     (*) Navigator:      New Web Order > Nik Cubrilovic - -
Notes on the Celebrity Data Theft - Iceweasel
      9     ( ) urxvt:          /home/gwern; arbtt-stats
--categorizefile=foo --filter='$sampleage <24:00' --dump-samples |
fgrep '(*)' | less
     10     (*) evince:         ATTACHMENT02
     10     (*) Navigator:      These Giant Copper Orbs Show Just How
Much Metal Comes From a Mine | Design | WIRED - Iceweasel
     12     (*) evince:
[Jon_Elster]_Alchemies_of_the_Mind_Rationality_an(BookFi.org).pdf ?
Alchemies of the mind
     12     (*) Navigator:      Morality Quiz/Test your Morals, Values
& Ethics - YourMorals.Org - Iceweasel
     33     ( ) urxvt:          /home/gwern; arbtt-stats
--categorizefile=foo --filter='$sampleage <24:00' --dump-samples |
fgrep '(*)'...

We can add rules categorizing these as 'Recreational', 'Writing',
'Research', 'Recreational', 'Research', 'Writing', and 'Writing'
respectively; and we might decide at this point that 'Writing' is
starting to become overloaded, so we'll split it into two tags,
'Writing' and 'Programming'. And then after tossing another
`--exclude=Programming` into our rules, we can repeat the process.

As we refine our rules, we will quickly spot instances where the
title/class/program are insufficient to allow accurate classification,
and we will figure out the best collection of tags for our particular
purposes. A few iterations is enough for most purposes.

## Categorizing advice

When building up rules, a few rules of thumb should be kept in mind:

1. categorize by purpose, not by program

    This leads to misleading time reports. Avoid, for example, lumping
all web browser time into a single category named 'Internet'; this is
more misleading than helpful. Good categories describe an activity or
goal, such as 'Work' or 'Recreation', not a tool, like 'Emacs' or
'Vim'.
2. when in doubt, write narrow rules and generalize later

    Regexps are tricky and it can be easy to write rules far broader
than one intended. The `--exclude` filters mean that one will never
see samples which are matched accidentally. If one is in doubt, it can
be helpful to take a specific sample one wants to match and several
similar strings and look at how well one's regexp rule works in
Emacs's [regexp-builder](http://www.emacswiki.org/emacs/ReBuilder) or
online regexp-testers like [regexpal](http://regexpal.com/).
3. don't try to classify everything

    You will never classify 100% of samples because sometimes programs
do not include useful X properties & cannot be fixed, you have samples
from before you fixed them, or they are too transient (like popups and
dialogues) to be worth fixing. It is not necessary to classify 100% of
your time, since as long as the most common programs and, say,
[80%](https://en.wikipedia.org/wiki/Pareto_principle) of your time is
classified, then you have most of the value. It is easy to waste more
time tweaking arbtt than one gains from increased accuracy or more
finely-grained tags.

## Long-term storage

Each halving of the sampling rate doubles the number of samples taken
and hence the storage requirement; sampling rates below 20s are
probably wasteful. But even the default 60s can accumulate into a
nontrivial amount of data over a year. A constantly-changing binary
file can interact poorly with backup systems, may make arbtt analyses
slower, and if one's system occasionally crashes or experiences other
problems, cause some corruption of the log and be a nuisance in having
to run `arbtt-recover`.

Thus it may be a good idea to archive one's `capture.log` on an annual
basis. If one needs to query the historical data, the particular log
file can be specified as an option like
`--logfile=/home/gwern/doc/arbtt/2013-2014.log`

## Advanced queries

arbtt supports CSV export of time by category in various levels of
granularity in a 'long' format (multiple rows for each day, with _n_
row specifying a category's value for that day). These CSV exports can
be imported into statistical programs like R or Excel and manipulated
as desired.

R users may prefer to have their time data in a 'wide' format (each
row is 1 day, with _n_ columns for each possible category); this can
be done with the `reshape` default library. After reading in the CSV,
the time-intervals can be converted to counts and the data to a wide
data-frame with R code like the following:

    arbtt <- read.csv("arbtt.csv")
    interval <- function(x) { if (!is.na(x)) { if (grepl(" s",x))
as.integer(sub(" s","",x))
                                              else { y <-
unlist(strsplit(x, ":"));

as.integer(y[[1]])*3600 + as.integer(y[[2]])*60 + as.integer(y[[3]]);
}
                                                     }
                             else NA
                             }
    arbtt$Time <- sapply(as.character(arbtt$Time), interval)
    library(reshape)
    arbtt <- reshape(arbtt, v.names="Time", timevar="Tag",
idvar="Day", direction="wide")

-----

-- 
gwern
http://www.gwern.net
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arbtt.page
Type: application/octet-stream
Size: 19514 bytes
Desc: not available
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140903/12431bea/attachment.obj>

From gwern at gwern.net  Sat Sep  6 04:43:07 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Fri, 5 Sep 2014 22:43:07 -0400
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
 <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
Message-ID: <CAMwO0gx4vc2jkzWpYp5Ayv7DzYWf5JbcKxRAtMvLiurOatorDg@mail.gmail.com>

On Wed, Sep 3, 2014 at 3:32 PM, Gwern Branwen <gwern at gwern.net> wrote:
> 3. don't try to classify everything
>
>     You will never classify 100% of samples because sometimes programs
> do not include useful X properties & cannot be fixed, you have samples
> from before you fixed them, or they are too transient (like popups and
> dialogues) to be worth fixing. It is not necessary to classify 100% of
> your time, since as long as the most common programs and, say,
> [80%](https://en.wikipedia.org/wiki/Pareto_principle) of your time is
> classified, then you have most of the value. It is easy to waste more
> time tweaking arbtt than one gains from increased accuracy or more
> finely-grained tags.

A fourth guideline just occurred to me.

4. avoid large and microscopic tags

    If a tag takes up more than a third or so of your time, it is
probably too large, masks variation, and can be broken down into more
meaningful tags. Conversely, a tag too narrow to show up regularly in
reports (because it is below the default 1% filter) may not be helpful
because it is usually tiny, and can be combined with the most similar
tag to yield more compact and easily interpreted reports.

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Sun Sep 14 23:12:41 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sun, 14 Sep 2014 23:12:41 +0200
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gymLW37NUJmtp5uCavnVJsK83o1B-BFmOgwAqz9Z4r88w@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
 <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
 <CAMwO0gymLW37NUJmtp5uCavnVJsK83o1B-BFmOgwAqz9Z4r88w@mail.gmail.com>
Message-ID: <1410729161.2862.10.camel@joachim-breitner.de>

Hi Gwern,

Am Sonntag, den 14.09.2014, 17:06 -0400 schrieb Gwern Branwen:
> Any thoughts on this? The guide was helpful to at least one new arbtt
> user I gave a link to.

I?m terribly sorry for not replying to your mails in time, and the three
unread mails in the arbtt folder keep reminding me of that, but other
things keep (including travel to DebConf and ICFP, and the work that had
to wait for that) having a higher priority.

I?ll set my mind to working through them this week.

Thanks for your patience,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140914/68fa4697/attachment.asc>

From gwern at gwern.net  Sun Sep 14 23:06:35 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sun, 14 Sep 2014 17:06:35 -0400
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
 <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
Message-ID: <CAMwO0gymLW37NUJmtp5uCavnVJsK83o1B-BFmOgwAqz9Z4r88w@mail.gmail.com>

On Wed, Sep 3, 2014 at 3:32 PM, Gwern Branwen <gwern at gwern.net> wrote:
> Maybe. I may not be an arbtt developer, but I'm still not a regular
> user. Regardless, I think some of the tricks and observations I made
> while working with arbtt are worth including in the manual; the manual
> currently gives one little idea how one would actually go about
> effectively using arbtt. I wrote up some thoughts in Markdown (below)
> - I have never used that XML stuff you are using and would probably
> muck it up, so hopefully you can convert the Markdown version to XML.
>
> If other arbtt users could mention various roadblocks and solutions
> they came up with, that'd be helpful.
>
> ...

Any thoughts on this? The guide was helpful to at least one new arbtt
user I gave a link to.

-- 
gwern
http://www.gwern.net


From gwern at gwern.net  Sun Sep 14 23:26:20 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sun, 14 Sep 2014 17:26:20 -0400
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <1410729161.2862.10.camel@joachim-breitner.de>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
 <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
 <CAMwO0gymLW37NUJmtp5uCavnVJsK83o1B-BFmOgwAqz9Z4r88w@mail.gmail.com>
 <1410729161.2862.10.camel@joachim-breitner.de>
Message-ID: <CAMwO0gxzDRdr6JmSZ4TQRu5JGsaV73se20vReGhmFhnvWTFpLg@mail.gmail.com>

On Sun, Sep 14, 2014 at 5:12 PM, Joachim Breitner
<mail at joachim-breitner.de> wrote:
> I?m terribly sorry for not replying to your mails in time, and the three
> unread mails in the arbtt folder keep reminding me of that, but other
> things keep (including travel to DebConf and ICFP, and the work that had
> to wait for that) having a higher priority.

Alright. I wasn't sure if you were even getting my emails since they
seemed to be nonexistent & eaten by a filter for the list when I
checked.

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Wed Sep 17 11:45:06 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 17 Sep 2014 11:45:06 +0200
Subject: Example of time-tracking
In-Reply-To: <CAMwO0gyB5ra3HSGncDtK-ZpKFbhSi6xTERpBtnp2Kw8DD0zA2g@mail.gmail.com>
References: <CAMwO0gywpXtOkMp1VqA+SKdS3am-NaivADS09_UskpB-CcNuNQ@mail.gmail.com>
 <CAMwO0gyB5ra3HSGncDtK-ZpKFbhSi6xTERpBtnp2Kw8DD0zA2g@mail.gmail.com>
Message-ID: <1410947105.2613.2.camel@joachim-breitner.de>

Hi Gwern,

on a train ride now, so enough time to work through your (very welcome!)
messages.


Am Samstag, den 16.08.2014, 18:20 -0400 schrieb Gwern Branwen:
> On Sat, Aug 16, 2014 at 5:02 PM, Gwern Branwen <gwern at gwern.net> wrote:
> >> - I work more and often in the morning (segmenting results into day slots? dawn, wee hours, morning, mid-morning, afternoon, evening?yielded even more interesting results?)
> 
> I thought I'd try to extract my own 24-h data from arbtt and see how
> my factor-analysis fared, but it looks like this can't be done with
> arbtt right now: I need categorization of data usage by minute or by
> hour, but it seems arbtt only supports extracting data to csv by
> chunks of day/month/year
> 
>                --for-each=PERIOD       one of: day, month, year
> 
> Is there a workaround here? Or does --for-each need to be extended? I
> think it would be enough to add 'minute' as a period, since arbtt
> isn't generally used more fine-grained than that and I can aggregate
> by hour in R if it turns out that there's not enough data for
> plotting/regressing by minute over 24h.

The easiest is to extend for-each; just did that with minute and hour.

Greetings,
Joachim


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140917/38d7f2f1/attachment.asc>

From mail at joachim-breitner.de  Wed Sep 17 12:34:13 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 17 Sep 2014 12:34:13 +0200
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gx4vc2jkzWpYp5Ayv7DzYWf5JbcKxRAtMvLiurOatorDg@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
 <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
 <CAMwO0gx4vc2jkzWpYp5Ayv7DzYWf5JbcKxRAtMvLiurOatorDg@mail.gmail.com>
Message-ID: <1410950053.2613.8.camel@joachim-breitner.de>

Hi,


Am Freitag, den 05.09.2014, 22:43 -0400 schrieb Gwern Branwen:
> A fourth guideline just occurred to me.

again thanks, added!

(Personally, I don?t mind such tags, e.g. I do have a tag for browsing
the web. As tags are not mutually exclusive, this does not in any way
interfere with more useful tags.)

Greetings,
Joachim

-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140917/bcfc4a51/attachment.asc>

From mail at joachim-breitner.de  Wed Sep 17 12:24:06 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Wed, 17 Sep 2014 12:24:06 +0200
Subject: Listing samples which are not matched by any tags?
In-Reply-To: <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
References: <CAMwO0gzuXhnoZOp-1yQE7mrpMar1HJ5tNBwzzmUZqg8dup_YaQ@mail.gmail.com>
 <1383508486.4978.2.camel@kirk>
 <CAMwO0gzGJE-PhVccnO2XYTn7-GN0eQ+Qnqp+9OU3knjCxcZE8w@mail.gmail.com>
 <1383517885.12868.10.camel@kirk>
 <CAMwO0gyw7W45WvCWabW8LXLhUf1bCwuukrfbx2B05x9SmzdUwQ@mail.gmail.com>
 <CAMwO0gwVjUfvKJeJdJuAoa3=OtqxeUpiXvKxGvB=foLJcTry7Q@mail.gmail.com>
 <1404560554.4066.7.camel@kirk>
 <CAMwO0gwZ69sybqg3keiRJrwqzMYWiWa36J9wJvC3VhORHxNqjg@mail.gmail.com>
 <1404578679.5548.2.camel@kirk>
 <CAMwO0gyVmeRYoT3Xr+APx2wPNC_dZ1hjzMjZE1EP+jF21L6mCw@mail.gmail.com>
Message-ID: <1410949446.2613.5.camel@joachim-breitner.de>

Hi,


Am Mittwoch, den 03.09.2014, 15:32 -0400 schrieb Gwern Branwen:
> On Sat, Jul 5, 2014 at 12:44 PM, Joachim Breitner
> <mail at joachim-breitner.de> wrote:
> > No, but the manual is quite short on ?how do I do X?. Would you be
> > interesting in contributing here? I think that _not_ being a developer
> > is an advantage when writing good documentation.
> 
> Maybe. I may not be an arbtt developer, but I'm still not a regular
> user.

Not sure what you mean. As far as I know, you might actually be the only
user :-)

> Regardless, I think some of the tricks and observations I made
> while working with arbtt are worth including in the manual; the manual
> currently gives one little idea how one would actually go about
> effectively using arbtt. I wrote up some thoughts in Markdown (below)

Great, thanks a lot!

I have included it (conversion was very easy, thanks to the amazing
"pandoc"), and did one round of copy-editing ? as a separate patch, if
you want to review the changes. In particular, I found it nicer to
address the user directly with "you", instead of "one".

> - I have never used that XML stuff you are using and would probably
> muck it up, so hopefully you can convert the Markdown version to XML.

Don?t be afraid of editing the docbook directly, it?s rather simple, and
running "make" will tell if you broke something, and immediately give
you the HTML output if not. And it has some possibilities that pandoc
does not have, such as linking to other sections, special mark-up for
examples etc.

And if in doubt, you can still use markdown and convert it using
pandoc :-)


Anyways, thanks a lot!
Joachim
-- 
Joachim Breitner
  e-Mail: mail at joachim-breitner.de
  Homepage: http://www.joachim-breitner.de
  Jabber-ID: nomeata at joachim-breitner.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140917/c2e35893/attachment.asc>

From gwern at gwern.net  Fri Sep 26 04:57:28 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Thu, 25 Sep 2014 22:57:28 -0400
Subject: Configuring arbtt guide, Re: Listing samples which are not matched by
 any tags?
Message-ID: <CAMwO0gwCRZuZUM_cn0Jgr9m0BwhUVELFEg8rbzdyD6oHrwknsQ@mail.gmail.com>

On Wed, Sep 17, 2014 at 6:34 AM, Joachim Breitner
<mail at joachim-breitner.de> wrote:
> again thanks, added!

I just took a look at
http://arbtt.nomeata.de/doc/users_guide/effective-use.html - the guide
looks good and I hope people find it helpful. While looking it over, I
noticed some typos and other issues, so I've sent in another patch.

-- 
gwern
http://www.gwern.net


From gwern0 at gmail.com  Fri Sep 26 04:56:11 2014
From: gwern0 at gmail.com (gwern0 at gmail.com)
Date: Thu, 25 Sep 2014 19:56:11 -0700 (PDT)
Subject: darcs patch: arbtt.xml: cpedit based on reading live version http:/...
Message-ID: <5424d5cb.e61b8c0a.2185.ffff96ac@mx.google.com>

1 patch for repository http://darcs.nomeata.de/arbtt:

Thu Sep 25 22:55:47 EDT 2014  gwern at gwern.net
  * arbtt.xml: cpedit based on reading live version http://arbtt.nomeata.de/doc/users_guide/effective-use.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-preview.txt
Type: text/x-darcs-patch
Size: 21102 bytes
Desc: Patch preview
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140925/d5835c7a/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arbtt_xml_-cpedit-based-on-reading-live-version-http___arbtt_nomeata_de_doc_users_guide_effective_use_html.dpatch
Type: application/x-darcs-patch
Size: 23615 bytes
Desc: A darcs patch for your repository!
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140925/d5835c7a/attachment-0001.bin>

From gwern at gwern.net  Sat Sep 27 22:58:18 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sat, 27 Sep 2014 16:58:18 -0400
Subject: Installation documentation: what are the dependencies for compiling
 arbtt?
Message-ID: <CAMwO0gwwuc+9FzZeuqTS05w1b3FuLk1LfQX0YtMUgVoQ+gX5xw@mail.gmail.com>

An acquaintance wanted to use --dump-samples but her arbtt was too old
(apparently the Ubuntu package is well behind the times?) and so
installed a fresh one from Hackage when a compile error hit:

    cabal: Error: some packages failed to install:
    X11-1.6.1.2 failed during the configure step. The exception was:
    ExitFailure 1

As many Xmonad users know, this is the common problem of having not
installed the headers/-dev package for the X11 library, which then
makes the X11 Haskell bindings impossible to compile, which makes
arbtt impossible to compile:
http://www.haskell.org/haskellwiki/Xmonad/Frequently_asked_questions#Missing_X11_headers

The solution is to install libx11-dev on Debian or xorg-dev (which
isn't mentioned in http://arbtt.nomeata.de/#install ).

But are there any other foreign dependencies which need to be
mentioned for people installing from source?

-- 
gwern
http://www.gwern.net


From gwern0 at gmail.com  Sat Sep 27 23:17:54 2014
From: gwern0 at gmail.com (gwern0 at gmail.com)
Date: Sat, 27 Sep 2014 14:17:54 -0700 (PDT)
Subject: darcs patch: arbtt.xml: cpedit based on reading live ... (and 1 more)
Message-ID: <54272982.853ce00a.182f.4fa5@mx.google.com>

2 patches for repository http://darcs.nomeata.de/arbtt:

Thu Sep 25 22:55:47 EDT 2014  gwern at gwern.net
  * arbtt.xml: cpedit based on reading live version http://arbtt.nomeata.de/doc/users_guide/effective-use.html

Sat Sep 27 17:11:46 EDT 2014  gwern at gwern.net
  * arbtt.cabal: specify bug tracker location for people looking at Hackage page for documentation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-preview.txt
Type: text/x-darcs-patch
Size: 22690 bytes
Desc: Patch preview
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140927/01c6bcae/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arbtt_xml_-cpedit-based-on-reading-live-version-http___arbtt_nomeata_de_doc_users_guide_effective_use_html.dpatch
Type: application/x-darcs-patch
Size: 25203 bytes
Desc: A darcs patch for your repository!
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140927/01c6bcae/attachment-0001.bin>

From mail at joachim-breitner.de  Mon Sep 29 14:51:04 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 29 Sep 2014 14:51:04 +0200
Subject: darcs patch: arbtt.xml: cpedit based on reading live ... (and 1
 more)
In-Reply-To: <54272982.853ce00a.182f.4fa5@mx.google.com>
References: <54272982.853ce00a.182f.4fa5@mx.google.com>
Message-ID: <1411995064.11958.2.camel@joachim-breitner.de>

Hi,


Am Samstag, den 27.09.2014, 14:17 -0700 schrieb gwern0 at gmail.com:
> 2 patches for repository http://darcs.nomeata.de/arbtt:
> 
> Thu Sep 25 22:55:47 EDT 2014  gwern at gwern.net
>   * arbtt.xml: cpedit based on reading live version http://arbtt.nomeata.de/doc/users_guide/effective-use.html
> 
> Sat Sep 27 17:11:46 EDT 2014  gwern at gwern.net
>   * arbtt.cabal: specify bug tracker location for people looking at Hackage page for documentation

thanks, both applied.

Greetings,
Joachim
-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140929/d8922734/attachment.asc>

From mail at joachim-breitner.de  Mon Sep 29 14:53:04 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 29 Sep 2014 14:53:04 +0200
Subject: Installation documentation: what are the dependencies for
 compiling arbtt?
In-Reply-To: <CAMwO0gwwuc+9FzZeuqTS05w1b3FuLk1LfQX0YtMUgVoQ+gX5xw@mail.gmail.com>
References: <CAMwO0gwwuc+9FzZeuqTS05w1b3FuLk1LfQX0YtMUgVoQ+gX5xw@mail.gmail.com>
Message-ID: <1411995184.11958.4.camel@joachim-breitner.de>

Hi,


Am Samstag, den 27.09.2014, 16:58 -0400 schrieb Gwern Branwen:
> An acquaintance wanted to use --dump-samples but her arbtt was too old
> (apparently the Ubuntu package is well behind the times?) and so
> installed a fresh one from Hackage when a compile error hit:
> 
>     cabal: Error: some packages failed to install:
>     X11-1.6.1.2 failed during the configure step. The exception was:
>     ExitFailure 1
> 
> As many Xmonad users know, this is the common problem of having not
> installed the headers/-dev package for the X11 library, which then
> makes the X11 Haskell bindings impossible to compile, which makes
> arbtt impossible to compile:
> http://www.haskell.org/haskellwiki/Xmonad/Frequently_asked_questions#Missing_X11_headers

Well spotted.

> The solution is to install libx11-dev on Debian or xorg-dev (which
> isn't mentioned in http://arbtt.nomeata.de/#install ).

I guess you?ll send a patch for that soon? :-)

> But are there any other foreign dependencies which need to be
> mentioned for people installing from source?

Probably libpcre3-dev for pcre-light, and libxss-dev for arbtt itself.

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140929/4bc423a0/attachment.asc>

From gwern0 at gmail.com  Mon Sep 29 21:11:35 2014
From: gwern0 at gmail.com (gwern0 at gmail.com)
Date: Mon, 29 Sep 2014 12:11:35 -0700 (PDT)
Subject: darcs patch: arbtt.html: split install section into binary vs sourc...
Message-ID: <5429aee7.c306e00a.3091.ffffeadf@mx.google.com>

1 patch for repository http://darcs.nomeata.de/arbtt:

Mon Sep 29 15:10:57 EDT 2014  gwern at gwern.net
  * arbtt.html: split install section into binary vs source, and list non-haskell dependencies
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-preview.txt
Type: text/x-darcs-patch
Size: 19219 bytes
Desc: Patch preview
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140929/1d740bea/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arbtt_html_-split-install-section-into-binary-vs-source_-and-list-non_haskell-dependencies.dpatch
Type: application/x-darcs-patch
Size: 22100 bytes
Desc: A darcs patch for your repository!
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20140929/1d740bea/attachment-0001.bin>

From rejuvyesh at gmail.com  Fri Nov 28 11:19:02 2014
From: rejuvyesh at gmail.com (rejuvyesh)
Date: Fri, 28 Nov 2014 15:49:02 +0530
Subject: Visualizing Daily Usage Statistics
Message-ID: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>

Greetings!

I wrote a `d3.js` based visualization for daily arbtt usage. Currently just
works on very simple categorization. Thought could be helpful to the
community. Suggestions (especially to make it more generalized) are most
welcome:

https://github.com/rejuvyesh/dailystats

For a demo see:

http://rejuvyesh.com/dailystats/

Hope you all find it useful.

---
rejuvyesh
http://rejuvyesh.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141128/a223d8e3/attachment.htm>

From mail at joachim-breitner.de  Sun Nov 30 19:17:00 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sun, 30 Nov 2014 19:17:00 +0100
Subject: Visualizing Daily Usage Statistics
In-Reply-To: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>
References: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>
Message-ID: <1417371420.3101.7.camel@joachim-breitner.de>

Hi,


Am Freitag, den 28.11.2014, 15:49 +0530 schrieb rejuvyesh:

> I wrote a `d3.js` based visualization for daily arbtt usage. Currently
> just works on very simple categorization. Thought could be helpful to
> the community. Suggestions (especially to make it more generalized)
> are most welcome:
> 
> https://github.com/rejuvyesh/dailystats
> 

this looks very cool, thanks for sharing!

I like how this is the beginning of small ecosystem around arbtt, where
you don?t have to wait for me to implement your particular feature.


If I run it with my own `categorize.cfg` it seems to miss something
about ?totaltime?. Can you document what special tags your tool demands?
Or maybe we can find a way for you to get that information some other
way?

Are there other assumptions made, e.g. that tags are assigned
exclusively?

The current setup with the gh-pages branch is a bit strange. It would be
nice if the user could simply run one command and get usable output
in ./out or somewhere.

It seems that opening the html files from the local path does not help,
at least not here. A neat trick is to run
        python -m SimpleHTTPServer
in that directory. Maybe worth adding to the README?


Greetings,
Joachim


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141130/2a0810b0/attachment.asc>

From rejuvyesh at gmail.com  Mon Dec  1 15:32:15 2014
From: rejuvyesh at gmail.com (rejuvyesh)
Date: Mon, 1 Dec 2014 20:02:15 +0530
Subject: Visualizing Daily Usage Statistics
In-Reply-To: <1417371420.3101.7.camel@joachim-breitner.de>
References: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>
 <1417371420.3101.7.camel@joachim-breitner.de>
Message-ID: <CAFqH0N=7CPvsrVprbUqtzwR7GqjeWbhO0cvW49nT3UFPyfHUjQ@mail.gmail.com>

Thanks a lot for your kind comments.

Currently there are a lot of issues with `categorize.cfg` handling. We can
define tags such that they never add up to 100% or add up to more than 100%
of the total time arbtt has captured. Handling that, generally enough seems
quite complicated (any suggestions are of course welcome). This is the
reason, even I am not using my default `categorize.cfg` file here. So you
are correct that the tags are assumed to form an exclusive set.

You are also right that I should include the rendering directory right in
the master branch. I will do it right now.

I am still pretty new to haskell. I was planning to write a complete app in
haskell handling both parsing and serving it to web. But writing a simple
parser was lot more easier in python :( . I hope to do that some day ^TM.

Till then I had some issues, like handling things which I haven't been able
to track with tags and want to tag as say "others". Right now I would have
to exclude every other tag and then find them to add them to the `json`
files. It would be really great if there was some way to add to the config
file to automatically mark everything untagged as say "others".

For now I'll add the rendering files to a separate folder and add a simple
python serving script.

Cheers!

---
rejuvyesh
http://rejuvyesh.com


On Sun, Nov 30, 2014 at 11:47 PM, Joachim Breitner <mail at joachim-breitner.de
> wrote:

> Hi,
>
>
> Am Freitag, den 28.11.2014, 15:49 +0530 schrieb rejuvyesh:
>
> > I wrote a `d3.js` based visualization for daily arbtt usage. Currently
> > just works on very simple categorization. Thought could be helpful to
> > the community. Suggestions (especially to make it more generalized)
> > are most welcome:
> >
> > https://github.com/rejuvyesh/dailystats
> >
>
> this looks very cool, thanks for sharing!
>
> I like how this is the beginning of small ecosystem around arbtt, where
> you don?t have to wait for me to implement your particular feature.
>
>
> If I run it with my own `categorize.cfg` it seems to miss something
> about ?totaltime?. Can you document what special tags your tool demands?
> Or maybe we can find a way for you to get that information some other
> way?
>
> Are there other assumptions made, e.g. that tags are assigned
> exclusively?
>
> The current setup with the gh-pages branch is a bit strange. It would be
> nice if the user could simply run one command and get usable output
> in ./out or somewhere.
>
> It seems that opening the html files from the local path does not help,
> at least not here. A neat trick is to run
>         python -m SimpleHTTPServer
> in that directory. Maybe worth adding to the README?
>
>
> Greetings,
> Joachim
>
>
> --
> Joachim ?nomeata? Breitner
>   mail at joachim-breitner.de ? http://www.joachim-breitner.de/
>   Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
>   Debian Developer: nomeata at debian.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141201/d86fa497/attachment.htm>

From mail at joachim-breitner.de  Mon Dec  1 15:45:55 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 01 Dec 2014 15:45:55 +0100
Subject: Visualizing Daily Usage Statistics
In-Reply-To: <CAFqH0N=7CPvsrVprbUqtzwR7GqjeWbhO0cvW49nT3UFPyfHUjQ@mail.gmail.com>
References: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>
 <1417371420.3101.7.camel@joachim-breitner.de>
 <CAFqH0N=7CPvsrVprbUqtzwR7GqjeWbhO0cvW49nT3UFPyfHUjQ@mail.gmail.com>
Message-ID: <1417445155.2364.4.camel@joachim-breitner.de>

Hi,


Am Montag, den 01.12.2014, 20:02 +0530 schrieb rejuvyesh:

> Currently there are a lot of issues with `categorize.cfg` handling. We
> can define tags such that they never add up to 100% or add up to more
> than 100% of the total time arbtt has captured. Handling that,
> generally enough seems quite complicated (any suggestions are of
> course welcome). This is the reason, even I am not using my default
> `categorize.cfg` file here. So you are correct that the tags are
> assumed to form an exclusive set.

I think this can be solved by categories. How about this: The user is
expected to assign tags to a specific category (e.g. Graph:). Your tool
will simply ignore all other tags, and just plots all tags of the form
to Graph:*. This should guarantee that they never add up to more than
100%. This may also solve the problem of measuring the unmatched time,
at least if you build on the output of "arbtt-stats --category Graph"

Bonus points for making the name of the category configurable with a
command line argument to your tool, and maybe even supporting multiple
pie charts next to each other.


> Till then I had some issues, like handling things which I haven't been
> able to track with tags and want to tag as say "others". Right now I
> would have to exclude every other tag and then find them to add them
> to the `json` files. It would be really great if there was some way to
> add to the config file to automatically mark everything untagged as
> say "others". 

If you use the "arbtt-stats --category" report, it will report unmatched
time. In fact, with
$ arbtt-stats --category Program --for-each=day --output-format csv
you should get the numbers for the pie charts directly, without further
processing


For the other reports, where tags are generally overlapping an automatic
"other" tag might be less useful.


Greetings,
Joachim

> 

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141201/63ea1b12/attachment.asc>

From mail at joachim-breitner.de  Mon Dec  1 16:20:18 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 01 Dec 2014 16:20:18 +0100
Subject: Visualizing Daily Usage Statistics
In-Reply-To: <CAFqH0Nm3no6EPZRPRJxhTeaP_R5M3PueQDpWuyZOcgWtJJaPqA@mail.gmail.com>
References: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>
 <1417371420.3101.7.camel@joachim-breitner.de>
 <CAFqH0N=7CPvsrVprbUqtzwR7GqjeWbhO0cvW49nT3UFPyfHUjQ@mail.gmail.com>
 <1417445155.2364.4.camel@joachim-breitner.de>
 <CAFqH0Nm3no6EPZRPRJxhTeaP_R5M3PueQDpWuyZOcgWtJJaPqA@mail.gmail.com>
Message-ID: <1417447218.2364.6.camel@joachim-breitner.de>

Hi,


Am Montag, den 01.12.2014, 20:36 +0530 schrieb rejuvyesh:
> Categories are fantastic for pie charts. Although it seems they
> haven't been implemented for `--for-each=minute`.

It does here:

$ ./dist/build/arbtt-stats/arbtt-stats --category Program \
     --for-each=minute  --filter '$sampleage < 0:10' 

Statistics for category "Program" (Day 2014-12-01 16:05)
========================================================
___________________Tag_|___Time_|_Percentage_
Program:gnome-terminal |  1m00s |     100.00

Statistics for category "Program" (Day 2014-12-01 16:06)
========================================================
______________Tag_|___Time_|_Percentage_
Program:Navigator |  1m00s |     100.00

Statistics for category "Program" (Day 2014-12-01 16:07)
========================================================
______________Tag_|___Time_|_Percentage_
Program:evolution |  1m00s |     100.00

Statistics for category "Program" (Day 2014-12-01 16:09)
========================================================
___________________Tag_|___Time_|_Percentage_
Program:gnome-terminal |  1m00s |     100.00

Statistics for category "Program" (Day 2014-12-01 16:10)
========================================================
______________Tag_|___Time_|_Percentage_
Program:evolution |  1m00s |     100.00

Statistics for category "Program" (Day 2014-12-01 16:11)
========================================================
______________Tag_|___Time_|_Percentage_
Program:evolution |  1m00s |     100.00

Statistics for category "Program" (Day 2014-12-01 16:12)
========================================================
______________Tag_|___Time_|_Percentage_
Program:evolution |  1m00s |     100.00


But why would you need "--for-each=minute" for the per-day pie chart?


Greetings,
Joachim
> 

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141201/3b9f12d9/attachment.asc>

From rejuvyesh at gmail.com  Mon Dec  1 16:29:52 2014
From: rejuvyesh at gmail.com (rejuvyesh)
Date: Mon, 1 Dec 2014 20:59:52 +0530
Subject: Visualizing Daily Usage Statistics
In-Reply-To: <1417447218.2364.6.camel@joachim-breitner.de>
References: <CAFqH0N=XevA5RJ4Y8FNoyi-fBM_5Ao=B=V_GqSZ2sTZyjw0-Lw@mail.gmail.com>
 <1417371420.3101.7.camel@joachim-breitner.de>
 <CAFqH0N=7CPvsrVprbUqtzwR7GqjeWbhO0cvW49nT3UFPyfHUjQ@mail.gmail.com>
 <1417445155.2364.4.camel@joachim-breitner.de>
 <CAFqH0Nm3no6EPZRPRJxhTeaP_R5M3PueQDpWuyZOcgWtJJaPqA@mail.gmail.com>
 <1417447218.2364.6.camel@joachim-breitner.de>
Message-ID: <CAFqH0NkEZxGE4=yfGLAOxbHUFY7cHdCq-A3CPsqSUyKD54_4Wg@mail.gmail.com>

Ah yes it does. It's needed for the barcode plot just below the bar charts
- visualize what exactly you were doing at some time of day. Right now I
have separated them as "productive" or "unproductive", but that can be
defined in `settings.js` :)

Thanks!

On Mon, Dec 1, 2014 at 8:50 PM, Joachim Breitner <mail at joachim-breitner.de>
wrote:

> Hi,
>
>
> Am Montag, den 01.12.2014, 20:36 +0530 schrieb rejuvyesh:
> > Categories are fantastic for pie charts. Although it seems they
> > haven't been implemented for `--for-each=minute`.
>
> It does here:
>
> $ ./dist/build/arbtt-stats/arbtt-stats --category Program \
>      --for-each=minute  --filter '$sampleage < 0:10'
>
> Statistics for category "Program" (Day 2014-12-01 16:05)
> ========================================================
> ___________________Tag_|___Time_|_Percentage_
> Program:gnome-terminal |  1m00s |     100.00
>
> Statistics for category "Program" (Day 2014-12-01 16:06)
> ========================================================
> ______________Tag_|___Time_|_Percentage_
> Program:Navigator |  1m00s |     100.00
>
> Statistics for category "Program" (Day 2014-12-01 16:07)
> ========================================================
> ______________Tag_|___Time_|_Percentage_
> Program:evolution |  1m00s |     100.00
>
> Statistics for category "Program" (Day 2014-12-01 16:09)
> ========================================================
> ___________________Tag_|___Time_|_Percentage_
> Program:gnome-terminal |  1m00s |     100.00
>
> Statistics for category "Program" (Day 2014-12-01 16:10)
> ========================================================
> ______________Tag_|___Time_|_Percentage_
> Program:evolution |  1m00s |     100.00
>
> Statistics for category "Program" (Day 2014-12-01 16:11)
> ========================================================
> ______________Tag_|___Time_|_Percentage_
> Program:evolution |  1m00s |     100.00
>
> Statistics for category "Program" (Day 2014-12-01 16:12)
> ========================================================
> ______________Tag_|___Time_|_Percentage_
> Program:evolution |  1m00s |     100.00
>
>
>
> But why would you need "--for-each=minute" for the per-day pie chart?
>
>
> Greetings,
> Joachim
> >
>
> --
> Joachim ?nomeata? Breitner
>   mail at joachim-breitner.de ? http://www.joachim-breitner.de/
>   Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
>   Debian Developer: nomeata at debian.org
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141201/aaab2850/attachment.htm>

From rejuvyesh at gmail.com  Mon Dec  1 19:20:16 2014
From: rejuvyesh at gmail.com (rejuvyesh)
Date: Mon, 1 Dec 2014 23:50:16 +0530
Subject: Visualizing arbtt-stats
Message-ID: <CAFqH0NkHwVEk+byfZmXZpHKHPtahEgyd-C5748LOkF0tvUvG6g@mail.gmail.com>

Greetings!

With numerous suggestions from Joachim, the graphing tool is ready to use.

See https://github.com/rejuvyesh/arbtt-graph

Let me know if you have any issues using it. Currently works with only a
single category.

---
rejuvyesh
http://rejuvyesh.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141201/e74307f3/attachment.htm>

From gwern0 at gmail.com  Mon Dec  8 20:12:11 2014
From: gwern0 at gmail.com (gwern0 at gmail.com)
Date: Mon, 08 Dec 2014 11:12:11 -0800 (PST)
Subject: darcs patch: stats-main.hs: --min-percentage docs: consistent abbre...
Message-ID: <5485f80b.e4538c0a.7fac.ffffb358@mx.google.com>

1 patch for repository http://darcs.nomeata.de/arbtt:

Mon Dec  8 14:11:28 EST 2014  gwern at gwern.net
  * stats-main.hs: --min-percentage docs: consistent abbreviation
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-preview.txt
Type: text/x-darcs-patch
Size: 1078 bytes
Desc: Patch preview
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141208/d5a94648/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: stats_main_hs_-__min_percentage-docs_-consistent-abbreviation.dpatch
Type: application/x-darcs-patch
Size: 4275 bytes
Desc: A darcs patch for your repository!
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141208/d5a94648/attachment-0001.bin>

From mail at joachim-breitner.de  Mon Dec  8 20:36:52 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 08 Dec 2014 20:36:52 +0100
Subject: darcs patch: stats-main.hs: --min-percentage docs: consistent
 abbre...
In-Reply-To: <5485f80b.e4538c0a.7fac.ffffb358@mx.google.com>
References: <5485f80b.e4538c0a.7fac.ffffb358@mx.google.com>
Message-ID: <1418067412.26842.0.camel@joachim-breitner.de>

Hi,

Am Montag, den 08.12.2014, 11:12 -0800 schrieb gwern0 at gmail.com:
> 1 patch for repository http://darcs.nomeata.de/arbtt:
> 
> Mon Dec  8 14:11:28 EST 2014  gwern at gwern.net
>   * stats-main.hs: --min-percentage docs: consistent abbreviation

thanks, applied.

Greetings,
Joachim
-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141208/f1def5e3/attachment.asc>

From mail at joachim-breitner.de  Mon Dec  8 21:03:04 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 08 Dec 2014 21:03:04 +0100
Subject: Visualizing arbtt-stats
In-Reply-To: <CAFqH0NkHwVEk+byfZmXZpHKHPtahEgyd-C5748LOkF0tvUvG6g@mail.gmail.com>
References: <CAFqH0NkHwVEk+byfZmXZpHKHPtahEgyd-C5748LOkF0tvUvG6g@mail.gmail.com>
Message-ID: <1418068984.26842.1.camel@joachim-breitner.de>

Hi,


Am Montag, den 01.12.2014, 23:50 +0530 schrieb rejuvyesh:

> With numerous suggestions from Joachim, the graphing tool is ready to
> use. 
>
> See https://github.com/rejuvyesh/arbtt-graph
> 

nice. I added a section to the User?s guide about contributed tools, and
included yours there:
http://arbtt.nomeata.de/doc/users_guide/contributed.html

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141208/03dcf1a1/attachment.asc>

From gwern at gwern.net  Sun Dec 14 21:39:43 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sun, 14 Dec 2014 15:39:43 -0500
Subject: arbtt: use of database like sqlite3?
Message-ID: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>

So I was waiting on arbtt to --dump-samples for the past 100 hours to
write a rule classifying a web serial I read as recreational, and I
began wondering: what is arbtt doing that it takes so long?

Is it because of the log structure that it has to read through, parse,
and classify my full 85M arbtt log just to get the last 100 hours of
data? I know from working with an 18GB sqlite3 db for Mnemosyne that
date range queries in databases can be *extremely* fast, and
arbtt-capture dumping into a db would probably be more reliable and
durable (ACID rather than arbtt-recover), and sqlite3 has had multiple
Haskell bindings for half a decade now.

Would switching to sqlite3 be an improvement?

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Sun Dec 14 23:59:23 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Sun, 14 Dec 2014 23:59:23 +0100
Subject: arbtt: use of database like sqlite3?
In-Reply-To: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>
References: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>
Message-ID: <1418597963.3430.7.camel@joachim-breitner.de>

Hi Gwern,


Am Sonntag, den 14.12.2014, 15:39 -0500 schrieb Gwern Branwen:
> So I was waiting on arbtt to --dump-samples for the past 100 hours to
> write a rule classifying a web serial I read as recreational, and I
> began wondering: what is arbtt doing that it takes so long?

wait, it is taking 100 hours for one run of "arbtt-stats
--dump-samples"?

> Is it because of the log structure that it has to read through, parse,
> and classify my full 85M arbtt log just to get the last 100 hours of
> data? I know from working with an 18GB sqlite3 db for Mnemosyne that
> date range queries in databases can be *extremely* fast, and
> arbtt-capture dumping into a db would probably be more reliable and
> durable (ACID rather than arbtt-recover), and sqlite3 has had multiple
> Haskell bindings for half a decade now.
> 
> Would switching to sqlite3 be an improvement?

Possibly. My goal with the current system is to make arbtt-capture as
cheap as possible, by doing nothing than simply appending a few bytes to
a file. I might have been over-optimizing here, but it is certainly a
good idea to pay attention to something that is going to run constantly,
even when on battery.

But that does not mean that the benefits of sqlite (such as date range
queries) or otherwise improved data formats would not outweigh this.

Or maybe an alternative route could be taken: arbtt-capture still writes
to a binary append-only log, but regularly (i.e. once a day), this log
is imported into a format more amendable to searching and flushed.

But note that there is another possibly important feature of the current
log format: It shares strings between a sample and its previous sample.
Otherwise, upon every sample, every window title would be duplicated
stored again and again, yielding in considerably larger files and
arbtt-stats memory consumption. I doubt that this is easily possible
with sqlite.

Maybe it is also sufficient to create an index file for the log file,
with the offsets of, say, the first sample of each day. This would allow
arbtt-stats to categorize a specific time span faster. But this is also
tricky given the string-sharing format.

Maybe it is also sufficient to keep the log format, but split it into
smaller files, i.e. one per day. Again, date ranges would be smaller,
plus it would be back-friendlier and easier to delete certain dates
manually.

So all in all quite a few options, and no clear best way to follow. What
do you think?

Greetings,
Joachim

-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141214/05997d40/attachment.asc>

From gwern at gwern.net  Mon Dec 15 01:40:16 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Sun, 14 Dec 2014 19:40:16 -0500
Subject: arbtt: use of database like sqlite3?
In-Reply-To: <1418597963.3430.7.camel@joachim-breitner.de>
References: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>
 <1418597963.3430.7.camel@joachim-breitner.de>
Message-ID: <CAMwO0gzBQOcQdpzorLri0pnHp=NZnWiA2HRFnTCEgAS8HiSFiQ@mail.gmail.com>

On Sun, Dec 14, 2014 at 5:59 PM, Joachim Breitner
<mail at joachim-breitner.de> wrote:
> wait, it is taking 100 hours for one run of "arbtt-stats
> --dump-samples"?

No, I meant the query was restricted to the last 100 hours (which was
when I was reading it)

> Possibly. My goal with the current system is to make arbtt-capture as
> cheap as possible, by doing nothing than simply appending a few bytes to
> a file. I might have been over-optimizing here, but it is certainly a
> good idea to pay attention to something that is going to run constantly,
> even when on battery.

I don't know how much more expensive an append is, but I suspect that
even a single run through a logfile will use up more joules than
whatever additional overhead sqlite3 has in INSERTs. (Since sqlite3 is
often used in embedded and resource-constrained applications or in
applications that write heavily to the database like Firefox, it can't
be *that* bad.)

> Or maybe an alternative route could be taken: arbtt-capture still writes
> to a binary append-only log, but regularly (i.e. once a day), this log
> is imported into a format more amendable to searching and flushed.

I think that would probably have the worst of both worlds; two
architectures means a lot more can go wrong.

> But note that there is another possibly important feature of the current
> log format: It shares strings between a sample and its previous sample.
> Otherwise, upon every sample, every window title would be duplicated
> stored again and again, yielding in considerably larger files and
> arbtt-stats memory consumption. I doubt that this is easily possible
> with sqlite.

That's a good point: sqlite3 has pretty minimal support for
compression. There's a plugin or two for coding in
compression/decompression on specific entries (such as
https://github.com/salviati/sqlite3-lz4 ), but that wouldn't save much
space since we want between-entry compression. There's 2 official
proprietary plugins, but besides being proprietary are intended for
read-only databases.

We could bite the bullet and accept greater filesize. I wouldn't mind
doubling space usage if it made queries near-instant. Another option
is that the database could be structured to explicitly exploit
duplication; I saw this suggestion in
http://stackoverflow.com/a/1829601 :

> Compression is all about removing duplication, and in a log file most of the duplication is between entries rather than within each entry so compressing each entry individually is not going to be a huge win.
>
> This is off the top of my head so feel free to shoot it down in flames, but I would consider breaking the table into a set of smaller tables holding the individual parts of the entry. A log entry would then mostly consist of a timestamp (as DATE type rather than a string) plus a set of indexes into other tables (e.g. requesting IP, request type, requested URL, browser type etc.)

That is, in more Haskelly terms, each arbtt-capture sample is a
(timestamp, [String]); each string is assigned a unique ID and stored
in a hashmap if not already present, and the IDs are stored with the
timestamp. So  a few seconds of samples would look something like

(1418603655,[1,54,20,333])
(1418603678,[1,53,333])
(1418603693,[1,53,333])
(1418603702,[1,53,333])
(1418603712,[53,333,801])

avoiding the worst of between-row redundancy; and another table would
store the definition of string #1, #53, #801, etc whenever one needed
them. This probably would compress even better than a log format which
looks an entry back for redundancy since it extends the lookback to
the entire database history, and indexes presumably mean the queries
remain as fast (since sqlite3 knows where the indices point to in the
other table).

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Mon Dec 15 09:35:06 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 15 Dec 2014 09:35:06 +0100
Subject: arbtt: use of database like sqlite3?
In-Reply-To: <CAMwO0gzBQOcQdpzorLri0pnHp=NZnWiA2HRFnTCEgAS8HiSFiQ@mail.gmail.com>
References: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>
 <1418597963.3430.7.camel@joachim-breitner.de>
 <CAMwO0gzBQOcQdpzorLri0pnHp=NZnWiA2HRFnTCEgAS8HiSFiQ@mail.gmail.com>
Message-ID: <1418632506.1489.8.camel@joachim-breitner.de>

Hi,


Am Sonntag, den 14.12.2014, 19:40 -0500 schrieb Gwern Branwen:

> That is, in more Haskelly terms, each arbtt-capture sample is a
> (timestamp, [String]); each string is assigned a unique ID and stored
> in a hashmap if not already present, and the IDs are stored with the
> timestamp. So  a few seconds of samples would look something like
> 
> (1418603655,[1,54,20,333])
> (1418603678,[1,53,333])
> (1418603693,[1,53,333])
> (1418603702,[1,53,333])
> (1418603712,[53,333,801])
> 
> avoiding the worst of between-row redundancy; and another table would
> store the definition of string #1, #53, #801, etc whenever one needed
> them. This probably would compress even better than a log format which
> looks an entry back for redundancy since it extends the lookback to
> the entire database history, and indexes presumably mean the queries
> remain as fast (since sqlite3 knows where the indices point to in the
> other table).

yes, an internalized string format would also work quite well, and if
used correctly on the Haskell side, could avoid having duplicated
strings in memory as well.

But now the insertion is even more expensive: Upon every sample, for
every open window, sqlite will have to traverse an index of over a
million? entries to see if this particular window title has occurred
before. That?s quite an increase both in computation time _and_ memory
consumption for the long-running process.

I think this variant is also only good if the data is first collected to
a log and then occasionally sorted into the database.

Greetings,
Joachim


? $ arbtt-dump |sort -u|wc -l
1116948
# and without deduplication:
$ arbtt-dump |wc -l
5767660


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141215/92e18ab3/attachment.asc>

From gwern at gwern.net  Mon Dec 15 17:19:23 2014
From: gwern at gwern.net (Gwern Branwen)
Date: Mon, 15 Dec 2014 11:19:23 -0500
Subject: arbtt: use of database like sqlite3?
In-Reply-To: <1418632506.1489.8.camel@joachim-breitner.de>
References: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>
 <1418597963.3430.7.camel@joachim-breitner.de>
 <CAMwO0gzBQOcQdpzorLri0pnHp=NZnWiA2HRFnTCEgAS8HiSFiQ@mail.gmail.com>
 <1418632506.1489.8.camel@joachim-breitner.de>
Message-ID: <CAMwO0gwtmei=Dmw7dJFzrKiOPUakgegyyZN5N1LXEiHX9G9Tkw@mail.gmail.com>

On Mon, Dec 15, 2014 at 3:35 AM, Joachim Breitner
<mail at joachim-breitner.de> wrote:
> But now the insertion is even more expensive: Upon every sample, for
> every open window, sqlite will have to traverse an index of over a
> million? entries to see if this particular window title has occurred
> before.

No; come on, these are *databases*, their raison d'etre is looking up
stuff. The ideas behind them are now at least 44 years old - give them
a little credit, they can do better than a linear scan.

Suppose inserts were as dumb as a binary tree - then the traverse only
involves ~14 lookups (log_2(1million) = 13.8 levels of a tree). I
would be surprised if sqlite3 couldn't do thousands of linked inserts
per second, and I'd expect it to require microbenchmarking to show the
difference between a regular insert and an index-linked insert like
proposed.
It's really not an issue.

The real question is whether anyone wants the faster queries enough to
rewrite the backend for and make everyone convert their old logs over.
(The existing append-only logs may be terrible for queries, but all
the code exists and presumably is debugged.)

-- 
gwern
http://www.gwern.net


From mail at joachim-breitner.de  Mon Dec 15 19:11:26 2014
From: mail at joachim-breitner.de (Joachim Breitner)
Date: Mon, 15 Dec 2014 19:11:26 +0100
Subject: arbtt: use of database like sqlite3?
In-Reply-To: <CAMwO0gwtmei=Dmw7dJFzrKiOPUakgegyyZN5N1LXEiHX9G9Tkw@mail.gmail.com>
References: <CAMwO0gzQRNSciXnjSm6_JGbhvZ_SJxW=TcKEYZ7t3gjELKVzow@mail.gmail.com>
 <1418597963.3430.7.camel@joachim-breitner.de>
 <CAMwO0gzBQOcQdpzorLri0pnHp=NZnWiA2HRFnTCEgAS8HiSFiQ@mail.gmail.com>
 <1418632506.1489.8.camel@joachim-breitner.de>
 <CAMwO0gwtmei=Dmw7dJFzrKiOPUakgegyyZN5N1LXEiHX9G9Tkw@mail.gmail.com>
Message-ID: <1418667086.27272.5.camel@joachim-breitner.de>

Hi,


Am Montag, den 15.12.2014, 11:19 -0500 schrieb Gwern Branwen:
> On Mon, Dec 15, 2014 at 3:35 AM, Joachim Breitner
> <mail at joachim-breitner.de> wrote:
> > But now the insertion is even more expensive: Upon every sample, for
> > every open window, sqlite will have to traverse an index of over a
> > million? entries to see if this particular window title has occurred
> > before.
> 
> No; come on, these are *databases*, their raison d'etre is looking up
> stuff. The ideas behind them are now at least 44 years old - give them
> a little credit, they can do better than a linear scan.

that?s why I said ?an index?!

> Suppose inserts were as dumb as a binary tree - then the traverse only
> involves ~14 lookups (log_2(1million) = 13.8 levels of a tree). I
> would be surprised if sqlite3 couldn't do thousands of linked inserts
> per second, and I'd expect it to require microbenchmarking to show the
> difference between a regular insert and an index-linked insert like
> proposed.
> It's really not an issue.

You might be right...

> The real question is whether anyone wants the faster queries enough to
> rewrite the backend for and make everyone convert their old logs over.

I don?t think this will be too hard. Maybe I should simply give it a try
over the holidays.

Also, this would maybe make it easier to cache the actual tags. This
might require some refactoring and redesign, though. E.g. currently,
tagging may easily depend on the command line.

Anyways, I also added a ticket for it
https://bitbucket.org/nomeata/arbtt/issue/19/use-sqlite3-as-a-backend

> (The existing append-only logs may be terrible for queries, but all
> the code exists and presumably is debugged.)

presumably.... unfortunately, they are not bullet proof (otherwise there
would be no need for arbtt-recover).


-- 
Joachim ?nomeata? Breitner
  mail at joachim-breitner.de ? http://www.joachim-breitner.de/
  Jabber: nomeata at joachim-breitner.de  ? GPG-Key: 0xF0FBF51F
  Debian Developer: nomeata at debian.org

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <https://lists.nomeata.de/pipermail/arbtt/attachments/20141215/29e1c0c8/attachment.asc>