[qutebrowser] Extend ad blocking to allow single URL blocking.

Viacheslav Chimishuk vchimishuk at yandex.ru
Mon Nov 5 19:12:12 CET 2018


> Hey,
>
> (re-adding the list to the Cc, I assume you accidentally didn't include
> it in your reply)

Hi, Florian.

Yes. Sorry, my fault. Thanks for adding it back.

> That's true, and a valid concern. Currently we just can test for
> membership of a host in a set, which is O(1). Host blocking lookups are
> done quite a lot, and there are over 50k blocked hosts with the default
> blocklist, so having that O(n) instead of O(1) is probably a noticable
> performance difference.
>
> I added a benchmark test for adblock lookups:
> https://github.com/qutebrowser/qutebrowser/commit/27d4796c2f9ced450ceaf3569de228532ce08df7
>
> Currently looking up whether a host is blocked takes 3us (median).
> When changing the lookup to:
>
>   any(h == host for h in self._blocked_hosts)
>
> instead of
>
>   host in self._blocked_hosts
>
> it instead takes 5.6ms, so almost 2000 times slower. Seeing that there
> can be hundreds of adblock lookups for a page load, that's not
> acceptable.
>
>
> Like you say, the best way out would be having some kind of
> UrlPatternSet class which can be more intelligent about lookups. I've
> had some thoughts on that noted down somewhere already, but I can't find
> them right now. Basically, you could hash patterns by their hosts, and
> then if you want to check whether foobar.example.com is contained, make
> some O(1) lookups for:
>
> - foobar.example.com
> - *.example.com
> - *.com
> - *
>
> However, ensuring that this works efficiently for all imaginable cases
> probably requires some more thought.
>
> I'd instead propose the following:
>
> - Add a UrlPatternSet class which we can extend like explained above
>   over time.
> - For now, just add a special case for patterns in the form of
>   *://example.com/* (all schemes, constant host, all paths), by looking
>   them up in a Python set. For all patterns which are more complex than
>   that, fall back to iterating.
>
> That solution should be quite straightforward for now, and result in a
> huge speedup for the common case of adblocking lists.
>
> What do you think?

Sounds like an excellent plan. I'll implement it the way you
described. After that I'll try to investigate and find a better way to
speed it up.

--
Best regards, Viacheslav Chimishuk
vchimishuk at yandex.ru
Ukraine, Khmelnitsky



More information about the qutebrowser mailing list