ABP/Ublock compatible list of blocked domains/zones
add Manual Blocks to ABP
ABP/Ublock compatible list of blocked domains/zones without Social Media tracker domains
add Manual Blocks (no SM) to ABP
Modified version of EasyList (ABP/Ublock compatible)
add Modified Easylist Blocks to ABP
Modified version of EasyList (ABP/Ublock compatible) without Social Media tracker domains
add Modified Easylist Blocks (no SM) to ABP
ABP compatible list of Social Media tracker domains
Add Social Media Trackers to ABP
Pi-Hole compatible blocklist
Pi-Hole compatible blocklist with Social Media tracking domains
(There are some Greasemonkey scripts too, but I'll ignore those as they're largely static).
Workflow
These scripts are refreshed, and often compiled, by a fairly clunk workchain, initially triggered by update_addomains.sh.
The workflow was quickly hacked together quite some time ago, and hasn't really had the attention it needs to be less crap.
It has had some minor improvements, like the amendment to allow blocks to be broken out into dedicated files (allowing categorisation of domains), but as workflows go it's still pretty shoddy.
Delivery
My adblock lists are delivered via www.bentasker.co.uk using my standard CDN resources. This is unusual, in that most adblock lists tend to be delivered straight from Github.
The problem with delivering via www.bentasker.co.uk is that the cache needs to be invalidated whenever an update is made to the lists. This makes automation hard (as the server either needs creds to interact with the CDN, or you have to have quite short TTLs).
Management
Blocks are managed via a number of config files, and then compiled into the published files.
But, there's some legacy cruft from the old management approach, so lines may not always be consistently added in the correct place.
This could probably be addressed independently if needed, but worth including in the context of wider changes.
Had it been just me using the lists, then it's possible that I might have considered this option (and instead locally hosted something for pihole to consume).
However, looking at my access log, there are thousands upon thousands of requests a month for the adblock lists.
Whilst I don't want to continue the lists in their current form, it feels like offering an alternative would be the decent thing
In this option, we'd spin out a new set of adblock lists, but with certain elements of the automation removed.
In particular, we'd no longer retrieve and rewrite easylist lists (so the config files easylist_append_lines.txt, easylist_strip.txt and easylist_strip_absolute.txt would be deprecated and removed).
Configuration would be revised, but wouldn't be that dissimilar to now.
However, the project would no longer be reliant on a cron script somewhere - individual lists would be compiled, on commit, by a git hook.
This change in build process means that external lists (such as the cname-trackers list) shouldn't be included. Whilst they could trivially be pulled in during a hook run, it doesn't make logical sense to present a "complete" list knowing that it'll only be updated when I add something to my own lists - inclusion of third party scripts needs to have a regular refresh cadence, which I don't want to commit to in this project.
Delivery of the lists would be via Github rather than my CDN
The idea here is that I should be able to more quickly publish updates to lists without changes in my own infra necessarily impacting it.
The existing adblock files would be left where they are - where possible redirects would be added to the new version, but only where the new version is directly compatible with what the user-agent thinks they're requesting (i.e. if they're requesting something with no social media domains in it, we can't direct to a generic list).
The final option is to carry on as we are now - do nothing.
But, it's likely to impact the cadence of updates: I've had a few infra changes this year, and the cron to retrieve and publish updates needs revising (in part due to my move to using nikola for my main site).
Ideally, I'll continue to track changes in this project, and given the option I'd like commits to be made into this project too (potentially with some sort of dual-remote setup in order to publish into Github).
The logical approach would be to fork the existing repo to make the necessary changes so that (where possible) a record of when domains were added (and why) is retained in the commit history.
This will ensure that when a commit is made, it's pushed to both Gitlab and Github.
This may be refined later to only trigger when on the master branch - so that complex changes can be performed in a branch and only pushed once they're merged
We now have our 4 formats - the script is much simplified compared to it's predecessor (largely because we're not having to mess about with modifying easylist).
Need to think carefully before doing any redirects, as we don't want to accidentally remove people's existing protection.
For example, because the original rewrites easylist (and pulls in a list of miner domains and cname trackers), there are quite a few entries in blockeddomains.txt
The initial check was intended to strip a subdomain off (so that a block for foo.bar.com wouldn't collide with a zone block for bar.com), obviously that doesn't work well if there is no subdomain in the name.
It's not trivially addressable either: we could check for a depth of 2, but foo.co.uk would fail that check and still return the same result.
I'll spin out a seperate ticket to track that one (#2)
Those would be broken by a redirect (because curl won't follow it by default). I can fix my own scripts, but that'd still leave the risk of breaking other people's deployments.
Activity
08-Jun-22 16:00
assigned to @btasker
08-Jun-22 16:10
Currently, I provide quite a range of options at https://www.bentasker.co.uk/adblock/:
(There are some Greasemonkey scripts too, but I'll ignore those as they're largely static).
Workflow
These scripts are refreshed, and often compiled, by a fairly clunk workchain, initially triggered by update_addomains.sh.
The workflow was quickly hacked together quite some time ago, and hasn't really had the attention it needs to be less crap.
It has had some minor improvements, like the amendment to allow blocks to be broken out into dedicated files (allowing categorisation of domains), but as workflows go it's still pretty shoddy.
Delivery
My adblock lists are delivered via
www.bentasker.co.uk
using my standard CDN resources. This is unusual, in that most adblock lists tend to be delivered straight from Github.The problem with delivering via
www.bentasker.co.uk
is that the cache needs to be invalidated whenever an update is made to the lists. This makes automation hard (as the server either needs creds to interact with the CDN, or you have to have quite short TTLs).Management
Blocks are managed via a number of config files, and then compiled into the published files.
But, there's some legacy cruft from the old management approach, so lines may not always be consistently added in the correct place.
This could probably be addressed independently if needed, but worth including in the context of wider changes.
08-Jun-22 16:12
Option 1: Discontinue Lists entirely
Had it been just me using the lists, then it's possible that I might have considered this option (and instead locally hosted something for pihole to consume).
However, looking at my access log, there are thousands upon thousands of requests a month for the adblock lists.
Whilst I don't want to continue the lists in their current form, it feels like offering an alternative would be the decent thing
08-Jun-22 16:17
Option 2: Discontinue (most) automation
This is currently my preferred option.
In this option, we'd spin out a new set of adblock lists, but with certain elements of the automation removed.
In particular, we'd no longer retrieve and rewrite
easylist
lists (so the config fileseasylist_append_lines.txt
,easylist_strip.txt
andeasylist_strip_absolute.txt
would be deprecated and removed).Configuration would be revised, but wouldn't be that dissimilar to now.
However, the project would no longer be reliant on a cron script somewhere - individual lists would be compiled, on commit, by a git hook.
This change in build process means that external lists (such as the cname-trackers list) shouldn't be included. Whilst they could trivially be pulled in during a hook run, it doesn't make logical sense to present a "complete" list knowing that it'll only be updated when I add something to my own lists - inclusion of third party scripts needs to have a regular refresh cadence, which I don't want to commit to in this project.
Delivery of the lists would be via Github rather than my CDN
The idea here is that I should be able to more quickly publish updates to lists without changes in my own infra necessarily impacting it.
The existing adblock files would be left where they are - where possible redirects would be added to the new version, but only where the new version is directly compatible with what the user-agent thinks they're requesting (i.e. if they're requesting something with no social media domains in it, we can't direct to a generic list).
08-Jun-22 16:19
Option 3: Do Nothing
The final option is to carry on as we are now - do nothing.
But, it's likely to impact the cadence of updates: I've had a few infra changes this year, and the cron to retrieve and publish updates needs revising (in part due to my move to using
nikola
for my main site).08-Jun-22 16:21
My intention is to look at moving to Option 2.
Ideally, I'll continue to track changes in this project, and given the option I'd like commits to be made into this project too (potentially with some sort of dual-remote setup in order to publish into Github).
The logical approach would be to fork the existing repo to make the necessary changes so that (where possible) a record of when domains were added (and why) is retained in the commit history.
08-Jun-22 16:32
mentioned in commit 81eeba99bd0de0ac00ed36edd9f3cd8f88f91195
Message
Create hook to update hooks after git-pull in preparation for jira-projects/ADBLK#1
08-Jun-22 16:34
mentioned in commit 9900a5f2641dd6d4e8ab7371c42b5075025dfbb1
Message
Add a post-commit hook in support of jira-projects/ADBLK#1
This will ensure that when a commit is made, it's pushed to both Gitlab and Github.
This may be refined later to only trigger when on the master branch - so that complex changes can be performed in a branch and only pushed once they're merged
08-Jun-22 16:38
mentioned in commit 13b3199017936d9c237bf5aefd67ed0259928930
Message
Remove files that are defunct under jira-projects/ADBLK#1
These files will continue to be hosted at https://www.bentasker.co.uk/adblock/ but are not part of v2 of this project
08-Jun-22 16:50
OK, the easy bit is done - the next stage is to look at writing a
pre-commit
hook that can build lists for us.We first need to define what lists we want to continue to provide. To a certain extent, that's going to be defined by which are actually in use.
autolist.txt
gets a few hundred requests a month, should probably port that overadblock_compiled.txt
(ABP/Ublock compatible list of blocked domains/zones)blockeddomains.txt
Pihole compatible listmanualzones.txt
: Used for regex blocks in PiholeNot being ported:
zoneblocks.unbound.txt
: Unbound format version ofmanualzones.txt
adblock_compiled_no_sm.txt
easylist_modified.txt
- Modified easylist support is being deprecatedeasylist_modified_no_sm.txt
- Modified easylist support is being deprecatedsocial_media_trackers.txt
- very limited useIt doesn't look like there's any particular current interest in lists that identify social media trackers separately (or those that exclude them).
So, V2 will generate a much simpler subset of lists, which can be summarised as
08-Jun-22 17:04
mentioned in commit af75c5b9128769922004031e09c89f4a9e040621
Message
Start creating new list building script for jira-projects/ADBLK#1
This'll eventually be triggered as part of a hook.
Although I'm tweaking a little as I go, initially it probably won't be much less clunky than the original as I'm using that as the basis
08-Jun-22 17:15
mentioned in commit 58202238c9736382c38d1f32c73b0069b8c56ddb
Message
Add support for list of zones for jira-projects/ADBLK#1
This is missing some functionality from the original, should probably add that later
08-Jun-22 17:21
mentioned in commit 60feb062857d8ab6069dce2029a0d899b4d4e700
Message
Add AdblockPlus compatability for jira-projects/ADBLK#1
This generates a file compatible with ABP and UBlock Origin
08-Jun-22 17:22
We now have our 4 formats - the script is much simplified compared to it's predecessor (largely because we're not having to mess about with modifying easylist).
08-Jun-22 17:23
mentioned in commit a2c9cfc1e38e24a85446360c08290f90b1706838
Message
As of jira-projects/ADBLK#1 social media trackers will no longer be accounted for in seperate blocklists.
Move into the general block config
08-Jun-22 17:23
mentioned in commit eefcd76f6abad9f6c0503991c6e1ca6401296bbc
Message
The easylist overrides are defunct as of jira-projects/ADBLK#1
Remove them
08-Jun-22 17:34
mentioned in commit 5ea126941f5ea5c8c3507e592c3b71b34e95405a
Message
Update script to install/publish the generated lists for jira-projects/ADBLK#1
08-Jun-22 18:37
mentioned in commit cae4954b315ff09e295996158c63b41201308d4c
Message
Have commit hooks rebuild and publish the lists for jira-projects/ADBLK#1
The process is a little contrived:
pre-commit
will let you stage files, but not add them into the current commit.So, instead, we write to a lockfile which
post-commit
checks for. If present, it'll rebuild the lists and amend the commit to include them.The lockfile is used because otherwise the commit --amend will re-trigger
post-commit
giving an infinite loop08-Jun-22 19:07
The
list
directory in the repository contains more or less a single adblock list published in a number of different formats formatsadblock_plus.txt
: Adblock Plus and UBlock Origin compatible formatunbound.txt
: Unbound config compatible formatblockeddomains.txt
: A simple list of Blocked domainsregexes.txt
: A list of zone wide blockszones.txt
: A list of zone wide blocksThe list of blocked zones can be used with a parser to generate regexes to feed into PiHole.
08-Jun-22 19:08
Still
TODO
09-Jun-22 07:28
Need to think carefully before doing any redirects, as we don't want to accidentally remove people's existing protection.
For example, because the original rewrites easylist (and pulls in a list of miner domains and cname trackers), there are quite a few entries in
blockeddomains.txt
Whereas the repo version has much fewer
So, it'd be unwise to redirect
blockdomains.txt
away as it'd have the effect of unexpectedly removing ~30K domains from people's blocklists.09-Jun-22 07:34
The same logic applies to the ABP format (5676 vs 1019).
The unbound format (previously
autolist.txt
) is less clear - the previous version had 394, the current has 390. Should look into why that is.Similarly the list of blockedzones is close (195 vs 197), again, we should look into why
09-Jun-22 07:35
The list of regex blocks can be redirected - they're identical.
09-Jun-22 07:49
The missing lines are
It looks like the build script is skipping it because it'll be blocked as a zone, but then the zone block never makes it into the file...
That happens because the check runs
Which returns true.
There's no good reason for the inversion (
-v
) there, so I'm removing that.It returns true because the statement evaluates to
The initial check was intended to strip a subdomain off (so that a block for
foo.bar.com
wouldn't collide with a zone block forbar.com
), obviously that doesn't work well if there is no subdomain in the name.It's not trivially addressable either: we could check for a depth of 2, but
foo.co.uk
would fail that check and still return the same result.I'll spin out a seperate ticket to track that one (#2)
09-Jun-22 07:54
mentioned in issue #2
09-Jun-22 14:03
#2 is resolved. It also accounts for (and has corrected) the difference between
blockedzones.txt
andzones.txt
09-Jun-22 14:12
The new lists now have the same content as their predecessors, except for the third-party entries pulled in. So, I think we're good in that respect.
It's occurred to me that implementing any redirects is probably unwise. A number of my scripts do something like
Those would be broken by a redirect (because
curl
won't follow it by default). I can fix my own scripts, but that'd still leave the risk of breaking other people's deployments.So, no redirects to implement.
09-Jun-22 19:27
Writeup published at https://www.bentasker.co.uk/posts/blog/general/replacing-my-ad-block-lists-with-a-newer-version.html.
I've disabled all automation around version 1 - I think we're done.