There's a section within the build script which attempts to detect whether a domain is part of a zone that's been blocked and avoid writing it into the unbound format output if so:
# Check if the domain exists within a zone that'll be blocked
egrep -e "^${domain#*.}|^$domain" $blocked_zones > /dev/null
if [ "$?" == "1" ]
then
echo "local-data: \"$domain A 127.0.0.1\"" >> $unbound_listbuild
echo "local-data: \"$domain AAAA ::1\"" >> $unbound_listbuild
fi
This isn't just for efficiency purposes - if we write in foo.bar.com
when there's a local-zone
statement for bar.com
then unbound will refuse to start, breaking people's installs.
The grep attempts to check for two scenarios:
bar.com
exists in the zone blocksfoo.bar.com
exists in the zone blocksThe first check is the one that's problematic.
In #1 we found that the domain pecult.com
wasn't being included in the Unbound format blocklist.
The reason is that the egrep
line evaluates to
egrep -e "^com|^pecult.com" lists/zones.txt
Which matches the following line
commoncannon.com
We need the check to be able to tell whether the domain that's being checked is likely to be a second level domain or not.
Activity
09-Jun-22 07:54
assigned to @btasker
09-Jun-22 07:54
mentioned in issue #1
09-Jun-22 07:55
One the thing that's odd here is that that code statement was copied verbatim from the original adblock implementation, yet
pecult.com
made it into the unbound format in that one. Should probably look at how/why in case there's something that's not been translated across.09-Jun-22 12:42
In the original blocklists it's being added as a blocked zone:
The logic we're looking at (in
blockDomain()
) adds a domain record (i.e.local-data
only).So, whatever in the original is adding it is operating on zones (in other words, it's not added via
blockDomain()
in the original)09-Jun-22 13:08
It must be being added by this block
So, the domain is leaking into
autolist.zones.txt
somewhere....Ah, this is a false positive - the list hadn't been regenerated following a few changes.
So, there's nothing special about the original version that mean it made it in.
09-Jun-22 13:21
Back to the intent of this issue then.
We need to be able to take TLDs of different depths
news.bbc.co.uk
foo.google.com
And correctly search the zones file for the domain (so
bbc.co.uk
andgoogle.com
respectively)The simplest way might be to use the current
egrep
but capture into a variable and then check the number of periods - if there are none, we've trimmed too much. But, that wouldn't handle thebbc.co.uk
usecase (so we'd end up falsely matchingco.ukabcde.com
if it were in the zone list)09-Jun-22 13:53
Test case then:
Currently returns both lines, so lets handle that
Matches only
google.com
.But, we now need to make sure it caters to
news.bbc.co.uk
Looks good
09-Jun-22 13:58
mentioned in commit de03cdc6ebfec17049026202579ed02916fd9b20
Message
Fix domain filtering for jira-projects/ADBLK#2
This handles checking the zone blocks for a parent zone without accidentally checking for the TLD
09-Jun-22 14:02
The domain that started all this now gets into the unbound format list