jira-projects/ADBLK#2: Zone shortcircuit logic throws false positives



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v2
Created: 09-Jun-22 07:54



Description

There's a section within the build script which attempts to detect whether a domain is part of a zone that's been blocked and avoid writing it into the unbound format output if so:

# Check if the domain exists within a zone that'll be blocked
egrep -e "^${domain#*.}|^$domain" $blocked_zones > /dev/null
if [ "$?" == "1" ]
then
    echo "local-data: \"$domain A 127.0.0.1\"" >> $unbound_listbuild
    echo "local-data: \"$domain AAAA ::1\"" >> $unbound_listbuild
fi

This isn't just for efficiency purposes - if we write in foo.bar.com when there's a local-zone statement for bar.com then unbound will refuse to start, breaking people's installs.

The grep attempts to check for two scenarios:

  • bar.com exists in the zone blocks
  • foo.bar.com exists in the zone blocks

The first check is the one that's problematic.

In #1 we found that the domain pecult.com wasn't being included in the Unbound format blocklist.

The reason is that the egrep line evaluates to

egrep -e "^com|^pecult.com" lists/zones.txt 

Which matches the following line

commoncannon.com

We need the check to be able to tell whether the domain that's being checked is likely to be a second level domain or not.



Toggle State Changes

Activity


assigned to @btasker

mentioned in issue #1

One the thing that's odd here is that that code statement was copied verbatim from the original adblock implementation, yet pecult.com made it into the unbound format in that one. Should probably look at how/why in case there's something that's not been translated across.

In the original blocklists it's being added as a blocked zone:

local-zone: "pecult.com" redirect
local-data: "pecult.com A 127.0.0.1"

The logic we're looking at (in blockDomain()) adds a domain record (i.e. local-data only).

So, whatever in the original is adding it is operating on zones (in other words, it's not added via blockDomain() in the original)

It must be being added by this block

cat autolist.zones.txt | sort | uniq | egrep -v -e '^$' | while read -r domain
do

cat << EOM >> autolist.build.txt
local-zone: "$domain" redirect
local-data: "$domain A 127.0.0.1"
EOM


cat << EOM >> zoneblocks.unbound.txt
local-zone: "$domain" redirect
local-data: "$domain A 127.0.0.1"
EOM

echo "$domain" >> blockedzones.txt

done

So, the domain is leaking into autolist.zones.txt somewhere....

Ah, this is a false positive - the list hadn't been regenerated following a few changes.

So, there's nothing special about the original version that mean it made it in.

Back to the intent of this issue then.

We need to be able to take TLDs of different depths

  • news.bbc.co.uk
  • foo.google.com

And correctly search the zones file for the domain (so bbc.co.uk and google.com respectively)

The simplest way might be to use the current egrep but capture into a variable and then check the number of periods - if there are none, we've trimmed too much. But, that wouldn't handle the bbc.co.uk usecase (so we'd end up falsely matching co.ukabcde.com if it were in the zone list)

Test case then:

cat << EOM > zones.txt

common.blah
google.com
EOM


domain="google.com"
egrep -e "^${domain#*.}|^$domain" zones.txt

Currently returns both lines, so lets handle that

domain="google.com"

dom_suffix="${domain#*.}"
tld=${dom_suffix#*.}

echo $dom_suffix
echo $tld

if [ "$tld" == "$dom_suffix" ]
then
    # there was no additional domain on the end
    dom_suffix=$domain
fi


echo $domain
egrep -e "^$dom_suffix|^$domain" zones.txt

Matches only google.com.

But, we now need to make sure it caters to news.bbc.co.uk

cat << EOM > zones.txt

common.blah
google.com
bbc.co.uk
EOM

domain="news.bbc.co.uk"

dom_suffix="${domain#*.}"
tld=${dom_suffix#*.}

if [ "$tld" == "$dom_suffix" ]
then
    # there was no additional domain on the end
    dom_suffix=$domain
fi

egrep -e "^$dom_suffix|^$domain" zones.txt

Looks good

verified

mentioned in commit de03cdc6ebfec17049026202579ed02916fd9b20

Commit: de03cdc6ebfec17049026202579ed02916fd9b20 
Author: B Tasker                            
                            
Date: 2022-06-09T14:58:10.000+01:00 

Message

Fix domain filtering for jira-projects/ADBLK#2

This handles checking the zone blocks for a parent zone without accidentally checking for the TLD

+1468 -3 (1471 lines changed)

The domain that started all this now gets into the unbound format list

$ grep pecu lists/unbound.txt 
local-data: "pecult.com A 127.0.0.1"
local-data: "pecult.com AAAA ::1"