Subnet matches were introduced in project-management-only/scraper-snitch-bot#9 in order to try and address the noise caused by Meta's crawlers.
Because of the size of the IPv6 address space, Meta's crawlers are able to connect from a wide range of IPs, with each ultimately triggering a notification toot. That's not necessarily Meta playing fast and loose so much as a reflection of how IPv6 addresses tend to be allocated.
The subnet matching functionality is something of a quick hack, but should serve to reduce this noise.
Within the config, I define known subnets:
grouped_prefixes:
- 2a03:2880::/32
- 2620:0:1c00::/40
If a misbehaving IP falls within one of these ranges, a few things happen
- The receipts filename is overridden to reflect the subnet (so
2a03-2880-1ff1-ab.txt
becomes28a03-2880--32-subnet.txt
) - The toot and receipts filename text is updated to note that it's a subnet match (and include the subnet)
- The flag
Subnet-Match
is allocated
There will be no re-toot if a bot from another IP within that subnet comes along and misbehaves - the assumption is that users will have blocked the subnet as a result of the earlier alert.
The toot text will also change to refer to a subnet rather than an IP.
Shortcomings
There are a few known shortcomings with this approach:
- Although not re-alerting was part of the aim, it does also raise the cost of missing an alert
- Some of the stats that are reported will be wrong: the count of how many requests have been seen is still tied to the individual IP rather than reflecting the subnet as a whole (this will likely be fixed in future)
- Existing/known bots will trigger the subnet alert if they reappear (this is probably a good thing, but might be annoying whilst it's happening) - it'll only happen for the first match to each subnet though
But, this is probably outweighed by the benefits of reducing alert fatigue.