The crawler already supports skipping URLs if they contain a substring defined in config/skipstrings.txt
.
However, it's not always possible to use those - perhaps because you only want to block a substring for a specific domain, etc.
So, also want to be able to provide regular expressions to apply against URLs
Activity
29-Dec-23 10:37
assigned to @btasker
29-Dec-23 10:38
mentioned in commit 5d0437abe2ff5df1f9943949f8951c871c9fb142
Message
feat: add support for regex based skip rules (utilities/file_location_listing#6)