The crawler already supports skipping URLs if they contain a substring defined in config/skipstrings.txt.
However, it's not always possible to use those - perhaps because you only want to block a substring for a specific domain, etc.
So, also want to be able to provide regular expressions to apply against URLs
Activity
29-Dec-23 10:37
assigned to @btasker
29-Dec-23 10:38
mentioned in commit 5d0437abe2ff5df1f9943949f8951c871c9fb142
Commit: 5d0437abe2ff5df1f9943949f8951c871c9fb142 Author: B Tasker Date: 2023-12-29T10:35:13.000+00:00Message
feat: add support for regex based skip rules (utilities/file_location_listing#6)