project Utilities / File Location Listing avatar

utilities/file_location_listing#52: Exception in new deployment



Issue Information

Issue Type: issue
Status: closed
Reported By: btasker
Assigned To: btasker

Milestone: v0.2.7
Created: 31-May-24 15:25



Description

I'm in the process of deploying a public instance of this (jira-projects/CDN#65) and ran into an exception

Traceback (most recent call last):
  File "/app/crawler/app/crawler.py", line 764, in <module>
    crawlPage(site, override = True)
  File "/app/crawler/app/crawler.py", line 577, in crawlPage
    if not shouldCrawlURL(url):
           ^^^^^^^^^^^^^^^^^^^
  File "/app/crawler/app/crawler.py", line 538, in shouldCrawlURL
    if config.shouldSkipURL(url, parsed.netloc):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/crawler/app/../../lib/config.py", line 213, in shouldSkipURL
    SITE_PERMIT_RULES = getPermitRules()
                        ^^^^^^^^^^^^^^^^
  File "/app/crawler/app/../../lib/config.py", line 176, in getPermitRules
    for line in l:
                ^
UnboundLocalError: cannot access local variable 'l' where it is not associated with a value

It may be that I've missed some config rather than it being a code bug. Either way, it should be handled better.



Toggle State Changes

Activity


assigned to @btasker

Yeah it's a bug:

    if os.path.exists(f"{CONFIG_BASE}/config/site-allowregexes.txt"):
        with open(f"{CONFIG_BASE}/config/site-allowregexes.txt") as f:
            l = f.read().splitlines()

    s = {}
    for line in l:

The file site-allowregexes.txt doesn't exist, so we don't open it, but then try and iterate through the variable we would have used.

verified

mentioned in commit 4d3d8e40ef2e2134619e0d2319f18eef9c510b29

Commit: 4d3d8e40ef2e2134619e0d2319f18eef9c510b29 
Author: B Tasker                            
                            
Date: 2024-05-31T16:27:33.000+01:00 

Message

fix: don't error out if allow regexes aren't provided (utilities/file_location_listing#52)

+6 -4 (10 lines changed)

The workaround for released versions is simply to create the file

touch files/file_location_listing/search_db/config/site-allowregexes.txt

mentioned in issue jira-projects/CDN#65