This project is the follow on from misc/Python_Web_Crawler#12
I discontinued that project because I no longer had need for full-text search.
What I do continue to have a need for, though, is identifying where I stored a file - i.e. searching by filename, path etc.
The aim of this project is to stand up a simple crawler and web portal which allows me to search for files by location
Although I'm not sure that it'll scale (in fact, I'm certain that it won't), I'd like the initial implementation to function without reliance on a traditional database - the focus should be on getting the crawler and information collection up and running.
The crawler should read a list of predefined domains from config and crawl pages on those domains. It should store
Nice to haves
Activity
28-Dec-23 12:09
assigned to @btasker
28-Dec-23 12:11
Raised utilities/file_location_listing#2 for the crawler
28-Dec-23 17:52
OK, we have a working proof of concept.
The next thing to do will be to productise it:
/search/
API should be documented so that I can call it from CLI utilitiesSo far, it performs pretty well using the text storage and indexes - I suspect that'll cease to be true once we've done a full crawl though.