I have a mirror of some projects at https://projects.bentasker.co.uk/gils_projects/index.html
Most don't use the Wiki, but there's one where I wanted to start.
However, whilst the Wiki index shows up https://projects.bentasker.co.uk/gils_projects/wiki/project-management-only/scraper-snitch-bot/list.html correctly, the links to the pages themselves are broken.
They contain the original host name (docker-host
), having not been replaced by wget
for some reason.
Need to look at what's different about these links - that there's a hostname there at all suggests that they might be absolute rather than relative links.
Activity
28-Jan-23 10:20
assigned to @btasker
28-Jan-23 13:59
Have had a look, this looks like there was an issue during mirroring - when the pages are requested from the mirror container directly they load fine. I think Gitlab's API might briefly have gone unavailable during mirroring, so
wget
wasn't able to fetch the pages and therefore didn't convert the links.29-Jan-23 09:48
The mirror's not caught up overnight.
Although we can see those pages were requested, they failed again
Curiously, we haven't logged anything on the server side. Turning
display_errors
on and running a capture at the same time to see whether we can collect anything.Commit views seem to view periodically with 500s too, but there are quite a lot of those to fetch (so might well just be hitting limits).
29-Jan-23 09:52
OK, looks like the network call failed
We're using a hostname for redis in config, so have replaced that with an IP. Re-running
29-Jan-23 09:53
Obviously we should also fix the code so that it handles that gracefully (or logs the issue somewhere).
29-Jan-23 09:59
OK, so we already are catching redis failures
The "network" issue though, is presumably in docker - the hostname that we're trying to resolve is actually in
/etc/hosts
.Whatever the cause though, if we're unable to lookup redis's hostname, we're presumably also unable to use the network to reach gitlab's API, so we get nothing back from there, which we're failing to handle - the codebase largely assumes that we'll get a reply from the API.
To be fair, we correctly fail out with an error when we don't, we just don't log anything to aid troubleshooting.
29-Jan-23 10:06
mentioned in commit d91273730d0e33f35347e95fc8f50129db80354b
Message
Add error logging for websites/Gitlab-Issue-Listing-Script#58
Make it easier to troubleshoot when/why 500s are being returned.