Instances running the FediFetcher software started getting flagged in issue 8.
After resolving the underlying issue, this page was created to try and provide an independent assessment of FediFetcher.
Background
Fedifetcher is a Python script, which describes itself as follows
FediFetcher is a tool for Mastodon that automatically fetches missing replies and posts from other fediverse instances, and adds them to your own Mastodon instance.
The author wrote about it on their blog and announced release on Mastodon.
Source code is shared at https://github.com/nanos/FediFetcher
Described Behaviour
The author posted a detailed explanation in response to questions/criticism.
On inspection, their explanation is borne out by the code (links below are to the most recent commit at time of writing):
At startup it fetches information about any instances that it knows it's seen before, before progressing to fetch configured input sources (for example by fetching the local user's lists)
For each configured source:
- It gets the timeline
- It fetches context for each of those , placing requests to get details of posts by mentioned users
- It adds those posts to the local instance:
- It fetches the posts
- It then iterates through those, running a search for the associated URL on the local instance
- It checks if that post has replies and whether it's config says it should fetch that context (if so, it adds the URLs of replies etc)
Depending on what's enabled in config, input sources might be
- User's lists
- Posts that the local user replied to
- The local user's home timeline
- Posts from the
n
last users the local user followed - Posts from the
n
last users to follow the local user - Posts from the
n
last users to request to follow the local user - Posts from the
n
last notifications - Posts from the
n
last bookmarks - Posts from the
n
last Posts from then
last bookmarks
Behaviour Synopsis
Although it requests posts/toots, Fedifetcher itself does not do anything with the content of those toots.
What it's doing is discovering the URLs of replies to specific toots and then telling it's local mastodon instance to fetch those.
Mastodon then attempts to fetch the toot as it would any other.
So, any blocks (instance or user level) will still be honoured (because the Mastodon instance will be unable to fetch those toots).
Additional notes:
- Honours rate-limiting responses (and also limits own requests)
Scenario: Normal Flow
Alice, Bob and Carlos are all on different instances. Bob follows Alice, but does not follow Carlos (who also follows Alice)
When Alice toots something interesting
:
- Bob sees Alice's toot, and replies
- Carlos sees Alice's toot, and replies
At this point:
- Bob can see Alice's toot, but not Carlos' reply
- Carlos can see Alice's toot, but not Bob's reply
- Alice can see Alice's toot and both replies
After Fedifetcher has run, though
- Bob can see Alice's toot and Carlos' reply
- Carlos can see Alice's toot and Bob's reply
- Alice can see Alice's toot and both replies
Scenario: Blocked User
Alice, Bob and Mallory are all on different instances. Mallory follows Alice Bob follows Alice and has blocked Mallory
When Alice toots something interesting
:
- Bob sees Alice's toot, and replies
- Mallory sees Alice's toot, and replies
At this point:
- Bob can see Alice's toot, but not Mallory's reply
- Carlos can see Alice's toot, but not Mallory's reply
- Alice can see Alice's toot and both replies
The situation does not change after fedifetcher has run:
- Bob's instance won't fetch Mallory's reply (because Bob has blocked Mallory)
- Mallory's instance is unable to fetch Bob's reply (because Bob has blocked Mallory)
It's open to debate, but some might argue that this is, in fact, an improvement. If Bob were reliant on visiting Alice's profile (on Alice's instance), they would see Mallory's reply. With Fedifetcher, they will not.
Admins: Restricting Fedifetcher
At time of writing, Fedifetcher appears to honour both the Allow
and Disallow
directives in robots.txt
.
So, it should be possible to disallow or restrict access:
Disallow everything
User-agent: *
Disallow: /
Disallow only Fedifetcher
User-agent: FediFetcher
Disallow: /
Allow Fedifetcher to a specific user only
User-agent: FediFetcher
Allow: /users/ben/statuses/
Disallow: /
Users: Restricting FediFetcher
Fedifetcher appears to support a selection of profile options:
- Adding the
#NoBot
tag to profile - Turn off featuring posts in profile
- Disabling inclusion in public search results
Admins: Preventing Local Use
Admins who do not want their users to run Fedifetcher themselves should be aware that the tool also applies robots.txt
to the API requests it makes to the local instance:
$ kubectl logs -f job.batch/fedifetcher-run-1
2024-08-18 14:33:41 UTC: Starting FediFetcher
2024-08-18 14:33:41 UTC: Getting context for home timeline
2024-08-18 14:33:41 UTC: Error getting timeline toots: Querying https://mastodon.bentasker.co.uk/api/v1/timelines/home prohibited by robots.txt
2024-08-18 14:33:41 UTC: Job failed after 0:00:00.158276.
Traceback (most recent call last):
File "/app/find_posts.py", line 1689, in <module>
timeline_toots = get_timeline(arguments.server, token, arguments.home_timeline_length)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/find_posts.py", line 362, in get_timeline
response = get_toots(url, access_token)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/find_posts.py", line 394, in get_toots
response = get( url, headers={
^^^^^^^^^^^^^^^^^^^
File "/app/find_posts.py", line 1142, in get
raise Exception(f"Querying {url} prohibited by robots.txt")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Exception: Querying https://mastodon.bentasker.co.uk/api/v1/timelines/home prohibited by robots.txt
So, robots.txt
can also be used by an instance admin to prevent their users from running their own instances of the tool.
Conversely, those admins who aren't concerned about it's use should be aware that pre-existing robots.txt
rules (specifically, Disallow: *
) may also prevent their users from using the tool.
If you wanted to allow users to opt to run it to fetch replies for posts in their timeline, but not allow other instances to run it against you, you might do something like
User-agent: *
Disallow: /
User-agent: FediFetcher
allow: /api/v1/timelines/home
Note: if everyone did this, the tool wouldn't be much use.
Possible Performance Impacts
Fedifetcher's mode of operation means that remote instances may see increased load, particularly if FediFetcher is working through a particularly active thread.
There are 3 associated load profiles
- The local server: will obviously see an increase in searches + resulting fetches. Increased API activity too, but that's not likely to be too steep
- Instance with original post: Will see a request for the post and it's replies. Might see periodic re-requests (depending on FediFetcher settings)
- Instance with a reply: Will see a request, per reply, from the local mastodon instance as well as the Fedifetcher requests necessary to check for profile level opt outs
At small scale, this is unlikely to translate to much load/impact. However, load is likely to increase where there are particularly active threads, with a lot of replies coming from 1 instance (whether that's lots of users, or one user of that instance).