Work log
2016-04-20 12:25:36
Time Spent: 10 minutes
Log Entry: Adding headers to Issue pages and Attachments, testing
2016-04-20 12:42:04
Time Spent: 16 minutes
Log Entry: Theorising and waffling
2016-04-20 13:56:56
Time Spent: 60 minutes
Log Entry: Initial implementation of support for Conditional Requests
2016-04-22 13:12:28
Time Spent: 60 minutes
Log Entry: Adding revalidation support to Component/Version pages
2016-04-22 13:21:11
Time Spent: 8 minutes
Log Entry: Adding Etag/Last-Mod/Reval support to project indexes. Testing
Activity
2016-04-20 12:25:36
2016-04-20 12:34:24
I currently have an install of Sphider (http://www.sphider.eu/) which crawls JILS (and other sites) periodically. Sphider's bot is, being generous, less than clever.
The way it performs revalidations is to fetch the entire page, generate an MD5 of the content and then compare that to a stored hash to see whether it needs to reprocess.
This is less than efficient as it means the origin has to serve the entire page either way, which really starts to matter when your JIRA database grows beyond a certain (arbitrary) size.
Before it places a GET, the bot places a GET in order to check for non-200 results. So I've patched it to also extract Last-Modified from the headers (may well add ETag later). This is now compared against a date stored in the database (no changes there, the date was already stored), and the GET is changed if the date remains the same.
So, for Sphider's purposes, as of commits 81c7055 and 5d22f2d revalidation is now possible on Issue pages and attachments (and thumbs, though it doesn't fetch those).
However - true revalidation still isn't possible. If (for an example) a caching NGinx reverse proxy were to be placed downstream, it'd periodically need to re-fetch the entire content, as it (correctly) uses conditional requests to revalidate the content rather than placing a HEAD first.
So, the final step in this issue is to add support for a more normal use-case - conditional GETS (i.e. If-Modified-Since etc).
As no page currently returns a cache-control header, also need to implement a configuration option to allow one to be set if desired (the alternative is forcing a caching age either at Apache or on the downstream proxy). The configuration option should allow a per-class setting to be used, so that different ages can be set for Issue pages, Attachments, Issue indexes (i.e. project home pages, versions, components etc).
2016-04-20 12:41:46
If so, what's the best way to do so? The idea of adding the headers is to make revalidation's cheap, so the SQL statement used to get the necessary data needs to be as simple as possible, otherwise it may well be cheaper to simply re-enumerate the issues for that page.
But, there's also the question of how smart to be. Which of the following questions do we ask?
- Has any issue linked to from this page changed in some way?
- Has any issue linked to from this page changed in such a way it'll have caused this page to change?
The difference being that the former is (largely) just a raw check of the jiraaction table. The second involves looking for certain events (change of Issue status/resolution, change of issue title, change of assignee, Change of priority, Change of Type, new issue creation etc).
The latter would give a more accurate result, but comes at a cost
- The SQL query will be more complex
- It ties the query to the current layout, making future changes to the layout more complex (will have to remember to update the query)
Going with the simpler route, though, means that the page will need to re-indexed whenever any change is made to an issue within a project - even if that "change" is simply that a comment has been added, or an additional watcher has been added (not even displayed on the issue page currently). But, it does avoid the risk of forgetting to update the query in the future and having new changes not be picked up when they should.
2016-04-20 12:42:04
2016-04-20 12:48:28
So will look for and honour
- If-Modified-Since
- If-None-Match
2016-04-20 12:48:30
2016-04-20 13:31:27
Webhook User-Agent
View Commit
2016-04-20 13:34:30
As required by the RFC, If-None-Match takes precedence over If-Modified-Since.
However, strictly speaking, the implementation isn't technically RFC compliant:
We're not currently checking that it's a valid HTTP date, simply that it's a valid date. Also not currently checking the request method - the system only uses GET/HEAD (ideally Apache should be configured to return a 405 for anything else), but really should insert a check just to be safe
2016-04-20 13:39:26
Webhook User-Agent
View Commit
2016-04-20 13:47:26
Webhook User-Agent
View Commit
2016-04-20 13:47:26
Webhook User-Agent
View Commit
2016-04-20 13:50:53
But changing the date format:
The only thing it doesn't do correctly yet is check the timezone, so
So, it's not 2616 (https://tools.ietf.org/html/rfc2616#section-3.3) compliant yet
2016-04-20 13:55:59
And with the correct TZ
2016-04-20 13:56:28
2016-04-20 13:56:56
2016-04-20 13:57:27
Webhook User-Agent
View Commit
2016-04-21 11:19:55
Used the following in the NGinx configuration to allow revalidation and force a short cache-age (to make testing quicker)
The ran a couple of timed requests against the NGinx box whilst tailing logs on the Apache origin
And for attachments
So it looks like that's working well.
Interestingly, although NGinx received a strong indicator (the E-Tag) in it's original CACHE_MISS:
It used the weaker indicator (Last-Modified) when going upstream to revalidate
So worth keeping in mind when making changes in the future
2016-04-22 12:08:06
It avoids future complications if the pages are changed, and also means that anyone wanting to tweak the layout of the pages in their own install won't need to go off and look up field names to do so.
The downside being that a page will be re-indexed if any change has occurred on an issue to linked to there (even if that's simply attaching a file), but also means that pages for old versions (for example) can be revalidated. If we explicitly avoid checking the jiraaction table, then this won't be true when comments are added.
At the moment, if a project is sitting there untouched, the listings pages will still need to be retrieved in full. This way we can get revalidation working for those to reduce overhead on the origin server.
2016-04-22 12:08:59
2016-04-22 12:41:26
Webhook User-Agent
View Commit
2016-04-22 12:51:52
JILS-36 broke the page into two sections - issues that are (or will be) fixed in this version, and "known issues" (i.e. issues that affect this version but are/will be fixed in a later version).
The current ETag/Last-mod implementation takes into account the former, but not the latter. So, once a version has been marked as released, any known issues will display as they were at time of release (so if they're fixed a week later, they'll still show as "open" on that page).
Simply adjusting it to include the latter is relatively simple, but means that most of the work involved in generating the page is already done. So aside from bandwidth savings we don't gain very much.
I think a better route would be to adjust the "Last-Modified" query to look more like the pre JILS-36 version than the current incarnation, so that a single query checks anything linked to the current version.
2016-04-22 12:57:26
Webhook User-Agent
View Commit
2016-04-22 12:57:27
Webhook User-Agent
View Commit
2016-04-22 13:00:05
2016-04-22 13:01:27
Webhook User-Agent
View Commit
2016-04-22 13:09:26
Webhook User-Agent
View Commit
2016-04-22 13:11:50
Which leaves two classes of page to consider
- Project Pages
- Projects Index
As the latter is just a list of projects, it's currently pretty inexpensive to generate. Adding a query to check whether any issues have changed is going to increase cost significantly, so for the time being, I'm inclined to leave that one alone.
There's definitely some value to adding to the Project Pages though. As has been done with Components and Versions, need to take any changes to the project itself into account.
2016-04-22 13:12:01
2016-04-22 13:12:28
2016-04-22 13:19:27
Webhook User-Agent
View Commit
2016-04-22 13:20:42
The next task affects Projects/Version/Component pages:
At the moment, if the Project/Version/Component properties are changed (say an update to the description), the E-Tag will change accordingly. Last-Modified, however, will remain unchanged, which is clearly going to cause an issue if there's an NGinx cache downstream (given it only uses IMS).
Need to have a poke around the database and see whether the change gets recorded anywhere
2016-04-22 13:21:11
2016-04-22 13:30:12
Not sure if there's a way around it, but it's going to cause an issue. Where a downstream cache relies on IMS (it shouldn't, but does) it will incorrectly revalidate a page despite the information about that version/project/component having been updated.
The workaround, for the time being, is to ensure that an issue within that project/version/component gets manually updated (perhaps toggling to a different priority, then changing back).
I'm going to raise a separate bug for that so it can be tracked/listed as a known issue - JILS-42
2016-04-22 13:37:46
2016-04-22 13:37:47
2016-04-22 13:37:47
2016-04-22 13:37:51
2016-04-29 14:53:58
2016-04-29 14:53:58
2016-04-29 14:53:58
2016-04-29 14:55:07
2016-04-29 14:55:07
2016-04-29 14:55:07
2016-04-29 14:58:32
2016-04-29 14:58:32
2016-04-29 14:58:32
2016-04-29 14:58:36