Want to run some experiments into possible setups for efficiently delivering streaming video via Tor Hidden Services. For what I've got in mind, it needs to be ABR and I hate smooth streaming so we'll go with HLS.
The aim is to build a tiered system with a single origin.
- node 1 - nginx caching reverse proxy - Hidden service 1
- node 2 - nginx caching reverse proxy - Hidden service 1
- node 3 - Nginx caching reverse proxy - Hidden service 2
-- origin - Hidden service 3
Where both node1 and node2 advertise the same hidden service. Essentially using the descriptor publishing race condition to build the edge of a small tiered CDN.
Hidden service 2 then proxies onto the origin. The idea being that it should be possible to easily introduce an additional cache at that level of the heirachy to further protect the origin, again, by using the race condition.
Once setup, need to test against both VoD and Linear content.
This is, in part, an expansion of
MISC-2 in that CDN like behaviour is already being used for some static content on
http://6zdgh5a5e6zpchdz.onion/
The ultimate aim isn't actually to deliver video, but to gauge the feasibility of building an efficient, fully Tor based CDN to aid in scalability and fault resistance (without needing to maintain multiple origins). I've specifically chosen streaming video as a starting point because delivery is time sensitive and issues are easily observable.
The aim is to start with full HD video at 60fps and then once delivery of that has been improved, work down to delivering something more realistic (so, either 720p at 24fps or 480p at 24fps and small static files, e.g. images/CSS).
Activity
2015-12-14 13:15:40
A copy of Big Buck Bunny is currently being transcoded into a multiple bitrate HLS stream (using HLS Stream Creator).
2015-12-14 13:15:56
2015-12-14 13:50:10
Further Considerations
Assuming reliable delivery to multiple clients can be sustained by splitting the edge, further consideration is needed regarding the impact on the Tor network itself.
At time of writing, relays advertise about 150Gbps of bandwidth within the Tor network (according to https://metrics.torproject.org/bandwidth.html). That capacity could easily be saturated by widespread delivery of even low bandwidth video.
A number of possible solutions come to mind
- Make every node within the CDN a middle relay to give some bandwidth back
- Access the midtier and/or the origin via Clearnet
Both have potential privacy implications
2015-12-14 13:58:52
IOW, with
- Relay 1 - foo.onion
If you can see foo.onion is down, there's a good chance it's hosted on the system running Relay 1
Thinking about it though, the situation here is slightly different. There will be (at least) two edge caches advertising the same descriptor. So if we take the following topology
- Relay 1 / Edge 1 - foo.onion
- Relay 2 / Edge 2 - foo.onion
If Relay 1 goes down, foo.onion will remain available. There may be a short period between descriptors being published where that isn't the case though? Would need to measure.
I guess, you probably add slightly more or slightly fewer relays though. If you've got a bunch of 24 relays all with the same "MyFamily" (as should be the case given the common operator) and your CDN is made up of 24 nodes, it correlates a bit closely.
More to the point, if your CDN is built on a bunch of systems with 10Gbps NICs, you might want to not have a relay on some of those, and instead offer some relays with 1Gbps NICs to reduce the likelihood of commonality.
Will come back to thinking about this later, it's easy to tie yourself in knots with various permutations.
2015-12-14 14:01:02
With every tier of the CDN being a hidden service, a 1MB file coming from the origin will need to transit a circuit (of 6 hops) 3 times in order to reach the ultimate client.
That can be addressed by making the caching as efficient as possible so that the upstream path is rarely used, but, still doesn't address the question of what happens if a huge number of users start streaming video at the same time.
Even if everything is served directly from the caches on the edge, that's still potentially a large saturation of Tor's bandwidth, so ideally you'd want to be giving some back (taking us back to Solution 1).
Might be that the best bet is a combination of the two (or something else), but the suitability of either solution will also depend on whether the CDN operator wishes to remain anonymous, or whether the aim is simply to protect client connections?
An assessment of the options needs factoring into the final writeup though really.
2015-12-14 14:02:18
2015-12-14 14:28:44
2015-12-15 15:01:08
2015-12-15 16:21:28
2015-12-15 16:21:52
2015-12-15 16:27:46
2015-12-15 17:01:42
Interesting thing to note, testing so far has been via a transparent Tor client with all players using that same client. When attempting to stream three copies of the stream the client started giving connection refused and needed to be restarted in order to be able to resume streaming (client on the midtier was fine though)
Will be interesting to see whether the same thing occurs when spreading requests across the edge. For the main tests, each player should have a dedicated tor client though.
URLs so far
- Origin: https://streamingtest.bentasker.co.uk
- Midtier: http://cix7cricsvweeu6k.onion:8091/
Will look at building the edge tomorrow
2015-12-16 11:18:02
2015-12-16 11:18:45
2015-12-16 11:18:59
2015-12-16 12:04:58
The descriptor to use for accessing via the edge is http://f5jayrbaz7nmtyyr.onion
For some reason NGinx was ignoring the resolver directive, so had to set Tor's DNS port to be localhost:53 and then update resolv.conf to direct all queries through there (could also have transparently redirected in iptables, but seemed better to be explicit).
2015-12-16 12:05:39
2015-12-16 12:06:06
2015-12-16 13:20:19
I've tweaked the output format of the stats, and added a request count in the later stats. The stats are specifically for test plays using the JWPlayer Stream tester page - http://demo.jwplayer.com/stream-tester/ - the bandwidth setting was set to force the player to use the 1Mb/s stream.
Given that the segments are an average of 2 seconds long (see comments on HLS-5 for why it varies), the observed delivery time of 9 seconds risks the playback session breaking as the buffer underruns.
But, based on the stats, it looks like those long durations were as a result of difficulties in getting the content to the client, rather than delays introduced trying to acquire from upstream (unless the delay was as a result of taking a while to establish an upstream connection).
Will look at getting the other edge node online now so we can see how/if requests balance across them. I'd expect that an individual player would probably use the same edge node for the duration of the playout session (at least, for something as short as Big Buck Bunny) but maybe that's not going to be the case.
2015-12-16 13:55:30
2015-12-16 13:55:42
2015-12-16 14:39:10
So, it looks like a single player will always go to the same edge cache, at least for a short playback session.
I'm happy the infrastructure seems to be working, so can start the tests laid out in the design document.
2015-12-16 14:39:23
2015-12-16 14:48:29
So removing
- 2 edge caches online, proxying to origin (cold cache)
- 2 edge caches online, proxying to origin (warm cache)
- 2 edge caches online, mid-tier online, proxying to origin (caching)
From the initial set of tests leaves us with
- Direct to origin
- 1 edge cache online, proxying to origin (cold cache)
- 1 edge cache online, proxying to origin (warm cache)
- edge cache online, mid-tier online, proxying to origin (caching)
- Multiple players, multiple VoD streams (with overlap between players)
- Multiple players, multiple VoD streams, limited cache space (to force LRUing)
We've essentially already run some of those in testing the setup, but to keep things simple will simply repeat those tests.
From here-on out the bandwidth setting in JW Player will be set to auto so that we can see how often it moves between the available options.
2015-12-16 16:39:08
2015-12-17 09:16:49
2015-12-17 09:54:54
Tests during Setup
- test1 - One Edge node live (cold)
- test2 - One Edge node live (warm)
- test3 - One edge node live (cold), midtier (warm)
- test4 - Missed a number here.....
Formal tests
- test5 - Direct to origin (via HTTP)
- test6 - Direct to midtier (cold)
- test7 - Direct to midtier (warm)
- test8 - To edge (cold), midtier (warm)
- test9 - To edge (warm), midtier (warm)
- test10 - To edge (cold), midtier (cold)
The multi-client/multi-player tests are next.
The unsurprising observation so far is that when things go well, HLS playout is fine via an .onion, but if issues are experienced there isn't a lot of elbowroom in a 2 second fragment to avoid it impacting on playback..
2015-12-17 13:45:09
- test11 - Two players, Edge 2 cache warmish (from previous playout)
- test12 - Two players, Edge 2 (warm)
- test13 - One player, Edge 1 (cold), midtier (cold)
- test14 - Two players, Edge 1 cache warmish (from previous playout)
- test15 - Two players, Edge 1 (warm)
- test16 - Two players, Edge 1 (warm)
Test 16 was essentially a repeat of test 15, because there were far more revalidations than expected in 15. Likely a result of the artificial warm taking longer to run than expected.
Delivery is still a little shakey at times, and the higher bandwidth stream still isn't really being utilised. So, I don't see any point in proceeding with the linear tests at the moment (as the stream will be unwatchable).
I'm going to move onto using an increased segment size (10 seconds) and will run the same tests to see what improvement, if any, it gives. It might be that a midway point (4 or 6 seconds) yields more benefit though.
2015-12-17 13:45:52
2015-12-18 12:47:54
- Test 17 (Direct to Origin)
- Test 18 (Direct to midtier, coldcache)
- Test 19 (Direct to midtier, warmcache)
- Test 20 (Cold edge, warm midtier)
- Test 21 (Warm edge, Warm midtier)
- Test 22 (Cold edge, cold midtier)
- Test 23 (warm edge, warm midtier)
- Test 24 (Artificially warmed Edge/midtier)
Test 21 saw some serious delivery issues which appear (based on a quick glance) to have been caused by the circuit to the client collapsing. So Test 23 is essentially a repeat of 21.
Will move onto the multi-player tests in a while
2015-12-18 12:48:22
2015-12-18 16:03:21
- Test 25 - Two players, cold edge
- Test 26 - Two players, warm edge
So that's all the 1080p VoD tests done.
The unsurprising conclusion so far is that Full HD delivery through tiered Tor Hidden Services is possible, but gives an unpredictable playback experience. But then, that's largely the case on the clearnet too.
I need to re-transcode the 720p stream, so I'll run the next set of tests against a 480p copy. Given this issue is already quite long, I'll raise a subtask for any subsequent steps required.
2015-12-18 16:05:56
2015-12-24 13:14:51
2015-12-26 16:48:54
2016-01-10 21:15:57
2016-01-10 21:15:57
2016-01-10 21:16:18
2016-01-10 21:16:23
2016-01-10 21:18:00
2017-07-06 15:54:55
2017-07-06 15:54:55
2017-07-06 15:54:58