GPXIN-22: Measure impact of Distance Calculations



Issue Information

Issue Type: Task
 
Priority: Minor
Status: Closed

Reported By:
Ben Tasker
Assigned To:
Ben Tasker
Project: PHP GPXIngest (GPXIN)
Resolution: Done (2015-08-27 05:22:13)
Affects Version: 1.03,
Target version: 1.03,
Components: Experimental Features ,

Created: 2015-08-13 18:03:14
Time Spent Working


Description
GPXIN-6 implemented automatic distance calculations between trackpoints based on changes in Latitude/Longitude.

The original feeling was that this might be un-necessarily processor intensive - in various tests though that doesn't seem to have been the case.

So need to run some additional tests and use the data to decide whether or not distance calculations should be enabled by default (if so, there should be an ability to suppress them).


Attachments

run2.csv.gz

Issue Links

Toggle State Changes

Activity


btasker changed priority from 'Major' to 'Minor'
Doing a very quick test run to get a basic overview.

ben@milleniumfalcon:~/tmp$ grep trkpt test.gpx | wc -l
10320


Created two test scripts - with_calcs.php and without_calcs.php
<?php
// without_calcs.php
require 'GPXIngest.class.php';

$gpx = new GPXIngest();
$gpx->loadFile('test.gpx');
$gpx->ingest();
print_r($gpx->getGPXNameSpaces());


<?php
//with_calcs.php
require 'GPXIngest.class.php';

$gpx = new GPXIngest();
$gpx->enableExperimental('calcDistance');
$gpx->loadFile('test.gpx');
$gpx->ingest();
print_r($gpx->getGPXNameSpaces());


Test run (not interested in the script output at this point)
ben@milleniumfalcon:~/tmp$ for i in {1..6}; do (time php without_calcs.php > /dev/null) ; done; echo "With:"; for i in {1..6}; do (time php with_calcs.php > /dev/null) ; done;

real	0m0.446s
user	0m0.417s
sys	0m0.028s

real	0m0.463s
user	0m0.444s
sys	0m0.016s

real	0m0.442s
user	0m0.422s
sys	0m0.020s

real	0m0.443s
user	0m0.420s
sys	0m0.020s

real	0m0.443s
user	0m0.410s
sys	0m0.032s

real	0m0.445s
user	0m0.428s
sys	0m0.016s
With:

real	0m0.492s
user	0m0.460s
sys	0m0.012s

real	0m0.466s
user	0m0.450s
sys	0m0.016s

real	0m0.475s
user	0m0.421s
sys	0m0.052s

real	0m0.459s
user	0m0.446s
sys	0m0.012s

real	0m0.477s
user	0m0.432s
sys	0m0.044s

real	0m0.470s
user	0m0.426s
sys	0m0.044s


Which gives the following result
==========================================================
| Run    |   1   |   2   |  3    |   4   |  5    |   6   |
----------------------------------------------------------
| With   | 0.492 | 0.466 | 0.475 | 0.459 | 0.477 | 0.470 |
----------------------------------------------------------
| Without| 0.446 | 0.463 | 0.442 | 0.443 | 0.443 | 0.445 |
----------------------------------------------------------
| Diff   | 0.046 | 0.003 | 0.033 | 0.016 | 0.034 | 0.025 |
==========================================================

So on average, have the calculations enabled required an extra 0.026216667 seconds of calculation.

Per trackpoint (0.026216667/10320) that's a cost of 0.00000254 seconds.

Want to run a few more test runs, with different files and bigger run-sizes, but based on that I can't see any argument against turning the calculations on by default
btasker changed status from 'Open' to 'In Progress'
btasker added 'run2.csv.gz' to Attachments
btasker removed 'run2.csv.gz' from Attachment
Larger run this time, triggered with
ben@milleniumfalcon:~/tmp$ for i in {1..1000}; do echo "$((time php without_calcs.php) |& grep real | cut -f2),$((time php with_calcs.php) |& grep real | cut -f2)," >> processing_times.csv ; done;
ben@milleniumfalcon:~/tmp$ less processing_times.csv # quick check to make sure nothing's > 60 seconds
ben@milleniumfalcon:~/tmp$ sed -i 's/0m//g' processing_times.csv
ben@milleniumfalcon:~/tmp$ sed -i 's/s,/,/g' processing_times.csv


I used real time as that's consistently been the highest value.

Resulting data is attached (run2.csv.gz) but the overall stats are
===================================================================
|         |   Mean      |   Median    |  Max        |   Min       |
-------------------------------------------------------------------
| Total   | 0.030436    | 0.0305      | 0.105       | -0.006      |
-------------------------------------------------------------------
| Per-Row | 0.000002949 | 0.000002949 | 0.000002949 | -0.000000581| 
===================================================================


So, the overall extra time required to perform the distance calculations is negligible, and in at least one case, the version with calculations ran (marginally) faster.

The test file I'm using represents a journey of more than 3 hours with varied speeds, so there should be a good range of calculations going on.

I'm fairly comfortable with the idea of raising an FR to move calcDistance out of experimental features, so that it's enabled by default (so long as it can be suppressed if needed).
GPXIN-23 has been raised to enable the functionality by default. Closing
btasker changed status from 'In Progress' to 'Resolved'
btasker added 'Done' to resolution
btasker changed status from 'Resolved' to 'Closed'