This demo predicts the duration of currently active road traffic incidents in London.
Incidents are unscheduled disruptions caused by collisions, surface damage, burst water mains etc, and do not include
planned roadworks or other scheduled disruptions.
The data for this demo is taken from the Transport for London (TfL)
Traffic Information Management Service feed.
Additional weather data for London is provided by the Weather Underground API.
All the data is stored in a Pivotal Greenplum Database.
The analysis and modelling is run using MADlib, Python
and Scikit-Learn every time the feed is updated. The graphs are produced using NVD3 and
D3. This website is running on the Pivotal CF hosted instance of Cloud Foundry.
- The current list of disruptions is retrieved from the TfL feed when it is updated every five minutes.
- The reports in the feed are then parsed and inserted into the Pivotal Greenplum Database.
- The feed is not limited to only new updates and many reports are duplicated. These duplicates are removed in the database.
- In order to model the disruption pattern features are created from the incoming data such as the day of the week and the number of affected streets. Additional
data from alternative datasets including weather data is incorporated.
- For this demo only Traffic Incidents and Hazards are used to generate the model as these are short lived unexpected disruptions,
unlike long term planned roadworks for example.
- The model is generated using MADlib, Python and scikit-learn and predictions are made for the currently active incidents.
- The interactive plots and table are created using d3.js and NVD3.
The maps are created using Leaflet and Leaflet.heat.
- The public website is served by Cloud Foundry using a simple nginx buildpack.
Head back to the predictions, analysis or details
of the models.
Created by Ian Huston | Twitter | LinkedIn |
Thank you to the whole Data Science team for their help in producing this demo, especially Noelle and Vatsan.