All Projects → ivanliu1989 → Driver Telematics Analysis

ivanliu1989 / Driver Telematics Analysis

Licence: mit

Projects that are alternatives of or similar to Driver Telematics Analysis

Cs224u
Code for Stanford CS224u
Stars: ✭ 857 (+7690.91%)
Mutual labels:  jupyter-notebook
Advanced pymc3
A talk illustrating some of the Advanced features of PyMC3
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Julia stats
Изучаем Julia
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Marketing campaign response prediction
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Awesome Google Colab
Google Colaboratory Notebooks and Repositories (by @firmai)
Stars: ✭ 863 (+7745.45%)
Mutual labels:  jupyter-notebook
Tf box classify
A simple TensorFlow example for training CNN models using input queues and labelled JPEGs
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Wikipediagenderinequality
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
D Script
Writer Identification of Handwritten Documents
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Data Science On Gcp
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Stars: ✭ 864 (+7754.55%)
Mutual labels:  jupyter-notebook
Open data science east 2016
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Optimization Cookbook
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Neurally Embedded Emojis
Convolutional variational autoencoders and text-question, emoji-answer models
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Curso pdi
Material do Curso PDI (Jupyter)
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Pytorch Everybodydancenow
Implementation of Everybody Dance Now by pytorch
Stars: ✭ 861 (+7727.27%)
Mutual labels:  jupyter-notebook
Pandastalks
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Idiomatic Robotframework
Stars: ✭ 10 (-9.09%)
Mutual labels:  jupyter-notebook
Pandas Tutorials
How To's and Tutorials in Jupyter Notebook
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Machine Learning
Machine Learning Projects using R, Mahout, Python and NLP
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Tutorials
Landlab tutorials
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook
Pdlsr
Pandas-aware non-linear least squares regression using Lmfit
Stars: ✭ 11 (+0%)
Mutual labels:  jupyter-notebook

Driver Telematics Analysis

For this competition, Kaggle participants must come up with a "telematic fingerprint" capable of distinguishing when a trip was driven by a given driver. The features of this driver fingerprint could help assess risk and form a crucial piece of a larger telematics puzzle.

alt tag

FINAL SOLUTION:

  • An ensemble model of random forest, neural networks and gradient boosting.
  • Have also tried other models like SVMs (Linear,Radial kernel), logistic regression, naive bayes, knn and kmeans, but no improvement found.
  • Have implemented a very simple trip match method to gain a small boost based on basic ensemble model. More advanced trip matching method can be developed if there are more time and more capable machines provided.
  • For data cleaning, both kalman filter and moving average are used to reduce data noise and remove inappropriate outliers.
  • Final Ranking: 76th/1562 (0.92274 | 5%)

TIPS:

  1. one driver in one car
  2. units of x and y - meters
  3. In Telematics ignition "on" to ignition "off" is defined as a "trip"
  4. driver 1 on trip 8 took a 200 meter detour and remained stopped for ~ 30 seconds or so. (see the attached image) Can we assume that this is a person dropping their kids off at school, or getting fast coffee as opposed to a person who stops for 30 minutes, the GPS shuts down and then resumes their travel and the GPS starts up again
  5. Smoothing vs. Outlier Removal
  1. Rotation and scaling of data
  2. number of turns, distribution of speed / acceleration
  3. I'm just using basic trigonometry. Identify the slope of the journey ( last y coord over last x coord). Then use inverse tan to calculate overall angle of travel. Then can use this to rotate all journeys to begin and end on the axis. At that stage flipping journeys becomes easy, just mutiply y coord by -1. Can use things like length of journey, max x, max y, average y to make a reasonable stab at similar journeys.
    • what's a straight?
  • how do you define a left turn? How many degrees turning over how long a distance?
  • is speed important? What about all those tiny movements when it seems like the driver is maneuvering at a parking spot?
  • once you get the journeys defined as strings, how well do they have to match for you to consider them the same journey? Is 1.1km straight the same as 1.2km straight? Driver 1, trip 1 is a very good example to try and define the turns and straights. For matching (and flipped) trips (and return trips) you can find several from driver 1 as well: 102,167,183,197,200 and 63,83,120,148 are some sets that are pretty tough. 10.Another idea that may be worth investigating is to use the various spatial packages in R.
    • convert the xy points for each journey into a SpatialLines object (package sp).
    • use the elide routine from package maptools to rotate and flip the journeys
    • use gdistance from package rgeos to measure the distance or similarity between journeys

ALGORITHM:

  1. GBM : The train MSE is will obviously go down with the number of tree's. So should I stop at maybe 150 tree's assuming that I am over-fitting after that.
  2. SVM : Is SVM better than GBM in terms of dealing with Noise.
  3. RandomForest : Is this any better than GBM for noise?
  4. NNET

VALUABLE TRY:

  1. Take the false trip data (all of it, all the data from all the other drivers) and run a k-means clustering with 200 or more clusters. Now use the centers of the resulting clusters as your target=0 data for all drivers.
  2. Next idea is sort of like a "by-hand" boosting algorithm but with more weight added to the 0 labels only. Take lots of 0 target trips and use them with the drivers' 200 trips (as target=1) to train a classifier. Now predict on just your 0 target trips and drop some percentage of them that had the lowest predicted probability as target=1. Now you're left with only those trips that are closest to the decision boundary. Retrain with the drivers' original trips and these zeros only.

SOME GOOD RESULTS:

  1. speed quantile features and gradient boost method
  2. speed, acceleration and curve features
  3. trimming outliers worked better than averaging them out
  4. Using Speed, Acceleration, Centripetal Acceleration, Heading Change, and et al. features processed by RandomForest: 0.83578.
  5. Using randomForest produced the best result (0.88315)

Used 5 contrasting drivers, chosen at random (found what RobRob found: increasing drivers did not improve performance: tried with 10) Feature set includes distance, time and %les for speed, acceleration and turn( same as heading changes as mentioned by RobRob?) Logistic performed far worse, best score: 0.77173, same feature set as for outliers, using %les probably helps eliminate them if you don't use mean values and don't include anything in the first and last 1 percent (total distance traveled is the only one to be impacted, in case of "hyper space" jumps).

Note that the project description data, including the texts, logos, images, and/or trademarks, for each open source project belongs to its rightful owner. If you wish to add or remove any projects, please contact us at [email protected].