Working with Play-by-Play Sports Data: Using SQL & R


Here are the slides from an introduction to working with sports play-by-play event data given in the Columbia University Statistics Dept. The aim was to introduce how to build a SQL database from raw XML data and then explore this database using R. I used soccer data provided by Opta which unfortunately I cannot share.


Also in the talk I discussed how to visualize data using R and briefly touched upon how to use d3.js. I also introduced the concept of expected goals and we discussed how to build such a model using play-by-play data and what improvements could be made.




Slides here (may take a short while to load due to gifs!)