Chapter 3: Predicting Sports Winners with Decision Trees

More on pandas

http://pandas.pydata.org/pandas-docs/stable/tutorials.html

The pandas library is a great package—anything you normally write to do data loading is probably already implemented in pandas. You can learn more about it from their tutorial, linked above.

There is also a great blog post written by Chris Moffitt that overviews common tasks people do in Excel and how to do them in pandas: http://pbpython.com/excel-pandas-comp.html

You can also handle large datasets with pandas; see the answer, from user Jeff (the top answer at the time of writing), to this StackOverflow question for an extensive overview of the process: http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas.

Another great tutorial on pandas is written by Brian Connelly: http://bconnelly.net/2013/10/summarizing-data-in-python-with-pandas/

More complex features

http://www.basketball-reference.com/teams/ORL/2014_roster_status.html

Sports teams change regularly from game to game. What is an easy win for a team can turn into a difficult game if a couple of the best players are injured. You can get the team rosters from basketball-reference as well. For example, the roster for the 2013-2014 season for the Orlando Magic is available at the above link—similar data is available for all NBA teams.

Writing code to integrate how much a team changes, and using that to add new features, can improve the model significantly. This task will take quite a bit of work, though!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset