Home Page Icon
Home Page
Table of Contents for
Mastering pandas
Close
Mastering pandas
by Femi Anthony
Mastering pandas
Mastering pandas
Table of Contents
Mastering pandas
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to pandas and Data Analysis
Motivation for data analysis
We live in a big data world
4 V's of big data
Volume of big data
Velocity of big data
Variety of big data
Veracity of big data
So much data, so little time for analysis
The move towards real-time analytics
How Python and pandas fit into the data analytics mix
What is pandas?
Benefits of using pandas
Summary
2. Installation of pandas and the Supporting Software
Selecting a version of Python to use
Python installation
Linux
Installing Python from compressed tarball
Windows
Core Python installation
Third-party Python software installation
Mac OS X
Installation using a package manager
Installation of Python and pandas from a third-party vendor
Continuum Analytics Anaconda
Installing Anaconda
Linux
Mac OS X
Windows
Final step for all platforms
Other numeric or analytics-focused Python distributions
Downloading and installing pandas
Linux
Ubuntu/Debian
Red Hat
Ubuntu/Debian
Fedora
OpenSuse
Mac
Source installation
Binary installation
Windows
Binary Installation
Source installation
IPython
IPython Notebook
IPython installation
Linux
Windows
Mac OS X
Install via Anaconda (for Linux/Mac OS X)
Wakari by Continuum Analytics
Virtualenv
Virtualenv installation and usage
Summary
3. The pandas Data Structures
NumPy ndarrays
NumPy array creation
NumPy arrays via numpy.array
NumPy array via numpy.arange
NumPy array via numpy.linspace
NumPy array via various other functions
numpy.ones
numpy.zeros
numpy.eye
numpy.diag
numpy.random.rand
numpy.empty
numpy.tile
NumPy datatypes
NumPy indexing and slicing
Array slicing
Array masking
Complex indexing
Copies and views
Operations
Basic operations
Reduction operations
Statistical operators
Logical operators
Broadcasting
Array shape manipulation
Flattening a multi-dimensional array
Reshaping
Resizing
Adding a dimension
Array sorting
Data structures in pandas
Series
Series creation
Using numpy.ndarray
Using Python dictionary
Using scalar values
Operations on Series
Assignment
Slicing
Other operations
DataFrame
DataFrame Creation
Using dictionaries of Series
Using a dictionary of ndarrays/lists
Using a structured array
Using a Series structure
Operations
Selection
Assignment
Deletion
Alignment
Other mathematical operations
Panel
Using 3D NumPy array with axis labels
Using a Python dictionary of DataFrame objects
Using the DataFrame.to_panel method
Other operations
Summary
4. Operations in pandas, Part I – Indexing and Selecting
Basic indexing
Accessing attributes using dot operator
Range slicing
Label, integer, and mixed indexing
Label-oriented indexing
Selection using a Boolean array
Integer-oriented indexing
The .iat and .at operators
Mixed indexing with the .ix operator
MultiIndexing
Swapping and reordering levels
Cross sections
Boolean indexing
The is in and any all methods
Using the where() method
Operations on indexes
Summary
5. Operations in pandas, Part II – Grouping, Merging, and Reshaping of Data
Grouping of data
The groupby operation
Using groupby with a MultiIndex
Using the aggregate method
Applying multiple functions
The transform() method
Filtering
Merging and joining
The concat function
Using append
Appending a single row to a DataFrame
SQL-like merging/joining of DataFrame objects
The join function
Pivots and reshaping data
Stacking and unstacking
The stack() function
Other methods to reshape DataFrames
Using the melt function
The pandas.get_dummies() function
Summary
6. Missing Data, Time Series, and Plotting Using Matplotlib
Handling missing data
Handling missing values
Handling time series
Reading in time series data
DateOffset and TimeDelta objects
Time series-related instance methods
Shifting/lagging
Frequency conversion
Resampling of data
Aliases for Time Series frequencies
Time series concepts and datatypes
Period and PeriodIndex
PeriodIndex
Conversions between Time Series datatypes
A summary of Time Series-related objects
Plotting using matplotlib
Summary
7. A Tour of Statistics – The Classical Approach
Descriptive statistics versus inferential statistics
Measures of central tendency and variability
Measures of central tendency
The mean
The median
The mode
Computing measures of central tendency of a dataset in Python
Measures of variability, dispersion, or spread
Range
Quartile
Deviation and variance
Hypothesis testing – the null and alternative hypotheses
The null and alternative hypotheses
The alpha and p-values
Type I and Type II errors
Statistical hypothesis tests
Background
The z-test
The t-test
Types of t-tests
A t-test example
Confidence intervals
An illustrative example
Correlation and linear regression
Correlation
Linear regression
An illustrative example
Summary
8. A Brief Tour of Bayesian Statistics
Introduction to Bayesian statistics
Mathematical framework for Bayesian statistics
Bayes theory and odds
Applications of Bayesian statistics
Probability distributions
Fitting a distribution
Discrete probability distributions
Discrete uniform distributions
The Bernoulli distribution
The binomial distribution
The Poisson distribution
The Geometric distribution
The negative binomial distribution
Continuous probability distributions
The continuous uniform distribution
The exponential distribution
The normal distribution
Bayesian statistics versus Frequentist statistics
What is probability?
How the model is defined
Confidence (Frequentist) versus Credible (Bayesian) intervals
Conducting Bayesian statistical analysis
Monte Carlo estimation of the likelihood function and PyMC
Bayesian analysis example – Switchpoint detection
References
Summary
9. The pandas Library Architecture
Introduction to pandas' file hierarchy
Description of pandas' modules and files
pandas/core
pandas/io
pandas/tools
pandas/sparse
pandas/stats
pandas/util
pandas/rpy
pandas/tests
pandas/compat
pandas/computation
pandas/tseries
pandas/sandbox
Improving performance using Python extensions
Summary
10. R and pandas Compared
R data types
R lists
R DataFrames
Slicing and selection
R-matrix and NumPy array compared
R lists and pandas series compared
Specifying column name in R
Specifying column name in pandas
R's DataFrames versus pandas' DataFrames
Multicolumn selection in R
Multicolumn selection in pandas
Arithmetic operations on columns
Aggregation and GroupBy
Aggregation in R
The pandas' GroupBy operator
Comparing matching operators in R and pandas
R %in% operator
The pandas isin() function
Logical subsetting
Logical subsetting in R
Logical subsetting in pandas
Split-apply-combine
Implementation in R
Implementation in pandas
Reshaping using melt
The R melt() function
The pandas melt() function
Factors/categorical data
An R example using cut()
The pandas solution
Summary
11. Brief Tour of Machine Learning
Role of pandas in machine learning
Installation of scikit-learn
Installing via Anaconda
Installing on Unix (Linux/Mac OS X)
Installing on Windows
Introduction to machine learning
Supervised versus unsupervised learning
Illustration using document classification
Supervised learning
Unsupervised learning
How machine learning systems learn
Application of machine learning – Kaggle Titanic competition
The titanic: machine learning from disaster problem
The problem of overfitting
Data analysis and preprocessing using pandas
Examining the data
Handling missing values
A naïve approach to Titanic problem
The scikit-learn ML/classifier interface
Supervised learning algorithms
Constructing a model using Patsy for scikit-learn
General boilerplate code explanation
Logistic regression
Support vector machine
Decision trees
Random forest
Unsupervised learning algorithms
Dimensionality reduction
K-means clustering
Summary
Index
Search in book...
Toggle Font Controls
Playlists
Add To
Create new playlist
Name your new playlist
Playlist description (optional)
Cancel
Create playlist
Sign In
Email address
Password
Forgot Password?
Create account
Login
or
Continue with Facebook
Continue with Google
Sign Up
Full Name
Email address
Confirm Email Address
Password
Login
Create account
or
Continue with Facebook
Continue with Google
Prev
Previous Chapter
Table of Contents
Next
Next Chapter
Mastering pandas
Mastering pandas
Add Highlight
No Comment
..................Content has been hidden....................
You can't read the all page of ebook, please click
here
login for view all page.
Day Mode
Cloud Mode
Night Mode
Reset