0%

Book Description

Understand, explore, and effectively present data using the powerful data visualization techniques of Python

Key Features

  • Use the power of Pandas and Matplotlib to easily solve data mining issues
  • Understand the basics of statistics to build powerful predictive data models
  • Grasp data mining concepts with helpful use-cases and examples

Book Description

Data mining, or parsing the data to extract useful insights, is a niche skill that can transform your career as a data scientist Python is a flexible programming language that is equipped with a strong suite of libraries and toolkits, and gives you the perfect platform to sift through your data and mine the insights you seek. This Learning Path is designed to familiarize you with the Python libraries and the underlying statistics that you need to get comfortable with data mining.

You will learn how to use Pandas, Python's popular library to analyze different kinds of data, and leverage the power of Matplotlib to generate appealing and impressive visualizations for the insights you have derived. You will also explore different machine learning techniques and statistics that enable you to build powerful predictive models.

By the end of this Learning Path, you will have the perfect foundation to take your data mining skills to the next level and set yourself on the path to become a sought-after data science professional.

This Learning Path includes content from the following Packt products:

  • Statistics for Machine Learning by Pratap Dangeti
  • Matplotlib 2.x By Example by Allen Yu, Claire Chung, Aldrin Yim
  • Pandas Cookbook by Theodore Petrou

What you will learn

  • Understand the statistical fundamentals to build data models
  • Split data into independent groups
  • Apply aggregations and transformations to each group
  • Create impressive data visualizations
  • Prepare your data and design models
  • Clean up data to ease data analysis and visualization
  • Create insightful visualizations with Matplotlib and Seaborn
  • Customize the model to suit your own predictive goals

Who this book is for

If you want to learn how to use the many libraries of Python to extract impactful information from your data and present it as engaging visuals, then this is the ideal Learning Path for you. Some basic knowledge of Python is enough to get started with this Learning Path.

Table of Contents

  1. Title Page
  2. Copyright
    1. Numerical Computing with Python
  3. Contributors
    1. About the authors
    2. About the reviewers
    3. Packt is searching for authors like you
  4. About Packt
    1. Why subscribe?
    2. Packt.com
  5. Preface
    1. Who this book is for
    2. What this book covers
    3. To get the most out of this book
      1. Download the example code files
      2. Conventions used
    4. Get in touch
      1. Reviews
  6. Journey from Statistics to Machine Learning
    1. Statistical terminology for model building and validation
      1. Machine learning
      2. Statistical fundamentals and terminology for model building and validation
      3. Bias versus variance trade-off
      4. Train and test data
    2. Summary
  7. Tree-Based Machine Learning Models
    1. Introducing decision tree classifiers
      1. Terminology used in decision trees
      2. Decision tree working methodology from first principles
    2. Comparison between logistic regression and decision trees
    3. Comparison of error components across various styles of models
    4. Remedial actions to push the model towards the ideal region
    5. HR attrition data example
    6. Decision tree classifier
    7. Tuning class weights in decision tree classifier
    8. Bagging classifier
    9. Random forest classifier
    10. Random forest classifier - grid search
    11. AdaBoost classifier
    12. Gradient boosting classifier
    13. Comparison between AdaBoosting versus gradient boosting
    14. Extreme gradient boosting - XGBoost classifier
    15. Ensemble of ensembles - model stacking
    16. Ensemble of ensembles with different types of classifiers
    17. Ensemble of ensembles with bootstrap samples using a single type of classifier
    18. Summary
  8. K-Nearest Neighbors and Naive Bayes
    1. K-nearest neighbors
      1. KNN voter example
      2. Curse of dimensionality
        1. Curse of dimensionality with 1D, 2D, and 3D example
    2. KNN classifier with breast cancer Wisconsin data example
    3. Tuning of k-value in KNN classifier
    4. Naive Bayes
    5. Probability fundamentals
      1. Joint probability
    6. Understanding Bayes theorem with conditional probability
    7. Naive Bayes classification
    8. Laplace estimator
    9. Naive Bayes SMS spam classification example
    10. Summary
  9. Unsupervised Learning
    1. K-means clustering
      1. K-means working methodology from first principles
      2. Optimal number of clusters and cluster evaluation
        1. The elbow method
      3. K-means clustering with the iris data example
    2. Principal Component Analysis - PCA
      1. PCA working methodology from first principles
      2. PCA applied on handwritten digits using scikit-learn
    3. Singular value decomposition - SVD
      1. SVD applied on handwritten digits using scikit-learn
    4. Deep auto encoders
    5. Model building technique using encoder-decoder architecture
    6. Deep auto encoders applied on handwritten digits using Keras
    7. Summary
  10. Reinforcement Learning
    1. Reinforcement learning basics
      1. Category 1 - value based 
      2. Category 2 - policy based 
      3. Category 3 - actor-critic
      4. Category 4 - model-free
      5. Category 5 - model-based
      6. Fundamental categories in sequential decision making
    2. Markov decision processes and Bellman equations
    3. Dynamic programming
      1. Algorithms to compute optimal policy using dynamic programming
    4. Grid world example using value and policy iteration algorithms with basic Python
    5. Monte Carlo methods
      1. Monte Carlo prediction
      2. The suitability of Monte Carlo prediction on grid-world problems
      3. Modeling Blackjack example of Monte Carlo methods using Python
    6. Temporal difference learning
      1. TD prediction
      2. Driving office example for TD learning
    7. SARSA on-policy TD control
    8. Q-learning - off-policy TD control
    9. Cliff walking example of on-policy and off-policy of TD control
    10. Further reading
    11. Summary
  11. Hello Plotting World!
    1. Hello Matplotlib!
      1. What is Matplotlib?
      2. What's new in Matplotlib 2.0?
        1. Changes to the default style
          1. Color cycle
          2. Colormap
          3. Scatter plot
          4. Legend
          5.  Line style
          6. Patch edges and color
          7. Fonts
        2. Improved functionality or performance
          1. Improved color conversion API and RGBA support
          2. Improved image support
          3. Faster text rendering
          4. Change in the default animation codec
        3. Changes in settings
          1. New configuration parameters (rcParams)
          2. Style parameter blacklist
          3. Change in Axes property keywords
    2. Plotting our first graph
      1. Loading data for plotting
        1. Data structures
          1. List
          2. Numpy array
          3. pandas dataframe
        2. Loading data from files
          1. The basic Python way
          2. The Numpy way
          3. The pandas way
      2. Importing the Matplotlib pyplot module
      3. Plotting a curve
      4. Viewing the figure
      5. Saving the figure
        1. Setting the output format
          1. PNG (Portable Network Graphics)
          2. PDF (Portable Document Format)
          3. SVG (Scalable Vector Graphics)
          4. Post (Postscript)
        2. Adjusting the resolution
    3. Summary
  12. Visualizing Online Data
    1. Typical API data formats
      1. CSV
      2. JSON
      3. XML
    2. Introducing pandas
      1. Importing online population data in the CSV format
      2. Importing online financial data in the JSON format
    3. Visualizing the trend of data
      1. Area chart and stacked area chart
    4. Introducing Seaborn
    5. Visualizing univariate distribution
      1. Bar chart in Seaborn
      2. Histogram and distribution fitting in Seaborn
    6. Visualizing a bivariate distribution
      1. Scatter plot in Seaborn
    7. Visualizing categorical data
      1. Categorical scatter plot
      2. Strip plot and swarm plot
      3. Box plot and violin plot
    8. Controlling Seaborn figure aesthetics
      1. Preset themes
      2. Removing spines from the figure
      3. Changing the size of the figure
      4. Fine-tuning the style of the figure
      5. More about colors
      6. Color scheme and color palettes
    9. Summary
  13. Visualizing Multivariate Data
    1. Getting End-of-Day (EOD) stock data from Quandl
      1. Grouping the companies by industry
      2. Converting the date to a supported format
      3. Getting the percentage change of the closing price
    2. Two-dimensional faceted plots
      1. Factor plot in Seaborn
      2. Faceted grid in Seaborn
      3. Pair plot in Seaborn
    3. Other two-dimensional multivariate plots
      1. Heatmap in Seaborn
      2. Candlestick plot in matplotlib.finance
        1. Visualizing various stock market indicators
      3. Building a comprehensive stock chart
    4. Three-dimensional (3D) plots
      1. 3D scatter plot
      2. 3D bar chart
      3. Caveats of Matplotlib 3D
    5. Summary
  14. Adding Interactivity and Animating Plots
    1. Scraping information from websites
    2. Non-interactive backends
    3. Interactive backends
      1. Tkinter-based backend 
      2. Interactive backend for Jupyter Notebook 
      3. Plot.ly-based backend
    4. Creating animated plots
      1. Installation of FFmpeg
      2. Creating animations
    5. Summary
  15. Selecting Subsets of Data
    1. Selecting Series data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    2. Selecting DataFrame rows
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Selecting DataFrame rows and columns simultaneously
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Selecting data with both integers and labels
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Speeding up scalar selection
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Slicing rows lazily
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Slicing lexicographically
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
  16. Boolean Indexing
    1. Calculating boolean statistics
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    2. Constructing multiple boolean conditions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Filtering with boolean indexing
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Replicating boolean indexing with index selection
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Selecting with unique and sorted indexes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Gaining perspective on stock prices
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Translating SQL WHERE clauses
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Determining the normality of stock market returns
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    9. Improving readability of boolean indexing with the query method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Preserving Series with the where method
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    11. Masking DataFrame rows
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    12. Selecting with booleans, integer location, and labels
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  17. Index Alignment
    1. Examining the Index object
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    2. Producing Cartesian products
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Exploding indexes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Filling values with unequal indexes
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Appending columns from different DataFrames
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    6. Highlighting the maximum value from each column
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Replicating idxmax with method chaining
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Finding the most common maximum
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
  18. Grouping for Aggregation, Filtration, and Transformation
    1. Defining an aggregation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    2. Grouping and aggregating with multiple columns and functions
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Removing the MultiIndex after grouping
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    4. Customizing an aggregation function
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    5. Customizing aggregating functions with *args and **kwargs
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Examining the groupby object
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    7. Filtering for states with a minority majority
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    8. Transforming through a weight loss bet
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    9. Calculating weighted mean SAT scores per state with apply
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Grouping by continuous variables
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    11. Counting the total number of flights between cities
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    12. Finding the longest streak of on-time flights
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  19. Restructuring Data into a Tidy Form
    1. Tidying variable values as column names with stack
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    2. Tidying variable values as column names with melt
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    3. Stacking multiple groups of variables simultaneously
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Inverting stacked data
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Unstacking after a groupby aggregation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    6. Replicating pivot_table with a groupby aggregation
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    7. Renaming axis levels for easy reshaping
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    8. Tidying when multiple variables are stored as column names
      1. Getting ready...
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    9. Tidying when multiple variables are stored as column values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    10. Tidying when two or more values are stored in the same cell
      1. Getting ready...
      2. How to do it...
      3. How it works...
      4. There's more...
    11. Tidying when variables are stored in column names and values
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    12. Tidying when multiple observational units are stored in the same table
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  20. Combining Pandas Objects
    1. Appending new rows to DataFrames
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    2. Concatenating multiple DataFrames together
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
    3. Comparing President Trump's and Obama's approval ratings
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    4. Understanding the differences between concat, join, and merge
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
    5. Connecting to SQL databases
      1. Getting ready
      2. How to do it...
      3. How it works...
      4. There's more...
      5. See also
  21. Other Books You May Enjoy
    1. Leave a review - let other readers know what you think