Preface

A familiar association people make is between forecasting and financial data. In reality, forecasting is used in many industries, leveraging historical data to make future predictions. More specifically, this book is about time series analysis, a process to gain better insight from historical data, capture trends and cyclical patterns, and build a suitable forecasting model.

When working with data that contains observations that change over time and is recorded at specific intervals, you are dealing with time series data. You will find time series data in many domains, and the discipline of time series analysis covers various use cases. For example, time series analysis is used in science (forecasting weather, earthquakes, air quality, or species growth), finance (forecasting stock return, budget, sales, or volatility), government (forecasting inflation, unemployment rates, GDP, or population birth rate), medical (tracking infectious disease transmission, monitoring electrocardiogram or blood glucose, or forecasting healthcare costs), engineering (predictive maintenance, production decline analysis, or traffic volume forecasting), business (inventory management, product demand planning, resource planning), and much more. Pretty much, time series data is all around us, and you will most definitely be encountering such data.

By picking this book, you are looking for practical recipes that you can apply and use – less on theory and more on the practical. The book will take you through the complete journey of time series analysis, covering the end-to-end process, including acquiring and ingesting various types of time series data, exploring the data, transforming and manipulating the data, and training models to use in forecasting.

The book covers concepts, techniques, and algorithms commonly used and more advanced and recent approaches with practical use. For example, you will learn to train and validate different models covering statistical methods, machine learning algorithms, and various deep learning architectures for forecasting and outlier (or anomaly) detection. Most importantly, the variety of datasets used in this book will give you a better insight into how these different models work and how you can pick the most appropriate approach to solve your specific problem.

Who this book is for

This book is for data analysts, business analysts, data scientists, data engineers, or Python developers who want practical Python recipes for time series analysis and forecasting techniques. Fundamental knowledge of Python programming is required. Although having a basic math and statistics background will be beneficial, it is not necessary. Prior experience working with time series data to solve business problems will also help you to better utilize and apply the different recipes in this book.

What this book covers

Chapter 1, Getting Started with Time Series Analysis, is a general introduction to Python development best practices. You will learn different techniques to create and manage virtual environments, install and manage Python packages, manage dependencies, and finally, how to install and manage Jupyter extensions.

Chapter 2, Reading Time Series Data from Files, is an introduction to time series data. This chapter shows you how to read data from various and commonly used file types, whether stored locally or on the cloud. The recipes will highlight advanced options for ingesting, preparing, and transforming data into a time series DataFrame for later analysis.

Chapter 3, Reading Time Series Data from Databases, picks up from Chapter 2, Reading Time Series Data from Files, and focuses on reading data from various database systems, including relational (PostgreSQL and MySQL) and non-relational (MongoDB and InfluxDB), whether on-premises or a cloud service (Amazon Redshift and Snowflake). The recipes will highlight different methods and techniques to offer flexibility on how data can be ingested, prepared, and transformed into a time series DataFrame for later analysis.

Chapter 4, Persisting Time Series Data to Files, covers different options and use cases to store time series data for later retrieval. The techniques will cover various methods and file types, whether on-premises or in the cloud. In addition, this chapter covers serialization, compression, overwriting, or appending to files.

Chapter 5, Persisting Time Series Data to Databases, builds on Chapter 4, Persisting Time Series Data to Files, focusing on writing data for scale. This covers different techniques for writing data to relational and non-relational database systems like those discussed in Chapter 3, Reading Time Series Data from Databases, including on-premises and cloud services.

Chapter 6, Working with Date and Time in Python, takes a practical and intuitive approach to an intimidating topic. You will learn how to deal with the complexity of dates and time in your time series data. The chapter illustrates practical use cases for handling time zones, custom holidays, and business days, working with Unix epoch and UTC. Typically, this intimidating topic is presented in a fun and practical way that you will find helpful to apply right away.

Chapter 7, Handling Missing Data, explores different methods for identifying and handling missing data. You will learn different imputation and interpolation techniques. The chapter starts with simple statistical methods for univariate imputation and then explores various univariate interpolation algorithms for more advanced multivariate imputation.

Chapter 8, Outlier Detection Using Statistical Methods, covers statistical methods for outlier and anomaly detection. These practical yet straightforward techniques are easy to interpret and implement. The chapter uses data from the Numenta Anomaly Benchmark (NAB) to evaluate different anomaly detection algorithms.

Chapter 9, Exploratory Data Analysis and Diagnosis, dives into visualization techniques for effective Exploratory Data Analysis (EDA) with interactive visualizations. You will learn how to investigate and diagnose your time series data to test for specific assumptions such as stationarity and autocorrelation. Finally, the chapter covers practical recipes for transforming your time series data using a family of power transforms, decomposition, and differencing methods.

Chapter 10, Building Univariate Time Series Models Using Statistical Methods, kick offs the journey into modeling and forecasting time series. The chapter intuitively explains what autocorrelation function (ACF) and partial autocorrelation function (PACF) plots are and how they are used, and then moves in to training, diagnosing, and comparing different models, including exponential smoothing, autoregressive integrated moving average (ARIMA), and seasonal ARIMA (SARIMA). Additionally, this chapter introduces grid search and hyperparameter tuning.

Chapter 11, Additional Statistical Modeling Techniques for Time Series, picks up from Chapter 10, Building Univariate Time Series Models Using Statistical Methods, diving into more advanced and practical models, such as vector autoregressive (VAR) for multivariate time series, generalized autoregressive conditional heteroskedasticity (GARCH) for forecasting volatility, and an introduction to the Prophet algorithm and library.

Chapter 12, Forecasting Using Supervised Machine Learning, will take you from classical time series forecasting techniques to more advanced machine learning algorithms. The chapter shows how time series data can be transformed appropriately to be suitable for supervised machine learning. In addition, you will explore a variety of machine learning algorithms and implement multi-step forecasting, using both scikit-learn and sktime.

Chapter 13, Deep Learning for Time Series Forecasting, covers more advanced deep learning architectures using TensorFlow/Keras and PyTorch. The chapter starts with a high-level API (Keras) and then dives into more complex implementations, using a lower-level API (PyTorch).

Chapter 14, Outlier Detection Using Unsupervised Machine Learning, continues from Chapter 8, Outlier Detection Using Statistical Methods, but focuses on more advanced unsupervised machine learning methods. You will use the same datasets from the NAB to allow you to compare statistical and machine learning techniques using the same benchmark data. The techniques cover a variety of machine learning algorithms.

Chapter 15, Advanced Techniques for Complex Time Series, will introduce more complex time series data that contains multiple seasonal patterns. The chapter includes how such time series data can be decomposed and explores different modeling techniques, including state-space models.

To get the most out of this book

You should be comfortable coding in Python, with some familiarity with Matplotlib, NumPy, and pandas. The book covers a wide variety of libraries, and the first chapter will show you how to create different virtual environments for Python development. Working knowledge of the Python programming language will assist with understanding the key concepts covered in this book. It is recommended, but not required, to install either Anaconda, Miniconda, or Miniforge. Throughout the chapters, you will see instructions using either pip or Conda.

Alternatively, you can use Colab, and all you need is a browser.

Software/hardware covered in the book

Operating system requirements

Python 3.8/3.9+

Windows, macOS, or Linux

JupyterLab or the Jupyter Notebook

Windows, macOS, or Linux

In Chapter 3, Reading Time Series Data from Databases, and Chapter 5, Persisting Time Series Data to Databases, you will be working with different databases, including PostgreSQL, MySQL, InfluxDB, and MongoDB. If you do not have access to such databases, you can install them locally on your machine or use Docker and download the appropriate image using docker pull to download images from Docker Hub https://hub.docker.com – for example, docker pull influxdb to download InfluxDB. You can download Docker from the official page here: https://docs.docker.com/get-docker/.

Alternatively, you can explore hosted services such as Aiven https://aiven.io, which offers a 30-day trial and supports PostgreSQL, MySQL, and InfluxDB. For the recipes using AWS Redshift and Snowflake, you will need to have a subscription. You can subscribe to the AWS free tier here: https://aws.amazon.com/free. You can subscribe for a 30-day Snowflake trial here: https://signup.snowflake.com.

Similarly, in Chapter 2, Reading Time Series Data from Files, and Chapter 4, Persisting Time Series Data to Files, you will learn how to read and write data to AWS S3 buckets. This will require an AWS service subscription and should be covered under the free tier. For a list of all services covered under the free tier, you can visit the official page here: https://aws.amazon.com/free.

If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.

To get the most value out of this book, it is important that you continue to experiment with the recipes further using different time series data. Throughout the recipes, you will see a recurring theme in which multiple time series datasets are used. This is done deliberately so that you can observe how the results vary on different data. You are encouraged to continue with that theme on your own.

If you are looking for additional datasets, in addition to those provided in the GitHub repository, you can check out some of the following links:

Download the example code files

You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Time-Series-Analysis-with-Python-Cookbook. If there's an update to code, it will be updated in the GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Code in Action

The Code in Action videos for this book can be viewed at https://bit.ly/3xDwOG1.

Conventions used

There are a number of text conventions used throughout this book.

Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

file = Path("../../datasets/Ch8/nyc_taxi.csv")
nyc_taxi = pd.read_csv(folder / file, 
                     index_col='timestamp', 
                     parse_dates=True)
nyc_taxi.index.freq = '30T'

Any command-line input or output is written as follows:

conda install -c conda-forge pyod

Bold: Indicates a new term, an important word, or words that you see on screen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "Select System info from the Administration panel."

Tips or Important Notes

Appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows.

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Share Your Thoughts

Once you've read Time Series Analysis with Python Cookbook, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset