In this section, we provide brief descriptions of the various submodules and files that make up pandas' library.
This module contains the core submodules of pandas. They are discussed as follows:
api.py
: This imports some key modules for later use.array.py
: This isolates pandas' exposure to numPy, that is, all direct numPy usage.base.py
: This defines fundamental classes, such as StringMixin
, PandasObject
which is the base class for various pandas objects such as Period
, PandasSQLTable
, sparse.array.SparseArray/SparseList
, internals.Block
, internals.BlockManager
, generic.NDFrame
, groupby.GroupBy
, base.FrozenList
, base.FrozenNDArray
, io.sql.PandasSQL
, io.sql.PandasSQLTable
, tseries.period.Period
, FrozenList
, FrozenNDArray
: IndexOpsMixin
, and DatetimeIndexOpsMixin
.common.py
: This defines common utility methods for handling data structures. For example isnull
object detects missing values.config.py
: This is the module for handling package-wide configurable objects. It defines the following classes: OptionError
, DictWrapper
, CallableDynamicDoc
, option_context
, config_init
.datetools.py
: This is a collection of functions that deal with dates in Python.frame.py
: This defines pandas' DataFrame class and its various methods. DataFrame inherits from NDFrame. (see below).generic.py
: This defines the generic NDFrame base class, which is a base class for pandas' DataFrame, Series, and Panel classes. NDFrame is derived from PandasObject, which is defined in base.py
. An NDFrame can be regarded as an N-dimensional version of a pandas' DataFrame. For more information on this, go to http://nullege.com/codes/search/pandas.core.generic.NDFrame.categorical.py
: This defines Categorical, which is a class that derives from PandasObject and represents categorical variables a la R/S-plus. (we will expand your knowledge on this a bit more later).format.py
: This defines a whole host of Formatter classes such as CategoricalFormatter
, SeriesFormatter
, TableFormatter
, DataFrameFormatter
, HTMLFormatter
, CSVFormatter
, ExcelCell
, ExcelFormatter
, GenericArrayFormatter
, FloatArrayFormatter
, IntArrayFormatter
, Datetime64Formatter
, Timedelta64Formatter
, and EngFormatter
.groupby.py
: This defines various classes that enable the groupby
functionality. They are discussed as follows:ops.py
: This defines an internal API for arithmetic operations on PandasObjects. It defines functions that add arithmetic methods to objects. It defines a _create_methods
meta method, which is used to create other methods using arithmetic, comparison, and Boolean method constructors. The add_methods
method takes a list of new methods, adds them to the existing list of methods, and binds them to their appropriate classes. The add_special_arithmetic_methods
and add_flex_arithmetic_methods
methods call _create_methods
and add_methods
to add arithmetic methods to a class.It also defines the _TimeOp
class, which is a wrapper for datetime-related arithmetic operations. It contains Wrapper
functions for arithmetic, comparison, and Boolean operations on Series, DataFrame and Panel functions—_arith_method_SERIES(..)
, _comp_method_SERIES(..)
, _bool_method_SERIES(..)
, _flex_method_SERIES(..)
, _arith_method_FRAME(..)
, _comp_method_FRAME(..)
, _flex_comp_method_FRAME(..)
, _arith_method_PANEL(..)
, _comp_method_PANEL(..)
.
index.py
: This defines the Index class and its related functionality. Index is used by all pandas' objects—Series, DataFrame, and Panel—to store axis labels. Underneath it is an immutable array that provides an ordered set that can be sliced.internals.py
: This defines multiple object classes. These are listed as follows:Block
: This is a homogeneously typed N-dimensional numpy.ndarray
object with additional functionality for pandas. For example, it uses __slots__
to restrict the attributes of the object to 'ndim', 'values', and '_mgr_locs'. It acts as the base class for other Block
subclasses.NumericBlock
: This is the base class for Blocks
with the numeric type.FloatOrComplexBlock
: This is base class for FloatBlock
and ComplexBlock
that inherits from NumericBlock
ComplexBlock
: This is the class that handles the Block
objects with the complex type.FloatBlock
: This is the class that handles the Block
objects with the float type.IntBlock
: This is the class that handles the Block
objects with the integer type.TimeDeltaBlock
, BoolBlock
, and DatetimeBlock
: These are the Block
classes for timedelta, Boolean, and datetime.ObjectBlock
: This is the class that handles Block
objects for user-defined objects.SparseBlock
: This is the class that handles sparse arrays of the same type.BlockManager
: This is the class that manages a set of Block
objects. It is not a public API class.SingleBlockManager
: This is the class that manages a single Block
.JoinUnit
: This is the utility class for Block
objects.matrix.py
: This imports DataFrame as DataMatrix
.nanops.py
: These are the classes and functionality for handling NaN values.ops.py
: This defines arithmetic operations for pandas' objects. It is not a public API.panel.py
, panel4d.py
, and panelnd.py
: These provide the functionality for the pandas' Panel object.series.py
: This defines the pandas Series class and its various methods that Series inherits from NDFrame and IndexOpsMixin.sparse.py
: This defines import for handling sparse data structures. Sparse data structures are compressed whereby data points matching NaN or missing values are omitted. For more information on this, go to http://pandas.pydata.org/pandas-docs/stable/sparse.html.strings.py
: These have various functions for handling strings.This module contains various modules for data I/O. These are discussed as follows:
api.py
: This defines various imports for the data I/O API.auth.py
: This defines various methods dealing with authentication.common.py
: This defines the common functionality for I/O API.data.py
: This defines classes and methods for handling data. The DataReader
method reads data from various online sources such as Yahoo and Google.date_converters.py
: This defines date conversion functions.excel.py
: This module parses and converts Excel data. This defines ExcelFile
and ExcelWriter
classes.ga.py
: This is the module for the Google Analytics functionality.gbq.py
: This is the module for Google's BigQuery.html.py
: This is the module for dealing with HTML I/O.json.py
: This is the module for dealing with json I/O in pandas. This defines the Writer
, SeriesWriter
, FrameWriter
, Parser
, SeriesParser
, and FrameParser
classes.packer.py
: This is a msgpack serializer support for reading and writing pandas data structures to disk.parsers.py
: This is the module that defines various functions and classes that are used in parsing and processing files to create pandas' DataFrames. All the three read_*
functions discussed as follows have multiple configurable options for reading. See this reference for more details: http://bit.ly/1e4Xqo1.read_csv(..)
: This defines the pandas.read_csv()
function that is useful to read the contents of a CSV file into a DataFrame.read_table(..)
: This reads a tab-separated table file into a DataFrame.read_fwf(..)
: This reads a fixed-width format file into a DataFrame.TextFileReader
: This is the class that is used for reading text files.ParserBase
: This is the base class for parser objects.CParserWrapper
, PythonParser
: These are the parser for C and Python respectively. They both inherit from ParserBase
.FixedWidthReader
: This is the class for reading fixed-width data. A fixed-width data file contains fields in specific positions within the file.FixedWithFieldParser
: This is the class for parsing fixed-width fields that have been inherited from PythonParser
.pickle.py
: This provides methods for pickling (serializing) pandas objects. These are discussed as follows:pytables.py
: This is an interface to PyTables
module for reading and writing pandas data structures to files on disk.sql.py
: It is a collection of classes and functions used to enable the retrieval of data from relational databases that attempts to be database agnostic. These are discussed as follows:PandasSQL
: This is the base class for interfacing pandas with SQL. It provides dummy read_sql
and to_sql
methods that must be implemented by subclasses.PandasSQLAlchemy
: This is the subclass of PandasSQL
that enables conversions between DataFrame and SQL databases using SQLAlchemy
.PandasSQLTable
class: This maps pandas tables (DataFrame) to SQL tables.pandasSQL_builder(..)
: This returns the correct PandasSQL
subclass based on the provided parameters.PandasSQLTableLegacy
class: This is the legacy support version of PandasSQLTable
.PandasSQLLegacy
class: This is the legacy support version of PandasSQLTable
.get_schema(..)
: This gets the SQL database table schema for a given frame.read_sql_table(..)
: This reads SQL db table into a DataFrame.read_sql_query(..)
: This reads SQL query into a DataFrame.read_sql(..)
: This reads SQL query/table into a DataFrame.to_sql(..)
: This write records that are stored in a DataFrame to a SQL database.stata.py
: This contains tools for processing Stata
files into pandas DataFrames.wb.py
: This is the module for downloading data from World Bank's website.util.py
: This has miscellaneous util
functions defined such as match(..)
, cartesian_product(..)
, and compose(..)
.tile.py
: This has a set of functions that enable quantization of input data and hence tile
functionality. Most of the functions are internal, except for cut(..)
and qcut(..)
.rplot.py
: This is the module that provides the functionality to generate trellis plots in pandas.plotting.py
: This provides a set of plotting functions that take a Series or DataFrame as an argument.scatter_matrix(..)
: This draws a matrix of scatter plotsandrews_curves(..)
: This plots multivariate data as curves that are created using samples as coefficients for a Fourier seriesparallel_coordinates(..)
: This is a plotting technique that allows you to see clusters in data and visually estimate statisticslag_plot(..)
: This is used to check whether a dataset or a time series is randomautocorrelation_plot(..)
: This is used for checking randomness in a time seriesbootstrap_plot(..)
: This plot is used to determine the uncertainty of a statistical measure such as mean or median in a visual mannerradviz(..)
: This plot is used to visualize multivariate dataReference for the preceding information is from: http://pandas.pydata.org/pandas-docs/stable/visualization.html
pivot.py
: This function is for handling pivot tables in pandas. It is the main function pandas.tools.pivot_table(..)
which creates a spreadsheet-like pivot table as a DataFrameReference for the preceding information is from: http://pandas.pydata.org/pandas-docs/stable/reshaping.html
merge.py
: This provides functions for combining the Series, DataFrame, and Panel objects such as merge(..)
and concat(..)
describe.py
: This provides a single value_range(..)
function that returns the maximum and minimum of a DataFrame as a Series.This is the module that provides sparse implementations of Series, DataFrame, and Panel. By sparse, we mean arrays where values such as missing or NA are omitted rather than kept as 0.
For more information on this, go to http://pandas.pydata.org/pandas-docs/version/stable/sparse.html.
api.py
: It is a set of convenience importsarray.py
: It is an implementation of the SparseArray data structureframe.py
: It is an implementation of the SparseDataFrame data structurelist.py
: It is an implementation of the SparseList data structurepanel.py
: It is an implementation of the SparsePanel data structureseries.py
: It is an implementation of the SparseSeries data structureapi.py
: This is a set of convenience imports.common.py
: This defines internal functions called by other functions in a module.fama_macbeth.py
: This contains class definitions and functions for the Fama-Macbeth regression. For more information on FM regression, go to http://en.wikipedia.org/wiki/Fama-MacBeth_regression.interface.py
: It defines ols(..)
which returns an Ordinary Least Squares (OLS) regression object. It imports from pandas.stats.ols
module.math.py
: This has useful functions defined as follows:misc.py
: This is used for miscellaneous functions.moments.py
: This provides rolling and expanding statistical measures including moments that are implemented in Cython. These methods include: rolling_count(..)
, rolling_cov(..)
, rolling_corr(..)
, rolling_corr_pairwise(..)
, rolling_quantile(..)
, rolling_apply(..)
, rolling_window(..)
, expanding_count(..)
, expanding_quantile(..)
, expanding_cov(..)
, expanding_corr(..)
, expanding_corr_pairwise(..)
, expanding_apply(..)
, ewma(..)
, ewmvar(..)
, ewmstd(..)
, ewmcov(..)
, and ewmcorr(..)
.ols.py
: This implements OLS and provides the OLS and MovingOLS
classes. OLS runs a full sample Ordinary Least-Squares Regression, whereas MovingOLS
generates a rolling or an expanding simple OLS.plm.py
: This provides linear regression objects for Panel data. These classes are discussed as follows:var.py
: This provides vector auto-regression classes discussed as follows:VAR
: This is the vector auto-regression on multi-variate data in Series and DataFramesPanelVAR
: This is the vector auto-regression on multi-variate data in Panel objectsFor more information on vector autoregression, go to: http://en.wikipedia.org/wiki/Vector_autoregression
testing.py
: This provides the assertion, debug, unit test, and other classes/functions for use in testing. It contains many special assert functions that make it easier to check whether Series, DataFrame, or Panel objects are equivalent. Some of these functions include assert_equal(..)
, assert_series_equal(..)
, assert_frame_equal(..)
, and assert_panelnd_equal(..)
. The pandas.util.testing
module is especially useful to the contributors of the pandas code base. It defines a util.TestCase
class. It also provides utilities for handling locales, console debugging, file cleanup, comparators, and so on for testing by potential code base contributors.terminal.py
: This function is mostly internal and has to do with obtaining certain specific details about the terminal. The single exposed function is get_terminal_size()
.print_versions.py
: This defines the get_sys_info()
function that returns a dictionary of systems information, and the show_versions(..)
function that displays the versions of available Python libraries.misc.py
: This defines a couple of miscellaneous utilities.decorators.py
: This defines some decorator functions and classes.The Substitution and Appender classes are decorators that perform substitution and appending on function docstrings
and for more information on Python decorators, go to http://bit.ly/1zj8U0o.
clipboard.py
: This contains cross-platform clipboard methods to enable the copy and paste functions from the keyboard. The pandas I/O API include functions such as pandas.read_clipboard()
and pandas.to_clipboard(..)
.This module attempts to provide an interface to the R statistical package if it is installed in the machine. It is deprecated in Version 0.16.0 and later. It's functionality is replaced by the rpy2
module that can be accessed from http://rpy.sourceforge.net.
This is the module that provides many tests for various objects in pandas. The names of the specific library files are fairly self-explanatory, and I will not go into further details here, except inviting the reader to explore this.
The functionality related to compatibility are explained as follows:
chainmap.py
, chainmap_impl.py
: This provides a ChainMap
class that can group multiple dicts
or mappings, in order to produce a single view that can be updatedpickle_compat.py
: This provides functionality for pickling pandas objects in the versions that are earlier than 0.12openpyxl_compat.py
: This checks the compatibility of openpyxl
This is the module that provides functionality for computation and is discussed as follows:
api.py
: This contains imports for eval
and expr
.align.py
: This implements functions for data alignment.common.py
: This contains a couple of internal functions.engines.py
: This defines Abstract Engine, NumExprEngine
, and PythonEngine
. PythonEngine
evaluates an expression and is used mainly for testing purposes.eval.py
: This defines the all-important eval(..)
function and also a few other important functions.expressions.py
: This provides fast expression evaluation through numexpr
. The numexpr
function is used to accelerate certain numerical operations. It uses multiple cores as well as smart chunking and caching speedups. It defines the evaluate(..)
and where(..)
methods.ops.py
: This defines the operator classes used by eval
. These are Term
, Constant
, Op
, BinOp
, Div
, and UnaryOp
.pytables.py
: This provides a query interface for the PyTables
query.scope.py
: This is a module for scope operations. It defines a Scope
class, which is an object to hold scope.For more information on numexpr
, go to https://code.google.com/p/numexpr/. For information of the usage of this module, go to http://pandas.pydata.org/pandas-docs/stable/computation.html.
api.py
: This is a set of convenience importsconverter.py
: This defines a set of classes that are used to format and convert datetime-related objects. Upon import, pandas registers a set of unit converters with matplotlib
.register()
function explained as follows:In [1]: import matplotlib.units as munits In [2]: munits.registry Out[2]: {} In [3]: import pandas In [4]: munits.registry Out[4]: {pandas.tslib.Timestamp: <pandas.tseries.converter.DatetimeConverter instance at 0x7fbbc4db17e8>, pandas.tseries.period.Period: <pandas.tseries.converter.PeriodConverter instance at 0x7fbbc4dc25f0>, datetime.date: <pandas.tseries.converter.DatetimeConverter instance at 0x7fbbc4dc2fc8>, datetime.datetime: <pandas.tseries.converter.DatetimeConverter instance at 0x7fbbc4dc2a70>, datetime.time: <pandas.tseries.converter.TimeConverter instance at 0x7fbbc4d61e18>}
Converter
: This class includes TimeConverter
, PeriodConverter
, and DateTimeConverter
Formatters
: This class includes TimeFormatter
, PandasAutoDateFormatter
, and TimeSeries_DateFormatter
Locators
: This class includes PandasAutoDateLocator
, MilliSecondLocator
, and TimeSeries_DateLocator
frequencies.py
: This defines the code for specifying frequencies—daily, weekly, quarterly, monthly, annual, and so on—of time series objects.holiday.py
: This defines functions and classes for handling holidays— Holiday
, AbstractHolidayCalendar
, and USFederalHolidayCalendar
are among the classes defined.index.py
: This defines the DateTimeIndex
class.interval.py
: This defines the Interval
, PeriodInterval
, and IntervalIndex
classes.offsets.py
: This defines various classes including Offsets that deal with time-related periods. These are explained as follows:DateOffset
: This is an interface for classes that provide the time period functionality such as Week
, WeekOfMonth
, LastWeekOfMonth
, QuarterOffset
, YearOffset
, Easter
, FY5253
, and FY5253Quarter
.BusinessMixin
: This is the mixin
class for business objects to provide functionality with time-related classes. This will be inherited by the BusinessDay
class. The BusinessDay
subclass is derived from BusinessMixin
and SingleConstructorOffset
and provides an offset in business days.MonthOffset
: This is the interface for classes that provide the functionality for month time periods such as MonthEnd
, MonthBegin
, BusinessMonthEnd
, and BusinessMonthBegin
.MonthEnd
and MonthBegin
: This is the date offset of one month at the end or the beginning of a month.BusinessMonthEnd
and BusinessMonthBegin
: This is the date offset of one month at the end or the beginning of a business day calendar.YearOffset
: This offset is subclassed by classes that provide year period functionality—YearEnd
, YearBegin
, BYearEnd
, BYearBegin
YearEnd
and YearBegin
: This is the date offset of one year at the end or the beginning of a year.BYearEnd
and BYearBegin
: This is the date offset of one year at the end or the beginning of a business day calendar.Week
: This provides the offset of 1 week.WeekDay
: This provides mapping from weekday (Tue) to day of week (=2).WeekOfMonth
and LastWeekOfMonth
: This describes dates in a week of a monthQuarterOffset
: This is subclassed by classes that provide quarterly period functionality—QuarterEnd
, QuarterrBegin
, BQuarterEnd
, and BQuarterBegin
.QuarterEnd
, QuarterrBegin
, BQuarterEnd
, and BQuarterBegin
: This is same as for Year*
classes except that the period is quarter instead of year.FY5253
, FY5253Quarter
: These classes describe a 52-53 week fiscal year. This is also known as a 4-4-5 calendar. You can get more information on this at http://en.wikipedia.org/wiki/4–4–5_calendar.Easter
: This is the DateOffset
for the Easter holiday.Tick
: This is the base class for Time unit classes such as Day
, Hour
, Minute
, Second
, Milli
, Micro
, and Nano
.period.py
: This defines the Period
and PeriodIndex
classes for pandas TimeSeries
.plotting.py
: This defines various plotting functions such as tsplot(..)
, which plots a Series.resample.py
: This defines TimeGrouper
, a custom groupby
class for time-interval grouping.timedeltas.py
: This defines the to_timedelta(..)
method, which converts its argument into a timedelta
object.tools.py
: This defines utility functions such as to_datetime(..)
, parse_time_string(..)
, dateutil_parse(..)
, and format(..)
.util.py
: This defines more utility functions as follows: