Symbols
.add method
working 261
.apply method
code, inspecting 567, 569, 570, 572, 574
performance 558, 560, 561, 562
performance, improving with Dask 562, 564, 565, 567
performance, improving with Pandarell 562, 564, 565, 567
performance, improving with Swifter 562, 564, 566, 567
working 561
.cummax method
working
.describe method 139
.groupby method
using, for DataFrames with DatetimeIndex
.idxmax method
.join method 417, 419, 421, 424, 426
.merge method 417, 419, 421, 423, 426
A
aggregation
working 289
aggregation function
customizing, with *args parameter 307, 309, 310, 311
customizing, with **kwargs parameter 307, 309, 311
grouping with 303, 304, 305, 306
area charts
stacking, to discover trends 532, 533, 535, 536, 538
arithmetic operator
method 32
axis levels
renaming 384, 385, 386, 387, 388, 389
B
Boolean array 23
Boolean arrays
working 220
Boolean indexing
readability, improving with query method 235, 236
Booleans
selecting
working
Boolean statistics
working 211
C
Cartesian product
implementing 248, 250, 251, 252
working 251
categorical and categorical values
comparing 187
categorical data
about 148, 149, 151, 152, 154, 156, 158, 159
college campus diversity
determining
column
about
creating 44
deleting 44
selecting 13, 14, 15, 16, 131, 132
selecting, with method 48, 49, 50, 52
working
column names
columns
adding, from DataFrame 265, 266, 267, 269, 271
filtering, with time data 444, 445, 446, 447, 449
column types
working 143
comparison operator
method 32
continuous columns
comparing 176, 178, 179, 181, 182, 184, 185, 186
working 183
continuous data
about 160, 161, 163, 164, 167, 169
working 165
continuous values
comparing, with categories 170, 171, 173, 174, 175
continuous variables
grouping by 340, 341, 343, 345
Cramér�s V measure
reference link
crime, by weekday and year
measuring 474, 475, 476, 477, 479, 480, 481, 482, 483
CSV files
reading 86, 87, 88, 89, 90, 91, 92, 93, 96
working 94
D
Dask
.apply method, performance improving 562, 564, 565, 567
data
selecting, with integers 205, 206, 208
selecting, with labels 205, 206, 208
data analysis routine
working 118
databases
data dictionary 121
DataFrame
attributes, working 6
columns, adding 265, 266, 267, 269, 271
creating, from scratch 81, 82, 83
slicing, with DatetimeIndex 438, 439, 440, 441, 442
DataFrame columns
working 204
DataFrame method
working 64
DataFrame operation
direction, transposing 79
working 71
DataFrame rows
selecting 196, 197, 200, 201, 203, 205
DataFrames
rows, appending to 401, 402, 403, 405, 407, 408
data integrity
managing, with Great Expectations tool 578, 582, 584, 585, 588
data types
working 11
DatetimeIndex
methods, using 449, 450, 452, 454, 458, 459
used, for grouping with anonymous functions
dunder methods 32
E
Excel files
working 100
Exploratory Data Analysis (EDA) 115
about 139
categorical data 148
column types 143
continuous data 160
summary statistics 139, 140, 141
working 142
F
flights dataset
visualizing 515, 518, 519, 520, 521, 522, 524, 525, 526, 528, 529, 530, 532
working
flow programming 34
functions
aggregating 290, 292, 294, 295, 297
grouping 290, 292, 294, 295, 297
working 293
G
Great Expectations tool
about 578
data integrity, managing 578, 582, 584, 585, 588
groupby aggregation
.unstack method, using 374, 375, 376, 377
used, for replicating pivot_table 378, 379, 381, 382, 384
groupby object
working 315
H
HTML tables
working
Hypothesis library
tests, generating
I
idxmax
replicating, with method chaining 282, 283, 284
indexes
working 257
index filtering
versus row filtering 221, 222, 225
index lexicographically
slice notation, using 208
index object
working 247
integer location
selecting
working
integers
used, for selecting data 205, 206, 208
IPython debugger (ipdb) 574
J
JavaScript Object Notation (JSON)
about 107
working 112
Jupyter
debugging 574, 575, 576, 577, 578
K
Kaggle
reference link 553
kernel density estimates (KDEs) 508
kernel density estimation (KDE) 164
L
labels
selecting
used, for selecting data 205, 206, 208
working
M
matplotlib
data, visualizing 500, 501, 502, 504, 505, 506
object-oriented guide 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499
melt
used, for tidying variable values as column names 358, 359, 361
memory
reducing, by changing data types 122, 123, 124, 127, 128, 129
used, for replicating idxmax 282, 283, 284
missing values
MultiIndex
removing, after grouping 297, 298, 299, 300, 303
multiple Boolean condition
working 215
multiple columns
aggregating 290, 291, 292, 293, 295, 297
grouping 290, 291, 292, 294, 295
working 293
multiple DataFrame columns
selecting 45
working 47
multiple DataFrames
concatenating 411, 412, 413, 415
multiple values, stored in same cell scenario
tidying
multiple variables, as stored single column scenario
tidying 400
multiple variables, stored as column names scenario
tidying 391, 394, 395, 396, 397, 400
multivariate analysis
with seaborn Grids 546, 547, 548, 549, 550
N
NaN (not a number) 3
Numba library 567
n values
replicating, with sort_values 137
O
on-time flight performance
finding
P
Pandaral·lel library 564
Pandarell
.apply method, performance improving 563, 564, 565, 567
pandas
about 1
importing 1
versus seaborn 538, 540, 541, 542, 543, 544, 545, 546
pandas profiling library
reference link
using
working
pivot_table
replicating, with groupby aggregation 378, 379, 381, 382, 384
pytest
using, with pandas 588, 591, 592
Python
reference link 436
versus pandas date tools 429, 430, 431, 432, 434, 435, 436
Q
query method
used, for improving Boolean indexing readability 235, 236
R
row filtering
versus index filtering 221, 222, 225
rows
appending, to DataFrames 401, 402, 403, 405, 407, 408
selecting 132, 133, 134, 135, 136
S
seaborn
Simpsons Paradox, uncovering diamonds dataset
versus pandas 538, 539, 541, 542, 543, 544, 545, 546
seaborn Grids
multivariate analysis 546, 548, 549, 550
Series data
selecting 189, 190, 191, 192, 193
series methods
calling 18, 19, 20, 21, 22, 23, 25, 26
chaining 34, 35, 36, 37, 38, 39
series operations
working 30
Series size
preserving, with where method 237, 238, 239, 240, 241
Simpsons Paradox
uncovering, diamonds dataset with seaborn
slice notation
used, for index lexicographically 208
sorted indexes
working 227
sort_values
used, for replicating n values 137
working
SQLAlchemy
SQL databases
connecting to
SQL WHERE clauses
translating 229, 230, 231, 234, 235
working 232
stack
used, for tidying variable values as column names 352, 354, 355, 356, 357
stacked data
inverting 366, 368, 369, 370, 371, 373
states
filtering, with minority majority 317, 319, 320, 322
stop order price
calculating
Structured Query Language (SQL) 229
Subject Matter Expert (SME) 121
summary statistics
working 142
Swifter
.apply method, performance improving 562, 564, 565, 567
T
tests
generating, with Hypothesis library
tidy data 350
time data
used, for filtering columns 444, 446, 447, 448, 449
Timestamps
grouping by
traffic accidents
aggregating 466, 467, 468, 469, 470, 472
transform data
working 558
U
unique indexes
working 227
V
value
highlighting, from column 272, 273, 274, 275, 276, 277, 278, 281
values
filling, with unequal indexes 259, 260, 261, 262, 265
variables
multiple groups, stacking simultaneously 362, 364, 366
variables, stored in column names and values
tidying
variable values
tidying, as column names with melt 358, 359, 361
tidying, as column names with stack 352, 354, 355, 356, 357
W
weekly crimes
aggregating 466, 467, 468, 469, 470, 472
numbers, counting 460, 461, 464
weighted mean SAT scores per state
weight loss bet
transforming through 322, 323, 324, 325, 326, 328, 329, 330, 332
where method
used, for preserving Series size 237, 238, 239, 240
Z
zip files