Index

Symbols

.add method

working 261

.apply method

code, inspecting 567, 569, 570, 572, 574

performance 558, 560, 561, 562

performance, improving with Dask 562, 564, 565, 567

performance, improving with Pandarell 562, 564, 565, 567

performance, improving with Swifter 562, 564, 566, 567

working 561

.cummax method

working

.describe method 139

.groupby method

using, for DataFrames with DatetimeIndex

.idxmax method

working 279, 281

.join method 417, 419, 421, 424, 426

.merge method 417, 419, 421, 423, 426

A

aggregation

defining 286, 288

working 289

aggregation function

customizing, with *args parameter 307, 309, 310, 311

customizing, with **kwargs parameter 307, 309, 311

grouping with 303, 304, 305, 306

area charts

stacking, to discover trends 532, 533, 535, 536, 538

arithmetic operator

method 32

axis levels

renaming 384, 385, 386, 387, 388, 389

B

Boolean array 23

Boolean arrays

filtering 217, 219, 221

working 220

Boolean indexing

readability, improving with query method 235, 236

Booleans

selecting

working

Boolean statistics

calculating 209, 210, 212

working 211

C

Cartesian product

implementing 248, 250, 251, 252

working 251

categorical and categorical values

comparing 187

categorical data

about 148, 149, 151, 152, 154, 156, 158, 159

working 153, 154

college campus diversity

determining

column

about

creating 44

deleting 44

selecting 13, 14, 15, 16, 131, 132

selecting, with method 48, 49, 50, 52

working

column names

ordering 54, 56, 57

renaming 39, 41, 43

columns

adding, from DataFrame 265, 266, 267, 269, 271

filtering, with time data 444, 445, 446, 447, 449

column types

about 143, 144, 147

working 143

comparison operator

method 32

concat function 411, 417, 418

continuous columns

comparing 176, 178, 179, 181, 182, 184, 185, 186

working 183

continuous data

about 160, 161, 163, 164, 167, 169

working 165

continuous values

comparing, with categories 170, 171, 173, 174, 175

continuous variables

grouping by 340, 341, 343, 345

working 343, 345

Cramér�s V measure

reference link

crime, by weekday and year

measuring 474, 475, 476, 477, 479, 480, 481, 482, 483

CSV files

reading 86, 87, 88, 89, 90, 91, 92, 93, 96

working 94

writing 84, 86

D

Dask

.apply method, performance improving 562, 564, 565, 567

data

selecting, with integers 205, 206, 208

selecting, with labels 205, 206, 208

data analysis routine

developing 115, 116, 117

working 118

databases

working with 106, 107

data dictionary 121

DataFrame

attributes 4, 5, 7

attributes, working 6

columns, adding 265, 266, 267, 269, 271

creating, from scratch 81, 82, 83

slicing, with DatetimeIndex 438, 439, 440, 441, 442

summarizing 57, 59, 60

using, in pandas 2, 3, 4

working 61, 268, 269

DataFrame columns

selecting 201, 204, 205

working 204

DataFrame method

chaining 62, 64, 65

working 64

DataFrame operation

about 66, 72

direction, transposing 79

performing 69, 70

working 71

DataFrame rows

masking 242, 244

selecting 196, 197, 200, 201, 203, 205

working 200, 204

DataFrames

rows, appending to 401, 402, 403, 405, 407, 408

data integrity

managing, with Great Expectations tool 578, 582, 584, 585, 588

data types

about 7, 10, 11

working 11

DatetimeIndex

methods, using 449, 450, 452, 454, 458, 459

used, for grouping with anonymous functions

dunder methods 32

E

Excel files

using 98, 100, 101

working 100

Exploratory Data Analysis (EDA) 115

about 139

categorical data 148

column types 143

continuous data 160

summary statistics 139, 140, 141

working 142

F

flights dataset

counting 346, 347

visualizing 515, 518, 519, 520, 521, 522, 524, 525, 526, 528, 529, 530, 532

working

flow programming 34

functions

aggregating 290, 292, 294, 295, 297

grouping 290, 292, 294, 295, 297

working 293

G

Great Expectations tool

about 578

data integrity, managing 578, 582, 584, 585, 588

groupby aggregation

.unstack method, using 374, 375, 376, 377

used, for replicating pivot_table 378, 379, 381, 382, 384

groupby object

examining 312, 313, 315, 317

working 315

H

HTML tables

reading 112, 113, 114

working

Hypothesis library

tests, generating

I

idxmax

replicating, with method chaining 282, 283, 284

indexes

exploding 253, 254, 255, 257

working 257

index filtering

versus row filtering 221, 222, 225

index lexicographically

slice notation, using 208

index object

examining 245, 246, 247, 248

working 247

integer location

selecting

working

integers

used, for selecting data 205, 206, 208

IPython debugger (ipdb) 574

J

JavaScript Object Notation (JSON)

about 107

reading 107, 108, 112

working 112

Jupyter

debugging 574, 575, 576, 577, 578

K

Kaggle

reference link 553

kernel density estimates (KDEs) 508

kernel density estimation (KDE) 164

L

labels

selecting

used, for selecting data 205, 206, 208

working

M

matplotlib

about 486, 487

data, visualizing 500, 501, 502, 504, 505, 506

object-oriented guide 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 499

melt

used, for tidying variable values as column names 358, 359, 361

memory

reducing, by changing data types 122, 123, 124, 127, 128, 129

method chaining 1, 34

used, for replicating idxmax 282, 283, 284

missing values

comparing 73, 75, 76, 77, 79

MultiIndex

removing, after grouping 297, 298, 299, 300, 303

multiple Boolean condition

constructing 213, 215

working 215

multiple columns

aggregating 290, 291, 292, 293, 295, 297

grouping 290, 291, 292, 294, 295

working 293

multiple DataFrame columns

selecting 45

working 47

multiple DataFrames

concatenating 411, 412, 413, 415

multiple values, stored in same cell scenario

tidying

multiple variables, as stored single column scenario

tidying 400

multiple variables, stored as column names scenario

tidying 391, 394, 395, 396, 397, 400

multivariate analysis

with seaborn Grids 546, 547, 548, 549, 550

N

NaN (not a number) 3

Numba library 567

n values

replicating, with sort_values 137

O

on-time flight performance

finding

P

Pandaral·lel library 564

Pandarell

.apply method, performance improving 563, 564, 565, 567

pandas

about 1

DataFrame, using 2, 3, 4

importing 1

plotting 508, 511, 513

pytest, using 588, 591, 592

versus seaborn 538, 540, 541, 542, 543, 544, 545, 546

pandas profiling library

reference link

using

working

pivot_table

replicating, with groupby aggregation 378, 379, 381, 382, 384

pytest

using, with pandas 588, 591, 592

Python

reference link 436

versus pandas date tools 429, 430, 431, 432, 434, 435, 436

Q

query method

used, for improving Boolean indexing readability 235, 236

R

row filtering

versus index filtering 221, 222, 225

rows

appending, to DataFrames 401, 402, 403, 405, 407, 408

selecting 132, 133, 134, 135, 136

S

seaborn

Simpsons Paradox, uncovering diamonds dataset

versus pandas 538, 539, 541, 542, 543, 544, 545, 546

seaborn Grids

multivariate analysis 546, 548, 549, 550

Series data

selecting 189, 190, 191, 192, 193

working 194, 196

series methods

calling 18, 19, 20, 21, 22, 23, 25, 26

chaining 34, 35, 36, 37, 38, 39

series operations

about 26, 28, 29, 32, 34

working 30

Series size

preserving, with where method 237, 238, 239, 240, 241

Simpsons Paradox

uncovering, diamonds dataset with seaborn

slice notation

used, for index lexicographically 208

sorted indexes

selecting 225, 226, 227, 229

working 227

sort_values

used, for replicating n values 137

working

SQLAlchemy

SQL databases

connecting to

SQL WHERE clauses

translating 229, 230, 231, 234, 235

working 232

stack

used, for tidying variable values as column names 352, 354, 355, 356, 357

stacked data

inverting 366, 368, 369, 370, 371, 373

states

filtering, with minority majority 317, 319, 320, 322

stop order price

calculating

Structured Query Language (SQL) 229

Subject Matter Expert (SME) 121

summary statistics

about 139, 140, 141, 142

working 142

Swifter

.apply method, performance improving 562, 564, 565, 567

T

tests

generating, with Hypothesis library

tidy data 350

time data

used, for filtering columns 444, 446, 447, 448, 449

Timestamps

grouping by

traffic accidents

aggregating 466, 467, 468, 469, 470, 472

transform data

coding 553, 554, 555

working 558

U

unique indexes

selecting 225, 226, 227, 229

working 227

V

value

highlighting, from column 272, 273, 274, 275, 276, 277, 278, 281

values

filling, with unequal indexes 259, 260, 261, 262, 265

variables

multiple groups, stacking simultaneously 362, 364, 366

variables, stored in column names and values

tidying

variable values

tidying, as column names with melt 358, 359, 361

tidying, as column names with stack 352, 354, 355, 356, 357

W

weekly crimes

aggregating 466, 467, 468, 469, 470, 472

numbers, counting 460, 461, 464

weighted mean SAT scores per state

calculating 333, 334, 339

working 337, 339

weight loss bet

transforming through 322, 323, 324, 325, 326, 328, 329, 330, 332

where method

used, for preserving Series size 237, 238, 239, 240

Z

zip files

working with 101, 102, 103, 104

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset