

  • # (pound sign), 200

  • [ ] (square brackets), 103

  • { } (curly braces), 103

  • “ (quotation marks), 102

  • 1NF (first normal form), 170

  • 2NF (second normal form), 170

  • 3NF (third normal form), 170171

  • 4NF (fourth normal form), 170


  • AAC (Advanced Audio Coding), 77

  • Abcya, 355

  • absolute measures of dispersion, 231

  • acceptable use policies (AUPs), 389390

  • access control lists (ACLs), 378

  • access points (APs), 378

  • access requirements, 370374

    • ACLs (access control lists), 378

    • DUAs (data use agreements), 373374

    • MDM (master data management), 455

    • MDM (master data management) for, 456457

    • RBAC (role-based access control), 331

    • release approvals, 374

    • role-based access control, 372

    • user group-based access control, 373

  • accessibility of data, 417

  • accuracy, 415

  • ACID design principle, 2526

  • ACLs (access control lists), 378

  • acquisition, data, 127135, 419

    • APIs (application programming interfaces), 121125

    • data collection methods, 127135

    • data integration, 114

    • data monetization, 114

    • delta load, 121

    • ELT (extract, load, and transform) process, 117119, 120

    • ETL (extract, transform, and load) process, 115117, 120

    • improving data quality in, 419

    • observation, 133134

    • overview of, 113114

    • public databases, 129

    • sampling, 132133

    • sources, 258, 260261

    • surveys, 129132

    • web scraping, 128129

    • web services, 121125

  • active data dictionaries, 452

  • ad hoc reports, 362363

  • Adaptive Transform Acoustic Coding (ATRAC), 79

  • administrative teams, reports for, 294

  • Advanced Audio Coding (AAC), 77

  • Advanced Encryption Standard (AES), 377

  • Advanced Systems Format (ASF), 80

  • AES (Advanced Encryption Standard), 377

  • aggregate functions, 177179

  • aggregation, data, 167168

  • AI (artificial intelligence), 390

  • AIFF (Audio Interchange File Format), 76

  • algorithms, encryption, 377, 378

  • alphanumeric data type, 63

  • alternative hypotheses, 246

  • ALUs (arithmetic logical units), 185186

  • Amazon Web Services. See AWS (Amazon Web Services)

  • analogous color schemes, 308

  • analog-to-digital audio conversion, 7375

  • analysis. See data analysis techniques; data analytics tools

  • Analysis Toolpak (Microsoft Excel), 212

  • AND function, 175176

  • Apex, 220

  • APIs (application programming interfaces), 121125

  • appending data, 160, 164166

  • appendixes, in documentation, 319320

  • Apple, 407

  • application programming interfaces. See APIs (application programming interfaces)

  • approvals, for dashboards, 331

  • APs (access points), 378

  • arithmetic logical units (ALUs), 185186

  • array data type, 58

  • array keyword, 58

  • artificial intelligence (AI), 390

  • ASF (Advanced Systems Forma), 80

  • assumption-free methods, 146

  • asymmetric encryption, 376

  • Athena (AWS), 381

  • atomicity, 26

  • ATRAC (Adaptive Transform Acoustic Coding), 79

  • attributes, data

    • attribute limitations, 417

    • in dashboards, 324326

    • definition of, 35

  • audiences, report, 292295

  • audio, 7379

    • analog-to-digital conversion, 7375

    • file formats, 7579

  • Audio Interchange File Format (AIFF), 76

  • Audio Video Interleave (AVI), 79

  • audits, 433434

  • AUPs (acceptable use policies), 389390

  • automated validation, 430, 435437

  • availability, 375

  • AVG function, 177178

  • AVI (Audio Video Interleave), 79

  • AWS (Amazon Web Services)

  • Azure

    • Blobs, 385

    • Database for MySQL Flexible Server, 187190

    • Synapse, 21


  • balanced tree, 194

  • bar charts, 346

  • BCNF (Boyce-Codd normal form), 170171

  • best fit, line of, 248

  • BI (business intelligence) tools, 465

  • BigQuery, 95

  • binary data attributes, 325

  • bitmap (BMP) file format, 83

  • blending data, 161

  • Blowfish, 377

  • BLUF (Bottom Line Up Front) statements, 304

  • BMP (bitmap) file format, 83

  • “Bottom Line Up Front” statements, 304

  • Boyce-Codd normal form (BCNF), 170171

  • branding, 313

  • bring-your-own-device (BYOD), 378

  • B-tree, 194

  • BTREE indexes, 194

  • bubble charts, 344345

  • business intelligence (BI) tools, 465

  • business questions, reviewing, 259

  • business reports. See reports

  • business rules, 426

  • BusinessObjects, 219

  • BusinessObjects (SAP), 219

  • BYOD (bring-your-own-device), 378


  • cache, size of, 187

  • California Consumer Privacy Act (CCPA), 375

  • Canva, 353

  • cardinality, 395396

  • categorical data type, 6869

  • categorical variables, recoding, 158

  • CBR (constant bitrate) encoding, 78

  • CCPA (California Consumer Privacy Act), 375

  • central processing units (CPUs), 185186

  • central tendency, measures of, 228231

  • char keyword, 56

  • character data type, 5657

  • Charts command (Insert menu), 343

  • charts/graphs. See also reports; statistical methods

  • cheat sheet for exam preparation, 460461

  • check constraints, 395

  • child nodes, 194

  • chi-square test, 244246

  • choropleth maps. See geographic maps

  • CIA triad, 375

  • citations, report, 318319

  • classification of data, 401405

    • examples of, 401

    • PCI-DSS (Payment Card Industry Data Security Standard), 367, 379, 404405

    • PHI (personal health information), 201, 367, 403404

    • PII (personally identifiable information), 201, 367, 401403, 417

  • cleansing data, 139149

    • data outliers, 146148

    • data quality and, 139

    • data type validation, 148149

    • definition of, 258

    • duplicate data, 140141

    • goals of, 140

    • invalid data, 145146

    • missing values, 143145

    • non-parametric data, 146

    • redundant data, 141143

    • specification mismatches, 148

  • C-level executive dashboards, 327

  • cloud-based storage, 385386, 392

  • clustered indexes, 194

  • clustered sampling, 133

  • Cognos (IBM), 214

  • collection methods, 127135

  • collective outliers, 147

  • color schemes, report, 306309

  • color wheels, 307

  • Colorado Privacy Act, 375

  • columnar data stores, 1011

  • columns, 9294

  • comma-delimited files, 101102

  • comma-separated values (CSV) format, 102

  • comparison

    • in data analysis, 270

    • of data over time, 268271

      • comparison, 270

      • composition, 269

      • distribution, 270271

      • relationships, 268

  • complementary color schemes, 308

  • completeness of data, 416

  • composition, in data analysis, 269

  • compression of data, 168

  • CompTIA Data+ exam

    • NDA (nondisclosure agreement), 460

    • post-exam career tips, 465

    • preparation for, 460461

    • question types, 459460

    • registration for, 459

    • test-taking tips, 461465

  • CONCATENATE function, 162

  • concatenation, 162164

  • confidence intervals, 240241

  • confidence levels, 240

  • confidentiality, 375

  • conformity, 427428

  • connectivity to DBMSs (database management systems), 7

  • consent, 369

  • consistency, 26, 149, 414415

  • consolidation hubs, 446

  • consolidation of data fields, 446448

  • constant bitrate (CBR) encoding, 78

  • constraints, data, 394395

  • consumers, dashboard, 327328

  • contextual outliers, 147

  • contingency tables, 274

  • continuous data attributes, 325

  • continuous data types, 6972

  • convenience sampling, 133

  • conversion data, 420

  • corporate reporting standards, 312313

  • correlation, 250251

  • Cosmos DB, 10

  • COUNT function, 177178

  • cover pages, 301302

  • CPUs (central processing units), 185186

  • crawlers, 129

  • create, read, update, and delete (CRUD) operations, 199, 221222

  • CREATE INDEX statement, 196

  • CREATE TEMPORARY TABLE statement, 197

  • CREATE UNIQUE INDEX statement, 196

  • Crockford, Douglas, 102

  • cross-validation, 432433

  • cryptography, 375377

  • CSAT (customer satisfaction), 327

  • CSV (comma-separated values) format, 102

  • CURDATE function, 174

  • curly braces ({ }), 103

  • currency data type, 6466

  • CURTIME function, 174

  • custodians, data, 368

  • customer satisfaction (CSAT), 327


  • dashboards, 321335

    • consumer types for, 327328

    • continuous/live data feeds, 326327

    • data sources and attributes, 324326

    • definition of, 322

    • delivery, 332335

      • interactive dashboards, 333335

      • scheduled, 332333

      • subscriptions, 332

    • development process for, 328332

      • approvals, 331

      • deployment to production, 331332

      • high-level steps of, 328329

      • mockups/wireframes, 329331

      • role of designers in, 331

    • examples of, 323

    • improving data quality in, 421422

    • static data, 326327

  • data acquisition/data mining

    • APIs (application programming interfaces), 121125

    • data collection methods, 127135

    • data integration, 114

    • data monetization, 114

    • delta load, 121

    • ELT (extract, load, and transform) process, 117119, 120

    • ETL (extract, transform, and load) process, 115117, 120

    • improving data quality in, 419

    • overview of, 113114

    • sources, 258, 260261

    • web services, 121125

  • data analysis techniques, 255276. See also data analytics tools; reports

    • business questions, 259

    • comparison of data over time, 268271

      • comparison, 270

      • composition, 269

      • distribution, 270271

      • relationships, 268

    • connection of data points/pathway, 275276

    • data collection sources, 258, 260261

    • data needs, determining, 257259

    • descriptive statistics for, 273274

    • diagnostic analysis, 266

    • exploratory data analysis, 272273

    • gap analysis, 261263

    • importance of, 255257

    • link analysis, 274275

    • performance analysis, 271

    • predictive analysis, 266267

    • prescriptive analysis, 267

    • projections to achieve goals, 272

    • statistical analysis, 266

    • text analysis, 266

    • tracking of measurements against defined goals, 271272

    • trend analysis, 267268

  • data analytics tools, 208224. See also data analysis techniques

    • Apex, 220

    • AWS QuickSight, 221

    • data lakes, 21

    • data warehouses, 17

    • Datorama, 220

    • Domo, 220221

    • IBM Cognos, 214

    • IBM Statistical Package Social Science (SPSS) Modeler, 214215

    • Microsoft Excel, 211213

    • Microsoft Power BI, 217218

    • MicroStrategy, 219

    • Minitab, 221222

    • Python, 211

    • Qlik, 218219

    • R, 213

    • Rapid Miner, 214

    • SAP BusinessObjects, 219

    • SAS (Statistical Analytical System), 215216

    • SQL (Structured Query Language), 8, 210211

    • Stata, 221

    • Tableau, 216217

  • data audits, 433434

  • data breach reporting, 407408

  • data classification, 401405

    • examples of, 401

    • PCI-DSS (Payment Card Industry Data Security Standard), 367, 379, 404405

    • PHI (personal health information), 201, 367, 403404

    • PII (personally identifiable information), 201, 367, 401403, 417

  • data cleansing. See cleansing data

  • data collection methods, 127135

  • data compression, 168

  • data consent, 369

  • data consistency, 149

  • data constraints, 394395

  • data custodians, 368

  • data de-identification, 379381

  • data deletion, 123, 179, 391392

  • data dictionaries, 451453

  • data dimensions, 4650

    • dimension data type, 6869

    • dimension tables, 35

    • dimensionality reduction, 168

    • overview of, 46

    • SCDs (slowly changing dimensions), 4650

  • data encryption, 375377, 378

  • data fields

  • data governance. See governance

  • data ingestion layer, 20

  • data integration, 114

  • data integrity, 375, 416

  • data lakes, 2023

    • advantages of, 2122

    • data warehouses compared to, 2223

    • definition of, 20

    • examples of, 21

    • importance of, 1516

    • structure and components of, 2021

  • data loss prevention (DLP), 379

  • data manipulation, 153179. See also individual functions

    • appending data, 160, 164166

    • blending data, 161

    • concatenation, 162164

    • data reconciliation, 156

    • definition of, 155

    • derived variables, 159

    • DML (Data Manipulation Language), 26, 156

    • final product and reports, 421422

    • importance of, 155

    • improving data quality in, 421

    • imputation, 166167

    • merges, 160

    • normalization, 170171

    • recoding, 156159

    • reduction of data, 167168

    • string manipulation, 171

    • transposition, 168169

  • Data Manipulation Language (DML), 26, 156

  • data marts, 1819

  • data masking, 379381

  • data mining. See data acquisition/data mining

  • data modeling, 34

  • data monetization, 114

  • data needs, determining, 257259

  • data outliers, 146148

  • data owners, 368

  • data partitioning, 202

  • data points, connection of, 275276

  • data processing, 390391

  • data profiling. See profiling and cleansing data

  • data purging, 392

  • data quality. See quality

  • data queries. See queries

  • data ranges, validation of, 149

  • data reconciliation, 156

  • data release approvals, 374

  • data retention, 392393

  • data sharding, 202

  • data sources, 20

    • in dashboards, 324326

    • definition of, 17

    • documentation references to, 318319

    • improving data quality at, 419

  • data sovereignty, 406

  • data stewards, 368

  • data stores

    • columnar, 1011

    • graph, 12

    • key/value, 11

  • data structures

    • file formats, 98109

      • comma-delimited, 101102

      • DSV, 102

      • HTML (Hypertext Markup Language), 106109

      • JSON (Java Script Object Notation), 10, 102103, 109

      • tab-delimited, 100101

      • text/flat file, 99100

      • XML (Extensible Markup Language), 104106, 109

    • metadata, 9697

    • overview of, 8990

    • semi-structured data, 96

    • structured data, 9094

      • abstract view of, 91

      • examples of, 9192

      • key/value pairs, 94

      • rows/columns, 9294

      • unstructured data compared to, 9091

    • unstructured data

      • abstract view of, 9495

      • processing and analysis of, 9596

      • structured data compared to, 9091

  • data transformation

    • data lakes, 20

    • improving data quality in, 419420

  • data transmission, 378379

  • data transposition, 168169

  • data types, 5571. See also types of data

    • alphanumeric, 63

    • array, 58

    • categorical, 6869

    • character, 5657

    • continuous, 6972

    • currency, 6466

    • date, 6162

    • definition of, 55

    • dimension, 6869

    • discrete, 6970

    • double, 5758

    • float, 57

    • integer, 57

    • length of, 55

    • numeric, 6364

    • storage sizes of, 56

    • string, 5859

    • text, 64

    • validation of, 148149

  • data use agreements (DUAs), 373374

  • data validation

    • conformity/non-conformity, 427428

    • methods of, 430

    • rows passed/rows failed, 429430

  • data visualization techniques, 341364. See also dashboards; documentation

  • data warehouses, 6-

    • architecture of, 1718

    • components of, 17

    • data lakes compared to, 2223

    • data marts, 1819

    • definition of, 16

    • examples of, 17, 18

    • importance of, 1516

    • key functions of, 16

  • databases. See also DBMSs (database management systems)

    • data dimensions, 4650

      • overview of, 46

      • SCDs (slowly changing dimensions), 4650

    • definition of, 3

    • importance of, 34

    • non-relational, 913

      • columnar data stores, 1011

      • definition of, 9

      • document data stores, 10

      • graph data stores, 12

      • key/value data stores, 11

      • relational databases compared to, 1213

    • normalization, 43

    • public, 129

    • relational, 79

      • advantages of, 89

      • definition of, 7

      • Excel spreadsheets compared to, 8

      • non-relational databases compared to, 1213

      • non-relational databases versus, 13

      • simple example of, 7

    • schemas, 3243

    • simple example of, 4

    • views, 33

  • date data type, 6162

  • date functions, 174

  • date ranges, in reports, 288290

  • DATE_FORMAT function, 174

  • DATEDIFF function, 174

  • Datorama, 220

  • DBMSs (database management systems)

    • capabilities of, 47

    • connectivity to, 7

    • RDBMSs (relational database management systems), 9

    • simple example of, 6

  • deep redundancy, 142

  • default value constraints, 395

  • de-identification, 379381

  • DELETE function, 179

  • Delete method, 123

  • deletion of data, 123, 179, 391392

  • delimiters, 100101

  • delimiter-separated values (DSV) file format, 102

  • delivery, of dashboards, 332335

    • interactive dashboards, 333335

    • scheduled, 332333

    • subscriptions, 332

  • delta load, 121

  • dependent data marts, 19

  • deployment, dashboards, 331332

  • derived variables, 159

  • descriptive statistics, 273274

  • design elements, report, 306313

  • designers, dashboard, 331

  • developers, reports for, 294

  • development process, dashboards, 328332

    • approvals, 331

    • deployment to production, 331332

    • high-level steps of, 328329

    • mockups/wireframes, 329331

    • role of designers in, 331

  • deviation

  • diagnostic analysis, 266

  • dictionaries, data, 451453

  • digital rights management (DRM), 394

  • dimension data type, 6869

  • dimension tables, 35

  • dimensionality reduction, 168

  • dimensions, data. See data dimensions

  • discrete data attributes, 325

  • discrete data types, 6970

  • disk space, 185186

  • dispersion, measures of, 231234

    • absolute versus relative, 231

    • mean deviation, 232233

    • quartile deviation, 232

    • range, 231232

    • relative measures, 234

    • standard deviation, 233234

    • variance, 233

  • distribution

  • distribution-free methods, 146

  • DLP (data loss prevention), 379

  • DML (Data Manipulation Language), 26, 156

  • document data stores, 10

  • documentation, 316320

  • DocumentDB, 10

  • Domo, 220221

  • DOUBLE data type, 5758

  • double quotes (“), 102

  • drill-down, 333

  • drives

  • DRM (digital rights management), 394

  • DROP TABLE statement, 199

  • DSV (delimiter-separated values) file format, 102

  • DUAs (data use agreements), 373374

  • duplicate data, 140141

  • durability, 26

  • dynamic data masking, 379381

  • dynamic reports, 361362


  • ECC (Elliptic Curve Cryptography), 377

  • ecosystem, API (application programming interface), 122

  • electronic protected health information (ePHI), 403

  • Elliptic Curve Cryptography (ECC), 377

  • ELT (extract, load, and transform) process, 419420

    • ETL process compared to, 120

    • stages of, 117119

  • Encapsulated Postscript (EPS), 84

  • encryption, 375377, 378

  • end-to-end encryption, 378

  • entity relationship requirements, 393396

    • cardinality, 395396

    • data constraints, 394395

    • importance of, 393394

    • record link restrictions, 394

  • ePHI (electronic protected health information), 403

  • EPS (Encapsulated Postscript), 84

  • ER (entity relationship) requirements, 393396

    • cardinality, 395396

    • data constraints, 394395

    • importance of, 393394

    • record link restrictions, 394

  • errors

  • escalation of data breach reporting, 407408

  • ETL (extract, transform, and load) process, 140, 419420

    • data warehouses, 17

    • ELT process compared to, 120

    • stages of, 115117

  • exam, CompTIA Data+. See CompTIA Data+ exam

  • Excel. See Microsoft Excel

  • execution plans, query, 186, 187190

  • executive summaries, 303

  • executives, reports for, 293294

  • EXPLAIN statement, 188

  • exploratory data analysis, 272273

  • Extensible Markup Language (XML), 104106, 109

  • external customers, 327, 328

  • external data sources, 261

  • extract, load, and transform process. See ELT (extract, load, and transform) process

  • extract, transform, and load process. See ETL (extract, transform, and load) process

  • extract stage

    • ELT (extract, load, and transform) process, 118

    • ETL (extract, transform, and load) process, 115116

  • extrapolation, 145


  • Facebook, 213

  • face-to-face surveys, 131132

  • facts

    • definition of, 34

    • fact tables, 35

  • false negatives, 247

  • false positives, 247

  • Family Educational Rights and Privacy Act (FERPA), 379

  • FAQs (frequently asked questions), 319320

  • FERPA (Family Educational Rights and Privacy Act), 379

  • field separators, 100101

  • fields

    • JSON (Java Script Object Notation), 103

    • standardization of, 448451

    • tab-delimited files, 100

  • file formats

  • Filled Maps command (Maps menu), 351

  • filters, 171172, 285287

  • final product, improving data quality in, 421422

  • first normal form (1NF), 170

  • FLAC (Free Lossless Audio Codec), 7778

  • flat dimensions, 39

  • flat files, 99100

  • FLOAT data type, 57

  • float keyword, 57

  • floating point numbers, 5758

  • fonts, report, 310311

  • foreign keys, 395

  • formal reports, 282

  • fourth normal form (4NF), 170

  • Free Lossless Audio Codec (FLAC), 7778

  • frequencies, 234235

  • frequency of reports, 291

  • frequently asked questions (FAQs), 319320

  • functions


  • gap analysis, 261263

  • GCP (Google Cloud Platform), 95, 211, 392

  • GDPR (General Data Protection Regulation), 201, 374375, 379, 406

  • general public dashboards, 328

  • geographic maps, 350351

  • Get method, 122

  • GIF (Graphics Interchange Format), 84

  • GLB (Gramm-Leach-Bliley) Act, 393

  • global outliers, 147

  • Global System for Mobile, 78

  • global temporary tables, 199

  • goals

    • projections to achieve, 272

    • tracking of measurements against, 271

  • goodness-of-fit, chi-square test of, 244

  • Google, 213

    • BigQuery, 21

    • GCP (Google Cloud Platform), 95, 392

    • Looker, 211, 333

  • governance, 367408

    • access requirements, 370374

      • DUAs (data use agreements), 373374

      • RBAC (role-based access control), 331, 372

      • release approvals, 374

      • user group-based access control, 373

    • benefits of, 368369

    • data breach reporting, 407408

    • data classification, 401405

      • examples of, 401

      • PCI-DSS (Payment Card Industry Data Security Standard), 367, 379, 404405

      • PHI (personal health information), 201, 367, 403404

      • PII (personally identifiable information), 201, 367, 401403, 417

    • entity relationship requirements, 393396

      • cardinality, 395396

      • data constraints, 394395

      • importance of, 393394

      • record link restrictions, 394

    • jurisdiction requirements, 406407

    • roles in, 368

    • security requirements, 374381

      • CIA triad, 375

      • data de-identification/masking, 379381

      • data encryption, 375377, 378

      • data transmission, 378379

      • GDPR (General Data Protection Regulation), 201, 374375

    • storage environment requirements, 383386

      • cloud-based storage, 385386

      • local storage, 384385

      • shared drives, 384

    • use requirements, 389393

      • AUPs (acceptable use policies), 389390

      • data deletion, 391392

      • data processing, 390391

      • data retention, 392393

  • GPUs (graphics processing units), 185186

  • Gramm-Leach-Bliley (GLB) Act, 393

  • graph data stores, 12

  • Graphics Interchange Format (GIF), 84

  • graphics processing units (GPUs), 185186

  • graphs/charts. See also reports; statistical methods

  • GSM file format, 78


  • Health Insurance Portability and Accountability Act (HIPAA), 379, 393, 403

  • heat maps, 348349

  • HIPAA (Health Insurance Portability and Accountability Act), 379, 393, 403

  • histograms, 274, 347

  • HTML (Hypertext Markup Language), 104, 106109

  • hubs, 446

  • hybrid data marts, 19

  • Hypertext Markup Language (HTML), 104, 106109

  • hypothesis testing, 239, 246247


  • IBM

    • Cognos, 214

    • SPSS (Statistical Package Social Science), 157, 214215

    • Watson, 214

  • IETF (Internet Engineering Task Force), 102

  • IF function, 177

  • images, 8185

    • file formats, 8385

    • raster versus vector, 8185

  • imputation, 145, 166167

  • independence, chi-square test of, 244

  • independent data marts, 19

  • indexing, 187, 193196

  • inferential statistics, 238251

  • infographics, 353

  • informal reports, 282

  • Informatica, 447

  • informational reports, 283

  • INSERT function, 179

  • Insert menu commands, Charts, 343

  • instructions, report, 303

  • INT data type, 57

  • integer data type, 57

  • integration data, 114

  • integrity, data, 375, 416

  • IntelliClean, 141

  • interactive dashboards, 333335

  • internal customers, 327

  • internal data sources, 261

  • Internet Engineering Task Force (IETF), 102

  • Internet of Things (IoT), 95, 104

  • interpolation, 145

  • interval data attributes, 325

  • intrahops, improving data quality in, 419420

  • invalid data, 145146

  • IoT (Internet of Things), 95, 104

  • IS NOT NULL statement, 96

  • IS NULL statement, 96

  • isolation, 26

  • IT team dashboards, 328

  • IT/operations teams, 294


  • Jason Davies, 355

  • Java Script Object Notation (JSON), 10, 102103

  • JPEG (Joint Photographic Experts Group), 83

  • JSON (Java Script Object Notation), 10, 102103, 109

  • jurisdiction requirements, 406407



  • Lake Formation (AWS), 381

  • lakes, data. See data lakes

  • languages

    • DML (Data Manipulation Language), 26, 156

    • HTML (Hypertext Markup Language), 104, 106109

    • Python, 211

    • R, 213

    • SGML (Standard Generalized Markup Language), 104

    • SQL (Structured Query Language), 8, 210211

    • VQL (Visual Query Language), 217

    • XML (Extensible Markup Language), 104106

  • layout, reports, 309310

  • leadership, reports for, 294

  • leadership dashboards, 327328

  • leaf nodes, 194

  • legislation

    • CCPA (California Consumer Privacy Act), 375

    • Colorado Privacy Act, 375

    • FERPA (Family Educational Rights and Privacy Act), 379

    • GDPR (General Data Protection Regulation), 201, 374375, 379, 406

    • GLB (Gramm-Leach-Bliley) Act, 393

    • HIPAA (Health Insurance Portability and Accountability Act), 379, 393, 403

    • impact of, 406

    • Patriot Act, 406

    • Virginia Consumer Data Privacy Act, 375

  • life cycle, data quality and

    • data acquisition/data source, 419

    • data manipulation, 421

    • data transformation/intrahops, 419420

    • final product and reports, 421422

    • overview of, 418

  • line charts, 342343

  • line of best fit, 248

  • linear regression, simple, 248249

  • links

    • definition of, 274

    • link analysis, 274275

  • lists, 39

  • live data feeds, 326327

  • load stage

    • ELT (extract, load, and transform) process, 119

    • ETL (extract, transform, and load) process, 116

  • local storage, 384385

  • local temporary tables, 200

  • logical functions, 174177

  • logical rules, imputation based on, 167

  • logical schemas, 33

  • logos, 313

  • LONGTEXT data type, 64

  • Looker (Google), 211, 333


  • management dashboards, 327328

  • managers, reports for, 294

  • manipulation of data. See data manipulation

  • manpower gaps, 263

  • maps. See also charts/graphs

  • Maps menu commands, Filled Maps, 351

  • MAR (missing at random), 143144

  • MariaDB, 10

  • markers, 343

  • market gaps, 263

  • marketing teams, reports for, 294

  • masking data, 379381

  • master data management. See MDM (master data management)

  • MAX function, 177179

  • MCAR (missing completely at random), 143144

  • MDM (master data management), 324, 441458

    • capabilities of, 443

    • data consolidation, 446448

    • data dictionaries, 451453

    • data standardization, 448451

    • definition of, 441, 443

    • example of, 444446

    • interaction with business functions, 444446

    • mergers and acquisitions, 455

    • policy compliance, 456

    • streamlined data access, 456457

  • mean

    • calculation of, 228

    • imputation with, 145

  • mean deviation

  • measurements, tracking against defined goals, 271

  • measures

  • median

    • calculation of, 229230

    • imputation with, 145

  • MEDIUMTEXT data type, 64

  • mergers and acquisitions, MDM (master data management) for, 455

  • merging data, 160

  • metadata, 6, 9697, 451453

  • metrics, quality, 413414

    • accessibility, 417

    • accuracy, 415

    • attribute limitations, 417

    • completeness, 416

    • consistency, 414415

    • integrity, 416

    • validity, 416

  • Microsoft Azure Synapse, 95

  • Microsoft Excel, 211213

  • Microsoft Power BI, 217218, 302

    • measuring data quality with, 421

    • mockups/wireframes, 329

    • scheduled deliveries in, 333

  • Microsoft PowerPoint

    • report options in, 302

    • versioning tools, 317

  • Microsoft Word

    • citation options, 318319

    • versioning tools, 317

  • MicroStrategy, 219

  • MIN function, 177

  • mining data. See data mining

  • Minitab, 221222

  • mismatches, specification, 148

  • missing values, 143145

    • imputation of, 166167

    • MAR (missing at random), 143144

    • MCAR ( missing completely at random), 143144

  • mobile device management, 378

  • mockups, 329331

  • mode, 230231

  • modeling, data, 34

  • monetization, data, 114

  • MongoDB, 10

  • monochromatic color schemes, 307

  • most frequent values, imputation with, 145

  • MOV file format, 80

  • Moving Picture Experts Group. See MPEG (Moving Picture Experts Group)

  • MPEG (Moving Picture Experts Group), 76, 80

  • MPLS (Multiprotocol Label Switching), 378

  • multimedia

  • multiple-choice questions, 460

  • Multiprotocol Label Switching (MPLS), 378

  • MySQL, 170. See also data types


  • names, field, 448451

  • name/value pairs, 103

  • NDA (nondisclosure agreement), for CompTIA Data+ exam, 460

  • negative correlation, 250251

  • networks, in link analysis, 274

  • NF (normal forms), 143, 170

  • nodes

    • B-tree, 194

    • in link analysis, 274

  • nominal data attributes, 325

  • non-clustered indexes, 194

  • non-conformity, 427428

  • nondisclosure agreement (NDA), for CompTIA Data+ exam, 460

  • non-parametric data, elimination of, 146

  • non-parametric methods, 146

  • non-random missing values, 144

  • non-relational databases, 913

    • columnar data stores, 1011

    • definition of, 9

    • document data stores, 10

    • graph data stores, 12

    • key/value data stores, 11

    • relational databases compared to, 1213

  • non-sensitive PII information, 402

  • normal forms (NF), 143, 170

  • normalization, 143, 170171

  • NoSQL databases. See non-relational databases

  • NOT function, 176

  • NOW function, 174

  • null hypothesis, 246

  • null values, 96

  • Nullsoft Winamp, 76, 79

  • numbers

    • double data type, 5758

    • float data type, 5758

    • integer data type, 57

    • interval data attributes, 325

    • numeric data attributes, 325

    • numeric data type, 6364

    • numerical values, recoding, 158159

  • numeric data attributes, 325

  • numeric data type, 6364

  • Numerical Python (NumPy), 211

  • numerical values, recoding, 158159

  • numerosity reduction, 168



  • paired t-tests, 243

  • paper surveys, 131

  • parameterization, query, 187190

  • parent nodes, 194

  • parsing data, 171

  • participant-based observation, 133134

  • partitioning, data, 202

  • partner APIs (application programming interfaces), 122

  • passive data dictionaries, 452

  • pass-through data, 420

  • Patriot Act, 406

  • PAYG (pay-as-you-go), 446

  • PCI-DSS (Payment Card Industry Data Security Standard), 367, 379, 404405

  • PDF (Portable Document Format), 8485

  • Pearson VUE, registration with, 459

  • percent change, 235236

  • percent difference, 235236

  • performance analysis, 271

  • performance gaps, 263

  • performance-based questions, 460

  • personal health information. See PHI (personal health information)

  • personally identifiable information. See PII (personally identifiable information)

  • personas, audience, 293294

  • PHI (personal health information), 201, 367, 403404

  • physical schemas, 33

  • pie charts, 343344

  • PII (personally identifiable information), 201, 367, 401403, 417

  • plans, query execution, 186, 187190

  • PlayStation Portable (PSP), 79

  • PNG (Portable Network Graphics), 84

  • point-of-sale (PoS) data, 95

  • policies

    • AUPs (acceptable use policies), 389390

    • data deletion, 391392

    • data retention, 392393

    • policy compliance, 456

  • Portable Document Format (PDF), 8485

  • Portable Network Graphics (PNG), 84

  • PoS (point-of-sale) data, 95

  • position, measures of, 274

  • positive correlation, 250251

  • Post method, 123

  • pound sign (#), 200

  • Power BI, 217218, 302

    • measuring data quality with, 421

    • mockups/wireframes, 329

    • scheduled deliveries in, 333

  • Power Query Editor, 165

  • PowerPoint

    • report options in, 302

    • versioning tools, 317

  • practice exams, 462

  • predictive analysis, 266267

  • preparation for CompTIA Data+ exam

    • cheat sheet, 460461

    • NDA (nondisclosure agreement), 460

    • post-exam career tips, 465

    • question types, 459460

    • registration for, 459, 463

    • test-taking tips, 461465

  • prescriptive analysis, 267

  • primary audience, 293

  • primary colors, in reports, 307

  • primary data sources, 260

  • primary keys, 395

  • privacy and security requirements. See security requirements

  • private APIs (application programming interfaces), 122

  • private information, 402

  • product gaps, 263

  • production, deploying dashboards to, 331332

  • profiling, data, 426

  • profiling and cleansing data, 139149, 430432

    • data outliers, 146148

    • data quality and, 139

    • data type validation, 148149

    • definition of, 258

    • duplicate data, 140141

    • goals of, 140

    • invalid data, 145146

    • missing values, 143145

    • non-parametric data, 146

    • redundant data, 141143

    • specification mismatches, 148

  • profit gaps, 263

  • projections to achieve goals, 272

  • prolog XML (Extensible Markup Language), 105

  • proprietary formats, 75

  • PSP (PlayStation Portable), 79

  • public APIs (application programming interfaces), 122

  • public databases, 129

  • public information, 403

  • purging data, 392

  • Put method, 123

  • p-values, 243244

  • Python, 211


  • Qlik, 218219

    • measuring data quality with, 421

    • scheduled deliveries in, 333

  • QT (QuickTime) file format, 80

  • quality, 139, 413437

    • business rules, 426

    • circumstances to check for

      • data acquisition/data source, 419

      • data manipulation, 421

      • data quality life cycle overview, 418

      • data transformation/intrahops, 419420

      • final product and reports, 421422

    • data profiling, 426, 430432

    • data validation methods, 430

    • data validation rules, 426427

      • conformity/non-conformity, 427428

      • rows passed/rows failed, 429430

    • expectations for, 434

    • metrics, 413414

      • accessibility, 417

      • accuracy, 415

      • attribute limitations, 417

      • completeness, 416

      • consistency, 414415

      • integrity, 416

      • validity, 416

    • validity, 416

  • quartile deviation, 232

  • queries

    • optimization and testing, 184202

    • parameterized, 187190

    • tools

      • data lakes, 21

      • data warehouses, 17

  • QuickSight (AWS), 221

  • QuickTime (QT) file format, 80

  • quotation marks (“), 102


  • R, 213

  • RACI matrix, 391

  • random missing values, 143144

  • random sampling, 133

  • range, 231232

  • Rapid Miner, 214

  • raster images, 8184

  • ratio data attributes, 325

  • RAW audio files, 78

  • RBAC (role-based access control), 121, 331, 372

  • RDBMSs (relational database management systems), 9

  • recoding data, 156159

  • reconciliation, data, 156

  • record link restrictions, 394

  • records, subsets of, 200202

  • recurring reports, 363364

  • RedShift, 10

  • reduction of data, 167168

  • redundant data, 141143

  • reference citations, 318319

  • refresh rates, 79

  • registration for CompTIA Data+ exam, 459, 463

  • regression, simple linear, 248249

  • regulation-enforced data deletion, 392

  • regulations. See also legislation

    • compliance with, 456

    • impact of, 406

  • relational database management systems (RDBMSs), 9

  • relational databases, 79

    • advantages of, 89

    • definition of, 7

    • Excel spreadsheets compared to, 8

    • non-relational databases compared to, 1213

    • simple example of, 7

  • relationship cardinality, 395396

  • relationships, in data analysis, 268

  • relative measures of dispersion, 231, 234

  • release approvals, 374

  • Remote Procedure Call (RPC), 122

  • replacing missing vlaues, 144

  • reports, 279295, 301313, 359364. See also charts/graphs

    • ad hoc/one-time, 362363

    • analytical versus informational, 283

    • audience for, 292295

    • content of, 282284

    • cover pages, 301302

    • creating, 359

    • data breach reporting, 407408

    • date range for, 288290

    • design elements, 306313

    • documentation, 316320

    • dynamic, 361362

    • executive summaries in, 303

    • executive summary, 304306

    • filters in, 285287

    • formal versus informal, 282

    • frequency of, 291

    • high-level objectives of, 279282

    • improving data quality in, 421422

    • instructions, 303

    • KPIs (key performance indicators)

      • selection of, 284

      • tracking against defined goals, 271272

    • recurring, 363364

    • self-service/on-demand, 363

    • static, 359360

    • tactical/research, 364

    • views of, 287288

  • requirements gathering, 257258

  • REST (representational state transfer), 122

  • restricted information, 403

  • retention, data, 392393

  • RFC 8259, 102

  • ROI (return on investment), 114

  • role-based access control (RBAC), 121, 331, 372

  • roll-up, 334

  • rows, 9294

  • rows passed/rows failed rules, 429430

  • RPC (Remote Procedure Call), 122

  • RSA, 377

  • rules

    • business, 426

    • validation, 426427

      • conformity/non-conformity, 427428

      • rows passed/rows failed, 429430


  • S3 (AWS), 385

  • SaaS (software as a service), 220

    • solutions

      • AWS QuickSight, 221

      • Datorama, 220

      • Power BI Service, 218

      • Qlik Sense, 219

  • Salesforce, 220, 326

  • sampling, 132133, 433

  • SAP, 219, 447

  • Sarbanes-Oxley Act (SOX), 379

  • SAS (Statistical Analytical System), 215216

  • Scalable Vector Graphics (SVG), 84

  • scatter plots, 249, 274, 345346

  • SCDs (slowly changing dimensions), 4650

    • definition of, 46

    • Type 0, 47

    • Type 1, 48

    • Type 2, 4849

    • Type 3, 49

    • Type 4, 4950

  • scheduled delivery, 332333

  • schemas, database, 3243

    • importance of, 3435

    • logical, 33

    • physical, 33

    • snowflake, 34, 4144

      • characteristics of, 43

      • definition of, 41

      • examples of, 4142

      • structure of, 41

    • star, 34, 3840

      • characteristics of, 40

      • definition of, 38

      • examples of, 3940

      • snowflake schema versus, 4344

      • structure of, 38

    • view, 33

  • scrubbing data. See cleansing data

  • search engine optimization (SEO), 334

  • second normal form (2NF), 170

  • secondary audience, 293

  • secondary colors, in reports, 307

  • secondary data sources, 260

  • secure wired access, 378

  • secure wireless access, 378

  • security requirements, 374381

    • CIA triad, 375

    • dashboards, 335

    • data de-identification/masking, 379381

    • data encryption, 375377, 378

    • data transmission, 378379

    • GDPR (General Data Protection Regulation), 201, 374375

  • self-service reports, 363

  • semi-structured data, 96

  • Sense, measuring data quality with, 421

  • sensitive PII information, 402

  • SEO (search engine optimization), 334

  • SGML (Standard Generalized Markup Language), 104

  • sharding, data, 202

  • shared drives, 384

  • shareholder dashboards, 328

  • SHOW TABLES statement, 198

  • signed char data type, 56

  • simple linear regression, 248249

  • Simple Object Access Protocol (SOAP), 122

  • single-tier architecture, 17

  • size

    • of data types, 56

    • of report fonts, 310311

  • slowly changing dimensions. See SCDs (slowly changing dimensions)

  • Snowflake, 21

  • snowflake schemas, 34, 4144

    • characteristics of, 43

    • definition of, 41

    • examples of, 4142

    • structure of, 41

  • SOAP (Simple Object Access Protocol), 122

  • sociograms, 274

  • software as a service. See SaaS (software as a service)

  • Sony Walkman, 79

  • sorting data, 172173

  • sources, data collection, 258, 260261

  • sovereignty, data, 406

  • SOX (Sarbanes-Oxley Act), 379

  • Spearman correlation test, 146

  • specification mismatches, 148

  • SPICE (Super-fast, Parallel, In-memory Calculation Engine), 221

  • spiders, 129

  • split complementary color schemes, 308

  • spontaneous observation, 133134

  • spot checking, 432433

  • spreadsheets, 8, 211213. See also Microsoft Excel

  • SPSS (Statistical Package for the Social Sciences), 157, 214215

  • SQL (Structured Query Language), 8, 188, 210211. See also MySQL

  • square brackets ([ ]), 103

  • stacked charts, 352353

  • standard deviation, 233234

  • Standard Generalized Markup Language (SGML), 104

  • standardization of names, 448451

  • star schemas, 34, 3840

    • characteristics of, 40

    • definition of, 38

    • illustration of, 38

    • snowflake schema versus, 4344

    • structure of, 3940

  • Stata, 221

  • statements, 188

    • CREATE INDEX, 196



    • DROP TABLE, 199

    • IS NOT NULL, 96

    • IS NULL, 96

    • SHOW TABLES, 198

  • static data, 326327

  • static data masking, 379380

  • static reports, 359360

  • statistical analysis, 266

  • Statistical Analytical System (SAS), 215216

  • statistical methods

  • Statistical Package for the Social Sciences (SPSS), 157, 214215

  • stewards, data, 368

  • storage environment requirements, 383386

    • cloud-based storage, 385386

    • local storage, 384385

    • shared drives, 384

    • storage sizes, 56

  • strategy gaps, 263

  • stratified sampling, 133

  • streamlined data access, MDM (master data management) for, 456457

  • string data type, 5859

  • strings

  • structure observation, 133134

  • structured data, 9094

    • abstract view of, 91

    • examples of, 9192

    • key/value pairs, 94

    • rows/columns, 9294

    • unstructured data compared to, 9091

  • Structured Query Language. See MySQL; SQL (Structured Query Language)

  • style guides, 312313

  • subqueries, optimization of, 187

  • subscriptions, 332

  • subsets of records, 200202

  • SUM function, 177

  • summary section, reports, 303, 304306

  • Super-fast, Parallel, In-memory Calculation Engine (SPICE), 221

  • superficial redundancy, 142

  • surveys, 129132

  • SVG (Scalable Vector Graphics), 84

  • symmetric encryption, 376

  • system functions, 179

  • systematic sampling, 133


  • tab-delimited files, 100101

  • Tableau, 216217, 302, 465

    • measuring data quality with, 421

    • mockups/wireframes, 329

    • reports. See reports

    • scheduled deliveries in, 333

  • tables

  • tab-separated values (TSV), 100101

  • tactical reports, 364

  • target audience, 293

  • telephone surveys, 131

  • temporary tables, 197200

  • tertiary audience, 293

  • tertiary colors, in reports, 307

  • testing

  • test-taking tips, 461465

  • tetradic color schemes, 308

  • text analysis, 266

  • text data type, 64

  • text databases, 99100

  • text/flat files, 99100

  • third normal form (3NF), 170171

  • TIBCO, 447

  • time, comparison of data over, 268271

    • comparison, 270

    • composition, 269

    • distribution, 270271

    • relationships, 268

  • timely integration, 455

  • tracking of measurements against defined goals, 271

  • transactional processing. See OLTP (Online Transactional Processing)

  • transform stage

    • ELT (extract, load, and transform) process, 119

    • ETL (extract, transform, and load) process, 116

  • transformation, data, 419420

  • transmission, data, 378379

  • TRANSPOSE function, 169

  • transposition of data, 168169

  • tree maps, 351352

  • trend analysis, 267268

  • triadic color schemes, 308

  • TSV (tab-separated values), 100101

  • t-tests, 242243

  • Twitter, 213

  • two-sample t-tests, 243

  • two-tier architecture, 17

  • Type 0 SCDs (slowly changing dimensions), 47

  • Type 1 SCDs (slowly changing dimensions), 48

  • Type 2 SCDs (slowly changing dimensions), 4849

  • Type 3 SCDs (slowly changing dimensions), 49

  • Type 4 SCDs (slowly changing dimensions), 4950

  • type I errors, 247

  • type II errors, 247

  • types of data. See also data types

  • typography, reports, 310311


  • U test, 146

  • Uber, 213

  • undefined values, 96

  • uniqueness constraints, 395

  • unsigned char data type, 56

  • unstructured data

    • abstract view of, 9495

    • processing and analysis of, 9596

    • structured data compared to, 9091

  • UPDATE function, 179

  • use requirements, 389393

    • AUPs (acceptable use policies), 389390

    • data deletion, 391392

    • data processing, 390391

    • data retention, 392393

  • user group-based access control, 373

  • user request-based data deletion, 391

  • utilization of data, 455


  • validation

  • validity of data, 416

  • VARCHAR data type, 64

  • variable categories, 167

  • variables

  • variance, 233

  • vector images, 8185

  • vendor dashboards, 328

  • version numbers, 317

  • video, 8185

    • file formats, 7981

    • refresh rates, 79

  • VideoLAN, 79

  • view schemas, 33

  • views

    • business reports, 287288

    • definition of, 33

  • Virginia Consumer Data Privacy Act, 375

  • virtual private networks (VPNs), 378

  • Visme, 302, 353

  • Vista, 353

  • Visual Query Language (VQL), 217

  • Vorbis, 81

  • VPNs (virtual private networks), 378

  • VQL (Visual Query Language), 217


  • W3C (World Wide Web Consortium), 84

  • warehouses, data. See data warehouses

  • waterfall charts, 348

  • watermarks, 313

  • WAV (Waveform Audio File Format), 75

  • Waveform Audio File Format (WAV), 75

  • web scraping, 128129

  • web services, 121125

  • WebM file format, 81

  • WebP file format, 84

  • WEP (Wired Equivalent Privacy), 378

  • Wi-Fi Protected Access 2 (WPA2), 378

  • Wi-Fi Protected Access 3 (WPA3), 378

  • wildcard searches, optimization of, 186

  • Windows Media Audio (WMA), 78

  • Windows Media Video (WMV), 80

  • wired access, secure, 378

  • Wired Equivalent Privacy (WEP), 378

  • wireframes, 329331

  • wireless access, secure, 378

  • WMA (Windows Media Audio), 78

  • WMV (Windows Media Video), 80

  • Word

    • citation options, 318319

    • versioning tools, 317

  • word clouds, 354357

  •, 355

  •, 355

  • World Wide Web Consortium (W3C), 84

  • WPA2 (Wi-Fi Protected Access 2), 378

  • WPA3 (Wi-Fi Protected Access 3), 378


