Table of Contents

Cover image

Title page

Copyright

Dedication

Foreword

Preface

Organization of this book

Chapter Organization

How to Read this Book

Nota Bene

Glossary

Author Biography

Chapter 1: The Simple Life

Abstract

1.1 Simplification Drives Scientific Progress

1.2 The Human Mind is a Simplifying Machine

1.3 Simplification in Nature

1.4 The Complexity Barrier

1.5 Getting Ready

Open Source Tools

Glossary

Chapter 2: Structuring Text

Abstract

2.1 The Meaninglessness of Free Text

2.2 Sorting Text, the Impossible Dream

2.3 Sentence Parsing

2.4 Abbreviations

2.5 Annotation and the Simple Science of Metadata

2.6 Specifications Good, Standards Bad

Open Source Tools

Glossary

Chapter 3: Indexing Text

Abstract

3.1 How Data Scientists Use Indexes

3.2 Concordances and Indexed Lists

3.3 Term Extraction and Simple Indexes

3.4 Autoencoding and Indexing with Nomenclatures

3.5 Computational Operations on Indexes

Open Source Tools

Glossary

Chapter 4: Understanding Your Data

Abstract

4.1 Ranges and Outliers

4.2 Simple Statistical Descriptors

4.3 Retrieving Image Information

4.4 Data Profiling

4.5 Reducing Data

Open Source Tools

Glossary

Chapter 5: Identifying and Deidentifying Data

Abstract

5.1 Unique Identifiers

5.2 Poor Identifiers, Horrific Consequences

5.3 Deidentifiers and Reidentifiers

5.4 Data Scrubbing

5.5 Data Encryption and Authentication

5.6 Timestamps, Signatures, and Event Identifiers

Open Source Tools

Glossary

Chapter 6: Giving Meaning to Data

Abstract

6.1 Meaning and Triples

6.2 Driving Down Complexity with Classifications

6.3 Driving Up Complexity With Ontologies

6.4 The Unreasonable Effectiveness of Classifications

6.5 Properties That Cross Multiple Classes

Open Source Tools

Glossary

Chapter 7: Object-Oriented Data

Abstract

7.1 The Importance of Self-Explaining Data

7.2 Introspection and Reflection

7.3 Object-Oriented Data Objects

7.4 Working with Object-Oriented Data

Open Source Tools

Glossary

Chapter 8: Problem Simplification

Abstract

8.1 Random Numbers

8.2 Monte Carlo Simulations

8.3 Resampling and Permutating

8.4 Verification, Validation, and Reanalysis

8.5 Data Permanence and Data Immutability

Open Source Tools

Glossary

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset