Table of Contents

Cover image

Title page

Copyright

Dedication

Introduction

Performance Apologetic

A Word on Premature Optimization

The Roadmap

Part 1: Background Knowledge

Chapter 1: Early Intel® Architecture

Abstract

1.1 Intel® 8086

1.2 Intel® 8087

1.3 Intel® 80286 and 80287

1.4 Intel® 80386 and 80387

Chapter 2: Intel® Pentium® Processors

Abstract

2.1 Intel® Pentium®

2.2 Intel® Pentium® Pro

2.3 Intel® Pentium® 4

Chapter 3: Intel® Core™ Processors

Abstract

3.1 Intel® Pentium® M

3.2 Second Generation Intel® Core™ Processor Family

Chapter 4: Performance Workflow

Abstract

4.1 Step 0: Defining the Problem

4.2 Step 1: Determine the Source of the Problem

4.3 Step 2: Determine Whether the Bottleneck Can Be Avoided

4.4 Step 3: Design a Reproducible Experiment

4.5 Step 4: Check Upstream

4.6 Step 5: Algorithmic Improvement

4.7 Step 6: Architectural Tuning

4.8 Step 7: Testing

4.9 Step 8: Performance Regression Testing

Chapter 5: Designing Experiments

Abstract

5.1 Choosing a Metric

5.2 Dealing with External Variables

5.3 Timing

5.4 Phoronix Test Suite

Part 2: Monitors

Chapter 6: Introduction to Profiling

Abstract

6.1 PMU

6.2 Top-Down Hierarchical Analysis

Chapter 7: Intel® VTune™ Amplifier XE

Abstract

7.1 Installation and Configuration

7.2 Data Collection and Reporting

Chapter 8: Perf

Abstract

8.1 Event Infrastructure

8.2 Perf Tool

Chapter 9: Ftrace

Abstract

9.1 DebugFS

9.2 Kernel Shark

Chapter 10: GPU Profiling Tools

Abstract

10.1 Traditional Graphics Stack

10.2 buGLe

10.3 Apitrace

Chapter 11: Other Helpful Tools

Abstract

11.1 GNU Profiler

11.2 Gcov

11.3 PowerTOP

11.4 LatencyTOP

11.5 Sysprof

Part 3: Optimization Techniques

Chapter 12: Toolchain Primer

Abstract

12.1 Compiler Flags

12.2 ELF and the x86/x86_64 ABIs

12.3 CPU Dispatch

12.4 Coding Style

12.5 x86 Unleashed

Chapter 13: Branching

Abstract

13.1 Avoiding Branches

13.2 Improving Prediction

Chapter 14: Optimizing Cache Usage

Abstract

14.1 Processor Cache Organization

14.2 Querying Cache Topology

14.3 Prefetch

14.4 Improving Locality

Chapter 15: Exploiting Parallelism

Abstract

15.1 SIMD

Chapter 16: Special Instructions

Abstract

16.1 Intel® Advanced Encryption Standard New Instructions (AES-NI)

16.2 PCLMUL-Packed Carry-Less Multiplication

16.3 CRC32

16.4 SSE4.2 String Functions

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset