Index

Note: Page numbers followed by f indicate figures and t indicate tables.

A

Advanced Configuration and Power Interface(ACPI) specification 
AML interpreter 44
C state 46–48
definition 44
EIST 45
P states 45–46
T states 44–45
Apitrace 185–187, 186f, 187f
Application Binary Interface (ABI) 
calling conventions 215–216
definition 214
natural alignment 215
Array of structures (AOS) 267–269
Asymmetric mode 50

B

blk tracer 171–174, 172t
Branching 
CC instruction 243–244, 245
CMP instruction 245
conditional branch 239
JCC instruction 243–244, 245
masks 241, 242
optional code 241
prediction accuracy 240–241
profile guided optimization 246–247
shift instructions 242
size calculation 242, 243
speculative execution 239–240
unconditional branch 239
Bus interface unit (BIU) 6–7

C

Caching 
associative cache 251
cache hit 249
cache line 251–252
cache miss 249
CPUID 253–256
data locality 257–259
dcache 251
icache 251
level 1 (L1) cache 251
level 2 (L2) cache 251
level 3 (L3) cache 251
LLC 250, 251
prefetching 256–257
storage hierarchy 249, 250f
sysfs interface 256
temporal data locality 249
Clock time 82–84
Collect action 
algorithmic/microarchitectural analyses 123
analysis type 123
command-line interface 123–124
configuration 125–127
graphical interface 124, 124f
results directory 124
XML file 123
Command-line interface 
analysis type  See (Collect action)
apitrace 185
callstacks report type 128
finalize action 127
general-exploration analysis 128–129
gprof-cc report type 128
hotspots report type 128
Phoronix Test Suite 87–88
result-dir arguments 127–128
summary report type 128
sysprof-module 200
top-down report type 128
Compiler flags (CFLAGS) 207–209, 207t, 208t
Complex Instruction Set Computer (CISC) 35–36
Conditional branch 239, 240, 241, 243
Controllable external variables 
CPU migrations 76–77
disk performance, variability of 75
HTML page 74–75
microbenchmarks 74
mmap(2) system call 76
noisy system 77
P and C states 76
remote resources 75
tmpfs file 75–76
Control unit (CU) 20
Counted event configuration 143–144, 144t
CPU cycles 
code snippet 82
frequency 80
TSC 80, 81–82
CPU dispatch 
CPUID 217–220
Procfs 221–222
runtime dispatching 222–225
CPU Identification instruction (CPUID) 217–220, 253–256
CRC32 275
Custom tests 
downloads.xml 96
install.sh script 96–97
new directory 95
results-definition.xml file 97
support-check.sh script 96
test-definition.xml file 95–96

D

Data collection 
analysis type  See (Collect action)
buGLe 182
command line interface 123
configuration option 122, 122f, 123
finalize action 127
graphical interface 122
kernelshark tool 175
LatencyTOP 199
perf tool 156–163
trace and trace_pipe file 168
DebugFS 
available_tracers file 168
blk tracer 171–174, 172t
busybox 166
current_tracer file 168
function and function_graph tracers 168–171
mount(1) command 167
nop tracer 174
trace and trace_pipe files 168
DecodedICache 110, 111
Direct Rendering Infrastructure (DRI) 
batchbuffer 180
DRM 179, 180
GEM 180
KMS support 180
Direct Rendering Manager (DRM) 179, 180

E

Enhanced Intel SpeedStep® technology (EIST) 45
Event configuration 143–144, 144t, 145t
Event counters 
APIC interrupt 106
bottleneck 107
edge detect bit 106
fixed-function 105
interface and hardware 105
microarchitectural performance analysis 107
MSR format 106
PEBS 106
programmable counters 107
user and operating system modes 106
Executable and Linkable Format (ELF) 
executable files 209
execution view 210
linking view 209–210
relocatable files 209
shared object files 209
Execution unit (EU) 6–7, 36–37
External variables 
uncontrollable 77–79

F

Fixed-function counter 105, 107
Flex Memory technology 49–50
Floating point unit (FPU) 20
Ftrace 
dynamic tracepoints 165–166
events 165
kernelshark 174–176, 175f
Linux distributions  See (DebugFS)
static tracepoints 165–166
function_graph tracer 168–171
function tracer 168–171

G

Git 
bisect 62–63
cleaning patches 63–65, 64f
sending patches 65–67, 66t, 67t
Global descriptor table (GDT) 22
Global Offset Table (GOT) 212–213
GNU code coverage (gcov) 192–193
GNU compiler toolchain (GCC)  See Toolchains
GNU profiler (gprof) 
binary data file 190
compiler support 189
flat profile 190
gcov 192–193
graph profile 190–192
limitations 192
workload testing 189–190
Graphical user interface (GUI) 
analysis type  See (Collect action)
finalize action 127
Phoronix Test Suite 87–88
report action 129–132
sysprof-module 200
Graphics Execution Manager (GEM) 180
Graphics processing unit (GPU) 
apitrace 185–187, 186f, 187f
buGLe 182–185
cairo 181–182
Mesa 180–181
pixman 182
Xlib/XCB library 178–179

I

IEEE 754 floating point 
C99 support 19
development 16
formats, precision, and environment 18–19
scientific notation 17
IFUNC 224–225
Instruction-level parallelism (ILP) 32, 33
Instruction pointer (IP) 9, 213, 243
Intel® 8086 
ADD instruction 11, 11t
BIU 6–7
boolean logic 12
data movement 11
execution unit 6–7
flow control 12
general purpose registers 8–9
IBM compatible PC 6
integer arithmetic 11–12
machine code format 13–16, 14t, 15t
operands 10–11
principles 5–6
ROM and BIOS 10
SDM 5
status registers 9–10
strings 12–13, 13t
system diagram 6, 6f
system state 7–8
x86 processor 5, 5f
Intel® 8087 
common and error-prone computations 16
IEEE 754 standard  See (IEEE 754 floating point)
x87 floating point 19–20
Intel® Advanced Vector Set Extensions (Intel® AVX) 263–264
Intel® AES New Instructions (AES-NI) 273–274
Intel® 80286 and 80287 
MMU 20–21
protected and real mode 21
protected mode segmentation 21–22
task control 22–23
Intel® 80386 and 80387 
32-bit mode 24–26
Linux kernel splits 28
Linux project 23–24
logical and linear addresses 26
page frame 26
PDE and PTE 26–27
TLB 27–28
Virtual 8086 mode 24
Intel® C and C++ Compilers (ICC) 220, 230, 235
Intel® Core™ processor 
Flex Memory technology 49–50
Intel HD graphics 49
Intel® Xeon® processor 48, 48f
RAPL 51–52
Turbo Boost technology 50–51
Intel® Graphics Media Accelerators (GMA) 49
Intel Hyper-Threading Technology 41
Intel® Pentium® 4 
Hyper-Threading 41
IA-32e mode 38–40
multi-core processor 40–41
P6 microarchitecture 38
Intel® Pentium® Pro 
µops 35–36
out-of-order execution 36–38
PAE 34–35
P6 architecture 34
Intel® Pentium® processors 
P5 microarchitecture 32
SIMD 32
superscalar 33–34
Intel® Thread Building Blocks (Intel® TBB) 262–263
Intel® VTune™ Amplifier XE 
apwr.ko driver 119
data collection  See (Data collection)
installation 118
kernel modules 119–120
pax driver 118
reporting  See (Report action)
sep driver 118
system configuration 120–121
vtsspp module 118–119

K

Kernel Mode Setting (KMS) support 180

L

Last Level Cache (LLC) 250, 251
LatencyTOP 199–200
libdrm 180–181
Linux dynamic tracepoints 165–166
Linux ftrace 
dynamic tracepoints 165–166
events 165
kernelshark 174–176, 175f
Linux distributions  See (DebugFS)
static tracepoints 165–166
Linux perf 
cooked/raw events 156–157, 157t
counting events 146–150
event selection  See (perf event)
measurement parameters 142, 142t, 144t, 145t
perf_event_open(2) function 136–137, 136t
perf record 158–161, 160f
perf stat 158
perf timechart 161, 162f, 163f
sampling events 150–156
Linux static tracepoints 165–166
Local descriptor table (LDT) 22
Low Level Virtual Machine toolchain (LLVM) 206, 230, 246, 247
L-shaped memory configuration 50

M

Machine code format 
CMP encoding 15–16
MOD field encodes 14, 15t
opcode 13
PUSH and POP instructions 14, 14t
REG field encodes 14, 15, 15t
variable length 13
Memory Type Range Registers (MTRR) 252
Microcode Sequencer 111
Micro-ops (µops)  See Top-down hierarchical analysis

N

Non-Uniform Memory Access (NUMA) 40
nop tracer 174
Numeric execution unit (NEU) 20

O

OpenGL 
apitrace 185–187, 186f, 187f
buGLe 182–185
Mesa 180–181
Open source project 
advantages 59–60
development method 59
Git  See (Git)
Linux distributions 60
results 61–62

P

Packed carry-less multiplication (PCLMUL) 274–275
Page Attribute Table (PAT) 252
Page directory entry (PDE) 26–27
Page directory pointer table (PDPT) 35, 38
Page table entry (PTE) 26–27
Parallelism 
COW 262
Hyper-Threading 41
INTEL® 8086 5–6
Intel® TBB library 262–263
perf event 
data structure 137
event types 138–139
perf_event.h header file 138
PERF_EVENT_HW_CACHE 141–142, 141t, 142f
PERF_EVENT_TRACEPOINT 141
PERF_TYPE_BREAKPOINT 142, 142t
PERF_TYPE_HARDWARE 139
PERF_TYPE_RAW 139–140, 140t
PERF_TYPE_SOFTWARE 141
perf_event_attr.size field 143
Performance Monitoring Unit (PMU) 
architectural/nonarchitectural events 104
collection methodology 105
.CORE/.THREAD counts 104
event counters  See (Event counters)
top-down method  See (Top-down hierarchical analysis)
Performance workflow 
algorithmic tuning 67–68
architectural tuning 68–70
bottleneck 57
problem solving 54–56
reproducible experiment 57–59
testing 70–71
upstream  See (Open source project)
perf record 158–161, 160f
perf stat 158
perf timechart 161, 162f, 163f
perf tool 
cooked/raw events 156–157, 157t
counting events 146–150
event selection  See (perf event)
measurement parameters 142, 142t, 144t, 145t
perf_event_open(2) function 136–137, 136t
perf record 158–161, 160f
perf stat 158
perf timechart 161, 162f, 163f
sampling events 150–156
Phoronix Test Suite 
advantages 85
batch mode 94
benchmarks 88–89, 88t
command-line interface and GUI 87–88, 87f
commands 89
configuration 90
customs  See (Custom tests)
directory 90
execution 92–94
graphical table 85–87, 85f, 86f
installation 91–92
local tests and pts 89–90
OpenBenchmarking website 89
resources 98–100
test results 94–95
Physical Address Extensions (PAE) 34–35
Pipeline hazards 33
Position Independent Code (PIC) 9
CALL instruction 213
dynamic linking 211–212
LD_LIBRARY_PATH environmental variable 214
LD_PRELOAD environment variable 214
relocatable code 212
static linking 210–211
writable code pages 213
PowerTOP 
Device Stats window 197, 198f
Frequency Stats window 196, 197f
Idle Stats window 195, 196f
ncurses-based interface 193
power consumption 193
tunables 197–199, 198f
wakeups 194–195, 194f
Precise Event Based Sampling (PEBS) 106
Procedure Linkage Table (PLT) 212–213
Process scheduling 23
Profiling 
apitrace tool 185–187, 186f, 187f
buGLe tool 182–185
definition 103
monitors 104

Q

Qapitrace 185

R

Reduced Instruction Set Computer (RISC) 35–36
Report action 
command-line interface 127–129
graphical interface 129–132
Running Average Power Limit (RAPL) 51–52

S

Sampled event configuration 144, 145t
Scale-index-base(SIB) byte 25
Single Instruction Multiple Data (SIMD) 
architectural tuning 68
data alignment 265–266
Intel® AVX 263–264
MOVDQA and MOVDQU instructions 264–265
operations 267–271, 268f
Pentium processor 32
SSE 263
vectorized and nonvectorized implementation 266–267
VMOVDQA and VMOVDQU instructions 264–265
Software Developer Manual (SDM) 5
Spatial locality 257–259
SSE4.2 instruction 275–276, 276f
Streaming SIMD Extensions (SSE) 263
Strings 
const qualifier 229
Intel® 8086 12–13, 13t
LatencyTOP 199–200
SSE4.2 275–276
Structure of arrays (SOA) 269–270
Sysprof 200–201, 201f

T

Temporal locality 257–259
Tick Tock model 44
Time-stamp (TSC) counter 80, 81–82
Timing 
clock time 82–84
cycle  See (CPU cycles)
monotonically increasing timer 79
process time 80
Unix time 82–84
wall time 80
Toolchains 
assembler 206
CFLAGS 207–209, 207t, 208t
compiler intrinsics 206, 232, 235–236
const qualifier 229
function dispatch  See (CPU dispatch)
inline assembly 232, 234–235, 236t
linker 206
loop unrolling 231
natural alignment 230–231
pointer aliasing 226–228
signed and unsigned qualifiers 228–229
standalone assembly 232–234
volatile qualifier 229
x86 and x86_64 architectures  See (Executable and Linkable Format (ELF))
Top-down hierarchical analysis 
bad speculation 115, 115f
bottleneck 107
Core Bound 114–115
decomposing µops 109, 109f
Front End Bound 110–111, 111f
Memory Bound 112–114
pipeline slot 108
retiring 115–116
Tracers 
available_tracers file 167, 168
blk tracer 171–174, 172t
current_tracer file 168
function and function_graph tracers 168–171
nop tracer 174
trace and trace_pipe files 168
tracing_on controls 168
Translation Lookaside Buffer (TLB) 27–28
Turbo Boost technology 50–51

U

Unconditional branch 239
Uncontrollable external variables 
data points 77
estimator 78
implementation 78–79
population and calculations 77, 78
standard deviation 78
Unix time 82–84

V

X

x87 floating point 19–20
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset