Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Previous Chapter

Lists of Figures, Tables and Algorithms

LIST OF FIGURES

INTRODUCTION

I.1. Pipelined vs. simultaneous execution

I.2. Superscalar execution

I.3. Simple superscalar pipelined steps

I.4. VLIW processors

I.5. Block diagram of a TTA

CHAPTER 1

1.1. Classes of phase-ordering problems

1.2. Classes of best-parameters problems

CHAPTER 2

2.1. The ST220 cumulative resource availabilities and resource class requirements

2.2. Source code and the inner loop body code generator representation

2.3. The block scheduled loop body and the block schedule resource table

2.4. The software pipeline local schedule and the software pipeline resource table

CHAPTER 3

3.1. The time-indexed dependence inequalities of Christofides et al. [CHR 87]

CHAPTER 4

4.1. Original dependence graph

4.2. Augmented dependence graph

4.3. A reservation table, a regular reservation table and a reservation vector

4.4. Counted loop software pipeline construction with and without preconditioning

4.5. While-loop software pipeline construction without and with modulo expansion

4.6. The st200cc R3.2 compiler performances on the HP benchmarks

4.7. Definition of the contributions

to the register pressure

4.8. Sample cyclic instruction scheduling problem

CHAPTER 5

5.1. Sample schedules for a two-resource scheduling problem (horizontal time)

5.2. Scoreboard Scheduling within the time window (window_size = 4)

5.3. Scoreboard Scheduling and moving the time window (window_size = 4)

5.4. Benchmark basic blocks and instruction scheduling results

5.5. Time breakdown for cycle scheduling and scoreboard scheduling

CHAPTER 6

6.1. Example of Alpha 21264 processor

6.2. Example of Power 4 processor

6.3. Cache behavior of Itanium 2 processor

6.4. Vectorization on Itanium 2

6.5. Stride patterns classification

CHAPTER 7

7.1. DAG example with acyclic register need

7.2. Periodic register need in software pipelining

7.3. Circular lifetime intervals

7.4. Relationship between the maximal clique and the width of a circular graph

7.5. Examples of DDG with unique possible killer per value

CHAPTER 8

8.2. Valid killing function and bipartite decomposition

8.3. Example of computing the acyclic register saturation

CHAPTER 9

9.1. Example for SIRA and reuse graphs

CHAPTER 10

10.1. Linear program based on shortest paths equations (SPE)

CHAPTER

11.1. Minimal unroll factor computation depending on phase ordering

11.2. Example to highlight the short-comings of the MVE technique

11.3. SWP kernel unrolled with MVE

11.4. Example to explain the optimality of the meeting graph technique

11.5. Example for SIRA and reuse graphs

11.6. Graphical solution for the fixed loop unrolling problem

11.7. How to traverse the lattice S

11.8. Modifying reuse graphs to minimize loop unrolling factor

11.9. Loop unrolling values in the search space S

11.10. Example of loop unrolling reduction using meeting graph

11.11. The new search space S in the meeting graph

CHAPTER 12

12.1. Observed execution times of some SPEC OMP 2001 applications (compiled with gcc)

12.2. The Speedup-Test protocol for analyzing the average execution time

12.3. The Speedup-Test protocol for analyzing the median execution time

APPENDIX 1

A1.1. Histograms on the number of nodes (loop statements): ||V||

A1.2. Histograms on the number of statements writing inside general registers ||V^R,GR||

A1.3. Histograms on the number of statements writing inside branch registers ||V^R,BR ||

A1.4. Histograms on the number of data dependences ||E||

A1.5. Histograms on MinII values

A1.6. Histograms on the number of strongly connected components

APPENDIX 2

A2.1. Accuracy of the GREEDY-K heuristic versus optimality

A2.2. Error ratios of the GREEDY-K heuristic versus optimality

A2.3. Execution times of the GREEDY-K heuristic

A2.4. Maximal periodic register need vs. initiation interval

A2.5. Periodic register saturation in unrolled loops

APPENDIX 3

A3.1. Percentage of DDG treated successfully by SIRALINA and the impact on the MII

A3.2. Average increase of the MII

A3.3. Boxplots of the execution times of SIRALINA (all DDG)

A3.4. Plugging SIRA into the ST231 compiler toolchain (LAO backend)

A3.5. The impact of SIRA on static code quality

A3.6. Loops where spill code disappears completely

A3.7. Speedups of the whole application using the standard input

A3.8. Performance characterization of some applications

A3.9. Performance characterization of the FFMPEG application

APPENDIX 4

A4.1. Execution times of UAL (in seconds)

A4.2. Execution times of CHECK (in seconds)

A4.3. Execution times of SPE (in seconds)

A4.3. Execution times of SPE (in seconds)

A4.4. Maximum observed number of iterations for SPE

A4.5. Comparison of the heuristics ability to reduce the register pressure (SPEC2000)

A4.6. Comparison of the heuristics ability to reduce the register pressure (MEDIABENCH)

A4.7. Comparison of the heuristics ability to reduce the register pressure (SPEC2006)

A4.8. Comparison of the heuristics ability to reduce the register pressure (FFMPEG)

APPENDIX 5

A5.1. Loop unrolling minimization experiments (random DDG, single register type)

A5.2. Average code compaction ratio (random DDG, single register type).

A5.3. Weighted harmonic mean for minimized loop unrolling degree

A5.4. Initial versus final loop unrolling in each configuration

A5.5. Observations on loop unrolling minimization

A5.6. Final loop unrolling factors after minimization

APPENDIX 6

A6.1. Execution time repartition for Spec benchmarks

A6.2. Efficiency of data prefetching and preloading. Note that prefetching is not applicable to all applications

A6.3. Initial and modified codes sizes

APPENDIX 7

A7.1. The sample minimum is a not necessarily a good estimation of the theoretical minimum

LIST OF TABLES

INTRODUCTION

I.1. Other contributors to the results presented in this book

CHAPTER 3

3.1. Polynomial-time solvable parallel machine scheduling problems

3.2. NP-hard parallel machine scheduling problems

3.3. Performance guarantees of the GLSA with arbitrary priority

3.4. Performance guarantees of the GLSA with a specific priority

3.5. Problems solved by the algorithm of Leung, Palem and Pnueli [LEU 01] steps 1 and 2

CHAPTER 6

6.1. Examples of measured performance degradation factors

6.2. Worst-case performance gain on Alpha 21264

6.3. Worst-case performance gain on Itanium 2

6.4. Examples of code and data regularity/irregularity

6.5. Examples of prefetch: simple case and using extra register case

CHAPTER 12

12.1. Number of non-statistically significant speedups in the tested benchmarks

APPENDIX 2

A2.1. Optimal versus approximate PRS

APPENDIX 5

A5.1. Machine with bounded number of registers

A5.2. Machine with bounded registers with option continue

A5.3. Number of unrolled loops compared to the number of spilled loops resulted (by using meeting graphs)

A5.4. Arithmetic mean of initial loop unrolling, final loop unrolling and ratio

A5.5. Comparison between final loop unrolling factors and MAXLIVE

A5.6. Optimized loop unrolling factors of scheduled versus unscheduled loops

APPENDIX 7

A7.1. Monte Carlo simulation of a Gaussian distribution

A7.2. The two risk levels for hypothesis testing in statistical and probability theory

A7.3. SPEC OMP2001 on low-overhead environment

A7.4. SPECCPU2006 executed on low overhead environment

A7.5. SPEC OMP2001 on high-overhead environment

A7.6. SPECCPU2006 on high-overhead environment

LIST OF ALGORITHMS

CHAPTER 1

1.1. Computing a good compilation sequence in the compilation cost model

1.2. Optimize_Node(n)

CHAPTER 8

8.1. GREEDY-K heuristic

CHAPTER 10

10.1. The algorithm IterativeSIRALINA

10.2. The function UpdateReuseDistances

CHAPTER 11

11.1. Fixed loop unrolling problem

11.4. LCM-MIN algorithm

11.5. General fixed loop unrolling problem

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.