Contents

Introduction

I.1. Inside this book

I.2. Other contributors

I.3. Basics on instruction-level parallelism processor architectures

Part 1 Prolog: Optimizing Compilation

1 On The Decidability Of Phase Ordering In Optimizing Compilation

1.1. Introduction to the phase ordering problem

1.2. Background on phase ordering

1.3. Toward a theoretical model for the phase ordering problem

1.4. Examples of decidable simplified cases

1.5. Compiler optimization parameter space exploration

1.6. Conclusion on phase ordering in optimizing compilation

Part 2 Instruction Scheduling

2 Instruction Scheduling Problems And Overview

2.1. VLIW instruction scheduling problems

2.2. Software pipelining

2.3. Instruction scheduling and register allocation

3 Applications Of Machine Scheduling To Instruction Scheduling

3.1. Advances in machine scheduling

3.2. List scheduling algorithms

3.3. Time-indexed scheduling problem formulations

4 Instruction Scheduling Before Register Allocation

4.1. Instruction scheduling for an ILP processor: case of a VLIW architecture

4.2. Large neighborhood search for the resource-constrained modulo scheduling problem

4.3. Resource-constrained modulo scheduling problem

4.4. Time-indexed integer programming formulations

4.5. Large neighborhood search heuristic

4.6. Summary and conclusions

5 Instruction Scheduling After Register Allocation

5.1. Introduction

5.2. Local instruction scheduling

5.3. Global instruction scheduling

5.4. Experimental results

5.5. Conclusions

6 Dealing In Practice With Memory Hierarchy Effects And Instruction Level Parallelism

6.1. The problem of hardware memory disambiguation at runtime

6.2. Data preloading and prefetching

Part 3. Register Optimization

7 The Register Need Of A Fixed Instruction Schedule

7.1. Data dependence graph and processor model for register optimization

7.2. The acyclic register need

7.3. The periodic register need

7.4. Computing the periodic register need

7.5. Some theoretical results on the periodic register need

7.6. Conclusion on the register requirement

8 The Register Saturation

8.1. Motivations on the register saturation concept

8.2. Computing the acyclic register saturation

8.3. Computing the periodic register saturation

8.4. Conclusion on the register saturation

9 Spill Code Reduction

9.1. Introduction to register constraints in software pipelining

9.2. Related work in periodic register allocation

9.3. SIRA: schedule independant register allocation

9.4. SIRALINA: an efficient polynomial heuristic for SIRA

9.5. Experimental results with SIRA

9.6. Conclusion on spill code reduction

10 Exploiting The Register Access Delays Before Instruction Scheduling

10.1. Introduction

10.2. Problem description of DDG circuits with non-positive distances

10.3. Necessary and sufficient condition to avoid non-positive circuits

10.4. Application to the SIRA framework

10.5. Experimental results on eliminating non-positive circuits

10.6. Conclusion on non-positive circuit elimination

11 Loop Unrolling Degree Minimization For Periodic Register Allocation

11.1. Introduction

11.2. Background

11.3. Problem description of unroll factor minimization for unscheduled loops

11.4. Algorithmic solution for unroll factor minimization: single register type

11.5. Unroll factor minimization in the presence of multiple register types

11.6. Unroll factor reduction for already scheduled loops

11.7. Experimental results

11.8. Related work

11.9. Conclusion on loop unroll degree minimization

Part 4 Epilog: Performance, Open Problems

12 Statistical Performance Analysis: The Speedup-Test Protocol

12.1. Code performance variation

12.2. Background and notations

12.3. Analyzing the statistical significance of the observed speedups

12.4. The Speedup-Test software

12.5. Evaluating the proportion of accelerated benchmarks by a confidence interval

12.6. Experiments and applications

12.7. Related work

12.8. Discussion and conclusion on the Speedup-Test protocol

Conclusion

Appendix 1 Presentation Of The Benchmarks Used In Our Experiments

A1.1. Qualitative benchmarks presentation

A1.2. Quantitative benchmarks presentation

A1.3. Changing the architectural configuration of the processor

Appendix 2 Register Saturation Computation On Stand-Alone Ddg

A2.1. The cyclic register saturation

A2.2. The periodic register saturation

Appendix 3 Efficiency Of Sira On The Benchmarks

A3.1. Efficiency of SIRALINA on stand-alone DDG

A3.2. Efficiency of SIRALINA plugged with an industrial compiler

Appendix 4 Efficiency Of Non-Positive Circuit Elimination In The Sira Framework

A4.1. Experimental setup

A4.2. Comparison between the heuristics execution times

A4.3. Convergence of the proactive heuristic (iterative SIRALINA)

A4.4. Qualitative analysis of the heuristics

A4.5. Conclusion on non-positive circuit elimination strategy

Appendix 5 Loop Unroll Degree Minimization: Experimental Results

A5.1. Stand-alone experiments with single register types

A5.2. Experiments with multiple register types

Appendix 6 Experimental Efficiency Of Software Data Preloading And Prefetching For Embedded Vliw

Appendix 7 Appendix Of The Speedup-Test Protocol

A7.1. Why is the observed minimal execution time not necessarily a good statistical estimation of program performances?

A7.2. Hypothesis testing in statistical and probability theory

A7.3. What is a reasonable large sample? Observing the central limit theorem in practice

Bibliography

Lists Of Figures, Tables And Algorithms

LIST OF FIGURES

LIST OF TABLES

LIST OF ALGORITHMS

Index

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset