I.3. Basics on instruction-level parallelism processor architectures
Part 1 Prolog: Optimizing Compilation
1 On The Decidability Of Phase Ordering In Optimizing Compilation
1.1. Introduction to the phase ordering problem
1.2. Background on phase ordering
1.3. Toward a theoretical model for the phase ordering problem
1.4. Examples of decidable simplified cases
1.5. Compiler optimization parameter space exploration
1.6. Conclusion on phase ordering in optimizing compilation
2 Instruction Scheduling Problems And Overview
2.1. VLIW instruction scheduling problems
2.3. Instruction scheduling and register allocation
3 Applications Of Machine Scheduling To Instruction Scheduling
3.1. Advances in machine scheduling
3.2. List scheduling algorithms
3.3. Time-indexed scheduling problem formulations
4 Instruction Scheduling Before Register Allocation
4.1. Instruction scheduling for an ILP processor: case of a VLIW architecture
4.2. Large neighborhood search for the resource-constrained modulo scheduling problem
4.3. Resource-constrained modulo scheduling problem
4.4. Time-indexed integer programming formulations
4.5. Large neighborhood search heuristic
5 Instruction Scheduling After Register Allocation
5.2. Local instruction scheduling
5.3. Global instruction scheduling
6 Dealing In Practice With Memory Hierarchy Effects And Instruction Level Parallelism
6.1. The problem of hardware memory disambiguation at runtime
6.2. Data preloading and prefetching
7 The Register Need Of A Fixed Instruction Schedule
7.1. Data dependence graph and processor model for register optimization
7.2. The acyclic register need
7.3. The periodic register need
7.4. Computing the periodic register need
7.5. Some theoretical results on the periodic register need
7.6. Conclusion on the register requirement
8.1. Motivations on the register saturation concept
8.2. Computing the acyclic register saturation
8.3. Computing the periodic register saturation
8.4. Conclusion on the register saturation
9.1. Introduction to register constraints in software pipelining
9.2. Related work in periodic register allocation
9.3. SIRA: schedule independant register allocation
9.4. SIRALINA: an efficient polynomial heuristic for SIRA
9.5. Experimental results with SIRA
9.6. Conclusion on spill code reduction
10 Exploiting The Register Access Delays Before Instruction Scheduling
10.2. Problem description of DDG circuits with non-positive distances
10.3. Necessary and sufficient condition to avoid non-positive circuits
10.4. Application to the SIRA framework
10.5. Experimental results on eliminating non-positive circuits
10.6. Conclusion on non-positive circuit elimination
11 Loop Unrolling Degree Minimization For Periodic Register Allocation
11.3. Problem description of unroll factor minimization for unscheduled loops
11.4. Algorithmic solution for unroll factor minimization: single register type
11.5. Unroll factor minimization in the presence of multiple register types
11.6. Unroll factor reduction for already scheduled loops
11.9. Conclusion on loop unroll degree minimization
Part 4 Epilog: Performance, Open Problems
12 Statistical Performance Analysis: The Speedup-Test Protocol
12.1. Code performance variation
12.2. Background and notations
12.3. Analyzing the statistical significance of the observed speedups
12.4. The Speedup-Test software
12.5. Evaluating the proportion of accelerated benchmarks by a confidence interval
12.6. Experiments and applications
12.8. Discussion and conclusion on the Speedup-Test protocol
Appendix 1 Presentation Of The Benchmarks Used In Our Experiments
A1.1. Qualitative benchmarks presentation
A1.2. Quantitative benchmarks presentation
A1.3. Changing the architectural configuration of the processor
Appendix 2 Register Saturation Computation On Stand-Alone Ddg
A2.1. The cyclic register saturation
A2.2. The periodic register saturation
Appendix 3 Efficiency Of Sira On The Benchmarks
A3.1. Efficiency of SIRALINA on stand-alone DDG
A3.2. Efficiency of SIRALINA plugged with an industrial compiler
Appendix 4 Efficiency Of Non-Positive Circuit Elimination In The Sira Framework
A4.2. Comparison between the heuristics execution times
A4.3. Convergence of the proactive heuristic (iterative SIRALINA)
A4.4. Qualitative analysis of the heuristics
A4.5. Conclusion on non-positive circuit elimination strategy
Appendix 5 Loop Unroll Degree Minimization: Experimental Results
A5.1. Stand-alone experiments with single register types
A5.2. Experiments with multiple register types
Appendix 6 Experimental Efficiency Of Software Data Preloading And Prefetching For Embedded Vliw
Appendix 7 Appendix Of The Speedup-Test Protocol
A7.2. Hypothesis testing in statistical and probability theory
A7.3. What is a reasonable large sample? Observing the central limit theorem in practice