Compiler that optimizes generated code From Wikipedia, the free encyclopedia
An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption.[1] Optimization is generally implemented as a sequence of optimizing transformations, a.k.a. compiler optimizations – algorithms that transform code to produce semantically equivalent code optimized for some aspect.
Optimization is limited by a number of factors. Theoretical analysis indicates that some optimization problems are NP-complete, or even undecidable.[2] Also, producing perfectly optimal code is not possible since optimizing for one aspect often degrades performance for another. Optimization is a collection of heuristic methods for improving resource usage in typical programs.[3]: 585
Scope describes how much of the input code is considered to apply optimizations.
Local scope optimizations use information local to a basic block.[4] Since basic blocks contain no control flow statements, these optimizations require minimal analysis, reducing time and storage requirements. However, no information is retained across jumps.
Global scope optimizations, also known as intra-procedural optimizations, operate on individual functions.[4] This gives them more information to work with but often makes expensive computations necessary. Worst-case assumptions need to be made when function calls occur or global variables are accessed because little information about them is available.
Peephole optimizations are usually performed late in the compilation process after machine code has been generated. This optimization examines a few adjacent instructions (similar to "looking through a peephole" at the code) to see whether they can be replaced by a single instruction or a shorter sequence of instructions.[3]: 554 For instance, a multiplication of a value by two might be more efficiently executed by left-shifting the value or by adding the value to itself (this example is also an instance of strength reduction).
Interprocedural optimizations analyze all of a program's source code. The more information available, the more effective the optimizations can be. The information can be used for various optimizations, including function inlining, where a call to a function is replaced by a copy of the function body.
Link-time optimization (LTO), or whole-program optimization, is a more general class of interprocedural optimization. During LTO, the compiler has visibility across translation units which allows it to perform more aggressive optimizations like cross-module inlining and devirtualization.
Machine code optimization involves using an object code optimizer to analyze the program after all machine code has been linked. Techniques such as macro compression, which conserves space by condensing common instruction sequences, become more effective when the entire executable task image is available for analysis.[5]
Most high-level programming languages share common programming constructs and abstractions, such as branching constructs (if, switch), looping constructs (for, while), and encapsulation constructs (structures, objects). Thus, similar optimization techniques can be used across languages. However, certain language features make some optimizations difficult. For instance, pointers in C and C++ make array optimization difficult; see alias analysis. However, languages such as PL/I that also support pointers implement optimizations for arrays. Conversely, some language features make certain optimizations easier. For example, in some languages, functions are not permitted to have side effects. Therefore, if a program makes several calls to the same function with the same arguments, the compiler can infer that the function's result only needs to be computed once. In languages where functions are allowed to have side effects, the compiler can restrict such optimization to functions that it can determine have no side effects.
Many optimizations that operate on abstract programming concepts (loops, objects, structures) are independent of the machine targeted by the compiler, but many of the most effective optimizations are those that best exploit special features of the target platform. Examples are instructions that do several things at once, such as decrement register and branch if not zero.
The following is an instance of a local machine-dependent optimization. To set a register to 0, the obvious way is to use the constant '0' in an instruction that sets a register value to a constant. A less obvious way is to XOR a register with itself or subtract it from itself. It is up to the compiler to know which instruction variant to use. On many RISC machines, both instructions would be equally appropriate, since they would both be the same length and take the same time. On many other microprocessors such as the Intel x86 family, it turns out that the XOR variant is shorter and probably faster, as there will be no need to decode an immediate operand, nor use the internal "immediate operand register"; the same applies on IBM System/360 and successors for the subtract variant.[6] A potential problem with this is that XOR or subtract may introduce a data dependency on the previous value of the register, causing a pipeline stall, which occurs when the processor must delay execution of an instruction because it depends on the result of a previous instruction. However, processors often treat the XOR of a register with itself or the subtract of a register from itself as a special case that does not cause stalls.
This section possibly contains original research. (August 2020) |
Optimization includes the following, sometimes conflicting themes.
Loop optimization acts on the statements that make up a loop, such as a for loop, for example loop-invariant code motion. Loop optimizations can have a significant impact because many programs spend a large percentage of their time inside loops.[3]: 596
Some optimization techniques primarily designed to operate on loops include:
j := 4*i + 1
, it can be updated appropriately each time the loop variable is changed. This is a strength reduction and also may allow the index variable's definitions to become dead code.[3]: 596–598 This information is also useful for bounds-checking elimination and dependence analysis, among other things.Prescient store optimizations allow store operations to occur earlier than would otherwise be permitted in the context of threads and locks. The process needs some way of knowing ahead of time what value will be stored by the assignment that it should have followed. The purpose of this relaxation is to allow compiler optimization to perform certain kinds of code rearrangements that preserve the semantics of properly synchronized programs.[9]
Data-flow optimizations, based on data-flow analysis, primarily depend on how certain properties of data are propagated by control edges in the control-flow graph. Some of these include:
(a + b) - (a + b)/4
, "common subexpression" refers to the duplicated (a + b)
. Compilers implementing this technique realize that (a + b)
will not change, and so only calculate its value once.[3]: 592–594 3 + 5
) with their final value (8
) at compile time, rather than doing the calculation in run-time.[10] Used in most modern languages.These optimizations are intended to be done after transforming the program into a special form called Static Single Assignment, in which every variable is assigned in only one place. Although some function without SSA, they are most effective with SSA. Many optimizations listed in other sections also benefit with no special changes, such as register allocation.
lea 25(a1,d5*4), a0
, allowing a single instruction to perform a significant amount of arithmetic with less storage.Although many of these also apply to non-functional languages, they either originate in or are particularly critical in functional languages such as Lisp and ML.
if (c) { foo; } if (c) { bar; }
to if (c) { foo; bar; }
,if (c) { foo; } if (!c) { bar; }
to if (c) { foo; } else { bar; }
.Interprocedural optimization works on the entire program, across procedure and file boundaries. It works tightly with intraprocedural counterparts, carried out with the cooperation of a local part and a global part. Typical interprocedural optimizations are procedure inlining, interprocedural dead-code elimination, interprocedural constant propagation, and procedure reordering. As usual, the compiler needs to perform interprocedural analysis before its actual optimizations. Interprocedural analyses include alias analysis, array access analysis, and the construction of a call graph.
Interprocedural optimization is common in modern commercial compilers from SGI, Intel, Microsoft, and Sun Microsystems. For a long time, the open source GCC was criticized for a lack of powerful interprocedural analysis and optimizations, though this is now improving.[16] Another open-source compiler with full analysis and optimization infrastructure is Open64.
Due to the extra time and space required by interprocedural analysis, most compilers do not perform it by default. Users must use compiler options explicitly to tell the compiler to enable interprocedural analysis and other expensive optimizations.
There can be a wide range of optimizations that a compiler can perform, ranging from simple and straightforward optimizations that take little compilation time to elaborate and complex optimizations that involve considerable amounts of compilation time.[3]: 15 Accordingly, compilers often provide options to their control command or procedure to allow the compiler user to choose how much optimization to request; for instance, the IBM FORTRAN H compiler allowed the user to specify no optimization, optimization at the registers level only, or full optimization.[3]: 737 By the 2000s, it was common for compilers, such as Clang, to have several compiler command options that could affect a variety of optimization choices, starting with the familiar -O2
switch.[17]
An approach to isolating optimization is the use of so-called post-pass optimizers (some commercial versions of which date back to mainframe software of the late 1970s).[18] These tools take the executable output by an optimizing compiler and optimize it even further. Post-pass optimizers usually work on the assembly language or machine code level (in contrast with compilers that optimize intermediate representations of programs). One such example is the Portable C Compiler (PCC) of the 1980s, which had an optional pass that would perform post-optimizations on the generated assembly code.[3]: 736
Another consideration is that optimization algorithms are complicated and, especially when being used to compile large, complex programming languages, can contain bugs that introduce errors in the generated code or cause internal errors during compilation. Compiler errors of any kind can be disconcerting to the user, but especially so in this case, since it may not be clear that the optimization logic is at fault.[19] In the case of internal errors, the problem can be partially ameliorated by a "fail-safe" programming technique in which the optimization logic in the compiler is coded such that a failure is trapped, a warning message issued, and the rest of the compilation proceeds to successful completion.[20]
Early compilers of the 1960s were often primarily concerned with simply compiling code correctly or efficiently, such that compile times were a major concern. One notable early optimizing compiler was the IBM FORTRAN H compiler of the late 1960s.[3]: 737 Another of the earliest and important optimizing compilers, that pioneered several advanced techniques, was that for BLISS (1970), which was described in The Design of an Optimizing Compiler (1975).[3]: 740, 779 By the late 1980s, optimizing compilers were sufficiently effective that programming in assembly language declined. This co-evolved with the development of RISC chips and advanced processor features such as superscalar processors, out-of-order execution, and speculative execution, which were designed to be targeted by optimizing compilers rather than by human-written assembly code.[citation needed]
Seamless Wikipedia browsing. On steroids.