1/3/97 This document contains the summaries of some of the GCC passes written by the following persons. The partition of the passes was done on 12/2/96. --- PLZ note that the file specified may NOT be all that is needed to look at; refer to the simple description also and find out if you need to look at other files too. Xin Wang: jump.c jump opt. and jump threading Steve Schwinn: scan.c regclass.c local-alloc.c global.c reload.c register allocation related Yonghong Song: cse.c flow.c combine.c final.c common subexpression elimination and data flow; instruction generation related Sangyeun Cho: loop.c unroll.c sched.c loop opt. and scheduling Each one is supposed to look at the files and figure out what kind of code changes are made and how they can affect the HLI. Techniques to maintain the HLI properly need to be identified and reported. ================================================================ (Following is summarized by Yonghong Song) This document is for hli usage in GCC optimization. Main passes observed:cse.c flow.c final.c combine.c hli:line table,region table,ref/mod table for subregion and function call. Goal:figure out what kind of code changes are made and how they can affect the HLI.Techniques to maintain the HLI properly need to be identified and reported. 1.CSE(Commen Subexpression Elimination): The basic idea of CSE is to go through the code,keeping a record of expressioins that would have the same value at the current scan point,and replacing expressions encountered with the cheapest equivalent expression. Special Consideration: Each basic block are processed seperately Basic Data Structure: "quantity numbers" to record equivalent(pseudo) registers "hash table" to record general expressions. recording known equivalences among expressions in general. We should keep in accord all the info in "qty numbers" and "hash table" If some value is changed in memory,except special cases(array),all "qty" and "hash table" are removed. If some value is changed in register,all expressions containing refs to that reg must be removed. Code modifications(about RTL and global info): Some statements are replaced by cheaper ones and CSE finally tries to delete statements which store a reg that is never used or copies a reg to itself,etc. Main means: Based on each basic block from the first statement of BB CSE contructs data info of "qty equ" and "hash table" and related data structure. main function: cse_main main precessing function: cse_basic_block HLI affects: If given precise memory reference info,CSE can give better results since it assumes all exps/pseudos are clobbered during memory ref. A possible improvement: can we do global CSE? 2.FLOW(flow and life analysis and BB identification): This pass contains the data analysis pass of the compiler,which affects insn combiner & reg allocation. find basic blocks(identify each BB precessing LABEL & JUMP statement) life analysis(determine where each hard or pseudo reg is live ) also setting LOG_LINKS for insn combiner live reg info(basic_block_live_at_start--hard/pseudo,REG_NOTES) REG_NOTES:REG_DEAD(total/partial reg) add $1,$2 /* $2 */ REG_UNUSED(total/partial reg) add $1,$2 /* $1 */ LOG_LINKs:the LOG_LINKs of each insn identify the most recent assignment to each REG used in the insn. It's a list of previous insns,each of which contains a SET for a REG that is used in this insn and not used or set in between.Recording within BBs. setting other info field about reg usage: reg_n_refs,reg_n_deaths,reg_n_sets,..... Function: flow_analysis calling 1.find_basic_blocks 2.life_analysis find_basic_blocks: precessing LABEL and JUMP statement to find BB life_analysis: for each BB Method:bottom-up to record reg usage. Initially the last BB:stack reg,frame reg/hard frame,all global regs) Propagate life info through the BBs around the graph of BBs. This is a relaxation process.For example, I1,I2--->I3 If $1 is live in the beginning of I3 it is too live in the end of I1,I2 about loop: The loop depth may change in the middle of a BB.Since we scan from end to beginning,we start with the depth at the end of the current BB,and adjust as we pass ends end starts of loops. So for loop we may need to repeat our precessing until a fixed point. .....?.....(I need to further examine it) about function call: find all call insns and mark them as possibly jumping to all the nonlocal goto handler labels reg_n_calls_crossed: indexed by N,gives 1 if that reg is live across any CALL_INSNs.This information remains valid for the rest of the compilation of the current function;it is used to control reg alloc. (Any regs live at the time of a call insn must not go in a reg clobbered by calls.reg_n_calls_crossed records this info which is used by reg allocation.The use of this info is affected by call convention. Modifications about RTL and other info: BB/unreachable loop which cannot be reached is deleted. Any insn which copies a reg to itself is deleted. Any store insn whose result is totally used later is deleted. HLI may work: With HLI we may get more detailed info about mem(pseudo) refs about pseudo/hard reg usage.(subregion/func call MOD/REF table) I don't know the detailed info about how gcc deals with loops and functional call,but obviously if we have ref/mod info about loop(subregion) and functional call we will obtain more accurate info of reg(hard,pseudo) use. 3.FINAL(final code generation): looking at the trl code for a function and outputs assembler code. final_start_function final final_end_function Basic method:traversing RTL chain,and for each RTX final_scan_insn is called. Before generating asm code peephole optimization is executed if required then insn recognition(recog_memorized),insn extraction(insn_extract) then in case of pattern and reg use compatibility the final assemble code is outputed. I cannot figure out HLI use here up to now. 4.COMBINE(combine two/more instructions into one) combining possible insns into one.In flow analysis suitable logic link info has been recorded.The LOG_LINKs are much like UDC(Use-Define Chain),but GCC combiner is restricted within one basic block. To get better combine result combiner may:(two RTXs to one) change one insn mode to fit one combiner model. simplify some insn(if-then-else,set,logical,comparison...) applying some law(distributive law in maths if correct) spliting/extracting part of one insn(if cheaper) Main function: combine_instructions calling try_combine COMBINER mainly uses the result of flow analysis and its own knowledge of RTL ,and it may update many info about data flow. Contraints: 1.combine only when the earlier insn(s) consist of only a single assignment.2.to never combine when a subroutine call appears in the middle. 3.Because of flow analysis within one basic block. MODIFICATIONS to RTL and other data info: in case of combination earlier insn is deleted if useless. updating all necessary data flow info(reg_use...). HLI valuability: I think we may put our weight on flow analysis.Insn combination between different BBs needs:speculation and aggrassive flow analysis. But HLI will have little impact on insn combiner. Combiner:reducing # of insns.Beyond BBs it means speculation. Scheduler:possibly extracting more ILP. 5.Some other considerations: I don't know how to use HLI input file.will we read the same copy each time? Or we could read HLI file once for all and use it later in all optimizations? I think the latter is better.So the key is to figure out how to change HLI to RTL representation or other global data structure.And in this case the best position to merge HLI is just before optimization since HLI can effectively work on opts and not bring in too much complexity. I expect more details. ================================================================ (Following is summarized by Sangyeun Cho) Loop optimization pass: loop.c unroll.c --------------------------------------- This pass finds invariant computations within loops and moves them to the beginning of the loop. Then it identifies basic and general induction variables. Strength reduction is applied to the general induction variables, and induction variable elimination is applied to the basic induction variables. It also finds cases where a register is set within the loop by zero-extending a narrower value and changes these to zero the entire register once before the loop and merely copy the low part within the loop. . Driver function: loop_optimize () loop_optimize () sets up some of the variables regarding loops and calls scan_loop () to handle each loop it encounter from inner loops to outer. scan_loop () then characterizes each loop and tries to move invariant computations out of the loop and calls strength_reduce (). When GCC moves invariant computations in a loop to the outside of that loop, there can be possibly a memory reference moved out of the loop. In that case, the HLI might become inaccurate, since the item is now in a different region, and GCC creates a new rtl insn and deletes the old one. Thus, we need to track each item touched by this pass and take care of such changes by identifying exactly which item is moved and from which region to which. Strength reduction and induction variable elimination don't seem to affect HLI since they don't touch memory items. Loop unrolling (which is called from strengh reduction pass) does change HLI. Since a loop body can be copied multiple times, an item from the source code may be duplicated in the lower level code after this optimization. So there can be multiple rtl insns corresponding to one item in the source code. It can be conceived that rtl-to-item mapping can still be done by relevant work but item-to-rtl mapping is not possible since it is going to be one-to-many relation. If the source level unrolling is performed at high level and GCC unrolling is suppressed, this problem wouldn't exist. . Driver functinon: unroll_loop () this function is supposed to be called in strength_reduce () of loop.c. Loops are categorized into three classes and treated accordingly: complete unrolling, modulo unrolling, and naive unrolling based on whether or not the loop count can be known at compile time, run time, or not at all. Then loop body is copied based on the categorization. Instruction scheduling: sched.c ------------------------------- This pass rearranges instructions in each basic block in such a way that the total execution time may be minimized. A modified list scheduling is used. . Driver function: schedule_insns () schedule_insns () traverses through the basic blocks (BBs) caculated in flow.c and schedule insns in each BB by calling schedule_block (). schedule_block () then rearranges instructions contained in a BB using a modified list-scheduling to shorten the execution time. This pass does not seem to affect HLI: this pass only rearranges the rtl insns in its chain and does not create or add new items. Since insns don't move over BB boundaries and are not deleted, HLI should still be valid after this pass. As an immediate application of HLI, we can use HLI in this pass when GCC calculates and adds dependences between memory reference instructions. ================================================================ (Following was summarized by Xin Wang) Summery of jump.c --------------------------------------------------------------------------- Jump optimization may be called two or three times in GCC depending on the how optimization flags are used. Once before cse, sometimes once after cse, and i once after reload (before final). Jump_optimize deletes unreachable code and labels that are not used. Jump across is done after the register allocation. Jump optimization is done after cse when cse's constant-propagation causes jumps to become unconditional or to be deleted. Dead code elimination of unreachable loops can not be done here. It is the job of flow analysis. If we want to keep the correct mapping between HLI and rtl, there are two places we need to pay attention to: 1) Code elimination. Whenever any instructions are deleted, we need to check out wheather those instructions contains memory reference. 2) Code duplication. Jump optimization will duplicate the loop exit test if there is a NOTE_INSN_LOOP_BEG followed by an unconditional jump. In this case, we must check if there is any memory reference is dumplicated also. Another place that rtl is chenged is reverse_comparison. It will not generate new memory reference, so HLI will not be affected. ================================================================