1/3/97

This document contains the summaries of some of the GCC passes
written by the following persons. The partition of the passes
was done on 12/2/96.

---
PLZ note that the file specified may NOT be all
that is needed to look at; refer to the simple description also
and find out if you need to look at other files too.

Xin Wang: jump.c
			jump opt. and jump threading

Steve Schwinn: scan.c regclass.c local-alloc.c global.c reload.c
			register allocation related

Yonghong Song: cse.c flow.c combine.c final.c
			common subexpression elimination and data
			flow; instruction generation related

Sangyeun Cho: loop.c unroll.c sched.c
			loop opt. and scheduling

Each one is supposed to look at the files and figure out what
kind of code changes are made and how they can affect the HLI.
Techniques to maintain the HLI properly need to be identified
and reported.


================================================================

(Following is summarized by Yonghong Song)

This document is for hli usage in GCC optimization.
Main passes observed:cse.c flow.c final.c combine.c
hli:line table,region table,ref/mod table for subregion and function call.

Goal:figure out what kind of code changes are made and how they can affect 
the HLI.Techniques to maintain the HLI properly need to be identified and
reported.

1.CSE(Commen Subexpression Elimination):

The basic idea of CSE is to go through the code,keeping a record of expressioins that would have the same value at the current scan point,and replacing expressions encountered with the cheapest equivalent expression.
Special Consideration:
	Each basic block are processed seperately
Basic Data Structure:
	"quantity numbers" to record equivalent(pseudo) registers
	"hash table" to record general expressions.
	 recording known equivalences among expressions in general.
   We should keep in accord all the info in "qty numbers" and "hash table"
	If some value is changed in memory,except special cases(array),all "qty"
	and "hash table" are removed.
	If some value is changed in register,all expressions containing refs
	to that reg must be removed.
Code modifications(about RTL and global info):
	Some statements are replaced by cheaper ones and CSE finally tries
	to delete statements which store a reg that is never
	used or copies a reg to itself,etc.
Main means:
	Based on each basic block from the first statement of BB CSE contructs
	data info of "qty equ" and "hash table" and related data structure.
	main function:	cse_main
	main precessing function:	cse_basic_block
HLI affects:
	If given precise memory reference info,CSE can give better results
	since it assumes all exps/pseudos are clobbered during memory ref.
	A possible improvement:
		can we do global CSE?


2.FLOW(flow and life analysis and BB identification):
	This pass contains the data analysis pass of the compiler,which 
	affects insn combiner & reg allocation.
	find basic blocks(identify each BB precessing LABEL & JUMP statement)
	life analysis(determine where each hard or pseudo reg is live )
		also setting LOG_LINKS for insn combiner 
	live reg info(basic_block_live_at_start--hard/pseudo,REG_NOTES)
		REG_NOTES:REG_DEAD(total/partial reg) add $1,$2 /* $2 */
			  REG_UNUSED(total/partial reg) add $1,$2 /* $1 */
		LOG_LINKs:the LOG_LINKs of each insn identify the most
			recent assignment to each REG used in the insn.
			It's a list of previous insns,each of which contains
			a SET for a REG that is used in this insn and not
			used or set in between.Recording within BBs.
		setting other info field about reg usage:
			reg_n_refs,reg_n_deaths,reg_n_sets,.....
	Function: flow_analysis calling 1.find_basic_blocks
		               		2.life_analysis
	find_basic_blocks: precessing LABEL and JUMP statement to find BB
	life_analysis: for each BB
		<function call and loop?>
Method:bottom-up to record reg usage.
	Initially the last BB:stack reg,frame reg/hard frame,all global regs)
	Propagate life info through the BBs around the graph of BBs.
	This is a relaxation process.For example,
	I1,I2--->I3
	If $1 is live in the beginning of I3 it is too live in the end of I1,I2	
	about loop:
	The loop depth may change in the middle of a BB.Since we scan from
	end to beginning,we start with the depth at the end of the current 
	BB,and adjust as we pass ends end starts of loops.
	So for loop we may need to repeat our precessing until a fixed point.
	.....?.....(I need to further examine it)
	about function call:
	find all call insns and mark them as possibly jumping to all the
	nonlocal goto handler labels
	reg_n_calls_crossed: indexed by N,gives 1 if that reg is live across 
	any CALL_INSNs.This information remains valid for the rest of the
	compilation of the current function;it is used to control reg alloc.
	(Any regs live at the time of a call insn must not go in a reg
	clobbered by calls.reg_n_calls_crossed records this info which is
	used by reg allocation.The use of this info is affected by call
	convention.
Modifications about RTL and other info:
	BB/unreachable loop which cannot be reached is deleted.
	Any insn which copies a reg to itself is deleted.
	Any store insn whose result is totally used later is deleted.
HLI may work:
	With HLI we may get more detailed info about mem(pseudo) refs about
pseudo/hard reg usage.(subregion/func call MOD/REF table)
	I don't know the detailed info about how gcc deals with loops and
functional call,but obviously if we have ref/mod info about loop(subregion)
and functional call we will obtain more accurate info of reg(hard,pseudo) use.

3.FINAL(final code generation):
looking at the trl code for a function and outputs assembler code.
	final_start_function
	final
	final_end_function
Basic method:traversing RTL chain,and for each RTX final_scan_insn is called.
	Before generating asm code peephole optimization is executed if required 
	then insn recognition(recog_memorized),insn extraction(insn_extract)
	then in case of pattern and reg use compatibility the final assemble
	code is outputed.
I cannot figure out HLI use here up to now.

4.COMBINE(combine two/more instructions into one)
combining possible insns into one.In flow analysis suitable logic link info
has been recorded.The LOG_LINKs are much like UDC(Use-Define Chain),but 
GCC combiner is restricted within one basic block.
To get better combine result combiner may:(two RTXs to one)
	change one insn mode to fit one combiner model.
	simplify some insn(if-then-else,set,logical,comparison...)
	applying some law(distributive law in maths if correct)
	spliting/extracting part of one insn(if cheaper)
Main function:  combine_instructions
		calling try_combine
COMBINER mainly uses the result of flow analysis and its own knowledge of
RTL ,and it may update many info about data flow.
Contraints: 1.combine only when the earlier insn(s) consist of only a single
assignment.2.to never combine when a subroutine call appears in the middle.
3.Because of flow analysis within one basic block.
MODIFICATIONS to RTL and other data info:
	in case of combination earlier insn is deleted if useless.
	updating all necessary data flow info(reg_use...).
HLI valuability:
	I think we may put our weight on flow analysis.Insn combination
between different BBs needs:speculation and aggrassive flow analysis.
But HLI will have little impact on insn combiner.
<Combiner and Scheduler>
Combiner:reducing # of insns.Beyond BBs it means speculation.
Scheduler:possibly extracting more ILP.

5.Some other considerations:
	I don't know how to use HLI input file.will we read the same copy each time?
Or we could read HLI file once for all and use it later in all optimizations? I
think the latter is better.So the key is to figure out how to change HLI to RTL
representation or other global data structure.And in this case the best position
to merge HLI is just before optimization since HLI can effectively work on opts
and not bring in too much complexity.
	I expect more details.


================================================================

(Following is summarized by Sangyeun Cho)

Loop optimization pass: loop.c unroll.c
---------------------------------------

This pass finds invariant computations within loops and moves
them to the beginning of the loop.  Then it identifies basic and
general induction variables.  Strength reduction is applied
to the general induction variables, and induction variable
elimination is applied to the basic induction variables.

It also finds cases where a register is set within the loop
by zero-extending a narrower value and changes these to zero
the entire register once before the loop and merely copy the
low part within the loop.

. Driver function: loop_optimize ()
 loop_optimize () sets up some of the variables regarding
 loops and calls scan_loop () to handle each loop it encounter
 from inner loops to outer. scan_loop () then characterizes
 each loop and tries to move invariant computations out
 of the loop and calls strength_reduce ().

When GCC moves invariant computations in a loop to the outside
of that loop, there can be possibly a memory reference moved
out of the loop.  In that case, the HLI might become inaccurate,
since the item is now in a different region, and GCC creates
a new rtl insn and deletes the old one. Thus, we need to track
each item touched by this pass and take care of such changes
by identifying exactly which item is moved and from which region
to which.

Strength reduction and induction variable elimination don't
seem to affect HLI since they don't touch memory items.

Loop unrolling (which is called from strengh reduction pass)
does change HLI.  Since a loop body can be copied multiple times,
an item from the source code may be duplicated in the lower
level code after this optimization.  So there can be multiple
rtl insns corresponding to one item in the source code.
It can be conceived that rtl-to-item mapping can still be done
by relevant work but item-to-rtl mapping is not possible
since it is going to be one-to-many relation.  If the source
level unrolling is performed at high level and GCC unrolling
is suppressed, this problem wouldn't exist.

. Driver functinon: unroll_loop ()
 this function is supposed to be called in strength_reduce ()
 of loop.c. Loops are categorized into three classes and treated
 accordingly: complete unrolling, modulo unrolling, and naive
 unrolling based on whether or not the loop count can be known
 at compile time, run time, or not at all. Then loop body is
 copied based on the categorization.


Instruction scheduling: sched.c
-------------------------------

This pass rearranges instructions in each basic block in such a way
that the total execution time may be minimized. A modified
list scheduling is used.

. Driver function: schedule_insns ()
 schedule_insns () traverses through the basic blocks (BBs) caculated
 in flow.c and schedule insns in each BB by calling schedule_block ().
 schedule_block () then rearranges instructions contained in a BB using a
 modified list-scheduling to shorten the execution time.

This pass does not seem to affect HLI: this pass only rearranges the rtl
insns in its chain and does not create or add new items. Since insns don't
move over BB boundaries and are not deleted, HLI should still be valid
after this pass.

As an immediate application of HLI, we can use HLI in this pass when GCC
calculates and adds dependences between memory reference instructions.


================================================================

(Following was summarized by Xin Wang)

Summery of jump.c
---------------------------------------------------------------------------
Jump optimization may be called two or three times in GCC depending on the how
optimization flags are used. Once before cse, sometimes once after cse, and i
once after reload (before final).

Jump_optimize deletes unreachable code and labels that are not used. Jump
across is done after the register allocation. Jump optimization is done
after cse when cse's constant-propagation causes jumps to become
unconditional or to be deleted.

Dead code elimination of unreachable loops can not be done here. It is the
job of flow analysis.

If we want to keep the correct mapping between HLI and rtl, there are two
places we need to pay attention to:

1) Code elimination.
   Whenever any instructions are deleted, we need to check out wheather those
instructions contains memory reference.

2) Code duplication.
   Jump optimization will duplicate the loop exit test if there is a
NOTE_INSN_LOOP_BEG followed by an unconditional jump. In this case, we must
check if there is any memory reference is dumplicated also.

Another place that rtl is chenged is reverse_comparison. It will not generate
new memory reference, so HLI will not be affected.


================================================================