AGASSIZ Publications

Agassiz Static and Dynamic Compilers

**** Please also visit DynOpt Group for more information on dynamic compilers.

"Recovery Code Generation for General Speculative Optimizations", by J.Lin, W.C. Hsu, P.C. Yew, R.D.C. Ju, and T.F. Ngai, ACM Transactions on Architecture and Code Optimization (TACO), Vol.3, No.1, March 2006, pp. 67-89

Dynamic Code Region (DCR) Based Program Phase Tracking and Prediction for Dynamic Optimizations, by J. Kim, S.V. Kodakara, W.C. Hsu, D.J. Lilja, P.C Yew, Lecture Notes in Computer Science, Volume 3793, Oct 2005, Pages 203 - 217.

"A General Compiler Framework for Speculative Optimizations Using Data Speculative Code Motion", by X. Dai, A. Zhai, W.C. Hsu and P.C. Yew, Proc. of the Third Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization (CGO), March 2005.

"Performance of Runtime Optimization on BLAST", by A. Das, J. Lu, H. Chen, J. Kim, P.C. Yew, W.C. Hsu, D.Y. Chen, Proc. of the Third Annual IEEE/ACM Int'l Symp. on Code Generation and Optimization (CGO), March 2005.

Loop Selection for Thread-Level Speculation, S.Wang, X.Dai, K.Yellajyosula, A.Zhai, and P.C. Yew, Proc of the 18th Workshop on Languages and Compilers for Parallel Computing (LCPC), Aug. 2005

"A Compiler Framework for Recovery Code Generation in General Speculative Optimizations", J.Lin, W.C. Hsu, P.C. Yew, R.D. Ju and T.F. Ngai, Proc. of Int'l Conf. on Parallel Architectures and Compiler Techniques (PACT), September 2004, pp. 17-28.

"A Compiler Framework for Speculative Optimizations", by J.Lin, T.Chen, W.C. Hsu, P.C. Yew, R.D.C. Ju, T.F. Ngai and S.Chan, ACM Transactions on Architecture and Code Optimization (TACO), Vol.1, No.3, September 2004, pp. 247-271

"Design and Implementation of a Lightweight Dynamic Optimization System", by Jiwei Lu, Howard Chen, Pen-Chung Yew, Wei Chung Hsu, Journal of Instruction-Level Parallelism, Volume 6, 2004

"Interprocedural Induction Variable Analysis", by P.Y. Tang and P.C. Yew, International Journal of Foundation of Computer Science, World Scientific, Vol.14, No.3, June 2003, pp.405-423

Data Dependence Profiling for Speculative Optimizations , by Tong Chen, Jin Lin, Xiaoru Dai, Wei-Chung Hsu, and Pen-Chung Yew, International Conference on Compiler Construction (CC), Barcelona, Spain, March 2004

Alias and dependence profiling in ORC and their applications , by Tong Chen, Chu-Cheow Lim, Tin-fook Ngai and Roy Ju, the First Intel Dynamic Compilation and Profile-guided Optimization Conference, Nov. 2003

A Compiler Framework for Speculative Analysis and Optimizations , by Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew, Roy Dz-Ching Ju, Tin-Fook. Ngai, Sun Chan, Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), San Diego, June 2003.

Speculative Register Promotion Using Advanced Load Address Table (ALAT) , by Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew, Proceeding of First Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), San Francisco, March 2003.

An Empirical Study on the Granularity of Pointer Analysis in C Programs , by Tong Chen, Jin Lin, Wei-Chung Hsu, Pen-Chung Yew, Proceedings of 15th Workshop on Languages and Compilers for Parallel Computing (LCPC), August 2002.

On the Impact of Naming Methods for Heap-Oriented Pointers in C Programs , by Tong Chen, Jin Lin, Wei-Chung Hsu, Pen-Chung Yew, Proceedings of The 6th International Symposium on Parallel Architectures, Algorithms, and Networks, May 2002.

Integrating scalar analysis and optimizations in a Parallel and optimizing compiler , by B.Zheng, Ph.D. Thesis, Jan. 2000.

Designing the Agassiz Compiler for Concurrent Multithreaded Architectures, by B. Zheng, J.-Y. Tsai, B. Y. Zang, T. Chen, B. Huang, J. H Li, Y. H. Ding, J. Liang, Y. Zhen, P.-C. Yew, C.Q. Zhu, Workshop on Languages and Compilers for Parallel Computing (LCPC), August 1999.

A Hierarchical Approach to Context-Sensitive Interprocedural Alias Analysis, by Bixia Zheng and Pen-Chung Yew, TR99-018, Univ. of Minnesota

High-Level Information - An Approach for Integrating Front-End and Back-End Compilers, by S. Cho, J.-Y. Tsai, Y. Song, B. Zheng, S. J. Schwinn, X. Wang, Q. Zhao, Z. Li, D. J. Lilja, and P.-C. Yew, Proceedings of the 1998 International Conference on Parallel Processing (ICPP), August 1998. (Also as Technical Report #98-008, Dept. of Computer Science and Engineering, Univ. of Minnesota, February 1998.)

Compiler Techniques for Concurrent Multithreading with Hardware Speculation Support, by Z. Li, J.-Y. Tsai, X. Wang, P.-C. Yew, and B. Zheng, Proceedings of the 9th Workshop on Languages and Compilers for Parallel Computing (LCPC), August 1996.

An Efficient Algorithm for the Run-Time Parallelization of Doacross Loops, by D.K. Chen, D.A. Oesterreich, J. Torrellas, and P.-C. Yew, Technical Report #97-028, Dept. of Computer Science, Univ. of Minnesota, July 1997. Preliminary version appeared in Supercomputing '94.

Statement Reordering for Doacross Loops, by D.K. Chen and P.-C. Yew, Technical Report #97-029, Dept. of Computer Science, Univ. of Minnesota, July 1997. Preliminary version appeared in ICPP '94.

On Effective Execution of Non-Uniform DOACROSS Loops, by D.K. Chen and P.-C. Yew, IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 7, No. 5, May 1996.

Enhancing Multiple-Path Speculative Execution with Predicate Window Shifting, by J.Y. Tsai and P.-C. Yew, Journal of System Architecture - Special Issue on Microprocessor Architecture, 1998.

Speculative Multi-Threaded, Multi-Core Architectures

**** Please also visit Arctic Group for more information on superthreaded architectures.

"Supporting Speculative Multithreading on Simultaneous Multithreaded Processors", by V.Packirisamy, S.Wang, A. Zhai, W.C.Hsu and P.C.Yew, Proc. of Intn'l Conf. on High Performance Computing (HiPC), Dec. 2006

The Superthreaded Processor Architecture, by J.-Y. Tsai, J. Huang, C. Amlo, D.J. Lilja, and P.-C. Yew, In the IEEE Transactions on Computers, Special Issue on Multithreaded Architectures, vol. 48, no. 9, Sep., 1999

Performance Study of a Concurrent Multithreaded Processor<, by J.-Y. Tsai, Z. Jiang, E. Ness, and P.-C. Yew, Proceedings of the Fourth Int'l Conf. on High-Performance Computer Architecture (HPCA-4), Feb. 1998.

The Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence Checking and Control Speculation, by J.-Y. Tsai and P.-C. Yew, Proceedings of Int'l Conf. on Parallel Architectures and Compilation Techniques (PACT '96), Oct. 1996.

Compiler Techniques for the Superthreaded Architectures, by J.-Y. Tsai, Z. Jiang, and P.-C. Yew, International Journal of Parallel Programming - Special Issue on Languages and Compilers for Parallel Computing, June 1998.

Superthreading: Integrating Compilation Technology and Processor Architecture for Cost-Effective Concurrent Multithreading, by J.-Y. Tsai, Z. Jiang, Z. Li, D.J. Lilja, X. Wang, P.-C. Yew, B. Zheng, and S. Schwinn, Journal of Information Science and Engineering, March 1998.

Improving Instruction Throughput and Memory Latency Using Two-Dimensional Superthreading, by J.-Y. Tsai, B. Zheng, and P.-C. Yew, Technical Report

Program Optimization for Concurrent Multithreaded Architectures, by J.-Y. Tsai, Z. Jiang, and P.-C. Yew Proceedings of the 10th Workshop on Languages and Compilers for Parallel Computing, Aug. 1997.

Integrating Compilation Technology and Processor Architecture for Cost-Effective Concurrent Multithreading, by J.-Y. Tsai, Ph.D. Thesis, Computer Science, University of Illinois at Urbana-Champaign, April 1998.

Compiler and Architecture Issues for Concurrent Multi-threaded Architectures, by P.-C. Yew, Presentation Material for the Intel MRL Research Forum, Nov. 1996.

Integrating Compilation Technology and Processor Architecture for Cost-Effective Concurrent Multithreading, by P.-C. Yew Presentation Material for the SGI/CRAY Future Architecture Seminar, July 1997

Speculative Execution

Decoupled Value Prediction on Trace Processors, by S.J. Lee, Y. Wang and P.C. Yew, Proceedings of the 6th Int'l Conf. on High Performance Computer Architecture (HPCA-6), Toulouse, France, Jan. 2000

Exploiting Basic Block Value Locality with Block Reuse, by J. Huang and D. J. Lilja, Proceedings of the 5th Int'l Symposium on High Performance Computer Architecture (HPCA-5), Orlando, Jan., 1999

System Software

Live Updating Operating Systems Using Virtualization, by H.Chen, R.Chen, F.Zhang, B.Zang, P.C.Yew, Proc. of 2nd Int'l Conf. on Virtual Execution Environments (VEE), pp. 35-44, June 2006.

High-Performance Memory System Design

Processor memory system design

A High-Bandwidth Memory Pipeline for Wide Issue Processors , by S. Cho, P.-C. Yew, and G. Lee, IEEE Transactions on Computers, Vol. 50, No. 7, July 2001.

Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor , by S. Cho, P.-C. Yew, and G. Lee, Proceedings of the 26th Int'l Symp. on Computer Architecture (ISCA), May 1999. (Also as Technical Report #98-020, Dept. of Computer Sci. and Eng., Univ. of Minnesota, May 1998)

Access Region Locality for High-Bandwidth Processor Memory System Design , by S. Cho, P.-C. Yew, and G. Lee, Proceedings of the 32nd Int'l Symp. on Microarchitecture (MICRO-32), Nov. 1999.

Multiprocessor cache and memory design

Efficient integration of compiler-directed cache coherence and data prefetching, by H.-B. Lim and P.-C. Yew, Journal of Parallel and Distributed Computing, Vol. 61, No. 12, Dec. 2001, pp. 1775-1802.

Efficient integration of compiler-directed cache coherence and data prefetching, by H.-B. Lim and P.-C. Yew, Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS 2000), May 2000 (Best Paper Award).

Binding Time in Distributed Shared Memories, by J. Kong, PhD Thesis, June 1999.

Maintaining Cache Coherence through Compiler-directed Data Prefetching, by H.-B. Lim and P.-C. Yew, Journal of Parallel and Distributed Computing, Vol. 53, No. 2, Sep. 1998, pp. 144-173.

Hardware and Compiler-Directed Cache Coherence in Large-Scale Multiprocessors, by L. Choi and P.-C. Yew, Technical Report #97-030, Dept. of Computer Science, Univ. of Minnesota, July 1997.

Compiler Analysis for Cache Coherence, by L. Choi and P.-C. Yew, Technical Report #97-031, Dept. of Computer Science, Univ. of Minnesota, July 1997.

A compiler-directed cache coherence scheme using data prefetching, by H.-B. Lim and P.-C. Yew, Proceedings of the 1997 International Parallel Processing Symposium, April 1997.

Techniques for compiler-directed cache coherence, by L. Choi, H.-B. Lim, and P.-C. Yew, IEEE Parallel & Distributed Technology, Winter 1996, pp. 23-34.

Compiler support for maintaining cache coherence using data prefetching, by H. B. Lim, L. Choi, and P.-C. Yew, Extended abstract in Proceedings of the Ninth Workshop on Languages and Compilers for Parallel Computing (LCPC '96), Santa Clara, CA, Aug. 1996.

Program Analysis for Cache Coherence: Beyond Procedural Boundaries, by L. Choi and P.-C. Yew Proceedings of International Conference on Parallel Processing, Aug. 1996.

Compiler and Hardware Support for Cache Coherence in Large-Scale Multiprocessors: Design Considerations and Performance Evaluation, by L. Choi and P.-C. Yew, Proceedings of International Symposium on Computer Architecture, May 1996, pp. 283-294.

Eliminating Stale Data References through Array Data-Flow Analysis, by L. Choi and P.-C. Yew, Proceedings of International Parallel Processing Symposium, April 1996.

Interprocedural Array Data-Flow Analysis for Cache Coherence, by L. Choi and P.-C. Yew, Proceedings of Eighth Workshop on Languages and Compilers for Parallel Computing, Aug. 1995.

Compiler Assistance for Directory-Based Cache Coherence Enforcement, by David J. Lilja, Proceedings of Workshop on Challenges for Parallel Processing, International Conference on Parallel Processing, Aug. 1995, pp. 133-138.

The Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement Strategy to the Data Sharing Characteristics, by Farnaz Mounes-Toussi and David J. Lilja, IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 5, May 1995, pp. 470-481.

Using Compiler Assistance to Reduce the Network Traffic Requirements of a Directory-Based Cache Coherence Mechanism, by Zhiyuan Li, Farnaz Mounes-Toussi, and David J. Lilja, High-Performance Parallel Computing Research Group Technical Report #HPPC-95-01, Jan. 1995.

Reducing the Impact of False-Sharing Using a Write-Through Cache with Partial Block Invalidation, by Farnaz Mounes-Toussi and David J. Lilja, High-Performance Parallel Computing Research Group Technical Report #HPPC-94-15, Dec. 1994.

A Compiler-Directed Cache Coherence Scheme with Improved Intertask Locality, by L. Choi and P.-C. Yew, Proceedings of Supercomputing '94, Washington, D.C., Nov. 1994, pp. 773-782.

A Superassociative Tagged Cache Coherence Directory, by David J. Lilja and Shanthi Ambalavanan, Proceedings of International Conference on Computer Design, Oct. 1994, pp. 42-45. (Extended version)

A Compiler-Assisted Scheme for Adaptive Cache Coherence Enforcement, by Trung N. Nguyen, Farnaz Mounes-Toussi, David J. Lilja, and Zhiyuan, Li Proceedings of IFIP International Conference on Parallel Architectures and Compilation Techniques, Aug. 1994, pp. 69-78.

An Evaluation of a Compiler Optimization for Improving the Performance of a Coherence Directory, by Farnaz Mounes-Toussi, David J. Lilja, and Zhiyuan Li, Proceedings of ACM International Conference on Supercomputing, July 1994, pp. 75-84.

Software Assistance for Directory-Based Caches, by Z. Li, Proceedings of the 8th IEEE International Parallel Processing Symposium, 1994.

Performance Limits of Compiler-Directed Multiprocessor Cache Coherence Enforcement, by Farnaz Mounes-Toussi and David J. Lilja, The Interaction of Compilation Technology and Computer Architecture, D. J. Lilja and P. L. Bird (eds.) Kluwer Academic Publishers, Boston, MA, 1994, pp. 161-190.

Improving Memory Utilization in Cache Coherence Directories, by David J. Lilja and Pen-Chung Yew, IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No. 10, Oct. 1993, pp. 1130-1146.

Cache Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons, by David J. Lilja, ACM Computing Surveys, Vol. 25, No. 3, Sep. 1993, pp. 303-338.

Compiler Support for the Efficient Use of Cache Coherence Directories, by Trung N. Nguyen, Zhiyuan Li, and David J. Lilja, High-Performance Parallel Computing Research Group Technical Report #HPPC-94-19, Dec. 1994. (Also appeared as "Efficient Use of Dynamically Tagged Directories Through Compiler Analysis," by Trung N. Nguyen, Zhiyuan Li, and David J. Lilja, Proceedings of International Conference on Parallel Processing, Vol. II: Software, Aug. 1993, pp. 112-119) Locality enhancement and latency hiding

Integrating Fine-Grained Message Passing in Cache Coherent Shared-Memory Multiprocessors, by D. Poulsen and P.-C. Yew, Journal of Parallel and Distributed Computing, Vol. 33, No. 2, March 1996, pp. 172-188.

Write Buffer Design for Cache-Coherent Shared-Memory Multiprocessors, by Farnaz Mounes-Toussi and David J. Lilja, Proceedings of International Conference on Computer Design, Oct. 1995, pp. 506-511.

An Interprocedural Parallelizing Compiler and Its Support for Memory Hierarchy Research, by J. Gu, Z. Li, and T.N. Nguyen, Languages and Compilers for Paralle Computing, Lecture Notes in Computer Science, 1033, Springer-Verlag, Aug. 1995.

Data Prefetching and Data Forwarding in Shared-Memory Multiprocessors, by D. Poulsen and P.-C. Yew, Proceedings of the International Conference on Parallel Processing, Vol. II, Aug. 1994, pp. 276-280.

Performance Analysis and Simulation Tools

An Efficient Strategy for Developing a Simulator for a Novel Concurrent Multithreaded Processor Architecture, by J. Huang and D. Lilja, Proceedings of the 6th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems, July, 1998

Processor Self-Scheduling in Parallel Discrete Event Simulation, by P. Konas and P.-C. Yew, Proceedings of 1995 Winter Simulation Conference, Dec. 1995.

Parallel Simulations of Multiprocessors, by P. Konas and P.-C. Yew, Simulation: Practice and Theory, Elsevier Science Publisher, 1994.

Execution-Driven Tools for Parallel Simulation of Parallel Architecture and Applications, by D. Poulsen and P.-C. Yew, Proceedings of Supercomputing '93, Nov. 1993, pp. 860-869.

Benchmarking and Parallel Applications

Performance and Program Complexity in Contemporary Network-based Parallel Computing Systems, by Steven VanderWiel, Dafna Nathanson, and David J. Lilja, High-Performance Parallel Computing Research Group Technical Report #HPPC-96-02, Mar. 1996.

A Data Parallel Implementation of the TRFD Program from the Perfect Benchmarks, by David J. Lilja and Jonathan Schmitt, EUROSIM International Conference on Massively Parallel Processing Applications and Development, Delft, The Netherlands, June 1994, pp. 355-362.

Scheduling for Parallel Systems

Performance Analysis and Prediction of Processor Scheduling Strategies in Multiprogrammed Shared-Memory Multiprocessors, by Kelvin K. Yue and David J. Lilja, Proceedings of International Conference on Parallel Processing, Aug. 1996.

Dynamic Scheduling Strategies for Shared-Memory Multiprocessors, by Babak Hamidzadeh and David J. Lilja, Proceedings of International Conference on Distributed Computing Systems, May 1996.

Efficient Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems, by Kelvin K. Yue and David J. Lilja, Proceedings of International Parallel Processing Symposium, April 1996, pp. 448-456.

Dynamic Scheduling Techniques for Heterogeneous Computing Systems, by Babak Hamidzadeh, David J. Lilja, and Yacine Atif, Concurreny: Practice and Experience, Special Issue on Resource Management in Parallel and Distributed Systems, Vol. 7, No. 7, Oct. 1995.

Parallel Loop Scheduling for High-Performance Computers, by Kelvin K. Yue and David J. Lilja, High Performance Computing: Technology, Methods, and Applications, by J. Dongarra, L. Grandinetti, G. Joubert and J. Kowalik (eds.), Elsevier Publishing Company, Amsterdam, Sep. 1995.

Parameter Estimation for a Generalized Parallel Loop Scheduling Algorithm, by Kelvin K. Yue and David J. Lilja, Practical Handbook of Genetic Algorithms, Volume 2: New Frontiers, by Lance D. Chambers (ed.), CRC Press, Inc., Boca Raton, Florida, Aug. 1995.

Performance Evaluation of Different Scheduling Schemes on Multiprocessor Architectures, by Arundhati Kalavade, High-Performance Parallel Computing Research Group Technical Report, #HPPC-95-03 (also M.S. thesis), June 1995.

Partitioning Tasks Between a Pair of Interconnected Heterogeneous Processors: A Case Study, by David J. Lilja, Concurrency: Practice and Experience, Vol. 7, No. 3, May 1995, pp. 209-223. (short version)

Loop-Level Process Control: An Effective Processor Allocation Policy for Multiprogrammed Shared-Memory Multiprocessors, by Kelvin K. Yue and David J. Lilja, Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing, International Parallel Processing Symposium, D.G. Feitelson and L. Rudolph (eds.)
Springer-Verlag Lecture Notes in Computer Science, Vol. 949, April 1995, pp. 182-199. (Also as High-Performance Parallel Computing Research Group Technical Report #HPPC-95-02, Jan. 1995)

Categorizing Parallel Loops Based on Iteration Execution Time Variances, by Kelvin K. Yue and David J. Lilja, High-Performance Parallel Computing Research Group Technical Report #HPPC-94-13, Nov. 1994.

Self-Adjusting Scheduling: An On-Line Optimization Technique for Locality Management and Load Balancing, by Babak Hamidzadeh and David J. Lilja, Proceedings of International Conference on Parallel Processing, Volume II: Software, Aug. 1994, pp. 39-46.

The Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory Multiprocessor, by David J. Lilja, IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No. 6, June 1994, pp. 573-584.

Exploiting the Parallelism Available in Loops, by David J. Lilja, IEEE Computer, Vol. 27, No. 2, Feb. 1994, pp. 13-26.


Home	Projects	People	Papers	Resoures	Contact