Agassiz Static and Dynamic Compilers
**** Please also visit DynOpt Group for
more information on dynamic compilers.
"Recovery Code Generation for General Speculative Optimizations",
by J.Lin, W.C. Hsu, P.C. Yew, R.D.C. Ju, and T.F. Ngai,
ACM Transactions on Architecture and Code Optimization (TACO), Vol.3,
No.1, March 2006, pp. 67-89
Dynamic Code Region (DCR) Based Program Phase Tracking
and Prediction for Dynamic Optimizations,
by J. Kim, S.V. Kodakara, W.C. Hsu, D.J. Lilja, P.C Yew,
Lecture Notes in Computer Science, Volume 3793, Oct 2005, Pages 203 - 217.
"A General Compiler Framework for Speculative Optimizations
Using Data Speculative Code Motion",
by X. Dai, A. Zhai, W.C. Hsu and P.C. Yew,
Proc. of the Third Annual IEEE/ACM Int'l Symp. on Code Generation and
Optimization (CGO), March 2005.
"Performance of Runtime Optimization on BLAST",
by A. Das, J. Lu, H. Chen, J. Kim, P.C. Yew, W.C. Hsu, D.Y. Chen,
Proc. of the Third Annual IEEE/ACM Int'l Symp. on Code Generation and
Optimization (CGO), March 2005.
Loop Selection for Thread-Level Speculation,
S.Wang, X.Dai, K.Yellajyosula, A.Zhai, and P.C. Yew,
Proc of the 18th Workshop on Languages and Compilers for
Parallel Computing (LCPC), Aug. 2005
"A Compiler Framework for Recovery Code Generation in General
Speculative Optimizations",
J.Lin, W.C. Hsu, P.C. Yew, R.D. Ju and T.F. Ngai,
Proc. of Int'l Conf. on Parallel
Architectures and Compiler Techniques (PACT), September 2004,
pp. 17-28.
"A Compiler Framework for Speculative Optimizations",
by J.Lin, T.Chen, W.C. Hsu, P.C. Yew, R.D.C. Ju, T.F. Ngai and S.Chan,
ACM Transactions on Architecture and Code Optimization (TACO), Vol.1,
No.3, September 2004, pp. 247-271
"Design and Implementation of a Lightweight Dynamic Optimization System",
by Jiwei Lu, Howard Chen, Pen-Chung Yew, Wei Chung Hsu,
Journal of Instruction-Level Parallelism, Volume 6, 2004
"Interprocedural Induction Variable Analysis",
by P.Y. Tang and P.C. Yew,
International Journal of Foundation of Computer Science,
World Scientific, Vol.14, No.3, June 2003, pp.405-423
Data Dependence Profiling for Speculative Optimizations ,
by Tong Chen, Jin Lin, Xiaoru Dai, Wei-Chung Hsu, and Pen-Chung Yew,
International Conference on Compiler Construction (CC), Barcelona, Spain, March 2004
Alias and dependence profiling in ORC and their applications ,
by Tong Chen, Chu-Cheow Lim, Tin-fook Ngai and Roy Ju,
the First Intel Dynamic Compilation and Profile-guided Optimization Conference, Nov. 2003
A Compiler Framework for Speculative Analysis and Optimizations ,
by Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew, Roy Dz-Ching Ju, Tin-Fook. Ngai, Sun Chan,
Proceedings of ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), San Diego, June 2003.
Speculative Register Promotion Using Advanced Load Address Table (ALAT) ,
by Jin Lin, Tong Chen, Wei-Chung Hsu, Pen-Chung Yew,
Proceeding of First Annual IEEE/ACM International Symposium on Code Generation and Optimization (CGO), San Francisco, March 2003.
An Empirical Study on the Granularity of Pointer Analysis in C Programs ,
by Tong Chen, Jin Lin, Wei-Chung Hsu, Pen-Chung Yew,
Proceedings of 15th Workshop on Languages and Compilers for Parallel Computing (LCPC), August 2002.
On the Impact of Naming Methods for Heap-Oriented Pointers in C Programs ,
by Tong Chen, Jin Lin, Wei-Chung Hsu, Pen-Chung Yew,
Proceedings of The 6th International Symposium on Parallel Architectures, Algorithms, and Networks, May 2002.
Integrating
scalar analysis and optimizations in a Parallel and optimizing compiler ,
by B.Zheng, Ph.D. Thesis, Jan. 2000.
Designing
the Agassiz Compiler for Concurrent Multithreaded Architectures,
by B. Zheng, J.-Y. Tsai, B. Y. Zang, T. Chen, B. Huang, J. H Li, Y. H.
Ding, J. Liang, Y. Zhen, P.-C. Yew, C.Q. Zhu,
Workshop on Languages and Compilers for Parallel Computing (LCPC), August 1999.
A
Hierarchical Approach to Context-Sensitive Interprocedural Alias Analysis,
by Bixia Zheng and Pen-Chung Yew,
TR99-018, Univ. of Minnesota
High-Level
Information - An Approach for Integrating Front-End and Back-End Compilers,
by S. Cho, J.-Y. Tsai, Y. Song, B. Zheng, S. J. Schwinn, X. Wang, Q. Zhao,
Z. Li, D. J. Lilja, and P.-C. Yew,
Proceedings of the 1998 International Conference on Parallel Processing (ICPP),
August 1998.
(Also as Technical
Report #98-008, Dept. of Computer Science and Engineering, Univ. of
Minnesota, February 1998.)
Compiler
Techniques for Concurrent Multithreading with Hardware Speculation Support,
by Z. Li, J.-Y. Tsai, X. Wang, P.-C. Yew, and B. Zheng,
Proceedings of the 9th Workshop on Languages and Compilers for Parallel
Computing (LCPC), August 1996.
An
Efficient Algorithm for the Run-Time Parallelization of Doacross Loops,
by D.K. Chen, D.A. Oesterreich, J. Torrellas, and P.-C. Yew,
Technical Report #97-028, Dept. of Computer Science, Univ. of Minnesota,
July 1997. Preliminary version appeared in Supercomputing '94.
Statement
Reordering for Doacross Loops,
by D.K. Chen and P.-C. Yew,
Technical Report #97-029, Dept. of Computer Science, Univ. of Minnesota, July 1997.
Preliminary version appeared in ICPP '94.
On
Effective Execution of Non-Uniform DOACROSS Loops,
by D.K. Chen and P.-C. Yew,
IEEE Transactions on Parallel and Distributed Systems (TPDS), Vol. 7, No. 5, May 1996.
Enhancing
Multiple-Path Speculative Execution with Predicate Window Shifting,
by J.Y. Tsai and P.-C. Yew,
Journal of System Architecture - Special Issue on Microprocessor Architecture,
1998.
Speculative Multi-Threaded, Multi-Core Architectures
**** Please also visit Arctic Group for
more information on superthreaded architectures.
"Supporting Speculative Multithreading on Simultaneous
Multithreaded Processors",
by V.Packirisamy, S.Wang, A. Zhai, W.C.Hsu and P.C.Yew,
Proc. of Intn'l Conf. on High Performance Computing (HiPC), Dec. 2006
The
Superthreaded Processor Architecture,
by J.-Y. Tsai, J. Huang, C. Amlo, D.J. Lilja, and P.-C. Yew,
In the IEEE Transactions on Computers, Special Issue on Multithreaded
Architectures, vol. 48, no. 9, Sep., 1999
Performance
Study of a Concurrent Multithreaded Processor<,
by J.-Y. Tsai, Z. Jiang, E. Ness, and P.-C. Yew,
Proceedings of the Fourth Int'l Conf. on High-Performance Computer
Architecture (HPCA-4), Feb. 1998.
The
Superthreaded Architecture: Thread Pipelining with Run-Time Data Dependence
Checking and Control Speculation,
by J.-Y. Tsai and P.-C. Yew,
Proceedings of Int'l Conf. on Parallel Architectures and Compilation
Techniques (PACT '96), Oct. 1996.
Compiler
Techniques for the Superthreaded Architectures,
by J.-Y. Tsai, Z. Jiang, and P.-C. Yew,
International Journal of Parallel Programming - Special Issue on Languages
and Compilers for Parallel Computing, June 1998.
Superthreading:
Integrating Compilation Technology and Processor Architecture for Cost-Effective
Concurrent Multithreading,
by J.-Y. Tsai, Z. Jiang, Z. Li, D.J. Lilja, X. Wang, P.-C. Yew, B. Zheng,
and S. Schwinn,
Journal of Information Science and Engineering, March 1998.
Improving
Instruction Throughput and Memory Latency Using Two-Dimensional Superthreading,
by J.-Y. Tsai, B. Zheng, and P.-C. Yew, Technical Report
Program
Optimization for Concurrent Multithreaded Architectures,
by J.-Y. Tsai, Z. Jiang, and P.-C. Yew
Proceedings of the 10th Workshop on Languages and Compilers for Parallel
Computing, Aug. 1997.
Integrating
Compilation Technology and Processor Architecture for Cost-Effective Concurrent
Multithreading,
by J.-Y. Tsai,
Ph.D. Thesis, Computer Science, University of Illinois at Urbana-Champaign,
April 1998.
Compiler
and Architecture Issues for Concurrent Multi-threaded Architectures,
by P.-C. Yew, Presentation Material for the Intel MRL Research Forum, Nov. 1996.
Integrating
Compilation Technology and Processor Architecture for Cost-Effective Concurrent
Multithreading,
by P.-C. Yew
Presentation Material for the SGI/CRAY Future Architecture Seminar,
July 1997
Speculative Execution
Decoupled
Value Prediction on Trace Processors,
by S.J. Lee, Y. Wang and P.C. Yew,
Proceedings of the 6th Int'l Conf. on High Performance Computer Architecture
(HPCA-6), Toulouse, France, Jan. 2000
Exploiting
Basic Block Value Locality with Block Reuse,
by J. Huang and D. J. Lilja,
Proceedings of the 5th Int'l Symposium on High Performance Computer
Architecture (HPCA-5), Orlando, Jan., 1999
System Software
Live Updating Operating Systems Using Virtualization,
by H.Chen, R.Chen, F.Zhang, B.Zang, P.C.Yew,
Proc. of 2nd Int'l Conf. on Virtual Execution Environments (VEE),
pp. 35-44, June 2006.
High-Performance Memory System Design
Processor memory system design
A High-Bandwidth Memory Pipeline for Wide Issue Processors
,
by S. Cho, P.-C. Yew, and G. Lee,
IEEE Transactions on Computers, Vol. 50, No. 7, July 2001.
Decoupling Local Variable Accesses in a Wide-Issue Superscalar Processor
,
by S. Cho, P.-C. Yew, and G. Lee,
Proceedings of the 26th Int'l Symp. on Computer Architecture (ISCA), May 1999.
(Also as Technical Report #98-020, Dept. of Computer Sci. and Eng.,
Univ. of Minnesota, May 1998)
Access Region Locality for High-Bandwidth Processor Memory System Design
,
by S. Cho, P.-C. Yew, and G. Lee,
Proceedings of the 32nd Int'l Symp. on Microarchitecture (MICRO-32),
Nov. 1999.
Multiprocessor cache and memory design
Efficient integration of compiler-directed cache coherence and data
prefetching,
by H.-B. Lim and P.-C. Yew,
Journal of Parallel and Distributed Computing, Vol. 61, No. 12, Dec.
2001, pp. 1775-1802.
Efficient integration of compiler-directed cache coherence and data
prefetching,
by H.-B. Lim and P.-C. Yew,
Proceedings of the International Parallel and Distributed Processing
Symposium (IPDPS 2000), May 2000 (Best Paper Award).
Binding
Time in Distributed Shared Memories,
by J. Kong, PhD Thesis, June 1999.
Maintaining
Cache Coherence through Compiler-directed Data Prefetching,
by H.-B. Lim and P.-C. Yew,
Journal of Parallel and Distributed Computing, Vol. 53, No. 2, Sep.
1998, pp. 144-173.
Hardware
and Compiler-Directed Cache Coherence in Large-Scale Multiprocessors,
by L. Choi and P.-C. Yew,
Technical Report #97-030, Dept. of Computer Science, Univ. of Minnesota,
July 1997.
Compiler
Analysis for Cache Coherence,
by L. Choi and P.-C. Yew,
Technical Report #97-031, Dept. of Computer Science, Univ. of Minnesota,
July 1997.
A
compiler-directed cache coherence scheme using data prefetching,
by H.-B. Lim and P.-C. Yew,
Proceedings of the 1997 International Parallel Processing Symposium,
April 1997.
Techniques
for compiler-directed cache coherence,
by L. Choi, H.-B. Lim, and P.-C. Yew,
IEEE Parallel & Distributed Technology, Winter 1996, pp. 23-34.
Compiler
support for maintaining cache coherence using data prefetching,
by H. B. Lim, L. Choi, and P.-C. Yew,
Extended abstract in Proceedings of the Ninth Workshop on Languages
and Compilers for Parallel Computing (LCPC '96), Santa Clara, CA, Aug. 1996.
Program
Analysis for Cache Coherence: Beyond Procedural Boundaries,
by L. Choi and P.-C. Yew
Proceedings of International Conference on Parallel Processing, Aug. 1996.
Compiler
and Hardware Support for Cache Coherence in Large-Scale Multiprocessors:
Design Considerations and Performance Evaluation,
by L. Choi and P.-C. Yew,
Proceedings of International Symposium on Computer Architecture, May
1996, pp. 283-294.
Eliminating
Stale Data References through Array Data-Flow Analysis,
by L. Choi and P.-C. Yew,
Proceedings of International Parallel Processing Symposium, April 1996.
Interprocedural
Array Data-Flow Analysis for Cache Coherence,
by L. Choi and P.-C. Yew,
Proceedings of Eighth Workshop on Languages and Compilers for Parallel
Computing, Aug. 1995.
Compiler
Assistance for Directory-Based Cache Coherence Enforcement,
by David J. Lilja,
Proceedings of Workshop on Challenges for Parallel Processing, International
Conference on Parallel Processing, Aug. 1995, pp. 133-138.
The
Potential of Compile-Time Analysis to Adapt the Cache Coherence Enforcement
Strategy to the Data Sharing Characteristics,
by Farnaz Mounes-Toussi and David J. Lilja,
IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 5, May 1995, pp. 470-481.
Using
Compiler Assistance to Reduce the Network Traffic Requirements of a Directory-Based
Cache Coherence Mechanism,
by Zhiyuan Li, Farnaz Mounes-Toussi, and David J. Lilja,
High-Performance Parallel Computing Research Group Technical Report
#HPPC-95-01, Jan. 1995.
Reducing
the Impact of False-Sharing Using a Write-Through Cache with Partial Block
Invalidation,
by Farnaz Mounes-Toussi and David J. Lilja,
High-Performance Parallel Computing Research Group Technical Report
#HPPC-94-15, Dec. 1994.
A
Compiler-Directed Cache Coherence Scheme with Improved Intertask Locality,
by L. Choi and P.-C. Yew,
Proceedings of Supercomputing '94, Washington, D.C., Nov. 1994, pp.
773-782.
A
Superassociative Tagged Cache Coherence Directory,
by David J. Lilja and Shanthi Ambalavanan,
Proceedings of International Conference on Computer Design, Oct. 1994,
pp. 42-45.
(Extended
version)
A
Compiler-Assisted Scheme for Adaptive Cache Coherence Enforcement,
by Trung N. Nguyen, Farnaz Mounes-Toussi, David J. Lilja, and Zhiyuan,
Li
Proceedings of IFIP International Conference on Parallel Architectures
and Compilation Techniques, Aug. 1994, pp. 69-78.
An
Evaluation of a Compiler Optimization for Improving the Performance of
a Coherence Directory,
by Farnaz Mounes-Toussi, David J. Lilja, and Zhiyuan Li,
Proceedings of ACM International Conference on Supercomputing, July
1994, pp. 75-84.
Software
Assistance for Directory-Based Caches,
by Z. Li,
Proceedings of the 8th IEEE International Parallel Processing Symposium,
1994.
Performance
Limits of Compiler-Directed Multiprocessor Cache Coherence Enforcement,
by Farnaz Mounes-Toussi and David J. Lilja,
The Interaction of Compilation Technology and Computer Architecture,
D. J. Lilja and P. L. Bird (eds.)
Kluwer Academic Publishers, Boston, MA, 1994, pp. 161-190.
Improving
Memory Utilization in Cache Coherence Directories,
by David J. Lilja and Pen-Chung Yew,
IEEE Transactions on Parallel and Distributed Systems, Vol. 4, No.
10, Oct. 1993, pp. 1130-1146.
Cache
Coherence in Large-Scale Shared-Memory Multiprocessors: Issues and Comparisons,
by David J. Lilja,
ACM Computing Surveys, Vol. 25, No. 3, Sep. 1993, pp. 303-338.
Compiler
Support for the Efficient Use of Cache Coherence Directories,
by Trung N. Nguyen, Zhiyuan Li, and David J. Lilja,
High-Performance Parallel Computing Research Group Technical Report
#HPPC-94-19, Dec. 1994.
(Also appeared as "Efficient Use of Dynamically Tagged Directories
Through Compiler Analysis,"
by Trung N. Nguyen, Zhiyuan Li, and David J. Lilja,
Proceedings of International Conference on Parallel Processing, Vol.
II: Software, Aug. 1993, pp. 112-119)
Locality enhancement and latency hiding
Integrating
Fine-Grained Message Passing in Cache Coherent Shared-Memory Multiprocessors,
by D. Poulsen and P.-C. Yew,
Journal of Parallel and Distributed Computing, Vol. 33, No. 2, March
1996, pp. 172-188.
Write
Buffer Design for Cache-Coherent Shared-Memory Multiprocessors,
by Farnaz Mounes-Toussi and David J. Lilja,
Proceedings of International Conference on Computer Design, Oct. 1995,
pp. 506-511.
An
Interprocedural Parallelizing Compiler and Its Support for Memory Hierarchy
Research,
by J. Gu, Z. Li, and T.N. Nguyen,
Languages and Compilers for Paralle Computing, Lecture Notes in Computer
Science, 1033, Springer-Verlag, Aug. 1995.
Data
Prefetching and Data Forwarding in Shared-Memory Multiprocessors,
by D. Poulsen and P.-C. Yew,
Proceedings of the International Conference on Parallel Processing,
Vol. II, Aug. 1994, pp. 276-280.
Performance Analysis and Simulation Tools
An
Efficient Strategy for Developing a Simulator for a Novel Concurrent Multithreaded
Processor Architecture,
by J. Huang and D. Lilja,
Proceedings of the 6th International Symposium on Modeling, Analysis,
and Simulation of Computer and Telecommunication Systems, July, 1998
Processor
Self-Scheduling in Parallel Discrete Event Simulation,
by P. Konas and P.-C. Yew,
Proceedings of 1995 Winter Simulation Conference, Dec. 1995.
Parallel
Simulations of Multiprocessors,
by P. Konas and P.-C. Yew,
Simulation: Practice and Theory, Elsevier Science Publisher, 1994.
Execution-Driven
Tools for Parallel Simulation of Parallel Architecture and Applications,
by D. Poulsen and P.-C. Yew,
Proceedings of Supercomputing '93, Nov. 1993, pp. 860-869.
Benchmarking and Parallel Applications
Performance
and Program Complexity in Contemporary Network-based Parallel Computing
Systems,
by Steven VanderWiel, Dafna Nathanson, and David J. Lilja,
High-Performance Parallel Computing Research Group Technical Report
#HPPC-96-02, Mar. 1996.
A
Data Parallel Implementation of the TRFD Program from the Perfect Benchmarks,
by David J. Lilja and Jonathan Schmitt,
EUROSIM International Conference on Massively Parallel Processing Applications
and Development, Delft, The Netherlands, June 1994, pp. 355-362.
Scheduling for Parallel Systems
Performance
Analysis and Prediction of Processor Scheduling Strategies in Multiprogrammed
Shared-Memory Multiprocessors,
by Kelvin K. Yue and David J. Lilja,
Proceedings of International Conference on Parallel Processing, Aug.
1996.
Dynamic
Scheduling Strategies for Shared-Memory Multiprocessors,
by Babak Hamidzadeh and David J. Lilja,
Proceedings of International Conference on Distributed Computing Systems,
May 1996.
Efficient
Execution of Parallel Applications in Multiprogrammed Multiprocessor Systems,
by Kelvin K. Yue and David J. Lilja,
Proceedings of International Parallel Processing Symposium, April 1996,
pp. 448-456.
Dynamic
Scheduling Techniques for Heterogeneous Computing Systems,
by Babak Hamidzadeh, David J. Lilja, and Yacine Atif,
Concurreny: Practice and Experience, Special Issue on Resource Management
in Parallel and Distributed Systems, Vol. 7, No. 7, Oct. 1995.
Parallel
Loop Scheduling for High-Performance Computers,
by Kelvin K. Yue and David J. Lilja,
High Performance Computing: Technology, Methods, and Applications,
by J. Dongarra, L. Grandinetti, G. Joubert and J. Kowalik (eds.), Elsevier
Publishing Company, Amsterdam, Sep. 1995.
Parameter
Estimation for a Generalized Parallel Loop Scheduling Algorithm,
by Kelvin K. Yue and David J. Lilja,
Practical Handbook of Genetic Algorithms, Volume 2: New Frontiers,
by Lance D. Chambers (ed.),
CRC Press, Inc., Boca Raton, Florida, Aug. 1995.
Performance
Evaluation of Different Scheduling Schemes on Multiprocessor Architectures,
by Arundhati Kalavade,
High-Performance Parallel Computing Research Group Technical Report,
#HPPC-95-03 (also M.S. thesis), June 1995.
Partitioning
Tasks Between a Pair of Interconnected Heterogeneous Processors: A Case
Study,
by David J. Lilja,
Concurrency: Practice and Experience, Vol. 7, No. 3, May 1995, pp. 209-223.
(short
version)
Loop-Level
Process Control: An Effective Processor Allocation Policy for Multiprogrammed
Shared-Memory Multiprocessors,
by Kelvin K. Yue and David J. Lilja,
Proceedings of Workshop on Job Scheduling Strategies for Parallel Processing,
International Parallel Processing Symposium,
D.G. Feitelson and L. Rudolph (eds.)
Springer-Verlag Lecture Notes in Computer Science, Vol. 949, April
1995, pp. 182-199.
(Also as High-Performance Parallel Computing Research Group Technical
Report #HPPC-95-02, Jan. 1995)
Categorizing
Parallel Loops Based on Iteration Execution Time Variances,
by Kelvin K. Yue and David J. Lilja,
High-Performance Parallel Computing Research Group Technical Report
#HPPC-94-13, Nov. 1994.
Self-Adjusting
Scheduling: An On-Line Optimization Technique for Locality Management and
Load Balancing,
by Babak Hamidzadeh and David J. Lilja,
Proceedings of International Conference on Parallel Processing, Volume
II: Software, Aug. 1994, pp. 39-46.
The
Impact of Parallel Loop Scheduling Strategies on Prefetching in a Shared-Memory
Multiprocessor,
by David J. Lilja,
IEEE Transactions on Parallel and Distributed Systems, Vol. 5, No.
6, June 1994, pp. 573-584.
Exploiting
the Parallelism Available in Loops,
by David J. Lilja,
IEEE Computer, Vol. 27, No. 2, Feb. 1994, pp. 13-26.