Agassiz Static and Dynamic Compilers

Compilers are an essential component in supporting speculative multi-threaded, multi-core processors. Very aggressive compiler optimizations are needed to wring out thread-level parallelism in general-purpose applications. New analysis approaches such as profiling allows data dependences, aliases and control dependences to be obtained efficiently to support such aggressive optimizations and speculation. Recovery code also needs to be generated to guarantee the correctness of execution when mis-speculation occurs. Compiler optimizations could also been done at runtime when more information about the application program could be obtained, so called dynamic optimizations. Special considerations and techniques, such as phase detection, overhead reduction through efficient sampling and code cache management, are needed to support such dynamic schemes. The targeted languages are C and Fortran.

Speculative Multi-Threaded, Multi-Core Architectures

Multi-threaded, multi-core processors are on the road map of all major microprocessors, and even embedded processors vendors. The low communication latency on a chip allows such systems to exploit finer-grained parallelism in general-purpose (i.e. integer-intensive) applications for a better performance and lower power consumption. However, more advanced architectural and compiler support, such as speculation, are required to achieve the stated goals. Superthreaded architectures allow multiple threads to be executed concurrently with thread-level control speculation and runtime data dependence check to speed up single program execution. The detailed execution-driven simulator for this novel architecture can be downloaded here.

System Software

Operating system design in conjunction with middleware development is crucial to multi-core embedded systems because of their small on-chip memories as well as low power and real time requirements. On the other hand, virtualization could be used to support many system requirements, but might incur too much overhead. Virtualization "on demand" seems a way to address such issues.

High-Performance Memory System Design

Processor memory system design
Techniques to improve the memory system performance (i.e. memory latency and memory bandwidth) in dynamically scheduled multiple-issue processors are investigated.

Multiprocessor cache and memory design
This project focuses on combined hardware/software techniques for maintaining cache coherence and for tolerating and avoiding false-sharing in multiprocessors.

Locality enhancement and latency hiding
Compiler techniques are developed to reduce the average data reference latency in parallel programs by enhancing data locality and hiding memory latency. For uniform memory access (UMA) multiprocessors, techniques for task alignment are developed to increase cache hit rate. For nonuniform memory access (NUMA) multiprocessors, data and tasks are co-allocated to processors to enhance data affinity. For programs whose locality is difficult to improve, compiler techniques for memory latency hiding are explored, which include data forwarding, in the form of compiler-assisted write-update cache, as well as software prefetching. Experiments are conducted both on SGI multiprocessors and on simulators.

Performance Analysis and Simulation Tools

Parallel discrete event-driven simulation techniques are developed on shared-memory multiprocessors to speed up the simulation time.

Benchmarking and Parallel Applications

This work is developing metrics to quantify the relationship between performance, programming complexity, and portability of different parallel programming paradigms. This work requires the development of new parallel application programs and the porting of existing applications to many different languages and architectures.

Scheduling for Parallel Systems

A critical consideration in minimizing the execution time of a program is choosing how to allocate parallel tasks to the individual processors. This work is developing new scheduling algorithms for fully exploiting the parallelism available in an application program when executed in multiprogrammed parallel systems.



Home Projects People Papers Resources Contact