CAPSL Gelato | Advancing Linux Itanium HP Open64 at SourceForge Tsinghua University University of Delaware

Google Summer of Code 2008

Google Summer of Code (GSoC) is a program that offers student developers stipends to write code for various open source projects. Google will be working with a several open source, free software, and technology-related groups to identify and fund several projects over a three month period.




Open64 Projects Funded by Google


1. Implementation of a static analysis phase for detecting potential memory leaking and duplicate free bugs
Student:  Thomas St. John; Mentor: Handong Ye
Abs.: In a language that doesn't provide garbage collection support/facilities, such as C, C++, programmers are required to manage memory themselves. However, this may cause some problems, such as duplicate free of same memory region, memory leaking, etc. There are some tools that can help programmers debug this program. However, many of them requires programmers to run the programs. Also, the detection result highly depends on the input and behavior of the program. Statically detecting memory leaks has been extensively studied in research community, such as "Practical Memory Leak Detection using Guarded Value-Flow Analysis" (Sigmund Cherem Lonnie Princehouse Radu Rugina PLDI' 07). However, Open64 compiler currently does not have any phases for memory safety detection. This proposal seeks to implement one of the static analysis techniques for detecting memory leaking, so that potential memory leaking/duplicate free problems can be detected at compilation time. The advantage with this is that programmers can be informed of possible memory operation issues at compile-time.

2. Advanced dead function elimination
Student: Chris Adamopoulos; Mentor: Gan, Ge
Abs.: Open64 Dead Function Elimination suffers from the fact that all virtual functions are address exposed via virtual tables. In this project, class hierarchy analysis and determination of the possible set of overrider functions for a class hierarchy needs to be performed. Based on this analysis, Dead Function Elimination can be made much sharper than it is today.

3. Implementation of New LNO Heuristics and Cost Models on X8664
Student: Chuntao Hong; Mentor: Robert Hundt
Abs.: Heuristics play an essential role in LNO. However, the heuristics and the models implemented in Open64 are implemented long ago. The computer architecture, especially the x86 architecture, has come a long way since then. We now have powerful Out Of Order (OOO) units, deep pipeline, and more registers on X8664. As a result, the old models in Open64 cannot reflect the characteristics of the new architecture, causing the heuristics to give wrong decisions. Through implementing new heuristics and cost models, we can better leverage the power of new X8664 CPUs, and make the Open64 compiler more powerful.

4. Function/Loop Specific Optimization
Student: Eun Jung Park Mentor: John Cavazos
Abs.: Nowadays, compilers have a plethora of optimizations that would confuse most users. Specific combinations of these optimizations can increase performance, reduce power consumption, reduce code size, or improve all of these metrics. However, good combinations are difficult to find since the space of optimizations can be enormous. Because of this, many researchers have focused on finding the best combination of optimizations or a good ordering of them given a specific architecture and application. However, previously proposed techniques have involved search (therefore are expensive) and require user-intervention. It remains an open problem to find the best set of optimization and ordering of these optimizations efficiently. This project focuses on finding a good ordering of optimizations for real-world applications, for example, bioinformatics, financial, and data-mining applications. Our techniques will be designed to be portable across a wide range of applications and architectures and it will use machine learning algorithms to learn good optimization orderings. Just as a human can learn how to solve complex problems, this project will develop a machine learning framework to use important characteristics of applications and hardware to help the compiler find the best combination and orderings of optimizations. This project will use several bioinformatics, data mining, and financial applications compiled using the Open64 compiler targeted to different architectures, including IA64 and X86-64.




Idea List:

1. Advanced dead function elimination:
Open64 DFE suffers from the fact that all virtual functions are address exposed via virtual tables. In this project, class hierarchy analysis and determination of the possible set of overrider functions for a class hierarchy needs to be performed. Based on this analysis, DFE can be made much sharper than it is today.



2. Rewrite GCM in CG:


The current global code motion (GCM) has a bad coding style, not consistent with the other parts. The open64 coding is a c++ style while GCM is the traditional C style, with lots of global variables and global functions. It would be nice if we can rewrite this part in CG. 3+ years C++, STL programming experience, at least have read Aho's dragon book or Muchnick's compiler text book.





3. Rewrite LRA in CG:


Current LRA was created many years ago when there are few compilers written in C++, so the design and coding style is far different than other parts of the compiler. The functions and data structures are not organized well, and this adds extra efforts for new developers of open64. This project is to redesign and implement a new LRA, to meet the style of the whole compiler. The student will get a good opportunities to understand how local register allocation is done, and will be trained of good programmer.

4. Implement inline heuristics to find out most profitable inline decisions for applications.


Inlining can expand the optimization scope for many optimizations such as partial redundancy elimination, global code motion, data layout, etc. It's also a brutal method to convert context insensitive points-to analysis to context sensitive for inlined callsites. It can also eliminate call overhead. However, the negative side for inlining is that it will introduce register pressure and I-Cache pressure, which will finally harm performance; and also the code explosion should also be prevented because of its resulting compile time explosion. So the problem "how to find the most important inline decisions" contains two questions: how to find the callsites inlining which will bring most performance gain; and how to find the callsites inlining which will harm most performance. This can be done by fine tuned heuristics using static program analysis techniques, and also we can borrow profiles of runtime information to help make better decisions.

5. Implement latest points-to research result into Open64 compiler, and evaluate their effectiveness, add profiling support for points-to analysis:



Points-to is one of most important part in compiler and other program analysis tools, research community had invented a lot of new and practical algorithms to solve the scalability and precision issue. In Open64, the pointer analysis part is very old, use the algorithm based on Steesgaard'96 paper. In this work, we try to implement the new flow-insensitive and context-insensitive algorithm based on Calvin Lin's paper. Also we can use profiling information and static heuristic to identify some important alias pair, then apply more expensive algorithm, such as context sensitive, to these pair in a demand driven way.

6. Software Prefetching:



Implement advanced profiling guide software prefetch, investigate the interaction between software: Research and implement prefetch for pointer chasing access patterns,Tune stride prefetch for array based access style code. Collaborating of  software prefetch and hardware prefetch on multiple platforms,research on bandwidth efficiency and power efficiency optimization opportunities.

7. Implement a practical unified data structure optimization framework for memory systems.



Hiding the latency of memory system is critical to achieving high performance on modern machines. In the past, lots of work have been done to improve locality of matrix accessing in loops, as well as prefetching of regular stride memory access. But completed data structures, for example, array of structures, pointer chasing structures etc, are not well handled. Recently, there are several researches addressing this issue. Sacral optimizations are proposed, such as field reordering, dead field removing, structure peeling, structure splitting etc. Preliminary results of those optimizations are promising. However, the work of those researches are either not practical, for example the target language is a tailored type-safe C. Or not easy to use, for example need huge memory trace. Or not well integrated. The work proposed here unifies all the optimizations above as well as readdressing and prefetching for complicated data structures with the help of auto memory pooling. The framework will be implemented on Open64 and has no pre-diction on its target language.

8. Implement a static program analysis tool based on Open64 compiler:



Static program analysis became one of most powerful tools to identify the bugs in type-unsafe languages, such as c. Open64 compiler provide a very good infra-structure for inter-procedural analysis. in this work, we need to implement a static program analysis tool based on Open64's facilities.

9. Advanced loop nest optimization


Loop nest optimization is still proven to be a very useful phase to improve the performance of a program, in this work, we need to survey some latest loop optimization research result, and implemented it into the open64 compiler. Also we can do some enhancement on open64's existed LNO, such as loop unrolling,etc.

10. Improving the Open64 backend for GPUs



Nvidia is using an open64 port of its backend to generate code for several of their GPU cards.  This port probably could be improved in several ways, for example in its register allocation or instruction scheduling. Also, several backend
optimizations could be added that might significantly improve the performance of GPU code generated by the open64 backend.

11. Function/Loop specific optimizations


Selecting the best set of optimizations for a function or loop has been the focus of optimizing compiler research for decades.  Determining the best set of optimizations, however is notoriously difficult.   The project will try to develop techniques using machine learning to drive an optimizing compiler to perform the best set of optimizations for functions and loops based on the underlying architecture.

12. Improve the Dead Store Elimination in WOPT so it does effective optimization in the presence of aliasing.  Potentially, extend the implementation to cover indirect stores.

13. Rewrite the WHIRL simplifier using templates.  Look for more peephole optimization opportunities and add them to the WHIRL simplifier.

14. Implement run-time versioning for pointers that cannot be de-aliased at compile time. When two pointers are determined at run-time to point to non-overlapping memory, then branch to the more optimized version of code.