Effective Global Memory Sharing and Coorindation in Cluster Systems

High-end clusters have become standard and cost-effective distributed platforms for large-scale scientific, commercial, and Web service applications. With the rapid advancement of CPU chip and networking technology, and increasingly large global memory capacity available in clusters, parallel and and distributed application workloads on high-end clusters are facing the following four technical challenges: (1) The growing Internet usages with high speed networks put an increasingly high pressure on computing nodes in clusters for fast service responses and fast data accesses. (2) the latency of disk accesses has become critical bottleneck, increasing the penalty for page faults and other I/O operations; (3) workloads are becoming more and more data-intensive, and their interactions in computing nodes are more and more dynamic; and (4) systems can be heterogeneous and nondedicated, with computing capability and memory capacity varying among the nodes, where multiple jobs compete for both CPU and memory resources during executions.

We are conducting several research projects on evaluating, designing, and implementing effective resource management schemes to adapt the technology changes. The targeted workloads are large and memory-intensive scientific applications, data-intensive Internet accesses, and data processing for commercial databases.


Impact of Global Memory Hierarchy in Clusters to Parallel Computing

The cost-effectiveness of a cluster computing platform for a given budget and for certain types of applications is mainly determined by its memory hierarchy and the interconnection network configurations of the cluster. We have developed an analytical model for evaluating the performance impact of memory hierarchies and networks on cluster computing.

Representative Publications:


Effective Load Sharing and Job Scheduling in Clusters

We have developed and examined job migration policies by considering effective usage of global memory in addition to CPU load sharing in distributed systems. We have looked into several issues: scheduling jobs with dynamic memory demands, and load sharing on heterogeneous cluster systems.

Representative Publications


Resource Allocations Adaptive to Known, Unknown, or/and Unexpectedly Large Memory Demands in Clusters

We have developed and examined job scheduling policies to adaptively handle data-intensive jobs with known, unknown, or/and unexpectedly large memory demands. We not only consider improving average job response time, but also aim at being fair to each submitted job in any sizes.

Representative Publications


Global Memory Sharing and Cooperation to Improve Performance of Virtual Memory and Buffer Caching in Clusters

The advances in networking technology has dramatically improved data transfer bandwidth in clusters, providing performance opportunities to share global memory space in order to reduce disk accesses for dealing page faults and fetching I/O data. For example, the use of 10 gigabith Ethernet or infiniband can make the transfer of a data block between computing nodes in a cluster two orders of magnitude faster than accessing the block from a fast local disk. We have igned and evaluated several protocols for network RAM to conduct paging in a remote memory instead of in the local disk, and for cooperative buffer caching to utilize memory space in remote buffer caches for fast accesses of I/O data.

Representative Publications


Multi-Level Buffer Caching in High-End Data Servers and Data Grid

In a distributed data centers or data grid, the data are cached in multi-level memory storages, where data access patterns and locality strengths are different at different levels. We have proposed several data placement and replacement protocols to exploit multi-level cache locality.

Representative Publications


Auto-CFD: a Precompiler for Parallelizing CFD Applications in Clusters

We have developed and implemented a pre-compiler for CFD applications, called AUTO-CFD, which automatically parallelizes structured CFD applications in clusters.

Representative Publications