Xiangyong (Shangyong) Ouyang

I'm now at 5th year of my PhD study at
Network Based Computing Lab,
Dept. of Computer Science and Engineering,
The Ohio State University.

My advisor is Dr. Dhabaleswar K. (DK) Panda.

My Email: ouyangx@cse.ohio-state.edu





Research Interests

  • Leveraging SSD technology to accelerate parallel data access

  • Optimizing Parallel IO with a wide spectrum of technologies including RDMA and SSD

  • High performance MPI checkpoint and Pro-Active process migration leveraging InfiniBand and RDMA.


  • Projects

  • SSD-Assisted Hybrid Memory (SAM)
  • SAM augments RAM with SSD to enlarge the effective memory size (as big as the SSD + RAM size). SAM provides a slab-based memory allocation interface to a programmer. SAM internally packes recently visited/allocated objects at RAM in LRU manner to improve access latency. SAM transparently evicts these objects to SSD, or loads them from SSD at proper moments. As a proof of concept we have integrated SAM into Memcached, and it's able to significantly expand the available memory size available to Memcached server. Initial experiments show large performance gains. We are currently working on a transparent solution to integrate SAM into applications.

  • Non-Contiguous Atomic IO with SSD
  • Many applications conduct extensive non-contiguous I/O accesses, and they demand data-consistency among those concurrent overlapping accesses. Existing solutions cannot meet such a requirement. We leverage the fast random access provided by SSD to address this problem. We propose a new IO primitive that batches multiple discrete I/O requests into a logical group which is completed in just one I/O call. We have extended SSD Flash Translation Layer (FTL) to guarantee the atomic completion of such a compound operation. This primitive, called Atomic Write, is able to improve MySQL transaction throughput by 30%. This design has been integrated into Fusion-io SSD products.

    Publication(s): HPCA 2011.

  • High Performance Process Migration with RDMA
  • Process Migration is widely used in many applications such as Fault Tolerance, cluster-wide load balancing, server consolidation, performance isolation, etc. In our research we reveal that inefficient I/O and network transfer are the principal factors responsible for its high overhead. We have designed Pipelined Process Migration with RDMA (PPMR) to overcome these overheads. PPMR fully pipelines data writing, data transfer, and data read operations during different phases of a migration cycle. PPMR has been implemented into MVAPICH2, a popular high performance opensource MPI-2 implementation. Experimental results show that PPMR achieves a 10.7X speedup over conventional approaches.

    Publication(s): CCGrid 2011, Cluster 2010

  • Enhancing Checkpoint/Restart Performance
  • Checkpoint/Restart (C/R) mechanisms have widely adopted to achieve fault-tolerance. However, a major limitation is the intensive IO bottleneck. We propose a scheme, Write Aggregation with Dynamic Buffer and Interleaving, to reduce the overhead related to checkpoint creation. By aggregating checkpoint writes into a dynamic buffer pool and overlapping the application progress with the file writes, we can significantly reduce checkpoint creation overhead. We have developed CRFS, a stackable filesystem, with the optimizations built in. CRFS achieves up to 5.5X speedup in checkpoint writing performance to Lustre filesystem. CRFS has been integrated into MVAPICH2, a popular high performance opensource MPI-2 implementation.

    Publication(s): ICPP 2011, SNAPI 2010, ICPP 2009, HiPC 2009


    Publications

  • J. Huang, X. Ouyang, J. Jose, M. Wasi-ur-Rahman, H. Wang, M. Luo, H. Subramoni, C. Murthy and D. K. Panda, High-Performance Design of HBase with RDMA over InfiniBand, Int'l Parallel and Distributed Processing Symposium (IPDPS 2012), May 2012.
  • V. Meshram, X. Besseron, X. Ouyang, R. Rajachandrasekhar and D. K. Panda, Can a Decentralized Metadata Service Layer benefit Parallel Filesystems? Workshop on Interfaces and Architectures for Scientific Data Storage (IASDS '11), held in conjunction with Cluster '11, Sept. 2011. Conference Slides
  • X. Ouyang, R. Rajachandrasekhar, X. Besseron, H. Wang, J. Huang and D. K. Panda, CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart, Int'l Conference on Parallel Processing (ICPP '11), Sept. 2011. Conference Slides
  • R. Rajachandrasekar, X. Ouyang, X.Besseron, V. Meshram and D. K. Panda, "Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging?", Resiliency in High Performance Computing in Clusters, Clouds and Grids (Resilience 2011), in conjunction with Euro-Par, Aug. 2011. Conference Slides
  • H. Wang, S. Potluri, M. Luo, A. Kumar Singh, X. Ouyang, S. Sur, D. K. Panda, Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2, IEEE International Conference on Cluster Computing (Cluster'11), September 26-30, 2011.
  • J. Jose, H. Subramoni, M. Luo, M. Zhang, J. Huang, M. W. Rahman, N. S. Islam, X. Ouyang, S. Sur and D. K. Panda, Memcached Design on High Performance RDMA Capable Interconnects, Int'l Conference on Parallel Processing (ICPP '11), Sept. 2011. Conference Slides
  • X. Ouyang, R. Rajachandrasekar, X. Besseron, D. K. Panda, High Performance Pipelined Process Migration with RDMA , The 11th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGrid 2011), Newport Beach, CA, May 2011. (Conference Slides).
  • X. Ouyang, D. Nellans, R. Wipfel, D. Flynn, D. K. Panda, Beyond Block I/O: Rethinking Traditional Storage Primitives, The 17th IEEE International Symposium on High Performance Computer Architecture (HPCA-17), February 2011, San Antonio, Texas. (Conference Slides).
  • S. Sur, H. Wang, J. Huang, X. Ouyang and D. K. Panda, Can High-Performance Interconnects Benefit Hadoop Distributed File System?, Workshop on Micro Architectural Support for Virtualization, Data Center Computing (MASVDC), and Clouds, In Conjunction with MICRO 2010, Dec 2010, Atlanta, GA, USA. Conference Slides.

  • X. Ouyang, S. Marcarelli, R. Rajachandrasekar and D. K. Panda, RDMA-Based Job Migration Framework for MPI over InfiniBand, IEEE International Conference on Cluster Computing 2010 (Cluster '10), Sept. 2010 Conference Slides.
  • X. Ouyang, S. Marcarelli and D. K. Panda, Enhancing Checkpoint Performance with Staging IO and SSD, IEEE International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI 2010), May 2010. Conference Slides.
  • X. Ouyang, K. Gopalakrishnan, T. Gangadharappa and D. K. Panda, Fast Checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on Multicore Architecture, Int'l Conference on High Performance Computing (HiPC '09), Dec. 2009. Conference Slides.
  • X. Ouyang, K. Gopalakrishnan and D. K. Panda, Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems, Int'l Conference on Parallel Processing (ICPP '09), Sept. 2009. Conference Slides.
  • R. Noronha, X. Ouyang and D. K. Panda, Designing a High-Performance Clustered NAS: A Case Study With pNFS over RDMA on InfiniBand, International Conference on High Performance Computing (HiPC '08), December 2008. Conference Slides.
  • L. Chai, X. Ouyang, R. Noronha and D.K. Panda. pNFS/PVFS2 over InfiniBand: Early Experiences. Petascale Data Storage Workshop 2007 (PDSW 2007), in conjunction with SuperComputing (SC) 2007, Reno, NV, November 2007. Conference Slides

  • Industry Experience

  • Fusion-io, Salt Lake City, USA, June 2010 - September 2010

    Graduate Technical Intern, Mentors: Drex Dixon, David Nellans

    Designed and implemented a new block IO primitive to take advantage of state-of-the-art Solid State Storage technology. This new primitive, Atomic-Write, batches multiple discrete IO into a logical group which is either committed or rolled back as an atom. This primitive can greately alleviate a transactional manager by avoiding the need of a complicated protocol to ensure ACID property.

    Had extended MySQL InnoDB storag engine to leverage Atomic-Write primitive to boost its performance.

    Initial evaluation showed 30% improvement in transaction throughput. Please see the paper for detailed studies.

  • Fusion-io, Salt Lake City, USA, June 2009 - September 2009

    Graduate Technical Intern, Mentors: Drex Dixon, Jeremy Garff

    Investigated the Solid State Storage driver stack.
    Designed a filesystem prototype (called DirectFS), which is built directly on top of Solid State Storage.
    Significantly minimized the metadata overhead in DirectFS to fully unleash the potentials of SSD.


    Technical Strength

  • InfiniBand network communication development, RDMA programming
  • Linux filesystem development, Linux kernel & driver development
  • C, C++, Unix programming, Shell programming

  • Education

  • 2006 ~ present: Ph.D student at Dep. of Computer Science & Engineering, The Ohio State University
  • 2004 ~ 2006: Visiting student at Tokyo Institue of Technology
  • 2001 ~ 2004: Master of Science at Electronic Engineering, Tsinghua University, Beijing China
  • 1997 ~ 2001: Bachelor of Science at Electronic Engineering, Tsinghua University, Beijing China

  • Useful Links

  • Prog Language Perf Comparison