Post Doctoral Researcher
Dept. of Computer Science and Engineering
The Ohio State University
2015 Neil Avenue
Columbus, OH-43210, USA
khaledhamidouche (at) gmail.com
I am a post doctoral researcher
in the Department of
and Engineering at the Ohio State
University, since December, 2012.
My research advisor is Prof. Dhabaleswar
K. (DK) Panda.
I am a member of Network Based
My research interests include parallel systems, distributed systems and heterougeneous HPC architectures including GPUs and MIC coprocessors.
Before that I was a post-doctoral researcher with the HP2 team (High Performance and Parallel) at Telecom Sud-Paris, Evry. In the area of compilation/ code generation for parallel architectures, I work on both the optimization of the source to source transformation tool (STEP) and its port on manycores architectures.
I defended my PhD thesis at the LRI- Laboratoire de Recherche en Informatique (
Parall Team ) in November 2011 on parallel computing and parallel architectures. My thesis work (dissertation available here), led by Prof Daniel Etiemble and Dr Joel Falcou , focused on a programming model and deployment / hybrid code generation for hierarchical and heterogeneous parallel architectures (with development of tools).
In September 2008, I graduated from University of Paris-Sud 11 with a Master's degree in Computer Science.
Events and Announcements:
MVAPICH User Group (MUG) Meeting
held in Columbus, Ohio, USA during August 26-27, 2013.
Presentation Submission Deadline: July 1, 2013. Advanced
Registration Deadline: July 15, 2013.
The Hadoop-RDMA Software package ,
designed and developed by his research group,
provides native InfiniBand Verbs level support
for multiple components (HDFS, MapReduce and RPC) of Apache Hadoop
for Big Data processing.
Sample performance numbers and download instructions are
available from the above-menioned website.
- High Performance Computing
- High level Parallel Programming
- Tools and Environements for parallel and heterogeneous architectures
- Compilation and code generation
- Multi/Many-cores architectures
- 16) A. Venkatesh, S. Potluri, R. Rajachandrasekar, M. Luo, K. Hamidouche and D. K. Panda, High Performance Alltoall and Allgather designs for InfiniBand MIC Clusters. IEEE International Parallel & Distributed Processing Symposium (IPDPS '14). May 2014, Phoenix, USA
- 15) M. Luo, X. Lu, K. Hamidouche, K. Kandalla and D. K. Panda, Initial Study of Multi-Endpoint Runtime for MPI+OpenMP Hybrid Applications on Multi-Core Systems. International Symposium on Principles and Practice of Parallel Programming (PPoPP '14). February 2014, Orlondo, USA
- 14) R. Shi, S. Potluri, K. Hamidouche, X. Lu, K. Tomko and D. K. Panda, A Scalable and Portable Approach to Accelerate Hybrid HPL on
Heterogeneous CPU-GPU Clusters. IEEE Cluster (Cluster13). Best Student Paper Award . September 2013, Indianapolis, USA
- 13) S. Potluri, D. Bureddy, K. Hamidouche, A. Venkatesh, K. Kandalla, H. Subramoni and D. K. Panda, MVAPICH-PRISM: A Proxy-based Communication
Framework using InfiniBand and SCIF for Intel MIC Clusters. IEEE/ACM International Conference on Supercomputing (SC13) . November 2013, Denver, CO, USA
- 12) S. Potluri, K. Hamidouche, D. Bureddy and D. K. Panda, MVAPICH2-MIC: A High-Performance MPI Library for Xeon Phi Clusters with InfiniBand. Extreme Scaling Workshop . August 2013, Boulder, CO, USA
- 11) K. Kandalla, A. Venkatesh, K. Hamidouche, S. Potluri, D. Bureddy and D. K. Panda, Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters. IEEE International Symposium on High-Performance Interconnects (HotI 2013) . August 2013, San Jose, CA, USA
- 10) S. Potluri, K. Hamidouche, A. Venkatesh, D. Bureddy and D. K. Panda, Efficient Inter-node MPI Communication using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs. IEEE International Conference on Parallel Processing (ICPP 2013) . October 2013, Lyon, France
- 9) M. LI, S. Potluri, K. Hamidouche, J. Jose, D. K. Panda, Efficient and Truly Passive MPI-3 RMA Using InfiniBand Atomics. EuroMPI 13 . Septembre 2013, Madrid, Spain
- 8) K. Hamidouche, S. Potluri, H. Subramoni, K. Kandalla and D. K. Panda, MIC-RO: Enabling Efficient Remote Offload on Heterogeneous Many Integrated Core (MIC) Clusters with InfiniBand. ACM International Conference on Supercomputing (ICS 2013) . June 2013, Oregon, USA
- 7) K. Hamidouche, F. M. Mendonca, J. Falcou, A.C.M.A Melo, D. Etiemble, Parallel Smith-Waterman Comparison on Multicore and Manycore Computing Platforms with BSP++. International Journal of Parallel Programming (IJPP) . August 2012
- 6) K. Hamidouche, F. M. Mendonca, J. Falcou, D. Etiemble, Parallel Biological Sequence Comparison on Heterogeneous High Performance Computing Platforms with BSP++. 23rd IEEE International Symposium on Computer Architecture and High Performance Computing - SBAC-PAD'2011 , Vitoria, Espirito Santo, Brazil, October 26-29, 2011. pdf
- 5) K. Hamidouche, J. Falcou, D. Etiemble, A Framework for an Automatic Hybrid MPI+OpenMP code generation, ACM High Performance Computing Symposium , HPC-11, Boston - USA,
April, 3-7, 2011. pdf
- 4) K. Hamidouche, J. Falcou, D. Etiemble, Hybrid Bulk Synchronous Parallelism Library for Clustered SMP Architectures, ACM International workshop on High Level Parallel Programming and Applications, HLPP2010 , Affiliated to ICFP 2010 , Baltimore - USA, September, 25,2010.pdf
- 3) K. Hamidouche, A. Borghi, P. Esterie, J. Falcou, S. Peyronnet, Three High Performance Architectures in the Parallel APMC Boat, IEEE International Workshop on Parallel and Distributed Methods in Verification, PDMC 2010 , Enschede, Netherlands, September 30, 2010. pdf
- 2) C. Tadonki, L. Lacassagne, T. Saidani, J. Falcou, K. Hamidouche The Harris algorithm revisited on the CELL processor , International workshop on highly-Efficient Accelerators and Reconfigurable Technologies, HEART 2010 , 1 June 2010, Tsukuba , Japan
- 1) K. Hamidouche, F. Cappello, D. Etiemble, Comparaison de MPI, OpenMP et MPI+OpenMP sur un noeud multiprocesseur multicoeurs AMD a memoire partagee , Recontre Francophone de Parallelisme, RenPar 2009 , Toulouse, Septempber,9-11, 2009. pdf
- BSP++ Library
Is a generic library using C++ templates. Based on a hierarchical model, the BSP++ library takes the hybrid architectures (Multicore clusters and Cell BE accelerator based clusters) as native targets. Using a small set of primitives and intuitive concepts, BSP++ provides a simple way to program hybrid and heterogeneous architectures. It generates MPI, OpenMP, MPI+OpenMP, Cell BE and MPI+Cell BE codes with the same version of the user base code (the choice of the target machine is just a preprocessing symbole).
- BSPGen FrameWork
Is a tool for an automatic hybrid multi-level hierarchy (MPI + OpenMP or MPI +Cell BE) code generation. Using the BSP++ cost model, BSPGen predicts and generates the
appropriate hierarchical hybrid code (BSP++ code) for a given application on a target architecture. The prediction is based on a new pass on the LLVM compiler. BSPGen generates hybrid code from a list of sequential functions and a description of the parallel algorithm (XML file).
- I was Instructor/Teacher at Paris Sud 11 university. During my thesis, I taught several courses of different levels:
Polytechnique -IFIPS - University of Paris-sud 11
- (5 th degree - Formation Continue) Parallel and Distributed Programming
- (5 th degree - Apprentis 3) Parallel and Distributed Programming
IUT d'Orsay - University of Paris-sud 11
- (3 rd Degree) Operating Systems
UFR d'Orsay - University of Paris-sud 11
- (1 st Degree) Introduction to the Algorithmic and C langage