Documented Scientific Discoveries and Technical Innovations

Welcome to the High Performance Computing and Software Laboratory Technical Report Browser

This document lists the titles of selected technical reports (published or to be published) of the High Performance Computing and Software Laboratory (since 1994) with links to corresponding ./publications/abstracts. Included in the heading of each ./publications/abstract is a link to download the actual technical report.

Papers sorted by Years

2020

``Automating incremental and asynchronous evaluation for recursive aggregate data processing", Proceedings of 2020 ACM SIGMOD Conference on Management of Data (SIGMOD 2020), Portland, OR, USA, June 14-19, 2014.


2019

"Catfish: adaptive RDMA-enabled R-tree for low latency and high throughput" , Proceedings of 39th ACM International Conference on Distributed Computing Systems (ICDCS 2019), Dallas, Texas, July 7-9, 2019.

"HYPHA: a framework based on separation of parallelism to accelerate persistent homology matrix resuction" , Proceedings of 33rd ACM International Conference on Supercomputing (ICS 2019), Phoenix, Arizona, June 26-28, 2019.

"DirectLoad: a fast web-scale index system across large regional centers" , Proceedings of 35th IEEE International Conference of Data Engineering (ICDE 2019), Macau, China, April 8-11, 2019.

"SEP-Graph: finding shortest execution paths for graph processing under a hybrid framework on GPU" , Proceedings of 24th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programmign (PPoPP 2019), Washington DC, USA, February 16-20, 2019.


2018

"A low-cost disk solution enabling LSM-tree to achieve high performance for mixed read/write workloads" , ACM Transactions on Storage Vol. 14, No. 2, April 2018.

"Software-defined-Software: a perspective of machine learning based software production" , Proceedings of 38th International Conference on Distributed Computing Systems (ICDCS'18), Vienna, Austria, July 2-5, 2018.

"SQLoop: high performance iterative processing in data management" , Proceedings of 38th International Conference on Distributed Computing Systems (ICDCS'18), Vienna, Austria, July 2-5, 2018.


2017

``A distributed in-memory key-value store system on heterogeneous CPU–GPU cluster" , The VLDB Journal, August 2017, pages 1-22.

``Software support inside and outside solid-state-devices for high performance and high efficiency" , Proceedings of the IEEE, Volume 105, Issue 9, September 2017, pages 1650-1665.

``LSbM-tree: re-enabling buffer caching in data management for mixed reads and writes" , Proceedings of 37th International Conference on Distributed Computing Systems (ICDCS'17), Atlanta, Georgia, June 5-8, 2017.

``Feisu: fast query execution over heterogeneous data sources on large-scale clusters" , Proceedings of 33rd International Conference on Data Engineering (ICDE'17), San Diego, California, April 19-22, 2017.


2016

"Spark-GPU: an accelerated in-memory data processing engine on clusters" , Proceedings of 2016 IEEE International Conference on Big Data (IEEE BigData 2016), Washington DC, USA, December 5-8, 2016.

``BCC: reducing false aborts in optimiztic concurrency control with low cost for in-memory databases", Proceedings of 42nd International Conference on Very Large Data Bases (VLDB 2016), New Delhi, India, September 5-9, 2016.


2015

``Mega-KV: a case for GPUs to maximize the throughput of in-memory key-value stores", Proceedings of 41st International Conference on Very Large Data Bases (VLDB 2015), Hawaii, USA, August 31 - September 4, 2015.


2014

``Concurrent analytical query processing with GPUs", Proceedings of 40th International Conference on Very Large Data Bases (VLDB 2014), Hangzhou, China, September 1-5, 2014.

``Understanding insights into the basic structure and essential issues of table placement methods in clusters", Proceedings of 40th International Conference on Very Large Data Bases (VLDB 2014), Hangzhou, China, September 1-5, 2014. (this paper was accepted in PVLDB 2013).

``Major technical advancements in Apache Hive", Proceedings of 2014 ACM SIGMOD Conference on Management of Data (SIGMOD 2014), Snowbird, June 22-27, 2014.

``GDM: device memory management for GPGPU computing", Proceedings of 2014 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS 2014), Austin, Texas, June 16-20, 2014.


2013

``UNIK: Unsupervised social network spam detection", Proceedings of 22nd ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, October 27- November 1, 2013.

``S-CAVE: effective SSD caching to improve virtual machine storage performance", Proceedings of 22nd International Conference on Parallel Architectures and Compilation Techniques (PACT 2013), Edinburgh, Scotland, September 7-11, 2013.

``Hadoop-GIS: a high performance spatial data warehousing system over MapReduce", Proceedings of 39th International Conference on Very Large Data Bases (VLDB 2013), Riva del Garda, Trento, Italy, August 26-30, 2013.

``The Yin and Yang of processing data warehousing queries on GPU devices", Proceedings of 39th International Conference on Very Large Data Bases (VLDB 2013), Riva del Garda, Trento, Italy, August 26-30, 2013.

``LDPC-in-SSD: making advanced error correction codes work effectively in solid state drives", Proceedings of 11th USENIX Conference on File and Storage Technologies (FAST'13), San Jose, California, February 12-15, 2013.


2012

``Accelerating pathology image data cross-comparison on CPU-GPU hybrid systems" , Proceedings of 38th ACM International Conference on Very Large Databases (VLDB 2012), Istanbul, Turkey, August 27-31, 2012.

The PixelBox algorithm in this paper has been adopted in The Geometric Performance Primitive Library

``hStorage-DB: hhStorageDB: heterogeneity-aware data management to exploit full capacity of hybrid storage systems", Proceedings of 38th ACM International Conference on Very Large Databases (VLDB 2012), Istanbul, Turkey, August 27-31, 2012.

``Spam behavior analysis and detection in user generated content on social networks", Proceedings of 32nd International Conference on Distributed Computing Systems (ICDCS 2012), Macau, China, June 18-21, 2012.

``BWS: Balanced Work Stealing for time-sharing multicores" Proceedings of ACM EuroSys'12, Bern, Switzerland, April 10-13, 2012.
BWS is open source software.


2011

``DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems", Proceedings of 2nd ACM Symposium on Cloud Computing (SOCC 2011), Cascais, Portugal, October 27-28, 2011.

``YSmart: Yet another SQL-to-MapReduce Translator", Proceedings of 31st International Conference on Distributed Computing Systems (ICDCS 2011), Minneapolis, Minnesota, June 20-24, 2011. Best Paper Award .

YSmart has been merged into big data warehousing production systems

``Hystor: Making the Best Use of Solid State Drives in High Performance Storage Systems", Proceedings of 25th ACM International Conference on Supercomputing (ICS 2011), Tucson, Arizona, May 31 - June 4, 2011. Best Paper Award .

Hystor has made impact on commercial hybrid storage products, including Apple's Fusion Drive

``SRM-Buffer: An OS Buffer Management Technique to Prevent Last Level Caches from Thrashing in Multicores", Proceedings of ACM EuroSys'11, Salzburg, Austria, April 10-13, 2011.

``RCFile: a fast and space-efficient data placement structure in MapReduce-based Warehouse systems", Proceedings of International Conference on Data Engineering (ICDE 2011), Hannover, Germany, April 11-16, 2011.

RCFile has been adopted in big data warehouse production systems:

``CAFTL: a content-aware flash translationa layer enhancing the lifespan of flash memory basedsolid state drives", Proceedings of 9th USENIX Conference on File and Storage Technologies (FAST'11), San Jose, California, February 15-17, 2011.

``Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing", Proceedings of 17th International Symposium on High Performance Computer Architecture (HPCA-17), San Antonio, Texas, February 12-16, 2011.

``ULCC: a user-level facility for optimizing shared cache performance on multicores", Proceedings of 16th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming (PPoPP 2011), San Antonio, Texas, February 12-16, 2011.


2010

``PS-BC: power-saving considerations in design of buffer caches serving heterogeneous storage devices", Proceedings of 16th ACM International Symposium on Low Power Electronics and Design (ISLPED 2010), Austin, Texas, August 18-20, 2010.

``Splitter: a proxy-based approach for post-migration testing of Web applications", Proceedings of ACM EuroSys 2010, Paris, France, April 13-16, 2010.

``TopBT: a topology-aware and infrastructure-independent BitTorrent client", Proceedings of INFOCOM'10, San Diego, California, March 15-19, 2010.
TopBT is open source software
.


2009

``CUBS: coordinated upload bandwidth sharing in residential networks", Proceedings of 17th International Conference on Network Protocols (ICNP 2009), Princeton, NJ, October 13-16, 2009.

``Enabling software management for multicore caches with a lightweight hardware support", Proceedings of 22nd ACM/IEEE Annual Conference on Supercomputing (SC09), Portland, Oregon, November 14-20, 2009.

``Soft-OLP: improving hardware cache performance through software-controlled object-level partitioning", Proceedings of 18th International Conference on Parallel Architectures and Compilation Techniques (PACT 2009), Raleigh, North Carolina, September 12-16, 2009.

``MCC-DB: minimizing cache conflicts in multi-core processors for databases", Proceedings of 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, France, August 24-28, 2009.

``Analyzing patterns of user content generation in online social networks", Proceedings of 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-09), Paris, France, June 28- July 1st, 2009.

``Understanding intrinsic characteristics and system implications of flash memory based solid state drives", Proceedings of 2009 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS/Performance 2009), Seattle, WA, June 15-19, 2009.

``BP-Wrapper: a system framework making any replacement algorithms (almost) lock contention free" , Proceedings of 25th International Conference on Data Engineering (ICDE'09), Shanghai, China, March 29- April 4, 2009.


2008

``Automatic software fault diagnosis by exploiting application signatures" , Proceedings of 22nd USENIX Conference on Large Installation System Administration (LISA'08), San Siego, california, November 9-14, 2008. (Best Paper Award).

``The stretched exponential distribution of Internet media access patterns" , Proceedings of 27th ACM Symposium on Principles of Distributed Computing (PODC 2008), Toronto, Canada, August 18-21, 2008.

``Caching for Bursts (C-Burst): let hard disks sleep well and work energetically", Proceedings of 13th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'08), Bangalore, India, August 11-13, 2008.

``LightFlood: minimizing redundant messages and maximizing scope of peer-to-peer search", IEEE Transactions on Parallel and Distributed Systems, Vol. 19, No. 5, 2008.

``Gaining insights into multicore cache partitioning: bridging the gap between simulation and real systems", Proceedings of the 14th International Symposium on High Performance Computer Architecture (HPCA'08), Salt Lake City, Utah, February 16-20, 2008.

The OS-based cache partitioning method in this paper has been used in Linux kernel for production systems


2007

``PSM-Throttling: minimizing energy comsumption for bulk data communications in WLANs", Proceedings of the 15th International Conference on Network Protocols, (ICNP'07), Beijing, China, October 16-19, 2007.

``SProxy: a caching infrastructure to support Internet streaming", IEEE Transactions on Multimedia, Vol. 9, No. 5, 2007.

``Cost-aware caching algorithms for distributed storage servers", Proceedings of the 21st International Symposium on Distributed Computing (DISC'07), Lemesos, Cyprus, September 24-26, 2007.

``Maintaining strong cache consistency for the Domain Name System", IEEE Transactions on Knowledge and Data Engineering, Vol. 19, No. 8, 2007.

``SCAP: Smart Caching in wireless Access Points to improve P2P streaming", Proceedings of the 27 International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 25-29, 2007.

``STEP: Sequentiality and Thrashing Detection based Prefetching to improve performance of networked storage servers", Proceedings of the 27 International Conference on Distributed Computing Systems (ICDCS'07), Toronto, Canada, June 25-29, 2007.

``DiskSeen: exploiting disk layout and access history to enhance I/O prefetch", Proceedings of 2007 USENIX Annual Technical Conference (USENIX'07), Santa Clara, California, June 17-22, 2007.

``Does Internet media traffic really follow Zipf-like distribution?", (an extended abstract), Proceedings of 2007 ACM International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS'07), San Diego, California, June 12-16, 2007.

``Design and Analysis of Sensing Scheduling Algorithms under Partial Coverage for Object Detection in Sensor Networks", IEEE Transactions on Parallel and Distributed Systems, Vol. 18, No. 3, 2007.

``Cooperative Relay Service in a Wireless LAN", IEEE Journal on Selected Areas in Communications, Vol. 25, No. 2, 2007.

``A Performance Study of BitTorrent-like Peer-to-Peer Systems", IEEE Journal on Selected Areas in Communications, Vol. 25, No. 1, 2007.

``Coordinated multilevel buffer cache management with consistent access locality quantification", IEEE Transactions on Computers, Vol. 56, No. 1, 2007.


2006

``Delving into Internet streaming media delivery: a quality and resource utilization perspective", Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC'06), Rio de Janeiro, Brazil, October 25-27, 2006.

``SmartSaver: turning flash drive into a disk energy saver for mobile computers", Proceedings of 11th ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED'06), Tegernsee, Germany, October 4-6, 2006.

``DNScup: a strong cache consistency protocol for DNS", Proceedings of the 26th International Conference on Distributed Computing Systems (ICDCS'06), Lisbon, Portugal, July 4-7, 2006.

``ASAP: an AS-Aware Peer-relay protocol for high quality VoIP", Proceedings of the 26th International Conference on Distributed Computing Systems (ICDCS'06), Lisbon, Portugal, July 4-7, 2006.

``A locality-aware cooperative cache management protocol to improve network file system performance", Proceedings of the 26th International Conference on Distributed Computing Systems (ICDCS'06), Lisbon, Portugal, July 4-7, 2006.

``Segment-based streaming media proxy: modeling and optimization", IEEE Transactions on Multimedia, Vol. 8, No. 2, 2006.

``Design and evaluation of a scalable and reliable P2P assisted proxy for on-demand streaming media delivery", IEEE Transactions on Knowledge and Data Engineering, Vol. 18, No. 5, 2006.

``MESA: reducing cache conflicts by increasing static and run-time methods", Proceedings of International Symposium on Performance Analysis of Systems and Software (ISPASS-2006), Austin, Texas, March 19-21, 2006.

``Exploiting idle communication power to improve wireless network performance and energy efficiency", Proceedings of INFOCOM'06, Barcelona, Spain, April 23-29, 2006.


2005

``Fast proxy delivery of multiple streaming sessions in shared running buffers", IEEE Transactions on Multimedia, Vol. 7, No. 6, December 2005.

``DULO: an effective buffer cache management scheme to exploit both temporal and spatial localities", Proceedings of the 4th USENIX Conference on Files and Storage Technologies (FAST'05), San Francisco, CA, December 14-16, 2005.

``Coordinated data prefetching for Web contents", Computer Communications, Vol. 28, Issue 17, October 2005.

``Look-ahead architecture adaptation to reduce processor power consumption" IEEE Micro, Vol. 25, No. 4, July/August, 2005.

``Measurement, analysis, and modeling of BitTorrent-like systems" Proceedings of ACM SIGCOMM Internet Measurement Conference (IMC'05), New Orleans, LA, October 19-21, 2005.

``Making LRU friendly to weak locality workloads: a novel replacement algorithm to improve buffer cache performance", IEEE Transactions on Computers, Vol. 54, No. 8, 2005.

``Segment-based proxy caching for Internet streaming media delivery", IEEE Multimedia, Vol. 12, No. 3, July-September, 2005.

``Fast and low-cost search schemes by exploiting localities in P2P networks", Journal of Parallel and Distributed Computing, Vol. 65, Issue 6, 2005.

``Design and analysis of wave sensing scheduling protocols for object-tracking applications", Proceedings of the First International Conference on Distributed Computing in Sensor Systems (DCOSS '05), Marina del Rey, California, June 30 - July 1, 2005.

``Analyzing object detection quality under probabilistic coverage in sensor networks", Proceedings of the 13th International Workshop on Quality of Service, (IWQoS'05), Passau, Germany, June 21 - 23, 2005.

``Analysis of multimedia workloads with implications for internet streaming" , Proceedings of the 14th International World Wide Web Conference, (WWW'2005), Chiba, Japan, May 10-14, 2005.

``DISC: Dynamic Interleaved Segment Caching for interactive steaming accesses", Proceedings of the 25th International Conference on Distributed Computing Systems, (ICDCS'2005), Columbus, Ohio, June 6-9, 2005.

``Locality awareness in unstructured peer-to-peer systems", IEEE Transactions on Parallel and Distributed Systems, Vol. 16, No. 2, February 2005.

``CLOCK-Pro: an effective improvement of the CLOCK replacement", Proceedings of 2005 USENIX Annual Technical Conference (USENIX'05), Anaheim, CA, April 10-15, 2005.

Clock-Pro has been adopted in OS kernels and other data processing systems

``SCOPE: scalable consistency maintenance in structured P2P systems", Proceedings of IEEE INFOCOM 2005 Conference, Miami, Florida, March 13-17, 2005.

``Token-ordered LRU: an effective page replacement policy and its implementation in Linux systems", Performance Evaluation, Vol. 60, Issue 1-4, 2005.

The token algorithm is a part of Linux Kernel


2004

``A study on object tracking quality under probabilistic coverage in sensor networks", a poster presentation in MobiCom'04, Philadelphia, Pennsylvania, September 26 to October 1, 2004; an extended abstract published in ACM Mobile Computing and Communication Review (MC2R), Vol. 9, No. 1, pp 73-76, January 2005.

``Enforcing direct communications between clients and Web servers to improve proxy performance and security", Software: Practice and Experience, Vol. 34, Issue 12, October 2004.

``Exploiting content localities for efficient search in P2P systems", Proceedings of the 18th International Symposium on Distributed Computing (DISC 2004), Amsterdam, Netherlands, October 4 - 8, 2004.

``Strong cache consistency support for domain name system", a poster presentation in SIGCOMM'04, Portland, Oregon, August 31 - September 3, 2004.

``Design and optimization of large size and low overhead off-chip caches", IEEE Transactions on Computers, Vol. 53, No. 7, 2004.

``Building a large and efficient hybrid peer-to-peer Internet caching system" , IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 6, 2004.

``Adaptive memory allocations in clusters to handle unexpectedly large data-intensive jobs" , IEEE Transactions on Parallel and Distributed Systems, Vol. 15, No. 7, 2004.

``SAT-Match: a self-adaptive topology matching method to achieve low lookup latency in structured P2P overlay networks" , Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04), Santa Fe, New Mexico, April 26-30, 2004.

``PROP: a scalable and reliable P2P assisted streaming proxy system" , Proceedings of the 24th International Confernece on Distributed Computing Systems, (ICDCS'04), Tokyo, Japan, March 23-26, 2004.

``ULC: A file block placement and replacement protocol to effectively exploit hierarchical locality in multi-level buffer caches" , Proceedings of the 24th International Confernece on Distributed Computing Systems, (ICDCS'04), Tokyo, Japan, March 23-26, 2004.

``SRB: Shared Running Buffers in proxy to exploit memory locality of multiple streaming media sessions" , Proceedings of the 24th International Confernece on Distributed Computing Systems, (ICDCS'04), Tokyo, Japan, March 23-26, 2004.

``Locality-aware topology matching in P2P systems" , Proceedings of IEEE INFOCOM'04, Hong Kong, March 7-11, 2004.

``Designs of high quality streaming proxy systems" , Proceedings of IEEE INFOCOM'04, Hong Kong, March 7-11, 2004.

``Investigating performance insights of segment-based proxy caching of streaming media strategies",, Proceedings of ACM International Conference on Multimedia Computing and Networking (MMCN'04), January 21-22, 2004.


2003

``Auto-CFD: efficiently parallelizing CFD applications on clusters" Proceedings of IEEE International Confernece on Cluster Computing, (Cluster'03), December 1-4, 2003.

``Efficient Distributed Disk Caching in Data Grid Management" Proceedings of IEEE International Confernece on Cluster Computing, (Cluster'03), December 1-4, 2003.

``On scalable and locality aware Web file sharing", Journal of Parallel and Distributed Computing, Vol. 63, No. 10, 2003.

``Low cost and reliable mutual anonymity protocols in peer-to-peer networks", IEEE Transactions on Parallel and Distributed Systems, Vol. 14, No. 9, 2003.

``Accurately modeling workload interactions for deploying prefetching in Web servers", , Proceedings of 2003 International Conference on Parallel Processing, (ICPP'03), Kaohsiung, Taiwan, China, October 6-9, 2003.

``LighFlood: an efficient flooding scheme for file search in unstructured peer-to-peer systems", Proceedings of 2003 International Conference on Parallel Processing, (ICPP'03), Kaohsiung, Taiwan, China, October 6-9, 2003.

``Adaptive and lazy segmentation based proxy caching for streaming media delivery" , Proceedings of 13th ACM International Workshop on Network and Operating Systems Support for Design Audio and Video, (NOSSDAV'03), Monterey, California, USA, June 1-3, 2003.

``Mutual anonymity protocols for hybrid peer-to-peer systems" , Proceedings of 23rd International Conference on Distributed Computing Systems, (ICDCS'03), Providence, Rhode Island, May 19-22, 2003.

``A popularity-based prediction model for Web prefetching", IEEE Computer, Vol. 36, No. 3, March, 2003.


2002

``Detective borwsers: a software technique to improve Web access performance and security", Proceedings of the 7th International Workshop on Web Content Caching and Distribution, (WCW'02), Boulder, Colorado, August 14-16, 2002.

``Access-mode predictions for low-power cache design", IEEE Micro, Vol. 22, No. 2, March/April, 2002.

``LIRS: an efficient low inter-reference recency set replacement to improve buffer cache performance" , Proceedings of the 2002 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, (SIGMETRICS'02), Marina Del Rey, California, June 15-19, 2002.

The LIRS algorithm has been adopted in major database and software systems

``Adaptive and virtual reconfigurations for dynamic job scheduling in clusters" , Proceedings of 22nd International Conference on Distributed Computing Systems, (ICDCS'02), Vienna, Austria, July 2-5, 2002.

``On reliable and scalable peer-to-peer web document sharing", Proceedings of 2002 International Parallel and Distributed Processing Symposium, (IPDPS'02), Fort Lauderdale, Florida, April 15-19, 2002.

``TPF: a system thrashing protection facility", Software: Practice and Experience, Vol. 32, Issue 3, 2002.

``Dynamic cluster resource allocations for jobs with known and unknown memory demands", IEEE Transactions on Parallel and Distributed Systems, Vol. 13, No. 3, 2002.

``Fine-grain priority scheduling on multi-channel memory systems", Proceedings of the 8th International Symposium on High Performance Computer Architecture, (HPCA-8), Cambridge, Massachusetts, February 2-6, 2002.


2001

``Breaking address mapping symmetry at multi-level of memory hierarchy to reduce DRAM row-buffer conflicts", Journal of Instruction-Level Parallelism, Vol. 3, 2001.

``Adaptive page replacement to protect thrashing in Linux", Proceedings of the 5th USENIX Annual Linux Showcase and Conference, (ALS'01), Oakland, California, November 5-10, 2001.

``Cached DRAM for ILP processor memory access latency reduction", IEEE Micro, Vol. 21, No. 4, July/August, 2001.

``Coordinated data prefetching by utilizing reference information at both proxy and Web servers", Proceedings of the ACM Workshop on Performance and Architecture of Web Servers, (PAWS-2001), Boston, Massachusetts, June 16-17, 2001.

``Exploiting neglected data locality in browsers", Proceedings of the 10th International World Wide Web Conference, (WWW10), Hong Kong, May 1-5, 2001, (an extended abstract).

``Dynamic load sharing with unknown memory demands in clusters", Proceedings of the 21st International Conference on Distributed Computing Systems, (ICDCS'2001), Phoenix, Arizona, April 16-19, 2001.

``Fast bit-reversals on uniprocessors and shared-memory multiprocessors", SIAM Journal on Scientific Computing, Vol. 22, No. 6, 2001.

``Architectural effects of symmetric multiprocessors on TPC-C commercial workload", Journal on Parallel and Distributed Computing, Vol. 61, 2001.


2000

``A permutation-based page interleaving scheme to reduce row-buffer conflicts and exploit data locality", Proceedings of the 33rd Annual International Symposium on Microarchitecture, (Micro-33), Monterey, California, December 10-13, 2000.

The permutation technique has been widely adopted in different commercial processors.

``Improving memory performance of sorting algorithms", ACM Journal on Experimental Algorithmics, Vol. 5, 2000.

``Memory hierarchy considerations for cost-effective cluster computing", IEEE Transactions on Computers, Vol. 49, No. 9, 2000.

``Incorporating job migration and network RAM to share memory resources", Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing, (HPDC-9), Pittsburgh, Pennsylvania, August 1-4, 2000.

``Effective Load Sharing on Heterogeneous Networks of Workstations", Proceedings of the 2000 International Parallel and Distributed Processing Symposium, (IPDPS'2000), Cancun, Mexico, May 1-5, 2000.

``Cacheminer: a runtime approach to exploit cache locality on SMP", IEEE Transactions on Parallel and Distributed Systems, Vol. 11, No. 4, 2000.

``Improving distributed workload performance by sharing both CPU and memory resources", Proceedings of the 20th International Conference on Distributed Computing Systems, (ICDCS'2000), Taipei, Taiwan, April 10-13, 2000.


1999

``Cache-optimal methods for bit-reversals", Proceedings of Supercomputing'99, (SC'99), November, Portland, Oregon, 1999.

``Analysis of commercial workload on SMP multiprocessors", Proceedings of Performance'99 August, 1999.

``Profit-effective parallel computing", IEEE Concurrency, Vol. 7, No. 2, 1999.

``The impact of memory hierarchies on cluster computing", Proceedings of 13th International Parallel Processing Symposium & 10th Symposium on Parallel and Distributed Processing, (Second Merged Symposium IPPS/SPDP'99), April, 1999.

``Engineering workstations", Encyclopedia of Electrical and Electronics Engineering, John Wiley & Sons, Publishers, February, 1999.

``Performance models and simulation", Chapter 6, High Performance Cluster Computing, Volume 1, edited by R. Buyya, Prentice Hall, New Jersey, 1999.

``Comparative evaluation and case studies of shared-memory and data-parallel execution patterns", Scientific Programming, Vol. 7, No. 1, 1999.


1998

``Lock Bypassing: an efficient algorithm for concurrently accessing priority heaps", ACM Journal on Experimental Algorithmics, Vol. 3, No. 3, 1998.

``A memory-layout oriented run-time technique for locality optimization", Proceedings of 1998 International Conference on Parallel Processing, (ICPP'98), August 1998.

``Characterizing and scheduling communication tasks of parallel and sequential jobs on networks of workstations", Computer Communications, Vol. 21, Issue. 5, 1998.

``An Integrated Approach of Performance Prediction on Networks of Workstations" , Chapter 4, Advanced Computer System Design, K. Bagchi, J. Walrand and G.Zobrist, Eds, Gordon and Breach Publishers, 1998.

Exploiting Cache Locality on Symmetric Multiprocessors: A Run-Time Approach, Ph.D. Dissertation, College of William and Mary, May 1998.


1997

``Two fast and high-associativity cache schemes", IEEE Micro, October, 1997.

``A comparative evaluation of hierarchical network architecture of the HP-Convex Exemplar", Proceedings of ICCD'97.

``Coordinating parallel processes on networks of workstations", Journal of Parallel and Distributed Computing, Vol. 46, No. 2, 1997.

``Effectively scheduling parallel tasks and communications on networks of workstations", Proceedings of Euro-Par'97.

``Nova visualization for optimization of data-parallel programs", Proceedings of Euro-Par'97.

``Distributed edge detection: issues and implementations", IEEE Computational Science and Engineering, Spring Issue, 1997.

``Software support for multiprocessor latency measurement and evaluation", IEEE Transactions on Software Engineering , Vol. 23, No. 1, 1997.

``Adaptively scheduling parallel loops on distributed shared-memory systems", IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 1, 1997.


1996

``Semi-empirical multiprocessor performance predictions" , Journal of Parallel and Distributed Computing, Vol. 39, No. 1, 1996.

``An effective and practical performance prediction model for parallel computing on non-dedicated heterogeneous NOW" , Journal of Parallel and Distributed Computing, Vol. 38, No. 1, 1996.

``An adaptive loop scheduling algorithm on shared-memory systems" , Proceedings of the 8th Symposium on Parallel and Distributed Processing, IEEE Computer Society Press, October, 1996.

``Evaluating and designing software mutual exclusion algorithms on shared-memory multiprocessors" , IEEE Parallel & Distributed Technology, Spring Issue, 1996.

``Simulation of heterogeneous networks of workstations" , Proceedings of MASCOTS'96, IEEE Computer Society Press, February, 1996.

``A fast token-chasing mutual exclusion algorithm in arbitrary network topologies" , Journal of Parallel and Distributed Computing, Vol. 35, No. 2, 1996.

``Parallelizing FDTD Methods for Solving Electromagnetic Scattering Problems" , Applications on Advanced Architecture Computers, G. Astfalk Eds., SIAM Press, 1996.


1995

``Comparative modeling and evaluation of CC-NUMA and COMA on hierarchical ring architectures", IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 12, 1995.

``Modeling and characterizing parallel computing performance on heterogeneous networks of workstations" , Proceedings of the 7th IEEE Symposium on Parallel and Distributed Processing, IEEE Computer Society Press, October, 1995.

``*Graph: a tool for visualizing communication and optimizing layout in data-parallel programs" , Proceedings of the 1995 International Conference on Parallel Processing, CRC Press, Vol. 2, August, 1995.

``Software support for asynchronous computing across networks" , Proceedings of the 19th Annual International Computer Software and Application Conference , IEEE Computer Society Press, August, 1995.

``Multiprocessor scalability predictions through detailed program execution analysis" , Proceedings of the 9th ACM International Conference on Supercomputing, ACM Press, July, 1995. (Best Paper Award ).

``Comparative performance analysis and evaluation of hot spots on network-based shared-memory architectures", IEEE Transactions on Parallel and Distributed Systems, Vol. 6, No. 8, 1995.

``Parallelizing an oil refining simulation: numerical methods, implementations and experience", Parallel Computing , Vol. 21, No. 4, 1995.


1994

``Distributed computation of electromagnetic scattering problems using finite-difference time-domain decompositions", Proceedings of the Third IEEE International Symposium on High-Performance Distributed Computing , IEEE Computer Society Press, August, 1994.

``Latency metric: an experimental method for measuring and evaluating parallel program and architecture scalability", Journal of Parallel and Distributed Computing , Vol. 22, No. 3, 1994.

``Comparative performance evaluation of spin-lock synchronization on MIN-based and HR-based multiprocessors", IEEE Parallel and Distributed Technology, Spring Issue, 1994.

``Computation and communication patterns of large-scale image convolutions on parallel architectures", Proceedings of the 8th International Parallel Processing Symposium, IEEE Computer Society Press, April, 1994.

Tutorial on Multiprocessor Performance Measurement and Evaluation , IEEE Computer Society Press, 1994.

``Triangular decomposition methods for solving reducible nonlinear systems of equations", SIAM Journal on Optimization , Vol. 5. No. 2, 1994.

``Spin-lock synchronization on the Butterfly and KSR1", IEEE Parallel & Distributed Technology, Vol. 2, Spring Issue, 1994.


Find the hidden treasure, eh?!