I am a Research Scientist in the Computation Structures Group at MIT CSAIL, working with Prof. Arvind. I work on computer systems for Graph AI. My research aims to democratize Graph AI (e.g., Graph ML, Graph Mining) by designing algorithms and building software & hardware systems. Before joining MIT, I was a research fellow in the Intelligent Software Systems group at University of Texas at Austin, and a visiting Ph.D. student in the IMPACT Research group at University of Illinois Urbana-Champaign.
Currently, We are looking for UROP/MEng students to join our research projects. If you are interested in working in machine learning, systems and architecture, feel free to reach out to me over email. These projects have the potential to become an MEng thesis, and we have 6-A program opportunities available.
Google Scholar» CV (PDF)» GitHub» Reading List for Graph Computing» Graph Mining Benchmarks» Graph Neural Network Benchmarks» GARDENIA Benchmark Suite»(OSDI 2022)
[PDF]
[Code]
Xuhao Chen, Arvind
Efficient and Scalable Graph Pattern Mining on GPUs,
16th USENIX Symposium on Operating Systems Design and Implementation, 2022
(GNNSys 2021)
[PDF]
[Poster]
[BibTeX]
[Zoom Talk]
[Code]
Loc Hoang, Xuhao Chen, Hochan Lee, Roshan Dathathri, Gurbinder Gill, Keshav Pingali,
Efficient Distribution for Deep Learning on Large Graphs,
Workshop on Graph Neural Networks and Systems, 2021
(ISCA 2021)
[PDF]
[Talk (PPTX)]
[BibTeX]
[Zoom Talk]
[Code]
Xuhao Chen*, Tianhao Huang*, Shuotao Xu, Thomas Bourgeat, Chanwoo Chung, Arvind
FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining,
International Symposium on Computer Architecture, 2021 (*: equal contribution)
(ICS 2021)
[PDF]
[Talk (PPTX)]
[BibTeX]
[Code]
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, Keshav Pingali,
Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining,
International Conference on Supercomputing, 2021
(VLDB 2020)
[PDF]
[Talk (PPTX)]
[YouTube]
[Bilibili]
[BibTeX]
[Code]
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Keshav Pingali,
Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU,
PVLDB 13(8): 1190-1205, 2020
(HPEC 2019)
[PDF]
[Talk (PPTX)]
[BibTeX]
[Code]
Loc Hoang, Vishwesh Jatala, Xuhao Chen, Udit Agarwal, Roshan Dathathri, Grubinder Gill, Keshav Pingali,
DistTC: High Performance Distributed Triangle Counting,
IEEE High Performance Extreme Computing Conference (HPEC), 2019
(arXiv)
[PDF]
[Code]
Xuhao Chen,
GraphCage: Cache Aware Graph Processing on GPUs,
CoRR, https://arxiv.org/abs/1904.02241
(JETC)
[PDF]
[Code]
Zhen Xu, Xuhao Chen, Jie Shen, Yang Zhang, Cheng Chen, Canqun Yang,
GARDENIA: A Graph Processing Benchmark Suite for Next-generation Accelerators,
ACM Journal on Emerging Technologies in Computing Systems, 15(1): 1-13, 2019
(arXiv)
[PDF]
[Code]
Xuhao Chen,
Escort: Efficient Sparse Convolutional Neural Networks on GPUs,
CoRR, https://arxiv.org/abs/1802.10280
(Parco)
[PDF]
[Code]
[BibTeX]
Xuhao Chen, Cheng Chen, Jie Shen, Jianbin Fang, Tao Tang, Canqun Yang, Zhiying Wang,
Orchestrating Parallel Detection of Strongly Connected Components on GPUs,
Parallel Computing, Vol 78, Pages 101–114, 2018
(PMAM 2017)
[PDF]
[Talk (PPTX)]
[Code]
[BibTeX]
Pingfan Li, Xuhao Chen, Jie Shen, Jianbin Fang, Tao Tang, Canqun Yang,
High Performance Detection of Strongly Connected Components in Sparse Graphs on GPUs,
In the Proceedings of the International Workshop on Programming Models and Applications for Multicores and Manycores,
in conjunction with the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Austin, TX, Feb 2017
(CPE)
[PDF]
[Code]
[BibTeX]
Xuhao Chen, Pingfan Li, Jianbin Fang, Tao Tang, Zhiying Wang, Canqun Yang,
Efficient and High-quality Sparse Graph Coloring on the GPU,
Concurrency and Computation: Practice and Experience, Volume 29, Issue 10, 17 April 2017
(PPL)
[PDF]
[BibTeX]
Jianbin Fang, Peng Zhang, Zhaokui Li, Tao Tang, Xuhao Chen, Cheng Chen, Canqun Yang,
Evaluating Multiple Streams on Heterogeneous Platforms,
Parallel Processing Letters, Vol. 26, No. 4, 2016
(JETC)
[PDF]
[BibTeX]
Hang Zhang, Xuhao Chen, Nong Xiao, Lei Wang, Fang Liu, Wei Chen, Zhiguang Chen,
Shielding STT-RAM Based Register files on GPUs Against Read Disturbance,
ACM Journal on Emerging Technologies in Computing Systems, Vol. 10, No. 5, Article 39, March 2016
(DAC 2016)
[PDF]
[Talk (PPTX)]
[BibTeX]
Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu,
Optimizing STT-RAM Based Register File Energy Consumption on GPGPU with Delta Compression,
In the Proceedings of the 53rd Design Automation Conference (DAC-53), Austin, TX, June 2016
(Acceptance rate: 152/876 ≈ 17.4%)
(GLSVLSI 2016)
[PDF]
[Poster (PPTX)]
[BibTeX]
Hang Zhang, Xuhao Chen, Nong Xiao, Fang Liu, Zhiguang Chen,
Red-Shield: Shielding Read Disturbance for STT-RAM Based Register Files on GPUs,
In the Proceedings of the 26th Great Lakes Symposium on VLSI (GLSVLSI), Boston, MA, May 2016
(IPDPSW 2016)
[PDF]
[Code]
[BibTeX]
Pingfan Li, Xuhao Chen, Zhe Quan, Jianbin Fang, Huayou Su, Tao Tang, Canqun Yang,
High Performance Parallel Graph Coloring on GPGPUs,
In the Proceedings of the 30th IEEE International Parallel & Distributed Processing Symposium Workshop (IPDPSW), Chicago, IL, May 2016
(MICRO 2014)
[PDF]
[Talk (PPTX)]
[Poster]
[Lightning]
[BibTeX]
Xuhao Chen, Li-Wen Chang, Christopher I. Rodrigues, Jie Lv, Zhiying Wang, Wen-Mei W. Hwu,
Adaptive Cache Management for Energy-efficient GPU Computing,
In the Proceedings of 47th International Symposium on Micro Architecture (MICRO), Cambridge, UK, December 2014
(Acceptance rate: 53/273 ≈ 19.4%)
(MES 2014)
[PDF]
[Talk (PPTX)]
[BibTeX]
Xuhao Chen, Shengzhao Wu, Li-Wen Chang, Wei-Sheng Huang, Carl Pearson, Zhiying Wang, Wen-Mei W. Hwu,
Adaptive Cache Bypass and Insertion for Many-core Accelerators,
In the Proceedings of the 2nd ACM International Workshop on Many-core embedded systems (MES '14),
in conjunction with ISCA-41, Minneapolis, MN, June 2014
(Science China)
Xuhao Chen, Li Shen, Zhiying Wang, Zhong Zheng, Wei Chen,
Binary Compatibility for Embedded Systems using Greedy Subgraph Mapping,
SCIENCE CHINA Information Sciences, Volume 57, Issue 7, pp 1-16, July 2014
Hardware Accelerator for Graph Pattern Mining
With Prof. Arvind
Design and implemented FlexMiner, a pattern-aware accelerator for graph pattern mining (GPM).
FlexMiner offers an order of magnitude speedup over state-of-the-art software GPM solutions.
Programming Framework for Graph Pattern Mining
With Prof. Keshav Pingali
Design and implemented two graph pattern mining (GPM) frameworks, Pangolin and Sandslash.
Pangolin targets both CPU and GPU. It is the first GPM system that supports GPU mining,
and it is orders-of-magnitude faster than previous GPM systems.
Sandslash targets CPU only.
It provides a novel two-level programming interface and supports adavanced GPM optimizations,
which offers 8x speedup over Pangolin on CPU.
Parallel Graph Algorithms on GPU
With Prof. Zhiying Wang
Design and implemented various parallel graph algorithms, frameworks and benchmarks on the GPU,
including vertex coloring, strongly connected components and sparse neural networks.
Cache Architecture for Irregular Algorithms on GPU
With Prof. Wen-Mei Hwu
and Prof. Zhiying Wang
Designed and implemented efficient cache architectures for irregular applications on
GPU.
Computer Architecture (undergraduate course) Fall 2008 NUDT
With Professor Zhiying Wang
Teaching Assistant to mentor students on lab assignments and final projects
Design and Analysis of Algorithms (undergraduate course) Fall 2010 NUDT
With Professor Jianping Yin
Teaching Assistant to mentor students on labs and final projects and help with scoring
CS 380C: Advanced Topics in Compilers (graduate course) Fall 2019, UT Austin
With Professor Keshav Pingali
Teaching Assistant to setup course project and mentor students
6.886: Algorithm Engineering (graduate course) Spring 2021, MIT
With Professor Julian Shun
Guest lecture on Pangolin [Slides]
GPU Programming (course in development) TBD
GPU Programming
Topic list: introduction to parallel computing, GPU architecture, CUDA programming model, CUDA memory model, matrix multiplication,
tiling, scratchpad memory, constant memory, SIMT, memory coalescing, control divergence, convolution, stencil, histogram,
atomic operations, reduction tree, prefix sum, merging, sorting, sparse matrix representations and processing, graph traversal,
streaming, intra-warp synchronization, dynamic parallleism, multi-GPU programming, etc.
Computer Organization (course in development) TBD
Computer Organization
Topic list: computer abstractions, performance, power wall, numeral systems, logic design, logic gates, truth tables,
Karnaugh maps, combinational logic, decoders, multiplexors, ALUs, sequential logic, flip flops, latches, register file, SRAM,
DRAM, finite state machine, multiplication hardware, assembly programming, instruction set architecture, instruction formats,
calling conventions, floating point representation, floating point hardware, building a datapath, pipelining the datapath,
structure hazards, data hazards, control hazards, memory hierarchy, locality, direct-mapped caches, associative caches,
cache writing modes, cache performance, virtual memory, address translation, page tables, translation lookaside buffer.
Parallel Computing (course in development) TBD
Parallel Computing
Topic list: parallel architectures, matrix multiplication, tiling, convolution, stencil, histogram, atomic operations,
reduction tree, prefix sum, merging, sorting, sparse matrix representations and processing, graph traversal,
synchronization, and computational thinking.
Graph Challenge 2019 Student Innovation Awards 2019
China Computer Federation (CCF) Distinguished PhD Dissertation Award Nominee 2015
Ci Yun-Gui Computer Technology Scholarship for Graduate, NUDT (top 1%) 2010
Meritorious Winner, Mathematical Contest In Modeling (MCM), COMAP 2009
Distinguished Graduate, NUDT (top 1%), 2009
First rank, Scholarship of Excellent Achievements, NUDT (top 3%) 2009
Ci Yun-Gui Computer Technology Scholarship for Undergraduate, NUDT (top 1%) 2008
First-rank Prize, China Undergraduate Mathematical Contest in Modeling 2007
First rank, Scholarship of Excellent Achievements, NUDT (top 3%) 2007
Invited reviewer for
IEEE Transactions on Knowledge and Data Engineering
Invited reviewer for
ACM Transactions on Architecture and Code Optimization
Invited reviewer for
ACM Transactions on Modeling and Performance Evaluation of Computing Systems
Invited reviewer for
Microprocessors and Microsystems: Embedded Hardware Design
Invited reviewer for
Journal of Supercomputing
Google Scholar Profile
LinkedIn Profile
DBLP Entry
Another Homepage
Github Entry
ORCID iD
Faculty job talks: tips from the faculty
Computer Science Graduate Job and Interview Guide
Getting an academic job
Preparing and Giving a Good Talk
Python Programming and Numerical Methods - A Guide for Engineers and Scientists