Cheng Li
Home
Publications
Experience
Projects
Talks
Languages
Contact
**Cheng Li**
Latest
The Design and Implementation of a Scalable DL Benchmarking Platform
DLSpec: A Deep Learning Task Exchange Specification
DLBricks: Composable Benchmark Generation to Reduce Deep Learning Benchmarking Effort on CPUs
MLModelScope: Evaluate and Introspect Cognitive Pipelines
Accelerating Reduction and Scan Using Tensor Core Units
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep Learning Inference in Function as a Service Environments
Evaluating Characteristics of CUDA Communication Primitives on High-Bandwidth Interconnects
Accelerating Reduction Using Tensor Core Units
SCOPE: C3SR Systems Characterization and Benchmarking Framework
Matrix Factorization on GPUs with Memory Optimization and Approximate Computing
RAI: A Scalable Project Submission System for Parallel Programming Courses
KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism
DjiNN and Tonic: DNN as a Service and Its Implications for Future Warehouse Scale Computers
Sirius: An Open End-to-End Voice and Vision Personal Assistant and Its Implications for Future Warehouse Scale Computers
Stochastic circuits for real-time image-processing applications
Cite
×