logo
Loading...

Lecture 15 | Efficient Methods and Hardware for Deep Learning - Stanford University School of Engineering - 深度學習 Deep Learning 公開課 - Cupoy

In Lecture 15, guest lecturer Song Han discusses algorithms and specialized hardware that can be use...

In Lecture 15, guest lecturer Song Han discusses algorithms and specialized hardware that can be used to accelerate training and inference of deep learning workloads. We discuss pruning, weight sharing, quantization, and other techniques for accelerating inference, as well as parallelization, mixed precision, and other techniques for accelerating training. We discuss specialized hardware for deep learning such as GPUs, FPGAs, and ASICs, including the Tensor Cores in NVIDIA’s latest Volta GPUs as well as Google’s Tensor Processing Units (TPUs). Keywords: Hardware, CPU, GPU, ASIC, FPGA, pruning, weight sharing, quantization, low-rank approximations, binary networks, ternary networks, Winograd transformations, EIE, data parallelism, model parallelism, mixed precision, FP16, FP32, model distillation, Dense-Sparse-Dense training, NVIDIA Volta, Tensor Core, Google TPU, Google Cloud TPU Slides: http://cs231n.stanford.edu/slides/201...