The Straight-Through Estimator (STE) is widely used for back-propagating gradients through the quantization function, but the STE technique lacks a complete theoretical understanding. We propose an alternative methodology…
Learning low-precision neural networks without Straight-Through Estimator(STE).pdf
The Winograd or Cook-Toom class of algorithms help to reduce the overall compute complexity of many modern deep convolutional neural networks (CNNs). Although there has been…
Efficient Winograd or Cook-Toom Convolution Kernel Implementation on Widely Used Mobile CPUs.pdf
Recurrent neural networks (RNNs) have shown state of the art results for speech recognition, natural language processing, image captioning and video summarizing applications. Many of these applications…
Measuring scheduling efficiency of RNNs for NLP applications.pdf
Machine learning-based applications are increasingly prevalent in IoT devices. The power and storage constraints of these devices make it particularly challenging to run modern neural networks, limiting the…
Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications.pdf
Continuous computer vision (CV) tasks increasingly rely on convolutional neural networks (CNN). However, CNNs have massive compute demands that far exceed the performance and energy constraints…
Euphrates- Algorithm-SoC Co-Design for Low-Power Mobile Continuous Vision.pdf
Machine learning is playing an increasingly significant role in emerging mobile application domains such as AR/VR, ADAS, etc. Accordingly, hardware architects have designed customized hardware for machine learning…
Mobile Machine Learning Hardware at ARM- A Systems-on-Chip (SoC) Perspective.pdf
On-device CNN inference for real-time computer vision applications can result in computational demands that far exceed the energy budgets of mobile devices. This paper proposes FixyNN, a co-designed…
Energy Efficient Hardware for On-Device CNN Inference via Transfer Learning.pdf
Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications…