Quantization

DeepGEMM: Accelerated Ultra Low-Precision Inference on CPU Architectures using Lookup Tables

A lot of recent progress has been made in ultra lowbit quantization, promising significant improvements in latency, memory footprint and energy consumption on edge devices. Quantization methods such as Learned Step Size Quantization can achieve model …

Inter-Operability of Compression Techniques for Efficient Deployment of CNNs on Microcontrollers

Machine Learning (ML) has become state of the art for various tasks, including classification of accelerometer data. In the world of Internet of Things (IoT), the available hardware with low-power consumption is often microcontrollers. However, one …