I conduct research at the IMT Atlantique in Brest on hardware and software implementations of signal processing and AI algorithms. I teach computer engineering and digital electronics.
My PhD thesis focused on the implementation of polar codes decoders. I proposed the fastest software implementation of the Adaptive SC List decoding algorithm to date. This implementation is integrated in the AFF3CT toolbox to which I actively contribute.
I currently focus on efficient hardware and software implementations of Neural Networks, aiming at low latency and energy efficiency, through multiple industrial collaborations, and in the near future as the coordinator of a JCJC ANR project, ProPruNN.
PhD in Electronics, 2018
PhD in Electronics, 2018
University of Bordeaux
MEng in Embedded Electronics, 2015
Enseirb-Matmeca, Bordeaux INP
Introduced in the late 1980s for generalization purposes, pruning has now become a staple for compressing deep neural networks. Despite many innovations in recent decades, pruning approaches still face core issues that hinder their performance or scalability. Drawing inspiration from early work in the field, and especially the use of weight decay to achieve sparsity, we introduce Selective Weight Decay (SWD), which carries out efficient, continuous pruning throughout training. Our approach, theoretically grounded on Lagrangian smoothing, is versatile and can be applied to multiple tasks, networks, and pruning structures. We show that SWD compares favorably to state-of-the-art approaches, in terms of performance-to-parameters ratio, on the CIFAR-10, Cora, and ImageNet ILSVRC2012 datasets.
Flexibility is one mandatory aspect of channel coding in modern wireless communication systems. Among other things, the channel decoder has to support several code lengths and code rates. This need for flexibility applies to polar codes that are considered for control channels in the future 5G standard. This paper presents a new generic and flexible implementation of a software Successive Cancellation List (SCL) decoder. A large set of parameters can be fine-tuned dynamically without re-compiling the software source code: the code length, the code rate, the frozen bits set, the puncturing patterns, the cyclic redundancy check, the list size, the type of decoding algorithm, the tree-pruning strategy and the data quantization. This generic and flexible SCL decoder enables to explore tradeoffs between throughput, latency and decoding performance. Several optimizations are proposed to achieve a competitive decoding speed despite the constraints induced by the genericity and the flexibility. The resulting polar list decoder is about 4 times faster than a generic software decoder and only 2 times slower than a non-flexible unrolled decoder. Thanks to the flexibility of the decoder, the fully adaptive SCL algorithm can be easily implemented and achieves higher throughput than any other similar decoder in the literature (up to 425 Mb/s on a single processor core for N = 2048 and K = 1723 at 4.5 dB).
In this paper, the first transport triggered architecture (TTA) customized for the decoding of polar codes is proposed. A first version of this programmable processor is optimized for the successive cancellation (SC) decoding of polar codes while a second architecture is further specialized to also support Soft CANcellation (SCAN) decoding. Both architectures were fully validated on FPGA device by prototyping. The first architecture was also synthesized in 28nm ASIC technology. It runs at a frequency of 800 MHz and reaches a throughput of 352 Mbps for a (1024, 512) polar code decoded with the SC algorithm. Compared to previous work, the energy consumption is reduced by one order of magnitude (0.14 nJ / bit) and the throughput is increased fivefold. Compared to an optimized software implementation on a general purpose processor (x86 architecture), the throughput is 37 % higher and the energy consumption is two orders of magnitude lower. TTA can be seen as a way to reduce the gap between programmable and dedicated polar decoders.
Cloud Radio Access Network is foreseen as one of the key features of the future 5G mobile communication standard. In this context, all the baseband processing is intended to be performed on CPUs in order to keep a high level of flexibility. The challenge is then to propose efficient software implementations of baseband processing algorithms that guarantee a sufficient throughput, while limiting the energy consumption. In this paper, as an alternative to general purpose processors, we propose an implementation of an Application Specific Instruction set Processor customized for the Successive Cancellation decoding of polar codes. The resulting software decoder achieves throughputs similar to state-of-the-art ARM processor implementations, while reducing the energy consumption by a factor 10.