Abstract: Optimization of deep learning models for embedded CPUs presents numerous challenges stemming from limited computational resources, memory constraints, thread synchronization overhead, and ...