Abstract: Storage and inference of deep neural network models are resource-intensive, limiting their deployment on edge devices. Structured pruning methods can reduce the resource requirements for ...