Abstract: Computing-in-memory (CIM) chips have demonstrated promising energy efficiency for artificial intelligence (AI) applications such as neural networks (NNs), Transformer, and recommendation ...
Abstract: Structured sparsity enables deploying large language models (LLMs) on resource-constrained systems. Approaches like dense-to-sparse fine-tuning are particularly compelling, achieving ...