survey

Free Access

Just Accepted

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Authors:
Hou-I Liu

National Yang Ming Chiao Tung University, Hsinchu, Taiwan

National Yang Ming Chiao Tung University, Hsinchu, Taiwan

0000-0002-2101-2997
Search about this author

,
Marco Galindo

National Yang Ming Chiao Tung University, Hsinchu Taiwan

National Yang Ming Chiao Tung University, Hsinchu Taiwan

0009-0009-9907-3253
Search about this author

,
Hongxia Xie

Jilin University, Changchun, China

Jilin University, Changchun, China

0000-0002-5652-4327
Search about this author

,
Lai-Kuan Wong

Multimedia University, Cyberjaya Malaysia

Multimedia University, Cyberjaya Malaysia

0000-0002-4517-0391
Search about this author

,
Hong-Han Shuai

National Yang Ming Chiao Tung University, Hsinchu Taiwan

National Yang Ming Chiao Tung University, Hsinchu Taiwan

0000-0003-2216-077X
Search about this author

,
Yung-Hui Li

Foxconn Research, Taipei Taiwan

Foxconn Research, Taipei Taiwan

0000-0002-0475-3689
Search about this author

,
Wen-Huang Cheng

National Taiwan University, Taipei Taiwan

National Taiwan University, Taipei Taiwan

0000-0002-4662-7875
Search about this author

Authors Info & Claims

ACM Computing SurveysAccepted on April 2024https://doi.org/10.1145/3657282

Online AM:11 May 2024Publication History

ACM Computing Surveys

Abstract

Over the past decade, the dominance of deep learning has prevailed across various domains of artificial intelligence, including natural language processing, computer vision, and biomedical signal processing. While there have been remarkable improvements in model accuracy, deploying these models on lightweight devices, such as mobile phones and microcontrollers, is constrained by limited resources. In this survey, we provide comprehensive design guidance tailored for these devices, detailing the meticulous design of lightweight models, compression methods, and hardware acceleration strategies. The principal goal of this work is to explore methods and concepts for getting around hardware constraints without compromising the model’s accuracy. Additionally, we explore two notable paths for lightweight deep learning in the future: deployment techniques for TinyML and Large Language Models. Although these paths undoubtedly have potential, they also present significant challenges, encouraging research into unexplored areas.

References

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. 2016. TensorFlow: A system for large-scale machine learning. In OSDI. 265–283.Google Scholar
M. S. Abdelfattah, A. Mehrotra, Ł. Dudziak, and N. D. Lane. 2021. Zero-Cost Proxies for Lightweight NAS. (2021).Google Scholar
AIM. 2022. Advances in Image Manipulation workshop in conjunction with ECCV 2022. Retrieved November 2, 2023 from https://data.vision.ee.ethz.ch/cvl/aim22/Google Scholar
D. Amodei and D. Hernandez. 2018. AI and Compute. Retrieved November 2, 2023 from https://openai.com/blog/ai-and-computeGoogle Scholar
S. An, Q. Liao, Z. Lu, and J.-H. Xue. 2022. Efficient semantic segmentation via self-attention and self-distillation. T-ITS 23, 9 (2022), 15256–15266.Google Scholar
R. Anil, A. M. Dai, O. Firat, M. Johnson, D. Lepikhin, A. Passos, S. Shakeri, E. Taropa, P. Bailey, Z. Chen, et al. 2023. PaLM 2 technical report. arXiv preprint arXiv:2305.10403(2023).Google Scholar
A. Asperti, D. Evangelista, and M. Marzolla. 2021. Dissecting FLOPs along input dimensions for GreenAI cost estimations. In LOD. 86–100.Google Scholar
C. Banbury, C. Zhou, I. Fedorov, R. Matas, U. Thakker, D. Gope, V. Janapa Reddi, M. Mattina, and P. Whatmough. 2021. MicroNets: Neural network architectures for deploying TinyML applications on commodity microcontrollers. MLSys 3(2021).Google Scholar
R. Banner, I. Hubara, E. Hoffer, and D. Soudry. 2018. Scalable methods for 8-bit training of neural networks. NIPS 31(2018).Google Scholar
M. Bastian. 2023. GPT-4 has more than a trillion parameters - Report. Retrieved March 1, 2024 from https://the-decoder.com/gpt-4-has-a-trillion-parameters/Google Scholar
A. Berthelier, T. Chateau, S. Duffner, C. Garcia, and C. Blanc. 2021. Deep model compression and architecture optimization for embedded systems: A survey. JSPS 93, 8 (2021), 863–878.Google Scholar
M. Booshehri, A. Malekpour, and P. Luksch. 2013. An improving method for loop unrolling. IJCSIS 11, 5 (2013), 73–76.Google Scholar
H. Cai, C. Gan, T. Wang, Z. Zhang, and S. Han. 2020. Once-for-All: Train One Network and Specialize it for Efficient Deployment. In ICLR.Google Scholar
Y. Cai, Z. Yao, Z. Dong, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. ZeroQ: A novel zero shot quantization framework. In CVPR. 13169–13178.Google Scholar
A. Capotondi, M. Rusci, M. Fariselli, and L. Benini. 2020. CMix-NN: Mixed low-precision CNN library for memory-constrained edge devices. TCAS-II 67, 5 (2020), 871–875.Google ScholarCross Ref
M. Capra, B. Bussolino, A. Marchisio, G. Masera, M. Martina, and M. Shafique. 2020. Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead. IEEE Access 8(2020), 225134–225180.Google ScholarCross Ref
B. Chen, T. Medini, J. Farwell, C. Tai, A. Shrivastava, et al. 2020. SLIDE: In defense of smart algorithms over hardware acceleration for large-scale deep learning systems. MLSys 2(2020), 291–306.Google Scholar
C.-Y. Chen, L. Lo, P.-J. Huang, H.-H. Shuai, and W.-H. Cheng. 2021. Fashionmirror: Co-attention feature-remapping virtual try-on with sequential template poses. In ICCV. 13809–13818.Google Scholar
D. Chen, J.-P. Mei, H. Zhang, C. Wang, Y. Feng, and C. Chen. 2022. Knowledge distillation with the reused teacher classifier. In CVPR. 11933–11942.Google Scholar
D. Chen, J.-P. Mei, Y. Zhang, C. Wang, Z. Wang, Y. Feng, and C. Chen. 2021. Cross-layer distillation with semantic calibration. In AAAI, Vol. 35. 7028–7036.Google Scholar
H. Chen, Y. Wang, C. Xu, B. Shi, C. Xu, Q. Tian, and C. Xu. 2020. AdderNet: Do We Really Need Multiplications in Deep Learning?. In CVPR.Google Scholar
P. Chen, S. Liu, H. Zhao, and J. Jia. 2021. Distilling knowledge via knowledge review. In CVPR. 5008–5017.Google Scholar
T. Chen, Z. Du, N. Sun, J. Wang, C. Wu, Y. Chen, and O. Temam. 2014. DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. ACM SIGARCH Computer Architecture News 42, 1 (2014), 269–284.Google ScholarDigital Library
T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. 2016. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. NIPSW.Google Scholar
W. Chen, Y. Wang, S. Yang, C. Liu, and L. Zhang. 2020. You Only Search Once: A Fast Automation Framework for Single-Stage DNN/Accelerator Co-design. In DATE. 1283–1286.Google Scholar
W. Chen, D. Xie, Y. Zhang, and S. Pu. 2019. All you need is a few shifts: Designing efficient convolutional neural networks for image classification. In CVPR. 7241–7250.Google Scholar
Y. Chen, X. Dai, D. Chen, M. Liu, X. Dong, L. Yuan, and Z. Liu. 2022. Mobile-Former: Bridging MobileNet and Transformer. In CVPR. 5270–5279.Google Scholar
Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, et al. 2014. DaDianNao: A machine-learning supercomputer. In MICRO. 609–622.Google Scholar
Y. Chen, T. Yang, X. Zhang, G. Meng, C. Pan, and J. Sun. 2019. Detnas: Neural architecture search on object detection. NIPS 1, 2 (2019), 4–1.Google Scholar
S. Chetlur, C. Woolley, P. Vandermersch, J. Cohen, J. Tran, B. Catanzaro, and E. Shelhamer. 2014. cuDNN: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759(2014).Google Scholar
R. Child, S. Gray, A. Radford, and I. Sutskever. 2019. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509(2019).Google Scholar
J. Cho, Y. Jung, S. Lee, and Y. Jung. 2021. Reconfigurable binary neural network accelerator with adaptive parallelism scheme. Electronics 10, 3 (2021), 230.Google ScholarCross Ref
J. Choi, Z. Wang, S. Venkataramani, P. I.-J. Chuang, V. Srinivasan, and K. Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. arXiv preprint arXiv:1805.06085(2018).Google Scholar
K. Choi, D. Hong, H. Yoon, J. Yu, Y. Kim, and J. Lee. 2021. Dance: Differentiable accelerator/network co-exploration. In DAC.Google Scholar
F. Chollet. 2017. Xception: Deep learning with depthwise separable convolutions. In CVPR. 1251–1258.Google Scholar
K. Choromanski, V. Likhosherstov, D. Dohan, X. Song, A. Gane, T. Sarlos, P. Hawkins, J. Davis, A. Mohiuddin, L. Kaiser, et al. 2021. Rethinking attention with performers. In ICLR.Google Scholar
X. Dai, A. Wan, P. Zhang, B. Wu, Z. He, Z. Wei, K. Chen, Y. Tian, M. Yu, P. Vajda, et al. 2021. Fbnetv3: Joint architecture-recipe search using predictor pretraining. In CVPR. 16276–16285.Google Scholar
Z. Dai, H. Liu, Q. V. Le, and M. Tan. 2021. CoAtNet: Marrying convolution and attention for all data sizes. NIPS 34(2021), 3965–3977.Google Scholar
R. David, J. Duke, A. Jain, V. Janapa Reddi, N. Jeffries, J. Li, N. Kreeger, I. Nappier, M. Natraj, T. Wang, P. Warden, and R. Rhodes. 2021. TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. In MLSys, Vol. 3. 800–811.Google Scholar
J. Deng, W. Li, Y. Chen, and L. Duan. 2021. Unbiased mean teacher for cross-domain object detection. In CVPR. 4091–4101.Google Scholar
X. Dong, S. Chen, and S. Pan. 2017. Learning to prune deep neural networks via layer-wise optimal brain surgeon. NIPS 30(2017).Google Scholar
Z. Dong, Z. Yao, D. Arfeen, A. Gholami, M. W. Mahoney, and K. Keutzer. 2020. Hawq-v2: Hessian aware trace-weighted quantization of neural networks. NIPS 33(2020), 18518–18529.Google Scholar
Z. Dong, Z. Yao, A. Gholami, M. W. Mahoney, and K. Keutzer. 2019. Hawq: Hessian aware quantization of neural networks with mixed-precision. In ICCV. 293–302.Google Scholar
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In ICLR.Google Scholar
L. Du, Y. Du, Y. Li, J. Su, Y.-C. Kuan, C.-C. Liu, and M.-C. F. Chang. 2017. A reconfigurable streaming deep convolutional neural network accelerator for Internet of Things. TCAS-I 65, 1 (2017), 198–208.Google ScholarCross Ref
S. Dubey, V. K. Soni, B. K. Dubey, et al. 2019. Application of Microcontroller in Assembly Line for Safety and Controlling. IJRAR 6, 1 (2019), 107–111.Google Scholar
S. d’Ascoli, H. Touvron, M. L. Leavitt, A. S. Morcos, G. Biroli, and L. Sagun. 2021. Convit: Improving vision transformers with soft convolutional inductive biases. In ICML. 2286–2296.Google Scholar
M. Elhoushi, Z. Chen, F. Shafiq, Y. H. Tian, and J. Y. Li. 2021. Deepshift: Towards multiplication-less neural networks. In CVPR. 2359–2368.Google Scholar
F. Faghri, I. Tabrizian, I. Markov, D. Alistarh, D. M. Roy, and A. Ramezani-Kebrya. 2020. Adaptive Gradient Quantization for Data-Parallel SGD. NIPS 33(2020), 3174–3185.Google Scholar
Z. Fan, W. Hu, H. Guo, F. Liu, and D. Xu. 2021. Hardware and Algorithm Co-Optimization for pointwise convolution and channel shuffle in ShuffleNet V2. In SMC. 3212–3217.Google Scholar
M. Feurer, A. Klein, K. Eggensperger, J. T. Springenberg, M. Blum, and F. Hutter. 2019. Auto-sklearn: efficient and robust automated machine learning. In Automated Machine Learning. 113–134.Google Scholar
L. Foundation. 2017. ONNX. Retrieved November 2, 2023 from https://onnx.ai/Google Scholar
M. Fraccaroli, E. Lamma, and F. Riguzzi. 2022. Symbolic DNN-tuner. Machine Learning (2022), 1–26.Google Scholar
J. Frankle and M. Carbin. 2019. The lottery ticket hypothesis: Finding sparse, trainable neural networks. ICLR.Google Scholar
E. Frantar and D. Alistarh. 2023. SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot. arXiv preprint arXiv:2301.00774(2023).Google Scholar
Z. Fu, M. He, Z. Tang, and Y. Zhang. 2023. Optimizing data locality by executor allocation in spark computing environment. ComSIS 20, 1 (2023), 491–512.Google ScholarCross Ref
J. Getzner, B. Charpentier, and S. Günnemann. 2023. Accuracy is not the only Metric that matters: Estimating the Energy Consumption of Deep Learning Models. In ICLR.Google Scholar
A. Gholami, S. Kim, Z. Dong, Z. Yao, M. W. Mahoney, and K. Keutzer. 2022. A survey of quantization methods for efficient neural network inference. (2022), 291–326.Google Scholar
A. Gholami, K. Kwon, B. Wu, Z. Tai, X. Yue, P. Jin, S. Zhao, and K. Keutzer. 2018. SqueezeNext: Hardware-aware neural network design. In CVPRW. 1638–1647.Google Scholar
A. N. Gomez, M. Ren, R. Urtasun, and R. B. Grosse. 2017. The reversible residual network: Backpropagation without storing activations. NIPS 30(2017).Google Scholar
Google. 2023. Post-training quantization | TensorFlow Lite. Retrieved November 2, 2023 from https://www.tensorflow.org/lite/performance/post_training_quantizationGoogle Scholar
J. Gou, B. Yu, S. J. Maybank, and D. Tao. 2021. Knowledge distillation: A survey. IJCV 129, 6 (2021), 1789–1819.Google ScholarDigital Library
B. Graham, A. El-Nouby, H. Touvron, P. Stock, A. Joulin, H. Jégou, and M. Douze. 2021. LeViT: a Vision Transformer in ConvNet’s Clothing for Faster Inference. In ICCV. 12259–12269.Google Scholar
R. M. Gray and D. L. Neuhoff. 1998. Quantization. TIT 44, 6 (1998), 2325–2383.Google ScholarDigital Library
K. Guo, L. Sui, J. Qiu, J. Yu, J. Wang, S. Yao, S. Han, Y. Wang, and H. Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. TCAD 37, 1 (2017), 35–47.Google Scholar
Q. Guo, X. Wang, Y. Wu, Z. Yu, D. Liang, X. Hu, and P. Luo. 2020. Online knowledge distillation via collaborative learning. In CVPR. 11020–11029.Google Scholar
Y. Guo, A. Yao, and Y. Chen. 2016. Dynamic network surgery for efficient DNNs. NIPS 29(2016).Google Scholar
Z. Guo, R. Zhang, L. Qiu, X. Ma, X. Miao, X. He, and B. Cui. 2023. CALIP: Zero-shot enhancement of clip with parameter-free attention. In AAAI, Vol. 37. 746–754.Google Scholar
M. Gupta and P. Agrawal. 2022. Compression of deep learning models for text: A survey. TKDD 16, 4 (2022), 1–55.Google ScholarDigital Library
S. Gupta, A. Agrawal, K. Gopalakrishnan, and P. Narayanan. 2015. Deep learning with limited numerical precision. (2015), 1737–1746.Google Scholar
S. Gupta and B. Akin. 2020. Accelerator-aware Neural Network Design using AutoML. MLSysW (2020).Google Scholar
T. J. Ham, S. J. Jung, S. Kim, Y. H. Oh, Y. Park, Y. Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, et al. 2020. A2303 3: Accelerating attention mechanisms in neural networks with approximation. In HPCA. 328–341.Google Scholar
T. J. Ham, Y. Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, and J. W. Lee. 2021. ELSA: Hardware-Software co-design for efficient, lightweight self-attention mechanism in neural networks. In ISCA. 692–705.Google Scholar
K. Han, Y. Wang, H. Chen, X. Chen, J. Guo, Z. Liu, Y. Tang, A. Xiao, C. Xu, Y. Xu, Z. Yang, Y. Zhang, and D. Tao. 2023. A Survey on Vision Transformer. TPAMI 45, 1 (2023), 87–110.Google ScholarCross Ref
S. Han, H. Mao, and W. J. Dally. 2016. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. In ICLR.Google Scholar
B. Hassibi, D. G. Stork, and G. J. Wolff. 1993. Optimal brain surgeon and general network pruning. In ICNN. 293–299.Google Scholar
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In CVPR. 770–778.Google Scholar
X. He, K. Zhao, and X. Chu. 2021. AutoML: A Survey of the State-of-the-Art. Knowledge-Based Systems 212 (2021), 106622.Google ScholarCross Ref
Y. He, Y. Ding, P. Liu, L. Zhu, H. Zhang, and Y. Yang. 2020. Learning Filter Pruning Criteria for Deep Convolutional Neural Networks Acceleration. In CVPR. 2006–2015.Google Scholar
Y. He, G. Kang, X. Dong, Y. Fu, and Y. Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. IJCAI (2018), 2234–2240.Google Scholar
Y. He, P. Liu, Z. Wang, Z. Hu, and Y. Yang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In CVPR. 4340–4349.Google Scholar
Y. He, X. Liu, H. Zhong, and Y. Ma. 2019. AddressNet: Shift-based primitives for efficient convolutional neural networks. In WACV. 1213–1222.Google Scholar
Y. He, X. Zhang, and J. Sun. 2017. Channel pruning for accelerating very deep neural networks. In CVPR. 1389–1397.Google Scholar
S. C. Hidayati, T. W. Goh, J.-S. G. Chan, C.-C. Hsu, J. See, L.-K. Wong, K.-L. Hua, Y. Tsao, and W.-H. Cheng. 2020. Dress with style: Learning style from joint deep embedding of clothing styles and body shapes. TMM 23(2020), 365–377.Google Scholar
G. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531(2015).Google Scholar
J. Ho, A. Jain, and P. Abbeel. 2020. Denoising diffusion probabilistic models. NIPS 33(2020), 6840–6851.Google Scholar
Y. Hou, Z. Ma, C. Liu, and C. C. Loy. 2019. Learning lightweight lane detection CNNs by self attention distillation. In ICCV. 1013–1021.Google Scholar
A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, et al. 2019. Searching for mobilenetv3. In ICCV. 1314–1324.Google Scholar
A. G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, and H. Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861(2017).Google Scholar
L.-C. Hsu, C.-T. Chiu, K.-T. Lin, H.-H. Chou, and Y.-Y. Pu. 2020. ESSA: An energy-aware bit-serial streaming deep convolutional neural network accelerator. JSA 111(2020), 101831.Google Scholar
J. Hu, L. Shen, and G. Sun. 2018. Squeeze-and-excitation networks. In CVPR. 7132–7141.Google Scholar
W. Hu, Z. Che, N. Liu, M. Li, J. Tang, C. Zhang, and J. Wang. 2023. CATRO: Channel Pruning via Class-Aware Trace Ratio Optimization. TNNLS (2023), 1–13.Google Scholar
G. Huang, S. Liu, L. Van der Maaten, and K. Q. Weinberger. 2018. CondenseNet: An efficient DenseNet using learned group convolutions. In CVPR. 2752–2761.Google Scholar
G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger. 2017. Densely connected convolutional networks. In CVPR. 4700–4708.Google Scholar
J.-C. Huang and T. Leng. 1999. Generalized loop-unrolling: a method for program speedup. In ASSET. 244–248.Google Scholar
Z. Huang and N. Wang. 2019. Like what you like: Knowledge distill via neuron selectivity transfer. In ICLR.Google Scholar
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2016. Binarized neural networks. In NIPS. 4114–4122.Google Scholar
F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W. J. Dally, and K. Keutzer. 2017. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. In ICLR.Google Scholar
B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. (2018), 2704–2713.Google Scholar
Y. Jeon and J. Kim. 2018. Constructing fast network through deconstruction of convolution. NIPS 31(2018).Google Scholar
M. Jia, L. Tang, B.-C. Chen, C. Cardie, S. Belongie, B. Hariharan, and S.-N. Lim. 2022. Visual prompt tuning. In ECCV. 709–727.Google Scholar
N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, et al. 2023. TPU v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings. In ISCA. 1–14.Google Scholar
N. P. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers, et al. 2017. In-datacenter performance analysis of a tensor processing unit. In ISCA. 1–12.Google Scholar
S. Jung, C. Son, S. Lee, J. Son, J.-J. Han, Y. Kwak, S. J. Hwang, and C. Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In CVPR. 4350–4359.Google Scholar
B. Kang, X. Chen, D. Wang, H. Peng, and H. Lu. 2023. Exploring Lightweight Hierarchical Vision Transformers for Efficient Visual Tracking. In ICCV. 9612–9621.Google Scholar
M. Kang and B. Han. 2020. Operation-aware soft channel pruning using differentiable masks. In ICML. 7021–7032.Google Scholar
K. Kim, B. Ji, D. Yoon, and S. Hwang. 2021. Self-knowledge distillation with progressive refinement of targets. In ICCV. 6567–6576.Google Scholar
N. Kitaev, Ł. Kaiser, and A. Levskaya. 2020. Reformer: The efficient transformer. In ICLR.Google Scholar
L. Kotthoff, C. Thornton, H. H. Hoos, F. Hutter, and K. Leyton-Brown. 2019. Auto-WEKA: Automatic model selection and hyperparameter optimization in WEKA. In Automated Machine Learning. 81–95.Google Scholar
A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. NIPS 25(2012), 1097–1105.Google Scholar
S. Kumar, V. Bitorff, D. Chen, C. Chou, B. Hechtman, H. Lee, N. Kumar, P. Mattson, S. Wang, T. Wang, et al. 2019. Scale MLPerf-0.6 models on google TPU-v3 pods. arXiv preprint arXiv:1909.09756(2019).Google Scholar
L. Lai, N. Suda, and V. Chandra. 2018. CMSIS-NN: Efficient neural network kernels for arm cortex-m CPUs. arXiv preprint arXiv:1801.06601(2018).Google Scholar
Y. LeCun, J. Denker, and S. Solla. 1989. Optimal brain damage. NIPS 2(1989).Google Scholar
N. Lee, T. Ajanthan, and P. H. Torr. 2019. Snip: Single-shot network pruning based on connection sensitivity. ICLR.Google Scholar
H. Li, A. Kadav, I. Durdanovic, H. Samet, and H. P. Graf. 2017. Pruning Filters for Efficient ConvNets. In ICLR.Google Scholar
N. Li, S. Takaki, Y. Tomiokay, and H. Kitazawa. 2016. A multistage dataflow implementation of a deep convolutional neural network based on FPGA for high-speed object recognition. In SSIAI. 165–168.Google Scholar
S. Li, M. Lin, Y. Wang, Y. Wu, Y. Tian, L. Shao, and R. Ji. 2023. Distilling a Powerful Student Model via Online Knowledge Distillation. TNNLS 34, 11 (2023), 8743–8752.Google Scholar
S. Li, M. Tan, R. Pang, A. Li, L. Cheng, Q. V. Le, and N. P. Jouppi. 2021. Searching for fast model families on datacenter accelerators. In CVPR. 8085–8095.Google Scholar
Y. Li, C. Hao, X. Zhang, X. Liu, Y. Chen, J. Xiong, W.-m. Hwu, and D. Chen. 2020. EDD: Efficient differentiable DNN architecture and implementation co-search for embedded ai solutions. In DAC. 1–6.Google Scholar
Y. Li, Y. Hu, F. Wu, and K. Li. 2022. DiVIT: Algorithm and architecture co-design of differential attention in vision transformer. JSA (2022), 102520.Google Scholar
T. Liang, J. Glossner, L. Wang, S. Shi, and X. Zhang. 2021. Pruning and quantization for deep neural network acceleration: A survey. Neurocomputing 461(2021), 370–403.Google ScholarDigital Library
Y. Liang, G. Chongjian, Z. Tong, Y. Song, J. Wang, and P. Xie. 2021. EViT: Expediting Vision Transformers via Token Reorganizations. In ICLR.Google Scholar
J. Lin, W.-M. Chen, H. Cai, C. Gan, and S. Han. 2021. MCUNetV2: Memory-efficient patch-based inference for tiny deep learning. In NIPS.Google Scholar
J. Lin, W.-M. Chen, Y. Lin, C. Gan, S. Han, et al. 2020. MCUNet: Tiny deep learning on iot devices. NIPS 33(2020), 11711–11722.Google Scholar
S. Lin, H. Xie, B. Wang, K. Yu, X. Chang, X. Liang, and G. Wang. 2022. Knowledge distillation via the target-aware transformer. In CVPR. 10915–10924.Google Scholar
S. Lin, H. Xie, B. Wang, K. Yu, X. Chang, X. Liang, and G. Wang. 2022. Knowledge Distillation via the Target-Aware Transformer. In CVPR. 10915–10924.Google Scholar
Y. Lin, D. Hafdi, K. Wang, Z. Liu, and S. Han. 2020. Neural-hardware architecture search. NIPSWS (2020).Google Scholar
Y.-J. Lin and T. S. Chang. 2017. Data and hardware efficient design for convolutional neural network. TCAS-I 65, 5 (2017), 1642–1651.Google Scholar
B. Liu, F. Li, X. Wang, B. Zhang, and J. Yan. 2023. Ternary weight networks. In ICASSP. 1–5.Google Scholar
H. Liu, K. Simonyan, and Y. Yang. 2019. DARTS: Differentiable Architecture Search. (2019).Google Scholar
L. Liu, S. Zhang, Z. Kuang, A. Zhou, J.-H. Xue, X. Wang, Y. Chen, W. Yang, Q. Liao, and W. Zhang. 2021. Group fisher pruning for practical network compression. In ICML.Google Scholar
X. Liu, M. Ye, D. Zhou, and Q. Liu. 2021. Post-training quantization with multiple points: Mixed precision without mixed precision. In AAAI, Vol. 35. 8697–8705.Google Scholar
Z. Liu, H. Hu, Y. Lin, Z. Yao, Z. Xie, Y. Wei, J. Ning, Y. Cao, Z. Zhang, L. Dong, F. Wei, and B. Guo. 2022. Swin Transformer V2: Scaling Up Capacity and Resolution. In CVPR. 12009–12019.Google Scholar
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In ICCV. 10012–10022.Google Scholar
Z. Liu, H. Mu, X. Zhang, Z. Guo, X. Yang, K.-T. Cheng, and J. Sun. 2019. Metapruning: Meta learning for automatic neural network channel pruning. In ICCV. 3296–3305.Google Scholar
G. Luo, Y. Zhou, X. Sun, Y. Wang, L. Cao, Y. Wu, F. Huang, and R. Ji. 2022. Towards lightweight transformer via group-wise transformation for vision-and-language tasks. TIP 31(2022), 3386–3398.Google ScholarDigital Library
T. Luo, S. Liu, L. Li, Y. Wang, S. Zhang, T. Chen, Z. Xu, O. Temam, and Y. Chen. 2016. DaDianNao: A neural network supercomputer. IEEE TC 66, 1 (2016), 73–88.Google ScholarDigital Library
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In ECCV. 116–131.Google Scholar
MAI. 2021. Mobile AI workshop 2021. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2021/#challengesGoogle Scholar
MAI. 2022. Mobile AI workshop 2022. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2022/#challengesGoogle Scholar
MAI. 2023. Mobile AI workshop 2023. Retrieved November 2, 2023 from https://ai-benchmark.com/workshops/mai/2023/#challengesGoogle Scholar
S. Mehta, M. Ghazvininejad, S. Iyer, L. Zettlemoyer, and H. Hajishirzi. 2021. Delight: Very deep and light-weight transformer. In ICLR.Google Scholar
S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2018. Pyramidal recurrent unit for language modeling. In EMNLP.Google Scholar
S. Mehta, R. Koncel-Kedziorski, M. Rastegari, and H. Hajishirzi. 2020. Define: Deep factorized input token embeddings for neural sequence modeling. In ICLR.Google Scholar
S. Mehta and M. Rastegari. 2022. Mobilevit: light-weight, general-purpose, and mobile-friendly vision transformer. In ICLR.Google Scholar
L. Mezdour, K. Kadem, M. Merouani, A. S. Haichour, S. Amarasinghe, and R. Baghdadi. 2023. A Deep Learning Model for Loop Interchange. In ACM SIGPLAN CC. 50–60.Google Scholar
P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu. 2018. Mixed Precision Training. (2018).Google Scholar
M. M. Naseer, K. Ranasinghe, S. H. Khan, M. Hayat, F. Shahbaz Khan, and M.-H. Yang. 2021. Intriguing properties of vision transformers. NIPS 34(2021).Google Scholar
A. Nechi, L. Groth, S. Mulhem, F. Merchant, R. Buchty, and M. Berekovic. 2023. FPGA-based Deep Learning Inference Accelerators: Where Are We Standing?TRETS 16, 4 (2023), 1–32.Google Scholar
NVIDIA. 2023. NVIDIA CUDA-X: GPU Accelerated Libraries. Retrieved November 2, 2023 from https://developer.nvidia.com/gpu-accelerated-librariesGoogle Scholar
OpenAI. 2023. GPT-4 Technical Report. (2023).Google Scholar
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. Emer, S. W. Keckler, and W. J. Dally. 2017. SCNN: An accelerator for compressed-sparse convolutional neural networks. ACM SIGARCH Comput. Archit. News 45, 2 (2017), 27–40.Google ScholarDigital Library
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al. 2019. Pytorch: An imperative style, high-performance deep learning library. NIPS 32(2019).Google Scholar
H. Peng, J. Wu, S. Chen, and J. Huang. 2019. Collaborative Channel Pruning for Deep Networks. In ICML. 5113–5122.Google Scholar
H. Pouransari, Z. Tu, and O. Tuzel. 2020. Least squares binary quantization of neural networks. In CVPRW. 698–699.Google Scholar
Z. Qi, W. Chen, R. A. Naqvi, and K. Siddique. 2022. Designing Deep Learning Hardware Accelerator and Efficiency Evaluation. Comput. Intell. and Neurosci. 2022 (2022).Google Scholar
J. Qiu, J. Wang, S. Yao, K. Guo, B. Li, E. Zhou, J. Yu, T. Tang, N. Xu, S. Song, et al. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In ACM FPGA. 26–35.Google Scholar
I. Radosavovic, R. P. Kosaraju, R. Girshick, K. He, and P. Dollár. 2020. Designing network design spaces. In CVPR. 10428–10436.Google Scholar
Y. Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh. 2021. Dynamicvit: Efficient vision transformers with dynamic token sparsification. NIPS 34(2021), 13937–13949.Google Scholar
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi. 2016. XNOR-Net: Imagenet classification using binary convolutional neural networks. In ECCV. 525–542.Google Scholar
P. P. Ray. 2022. A review on TinyML: State-of-the-art and prospects. Journal of King Saud University-Computer and Information Sciences 34, 4(2022), 1595–1623.Google ScholarDigital Library
E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. V. Le, and A. Kurakin. 2017. Large-scale evolution of image classifiers. In ICML. 2902–2911.Google Scholar
P. Ren, Y. Xiao, X. Chang, P.-Y. Huang, Z. Li, X. Chen, and X. Wang. 2021. A comprehensive survey of neural architecture search: Challenges and solutions. CSUR 54, 4 (2021), 1–34.Google Scholar
D. Roggen, R. Cobden, A. Pouryazdan, and M. Zeeshan. 2022. Wearable FPGA platform for accelerated dsp and ai applications. In PerComW. 66–69.Google Scholar
B. Rokh, A. Azarpeyvand, and A. Khanteymoori. 2023. A comprehensive survey on model quantization for deep neural networks in image classification. TIST 14, 6 (2023), 1–50.Google ScholarDigital Library
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer. 2022. High-Resolution Image Synthesis With Latent Diffusion Models. In CVPR. 10684–10695.Google Scholar
C. Sakr, S. Dai, R. Venkatesan, B. Zimmer, W. Dally, and B. Khailany. 2022. Optimal clipping and magnitude-aware differentiation for improved quantization-aware training. In ICML. 19123–19138.Google Scholar
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. 2018. Mobilenetv2: Inverted residuals and linear bottlenecks. In CVPR. 4510–4520.Google Scholar
R. Schwartz, J. Dodge, N. A. Smith, and O. Etzioni. 2020. Green ai. CACM 63, 12 (2020), 54–63.Google ScholarDigital Library
L. Sekanina. 2021. Neural architecture search and hardware accelerator co-search: A survey. IEEE access 9(2021), 151337–151362.Google ScholarCross Ref
K. P. Seng, P. J. Lee, and L. M. Ang. 2021. Embedded intelligence on FPGA: Survey, applications and challenges. Electronics 10, 8 (2021), 895.Google ScholarCross Ref
Y. Shang, Z. Yuan, B. Xie, B. Wu, and Y. Yan. 2023. Post-training quantization on diffusion models. In CVPR. 1972–1981.Google Scholar
K. Simonyan and A. Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In ICLR.Google Scholar
S. Sinha. 2023. State of IoT 2023: Number of connected IoT devices growing 16% to 16.7 billion globally. Retrieved November 2, 2023 from https://iot-analytics.com/number-connected-iot-devices/Google Scholar
Y. Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Ermon, and B. Poole. 2021. Score-based generative modeling through stochastic differential equations. In ICLR.Google Scholar
A. Srinivas, T.-Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani. 2021. Bottleneck transformers for visual recognition. In CVPR. 16519–16529.Google Scholar
A. Stoutchinin, F. Conti, and L. Benini. 2019. Optimally scheduling CNN convolutions for efficient memory access. arXiv preprint arXiv:1902.01492(2019).Google Scholar
E. Strubell, A. Ganesh, and A. McCallum. 2019. Energy and policy considerations for deep learning in NLP. ACL.Google Scholar
Z. Su, L. Fang, W. Kang, D. Hu, M. Pietikäinen, and L. Liu. 2020. Dynamic group convolution for accelerating convolutional neural networks. In ECCV. 138–155.Google Scholar
M. Sultana, M. Naseer, M. H. Khan, S. Khan, and F. S. Khan. 2022. Self-Distilled Vision Transformer for Domain Generalization. In ACCV. 3068–3085.Google Scholar
M. Sun, Z. Liu, A. Bair, and J. Z. Kolter. 2023. A Simple and Effective Pruning Approach for Large Language Models. arXiv preprint arXiv:2306.11695(2023).Google Scholar
M. Sun, H. Ma, G. Kang, Y. Jiang, T. Chen, X. Ma, Z. Wang, and Y. Wang. 2022. VAQF: Fully Automatic Software-hardware Co-design Framework for Low-bit Vision Transformer. arXiv preprint arXiv:2201.06618(2022).Google Scholar
Y. Sun, H. Wang, B. Xue, Y. Jin, G. G. Yen, and M. Zhang. 2020. Surrogate-Assisted Evolutionary Deep Learning Using an End-to-End Random Forest-Based Performance Predictor. TEVC 24, 2 (2020), 350–364.Google ScholarCross Ref
V. Sze, Y.-H. Chen, T.-J. Yang, and J. S. Emer. 2020. How to evaluate deep neural network processors: Tops/w (alone) considered harmful. SSC-M 12, 3 (2020), 28–41.Google Scholar
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. A. Alemi. 2017. Inception-v4, inception-resnet and the impact of residual connections on learning. In AAAI.Google Scholar
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In CVPR. 1–9.Google Scholar
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. 2016. Rethinking the inception architecture for computer vision. In CVPR. 2818–2826.Google Scholar
A. Talwalkar. 2020. The push for energy efficient ”Green AI”. Retrieved November 2, 2023 from https://spectrum.ieee.org/energy-efficient-green-ai-strategiesGoogle Scholar
J. Tan, L. Niu, J. K. Adams, V. Boominathan, J. T. Robinson, R. G. Baraniuk, and A. Veeraraghavan. 2019. Face Detection and Verification Using Lensless Cameras. TCI 5, 2 (2019), 180–194.Google Scholar
M. Tan, B. Chen, R. Pang, V. Vasudevan, M. Sandler, A. Howard, and Q. V. Le. 2019. MnasNet: Platform-aware neural architecture search for mobile. In CVPR. 2820–2828.Google Scholar
M. Tan and Q. Le. 2019. EfficientNet: Rethinking model scaling for convolutional neural networks. In ICML. 6105–6114.Google Scholar
M. Tan and Q. Le. 2021. EfficientNetV2: Smaller models and faster training. In ICML. 10096–10106.Google Scholar
M. Tan and Q. V. Le. 2019. MixConv: Mixed depthwise convolutional kernels. (2019).Google Scholar
C. Tao, L. Hou, W. Zhang, L. Shang, X. Jiang, Q. Liu, P. Luo, and N. Wong. 2022. Compression of generative pre-trained language models via quantization. In ACL.Google Scholar
A. Tarvainen and H. Valpola. 2017. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. In NIPS, Vol. 30.Google Scholar
Y. Tay, M. Dehghani, D. Bahri, and D. Metzler. 2021. Efficient transformers: A survey. CSUR 54, 4 (2021), 1–41.Google Scholar
Y. Tian, D. Krishnan, and P. Isola. 2020. Contrastive Representation Distillation. (2020).Google Scholar
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jégou. 2021. Training data-efficient image transformers & distillation through attention. In ICML. 10347–10357.Google Scholar
H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971(2023).Google Scholar
H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288(2023).Google Scholar
S. Um, S. Kim, S. Kim, and H.-J. Yoo. 2021. A 43.1 tops/w energy-efficient absolute-difference-accumulation operation computing-in-memory with computation reuse. TCAS-II 68, 5 (2021), 1605–1609.Google ScholarCross Ref
H. Vanholder. 2016. Efficient inference with tensorrt. In GPU Technology Conference, Vol. 1.Google Scholar
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. 2017. Attention is all you need. NIPS 30(2017).Google Scholar
L. N. Viet, T. N. Dinh, D. T. Minh, H. N. Viet, and Q. L. Tran. 2021. UET-Headpose: A sensor-based top-view head pose dataset. In KSE. 1–7.Google Scholar
A. Wan, X. Dai, P. Zhang, Z. He, Y. Tian, S. Xie, B. Wu, M. Yu, T. Xu, K. Chen, et al. 2020. Fbnetv2: Differentiable neural architecture search for spatial and channel dimensions. In CVPR. 12965–12974.Google Scholar
H. Wang, Z. Zhang, and S. Han. 2021. Spatten: Efficient sparse attention architecture with cascade token and head pruning. In HPCA. 97–110.Google Scholar
L. Wang, X. Dong, Y. Wang, L. Liu, W. An, and Y. Guo. 2022. Learnable Lookup Table for Neural Network Quantization. In CVPR. 12423–12433.Google Scholar
N. Wang, J. Choi, D. Brand, C.-Y. Chen, and K. Gopalakrishnan. 2018. Training deep neural networks with 8-bit floating point numbers. In NIPS. 7686–7695.Google Scholar
S. Wang, B. Z. Li, M. Khabsa, H. Fang, and H. Ma. 2020. Linformer: Self-attention with linear complexity. arXiv preprint arXiv:2006.04768(2020).Google Scholar
X. Wang, M. Kan, S. Shan, and X. Chen. 2019. Fully learnable group convolution for acceleration of deep neural networks. In CVPR. 9049–9058.Google Scholar
X. Wang, L. L. Zhang, Y. Wang, and M. Yang. 2022. Towards efficient vision transformer inference: a first study of transformers on mobile devices. In WMCSA. 1–7.Google Scholar
Z. Wang, K. Xu, S. Wu, L. Liu, L. Liu, and D. Wang. 2020. Sparse-YOLO: Hardware/software co-design of an FPGA accelerator for YOLOv2. IEEE Access 8(2020), 116569–116585.Google ScholarCross Ref
X. Wei, C. H. Yu, P. Zhang, Y. Chen, Y. Wang, H. Hu, Y. Liang, and J. Cong. 2017. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs. In DAC. 1–6.Google Scholar
M. E. Wolf and M. S. Lam. 1991. A data locality optimizing algorithm. In PLDI. 30–44.Google Scholar
M. Wortsman, G. Ilharco, S. Y. Gadre, R. Roelofs, R. Gontijo-Lopes, A. S. Morcos, H. Namkoong, A. Farhadi, Y. Carmon, S. Kornblith, et al. 2022. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. (2022), 23965–23998.Google Scholar
B. Wu, X. Dai, P. Zhang, Y. Wang, F. Sun, Y. Wu, Y. Tian, P. Vajda, Y. Jia, and K. Keutzer. 2019. FBNet: Hardware-aware efficient ConvNet design via differentiable neural architecture search. In CVPR. 10734–10742.Google Scholar
B. Wu, A. Wan, X. Yue, P. Jin, S. Zhao, N. Golmant, A. Gholaminejad, J. Gonzalez, and K. Keutzer. 2018. Shift: A zero flop, zero parameter alternative to spatial convolutions. In CVPR. 9127–9135.Google Scholar
H. Wu, B. Xiao, N. Codella, M. Liu, X. Dai, L. Yuan, and L. Zhang. 2021. Cvt: Introducing convolutions to vision transformers. In ICCV. 22–31.Google Scholar
X. Wu, C. Li, R. Y. Aminabadi, Z. Yao, and Y. He. 2023. Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases. arXiv preprint arXiv:2301.12017(2023).Google Scholar
Z. Wu, Z. Liu, J. Lin, Y. Lin, and S. Han. 2020. Lite transformer with long-short range attention. ICLR.Google Scholar
T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick. 2021. Early convolutions help transformers see better. NIPS 34(2021).Google Scholar
H. Xie, M.-X. Lee, T.-J. Chen, H.-J. Chen, H.-I. Liu, H.-H. Shuai, and W.-H. Cheng. 2023. Most Important Person-guided Dual-branch Cross-Patch Attention for Group Affect Recognition. In ICCV. 20598–20608.Google Scholar
R. Xu, E. H.-M. Sha, Q. Zhuge, Y. Song, and H. Wang. 2023. Loop interchange and tiling for multi-dimensional loops to minimize write operations on NVMs. JSA 135(2023), 102799.Google Scholar
Y. Xue, C. Chen, and A. Słowik. 2023. Neural Architecture Search Based on A Multi-objective Evolutionary Algorithm with Probability Stack. TEVC 27, 4 (2023).Google Scholar
C. Yang, L. Xie, C. Su, and A. L. Yuille. 2019. Snapshot distillation: Teacher-student optimization in one generation. In CVPR. 2859–2868.Google Scholar
J. Yang, B. Martinez, A. Bulat, G. Tzimiropoulos, et al. 2021. Knowledge distillation via softmax regression representation learning. In ICLR.Google Scholar
L. Yang, H. Jiang, R. Cai, Y. Wang, S. Song, G. Huang, and Q. Tian. 2021. Condensenet v2: Sparse feature reactivation for deep networks. In CVPR. 3569–3578.Google Scholar
T.-J. Yang, A. Howard, B. Chen, X. Zhang, A. Go, M. Sandler, V. Sze, and H. Adam. 2018. Netadapt: Platform-aware neural network adaptation for mobile applications. In ECCV. 285–300.Google Scholar
T.-J. Yang, Y.-L. Liao, and V. Sze. 2021. Netadaptv2: Efficient neural architecture search with fast super-network training and architecture optimization. In CVPR. 2402–2411.Google Scholar
Z. Yao, Z. Dong, Z. Zheng, A. Gholami, J. Yu, E. Tan, L. Wang, Q. Huang, Y. Wang, M. Mahoney, et al. 2021. Hawq-v3: Dyadic neural network quantization. In ICML. 11875–11886.Google Scholar
J. Ye, X. Chen, N. Xu, C. Zu, Z. Shao, S. Liu, Y. Cui, Z. Zhou, C. Gong, Y. Shen, et al. 2023. A comprehensive capability analysis of GPT-3 and GPT-3.5 series models. arXiv preprint arXiv:2303.10420(2023).Google Scholar
J. Ye, X. Lu, Z. Lin, and J. Z. Wang. 2018. Rethinking the smaller-norm-less-informative assumption in channel pruning of convolution layers. In ICLR.Google Scholar
H. Yin, A. Vahdat, J. Alvarez, A. Mallya, J. Kautz, and P. Molchanov. 2022. AdaViT: Adaptive Tokens for Efficient Vision Transformer. (2022), 10809–10818.Google Scholar
J. Yoon, D. Kang, and M. Cho. 2022. Semi-supervised Domain Adaptation via Sample-to-Sample Self-Distillation. In WACV. 1978–1987.Google Scholar
H. You, X. Chen, Y. Zhang, C. Li, S. Li, Z. Liu, Z. Wang, and Y. Lin. 2020. ShiftAddNet: A Hardware-Inspired Deep Network. NIPS 33(2020), 2771–2783.Google Scholar
C. Yu, T. Chen, and Z. Gan. 2023. Boost Transformer-based Language Models with GPU-Friendly Sparsity and Quantization. In ACL. 218–235.Google Scholar
J. Yu, J. Liu, X. Wei, H. Zhou, Y. Nakata, D. Gudovskiy, T. Okuno, J. Li, K. Keutzer, and S. Zhang. 2022. Cross-domain object detection with mean-teacher transformer. In ECCV.Google Scholar
L. Yuan, Y. Chen, T. Wang, W. Yu, Y. Shi, Z. Jiang, F. E. Tay, J. Feng, and S. Yan. 2021. Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet. In ICCV. 558–567.Google Scholar
L. Yuan, F. E. Tay, G. Li, T. Wang, and J. Feng. 2020. Revisiting knowledge distillation via label smoothing regularization. In CVPR. 3903–3911.Google Scholar
M. Yuan and Y. Lin. 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68, 1 (2006), 49–67.Google ScholarCross Ref
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM FPGA. 161–170.Google Scholar
C. Zhang, G. Sun, Z. Fang, P. Zhou, P. Pan, and J. Cong. 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. TCAD 38, 11 (2018), 2072–2085.Google Scholar
H. Zhang, Z. Hu, W. Qin, M. Xu, and M. Wang. 2021. Adversarial co-distillation learning for image recognition. Pattern Recognition 111(2021), 107659.Google ScholarCross Ref
H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y. Shum. 2023. DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection. In ICLR.Google Scholar
L. Zhang, A. Rao, and M. Agrawala. 2023. Adding Conditional Control to Text-to-Image Diffusion Models. In ICCV. 3836–3847.Google Scholar
L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma. 2019. Be your own teacher: Improve the performance of convolutional neural networks via self distillation. In ICCV. 3713–3722.Google Scholar
S. Zhang, Z. Du, L. Zhang, H. Lan, S. Liu, L. Li, Q. Guo, T. Chen, and Y. Chen. 2016. Cambricon-X: An accelerator for sparse neural networks. In MICRO. 1–12.Google Scholar
X. Zhang, X. Zhou, M. Lin, and J. Sun. 2018. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In CVPR. 6848–6856.Google Scholar
Y. Zhang and N. M. Freris. 2023. Adaptive Filter Pruning via Sensitivity Feedback. TNNLS (2023), 1–13.Google Scholar
Y. Zhang, T. Xiang, T. M. Hospedales, and H. Lu. 2018. Deep mutual learning. In CVPR. 4320–4328.Google Scholar
Z. Zhang, J. Li, W. Shao, Z. Peng, R. Zhang, X. Wang, and P. Luo. 2019. Differentiable learning-to-group channels via groupable convolutional neural networks. In ICCV. 3542–3551.Google Scholar
B. Zhao, Q. Cui, R. Song, Y. Qiu, and J. Liang. 2022. Decoupled Knowledge Distillation. In CVPR. 11953–11962.Google Scholar
D. Zhou, Q. Hou, Y. Chen, J. Feng, and S. Yan. 2020. Rethinking bottleneck structure for efficient mobile network design. In ECCV. 680–697.Google Scholar
X. Zhou, Z. Du, Q. Guo, S. Liu, C. Liu, C. Wang, X. Zhou, L. Li, T. Chen, and Y. Chen. 2018. Cambricon-S: Addressing irregularity in sparse neural networks through a cooperative software/hardware approach. In MICRO. 15–28.Google Scholar
Y. Zhou, X. Dong, B. Akin, M. Tan, D. Peng, T. Meng, A. Yazdanbakhsh, D. Huang, R. Narayanaswami, and J. Laudon. 2021. Rethinking co-design of neural architectures and hardware accelerators. arXiv preprint arXiv:2102.08619(2021).Google Scholar
C. Zhu, S. Han, H. Mao, and W. J. Dally. 2017. Trained Ternary Quantization. In ICLR.Google Scholar
B. Zoph and Q. V. Le. 2017. Neural architecture search with reinforcement learning. ICLR (2017).Google Scholar

Index Terms

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

Recommendations

Compression of Deep Learning Models for Text: A Survey
In recent years, the fields of natural language processing (NLP) and information retrieval (IR) have made tremendous progress thanks to deep learning models like Recurrent Neural Networks (RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (...
Read More
Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing, and speech recognition. However, their superior performance comes at the considerable ...
Read More
TinyM²Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural Networks
With the evaluation of Artificial Intelligence (AI), there has been a resurgence of interest in how to use AI algorithms on low-power embedded systems to broaden potential use cases of the Internet of Things (IoT). To mimic multimodal human perception, ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Computing Surveys Just Accepted
ISSN:0360-0300
EISSN:1557-7341
Table of Contents

Copyright © 2024 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Online AM: 11 May 2024
- Accepted: 2 April 2024
- Revised: 2 March 2024
- Received: 15 December 2022
Check for updates
Author Tags
Lightweight model
efficient transformer
model compression
quantization
tinyML
large language models
Qualifiers
- survey
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 109
  Total Downloads
- Downloads (Last 12 months)109
- Downloads (Last 6 weeks)109
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Compression of Deep Learning Models for Text: A Survey

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

TinyM²Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Lightweight Deep Learning for Resource-Constrained Environments: A Survey

ACM Computing Surveys

Abstract

References

Cited By

Index Terms

Recommendations

Compression of Deep Learning Models for Text: A Survey

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

TinyM2Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural Networks

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media

TinyM²Net-V2: A Compact Low-power Software Hardware Architecture for Multimodal Deep Neural Networks