سامانه پژوهشی مرکز آموزش عالی فنی و مهندسی بوئین زهرا | A Novel Approach to Transformer-in-Transformer Network for Knowledge Distillation

عنوان	A Novel Approach to Transformer-in-Transformer Network for Knowledge Distillation
نوع پژوهش	داوری و نظارت بر فعالیت‌های پژوهشی
کلیدواژه‌ها	Knowledge Distillation, Vision Transformer, Attention Mechanism, Image Classification
چکیده	This paper presents a novel knowledge distillation neural architecture leveraging efficient transformer networks for effective image classification. Recently, vision transformer model has demonstrated its potential in many tasks but training a transformer networks is time consuming. Moreover, when transformer meets knowledge distillation, the student network does not provide satisfactory results in terms of image classification problem. For a trade-off between performance and training time, we have proposed an in-transformer student network for knowledge distillation where teacher network is VGG16. The proposed network uses both transformers model as input and output of the student model in order to learn strong representations. The transformerin-transformer network are lightweight due to its distillation process in feature extraction layer. Substantial experiments on MNIST, CIFAR10, and CIFAR100 show the strength of our featured network in top-1 and top-5 accuracy. Extensive ablative analysis depicts the selected parameter & settings are effective against other state-of-the-art methods.
پژوهشگران	سید علیرضا بشیری موسوی (داور)