Image recognition using multiple Vision Transformers in    parallel having different patch sizes

A. M. Hafiz

doi:10.17762/msea.v71i4.615

Image recognition using multiple Vision Transformers in parallel having different patch sizes

Authors

A. M. Hafiz

DOI:

https://doi.org/10.17762/msea.v71i4.615

Abstract

With the advent of Transformers which are attention-based mechanisms, many research directions have emerged. Their prowess in natural language processing tasks is well known. Extension of Transformers to computer vision is but natural. Recently, Vision Transforms (ViT’s) have achieved very good results on popular image recognition datasets. However, training Transformers is a difficult process due to the need for large computational resources. Parallel processing is a well-known phenomenon present in Nature’s most efficient data processors. Inspired by the same, I use a novel technique in which multiple ViT’s with different patch sizes are used in parallel. This is followed by averaging the probability vectors of the ViT’s for final classification. Using medium-sized ViT’s I show that without going for huge scales, state-of-the-art results are achieved on popular datasets.

Downloads

Published

2022-08-29

How to Cite

A. M. Hafiz. (2022). Image recognition using multiple Vision Transformers in parallel having different patch sizes. Mathematical Statistician and Engineering Applications, 71(4), 1183–1194. https://doi.org/10.17762/msea.v71i4.615

Download Citation

Issue

Vol. 71 No. 4 (2022)

Section

Articles

Image recognition using multiple Vision Transformers in parallel having different patch sizes

Authors

DOI:

Abstract

Downloads

Published

How to Cite

Issue

Section

Make a Submission

Downloads

Important Links

Information