Efficient Transformers

Convolutional Neural Networks have been a boon to the computer vision community. Deep learning from high-bandwidth image/video datasets can be computationally and statistically much more efficient using the inductive bias of strong locality. This streamlines inference over big datasets or on resource-limited hardware. To model sequential dependence in short sequences of low-dimensional data, we have often used LSTMs. However, researchers have recently found success adapting Transformer architectures to learn from image and video, both applications traditionally dominated by CNNs....

 · 7 min · Terry Rodriguez & Salma Mayorquin