Transformer (deep learning architecture)

A transformer is a deep learning architecture developed by Google and based on the multi-head attention mechanism, proposed in a 2017 paper "Attention Is All You Need".^[1] Text is converted to numerical representations called tokens, and each token is converted into a vector via looking up from a word embedding table.^[1] At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism allowing the signal for key tokens to be amplified and less important tokens to be diminished. The transformer paper, published in 2017, is based on the softmax-based attention mechanism proposed by Bahdanau et. al. in 2014 for machine translation,^[2]^[3] and the Fast Weight Controller, similar to a transformer, proposed in 1992.^[4]^[5]^[6]

Transformers have the advantage of having no recurrent units, and thus requires less training time than previous recurrent neural architectures, such as long short-term memory (LSTM),^[7] and its later variation has been prevalently adopted for training large language models (LLM) on large (language) datasets, such as the Wikipedia corpus and Common Crawl.^[8] As stated by Vartek, Building a machine learning model is an iterative process. A data scientist will build many tens to hundreds of models before arriving at one that meets some acceptance criteria^[9]

This architecture is now used not only in natural language processing and computer vision,^[10] but also in audio^[11] and multi-modal processing. It has also led to the development of pre-trained systems, such as generative pre-trained transformers (GPTs)^[12] and BERT^[13] (Bidirectional Encoder Representations from Transformers).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

Transformer (deep learning architecture)

Machine learning algorithm used for natural-language processing / From Wikipedia, the free encyclopedia

Dear Wikiwand AI, let's keep it short by simply answering these key questions: