Skip to content

Transformer

transformer

Architecture from attention-is-all-you-need paper

encoder-decoder structure using self-attention

No rnn → parallel processing → faster training

Basis for GPT, BERT, T5


*References


*References

#ml-notes

On this page