Transformer
transformer
Architecture from attention-is-all-you-need paper
encoder-decoder structure using self-attention
No rnn → parallel processing → faster training
Basis for GPT, BERT, T5
*References
*References
#ml-notes
Architecture from attention-is-all-you-need paper
encoder-decoder structure using self-attention
No rnn → parallel processing → faster training
Basis for GPT, BERT, T5
#ml-notes