Skip to content

Transformers

Transformers

Attention-based architecture without recurrence

→ Replaces recurrent-neural-networkin NLP → Self-attention mechanism processes entire sequence in parallel → Foundation for GPT-3, BERT, T5 → Key innovation: positional encoding


*References

On this page