Transformers
Transformers
Attention-based architecture without recurrence
→ Replaces recurrent-neural-networkin NLP → Self-attention mechanism processes entire sequence in parallel → Foundation for GPT-3, BERT, T5 → Key innovation: positional encoding