How did transformers get this good?

175B params and RLHF.

RLHF

1. HF part

2. RL part

[1] Lambert, et al., "Illustrating Reinforcement Learning from Human Feedback (RLHF)", Hugging Face Blog, 2022.