Exploring Mixture of Experts Architectures

Writing

MoE is how the industry is scaling LLMs beyond dense transformer limits. Studying the key architectures and implementation details.

Reading List

Key Insights