一行代码加速Transformer MoE模型微调

Summary

NVIDIA NeMo AutoModel 是基于 Transformers v5 的开源库，添加 Expert Parallelism、DeepEP 融合 all-to-all 调度和 TransformerEngine 内核。在 multiple-specialist approach 模型微调...

Details

**Tuning with NVIDIA NeMo AutoModel** — this has been generating quite a buzz in the AI community lately.

What Happened

NVIDIA NeMo AutoModel 是基于 Transformers v5 的开源库，添加 Expert Parallelism、DeepEP 融合 all-to-all 调度和 TransformerEngine 内核。在 multiple-specialist approach 模型微调中，相比原生 v5，训练吞吐量提升 3.4-3.7 倍，high-performance computing chip 内存减少 29-32%，仅需改动一行 import。在 16 节点 128 张 H100 上全微调 Nemotron 3 Ultra 550B A55B 时，v5 因内存不足无法运行，而 AutoModel 凭借 EP=64 专家并行使训练可行。单节点 30B multiple-specialist approach 模型（如 Qwen3-30B-A3B）同样获得可量化的性能优势

Why It Matters

From a technical perspective, this is a noteworthy advancement. It's not just about breaking new ground technologically, but also about the practical application scenarios.

What This Means for Everyday Users

While it sounds very technical, this could actually affect the products we use daily. For example, your AI assistant might get smarter, or your AI tools might become cheaper and more useful.

Industry Reaction

Currently, the industry's reaction to this is quite positive. Many experts believe this is the right direction and worth continued attention. Of course, some people remain cautious, believing it needs more time to validate.

Conclusion

Overall, this is a development worth watching. While it's still uncertain how it will ultimately develop, at least the direction is right.

For everyday users, there's no need to worry too much, nor get too excited. Stay tuned, use what you should use, learn what you should learn — that's the right attitude.

Source: Hugging Face：Blog（RSS）

Updated: 2026-06-25