论文阅读

用中文整理重要论文的核心问题、方法脉络和结果判断。少而清晰,方便回看。

最近阅读 Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Lin et al., Oxford / Stanford, 2026

LLM

大语言模型

4 篇
  1. Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

    Liang et al., Stanford / Meta, 2025

    MoT 按模态拆分 Transformer 的非 embedding 参数,同时保留全局 self-attention,让多模态预训练用更少 FLOPs 达到 dense baseline 质量。

  2. Training Compute-Optimal Large Language Models

    Hoffmann et al., DeepMind, 2022

    Chinchilla 重新估计算力最优缩放:模型参数和训练 token 应近似等比例增长。

  3. Scaling Laws for Neural Language Models

    Kaplan et al., OpenAI, 2020

    语言模型损失随参数量、数据量、算力呈幂律下降;固定算力下应优先放大模型并早停。

  4. Outrageously Large Neural Networks

    Shazeer et al., Google Brain, 2017

    稀疏门控 MoE 让模型拥有巨大总参数量,但每个输入只激活少数专家。