MMAU logo

MMAU‑Pro

A Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence

Authors

Examples

Overview of the MMAU-Pro benchmark. MMAU-Pro provides comprehensive coverage across all three core audio domains-speech, sound, and music-and extends evaluation to their mixtures. It further includes multi-audio reasoning, long-form audio (up to 10 minutes), voice-chat QA, spatial audio understanding, open-ended QA, and multimodal instruction following, offering a broad and realistic assessment of audio intelligence.

Abstract

Highlights

    Skill Coverage

    (Left) Distribution of audio perception skills required for questions in the MMAU-Pro across the domains of sound, speech, and music. (Right) Distribution of auditory reasoning skills required for questions in MMAU-Pro. Each question in MMAU-Pro demands the model to apply one or more of the perception and reasoning skills to generate a reliable and accurate response.

    Skills overview diagram

    What the skills test

    • Speech: ASR‑plus reasoning (semantics, coreference, intent).
    • Sound: non‑speech events; causal and physical reasoning.
    • Music: instruments, rhythm, theory descriptors.
    • Spatial: binaural cues, relative positions, motion.
    • Multi‑audio: mixture attribution, stream segregation.
    • Voice‑chat: persona, prosody, multi‑turn QA.
    • Instruction following: constrained multi‑step tasks.

    At a Glance

    Breakdown

    Model Performance — Leaderboard

    Sorted by overall average (desc)
    |
    #ModelSizeAverage (%)

    Medals: 🥇 🥈 🥉 for top 3 (excluding Human/Random baselines).

    Full Results Table

    Data source: results.json

    BibTeX

    @article{kumar2025mmau,
      title={MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence},
      author={Kumar, Sonal and Sedl{\'a}{\v{c}}ek, {\v{S}}imon and Lokegaonkar, Vaibhavi and L{\'o}pez, Fernando and Yu, Wenyi and Anand, Nishit and Ryu, Hyeonggon and Chen, Lichang and Pli{\v{c}}ka, Maxim and Hlav{\'a}{\v{c}}ek, Miroslav and others},
      journal={arXiv preprint arXiv:2508.13992},
      year={2025}
    }
    }

    Links & Contact