MMAU‑Pro

Overview of the MMAU-Pro benchmark. MMAU-Pro provides comprehensive coverage across all three core audio domains-speech, sound, and music-and extends evaluation to their mixtures. It further includes multi-audio reasoning, long-form audio (up to 10 minutes), voice-chat QA, spatial audio understanding, open-ended QA, and multimodal instruction following, offering a broad and realistic assessment of audio intelligence.

Abstract

Highlights

Skill Coverage

(Left) Distribution of audio perception skills required for questions in the MMAU-Pro across the domains of sound, speech, and music. (Right) Distribution of auditory reasoning skills required for questions in MMAU-Pro. Each question in MMAU-Pro demands the model to apply one or more of the perception and reasoning skills to generate a reliable and accurate response.

What the skills test

Speech: ASR‑plus reasoning (semantics, coreference, intent).
Sound: non‑speech events; causal and physical reasoning.
Music: instruments, rhythm, theory descriptors.
Spatial: binaural cues, relative positions, motion.
Multi‑audio: mixture attribution, stream segregation.
Voice‑chat: persona, prosody, multi‑turn QA.
Instruction following: constrained multi‑step tasks.

At a Glance

Breakdown

Model Performance — Leaderboard

Sorted by overall average (desc)

Show top

|

Sort by

#	Model	Size	Average (%)

Medals: 🥇 🥈 🥉 for top 3 (excluding Human/Random baselines).

Full Results Table

Data source: results.json

Sort by

BibTeX

@article{kumar2025mmau,
  title={MMAU-Pro: A Challenging and Comprehensive Benchmark for Holistic Evaluation of Audio General Intelligence},
  author={Kumar, Sonal and Sedl{\'a}{\v{c}}ek, {\v{S}}imon and Lokegaonkar, Vaibhavi and L{\'o}pez, Fernando and Yu, Wenyi and Anand, Nishit and Ryu, Hyeonggon and Chen, Lichang and Pli{\v{c}}ka, Maxim and Hlav{\'a}{\v{c}}ek, Miroslav and others},
  journal={arXiv preprint arXiv:2508.13992},
  year={2025}
}
}

Links & Contact

📄 Paper (PDF): Open
🐙 Code: Repository
🔗 Dataset: Landing page
✉️ Corresponding author: sonalkum@umd.edu