Metaphor Abstraction and Reasoning Corpus v2
May 11, 2026
MARC2 is a dataset of abstract reasoning puzzles drawn from ARC-AGI2 where figurative language—metaphors, analogies, and domain-grounded reframings—demonstrably helps AI models solve tasks they cannot crack from examples alone.
The dataset exploits the capability gap between models. Claude Opus 4.6 (82% on ARC-AGI2 training) solves tasks and distills its reasoning into language-complete descriptions. Smaller subject models are then tested under controlled conditions—examples only, language only, and both together—to identify where figurative reframings unlock understanding that neither modality provides on its own.
Each verified puzzle ships with 15 domain-diverse figurative variants spanning domains from music theory to fluid dynamics, enabling fine-grained study of how conceptual framing interacts with model reasoning.
Upper Bound — Claude Opus 4.6
Lower Bound — Subject Models
A task has the MARC property for a given model when:
This three-way contrast isolates the unique contribution of figurative language—it is neither a substitute for examples nor redundant with them.
| Model | Ex. Only | Lang. Only | Both | MARC Tasks | MARC Clues |
|---|---|---|---|---|---|
| qwen3.6-35b | 15.3% | 35.9% | 31.7% | 126 | 455 |
| gemma-4-26b | 25.8% | 59.3% | 54.6% | 110 | 506 |
| gpt-oss-120b | 25.8% | 58.2% | 51.5% | 104 | 848 |
| qwen3.6-27b | 12.8% | 39.3% | 29.3% | 101 | 358 |
| gemma-4-31b | 37.8% | 68.7% | 68.0% | 96 | 445 |
| qwen3.5-122b | 11.6% | 40.8% | 31.7% | 77 | 322 |
| gpt-oss-20b | 11.0% | 42.1% | 27.1% | 67 | 261 |
| nemotron-3-super | 9.6% | 41.0% | 35.5% | 57 | 235 |
Baseline accuracy on 791 validated tasks under three conditions. "MARC Tasks" = unique tasks satisfying the full MARC property (examples fail, figurative alone fails, figurative + examples succeed). "MARC Clues" = total figurative descriptions exhibiting the property across all variants.
Featured views use gpt-oss-120b as the primary subject model. Views for all eight subject models are available below.