MIRROR: A Hierarchical Benchmark for Metacognitive Calibration in Large Language Models - Databubble