Model Detail
Reason-ModernColBERT
—Rethinking Math Reasoning Evaluation: A Robust LLM-as-a-Judge Framework Beyond Symbolic Rigidity
arXiv:2604.22597v1 Announce Type: new Abstract: Recent advancements in large language models have led to significant improvements across various tasks, including mathematical reasoning, which is used to assess models' intelligence in logical reasoning and problem-solving. Models are evaluated on mat
LLMs as Assessors: Right for the Right Reason?
arXiv:2601.08919v2 Announce Type: replace-cross Abstract: A good deal of recent research has focused on how Large Language Models (LLMs) may be used as judges in place of humans to evaluate the quality of the output produced by various text / image processing systems. Within this broader context, a
Emergent Strategic Reasoning Risks in AI: A Taxonomy-Driven Evaluation Framework
arXiv:2604.22119v1 Announce Type: new Abstract: As reasoning capacity and deployment scope grow in tandem, large language models (LLMs) gain the capacity to engage in behaviors that serve their own objectives, a class of risks we term Emergent Strategic Reasoning Risks (ESRRs). These include, but ar
Sum-of-Checks: Structured Reasoning for Surgical Safety with Large Vision-Language Models
arXiv:2604.22156v1 Announce Type: new Abstract: Purpose: Accurate assessment of the Critical View of Safety (CVS) during laparoscopic cholecystectomy is essential to prevent bile duct injury, a complication associated with significant morbidity and mortality. While large vision-language models (LVLM
Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought
arXiv:2604.22709v1 Announce Type: new Abstract: While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representati
Contexts are Never Long Enough: Structured Reasoning for Scalable Question Answering over Long Document Sets
arXiv:2604.22294v1 Announce Type: cross Abstract: Real-world document question answering is challenging. Analysts must synthesize evidence across multiple documents and different parts of each document. However, any fixed LLM context window can be exceeded as document collections grow. A common work