·
DataBubble
  • Home
  • Models
  • News
  • Compare
  • Boards
  • Pricing
  • About
  • Newsletter
  • Methodology
  • Contact
Latest
SpaceX has an AI device prototype, and it sure sounds phone-ish1h◆Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller1h◆The latest AI news we announced in June 20262h◆Cloudflare’s new policy pushes AI companies to pay for publishers’ content2h◆New York City educators and industry leaders gathered at Google’s offices to shape the future of AI in classrooms.4h◆LLMs are stuck in a groupthink groove. This startup is trying to get them out.5h◆Venice AI becomes a unicorn with $65M Series A as its privacy-first AI platform takes off5h◆Gemini Spark, Google’s agentic assistant, is now available on Mac6h◆Builders Stage agenda revealed: Practical strategies for scaling startups at TechCrunch Disrupt 20266h◆Meta, like SpaceX, looks to turn excess AI compute into cash6h◆Google built a great smart speaker, but Gemini isn’t ready for it8h◆From Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue16h◆Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments16h◆When the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue16h◆Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering16h◆Tailored minimal reservoir computing: on the bidirectional connection between nonlinearities in the reservoir and in data16h◆Private Rate-Constrained Optimization with Applications to Fair Learning16h◆Diffusion-warm sampling of the XY model enables fast thermalization at scale16h◆BLUEX v2: Benchmarking LLMs on Open-Ended Questions from Brazilian University Entrance Exams16h◆CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations16h◆SpaceX has an AI device prototype, and it sure sounds phone-ish1h◆Ashton Kutcher leaving Sound Ventures to launch new VC firm with Morgan Beller1h◆The latest AI news we announced in June 20262h◆Cloudflare’s new policy pushes AI companies to pay for publishers’ content2h◆New York City educators and industry leaders gathered at Google’s offices to shape the future of AI in classrooms.4h◆LLMs are stuck in a groupthink groove. This startup is trying to get them out.5h◆Venice AI becomes a unicorn with $65M Series A as its privacy-first AI platform takes off5h◆Gemini Spark, Google’s agentic assistant, is now available on Mac6h◆Builders Stage agenda revealed: Practical strategies for scaling startups at TechCrunch Disrupt 20266h◆Meta, like SpaceX, looks to turn excess AI compute into cash6h◆Google built a great smart speaker, but Gemini isn’t ready for it8h◆From Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue16h◆Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments16h◆When the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue16h◆Clinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering16h◆Tailored minimal reservoir computing: on the bidirectional connection between nonlinearities in the reservoir and in data16h◆Private Rate-Constrained Optimization with Applications to Fair Learning16h◆Diffusion-warm sampling of the XY model enables fast thermalization at scale16h◆BLUEX v2: Benchmarking LLMs on Open-Ended Questions from Brazilian University Entrance Exams16h◆CORTEX: Token-Level Hallucination Detection in RAG via Comparative Internal Representations16h◆
News/BLUEX v2: Benchmarking LLMs on Open-Ended Questions from Brazilian University Entrance Exams
arxiv
PublishedJuly 1, 2026 at 4:00 AM
—neutral

BLUEX v2: Benchmarking LLMs on Open-Ended Questions from Brazilian University Entrance Exams

Source
arxiv.orgfull article ↗
Read on arxiv→
Publisher summary· verbatim

arXiv:2606.22723v2 Announce Type: replace Abstract: Although Large Language Models (LLMs) excel in many tasks, their assessment in Portuguese has received less attention, particularly for open-ended, discursive tasks that demand deeper reasoning and generation capabilities. While the original BLUEX

Stay posted· Newsletter

A 5-min weekly brief — top movers, price watch, story of the week.

// no spam · unsubscribe one-click · free forever

Discussion
Source
↗
arxiv
Read original ↗All from arxiv →

No replies yet. Be first.

Source
↗
arxiv
Read original ↗All from arxiv →

Related coverage

More from ARXIV
arxivFrom Propositional to Perceptual Asymmetry: Extending Frictive Policy Optimization to Asymmetric Partial Information Dialogue16harxivMeasuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments16harxivWhen the Database Fails: Prompting LLM Dialogue Agents for Safe Recovery in Task-Oriented Dialogue16harxivClinically Structured Rank-Gated LoRA for Cross-Benchmark Medical Question Answering16h
The Bubble Brief
WEEKLY

Read AI insights every Tuesday — top movers, new releases, story of the week.

// no spam · unsubscribe one-click · free forever

Originally published on arxiv ↗
HomeModelsNews