arxivJun 20

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

arXiv:2606.18191v2 Announce Type: replace Abstract: Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows

DR1 model #workflow #benchmark #personalization Read on arxiv →

arxivMay 1bearish

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

arXiv:2604.28139v1 Announce Type: cross Abstract: LLM agents are expected to complete end-to-end units of work across software tools, business services, and local workspaces. Yet many agent benchmarks freeze a curated task set at release time and grade mainly the final response, making it difficult

#benchmark #workflow #evaluation Read on arxiv →

openaiApr 10bullish

Using custom GPTs

Learn how to build and use custom GPTs to automate workflows, maintain consistent outputs, and create purpose-built AI assistants.

OPCU2 models #customization #productivity #workflow Read on openai →

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

When Does Multi-Agent RL Improve LLM Workflows? Workflow, Scale, and Policy-Sharing Tradeoffs

Claw-Eval-Live: A Live Agent Benchmark for Evolving Real-World Workflows

Using custom GPTs