Tag

#open-source

43 articles tagged #open-source

arxiv4d ago

Incomplete Prompt Jailbreaks in Large Language Models

arXiv:2607.20473v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly released as open-weight models with safeguards against harmful requests. Nevertheless, sentence completion remains vulnerable to incomplete harmful prompts. In this work, we formalize this phenomenon as inc

#safety #open-source #language-models Read on arxiv →

arxiv4d ago

From Static Bibliometrics to Dynamic Knowledge Graphs: An LLM-Powered Framework for Modernizing Science, Technology, and Innovation (STI) Analytics

arXiv:2607.21327v1 Announce Type: cross Abstract: Bibliometric indicators - citation counts, h-indexes, co-authorship networks - have long anchored science, technology, and innovation (STI) analytics, yet suffer from temporal lag, semantic shallowness, and an inability to capture the non-linear dyna

#open-source #collaboration #community Read on arxiv →

arxiv5d ago

Are Attributions of Consciousness to AI Chatbots Epistemically Innocent?

arXiv:2607.20001v1 Announce Type: cross Abstract: Artificial intelligence (AI) chatbots (e.g., ChatGPT) can communicate in strikingly humanlike ways. This has prompted many chatbot users to attribute psychological properties, including consciousness, to these systems. However, there is little scient

#open-source #community #collaboration Read on arxiv →

techcrunch5d ago

Arcee, a US open source AI lab, says Chinese models are not inherently dangerous

As Chinese AI models grow in capability and popularity among U.S. companies, the arguing over what should be done about them has reached a fever pitch.

KIQW2 models #open-source #security #regulation Read on techcrunch →

arxivJul 21bullish

Masked Diffusion Language Models are Strong and Steerable Text-Based World Models for Agentic RL

arXiv:2607.16204v1 Announce Type: new Abstract: Recent growth in reinforcement learning (RL) has surfaced a need for diverse, specialized training environments. Hand-curated environments with fixed task and reward difficulties become ineffective signals as model performance improves, and sparse rewa

LLMDLF5 models · +2 #reinforcement-learning #world-models #autoregressive-models Read on arxiv →

arxivJul 21bullish

Scaling Model and Data for Multilingual Machine Translation with Open Large Language Models

arXiv:2602.11961v3 Announce Type: replace Abstract: Open large language models (LLMs) have demonstrated improving multilingual capabilities in recent years. In this paper, we present a study of open LLMs for multilingual machine translation (MT) across a range of languages, and investigate the effec

MISEHY7 models · +4 #multilingual #machine-translation #open-source Read on arxiv →

arxivJul 18bullish

PhasorFlow: A Python Library for Unit Circle Based Computing

arXiv:2603.15886v4 Announce Type: replace-cross Abstract: We present PhasorFlow, an open-source Python library for computing on the $S^1$ unit circle. Inputs are encoded as complex phasors $z=e^{i\phi}$ on the $N$-torus ($\mathbb{T}^N$); as computation proceeds through unitary wave-interference gate

PHVAPH4 models · +1 #open-source #machine learning #artificial intelligence Read on arxiv →

huggingfaceJul 16bullish

NVIDIA Nemotron 3 Embed Ranks #1 Overall on RTEB, Advancing Agentic Retrieval

NENENE3 models #retrieval #embeddings #open-source Read on huggingface →

arxivJul 16bullish

From Language to Navigation Goals: A Vision-Language Approach for Semantic Navigation of Mobile Robots Using RGB-D Perception

arXiv:2607.13624v1 Announce Type: cross Abstract: Natural language interaction provides an intuitive way for non-expert users to communicate with robotic platforms. However, transforming user requests into executable navigation actions remains a challenging task, requiring the integration of languag

#robotics #navigation #open-source Read on arxiv →

arxivJul 14bullish

Index SLM Technical Report

arXiv:2607.09885v1 Announce Type: new Abstract: We present Index-1.9B, a series of open small language models developed at Bilibili. The series comprises four models: Index-1.9B-Base, a foundation model with 1.9 billion non-embedding parameters pre-trained on 2.8 trillion predominantly Chinese and E

INININ4 models · +1 #open-source #language-models #pre-training Read on arxiv →

arxivJul 14bullish

SETA: Scaling Environments for Terminal Agents

arXiv:2607.10891v1 Announce Type: new Abstract: Large language models (LLMs) are rapidly shifting toward agents that solve tasks through diverse interfaces, including web and graphical user interfaces (GUIs). Among these, the terminal command line provides a text-based, general-purpose interface, co

QWDE2 models #reinforcement learning #large language models #open-source Read on arxiv →

arxivJul 13bullish

EvoLP: Self-Evolving Latency Predictor for Model Compression in Real-Time Edge Systems

arXiv:2607.09063v1 Announce Type: new Abstract: Edge devices are increasingly utilized for deploying deep learning applications on embedded systems. The real-time nature of many applications and the limited resources of edge devices necessitate latency-targeted neural network compression. However, m

#edge-devices #model-compression #latency-prediction Read on arxiv →

arxivJul 11bullish

Svarna: An Open Corpus Workbench for Modern Greek

arXiv:2607.00970v5 Announce Type: replace Abstract: This paper introduces Svarna, a free, open-source, web-based corpus workbench for modern Greek. Svarna integrates five databases covering various registers, institutional, literary, dialectal, social media, and historical, to provide a total of mor

#open-source #language-technology #corpus Read on arxiv →

arxivJul 2

Constructing Epistemic AI Literacy: Detecting Epistemic Aims and Processes in Student-AI Co-Programming

arXiv:2607.00211v1 Announce Type: new Abstract: Epistemic thinking plays a central role in students' learning processes when applying generative artificial intelligence (GenAI), particularly in programming contexts where learners must construct queries, evaluate and validate AI-generated outputs, an

#open-source #collaboration #community Read on arxiv →

arxivJul 2bullish

WorkBench Revisited: Workplace Agents Two Years On

arXiv:2606.13715v2 Announce Type: replace Abstract: The best agent on WorkBench in March 2024, GPT-4, completed just 43% of tasks. We revisit the benchmark in June 2026 and find that the best agent to date, Claude Fable 5, now completes 98%. Beyond this considerable progress in frontier agent perfor

OPCL2 models #benchmark #safety #open-source Read on arxiv →

arxivJun 27bullish

Boundary-Aware Context Grounding for A Low-Channel EEG Agent

arXiv:2606.26519v1 Announce Type: new Abstract: Large language models (LLMs) can make scientific software easier to use. However, a general model does not automatically know which measurements a particular sensor can support, which algorithms are implemented in the current software, or which conclus

NE1 model #open-source #eeg #scientific-software Read on arxiv →

arxivJun 25bullish

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

arXiv:2507.16696v3 Announce Type: replace-cross Abstract: Industrial signal analysis is hindered by severe data heterogeneity, which we characterize as the M5 problem. Existing solutions rely on specialized models that lack robustness and scalability, while large-scale pre-training has rarely been i

FI1 model #industrial-signal-analysis #multi-modal #pre-training Read on arxiv →

arxivJun 18bullish

Guava: An Effective and Universal Harness for Embodied Manipulation

arXiv:2606.18363v1 Announce Type: cross Abstract: Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining

GU1 model #robotics #embodied ai #open-source Read on arxiv →

arxivJun 18bullish

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

arXiv:2508.04086v3 Announce Type: replace Abstract: Prior work synthesizes tool-use LLM datasets by first generating a user query, followed by complex tool-use annotations like depth-first search (DFS). This leads to inevitable annotation failures and low efficiency in data generation. We introduce

TO1 model #llm #dataset #open-source Read on arxiv →

huggingfaceJun 17bullish

GLM-5.2: Built for Long-Horizon Tasks

GLGLOP7 models · +4 #open-source #benchmark #long-horizon tasks Read on huggingface →

arxivJun 16bullish

ESBMC-PLC: Formal Verification of IEC 61131-3 Ladder Diagram Programs Using SMT-Based Model Checking

arXiv:2606.15461v1 Announce Type: new Abstract: PLCs execute safety-critical programs across industrial sectors. The dominant PLC notation, ladder diagram (LD) per IEC 61131-3, remains absent from formal verification: SMT-based model checkers cannot process LD's rung-and-coil graphics. This paper pr

ESPL2 models #formal-verification #industrial-control #open-source Read on arxiv →

arxivJun 16bullish

SorryDB: Can AI Provers Complete Real-World Lean Theorems?

arXiv:2603.02668v2 Announce Type: replace Abstract: We present SorryDB, a dynamically-updating benchmark of open Lean tasks drawn from 78 real world formalization projects on GitHub. Unlike existing static benchmarks, often composed of competition problems, hillclimbing the SorryDB benchmark will yi

GE1 model #benchmark #open-source #mathematics Read on arxiv →

arxivJun 12bullish

EurekAgent: Agent Environment Engineering is All You Need For Autonomous Scientific Discovery

arXiv:2606.13662v1 Announce Type: new Abstract: LLM-based agents have shown increasing potential in automating scientific discovery. Given an optimizable metric and an execution environment, they can propose, validate, and iterate scientific solutions, and have produced results that outperform human

EU1 model #autonomous-research #scientific-discovery #environment-engineering Read on arxiv →

arxivJun 11

Evaluating Bias in Phoneme-Based Automatic Speech Recognition Systems: An Analysis of IPA Transcription Models

arXiv:2606.11639v1 Announce Type: new Abstract: The popularization of automatic speech recognition (ASR) systems has increased exploration of the demographic biases related to race, age, gender, and accent, often formed from imbalanced training data. Most of these studies focused on standard graphem

WHZI2 models #speech recognition #bias #multilingual Read on arxiv →

arxivJun 5bullish

SciDER: Scientific Data-centric End-to-end Researcher

arXiv:2603.01421v3 Announce Type: replace Abstract: While large language models accelerate scientific discovery, existing agents face severe limitations in adaptability, domain generalization, and multimodal scalability, often struggling to autonomously process raw, domain-specific experimental data

SCOP2 models #scientific discovery #multimodal scalability #open-source Read on arxiv →

arxivJun 2

OmniEEG-Bench: A Standardized Evaluation Benchmark for EEG Foundation Models

arXiv:2606.00815v1 Announce Type: new Abstract: Electroencephalography (EEG) supports a variety of brain-computer interface (BCI) tasks ranging from brain-state monitoring to human-LLM interactions. EEG foundation models are emerging, but evaluation remains fragmented due to heterogeneous datasets a

#benchmark #machine-learning #neuroscience Read on arxiv →

arxivMay 29bullish

Formalizing Mathematics at Scale

arXiv:2605.29955v1 Announce Type: new Abstract: We present AutoformBot, a multi-agent system for building an Autoformalized Textbook Library At Scale (Atlas) in Lean 4. AutoformBot orchestrates thousands of LLM agents, equipped with formal verification tools, dependency-aware task scheduling, and co

AU1 model #autoformalization #mathematics #verification Read on arxiv →

arxivMay 26

AI-Driven Adaptive Adversaries and the Erosion of Cryptographic Trust in Public Key Systems

arXiv:2605.24542v1 Announce Type: cross Abstract: This paper examines the erosion of Public Key Cryptography (PKC) security under adaptive adversarial optimisation driven by artificial intelligence. The problem addressed is the growing mismatch between algorithm-centric cryptographic security models

#open-source #collaboration #community Read on arxiv →

arxivMay 19bullish

SAME: A Semantically-Aligned Music Autoencoder

arXiv:2605.18613v1 Announce Type: cross Abstract: Latent representations are at the heart of the majority of modern generative models. In the audio domain they are typically produced by a neural-audio-codec autoencoder. In this work we introduce SAME (Semantically-Aligned Music autoEncoder), an auto

SASASA3 models #audio #generative #autoencoder Read on arxiv →

arxivMay 15bullish

A Large Language Model Based Pipeline for Review of Systems Entity Recognition from Clinical Notes

arXiv:2506.11067v3 Announce Type: replace Abstract: Objective: Develop a cost-effective, large language model (LLM)-based pipeline for automatically extracting Review of Systems (ROS) entities from clinical notes. Materials and Methods: The pipeline extracts ROS section from the clinical note using

MEGEMI4 models · +1 #healthcare #language-models #open-source Read on arxiv →

arxivMay 14

BEHAVE: A Hybrid AI Framework for Real-Time Modeling of Collective Human Dynamics

arXiv:2605.12730v1 Announce Type: new Abstract: Existing AI systems for modeling human behavior operate at the level of individuals or detect events after they occur. As a result, they systematically fail to capture the collective dynamics that determine whether a group remains stable or transitions

#open-source #collaboration #community Read on arxiv →

arxivMay 14bullish

Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

arXiv:2604.10720v2 Announce Type: replace Abstract: Artificial students -- models that simulate how learners act and respond within educational systems -- are a promising tool for evaluating tutoring strategies and feedback mechanisms at scale. However, most existing approaches rely on prompting lar

QW1 model #open-source #education #programming Read on arxiv →

arxivMay 1bullish

QED: An Open-Source Multi-Agent System for Generating Mathematical Proofs on Open Problems

arXiv:2604.24021v2 Announce Type: replace Abstract: We explore a central question in AI for mathematics: can AI systems produce original, nontrivial proofs for open research problems? Despite strong benchmark performance, producing genuinely novel proofs remains an outstanding challenge for LLMs. Th

LLQE2 models #proof-generation #open-source #mathematics Read on arxiv →

arxivApr 30

Structural Generalization on SLOG without Hand-Written Rules

arXiv:2604.26157v1 Announce Type: cross Abstract: Structural generalization in semantic parsing requires systems to apply learned compositional rules to novel structural combinations. Existing approaches either rely on hand-written algebraic rules (AM-Parser) or fail to generalize structurally (Tran

#open-source #collaboration #community Read on arxiv →

arxivApr 27bullish

Towards Safe Mobility: A Unified Transportation Foundation Model enabled by Open-Ended Vision-Language Dataset

arXiv:2604.22260v1 Announce Type: cross Abstract: Urban transportation systems face growing safety challenges that require scalable intelligence for emerging smart mobility infrastructures. While recent advances in foundation models and large-scale multimodal datasets have strengthened perception an

UN1 model #open-source #dataset #computer-vision Read on arxiv →

arxivApr 22

Do Agents Dream of Root Shells? Partial-Credit Evaluation of LLM Agents in Capture The Flag Challenges

arXiv:2604.19354v1 Announce Type: new Abstract: Large Language Model (LLM) agents are increasingly proposed for autonomous cybersecurity tasks, but their capabilities in realistic offensive settings remain poorly understood. We present DeepRed, an open-source benchmark for evaluating LLM-based agent

LL1 model #cybersecurity #benchmark #open-source Read on arxiv →

arxivApr 13

A novel hybrid approach for positive-valued DAG learning

arXiv:2604.08935v1 Announce Type: cross Abstract: Causal discovery from observational data remains a fundamental challenge in machine learning and statistics, particularly when variables represent inherently positive quantities such as gene expression levels, asset prices, company revenues, or popul

#open-source #collaboration #community Read on arxiv →

arxivApr 4bullish

J-CHAT: Japanese Large-scale Spoken Dialogue Corpus for Spoken Dialogue Language Modeling

arXiv:2407.15828v2 Announce Type: replace Abstract: Spoken dialogue is essential for human-AI interactions, providing expressive capabilities beyond text. Developing effective spoken dialogue systems (SDSs) requires large-scale, high-quality, and diverse spoken dialogue corpora. However, existing da

#open-source #dataset #speech Read on arxiv →

arxivApr 4bullish

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging

arXiv:2604.01538v1 Announce Type: new Abstract: Large language models have been adopted in the medical domain for clinical documentation to reduce clinician burden. However, studies have reported that LLMs often "forget" a significant amount of instruction-following ability when fine-tuned using a t

GAME2 models #open-source #clinical #domain-adaptation Read on arxiv →

arxivApr 3

A Safety-Aware Role-Orchestrated Multi-Agent LLM Framework for Behavioral Health Communication Simulation

arXiv:2604.00249v1 Announce Type: new Abstract: Single-agent large language model (LLM) systems struggle to simultaneously support diverse conversational functions and maintain safety in behavioral health communication. We propose a safety-aware, role-orchestrated multi-agent LLM framework designed

#research #open-source #collaboration Read on arxiv →

arxivApr 3

(PAC-)Learning state machines from data streams: A generic strategy and an improved heuristic (Extended version)

arXiv:2604.02244v1 Announce Type: cross Abstract: This is an extended version of our publication Learning state machines from data streams: A generic strategy and an improved heuristic, International Conference on Grammatical Inference (ICGI) 2023, Rabat, Morocco. It has been extended with a formal

#state-machines #machine-learning #open-source Read on arxiv →

arxivApr 3bullish

Open, Reliable, and Collective: A Community-Driven Framework for Tool-Using AI Agents

arXiv:2604.00137v1 Announce Type: new Abstract: Tool-integrated LLMs can retrieve, compute, and take real-world actions via external tools, but reliability remains a key bottleneck. We argue that failures stem from both tool-use accuracy (how well an agent invokes a tool) and intrinsic tool accuracy

#reliability #benchmark #open-source Read on arxiv →

arxivApr 1bullish

A Multi-Agent Rhizomatic Pipeline for Non-Linear Literature Analysis

arXiv:2603.28336v2 Announce Type: replace Abstract: Systematic literature reviews in the social sciences overwhelmingly follow arborescent logics -- hierarchical keyword filtering, linear screening, and taxonomic classification -- that suppress the lateral connections, ruptures, and emergent pattern

ALRHLA3 models #open-source #literature-review #complexity Read on arxiv →

Tag

#open-source

43 articles tagged #open-source

arxiv4d ago

Incomplete Prompt Jailbreaks in Large Language Models

#safety #open-source #language-models Read on arxiv →

arxiv4d ago

From Static Bibliometrics to Dynamic Knowledge Graphs: An LLM-Powered Framework for Modernizing Science, Technology, and Innovation (STI) Analytics

#open-source #collaboration #community Read on arxiv →

arxiv5d ago

Are Attributions of Consciousness to AI Chatbots Epistemically Innocent?

#open-source #community #collaboration Read on arxiv →

techcrunch5d ago

Arcee, a US open source AI lab, says Chinese models are not inherently dangerous

As Chinese AI models grow in capability and popularity among U.S. companies, the arguing over what should be done about them has reached a fever pitch.

KIQW2 models #open-source #security #regulation Read on techcrunch →

arxivJul 21bullish

From Language to Navigation Goals: A Vision-Language Approach for Semantic Navigation of Mobile Robots Using RGB-D Perception

#robotics #navigation #open-source Read on arxiv →

arxivJul 14bullish

Countering Catastrophic Forgetting of Large Language Models for Better Instruction Following via Weight-Space Model Merging

GAME2 models #open-source #clinical #domain-adaptation Read on arxiv →

arxivApr 3