arxiv2d ago
arXiv:2606.02998v1 Announce Type: new Abstract: Automated cough analysis offers a path to low-cost respiratory screening, but most existing work stops at binary COVID-19 detection. A practical tool needs to tell apart several respiratory conditions from one cough recording on a consumer smartphone.
arxiv2d ago
arXiv:2606.03576v1 Announce Type: new Abstract: Scholarly text classification supports literature organization, subject indexing, and research intelligence, but Chinese scholarly corpora often contain imbalanced and semantically adjacent disciplinary labels. We propose AutoTail-BSFGM, a class-balanc
arxiv2d ago
arXiv:2606.02604v1 Announce Type: cross Abstract: ESG and climate risk data remain fragmented across heterogeneous Scope 1, Scope 2, and Scope 3 reporting environments, while conventional validation pipelines lack provenance aware auditability, hidden drift detection, and reproducibility oriented go
arxiv2d ago
arXiv:2606.02979v1 Announce Type: cross Abstract: We present a novel compact deep multi-task learning model to handle various autonomous driving perception tasks in one forward pass. The model performs multiple views of semantic segmentation, depth estimation, light detection and ranging (LiDAR) seg
arxiv3d ago
arXiv:2606.02545v1 Announce Type: new Abstract: Self-harm is a major public health concern, but current surveillance relying on hospital presentations is inadequate due to the low sensitivity of diagnostic codes. Emergency Department (ED) triage notes, recorded at the initial point of contact, provi
arxiv3d ago
arXiv:2606.00647v1 Announce Type: cross Abstract: Detecting psychological defense mechanisms in conversational text remains a challenging clinical NLP problem. For the PsyDefDetect 2026 shared task (nine-class utterance classification evaluated via macro F1), our team LinguIUTics achieves a macro F1
arxiv3d ago
arXiv:2602.23161v4 Announce Type: replace Abstract: Time series reasoning demands both the perception of complex dynamics and logical depth. However, existing LLM-based approaches exhibit two limitations: they often treat time series merely as text or images, failing to capture the patterns like tre
arxiv3d ago
arXiv:2606.01329v1 Announce Type: new Abstract: We show that computing the log-partition function (free-energy) of conditioned inhomogeneous Curie--Weiss spin Hamiltonians reduces to an unbalanced $2 \to 1$ norm computation, and design a polynomial-time SDP algorithm for this problem with a lower bo
arxiv3d ago
arXiv:2605.17839v3 Announce Type: replace-cross Abstract: Knowledge distillation transfers knowledge from a high capacity teacher to a compact student using a mixture of hard and soft losses. On imbalanced data, a fixed weighting between hard and soft losses becomes brittle the learning process. Rec
arxiv3d ago
arXiv:2606.00298v1 Announce Type: cross Abstract: Data-driven reduced-order modeling is an essential component in the computer-aided design of control systems. In this work, we present a novel symmetric Hermite formulation of the quadrature-based balanced truncation algorithm that constructs linear
arxiv3d ago
arXiv:2412.16209v5 Announce Type: replace Abstract: When using machine learning for imbalanced binary classification problems, it is common to subsample the majority class to create a (more) balanced training dataset. This biases the model's predictions because the model learns from data that is not
arxiv3d ago
arXiv:2606.01883v1 Announce Type: new Abstract: Open-set recognition (OSR) requires a classifier to reject inputs from unseen classes which is essential in safety-critical settings such as medical imaging. Simplex based methods, which fix class prototypes at the vertices of a regular simplex and the
arxiv3d ago
arXiv:2602.22101v3 Announce Type: replace-cross Abstract: Many real-world applications generate continuous data streams for regression. Hoeffding trees and their variants have a long-standing tradition due to their effectiveness, either alone or as base models in broader ensembles. Recent batch-lear
arxiv3d ago
arXiv:2606.01221v1 Announce Type: cross Abstract: Imbalanced learning is a critical challenge in machine learning, where underrepresented target values can bias models and degrade prediction performance on rare but important cases. Although extensively studied in classification, imbalanced regressio
arxiv4d ago
arXiv:2605.30487v1 Announce Type: new Abstract: Aligning large language models (LLMs) to heterogeneous and rapidly evolving safety requirements remains a critical challenge. Existing instruction-tuned LLMs and standalone safety classifiers often fail to generalize to new safety configurations, motiv
arxiv4d ago
arXiv:2605.31484v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) is the most widely adopted method for fine-tuning large language models. Notably, LoRA is inherently overparameterized: multiple pairs of low-rank factors can yield the same adapted weight matrix. We show--both theoretically
arxiv4d ago
arXiv:2605.31452v1 Announce Type: new Abstract: Building on our previous work, this paper develops practical, low-barrier methods for freelance translators and smaller language service providers to evaluate translation technologies using rigorous yet accessible analytic methods. Here we address a hi
arxivMay 29
arXiv:2605.30135v1 Announce Type: cross Abstract: Various algorithms have been proposed to address the challenges posed by class-imbalanced learning from real-world data with long-tailed distributions. While these algorithms reduce prediction bias through rebalancing techniques, they often introduce
arxivMay 29
arXiv:2603.07916v2 Announce Type: replace Abstract: In recent advances, to enable a fully data-driven learning paradigm on relational databases (RDB), relational deep learning (RDL) is proposed to structure the RDB as a heterogeneous entity graph and adopt the graph neural network (GNN) as the predi
arxivMay 29
arXiv:2605.29121v1 Announce Type: cross Abstract: We propose a minimal dynamical model of adaptive softmax routing for a two-expert Mixture-of-Experts (MoE) layer. The model is obtained as a mean-field limit of a discrete reinforcement rule: the selected expert receives a small score increment, whil
arxivMay 29
arXiv:2603.00454v2 Announce Type: replace-cross Abstract: Generative Flow Networks (GFlowNets) enable fine-tuning large language models to approximate reward-proportional posteriors, but they remain prone to mode collapse, manifesting as prefix collapse and length bias. We attribute this to two fact
arxivMay 29
arXiv:2605.00553v2 Announce Type: replace Abstract: Large Language Model (LLM) Red-Teaming, which proactively identifies vulnerabilities of LLMs, is an essential process for ensuring safety. Finding effective and diverse attacks in red-teaming is important, but achieving both is challenging. Generat
arxivMay 28
arXiv:2605.28745v1 Announce Type: new Abstract: Prediction markets such as Polymarket aggregate crowd beliefs into real-time probability estimates, and the comments traders post beneath each market contain rich directional stance signals that prices alone cannot capture. This work introduces the fir
arxivMay 28
arXiv:2605.28440v1 Announce Type: new Abstract: DPO has become a widely adopted alternative to RLHF for aligning LLMs with human preferences, eliminating the need for a separate reward model or RL loop. Recent theoretical analysis uncovers an asymmetric gradient behavior in DPO: the loss suppresses
arxivMay 28
arXiv:2605.28109v1 Announce Type: new Abstract: Recent advances in online reinforcement learning (RL) for large language models (LLMs) have demonstrated promising performance in complex reasoning tasks. However, they often exhibit an imbalanced exploration-exploitation trade-off, resulting in unstab
arxivMay 27
arXiv:2605.26166v1 Announce Type: cross Abstract: The rapid proliferation of Internet of Things (IoT) devices has created an urgent demand for adaptive, resource-efficient Intrusion Detection Systems (IDS) capable of handling dynamic and evolving cyber threats. This paper investigates AOC-IDS, a sta
arxivMay 27
arXiv:2601.07085v2 Announce Type: replace-cross Abstract: Large language model (LLM)-based conversational AI systems present a challenge to human cognition that current frameworks for understanding misinformation and persuasion do not adequately address. This paper proposes that a significant episte
arxivMay 27
arXiv:2512.21602v3 Announce Type: replace Abstract: Every year, millions of patients pass through emergency departments and intensive care units, where clinicians must make high-stakes decisions under time pressure and uncertainty. Machine learning could support prediction of deterioration, triage,
arxivMay 27
arXiv:2605.26579v1 Announce Type: new Abstract: The open-ended generation in LLMs usually requires multi-dimensional rubrics to adequately assess quality and guide the improvement of reinforcement learning. However, a critical dilemma inherent in this training paradigm is the imbalanced reward polar
arxivMay 27
arXiv:2605.22904v2 Announce Type: replace-cross Abstract: Understanding and monitoring human behavior in metro stations play an important role in supporting suicide prevention efforts, where early identification of high-risk situations can enable timely intervention. This requires assessing suicide
arxivMay 26
arXiv:2605.24779v1 Announce Type: cross Abstract: Submodular optimization has become a fundamental paradigm for data selection, retrieval, summarization, and representation learning due to its ability to model coverage, diversity, and representativeness. However, classical submodular objectives opti
arxivMay 26
arXiv:2605.24908v1 Announce Type: cross Abstract: Class imbalance in deep neural networks (DNNs) has witnessed a rapid increase in research attention in recent years. However, the varying accounts of the reasons behind the poor performance of DNN on imbalance data in pertinent literature shows that
arxivMay 26
arXiv:2011.10254v3 Announce Type: replace-cross Abstract: Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often ha
arxivMay 26
arXiv:2603.05143v3 Announce Type: replace Abstract: Understanding reasoning in large language models is complicated by evaluations that conflate multiple reasoning types. We isolate analogical reasoning, where a model transfers an attribute between entities that share known properties, and study whe
arxivMay 26
arXiv:2506.17326v3 Announce Type: replace Abstract: Class imbalance remains a practical obstacle in the development of clinical prediction models for conditions such as diabetes mellitus, where the number of confirmed cases is often much smaller than the number of controls. The Synthetic Minority Ov
arxivMay 26
arXiv:2512.23995v2 Announce Type: replace-cross Abstract: Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practice, modern inference systems commonly adopt expert p
arxivMay 26
arXiv:2602.19333v2 Announce Type: replace Abstract: This research introduces the first large-scale, well-balanced Persian social media text classification dataset, specifically designed to address the lack of comprehensive resources in this domain. The dataset comprises 36,000 posts across nine cate
arxivMay 26
arXiv:2605.23964v1 Announce Type: cross Abstract: The growing share of Renewable Energy Sources (RES) in modern power systems increases both grid imbalances and frequency deviations, reinforcing the need for ancillary services such as Frequency Containment Reserve (FCR) and passive balancing. Batter
arxivMay 26
arXiv:2605.25605v1 Announce Type: cross Abstract: In the past decade, numerous studies have applied deep neural networks (DNNs) to decode auditory attention (AAD) from Electroencephalogram (EEG) signals via stimulus reconstruction. However, the influence of dataset balance on the decoding performanc
arxivMay 25
arXiv:2605.23453v1 Announce Type: new Abstract: We conducted a reproducibility-oriented re-evaluation of prior migraine classification studies, correcting for data leakage and metric bias. We then introduced (i) a clinically motivated aggregation of two hemiplegic subtypes following ICHD-3 {\S}1.2.3
arxivMay 25
arXiv:2605.23378v1 Announce Type: cross Abstract: Ambulance response is time-critical in out-of-hospital cardiac arrest (OHCA), where dispatchers must balance timely arrivals with limited fleet capacity. Static territories and deterministic travel-time estimates are vulnerable to dynamic congestion,
arxivMay 25
arXiv:2603.17879v2 Announce Type: replace-cross Abstract: This work presents a multi-label temporal event detection framework for video capsule endoscopy (VCE) that addresses the extreme class imbalance inherent in the Galar dataset by combining two principal contributions: an Angular Separation Los
arxivMay 22
arXiv:2605.21565v1 Announce Type: new Abstract: Multimodal Emotion Recognition in Conversations (MERC) is a crucial task for understanding human interactions, where multimodal approaches integrating language, facial expressions, and vocal tone have achieved significant progress. However, modality mi
arxivMay 22
arXiv:2511.04838v2 Announce Type: replace Abstract: Molecular property regression struggles with cases in chemically relevant target ranges that are underrepresented in datasets. Standard average error minimization approaches underperform in these highly relevant cases, and oversampling approaches l
arxivMay 22
arXiv:2605.21507v1 Announce Type: cross Abstract: Atmospheric visibility is a critical variable for transportation safety and air quality management, however, accurate prediction remains challenging due to the complex interactions between meteorological conditions and air pollutants, as well as the
arxivMay 22
arXiv:2605.18678v2 Announce Type: replace-cross Abstract: We present Lance, a lightweight native unified model supporting multimodal understanding, generation, and editing for both images and videos. Rather than relying on model capacity scaling or text-image-dominant designs, Lance explores a pract
arxivMay 22
arXiv:2605.20405v1 Announce Type: cross Abstract: Class imbalance is a fundamental challenge in medical image segmentation, where frequent classes typically dominate training at the expense of rare classes. Loss-based approaches mitigate imbalance by reweighting the per-pixel loss within the batch,
arxivMay 22
arXiv:2605.21742v1 Announce Type: new Abstract: Prior-data fitted networks (PFNs) have achieved exceptional performance on tabular classification tasks. However, like other classifiers, their performance can suffer under the effect of class imbalance, resulting in poor performance for rare classes.