arxiv2d ago
arXiv:2605.25645v2 Announce Type: replace-cross Abstract: We present the first end-to-end demonstration of fine-tuning and serving Google's Gemma 4 31B model on TPU hardware, providing an empirical comparison of TPU and GPU platforms for large language model adaptation. Using LoRA on a Google TPU v5
arxivMay 20
arXiv:2605.00333v2 Announce Type: replace-cross Abstract: Frozen Gemma 4 31B weights pretrained exclusively on text, unmodified, transfer through a thin trainable interface to non-text modalities the substrate has never processed. On the L24--L29 slice (192 attention heads), an English-text TxtCopy
arxivMay 7
arXiv:2605.05159v1 Announce Type: new Abstract: We present our system for SemEval-2026 Task 9: Multilingual Polarization Detection, a binary classification task spanning 22 languages. Our approach fine-tunes separate Gemma~3 models (12B and 27B parameters) per language using Low-Rank Adaptation (LoR
arxivMay 6
arXiv:2604.05081v2 Announce Type: replace Abstract: We introduce MedGemma 1.5 4B, the latest model in the MedGemma collection. MedGemma 1.5 expands on MedGemma 1 by integrating additional capabilities: high-dimensional medical imaging (CT/MRI volumes and histopathology whole slide images), anatomica
arxivApr 29
arXiv:2604.24070v1 Announce Type: cross Abstract: Small instruct-tuned LLMs produce degenerate verbal confidence under minimal elicitation: ceiling rates above 95%, near-chance Type-2 AUROC, and Invalid validity profiles. We test whether confidence-conditioned supervised fine-tuning (CSFT) with self
huggingfaceApr 22
arxivApr 11
arXiv:2604.07490v1 Announce Type: new Abstract: Representation learning for geospatial and spatio-temporal data plays a critical role in enabling general-purpose geospatial intelligence. Recent geospatial foundation models, such as the Population Dynamics Foundation Model (PDFM), encode complex popu
arxivApr 9
arXiv:2604.07035v1 Announce Type: new Abstract: Mixture-of-experts (MoE) language models are often expected to offer better quality-efficiency tradeoffs than dense models because only a subset of parameters is activated per token, but the practical value of that advantage depends on end-to-end behav
arxivApr 8
arXiv:2507.05201v4 Announce Type: replace Abstract: Artificial intelligence (AI) has significant potential in healthcare applications, but its training and deployment faces challenges due to healthcare's diverse data, complex tasks, and the need to preserve privacy. Foundation models that perform we
huggingfaceApr 2
huggingfaceSep 4
huggingfaceJun 26
huggingfaceMar 12
huggingfaceFeb 19
huggingfaceDec 5
huggingfaceJul 31
huggingfaceJun 27
huggingfaceMay 14
huggingfaceApr 9
huggingfaceFeb 23
huggingfaceFeb 21