Model Detail
nomic-embed-text-v1.5
▼ 0.6%nomic-embed-text-v1.5 is a code generation model with 68M parameters released by nomic-ai. The model is registered under the sentence-similarity pipeline tag on Hugging Face, distributed under the permissive apache-2.0 license.
nomic-embed-text-v1.5 ships with 68M parameters. The apache-2.0 license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
Downloads of nomic-embed-text-v1.5 have moved -0.6% over the past 24 hours. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
nomic-embed-text-v1.5 is best fit for code completion, repository-scale Q&A, and pair-programming integrations. It is a less obvious choice for one-shot generation of security-critical code without review. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?
arXiv:2606.04592v1 Announce Type: cross Abstract: LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and inte
BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format
arXiv:2509.02655v3 Announce Type: replace-cross Abstract: Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utility maximisers that over-optimise a proxy objective (e.g., "paperclip maximiser", specification gaming) at the expense of everything else. LLM-based sys
$p$-adic Bi-Filtrations for Topological Machine Learning on Genomic Sequences
arXiv:2606.06117v1 Announce Type: cross Abstract: We introduce pVR, a topological machine learning framework for alignment-free genomic sequence classification that combines $p$-adic numbers with topological data analysis. Each DNA sequence is encoded along two complementary axes: a $p$-adic distanc
LDARNet: DNA Adaptive Representation Network with Learnable Tokenization for Genomic Modeling
arXiv:2606.04552v1 Announce Type: new Abstract: Genomic foundation models increasingly adopt large language model architectures, yet almost universally rely on fixed tokenization schemes such as $k$-mers, BPE, or single nucleotides, which impose arbitrary sequence boundaries that may obscure biologi
GENEB: Why Genomic Models Are Hard to Compare
arXiv:2606.04525v1 Announce Type: new Abstract: Progress in genomic foundation models is difficult to assess due to fragmented benchmarks, incompatible evaluation protocols, and task-specific reporting. As a result, claims of superiority or generality across models are often not directly comparable.
The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs
arXiv:2606.03092v1 Announce Type: new Abstract: Inference-time scaling has emerged as a critical avenue for enhancing Large Language Models' performance, yet real-world deployment is constrained by strict computational budgets. In this work, we formulate inference budget allocation as a global const