DataBubble·

Model Detail

DeepSeek-R1

▲ 2.8%

Provider: DeepSeekCategory: codePipeline: text-generation

DB Score

1.1

Downloads

8.6M

Likes

13K

GitHub Stars

92K

Day

+2.8%

Week

+0.0%

Month

+0.0%

Overview

DeepSeek-R1 is a code generation model with 342.3B parameters released by DeepSeek. The model is registered under the text-generation pipeline tag on Hugging Face, and supports text->text inputs, distributed under the permissive mit license.

Pricing & Throughput

DeepSeek-R1 is priced at $1.35/M input tokens and $5.4/M output tokens. Operationally the model offers a 164K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. Pricing in this range is the working middle of the API market — neither the cheapest nor the most expensive option per token, so cost-fit is usually a function of how much output you generate.

Technical

DeepSeek-R1 ships with 342.3B parameters. The published knowledge cutoff is 2024-07-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 684.5 GB, which is the relevant figure when planning local-inference VRAM. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Trending Signal

Downloads of DeepSeek-R1 have moved +2.8% over the past 24 hours. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

DeepSeek-R1 is best fit for code completion, repository-scale Q&A, and pair-programming integrations. It is a less obvious choice for one-shot generation of security-critical code without review. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Pricing

Input ($/M tokens)

$1.35

Output ($/M tokens)

$5.4

Context Window

164K

Research Paper

arXiv: 2501.12948→

Model Info

Licensemit

Modalitytext->text

Knowledge Cutoff2024-07-31

Citations5,476 (932 influential)

Recent newsView all news →

Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune

arXiv:2606.10392v1 Announce Type: new Abstract: Financial named-entity recognition (NER) is essential for translating unstructured financial reports and news into structured knowledge graphs. However, general-purpose large language models (LLMs) often misclassify financial entities or ignore domain-

huggingface539d ago

Open-R1: a fully open reproduction of DeepSeek-R1

arxiv9h ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

arXiv:2606.09079v3 Announce Type: replace-cross Abstract: Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose \textbf{Lookahead Sparse Attention (LSA)}, a novel inference paradigm powered b

arxivneutral31d ago

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

arXiv:2606.19348v1 Announce Type: cross Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a

arxiv56d ago

DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading

arXiv:2605.25527v1 Announce Type: new Abstract: This paper studies reinforcement learning for high-frequency trading on limit order books by pairing an Order-Flow-based state model with policy-gradient methods. Instead of value-based RL techniques like tabular Q-learning, our approach deploys policy

arxiv56d ago

SoK: A Comprehensive Security Analysis of Jailbreak Resilience in GPT and DeepSeek Models

arXiv:2506.18543v2 Announce Type: replace-cross Abstract: The rapid proliferation of Large Language Models (LLMs) has heightened concerns regarding their exposure to jailbreak attacks, which craft adversarial inputs designed to elicit unsafe content. Although proprietary models such as GPT-4 have be

Related Models

DeepSeek-V3.2

DeepSeek · 11.2M downloads

DeepSeek-OCR

DeepSeek · 3.0M downloads

all-MiniLM-L6-v2

SBERT · 245.7M downloads

nomic-embed-text-v1.5

nomic-ai · 17.1M downloads