Model Detail
DeepSeek-R1
—DeepSeek-R1 is a code generation model with 342.3B parameters released by DeepSeek. The model is registered under the text-generation pipeline tag on Hugging Face, and supports text->text inputs, distributed under the permissive mit license.
DeepSeek-R1 is priced at $1.35/M input tokens and $5.4/M output tokens. Operationally the model offers a 164K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. Pricing in this range is the working middle of the API market — neither the cheapest nor the most expensive option per token, so cost-fit is usually a function of how much output you generate.
DeepSeek-R1 ships with 342.3B parameters. The published knowledge cutoff is 2024-07-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 684.5 GB, which is the relevant figure when planning local-inference VRAM. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.
Downloads of DeepSeek-R1 have moved +16.4% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
DeepSeek-R1 is best fit for code completion, repository-scale Q&A, and pair-programming integrations. It is a less obvious choice for one-shot generation of security-critical code without review. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
Open-R1: a fully open reproduction of DeepSeek-R1
SoK: A Comprehensive Security Analysis of Jailbreak Resilience in GPT and DeepSeek Models
arXiv:2506.18543v2 Announce Type: replace-cross Abstract: The rapid proliferation of Large Language Models (LLMs) has heightened concerns regarding their exposure to jailbreak attacks, which craft adversarial inputs designed to elicit unsafe content. Although proprietary models such as GPT-4 have be
DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading
arXiv:2605.25527v1 Announce Type: new Abstract: This paper studies reinforcement learning for high-frequency trading on limit order books by pairing an Order-Flow-based state model with policy-gradient methods. Instead of value-based RL techniques like tabular Q-learning, our approach deploys policy
RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference
arXiv:2605.00392v3 Announce Type: replace-cross Abstract: DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventi
DeepSeek could hit $45B valuation from its first investment round
The Chinese AI lab came to prominence in early 2025 after launching a large language model that trained on a fraction of the compute power and at a fraction of the cost of the big U.S. models like those from OpenAI and Anthropic.
Refining and Reusing Annotation Guidelines for LLM Annotation
arXiv:2605.20809v1 Announce Type: new Abstract: While Large Language Models (LLMs) demonstrate remarkable performance on zero-shot annotation tasks, they often struggle with the specialized conventions of gold-standard benchmarks. We propose the systematic reuse and refinement of annotation guidelin