DataBubble·

Model Detail

DeepSeek-OCR

—

Provider: DeepSeekCategory: codePipeline: image-text-to-text

DB Score

4.7

Downloads

3.0M

Likes

Day

+0.0%

Week

+0.0%

Month

+0.0%

Overview

DeepSeek-OCR is a code generation model with 1.7B parameters released by DeepSeek. The model is registered under the image-text-to-text pipeline tag on Hugging Face, distributed under the permissive mit license.

Technical

DeepSeek-OCR ships with 1.7B parameters. Total weight footprint is approximately 3.3 GB, which is the relevant figure when planning local-inference VRAM. The mit license is permissive, allowing commercial deployment and derivative work without per-seat fees, though attribution requirements still apply.

Use Cases

DeepSeek-OCR is best fit for code completion, repository-scale Q&A, and pair-programming integrations. It is a less obvious choice for one-shot generation of security-critical code without review. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Research Paper

arXiv: 2401.02954→

Model Info

Licensemit

Citations768 (69 influential)

Recent newsView all news →

RTPrune: Reading-Twice Inspired Token Pruning for Efficient DeepSeek-OCR Inference

arXiv:2605.00392v3 Announce Type: replace-cross Abstract: DeepSeek-OCR leverages visual-text compression to reduce long-text processing costs and accelerate inference, yet visual tokens remain prone to redundant textual and structural information. Moreover, current token pruning methods for conventi

arxivneutral90d ago

Fine-tuning DeepSeek-OCR-2 for Molecular Structure Recognition

arXiv:2604.03476v2 Announce Type: replace-cross Abstract: Optical Chemical Structure Recognition (OCSR) is critical for converting 2D molecular diagrams from printed literature into machine-readable formats. While Vision-Language Models have shown promise in end-to-end OCR tasks, their direct applic

arxiv9h ago

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

arXiv:2606.09079v3 Announce Type: replace-cross Abstract: Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose \textbf{Lookahead Sparse Attention (LSA)}, a novel inference paradigm powered b

arxivneutral31d ago

DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

arXiv:2606.19348v1 Announce Type: cross Abstract: We present a preview version of DeepSeek-V4 series, including two strong Mixture-of-Experts (MoE) language models -- DeepSeek-V4-Pro with 1.6T parameters (49B activated) and DeepSeek-V4-Flash with 284B parameters (13B activated) -- both supporting a

arxiv41d ago

Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune

arXiv:2606.10392v1 Announce Type: new Abstract: Financial named-entity recognition (NER) is essential for translating unstructured financial reports and news into structured knowledge graphs. However, general-purpose large language models (LLMs) often misclassify financial entities or ignore domain-

arxiv56d ago

DeepSeekMath Meets Order Book: Group-Aware Policy Optimization for High-Frequency Directional Trading

arXiv:2605.25527v1 Announce Type: new Abstract: This paper studies reinforcement learning for high-frequency trading on limit order books by pairing an Order-Flow-based state model with policy-gradient methods. Instead of value-based RL techniques like tabular Q-learning, our approach deploys policy

Related Models

DeepSeek-V3.2

DeepSeek · 11.2M downloads

DeepSeek-R1

DeepSeek · 7.8M downloads

all-MiniLM-L6-v2

SBERT · 246.4M downloads

nomic-embed-text-v1.5

nomic-ai · 17.1M downloads