DataBubble·

Model Detail

Llama-3.3-70B-Instruct

▼ 2.0%

Provider: MetaCategory: llmPipeline: text-generationParameters: 70B

DB Score

34.0

Downloads

693K

Likes

GitHub Stars

29K

Day

-2.0%

Week

-0.2%

Month

+0.0%

Overview

Llama-3.3-70B-Instruct is a large language model with 70B parameters released by Meta. The model is registered under the text-generation pipeline tag on Hugging Face, and supports text->text inputs, released under the llama3.3 license.

Performance

Open-LLM-Leaderboard scoring places it at MMLU-Pro 48, GPQA 11, IFEval 90, BBH 57, giving a sense of how it handles instruction following, reasoning, and graduate-level QA in absolute terms.

How we score this →

Pricing & Throughput

Llama-3.3-70B-Instruct is priced at $0.71/M input tokens and $0.71/M output tokens. Operationally the model offers a 128K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. Pricing in this range is the working middle of the API market — neither the cheapest nor the most expensive option per token, so cost-fit is usually a function of how much output you generate.

Technical

Llama-3.3-70B-Instruct ships as a LlamaForCausalLM / 💬 chat models (RLHF, DPO, IFT, ...) architecture with 70B parameters. The published knowledge cutoff is 2023-12-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 70.6 GB, which is the relevant figure when planning local-inference VRAM. Access is gated on Hugging Face under the llama3.3 license, which means a manual approval step before weights can be downloaded.

Trending Signal

Downloads of Llama-3.3-70B-Instruct have moved -2.0% over the past 24 hours, -0.2% over the trailing seven days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

Llama-3.3-70B-Instruct is best fit for general-purpose chat and instruction-following workloads. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Pricing

Input ($/M tokens)

$0.71

Output ($/M tokens)

$0.71

Context Window

128K

Research Paper

arXiv: 2204.05149→

Benchmark Scores

IFEval

90.0

BBH

56.6

GPQA

10.5

MMLU-Pro

48.1

MATH

48.3

MUSR

15.6

Average

44.8

Model Info

Licensellama3.3

ArchitectureLlamaForCausalLM

Type💬 chat models (RLHF, DPO, IFT, ...)

Modalitytext->text

Knowledge Cutoff2023-12-31

Citations16,069 (3016 influential)

Recent newsView all news →

The Capacity of Thought: Benchmarking Llama 3.2 in Semantic fMRI Neural Language Decoding and Improving the Huth Encoding-Model Baseline

arXiv:2607.12079v1 Announce Type: new Abstract: Decoding continuous language from fMRI signals remains a core challenge in non-invasive brain-computer interface research. We present two complementary investigations. First, we improve the Huth et al. ridge regression encoding pipeline through expande

techcrunchneutral12d ago

Popular open source AI developer tool Ollama raises $65M, grows to nearly 9M users

Benchmark-backed Ollama has amassed 176,000 stars, and nearly 17,000 forks on GitHub by helping developers easily run AI on their PCs.

arxivneutral35d ago

Frame-Conditioned Moral Computation in LLaMA 3.1-8B-Instruct: A Mechanistic Interpretability Audit of Ethical Reasoning

arXiv:2606.15507v1 Announce Type: new Abstract: Behavioral audits of Large Language Models on moral prompts measure what the model says, not the internal computation producing it. We use Transluce, an AI-driven mechanistic-interpretability platform, to examine LLaMA 3.1-8B-Instruct on 54 moral promp

arxivneutral36d ago

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

arXiv:2512.22671v3 Announce Type: replace-cross Abstract: Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance o

arxivneutral39d ago

A Quantitative Experimental Repeated Measures Study of Training Dynamics in a Small Llama Style Language Model Under a Compute-Aware Token Budget

arXiv:2606.13370v1 Announce Type: new Abstract: This study examines training dynamics in a small Llama-style language model trained under a fixed, compute-constrained token budget. Rather than evaluating efficiency solely through endpoint performance, the study uses a quantitative experimental repea

arxiv41d ago

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

arXiv:2606.10327v1 Announce Type: new Abstract: Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat these in isolation, harming coherence and generalization. We investigate task-aware fine-tuning of L

Related Models

Llama-3.2-1B-Instruct

Meta · 8.6M downloads

Llama-3.1-8B-Instruct

Meta · 8.2M downloads

bert-base-uncased

google-bert · 69.6M downloads

paraphrase-multilingual-MiniLM-L12-v2

SBERT · 48.6M downloads

DataBubble·

Model Detail

Llama-3.3-70B-Instruct

▼ 2.0%

Provider: MetaCategory: llmPipeline: text-generationParameters: 70B

DB Score

34.0

Downloads

693K

Likes

GitHub Stars

29K

Day

-2.0%

Week

-0.2%

Month

+0.0%

Overview

Performance

Open-LLM-Leaderboard scoring places it at MMLU-Pro 48, GPQA 11, IFEval 90, BBH 57, giving a sense of how it handles instruction following, reasoning, and graduate-level QA in absolute terms.

How we score this →

Pricing & Throughput

Technical

Trending Signal

Read about databubble_score →

Use Cases

Download History

Pricing

Input ($/M tokens)

$0.71

Output ($/M tokens)

$0.71