DataBubble·

Model Detail

Llama-3.1-8B-Instruct

—

Provider: MetaCategory: llmPipeline: text-generationParameters: 8B

DB Score

16.6

Downloads

10.7M

Likes

Day

+0.0%

Week

+0.0%

Month

+3.8%

Overview

Llama-3.1-8B-Instruct is a large language model with 8B parameters released by Meta. The model is registered under the text-generation pipeline tag on Hugging Face, and supports text->text inputs, released under the llama3.1 license.

Performance

Open-LLM-Leaderboard scoring places it at MMLU-Pro 31, GPQA 9, IFEval 49, BBH 29, giving a sense of how it handles instruction following, reasoning, and graduate-level QA in absolute terms.

How we score this →

Pricing & Throughput

Llama-3.1-8B-Instruct is priced at $0.1/M input tokens and $0.1/M output tokens. Operationally the model offers a 131K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. At this input rate the model sits in the commodity tier and is suitable for high-volume workloads where per-call cost dominates the decision.

Technical

Llama-3.1-8B-Instruct ships as a LlamaForCausalLM / 🟢 pretrained architecture with 8B parameters. The published knowledge cutoff is 2023-12-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 8.0 GB, which is the relevant figure when planning local-inference VRAM. Access is gated on Hugging Face under the llama3.1 license, which means a manual approval step before weights can be downloaded.

Trending Signal

Downloads of Llama-3.1-8B-Instruct have moved +3.8% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.

Read about databubble_score →

Use Cases

Llama-3.1-8B-Instruct is best fit for general-purpose chat and instruction-following workloads, and high-volume batch jobs where per-call cost dominates the budget. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.

Download History

Pricing

Input ($/M tokens)

$0.1

Output ($/M tokens)

$0.1

Context Window

131K

Research Paper

arXiv: 2204.05149→

Benchmark Scores

IFEval

49.2

BBH

29.4

GPQA

8.7

MMLU-Pro

31.1

MATH

15.6

MUSR

8.6

Average

23.8

Model Info

Licensellama3.1

ArchitectureLlamaForCausalLM

Type🟢 pretrained

Modalitytext->text

Knowledge Cutoff2023-12-31

Citations15,603 (2948 influential)

Recent newsView all news →

LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics

arXiv:2601.18685v3 Announce Type: replace-cross Abstract: The capabilities of generative AI in mathematics education are rapidly evolving, posing significant challenges for research to keep pace. Research syntheses remain scarce and risk being outdated by the time of publication. To address this iss

arxivneutral14d ago

Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU

arXiv:2605.20706v1 Announce Type: cross Abstract: Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportuni

arxiv16d ago

From Llama to Cria: Scaling Down Neural Networks via Neuron-Level Spectral Structural Importance Evaluation

arXiv:2605.18860v1 Announce Type: new Abstract: This paper proposes a neuron pruning framework based on neuron-level spectral structural importance evaluation. Given a trained neural network, we record the hidden states of each hidden layer during inference and model neurons as graph nodes, with hid

arxiv29d ago

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

arXiv:2512.22671v2 Announce Type: replace Abstract: Structured width pruning of GLU-MLP layers, guided by the Maximum Absolute Weight (MAW) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parame

arxiv30d ago

Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts

arXiv:2605.01148v1 Announce Type: new Abstract: Does structure in representations imply structure in computation? We study how Llama-3.1-8B reasons over cyclic concepts (e.g., "what month is six months after August?"). Even though Llama-3.1-8B's representations for these concepts are circularly stru

arxivneutral36d ago

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

arXiv:2409.06624v4 Announce Type: replace-cross Abstract: Large Language Models (LLM) often need to be Continual Pre-Trained (CPT) to obtain unfamiliar language skills or adapt to new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ra

Related Models

Llama-3.2-1B-Instruct

Meta · 8.2M downloads

sam3

Meta · 2.3M downloads

bert-base-uncased

google-bert · 69.6M downloads

paraphrase-multilingual-MiniLM-L12-v2

SBERT · 50.1M downloads