Model Detail
Llama-3.2-1B-Instruct
▼ 0.8%Llama-3.2-1B-Instruct is a large language model with 1B parameters released by Meta. The model is registered under the text-generation pipeline tag on Hugging Face, and supports text->text inputs, released under the llama3.2 license.
Open-LLM-Leaderboard scoring places it at MMLU-Pro 8, GPQA 3, IFEval 57, BBH 9, giving a sense of how it handles instruction following, reasoning, and graduate-level QA in absolute terms.
Llama-3.2-1B-Instruct is priced at $0.027/M input tokens and $0.2/M output tokens. Operationally the model offers a 131K-token context window, which matters when sizing it for prompt-heavy or latency-sensitive workloads. At this input rate the model sits in the commodity tier and is suitable for high-volume workloads where per-call cost dominates the decision.
Llama-3.2-1B-Instruct ships as a LlamaForCausalLM / 💬 chat models (RLHF, DPO, IFT, ...) architecture with 1B parameters. The published knowledge cutoff is 2023-12-31, so newer events will not be reflected in zero-shot answers without retrieval. Total weight footprint is approximately 1.2 GB, which is the relevant figure when planning local-inference VRAM. Access is gated on Hugging Face under the llama3.2 license, which means a manual approval step before weights can be downloaded.
Downloads of Llama-3.2-1B-Instruct have moved -0.8% over the past 24 hours, -1.4% over the trailing seven days, +29.6% over the trailing thirty days. That is a slight downtrend, consistent with normal cooling as newer models compete for the same workloads. These numbers are signal, not guarantee — week-over-week download counts on Hugging Face also reflect mirror traffic, CI scrapes, and one-off benchmarking runs.
Llama-3.2-1B-Instruct is best fit for general-purpose chat and instruction-following workloads, and high-volume batch jobs where per-call cost dominates the budget. Treat this as a starting matrix rather than a benchmark verdict — the right deployment usually depends on the specific evaluation suite that mirrors your workload.
LLAMA LIMA: A Living Meta-Analysis on the Effects of Generative AI on Learning Mathematics
arXiv:2601.18685v3 Announce Type: replace-cross Abstract: The capabilities of generative AI in mathematics education are rapidly evolving, posing significant challenges for research to keep pace. Research syntheses remain scarce and risk being outdated by the time of publication. To address this iss
Llamas on the Web: Memory-Efficient, Performance-Portable, and Multi-Precision LLM Inference with WebGPU
arXiv:2605.20706v1 Announce Type: cross Abstract: Running language models in the browser presents a unique opportunity to build efficient, private, and portable AI applications, but requires contending with constrained memory availability and heterogeneous hardware targets. To realize this opportuni
From Llama to Cria: Scaling Down Neural Networks via Neuron-Level Spectral Structural Importance Evaluation
arXiv:2605.18860v1 Announce Type: new Abstract: This paper proposes a neuron pruning framework based on neuron-level spectral structural importance evaluation. Given a trained neural network, we record the hidden states of each hidden layer during inference and model neurons as graph nodes, with hid
Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
arXiv:2512.22671v2 Announce Type: replace Abstract: Structured width pruning of GLU-MLP layers, guided by the Maximum Absolute Weight (MAW) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parame
Arithmetic in the Wild: Llama uses Base-10 Addition to Reason About Cyclic Concepts
arXiv:2605.01148v1 Announce Type: new Abstract: Does structure in representations imply structure in computation? We study how Llama-3.1-8B reasons over cyclic concepts (e.g., "what month is six months after August?"). Even though Llama-3.1-8B's representations for these concepts are circularly stru
A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio
arXiv:2409.06624v4 Announce Type: replace-cross Abstract: Large Language Models (LLM) often need to be Continual Pre-Trained (CPT) to obtain unfamiliar language skills or adapt to new domains. The huge training cost of CPT often asks for cautious choice of key hyper-parameters such as the mixture ra