Gemma 4 VLA Demo on Jetson Orin Nano Super
Back to Articles Talk to Gemma 4, and she'll decide on her own if she needs to look through the webcam to answer you. All running locally on a Jetson Orin Nano Super. You speak → Parakeet STT → Gemma 4 → [Webcam if needed] → Kokoro TTS → Speaker Press SPACE to record, SPACE again to stop. This is a simple VLA: the model decides on its own whether to act based on the context of what you asked, no keyword triggers, no hardcoded logic. If your question needs Gemma to open her eyes, she'll decide to take a photo, interpret it, and answer you with that context in mind. She's not describing the picture, she's answering your actual question using what she saw. And honestly? It's pretty impressive that this runs on a Jetson Orin Nano. :) Get the code The full script for this tutorial lives on GitHub, in my Google_Gemma repo next to the Gemma 2 demos: github.com/asierarranz/Google_Gemma Grab it with either of these (pick one): Option 1: clone the whole repo git clone https://github.com/asierarranz/Google_Gemma.git cd Google_Gemma/Gemma4 Option 2: just download the script wget https://raw.githubusercontent.com/asierarranz/Google_Gemma/main/Gemma4/Gemma4_vla.py That single file (Gemma4_vla.py) is all you need. It pulls the STT/TTS models and voice assets from Hugging Face on first run.
Hardware What we used: NVIDIA Jetson Orin Nano Super (8 GB) Logitech C920 webcam (mic built in) USB speaker USB keyboard (to press SPACE) Not tied to these exact devices, any webcam, USB mic, and USB speaker that Linux sees should work.
Step 1: System packages Fresh Jetson, let's install the basics: sudo apt update sudo apt install -y git build-essential cmake curl wget pkg-config python3-pip python3-venv python3-dev alsa-utils pulseaudio-utils v4l-utils psmisc ffmpeg libsndfile1 build-essential and cmake are only needed if you go the native llama.cpp route (Option A in Step 4). The rest is for audio, webcam, and Python.
Step 2: Python environment python3 -m venv .venv source .venv/bin/activate pip install --upgrade pip pip install opencv-python-headless onnx_asr kokoro-onnx soundfile huggingface-hub numpy
Step 3: Free up RAM (optional but recommended) Heads up: this step may not be strictly necessary. But we're pushing this 8 GB board pretty hard with a fairly capable model, so giving ourselves some headroom makes the whole experience smoother, especially if you've been playing with Docker or other heavy stuff before this. These are just the commands that worked nicely for me. Use them if they help. Add some swap Swap won't speed up inference, but it acts as a safety net during model loading so you don't get OOM-killed at the worst moment. sudo fallocate -l 8G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile echo '/swapfile none swap sw 0 0' | sudo tee -a /etc/fstab Kill memory hogs sudo systemctl stop docker 2>/dev/null || true sudo systemctl stop containerd 2>/dev/null || true pkill -f tracker-miner-fs-3 || true pkill -f gnome-software || true free -h Close browser tabs, IDE windows, anything you don't need. Every MB counts. If you're going with the Docker route in Step 4, obviously don't stop Docker here, you'll need it. Still kill the rest though. Still tight on RAM? From our tests, Q4_K_M (native build) and Q4_K_S (Docker) run comfortably on the 8 GB board once you've done the cleanup above. But if you've got other stuff you can't kill and memory is still tight, you can drop one step down to a Q3 quant, same model, a bit less smart, noticeably lighter. Just swap the filename in Step 4: gemma-4-E2B-it-Q3_K_M.gguf # instead of Q4_K_M Honestly though, stick with Q4_K_M if you can. It's the sweet spot.
Step 4: Serve Gemma 4 You need a running llama-server with Gemma 4 before launching the demo. We'll build llama.cpp natively on the Jetson, it gives the best performance and full control over the vision projector that the VLA demo needs. Build llama.cpp cd ~ git clone https://github.com/ggml-org/llama.cpp.git cd
No replies yet. Be first.