NVIDIA has announced the general availability of Google DeepMind’s Gemma 3n models on both NVIDIA Jetson and RTX platforms. This development marks a significant milestone in bringing advanced multimodal AI capabilities—including text, vision, and audio—to a wide range of devices, from edge robotics to AI-powered PCs.
Gemma 3n: A Leap in Multimodal AI
Gemma 3n introduces two new models optimized for on-device, multimodal applications. Building on the foundation laid by version 3.5, Gemma 3n now supports audio processing alongside text and vision, integrating leading research models for each modality:
- Audio: Universal Speech Model
- Vision: MobileNet v4
- Text: MatFormer
Per-Lay Embeddings: Efficient Memory Usage
A standout feature of Gemma 3n is the introduction of Per-Lay Embeddings. This innovation substantially reduces RAM requirements for parameter storage. For example, the Gemma 3n E4B model, with 8 billion raw parameters, operates with a memory footprint similar to a 4B model. This advancement allows developers to deploy higher-quality models in environments with limited resources.
Model Specifications
Model Name | Raw Parameters | Input Context Length | Output Context Length | Size on Disk |
---|---|---|---|---|
E2B | 5B | 32K | 32K minus request | 1.55GB |
E4B | 8B | 32K | 32K minus request | 2.82GB |
Table: Gemma 3n model components and specifications.
Powering Robotics and Edge AI with Jetson
The Gemma 3n models are well-suited for NVIDIA Jetson devices, which are designed for edge applications such as robotics, smart cameras, and industrial automation. The combination of lightweight architecture and dynamic memory management enables these models to function efficiently in resource-constrained environments.
Gemma 3n Impact Challenge
Developers working with Jetson can participate in the Gemma 3n Impact Challenge on Kaggle. This initiative encourages the creation of impactful solutions in fields like accessibility, education, healthcare, environmental sustainability, and crisis response. Cash prizes, starting at $10,000, are awarded for top submissions and for leveraging on-device deployment technologies such as Jetson.
NVIDIA RTX: AI for Windows Developers and Enthusiasts
NVIDIA RTX AI PCs make it easy for developers and AI enthusiasts to deploy Gemma 3n models using the Ollama platform. These models can be integrated into popular applications like AnythingLLM and LM Studio, benefiting from RTX acceleration.
Quick Start with Ollama
Deploying Gemma 3n locally on RTX and Jetson devices is straightforward:
- Download and install Ollama for Windows.
- Open a terminal and run:
ollama pull gemma3n:e4b
ollama run gemma3n:e4b "Summarize Shakespeare’s Hamlet"
NVIDIA collaborates with Ollama to optimize performance for RTX GPUs, leveraging the GGML library for efficient model execution.
Customizing Gemma with NVIDIA NeMo Framework
Developers can further tailor Gemma 3n models using the open-source NVIDIA NeMo Framework, available via Hugging Face. NeMo provides an end-to-end workflow for post-training Llama models, enabling fine-tuning with enterprise-specific data for improved accuracy.
NeMo Workflow Overview
- Data Curation (NeMo Curator): Prepares high-quality datasets for pretraining or fine-tuning by extracting, filtering, and deduplicating large data volumes.
- Fine-Tuning (NeMo): Supports techniques like LoRA (Low-Rank Adaptation), PEFT (Parameter-Efficient Fine-Tuning), and full parameter tuning for comprehensive model customization.
- Model Evaluation (NeMo Evaluator): Assesses the performance of adapted models using custom tests and benchmarking.
Advancing Open-Source AI and Community Collaboration
NVIDIA actively contributes to the open-source AI ecosystem and has released hundreds of projects under open licenses. By supporting open models like Gemma, NVIDIA promotes AI transparency and encourages collaborative progress in AI safety and resilience.
Conclusion
The availability of Gemma 3n on NVIDIA Jetson and RTX platforms empowers developers to bring advanced multimodal AI capabilities to both edge and desktop environments. With innovations in memory efficiency, robust developer tools, and a commitment to open-source collaboration, this partnership sets a new standard for accessible, high-performance AI deployment.
Read more such articles from our Newsletter here.