June 25, 2025

Gemini’s Robotics On-Device outperforms Google’s other models

3 min read

Google DeepMind on Tuesday introduced a new language model called Gemini Robotics On-Device. The firm revealed that the model can run tasks locally on robots without an internet connection. The new model, which builds on the company’s previous Gemini Robotics AI model that was released in March, can control a robot’s movements. Google also acknowledged that the vision-language-action model (VLA) is small and efficient enough to run directly on a robot. According to the company, developers can control and fine-tune the model to suit various needs using natural language prompts. Robotics On-Device outperforms Google’s other models We’re bringing powerful AI directly onto robots with Gemini Robotics On-Device. 🤖 It’s our first vision-language-action model to help make robots faster, highly efficient, and adaptable to new tasks and environments – without needing a constant internet connection. 🧵 pic.twitter.com/1Y21D3cF5t — Google DeepMind (@GoogleDeepMind) June 24, 2025 Head of robotics at Google DeepMind, Carolina Parada, maintained that the original Gemini Robotics model uses a hybrid approach, allowing it to operate on-device and on the cloud. She said that with the new device-only model, users can access offline features almost as well as the flagship’s. The tech company claims that the model performs at a level close to the cloud-based Gemini Robotics model in benchmarks. Google also said it outperforms others on-device modes in general benchmarks, though it didn’t name those models. “The Gemini Robotics hybrid model is still more powerful, but we’re actually quite surprised at how strong this on-device model is. I would think about it as a starter model or as a model for applications that just have poor connectivity.” -Carolina Parada, Head of Robotics at Google DeepMind. The firm illustrated in the demo robots running the local model, unzipping bags, and folding clothes. Google acknowledged that while the model was trained for ALOHA robots, it later adapted it to work on a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik. The tech company claims the bi-arm Franka FR3 was successful in tackling scenarios and objects it hadn’t seen before, like doing assembly on an industrial belt. The firm mentioned that developers can show robots 50 to 100 demonstrations of tasks to train them on new tasks using the models on the MuJoCo physics simulator. Google DeepMind also mentioned the release of a software development kit called the Gemini Robotics SDK. The company revealed that its Robotics SDK provides full lifecycle tooling necessary for using Gemini Robotics models, including accessing checkpoints, serving a model, evaluating the model on the robot and in the sim, uploading data, and fine-tuning it. The firm disclosed that its on-device Gemini Robotics model and its SDK will be available to a group of trusted testers while Google continues to work toward minimizing safety risks. Tech companies join the robotics race Other companies that use AI models are also showing interest in robotics. Nvidia is building a platform to create foundational models for humanoids. The firm’s CEO, Jensen Huang, noted that building foundation models for general humanoid robots is one of the most exciting problems to solve in AI today. Huang argued that the humanoid factor is one of the most contested topics in the world of robotics at the moment. He acknowledged that it’s raising venture capital by the boatload while generating massive skepticism along the way. Nvidia has also been championing robotic innovation through initiatives like Isaac and Jetson. Last year in March, at its annual GTC developer conference, the company joined the humanoid race with Project GROOT. Nvidia referred to the new platform as a general-purpose foundation model for humanoid robots. The firm said GROOT will support new hardware from Nvidia as well. Hugging Face is not only developing open models and data sets for robotics, but it is also working on robots. The firm revealed earlier this month an OpenAI model for robotics called SmolVLA. The company claims the model is trained on community-shared datasets and outperforms much larger models for robotics in both virtual and real-world environments. Hugging Face also revealed that SmolVLA aims to democratize access to vision-language-action (VLA) models and accelerate research toward generalist robotic agents. Last year, the firm launched LeRobot, a collection of robotics-focused models, datasets, and tools. More recently, Hugging Face acquired Pollen Robotics, a robotics startup based in France, and revealed several inexpensive robotics systems, including humanoids, for purchase. Your crypto news deserves attention – KEY Difference Wire puts you on 250+ top sites

Cryptopolitan logo

Source: Cryptopolitan

Leave a Reply

Your email address will not be published. Required fields are marked *

You may have missed