Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and Copilots
    • Innovation & Leadership
    • Cybersecurity
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
  • Summit NA
  • Dynamics Communities
  • Ask Copilot
Twitter Instagram
  • Summit NA
  • Dynamics Communities
  • AI Copilot Summit NA
  • Ask Cloud Wars
Twitter LinkedIn
Cloud Wars
  • Home
  • Top 10
  • CW Minute
  • CW Podcast
  • Categories
    • AI and CopilotsWelcome to the Acceleration Economy AI Index, a weekly segment where we cover the most important recent news in AI innovation, funding, and solutions in under 10 minutes. Our goal is to get you up to speed – the same speed AI innovation is taking place nowadays – and prepare you for that upcoming customer call, board meeting, or conversation with your colleague.
    • Innovation & Leadership
    • CybersecurityThe practice of defending computers, servers, mobile devices, electronic systems, networks, and data from malicious attacks.
    • Data
  • Member Resources
    • Cloud Wars AI Agent
    • Digital Summits
    • Guidebooks
    • Reports
  • About Us
    • Our Story
    • Tech Analysts
    • Marketing Services
    • Login / Register
Cloud Wars
    • Login / Register
Home » Google Develops ‘RT-2’ AI Model to Bridge General-Purpose Robot Functional Gaps
AI and Copilots

Google Develops ‘RT-2’ AI Model to Bridge General-Purpose Robot Functional Gaps

Toni WittBy Toni WittNovember 3, 20235 Mins Read
Facebook Twitter LinkedIn Email
Google RT-2 Model
Share
Facebook Twitter LinkedIn Email

We’ve all seen movies with the stereotypical AI robot that looks and talks like a human but is out to get someone or cause damage. It’s not a new concept in the realm of science fiction. But the reality looks very different. Advanced robotics, like the ones shown in movies filmed decades ago, are still not possible — and there are many reasons for that. The field of robotics is one with many limitations.

While we do have pre-programmed robots that follow predefined paths, like those that assemble cars or vacuum our floors, we want robots that can help us with anything: general purpose robots. Hopefully, they will be kinder than their cinematic counterparts. There are some challenges with general purpose robots, and Google developed a transformer model to bridge those gaps.

Discover how AI has created a new ecosystem of partnerships with a fresh spirit of customer-centric cocreation and a renewed focus on reimagining what is possible. The Acceleration Economy AI Ecosystem Course is available on demand.

Challenges With General Purpose Robotics

One of the problems with creating these kinds of robots is that you have to explicitly train a computer vision system to recognize each kind of object and scenario. Then, you have to provide it with a precise list of instructions to execute when that object or situation is recognized. This is a time-consuming and unfeasible process, given the randomness of real life. There will always be unforeseen circumstances.

For example, if you’re building a robot that picks up trash, you don’t want it to pick up food items that aren’t trash. Yet it’s very difficult for a robot to distinguish a bag of chips that is full – which is not trash — from a bag that’s maybe half empty and needs to be thrown away. Discerning between those requires a degree of reasoning, a trait that humans have but robots don’t. Even if you account for that specific scenario and train the robot explicitly to deal with half-full bags of chips, it still might not recognize the difference between full and half-full.

The technical challenge here is that high-level reasoning models like ChatGPT, which will understand the difference between a bag of chips and a piece of trash which used to be a bag of chips, aren’t aligned with the low-level software that drives the physical actions of robots.

Source: Google DeepMind: RT-2: New model translates vision and language into action

To bridge this gap, Google recently developed a novel kind of AI model that brings these functionalities together. Called RT-2, or the Robotics Transformer 2, this is a first-of-its-kind model trained on text and images from the web as well as actual robotics movement data. RT-2 can directly output robotic actions. Google’s innovation has implications for any field that relies on robotics, including healthcare, logistics, manufacturing, and more.

Google’s RT-2 Model

In short, Google built an AI model that translates high-level reasoning into low-level machine-executable instructions (move this joint 30 degrees, change the position of this from X to Y, and so on). They achieved this by pretraining RT-2 on web data that contains a vast amount of situations the robot might encounter, allowing the robot to pull knowledge from this pretraining when novel situations arise. This makes RT-2 extremely powerful in managing unforeseen situations, making robots that operate on this model much more useful as all-purpose machines.

Demonstration of chain-of-thought reasoning capability and prompting like that used in LLMs.
Source: Google DeepMind: RT-2: New model translates vision and language into action

RT-2 is built on previous work from Google and others. They called RT-2 the first-ever VLA model, or vision-language-action model, indicating that it can translate easily between visuals, language, and robotic action. These VLA models are built on top of VLM, an older class of so-called vision-language models that are general models trained on web data sets to translate between images and language. In essence, RT-2 combines VLM pre-training with robotic data that allows it to directly control a robot.

It’s also worth noting that RT-2 and VLAs are transformer models, a class of machine learning models that also include things like ChatGPT and most LLMs. Transformers are great at transferring “learned concepts” from their training data to unforeseen scenarios. This is why ChatGPT is still great at answering specific questions that it hasn’t seen before in its training: it’s able to generalize.

While RT-2 is still a research project and not a product, there’s a clear pathway for this technology to impact markets. It will reduce the failure rate and increase the flexibility of robots deployed in healthcare, manufacturing, or other commercial environments. It can eventually be used within autonomous robots like Spot by Boston Dynamics, which is being used in accident scenes or by the military, as well as drones which are used in various industries. It may play a role in autonomous vehicles. And, of course, it builds the foundation required to have robot companions that can help us with our various humanoid tasks.

I’m excited to see how innovation in AI and transformer models will continue to trickle down into other fields including robotics. RT-2 is no doubt a huge step closer to that sci-fi vision of robots that are actually useful day-to-day.


Interested in Google Cloud?

Schedule a discovery meeting to see if we can help achieve your goals

Connect With Us

Book a Demo

ai Artificial Intelligence featured Google Cloud Machine Learning robots
Share. Facebook Twitter LinkedIn Email
Analystuser

Toni Witt

Co-founder, Sweet
Cloud Wars analyst

Areas of Expertise
  • AI/ML
  • Entrepreneurship
  • Partners Ecosystem
  • Website
  • LinkedIn

In addition to keeping up with the latest in AI and corporate innovation, Toni Witt co-founded Sweet, a startup redefining hospitality through zero-fee payments infrastructure. He also runs a nonprofit community of young entrepreneurs, influencers, and change-makers called GENESIS. Toni brings his analyst perspective to Cloud Wars on AI, machine learning, and other related innovative technologies.

  Contact Toni Witt ...

Related Posts

Microsoft Adopts A2A Protocol, Agentic AI Era Begins

May 9, 2025

AI Agent & Copilot Podcast: Finastra Chief AI Officer Lays Out Range of Use Cases, Microsoft Collaboration

May 9, 2025

IBM Launches Microsoft Practice to Accelerate AI, Cloud, and Security Transformation

May 9, 2025

AI Agent & Copilot Podcast: JP Morgan Chase CISO Publicly Pushes for Stronger Security Controls

May 8, 2025
Add A Comment

Comments are closed.

Recent Posts
  • Microsoft Adopts A2A Protocol, Agentic AI Era Begins
  • AI Agent & Copilot Podcast: Finastra Chief AI Officer Lays Out Range of Use Cases, Microsoft Collaboration
  • IBM Launches Microsoft Practice to Accelerate AI, Cloud, and Security Transformation
  • AI Agent & Copilot Podcast: JP Morgan Chase CISO Publicly Pushes for Stronger Security Controls
  • ServiceNow Re-Invents CRM for End-to-End Enterprise

  • Ask Cloud Wars AI Agent
  • Tech Guidebooks
  • Industry Reports
  • Newsletters

Join Today

Most Popular Guidebooks

Accelerating GenAI Impact: From POC to Production Success

November 1, 2024

ExFlow from SignUp Software: Streamlining Dynamics 365 Finance & Operations and Business Central with AP Automation

September 10, 2024

Delivering on the Promise of Multicloud | How to Realize Multicloud’s Full Potential While Addressing Challenges

July 19, 2024

Zero Trust Network Access | A CISO Guidebook

February 1, 2024

Advertisement
Cloud Wars
Twitter LinkedIn
  • Home
  • About Us
  • Privacy Policy
  • Get In Touch
  • Marketing Services
  • Do not sell my information
© 2025 Cloud Wars.

Type above and press Enter to search. Press Esc to cancel.

  • Login
Forgot Password?
Lost your password? Please enter your username or email address. You will receive a link to create a new password via email.