How to Build a Voice Interaction Model in ROS 2

Voice control has become one of the most natural and intuitive ways to interact with robots. From simple voice commands to complex conversational systems, enabling a robot to understand and respond to spoken language opens the door to more human-like and user-friendly experiences.

In this tutorial, we will build a Voice Interaction Model in ROS 2 that listens to a voice command, processes it through speech recognition, and triggers a robot’s action accordingly.

We will go step-by-step — first understanding the theory, then implementing the system with complete ROS 2 code examples. As always, I’ll provide the full code first, and then we’ll go into a Let’s break down the code section where we dissect each line to understand what it does and why.

1. Understanding the Voice Interaction Pipeline

A voice interaction model in robotics typically consists of three main components:

  1. Audio Capture — getting the voice signal from a microphone.

  2. Speech Recognition — converting the audio to text (using online or offline engines).

  3. Command Interpretation and Execution — parsing the recognized text and triggering robot actions.

In ROS 2, these components can be implemented as separate nodes or integrated into a single node depending on the complexity of the system.

2. Implementing the Voice Interaction Node

Here’s the full Python implementation of our ROS 2 voice interaction node:

				
					import rclpy
from rclpy.node import Node
import speech_recognition as sr
from std_msgs.msg import String

class VoiceInteractionNode(Node):
    def __init__(self):
        super().__init__('voice_interaction_node')
        self.publisher_ = self.create_publisher(String, 'voice_commands', 10)
        self.timer = self.create_timer(1.0, self.listen_command)
        self.recognizer = sr.Recognizer()
        self.microphone = sr.Microphone()
        self.get_logger().info("Voice Interaction Node started and listening...")

    def listen_command(self):
        with self.microphone as source:
            self.get_logger().info("Listening for a command...")
            audio = self.recognizer.listen(source)
        try:
            command = self.recognizer.recognize_google(audio)
            self.get_logger().info(f"Recognized command: {command}")
            msg = String()
            msg.data = command
            self.publisher_.publish(msg)
        except sr.UnknownValueError:
            self.get_logger().warn("Could not understand the audio.")
        except sr.RequestError as e:
            self.get_logger().error(f"Speech Recognition service error: {e}")

def main(args=None):
    rclpy.init(args=args)
    node = VoiceInteractionNode()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

				
			

Let’s break down the code

				
					import rclpy
from rclpy.node import Node
import speech_recognition as sr
from std_msgs.msg import String

				
			

We import:

  • rclpy — ROS 2 Python client library.

  • Node — base class for creating ROS 2 nodes.

  • speech_recognition — Python library for speech-to-text.

  • String — ROS 2 standard message type for publishing recognized text.

Node Initialization

				
					class VoiceInteractionNode(Node):
    def __init__(self):
        super().__init__('voice_interaction_node')

				
			

We create a ROS 2 node named "voice_interaction_node".

Publisher

				
					self.publisher_ = self.create_publisher(String, 'voice_commands', 10)

				
			

This publishes recognized commands on the voice_commands topic.

Timer Callback

				
					self.timer = self.create_timer(1.0, self.listen_command)

				
			

Every 1 second, the node calls the listen_command() function to capture audio.

 

Speech Recognition Setup

				
					self.recognizer = sr.Recognizer()
self.microphone = sr.Microphone()

				
			

We initialize the speech recognizer and the default microphone input.

Listening for Commands

				
					with self.microphone as source:
    self.get_logger().info("Listening for a command...")
    audio = self.recognizer.listen(source)

				
			

We open the microphone and listen for audio input.

Recognizing Speech

				
					command = self.recognizer.recognize_google(audio)

				
			

We use Google’s speech recognition API to transcribe the audio.

Publishing Commands

				
					msg = String()
msg.data = command
self.publisher_.publish(msg)

				
			

The recognized text is published as a ROS 2 String message.

Error Handling

We catch two exceptions:

  • UnknownValueError — when speech cannot be recognized.

  • RequestError — when there’s an issue contacting the speech recognition service.

Main Entry Point

				
					if __name__ == '__main__':
    main()

				
			

Runs the ROS 2 node.

3. Testing the Voice Interaction Node

Once you have the code ready, it’s time to test your voice interaction model.

Step 1 – Install Dependencies

Make sure you have the required Python packages installed:

				
					pip install SpeechRecognition
sudo apt install portaudio19-dev
pip install pyaudio

				
			
  • pip install SpeechRecognition → Installs the Python speech recognition library.

  • sudo apt install portaudio19-dev → Installs audio device support needed for microphone input.

  • pip install pyaudio → Provides the interface between Python and PortAudio.

Step 2 – Run the Node

In one terminal, source your workspace and run:

				
					. install/setup.bash
ros2 run voice_interaction_pkg voice_interaction_node

				
			

Let’s break down the code

  • . install/setup.bash → Sources your ROS 2 workspace so ROS knows about your package.

  • ros2 run voice_interaction_pkg voice_interaction_node → Executes the node from your package.

Step 3 – Subscribe to the Commands

In another terminal:

				
					. install/setup.bash
ros2 topic echo /voice_commands

				
			
  • ros2 topic echo /voice_commands → Displays every message published to the /voice_commands topic in real time.

You should see the exact text of the voice command you spoke.

4. Integrating with Robot Control

Recognizing voice commands is just the first step. The real power comes when you connect these commands to robot actions.

Let’s say you have a robot that can move forward, turn, and stop. We’ll create a command interpreter node that listens to /voice_commands and sends velocity commands.

 

Full Command Interpreter Node (Python)

				
					import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist

class CommandInterpreter(Node):
    def __init__(self):
        super().__init__('command_interpreter')
        self.subscription = self.create_subscription(
            String,
            'voice_commands',
            self.listener_callback,
            10)
        self.publisher_ = self.create_publisher(Twist, 'cmd_vel', 10)
        self.get_logger().info("Command Interpreter Node started.")

    def listener_callback(self, msg):
        command = msg.data.lower()
        twist = Twist()

        if "forward" in command:
            twist.linear.x = 0.2
        elif "backward" in command:
            twist.linear.x = -0.2
        elif "left" in command:
            twist.angular.z = 0.5
        elif "right" in command:
            twist.angular.z = -0.5
        elif "stop" in command:
            twist.linear.x = 0.0
            twist.angular.z = 0.0

        self.publisher_.publish(twist)
        self.get_logger().info(f"Executing command: {command}")

def main(args=None):
    rclpy.init(args=args)
    node = CommandInterpreter()
    rclpy.spin(node)
    node.destroy_node()
    rclpy.shutdown()

if __name__ == '__main__':
    main()

				
			

🧠Let’s break down the code

Subscribing to Voice Commands
				
					self.subscription = self.create_subscription(
    String,
    'voice_commands',
    self.listener_callback,
    10)

				
			
  • Listens to the voice_commands topic.

  • When a new message arrives, it calls listener_callback.

Publishing Velocity Commands
				
					self.publisher_ = self.create_publisher(Twist, 'cmd_vel', 10)

				
			
  • Publishes robot velocity messages to cmd_vel.
Parsing Commands
				
					command = msg.data.lower()

				
			
  • Converts recognized command to lowercase for easy matching.

Movement Logic
  • "forward" → Moves straight ahead at 0.2 m/s.

  • "backward" → Moves backward.

  • "left" → Rotates counter-clockwise.

  • "right" → Rotates clockwise.

  • "stop" → Stops all movement.

Publishing
				
					self.publisher_.publish(twist)

				
			
  • Sends the velocity command to the robot.

5. Testing the Full System

Once both nodes are running:

  1. Start voice_interaction_node to capture and recognize speech.

  2. Start command_interpreter to interpret and act on those commands.

  3. Speak commands like:

    • “Forward” → Robot moves ahead.

    • “Left” → Robot rotates left.

    • “Stop” → Robot stops.

6. Possible Improvements

  • Offline Recognition — Use Vosk or PocketSphinx to avoid reliance on internet connection.

  • Noise Filtering — Apply audio preprocessing to improve recognition in noisy environments.

  • Multi-Language Support — Enable commands in multiple languages.

  • Context Awareness — Different actions depending on the robot’s current state.

By following this tutorial, you’ve created a complete voice interaction system in ROS 2 — from capturing audio to making a robot move based on what you say.

Voice control brings an extra layer of intuitiveness to human-robot interaction, making robots more accessible and engaging. This foundation can be expanded into more sophisticated systems, from conversational AI to voice-driven task planning.

Leave a Reply

Your email address will not be published. Required fields are marked *

en_US