How to Build a Voice Interaction Model in ROS 2
- ROS 2, Tutoriales

Voice control has become one of the most natural and intuitive ways to interact with robots. From simple voice commands to complex conversational systems, enabling a robot to understand and respond to spoken language opens the door to more human-like and user-friendly experiences.
In this tutorial, we will build a Voice Interaction Model in ROS 2 that listens to a voice command, processes it through speech recognition, and triggers a robot’s action accordingly.
We will go step-by-step — first understanding the theory, then implementing the system with complete ROS 2 code examples. As always, I’ll provide the full code first, and then we’ll go into a Let’s break down the code section where we dissect each line to understand what it does and why.
1. Understanding the Voice Interaction Pipeline
A voice interaction model in robotics typically consists of three main components:
Audio Capture — getting the voice signal from a microphone.
Speech Recognition — converting the audio to text (using online or offline engines).
Command Interpretation and Execution — parsing the recognized text and triggering robot actions.
In ROS 2, these components can be implemented as separate nodes or integrated into a single node depending on the complexity of the system.
2. Implementing the Voice Interaction Node
Here’s the full Python implementation of our ROS 2 voice interaction node:
import rclpy
from rclpy.node import Node
import speech_recognition as sr
from std_msgs.msg import String
class VoiceInteractionNode(Node):
def __init__(self):
super().__init__('voice_interaction_node')
self.publisher_ = self.create_publisher(String, 'voice_commands', 10)
self.timer = self.create_timer(1.0, self.listen_command)
self.recognizer = sr.Recognizer()
self.microphone = sr.Microphone()
self.get_logger().info("Voice Interaction Node started and listening...")
def listen_command(self):
with self.microphone as source:
self.get_logger().info("Listening for a command...")
audio = self.recognizer.listen(source)
try:
command = self.recognizer.recognize_google(audio)
self.get_logger().info(f"Recognized command: {command}")
msg = String()
msg.data = command
self.publisher_.publish(msg)
except sr.UnknownValueError:
self.get_logger().warn("Could not understand the audio.")
except sr.RequestError as e:
self.get_logger().error(f"Speech Recognition service error: {e}")
def main(args=None):
rclpy.init(args=args)
node = VoiceInteractionNode()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
Let’s break down the code
import rclpy
from rclpy.node import Node
import speech_recognition as sr
from std_msgs.msg import String
We import:
rclpy — ROS 2 Python client library.
Node — base class for creating ROS 2 nodes.
speech_recognition — Python library for speech-to-text.
String — ROS 2 standard message type for publishing recognized text.
Node Initialization
class VoiceInteractionNode(Node):
def __init__(self):
super().__init__('voice_interaction_node')
We create a ROS 2 node named "voice_interaction_node"
.
Publisher
self.publisher_ = self.create_publisher(String, 'voice_commands', 10)
This publishes recognized commands on the voice_commands
topic.
Timer Callback
self.timer = self.create_timer(1.0, self.listen_command)
Every 1 second, the node calls the listen_command()
function to capture audio.
Speech Recognition Setup
self.recognizer = sr.Recognizer()
self.microphone = sr.Microphone()
We initialize the speech recognizer and the default microphone input.
Listening for Commands
with self.microphone as source:
self.get_logger().info("Listening for a command...")
audio = self.recognizer.listen(source)
We open the microphone and listen for audio input.
Recognizing Speech
command = self.recognizer.recognize_google(audio)
We use Google’s speech recognition API to transcribe the audio.
Publishing Commands
msg = String()
msg.data = command
self.publisher_.publish(msg)
The recognized text is published as a ROS 2 String
message.
Error Handling
We catch two exceptions:
UnknownValueError
— when speech cannot be recognized.RequestError
— when there’s an issue contacting the speech recognition service.
Main Entry Point
if __name__ == '__main__':
main()
Runs the ROS 2 node.
3. Testing the Voice Interaction Node
Once you have the code ready, it’s time to test your voice interaction model.
Step 1 – Install Dependencies
Make sure you have the required Python packages installed:
pip install SpeechRecognition
sudo apt install portaudio19-dev
pip install pyaudio
pip install SpeechRecognition
→ Installs the Python speech recognition library.sudo apt install portaudio19-dev
→ Installs audio device support needed for microphone input.pip install pyaudio
→ Provides the interface between Python and PortAudio.
Step 2 – Run the Node
In one terminal, source your workspace and run:
. install/setup.bash
ros2 run voice_interaction_pkg voice_interaction_node
Let’s break down the code
. install/setup.bash
→ Sources your ROS 2 workspace so ROS knows about your package.ros2 run voice_interaction_pkg voice_interaction_node
→ Executes the node from your package.
Step 3 – Subscribe to the Commands
In another terminal:
. install/setup.bash
ros2 topic echo /voice_commands
ros2 topic echo /voice_commands
→ Displays every message published to the/voice_commands
topic in real time.
You should see the exact text of the voice command you spoke.
4. Integrating with Robot Control
Recognizing voice commands is just the first step. The real power comes when you connect these commands to robot actions.
Let’s say you have a robot that can move forward, turn, and stop. We’ll create a command interpreter node that listens to /voice_commands
and sends velocity commands.
Full Command Interpreter Node (Python)
import rclpy
from rclpy.node import Node
from std_msgs.msg import String
from geometry_msgs.msg import Twist
class CommandInterpreter(Node):
def __init__(self):
super().__init__('command_interpreter')
self.subscription = self.create_subscription(
String,
'voice_commands',
self.listener_callback,
10)
self.publisher_ = self.create_publisher(Twist, 'cmd_vel', 10)
self.get_logger().info("Command Interpreter Node started.")
def listener_callback(self, msg):
command = msg.data.lower()
twist = Twist()
if "forward" in command:
twist.linear.x = 0.2
elif "backward" in command:
twist.linear.x = -0.2
elif "left" in command:
twist.angular.z = 0.5
elif "right" in command:
twist.angular.z = -0.5
elif "stop" in command:
twist.linear.x = 0.0
twist.angular.z = 0.0
self.publisher_.publish(twist)
self.get_logger().info(f"Executing command: {command}")
def main(args=None):
rclpy.init(args=args)
node = CommandInterpreter()
rclpy.spin(node)
node.destroy_node()
rclpy.shutdown()
if __name__ == '__main__':
main()
🧠Let’s break down the code
Subscribing to Voice Commands
self.subscription = self.create_subscription(
String,
'voice_commands',
self.listener_callback,
10)
Listens to the
voice_commands
topic.When a new message arrives, it calls
listener_callback
.
Publishing Velocity Commands
self.publisher_ = self.create_publisher(Twist, 'cmd_vel', 10)
- Publishes robot velocity messages to
cmd_vel
.
Parsing Commands
command = msg.data.lower()
Converts recognized command to lowercase for easy matching.
Movement Logic
"forward"
→ Moves straight ahead at0.2 m/s
."backward"
→ Moves backward."left"
→ Rotates counter-clockwise."right"
→ Rotates clockwise."stop"
→ Stops all movement.
Publishing
self.publisher_.publish(twist)
Sends the velocity command to the robot.
5. Testing the Full System
Once both nodes are running:
Start
voice_interaction_node
to capture and recognize speech.Start
command_interpreter
to interpret and act on those commands.Speak commands like:
“Forward” → Robot moves ahead.
“Left” → Robot rotates left.
“Stop” → Robot stops.
6. Possible Improvements
Offline Recognition — Use Vosk or PocketSphinx to avoid reliance on internet connection.
Noise Filtering — Apply audio preprocessing to improve recognition in noisy environments.
Multi-Language Support — Enable commands in multiple languages.
Context Awareness — Different actions depending on the robot’s current state.
By following this tutorial, you’ve created a complete voice interaction system in ROS 2 — from capturing audio to making a robot move based on what you say.
Voice control brings an extra layer of intuitiveness to human-robot interaction, making robots more accessible and engaging. This foundation can be expanded into more sophisticated systems, from conversational AI to voice-driven task planning.