Part V: Vision-Language-Action
Part Overview
VLA systems combine vision, language understanding, and robot actions to enable natural human-robot interaction.
Chapters
- Ch 16: VLA concepts, LLM + robotics convergence
- Ch 17: Whisper speech recognition, voice-to-action
- Ch 18: LLM planning, task decomposition, ROS mapping
- Ch 19: Multimodal interaction, gesture + voice + vision
Learning Outcomes
- Understand VLA architecture patterns
- Integrate Whisper for voice recognition
- Use LLMs for task planning
- Build multimodal interaction systems
Time: 28-36 hours