Course Contents
This advanced seminar introduces fundamental algorithms for[b] Robotic Embodied AI Systems (REAIS)[/b] that can autonomously perceive, navigate and manipulate objects in unstructured environments like homes, restaurants, supermarkets, etc.
It addresses the complex and timely challenge of understanding and developing intelligent robotic agents that can interact and change their world. The seminar will discuss fundamental problems in embodied AI and robotics connecting [b]Multimodal Perception to Action[/b].
The seminar will combine an introductory lecture and a reading group to discuss and learn about advanced algorithmic approaches in robotics and embodied AI.
This semester the theme of the seminar is "[b]Interactive Robot Perception and Learning[/b]".
A tentative list of papers includes:
Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations https://arxiv.org/abs/2104.01542
Learning Agent-Aware Affordances for Closed-Loop Interaction with Articulated Objects https://arxiv.org/abs/2209.05802
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models https://arxiv.org/abs/2207.11514
The (Un)Surprising Effectiveness of Pre-Trained Vision Models for Control https://arxiv.org/abs/2203.03580
R3M: A Universal Visual Representation for Robot Manipulation https://arxiv.org/abs/2203.12601
Real-World Robot Learning with Masked Visual Pre-training https://arxiv.org/abs/2210.03109
Offline Visual Representation Learning for Embodied Navigation https://arxiv.org/abs/2204.13226
The Surprising Effectiveness of Representation Learning for Visual Imitation https://arxiv.org/abs/2112.01511
VideoDex: Learning Dexterity from Internet Videos https://arxiv.org/abs/2212.04498
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances https://arxiv.org/abs/2204.01691
CLIPort: What and Where Pathways for Robotic Manipulation https://arxiv.org/abs/2109.12098
VIMA: General Robot Manipulation with Multimodal Prompts https://arxiv.org/abs/2210.03094
GATO: A Generalist Agent https://arxiv.org/abs/2205.06175
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training https://arxiv.org/abs/2209.11133
Learning Universal Policies via Text-Guided Video Generation https://arxiv.org/abs/2302.00111
Literature
We recommend watching the online course on Modern Robotics: [url]https://youtube.com/playlist?list=PLggLP4f-rq02vX0OQQ5vrCxbJrzamYDfx[/url]
Preconditions
Recommended:
The students should have fundamental knowledge in robotics, and linear algebra. Furthermore, Fundamentals of Robotics, Robot Learning and/or Computer Vision I is recommended.
Online Offerings
Moodle.
This advanced seminar introduces fundamental algorithms for[b] Robotic Embodied AI Systems (REAIS)[/b] that can autonomously perceive, navigate and manipulate objects in unstructured environments like homes, restaurants, supermarkets, etc.
It addresses the complex and timely challenge of understanding and developing intelligent robotic agents that can interact and change their world. The seminar will discuss fundamental problems in embodied AI and robotics connecting [b]Multimodal Perception to Action[/b].
The seminar will combine an introductory lecture and a reading group to discuss and learn about advanced algorithmic approaches in robotics and embodied AI.
This semester the theme of the seminar is "[b]Interactive Robot Perception and Learning[/b]".
A tentative list of papers includes:
Synergies Between Affordance and Geometry: 6-DoF Grasp Detection via Implicit Representations https://arxiv.org/abs/2104.01542
Learning Agent-Aware Affordances for Closed-Loop Interaction with Articulated Objects https://arxiv.org/abs/2209.05802
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models https://arxiv.org/abs/2207.11514
The (Un)Surprising Effectiveness of Pre-Trained Vision Models for Control https://arxiv.org/abs/2203.03580
R3M: A Universal Visual Representation for Robot Manipulation https://arxiv.org/abs/2203.12601
Real-World Robot Learning with Masked Visual Pre-training https://arxiv.org/abs/2210.03109
Offline Visual Representation Learning for Embodied Navigation https://arxiv.org/abs/2204.13226
The Surprising Effectiveness of Representation Learning for Visual Imitation https://arxiv.org/abs/2112.01511
VideoDex: Learning Dexterity from Internet Videos https://arxiv.org/abs/2212.04498
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances https://arxiv.org/abs/2204.01691
CLIPort: What and Where Pathways for Robotic Manipulation https://arxiv.org/abs/2109.12098
VIMA: General Robot Manipulation with Multimodal Prompts https://arxiv.org/abs/2210.03094
GATO: A Generalist Agent https://arxiv.org/abs/2205.06175
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training https://arxiv.org/abs/2209.11133
Learning Universal Policies via Text-Guided Video Generation https://arxiv.org/abs/2302.00111
Literature
We recommend watching the online course on Modern Robotics: [url]https://youtube.com/playlist?list=PLggLP4f-rq02vX0OQQ5vrCxbJrzamYDfx[/url]
Preconditions
Recommended:
The students should have fundamental knowledge in robotics, and linear algebra. Furthermore, Fundamentals of Robotics, Robot Learning and/or Computer Vision I is recommended.
Online Offerings
Moodle.
- Lehrende: Georgia Chalvatzaki
Semester: ST 2024