Course Contents
[list]
[*]Review of Probability Theory
[*]Markov Property and Markov Decision Processes
[*]The Multi-Armed Bandit Problem vs. the Full Reinforcement Learning Problem
[*]Taxonomy of Multi-Armed Bandit Problems (e.g., Stochastic vs. Adversarial Rewards, Contextual MAB)
[*]Algorithms for Multi-Armed Bandit Problems (e.g., Upper Confidence Interval (UCB), Epsilon-Greedy, SoftMax, LinUCB) and their Application to Cyber-Physical Networking
[*]Fundamentals of Dynamic Programming and Bellman Equations
[*]Taxonomy of Approaches for the Full Reinforcement Learning Problem (e.g., Temporal-Difference Learning, Policy Gradient and Actor-Critic)
[*]Algorithms for the Full Reinforcement Learning Problem (e.g., Q-Learning, SARSA, Policy Gradient, Actor-Critic) and their Application to Cyber-Physical Networking
[*]Linear Function Approximation
[*]Non-linear Function Approximation
[/list]

Literature
[list]
[*]Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction”, A Bradford Book, Cambridge, MA, USA, 2018.
[*]Aleksandrs Slivkins, "Introduction to Multi-Armed Bandits", Foundations and Trends in Machine Learning, Vol. 12: No. 1-2, 2019.
[/list]

Preconditions
[list]
[*]Python or Matlab: basic knowledge
[*]Engineering mathematics and probability theory
[/list]

Semester: Verão 2023