Abstract: The computational framework of inverse reinforcement learning (IRL) attempts to model human behavior as reward maximization by inferring the "reward(s)" that people appear to be pursuing. This reward model can predict future behavior and serve as a description of "what humans want" that can guide an autonomous system attempting to be helpful to users. Reward models like these underlie many of the machine-learning systems we interact with every day, from recommender systems to social media to autonomous vehicles to large language models like chatGPT. Despite recent progress, much work remains at the intersection of computer science, cognitive science, neuroscience, economics, and philosophy to understand and model human rewards and preferences. I will talk about current approaches and open problems in this vital and interdisciplinary space.
Seminar
US Mountain Time
Speaker:
Brian Christian
Our campus is closed to the public for this event.
Brian ChristianAuthor
SFI Host:
Melanie Mitchell