As AI systems surpass human capabilities and progress toward superintelligence, ensuring their alignment with human values becomes paramount. These systems will need to act on our behalf in situations we could not foresee, requiring superalignment. In this talk, we explore imbuing such virtues into systems more intelligent than ourselves through multi-agent approaches:
Akbir Khan is a member of the technical staff at Anthropic, where he focuses on building safe superintelligence. His research centers on Scalable Oversight techniques, primarily through multi-agent learning approaches. His recent work on debate received a Best Paper Award at ICML 2024.