Watch Richard Ngo, from OpenAI's Policy Frontiers team, explore different types of AI alignment goals at our recent Intelligent Cooperation workshop.
Ngo categorizes them into single, single, multi-align, multi-single-align, and multi-align, each with unique considerations. He focuses on the goal of single alignment and the question of which aspects of a human’s goals or values the AI should be aligned with, noting a spectrum of options, from literal instructions to idealized values. The challenge for him lies in balancing obedience and paternalism. He proposes the concept of empowerment as a principled approach to this issue, where AI empowers users to make long-term choices and execute plans without incoherence or contradictory goals. He concludes that focusing on empowerment as a precise goal for a single alignment can help balance competing alignment objectives and avoid conflicts between user desires and AI nudges.
See the full talk here:
https://lnkd.in/dUynUWHT