Aligning smart AI using slightly less smart AI
A strategy that some (who focus on machine learning safety) have cited for their relative optimism on the difficulty of AI alignment: we humans wouldn't need to directly align a superintelligence, but rather only need to align AI systems slightly smarter than ourselves, and from there, each "generation" of AI systems will align slightly smarter systems, and so on.
External links
- Paul Christiano makes the argument here
- Richard Ngo brings up this argument in [1]