Difference between revisions of "Iterated amplification"
(3 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
'''Iterated amplification''' (also called '''iterated distillation and amplification''', and abbreviated '''IDA''') is the technical alignment agenda that [[Paul Christiano]] works on. | '''Iterated amplification''' (also called '''iterated distillation and amplification''', and abbreviated '''IDA''') is the technical alignment agenda that [[Paul Christiano]] works on. | ||
+ | |||
+ | Terminology (not necessarily about IDA, but these are some terms frequently used by Paul): | ||
+ | |||
+ | * [[informed oversight]] | ||
+ | * [[adequate oversight]] | ||
+ | * [[overseer]] | ||
+ | * [[bandwidth of the overseer]], [[high bandwidth oversight]], [[low bandwidth oversight]] | ||
+ | * [[reward engineering]] | ||
+ | * [[HCH]], [[Strong HCH]], [[Weak HCH]], [[Humans consulting HCH]] | ||
+ | * [[amplification]] | ||
+ | * [[capability amplification]] | ||
+ | * [[distillation]] | ||
+ | * [[factored cognition]], [[factored evaluation]], [[factored generation]] | ||
+ | * [[corrigibility]] | ||
+ | * [[benign]] | ||
+ | * [[aligned]] | ||
+ | * [[robustness]] | ||
+ | * [[red teaming]] | ||
+ | * [[ALBA]] | ||
+ | * [[optimization daemon]]s | ||
+ | * [[act-based agent]] vs [[goal-directed agent]] | ||
+ | * [[approval-directed agent]] | ||
+ | * [[steering problem]] | ||
+ | * [[prosaic AI]] | ||
+ | * [[bootstrapping]] | ||
+ | * [[catastrophe]] | ||
+ | * [[reliability amplification]] | ||
+ | * [[security amplification]] | ||
+ | * [[universality]] | ||
+ | * [[narrow value learning]] vs [[ambitious value learning]] | ||
+ | * [[learning with catastrophes]], [[optimizing worst-case performance]] | ||
==See also== | ==See also== | ||
Line 6: | Line 37: | ||
[[Category:Iterated amplification]] | [[Category:Iterated amplification]] | ||
+ | [[Category:AI safety]] |
Latest revision as of 03:58, 26 April 2020
Iterated amplification (also called iterated distillation and amplification, and abbreviated IDA) is the technical alignment agenda that Paul Christiano works on.
Terminology (not necessarily about IDA, but these are some terms frequently used by Paul):
- informed oversight
- adequate oversight
- overseer
- bandwidth of the overseer, high bandwidth oversight, low bandwidth oversight
- reward engineering
- HCH, Strong HCH, Weak HCH, Humans consulting HCH
- amplification
- capability amplification
- distillation
- factored cognition, factored evaluation, factored generation
- corrigibility
- benign
- aligned
- robustness
- red teaming
- ALBA
- optimization daemons
- act-based agent vs goal-directed agent
- approval-directed agent
- steering problem
- prosaic AI
- bootstrapping
- catastrophe
- reliability amplification
- security amplification
- universality
- narrow value learning vs ambitious value learning
- learning with catastrophes, optimizing worst-case performance