Difference between revisions of "Competence gap"
(→notes) |
|||
(2 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | '''Competence gap''' is the gap between an AI system's ability design better | + | '''Competence gap''' is the gap between an AI system's ability design better (not necessarily aligned) AI systems and its ability to solve alignment problems (i.e. design better ''aligned'' AI systems). |
==History== | ==History== | ||
Line 12: | Line 12: | ||
in a situation where AI algorithms are creating other AI algorithms (this includes recursive self-improvement, but is also more general/relaxed), to what extent will the AI be helping with alignment (rather than just pushing forward capabilities)? how big will the "competence gap" be? [https://arbital.com/p/KANSI/?l=1fy#subpage-1h6] [https://agentfoundations.org/item?id=64] If there is a big competence gap, this leads to the situation [[Nate]] described [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design#Z6TbXivpjxWyc8NYM]: "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time." i.e. paul's approach gets to like aligned IQ 80 AIs or whatever, then when it tries to train IQ 81 AIs, we get alignment problems, but the IQ 80 AIs can't really help us align the IQ 81 AIs, and humans can't solve this in a reasonable amount of time either. | in a situation where AI algorithms are creating other AI algorithms (this includes recursive self-improvement, but is also more general/relaxed), to what extent will the AI be helping with alignment (rather than just pushing forward capabilities)? how big will the "competence gap" be? [https://arbital.com/p/KANSI/?l=1fy#subpage-1h6] [https://agentfoundations.org/item?id=64] If there is a big competence gap, this leads to the situation [[Nate]] described [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design#Z6TbXivpjxWyc8NYM]: "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time." i.e. paul's approach gets to like aligned IQ 80 AIs or whatever, then when it tries to train IQ 81 AIs, we get alignment problems, but the IQ 80 AIs can't really help us align the IQ 81 AIs, and humans can't solve this in a reasonable amount of time either. | ||
+ | |||
+ | Maybe another question to ask, or possibly an equivalent phrasing, is "will AIs be doing most of the safety work in the future?" (e.g. see [https://www.greaterwrong.com/posts/HekjhtWesBWTQW5eF/agis-as-populations#comment-eNjvffgpR875HeuaZ] for someone who thinks safety work will be done by AIs) | ||
+ | |||
+ | "if one of the major use cases for your first advanced AI is helping to build your second advanced AI, STEM AI fails hard on that metric, as it advances our technology without also advancing our understanding of alignment. In particular, unlike every other approach on this list, because STEM AI is confined solely to STEM, it can’t be used to do alignment work." [https://lw2.issarice.com/posts/fRsjBseRuvRhMPPE5/an-overview-of-11-proposals-for-building-safe-advanced-ai] | ||
[[Category:AI safety]] | [[Category:AI safety]] |
Latest revision as of 00:01, 30 May 2020
Competence gap is the gap between an AI system's ability design better (not necessarily aligned) AI systems and its ability to solve alignment problems (i.e. design better aligned AI systems).
History
The term seems to have first been used online by Daniel Dewey, who credits Nick Bostrom for the term. [1]
it's not clear when the concept (under different terms, or without introducing a term) was first discussed.
notes
to what extent paul's approach looks like humans trying to align arbitrarily large black boxes ("corralling hostile superintelligences") vs humans+pretty smart aligned AIs trying to align slightly large black boxes (this is actually somewhat analogous to Rapid capability gain vs AGI progress, where again eliezer is imagining some big leap/going from just humans to suddenly superhuman AI, whereas paul is imagining a more smooth transition that powers his optimism). In other words, how much easier is it to align large black boxes if we have pretty smart aligned AIs to help us? [2] [3]
in a situation where AI algorithms are creating other AI algorithms (this includes recursive self-improvement, but is also more general/relaxed), to what extent will the AI be helping with alignment (rather than just pushing forward capabilities)? how big will the "competence gap" be? [4] [5] If there is a big competence gap, this leads to the situation Nate described [6]: "your team runs into an alignment roadblock and can easily tell that the system is currently misaligned, but can’t figure out how to achieve alignment in any reasonable amount of time." i.e. paul's approach gets to like aligned IQ 80 AIs or whatever, then when it tries to train IQ 81 AIs, we get alignment problems, but the IQ 80 AIs can't really help us align the IQ 81 AIs, and humans can't solve this in a reasonable amount of time either.
Maybe another question to ask, or possibly an equivalent phrasing, is "will AIs be doing most of the safety work in the future?" (e.g. see [7] for someone who thinks safety work will be done by AIs)
"if one of the major use cases for your first advanced AI is helping to build your second advanced AI, STEM AI fails hard on that metric, as it advances our technology without also advancing our understanding of alignment. In particular, unlike every other approach on this list, because STEM AI is confined solely to STEM, it can’t be used to do alignment work." [8]