Difference between revisions of "List of disagreements in AI safety"

From Issawiki
Jump to: navigation, search
Line 1: Line 1:
* list of things people disagree about:<ref>[https://drive.google.com/file/d/1wI21XP-lRa6mi5h0dq_USooz0LpysdhS/view Clarifying some key hypotheses in AI alignment].</ref>
+
This is a '''list of disagreements in AI safety''' which collects the list of things people in AI safety seem to most frequently and deeply disagree about.
** how doomed ML safety approaches are e.g. see discussion [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-32dCL2u6p8L8td9BA here] -- [[How doomed are ML safety approaches?]]
+
 
*** there's the sort of opposite question of, how doomed is MIRI's approach? i.e. if there turns out to be no [[simple core algorithm for agency]], or if understanding agency better doesn't help us build an AGI, then we might not be in a better place wrt aligning AI.
+
Many of the items are from <ref>[https://drive.google.com/file/d/1wI21XP-lRa6mi5h0dq_USooz0LpysdhS/view Clarifying some key hypotheses in AI alignment].</ref> (there are more posts like this, i think? find them)
** can MIRI-type research be done in time to help with AGI? see [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-Dk5LmWMEL55ufkTB5 this comment]
+
 
** prior on difficulty of alignment, and ideas like "if ML-based safety were to have any shot at working, wouldn't we just go all the way and expect the default (no EA intervention) approach to AGI to just produce basically ok outcomes?"
+
* how doomed ML safety approaches are e.g. see discussion [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-32dCL2u6p8L8td9BA here] -- [[How doomed are ML safety approaches?]]
** probability of doom
+
** there's the sort of opposite question of, how doomed is MIRI's approach? i.e. if there turns out to be no [[simple core algorithm for agency]], or if understanding agency better doesn't help us build an AGI, then we might not be in a better place wrt aligning AI.
** civilizational adequacy
+
* can MIRI-type research be done in time to help with AGI? see [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-Dk5LmWMEL55ufkTB5 this comment]
** probability of doom without any special EA intervention
+
* prior on difficulty of alignment, and ideas like "if ML-based safety were to have any shot at working, wouldn't we just go all the way and expect the default (no EA intervention) approach to AGI to just produce basically ok outcomes?"
** shape of takeoff
+
* probability of doom
*** [[Will there be significant changes to the world prior to some critical AI capability threshold being reached?]]
+
* civilizational adequacy
** what precursors/narrow systems we will see prior to AGI
+
* probability of doom without any special EA intervention
** AI timelines
+
* shape of takeoff
** what the first AGI will look like
+
** [[Will there be significant changes to the world prior to some critical AI capability threshold being reached?]]
** how big of a problem collusion between subsystems of an AI will be
+
* what precursors/narrow systems we will see prior to AGI
** how likely optimization daemons/mesa-optimizers are or what they will look like
+
* AI timelines
** whether there is a basin of attraction for corrigibility
+
* what the first AGI will look like
** something-like-realism-about-rationality, e.g. "Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?" [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-YMNwHcPNPd4pDK7MR]
+
* how big of a problem collusion between subsystems of an AI will be
** whether MIRI-type work can be done in time
+
* how likely optimization daemons/mesa-optimizers are or what they will look like
** whether ML-based approaches are doomed
+
* whether there is a basin of attraction for corrigibility
** whether "weird recursions" are a good idea
+
* something-like-realism-about-rationality, e.g. "Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?" [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-YMNwHcPNPd4pDK7MR]
** whether we can correct mistakes when deploying AI systems as they come up (i.e. how catastrophic the initial problems will be)
+
* whether MIRI-type work can be done in time
** how many/how "lumpy" insights are for creating an AGI
+
* whether ML-based approaches are doomed
*** "the degree of complexity of useful combination, and the degree to which a simple general architecture search and generation process can find such useful combinations for particular tasks" [https://www.facebook.com/yudkowsky/posts/10155848910529228?comment_id=10155848951264228&reply_comment_id=10155849315174228]
+
* whether "weird recursions" are a good idea
** how much sharing/trading there will be between different AI companies (eliezer vs [[Robin Hanson]]) -- this one is downstream of lumpiness of insights, because hanson expects that if there are very few insights needed to get to AGI, then there won't be any need for sharing (so in that case even hanson would agree with eliezer).
+
* whether we can correct mistakes when deploying AI systems as they come up (i.e. how catastrophic the initial problems will be)
** how important it is to get the right architecture e.g. "That is what I meant by suggesting that architecture isn’t the key to AGI." [https://www.greaterwrong.com/posts/D3NspiH2nhKA6B2PE/what-evidence-is-alphago-zero-re-agi-complexity]. There is [[Dario Amodei]]'s comment [https://www.facebook.com/yudkowsky/posts/10155848910529228?comment_id=10155849004324228&reply_comment_id=10155849068769228 here] which is the opposite view.
+
* how many/how "lumpy" insights are for creating an AGI
** is it possible to turn a small lead in AGI development into a big lead?
+
** "the degree of complexity of useful combination, and the degree to which a simple general architecture search and generation process can find such useful combinations for particular tasks" [https://www.facebook.com/yudkowsky/posts/10155848910529228?comment_id=10155848951264228&reply_comment_id=10155849315174228]
** will AGI be agent-like?
+
* how much sharing/trading there will be between different AI companies (eliezer vs [[Robin Hanson]]) -- this one is downstream of lumpiness of insights, because hanson expects that if there are very few insights needed to get to AGI, then there won't be any need for sharing (so in that case even hanson would agree with eliezer).
** whether an AGI will look like a utility maximizer?
+
* how important it is to get the right architecture e.g. "That is what I meant by suggesting that architecture isn’t the key to AGI." [https://www.greaterwrong.com/posts/D3NspiH2nhKA6B2PE/what-evidence-is-alphago-zero-re-agi-complexity]. There is [[Dario Amodei]]'s comment [https://www.facebook.com/yudkowsky/posts/10155848910529228?comment_id=10155849004324228&reply_comment_id=10155849068769228 here] which is the opposite view.
** will AGI appear rational to humans? (efficient relative to humans)
+
* is it possible to turn a small lead in AGI development into a big lead?
** will current ML techniques scale to AGI?
+
* will AGI be agent-like?
** will there be small-scale AI failures prior to the end of the world?
+
* whether an AGI will look like a utility maximizer?
** will failure be conspicuous?
+
* will AGI appear rational to humans? (efficient relative to humans)
** how likely is a treacherous turn/context change type of failure?
+
* will current ML techniques scale to AGI?
** how much overlap is there between AI capabilities work and safety work? (e.g. is it reasonable to say things like "making progress on safety requires advancing capabilities"?)
+
* will there be small-scale AI failures prior to the end of the world?
** what will failure look like? yudkowskian takeover vs paul's "we get what we measure, and our ability to get what we specify outstrips our ability to measure what we truly want" vs paul's influence-seeking optimizers/daemons vs ...
+
* will failure be conspicuous?
** how strong of a guarantee do we need for the safety of AI? proof-level (does anyone actually argue this?) vs security mindset vs whatever ML safety people believe
+
* how likely is a treacherous turn/context change type of failure?
** deep insights needed to build an ''aligned'' AGI? see also [[Different senses of claims about AGI]] i.e. it might not require deep insights to build any old AGI, but still require deep insights for an aligned one.
+
* how much overlap is there between AI capabilities work and safety work? (e.g. is it reasonable to say things like "making progress on safety requires advancing capabilities"?)
** how useful is each kind of research e.g. Paul's vs MIRI's?
+
* what will failure look like? yudkowskian takeover vs paul's "we get what we measure, and our ability to get what we specify outstrips our ability to measure what we truly want" vs paul's influence-seeking optimizers/daemons vs ...
** what lessons can we learn by looking at the [[Evolution|evolutionary history]] of chimps vs humans?
+
* how strong of a guarantee do we need for the safety of AI? proof-level (does anyone actually argue this?) vs security mindset vs whatever ML safety people believe
** what lessons can we learn from [[AlphaGo]]?
+
* deep insights needed to build an ''aligned'' AGI? see also [[Different senses of claims about AGI]] i.e. it might not require deep insights to build any old AGI, but still require deep insights for an aligned one.
** is [[prosaic AI]] possible? see [https://srconstantin.wordpress.com/2017/02/21/strong-ai-isnt-here-yet/] for a post arguing against.
+
* how useful is each kind of research e.g. Paul's vs MIRI's?
** how short is the window between "clearly infrahuman" and "clearly superhuman" for important real-world tasks like "doing AI research"?
+
* what lessons can we learn by looking at the [[Evolution|evolutionary history]] of chimps vs humans?
 +
* what lessons can we learn from [[AlphaGo]]?
 +
* is [[prosaic AI]] possible? see [https://srconstantin.wordpress.com/2017/02/21/strong-ai-isnt-here-yet/] for a post arguing against.
 +
* how short is the window between "clearly infrahuman" and "clearly superhuman" for important real-world tasks like "doing AI research"?
  
 
==See also==
 
==See also==

Revision as of 23:27, 23 February 2020

This is a list of disagreements in AI safety which collects the list of things people in AI safety seem to most frequently and deeply disagree about.

Many of the items are from [1] (there are more posts like this, i think? find them)

  • how doomed ML safety approaches are e.g. see discussion here -- How doomed are ML safety approaches?
    • there's the sort of opposite question of, how doomed is MIRI's approach? i.e. if there turns out to be no simple core algorithm for agency, or if understanding agency better doesn't help us build an AGI, then we might not be in a better place wrt aligning AI.
  • can MIRI-type research be done in time to help with AGI? see this comment
  • prior on difficulty of alignment, and ideas like "if ML-based safety were to have any shot at working, wouldn't we just go all the way and expect the default (no EA intervention) approach to AGI to just produce basically ok outcomes?"
  • probability of doom
  • civilizational adequacy
  • probability of doom without any special EA intervention
  • shape of takeoff
  • what precursors/narrow systems we will see prior to AGI
  • AI timelines
  • what the first AGI will look like
  • how big of a problem collusion between subsystems of an AI will be
  • how likely optimization daemons/mesa-optimizers are or what they will look like
  • whether there is a basin of attraction for corrigibility
  • something-like-realism-about-rationality, e.g. "Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?" [1]
  • whether MIRI-type work can be done in time
  • whether ML-based approaches are doomed
  • whether "weird recursions" are a good idea
  • whether we can correct mistakes when deploying AI systems as they come up (i.e. how catastrophic the initial problems will be)
  • how many/how "lumpy" insights are for creating an AGI
    • "the degree of complexity of useful combination, and the degree to which a simple general architecture search and generation process can find such useful combinations for particular tasks" [2]
  • how much sharing/trading there will be between different AI companies (eliezer vs Robin Hanson) -- this one is downstream of lumpiness of insights, because hanson expects that if there are very few insights needed to get to AGI, then there won't be any need for sharing (so in that case even hanson would agree with eliezer).
  • how important it is to get the right architecture e.g. "That is what I meant by suggesting that architecture isn’t the key to AGI." [3]. There is Dario Amodei's comment here which is the opposite view.
  • is it possible to turn a small lead in AGI development into a big lead?
  • will AGI be agent-like?
  • whether an AGI will look like a utility maximizer?
  • will AGI appear rational to humans? (efficient relative to humans)
  • will current ML techniques scale to AGI?
  • will there be small-scale AI failures prior to the end of the world?
  • will failure be conspicuous?
  • how likely is a treacherous turn/context change type of failure?
  • how much overlap is there between AI capabilities work and safety work? (e.g. is it reasonable to say things like "making progress on safety requires advancing capabilities"?)
  • what will failure look like? yudkowskian takeover vs paul's "we get what we measure, and our ability to get what we specify outstrips our ability to measure what we truly want" vs paul's influence-seeking optimizers/daemons vs ...
  • how strong of a guarantee do we need for the safety of AI? proof-level (does anyone actually argue this?) vs security mindset vs whatever ML safety people believe
  • deep insights needed to build an aligned AGI? see also Different senses of claims about AGI i.e. it might not require deep insights to build any old AGI, but still require deep insights for an aligned one.
  • how useful is each kind of research e.g. Paul's vs MIRI's?
  • what lessons can we learn by looking at the evolutionary history of chimps vs humans?
  • what lessons can we learn from AlphaGo?
  • is prosaic AI possible? see [4] for a post arguing against.
  • how short is the window between "clearly infrahuman" and "clearly superhuman" for important real-world tasks like "doing AI research"?

See also

References