Difference between revisions of "Future planning"
Line 11: | Line 11: | ||
* can MIRI-type research be done in time to help with AGI? see [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-Dk5LmWMEL55ufkTB5 this comment] | * can MIRI-type research be done in time to help with AGI? see [https://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-Dk5LmWMEL55ufkTB5 this comment] | ||
* prior on difficulty of alignment, and ideas like "if ML-based safety were to have any shot at working, wouldn't we just go all the way and expect the default (no EA intervention) approach to AGI to just produce basically ok outcomes?" | * prior on difficulty of alignment, and ideas like "if ML-based safety were to have any shot at working, wouldn't we just go all the way and expect the default (no EA intervention) approach to AGI to just produce basically ok outcomes?" | ||
− | * list of things people disagree about: | + | * list of things people disagree about:<ref>https://drive.google.com/file/d/1wI21XP-lRa6mi5h0dq_USooz0LpysdhS/view</ref> |
** probability of doom | ** probability of doom | ||
** civilizational adequacy | ** civilizational adequacy | ||
Line 21: | Line 21: | ||
** what the first AGI will look like | ** what the first AGI will look like | ||
** how big of a problem collusion between subsystems of an AI will be | ** how big of a problem collusion between subsystems of an AI will be | ||
− | ** how likely optimization daemons are or what they will look like | + | ** how likely optimization daemons/mesa-optimizers are or what they will look like |
− | ** whether there is a | + | ** whether there is a basin of attraction for corrigibility |
− | |||
** something-like-realism-about-rationality | ** something-like-realism-about-rationality | ||
** whether MIRI-type work can be done in time | ** whether MIRI-type work can be done in time | ||
Line 34: | Line 33: | ||
** how important it is to get the right architecture e.g. "That is what I meant by suggesting that architecture isn’t the key to AGI." [https://www.greaterwrong.com/posts/D3NspiH2nhKA6B2PE/what-evidence-is-alphago-zero-re-agi-complexity]. There is [[Dario Amodei]]'s comment [https://www.facebook.com/yudkowsky/posts/10155848910529228?comment_id=10155849004324228&reply_comment_id=10155849068769228 here] which is the opposite view. | ** how important it is to get the right architecture e.g. "That is what I meant by suggesting that architecture isn’t the key to AGI." [https://www.greaterwrong.com/posts/D3NspiH2nhKA6B2PE/what-evidence-is-alphago-zero-re-agi-complexity]. There is [[Dario Amodei]]'s comment [https://www.facebook.com/yudkowsky/posts/10155848910529228?comment_id=10155849004324228&reply_comment_id=10155849068769228 here] which is the opposite view. | ||
** is it possible to turn a small lead in AGI development into a big lead? | ** is it possible to turn a small lead in AGI development into a big lead? | ||
+ | ** will AGI be agent-like? | ||
+ | ** whether an AGI will look like a utility maximizer? | ||
+ | ** will AGI appear rational to humans? (efficient relative to humans) | ||
+ | ** will current ML techniques scale to AGI? | ||
+ | ** will there be small-scale AI failures prior to the end of the world? | ||
+ | ** will failure be conspicuous? | ||
+ | ** how much overlap is there between AI capabilities work and safety work? (e.g. is it reasonable to say things like "making progress on safety requires advancing capabilities"?) |
Revision as of 05:17, 23 February 2020
things to talk about:
- the most decision-relevant questions for me right now (everything else should feed into one of these questions):
- AI safety vs something else? right now AI safety seems like the best candidate for the biggest/soonest change, but i want to investigate some other things.
- if AI safety, then what technical agenda seems best? this matters for (1) deciding what to do technical research on, if at all; (2) what technical research to follow/promote/give money to.
- if AI safety, then what will the end of the world look like? (this matters for prepping)
- how likely is the end of the world?
- when will AGI come?
- how doomed ML safety approaches are e.g. see discussion here -- How doomed are ML safety approaches?
- there's the sort of opposite question of, how doomed is MIRI's approach? i.e. if there turns out to be no simple core algorithm for agency, or if understanding agency better doesn't help us build an AGI, then we might not be in a better place wrt aligning AI.
- can MIRI-type research be done in time to help with AGI? see this comment
- prior on difficulty of alignment, and ideas like "if ML-based safety were to have any shot at working, wouldn't we just go all the way and expect the default (no EA intervention) approach to AGI to just produce basically ok outcomes?"
- list of things people disagree about:[1]
- probability of doom
- civilizational adequacy
- probability of doom without any special EA intervention
- shape of takeoff
- what precursors/narrow systems we will see prior to AGI
- AI timelines
- what the first AGI will look like
- how big of a problem collusion between subsystems of an AI will be
- how likely optimization daemons/mesa-optimizers are or what they will look like
- whether there is a basin of attraction for corrigibility
- something-like-realism-about-rationality
- whether MIRI-type work can be done in time
- whether ML-based approaches are doomed
- whether "weird recursions" are a good idea
- whether we can correct mistakes when deploying AI systems as they come up (i.e. how catastrophic the initial problems will be)
- how many/how "lumpy" insights are for creating an AGI
- "the degree of complexity of useful combination, and the degree to which a simple general architecture search and generation process can find such useful combinations for particular tasks" [1]
- how much sharing/trading there will be between different AI companies (eliezer vs Robin Hanson) -- this one is downstream of lumpiness of insights, because hanson expects that if there are very few insights needed to get to AGI, then there won't be any need for sharing (so in that case even hanson would agree with eliezer).
- how important it is to get the right architecture e.g. "That is what I meant by suggesting that architecture isn’t the key to AGI." [2]. There is Dario Amodei's comment here which is the opposite view.
- is it possible to turn a small lead in AGI development into a big lead?
- will AGI be agent-like?
- whether an AGI will look like a utility maximizer?
- will AGI appear rational to humans? (efficient relative to humans)
- will current ML techniques scale to AGI?
- will there be small-scale AI failures prior to the end of the world?
- will failure be conspicuous?
- how much overlap is there between AI capabilities work and safety work? (e.g. is it reasonable to say things like "making progress on safety requires advancing capabilities"?)