Difference between revisions of "Late 2021 MIRI conversations"
(20 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
{| class="wikitable" | {| class="wikitable" | ||
− | ! Title !! Summary !! | + | ! Title !! Summary !! My thoughts !! Further reading/keywords |
+ | |- | ||
+ | | [https://www.lesswrong.com/posts/vwLxd6hhFvPbvKmBH/yudkowsky-and-christiano-discuss-takeoff-speeds Yudkowsky and Christiano discuss "Takeoff Speeds"] | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |- | ||
+ | | [https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/hwxj4gieR7FWNwYfa Ngo and Yudkowsky on AI capability gains] | ||
+ | | This conversation jumps between a lot of different topics, including both object level points and meta points (about how one ought to reason in these kinds of situations). Some of the main covered topics are: (1) recursive self-improvement and consequentialism (Eliezer defends the position that these are useful abstractions that apply to the real world, whereas Richard thinks they are wrong, or at least messier to apply to the real world than Eliezer believes or at the very least that the validity of these concepts has not been comprehensively argued for by Eliezer); (2) a meta-disagreement about what Eliezer got wrong in the [[Hanson-Yudkowsky debate]] (Eliezer thinks he made a 'correct argument about a different subject' rather than an 'incorrect argument about the correct subject', and thinks he didn't properly take into account the [[law of earlier failure]] -- Hanson's arguments failed much more prosaically than Eliezer expected; Richard on the other hand believes Eliezer's error is about the abstraction on consequentialism/recursive self-improvement being messier to apply to the real world than Eliezer expected); (3) the concept of utility (Eliezer defends the concept being useful and coherent and applicable to analyzing advanced AI; Richard is skeptical and goes on and on about advance predictions and such); (4) competence of government response on handling the AI situation (Eliezer is generally pessimistic, analogizing to COVID and the [[wikipedia:Subprime mortgage crisis]]; Richard thinks Eliezer is cherry-picking the negative data points); (5) whether there will be a period of rapid economics progress from "pre-scary" AI before "scary" cognition appears (Eliezer doesn't think this is likely, but also that there is no single principle that prohibits "hanging around" at a level for like 5 years where GDP goes up dramatically but the world doesn't end, that multiple smaller things would have to go unexpectedly for it to happen but it can still happen; Richard doesn't really say what his opinion is, but a lot of his hope seems to be here?). Eliezer also reveals the least impressive technology that his model rules out before the end of the world, namely [[copy-pasting strawberries]]. | ||
+ | | I found this conversation frustrating to read. As the participants themselves noted, the conversation spends too much time debating meta principles and not enough time on object level disagreement. Richard also didn't really state his opinions most of the time, making it difficult to see where his questions are coming from/why he is choosing particular lines of questioning. This makes the conversation feel "jerky". I actually thought [[Nate Soares]] provided a lot of good framing/moderation, and wished he would have been more heavy-handed in proposing topics and getting the main participants to stick to the object level. The main cruxes seem to be the stuff Nate brought up: can "non-scary" cognition end the acute risk period? (Eliezer thinks no, that you need a really powerful AI, whereas Richard seems to think yes, which is why he asks what Eliezer thinks is the hardest thing you can do with a "non-scary" AI) and will there be a longish regime of "pre-scary" cognition that we can study and learn from to help better align "scary" cognition? (Eliezer would be surprised to see such a period where we just "hang around", whereas Richard seems to think it is likely) | ||
+ | | | ||
|- | |- | ||
| [https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/gf9hhmSvpZfyfS34B Ngo's view on alignment difficulty] | | [https://www.lesswrong.com/s/n945eovrA3oDueqtq/p/gf9hhmSvpZfyfS34B Ngo's view on alignment difficulty] | ||
− | | [[Richard Ngo]] puts forth his own case about why he is more optimistic (compared to [[Eliezer Yudkowsky]]) about humanity handling the creation of [[AGI]] well. | + | | [[Richard Ngo]] puts forth his own case about why he is more optimistic (compared to [[Eliezer Yudkowsky]]) about humanity handling the creation of [[AGI]] well. Ngo's case relies on several points: (1) he expects a [[continuous takeoff]] where more and more tasks are automated (including the ability to "only to answer questions" but to do so at human level) without achieving AGI; (2) the difficulty of achieving AGI (he distinguishes between task-based reinforcement learning and open-ended reinforcement learning, and says the latter is what leads to AI catastrophe, but also that the latter is much more difficult due to slowness of real-world feedback and the difficulty of creating sufficiently rich artificial environments); (3) "The US and China preventing any other country from becoming a leader in AI requires about as much competent power as banning chemical/biological weapons"; (4) there is enough competent power at the level of 'banning chemical/biological weapons'; (5) this competent power will be used to halt progress on AI outside a US-China collaboration (?) (the optimism here relies on (1): continuous takeoff allows compelling cases of misalignment to occur and convince governments). So now his actual case (which is only implicit in the post, so I am reading between the lines here) is something like: The difficulty of AGI buys us some time (2). Meanwhile, progress on task-based/narrow AI will continue (1), and will produce compelling cases of AI misalignment, leading US/China to halt progress on AI outside a US-China collaboration (3,4,5). |
+ | | I did not like how this document was written; I was expecting a clear case for AI optimism to be stated explicitly, but instead it was a meandering sequence of points, most of them not obviously related to AI optimism. At the end I was left thinking, "What even is the case here?" and had to backtrack and skip around the post a few more times to figure out what seems to be the argument (but even now, I am not sure if I fully understand). | ||
+ | | [[ASML]] | ||
|} | |} | ||
Latest revision as of 21:27, 1 August 2022
https://www.lesswrong.com/s/n945eovrA3oDueqtq
Title | Summary | My thoughts | Further reading/keywords |
---|---|---|---|
Yudkowsky and Christiano discuss "Takeoff Speeds" | |||
Ngo and Yudkowsky on AI capability gains | This conversation jumps between a lot of different topics, including both object level points and meta points (about how one ought to reason in these kinds of situations). Some of the main covered topics are: (1) recursive self-improvement and consequentialism (Eliezer defends the position that these are useful abstractions that apply to the real world, whereas Richard thinks they are wrong, or at least messier to apply to the real world than Eliezer believes or at the very least that the validity of these concepts has not been comprehensively argued for by Eliezer); (2) a meta-disagreement about what Eliezer got wrong in the Hanson-Yudkowsky debate (Eliezer thinks he made a 'correct argument about a different subject' rather than an 'incorrect argument about the correct subject', and thinks he didn't properly take into account the law of earlier failure -- Hanson's arguments failed much more prosaically than Eliezer expected; Richard on the other hand believes Eliezer's error is about the abstraction on consequentialism/recursive self-improvement being messier to apply to the real world than Eliezer expected); (3) the concept of utility (Eliezer defends the concept being useful and coherent and applicable to analyzing advanced AI; Richard is skeptical and goes on and on about advance predictions and such); (4) competence of government response on handling the AI situation (Eliezer is generally pessimistic, analogizing to COVID and the wikipedia:Subprime mortgage crisis; Richard thinks Eliezer is cherry-picking the negative data points); (5) whether there will be a period of rapid economics progress from "pre-scary" AI before "scary" cognition appears (Eliezer doesn't think this is likely, but also that there is no single principle that prohibits "hanging around" at a level for like 5 years where GDP goes up dramatically but the world doesn't end, that multiple smaller things would have to go unexpectedly for it to happen but it can still happen; Richard doesn't really say what his opinion is, but a lot of his hope seems to be here?). Eliezer also reveals the least impressive technology that his model rules out before the end of the world, namely copy-pasting strawberries. | I found this conversation frustrating to read. As the participants themselves noted, the conversation spends too much time debating meta principles and not enough time on object level disagreement. Richard also didn't really state his opinions most of the time, making it difficult to see where his questions are coming from/why he is choosing particular lines of questioning. This makes the conversation feel "jerky". I actually thought Nate Soares provided a lot of good framing/moderation, and wished he would have been more heavy-handed in proposing topics and getting the main participants to stick to the object level. The main cruxes seem to be the stuff Nate brought up: can "non-scary" cognition end the acute risk period? (Eliezer thinks no, that you need a really powerful AI, whereas Richard seems to think yes, which is why he asks what Eliezer thinks is the hardest thing you can do with a "non-scary" AI) and will there be a longish regime of "pre-scary" cognition that we can study and learn from to help better align "scary" cognition? (Eliezer would be surprised to see such a period where we just "hang around", whereas Richard seems to think it is likely) | |
Ngo's view on alignment difficulty | Richard Ngo puts forth his own case about why he is more optimistic (compared to Eliezer Yudkowsky) about humanity handling the creation of AGI well. Ngo's case relies on several points: (1) he expects a continuous takeoff where more and more tasks are automated (including the ability to "only to answer questions" but to do so at human level) without achieving AGI; (2) the difficulty of achieving AGI (he distinguishes between task-based reinforcement learning and open-ended reinforcement learning, and says the latter is what leads to AI catastrophe, but also that the latter is much more difficult due to slowness of real-world feedback and the difficulty of creating sufficiently rich artificial environments); (3) "The US and China preventing any other country from becoming a leader in AI requires about as much competent power as banning chemical/biological weapons"; (4) there is enough competent power at the level of 'banning chemical/biological weapons'; (5) this competent power will be used to halt progress on AI outside a US-China collaboration (?) (the optimism here relies on (1): continuous takeoff allows compelling cases of misalignment to occur and convince governments). So now his actual case (which is only implicit in the post, so I am reading between the lines here) is something like: The difficulty of AGI buys us some time (2). Meanwhile, progress on task-based/narrow AI will continue (1), and will produce compelling cases of AI misalignment, leading US/China to halt progress on AI outside a US-China collaboration (3,4,5). | I did not like how this document was written; I was expecting a clear case for AI optimism to be stated explicitly, but instead it was a meandering sequence of points, most of them not obviously related to AI optimism. At the end I was left thinking, "What even is the case here?" and had to backtrack and skip around the post a few more times to figure out what seems to be the argument (but even now, I am not sure if I fully understand). | ASML |