Difference between revisions of "The Hour I First Believed"

From Issawiki
Jump to: navigation, search
(Comments)
(Analysis)
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
'''"The Hour I First Believed"''' is a blog post by [[Scott Alexander]] about [[acausal trade]] and a big picture of what will happen in the multiverse.<ref>https://slatestarcodex.com/2018/04/01/the-hour-i-first-believed/</ref>
+
'''"The Hour I First Believed"''' is a blog post by [[Scott Alexander]] about [[multiverse-wide cooperation]] and a big picture of what will happen in the multiverse.<ref>https://slatestarcodex.com/2018/04/01/the-hour-i-first-believed/</ref> Scott creates plausible deniability by publishing the post on April 1 (April Fool's Day), but on this page I'll assume that everything in the post is serious.
 +
 
 +
"This idea that possible worlds can trade with each other seems to have fairly radical implications. Together with Eliezer's idea that agents who know each other's source code ought to play cooperate in one-shot PD, doesn't it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences? (Note: the resulting unified preferences are not necessarily characterized by expected utility maximization.)" [https://www.lesswrong.com/posts/qij9v3YqPfyur2PbX/indexical-uncertainty-and-the-axiom-of-independence?commentId=CTNP26BJp2kDMEv2M]
  
 
==Comments==
 
==Comments==
  
Here's my thinking on this post:
+
===Upshot===
  
* I think the explanations of the five parts ("acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse") are basically fine/accurate descriptions of those things.
+
* I think the explanations of the five parts ("acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse") are basically fine/accurate descriptions of those things. This isn't to say that Scott's descriptions are all you would want to know about these topics (far from it!) but more that there isn't anything wrong with them given the space constraints.
 
* I think it's plausible that something ''sort of like'' what the post describes will happen, where there will be one dominant "universal law", and many/most superintelligences in the multiverse will follow this law.
 
* I think it's plausible that something ''sort of like'' what the post describes will happen, where there will be one dominant "universal law", and many/most superintelligences in the multiverse will follow this law.
 
* I think the universal law will mostly look alien to us, and completely unlike what the post describes ("since the superentity is identical to the moral law, it’s not really asking you to do anything except be a good person anyway").
 
* I think the universal law will mostly look alien to us, and completely unlike what the post describes ("since the superentity is identical to the moral law, it’s not really asking you to do anything except be a good person anyway").
  
Here are some considerations:
+
===Analysis===
 +
 
 +
* maybe it turns out that most civilizations in general, across the multiverse, screw up AI alignment. If so, most superintelligences that exist could have messed up values (values that looked good to program into an AI, but aren't actually the real thing). If so, ''the universal law will take into account these messed up values, rather than the values which tend to naturally evolve''.
 +
* [[Eliezer]]'s idea of [[reflectively consistent degrees of freedom]]: if your AI uses CDT, it will not self-modify to use UDT; instead it will evolve to use son-of-CDT. There are other things like this, where different initial configurations lead to totally different endpoints after many iterations of self-modification. So it isn't necessarily the case that all superintelligences will use acausal trade/value handshakes. The universes where the dominant superintelligence doesn't use acausal trade will be "pockets" of isolated worlds that none of the other superintelligences (in other universes) will care about (because that universe cannot be acausally influenced).
 +
* The post also seems to completely ignore the [[malignity of the universal prior]]/getting hacked by distant superintelligences (though simulation capture is similar). I guess the superintelligences won't be using the universal prior to make decisions, so maybe this is a minor point.
 +
 
 +
===Comments on specific parts of the post===
  
* maybe it turns out that most civilizations in general, across the multiverse, screw up AI alignment. If so, most superintelligences that exist could have messed up values (values that looked good to program into an AI, but aren't actually the real thing). If so, ''the universal law will take into account these messed up values, rather than the values which tend to naturally to evolve''.
+
* "In each universe, life arises, forms technological civilizations, and culminates in the creation of a superintelligence which gains complete control over its home universe." -- not necessarily the case. In Christiano-style takeoff, there will be multiple competing AIs, none of which has complete control over the universe.
* [[Eliezer]]'s idea of [[reflectively consistent degrees of freedom]]: if you AI uses CDT, it will not self-modify to use UDT; instead it will evolve to use son-of-CDT. There are other things like this, where different initial configurations lead to totally different endpoints after many iterations of self-modification. So it isn't necessarily the case that all superintelligences will use acausal trade/value handshakes.
+
* "So superintelligences may spend some time calculating the most likely distribution of superintelligences in foreign universes, figure out how those superintelligences would acausally “negotiate”, and then join a pact such that all superintelligences in the pact agree to replace their own values with a value set based on the average of all the superintelligences in the pact." -- this isn't clear to me. It could be the case that, for instance, two superintelligences do a value handshake with each other, but not with any other superintelligences. Or maybe another way of putting it is: the universal law will have lots of conditional statements like "if your universe has resource of X kind, then take action Y". So the universal law won't literally cause all universes to "look the same" in terms of the actions the AIs are taking.
 +
* "they might also think of this as an example of the counterfactual mugging, and decide to weight their values more in order to do better in the counterfactual case where they are less powerful. This might also simplify the calculation of trying to decide what the values of the pact would be." -- I think this is likely to be much more tricky than what Scott is imagining. Basically, it's not clear which "game" will be considered as the "behind the veil of ignorance" version. See [https://causeprioritization.org/Veil_of_ignorance_and_functional_decision_theory my page on veil of ignorance]. In particular, there are two competing ideas: "Interestingly, Dai uses this reasoning to reach the conclusion that one might care less about astronomical waste, while Karnofsky uses this reasoning to give more weight to long-term worldviews (since they are relatively more neglected)."
 +
* "If they decide to negotiate this way, the pact will be to maximize the total utility of all the entities in the universe willing to join the pact" -- not really clear what this is even supposed to mean. If we are literally counting every single entity, then e.g. entities that breed more will get a larger share of the multiverse. If he means instead just the representative superintelligence from each universe, then there will probably be some superintelligence that "had no shot" at being stronger for some fundamental reason (e.g. maybe its values are super complicated, or one of its values is to preserve resources or something weird like that); then in no version of the universe will they have been in a "winning position" (prior to considering veil of ignorance). Do you count these superintelligences equally?
 +
* "But “maximize the total utility of all the entities in the universe” is just the moral law, at least according to utilitarians (and, considering the way this is arrived at, probably contractarians too)." -- ???? similar to the previous point I guess. If he wants to count literally every single entity, then he is incentivizing the creation of breeders. If he's counting the "representative superintelligence", then the "moral law" doesn't quite say what he wants it to.
 +
* "This would be metaphysically simplest if it were done exactly as the mortal dies in its own universe, leaving nothing behind except a clean continuity of consciousness into the simulated world." -- it's hard for me to see how this would actually work. The notions of time between different universes seems incompatible. Or does Scott mean that you start simulating at the state where the mortal dies? But to be able to obtain that state, you need to already simulate the mortal's life up to the point where they die!
  
 
==References==
 
==References==
  
 
<references/>
 
<references/>

Latest revision as of 06:18, 21 May 2020

"The Hour I First Believed" is a blog post by Scott Alexander about multiverse-wide cooperation and a big picture of what will happen in the multiverse.[1] Scott creates plausible deniability by publishing the post on April 1 (April Fool's Day), but on this page I'll assume that everything in the post is serious.

"This idea that possible worlds can trade with each other seems to have fairly radical implications. Together with Eliezer's idea that agents who know each other's source code ought to play cooperate in one-shot PD, doesn't it imply that all sufficiently intelligent and reflective agents across all possible worlds should do a global trade and adopt a single set of preferences that represents a compromise between all of their individual preferences? (Note: the resulting unified preferences are not necessarily characterized by expected utility maximization.)" [1]

Comments

Upshot

  • I think the explanations of the five parts ("acausal trade, value handshakes, counterfactual mugging, simulation capture, and the Tegmarkian multiverse") are basically fine/accurate descriptions of those things. This isn't to say that Scott's descriptions are all you would want to know about these topics (far from it!) but more that there isn't anything wrong with them given the space constraints.
  • I think it's plausible that something sort of like what the post describes will happen, where there will be one dominant "universal law", and many/most superintelligences in the multiverse will follow this law.
  • I think the universal law will mostly look alien to us, and completely unlike what the post describes ("since the superentity is identical to the moral law, it’s not really asking you to do anything except be a good person anyway").

Analysis

  • maybe it turns out that most civilizations in general, across the multiverse, screw up AI alignment. If so, most superintelligences that exist could have messed up values (values that looked good to program into an AI, but aren't actually the real thing). If so, the universal law will take into account these messed up values, rather than the values which tend to naturally evolve.
  • Eliezer's idea of reflectively consistent degrees of freedom: if your AI uses CDT, it will not self-modify to use UDT; instead it will evolve to use son-of-CDT. There are other things like this, where different initial configurations lead to totally different endpoints after many iterations of self-modification. So it isn't necessarily the case that all superintelligences will use acausal trade/value handshakes. The universes where the dominant superintelligence doesn't use acausal trade will be "pockets" of isolated worlds that none of the other superintelligences (in other universes) will care about (because that universe cannot be acausally influenced).
  • The post also seems to completely ignore the malignity of the universal prior/getting hacked by distant superintelligences (though simulation capture is similar). I guess the superintelligences won't be using the universal prior to make decisions, so maybe this is a minor point.

Comments on specific parts of the post

  • "In each universe, life arises, forms technological civilizations, and culminates in the creation of a superintelligence which gains complete control over its home universe." -- not necessarily the case. In Christiano-style takeoff, there will be multiple competing AIs, none of which has complete control over the universe.
  • "So superintelligences may spend some time calculating the most likely distribution of superintelligences in foreign universes, figure out how those superintelligences would acausally “negotiate”, and then join a pact such that all superintelligences in the pact agree to replace their own values with a value set based on the average of all the superintelligences in the pact." -- this isn't clear to me. It could be the case that, for instance, two superintelligences do a value handshake with each other, but not with any other superintelligences. Or maybe another way of putting it is: the universal law will have lots of conditional statements like "if your universe has resource of X kind, then take action Y". So the universal law won't literally cause all universes to "look the same" in terms of the actions the AIs are taking.
  • "they might also think of this as an example of the counterfactual mugging, and decide to weight their values more in order to do better in the counterfactual case where they are less powerful. This might also simplify the calculation of trying to decide what the values of the pact would be." -- I think this is likely to be much more tricky than what Scott is imagining. Basically, it's not clear which "game" will be considered as the "behind the veil of ignorance" version. See my page on veil of ignorance. In particular, there are two competing ideas: "Interestingly, Dai uses this reasoning to reach the conclusion that one might care less about astronomical waste, while Karnofsky uses this reasoning to give more weight to long-term worldviews (since they are relatively more neglected)."
  • "If they decide to negotiate this way, the pact will be to maximize the total utility of all the entities in the universe willing to join the pact" -- not really clear what this is even supposed to mean. If we are literally counting every single entity, then e.g. entities that breed more will get a larger share of the multiverse. If he means instead just the representative superintelligence from each universe, then there will probably be some superintelligence that "had no shot" at being stronger for some fundamental reason (e.g. maybe its values are super complicated, or one of its values is to preserve resources or something weird like that); then in no version of the universe will they have been in a "winning position" (prior to considering veil of ignorance). Do you count these superintelligences equally?
  • "But “maximize the total utility of all the entities in the universe” is just the moral law, at least according to utilitarians (and, considering the way this is arrived at, probably contractarians too)." -- ???? similar to the previous point I guess. If he wants to count literally every single entity, then he is incentivizing the creation of breeders. If he's counting the "representative superintelligence", then the "moral law" doesn't quite say what he wants it to.
  • "This would be metaphysically simplest if it were done exactly as the mortal dies in its own universe, leaving nothing behind except a clean continuity of consciousness into the simulated world." -- it's hard for me to see how this would actually work. The notions of time between different universes seems incompatible. Or does Scott mean that you start simulating at the state where the mortal dies? But to be able to obtain that state, you need to already simulate the mortal's life up to the point where they die!

References