MIRI vs Paul research agenda hypotheses - Revision history

Issa at 03:57, 26 April 2020

2020-04-26T03:57:58Z

Issa at 20:02, 15 March 2020

2020-03-15T20:02:30Z

Issa at 20:01, 15 March 2020

2020-03-15T20:01:19Z

Issa at 19:55, 15 March 2020

2020-03-15T19:55:15Z

Issa at 19:52, 15 March 2020

2020-03-15T19:52:30Z

Issa at 08:28, 5 March 2020

2020-03-05T08:28:05Z

Issa at 08:00, 5 March 2020

2020-03-05T08:00:10Z

Issa at 07:58, 5 March 2020

2020-03-05T07:58:24Z

Issa at 07:57, 5 March 2020

2020-03-05T07:57:41Z

Issa at 07:56, 5 March 2020

2020-03-05T07:56:51Z

← Older revision		Revision as of 03:57, 26 April 2020
Line 44:		Line 44:

	<references/>		<references/>
		+
		+	[[Category:AI safety]]

@@ Line 17: / Line 17: @@
 * (M.4) AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.
-key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. [[Eliezer]]'s criticisms of [[IDA]] can be seen as attacking each of the "key hopes")
+key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification
 * (P.1) "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
@@ Line 34: / Line 34: @@
 * (E.4) "since Paul wants to use big unaligned neural nets to imitate humans, we have to worry about the possibility of adversarial behavior. He has suggested using large ensembles of agents and detecting and pruning the ones that are adversarial. However, this would require millions of samples per unaligned agent, which is prohibitively expensive."
 ** This one seems to be attacking (P.2), i.e. it's saying we can't use sparse feedback to train the system.
 ==See also==

@@ Line 31: / Line 31: @@
 ** This directly goes against (P.3), in particular the "smarter than any individual agent" part. (Well, to be precise, (P.3) does not actually state that these agents can reach ''arbitrary'' levels of capability, but Paul thinks IDA can compete with unaligned AGI, so the capabilities part would naturally fall under (P.3).)
 * (E.3) "while it is true that exact imitation of a human would avoid the issues of RL, it is harder to create exact imitation than to create superintelligence, and as soon as you have any imperfection in your imitation of a human, you very quickly get back the problems of RL."
 * (E.4) "since Paul wants to use big unaligned neural nets to imitate humans, we have to worry about the possibility of adversarial behavior. He has suggested using large ensembles of agents and detecting and pruning the ones that are adversarial. However, this would require millions of samples per unaligned agent, which is prohibitively expensive."
 ==See also==

@@ Line 27: / Line 27: @@
 * (E.1) "a collection of aligned agents interacting does not necessarily lead to aligned behavior"
 * (E.2) "it’s unclear that even with high bandwidth oversight, that a collection of agents could reach arbitrary levels of capability. For example, how could agents with an understanding of arithmetic invent Hessian-free optimization?"
 * (E.3) "while it is true that exact imitation of a human would avoid the issues of RL, it is harder to create exact imitation than to create superintelligence, and as soon as you have any imperfection in your imitation of a human, you very quickly get back the problems of RL."
 * (E.4) "since Paul wants to use big unaligned neural nets to imitate humans, we have to worry about the possibility of adversarial behavior. He has suggested using large ensembles of agents and detecting and pruning the ones that are adversarial. However, this would require millions of samples per unaligned agent, which is prohibitively expensive."

@@ Line 23: / Line 23: @@
 * (P.3) "A team of aligned agents may be smarter than any individual agent, while remaining aligned."
 ** this contradicts (M.3), which says that eventually the team of aligned agents will become "packaged with a random goal" in some sense.
 ==See also==

@@ Line 9: / Line 9: @@
 Taking [[Owen]]'s suggestion,<ref>https://agentfoundations.org/item?id=1242</ref> we can change this to:
-* "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
+* (M.1) "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
 ** this could be false if we have something like [[KANSI]] or [[Drexler]]'s [[CAIS]]
-* "The default AI development path will not produce good consequentialist reasoning at the top level."
+* (M.2) "The default AI development path will not produce good consequentialist reasoning at the top level."
-* "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
+* (M.3) "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
 ** this is the hypothesis paul attacks: he is saying, even without top-level consequentialist reasoning, we can align AI systems.
 ** I guess this is also the premise that "AI will be safe by default" people would reject: the top-level reasoning stays dominant, so the subsystems aren't really packaged with a random goal.
-* AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.
+* (M.4) AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.
 key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. [[Eliezer]]'s criticisms of [[IDA]] can be seen as attacking each of the "key hopes")
-* "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
+* (P.1) "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
-* "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
+* (P.2) "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
-* "A team of aligned agents may be smarter than any individual agent, while remaining aligned."
+* (P.3) "A team of aligned agents may be smarter than any individual agent, while remaining aligned."
 ==See also==