Difference between revisions of "MIRI vs Paul research agenda hypotheses"

Revision as of 08:00, 5 March 2020

from "The concern" in https://agentfoundations.org/item?id=1220

"The first AI systems capable of pivotal acts will use good consequentialist reasoning."
"The default AI development path will not produce good consequentialist reasoning at the top level."
"Therefore, on the default AI development path, the first AI systems capable of pivotal acts will have good consequentialist subsystem reasoning but not good consequentialist top-level reasoning."
"Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
"Therefore, the default AI development path will produce, as the first AI systems capable of pivotal acts, AI systems with goals not aligned with human interests, causing catastrophe."

Taking Owen's suggestion,^[1] we can change this to:

(M.1) "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
- this could be false if we have something like KANSI or Drexler's CAIS
(M.2) "The default AI development path will not produce good consequentialist reasoning at the top level."
(M.3) "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
- this is the hypothesis paul attacks: he is saying, even without top-level consequentialist reasoning, we can align AI systems.
- I guess this is also the premise that "AI will be safe by default" people would reject: the top-level reasoning stays dominant, so the subsystems aren't really packaged with a random goal.
(M.4) AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.

key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. Eliezer's criticisms of IDA can be seen as attacking each of the "key hopes")

(P.1) "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
(P.2) "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
(P.3) "A team of aligned agents may be smarter than any individual agent, while remaining aligned."

References

↑ https://agentfoundations.org/item?id=1242

[1] ttps://agentfoundations.org/item?id=1242

[1]

@@ Line 9: / Line 9: @@
 Taking [[Owen]]'s suggestion,<ref>https://agentfoundations.org/item?id=1242</ref> we can change this to:
-* "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
+* (M.1) "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
 ** this could be false if we have something like [[KANSI]] or [[Drexler]]'s [[CAIS]]
-* "The default AI development path will not produce good consequentialist reasoning at the top level."
+* (M.2) "The default AI development path will not produce good consequentialist reasoning at the top level."
-* "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
+* (M.3) "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
 ** this is the hypothesis paul attacks: he is saying, even without top-level consequentialist reasoning, we can align AI systems.
 ** I guess this is also the premise that "AI will be safe by default" people would reject: the top-level reasoning stays dominant, so the subsystems aren't really packaged with a random goal.
-* AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.
+* (M.4) AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.
 key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. [[Eliezer]]'s criticisms of [[IDA]] can be seen as attacking each of the "key hopes")
-* "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
+* (P.1) "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
-* "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
+* (P.2) "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
-* "A team of aligned agents may be smarter than any individual agent, while remaining aligned."
+* (P.3) "A team of aligned agents may be smarter than any individual agent, while remaining aligned."
 ==See also==

Difference between revisions of "MIRI vs Paul research agenda hypotheses"

Revision as of 08:00, 5 March 2020

See also

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools