Difference between revisions of "MIRI vs Paul research agenda hypotheses"
Line 9: | Line 9: | ||
Taking [[Owen]]'s suggestion,<ref>https://agentfoundations.org/item?id=1242</ref> we can change this to: | Taking [[Owen]]'s suggestion,<ref>https://agentfoundations.org/item?id=1242</ref> we can change this to: | ||
− | * "The first AI systems capable of pivotal acts will use good consequentialist reasoning." | + | * (M.1) "The first AI systems capable of pivotal acts will use good consequentialist reasoning." |
** this could be false if we have something like [[KANSI]] or [[Drexler]]'s [[CAIS]] | ** this could be false if we have something like [[KANSI]] or [[Drexler]]'s [[CAIS]] | ||
− | * "The default AI development path will not produce good consequentialist reasoning at the top level." | + | * (M.2) "The default AI development path will not produce good consequentialist reasoning at the top level." |
− | * "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests." | + | * (M.3) "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests." |
** this is the hypothesis paul attacks: he is saying, even without top-level consequentialist reasoning, we can align AI systems. | ** this is the hypothesis paul attacks: he is saying, even without top-level consequentialist reasoning, we can align AI systems. | ||
** I guess this is also the premise that "AI will be safe by default" people would reject: the top-level reasoning stays dominant, so the subsystems aren't really packaged with a random goal. | ** I guess this is also the premise that "AI will be safe by default" people would reject: the top-level reasoning stays dominant, so the subsystems aren't really packaged with a random goal. | ||
− | * AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe. | + | * (M.4) AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe. |
key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. [[Eliezer]]'s criticisms of [[IDA]] can be seen as attacking each of the "key hopes") | key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. [[Eliezer]]'s criticisms of [[IDA]] can be seen as attacking each of the "key hopes") | ||
− | * "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective." | + | * (P.1) "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective." |
− | * "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive." | + | * (P.2) "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive." |
− | * "A team of aligned agents may be smarter than any individual agent, while remaining aligned." | + | * (P.3) "A team of aligned agents may be smarter than any individual agent, while remaining aligned." |
==See also== | ==See also== |
Revision as of 08:00, 5 March 2020
from "The concern" in https://agentfoundations.org/item?id=1220
- "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
- "The default AI development path will not produce good consequentialist reasoning at the top level."
- "Therefore, on the default AI development path, the first AI systems capable of pivotal acts will have good consequentialist subsystem reasoning but not good consequentialist top-level reasoning."
- "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
- "Therefore, the default AI development path will produce, as the first AI systems capable of pivotal acts, AI systems with goals not aligned with human interests, causing catastrophe."
Taking Owen's suggestion,[1] we can change this to:
- (M.1) "The first AI systems capable of pivotal acts will use good consequentialist reasoning."
- (M.2) "The default AI development path will not produce good consequentialist reasoning at the top level."
- (M.3) "Consequentialist subsystem reasoning will likely come “packaged with a random goal” in some sense, and this goal will not be aligned with human interests."
- this is the hypothesis paul attacks: he is saying, even without top-level consequentialist reasoning, we can align AI systems.
- I guess this is also the premise that "AI will be safe by default" people would reject: the top-level reasoning stays dominant, so the subsystems aren't really packaged with a random goal.
- (M.4) AI systems capable of pivotal acts with goals not aligned with human interests will cause catastrophe.
key hopes listed in https://www.greaterwrong.com/posts/HCv2uwgDGf5dyX5y6/preface-to-the-sequence-on-iterated-amplification (TODO: see if e.g. Eliezer's criticisms of IDA can be seen as attacking each of the "key hopes")
- (P.1) "If you have an overseer who is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective."
- (P.2) "We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
- (P.3) "A team of aligned agents may be smarter than any individual agent, while remaining aligned."