Difference between revisions of "Goalpost for usefulness of HRAD work"
(8 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | There's a pattern I see where: | + | When thinking about the question of "How useful is [[HRAD]] work?", what standards/goalposts should we use? There's a pattern I see where: |
− | * people advocating [[HRAD]] research bring up historical cases like Turing, Shannon, etc. where formalization worked well | + | * people advocating [[HRAD]] research bring up historical cases like Turing, Shannon, etc. where formalization worked well. There is also the [[deconfusion research]] framing, where just understanding what's going on better is a form of progress. |
* people arguing against HRAD research talk about how "complete axiomatic descriptions" haven't been useful so far in AI, and how they aren't used to describe machine learning systems | * people arguing against HRAD research talk about how "complete axiomatic descriptions" haven't been useful so far in AI, and how they aren't used to describe machine learning systems | ||
It seems like there's a question of what is the relevant goalpost, for deciding whether HRAD work is useful. | It seems like there's a question of what is the relevant goalpost, for deciding whether HRAD work is useful. | ||
+ | |||
+ | This is an example of what I mean, when I say that the goalpost is set at an easier spot by MIRI: "Techniques you can actually adapt in a safe AI, come the day, will probably have very simple cores — the sort of core concept that takes up three paragraphs, where any reviewer who didn’t spend five years struggling on the problem themselves will think, “Oh I could have thought of that.” Someday there may be a book full of clever and difficult things to say about the simple core — contrast the simplicity of the core concept of causal models, versus the complexity of proving all the clever things Judea Pearl had to say about causal models. But the planetary benefit is mainly from posing understandable problems crisply enough so that people can see they are open, and then from the simpler abstract properties of a found solution — complicated aspects will not carry over to real AIs later." [https://intelligence.org/files/OpenPhil2016Supplement.pdf#page=13] | ||
+ | |||
+ | In contrast, the kind of goalpost [[Daniel Dewey]] sets in [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design] seems much harder/more restrictive. | ||
+ | |||
+ | ---- | ||
* will early advanced AI systems be understandable in terms of HRAD's formalisms? [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design#3__What_do_I_think_about_HRAD_] | * will early advanced AI systems be understandable in terms of HRAD's formalisms? [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design#3__What_do_I_think_about_HRAD_] | ||
Line 11: | Line 17: | ||
** what will early advanced AI systems look like? | ** what will early advanced AI systems look like? | ||
* how convincing historical examples are (e.g. Shannon, Turing, Bayes, Pearl, Kolmogorov, null-terminated strings in C [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design#Z6TbXivpjxWyc8NYM], [https://www.facebook.com/danielfilan/posts/10212534556141446] [https://www.facebook.com/robbensinger/posts/10160644236785447], Eliezer also brings up the Shannon vs Poe chess example) See also [[selection effect for successful formalizations]]. | * how convincing historical examples are (e.g. Shannon, Turing, Bayes, Pearl, Kolmogorov, null-terminated strings in C [https://eaforum.issarice.com/posts/SEL9PW8jozrvLnkb4/my-current-thoughts-on-miri-s-highly-reliable-agent-design#Z6TbXivpjxWyc8NYM], [https://www.facebook.com/danielfilan/posts/10212534556141446] [https://www.facebook.com/robbensinger/posts/10160644236785447], Eliezer also brings up the Shannon vs Poe chess example) See also [[selection effect for successful formalizations]]. | ||
+ | |||
+ | ==See also== | ||
+ | |||
+ | * [[List of success criteria for HRAD work]] | ||
+ | * [[Something like realism about rationality]] | ||
+ | |||
+ | ==External links== | ||
+ | |||
+ | * https://www.greaterwrong.com/posts/BGxTpdBGbwCWrGiCL/plausible-cases-for-hrad-work-and-locating-the-crux-in-the | ||
[[Category:AI safety]] | [[Category:AI safety]] |
Latest revision as of 20:17, 26 June 2020
When thinking about the question of "How useful is HRAD work?", what standards/goalposts should we use? There's a pattern I see where:
- people advocating HRAD research bring up historical cases like Turing, Shannon, etc. where formalization worked well. There is also the deconfusion research framing, where just understanding what's going on better is a form of progress.
- people arguing against HRAD research talk about how "complete axiomatic descriptions" haven't been useful so far in AI, and how they aren't used to describe machine learning systems
It seems like there's a question of what is the relevant goalpost, for deciding whether HRAD work is useful.
This is an example of what I mean, when I say that the goalpost is set at an easier spot by MIRI: "Techniques you can actually adapt in a safe AI, come the day, will probably have very simple cores — the sort of core concept that takes up three paragraphs, where any reviewer who didn’t spend five years struggling on the problem themselves will think, “Oh I could have thought of that.” Someday there may be a book full of clever and difficult things to say about the simple core — contrast the simplicity of the core concept of causal models, versus the complexity of proving all the clever things Judea Pearl had to say about causal models. But the planetary benefit is mainly from posing understandable problems crisply enough so that people can see they are open, and then from the simpler abstract properties of a found solution — complicated aspects will not carry over to real AIs later." [1]
In contrast, the kind of goalpost Daniel Dewey sets in [2] seems much harder/more restrictive.
- will early advanced AI systems be understandable in terms of HRAD's formalisms? [3]
- how convincing historical examples are (e.g. Shannon, Turing, Bayes, Pearl, Kolmogorov, null-terminated strings in C [6], [7] [8], Eliezer also brings up the Shannon vs Poe chess example) See also selection effect for successful formalizations.