Something like realism about rationality

From Issawiki
Revision as of 01:45, 16 June 2020 by Issa (talk | contribs) (Topics of disagreement)
Jump to: navigation, search

Something like realism about rationality is a topic of debate among people working on AI safety. The "something like" refers to the fact that the very topic of what the disagreement is even about is in disagreement. Thus the term gestures at some broad-ish cluster of things rather than one specific disagreement. This topic comes up when discussing the value of MIRI's HRAD work.

History

The general idea has been discussed for a long time, under various names. (e.g. intelligibility of intelligence?)

The phrase "realism about rationality" was introduced by Richard Ngo in September 2018.[1]

Topics of disagreement

  • the original "realism about rationality" claim: [describe the claim here]. This was then discussed on LW and a bunch of people took issue with the framing etc
  • "Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?"[2]
    • I think a more precise version of the above is the following: Will mathematical theories of embedded rationality always be the relatively-imprecise theories that can’t scale to “2+ levels above” in terms of abstraction?[3]

notes

"The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'." [1] -- so perhaps the disagreement is more like (or additionally about) "at what level of 'making it easy to design the right systems' is HRAD work justified?", where MIRI is like "being analogous to Bayesian justifications in modern ML is enough" and skeptics are like "you need a theory sufficiently precise to build a hierarchy of abstractions/axiomatically specify AGI!"

(See also Wei Dai's list, where he specifically says concretely applying decision theory stuff to an AI isn't what is motivating him. And Rob's post, which has more info.)

Eliezer: "Techniques you can actually adapt in a safe AI, come the day, will probably have very simple cores — the sort of core concept that takes up three paragraphs, where any reviewer who didn’t spend five years struggling on the problem themselves will think, “Oh I could have thought of that.” Someday there may be a book full of clever and difficult things to say about the simple core — contrast the simplicity of the core concept of causal models, versus the complexity of proving all the clever things Judea Pearl had to say about causal models. But the planetary benefit is mainly from posing understandable problems crisply enough so that people can see they are open, and then from the simpler abstract properties of a found solution — complicated aspects will not carry over to real AIs later." [2]

See also

References