Something like realism about rationality

From Issawiki
Revision as of 18:51, 16 June 2020 by Issa (talk | contribs) (Topics of disagreement)
Jump to: navigation, search

Something like realism about rationality is a topic of debate among people working on AI safety. The "something like" refers to the fact that the very topic of what the disagreement is even about is in disagreement. Thus the term gestures at some broad-ish cluster of things rather than one specific disagreement. This topic comes up when discussing the value of MIRI's HRAD work.

History

The general idea has been discussed for a long time, under various names. (e.g. intelligibility of intelligence?)

The phrase "realism about rationality" was introduced by Richard Ngo in September 2018.[1]

Topics of disagreement

  • the original "realism about rationality" claim: "a mindset in which reasoning and intelligence are more like momentum than like fitness".[2] This was then discussed on LW and a bunch of people took issue with the framing etc.
  • "Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?"[3]
    • I think a more precise version of the above is the following: Will mathematical theories of embedded rationality always be the relatively-imprecise theories that can’t scale to “2+ levels above” in terms of abstraction?[4]

From Rohin's comment here, we can build the case against "rationality realism" using three premises:

  1. It's very hard to use relatively-imprecise theories to build things "2+ levels above".
  2. Real AGI systems are "2+ levels above" mathematical theories of embedded rationality.
  3. Mathematical theories of embedded rationality will always be the relatively-imprecise theories that can't scale to "2+ levels above".

In contrast, Richard's original case seems to be more like "mathematical theories of embedded rationality do not exist to be discovered", which seems off the mark.

I think maybe Abram will reject premise (3) (since he did agree to the crux presented by Rohin). But I think Nate/Eliezer will rather reject premise (1), based on what they've been saying?

In [5] Abram gives two possible meanings for "model agency exactly" (which I take to be the what a mathematical theory of embedded rationality must do), namely building agents from the ground up vs being able to take an arbitrary AI system (e.g. one built with ML) and then being able to predict roughly what it does.

notes

"The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'." [1] -- so perhaps the disagreement is more like (or additionally about) "at what level of 'making it easy to design the right systems' is HRAD work justified?", where MIRI is like "being analogous to Bayesian justifications in modern ML is enough" and skeptics are like "you need a theory sufficiently precise to build a hierarchy of abstractions/axiomatically specify AGI!"

(See also Wei Dai's list, where he specifically says concretely applying decision theory stuff to an AI isn't what is motivating him. And Rob's post, which has more info.)

See also

References