Difference between revisions of "Something like realism about rationality"

Revision as of 03:40, 16 June 2020

Something like realism about rationality is a topic of debate among people working on AI safety. The "something like" refers to the fact that the very topic of what the disagreement is even about is in disagreement. Thus the term gestures at some broad-ish cluster of things rather than one specific disagreement. This topic comes up when discussing the value of MIRI's HRAD work.

History

The general idea has been discussed for a long time, under various names. (e.g. intelligibility of intelligence?)

The phrase "realism about rationality" was introduced by Richard Ngo in September 2018.^[1]

Topics of disagreement

the original "realism about rationality" claim: "a mindset in which reasoning and intelligence are more like momentum than like fitness".^[2] This was then discussed on LW and a bunch of people took issue with the framing etc.
"Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?"^[3]
- I think a more precise version of the above is the following: Will mathematical theories of embedded rationality always be the relatively-imprecise theories that can’t scale to “2+ levels above” in terms of abstraction?^[4]

From Rohin's comment here, we can build the case against "rationality realism" in three steps:

It's very hard to use relatively-imprecise theories to build things "2+ levels above".
Real AGI systems are "2+ levels above" mathematical theories of embedded rationality.
Mathematical theories of embedded rationality will always be the relatively-imprecise theories that can't scale to "2+ levels above".

In contrast, Richard's original case seems to be more like "mathematical theories of embedded rationality do not exist to be discovered", which seems off the mark.

notes

"The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'." [1] -- so perhaps the disagreement is more like (or additionally about) "at what level of 'making it easy to design the right systems' is HRAD work justified?", where MIRI is like "being analogous to Bayesian justifications in modern ML is enough" and skeptics are like "you need a theory sufficiently precise to build a hierarchy of abstractions/axiomatically specify AGI!"

(See also Wei Dai's list, where he specifically says concretely applying decision theory stuff to an AI isn't what is motivating him. And Rob's post, which has more info.)

References

[1] ttps://www.lesswrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality

[2] ttps://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality

[3] ttps://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality#comment-YMNwHcPNPd4pDK7MR

[4] ttps://www.greaterwrong.com/posts/suxvE2ddnYMPJN9HD/realism-about-rationality/comment/aNHTM9iwfrCeAcHtn

[1]

[2]

[3]

[4]

@@ Line 18: / Line 18: @@
 # Real AGI systems are "2+ levels above" mathematical theories of embedded rationality.
 # Mathematical theories of embedded rationality will always be the relatively-imprecise theories that can't scale to "2+ levels above".
+In contrast, Richard's original case seems to be more like "mathematical theories of embedded rationality do not exist to be discovered", which seems off the mark.
 ==notes==

Difference between revisions of "Something like realism about rationality"

Revision as of 03:40, 16 June 2020

Contents

History

Topics of disagreement

notes

See also

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools