Difference between revisions of "Something like realism about rationality"

From Issawiki
Jump to: navigation, search
(notes)
(See also)
Line 27: Line 27:
 
* [[Simple core of consequentialist reasoning]]
 
* [[Simple core of consequentialist reasoning]]
 
* [[Goalpost for usefulness of HRAD work]]
 
* [[Goalpost for usefulness of HRAD work]]
 +
* [[List of success criteria for HRAD work]]
  
 
==References==
 
==References==

Revision as of 21:11, 1 June 2020

Something like realism about rationality is a topic of debate among people working on AI safety. The "something like" refers to the fact that the very topic of what the disagreement is even about is in disagreement. Thus the term gestures at some broad-ish cluster of things rather than one specific disagreement.

History

The general idea has been discussed for a long time, under various names. (e.g. intelligibility of intelligence?)

The phrase "realism about rationality" was introduced by Richard Ngo in September 2018.[1]

Topics of disagreement

  • the original "realism about rationality" claim: [describe the claim here]. This was then discussed on LW and a bunch of people took issue with the framing etc
  • "Is there a theory of rationality that is sufficiently precise to build hierarchies of abstraction?" [1]

notes

"An example of how this relates to HRAD is that I think that Bayesian justifications are useful in ML, and that a good formal model of rationality in the face of logical uncertainty is likely to be useful in analogous ways. When I speak of foundational understanding making it easy to design the right systems, I’m trying to point at things like the usefulness of Bayesian justifications in modern ML." [2]

"The main case for HRAD problems is that we expect them to help in a gestalt way with many different known failure modes (and, plausibly, unknown ones). E.g., 'developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems'." [3] -- so perhaps the disagreement is more like (or additionally about) "at what level of 'making it easy to design the right systems' is HRAD work justified?", where MIRI is like "being analogous to Bayesian justifications in modern ML is enough" and skeptics are like "you need a theory sufficiently precise to build a hierarchy of abstractions/axiomatically specify AGI!"

(See also Wei Dai's list, where he specifically says concretely applying decision theory stuff to an AI isn't what is motivating him. And Rob's post, which has more info.)

Eliezer: "Techniques you can actually adapt in a safe AI, come the day, will probably have very simple cores — the sort of core concept that takes up three paragraphs, where any reviewer who didn’t spend five years struggling on the problem themselves will think, “Oh I could have thought of that.” Someday there may be a book full of clever and difficult things to say about the simple core — contrast the simplicity of the core concept of causal models, versus the complexity of proving all the clever things Judea Pearl had to say about causal models. But the planetary benefit is mainly from posing understandable problems crisply enough so that people can see they are open, and then from the simpler abstract properties of a found solution — complicated aspects will not carry over to real AIs later." [4]

See also

References