List of success criteria for HRAD work
This page is a list of success criteria that have been proposed for HRAD work. Most of these are correlated, so this isn't anything like a list of independent ways HRAD could succeed. The idea is to list out more concrete ways in which HRAD work will be useful.
- resembles the work of Turing, Shannon, Bayes, etc
- HRAD will help in a way that is analogous to how Bayesian justifications are useful in machine learning [1]
- helps AGI programmers avoid mistakes analogous to the use of null-terminated strings in C
- early advanced AI systems will be understandable in terms of HRAD's formalisms [2] (need to clarify what it means to be understandable in terms of a formalism)
- helps AGI programmers fix problems in early advanced AI systems
- helps AGI programmers predict problems in early advanced AI systems
- helps AGI programmers postdict/explain problems in early advanced AI systems
- ideas from HRAD will be a "useful source of inspiration" for ML/AGI work [3]
- when applying HRAD to actual systems, there will be "theoretically satisfying approximation methods" that make this application possible [4]
- when applying HRAD to actual systems, the approximation methods used will preserve the important desirable properties of HRAD work [5]
- the conceptual framework chosen in HRAD work and the conceptual framework that best describes early advanced AI systems will be compatible enough for it to be enlightening to use HRAD to describe these systems [6]
- helps for "broadly understanding how the system is reasoning about the world" [7]
- helps for checking whether AI systems are aligned
- "developing a basic understanding of counterfactual reasoning improves our ability to understand the first AGI systems in a general way, and if we understand AGI better it's likelier we can build systems to address deception, edge instantiation, goal instability, and a number of other problems" [8]
- will HRAD result in a theory of rationality that is sufficiently precise to build hierarchies of abstraction? [9]
- "directly use the results of the research to build an AI system, rather than using it to inform existing efforts to build AI" [10]
- helps to "make advance plans and predictions, shoot for narrow design targets, and understand what they’re doing well enough to avoid the kinds of kludgey, opaque, non-modular, etc. approaches that aren’t really compatible with how secure or robust software is developed" [11]