How doomed are ML safety approaches?
I want to understand better the MIRI case for thinking that ML-based safety approaches (like paul christiano's agenda) are so hopeless as to not be worth working on (or something like that).
in particular, which one of the following is the case closest to?
- a highly intelligent AI would see things humans cannot see, can arrive at unanticipated solutions, etc. therefore it seems pretty imprudent/careless/whatever to go ahead and try to build an AGI via ML without really understanding what is going on.
- in addition to (1), we have some sketchy ideas of why things could be bad by default. for instance, optimization daemons could be a thing (humans are an existence proof that this isn't impossible). we cannot tell if any of these sketchy ideas are likely, but they are theoretically possible.
- in addition to the stuff in (1) and (2), we actually additionally have one of the following:
- a really good argument for why ML-based approaches are doomed, but it's really hard to write down / too long to write down / we don't have enough time to write it down / there's too much inferential distance to cover
- a really strong intuition that the problems in (1) and (2) really will actually be problems, but it's super hard to articulate this intuition.
https://intelligence.org/2015/07/27/miris-approach/ -- this article, in particular the section "Creating a powerful AI system without understanding why it works is dangerous", gives a basic case for (1).
https://agentfoundations.org/item?id=1220 -- this is somewhere between (1) and (3.1) i think? like, it's more detailed than just (1), but i wouldn't call this a super watertight argument. specifically, point (4) in the argument seems sketchy; it's basically just saying "advanced ML systems cannot be aligned".
"The biggest indication that [top-level reasoning might not stay dominant on default AI development path] is that we currently don’t have an in-principle theory for good reasoning (e.g. we’re currently confused about logical uncertainty and multi-level models), and it doesn’t look like these theories will be developed without a concerted effort. Usually, theory lags behind common practice." "“MIRI” has a strong intuition that this won’t be the case, and personally I’m somewhat confused about the details; see Nate’s comments below for details."
paul: 'As far as I can tell, the MIRI view is that my work is aimed at problem which is *not possible,* not that it is aimed at a problem which is too easy.' 'One part of this is the disagreement about whether the overall approach I'm taking could possibly work, with my position being "something like 50-50" the MIRI position being "obviously not" (and normal ML researchers' positions being skepticism about our perspective on the problem).' [1] -- i'm really confused where this "obviously not" is coming from.