How doomed are ML safety approaches?

From Issawiki
Revision as of 00:41, 20 February 2020 by Issa (talk | contribs)
Jump to: navigation, search

I want to understand better the MIRI case for thinking that ML-based safety approaches (like paul christiano's agenda) are so hopeless as to not be worth working on (or something like that).

in particular, which one of the following is the case closest to?

  1. a highly intelligent AI would see things humans cannot see, can arrive at unanticipated solutions, etc. therefore it seems pretty imprudent/careless/whatever to go ahead and try to build an AGI via ML without really understanding what is going on.
  2. in addition to (1), we have some sketchy ideas of why things could be bad by default. for instance, optimization daemons could be a thing (humans are an existence proof that this isn't impossible). we cannot tell if any of these sketchy ideas are likely, but they are theoretically possible.
  3. in addition to the stuff in (1) and (2), we actually additionally have one of the following:
    1. a really good argument for why ML-based approaches are doomed, but it's really hard to write down / too long to write down / we don't have enough time to write it down / there's too much inferential distance to cover
    2. a really strong intuition that the problems in (1) and (2) really will actually be problems, but it's super hard to articulate this intuition.

https://intelligence.org/2015/07/27/miris-approach/ -- this article, in particular the section "Creating a powerful AI system without understanding why it works is dangerous", gives a basic case for (1).

https://agentfoundations.org/item?id=1220 -- this is somewhere between (1) and (3.1) i think? like, it's more detailed than just (1), but i wouldn't call this a super watertight argument. specifically, point (4) in the argument seems sketchy; it's basically just saying "advanced ML systems cannot be aligned".