Search results

Jump to: navigation, search
  • ...is trying to train. So we won't have something like the agent pushing the overseer over so that it can press to "i approve" button.
    9 KB (1,597 words) - 00:27, 6 October 2020
  • * [[overseer]] * [[bandwidth of the overseer]], [[high bandwidth oversight]], [[low bandwidth oversight]]
    1 KB (125 words) - 03:58, 26 April 2020
  • ...is smarter than the agent you are trying to train, you can safely use that overseer’s judgment as an objective." ...We can train an RL system using very sparse feedback, so it’s OK if that overseer is very computationally expensive."
    4 KB (702 words) - 03:57, 26 April 2020