Stupid questions - Revision history

Issa at 20:58, 26 March 2021

2021-03-26T20:58:24Z

Issa at 20:58, 26 March 2021

2021-03-26T20:58:11Z

Issa at 23:36, 17 February 2020

2020-02-17T23:36:02Z

Issa: Created page with "* there's a bunch of different considerations that people talk about (like different takeoff scenarios, comparisons to nuclear arms control, etc.) and it's unclear to me how t..."

2020-02-17T23:34:43Z

Created page with "* there's a bunch of different considerations that people talk about (like different takeoff scenarios, comparisons to nuclear arms control, etc.) and it's unclear to me how t..."

New page

* there's a bunch of different considerations that people talk about (like different takeoff scenarios, comparisons to nuclear arms control, etc.) and it's unclear to me how the answers to these questions should influence our actions. even if we hammer out these strategy questions, would that change any of our actions? like if we suddenly knew with 100% certainty that there are three big insights needed to go from chimpanzee brains to human brains (but we wouldn't know the content of the insights), what does that mean, in terms of what to do about AI safety?
* what is the minimal set of background assumptions/parameters that are needed to characterize the debate between eliezer and paul? (i am thinking of each person's views as being "emergent" from some set of background assumptions.) e.g. [https://lw2.issarice.com/posts/mJ5oNYnkYrd4sD5uE/clarifying-some-key-hypotheses-in-ai-alignment] captures some of these, but i don't think this is a minimal set (there are several others that i think are missing, and also some of the hypotheses listed here might be redundant/irrelevant.)
* <strike>in paul's iterated amplification scheme, i don't understand why we can't just stop after the first iteration and use the human-level AI to do things; why do we have to keep amplifying?</strike> -- i figured out the answer. i was mistaken about how capable the first round of IDA is, because the writeup itself was confusing. see [https://lw2.issarice.com/posts/HqLxuZ4LhaFhmAHWk/iterated-distillation-and-amplification-1#GeBLKN5FJjzDmdGqn my comment here] for more.
* what is the difference between informed oversight and reward engineering?
* what are some "easy"/doable open problems in agent foundations research? (if someone was doing a PhD in agent foundations, what problems would their advisor suggest for them?)
* what happened to intelligence amplification? in the early days of AI safety, people talked a lot about various intelligence amplification methods for navigating the singularity (e.g. cloning human geniuses, whole brain emulation, cognitive enhancement drugs or implanting technology). The idea is that intelligence amplification will give us aligned entities that are smarter than us, which will help us to eventually get a friendly AI. Intelligence amplification was one of the three "main" paths that were discussed (along with technical alignment work, aka FAI theory, and coordination). nowadays when you look at sort of "who is working on AI safety full time?", you have the technical researchers, who work on different agendas but all of them focused on aligning de novo AI, and then you have the policy people, who are looking at coordination.
* is there an easy problem of corrigibility? if so, what is it? if not, why did eliezer introduce the hard problem of corrigibility?
* AI safety prepping: what can individuals do to maximize their chances of surviving the singularity?
* does subagent imply mesa-optimizer? (also in the mesa-optimizers paper, "learned algorithm" doesn't seem like it needs to be learned, which is confusing)
* i'm confused about whether/how the distilled agent in IDA is producing explanations of its own outputs, and also what these explanations look like.
* was the timeline discrepancy between eliezer and carl ever resolved? if so, what was the resolution/new estimate?

← Older revision		Revision as of 20:58, 26 March 2021
Line 10:		Line 10:
	* i'm confused about whether/how the distilled agent in [[IDA]] is producing explanations of its own outputs, and also what these explanations look like.		* i'm confused about whether/how the distilled agent in [[IDA]] is producing explanations of its own outputs, and also what these explanations look like.
	* was the timeline discrepancy between [[eliezer]] and [[carl]] ever resolved? if so, what was the resolution/new estimate?		* was the timeline discrepancy between [[eliezer]] and [[carl]] ever resolved? if so, what was the resolution/new estimate?
		+
		+	[[Category:AI safety]]

@@ Line 1: / Line 1: @@
-* there's a bunch of different considerations that people talk about (like different takeoff scenarios, comparisons to nuclear arms control, etc.) and it's unclear to me how the answers to these questions should influence our actions. even if we hammer out these strategy questions, would that change any of our actions? like if we suddenly knew with 100% certainty that there are three big insights needed to go from chimpanzee brains to human brains (but we wouldn't know the content of the insights), what does that mean, in terms of what to do about AI safety?
+* there's a bunch of different considerations that people talk about (like different [[takeoff scenario]]s, comparisons to nuclear arms control, etc.) and it's unclear to me how the answers to these questions should influence our actions. even if we hammer out these strategy questions, would that change any of our actions? like if we suddenly knew with 100% certainty that there are three big insights needed to go from chimpanzee brains to human brains (but we wouldn't know the content of the insights), what does that mean, in terms of what to do about AI safety?
 * what is the minimal set of background assumptions/parameters that are needed to characterize the debate between eliezer and paul? (i am thinking of each person's views as being "emergent" from some set of background assumptions.) e.g. [https://lw2.issarice.com/posts/mJ5oNYnkYrd4sD5uE/clarifying-some-key-hypotheses-in-ai-alignment] captures some of these, but i don't think this is a minimal set (there are several others that i think are missing, and also some of the hypotheses listed here might be redundant/irrelevant.)
 * <strike>in paul's iterated amplification scheme, i don't understand why we can't just stop after the first iteration and use the human-level AI to do things; why do we have to keep amplifying?</strike> -- i figured out the answer. i was mistaken about how capable the first round of IDA is, because the writeup itself was confusing. see [https://lw2.issarice.com/posts/HqLxuZ4LhaFhmAHWk/iterated-distillation-and-amplification-1#GeBLKN5FJjzDmdGqn my comment here] for more.
-* what is the difference between informed oversight and reward engineering?
+* what is the difference between [[informed oversight]] and [[reward engineering]]?
 * what are some "easy"/doable open problems in [[agent foundations]] research? (if someone was doing a PhD in agent foundations, what problems would their advisor suggest for them?)
-* what happened to intelligence amplification? in the early days of AI safety, people talked a lot about various intelligence amplification methods for navigating the singularity (e.g. cloning human geniuses, [[whole brain emulation]], cognitive enhancement drugs or implanting technology). The idea is that intelligence amplification will give us aligned entities that are smarter than us, which will help us to eventually get a friendly AI. Intelligence amplification was one of the three "main" paths that were discussed (along with technical alignment work, aka FAI theory, and coordination). nowadays when you look at sort of "who is working on AI safety full time?", you have the technical researchers, who work on different agendas but all of them focused on aligning de novo AI, and then you have the policy people, who are looking at coordination.
+* what happened to [[intelligence amplification]]? in the early days of AI safety, people talked a lot about various intelligence amplification methods for navigating the singularity (e.g. cloning human geniuses, [[whole brain emulation]], cognitive enhancement drugs or implanting technology). The idea is that intelligence amplification will give us aligned entities that are smarter than us, which will help us to eventually get a friendly AI. Intelligence amplification was one of the three "main" paths that were discussed (along with technical alignment work, aka FAI theory, and coordination). nowadays when you look at sort of "who is working on AI safety full time?", you have the technical researchers, who work on different agendas but all of them focused on aligning de novo AI, and then you have the policy people, who are looking at coordination.
-* is there an easy problem of corrigibility? if so, what is it? if not, why did eliezer introduce the hard problem of corrigibility?
+* is there an easy problem of [[corrigibility]]? if so, what is it? if not, why did [[Eliezer]] introduce the hard problem of corrigibility?
 * AI safety prepping: what can individuals do to maximize their chances of surviving the singularity?
-* does subagent imply mesa-optimizer? (also in the mesa-optimizers paper, "learned algorithm" doesn't seem like it needs to be learned, which is confusing)
+* does subagent imply [[mesa-optimizer]]? (also in the mesa-optimizers paper, "learned algorithm" doesn't seem like it needs to be learned, which is confusing)
 * i'm confused about whether/how the distilled agent in [[IDA]] is producing explanations of its own outputs, and also what these explanations look like.
 * was the timeline discrepancy between [[eliezer]] and [[carl]] ever resolved? if so, what was the resolution/new estimate?