Big World, Small World
Jan. 12th, 2019 04:46 pmAt the last AI Safety reading group meeting, we discussed the Decision Theory section of MIRI's Embedded Agency post.
We had some confusion over the idea of an agent being 'bigger than the world' or 'smaller than the world'. This is a braindump of my thoughts on the spectrum of 'small' vs 'big' worlds.
It's a Small World
At one extreme, there are problems that are completely computable, such that the agent can hold a complete and precise map of the game 'in their head'. Examples include very simple discrete games like tic-tac-toe. You can also think of simple utility-maximization problems you'd see in basic microeconomics, optimizing a differentiable f(x1,x2) subject to differentiable constraints g(x1,x2) = C, h(x1,x2) = D. (At the continuous limit, where f, g, h become integrals over a continuous variable x, this is solved by one of my favorite tools, the Euler-Lagrange equation.)
At first blush, this class of problems seems pretty irrelevant compared to those that, yknow, require actual intellectual work. On reflection, that's not completely true. What's in this class is determined by what we have tractable algorithms for, so it's widened over time as theory and computing power have advanced. Also, difficult problems are often solved in part by approximating them as a solved-out problem. (Example: generate a game tree N moves out, evaluate game-states at time N, then choose the best move the same way you would in tic-tac-toe.)
A Little Bit Taller
Moving up the ladder, we have worlds that we can construct good representations or approximate solutions for. Often a solution is found through iteration over approximate solutions, with the convergence to some 'good' solution usually only empirically supported. At the simpler end, there may be theoretical convergence guarantees. Examples: finding the thermodynamic distribution of a molecular system through molecular modeling; playing chess or Go; playing Atari games.
Lots of useful problems lie around here. If I knew more computational complexity stuff, there would probably be useful things to say about polynomial vs exponential time.
Playing Ball
Now we introduce the second aspect of 'bigness': how good of a representation the world has of the agent. At the simplest level, the world may be able to approximate the agent as aiming for some fixed (i.e. non-agent-dependent) utility function. For example, most chess or Go algorithms will implicitly assume the agent is optimizing for the algorithm's own position-evaluation function. This could also matter when the agent plays against imperfect players of solvable games like Tic-Tac-Toe. In either case, it's possible to do better by modeling opponents correctly. There's probably some interesting stuff in this modeling-of-capacity question. Note that the basic game theory solution of computing Nash equilibria assumes the opponent is capable of computing Nash, will choose to pursue Nash (simulating the agent as a Nash player), and that there is a common-knowledge map of game rewards to each player's utility (each agent assuming the other's utility function).
Things get more complicated when the representation of the agent is parameterized in some way, such that the agent's actions provide information about its goals. In the simplest case, the agent knows exactly how its actions affect other players, and can tractably include this in its game-tree or utility function. In reality, the details of others' thinking is often complex, though we can hope that a simplified representation captures most of the information. A trading algorithm, for example, might be able to accurately model the market impact of its buy and sell orders, and use this information to maximize expected profits. However, in a real market, it usually has no way to learn that trader N has become more skittish in the X market, at least until it sees evidence that its trades there are having more impact than usual. Trading is a useful model here because the simulation effect varies continuously with trader size. Small traders in liquid markets can generally assume their trades have effectively no impact, and so are free to simply optimize on price + trader-models. In contrast, traders who provide a substantial portion of liquidity must assume that others are modeling them, opening the door to the potentially infinite hierarchy of models-of-models.
A very simple version of simulation is in cooperative coordination games, such as the Stag Hunt. For a slightly more sophisticated example, consider the Cross Game: each player chooses Left, Right, Top, Bottom, or Middle. If they both choose the same square, they get reward 1; else, reward 0. I like this model because it's easy to tweak in interesting ways, by changing the symmetry of the actions and introducing asymmetry in the rewards. You can introduce penalties for guessing wrong, or penalties for choosing the obvious Schelling point. What's 'correct' might depend a lot on what the players know about each other's mental structure, or what they think is deducible from first principles. What if the game is to choose a 5-digit decimal between .00001 and .99999, not inclusive? As with all things, this leads back to Solomonoff Induction, which relates to some neat ideas. And it's also pretty fun to think about humans rather than algorithms playing these games, though I make no claim that it's especially useful.
Head Games
At the extreme, of course, you get worlds with simulations of the agent. The simple end of this is Newcomb's problem. Functional Decision Theory is an attempt to deal with this, and I think it makes sense, but apparently it's not as concretely specified as traditional decision theorists would like. Not sure what the gaps are there, but that's an obvious place to look for deep conceptual difficulties/needs.
Before that point, though, there's a lot of fun, meaty stuff with regards to co-simulation. I'm no expert, but my sense is that this stuff is quite difficult to handle analytically, since you can't use your clean optimization tools as easily. A lot can depend on the precise level of the agent's ability to simulate the opponents, and vice-versa. One neat post is Scott Garrabrant on The Prisoners' Dilemma With Modeling Costs. It seems like the literature on bounded rationality and the like would be useful here. Anecdotally, it seems like a lot of rationalist advice for humans rests on tools that assume arbitrary computing power and world-knowledge (Solomonoff Induction, Bayes, expected-utility-maximization); that might lead us pretty far astray, so it would be nice to have tools for bounded agents. That notion seems to be at the heart of the Embedded Agency post.
We had some confusion over the idea of an agent being 'bigger than the world' or 'smaller than the world'. This is a braindump of my thoughts on the spectrum of 'small' vs 'big' worlds.
It's a Small World
At one extreme, there are problems that are completely computable, such that the agent can hold a complete and precise map of the game 'in their head'. Examples include very simple discrete games like tic-tac-toe. You can also think of simple utility-maximization problems you'd see in basic microeconomics, optimizing a differentiable f(x1,x2) subject to differentiable constraints g(x1,x2) = C, h(x1,x2) = D. (At the continuous limit, where f, g, h become integrals over a continuous variable x, this is solved by one of my favorite tools, the Euler-Lagrange equation.)
At first blush, this class of problems seems pretty irrelevant compared to those that, yknow, require actual intellectual work. On reflection, that's not completely true. What's in this class is determined by what we have tractable algorithms for, so it's widened over time as theory and computing power have advanced. Also, difficult problems are often solved in part by approximating them as a solved-out problem. (Example: generate a game tree N moves out, evaluate game-states at time N, then choose the best move the same way you would in tic-tac-toe.)
A Little Bit Taller
Moving up the ladder, we have worlds that we can construct good representations or approximate solutions for. Often a solution is found through iteration over approximate solutions, with the convergence to some 'good' solution usually only empirically supported. At the simpler end, there may be theoretical convergence guarantees. Examples: finding the thermodynamic distribution of a molecular system through molecular modeling; playing chess or Go; playing Atari games.
Lots of useful problems lie around here. If I knew more computational complexity stuff, there would probably be useful things to say about polynomial vs exponential time.
Playing Ball
Now we introduce the second aspect of 'bigness': how good of a representation the world has of the agent. At the simplest level, the world may be able to approximate the agent as aiming for some fixed (i.e. non-agent-dependent) utility function. For example, most chess or Go algorithms will implicitly assume the agent is optimizing for the algorithm's own position-evaluation function. This could also matter when the agent plays against imperfect players of solvable games like Tic-Tac-Toe. In either case, it's possible to do better by modeling opponents correctly. There's probably some interesting stuff in this modeling-of-capacity question. Note that the basic game theory solution of computing Nash equilibria assumes the opponent is capable of computing Nash, will choose to pursue Nash (simulating the agent as a Nash player), and that there is a common-knowledge map of game rewards to each player's utility (each agent assuming the other's utility function).
Things get more complicated when the representation of the agent is parameterized in some way, such that the agent's actions provide information about its goals. In the simplest case, the agent knows exactly how its actions affect other players, and can tractably include this in its game-tree or utility function. In reality, the details of others' thinking is often complex, though we can hope that a simplified representation captures most of the information. A trading algorithm, for example, might be able to accurately model the market impact of its buy and sell orders, and use this information to maximize expected profits. However, in a real market, it usually has no way to learn that trader N has become more skittish in the X market, at least until it sees evidence that its trades there are having more impact than usual. Trading is a useful model here because the simulation effect varies continuously with trader size. Small traders in liquid markets can generally assume their trades have effectively no impact, and so are free to simply optimize on price + trader-models. In contrast, traders who provide a substantial portion of liquidity must assume that others are modeling them, opening the door to the potentially infinite hierarchy of models-of-models.
A very simple version of simulation is in cooperative coordination games, such as the Stag Hunt. For a slightly more sophisticated example, consider the Cross Game: each player chooses Left, Right, Top, Bottom, or Middle. If they both choose the same square, they get reward 1; else, reward 0. I like this model because it's easy to tweak in interesting ways, by changing the symmetry of the actions and introducing asymmetry in the rewards. You can introduce penalties for guessing wrong, or penalties for choosing the obvious Schelling point. What's 'correct' might depend a lot on what the players know about each other's mental structure, or what they think is deducible from first principles. What if the game is to choose a 5-digit decimal between .00001 and .99999, not inclusive? As with all things, this leads back to Solomonoff Induction, which relates to some neat ideas. And it's also pretty fun to think about humans rather than algorithms playing these games, though I make no claim that it's especially useful.
Head Games
At the extreme, of course, you get worlds with simulations of the agent. The simple end of this is Newcomb's problem. Functional Decision Theory is an attempt to deal with this, and I think it makes sense, but apparently it's not as concretely specified as traditional decision theorists would like. Not sure what the gaps are there, but that's an obvious place to look for deep conceptual difficulties/needs.
Before that point, though, there's a lot of fun, meaty stuff with regards to co-simulation. I'm no expert, but my sense is that this stuff is quite difficult to handle analytically, since you can't use your clean optimization tools as easily. A lot can depend on the precise level of the agent's ability to simulate the opponents, and vice-versa. One neat post is Scott Garrabrant on The Prisoners' Dilemma With Modeling Costs. It seems like the literature on bounded rationality and the like would be useful here. Anecdotally, it seems like a lot of rationalist advice for humans rests on tools that assume arbitrary computing power and world-knowledge (Solomonoff Induction, Bayes, expected-utility-maximization); that might lead us pretty far astray, so it would be nice to have tools for bounded agents. That notion seems to be at the heart of the Embedded Agency post.