davv | AI, part three! (Reply)

The impractical optimizer layout I gave has been discovered before -- that's where I took it from, actually. It's called AIXI. The predictor system is called an "universal prior" (though it does depend on the assumption that the universe is computable - i.e. the Church-Turing thesis - to be truly universal). The attempt at a more practical system is called MC-AIXI (MC for Monte Carlo). I was dimly aware of it (I even linked to it earlier), but I hadn't actually read the paper.

Upon doing just that, between lots of mathematics I wasn't really equipped to reason about, I found out that the authors had already thought of my idea. They make use of the clever predictor itself, and say that (more or less):

If we have some program that consistently shows up with a high weight in our predictions of reality, then go through this program (and its variants) first.

This is a very nice solution. If reality starts to diverge from the already-stored predictors, they'll drop in weight and disappear. If not, they're handled by the same kind of process as the baseline, so there's never an abrupt jump between the baseline reasoning and "baseline, augmented" reasoning.

In the plain old impractical optimizer, memoizing like this serves no purpose because the system goes through every possible program anyway. But to be workable in the real world, MC-AIXI samples. That's the Monte Carlo part. And when sampling, one doesn't go through every program, only some of them... so storing what has worked in the past proves useful indeed.

And unlike the impractical optimizer, the Monte-Carlo version isn't uncomputable. It actually works on toy problems with current computers, though it takes a long time still. (Here it plays limited-observable Pacman.)

Will this thing take over the world? Probably not. Say the impractical optimizer is like looking at a game and divining optimal play from the entire tree of "if I do this, he does that, then I do this, then he does that" in an instant. The Monte-Carlo variant is like an early computer Go player. More cleverness will probably be needed to make a working General Optimizer... and there's always the question of whether we should be given that kind of power :)

What we really got out of it is:
- From the impractical optimizer, we have an idea of how to do general optimization, even if the tasks are impossible as we know them.
- From the Monte-Carlo optimizer, we have something that works, albeit very slowly.

These are footholds. The people who made MC-AIXI haven't climbed the mountain, but given some ideas of what kind of climbing could work.