davv | AI, or Postings from a Train, part 1

In the domain of artificial intelligence, I've been thinking about what one may call a "General Optimizer" (or perhaps Great Optimizer, though there can be no one single Great - in the Platonic/Vasai sense - Optimizer). The General Optimizer is a system that you provide with a fitness function, and then it acts to maximize the value of that fitness function, given the manipulators and actions available to it. What makes the optimizer General is that the fitness function can be anything you can evaluate. It might be the proportion of red pixels in a picture the Optimizer gets from a camera. It might be the fraction of satisfied clauses in a SAT instance; or it might be the average return on some stock market data according to a strategy it devises.

The interesting thing about the General Optimizer is that now we have an idea of how it would work. The algorithm is completely impractical, but at least we have an algorithm and a general idea of what modules are necessary.

The impractical algorithm consists of two components. First, there is an explanation aspect, which tries to produce a predictor of reality (the domain where changes may be affected) and the fitness function. Second, there is a planner aspect, which determines the action to perform so that the best prediction so far, when combined with the action, predicts as good an improvement in the fitness function as is possible.

The predictor then uses a clever trick. It assumes that reality is computable, and that simpler models are more likely to be true. In other words, all laws of physics (or mathematics) can be run through an ordinary computer, and simple laws are more likely to be true than are complex laws. (There's a related proof that every kind of predictor has to make *some* kind of assumption, and so the reasoning goes that this assumption of computability seems to be a good one in practice.)

So the impractical predictor takes a sum of every computer program that produces what it has seen so far, and weights the programs by their length in base 1. This is like taking a vote, but one where short programs (simple laws) have a much stronger vote than long programs. Yes, this is uncomputable since it can be used to find a set of data's ultimate compression (Kolmogorov complexity) which is known to be impossible to do with an ordinary computer -- but the point here is to find the ideal.

The impractical planner is much simpler. Since the predictor now have a correctly weighted "committee" (as one may say, of all programs), it can extend the prediction however far into the future it desires. This may take a very long time, but again. Ideal! Thus it goes through every possible action it can make, and checks what the predictor says will happen if it performs that action. One of these actions will eventually give the best return on the fitness function, and it then picks that one.

If reality diverges from the prediction, that's no problem. As long as reality is computable, that only provides additional data for the next step of the impractical predictor.

Flat | Top-Level Comments Only

From:

lhexa

I think you could cause some mischief with the "fitness function". For instance, suppose your fitness function is a measure of how well a program solves some differential equation. Then you have a hierarchy of increasingly long programs:

One using Euler's method: on the order of dt (time step) in error.
One using the Improved Euler's method: order of dt^2, now.
Runge-Kutta: dt^3
...

It keeps going, without any theoretical bound -- the general notion is that each step lets you implicitly incorporate more derivatives. So maximizing the fitness function leads you to ever longer programs.

The way that I see around this is to always have your fitness function incorporate some measure of the very computer's abilities, rather than being based entirely on program output... so, for instance, the function could weigh the order of approximation versus the total rounding error (each high approximation adds more terms and thus more rounding error). Or something else...

Oh, and, the reason I bring up this issue is that my advisor wants me to convert an Improved Euler code into a Runge-Kutta code. *sighs*

davv

Oh, could you ever. When you deal with black box optimization (genetic algorithms, simulated annealing, etc), you'll quickly find out these optimization methods are supreme rules lawyers. They'll give you what you've specified, and they'll sacrifice everything else to give you that. If they find two solutions, one which is almost perfect and reasonably close to what you really wanted, and one that is perfect according to the metric, yet completely inapplicable, they'll choose the second every time.

For a general purpose AI, this is even worse. You might tell it to solve P vs NP as quickly as possible, and if you don't qualify that, it could decide to consume the Earth to build a more powerful version of itself in order to determine the answer as quickly as possible.

It's "be careful what you wish for, you might get it" all over again! Only the AI is not like a malevolent genie, it's more like the brooms in the Sorcerer's Apprentice.

And now I have to say I am a lot more familiar with Improved Euler and Runge-Kutta than I were at the beginning of the year.

That's good, to be sure. I just hope it wasn't entirely arcane, so that the "1-6-4-6-1" stuff made some sense.

Well, it still only talked about the basics. That I'm a lot more informed speaks more about how little informed I were before it :)

What I do know is: there are different ODE solution programs. Some are simple (like Euler), some are complex (like RK4), and they are classified by mainly two things:

- their order of convergence (and I had to learn the convergence for each - Euler is 1, the midpoint method is 2, RK4 is 4, Taylor of order n is, well, n, and so on).

- how well they handle certain "difficult" systems (implicit methods are generally better than explicit ones), though this was only briefly touched on.

From what I understand, though, the weighting in RK4 was picked so that when the ODE reduces to an integral, you're doing Simpson's method.

It sounds like you know as much as I do, at this point. I still don't know what sorts of ODEs the series-expansion methods fail on (solutions with poles, maybe?), or what motivates the choice of what order of convergence to use (since you can go arbitrarily high), and the textbook that I used (on differential equations in general, not numerical methods specifically) didn't say.

Also, just wait until you start studying numerical methods for PDEs... *rolls eyes*

From what I've read on the web, some of the solver failures happen when the combination of the particular solver and the ODE leads to numerical instability. We got to know quite a bit about numerical instability in the numerical methods class/course, e.g. what kind of operations can cause cancellation in floating point operations and so on.

I doubt that's the whole story, though, because if it were, one could simply use a rational number type (or infinite-precision types). Perhaps those are too slow? Yet that sounds unlikely, too.

As to the order-of-convergence thing, we were told that higher order of convergence is usually better than lower because you end up lowering the error more than you pay for the additional computation. But sometimes your function is pretty well-behaved, in which case you can use a simple method and still have it work okay.

E.g. for a Taylor approximation of f(t_i+1), you have an error that's something like

h⁽ⁿ⁺¹⁾/(n+1)! * f⁽ⁿ⁾(ξ_i, y(ξ_i))

where ξ_i is in (t_i, t_i+1).

So if you know that some kth derivative is well-behaved or reasonably bounded, you don't have to use a higher order than k. Beyond that, I guess that calculating the higher-order derivatives gets tricky for some ODEs so that you can't just take Taylor to order infinity - but they didn't really deal with that. For some real-world systems, differentiating might be either very difficult or a lot of work, and so one just uses things like RK4, I also suppose.

You shouldn't have to take derivatives, unless you're deriving the method itself. For instance, the first and second derivatives have approximate expressions:

f'(x) = (1/(2*h))(f(t+h)-f(t-h))
f''(x) = (1/h^2)*(f(t+h) - 2*f(t) + f(t-h))

and more for higher derivatives. If you take the Taylor expansion for a function and plug in these expressions (using lower-order methods for the terms not yet evaluated, like f(t+h)), you will end up with an expression for f(t+h) in terms of f(t), f(t-h), etc., but with no explicit derivatives. I think going up to the second derivative gives you the third-order method, and going up to the third derivative gives you RK4. But there's no trickiness in that regard, unless evaluating f(t) is itself tricky.

Not that I'm particularly good at, or knowledgeable about, numerical methods. *sighs*

In the method I was talking about, you derive the ODE itself. For instance, say you want to find y = y'. Then you can take the initial value (say y(0) = 1) and just insert into the Taylor sequence. Right off, you know that y'(0) = y(0) = 1, y''(0) = 1 etc., and you get the Taylor expansion for e directly. Remember how I stumbled across sinh by imagining a function that had y(0) == 0, y'(0) == 1, y''(0) == 0 and so on? Kinda like that.

For more complex ODEs, you'll have to actually differentiate to get the second, third, etc. derivative.

This particular approach isn't all that common, though; you're much more likely to encounter something like RK4.

S	M	T	W	T	F	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

Journal of Dw

Creatures, code, and corvids, oh my!

AI, or Postings from a Train, part 1

AI, or Postings from a Train, part 1

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

no subject

Profile

March 2018

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags