We consider two co-adapting agents, each with a different and incomplete view of the environment and each trying to optimize a different objective function, which is a function of the behavior of of both agents. The agents learn to increase their objective function using gradient ascent, and learn the behaviour of their opponent using stochastic approximation. We show in the limit of an infinitessimal step size the following results: gradient ascent with perfect information of the opponent does not converge. It can be made to converge using the lagging anchor algorithm of Dahl. When the agents have to model each other's behaviour there is a phase transition between stable and unstable behavior