Optimization of large-scale simulations is a crucial driver for petascale computing, particularly when the simulations represent the solution of PDEs. The optimization problem can be orders of magnitude more challenging to solve than the simulation problem: it can be ill-posed even when the simulation problem is well posed, 4D in space-time when the simulation problem is 3D evolutionary, dense when the simulation problem is sparse, and require many forward simulations for convergence. We discuss, by way of example, challenges in scaling up optimization methods to thousands of processors and billions of unknowns. Examples include inverse problems, optimal control, and optimal design problems for elliptic, parabolic, and hyperbolic systems. We conclude that parallel scalability can be achieved as straightforwardly as for the forward simulation. Algorithmic scalability is often harder to achieve, but in some cases optimization is achieved in a small multiple of the cost of the forward simulation.