The large gap between sustained and peak performance on current high performance computing platforms remains a serious problem. While there are many causes, two are commonly mentioned: the difficulty in achieving good single node performance and the complexity of managing distributed memory code. Various solutions have been offered to this problem, including the development of new, high-productivity parallel languages. However, to effectively address both of these issues, it is necessary to address the issue of data locality, for both local memory hierarchies and distributed memory parallelism. Experience shows that this is a very challenging problem for compilers, even in the most studied cases, which suggests that a more cooperative approach between the system software and the programmer is required.