Many important applications of high performance computing involve frequent, unstructured memory accesses. Among these applications are graph algorithms which arise in a wide range of important applications including linear algebra, biology and informatics. Graph operations often involve following sequences of edges, which requires minimal computation but frequent accesses to unpredictable locations in global memory. These characteristics result in poor performance on traditional microprocessors, and even worse performance on common parallel computers. In recent work, we have explored the performance of graph algorithms on the massively multithreaded Cray MTA-2. The MTA's latency tolerance and fine-grained synchronization mechanisms allow for high performance of single processor and parallel graph algorithms. As large-scale, data-centric computing continues to grow in importance, these results have important lessons for future architectures.