According to conventional wisdom, it is best to use powerful processors, in moderate numbers, to build supercomputers. However, rising power dissipation in microprocessor chips is driving computer architects towards a variety of solutions, all of which require exploiting greater degrees of parallelism. There are several hurdles to exploting greater levels of parallelism, such as, programming complexity, communication bottlenecks, interference from operating system services, and system management costs. In this talk, we describe our experiences with the IBM Blue Gene project, carried out in collaboration with Lawrence Livermore National Laboratory, on pushing the limits of scalability in all aspects of system design. We have shown that a system with over a hundred thousand processors can operate reliably enough to support production science. We have also been able to scale several codes to unprecedented levels of parallelism, ranging from tens of thousands of processors to over a hundred thousand processors. We describe some of the modifications to applications and system software that were needed to eliminate specific bottlenecks, and describe several outstanding challenges, as we move forward towards petascale computing.