Computational modeling and simulation is now accepted as a common mode for knowledge discovery and design across science, engineering and manufacturing. In many such applications, the models are based on partial-differential equations which are typically solved numerically using sparse or irregular scientific computing kernels. Such sparse kernels differ greatly in their memory, cache and FPU usage from the SPEC floating-point benchmark kernels and the more traditional dense matrix LINPACK benchmark. For example, dense matrix kernels have high levels of data locality and re-use that enable them to execute at near peak rates on high performance microprocessor architectures with deep cache hierarchies and high clock frequencies. However, sparse kernels can utilize only a fraction of the available processing speed because they have a large number of data accesses per floating-point operation, and limited data locality and data re-use despite algorithmic changes and considerable tuning of codes through blocking and loop unrolling schemes. In this paper, we take into account current trends in power-aware high performance architecture design to explore the impact of memory subsystem optimizations on sparse scientific codes. We propose a variety of features to enable energy-aware, high performance, sparse scientific applications. We use simulations with SimpleScalar and Wattch to demonstrate the effectiveness of our approach for an optimized sparse-matrix-vector multiplication kernel and two sparse kernels from the NAS benchmark. Our results indicate that in certain cases, our optimizations can improve cache miss rates by 90\%, application time by 80%, and the energy $\times$ time metric by 90\%, at system power levels less than or equal to those in the original configuration.