Moore's prediction -- commonly known as Moore's "Law" but not a scientific law in the strict sense -- indicates that in the next few years we will have digital circuits with ten billion transistors (we already have a billion+ transistor processor chip from Intel). Clearly, a portion of the billion-transistor integrated circuit market will consist of traditional ASICs, e.g., for super-high volume devices such as cell phones. Another portion of the billion-transistor integrated circuit market will be dominated by processor designs such as Intel's Merced/Itanium architecture. The rest of the picture is less clear; however, some percentage will likely be dominated by customizable heterogeneous multiprocessor chips with a reasonable (say, 30-60) percent of the chip consisting of reconfigurable and custom digital logic. For lack of a better term, we will refer to such Customizable Heterogeneous Multiprocessor chips as CHM chips. One example of a CHM chip is the Virtex-4. Standard argumentation in favor of RISC indicates that a processor's compiler and architecture must be designed together or codesigned. Similarly, we will argue that CHM chips require codesign of the architecture and the RTOS to run on the architecture. The Hardware/Software Codesign Group at Georgia Tech is working on some ideas in this domain. Specifically, we will give a brief overview of three recent projects: (i) design of a System-on-a-Chip Lock Cache (SoCLC) where lock variables are placed in a special lock cache in a CHM chip -- a client-server example using SoCLC shows a reduction in lock latency by a factor of up to 3.65X resulting in an overall speedup of 31% for the application, (ii) a specialized hardware structure and associated algorithm which speeds up deadlock detection by two to three orders of magnitude in reconfigurable logic when compared with software algorithms, resulting in a 38% overall speedup in a practical deadlock scenario, and (iii) a SoC Dynamic Memory Management Unit (SoCDMMU) integrated with a software RTOS and able to provide worst-case second-level memory allocation in 16 cycles in a four-processor SoC example, resulting in an example where average case application transition time is 4.4X faster and worst case application transition time is over 10X faster using the SoCDMMU versus the traditional software approach. The talk will end with a brief description of a hardware/software RTOS generation framework able to integrate any mix of the three hardware RTOS units (i, ii or iii above) together with a software RTOS. This talk was given as the keynote at opening of the FPGAworld 2004 Conference.