Fine-grain data-level parallelism can be exploited using SIMD or short- vector architectures, which have become ubiquitous in DSP's, desktops and servers. Programmers have benefited from such instructions for some time using both programming language extensions as well as automatic vectorization or simdization by compilers. Yet some limitations still hinder wider usage and efficiency of SIMD extensions, mostly involved with gathering data to feed the SIMD processing engine, including its programmability, compilability and portability. In this short talk we'll discuss several approaches that aim to overcome these limitations.