Discovery of useful knowledge from large image and video datasets has been one of the common interests in content-based retrieval. The useful knowledge can serve as "entry points" for exploring the index structure, content summarization and semantic analysis. In this talk, we present two of our recent works in: (i) Automatic discovery of near-duplicate keyframes in broadcast videos, (ii) Automatic discovery of common pattern in images. Near-duplicate keyframes (NDK), by definition, are similar keyframes but undergone a variations of changes including lighting, viewpoint, color, contrast and editing. While human subjects could probably identify them by spending time to browse through the entire set of keyframes, it become difficult for a machine to automatically discover them partly due to the various local variations introduced to NDKs. In this talk, we present techniques to effective detect NDK pairs and efficiently track them into groups, by utilizing both content and external cues. For content cue, we demonstrate that local interest point is a good choice for this problem, with the appropriate choices of features, point-to-point matching strategy, and learning mechanism. For external cue, we utilize the time-span distribution and transitive nature of NDKs to rapidly grow and discover NDK groups. A different version of the problem, compared to NDK discovery, is the automatic identification of the common patterns in images. The patterns could probably appear in background clutter under various viewpoint and scale changes. The task is potentially challenging if we simply want to find common concepts that frequently appear and potentially useful to describe an arbitrarily given image data set. For human subjects, it is easy to identify the common items of few images, but it become time consuming for a moderately large image data set. For a machine, naturally but also interestingly, it is easier to identify common items for three images than two images. To tackle this problem, we propose a novel segmentation-insensitive approach for mining common patterns from as few as two images. Based on Earth Mover's Distance (EMD), we propose an approach, namely local flow maximization (LFM), to find the best estimation of location and scale of a common pattern, under the multiple instance learning (MIL) setting. In addition to finding common pattern, We also demonstrate that this approach can yield encouraging results for image retrieval with feedback mechasim.