The classic problem of ad hoc information retrieval involves a user with an information need, a representation or expression of that information need (the query), and a system or retrieval engine that compares the query against a collection of items in order to return the most relevant items to the user information need. Despite numerous and obvious exceptions, in general text information retrieval has a fairly high correlation between the syntax of a query as expressed in language and the semantics of the information need. Textual similarity is highly correlated with relevance. On the other hand, in content-based multimedia retrieval (images, video, music, 3d models), objects encompass multitudinous semantics in many different dimensions. In music for example there are properties of pitch, tempo, rhythm, timbre, singer characteristics, genre, instrumentation, year of production, and so on. The correlation between similarity and relevance is much lower. Two music pieces might be similar because they both use similar instruments, timbres, tempos and singers, but they are not necessary both relevant to my information need if I am looking for waltzes, and one piece is in 3/4 and the other in 4/4. The current popular solution to this problem, characterized by buzzwords such as "collective intelligence", "wisdom of crowds" and "Web 2.0", is to bypass content altogether. By instead aggregating the media interactions (playlists, tags, click behavior, etc.) of massive numbers of people, the collective intelligence approach hopes to be able to determine relevance directly, without the need for content-based methods. If people are not only the ultimate consumers, but also the ultimate producers of relevance, why waste any effort on a problem as difficult as content-based retrieval? In our presentation we reject this notion of complete reliance on collective intelligence methods and argue that content-based methods are necessary. Aggregate crowd relevance information may be able to tell us what should be retrieved, but it still will not tell us why something was retrieved. For that, we still need to rely on the explanatory power of content. Therefore, we propose the "cognitive disclosure" paradigm, in which semantic representations are chosen a priori by designers of a content retrieval system, i.e. content-features necessary to call a piece of music a "waltz", or to call an image a "landscape". These semantic categories are then revealed to users at retrieval time, to allow them more intelligent selection of the types of information that is relevant to them. This problem is still very difficult and there are no easy solutions. However, our purpose is simply to explain why "wisdom of crowds" approaches will inevitably fall short, and content-based methods are still going to be necessary.