¶ Search Is Not a Goal · 5 March 2006 essay/tech
The main human information goal is understanding. Or wisdom, depending on your precise taxonomy. But either way, searching is plainly a means, not an end, and the current common incarnation of Search, which involves arbitrarily flattening a content space into a set of independent and logically equivalent Pages and then filtering them based on the presence or absence of words in their text, not only isn't an end, but is barely even a means to a means. This form of two-dimensional, context-stripping, schema-oblivous, answer-better-already-exist-somewhere searching is properly the very last resort, and it's a grotesque testament to the poverty of our information spaces that at the moment our last resort is often our only resort.
The first big improvement in searching is giving it schema awareness. I doubt the people behind IMDb spend much time thinking about themselves as technology visionaries, but IMDb Search is a wildly instructive model of what is not only possible but arguably almost inevitable if you know something about the structure of your data. IMDb presents both the search widgetry and the answers in the vocabulary of the data-schema of movies and the people who work on them, not in "keywords" and "pages", and understands intimately that in IMDb's information-space search exists almost exclusively for the purpose of finding an entry point at which to start browsing. You go to Google to "look for something", you go to IMDb to "look something up"; the former phrase implies difficulty and disappointment in its very phrasing, the latter the comfortable assumption of success.
On the web at large, of course, there is no meaningful schema, and it's impossible to make any simplifying assumptions about the subject matter of your question before you ask it. It is more productive to search in IMDb than with Google not because IMDb's searching is better, but because its data is better. But this does not even fractionally exonerate Google, or anybody else who is currently trying to solve an information problem by defining it as a search problem. They're all data problems. Google has the hardest version of this problem, since they don't directly control the information-space they're searching, but they have more than enough power and credibility to lead a revolution if they can muster the vision and organization. And anybody building an "enterprise" search tool has no such excuse; the enterprise does control their information-space, at least out to the edges where it touches the public space, and every second that can be invested in improving the data will be at least as productive as an hour sunk into flatly searching it.
So if I worked for a Searching company right now, I'd start madly redefining ourselves tomorrow. We are not a searching company, we are an information organization company. The last resort is necessary, but neither sufficient nor transformative. I'd pull the smartest people I had off of "search" and put them to work on tools for the other end of the information process, reaching to the humans who are creating it and giving them the power to communicate not just the words of what they know but the structure of it, and to the collective mass of people to help them communicate and recognize and refine their collective knowledge about the schemas of known and knowable things. This is why Google Base holds the future of Google, and why you should sell your Google stock right now if they keep treating it as mainly a way for someone to buy your unused exercise equipment from you using a credit card. It should be the world's de facto public forum for the negotiation of the schema of all human knowledge, and if it isn't, every other decision Google makes will be forced by whatever is.
But well-structured data, though necessary, isn't sufficient either. The good news for "search" companies is that improving the data is itself just a means to an end. Ideal data only encodes what we already know. The problems of useful inference from known data are hugely harder and super-hugely more valuable than the current forms of searching, especially when you realize that the boundary between private and public data is an obstacle and an opportunity in both directions not a wall to hide behind or run away from. The real future of "search" is in providing humans with the tools to form questions that haven't already been answered, and assemble the possible pieces of the answer, from threads of reasoning that traverse all kinds of territories of partial knowledge, into some form that synthesizes ideas that have never before even been juxtaposed, and onto which humans can further apply human powers where machine powers really fail -- fail because the machines are machines, not where they fail because we didn't take the time to let them be more thoroughly themselves -- so that they in turn can help us be more completely and wisely human.
The first big improvement in searching is giving it schema awareness. I doubt the people behind IMDb spend much time thinking about themselves as technology visionaries, but IMDb Search is a wildly instructive model of what is not only possible but arguably almost inevitable if you know something about the structure of your data. IMDb presents both the search widgetry and the answers in the vocabulary of the data-schema of movies and the people who work on them, not in "keywords" and "pages", and understands intimately that in IMDb's information-space search exists almost exclusively for the purpose of finding an entry point at which to start browsing. You go to Google to "look for something", you go to IMDb to "look something up"; the former phrase implies difficulty and disappointment in its very phrasing, the latter the comfortable assumption of success.
On the web at large, of course, there is no meaningful schema, and it's impossible to make any simplifying assumptions about the subject matter of your question before you ask it. It is more productive to search in IMDb than with Google not because IMDb's searching is better, but because its data is better. But this does not even fractionally exonerate Google, or anybody else who is currently trying to solve an information problem by defining it as a search problem. They're all data problems. Google has the hardest version of this problem, since they don't directly control the information-space they're searching, but they have more than enough power and credibility to lead a revolution if they can muster the vision and organization. And anybody building an "enterprise" search tool has no such excuse; the enterprise does control their information-space, at least out to the edges where it touches the public space, and every second that can be invested in improving the data will be at least as productive as an hour sunk into flatly searching it.
So if I worked for a Searching company right now, I'd start madly redefining ourselves tomorrow. We are not a searching company, we are an information organization company. The last resort is necessary, but neither sufficient nor transformative. I'd pull the smartest people I had off of "search" and put them to work on tools for the other end of the information process, reaching to the humans who are creating it and giving them the power to communicate not just the words of what they know but the structure of it, and to the collective mass of people to help them communicate and recognize and refine their collective knowledge about the schemas of known and knowable things. This is why Google Base holds the future of Google, and why you should sell your Google stock right now if they keep treating it as mainly a way for someone to buy your unused exercise equipment from you using a credit card. It should be the world's de facto public forum for the negotiation of the schema of all human knowledge, and if it isn't, every other decision Google makes will be forced by whatever is.
But well-structured data, though necessary, isn't sufficient either. The good news for "search" companies is that improving the data is itself just a means to an end. Ideal data only encodes what we already know. The problems of useful inference from known data are hugely harder and super-hugely more valuable than the current forms of searching, especially when you realize that the boundary between private and public data is an obstacle and an opportunity in both directions not a wall to hide behind or run away from. The real future of "search" is in providing humans with the tools to form questions that haven't already been answered, and assemble the possible pieces of the answer, from threads of reasoning that traverse all kinds of territories of partial knowledge, into some form that synthesizes ideas that have never before even been juxtaposed, and onto which humans can further apply human powers where machine powers really fail -- fail because the machines are machines, not where they fail because we didn't take the time to let them be more thoroughly themselves -- so that they in turn can help us be more completely and wisely human.