7 June 2009 to 12 January 2009
Recognition: "Pitatoes!"
Speculation: "Lyra have pitatoes?"
Direction: "Lyra have pitatoes on OWN PLATE!"
Clarification: "Lyra have pitatoes on own LITTLE plate RIGHT NOW!"
Description: "Taste like bannas!"
Extrapolation: "Daddy's pitatoes taste like bannas?"
Invention: "Lyra make more pitatoes in da PLATE!" (Stirs remaining sweet potatoes vigorously.)
Context: "Chicken TOO BIG! CUT IT!" (Small piece of chicken is bisected; both halves are immediately crammed in mouth at once.)
Misdirection: "Affer dinner, have yogit pop OUTSIDE?"
Capitalization: "ALL DONE WIF DINNER!! Have yogit pop OUTSIDE!"
Speculation: "Lyra have pitatoes?"
Direction: "Lyra have pitatoes on OWN PLATE!"
Clarification: "Lyra have pitatoes on own LITTLE plate RIGHT NOW!"
Description: "Taste like bannas!"
Extrapolation: "Daddy's pitatoes taste like bannas?"
Invention: "Lyra make more pitatoes in da PLATE!" (Stirs remaining sweet potatoes vigorously.)
Context: "Chicken TOO BIG! CUT IT!" (Small piece of chicken is bisected; both halves are immediately crammed in mouth at once.)
Misdirection: "Affer dinner, have yogit pop OUTSIDE?"
Capitalization: "ALL DONE WIF DINNER!! Have yogit pop OUTSIDE!"
Since I've done this before for a couple other search engines, here's a side-by-side Bing vs Google comparator.
I haven't tried enough things to have a very considered opinion, but here are the first few tests that went into my preliminary unimpressedness with Bing:
kate bush covers
president twice
president obama's birthday
thread query language
where the rose is sown
capital of estonia
boston to asheville
36 jfk street
austin population
primitons reissue
ellen barkin buckaroo banzai picture
[11 June note: this post itself is now (temporarily?) towards the top of Google's results for some of these terms! But not Bing's. Good demonstration of the observer effect, at least.]
I haven't tried enough things to have a very considered opinion, but here are the first few tests that went into my preliminary unimpressedness with Bing:
kate bush covers
president twice
president obama's birthday
thread query language
where the rose is sown
capital of estonia
boston to asheville
36 jfk street
austin population
primitons reissue
ellen barkin buckaroo banzai picture
[11 June note: this post itself is now (temporarily?) towards the top of Google's results for some of these terms! But not Bing's. Good demonstration of the observer effect, at least.]
¶ The Problem With Answers · 27 March 2009 essay/tech
In his post explaining his departure from Google, Douglas Bowman says "Yes, it's true that a team at Google couldn't decide between two blues, so they're testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can't operate in an environment like that."
He's not really trying to have an opinion-changing last word in an argument against data-driven product decisions, but if he were, this is not how to do it. If you believe that the proper width of a border can be tested, then Bowman's refusal to subject his intuitions to quantitative confirmation just sounds like petulant prima-donna nonsense. If you can test 41 shades of blue, this line of reasoning goes, you don't need to guess, so a guessing specialist is an annoying waste of everybody's time.
The great advantage of testing and data, of course, is that you get precise, decisive answers you can act on. Shade 31, with 3.7%, trouncing runner-up shade 14 with only 3.4%! Apply shade 31, declare progress.
But the great disadvantage of testing and data is that you get precise, decisive answers you can and will act on, but you almost never know what question you really asked. Sure, the people who saw shade 31 did some measurable thing at some measurable rate. But why? Is it shade 31? Or is it the constrast between shade 31 and the old shade? Or is it the interplay between shade 31 and some other thing you aren't thinking about, or possibly don't even control? Are you going to run your test again in a month to see if the results have changed? Did you cross-correlate them with HTTP_REFERER and index the colors on the pages people came from? What about all the combinations of these 41 shades and 41 backgrounds and 8 fonts and 3 border widths (12 if you vary each side of the box separately!) and 41 line-heights and 19 title-bar wordings and the color of the tie Jon Stewart was wearing the night before? Which things matter? How do you know?
You don't. And if you need to add some new element, tomorrow, you don't know which of the tests you've already run it invalidates. Are you going to rerun all of them, permuted 41 more new ways?
No. You are going to sheepishly post a job opening for a new guessing specialist. Bowman already had his last word. It was "Goodbye".
He's not really trying to have an opinion-changing last word in an argument against data-driven product decisions, but if he were, this is not how to do it. If you believe that the proper width of a border can be tested, then Bowman's refusal to subject his intuitions to quantitative confirmation just sounds like petulant prima-donna nonsense. If you can test 41 shades of blue, this line of reasoning goes, you don't need to guess, so a guessing specialist is an annoying waste of everybody's time.
The great advantage of testing and data, of course, is that you get precise, decisive answers you can act on. Shade 31, with 3.7%, trouncing runner-up shade 14 with only 3.4%! Apply shade 31, declare progress.
But the great disadvantage of testing and data is that you get precise, decisive answers you can and will act on, but you almost never know what question you really asked. Sure, the people who saw shade 31 did some measurable thing at some measurable rate. But why? Is it shade 31? Or is it the constrast between shade 31 and the old shade? Or is it the interplay between shade 31 and some other thing you aren't thinking about, or possibly don't even control? Are you going to run your test again in a month to see if the results have changed? Did you cross-correlate them with HTTP_REFERER and index the colors on the pages people came from? What about all the combinations of these 41 shades and 41 backgrounds and 8 fonts and 3 border widths (12 if you vary each side of the box separately!) and 41 line-heights and 19 title-bar wordings and the color of the tie Jon Stewart was wearing the night before? Which things matter? How do you know?
You don't. And if you need to add some new element, tomorrow, you don't know which of the tests you've already run it invalidates. Are you going to rerun all of them, permuted 41 more new ways?
No. You are going to sheepishly post a job opening for a new guessing specialist. Bowman already had his last word. It was "Goodbye".
¶ echoes · 12 March 2009
Between this (first song on the album) and this (listen to the lyrics) (the latter by this band), that's now two bands I once wrote about (and who know that I wrote about them) who have later used my column's title in songs! Neither seems to be about me or the column, but still...
¶ How We Talk About What We Know · 2 March 2009 essay/tech
An adequate computer language allows humans to communicate with machines about machine concerns.
A good computer language also facilitates communication between humans about machine concerns.
A great language allows machines to participate in conversations between humans about human concerns.
There are not very many of this last sort. As I've mentioned before, I'm trying to write one. I've been calling it a query language, but I've started to think I shouldn't. It's a language for talking about data-relationships, where most other things called "query languages" are for excerpting data, and the two are different qualitative goals even when the individual tasks end up being logistically similar. I'm trying to do for data-relationships what the system for symbolic algebra did for numbers. Not what algebra did for numbers, thankfully, just what we accomplished by making up a written syntax for expressing algebra compactly and precisely.
So here's just one real-world example from yesterday. We were talking, elsewhere, about how you calculate overall ratings for bands in a large reviews database. The simplest thing is just to average all their ratings. In Thread, my data-relationship language, this is:
I.e.: For each artist, get their albums, then get those albums' ratings, then average all the ratings. But this is maybe not the best statistic, as it weights albums proportionally to the number of reviews. Maybe we want to average the ratings for each album, and then average the album-averages to get the artist average. That's a hard sentence for a person to read, and the computer can't read it at all. But in Thread it's just:
Run this, though, and you see that the top of the list is dominated by bands with very small numbers of very high ratings. Not really what we're trying to find out. So let's include only bands with at least 25 ratings:
This is better, but maybe not as much better as you'd think. It turns out that there are a number of bands for which a small number of people have written a large number of reviews. Maybe what we really want is to average the ratings for each user, not for each album. That way one person giving the same high rating to 8 different albums counts as 1, not 8. And we'll only consider artists with ratings from 25 different users, not just 25 ratings total. This is:
Better, but it's still pretty easy to game this by creating new accounts and filing one very high rating from each of them. We can mitigate that, though, by trusting only ratings from users who have rated, say, at least 5 different albums, from at least 3 different artists. That's:
Better again. But there are still a few pretty obscure things at the top of the list. This doesn't prove that the results are flawed, of course, but scrutinizing them, and thinking about the sample-size effects of rating variation at this scale, reveals that the highest and lowest ratings are having pretty dramatic effects. Perhaps it would be smart to toss out the top and bottom 10% of the per-reviewer averages, averaging only the middle 10%. This keeps one perspective-challenged fan or one vengeful ex-bassist from single-handedly jumping the ratings up or down. Thus:
The result of this, in fact, is this leaderboard. By these rules Immolation is currently the top-ranked band in the Encyclopaedia Metallum.
The English version of this final formulation is "bands with 25+ reviewers of their full-length albums, counting only reviewers who have filed at least 5 reviews and covered at least 3 bands; scored by averaging the ratings from each reviewer, dropping the top and bottom 10% of these reviewer-averages, and then averaging the remainder". This is a long sentence for people, and a useless sentence for machines, and as long as this is our canonical format, we will be at considerable risk for error every time we retranslate into a computer language. Put this in SPARQL or SQL or MQL, though, and it would be essentially inaccessible to people. So you chose between knowing what you want and not necessarily getting it, or knowing what you're getting but not whether it's what you want.
I think we have to do better. The human stakes for data-comprehension are approaching critical levels, and our tools have not kept up. Worse, the shiny new tools in the big labs are not ready yet and not even that great.
So Thread is my own personal attempt at doing better. Could it be the language we could actually share, humans and computers, to talk about data? I can't prove it is yet, and the project in which it's embedded is still working towards its public debut, so you can't make up your own mind yet, either. But for the past couple years I've been using it to talk to computers, and to myself, and even to a few coworkers, and the experience at least gives me hope. I know it's powerful, and I know it's compact.
Like any language, of course, we'd have to learn it. I make no claims of it being "intuitive", whatever meaning that term might have for a symbolic-reasoning language, nor do I claim it's trivially implemented at scale. It's cryptic in its own particular way, and poses its own technical challenges. But I'm not trying to minimize anybody's absolute difficulty, I'm trying to maximize the ratio of power to difficulty. If, reading those examples above, without a formal tutorial or even an actual diagram of the data model in question, you have at least a sense of what might be going on, then it's at least possible I'm getting somewhere.
[Note from a few days later: in re-reading these queries I actually noticed a methodological error! The first time I did this, I neglected to sort the ratings before trimming the first and last few. That is, I did this:
where I should have done this:
The operative difference is the "#" for sorting right before "._Trim 10%" in the second query, which is what makes the trim function take off the highest and lowest ratings, rather than just the first and last.
But even this error is kind of my point. The language is a tool for me to talk to myself over time.]
A good computer language also facilitates communication between humans about machine concerns.
A great language allows machines to participate in conversations between humans about human concerns.
There are not very many of this last sort. As I've mentioned before, I'm trying to write one. I've been calling it a query language, but I've started to think I shouldn't. It's a language for talking about data-relationships, where most other things called "query languages" are for excerpting data, and the two are different qualitative goals even when the individual tasks end up being logistically similar. I'm trying to do for data-relationships what the system for symbolic algebra did for numbers. Not what algebra did for numbers, thankfully, just what we accomplished by making up a written syntax for expressing algebra compactly and precisely.
So here's just one real-world example from yesterday. We were talking, elsewhere, about how you calculate overall ratings for bands in a large reviews database. The simplest thing is just to average all their ratings. In Thread, my data-relationship language, this is:
Artist|(.Album.Rating._Average)
I.e.: For each artist, get their albums, then get those albums' ratings, then average all the ratings. But this is maybe not the best statistic, as it weights albums proportionally to the number of reviews. Maybe we want to average the ratings for each album, and then average the album-averages to get the artist average. That's a hard sentence for a person to read, and the computer can't read it at all. But in Thread it's just:
Artist|(.Album.(.Rating._Average)._Average)
Run this, though, and you see that the top of the list is dominated by bands with very small numbers of very high ratings. Not really what we're trying to find out. So let's include only bands with at least 25 ratings:
Artist:(.Album.Rating::#25)|(.Album.(.Rating._Average)._Average)
This is better, but maybe not as much better as you'd think. It turns out that there are a number of bands for which a small number of people have written a large number of reviews. Maybe what we really want is to average the ratings for each user, not for each album. That way one person giving the same high rating to 8 different albums counts as 1, not 8. And we'll only consider artists with ratings from 25 different users, not just 25 ratings total. This is:
Artist:|(.Album.Rating/User::#25.(.group._Average)._Average)
Better, but it's still pretty easy to game this by creating new accounts and filing one very high rating from each of them. We can mitigate that, though, by trusting only ratings from users who have rated, say, at least 5 different albums, from at least 3 different artists. That's:
Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)._Average)
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)._Average)
Better again. But there are still a few pretty obscure things at the top of the list. This doesn't prove that the results are flawed, of course, but scrutinizing them, and thinking about the sample-size effects of rating variation at this scale, reveals that the highest and lowest ratings are having pretty dramatic effects. Perhaps it would be smart to toss out the top and bottom 10% of the per-reviewer averages, averaging only the middle 10%. This keeps one perspective-challenged fan or one vengeful ex-bassist from single-handedly jumping the ratings up or down. Thus:
Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)#._Trim 10%._Average)
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)#._Trim 10%._Average)
The result of this, in fact, is this leaderboard. By these rules Immolation is currently the top-ranked band in the Encyclopaedia Metallum.
The English version of this final formulation is "bands with 25+ reviewers of their full-length albums, counting only reviewers who have filed at least 5 reviews and covered at least 3 bands; scored by averaging the ratings from each reviewer, dropping the top and bottom 10% of these reviewer-averages, and then averaging the remainder". This is a long sentence for people, and a useless sentence for machines, and as long as this is our canonical format, we will be at considerable risk for error every time we retranslate into a computer language. Put this in SPARQL or SQL or MQL, though, and it would be essentially inaccessible to people. So you chose between knowing what you want and not necessarily getting it, or knowing what you're getting but not whether it's what you want.
I think we have to do better. The human stakes for data-comprehension are approaching critical levels, and our tools have not kept up. Worse, the shiny new tools in the big labs are not ready yet and not even that great.
So Thread is my own personal attempt at doing better. Could it be the language we could actually share, humans and computers, to talk about data? I can't prove it is yet, and the project in which it's embedded is still working towards its public debut, so you can't make up your own mind yet, either. But for the past couple years I've been using it to talk to computers, and to myself, and even to a few coworkers, and the experience at least gives me hope. I know it's powerful, and I know it's compact.
Like any language, of course, we'd have to learn it. I make no claims of it being "intuitive", whatever meaning that term might have for a symbolic-reasoning language, nor do I claim it's trivially implemented at scale. It's cryptic in its own particular way, and poses its own technical challenges. But I'm not trying to minimize anybody's absolute difficulty, I'm trying to maximize the ratio of power to difficulty. If, reading those examples above, without a formal tutorial or even an actual diagram of the data model in question, you have at least a sense of what might be going on, then it's at least possible I'm getting somewhere.
[Note from a few days later: in re-reading these queries I actually noticed a methodological error! The first time I did this, I neglected to sort the ratings before trimming the first and last few. That is, I did this:
Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)._Trim 10%._Average)
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)._Trim 10%._Average)
where I should have done this:
Album|Trusted Rating=(.Rating:(.User:(.Rating.Album::#5.Artist::#3)))
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)#._Trim 10%._Average)
Artist|(.Album.Trusted Rating/User::#25.(.group._Average)#._Trim 10%._Average)
The operative difference is the "#" for sorting right before "._Trim 10%" in the second query, which is what makes the trim function take off the highest and lowest ratings, rather than just the first and last.
But even this error is kind of my point. The language is a tool for me to talk to myself over time.]
¶ 7 Habits of Partly Successful People · 25 February 2009
1. The tendency to forget completely about anything for which someone else is vaguely expected to take the next step.
2. A reluctance to accept that quantifying one's nostalgia does not mitigate its mortality.
3. A vigilant willingness to challenge abstrusely tangential orthodoxies.
4. A failure, when not concentrating, to properly aspirate the letter H in the words "humor" and "human".
5. A fear of widths.
6. The maintenance of short but meticulous lists of inconclusive evidence for undeniable truths.
7. Always, or almost, allowing the silent moments at the ends of experiences to complete without crossfade.
2. A reluctance to accept that quantifying one's nostalgia does not mitigate its mortality.
3. A vigilant willingness to challenge abstrusely tangential orthodoxies.
4. A failure, when not concentrating, to properly aspirate the letter H in the words "humor" and "human".
5. A fear of widths.
6. The maintenance of short but meticulous lists of inconclusive evidence for undeniable truths.
7. Always, or almost, allowing the silent moments at the ends of experiences to complete without crossfade.
Lyra was supervising while I cooked dinner tonight, and I gave her a few chickpeas as I was putting them in a salad, mostly because it's so irresistibly cute to hear her call them "bickies".
"Moh?", she said, with a cartoonish upward lilt as if she read in a guidebook that that's how you ask for something in grownup. This means "more", in this case "more chickpeas?". Never mind how I know this. Parenting skills.
"I'm making yummy dinner", I pointed out, reasonably. I'm a fairly reasonable person, which I think she appreciates. Or will, by the time she's 36 or so.
She considered this for a moment, pressing a tiny finger into the dot of bickish water the last chickpea left behind on her tray, then looked up again, a tiny easy-bake-oven light-bulb clicking on above her head.
"One?"
"One? You want one chickpea?" I said. I'm assuming that my habit of asking her for clarification will become decreasingly inane as time goes on. She nodded enthusiastically. You might think, from the time-ordering, that she was answering my question, but I've conducted tests, and it turns out that she nods no matter what you say. The nodding is her answer to the implied question "Do you still want whatever you wanted before?" Which is, to be fair, what most of our questions to her amount to.
"Well, I can hardly deny you one single chickpea." It's OK to indulge children as long as they understand the careful logic behind your actions. I plucked a chickpea off the top of the salad and centered it precisely on the tray in front of her. "One", I explained, pointing to it for helpful pedagogical emphasis.
She nodded three or seven times, then picked up the chickpea, crammed her whole fist into her mouth, somehow extracted the chickpea from her grip while her hand was inside her mouth, and then pulled her hand out with that great sweeping flourish she's been working on in case she ends up needing a career in rodeo. I turned back to the stove, wondering whether you can say that you've learned to count "to" one. It's kind of "from" one, really.
Behind me I heard a small finger tap a plastic tray once, moistly.
"Five?", she said.
"Moh?", she said, with a cartoonish upward lilt as if she read in a guidebook that that's how you ask for something in grownup. This means "more", in this case "more chickpeas?". Never mind how I know this. Parenting skills.
"I'm making yummy dinner", I pointed out, reasonably. I'm a fairly reasonable person, which I think she appreciates. Or will, by the time she's 36 or so.
She considered this for a moment, pressing a tiny finger into the dot of bickish water the last chickpea left behind on her tray, then looked up again, a tiny easy-bake-oven light-bulb clicking on above her head.
"One?"
"One? You want one chickpea?" I said. I'm assuming that my habit of asking her for clarification will become decreasingly inane as time goes on. She nodded enthusiastically. You might think, from the time-ordering, that she was answering my question, but I've conducted tests, and it turns out that she nods no matter what you say. The nodding is her answer to the implied question "Do you still want whatever you wanted before?" Which is, to be fair, what most of our questions to her amount to.
"Well, I can hardly deny you one single chickpea." It's OK to indulge children as long as they understand the careful logic behind your actions. I plucked a chickpea off the top of the salad and centered it precisely on the tray in front of her. "One", I explained, pointing to it for helpful pedagogical emphasis.
She nodded three or seven times, then picked up the chickpea, crammed her whole fist into her mouth, somehow extracted the chickpea from her grip while her hand was inside her mouth, and then pulled her hand out with that great sweeping flourish she's been working on in case she ends up needing a career in rodeo. I turned back to the stove, wondering whether you can say that you've learned to count "to" one. It's kind of "from" one, really.
Behind me I heard a small finger tap a plastic tray once, moistly.
"Five?", she said.
¶ Kvltosis · 25 January 2009
I've been calculating voter-centricity in polls for several years now, so I can't believe I only just thought of the way to re-apply voter-centricity to the things being voted on: Retabulate the album (or whatever) ranking, inverse-weighting each vote by the voter's centricity. I.e., the closer the voter was to the consensus, the less their vote is worth. Then take the ratio of weighted scores to vote-counts, and you get a measure not of popularity, but of cultishness. You probably want to get rid of the albums that got very few votes, but in the 30-voter ILM Metal poll I only had to eliminate albums that got fewer than 3 votes before the results started looking interesting. In the 577-voter Pazz & Jop poll I cut off the albums with 5 votes or fewer, but even the 6- and 7-vote albums are distributed across the score-range pretty well.
The only real metric of idiot statistics tricks like this is whether you find out anything new by looking at them. In this case, you can make up your own mind. I have named this new stat "kvltosis" (in a combined metal/statistics joke for which possibly I am the entire target-audience), and added it to my Pazz & Jop analysis. If the poll's consensus bores you, perhaps this can be another antidote. (If the poll's consensus thrills you, on the other hand, just mentally invert this list and you have consensus squared...)
The only real metric of idiot statistics tricks like this is whether you find out anything new by looking at them. In this case, you can make up your own mind. I have named this new stat "kvltosis" (in a combined metal/statistics joke for which possibly I am the entire target-audience), and added it to my Pazz & Jop analysis. If the poll's consensus bores you, perhaps this can be another antidote. (If the poll's consensus thrills you, on the other hand, just mentally invert this list and you have consensus squared...)
¶ Numbers upon numbers · 24 January 2009
If you haven't had enough music-poll stats after this, I also helped tabulate the ILM Metal Poll, and posted even more geekery on the ILM thread discussing the Pazz & Jop results.
If you want to make the case for "improved" searching via the wonders of semantic-web technology, as this blog post and this demo attempt to, you need to make your demo demonstrate something compelling.
In the blog post announcing that demo, Kingsley suggests "Microsoft" as the query example. As of this moment, doing that query on that demo produces a page of unintelligibly elided URLs, misrendered characters, and random blobs of text that contain the word "Microsoft" in them. The UI opens with this stirring invocation:
And the first search result begins, and I feel like I have to clarify that I am not making this up, with the words "Mac OS X Leopard" (and then some gibberish that I'm guessing used to be Italian).
If you do the search "Microsoft" on Google right now, you get some news items about Microsoft, followed by the Microsoft site itself.
But maybe that was just an unfortunate example. So I tried looking for Cyndi Lauper. Google's results for this begin with Cyndi's official site, then the Wikipedia page about her, then her MySpace page. OpenLink's begin with "The Parking Lot 03.09.2007 at SmartLemming.com", again in a page-layout that isn't even funny as a parody of good information design.
If you want to amuse yourself by trying more examples, I've put up an easy form for running a search on both sites side-by-side:
cyndi lauper
microsoft
(try your own)
Be patient with the OpenLink side...
To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed.
On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?
And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can.
Now, did your opinion of the potential of the "semantic web" go up or down during your experience?
[Update: Kingsley responds here, and suggests that "glenn mcdonald" would actually be a better example query. So here you go: glenn mcdonald. Did your opinion change?
Just to be clear, I think Kingsley is exactly right that we need a universal data browser, and quite possibly right that Virtuoso's underlying technology is capable of being an engine for such a thing. But this thing he's showing isn't a data browser, it's a data-representation browser. It's as if the first web-browser only did View Source. We will no more sell the new web by showing people URIs than we sold the old web by showing them hrefs. Exactly the opposite: we sold the old web by not showing people UL and OLs and TD/TD/TD/TD and CELLPADDING=0. And we'll sell this new web by not showing them meta-schema and triples and reification and inverse-link entailment.]
In the blog post announcing that demo, Kingsley suggests "Microsoft" as the query example. As of this moment, doing that query on that demo produces a page of unintelligibly elided URLs, misrendered characters, and random blobs of text that contain the word "Microsoft" in them. The UI opens with this stirring invocation:
Displaying values and text summaries associated with pattern: (NULL)1
(NULL)1 contains "microsoft" in any property value.
(NULL)1 contains "microsoft" in any property value.
And the first search result begins, and I feel like I have to clarify that I am not making this up, with the words "Mac OS X Leopard" (and then some gibberish that I'm guessing used to be Italian).
If you do the search "Microsoft" on Google right now, you get some news items about Microsoft, followed by the Microsoft site itself.
But maybe that was just an unfortunate example. So I tried looking for Cyndi Lauper. Google's results for this begin with Cyndi's official site, then the Wikipedia page about her, then her MySpace page. OpenLink's begin with "The Parking Lot 03.09.2007 at SmartLemming.com", again in a page-layout that isn't even funny as a parody of good information design.
If you want to amuse yourself by trying more examples, I've put up an easy form for running a search on both sites side-by-side:
cyndi lauper
microsoft
(try your own)
Be patient with the OpenLink side...
To state the obvious caveat, the claim OpenLink is making about this demo is not that it delivers better search-term relevance, therefore the ranking of searching results is not the main criteria on which it is intended to be assessed.
On the other hand, one of the things they are bragging about is that their server will automatically cut off long-running queries. So how do you like your first page of results?
And on the other other hand, the big claim OpenLink is making about this demo is that the aggregate experience of using it is better than the aggregate experience of using "traditional" search. So go ahead, use it. If you can.
Now, did your opinion of the potential of the "semantic web" go up or down during your experience?
[Update: Kingsley responds here, and suggests that "glenn mcdonald" would actually be a better example query. So here you go: glenn mcdonald. Did your opinion change?
Just to be clear, I think Kingsley is exactly right that we need a universal data browser, and quite possibly right that Virtuoso's underlying technology is capable of being an engine for such a thing. But this thing he's showing isn't a data browser, it's a data-representation browser. It's as if the first web-browser only did View Source. We will no more sell the new web by showing people URIs than we sold the old web by showing them hrefs. Exactly the opposite: we sold the old web by not showing people UL and OLs and TD/TD/TD/TD and CELLPADDING=0. And we'll sell this new web by not showing them meta-schema and triples and reification and inverse-link entailment.]