furialog

¶ Never Mind the Semantic Web (or, 13 Reasons Not to Let a Computer Scientist Choose a Name (or a Problem)) · 3 June 2008 essay/tech

1. "Semantic". By starting the name this way, you have essentially, avoidably, uselessly doomed the whole named enterprise before it starts. Most people don't have the slightest idea what this word even means, most of the people who do have an idea think it implies pointless distinctions, and everybody left after you eliminate those two groups will still have to argue about what "semantic" means. This is a rare actual example of begging the question. Or to put it in terms you will understand: congratulations, you've introduced terminological head recursion. Any wonder the program never gets around to doing anything?

2. "The Semantic Web". The "The" and the "Web" and the capitalization combine to suggest, even before anybody compounds the error by stating it explicitly, that this thing, which nobody can coherently explain, is intended to compete with a thing we already grok and see and fetishize. But this is totally not the point. The web is good. What we're talking about are new tools for how computers work with data. Or, really, what we're talking about are actually old tools for working with data, but ones that a) weren't as valuable or critical until the web made us more aware of our data and more aware of how badly it is serving us, and b) weren't as practical to implement until pretty recently in processor-speed and memory-size history.

3. "FOAF". There have been worse acronyms, obviously, but this one is especially bad for the mildness of its badness. It sounds like some terrible dessert your friends pressured you to eat at a Renaissance Festival after you finally finished gnawing your baseball-bat-sized Turkey Sinew to death.

4. FOAF as the stock example. You could have started anywhere, and almost any other start would have been better for explaining the true linked nature of data than this. "Friend" is the second farthest thing from a clean semantic annotation in anybody's daily experience. I'm barely in control of the meaning of my own friend lists, and certain wouldn't do anything with anybody else's without human context.

5. Tagging as the stock example. "Tagged as" is the first farthest thing from a clean semantic annotation in anybody's daily experience.

6. Blogging as the stock example. Even if your hand-typed RDFa annotations are nuggets of precious ontological purity, you can't generate enough of them by hand to matter. Your writing is for humans, not machines, and wasting brains the size of planets on chasing pingbacks is squandering electricity. We already know how to add to humanity's knowledge one fact at a time. The problem is in grasping the facts en masse, in turning data to information to knowledge to wisdom to the icecaps not melting on us.

7. Anything AI. Natural-language-processing and entity-extraction are interesting information-science problems, and somebody, somewhere, probably ought to be working on them. But those tools are going to pretty much suck for general-purpose uses for a really long time. So keep them out of our way while we try to actually improve the world in the meantime.

8. "Giant Global" Graph. The "Giant" and "Global" parts are menacing and unnecessary, and maybe ultimately just wrong. In data-modeling, the more giant and global you try to be, the harder it is to accomplish anything. What we're trying to do is make it possible to connect data at the point where humans want it to connect, not make all data connected. We're not trying to build one graph any more than the World Wide Web was trying to build one site.

9. Giant Global "Graph". This is a classic jargon failure: using an overloaded term with a normal meaning that makes sense in most of the same sentences. I don't know the right answer to this one, since "web" and "network" and "mesh" and "map" are all overloaded, too. We may have to use a new term here just so people know we're talking about something new. "Nodeset", possibly. "Graph" is particularly bad because it plays into the awful idea that "visualization" is all about turning already-elusive meaning into splendidly gradient-filled, non-question-answering splatter-plots.

10. URIs. Identifying things is a terrific idea, but "Uniform" is part of the same inane pipe-dream distraction as "Giant" and "Global", and "Resource" and the associated crap about protocols and representations munge together so many orthogonal issues that here again the discussions all end up being Zenotic debates over how many pins can be shoved halfway up which dancing angel.

11. "Metadata". There is no such thing as "metadata". Everything is relative. Everything is data. Every bit of data is meta to everything else, and thus to nothing. It doesn't matter whether the map "is" the terrain, it just matters that you know you're talking about maps when you're talking about maps. (And it usually doesn't matter if the computer knows the difference, regardless...)

12. RDF. It's insanely brilliant to be able to represent any kind of data structure in a universal lowest-common-denominator form. It's just insane to think that this particular brilliance is of pressing interest to anybody but data-modeling specialists, any more than hungry people want to hear your lecture about the atomic structure of food before they eat. RDF will be the core of the new model in the same way that SGML was the core of the web.

13. The Open-World Hypothesis. See "Global", above. Acknowledging the ultimate unknowability of knowledge is a profound philosophical and moral project, but not one for which we need computer assistance. Meanwhile, computers could be helping us make use of what we do know in all our little worlds that are already more than closed enough.