furia furialog · Every Noise at Once · New Particles · The War Against Silence · Aedliga (songs) · photography · other things · contact
It is becoming increasingly possible for separate systems to perform each of the six major functions of data applications: storage, transformation (including creation), categorization (including tagging, indexing, search and retrieval), visualization, monitoring and the administration of trust.  

Historically, of course, these functions were usually not only performed by a single unified system, but mostly limited to that system. At its most insular, old-world applications entailed embedded storage in proprietary formats, integrated authoring tools and UI, integrated (if any) notification, and integrated (if any) user-management.  

In the old world of personal data applications, like spreadsheets and word-processors and whatever, standardized file systems at least separated storage and categorization from application logic (you could put your Excel file on a floppy disk, or in your "Budgets" folder). Semi-standardization of formats helped open data transformation and/or visualization a little bit (you could use Word's search-and-replace tools on an RTF file, or run Crystal Reports on your Paradox databases), but published formats are not quite the same thing as open formats. And monitoring and trust were usually expendable for personal applications, or solvable a function at a time.  

Old-world online applications changed the distribution of insularities. You could actually use several different tools to send, receive and monitor CompuServe data. Prodigy let you use any tool you wanted, as long as it was a construction-paper hammer designed by runt warthogs for use by cartoon infants. But the online service very clearly owned the physical storage, the content space and the identity space.  

The early web was a combination of progress and regress, pretty much no matter which directions you think those go. HTML offered the tantalizing prospect of separating the presentation logic from the data structures, but in practice browser convergence quickly resulted in this being true in only a somewhat obscure development-tools sense. You could produce your HTML files with different software than they would be read with, but you still had to take the client constraints heavily into account. HTML files could be moved around fairly easily, but cross-server transclusion got pretty ugly the moment you tried to move much beyond linking. And identity management was reinvented from scratch anywhere it was unavoidable.  

But we now have at least nominal rudimentary pieces of the ability to separate all of these. XML offers an interchangeable application-neutral storage format (or at least a meta-format), XML+HTTP gives us a way to virtualize storage (as long as we don't virtualize it too far), Google has demonstrated the scalable separation of categorization and some amount of visualization, and RSS is at least a conceptual step towards the separation of tracking. LDAP separates identity management for at least certain kinds of communities. These may not be the solutions, but they are indications that the solutions are possible and closer.  

But the next steps, in all cases, are huge, and at least as difficult culturally as they will be technically.  

Storage  

All systems must be prepared to handle external storage transparently to the data's owner, whether this actually means live reading and writing over the network or caching and mirroring to simulate it. An indexer must be able to hand you back the indexes it makes and updates, an image organizer must allow you to store the images on your own server, etc.  

Transformation  

All data must be stored in as neutral and open a format as possible. Application-neutral information must be tagged in standard self-describing ways. Proprietary information is acceptable only when mandated by definition (for internal security functions and precious little else), and where necessary must be clearly identified and attributed. These will be practical imperatives, not just moral ones. Secrecy is fragile, and the net routes around it instinctively.  

Categorization  

Anything that exists can be categorized. In many cases, the categorization will end up being qualitatively more valuable than the original information. The only difference between data and meta-data is that meta-data is data the owner of the thing didn't anticipate or provide for. The more fluidly a system can re-integrate the meta-data it spawns, the more powerful it will be. The more afraid you are of your audience, the faster they will depart.  

Visualization  

Similarly, the more readily a system opens itself to external visualization, the better off it will be. Whatever it is you own and control, it's never more than part of the experience. The default techno-social goal of a data application is to be the reference source for some kind of data. (The default business goal is to have some way to make money, not from that data but from that status.)  

Monitoring  

Various malformed and over-constrained attempts have been made to generalize the problems of monitoring, change tracking and notification into email, IM, RSS, Trackback, Konfabulator, Dashboard and countless proprietary and special-purpose schemes. The next generation has to supply a version that scales to the entire world, including not only its size and its bandwidth but also its heterogeneity and its self-organization. The new system has to rationalize all flows, including the malevolent ones.  

Trust  

Ultimately, though, the native currency of the new connected world will be trust. Every interaction of people and systems relies on it, usually inherently and implicitly. Existing systems have mostly survived on trust by exclusivity (firewalls, closed networks, internal identity management) obscurity (mostly self-selection) or informal accountability (feedback and self-policing). None of these scale. The new identity systems must be built not to administer specific applications but to provide universal credentials that verify a user's membership in defined communities. The new data systems must be built so that unknown individuals can be accepted on the basis of delegated authority. In the old world people were "users", users existed inside careful boundaries, and outside of those boundaries all there were were names. In the new world, people are the signals themselves, and a name is a name only by virtue of some authority, and maybe that authority by virtue of another one. In the new data world, where the scope of the network is as big as the scope of the planet, and the size is exponentially larger, the primary component of every transaction of storage, transformation, categorization, visualization or monitoring will be the intimate initialization of the basis of trust under which any two of us say anything to each other at all.
Site contents published by glenn mcdonald under a Creative Commons BY/NC/ND License except where otherwise noted.