¶ (Just Slightly) Beyond CSV · 29 July 2008 essay/tech
I like CSV. It addresses a common data-exchange need, and does it with admirably little conceptual or syntactic overhead. As formats go, it's totally unglamorous, but glamor is very definitely not an end in itself. If you need to exchange a simple table of simple data, one datum per row/column, CSV is a fine tool.
I also like JSON. Actually, I really like JSON. It has the data-modeling virtues of my old Elemental XML proposal, with a cleaner semantic distinction between hashes and arrays. I would like to see the spec amended to allow keys to be strings or numbers, rather than only strings, but other than that I think the whole thing is a solid, self-consistent, useful, appropriately scaled solution to an impressively large set of data-exchange and serialization problems. Certainly anything you could model with CSV, you could put into JSON instead, and not at all vice versa.
But the Law of Conservation of Complexity applies here, as in most things. Although JSON syntax is pretty simple, any particular JSON data model can be arbitrarily complex, and the modeling issues in data-exchange are always far more involved than the syntactic ones. The important simplicity of CSV is in constraining the data model to be a table.
Flexibility, however, can be employed in moderation. It is possible, for example, to render a CSV model in JSON syntax. That is, to turn this:
Artist,Albums
Cradle of Filth,7
Nightwish,8
To/Die/For,4
into this:
[
["Artist","Albums"],
["Cradle of Filth","7"],
["Nightwish","8"],
["To\/Die\/For","4"]
]
The advantage of doing so will, I hope, become quickly obvious once you realize that JSON then makes it trivially easy to have cell values be arrays. If we want a list of artists, each with their list of albums, CSV forces us to refactor:
Album,Artist
Cruelty and the Beast,Cradle of Filth
Damnation and a Day,Cradle of Filth
Dusk...and Her Embrace,Cradle of Filth
Midian,Cradle of Filth
Nymphetamine,Cradle of Filth
The Principle of Evil Made Flesh,Cradle of Filth
Thornography,Cradle of Filth
Angels Fall First,Nightwish
Bless the Child,Nightwish
Century Child,Nightwish
Dark Passion Play,Nightwish
Oceanborn,Nightwish
Once,Nightwish
Over the Hills and Far Away,Nightwish
Wishmaster,Nightwish
All Eternity,To/Die/For
Epilogue,To/Die/For
IV,To/Die/For
Jaded,To/Die/For
This is annoying at minimum, and unworkable if we also wanted to see other properties of artists. But in JSON, or this modeling reduction of JSON that we might as well call "JSV", we can simply say:
[
["Artist","Album"],
["Cradle of Filth",["The Principle of Evil Made Flesh","Dusk...and Her Embrace","Cruelty and the Beast","Midian","Damnation and a Day","Nymphetamine","Thornography"]],
["Nightwish",["Angels Fall First","Oceanborn","Wishmaster","Over the Hills and Far Away","Century Child","Bless the Child","Once","Dark Passion Play"]],
["To\/Die\/For",["All Eternity","Epilogue","Jaded","IV"]]
]
JSV retains the modeling simplicity of CSV (a single row of "fields" defined at the top), but allows a cell to be either a single value or a list of them. This is still unglamorous, but to me it's a big gain in expressivity for essentially no technical cost. I suggest that if combining lists and tables isn't a big deal to you (and you're the sort of person to whom a data-serialization format could be a big deal), maybe you've allowed tabular single-mindedness to beat all the natural human list-thinking out of you. Time to reclaim your right to multiplicity. Time to reclaim lots of your rights, really. Yet another argument for more lists, everywhere around us.
I also like JSON. Actually, I really like JSON. It has the data-modeling virtues of my old Elemental XML proposal, with a cleaner semantic distinction between hashes and arrays. I would like to see the spec amended to allow keys to be strings or numbers, rather than only strings, but other than that I think the whole thing is a solid, self-consistent, useful, appropriately scaled solution to an impressively large set of data-exchange and serialization problems. Certainly anything you could model with CSV, you could put into JSON instead, and not at all vice versa.
But the Law of Conservation of Complexity applies here, as in most things. Although JSON syntax is pretty simple, any particular JSON data model can be arbitrarily complex, and the modeling issues in data-exchange are always far more involved than the syntactic ones. The important simplicity of CSV is in constraining the data model to be a table.
Flexibility, however, can be employed in moderation. It is possible, for example, to render a CSV model in JSON syntax. That is, to turn this:
Artist,Albums
Cradle of Filth,7
Nightwish,8
To/Die/For,4
into this:
[
["Artist","Albums"],
["Cradle of Filth","7"],
["Nightwish","8"],
["To\/Die\/For","4"]
]
The advantage of doing so will, I hope, become quickly obvious once you realize that JSON then makes it trivially easy to have cell values be arrays. If we want a list of artists, each with their list of albums, CSV forces us to refactor:
Album,Artist
Cruelty and the Beast,Cradle of Filth
Damnation and a Day,Cradle of Filth
Dusk...and Her Embrace,Cradle of Filth
Midian,Cradle of Filth
Nymphetamine,Cradle of Filth
The Principle of Evil Made Flesh,Cradle of Filth
Thornography,Cradle of Filth
Angels Fall First,Nightwish
Bless the Child,Nightwish
Century Child,Nightwish
Dark Passion Play,Nightwish
Oceanborn,Nightwish
Once,Nightwish
Over the Hills and Far Away,Nightwish
Wishmaster,Nightwish
All Eternity,To/Die/For
Epilogue,To/Die/For
IV,To/Die/For
Jaded,To/Die/For
This is annoying at minimum, and unworkable if we also wanted to see other properties of artists. But in JSON, or this modeling reduction of JSON that we might as well call "JSV", we can simply say:
[
["Artist","Album"],
["Cradle of Filth",["The Principle of Evil Made Flesh","Dusk...and Her Embrace","Cruelty and the Beast","Midian","Damnation and a Day","Nymphetamine","Thornography"]],
["Nightwish",["Angels Fall First","Oceanborn","Wishmaster","Over the Hills and Far Away","Century Child","Bless the Child","Once","Dark Passion Play"]],
["To\/Die\/For",["All Eternity","Epilogue","Jaded","IV"]]
]
JSV retains the modeling simplicity of CSV (a single row of "fields" defined at the top), but allows a cell to be either a single value or a list of them. This is still unglamorous, but to me it's a big gain in expressivity for essentially no technical cost. I suggest that if combining lists and tables isn't a big deal to you (and you're the sort of person to whom a data-serialization format could be a big deal), maybe you've allowed tabular single-mindedness to beat all the natural human list-thinking out of you. Time to reclaim your right to multiplicity. Time to reclaim lots of your rights, really. Yet another argument for more lists, everywhere around us.