26 September 2007 to 23 June 2007
The Deciblog just published Justin Foley's reply to my implication that he botched his analysis of first letters of heavy-metal band names. [Read those if you want the rest of this to make any sense, not that I'm saying you need to want that...]
Foley cc'ed a bunch of other people in the actual email, and in an ensuing thread that got well-underway before I noticed it in my spam filter (which wouldn't have happened if I'd had the good sense to put all Southern Lord label personnel in my Address Book proactively), someone beat me to taking statistical issue with Foley's idea that my 50,000+ EM-derived sample-size was "too large", but agreed that in the abstract some sort of weighting scheme could account for the idea that Metallica earns M more points than some unkown band called The Austerity Program earns for A (or, in Foley's original analysis, T).
To all of which I said:
Weighting is easy. Let's say that a band only counts if somebody has actually bothered to write a review of one of their releases, and we'll weight them by the number of releases that have reviews. This method counts 6778 of EM's artists, who have 14057 releases between them.
Here are the percentages from the whole sample, the smaller sample unweighted, and the smaller sample weighted:
As you see, both restricting the sample and weighting do make small differences in the percentages, but S still wins, and D is still only in third.
It's also easy to rigorously calculate the most metal of all names, in essentially exactly the way [FH] suggests. Using only the smaller sample, we can build up the name by at each position taking the most common letter (again weighting each band name by the number of reviewed releases) among the names which match what we have so far, working towards a goal length obtained in the same weighted-average fashion. This produces this incremental search result:
goal length: 10
searching: [] 6778 partial matches
searching: [s] 718 partial matches
searching: [sa] 122 partial matches
searching: [sac] 23 partial matches
searching: [sacr] 23 partial matches
searching: [sacri] 9 partial matches
searching: [sacrif] 5 partial matches
searching: [sacrifi] 5 partial matches
searching: [sacrific] 5 partial matches
searching: [sacrifici] 4 partial matches
searching: [sacrificia] 3 partial matches
I submit that when Daree Eeee and the mighty Sacrificia tour together, Daree Eeee will be going on first, and carrying their own mangy amps off the stage when they're done with their 3 crappy songs...
glenn
PS: I most definitely did not type in any numbers by hand.
PPS: Excel is a fine tool for lots of things. Not *these* things, though.
The aforementioned FH then clarified the less-rigorous most-metal algorithm he had in mind, which was also easy to produce:
It's more or less just as easy to do it that way, considering only the weighted likelihood of a given letter at a given position with a given preceeding character.
searching: [] 6778 candidates
searching: [s] 718 candidates
searching: [sa] 1109 candidates
searching: [sar] 870 candidates
searching: [sara] 533 candidates
searching: [saran] 450 candidates
searching: [saran ] 521 candidates
searching: [saran o] 419 candidates
searching: [saran or] 271 candidates
searching: [saran ore] 270 candidates
searching: [saran orer] 182 candidates
I think Saran Orer get a guitar tech and some sandwiches, and go on after Daree Eeee, but they're still playing for people who are there to hail Sacrificia.
I hope everything is clear now, as I'm way overdue to get back to posting pictures of my daughter...
[Discussion, if you can bear the thought, here on vF.]
Foley cc'ed a bunch of other people in the actual email, and in an ensuing thread that got well-underway before I noticed it in my spam filter (which wouldn't have happened if I'd had the good sense to put all Southern Lord label personnel in my Address Book proactively), someone beat me to taking statistical issue with Foley's idea that my 50,000+ EM-derived sample-size was "too large", but agreed that in the abstract some sort of weighting scheme could account for the idea that Metallica earns M more points than some unkown band called The Austerity Program earns for A (or, in Foley's original analysis, T).
To all of which I said:
Weighting is easy. Let's say that a band only counts if somebody has actually bothered to write a review of one of their releases, and we'll weight them by the number of releases that have reviews. This method counts 6778 of EM's artists, who have 14057 releases between them.
Here are the percentages from the whole sample, the smaller sample unweighted, and the smaller sample weighted:
? | All | SU | SW |
# | 0.3 | 0.4 | 0.3 |
A | 9.1 | 9.8 | 9.8 |
B | 5.9 | 6.2 | 6.2 |
C | 6.3 | 6.4 | 6.0 |
D | 8.9 | 8.1 | 8.3 |
E | 4.9 | 4.6 | 4.3 |
F | 3.6 | 3.7 | 3.1 |
G | 3.0 | 3.5 | 3.4 |
H | 3.9 | 3.9 | 3.7 |
I | 3.7 | 3.5 | 3.7 |
J | 0.6 | 0.6 | 0.8 |
K | 2.2 | 2.3 | 2.6 |
L | 3.1 | 3.0 | 2.8 |
M | 7.4 | 6.8 | 8.2 |
N | 4.2 | 4.0 | 4.0 |
O | 2.3 | 2.4 | 2.6 |
P | 4.0 | 3.7 | 3.6 |
Q | 0.2 | 0.2 | 0.3 |
R | 3.3 | 2.9 | 3.2 |
S | 10.8 | 10.6 | 10.5 |
T | 4.5 | 4.6 | 4.4 |
U | 1.3 | 1.3 | 1.2 |
V | 2.7 | 2.8 | 2.8 |
W | 2.7 | 3.3 | 2.8 |
X | 0.3 | 0.4 | 0.4 |
Y | 0.2 | 0.3 | 0.3 |
Z | 0.7 | 0.7 | 0.5 |
As you see, both restricting the sample and weighting do make small differences in the percentages, but S still wins, and D is still only in third.
It's also easy to rigorously calculate the most metal of all names, in essentially exactly the way [FH] suggests. Using only the smaller sample, we can build up the name by at each position taking the most common letter (again weighting each band name by the number of reviewed releases) among the names which match what we have so far, working towards a goal length obtained in the same weighted-average fashion. This produces this incremental search result:
goal length: 10
searching: [] 6778 partial matches
searching: [s] 718 partial matches
searching: [sa] 122 partial matches
searching: [sac] 23 partial matches
searching: [sacr] 23 partial matches
searching: [sacri] 9 partial matches
searching: [sacrif] 5 partial matches
searching: [sacrifi] 5 partial matches
searching: [sacrific] 5 partial matches
searching: [sacrifici] 4 partial matches
searching: [sacrificia] 3 partial matches
I submit that when Daree Eeee and the mighty Sacrificia tour together, Daree Eeee will be going on first, and carrying their own mangy amps off the stage when they're done with their 3 crappy songs...
glenn
PS: I most definitely did not type in any numbers by hand.
PPS: Excel is a fine tool for lots of things. Not *these* things, though.
The aforementioned FH then clarified the less-rigorous most-metal algorithm he had in mind, which was also easy to produce:
It's more or less just as easy to do it that way, considering only the weighted likelihood of a given letter at a given position with a given preceeding character.
searching: [] 6778 candidates
searching: [s] 718 candidates
searching: [sa] 1109 candidates
searching: [sar] 870 candidates
searching: [sara] 533 candidates
searching: [saran] 450 candidates
searching: [saran ] 521 candidates
searching: [saran o] 419 candidates
searching: [saran or] 271 candidates
searching: [saran ore] 270 candidates
searching: [saran orer] 182 candidates
I think Saran Orer get a guitar tech and some sandwiches, and go on after Daree Eeee, but they're still playing for people who are there to hail Sacrificia.
I hope everything is clear now, as I'm way overdue to get back to posting pictures of my daughter...
[Discussion, if you can bear the thought, here on vF.]
And for completeness, here are the top bands by average rating across all releases, counting only the bands that have reviews from at least 10 different reviewers.
Vintersorg and Windir are the only bands to get an average above 90 with 20 or more reviewers. So clearly those are the greatest bands in all of heavy metal.
The worst metal band in the world is Apocalypse, who got an average review of 7.0 from 14 reviewers. Dishonorable mention to Six Feet Under, the only band with at least 4 releases and 10 reviewers who averaged below 50 (49.91 from 47 reviewers).
# | Artist | Reviewers | Average | Spread |
1 | Repulsion | 11 | 96.43 | 3.064 |
2 | Esoteric (UK) | 13 | 96.2 | 3.544 |
3 | Gorguts | 15 | 95.43 | 4.03 |
4 | Lykathea Aflame | 12 | 94.6 | 4.363 |
5 | Atheist | 12 | 94.56 | 4.272 |
6 | Solitude Aeturnus | 10 | 93.9 | 5.485 |
7 | Sacramentum | 11 | 93.7 | 6.067 |
8 | Disembowelment | 10 | 93.43 | 3.959 |
9 | Deeds of Flesh | 12 | 93.33 | 5.375 |
10 | The Axis of Perdition | 10 | 93.33 | 4.607 |
11 | Martyr (Can) | 11 | 93.29 | 4.399 |
12 | Cult of Luna | 10 | 92.56 | 6.735 |
13 | Persuader | 11 | 92.4 | 5.886 |
14 | Katharsis (Ger) | 10 | 92.0 | 5.715 |
15 | Novembre | 12 | 91.86 | 5.436 |
16 | Vintersorg | 21 | 91.67 | 6.968 |
17 | Demilich | 15 | 91.58 | 7.522 |
18 | Saint Vitus | 13 | 91.36 | 6.526 |
19 | Belphegor (Aut) | 15 | 90.94 | 6.571 |
20 | Manticora | 11 | 90.64 | 7.889 |
21 | Windir | 21 | 90.64 | 6.986 |
22 | Agent Steel | 15 | 90.2 | 6.002 |
23 | Negurã Bunget | 10 | 90.17 | 10.123 |
24 | Pentagram (US) | 10 | 90.13 | 6.827 |
25 | Maudlin of the Well | 11 | 90.08 | 8.558 |
26 | Deströyer 666 | 15 | 90.07 | 7.676 |
Vintersorg and Windir are the only bands to get an average above 90 with 20 or more reviewers. So clearly those are the greatest bands in all of heavy metal.
The worst metal band in the world is Apocalypse, who got an average review of 7.0 from 14 reviewers. Dishonorable mention to Six Feet Under, the only band with at least 4 releases and 10 reviewers who averaged below 50 (49.91 from 47 reviewers).
And here are the 25 least consistent:
Most of these follow the "great once, crap now" pattern (I think we can now officially call this "Sepulturding"), which makes one wonder whether developing a fan-base is really worth the bother in the end. Deathspell Omega deserve a special note: if they'd had the sense to release Infernal Battles under a different name, their other 4 albums would give them a standard deviation of 1.66 on an average of 92.86, and we could have a very obscure statistical argument over whether that means they are in fact even greater than Fates Warning.
# | Artist | Spread | Average |
1 | Sepultura | 25.022 | 67.55 |
2 | In Flames | 21.621 | 60.74 |
3 | Megadeth | 19.333 | 71.54 |
4 | Krieg | 18.455 | 71.19 |
5 | Deicide | 18.007 | 74.15 |
6 | Deathspell Omega | 17.925 | 83.93 |
7 | Metallica | 17.863 | 69.16 |
8 | Virgin Steele | 17.747 | 74.5 |
9 | Six Feet Under (US) | 17.601 | 55.15 |
10 | Dissection (Swe) | 17.275 | 70.67 |
11 | Sentenced | 16.626 | 66.79 |
12 | Moonspell | 16.312 | 78.28 |
13 | Nuclear Assault | 16.234 | 72.04 |
14 | Mayhem (Nor) | 15.754 | 67.69 |
15 | Machine Head (US) | 15.453 | 55.06 |
16 | Within Temptation | 15.379 | 61.25 |
17 | Slayer (US) | 14.537 | 73.97 |
18 | Children of Bodom | 13.943 | 77.63 |
19 | Black Label Society | 13.937 | 75.05 |
20 | Pantera | 13.921 | 69.97 |
21 | Celtic Frost | 13.717 | 73.4 |
22 | Cannibal Corpse | 13.476 | 74.89 |
23 | Motörhead | 13.319 | 78.23 |
24 | Danzig | 13.037 | 80.0 |
25 | Pain of Salvation | 12.934 | 87.77 |
Most of these follow the "great once, crap now" pattern (I think we can now officially call this "Sepulturding"), which makes one wonder whether developing a fan-base is really worth the bother in the end. Deathspell Omega deserve a special note: if they'd had the sense to release Infernal Battles under a different name, their other 4 albums would give them a standard deviation of 1.66 on an average of 92.86, and we could have a very obscure statistical argument over whether that means they are in fact even greater than Fates Warning.
My analytical tools make various otherwise-elusive questions easy to answer, so while I'm playing with heavy-metal data, here's another thing I wondered about: which bands have the narrowest and widest ranges of ratings? To answer this meaningfully I counted only releases that have 4 or more reviews, and only bands that have 4 or more of these releases and at least 10 different reviewers. For these I then averaged the ratings for each such release, and ran standard deviations on the sets of averages. So a low standard deviation means there's some consensus that the quality of the band's output is consistent. High means consensus that the quality varies widely.
Here are 25 most consistent. "Spread" is the standard deviation, "Average" is the average rating of the releases used in the calculation.
I sense a hastily-assembled cash-in Coroner boxset in our future. I think this also means that Fates Warning is the most consistently great band in all of heavy metal. So now we know. And Lamb of God gets some sort of weird prize for being the most consistently mediocre.
Here are 25 most consistent. "Spread" is the standard deviation, "Average" is the average rating of the releases used in the calculation.
# | Artist | Spread | Average |
1 | Coroner | 0.908 | 88.21 |
2 | Helstar | 1.455 | 90.54 |
3 | Moonsorrow | 1.676 | 89.98 |
4 | Dark Angel (US) | 1.767 | 82.15 |
5 | Candlemass | 1.842 | 89.78 |
6 | Lamb of God | 1.845 | 68.5 |
7 | Obituary | 2.004 | 85.32 |
8 | Type O Negative | 2.035 | 89.16 |
9 | Accept | 2.193 | 88.06 |
10 | Agent Steel | 2.479 | 90.49 |
11 | Fates Warning | 2.531 | 93.36 |
12 | Alice in Chains | 2.538 | 88.83 |
13 | Iron Savior | 3.025 | 88.25 |
14 | Falconer | 3.083 | 84.42 |
15 | Therion (Swe) | 3.159 | 90.38 |
16 | Sodom | 3.294 | 83.4 |
17 | Kamelot | 3.463 | 90.52 |
18 | Gorgoroth | 3.496 | 84.71 |
19 | Judas Iscariot | 3.602 | 89.03 |
20 | Bolt Thrower | 3.652 | 88.31 |
21 | Suffocation (US) | 3.701 | 86.48 |
22 | Angra | 3.758 | 88.63 |
23 | Enslaved (Nor) | 3.926 | 88.85 |
24 | Vader | 4.162 | 85.78 |
25 | Bal-Sagoth | 4.249 | 89.9 |
I sense a hastily-assembled cash-in Coroner boxset in our future. I think this also means that Fates Warning is the most consistently great band in all of heavy metal. So now we know. And Lamb of God gets some sort of weird prize for being the most consistently mediocre.
¶ Death to False(ly Tabulated) Metal · 18 September 2007 essay/tech
If you're going to waste your time doing obsessive analysis of data on which nobody's life or ecology depends, you ought to at least do it diligently and efficiently.
About a month ago The Deciblog published Justin Foley's attempt to answer the timeless question "How likely is a metal band to start their name with a particular letter of the alphabet?". For his sample set, Foley took the combined rosters of several metal labels (he doesn't reveal either list), which gave him 814 names, for which he then calculated the first-letter distributions, reaching the startling conclusion that the most likely letter is S.
Foley put his results in a bar-chart, which I assume means he used a spreadsheet, so hopefully he didn't spend a whole lot of time hand-counting. But he should have spent even less. The Encyclopaedia Metallum is not only just sitting there with a collaboratively-amassed and collectively-moderated database of 50,000+ metal bands, but they've even already split it up by first-letter and there are band-counts right at the top of each letter-page. Add, divide, and you're done.
Here, then, is the much better-informed version of this still-pointless breakdown:
Most of Foley's numbers aren't that far off. His small sample-size leads him to overestimate J and Y, and underestimate Q and V, but the absolute numbers for these letters are small anyway. He also seems to underestimate A, P and R, for reasons which are not apparent in his opaque reporting, but might have to do with language tendences, as EM's list is probably more global than his.
But the biggest discrepancy in Foley's numbers, by far, is T, which he credits with 9.3% of the band-names, where EM data indicates less than half that. Here I have a wearyingly mundane but highly plausible theory: Foley has accidentally counted all the bands whose names begin with "The " as T, despite specifically saying that he didn't. This is obviously both philosophically and methodologically repugnant, and although I regret the maelstrom of blogospheric outrage that will undoubtably accompany my public exposure of this error, I think we owe ourselves (the) Truth.
Of course, the thousands of metal fans around the world who have put time and effort into building the Encyclopaedia Metallum did it because they care about the music, not the alphabet. The site is the definitive central reference source for most metal-related matters, and certainly the final arbiter of obscurity for the vast unknown majority of the bands it lists.
It also has, in addition to its factual content, tens of thousands of percentage-scored, user-attributed, peer-moderated reviews of metal recordings. What it does not have is any sort of similarity analysis to make use of the huge data-graph represented by the connections between bands, users and ratings. There is a wearyingly mundane reason for this, too: trying to do similarity analysis with SQL queries will make you want to eat your own neck.
So here is an actual contribution to the world's knowledge on this admittedly peripheral subject: empath, the missing similarity analysis of EM user/review/band data, accurate as of yesterday. Pick a band, see the other bands that people who like the first band also like. Data wants to form shapes. In a better world, this would be just as easy for EM to do themselves, updating live, as it is for them to serve their raw data into web pages.
Easier, actually. I bet it took me less time to do this analysis than it took Foley to make a bar-chart of first letters. But I have better tools. I have better tools because at the moment I'm paid to design better tools. If I do my job well enough, eventually you'll have my better tools, too. I'm not designing them to tabulate heavy metal, I'm designing them to answer questions. Not all answers turn out to be shaped like Truths, of course. But if you can't answer them, you can't be sure which are which.
About a month ago The Deciblog published Justin Foley's attempt to answer the timeless question "How likely is a metal band to start their name with a particular letter of the alphabet?". For his sample set, Foley took the combined rosters of several metal labels (he doesn't reveal either list), which gave him 814 names, for which he then calculated the first-letter distributions, reaching the startling conclusion that the most likely letter is S.
Foley put his results in a bar-chart, which I assume means he used a spreadsheet, so hopefully he didn't spend a whole lot of time hand-counting. But he should have spent even less. The Encyclopaedia Metallum is not only just sitting there with a collaboratively-amassed and collectively-moderated database of 50,000+ metal bands, but they've even already split it up by first-letter and there are band-counts right at the top of each letter-page. Add, divide, and you're done.
Here, then, is the much better-informed version of this still-pointless breakdown:
? | % | * |
# | 0.3% | |
A | 9.1% | ********* |
B | 5.9% | ***** |
C | 6.3% | ****** |
D | 8.9% | ******** |
E | 4.9% | **** |
F | 3.6% | *** |
G | 3.0% | *** |
H | 3.9% | *** |
I | 3.7% | *** |
J | 0.6% | |
K | 2.2% | ** |
L | 3.1% | *** |
M | 7.4% | ******* |
N | 4.2% | **** |
O | 2.3% | ** |
P | 4.0% | *** |
Q | 0.2% | |
R | 3.3% | *** |
S | 10.8% | ********** |
T | 4.5% | **** |
U | 1.3% | * |
V | 2.7% | ** |
W | 2.7% | ** |
X | 0.3% | |
Y | 0.2% | |
Z | 0.7% |
Most of Foley's numbers aren't that far off. His small sample-size leads him to overestimate J and Y, and underestimate Q and V, but the absolute numbers for these letters are small anyway. He also seems to underestimate A, P and R, for reasons which are not apparent in his opaque reporting, but might have to do with language tendences, as EM's list is probably more global than his.
But the biggest discrepancy in Foley's numbers, by far, is T, which he credits with 9.3% of the band-names, where EM data indicates less than half that. Here I have a wearyingly mundane but highly plausible theory: Foley has accidentally counted all the bands whose names begin with "The " as T, despite specifically saying that he didn't. This is obviously both philosophically and methodologically repugnant, and although I regret the maelstrom of blogospheric outrage that will undoubtably accompany my public exposure of this error, I think we owe ourselves (the) Truth.
Of course, the thousands of metal fans around the world who have put time and effort into building the Encyclopaedia Metallum did it because they care about the music, not the alphabet. The site is the definitive central reference source for most metal-related matters, and certainly the final arbiter of obscurity for the vast unknown majority of the bands it lists.
It also has, in addition to its factual content, tens of thousands of percentage-scored, user-attributed, peer-moderated reviews of metal recordings. What it does not have is any sort of similarity analysis to make use of the huge data-graph represented by the connections between bands, users and ratings. There is a wearyingly mundane reason for this, too: trying to do similarity analysis with SQL queries will make you want to eat your own neck.
So here is an actual contribution to the world's knowledge on this admittedly peripheral subject: empath, the missing similarity analysis of EM user/review/band data, accurate as of yesterday. Pick a band, see the other bands that people who like the first band also like. Data wants to form shapes. In a better world, this would be just as easy for EM to do themselves, updating live, as it is for them to serve their raw data into web pages.
Easier, actually. I bet it took me less time to do this analysis than it took Foley to make a bar-chart of first letters. But I have better tools. I have better tools because at the moment I'm paid to design better tools. If I do my job well enough, eventually you'll have my better tools, too. I'm not designing them to tabulate heavy metal, I'm designing them to answer questions. Not all answers turn out to be shaped like Truths, of course. But if you can't answer them, you can't be sure which are which.
1. Good light.
2. Elevate child to proper height.
3. Remove awkward outer casing from instrument.
2. Elevate child to proper height.
3. Remove awkward outer casing from instrument.
It should be possible to refer explicitly to something particular on a web page. To anything, whether or not the page's author thought to assign it a (secret) labeled ahead of time.
The most obvious way to mostly provide this, it seems to me, is for the HTTP URL syntax to include a way to specify a search string at the end, which the browser simply plugs into its own Find function after loading the page. I will randomly suggest that since we already have "#" for fragments at the end of a URL, and valid fragment IDs must begin with an alphanumeric or an underscore, "#=" could be used for passing search text. So where
http://www.furia.com/page.cgi?type=log&id=273
refers to this blog-entry as a whole page, you could also do
http://www.furia.com/page.cgi?type=log&id=273#=this+phrase
to refer directly to this phrase.
For extra credit we could also support "@" and a number for getting to the Nth use of that text. So
http://www.furia.com/page.cgi?type=log&id=273#=this+phrase@2
would refer directly to this phrase, not the one above.
Useful, semantically reasonable, eminently implementable.
[See the discussion on vF.]
The most obvious way to mostly provide this, it seems to me, is for the HTTP URL syntax to include a way to specify a search string at the end, which the browser simply plugs into its own Find function after loading the page. I will randomly suggest that since we already have "#" for fragments at the end of a URL, and valid fragment IDs must begin with an alphanumeric or an underscore, "#=" could be used for passing search text. So where
http://www.furia.com/page.cgi?type=log&id=273
refers to this blog-entry as a whole page, you could also do
http://www.furia.com/page.cgi?type=log&id=273#=this+phrase
to refer directly to this phrase.
For extra credit we could also support "@" and a number for getting to the Nth use of that text. So
http://www.furia.com/page.cgi?type=log&id=273#=this+phrase@2
would refer directly to this phrase, not the one above.
Useful, semantically reasonable, eminently implementable.
[See the discussion on vF.]
¶ Death to False (Metal) · 8 August 2007 essay/tech
Two of the very bad features of the SQL-based conception of relational data-modeling, I think, are these:
- one-to-many relationships are significantly harder to handle than one-to-one relationships (and, inversely, many-to-one relationships easiest to handle by defactoring)
- absence is significantly harder to handle than presence
It's easy to add an Artist column to an Album table, for example, so we get:
But if you want to model the fact that collaboration albums can have multiple artists, you need an Albums table, an Artists table, a join table, and a bunch of IDs:
This is already a pain in the ass, and thus you get a world filled with cop-outs like this:
and this:
both of which are tolerable enough to just look at, but awful when you go to try to get the computer to answer even perfectly sensible questions like what albums Boris has done.
And then, when you want to add a little more detail, you really ought to do this:
but instead you'll probably try to get away with:
or
because if you have the latter, you can do:
which is easy to understand, although it won't work because you forgot you don't have a field called plain "Artist" anymore. Whereas with the five-table form you have to do something like:
and that's probably still wrong.
I submit that this is all a colossal mess, and it's a wonder the database-driven software we get driven out of it isn't even more woeful than it is. A better paradigm would embrace the opposites of these two flaws:
- the modeling of a relationship should be the same whether the relationship is one-to-one, one-to-many, many-to-one or many-to-many, and should be the same no matter how many "many"s you have
- absence and presence should be equally easy to assess
And I think a useful rough metric for such a new system is that it allows you to model your data without ever needing to say FALSE.
And I think this, I think, because I think this is how we think and talk about relationships between things when nobody is getting in our way:
Black One was recorded by Sunn O))), and was released in CD, LP and Download formats. Altar was recorded by Sunn O))) and Boris, and was released in CD and LP formats. All of these albums were released on CD and LP. Altar is the only one that wasn't released in Download format.
Take these out of sentence form and make the data-types explicit and everything is still perfectly straightforward:
or, looking at the same data from a different perspective:
and then the old questions are also easier:
Subliminating the one/many distinction allows us to talk about albums with 3 or 8 or 0 artists just as readily as we talk about albums with 1 or 2. Fixing the query language to handle absence lets us cope with the introduction of a new edition into our dataset, say, without having to add a new field for it and/or go through asserting that everything we already knew about doesn't have that format. I suspect pretty much every database field headed X and filled with TRUEs and FALSEs could at least be more usefully modeled in my new world as a relationship called "Characteristics" that either does or doesn't include X. And usually, as with formats, there's actually a meaningful label that contains information itself. The point is that things have relationships to their characteristics, or their formats. They don't, except in a very existential sense, have a relationship to falseness. FALSE is a machine thing.
The relationally astute will note that the five-table version is the fully-normalized representation that would support an SQL-based implementation of my new form as a display abstraction, and presumably my pseudo-query-code could be preprocessed into the messy SQL so I'd never see it. But that's my point: SQL is too low an abstraction, which makes it too hard to do things naturally, which makes it too hard to pass along natural behavior to the victims of the system. I'm mainly arguing for a better abstraction. (Although I also expect it's still always going to be bad that there's no better underlying way to represent lists, so I'm also probably arguing for the better abstraction to not merely be layered on top of the bad one.)
The discographically astute will note that Altar is available on iTunes, and that I have vastly undermodeled the rich and complicated space of Sunn O))) release formats. But if I'd done the examples in real detail, I'd still be typing out the bad versions...
- one-to-many relationships are significantly harder to handle than one-to-one relationships (and, inversely, many-to-one relationships easiest to handle by defactoring)
- absence is significantly harder to handle than presence
It's easy to add an Artist column to an Album table, for example, so we get:
Album | Artist |
Black One | Sunn O))) |
But if you want to model the fact that collaboration albums can have multiple artists, you need an Albums table, an Artists table, a join table, and a bunch of IDs:
Albums
| Artists
| Authorship
|
This is already a pain in the ass, and thus you get a world filled with cop-outs like this:
Album | Artist |
Black One | Sunn O))) |
Altar | Sunn O))) / Boris |
and this:
Album | Artist 1 | Artist 2 | Artist 3 |
Black One | Sunn O))) | FALSE | FALSE |
Altar | Sunn O))) | Boris | FALSE |
both of which are tolerable enough to just look at, but awful when you go to try to get the computer to answer even perfectly sensible questions like what albums Boris has done.
And then, when you want to add a little more detail, you really ought to do this:
Albums
| Artists
| Formats
| Authorship
| Formatship
|
but instead you'll probably try to get away with:
Album | Artist | FormatCode |
Black One | Sunn O))) | CD/LP/D |
Altar | Sunn O))) & Boris | CD/LP |
or
Album | Artist 1 | Artist 2 | Artist 3 | CD | LP | D |
Black One | Sunn O))) | FALSE | FALSE | TRUE | TRUE | TRUE |
Altar | Sunn O))) | Boris | FALSE | TRUE | TRUE | FALSE |
because if you have the latter, you can do:
SELECT Album, Artist
FROM Albums
WHERE D=FALSE
which is easy to understand, although it won't work because you forgot you don't have a field called plain "Artist" anymore. Whereas with the five-table form you have to do something like:
SELECT Albums.Album, Artists.Artist
FROM Albums, Artists, Authorship
WHERE Albums.ID=Authorship.AlbumID
AND Artists.ID=Authorship.ArtistID
AND NOT EXISTS (
SELECT *
FROM Formats, Formatship
WHERE Albums.ID=Formatship.AlbumID
AND Formats.ID=Formatship.FormatID
AND Formats.Format='Download'
)
and that's probably still wrong.
I submit that this is all a colossal mess, and it's a wonder the database-driven software we get driven out of it isn't even more woeful than it is. A better paradigm would embrace the opposites of these two flaws:
- the modeling of a relationship should be the same whether the relationship is one-to-one, one-to-many, many-to-one or many-to-many, and should be the same no matter how many "many"s you have
- absence and presence should be equally easy to assess
And I think a useful rough metric for such a new system is that it allows you to model your data without ever needing to say FALSE.
And I think this, I think, because I think this is how we think and talk about relationships between things when nobody is getting in our way:
Black One was recorded by Sunn O))), and was released in CD, LP and Download formats. Altar was recorded by Sunn O))) and Boris, and was released in CD and LP formats. All of these albums were released on CD and LP. Altar is the only one that wasn't released in Download format.
Take these out of sentence form and make the data-types explicit and everything is still perfectly straightforward:
Album: [Black One]
Artists: [Sunn O)))]
Formats: [CD] [LP] [Download]
Album: [Altar]
Artists: [Sunn O)))] [Boris]
Formats: [CD] [LP]
or, looking at the same data from a different perspective:
Format: [CD]
Albums: [Black One] [Altar]
Format: [LP]
Albums: [Black One] [Altar]
Format: [Download]
Albums: [Black One]
and then the old questions are also easier:
FIND Formats WHOSE Albums INCLUDE 'Black One' and 'Altar'
FIND Albums WHOSE Formats DON'T INCLUDE 'Download'
Subliminating the one/many distinction allows us to talk about albums with 3 or 8 or 0 artists just as readily as we talk about albums with 1 or 2. Fixing the query language to handle absence lets us cope with the introduction of a new edition into our dataset, say, without having to add a new field for it and/or go through asserting that everything we already knew about doesn't have that format. I suspect pretty much every database field headed X and filled with TRUEs and FALSEs could at least be more usefully modeled in my new world as a relationship called "Characteristics" that either does or doesn't include X. And usually, as with formats, there's actually a meaningful label that contains information itself. The point is that things have relationships to their characteristics, or their formats. They don't, except in a very existential sense, have a relationship to falseness. FALSE is a machine thing.
The relationally astute will note that the five-table version is the fully-normalized representation that would support an SQL-based implementation of my new form as a display abstraction, and presumably my pseudo-query-code could be preprocessed into the messy SQL so I'd never see it. But that's my point: SQL is too low an abstraction, which makes it too hard to do things naturally, which makes it too hard to pass along natural behavior to the victims of the system. I'm mainly arguing for a better abstraction. (Although I also expect it's still always going to be bad that there's no better underlying way to represent lists, so I'm also probably arguing for the better abstraction to not merely be layered on top of the bad one.)
The discographically astute will note that Altar is available on iTunes, and that I have vastly undermodeled the rich and complicated space of Sunn O))) release formats. But if I'd done the examples in real detail, I'd still be typing out the bad versions...
¶ 1 July 2007 child/photo
What Muybridge did for horses galloping, I will become famous by doing for tiny children sneezing.
¶ first loves · 23 June 2007 child/photo
My parents report, and I vaguely remember this myself, that for the first few years of my being allowed to go to movie theaters and see real (i.e., non-"family") movies, I would invariably come home after each one and deliver some epic panegyric insisting that whatever it was was the best movie I had ever seen. The only specific movies I recall thinking this of as an impressionable early movie-goer are The Eagle Has Landed, Capricorn One and Clash of the Titans, and I submit that it is not unreasonable to consider this an ascending order of quality, albeit in the last case maybe only if one was a 14-year-old boy in 1981 and hadn't seen pointedly naked movie-breasts in anything else yet.
You will eventually get to the age where we begin worrying about how to stage your exposure to adult concepts, too, and before that the age where we start having to negotiate with your own loopily underinformed idea of your personal tastes. But for now your comprehension is blissfully abstract, and pretty much anything is likely to be the greatest thing ever, or close enough for us to amuse ourselves by inferring the nuances of your reactions from the gradually-less-random expressions that flicker across your face.
I have been taking advantage of this state by attempting to give you a sweepingly unguarded pre-self-consciousness grounding in modern rock music. I realize this might sound suspiciously like me just listening to music while I'm holding you, but your reactions are both surprisingly organized and not what I claimed I thought they were going to be, so I feel like the experiments are at least marginally non-solipsistic. Later, no doubt, you will turn out to mostly like some new kind of music B and I fundamentally fail to grasp, but at least you'll have the benefit of some Manic Street Preachers b-sides buried deep in your psyche.
One of my favorite dopey anticipatory theories, developed a while ago after reading an evocative article about the sonic harshness of the womb, was that babies are naturally intuitive fans of deeply textural black metal. The coursing blood all around you, the sudden tumbling motions, the frantic gnashing hum of your own brain growing. Might as well call it birth metal.
This idea proves to have some limited sense to it, at least, in that churning noise at high volume has some undeniable visceral effect. There's a long stretch of the middle of Part II of Metal Machine Music that puts you to sleep approximately as effectively as the "ocean" loop on the sound-module that came with your pack-and-play. Likewise a few carefully-selected sections of Aube's Aqua Syndrome. The ocean noises also kind of work on Bethany, though, whereas Reed and Aube kind of make her want to encase her head in cement, so given the open format of our house, and our affection for her head, the ocean sounds usually seem like the better option in practice.
Little Tiggers do not turn out to like death metal at all, to my mild surprise and pro-forma-ly brooding disappointment. Even the Leviathan and Xasthur pieces I holistically experience as most textural prove to have far too many dynamic and tonal shifts at the second-by-second scale, and you're perpetually either getting restless during a quiet part or being startled when the noise blasts in again. Sad. Statistically this was the most likely age at which you'd appreciate death metal, in that there's not even the outside danger of your accidentally figuring out any of the words. Later you'll realize it's another silly thing that only boys like, and thus yet another thing for which I might as well apologize in advance.
What you like best, by far, is steady, propulsive, fast, opportunistically rock-inflected pop songs. Four minutes at most, three is better. No long intros or outros. Female singers more than men (but James Bradfield and Paul Smith are OK), and big bonus points for any break where everything drops out except the drums and vocals. B thinks this is all about how I dance with you during those kinds of songs, but you and I know it's more than that. Fefe Dobson's "Rock It Till You Drop It" is, after all, a very fine song about perseverance, or maybe equanimity, and I'm only dancing the way anybody should feel inspired to. The Go-Go's are cultural icons, and give us a way to understand how the Spice Girls embody the dangerous nominality of post-ironic identity. Shakira represents both the emotional vitality of culture and the insistent triumph over it of individuality and hair. Tommy heavenly6 is, I agree, universally charming, and by the time you're old enough to wonder what she's singing, maybe I'll again have enough spare time to translate it.
But in case you're ever really curious, your favorite song for your first two months has been, unquestionably, "U & Ur Hand" by Pink. I don't remember how I discovered this. You like "Who Knew" and "Leave Me Alone (I'm Lonely)" fairly well, too, but "U & Ur Hand" is clearly special. We have a special dance we've invented for it, you and I, and it has never, ever failed to please you. Once you went from beet-red squalling to fast asleep before it even got to the two-minute mark, but usually you just dance with me. Or, anyway, I dance, and you lie in my arms with your head nestled into my left armpit, gazing up at me with love. Or past me towards the nearest skylight, or lamp, or faint diagonal shadow, but I collect the love en passant.
As you'll appreciate when you're considerably older, "U & Ur Hand" is not exactly intended for children. I usually sing along, anyway. It's a song of aggressive feminine self-affirmation, mostly, so even if its specific defiance and injunctions aren't really age-appropriate for you, or gender-appropriate for me, I think it's healthy for you to be exposed to its energy.
And if B is right, and it's really only me you're responding to, then you're learning about how my goofy tastes in music and defiance become manifest in my body's movements and my heart's resonances, and although I don't need you to love exactly what I love, I want desperately for you to love what you love at least as jubilantly as I love what I love.
I've long since seen a lot of movies I liked more than Capricorn One, but more than a few I liked less. "U & Ur Hand" probably won't always be your favorite song, and it'll probably save your mom and me some tense parent-teacher conferences along the way if it isn't. But not everything always has to change, and your first intuitions aren't necessarily wrong. I'll play this for you again someday, when you've forgotten, and we'll see what we think. And, maybe, remember what this felt like, when every next minute with you was so amazingly likely to be the best one yet.