¶ The Problem With Answers · 27 March 2009 essay/tech
In his post explaining his departure from Google, Douglas Bowman says "Yes, it's true that a team at Google couldn't decide between two blues, so they're testing 41 shades between each blue to see which one performs better. I had a recent debate over whether a border should be 3, 4 or 5 pixels wide, and was asked to prove my case. I can't operate in an environment like that."
He's not really trying to have an opinion-changing last word in an argument against data-driven product decisions, but if he were, this is not how to do it. If you believe that the proper width of a border can be tested, then Bowman's refusal to subject his intuitions to quantitative confirmation just sounds like petulant prima-donna nonsense. If you can test 41 shades of blue, this line of reasoning goes, you don't need to guess, so a guessing specialist is an annoying waste of everybody's time.
The great advantage of testing and data, of course, is that you get precise, decisive answers you can act on. Shade 31, with 3.7%, trouncing runner-up shade 14 with only 3.4%! Apply shade 31, declare progress.
But the great disadvantage of testing and data is that you get precise, decisive answers you can and will act on, but you almost never know what question you really asked. Sure, the people who saw shade 31 did some measurable thing at some measurable rate. But why? Is it shade 31? Or is it the constrast between shade 31 and the old shade? Or is it the interplay between shade 31 and some other thing you aren't thinking about, or possibly don't even control? Are you going to run your test again in a month to see if the results have changed? Did you cross-correlate them with HTTP_REFERER and index the colors on the pages people came from? What about all the combinations of these 41 shades and 41 backgrounds and 8 fonts and 3 border widths (12 if you vary each side of the box separately!) and 41 line-heights and 19 title-bar wordings and the color of the tie Jon Stewart was wearing the night before? Which things matter? How do you know?
You don't. And if you need to add some new element, tomorrow, you don't know which of the tests you've already run it invalidates. Are you going to rerun all of them, permuted 41 more new ways?
No. You are going to sheepishly post a job opening for a new guessing specialist. Bowman already had his last word. It was "Goodbye".
He's not really trying to have an opinion-changing last word in an argument against data-driven product decisions, but if he were, this is not how to do it. If you believe that the proper width of a border can be tested, then Bowman's refusal to subject his intuitions to quantitative confirmation just sounds like petulant prima-donna nonsense. If you can test 41 shades of blue, this line of reasoning goes, you don't need to guess, so a guessing specialist is an annoying waste of everybody's time.
The great advantage of testing and data, of course, is that you get precise, decisive answers you can act on. Shade 31, with 3.7%, trouncing runner-up shade 14 with only 3.4%! Apply shade 31, declare progress.
But the great disadvantage of testing and data is that you get precise, decisive answers you can and will act on, but you almost never know what question you really asked. Sure, the people who saw shade 31 did some measurable thing at some measurable rate. But why? Is it shade 31? Or is it the constrast between shade 31 and the old shade? Or is it the interplay between shade 31 and some other thing you aren't thinking about, or possibly don't even control? Are you going to run your test again in a month to see if the results have changed? Did you cross-correlate them with HTTP_REFERER and index the colors on the pages people came from? What about all the combinations of these 41 shades and 41 backgrounds and 8 fonts and 3 border widths (12 if you vary each side of the box separately!) and 41 line-heights and 19 title-bar wordings and the color of the tie Jon Stewart was wearing the night before? Which things matter? How do you know?
You don't. And if you need to add some new element, tomorrow, you don't know which of the tests you've already run it invalidates. Are you going to rerun all of them, permuted 41 more new ways?
No. You are going to sheepishly post a job opening for a new guessing specialist. Bowman already had his last word. It was "Goodbye".