30 August 2005 to 8 August 2005
It will be of interest to approximately nobody, but if this particular approximation misses you, I have written up an unsolicited idea about simplified XML.
I give this about a 97% chance of being completely pointless, and a 3% chance of being an inspired idea that will ultimately be the foundation of my reputation as an esoteric technical visionary.
And I wish I knew whether saying 3% is an example of humility or egotism.
I give this about a 97% chance of being completely pointless, and a 3% chance of being an inspired idea that will ultimately be the foundation of my reputation as an esoteric technical visionary.
And I wish I knew whether saying 3% is an example of humility or egotism.
The default architectural approach for a dynamic web site, these days, is to put all the content into a relational database system, retrieve pieces of it with SQL queries, and format the content on the fly with some combination of code, style-sheets and style transformations. There are many good reasons for this, among them scalable performance, manageable storage, transactional updating and a full relational query system for use in subsetting and searching.
When I was writing the forum software for vF, though, my primary goals were statelessness and an absolute minimum of installation dependencies, and I was willing to assume relatively small scales, so instead of using a database I opted to store notes as simple XML files directly in the file system. In my scalability tests this actually worked better than I'd expected, so I used a variation on the same approach for the new architecture for the rest of the site when I recently rewrote it.
I did wonder, though, how much speed I was giving up in the interest of file-system transparency and simplicity. So today I got around to dumping the back issues of my music-review column into a MySQL database so I could do a couple of performance tests of the two approaches.
The first test was to build an alphabetized list of issue titles. The SQL version of this is very simple, as the titles are stored in a discreet field, and SQL lets you demand sorting, so a query like "SELECT title,id FROM thistable ORDER BY title" gets the right data in the right order, and some simple post-processing formats it into HTML. The XML version builds a file list by operating-system glob expansion (which is every bit as glamorous as it sounds), reads in each file, uses string-matching to find the content of the appropriate XML mark-up and stuff it into an array of hashes (again, very sexy), and then sorts the array by the title hash key using a custom comparison function. This test is slightly biased in favor of the SQL approach, as my XML code does some extra string-processing in between retrieval and sorting that I left out of the SQL version because there isn't a directly corresponding intermediate step. The XML version could be optimized in several obvious ways, but the SQL version would also be faster with an index.
The second test was to retrieve the back-issues that contain a particular text phrase. Again, in SQL this is very easy: "SELECT title,id,content FROM thistable WHERE content LIKE '%this phrase%'". The XML method is the same as in the first test, plus some arcane string-matching and some even more arcane string-unmatching to avoid matches that occur inside of HTML tags. This second test is more significantly biased in favor of the SQL approach, because my XML code does a whole XML-HTML transformation step that I didn't bother plugging into the SQL version, plus the SQL version would produce false-positives inside tags that in production use would have to be caught in post-processing. There aren't any trivial accelerations for either approach to this problem, and it's a much more processing-intensive example than the first one, so this is the more interesting of the two tests.
In the first test, the SQL approach reliably generated the title list in about 45 milliseconds, and the XML approach generally took about 89ms, for an SQL advantage of a factor of 2, more or less. This is actually much less of a different than I anticipated, given the absurd brute-force nature of my current XML approach to this problem.
In the second test, even handicapped by post-processing, the XML method actually beat the SQL method. Every time, although not by a lot: SQL times range from 1.5s to 1.8s, XML times hover more consistently around 1.3s to 1.4s.
Neither of these tests were remotely scientific, and I have made no attempt to run then in any environment other than the one I really use, or at any scale other than the one I'm really dealing with. So it would be insane to conclude that a global revolution against databases is imminent. But maybe, at least, I'm not as crazy as I feared for trying to see how far I can get without them.
When I was writing the forum software for vF, though, my primary goals were statelessness and an absolute minimum of installation dependencies, and I was willing to assume relatively small scales, so instead of using a database I opted to store notes as simple XML files directly in the file system. In my scalability tests this actually worked better than I'd expected, so I used a variation on the same approach for the new architecture for the rest of the site when I recently rewrote it.
I did wonder, though, how much speed I was giving up in the interest of file-system transparency and simplicity. So today I got around to dumping the back issues of my music-review column into a MySQL database so I could do a couple of performance tests of the two approaches.
The first test was to build an alphabetized list of issue titles. The SQL version of this is very simple, as the titles are stored in a discreet field, and SQL lets you demand sorting, so a query like "SELECT title,id FROM thistable ORDER BY title" gets the right data in the right order, and some simple post-processing formats it into HTML. The XML version builds a file list by operating-system glob expansion (which is every bit as glamorous as it sounds), reads in each file, uses string-matching to find the content of the appropriate XML mark-up and stuff it into an array of hashes (again, very sexy), and then sorts the array by the title hash key using a custom comparison function. This test is slightly biased in favor of the SQL approach, as my XML code does some extra string-processing in between retrieval and sorting that I left out of the SQL version because there isn't a directly corresponding intermediate step. The XML version could be optimized in several obvious ways, but the SQL version would also be faster with an index.
The second test was to retrieve the back-issues that contain a particular text phrase. Again, in SQL this is very easy: "SELECT title,id,content FROM thistable WHERE content LIKE '%this phrase%'". The XML method is the same as in the first test, plus some arcane string-matching and some even more arcane string-unmatching to avoid matches that occur inside of HTML tags. This second test is more significantly biased in favor of the SQL approach, because my XML code does a whole XML-HTML transformation step that I didn't bother plugging into the SQL version, plus the SQL version would produce false-positives inside tags that in production use would have to be caught in post-processing. There aren't any trivial accelerations for either approach to this problem, and it's a much more processing-intensive example than the first one, so this is the more interesting of the two tests.
In the first test, the SQL approach reliably generated the title list in about 45 milliseconds, and the XML approach generally took about 89ms, for an SQL advantage of a factor of 2, more or less. This is actually much less of a different than I anticipated, given the absurd brute-force nature of my current XML approach to this problem.
In the second test, even handicapped by post-processing, the XML method actually beat the SQL method. Every time, although not by a lot: SQL times range from 1.5s to 1.8s, XML times hover more consistently around 1.3s to 1.4s.
Neither of these tests were remotely scientific, and I have made no attempt to run then in any environment other than the one I really use, or at any scale other than the one I'm really dealing with. So it would be insane to conclude that a global revolution against databases is imminent. But maybe, at least, I'm not as crazy as I feared for trying to see how far I can get without them.
¶ a life in numbers · 19 August 2005
My lifetime running distance, where for running purposes my life only began a year ago June, will hit 1000 miles about halfway through tomorrow's run. My first week of post-run/walk running I covered 9 miles; most weeks now I do 20. As of today I am on track for my arbitrary goal of running 1000 miles during calendar 2005. This isn't very much in serious running terms, but it's as far as I can go in as much time as I care to devote to the cause, and it's appealing to have an order of magnitude for a goal.
I am slightly behind on my equally arbitrary goal of reading 50 books in 2005. I've finished 28, which suggests 43 or 44 for the year, but the numbers don't account for the fact that I spent most of May and June reading travel guides and grammar manuals, so I expect the current extrapolation underestimates the eventual total. If I get to 46, I'll have read more books in 2005 than in any year since 1996. A graph plotting my reading against my writing would explain most of the fluctuations in the former, as if there is a conservation of the time I spend with written words.
Out of my goals of gaining and losing zero pounds in 2005, I have so far gained zero pounds and lost zero pounds, both of which trends point to a total calendar-year weight-gain and -loss of zero pounds.
Out of my goal of having been married for one year during the first year of my marriage, I completed one year, for 100% of my goal. During the first seven days of the second year of my marriage, I have successfully been married for seven days. My goal is to reach two years of marriage by the end of the second year.
Math is a comfort. If only anything important could be so simply measured.
I am slightly behind on my equally arbitrary goal of reading 50 books in 2005. I've finished 28, which suggests 43 or 44 for the year, but the numbers don't account for the fact that I spent most of May and June reading travel guides and grammar manuals, so I expect the current extrapolation underestimates the eventual total. If I get to 46, I'll have read more books in 2005 than in any year since 1996. A graph plotting my reading against my writing would explain most of the fluctuations in the former, as if there is a conservation of the time I spend with written words.
Out of my goals of gaining and losing zero pounds in 2005, I have so far gained zero pounds and lost zero pounds, both of which trends point to a total calendar-year weight-gain and -loss of zero pounds.
Out of my goal of having been married for one year during the first year of my marriage, I completed one year, for 100% of my goal. During the first seven days of the second year of my marriage, I have successfully been married for seven days. My goal is to reach two years of marriage by the end of the second year.
Math is a comfort. If only anything important could be so simply measured.
The Field Mice: "Emma's House" (1.7M mp3)
I'm still not getting through sunny days without listening to Waltham's blast of uncomplicatedly shallow puppy-romantic zeal at least once, but it's good to have counterpoint, and if Waltham are a recursivist's dream of a band trying to impress girls by playing songs about trying to impress girls by being in a band, then the Field Mice were their polar opposite, a band about trying to figure out how best to end up alone. This one reduces melancholy almost past its essence, understanding perfectly that the core of redemptive loneliness is self-circumscribing, and that in the perfect portrait of sadness details exist only to anchor atmosphere.
I'm still not getting through sunny days without listening to Waltham's blast of uncomplicatedly shallow puppy-romantic zeal at least once, but it's good to have counterpoint, and if Waltham are a recursivist's dream of a band trying to impress girls by playing songs about trying to impress girls by being in a band, then the Field Mice were their polar opposite, a band about trying to figure out how best to end up alone. This one reduces melancholy almost past its essence, understanding perfectly that the core of redemptive loneliness is self-circumscribing, and that in the perfect portrait of sadness details exist only to anchor atmosphere.
She touches three keys with the same fingers she must have run through your hair, and then we are away and I will never have to see you again.
She stands by windows onto ten worlds, watching a hundred billion people dodge through each other's enmities, and we duel quietly with our convictions about what she hopes to see among them.
It is only through the invisible mercy of infinitesimal machines that she can breathe in this air and my company.
You have no idea how much more courage it took to come out here alone with what I know and brought with me than to land on these rocks where we know nothing and owe nothing.
In the logs it is at first Minerva, and only self-consciously do we leave off the catalog number; and then later Beta, when discovered implications begin to eclipse portaged expectations; and in my mind it is half of the time Home, and half of the time only Without You.
She stands by windows onto ten worlds, watching a hundred billion people dodge through each other's enmities, and we duel quietly with our convictions about what she hopes to see among them.
It is only through the invisible mercy of infinitesimal machines that she can breathe in this air and my company.
You have no idea how much more courage it took to come out here alone with what I know and brought with me than to land on these rocks where we know nothing and owe nothing.
In the logs it is at first Minerva, and only self-consciously do we leave off the catalog number; and then later Beta, when discovered implications begin to eclipse portaged expectations; and in my mind it is half of the time Home, and half of the time only Without You.
¶ new design, new complaints · 12 August 2005
I have switched over to my new site-design. Hopefully everything is working. If you find anything broken, please let me know.
If the new systems are working, they are even capable of delivering you a new list of complaints about the sixth Harry Potter book, which contains spoilers, so don't read it if you're trying to preserve ignorance.
If the new systems are working, they are even capable of delivering you a new list of complaints about the sixth Harry Potter book, which contains spoilers, so don't read it if you're trying to preserve ignorance.
I've fixed a few more bugs in the new code (most notably: the TWAS artist indices are working again, all embedded links and images should now work in the RSS as well as HTML, and furialog tag-filtering is automatically reflected in the attached RSS feed) and tweaked the visual layout very slightly.
For my own reference, here are the primary features added in the new system:
- consistent page design across all content ("all" meaning TWAS/furialog/photo/songs/code/misc)
- unified search across all content
- unified search-hit highlighting in all content
- unified dynamic RSS across all content, filterable by source, text and tag
- unified back-end content processing rendering HTML out of XML content files driven by XML indices, including embeddable magic functions for photosets and csv tables
- improved table embedder that handles sorting/filtering inside of other pages, blank cells in any field type, and tagless sorting of HTML fields
- automatic context-sensitive RSS feeds on all pages
- previewing for configurable RSS feeds
- site navigation bar on all pages
- sidebar section navigation in all sections
- better use of permalinks, searching and tagging in furialog
- paging of furialog entries (including text-/tag-filters and permalink mode)
- replacement of section index pages with pass-through to latest content
- replacement of individual old-style copyright notices with global CC notice
- single streamlined semi-semantic style sheet for all content
- automatic current-section nav highlighting in main navigation bar, sidebar navigation and subsections
- automatic previous/next issue navigation in TWAS
- automatic previous/next list navigation in TWAS best-of lists
- layout scalable to browser and text sizes
For my own reference, here are the primary features added in the new system:
- consistent page design across all content ("all" meaning TWAS/furialog/photo/songs/code/misc)
- unified search across all content
- unified search-hit highlighting in all content
- unified dynamic RSS across all content, filterable by source, text and tag
- unified back-end content processing rendering HTML out of XML content files driven by XML indices, including embeddable magic functions for photosets and csv tables
- improved table embedder that handles sorting/filtering inside of other pages, blank cells in any field type, and tagless sorting of HTML fields
- automatic context-sensitive RSS feeds on all pages
- previewing for configurable RSS feeds
- site navigation bar on all pages
- sidebar section navigation in all sections
- better use of permalinks, searching and tagging in furialog
- paging of furialog entries (including text-/tag-filters and permalink mode)
- replacement of section index pages with pass-through to latest content
- replacement of individual old-style copyright notices with global CC notice
- single streamlined semi-semantic style sheet for all content
- automatic current-section nav highlighting in main navigation bar, sidebar navigation and subsections
- automatic previous/next issue navigation in TWAS
- automatic previous/next list navigation in TWAS best-of lists
- layout scalable to browser and text sizes