Part of what QA departments do is a type of automated testing in which a script runs a piece of software through its paces, entering all sorts of rule-breaking text strings in input fields and seeing whether the software handles the rule-breaking gracefully. These test strings comprise all the weird cases analysts can think of—trying to type 100 characters into a field that is 30 characters long, using non-Latin characters into a field that expects Latin characters, typing letters into a number field, and so forth.
My serendipitous test case
My post yesterday is titled “<Redacted>.” Note that it begins and ends with angle brackets. Angle brackets have a special status in markup languages that derive from SGML, such as HTML, XHTML, and the others that the Web is built on. Angle brackets signal to the markup language interpreter that what is between them is a tag, information the software uses to decide, at the simplest level, how to display what follows (until a closing tag is encountered).
As a consequence of this special status, if I want you to see an angle bracket in the displayed text, I have to use a workaround. The workaround is to put in a character code (called an HTML entity) that will be interpreted as a mathematical less than or greater than symbol. Knowing this, what I typed into the Title field for yesterday’s post was <Redacted> (and I just applied a similar trick to make that come out right). So far so good.
But as you know, a blog post is more than a simple HTML web page. When I click the Publish Post button, my browser sends information to a server that triggers software to assign a URL-friendly name to the post and store what I’ve typed in a database. That database has rules for how text is stored. Other software extracts text from the database and sends an HTML page to your browser so you can read the post. Other software extracts the information in another way to supply your RSS feed reader. When you view my post, either as a web page on my domain or in your feed reader or in your email or…wherever, other software has intervened to process the text.
So there are lots of places where my angle brackets have to be interpreted and processed.
Back to <Redacted>
Typically, Blogger creates a URL for blog posts based on the post title. It strips out punctuation and words such as a, an, the, and a few others. For example, my post titled “Do you have the willies?” became http://www.ampersandvirgule.com/2009/12/do-you-have-willies.html. For yesterday’s post, though, Blogger looked at the post title and threw up its hands (wise move), basing the post URL on the first line of the post body, instead: http://www.ampersandvirgule.com/2009/12/you-may-have-seen-story-other-day-about.html. So far so good.
But the post title gets reported in many other interfaces. It has variously shown up as:
- <Redacted> [correct]
- <Redacted> [user-unfriendly but not wrong]
- Untitled Entry [uninformative, but a sign of recognition there was a problem]