words / myth / ampers & virgule: Interesting software QA test case

Saturday, December 12, 2009

Interesting software QA test case

Hold those cynical thoughts for a minute. There really is such a thing as software quality assurance testing, an activity that employs a great many skilled and intelligent people. Despite the annoying flaws we users complain about in commercial software, it wouldn’t be on the market at all without the diligent and unsung work of QA departments everywhere.

Part of what QA departments do is a type of automated testing in which a script runs a piece of software through its paces, entering all sorts of rule-breaking text strings in input fields and seeing whether the software handles the rule-breaking gracefully. These test strings comprise all the weird cases analysts can think of—trying to type 100 characters into a field that is 30 characters long, using non-Latin characters into a field that expects Latin characters, typing letters into a number field, and so forth.

My serendipitous test case
My post yesterday is titled “<Redacted>.” Note that it begins and ends with angle brackets. Angle brackets have a special status in markup languages that derive from SGML, such as HTML, XHTML, and the others that the Web is built on. Angle brackets signal to the markup language interpreter that what is between them is a tag, information the software uses to decide, at the simplest level, how to display what follows (until a closing tag is encountered).

As a consequence of this special status, if I want you to see an angle bracket in the displayed text, I have to use a workaround. The workaround is to put in a character code (called an HTML entity) that will be interpreted as a mathematical less than or greater than symbol. Knowing this, what I typed into the Title field for yesterday’s post was <Redacted> (and I just applied a similar trick to make that come out right). So far so good.

But as you know, a blog post is more than a simple HTML web page. When I click the Publish Post button, my browser sends information to a server that triggers software to assign a URL-friendly name to the post and store what I’ve typed in a database. That database has rules for how text is stored. Other software extracts text from the database and sends an HTML page to your browser so you can read the post. Other software extracts the information in another way to supply your RSS feed reader. When you view my post, either as a web page on my domain or in your feed reader or in your email or…wherever, other software has intervened to process the text.

So there are lots of places where my angle brackets have to be interpreted and processed.

Complicating matters is that a lot of low-level text processing takes place inside software modules that are freely distributed to programmers. These modules may be written in a programming language different from that of the surrounding program, and the programmer who uses them may not fully understand all that goes on inside them. For example, if I want to build a web form that asks for a phone number, I may search around for a Javascript program that validates entries to assure they are legitimate phone numbers. I don’t have to know how that works; I only need to know how to use it.

Back to <Redacted>
Typically, Blogger creates a URL for blog posts based on the post title. It strips out punctuation and words such as a, an, the, and a few others. For example, my post titled “Do you have the willies?” became http://www.ampersandvirgule.com/2009/12/do-you-have-willies.html. For yesterday’s post, though, Blogger looked at the post title and threw up its hands (wise move), basing the post URL on the first line of the post body, instead: http://www.ampersandvirgule.com/2009/12/you-may-have-seen-story-other-day-about.html. So far so good.

But the post title gets reported in many other interfaces. It has variously shown up as:

<Redacted> [correct]
<Redacted> [user-unfriendly but not wrong]
Untitled Entry [uninformative, but a sign of recognition there was a problem]
[blank]

My humble suggestion to software QA professionals everywhere is that this is a test case they may want to add to their battery.

5 comments:

Michael Bolton http://www.developsense.com said...: Yes. There are other motivations for exercising this test idea, too. One important motivation is to investigate the possibility of vulnerability to cross-site scripting (XSS).

In this post, I also appreciate the fact that you've avoided the use of the term "special characters". Any character is potentially special in one context or another.

---Michael B.; 10:45 AM
Dick Margulis said...: Michael,

Thanks for your comment. I spent a decade as a tech writer, so I know just enough to be dangerous ;-); 10:58 AM
Mike Starr said...: I've encountered variations on your experience as I'm prone to using &ltgrin&gt rather than the ubiquitous "smiley" to make my humorous intent obvious..

Note: in attempting to use &ltgrin&gt but with the actual symbols rather than the pseudocode, I received the error message "Your HTML cannot be accepted: Tag is not allowed: GRIN".

However, given that I'm somewhat of a fellow curmudgeon, I've been trying to avoid the common usage "angle bracket" to denote the "less than" and "greater than" symbols.

I've also resisted drinking the Kool-Aid® on the usages "square brackets" for the [ and ] symbols (brackets) and "curly brackets for the { and } symbols (braces).

I know it's not true but sometimes it seems like I'm the only out there who has retained the memory of the correct names for these symbols.

And Dick, in spite of your statement that you know just enough to be dangerous, you're one of the bright lights of the web and you know far more than many so-called experts.

Mike; 11:42 AM
Dick Margulis said...: Mike,

Angle brackets (〈 and 〉) are angle brackets (and that's what they're properly called). You're correct that they're not less than and greater than symbols. However, the math symbols, because they're on the keyboard and because programmers and program language designers other than Donald Knuth have never understood squat about typographic symbols, are substituted for angle brackets in program code.

See http://en.wikipedia.org/wiki/Angle_bracket for further discussion.

And thanks for your kind exaggeration. 〈grin /〉; 12:02 PM
simi said...: Hi,

This Software Testing artical is very useful for me. I am a QA Engineer and always looking to learn something new. I would like to introduce another good Software Testing blog, Have a look.

http://SoftwareTestingNet.com

simi; 4:58 AM