Are your writing or maintaining a web based application that uses forms? If so, you have better be aware of some of the nasty characters that are around!
The
< character, when echoed back from a users's input 'unchallenged', may form the start of a tag. So that in a relatively benign case, a user who enters
<em> at the start of his name will have his name
emphasised back to him ... and to anyone else to whom that data is echoed unless your application cleans up.
The
" character too can cause problems when echoed - if it gets written into a tag that's already got an attribute that's quoted, you can get some odd results. A user who enters
44" type="password into an unchallenged box that's echoed may be able to make the next form come up with the field he is entering using blobs rather than the actual characters typed in the box.
The
' character can be a snare too - if your application stores the entry uncleaned in a database, then with appropriate following code after the quote (I am not giving an example here!) can do severe damage.
And those are just three examples of special characters that can cause problems if they are not carefully considered; others include
` . + \ & % and even the humble space. And if you are unwise enough to treat a user's input as a regular expression, you're opening the way for the user to start performing all sorts of nasties with other characters too such as
* ? [ ] | ( and
) (and this list is not - and is not intended to be - complete!)
Have I frightened you so much that you never want to provide a user input box again? I hope not, because there are robust and easy solutions!
I find it helpful to draw diagrams to show how the variables flow through my code and are processed, labelling each of the legs with the function / code necessary to clean up and close loopholes. The variable conditions ("from web", "in memory", "as part of XML string", "in database" and "sent back to web") will be the same no matter what language you're using. The labels on the flow lines will vary, depending on the functions in the language and how much work the web / database interfaces in the language do for you, and how much is left up to you.

Here is the diagram for PHP; you'll typically use "stripslashes" to bring a string into memory, with most of the rest of the work done by PHP. "addslashes" or "mysqlrealescape" converts the data for database storage, and "htmlspecialchars" gets it read for sending back to the web.

For Perl, you can use a module like CGI.pm, or you can roll your own. Personally, I have a sub that I call
collectform that turns up via a
use in most of my apps, and another called
webify that cleans for output. They need to hand things like hex codes (%2B) and + characters which PHP handles silently for you (one of the differences between the ethos of the languages - Perl being general purpose, whereas PHP is written by a web programmer, for web programmers).
With
Python, the
cgi module provides methods such as
cgi.Fieldstoragecgi.escape which add, in single calls, the necessary converters to the language. There's an example in our source code library
here (and further examples linked from that page too!.
If you're using
Tcl as your server side scripting language, we have sample of source code that tidies up nasty characters
here. And if you're a
Lua Programmer, then we have an example
here.
(written 2009-06-14, updated 2009-06-21)
22c5
Associated topics are indexed under
H107 - String Handling in PHP [4072] Splitting the difference with PHP - (2013-04-27)
[4071] Setting up strings in PHP - (2013-04-27)
[3790] Solution looking for a problem? Lookahead and Lookbehind - (2012-06-30)
[3789] More than just matching with a regular expression in PHP - (2012-06-30)
[3788] Getting more than a yes / no answer from a regular expression pattern match - (2012-06-30)
[3534] Learning to program in PHP - Regular Expression and Associative Array examples - (2011-12-01)
[3516] Regular Expression modifiers in PHP - summary table - (2011-11-12)
[3515] PHP - moving from ereg to preg for regular expressions - (2011-11-11)
[3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
[3020] Handling (expanding) tabs in PHP - (2010-10-29)
[2629] Curly braces within double quoted strings in PHP - (2010-02-09)
[2165] Making Regular Expressions easy to read and maintain - (2009-05-10)
[2046] Finding variations on a surname - (2009-02-17)
[1799] Regular Expressions in PHP - (2008-09-16)
[1613] Regular expression for 6 digits OR 25 digits - (2008-04-16)
[1603] Do not SHOUT and do not whisper - (2008-04-06)
[1533] Short and sweet and sticky - PHP form input - (2008-02-06)
[1372] A taster PHP expression ... - (2007-09-30)
[1336] Ignore case in Regular Expression - (2007-09-08)
[1195] Regular Express Primer - (2007-05-20)
[1058] PHP Regular expression to extrtact link and text - (2007-01-31)
[1008] Date conversion - PHP - (2006-12-26)
[728] Looking ahead and behind in a Regular Expression - (2006-05-22)
[716] Evaluating arithmetic expressions in configuration files - (2006-05-10)
[642] How similar are two words - (2006-03-11)
[608] Don't expose your regular expressions - (2006-02-15)
[589] Robust PHP user inputs - (2006-02-03)
[574] PHP - dividing a string up into pieces - (2006-01-23)
[560] The fencepost problem - (2006-01-10)
[558] Converting between acres and hectares - (2006-01-08)
[493] Running a Perl script within a PHP page - (2005-11-12)
[463] Splitting the difference - (2005-10-13)
[422] PHP Magic Quotes - (2005-08-22)
[337] the array returned by preg_match_all - (2005-06-06)
[54] PHP and natural sorting - (2004-09-19)
[31] Here documents - (2004-08-28)
H303 - PHP - Long and short term cookies and security [3813] Injection Attacks - PHP, SQL, HTML, Javascript - and how to neutralise them - (2012-07-22)
[3698] How to stop forms on other sites submitting to your scripts - (2012-04-15)
[1911] Remember Me - PHP - (2008-11-28)
[1646] Using cookies and sessions to connect different URLs - PHP - (2008-05-18)
P403 - Perl - The Common Gateway Interface [3445] Perl and CGI - simple form, and monitoring script. - (2011-09-17)
[2834] Teaching examples in Perl - third and final part - (2010-06-27)
[2551] Perl and the Common Gateway Interface - out of fashion but still very useful? - (2009-12-26)
[45] CGI v mod_perl - (2004-09-11)
P609 - Perl - Network Security [2688] Security considerations in programming - what do we teach? - (2010-03-22)
[426] Robust checking of data entered by users - (2005-08-27)
T241 - Tcl/Tk - Tcl on the Web [2429] Tcl scripts / processes on a web server via CGI - (2009-09-27)
[2040] Error: Cant read xxxxx: no such variable (in Tcl Tk) - (2009-02-14)
[1785] What is running on your network? (tcl and expect) - (2008-09-04)
U116 - Network Lua.Y202 - Python on the Web [4089] Quick and easy - showing Python data hander output via a browser - (2013-05-15)
[2365] Counting Words in Python via the web - (2009-08-18)
[1745] Moodle, Drupal, Django (and Rails) - (2008-08-08)
[903] Pieces of Python - (2006-10-23)
[433] FTP - how to make the right transfers - (2005-09-01)
[237] Crossfertilisation, PHP to Python - (2005-03-06)
53b9
Some other Articles
So what is this thing called Perl that I keep harping on about?Perl references - $$var and \$var notationsHow do I query a database (MySQL)?Cornerstone Cafe, MelkshamHandling nasty characters - Perl, PHP, Python, Tcl, LuaTaking a pride in the communityAlumni - revisiting and supporting the old UniversitySending awkward characters by email in PerlLoading external code into Perl from a nonstandard directoryTransforming data in Perl using lists of lists and hashes of hashes