Are your writing or maintaining a web based application that uses forms? If so, you have better be aware of some of the nasty characters that are around!
The
< character, when echoed back from a users's input 'unchallenged', may form the start of a tag. So that in a relatively benign case, a user who enters
<em> at the start of his name will have his name
emphasised back to him ... and to anyone else to whom that data is echoed unless your application cleans up.
The
" character too can cause problems when echoed - if it gets written into a tag that's already got an attribute that's quoted, you can get some odd results. A user who enters
44" type="password into an unchallenged box that's echoed may be able to make the next form come up with the field he is entering using blobs rather than the actual characters typed in the box.
The
' character can be a snare too - if your application stores the entry uncleaned in a database, then with appropriate following code after the quote (I am not giving an example here!) can do severe damage.
And those are just three examples of special characters that can cause problems if they are not carefully considered; others include
` . + \ & % and even the humble space. And if you are unwise enough to treat a user's input as a regular expression, you're opening the way for the user to start performing all sorts of nasties with other characters too such as
* ? [ ] | ( and
) (and this list is not - and is not intended to be - complete!)
Have I frightened you so much that you never want to provide a user input box again? I hope not, because there are robust and easy solutions!
I find it helpful to draw diagrams to show how the variables flow through my code and are processed, labelling each of the legs with the function / code necessary to clean up and close loopholes. The variable conditions ("from web", "in memory", "as part of XML string", "in database" and "sent back to web") will be the same no matter what language you're using. The labels on the flow lines will vary, depending on the functions in the language and how much work the web / database interfaces in the language do for you, and how much is left up to you.
Here is the diagram for PHP; you'll typically use "stripslashes" to bring a string into memory, with most of the rest of the work done by PHP. "addslashes" or "mysqlrealescape" converts the data for database storage, and "htmlspecialchars" gets it read for sending back to the web.
For Perl, you can use a module like CGI.pm, or you can roll your own. Personally, I have a sub that I call
collectform that turns up via a
use in most of my apps, and another called
webify that cleans for output. They need to hand things like hex codes (%2B) and + characters which PHP handles silently for you (one of the differences between the ethos of the languages - Perl being general purpose, whereas PHP is written by a web programmer, for web programmers).
With
Python, the
cgi module provides methods such as
cgi.Fieldstoragecgi.escape which add, in single calls, the necessary converters to the language. There's an example in our source code library
here (and further examples linked from that page too!.
If you're using
Tcl as your server side scripting language, we have sample of source code that tidies up nasty characters
here. And if you're a
Lua Programmer, then we have an example
here.
(written 2009-06-14, updated 2009-06-21)
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
Y202 - Python on the Web [237] Crossfertilisation, PHP to Python - (2005-03-06)
[426] Robust checking of data entered by users - (2005-08-27)
[433] FTP - how to make the right transfers - (2005-09-01)
[903] Pieces of Python - (2006-10-23)
[1745] Moodle, Drupal, Django (and Rails) - (2008-08-08)
[2365] Counting Words in Python via the web - (2009-08-18)
[4089] Quick and easy - showing Python data hander output via a browser - (2013-05-15)
[4404] Which (virtual) host was visited? Tuning Apache log files, and Python analysis - (2015-01-23)
[4536] Json load from URL, recursive display, Python 3.4 - (2015-10-14)
U116 - Network Lua.T241 - Tcl/Tk - Tcl on the Web [1785] What is running on your network? (tcl and expect) - (2008-09-04)
[2040] Error: Cant read xxxxx: no such variable (in Tcl Tk) - (2009-02-14)
[2429] Tcl scripts / processes on a web server via CGI - (2009-09-27)
[4461] Reading from a URL, and reading Json, from your Tcl script - (2015-03-12)
P609 - Perl - Network Security [2688] Security considerations in programming - what do we teach? - (2010-03-22)
P403 - Perl - The Common Gateway Interface [45] CGI v mod_perl - (2004-09-11)
[2551] Perl and the Common Gateway Interface - out of fashion but still very useful? - (2009-12-26)
[2834] Teaching examples in Perl - third and final part - (2010-06-27)
[3445] Perl and CGI - simple form, and monitoring script. - (2011-09-17)
H303 - PHP - Long and short term cookies and security [1646] Using cookies and sessions to connect different URLs - PHP - (2008-05-18)
[1911] Remember Me - PHP - (2008-11-28)
[3698] How to stop forms on other sites submitting to your scripts - (2012-04-15)
[3813] Injection Attacks - PHP, SQL, HTML, Javascript - and how to neutralise them - (2012-07-22)
H107 - String Handling in PHP [31] Here documents - (2004-08-28)
[54] PHP and natural sorting - (2004-09-19)
[337] the array returned by preg_match_all - (2005-06-06)
[422] PHP Magic Quotes - (2005-08-22)
[463] Splitting the difference - (2005-10-13)
[493] Running a Perl script within a PHP page - (2005-11-12)
[558] Converting between acres and hectares - (2006-01-08)
[560] The fencepost problem - (2006-01-10)
[574] PHP - dividing a string up into pieces - (2006-01-23)
[589] Robust PHP user inputs - (2006-02-03)
[608] Don't expose your regular expressions - (2006-02-15)
[642] How similar are two words - (2006-03-11)
[716] Evaluating arithmetic expressions in configuration files - (2006-05-10)
[728] Looking ahead and behind in a Regular Expression - (2006-05-22)
[1008] Date conversion - PHP - (2006-12-26)
[1058] PHP Regular expression to extrtact link and text - (2007-01-31)
[1195] Regular Express Primer - (2007-05-20)
[1336] Ignore case in Regular Expression - (2007-09-08)
[1372] A taster PHP expression ... - (2007-09-30)
[1533] Short and sweet and sticky - PHP form input - (2008-02-06)
[1603] Do not SHOUT and do not whisper - (2008-04-06)
[1613] Regular expression for 6 digits OR 25 digits - (2008-04-16)
[1799] Regular Expressions in PHP - (2008-09-16)
[2046] Finding variations on a surname - (2009-02-17)
[2165] Making Regular Expressions easy to read and maintain - (2009-05-10)
[2629] Curly braces within double quoted strings in PHP - (2010-02-09)
[3020] Handling (expanding) tabs in PHP - (2010-10-29)
[3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
[3515] PHP - moving from ereg to preg for regular expressions - (2011-11-11)
[3516] Regular Expression modifiers in PHP - summary table - (2011-11-12)
[3534] Learning to program in PHP - Regular Expression and Associative Array examples - (2011-12-01)
[3788] Getting more than a yes / no answer from a regular expression pattern match - (2012-06-30)
[3789] More than just matching with a regular expression in PHP - (2012-06-30)
[3790] Solution looking for a problem? Lookahead and Lookbehind - (2012-06-30)
[4071] Setting up strings in PHP - (2013-04-27)
[4072] Splitting the difference with PHP - (2013-04-27)
Some other Articles
So what is this thing called Perl that I keep harping on about?Perl references - $$var and \$var notationsHow do I query a database (MySQL)?Cornerstone Cafe, MelkshamHandling nasty characters - Perl, PHP, Python, Tcl, LuaTaking a pride in the communityAlumni - revisiting and supporting the old UniversitySending awkward characters by email in PerlLoading external code into Perl from a nonstandard directoryTransforming data in Perl using lists of lists and hashes of hashes