Home Accessibility Courses Twitter The Mouth Facebook Resources Site Map About Us Contact
Handling nasty characters - Perl, PHP, Python, Tcl, Lua

Are your writing or maintaining a web based application that uses forms? If so, you have better be aware of some of the nasty characters that are around!

The < character, when echoed back from a users's input 'unchallenged', may form the start of a tag. So that in a relatively benign case, a user who enters <em> at the start of his name will have his name emphasised back to him ... and to anyone else to whom that data is echoed unless your application cleans up.

The " character too can cause problems when echoed - if it gets written into a tag that's already got an attribute that's quoted, you can get some odd results. A user who enters 44" type="password into an unchallenged box that's echoed may be able to make the next form come up with the field he is entering using blobs rather than the actual characters typed in the box.

The ' character can be a snare too - if your application stores the entry uncleaned in a database, then with appropriate following code after the quote (I am not giving an example here!) can do severe damage.

And those are just three examples of special characters that can cause problems if they are not carefully considered; others include ` . + \ & % and even the humble space. And if you are unwise enough to treat a user's input as a regular expression, you're opening the way for the user to start performing all sorts of nasties with other characters too such as * ? [ ] | ( and ) (and this list is not - and is not intended to be - complete!)

Have I frightened you so much that you never want to provide a user input box again? I hope not, because there are robust and easy solutions!

I find it helpful to draw diagrams to show how the variables flow through my code and are processed, labelling each of the legs with the function / code necessary to clean up and close loopholes. The variable conditions ("from web", "in memory", "as part of XML string", "in database" and "sent back to web") will be the same no matter what language you're using. The labels on the flow lines will vary, depending on the functions in the language and how much work the web / database interfaces in the language do for you, and how much is left up to you.

PHP - how to prevent injection attacksHere is the diagram for PHP; you'll typically use "stripslashes" to bring a string into memory, with most of the rest of the work done by PHP. "addslashes" or "mysqlrealescape" converts the data for database storage, and "htmlspecialchars" gets it read for sending back to the web.


Perl, Web, injection attacksFor Perl, you can use a module like CGI.pm, or you can roll your own. Personally, I have a sub that I call collectform that turns up via a use in most of my apps, and another called webify that cleans for output. They need to hand things like hex codes (%2B) and + characters which PHP handles silently for you (one of the differences between the ethos of the languages - Perl being general purpose, whereas PHP is written by a web programmer, for web programmers).


With Python, the cgi module provides methods such as cgi.Fieldstoragecgi.escape which add, in single calls, the necessary converters to the language. There's an example in our source code library here (and further examples linked from that page too!.

If you're using Tcl as your server side scripting language, we have sample of source code that tidies up nasty characters here. And if you're a Lua Programmer, then we have an example here.
(written 2009-06-14, updated 2009-06-21)

 
Associated topics are indexed as below, or enter http://melksh.am/nnnn for individual articles
H107 - String Handling in PHP
  [4072] Splitting the difference with PHP - (2013-04-27)
  [4071] Setting up strings in PHP - (2013-04-27)
  [3790] Solution looking for a problem? Lookahead and Lookbehind - (2012-06-30)
  [3789] More than just matching with a regular expression in PHP - (2012-06-30)
  [3788] Getting more than a yes / no answer from a regular expression pattern match - (2012-06-30)
  [3534] Learning to program in PHP - Regular Expression and Associative Array examples - (2011-12-01)
  [3516] Regular Expression modifiers in PHP - summary table - (2011-11-12)
  [3515] PHP - moving from ereg to preg for regular expressions - (2011-11-11)
  [3424] Divide 10000 by 17. Do you get 588.235294117647, 588.24 or 588? - Ruby and PHP - (2011-09-08)
  [3020] Handling (expanding) tabs in PHP - (2010-10-29)
  [2629] Curly braces within double quoted strings in PHP - (2010-02-09)
  [2165] Making Regular Expressions easy to read and maintain - (2009-05-10)
  [2046] Finding variations on a surname - (2009-02-17)
  [1799] Regular Expressions in PHP - (2008-09-16)
  [1613] Regular expression for 6 digits OR 25 digits - (2008-04-16)
  [1603] Do not SHOUT and do not whisper - (2008-04-06)
  [1533] Short and sweet and sticky - PHP form input - (2008-02-06)
  [1372] A taster PHP expression ... - (2007-09-30)
  [1336] Ignore case in Regular Expression - (2007-09-08)
  [1195] Regular Express Primer - (2007-05-20)
  [1058] PHP Regular expression to extrtact link and text - (2007-01-31)
  [1008] Date conversion - PHP - (2006-12-26)
  [728] Looking ahead and behind in a Regular Expression - (2006-05-22)
  [716] Evaluating arithmetic expressions in configuration files - (2006-05-10)
  [642] How similar are two words - (2006-03-11)
  [608] Don't expose your regular expressions - (2006-02-15)
  [589] Robust PHP user inputs - (2006-02-03)
  [574] PHP - dividing a string up into pieces - (2006-01-23)
  [560] The fencepost problem - (2006-01-10)
  [558] Converting between acres and hectares - (2006-01-08)
  [493] Running a Perl script within a PHP page - (2005-11-12)
  [463] Splitting the difference - (2005-10-13)
  [422] PHP Magic Quotes - (2005-08-22)
  [337] the array returned by preg_match_all - (2005-06-06)
  [54] PHP and natural sorting - (2004-09-19)
  [31] Here documents - (2004-08-28)

H303 - PHP - Long and short term cookies and security
  [3813] Injection Attacks - PHP, SQL, HTML, Javascript - and how to neutralise them - (2012-07-22)
  [3698] How to stop forms on other sites submitting to your scripts - (2012-04-15)
  [1911] Remember Me - PHP - (2008-11-28)
  [1646] Using cookies and sessions to connect different URLs - PHP - (2008-05-18)

P403 - Perl - The Common Gateway Interface
  [3445] Perl and CGI - simple form, and monitoring script. - (2011-09-17)
  [2834] Teaching examples in Perl - third and final part - (2010-06-27)
  [2551] Perl and the Common Gateway Interface - out of fashion but still very useful? - (2009-12-26)
  [45] CGI v mod_perl - (2004-09-11)

P609 - Perl - Network Security
  [2688] Security considerations in programming - what do we teach? - (2010-03-22)
  [426] Robust checking of data entered by users - (2005-08-27)

T241 - Tcl/Tk - Tcl on the Web
  [2429] Tcl scripts / processes on a web server via CGI - (2009-09-27)
  [2040] Error: Cant read xxxxx: no such variable (in Tcl Tk) - (2009-02-14)
  [1785] What is running on your network? (tcl and expect) - (2008-09-04)

U116 - Network Lua.
Y202 - Python on the Web
  [4089] Quick and easy - showing Python data hander output via a browser - (2013-05-15)
  [2365] Counting Words in Python via the web - (2009-08-18)
  [1745] Moodle, Drupal, Django (and Rails) - (2008-08-08)
  [903] Pieces of Python - (2006-10-23)
  [433] FTP - how to make the right transfers - (2005-09-01)
  [237] Crossfertilisation, PHP to Python - (2005-03-06)


Back to
Taking a pride in the community
Previous and next
or
Horse's mouth home
Forward to
Cornerstone Cafe, Melksham
Some other Articles
So what is this thing called Perl that I keep harping on about?
Perl references - $$var and \$var notations
How do I query a database (MySQL)?
Cornerstone Cafe, Melksham
Handling nasty characters - Perl, PHP, Python, Tcl, Lua
Taking a pride in the community
Alumni - revisiting and supporting the old University
Sending awkward characters by email in Perl
Loading external code into Perl from a nonstandard directory
Transforming data in Perl using lists of lists and hashes of hashes
4318 posts, page by page
Link to page ... 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87 at 50 posts per page


This is a page archived from The Horse's Mouth at http://www.wellho.net/horse/ - the diary and writings of Graham Ellis. Every attempt was made to provide current information at the time the page was written, but things do move forward in our business - new software releases, price changes, new techniques. Please check back via our main site for current courses, prices, versions, etc - any mention of a price in "The Horse's Mouth" cannot be taken as an offer to supply at that price.

Link to Ezine home page (for reading).
Link to Blogging home page (to add comments).

You can Add a comment or ranking to this page

© WELL HOUSE CONSULTANTS LTD., 2014: Well House Manor • 48 Spa Road • Melksham, Wiltshire • United Kingdom • SN12 7NY
PH: 01144 1225 708225 • FAX: 01144 1225 899360 • EMAIL: info@wellho.net • WEB: http://www.wellho.net • SKYPE: wellho

PAGE: http://www.wellho.net/mouth/2238_Han ... l-Lua.html • PAGE BUILT: Thu Sep 18 15:30:25 2014 • BUILD SYSTEM: WomanWithCat