Security and web sites: how to find the causes of code injection attacks, validation techniques


So, after a thorough analysis of a web site’s performance, we’ve found out how vulnerable it is, and in how many ways it is so.  We talked about code injection attacks, and considered the causes of this type of vulnerability, perhaps also discovered that some of your web sites are open to such attacks.  Since the task of these articles is not only that of evalutating web sites’ weaknesses, but also of strengthening their defences – how do we go about finding and eliminating the vulnerabilities of our web sites?

“Everything is in the hands of the enemy.  Do not trust anyone” – Fox Mulder would have been an excellent programmer!

Surely, that’s bad news.  The good news is that we can entirely avoid code injection attacks by checking our code accurately, and practicing well known programming techniques such as validation, escaping, and indirection.

In the remaining part of this article, just to keep things simple as much as possible, I’ll be talking exclusively about Javascript and php based technologies, however the same concepts can also be applied to ASP and VBScript, JSP, Java, Ruby or any other client-side or server-side programming language.

Let’s get some help – the server

First of all, it’s very important that we make sure that our PHP server’s register globals flag is off.  Some of you might have heard about it, but what’s its use, and why is it really “dangerous” if it’s left on?

The register globals flag indicates how the PHP interpreter deals with global variables: in particular variables such as those contained in the url, or passed in the post method, cookies, and session variables (in other words, all external variables), defined as global because they can be accessed by any function without being declared.

If the flag is on, as it used to be in earlier versions of PHP, all external variables become automatically php variables.  If it’s off instead, as it’s the case in later versions, external variables can be accessed exclusively by $_POST, $_GET, $_REQUEST, $_SESSION arrays, etc.

Why is this a problem?  Because often in php the value of a variable is used without being explicitly initialized, the reason being that php initializes it as an “empty variable”, as in the following code sample:

    if ($_REQUEST["flag"] == "yes")
        $message = "active flag";
    echo $message;

The idea is that, if we call this script as follows:

http://www.yoursite.com/script.php?flag=yes

what gets displayed is the message “active flag”.  If instead we called it like so:

http://www.yoursite.com/script.php?flag=no

nothing would be displayed.  Now, if the register_globals flag were on, I could do something like this:

http://www.yoursite.com/script.php?message=unforseen

and the message “unforseen” would be displayed!

This is so because PHP would automatically register the variable “messaggio”, which was passed in the url, as a global PHP variable.  This variable, not having been initialized beforehand, has in fact created an area vulnerable to code injection attacks.  Obviously, in the scenario outlined above the damages would be limited, but let’s imagine what would happen if the above variable were used to communicate with a database or to write onto a file: at this point, I’d have created a direct entry point from the url to my database!

It’s possible immediately to notice how efficacious this attack technique is, especially when one knows php code.  We already know that to base security on secrecy is a mistake: the code could have been read, perhaps by an untrustworthy person in the web farm, or could have been exposed because of some kind of problem in the web server – or could even have been open source, hence visible by everybody.

Sometimes, the code gets compromised in some obvious way because the programmer has shared parts of it on programmers’ forums, which might lead some ill- intentioned individuals to understand the convention according to which we create variable names and apply it to other works of ours, and take notice of the weaknesses in our programming style so as to be able to attack them.

Very often the problem is a subtle one:

  // UNSAFE CODE, DO NOT USE!
  //
  if ($_REQUEST["sector"] == "commercial")
      $email = "commercial@yoursite.com";
 else  if ($_REQUEST["sector"] == "marketing")
      $email = "marketing@yoursite.com";

  if ($email != "") {
      echo "Specify an email!";
      exit;
  }

  mail($email, "Contact received", "Contact riceived from $_REQUEST[surname] $_REQUEST[name]");

In this case, the script would be safe (almost, that is, apart from the mailto … but we’ll be talking about this later), if the flag register_globals were off, but it’s on, and it’s enough for me to call it as follows:

http://www.yoursite.com/script.php?email=your%20mail%20address

so as to make some spammers really happy: I’ve just created a mail gateway that can be used to send emails to everyone without me knowing about it, at my expenses (or, worse, at the client’s expenses, who’s less accommodating than myself …).  Mail gateways by means of php aren’t rare or hardly used among spammers.

Take note of the following important point: the register globals flag isn’t “bad” in itself.  None of the problems mentioned above would occur if I initialized all the variables, which is a practice always to be followed.  By setting it to off, however, I’m protected from this kind of problems – thereby eliminating the problems at the root, which will benefit my code.

A checklist for the server

  1. let’s verify by means of the phpinfo function that the register_globals flag is off;
  2. if it’s on, let’s set it to off.  At the level of Apache (we couldn’t do it anywhere else!), it’d be sufficient to add php_value register_globals off to the .htaccess main directory of our web site; 
  3. if for any reason I’m unable to set it to off (let’s always verify by means of  phpinfo that the value changes, and if we want to be really diligent let’s verify that automatically as our scripts start running), let’s ask the web farm to do it for us;
  4. if for any reason it can’t be done at the global level (old web sites developed in PHP 4 might not work) and also cannot be done at the local level because perhaps I can’t modify .htaccess (and, above all, I can’t afford the best providers!), let’s check and initialize all the variables used in our php.  Some help could be provided by the  error_reporting function, which when set to detailed error reporting will let us know if uninitialized variables are being used.

It’s important to stress that use of detailed  error_reporting flag is safe and recommended practice also when there are no problems in managing the  register_globals flag. A piece of code that doesn’t generate any kind of errors is a better piece of code, hence less subject to unforseen circumstances, and consequently less open to attacks.

Programming Techniques

Once we’ve ensured the server’s “help”, it’s time to clean our code.  I’m talking about cleaning, but normally this is done during coding, not afterwards: it’s an integral part of it.  The idea of first jotting down a rough copy to be cleaned up later on is open to future problems – don’t do it if you don’t really have to and only for very small and temporary jobs, it’s better to integrate good coding practices from the outset and to “think” the code through very accurately.

The three techniques that we’re going to look at are:

  1. validation, that is we make sure the data are valid both syntactically, e.g., numbers are made of figures, names are made of characters, etc., and where possible semantically: dates made of valid numbers, email addresses made of their appropriate components, existing labels corresponding to the ones that we want, non empty strings where they don’t have to be empty.  Even though it isnt’ exactly a correct definition, this phase includes the tacit conversion of value types: if I read a number in my php code, I convert it into a numeric type, if I read a date in Javascript I convert it into a Date object: I won’t leave them as string and number array respectively;
  2. escaping, that is replacing symbols and characters that have special meaning in a given language, for instance, let’s think about the characters ‘ and ” in SQL strings and Javascript, or about <> and & in HTML, with what are historically defined as “escape sequences”: for example, in SQL strings all ‘ are replaced with \’.  That first \ is called escape character. At the level of HTML, for example, the escape sequence of < is  &lt;
  3. indirection, which allows us to “isolate” input data from the rest of our code by replacing them by means of conversion tables.  It’s an extremely efficacious technique, even though descriptions of it are rather unclear: it’ll be much clearer with the help of some examples.

Generally, validation is used when data are being read, escaping during a change of language, while indirection is useful during both phases.

Validation

The first point open to code injection attacks is when data are being read.  Let’s consider a nice PHP page, we can say that data reading occurs:

  1. by using $_GET, $_POST, $_REQUEST arrays. In actual fact, I read the data passed by the user;
  2. obviously, by reading a file or an external html (fopen, popen, fread, etc);
  3. by loading the records obtained from a database query.

To keep things simple, we’ll be offering only examples that refer to  $_REQUEST, which is often the absolutely most important source, apart from being the first one, that we have to manage.

Let’s look for all the places where $_REQUEST has been used: every instance of its occurrence is a possible entry point to code injection attacks. In order to eliminate the vulnerability to the attack, the values taken directly from $_REQUEST:

  • do not have to appear in file names or in urls;
  • do not have to appear in SQL queries;
  • do not have to appear in the HTML output;
  • do not have absolutely to appear in the calls to the operating system or to the shell (system, passthru, etc);
  • if they get passed to a variable, as a whole or in concatenation, this has to be checked in the same way!
  • if (God forbid) they are used as parameter of an eval call … have recourse to punishments, hang them by the feet, beat them …

So, what do we have to do with these values if we can’t apparently do anything with them?

We have to validate them, that is we have to guarantee that the values  they contain are what they should be, that is valid values, and where possible also convert them.  Php offers excellent methods for checking variable values, which I recommend that you study and use wherever possible rather than writing your own version: http://it.php.net/manual/en/ref.var.php

For example, if I expect an integer, let’s check that it’s an integer and let’s eventually convert it by verifying that the values are valid:

if (is_numeric($_REQUEST["nut"]))
    $dado = intval($_REQUEST["nut"])
else
    exit;
if (($nut<1)||(6<nut))
    exit;


If I expect a label, as it’s the case for the guestbook’s radio buttons, let’s check that they’re valid:

switch ($_REQUEST["howmuch"]) {
case "alot":
case "boh":
    $quanto = $_REQUEST["howmuch"];
default:
    // I get angry, or I set $quanto to the default value
    error_log("someone is having a go at my guestbook!!1!11!");
    exit;
}


or:

$comments_valid = array("alot"=>true, "howmuch"=>true);
if (array_key_exists($_REQUEST["howmuch"], $comments_valid))
    $howmuch = $_REQUEST["howmuch"];
else {
    error_log("someone is having a go at my guestbook!!1!11!");
    exit;
}


Other valid checks can be the following:

  • length of form fields: a very frequent mistake is to believe that we can set the maximum length of text fields in a form by means of html.  As I’ve already said, never trust what we get from the html page. This is so also when we need simply to check whether a field has been filled in by having at least a length of 1;
  • validity of dates: php offers a verifying function called checkdate: use it, do not write your own;
  • the existence of values inserted in secondary fields. For instance, the classic select with the list of nations in contact forms – check that the returned value is correct and that it exists, do not trust Javascript.

Regular Expressions

Among the absolutely most powerful tools to validate data at the level of syntax, both in php and in Javascript, are regular expressions (often referred to as RE). Among the possible uses of regular expressions in the context of validation, we find:

  • emai address validation;
  • URL validation;
  • National Insurance Number and Company Registration Number validation.

Let’s be clear, syntax validation (numbers and characters in the right place and in the right amount) as much as it is powerful, doesn’t guarantee that the email exists, that the URL can be found or that the National Insurance Number and the Company Registration Number are valid: to make semantic checks, when possible, we have to use specialized code discussion of which is beyond the scope of this article.

When is it appropriate to use validation?

The rule is that values must be validated every time they are read, therefore both when they’re read from $_REQUEST and when they’re read from a database or file (which might have been compromised). Now, anyone who’s written more than a couple of applications in php understands very well that such an uncompromising approach quickly becomes hard to manage: many checks to perform, therefore much complication, and complication often engenders additional problems, until one becomes paranoid.

It’s clear that the decision of how much paranoia one endures must depend on many factors, including how critical the environment is (banks, risky targets, high traffic web sites), the allocated budget, the intrinsic value of the data.

One mistake to be avoided is to validate in multiple places and more than once.  Concentrate all validation code at the same time, variable by variable, and in the same place in your script, this way you’ll validate just once.  Clearly, if you have to validate multiple sources (e.g., $_REQUEST, files) you can’t have just one validation, however you can still validate an entire data source in one place.  Follow this advice and when you review the code again, perhaps in a few years’ time, you’ll be happy to have done that!

Does it make sense to validate in Javascript?

This is a never-ending topic for debate where there are different and sometimes contrary views.

The objective point: validation must always be performed, in any case, on the server side, therefore in php.  Never trust data input from the browser, therefore never trust Javascript validation.  NEVER!   If you’ve read the preceding sections, it should be absolutely clear why.

The subjective point: for reasons of efficiency, that is, in order to lower the number of calls to the server, it makes sense to validate also in Javascript.  Let’s remind ourselves of the small number of ill-intentioned people, sure, but not at the expenses of the majority of real users: if we already validate in Javascript our application’s interactivity will be faster and smoother, instead of waiting for the server response.

But… there’s a but.  This necessarily implies code duplication: I’ve got to check an empty string, a correct email address, the existence of some fields both in Javascript and in php.  Every code duplication is subject to synchronization problems – if I modify one part, I must also remember to modify the other, if I eliminate one validation on one side I must eliminate it also on the other.  Every synchronization is a likely source of errors or of security problems, besides being costly.

My advice is of tackling the problem case by case.  If the response time for a validation performed only in php is acceptable and the additional traffic thereby generated isn’t a problem, let’s validate only in php.  If we’re dealing with high traffic web sites or with saturated web farms, let’s use double validation – we’re already within an extreme context, and the costs can be acceptable when compared with significant performance  improvements.

In many cases, the hybrid solution is the best one: In the face of a complete validation in php, let’s keep “basic” validation at the level of Javascript.  For example, in a form to request approval for the treatment of personal details, a Javascript validation of the checkbox is a perfect system to reduce server traffic, validation check that will be  performed by the server in any case together with the rest of the fields that require validation.

Conclusions

Given the wide scope of this topic, I’d better stop here before I risk to create more confusion by adding new information.  In the next article I’ll go on with a discussion of the other two techninques.

In the meantime, I leave you to assimilate the theory of validation, and I invite you to talk about your experience with this fascinating technology.  There are many practical approaches to the problem of validation, which include both object-oriented techniques and old procedural coding practices: which one is yours, how do you tackle validation in your web sites and on which side are you on – all validation in php or a hybrid approach?

Index

Security and Web sites

Introduction

Keep your code safe

Master per Web Designer Freelance
In tutti questi anni abbiamo ricevuto centinaia di richieste di approfondimento sulle numerose tematiche del web design vissuto da freelance. Le abbiamo affrontate volta per volta. Ma ci siamo resi conto che era necessario fare qualcosa di più. Ecco perché è nato One Year Together, un vero e proprio master per web designer freelance che apre finalmente le porte al mondo del lavoro.
Scopri One Year Together »
[pdf]Scarica articolo in PDF[/pdf]
Tags: ,

The Author

Cristian is a freelance computer, specialized in the design and production of websites and, more generally, technology and Internet related mischief.

Author's web site | Other articles written by

Related Posts

You may be interested in the following articles:

3 comments

Trackback e pingback

  1. Tweets that mention Security and web sites: how to find the causes of code injection attacks, validation techniques | Your Inspiration Web -- Topsy.com
    [...] This post was mentioned on Twitter by Don Cabaleiro, Your Inspiration Web. Your Inspiration Web said: RT @YIW Security …

Leave a Reply

Current day month ye@r *