Why URL validation with filter_var might not be a good idea

Since PHP 5.2 brought us the filter_var function, the time of such monsters was over (taken from here):

1
2
$urlregex = "^(https?|ftp)\:\/\/([a-z0-9+!*(),;?&=\$_.-]+(\:[a-z0-9+!*(),;?&=\$_.-]+)?@)?[a-z0-9+\$_-]+(\.[a-z0-9+\$_-]+)*(\:[0-9]{2,5})?(\/([a-z0-9+\$_-]\.?)+)*\/?(\?[a-z+&\$_.-][a-z0-9;:@/&%=+\$_.-]*)?(#[a-z_.-][a-z0-9+\$_.-]*)?\$";
if (eregi($urlregex, $url)) {echo "good";} else {echo "bad";}

The simple, yet effective syntax:

1
filter_var($url, FILTER_VALIDATE_URL)

As third parameter, filter flags can be passed. Considering URL validation, the following 4 flags are availible:

1
2
3
4
FILTER_FLAG_SCHEME_REQUIRED
FILTER_FLAG_HOST_REQUIRED
FILTER_FLAG_PATH_REQUIRED
FILTER_FLAG_QUERY_REQUIRED

The first two FILTER_FLAG_SCHEME_REQUIRED and FILTER_FLAG_HOST_REQUIRED are the default.

Get started!

Alright, let’s look at some critical examples.

1
filter_var('http://example.com/"><script>alert("xss")</script>', FILTER_VALIDATE_URL) !== false; //true

Well, nobody said that filter_var was built to fight XSS. Let’s accept this and move on:

1
filter_var('php://filter/read=convert.base64-encode/resource=/etc/passwd', FILTER_VALIDATE_URL) !== false; //true

Way more critical. Any scheme will pass the filter. http(s) and ftp would have been acceptable, but this is problematic. filter_var has to deal with all the evilness that a url can contain.

1
filter_var('foo://bar', FILTER_VALIDATE_URL) !== false; //true

And the best

1
filter_var('javascript://test%0Aalert(321)', FILTER_VALIDATE_URL) !== false; //true

Let’s take a closer look: javascript is the scheme. Of course, hit javascript:alert(1+2+3+4); in the address bar of your browser and you’ll see:

Javascript-URL

Javascript-URL

This is the way that bookmarklets work and not a secret. But let’s move on: The double // starts an ordinary javascript comment and convinces filter_var that we are dealing with a valid url scheme – look at the examples above. After that, the sequence %0A follows, which is exactly the output of the following code:

1
echo urlencode("\n");

Get it? Because of the url encoded newline, the javascript comment started with // will be finished and what follows is arbitrary javascript code. Imagine a dating site where user urls are validated with filter_var and displayed on the front page. Very evil. Try it yourself.

And now?

The following modification of filter_var could be worth wile:

1
2
3
4
5
6
7
function validate_url($url)
{
    $url = trim($url);
   
    return ((strpos($url, "http://") === 0 || strpos($url, "https://") === 0) &&
            filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_SCHEME_REQUIRED | FILTER_FLAG_HOST_REQUIRED) !== false);
}

But even with this wrapping function, the – at least very unusual – url http://x passes validation. Maybe, the regex monsters are not that bad ;). And before I forget: filter_var is not multibyte capable. The absolutely valid url http://스타벅스코리아.com is being rejected:

1
var_dump(filter_var("http://스타벅스코리아.com", FILTER_VALIDATE_URL) !== false); //bool(false)

To conclude: use filter_var with care, adapt to your situation and be aware of the weaknesses. Finally, I’d like to recommend this nice collection of filter_var tests dependent on the filter flags. Ah, and have a look at Symfony 2′s url validator, if you like.

Weitere Posts:

Dieser Beitrag wurde unter php, PHP-WTF, Security, webdev veröffentlicht. Setze ein Lesezeichen auf den Permalink.

Eine Antwort auf Why URL validation with filter_var might not be a good idea

  1. Sebastian sagt:

    Vielen Dank für diesen hilfreichen Artikel. Habe deine URL-Validierung gleich weiterverarbeitet siehe: http://sklueh.de/2012/09/lightweight-validator-in-php/

    public function check_url($mValue)
    {
    //Danke an David Müller (http://www.d-mueller.de)

Hinterlasse eine Antwort

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *

Du kannst folgende HTML-Tags benutzen: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>