Debugging a segfault on a production service: tutorial/example based on a true story: Apache2 mod_rpaf segfault

What follows are notes taken while debugging segfaults of Apache on a production environment. It might be useful to me for future debugging needs, and may be of use to other newbies as well.

 .
/!\  warning: It has been written by a newbie, for himself and for newbies. 
---  I have a very basic understanding of C debugging and the apache internals.  

(Please tell me if there are errors)

Environment and symptoms

Apache sometimes segfault. After investigation, it segfaults under some web vulnerability scans.

First step: get a backtrace / coredump

Make Apache create a coredump when it segfaults, for being able to study the issue under a debugger.

On Debian, read /usr/share/doc/apache2.2-common/README.backtrace.

Basically, you'll need:

CoreDumpDirectory /var/cache/apache2 # in apache's configuration
ulimit -c unlimited # before starting apache

You might need to recompile apache with debugging and no optimisations. See below.

Have a look to the coredump

You'll need debugging symbols and a debugger.

On debian:

apt-get install apache2-dbg libapr1-dbg libaprutil1-dbg gdb

debug symbols, no optimisations

However, compiler does optimisation so that some values won't be printable by the debugger. So, you might need to recompile Apache with debugging and no optimisation.

In general:

CFLAGS= -g -O0 # debugging, no optimisation

And, of course, do not strip the binaries.

On Debian:

apt-get build-dep apache2
apt-get source apache2
cd apache2-2.2.16
export DEB_BUILD_OPTIONS="nostrip noopt"
fakeroot debian/rules binary

Starting the debugger (GDB)

gdb /usr/sbin/apache2 /path/to/coredump
bt full # show the backtrace

In this case, the backtrace was:

#0  0x00007fb11e60ef32 in ap_add_common_vars (r=0x7fb1229f4d30) at /usr/src/apache/apache2-2.2.16/server/util_script.c:241
        e = 0x7fb1229f3860
        s = 0x7fb11fefbd00
        c = 0x7fb1229ecee0
        rem_logname = 0x7fb1229f4cb8 "x̞\"\261\177"
        env_path = 0x7fff9585df40 "/usr/local/bin:/usr/bin:/bin"
        host = 0x0
        hdrs_arr = 0x7fb1229f5010
        hdrs = 0x7fb1229f5138
        i = 4
        rport = 0
#1  0x00007fb119e983f9 in suphp_script_handler (r=0x7fb1229f4d30) at mod_suphp.c:834
        p = 0x7fb1229f4cb8
        sconf = 0x7fb1214a1c68
        dconf = 0x7fb1229f2d88
        finfo = {pool = 0x7fb1229f4cb8, valid = 7598960, protection = 1604, filetype = APR_REG, user = 1000, group = 1000, inode = 10193601, device = 18, nlink = 1, size = 19981, csize = 20480, atime = 1309525506000000, mtime = 1317514847000000, ctime = 1341403789000000, 
          fname = 0x7fb1229f3988 "/var/www/index.php", name = 0x7fb11dd22968 "\205\300u\nH\213E\330H\213@\b\353\024H\203E\330\030H\213E\330H;E\320v\301\270", filehand = 0x7fb11ed6e0b0}
        procattr = 
        argv = 0x7fb1229f39b0
        env = 
        rv = 
        strbuf = '\000' , "\023\257\035\261\177\000\000\060؞\"\261\177\000\000\260\300\205\225\377\177\000\000\200\023\257\035\261\177\000\000x\332}\035\261\177\000\000 \000\000\000\060\000\000\000@\267\205\225\377\177\000\000\200\266\205\225\377\177\000\000\000\000\000\000\000\000\000\000p\267\205\225\377\177\000\000q\363\000\000\000\000\000\000\060؞\"\261\177\000\000\340Ξ\"\261\177\000\000\200\023\257\035\261\177\000\000\031\000\000\000\377\177\000\000\000\000\000\000\004\002\000\000\000\000\000\000\031", '\000' , "\b\000\000p\264\205\225\377\177\000\000\260\300\205\225\377\177\000\000\000\000\000\000\000\000\000\000\060؞\"\261\177\000\000\000\023\257\035\261\177\000\000\260\300\205\225\377\177\000\000\006\000\000\000\000\000\000\000\060ͅ\225\377\177\000\000q\363\000\000\000\000\000\000\060؞\"\261\177\000\000\061P\207\031\261\177\000\000\060ͅ\225\377\177\000\000(ͅ\225\377\177\000\000Lͅ\225\377\177\000\000Hͅ\225\377\177", '\000' ...
        tmpbuf = 
        auth_user = 
        auth_pass = 0x7fb1229f4d30 "\270L\237\"\261\177"
        ud_user = 0x0
        ud_group = 0x0
        bb = 
        b = 
#2  0x00007fb119e991b8 in suphp_handler (r=0x7fb1229f4d30) at mod_suphp.c:569
        dconf = 0x7fb1229f2d88
#3  0x00007fb11e607316 in ap_run_handler (r=0x7fb1229f4d30) at /usr/src/apache/apache2-2.2.16/server/config.c:159
        pHook = 0x7fb11e86c1b0
        n = 5
        rv = -1
#4  0x00007fb11e607c25 in ap_invoke_handler (r=0x7fb1229f4d30) at /usr/src/apache/apache2-2.2.16/server/config.c:377
        handler = 0x7fb11ed6e0b0 "application/x-httpd-php5"
        p = 0x0
        result = 0
        old_handler = 0x0
        ignore = 0x7fb11e5f434a "H\213U\370H\211BP\353\031\203}\344\002u\023\350Ia\377\377H\213U\370H\211BX\353\004\220\353\001\220\311\303UH\211\345\211}\374\211u\370\203", 
#5  0x00007fb11e61cc21 in ap_process_request (r=0x7fb1229f4d30) at /usr/src/apache/apache2-2.2.16/modules/http/http_request.c:282
        access_status = 0
#6  0x00007fb11e6198fa in ap_process_http_connection (c=0x7fb1229ecee0) at /usr/src/apache/apache2-2.2.16/modules/http/http_core.c:190
        r = 0x7fb1229f4d30
        csd = 0x0
#7  0x00007fb11e611458 in ap_run_process_connection (c=0x7fb1229ecee0) at /usr/src/apache/apache2-2.2.16/server/connection.c:43
        pHook = 0x7fb11e86c890
        n = 1
        rv = -1
#8  0x00007fb11e61195d in ap_process_connection (c=0x7fb1229ecee0, csd=0x7fb1229eccf0) at /usr/src/apache/apache2-2.2.16/server/connection.c:190
        rc = -2
#9  0x00007fb11e624427 in child_main (child_num_arg=3) at /usr/src/apache/apache2-2.2.16/server/mpm/prefork/prefork.c:662
        current_conn = 0x7fb1229ecee0
        csd = 0x7fb1229eccf0
        ptrans = 0x7fb1229ecc78
        allocator = 0x7fb1229e98b0
        status = 0
        i = -1
        lr = 0x7fb11e845f10
        pollset = 0x7fb1229ead50
        sbh = 0x7fb1229ead48
        bucket_alloc = 0x7fb1229f0c98
        last_poll_idx = 0
#10 0x00007fb11e6245f6 in make_child (s=0x7fb11e84dd78, slot=3) at /usr/src/apache/apache2-2.2.16/server/mpm/prefork/prefork.c:758
        pid = 0

[...]

The source code can be displayed within GDB:

ctrl-x 2

What can we learn from this backtrace ?

The segfault happens in function ap_add_common_vars() in server/util_script.c at line 241.

Line 241 is:

rport = c->remote_addr->port;

Let's have a look to the pointers:

(gdb) print c
$2 = (conn_rec *) 0x7fb1229ecee0
(gdb) print c->remote_addr
$3 = (apr_sockaddr_t *) 0x0

Arrg. c->remote_addr is a null pointer (0x0). Were are dealing with a NULL pointer dereference.

(... read some code...)

This is kind of weird, since remote_addr store network information (ip, port, inet familly...) about the remote peer (the HTTP client), and therefore this pointer shouldn't be NULL at that place.

The core of Apache manages the creation of this c (connection) object and should have aborted if the remote_addr thingies couldn't have been retrieved from the socket (and debug messages would have pop up the in error log in this case)...

Remember that we are in ap_add_common_vars() (... read some code...), so the connection has been established already and we are now reading the request and setting up HTTP headers.

Also, Apache segfaults only under some vulnerability scans. There must be something something in between the socket handling and the ap_add_common_vars() call.

Right: Modules ! In the backtrace, only suphp shows up. But after backward inspecting the frames of the stacktrace (see below), you can see that c->remote_addr was already NULL.

Often, especially on a simple own software, the upper frames help to find where such a bug was introduced (where was the pointer nulled). However, Apache relies on hooks and modules, and here it seems that we cannot find this precious information.

Inspecting the upper frames:

(gdb) frame 6
#6  0x00007fb11e6198fa in ap_process_http_connection (c=0x7fb1229ecee0) at /usr/src/apache/apache2-2.2.16/modules/http/http_core.c:190
(gdb) print c->remote_addr
$4 = (apr_sockaddr_t *) 0x0

Track down the issue

Let's have a guess: c->remote_addr is correct at first. Then it is nulled, but we cannot see where from the backtrace. It might have been done by a module. A module that modifies c->remote_addr.

apache2ctl -M # or apache2 -M
[...]
core_module (static)
log_config_module (static)
logio_module (static)
rpaf_module (shared)
[...]

Uho ! What is mod_rpaf again ?

module for Apache2 which takes the last IP from the 'X-Forwarded-For' header

ding ding ding We might have a winner !

Let's read the source:

apt-get source libapache2-mod-rpaf
cd libapache2-mod-rpaf-0.5

Arg... Nothing that seems too dangerous... at first...

$ view mod_rpaf-2.0.c
[...]
            r->connection->remote_addr->sa.sin.sin_addr.s_addr = inet_addr(r->connection->remote_ip);
[...]
$ for p in debian/patches/*; do patch -p1 < $p; done
$ view mod_rpaf-2.0.c
[...]
            r->connection->remote_ip = apr_pstrdup(r->connection->pool, last_not_in_array(arr, cfg->proxy_ips));
            r->connection->remote_addr->sa.sin.sin_addr.s_addr = inet_addr(r->connection->remote_ip);
            tmppool = r->connection->remote_addr->pool;
            tmpport = r->connection->remote_addr->port;
            memset(r->connection->remote_addr, '\0', sizeof(apr_sockaddr_t));
            r->connection->remote_addr = NULL;
            apr_sockaddr_info_get(&(r->connection->remote_addr), r->connection->remote_ip, APR_UNSPEC, tmpport, 0, tmppool);

[...]

Hm :) remote_addr is nulled there ! Then, it is passed by reference to apr_sockaddr_info_get() to be filled up with the new data (the module uses the value of the X-Forwarded-For HTTP header to rewrite the ip the of client, remember?).

However, there is no sanity checks. What happens if apr_sockaddr_info_get() fails ?

r->connection->remote_addr = NULL;, so c->remote_addr is NULL later on in ap_add_common_vars() where it is dereferenced…

Fine, apr_sockaddr_info_get() should not fail. But here, it might be the case.

Let's check that out:

--- libapache2-mod-rpaf-0.5/mod_rpaf-2.0.c      2012-07-28 15:40:09.000000000 +0200
+++ rpaf_/mod_rpaf-2.0.c        2012-07-27 15:32:37.000000000 +0200
@@ -186,6 +186,7 @@
     apr_pool_t *tmppool;
     rpaf_server_cfg *cfg = (rpaf_server_cfg *)ap_get_module_config(r->server->module_config,
                                                                    &rpaf_module);
+    int rv;

     if (!cfg->enable)
         return DECLINED;
@@ -206,7 +207,11 @@
             tmpport = r->connection->remote_addr->port;
             memset(r->connection->remote_addr, '\0', sizeof(apr_sockaddr_t));
             r->connection->remote_addr = NULL;
-            apr_sockaddr_info_get(&(r->connection->remote_addr), r->connection->remote_ip, APR_UNSPEC, tmpport, 0, tmppool);
+            rv = apr_sockaddr_info_get(&(r->connection->remote_addr), r->connection->remote_ip, APR_UNSPEC, tmpport, 0, tmppool);
+            if(rv != APR_SUCCESS) {
+              ap_log_error(APLOG_MARK, APLOG_ERR, 0, NULL,
+                             "MYDEBUG apr_sockaddr_info_get != sucess -- %s : %d", r->connection->remote_ip, tmpport);
+            }
             if (cfg->sethostname) {
                 const char *hostvalue;
                 if (hostvalue = apr_table_get(r->headers_in, "X-Forwarded-Host")) {
$ tail -f /var/log/apache/error.log | egrep '(MYDEBUG|Segmentation)' 
[date] [error] MYDEBUG apr_sockaddr_info_get != success -- remote_ip: GARBAGE, 172.16.0.1 tmpport: 38601
[date] [notice] child pid 9282 exit signal Segmentation fault (11), possible coredump in /var/cache/apache2/coredump

Gotcha ! There is some garbage in remote_ip !

The silly request

I've been able to track the request by using tcpdump and looking for the garbage pattern:

# tcpdump -s 1500 -lnXvvv port 80 > /tmp/log
0x0080:  703a 2031 2722 3430 3030 0d0a 782d 666f  XXXXXXXXXXX.x-fo
0x0090:  7277 6172 6465 642d 666f 723a 2031 2722  rwarded-for:.YYY
0x00a0:  3530 3030 0d0a 6163 6365 7074 2d6c 616e  YYYY..accept-lan

Where YYYYYYY is the garbage pattern.

It is eventually a security issue

Well, a single request makes Apache segfault. On one environment that I tested, it even made the apache processes becoming zombies and killed the HTTP service totally.

Even if there are no zombies and only a single process dies, it is an issue: Apache will have to clone() for each new request.

This can lead to a DOS.

Conclusion:

A specific X-forwarded-For header ?

From my experiments:

A specific value is needed when targeting the reverse proxy. The reverse proxy adds his own X-forwarded-For header, and mod_rpaf is supposed to used the last one. Injecting a X-forwarded-For header can make apache segfault but it has to be a specific value (toto is not, for example)

I won't give the one value I've tested (and approved) before Debian updates its package (or tell me they are fine with this issue... Which I doubt)

Fix
What is the wrong part of the code ?

I do not understand why there is some garbage in remote_ip. There must be a bug in the memory handling in mod_rpaf.

Here is a quote from the CHANGES document released with mod_rpaf 0.6:

mod_rpaf was incorrectly using r->pool to allocate memory for the
ip. The correct pool for this when you are dealing wth keep-alive
requests was r->connection->pool.
UPDATE 0.6 version DOES NOT fix it

I previously stated that version 0.6 fixes the issue. I was wrong, I messed up when testing custom patches and packages.

Do not patch

The bug does not trigger when using unpatched 0.5 or 0.6 sources (remote_addr is not nulled.)

Workaround

Remove extra X-Forwarded-For headers added by (malicious?) clients on the reverse proxy before passing the requests to the backends.

affected environments and warnings sent to upstream

affected:

warnings sent to upstream:

CVE-2012-3526

A CVE number has been assigned to this issue: CVE-2012-3526