PHP exited on signal 7 (SIGBUS)
It breaks
Debug of the day: random PHP crashes (SIGBUS) on a production platform. The website is Prestashop, a (quite) well known online shop software.
PHP crashes and 500 errors had been reported for this website, running under SuPHP/CGI mode. When I migrated this website to a new platform, running PHP-FPM, crashes seemed to be more prevalent, and, more importantly, had a much more important impact:
I/O (or maybe random) functions were "hanging", so that parts of the website were not working at all (timeout). Indeed, a crash of a PHP process in a FPM pool happen to make the whole pool feel... bad.
Well, emergency_restart_threshold / emergency_restart_interval are there for a reason.
However, if a whole pool of PHP processes behave wrongly every single or two crashes, and that crashes are as regular as 1 every 3 or 5 minutes, you'ld better... fix the software.
Necessary advisory
.
/!\ warning: I am not a developper, and don't know anything about prestashop
--- other than "it is a shop software, and some are hosted on servers I manage"
(Please tell me if there are errors)
Look at the source of the problem
So here we go.
# echo "/tmp/core-%e.%p" > /proc/sys/kernel/core_pattern # vi /etc/init.d/php5-fpm # add 'ulimit -c unlimited' at a right place # /etc/init.f/php5-fpm restart
Make yourself a coffee. Wait for coredumps.
Let's see what the coredump can tell us (probably on a dev/preprod machine)
# apt-get install gdb php5-dbg build-essential
# gdb /usr/sbin/php5-fpm /tmp/core-php5-fpm.6739
Core was generated by `php-fpm: pool bigotter '.
Program terminated with signal 7, Bus error.
#0 lex_scan (zendlval=0x7fff20bda898) at Zend/zend_language_scanner.l:2000
(gdb) bt
#0 lex_scan (zendlval=0x7fff20bda898) at Zend/zend_language_scanner.l:2000
#1 0x00000000006885e0 in zendlex (zendlval=0x7fff20bda890) at /tmp/buildd/php5-5.4.4/Zend/zend_compile.c:6692
#2 0x0000000000662c42 in zendparse () at /tmp/buildd/php5-5.4.4/Zend/zend_language_parser.c:3432
#3 0x00000000006680fb in compile_file (file_handle=<incomplete type>, type=2) at Zend/zend_language_scanner.l:582
#4 0x000000000051d27a in phar_compile_file (file_handle=<incomplete type>, type=32767) at /tmp/buildd/php5-5.4.4/ext/phar/phar.c:3391
#5 0x00000000006682b0 in compile_filename (type=2, filename=0x7fa871878ce8) at Zend/zend_language_scanner.l:625
#6 0x00000000006e1e2b in ZEND_INCLUDE_OR_EVAL_SPEC_TMP_HANDLER (execute_data=0x7fa871878af8) at /tmp/buildd/php5-5.4.4/Zend/zend_vm_execute.h:7006
#7 0x00000000006fed17 in execute (op_array=0x7fa8718b1af8) at /tmp/buildd/php5-5.4.4/Zend/zend_vm_execute.h:410
#8 0x000000000069eb5e in zend_execute_scripts (type=8, retval=0x7fa8718a8d38, file_count=3) at /tmp/buildd/php5-5.4.4/Zend/zend.c:1279
#9 0x000000000063e133 in php_execute_script (primary_file=0x6aabfc) at /tmp/buildd/php5-5.4.4/main/main.c:2473
#10 0x0000000000431ac0 in main (argc=1904694192, argv=0xdc2660) at /tmp/buildd/php5-5.4.4/sapi/fpm/fpm/fpm_main.c:1885
(gdb) frame 5
#5 0x00000000006682b0 in compile_filename (type=2, filename=0x7fa871878ce8) at Zend/zend_language_scanner.l:625
625 in Zend/zend_language_scanner.l
(gdb) print *filename
$1 = {value = {lval = 1904908816, dval = 6.9347763598749545e-310, str = {val = 0x7fa8718a9a10 "/var/www/bigotter/www/cache/class_index.php", len = 43}, ht = 0x7fa8718a9a10, obj = {handle = 1904908816, handlers = 0x2b}}, refcount__gc = 1904909728, type = 6 '\006', is_ref__gc = 127 '\177'}
So:
- Crash happens in the lexer, a step of the parsing of the PHP file which converts a sequence of characters into a sequence of tokens... Simple guess: the file is corrupted while reading it
- The file which causes this crash is cache/class_index.php
After some researches, it appears that this file is regularly rewritten, used in a cache system for Prestashop.
Let's see how it is written:
// Write classes index on disc to cache it
$filename = $this->root_dir.Autoload::INDEX_FILE;
if ((file_exists($filename) && !is_writable($filename)) || !is_writable(dirname($filename)))
throw new PrestaShopException($filename.' is not writable, please give write permissions (chmod 666) on this file.');
else
{
// Let's write index content in cache file
// In order to be sure that this file is correctly written, a check is done on the file content
$loop_protection = 0;
do
{
$integrity_is_ok = false;
file_put_contents($filename, $content);
if ($loop_protection++ > 10)
break;
// If the file content end with PHP tag, integrity of the file is ok
if (preg_match('#\?>\s*$#', file_get_contents($filename)))
$integrity_is_ok = true;
}
while (!$integrity_is_ok);
if (!$integrity_is_ok)
{
file_put_contents($filename, '');
throw new PrestaShopException('Your file '.$filename.' is corrupted. Please remove this file, a new one will be regenerated automatically');
}
}
Facepalm. I posted that on a pastebin and shared it around among friends, technical people. It seems that I'm not the only one to be amazed.
What's wrong?
- A loop for writing a single file up to 12 times if previous attempts failed (WTF?!)
- No check for the return code of file_put_contents...
- ...but a 'preg_match' check afterwards. WTF?!
Well... A totally broken approach of handling "corruption problems", as stated in this commit, a "bugfix" for this report.
The good question is not "how to work around corruption problems?". It is "how do corruption problems appear, and how to prevent them?".
Notes:
- An advisory lock have been added to file_put_contents() recently. It's worth having it, but it is useless. PHP's file_put_contents() and file_get_contents() do not respect advisory locks (moar reading: should LOCK_EX on both read & write be atomic?)
- PHP does not respect exclusive locks for require(), include(), and does not handle broken files nicely: see Bug #52752 Crash when lexing
Concurrency
The issue here is concurrency. Or rather the lack of handling of concurrency.
cache/class_index.php is a PHP file. It is certainly included in some other PHP file, probably loaded for each request to the prestashop application.
It is also rewritten regularly, using the PHP function file_put_contents() which is not atomic: 3 ops: open, write, close.
But, hey, PHP developper... There is more than one PHP process on a loaded production server!
Sometimes, the file needs to be refreshed, so a process opens the file... Begins to write into it. And another process loads the file, as it is required for the application, before the file is fully written.
Boum, big badaboum. "Corruption problem".
The issue is more and more prevalent as more and more PHP processes are running concurrently.
Sad, but true.
The more you get visitors on your website, the more it breaks.
How do I fix that?
Use atomic operations. The correct way to refresh the file is to create a temporary file, and then rename it to the proper name. Why? rename is atomic. file_put_contents is not.
This is an untested patch (against master on github):
--- a/classes/Autoload.php
+++ b/classes/Autoload.php
@@ -136,27 +136,13 @@ class Autoload
}
else
{
- // Let's write index content in cache file
- // In order to be sure that this file is correctly written, a check is done on the file content
- $loop_protection = 0;
- do
- {
- $integrity_is_ok = false;
- file_put_contents($filename, $content, LOCK_EX);
- if ($loop_protection++ > 10)
- break;
-
- // If the file content end with PHP tag, integrity of the file is ok
- if (preg_match('#\?>\s*$#', file_get_contents($filename)))
- $integrity_is_ok = true;
- }
- while (!$integrity_is_ok);
-
- if (!$integrity_is_ok)
- {
- file_put_contents($filename, '', LOCK_EX);
- // Cannot use PrestaShopException in this context
- die('Your file '.$filename.' is corrupted. Please remove this file, a new one will be regenerated automatically');
+ $filename_tmp = $filename.'.tmp.'.getmypid();
+ if(file_put_contents($filename_tmp, $content, LOCK_EX) !== FALSE) {
+ rename($filename_tmp, $filename);
+ } else {
+ // $filename_tmp couldn't be written. $filename should be there anyway (even if outdated),
+ // no need to die.
+ error_log('Cannot write file '.$filename_tmp);
}
}
Notes:
- getmypid() permits to compute a unique temporary filename. 'mktemp' like functions/libs could be used instead.
- moar checking could be done... (but hey... Errors in that part of prestashop is currently handled with... die())
Then, no more corruption problems, no more SIGBUS issues, PHP is in peace with Prestashop.
Update: proper fix merged in Prestashop
I proposed my cheesy patch (with getmypid()) to Prestashop to bring the issue to their attention. I eventually made a PR with a proper fix using tempnam() which has been merged into the development branch so that upcoming Prestashop releases will hopefully be free of this crap :)