MemcacheD的PHP会话锁定问题


PHP session lock issue with MemcacheD

Nginx在include.php脚本中调用的session_start()上抛出502 Bad Gateway。

PHP会话存储由MemcacheD 处理

# nginx -v
nginx version: nginx/1.4.6 (Ubuntu)
# php5-fpm -v
PHP 5.5.9-1ubuntu4.14 (fpm-fcgi) (built: Oct 28 2015 01:38:24)
Copyright (c) 1997-2014 The PHP Group
Zend Engine v2.5.0, Copyright (c) 1998-2014 Zend Technologies
    with Zend OPcache v7.0.3, Copyright (c) 1999-2014, by Zend Technologies
# memcached -h
memcached 1.4.14
# pecl list
Installed packages, channel pecl.php.net:
=========================================
Package   Version State
memcached 2.1.0   stable
# php -c /etc/php5/fpm/php.ini -i | grep session
session
session.auto_start => Off => Off
session.cache_expire => 180 => 180
session.cache_limiter => nocache => nocache
session.cookie_domain => no value => no value
session.cookie_httponly => Off => Off
session.cookie_lifetime => 0 => 0
session.cookie_path => / => /
session.cookie_secure => Off => Off
session.entropy_file => /dev/urandom => /dev/urandom
session.entropy_length => 32 => 32
session.gc_divisor => 1000 => 1000
session.gc_maxlifetime => 1440 => 1440
session.gc_probability => 0 => 0
session.hash_bits_per_character => 5 => 5
session.hash_function => 0 => 0
session.name => PHPSESSID => PHPSESSID
session.referer_check => no value => no value
session.save_handler => memcached => memcached
session.save_path => 127.0.0.1:11211 => 127.0.0.1:11211
session.serialize_handler => php => php
session.upload_progress.cleanup => On => On
session.upload_progress.enabled => On => On
session.upload_progress.freq => 1% => 1%
session.upload_progress.min_freq => 1 => 1
session.upload_progress.name => PHP_SESSION_UPLOAD_PROGRESS => PHP_SESSION_UPLOAD_PROGRESS
session.upload_progress.prefix => upload_progress_ => upload_progress_
session.use_cookies => On => On
session.use_only_cookies => On => On
session.use_strict_mode => Off => Off
session.use_trans_sid => 0 => 0
# php -c /etc/php5/fpm/php.ini -i | grep memcached
/etc/php5/cli/conf.d/20-memcached.ini,
memcached
memcached support => enabled
libmemcached version => 1.0.8
memcached.compression_factor => 1.3 => 1.3
memcached.compression_threshold => 2000 => 2000
memcached.compression_type => fastlz => fastlz
memcached.serializer => php => php
memcached.sess_binary => no value => no value
memcached.sess_lock_wait => 150000 => 150000
memcached.sess_locking => 1 => 1
memcached.sess_prefix => memc.sess.key. => memc.sess.key.
Registered save handlers => files user memcached
session.save_handler => memcached => memcached

在深入研究系统调用时,我发现了坏网关的可能原因。

在php5-fpm过程中,我得到了很多这样的东西。

# strace -p 12927 -ff -tt
Process 12927 attached
11:13:01.205991 restart_syscall(<... resuming interrupted call ...>) = 0
11:13:01.309243 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.309411 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.309535 nanosleep({0, 150000000}, NULL) = 0
11:13:01.459913 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.460049 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.460118 nanosleep({0, 150000000}, NULL) = 0
11:13:01.610353 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.610480 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.610521 nanosleep({0, 150000000}, NULL) = 0
11:13:01.760785 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.760944 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.761064 nanosleep({0, 150000000}, NULL) = 0
11:13:01.911438 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:01.911575 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:01.911643 nanosleep({0, 150000000}, NULL) = 0
11:13:02.061920 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.062088 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.062211 nanosleep({0, 150000000}, NULL) = 0
11:13:02.212470 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.212611 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.212693 nanosleep({0, 150000000}, NULL) = 0
11:13:02.362917 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.362999 recvfrom(6, 0x2967068, 8196, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:13:02.363065 poll([{fd=6, events=POLLIN}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
11:13:02.363196 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.363241 nanosleep({0, 150000000}, NULL) = 0
11:13:02.513457 sendto(6, "add memc.sess.key.lock.oqr9vso3a"..., 69, MSG_DONTWAIT|MSG_NOSIGNAL, NULL, 0) = 69
11:13:02.513531 recvfrom(6, 0x2967068, 8196, 64, 0, 0) = -1 EAGAIN (Resource temporarily unavailable)
11:13:02.513581 poll([{fd=6, events=POLLIN}], 1, 5000) = 1 ([{fd=6, revents=POLLIN}])
11:13:02.513619 recvfrom(6, "NOT_STORED'r'n", 8196, MSG_DONTWAIT, NULL, NULL) = 12
11:13:02.513651 nanosleep({0, 150000000}, ^CProcess 12927 detached

这导致了一个无休止的循环,直到nginx的耐心结束并抛出502错误。

对memcached进程进行排序的输出相同。

据我所知,已经有一个会话具有这样的标识符,当memcached试图添加相同的密钥时,它会返回NOT_STORED,导致超时。。。

有什么提示我应该在哪里进一步挖掘以找到解决方案吗?

非常感谢!

发现PHP代码包含一个递归函数,该函数导致嵌套循环,导致PHP错误(由于生产环境而隐藏,只有在数据库复制到开发时才会在xdebug中弹出)

Fatal error: Maximum function nesting level of '100' reached, aborting! in...

修复php错误使我在开发中得到了进一步的帮助,但并不能解决最初的问题——为什么在调用session_start()时会出现无休止的not_STORED消息

php手动会话部分中的

我找到这个

void session_write_close ( void )   End the current session and store session data.   Session data is usually stored after your script terminated without the need to call session_write_close(), but as session data is locked to prevent concurrent writes only one script may operate on a session at any time. When using framesets together with sessions you will experience the frames loading one by one due to this locking. You can reduce the time needed to load all the frames by ending the session as soon as all changes to session variables are done.

你终于解决了这个问题吗?