(php-fpm generated a lot of load when resetting opcache, rendering server unresponsive)

发布于 2021-03-19 05:21:40

最近几周，我们在部署代码时遇到了一种负面现象：服务器有时会在几分钟内变得无响应。

这是发生这种情况时服务器负载的示例：

我可以找到的唯一相关日志来自/var/log/php7.2-fpm.log，有时（但并非总是）我看到这样的条目（请注意：这是与上图所示的事件不同的事件，尽管发生了相同的情况）：

[22-Mar-2019 15:33:50] WARNING: [pool api] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 32 children, there are 0 idle, and 231 total children
[22-Mar-2019 15:33:52] WARNING: [pool api] server reached pm.max_children setting (250), consider raising it
[22-Mar-2019 15:34:05] WARNING: [pool app] seems busy (you may need to increase pm.start_servers, or pm.min/max_spare_servers), spawning 8 children, there are 47 idle, and 104 total children

发生的事情是我们对该服务器进行了部署：

git status --porcelain 检查修改
git pull origin master 更新文件
重置操作缓存，即我们调用执行以下操作的端点 opcache_reset()
清除本地缓存文件

经过一些试验，我可以将负载问题减少到这样： opcache_reset()

一旦执行了此调用（与任何先前或以后的部署步骤无关，当我仅调用此端点时，此操作也会单独发生），系统负载可能会突然激增。

如果发生这种情况，并且负载“过高”（根据经验，我会说> 200左右），系统将变得无响应，直到几秒钟或几分钟，这取决于一切需要多长时间才能平静下来。

眼镜：

在VMWare上运行的VM（不是我们自己托管的，我们是合作伙伴）
4个vCPU
8GB RAM
8GB交换空间
Ubuntu 18.04 TS
nginx 1.14.0（Ubuntu默认）
PHP 7.2（通过https://launchpad.net/~ondrej/+archive/ubuntu/php）

PHP-FPM配置：

我们正在使用具有不同虚拟主机的6个池
的start_servers直接总计到320 PHP-fpm的过程（也可以通过确认ps auxw|grep -i fpm|grep -v grep |wc -l）
max_children所有泳池的总数约为870

也许这里的总数太高了，其目的是为了应对有时偶尔出现的单个虚拟主机上的峰值。

使用htop，系统通常如下所示：

和通常的负荷低，除非我们在这一尖峰它们都涉及到opcache复位（这是我刚刚发现了）：

我知道重置缓存，现在所有进程都必须重新填充缓存，这会占用大量CPU。

但是我不明白的是：

这只是最近才开始发生的，例如大约1-2个月，但是直到最近两周，反应迟钝才很明显
它并不总是发生，有时在重置缓存时什么也没有发生

这是opcache_get_status(false)部署之前的输出：

{
  "opcache_enabled": true,
  "cache_full": false,
  "restart_pending": false,
  "restart_in_progress": false,
  "memory_usage": {
    "used_memory": 67353640,
    "free_memory": 66864088,
    "wasted_memory": 0,
    "current_wasted_percentage": 0
  },
  "interned_strings_usage": {
    "buffer_size": 8388608,
    "used_memory": 5215176,
    "free_memory": 3173432,
    "number_of_strings": 89109
  },
  "opcache_statistics": {
    "num_cached_scripts": 2873,
    "num_cached_keys": 5063,
    "max_cached_keys": 7963,
    "hits": 633581523,
    "start_time": 1553172771,
    "last_restart_time": 1553248200,
    "oom_restarts": 0,
    "hash_restarts": 0,
    "manual_restarts": 6,
    "misses": 9512,
    "blacklist_misses": 0,
    "blacklist_miss_ratio": 0,
    "opcache_hit_rate": 99.9984987161316
  }
}

然后在这里：

{
  "opcache_enabled": true,
  "cache_full": false,
  "restart_pending": false,
  "restart_in_progress": false,
  "memory_usage": {
    "used_memory": 57745856,
    "free_memory": 76471872,
    "wasted_memory": 0,
    "current_wasted_percentage": 0
  },
  "interned_strings_usage": {
    "buffer_size": 8388608,
    "used_memory": 4337168,
    "free_memory": 4051440,
    "number_of_strings": 75163
  },
  "opcache_statistics": {
    "num_cached_scripts": 2244,
    "num_cached_keys": 3925,
    "max_cached_keys": 7963,
    "hits": 5893926,
    "start_time": 1553172771,
    "last_restart_time": 1553265235,
    "oom_restarts": 0,
    "hash_restarts": 0,
    "manual_restarts": 7,
    "misses": 4962,
    "blacklist_misses": 0,
    "blacklist_miss_ratio": 0,
    "opcache_hit_rate": 99.91588245106536
  }
}

我观察到的其他内容：

php-fpm即将停止响应
除非负载真的很高，否则nginx仍然可以工作。我确认了这一点，因为当php-fpm基本无法访问时，nginx会提供已配置的500页

是什么原因导致这些负载尖峰真的？我该如何避免它们？

接受答案后更新：

基本上，根本不调用大多数我的opcache设置自定义项opcache_reset 并将其恢复为默认值（即不强加它们）来修复它。

实际上，这一步骤是我部署流程的一部分。我想找出最初的原因，据我所知，它与类引用未加载/刷新的新代码时的部署问题有关。

事后看来，我什至不确定这是否是实际的问题，但我们现在就在这里。

Questioner

mark

Viewed

Michael Hampton 2019-03-25 21:31:10

默认情况下，PHP检查文件时间戳以使opcache条目无效。可以将其关闭，这是我能想到的唯一opcache_reset()可以使用的方案。当然，它也会引起你遇到的问题。

我建议回到默认值：

opcache.validate_timestamps = 1
opcache.revalidate_freq = 2
opcache.revalidate_path = 0

mark 1970-01-01 08:00:00

非常感谢你; 尽管事后看来很明显，但并没有那么明显地理解这根本没有必要:-)

nginx-重置opcache时，php-fpm产生大量负载，导致服务器无响应

(php-fpm generated a lot of load when resetting opcache, rendering server unresponsive)

发表评论取消回复

(php-fpm generated a lot of load when resetting opcache, rendering server unresponsive)

发表评论 取消回复

发表评论取消回复