Symptoms
I get "communication failure" issues when I want to list the users of my system (but not the groups) or when I wan to edit the ACL of a shared folder.
Exactly 1 minute after the HTTP request is sent, my browser get a 504 Gateway Timeout and Nginx adds a 499 HTTP response code in it's access.log log file.
Context
I have 539 users and 175 groups (including thoses coming from LDAP via nslcd, libnss-ldapd, libpam-ldapd) so I guessed it was a "too-many things to iterate in too-short time" issue.
Debugging
I tested with /usr/sbin/omv-rpc:
time /usr/sbin/omv-rpc "UserMgmt" "getGroupList" '{"start":0,"limit":null,"sortfield":null,"sortdir":null}'
time /usr/sbin/omv-rpc "UserMgmt" "getUserList" '{"start":0,"limit":null,"sortfield":null,"sortdir":null}'
and got the following timings:
- getGroupList:
- getUserList:
There you go: getting the groups executes in about 2s, but getting the users takes 1m24s, which exceeds the 1 minute time out.
By looking/debugging the following source files (added microtime(true) and logged to /tmp file):
- /usr/share/php/openmediavault/system/user.inc: for getUsers()
- /usr/share/php/openmediavault/system/group.inc: for getGroups()
- /usr/share/openmediavault/engined/rpc/usermgmt.inc: for enumerateUsersByType(), enumerateGroupsByType(), getUserList(), getGroupList()
I got the following execution detail:
- getGroupList:
Code
OMV\System\Group::getGroups(): cmd executed in 0.035168 OMV\System\Group::getGroups(): output of 175 items parsed in 0.000187 Engined\Rpc\UserMgmt::enumerateGroupsByType(): got list of 175 items in 0.035927 Engined\Rpc\UserMgmt::enumerateGroupsByType(): result array of 108 items filled in 1.582890 Engined\Rpc\UserMgmt::getGroupList(): got list of 108 items in 1.619559 Engined\Rpc\UserMgmt::getGroupList(): groups array of 108 items enriched in 0.079005
- getUserList:
Code
OMV\System\User::getUsers(): cmd executed in 0.103545 OMV\System\User::getUsers(): output of 539 items parsed in 0.000745 Engined\Rpc\UserMgmt::enumerateUsersByType(): got list of 539 items in 0.105240 Engined\Rpc\UserMgmt::enumerateUsersByType(): result array of 500 items filled in 84.242414 Engined\Rpc\UserMgmt::getUserList(): got list of 500 items in 84.348474 Engined\Rpc\UserMgmt::getUserList(): users array of 500 items enriched in 0.181667
Analysis
The foreach in UserMgmt::enumerateUsersByType() calls UserMgmt::getUserInfo() which instanciates a new OMV\System\User and fetch it's data using new getent or id system calls (about 3 times according to OMV\System\User::getData()).
I think my "bottleneck" is here: 539 users * 3 system calls: *ouch*.
Unless there is a way to add cache to these data (at various levels) I don't think I could get the list of users in less then 1 minute. And I can live with that.
Workaround (yet to be found)
But I would like Nginx/FPM/WebGUI to wait thoses 1m24s because -I've checked- the UserMgmt::getUserList() triggered from the WebGUI does finishes it's job even after Nginx timed-out.
I tried setting the following in Nginx configuration (/etc/nginx/sites-enabled/openmediavault-webgui without luck:
server {
# [...]
location ~ \.php$ {
# [...]
fastcgi_read_timeout 180s;
client_header_timeout 180s;
}
client_body_timeout 180s;
}
And I could not find anything in the /etc/php/7.3/*/php.ini or /etc/php/7.3/fpm/pool.d/*.conf files.
There is a setting I failed to think about: but which one?
Thanks for any help you could provide.
Versions used
- Debian v10
- Kernel v5.10.0
- openmediavault v5.6.13
- Nginx v1.14.2
- PHP v7.3.29
- PHP-FPM v7.3.29