Running 12.04.3 LTS
A user submitting a batch job to the queue was receiving a
'too many open files error'
Investigation suggested changing the /etc/security/limits.conf file which has been done both for all users and for root individually specifying hard limits of 65536. ulimit -a returns the limit of 65536. Also the /etc/pam.d/ common-session, common-session-noninteractive, login, and ssh files have all had the line session required pam_limits.so added.
The problem persists and can be seen when looking at the /proc/pid/limits file for processes listed using lsof. Here the nofile limit for processes listed as sh under command remains as soft limit of 1024 and hard limit of 4096 which were the original system defaults. The limits for other processes, for example those with command grep, have changed to the new 65536 limit.
As written, the script being run requires a short burst of a large number of open files from which individual entries are being collected and transferred to another composite file. The failure always occurs at the 512th file which seems to indicate that it is bumping into the limit.
What needs to be changed to increase the limits for the commands falling under sh ?