Ingress controller reported: epoll_create () failed (24: Too many open files)
Ingress controller does not work with a POWER machine, which has 160 cores. Ingress controller might fail when it is running on a node with lots of cores.
Ingress controller might be running on a node that has too many cores. The maximum number of open file descriptors is calculated with the following formula: *RLIMIT_NOFILE/worker-processes) - 1024. To resolve, you can either decrease the value of the worker processes, or increase the value of the RLIMIT_NOFILE of the container.
Solution one: Edit the configMap of nginx-ingress-controller with a decreased value of worker-processes.
To edit the configmap of nginx-ingress-controller, run the following command:
kubectl -n kube-system edit cm nginx-ingress-controller
Add worker-processes: "2" to the configMap, as it is in the following example. Note: The value might not be 2, depending on your sysctl configuration.
# Edit the following object. Lines beginning with a '#' are ignored, # and an empty file will abort the edit. If an error occurs while saving this file will be # reopened with the relevant failures. # apiVersion: v1 data: body-size: "0" disable-access-log: "true" worker-processes: "2"
Just deploy a fresh k8s cluster with crio, deploy ingress-nginx and try to load ingresses. Nginx throws errors because it cannot respawn its worker processes:
2022/04/05 09:27:49 [alert] 56#56: sendmsg() failed (9: Bad file descriptor) 2022/04/05 09:27:49 [alert] 56#56: sendmsg() failed (9: Bad file descriptor) 2022/04/05 09:27:49 [alert] 1415#1415: pthread_create() failed (11: Resource temporarily unavailable) 2022/04/05 09:27:49 [alert] 1411#1411: pthread_create() failed (11: Resource temporarily unavailable) 2022/04/05 09:27:49 [alert] 1431#1431: pthread_create() failed (11: Resource temporarily unavailable) 2022/04/05 09:27:50 [alert] 56#56: worker process 1190 exited with fatal code 2 and cannot be respawned 2022/04/05 09:27:50 [alert] 56#56: worker process 1191 exited with fatal code 2 and cannot be respawned
If i just switch from crio to docker it works without any error - thats why i think the issue is related to crio and not to ingress-nginx.
The issue only happens if nginx has many worker processes. With testing i think the sweet spot is around 14- 18 worker processes.
We ran into the issue because by default, nginx-ingress has an auto setting for worker processes which spawns as many workers as cores are detected. If you have large systems with xxx cores the issue occures.