Ticket #50 (new bug)

Opened 2 months ago

Last modified 2 months ago

need better error handling of "too many open files" condition

Reported by: carns Assigned to: pcarns
Priority: minor Component: BMI
Version: HEAD Keywords: open files thread bmi
Cc:

Description

Kevin ran into this scenario on a server that accidentally been configured with a low ulimit for maximum number of open files:

(11:12:53 AM) hams: [E 06/23 05:55] Error: accept: Too many open files
(11:12:54 AM) hams: [E 06/23 05:55] src/io/bmi/bmi.c line 1046: Error: critical BMI_testcontext failure.
(11:12:54 AM) hams: [E 06/23 05:55] critical BMI failure.
(11:12:54 AM) hams: : Too many open files
(11:12:54 AM) hams: [E 06/23 05:55] bmi_thread_function thread terminating

In this specific case it could print an error message and maybe a suggestion about how to fix it, but then just try to keep processing. It would be better if the server recovered after the number of connections had been reduced rather than killing the BMI thread (which cripples the server until restarted).

Change History

06/23/08 10:42:06 changed by carns

Some other possible approaches: - check getrlimit on startup and warn if limit is low? - have a config file value for num open files? - actually do a setrlimit if limit is low? - warn in BMI if we detect that we are getting within a certain percentage of limit?