#48040 Need to check if listener fd is disabled before closing
Closed: wontfix None Opened 9 years ago by mreynolds.

If the nunc-stans listener is disabled, due to FDs being exhausted, the server will crash if it is stopped.


{{{ Program received signal SIGSEGV, Segmentation fault. 0x00007f28508547e3 in PR_Close () from /lib64/libnspr4.so (gdb) where #0 0x00007f28508547e3 in PR_Close () from /lib64/libnspr4.so #1 0x000000000041d339 in slapd_daemon (ports=0x7fff55974da0) at ../ds/ldap/servers/slapd/daemon.c:1562 #2 0x0000000000426537 in main (argc=7, argv=0x7fff55974ed8) at ../ds/ldap/servers/slapd/main.c:1279 (gdb) up #1 0x000000000041d339 in slapd_daemon (ports=0x7fff55974da0) at ../ds/ldap/servers/slapd/daemon.c:1562 1562 PR_Close( *fdesp ); (gdb) p *fdesp[0] $4 = {methods = 0x0, secret = 0x1b08480, lower = 0x0, higher = 0x0, dtor = 0x0, identity = -1} }}} When the the nunc-stans listener is enabled we see: {{{ (gdb) p *fdesp[0] $5 = {methods = 0x7fb873e7e060, secret = 0x15e5300, lower = 0x0, higher = 0x0, dtor = 0x0, identity = 0} }}} Need to check that the identity is not set to -1 before closing the FD.

We should not have to check the internals of the prfd. Something odd is going on, like we are closing the prfd twice, so that the second closure causes the crash.

So does disabling a listener via ns_job_modify(listener->ns_job, NS_JOB_DISABLE_ONLY) close the listener fd? If so, where? If not, how do we close the disabled listener fd?

Replying to [comment:4 rmeggins]: > So does disabling a listener via ns_job_modify(listener->ns_job, NS_JOB_DISABLE_ONLY) close the listener fd? If so, where? If not, how do we close the disabled listener fd? The FD is definitely being closed in nunc_stans. It appears to be happening here: {{{ event_del(job->ns_event_fw_fd); --> this closes the FD ns_event_fw_io_event_remove (ns_event_fw_ctx=0x2aab240, job=0x2af40c0) at ns_event_fw_event.c:98 #0 ns_event_fw_mod_io (ns_event_fw_ctx=0x2aab240, job=0x2af40c0) at ns_event_fw_event.c:181 #1 0x00007f9ed129269f in update_event (job=0x2af40c0) at ns_thrpool.c:163 #2 0x00007f9ed1292872 in event_q_notify (job=0x2af40c0) at ns_thrpool.c:215 #3 0x00007f9ed1293125 in ns_job_modify (job=0x2af40c0, job_type=0) at ns_thrpool.c:583 #4 0x000000000041c90b in ns_disable_listener (listener=0x2abb0d0) at ../ds/ldap/servers/slapd/daemon.c:1206 #5 0x000000000041fc06 in ns_handle_new_connection (job=0x2af40c0) at ../ds/ldap/servers/slapd/daemon.c:3215 #6 0x00007f9ed12929f9 in event_cb (job=0x2af40c0) at ns_thrpool.c:289 #7 0x00007f9ed12916bf in event_cb (fd=7, event=2, arg=0x2af40c0) at ns_event_fw_event.c:48 #8 0x00007f9ece237a44 in event_base_loop () from /lib64/libevent-2.0.so.5 #9 0x00007f9ed1291b9b in ns_event_fw_loop (ns_event_fw_ctx=0x2aab240) at ns_event_fw_event.c:252 #10 0x00007f9ed1292982 in event_loop_thread_func (arg=0x2ab22b0) at ns_thrpool.c:263 #11 0x00007f9ecee94c2b in _pt_root () from /lib64/libnspr4.so }}} Typical PR_Close (before and after): {{{ (gdb) p *fd $1 = {methods = 0x7fe1e2ac1060, secret = 0x11ef210, lower = 0x0, higher = 0x0, dtor = 0x0, identity = 0} (gdb) p *fd $2 = {methods = 0x0, secret = 0x11ef210, lower = 0x0, higher = 0x0, dtor = 0x0, identity = -1} }}} The later is exactly how the FD looks when the listener is disabled.

AFAICT, event_del does not close the fd - it just tells the event framework not to listen for events on this fd. I've looked at the source code for libevent - there is no call to close(ev->fd).

Okay, this is what is happening. The FD table gets full and we goto disable the listener: {{{ #0 ns_job_modify (job=0x2671120, job_type=0) at ns_thrpool.c:581 #1 0x000000000041c90b in ns_disable_listener (listener=0x2637160) at ../ds/ldap/servers/slapd/daemon.c:1206 #2 0x000000000041fc06 in ns_handle_new_connection (job=0x2671120) at ../ds/ldap/servers/slapd/daemon.c:3217 #3 0x00007f173afad9f9 in event_cb (job=0x2671120) at ns_thrpool.c:289 #4 0x00007f173afac6bf in event_cb (fd=7, event=2, arg=0x2671120) at ns_event_fw_event.c:48 #5 0x00007f1737f52a44 in event_base_loop () from /lib64/libevent-2.0.so.5 #6 0x00007f173afacb9b in ns_event_fw_loop (ns_event_fw_ctx=0x25fb900) at ns_event_fw_event.c:252 #7 0x00007f173afad982 in event_loop_thread_func (arg=0x262f2c0) at ns_thrpool.c:263 }}} This code path sets the job->job_type to 0 Then during the shutdown event we run into this code which closes that FD: {{{ #0 0x00007f06f90cf7e0 in PR_Close () from /lib64/libnspr4.so #1 0x00007f06fb4e75a1 in internal_ns_job_done (job=0x129b120) at ns_thrpool.c:146 #2 0x00007f06fb4e75db in update_event (job=0x129b120) at ns_thrpool.c:158 #3 0x00007f06fb4e7872 in event_q_notify (job=0x129b120) at ns_thrpool.c:215 #4 0x00007f06fb4e7cc4 in ns_job_done (job=0x129b120) at ns_thrpool.c:382 #5 0x0000000000420272 in ns_set_shutdown (job=0x129b000) at ../ds/ldap/servers/slapd/daemon.c:3510 #6 0x00007f06fb4e79f9 in event_cb (job=0x129b000) at ns_thrpool.c:289 #7 0x00007f06fb4e66bf in event_cb (fd=15, event=8, arg=0x129b000) at ns_event_fw_event.c:48 #8 0x00007f06f848cd6b in event_base_loop () from /lib64/libevent-2.0.so.5 #9 0x00007f06fb4e6b9b in ns_event_fw_loop (ns_event_fw_ctx=0x1225900) at ns_event_fw_event.c:252 #10 0x00007f06fb4e7982 in event_loop_thread_func (arg=0x12592c0) at ns_thrpool.c:263 }}} So in internal_ns_job_done(), since the job->job_type was set to 0, we close the FD when the listener is disabled. When the listener is enabled job type is not zero(it's 321). and its not closed. The question is, should we stop internal_ns_job_done() from closing FDs? Or stick with my fix? Seems like we should leave internal_ns_job_done() as is.

Ah, ok. Looks like we should call this:
{{{
ns_disable_listener(listener_info *listener)
...
ns_job_modify(listener->ns_job, NS_JOB_DISABLE_ONLY|NS_JOB_PRESERVE_FD);
...
}}}
That should tell nunc-stans not to close the fd.

Replying to [comment:8 rmeggins]:

Ah, ok. Looks like we should call this:
{{{
ns_disable_listener(listener_info *listener)
...
ns_job_modify(listener->ns_job, NS_JOB_DISABLE_ONLY|NS_JOB_PRESERVE_FD);
...
}}}
That should tell nunc-stans not to close the fd.

I was just looking into those flags :-) Very good, new patch attached.

To git+ssh://git.engineering.redhat.com/srv/git/users/mareynol/ds.git
9bef065..14d8059 nunc-stans -> nunc-stans
commit 14d80591be13c82ca63248f18dbd6a6c45379792

Metadata Update from @mreynolds:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.4 backlog

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1371

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata