#48412 worker threads do not detect abnormally closed connections
Closed: wontfix None Opened 8 years ago by mreynolds.

This only applies to post-nunc-stans: 1.3.4 and up

When a connection is abruptly closed due to an error, and there is still some data left to be read(buffer offset vs buffer bytes), connection_threadmain starts to loop. The connection is marked as closed, which prevents the buffer bytes/offset to be updated(e.g. in connection_read_operation()). So the worker thread endlessly tries to read the "data" over and over, but it should just remove the connection from the conn table since it's marked as closed.

While it's looping it's repeatedly taking the factory extension lock which creates a lot of contention with the other worker threads. Ultimately, all the worker threads are trying to read data on closed connections, and the server stops accepting new connections.


There are a few places where conn_buffered_data_avail_nolock is called:

In connection_read_operation() at line 1153: ok because just before this the connection was checked for invalid or closing

In connection_read_operation() at line 1280: I think conn_buffered_data_avail_nolock() needs some sort of way to return the fact that the connection is closed or invalid. In this case, at this location, the code needs to set ret = CONN_DONE and goto DONE, but only in the case where the connection is closed.

In connection_threadmain at line 1786: not sure what it should do here if the connection is closed or invalid - perhaps set more_data = 0 and thread_turbo_flag = 0, and just let the connection go through the normal release process in lines 1790-1816?

Replying to [comment:3 rmeggins]:

There are a few places where conn_buffered_data_avail_nolock is called:

In connection_read_operation() at line 1153: ok because just before this the connection was checked for invalid or closing

In connection_read_operation() at line 1280: I think conn_buffered_data_avail_nolock() needs some sort of way to return the fact that the connection is closed or invalid. In this case, at this location, the code needs to set ret = CONN_DONE and goto DONE, but only in the case where the connection is closed.

In connection_threadmain at line 1786: not sure what it should do here if the connection is closed or invalid - perhaps set more_data = 0 and thread_turbo_flag = 0, and just let the connection go through the normal release process in lines 1790-1816?

This is what the code does in 1.3.3. We don't check the buffer bytes/offset and just close it. The current fix does set more_data to 0, and the turbo flag is already 0.

Maybe we need to return -1 to show that the connection was closed instead of zero data to read? Then we can catch it in connection_read_operation() as you suggested above.

I'll work on this...

Replying to [comment:4 mreynolds]:

Replying to [comment:3 rmeggins]:

There are a few places where conn_buffered_data_avail_nolock is called:

In connection_read_operation() at line 1153: ok because just before this the connection was checked for invalid or closing

In connection_read_operation() at line 1280: I think conn_buffered_data_avail_nolock() needs some sort of way to return the fact that the connection is closed or invalid. In this case, at this location, the code needs to set ret = CONN_DONE and goto DONE, but only in the case where the connection is closed.

In connection_threadmain at line 1786: not sure what it should do here if the connection is closed or invalid - perhaps set more_data = 0 and thread_turbo_flag = 0, and just let the connection go through the normal release process in lines 1790-1816?

This is what the code does in 1.3.3. We don't check the buffer bytes/offset and just close it. The current fix does set more_data to 0, and the turbo flag is already 0.

Maybe we need to return -1 to show that the connection was closed instead of zero data to read? Then we can catch it in connection_read_operation() as you suggested above.

Ok, but note that the function currently returns size_t, which is unsigned. Maybe have it return PRInt64?

I'll work on this...

Replying to [comment:5 rmeggins]:

Maybe we need to return -1 to show that the connection was closed instead of zero data to read? Then we can catch it in connection_read_operation() as you suggested above.

Ok, but note that the function currently returns size_t, which is unsigned. Maybe have it return PRInt64?

Yeah I noticed that, so I just added an integer pointer parameter to mark if the connection is closed. This is cleaner anyway. Wrapping up some testing now, then I'll attach the new patch...

e3c009f..30c4852 master -> master
commit 30c4852
Author: Mark Reynolds mreynolds@redhat.com
Date: Fri Jan 15 11:35:16 2016 -0500

Forgot to commit to 1.3.4:

407c545..cd45d03 389-ds-base-1.3.4 -> 389-ds-base-1.3.4
commit cd45d03
Author: Mark Reynolds mreynolds@redhat.com
Date: Fri Jan 15 11:35:16 2016 -0500

Metadata Update from @rmeggins:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.4.7

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1743

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata