On a multi CPU system, ns-slapd process aborts. Stack trace says it aborted calling PR_ASSERT() at the following location in connection.c (Release version of 389 dirsrv is 1.2.0). Problem occurs on latest code as well.
Connection.c 1690 int connection_read_operation(Connection *conn, Operation *op, ber_tag_t *tag, int *remaining_data) 1691 { 1734 } 1735 /* If we still haven't seen a complete PDU, read from the network */ 1736 while (*tag == LBER_DEFAULT) { 1737 int ioblocktimeout_waits = config_get_ioblocktimeout() / CONN_TURBO_TIMEOUT_INTERVAL; 1738 /* We should never get here with data remaining in the buffer */ 1739 PR_ASSERT( !new_operation || 0 == (conn->c_private->c_buffer_bytes - conn->c_private->c_buffer_offset) );<------------------ aborted calling PR_ASSERT() 1740 /* We make a non-blocking read call */ -----
Using instrumented image I have collected data and found that:
While a thread is working on connection's private buffer, another thread which belongs to older connection makes the connection readable again by modifying c_gettingber to zero. This results in waking up one more thread. Finally more than one thread works simultaneously on connection's private buffer and results in aborting ns-slapd.
It happens as below:
Thread X is working on Connection-1's last operation.
After performing the LDAP operation ( connection_dispatch_operation() ) decrements the connection's c_refcnt by calling connection_release_nolock() in connection_threadmain() function.
Since this is the last operation, c_refcnt becomes zero.
Thread X yet to execute the code that updates c_threadnumber and c_gettingber. But as soon as c_refcnt of the connection becomes zero, main thread allocates the same connection object/structure to new incoming connection connection-2. Thread-Y started executing operation belongs to this newer connection (Connection-2).
connection.c
2008 static void 2009 connection_threadmain() 2010 {
2187 if (!thread_turbo_flag&& !more_data) { 2188 connection_release_nolock (conn);<------------------ Thread-X decremented c_refcnt and now c_refcnt = 0 2189 } 2190 PR_Unlock( conn->c_mutex ); 2191 } 2192 / Since we didn't do so earlier, we need to make a replication connection readable again here / 2193 if ( ((1 == is_timedout) || (replication_connection&& !thread_turbo_flag))&& !more_data) 2194 connection_make_readable(conn); 2195 pb = NULL; 2196 2197 if (!thread_turbo_flag&& !more_data) { / Don't do this in turbo mode / 2198 PR_Lock( conn->c_mutex ); 2199 / if the threadnumber of now below the maximum, wakeup 2200 * the listener thread so that we start polling on this 2201 * connection again 2202 / 2203 / DBDB I think this code is bogus -- we already signaled the listener above here / 2204 if (conn->c_threadnumber == config_get_maxthreadsperconn()) 2205 need_wakeup = 1; 2206 else 2207 need_wakeup = 0; 2208 conn->c_threadnumber--; 2209 PR_Unlock( conn->c_mutex ); 2210 2211 if (need_wakeup) 2212 signal_listner(); 2213 } 2214 2215 2216 } / while (1) / 2217 }
Thread-Y started processing operation data belongs to newer connection Connection-2 (connection_read_operation() ).
Now Thread-X of connection-1 called make_connection_readable() and made the connection readable again by setting c_gettingber to 0.
This wakes up another thread of connection-2 and it too starts processing operation data (connection_read_operation() ).
So at this point of time more than one thread is working on connection's private data.
As two threads are working on the connections c_buffer, c_bufferbytes and c_offset, at some time it meats conditions to abort() at the following location:
Connection.c: connection_read_operation()
1734 } 1735 /* If we still haven't seen a complete PDU, read from the network */ 1736 while (*tag == LBER_DEFAULT) { 1737 int ioblocktimeout_waits = config_get_ioblocktimeout() / CONN_TURBO_TIMEOUT_INTERVAL; 1738 /* We should never get here with data remaining in the buffer */ 1739 PR_ASSERT( !new_operation || 0 == (conn->c_private->c_buffer_bytes - conn->c_private->c_buffer_offset) );<------------------ aborted calling PR_ASSERT() 1740 /* We make a non-blocking read call */
The probable fix could be:
In the connection_threadmain() function, if it is CONN_DONE don't make the connection readable again after decrementing the c_refcnt.
So should we handle CONN_DONE and CONN_TIMEDOUT situation differently as below?
Present code:
2097 switch (ret) { 2098 case CONN_DONE: 2099 /* This means that the connection was closed, so clear turbo mode */ 2100 /*FALLTHROUGH*/ 2101 case CONN_TIMEDOUT: 2102 thread_turbo_flag = 0; 2103 is_timedout = 1;
new code:
2097 switch (ret) { 2098 case CONN_DONE: 2099 /* This means that the connection was closed, so clear turbo mode */ is_conn_done = 0;<----------------- thread_turbo_flag = 0;<------------- 2101 case CONN_TIMEDOUT: 2102 thread_turbo_flag = 0; 2103 is_timedout = 1;
Note that you don't have to set thread_turbo_flag = 0 in the CONN_DONE case because it will be set when it falls through to the CONN_TIMEDOUT case.
Present:
2193 if ( ((1 == is_timedout) || (replication_connection&& !thread_turbo_flag))&& !more_data ) 2194 connection_make_readable(conn); 2195 pb = NULL;
new:
2193 ---------> if ( !is_conn_done&& ((1 == is_timedout) || (replication_connection&& !thread_turbo_flag))&& !more_data ) 2194 connection_make_readable(conn); 2195 pb = NULL;
If ret is CONN_DONE, but you set is_conn_done = 0, how does the code know the connection is really done?
If I understand the problem correctly, it is that we should not touch the conn object at all after doing the connection_release_nolock() - is that correct?
So perhaps the simplest solution will be to move the connection_release_nolock() to after the code that does the connection_make_readable() and the threadnumber/wakeup handling? And also in the doshutdown case - have to release the connection before doing the return. Might also need a connection_make_readable_nolock() so that we can reset the gettingber flag and decrement the refcount and do the threadnumber handling inside the same mutex.
I think in your proposed solution, the code will not do the connection_make_readable() until it loops again - this seems like a minor change in behavior, but the code in connection_threadmain is very sensitive - small changes can have very large unintended consequences.
reproducer - c code
reproducer - header
reproducer - Makefile
As you said, decrementing the reference count should be the last thing before control goes back to starting of the while loop in connection_threadmain. But does this change has any ill effects as signal_listner() and connection_make_readable() are called before decrementing reference count?? (That is waking up the listener thread and setting gettingber to zero before decrementing reference count.) I request your inputs on this.
connection_make_readable() - as long as all code that accesses gettingber first acquires the conn->c_mutex lock, it should be fine
signal_listner() - probably better to set the need_wakeup flag inside the mutex, then call signal_listner() after doing the connection_release and the PR_Unlock()
In case of sync operations, most of the times the thread number doesn't become equal to maxthreadsperconn. You'll have to heavily load the server to see that case in the default, or just lower the maxthreadsperconn value to 2. If it becomes equal (need_wakeup=1), how calling signal_listener() (waking up early from select) helps to poll on the connection again? signal_listenr() will write to the signal_pipe, which will wake up this POLL_FN in daemon.c: select_return = POLL_FN(the_connection_table->fd, num_poll, pr_timeout);
In case of sync operations, most of the times the thread number doesn't become equal to maxthreadsperconn. You'll have to heavily load the server to see that case in the default, or just lower the maxthreadsperconn value to 2.
If it becomes equal (need_wakeup=1), how calling signal_listener() (waking up early from select) helps to poll on the connection again? signal_listenr() will write to the signal_pipe, which will wake up this POLL_FN in daemon.c: select_return = POLL_FN(the_connection_table->fd, num_poll, pr_timeout);
If there is any "real" data to read, the code will then call handle_new_connection() and/or handle_pr_read_ready(). If not, then it will loop back around and call setup_pr_read_pds() to add connections that are ready to read data back to the poll array.
Similarly, we call signal_listener() in connection_make_readable() after setting gettingber to zero. Why do we wakeup early when we set gettingber to zero? The connection is ready to read the next operation. If POLL_FN (see above) is idle, we don't want to wait for it to timeout, we want it to break immediately and add the connection back to the poll array (in setup_pr_read_pds()). Could you please briefly explain the significance of signal_listner() See above. and c_threadnumber and how they are used? c_threadnumber is used to limit the number of threads servicing a single connection. We don't want a single connection to use up all of the threads.
Similarly, we call signal_listener() in connection_make_readable() after setting gettingber to zero. Why do we wakeup early when we set gettingber to zero? The connection is ready to read the next operation. If POLL_FN (see above) is idle, we don't want to wait for it to timeout, we want it to break immediately and add the connection back to the poll array (in setup_pr_read_pds()). Could you please briefly explain the significance of signal_listner() See above.
and c_threadnumber and how they are used? c_threadnumber is used to limit the number of threads servicing a single connection. We don't want a single connection to use up all of the threads.
jpeg of code-diff tool
One problem is that the function connection_make_readable() called at line 36 does this:\
{{{ void connection_make_readable(Connection *conn) { PR_Lock( conn->c_mutex ); conn->c_gettingber = 0; PR_Unlock( conn->c_mutex ); signal_listner(); } }}} That's why I suggested creating a new version connection_make_readable_nolock which does not acquire and release the c_mutex, and does not signal_listner(). That way, you could do all of the setting conn->c_gettingber = 0; and decrementing the refcnt and decrementing the threadnumber all inside the same mutex.
Another problem is at line 43 - I think you meant conn->c_gettingber instead of c_gettingber.
Tested 1.2.10.2 with 32 simultaneous reproducer program in the endless loop, but it could not duplicate the problem.
The test system: OS - Fedora15 64-bit; HW - 4 CPUs.
tentative patch
I ran the same test against the locally built server with the above patch. There was no problem in the run nor in the shutdown, either.
There is no good evidence the original problem reported on 1.2.0 still exists nor the patch solves the problem. So, for now, we are putting this ticket to later.
set default ticket origin to Community
Added initial screened field value.
I need to run some more testing. Moved the milestone to 1.3.0 rc1.
git patch file (master) -- very close to the original patch from the reporter. It passed the long running stress test.
git patch file (master) -- with connection_make_readable_nolock which does not acquire and release the c_mutex, and does not signal_listner(). It does not pass the stress test. Typical error: Unable to receive the response for a startReplication extended operation to consumer (Timed out)
git patch file (master)
Bug description: When a connection is to be released, the current code releases the connection object before making it readable, which leaves a small window for multiple threads accessing the same private buffer.
Fix description: This patch moves the location of releasing the connection object after the connection is readable.
Nice!
Reviewed by Rich (Thanks!!)
Pushed to master: commit 48b35ec
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=918696
Metadata Update from @nkinder: - Issue assigned to nhosoi - Issue set to the milestone: 1.3.0.rc1
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/276
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.