#276 Multiple threads simultaneously working on connection's private buffer causes ns-slapd to abort
Closed: wontfix None Opened 12 years ago by rmeggins.

On a multi CPU system, ns-slapd process aborts. Stack trace says it aborted calling PR_ASSERT() at the following location in connection.c (Release version of 389 dirsrv is 1.2.0). Problem occurs on latest code as well.

Connection.c
1690 int connection_read_operation(Connection *conn, Operation *op,
ber_tag_t *tag, int *remaining_data)
1691 {

1734         }
1735         /* If we still haven't seen a complete PDU, read from the network */
1736         while (*tag == LBER_DEFAULT) {
1737                 int ioblocktimeout_waits = config_get_ioblocktimeout() / CONN_TURBO_TIMEOUT_INTERVAL;
1738                 /* We should never get here with data remaining in the buffer */
1739                 PR_ASSERT( !new_operation || 0 == (conn->c_private->c_buffer_bytes - conn->c_private->c_buffer_offset) );<------------------ aborted calling PR_ASSERT()
1740                 /* We make a non-blocking read call */
-----

Using instrumented image I have collected data and found that:

While a thread is working on connection's private buffer, another thread which belongs to older connection makes the connection readable again by modifying c_gettingber to zero.
This results in waking up one more thread. Finally more than one thread works simultaneously on connection's private buffer and results in aborting ns-slapd.

It happens as below:

  1. Thread X is working on Connection-1's last operation.

  2. After performing the LDAP operation ( connection_dispatch_operation() ) decrements the connection's c_refcnt by calling connection_release_nolock() in connection_threadmain() function.

  3. Since this is the last operation, c_refcnt becomes zero.

  4. Thread X yet to execute the code that updates c_threadnumber and c_gettingber. But as soon as c_refcnt of the connection becomes zero, main thread allocates the same connection object/structure to new incoming connection connection-2. Thread-Y started executing operation belongs to this newer connection (Connection-2).

    connection.c

    2008 static void
    2009 connection_threadmain()
    2010 {

    2187 if (!thread_turbo_flag&& !more_data) {
    2188 connection_release_nolock (conn);<------------------ Thread-X decremented c_refcnt and now c_refcnt = 0
    2189 }
    2190 PR_Unlock( conn->c_mutex );
    2191 }
    2192 / Since we didn't do so earlier, we need to make a replication connection readable again here /
    2193 if ( ((1 == is_timedout) || (replication_connection&& !thread_turbo_flag))&& !more_data)
    2194 connection_make_readable(conn);
    2195 pb = NULL;
    2196
    2197 if (!thread_turbo_flag&& !more_data) { / Don't do this in turbo mode /
    2198 PR_Lock( conn->c_mutex );
    2199 / if the threadnumber of now below the maximum, wakeup
    2200 * the listener thread so that we start polling on this
    2201 * connection again
    2202
    /
    2203 / DBDB I think this code is bogus -- we already signaled the listener above here /
    2204 if (conn->c_threadnumber == config_get_maxthreadsperconn())
    2205 need_wakeup = 1;
    2206 else
    2207 need_wakeup = 0;
    2208 conn->c_threadnumber--;
    2209 PR_Unlock( conn->c_mutex );
    2210
    2211 if (need_wakeup)
    2212 signal_listner();
    2213 }
    2214
    2215
    2216 } / while (1) /
    2217 }

  5. Thread-Y started processing operation data belongs to newer connection Connection-2 (connection_read_operation() ).

  6. Now Thread-X of connection-1 called make_connection_readable() and made the connection readable again by setting c_gettingber to 0.

  7. This wakes up another thread of connection-2 and it too starts processing operation data (connection_read_operation() ).

  8. So at this point of time more than one thread is working on connection's private data.

  9. As two threads are working on the connections c_buffer, c_bufferbytes and c_offset, at some time it meats conditions to abort() at the following location:

Connection.c: connection_read_operation()

1734         }
1735         /* If we still haven't seen a complete PDU, read from the network */
1736         while (*tag == LBER_DEFAULT) {
1737                 int ioblocktimeout_waits = config_get_ioblocktimeout() / CONN_TURBO_TIMEOUT_INTERVAL;
1738                 /* We should never get here with data remaining in the buffer */
1739                 PR_ASSERT( !new_operation || 0 == (conn->c_private->c_buffer_bytes - conn->c_private->c_buffer_offset) );<------------------ aborted calling PR_ASSERT()
1740                 /* We make a non-blocking read call */

The probable fix could be:

In the connection_threadmain() function, if it is CONN_DONE don't make the connection readable again after decrementing the c_refcnt.

So should we handle CONN_DONE and CONN_TIMEDOUT situation differently as below?

Present code:

2097                 switch (ret) {
2098                         case CONN_DONE:
2099                                 /* This means that the connection was closed, so clear turbo mode */
2100                         /*FALLTHROUGH*/
2101                         case CONN_TIMEDOUT:
2102                                 thread_turbo_flag = 0;
2103                                 is_timedout = 1;

new code:

2097                 switch (ret) {
2098                         case CONN_DONE:
2099                                 /* This means that the connection was closed, so clear turbo mode */
                                         is_conn_done = 0;<-----------------
                                         thread_turbo_flag = 0;<-------------
2101                         case CONN_TIMEDOUT:
2102                                 thread_turbo_flag = 0;
2103                                 is_timedout = 1;

Note that you don't have to set thread_turbo_flag = 0 in the CONN_DONE case because it will be set when it falls through to the CONN_TIMEDOUT case.

Present:

2193                 if ( ((1 == is_timedout) || (replication_connection&&     !thread_turbo_flag))&&     !more_data )
2194                         connection_make_readable(conn);
2195                 pb = NULL;

new:

2193    --------->      if ( !is_conn_done&&     ((1 == is_timedout) || (replication_connection&&     !thread_turbo_flag))&&     !more_data )
2194                         connection_make_readable(conn);
2195                 pb = NULL;

If ret is CONN_DONE, but you set is_conn_done = 0, how does the code know the connection is really done?


If I understand the problem correctly, it is that we should not touch
the conn object at all after doing the connection_release_nolock() -
is that correct?

So perhaps the simplest solution will be to move the
connection_release_nolock() to after the code that does the
connection_make_readable() and the threadnumber/wakeup handling? And
also in the doshutdown case - have to release the connection before
doing the return. Might also need a
connection_make_readable_nolock() so that we can reset the gettingber
flag and decrement the refcount and do the threadnumber handling inside the same mutex.

I think in your proposed solution, the code will not do the
connection_make_readable() until it loops again - this seems like a
minor change in behavior, but the code in connection_threadmain is
very sensitive - small changes can have very large unintended consequences.

As you said, decrementing the reference count should be the last thing before control goes back to
starting of the while loop in connection_threadmain. But does this change has any ill effects as signal_listner()
and connection_make_readable() are called before decrementing reference count?? (That is waking up the listener thread and
setting gettingber to zero before decrementing reference count.) I request your inputs on this.

connection_make_readable() - as long as all code that accesses
gettingber first acquires the conn->c_mutex lock, it should be fine

signal_listner() - probably better to set the need_wakeup flag inside
the mutex, then call signal_listner() after doing the connection_release
and the PR_Unlock()

In case of sync operations, most of the times the thread number doesn't become equal to maxthreadsperconn.
You'll have to heavily load the server to see that case in the default, or just lower the maxthreadsperconn value to 2.

If it becomes equal (need_wakeup=1), how calling signal_listener() (waking up early from select) helps to poll on the connection again?
signal_listenr() will write to the signal_pipe, which will wake up this POLL_FN in daemon.c:
select_return = POLL_FN(the_connection_table->fd, num_poll, pr_timeout);

If there is any "real" data to read, the code will then call
handle_new_connection() and/or handle_pr_read_ready(). If not, then it will loop back around and call setup_pr_read_pds() to add connections that are ready to read data back to the poll array.

Similarly, we call signal_listener() in connection_make_readable() after setting gettingber to zero.
Why do we wakeup early when we set gettingber to zero?
The connection is ready to read the next operation. If POLL_FN (see
above) is idle, we don't want to wait for it to timeout, we want it to break immediately and add the connection back to the poll array (in setup_pr_read_pds()).
Could you please briefly explain the significance of signal_listner()
See above.

and c_threadnumber and how they are used?
c_threadnumber is used to limit the number of threads servicing a single connection. We don't want a single connection to use up all of the threads.

One problem is that the function connection_make_readable() called at line 36 does this:\

{{{
void connection_make_readable(Connection *conn)
{
PR_Lock( conn->c_mutex );
conn->c_gettingber = 0;
PR_Unlock( conn->c_mutex );
signal_listner();
}
}}}
That's why I suggested creating a new version connection_make_readable_nolock which does not acquire and release the c_mutex, and does not signal_listner(). That way, you could do all of the setting conn->c_gettingber = 0; and decrementing the refcnt and decrementing the threadnumber all inside the same mutex.

Another problem is at line 43 - I think you meant conn->c_gettingber instead of c_gettingber.

Tested 1.2.10.2 with 32 simultaneous reproducer program in the endless loop, but it could not duplicate the problem.

The test system: OS - Fedora15 64-bit; HW - 4 CPUs.

I ran the same test against the locally built server with the above patch. There was no problem in the run nor in the shutdown, either.

There is no good evidence the original problem reported on 1.2.0 still exists nor the patch solves the problem. So, for now, we are putting this ticket to later.

set default ticket origin to Community

Added initial screened field value.

I need to run some more testing. Moved the milestone to 1.3.0 rc1.

git patch file (master) -- very close to the original patch from the reporter. It passed the long running stress test.

git patch file (master) -- with connection_make_readable_nolock which does not acquire and release the c_mutex, and does not signal_listner(). It does not pass the stress test. Typical error: Unable to receive the response for a startReplication extended operation to consumer (Timed out)

git patch file (master)

Bug description: When a connection is to be released, the current
code releases the connection object before making it readable,
which leaves a small window for multiple threads accessing the
same private buffer.

Fix description: This patch moves the location of releasing the
connection object after the connection is readable.

Reviewed by Rich (Thanks!!)

Pushed to master: commit 48b35ec

Metadata Update from @nkinder:
- Issue assigned to nhosoi
- Issue set to the milestone: 1.3.0.rc1

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/276

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata