#49091 remove changelog semaphore
Closed: wontfix 6 years ago Opened 7 years ago by lkrispen.

Turning an older email thread into a ticket:

in cl5_api we have a semaphore which a comment says is used to limit the number of concurrent writes, the default value is: 2

But. The semaphore is only used

  • in writing an update to the changelog in write_changelog_and_ruv(), which is serialized by the backend lock and so there always will only be one
  • in log_ruv_elements (when changelog is reloaded).
  • ldif import

It is NOT used in changelog trimming, purging of cleaned RIDs, changelog compaction.

so, in the cases where we do use it there will be no more than two parallel updates, in cases wheer there is, or could be, heavy update contention like changelog trimming while updates are applied iz is not used.

I think it was possibly useful when writing the changelog was a real postop, so the semaphore played the role of the backend lock.

So far I only have seen problems reported about the semaphore file when it could not be deleted, recreated and I don't see any benefit in keeping it.


It was introduced to solve deadlocks found in the reliab test suites.

If the tests pass w/o the semaphore, there is nothing to stop it. ;)

Looking into the comment when the semaphore is created it says:

{{{
/
* Considerations for setting up cl semaphore:
* (1) The NT version of SleepyCat uses test-and-set mutexes
* at the DB page level instead of blocking mutexes. That has
* proven to be a killer for the changelog DB, as this DB is
* accessed by multiple a reader threads (the repl thread) and
* writer threads (the server ops threads) usually at the last
* pages of the DB, due to the sequential nature of the changelog
* keys. To avoid the test-and-set mutexes, we could use semaphore
* to serialize the writers and avoid the high mutex contention
* that SleepyCat is unable to avoid.
* (2) DS 6.2 introduced the semaphore on all platforms (replaced
* the serial lock used on Windows and Linux described above).
* The number of the concurrent writes now is configurable by
* nsslapd-changelogmaxconcurrentwrites (the server needs to
* be restarted).
/
}}}

so, in my understanding it was used to replace a serial lock, but now the writing to the changelog is done inside the txn and already serialized by the backend lock.

On the other side there are deadlocks in the changelog, see ticket 49040, which are not prevented by the semaphore, which is only used for writing.

I will try to run the reliab tests to get more confidence.

Metadata Update from @lkrispen:
- Issue set to the milestone: 1.3.6.0

7 years ago

Metadata Update from @firstyear:
- Custom field component reset (from Replication - General)
- Issue close_status updated to: None
- Issue tagged with: Hot, Investigate

7 years ago

Metadata Update from @mreynolds:
- Issue set to the milestone: 1.3.7 backlog (was: 1.3.6.0)

6 years ago

I did test with four masters and six backend all replicated in full mesh and continously applied changes to all backends and changelog trimming running every 30 sec.
No problems noticed, so I will send the following patch for review

0001-remove-usage-of-changelog-semaphore.patch

Metadata Update from @lkrispen:
- Issue assigned to lkrispen

6 years ago

Metadata Update from @lkrispen:
- Custom field reviewstatus adjusted to review

6 years ago

The patch looks good to me. Ack

Metadata Update from @tbordaz:
- Custom field reviewstatus adjusted to ack (was: review)

6 years ago

Metadata Update from @mreynolds:
- Custom field component adjusted to None

6 years ago

@lkrispen Are we okay to merge this for you?

oops, it is already committed:
commit 13b8001

committed to master before the creation of the 1.3.7 branch, so we should be fine. no need to backport to 1.3.6

Metadata Update from @firstyear:
- Issue close_status updated to: fixed
- Issue status updated to: Closed (was: Open)

6 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2150

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: fixed)

3 years ago

Login to comment on this ticket.

Metadata