Issue #47975: Replication flow control: monitoring and tuning - 389-ds-base

389-ds-base

#47975 Replication flow control: monitoring and tuning

Closed: wontfix 3 years ago by spichugi. Opened 9 years ago by tbordaz.

Replication agreement can hang during a replication session (total or incremental update) depending on how fast a consumer can process the received updates/entries.

Ticket https://fedorahosted.org/389/ticket/47942, implements a flow control based on configurable attributes (nsds5ReplicaFlowControlWindow/nsds5ReplicaFlowControlPause).

This new ticket is to enhance this flow control so that:
- default values (window/pause) matches the general purpose use case
- describes a procedure so that administrator can determine their own tuning
- implement an automatic tuning that would use the recent updates/entries rate
- implement a monitoring of flow control events (e.g. cn=monitor,<RA>)

nhosoi commented 7 years ago

Per triage, push the target milestone to 1.3.6.

Metadata Update from @nhosoi:
- Issue set to the milestone: 1.3.6.0

7 years ago

Metadata Update from @mreynolds:
- Issue close_status updated to: None
- Issue set to the milestone: 1.4 backlog (was: 1.3.6.0)

6 years ago

tbordaz commented 3 years ago

@msauton , do you think it would be valuable to help tuning. If yes, what kind of information would you expect ?

tbordaz commented 3 years ago

@msauton , do you think it would be valuable to help tuning. If yes, what kind of information would you expect ?

msauton commented 3 years ago

that may need some debate and thoughs, but will try:

with larger IPA deployments, plus replicas and hosts provisioning in cloud environment, I would say yes, such feature would help.

often, the static configuration does not fit or scale to burst of activity, for cache(s), threads, dblocks, and for the online total and incremental updates with nsds5ReplicaFlowControlPause , nsds5ReplicaFlowControlWindow.

the errors log file should have messages with a severity level related to events and trigger conditions (WARN ?) and configuration changes ( INFO ?)
may be the possibility of some monitoring output with INFO, so we could collect historical data.

ideally, we like to see a more general value of entries/second , but we should probably see some more protocol related values like
- the replica id and the replication agreement id
- number of entries sent without acknowledgment
- last_message_id_received and last_message_id_sent
- the delay from flowControlPause
- may be the busywaittime and pausetime

one detail and possibly a different topic, related to replication logging in general, but for example like
slapi_log_err(SLAPI_LOG_REPL, repl_plugin_name,
"repl5_inc_waitfor_async_results - %d %d\n",
rd->last_message_id_received, rd->last_message_id_sent);

and for example
ERR - NSMMReplicationPlugin - repl5_inc_waitfor_async_results - Timed out waiting for responses: 69564 69578
->
the replica id and the replication agreement id would be an interesting info to collect, as now days we have many more replication agreements.

is the "automatic tuning that would use the recent updates/entries rate" a one time setting, or dynamic, with regular checks?

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1306

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix
- Issue status updated to: Closed (was: Open)

3 years ago

Metadata

Assignee

None

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

1.4 backlog

reviewstatus

None

rhbz

None

origin

Community

389-ds-base

Source Code

#47975 Replication flow control: monitoring and tuning Closed: wontfix 3 years ago by spichugi. Opened 9 years ago by tbordaz.

Metadata

#47975 Replication flow control: monitoring and tuning

Closed: wontfix 3 years ago by spichugi. Opened 9 years ago by tbordaz.