#1679 Primary server status is not always reset after failover to backup server happened
Closed: Fixed None Opened 11 years ago by jhrozek.

https://bugzilla.redhat.com/show_bug.cgi?id=880956 (Red Hat Enterprise Linux 6)

Description of problem:
Primary server status is not always reset after failover to backup server
happened

Version-Release number of selected component (if applicable):
sssd-1.9.2-25

How reproducible:
Always, if TTL of primary server is more than 30s.

Steps to Reproduce:
1.sssd.conf domain section has:
ldap_uri = ldap://SERVER1
ldap_backup_uri = ldap://SERVER2

2. getent passwd user1_from_server1
Works

3. Stop ldap on SERVER1

4. getent passwd user1_from_server2
Works

5. Start ldap on Server1

6. sleep 40

7. getent passwd user2_from_server1
Fails (Should have worked, since backup to primary failover is 30s)

8. sleep 40

9. getent passwd user2_from_server1
Works fine now

Actual results:
Primary Server is not contacted back after 30s after failover to backup server
happened.
From the logs, it seems primary server was looked up in 30s. However, since the
TTL of primary server is set to more than 30 in dns, the primary server status
is not reset.
See additional info for logs.

Expected results:
Failover from backup to primary should reset the primary server status after
30s.

Additional info:

(Wed Nov 28 13:25:52 2012) [sssd[be[LDAP]]]
[be_primary_server_timeout_activate] (0x2000): Primary server reactivation
timeout set to 30 seconds

...
After 30 seconds
...

(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [be_primary_server_timeout]
(0x0400): Looking for primary server!
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100):
Trying to resolve service 'LDAP'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_server_status] (0x1000):
Status of server 'SERVER2' is 'working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port
status of port 389 for server 'SERVER2' is 'working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_server_status] (0x1000):
Status of server 'SERVER1' is 'working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port
status of port 389 for server 'SERVER1' is 'not working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_server_status] (0x1000):
Status of server 'SERVER2' is 'working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port
status of port 389 for server 'SERVER2' is 'working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]]
[fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 10
seconds
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [get_server_status] (0x1000):
Status of server 'SERVER2' is 'working'
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [be_resolve_server_process]
(0x1000): Saving the first resolved server
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]] [be_resolve_server_process]
(0x0200): Found address for server SERVER2: [192.168.122.22] TTL 59
(Wed Nov 28 13:26:22 2012) [sssd[be[LDAP]]]
[be_primary_server_timeout_activate] (0x2000): Primary server reactivation
timeout set to 30 seconds

...
After 30 seconds
...

(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [be_primary_server_timeout]
(0x0400): Looking for primary server!
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [fo_resolve_service_send] (0x0100):
Trying to resolve service 'LDAP'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [get_server_status] (0x1000):
Status of server 'SERVER1' is 'working'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [get_server_status] (0x0100):
Hostname resolution expired, resetting the server status of 'SERVER1'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [set_server_common_status]
(0x0100): Marking server 'SERVER1' as 'name not resolved'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [get_port_status] (0x1000): Port
status of port 389 for server 'SERVER1' is 'not working'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [get_port_status] (0x0100):
Reseting the status of port 389 for server 'SERVER1'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]]
[fo_resolve_service_activate_timeout] (0x2000): Resolve timeout set to 10
seconds
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [get_server_status] (0x1000):
Status of server 'SERVER1' is 'name not resolved'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_is_address] (0x4000):
[cobra.lab.eng.pnq.redhat.com] does not look like an IP address
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_step]
(0x2000): Querying files
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_files_send]
(0x0100): Trying to resolve A record of 'SERVER1' in files
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [set_server_common_status]
(0x0100): Marking server 'SERVER1' as 'resolving name'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_step]
(0x2000): Querying files
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_files_send]
(0x0100): Trying to resolve AAAA record of 'SERVER1' in files
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_next]
(0x0200): No more address families to retry
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_step]
(0x2000): Querying DNS
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_dns_query]
(0x0100): Trying to resolve A record of 'SERVER1' in DNS
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [schedule_request_timeout]
(0x2000): Scheduling a timeout of 5 seconds
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [schedule_timeout_watcher]
(0x2000): Scheduling DNS timeout watcher
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [unschedule_timeout_watcher]
(0x4000): Unscheduling DNS timeout watcher
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [resolv_gethostbyname_dns_parse]
(0x1000): Parsing an A reply
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [request_watch_destructor]
(0x0400): Deleting request watch
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [set_server_common_status]
(0x0100): Marking server 'SERVER1' as 'name resolved'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [be_resolve_server_process]
(0x1000): Saving the first resolved server
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [be_resolve_server_process]
(0x0200): Found address for server SERVER1: [192.168.122.11] TTL 60
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [sdap_uri_callback] (0x0400):
Constructed uri 'ldap://SERVER1'
(Wed Nov 28 13:26:52 2012) [sssd[be[LDAP]]] [be_run_reconnect_cb] (0x0400):
Reconnecting. Running callbacks.

Fields changed

blockedby: =>
blocking: =>
coverity: =>
design: =>
design_review: => 0
feature_milestone: =>
fedora_test_page: =>
milestone: NEEDS_TRIAGE => SSSD 1.9.4
testsupdated: => 0

Fields changed

owner: somebody => pbrezina
status: new => assigned

Fields changed

patch: 0 => 1

resolution: => fixed
status: assigned => closed

Metadata Update from @jhrozek:
- Issue assigned to pbrezina
- Issue set to the milestone: SSSD 1.9.4

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/2721

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata