#664 Default LDAP connection timeout is too long
Closed: Invalid None Opened 13 years ago by dpal.

After the system lost connection over night (VPN connection expired) in the morning I got (for the first time the following)

(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo]
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Got request for [4097][1][idnumber=12785]
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP'
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working'
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.server' is 'not working'
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.bos.redhat.com'
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working'
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.server: [xx.xx.xx.xx]
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported.
(Tue Nov  2 08:13:33 2010) [sssd[be[redhat.com]]] [simple_bind_send] (4): Executing simple bind as: (null)





90 seconds time out is too long! It should be under 10 seconds by default. If you can't establish connection in 5 seconds something is really wrong so better go offline and do not annoy user. The customer might decide to adjust this if he has a bad network but in the common case 5 should be enough.




(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [simple_bind_send] (1): ldap_bind failed (-1)[Can't contact LDAP server]
(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.server' as 'not working'
(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [sdap_handle_release] (8): Trace: sh[0x8a929b8], connected[0], ops[(nil)], ldap[0x91abab0], destructor_lock[0], release_memory[0]
(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [be_mark_offline] (8): Going offline!
(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [be_run_offline_cb] (3): Going offline. Running callbacks.
(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678
(Tue Nov  2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.

The config file

[sssd]
config_file_version = 2
reconnection_retries = 3
services = nss, pam
domains = redhat.com
debug_timestamps = true
debug_level = 9


[nss]
filter_groups = root
filter_users = root
entry_cache_nowait_timeout = 50

[pam]

[domain/redhat.com]
id_provider = ldap
ldap_uri = ldap://ldap.server
ldap_user_search_base = ou=Users,dc=redhat,dc=com
ldap_group_search_base = ou=Groups,dc=redhat,dc=com
enumerate = false

auth_provider = krb5
krb5_kdcip = kerberos.server, kerberos2.server
krb5_auth_timeout = 5
krb5_realm = REDHAT.COM
cache_credentials = true

access_provider = permit
bhpass_provider = krb5

krb5_store_password_if_offline = True

Fields changed

milestone: NEEDS_TRIAGE => SSSD 1.5.0
owner: somebody => sbose

Fields changed

description: After the system lost connection over night (VPN connection expired) in the morning I got (for the first time the following)

{{{
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo]
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Got request for [4097][1][idnumber=12785]
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.bos.redhat.com' is 'working'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.bos.redhat.com' is 'not working'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.bos.redhat.com'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.bos.redhat.com' is 'working'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.bos.redhat.com: [10.16.255.3]
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported.
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [simple_bind_send] (4): Executing simple bind as: (null)

90 seconds time out is too long! It should be under 10 seconds by default. If you can't establish connection in 5 seconds something is really wrong so better go offline and do not annoy user. The customer might decide to adjust this if he has a bad network but in the common case 5 should be enough.

(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [simple_bind_send] (1): ldap_bind failed (-1)[Can't contact LDAP server]
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.bos.redhat.com' as 'not working'
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sdap_handle_release] (8): Trace: sh[0x8a929b8], connected[0], ops[(nil)], ldap[0x91abab0], destructor_lock[0], release_memory[0]
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_mark_offline] (8): Going offline!
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_run_offline_cb] (3): Going offline. Running callbacks.
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.
}}}

The config file

{{{
[sssd]
config_file_version = 2
reconnection_retries = 3
services = nss, pam
domains = redhat.com
debug_timestamps = true
debug_level = 9

[nss]
filter_groups = root
filter_users = root
entry_cache_nowait_timeout = 50

[pam]

[domain/redhat.com]
id_provider = ldap
ldap_uri = ldap://ldap.bos.redhat.com
ldap_user_search_base = ou=Users,dc=redhat,dc=com
ldap_group_search_base = ou=Groups,dc=redhat,dc=com
enumerate = false

auth_provider = krb5
krb5_kdcip = kerberos.bos.redhat.com, kerberos.corp.redhat.com
krb5_auth_timeout = 5
krb5_realm = REDHAT.COM
cache_credentials = true

access_provider = permit
bhpass_provider = krb5

krb5_store_password_if_offline = True
}}}

=> After the system lost connection over night (VPN connection expired) in the morning I got (for the first time the following)

{{{
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo]
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Got request for [4097][1][idnumber=12785]
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.server' is 'not working'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.bos.redhat.com'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working'
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.server: [xx.xx.xx.xx]
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported.
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [simple_bind_send] (4): Executing simple bind as: (null)

90 seconds time out is too long! It should be under 10 seconds by default. If you can't establish connection in 5 seconds something is really wrong so better go offline and do not annoy user. The customer might decide to adjust this if he has a bad network but in the common case 5 should be enough.

(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [simple_bind_send] (1): ldap_bind failed (-1)[Can't contact LDAP server]
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.server' as 'not working'
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sdap_handle_release] (8): Trace: sh[0x8a929b8], connected[0], ops[(nil)], ldap[0x91abab0], destructor_lock[0], release_memory[0]
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_mark_offline] (8): Going offline!
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_run_offline_cb] (3): Going offline. Running callbacks.
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.
}}}

The config file

{{{
[sssd]
config_file_version = 2
reconnection_retries = 3
services = nss, pam
domains = redhat.com
debug_timestamps = true
debug_level = 9

[nss]
filter_groups = root
filter_users = root
entry_cache_nowait_timeout = 50

[pam]

[domain/redhat.com]
id_provider = ldap
ldap_uri = ldap://ldap.server
ldap_user_search_base = ou=Users,dc=redhat,dc=com
ldap_group_search_base = ou=Groups,dc=redhat,dc=com
enumerate = false

auth_provider = krb5
krb5_kdcip = kerberos.server, kerberos2.server
krb5_auth_timeout = 5
krb5_realm = REDHAT.COM
cache_credentials = true

access_provider = permit
bhpass_provider = krb5

krb5_store_password_if_offline = True
}}}

Can you give the exact version numbers of the RPM you are using?

The delay occurs during the execution of ldap_sasl_bind() which is part of the asynchronous API of OpenLDAP. Maybe you are seeing some kinf of TCP timeout here, which is not handled by the version of OpenLDAP you are using.

I'm wondering why you do not see some message about a request to the RootDSE, because this should happen before the call to ldap_sasl_bind().

I think I saw this issue again. The scenario was the following:

- I was online connected to the central server. 
- I clicked suspend, closed the laptop and moved to a completely different network
- When it tried to wake up I saw a delay of about 90 seconds before the prompt for the password came up.

The complication is that I am hitting another bug related to the file rotation. The log got rotated not long before I closed the lid and nothing is getting logged into the sssd log any more.

I'm pretty confident that this delay is due to the fact that the version of the OpenLDAP client libraries you are using (2.3.43) does not really support asynchronous connections (one indication is that there is no LDAP_OPT_CONNECT_ASYNC option).

I would be very useful to see if this delay can be reproduced with more recent OpenLDAP versions. Although we do not set LDAP_OPT_CONNECT_ASYNC the newer code might handle timeouts during the creation if the connection more gracefully and respect other timeout settings.

Fields changed

status: new => assigned

I just opened a ticket #743 it has a lot of details int the log. The ticket #743 itself tracks the invalid fail over logic that I traced down. It causes the delay to happen twice!!!

Please take a look at the delay there! It is 2 minutes sharp! I do not know why. I did not change the configuration. I am really ready to try latest openldap libraries and see if this issue would go away.

coverity: =>

I am hitting the 2 min delay constantly. From time to time the machine freezes for 2 minutes waiting for the LDAP timeout.

(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sbus_dispatch] (9): Dispatching.
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [be_get_account_info] (4): Got request for [4099][1][name=postfix]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_id_op_connect_step] (9): beginning to connect
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP'
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_server_status] (7): Status of server 'ldap.foobar.com' is 'working'
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.foobar.com' is 'not working'
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.foobar.com'
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_server_status] (7): Status of server 'ldap.foobar.com' is 'working'
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [be_resolve_server_done] (4): Found address for server ldap.foobar.com: [xxx.xxx.xxx.xxx]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported.
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_rootdse_send] (9): Getting rootdse
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (6): calling ldap_search_ext with [(objectclass=*)][].
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [*]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [altServer]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [namingContexts]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedControl]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedExtension]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedFeatures]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedLDAPVersion]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedSASLMechanisms]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [defaultNamingContext]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [lastUSN]
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [highestCommittedUSN]
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (3): ldap_search_ext failed: Can't contact LDAP server
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (3): Connection error: (null)
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_install_ldap_callbacks] (8): Trace: sh[0x9afd8c8], connected[1], ops[(nil)], fde[0x9ae3de8], ldap[0xd5fd928]
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.foobar.com' as 'not working'
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_handle_release] (8): Trace: sh[0x9afd8c8], connected[1], ops[(nil)], ldap[0xd5fd928], destructor_lock[0], release_memory[0]
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP'
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [get_server_status] (7): Status of server 'ldap.foobar.com' is 'working'
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.foobar.com' is 'not working'
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [fo_resolve_service_send] (1): No available servers for service 'LDAP'
(Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_id_op_connect_done] (1): Failed to connect, going offline (5 [Input/output error])

I also noticed that it tries to establish the connection on any new request. Is there a config parameter that would control for how long it should wait before it retries?

The more I think about it the more I come to conclusion that we need to introduce machine profiles like mentioned in the ticket #743. We need to define a set of assumptions about client connectivity, timeouts and fail over. If it is a "laptop" we can assume that the network would most likely be changing frequently and that laptop is frequently off the corporate network thus it does not make sense to try to connect to the server frequently if it was detected that it can't. I will file a separate ticket.

When using Calc and trying to open File menu calc immediately tries to perform an nss lookup. It hits SSSD and it tries to go online. Since I am not connected to the corporate network the connection fails, SSSD goes offline (after 2 min) and then I hit the File menu again and all repeats again so at this point I can't use Calc at all.

Fields changed

milestone: SSSD 1.5.0 => SSSD 1.5.1

Fields changed

milestone: SSSD 1.5.1 => SSSD 1.5.2
upgrade: => 0

Fields changed

resolution: => worksforme
status: assigned => closed

Fields changed

rhbz: => 0

Metadata Update from @dpal:
- Issue assigned to sbose
- Issue set to the milestone: SSSD 1.5.2

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1706

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata