Learn more about these different git repos.
Other Git URLs
After the system lost connection over night (VPN connection expired) in the morning I got (for the first time the following)
(Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678 (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching. (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Got request for [4097][1][idnumber=12785] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.server' is 'not working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.bos.redhat.com' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.server: [xx.xx.xx.xx] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported. (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [simple_bind_send] (4): Executing simple bind as: (null) 90 seconds time out is too long! It should be under 10 seconds by default. If you can't establish connection in 5 seconds something is really wrong so better go offline and do not annoy user. The customer might decide to adjust this if he has a bad network but in the common case 5 should be enough. (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [simple_bind_send] (1): ldap_bind failed (-1)[Can't contact LDAP server] (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.server' as 'not working' (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sdap_handle_release] (8): Trace: sh[0x8a929b8], connected[0], ops[(nil)], ldap[0x91abab0], destructor_lock[0], release_memory[0] (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_mark_offline] (8): Going offline! (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_run_offline_cb] (3): Going offline. Running callbacks. (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678 (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching.
The config file
[sssd] config_file_version = 2 reconnection_retries = 3 services = nss, pam domains = redhat.com debug_timestamps = true debug_level = 9 [nss] filter_groups = root filter_users = root entry_cache_nowait_timeout = 50 [pam] [domain/redhat.com] id_provider = ldap ldap_uri = ldap://ldap.server ldap_user_search_base = ou=Users,dc=redhat,dc=com ldap_group_search_base = ou=Groups,dc=redhat,dc=com enumerate = false auth_provider = krb5 krb5_kdcip = kerberos.server, kerberos2.server krb5_auth_timeout = 5 krb5_realm = REDHAT.COM cache_credentials = true access_provider = permit bhpass_provider = krb5 krb5_store_password_if_offline = True
Fields changed
milestone: NEEDS_TRIAGE => SSSD 1.5.0 owner: somebody => sbose
description: After the system lost connection over night (VPN connection expired) in the morning I got (for the first time the following)
{{{ (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678 (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching. (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Got request for [4097][1][idnumber=12785] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.bos.redhat.com' is 'working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.bos.redhat.com' is 'not working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.bos.redhat.com' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.bos.redhat.com' is 'working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.bos.redhat.com: [10.16.255.3] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported. (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [simple_bind_send] (4): Executing simple bind as: (null)
90 seconds time out is too long! It should be under 10 seconds by default. If you can't establish connection in 5 seconds something is really wrong so better go offline and do not annoy user. The customer might decide to adjust this if he has a bad network but in the common case 5 should be enough.
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [simple_bind_send] (1): ldap_bind failed (-1)[Can't contact LDAP server] (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.bos.redhat.com' as 'not working' (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sdap_handle_release] (8): Trace: sh[0x8a929b8], connected[0], ops[(nil)], ldap[0x91abab0], destructor_lock[0], release_memory[0] (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_mark_offline] (8): Going offline! (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_run_offline_cb] (3): Going offline. Running callbacks. (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678 (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching. }}}
{{{ [sssd] config_file_version = 2 reconnection_retries = 3 services = nss, pam domains = redhat.com debug_timestamps = true debug_level = 9
[nss] filter_groups = root filter_users = root entry_cache_nowait_timeout = 50
[pam]
[domain/redhat.com] id_provider = ldap ldap_uri = ldap://ldap.bos.redhat.com ldap_user_search_base = ou=Users,dc=redhat,dc=com ldap_group_search_base = ou=Groups,dc=redhat,dc=com enumerate = false
auth_provider = krb5 krb5_kdcip = kerberos.bos.redhat.com, kerberos.corp.redhat.com krb5_auth_timeout = 5 krb5_realm = REDHAT.COM cache_credentials = true
access_provider = permit bhpass_provider = krb5
krb5_store_password_if_offline = True }}}
=> After the system lost connection over night (VPN connection expired) in the morning I got (for the first time the following)
{{{ (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678 (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching. (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Got request for [4097][1][idnumber=12785] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.server' is 'not working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.bos.redhat.com' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [get_server_status] (7): Status of server 'ldap.server' is 'working' (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [be_resolve_server_done] (4): Found address for server ldap.server: [xx.xx.xx.xx] (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported. (Tue Nov 2 08:13:33 2010) [sssd[be[redhat.com]]] [simple_bind_send] (4): Executing simple bind as: (null)
(Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [simple_bind_send] (1): ldap_bind failed (-1)[Can't contact LDAP server] (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.server' as 'not working' (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sdap_handle_release] (8): Trace: sh[0x8a929b8], connected[0], ops[(nil)], ldap[0x91abab0], destructor_lock[0], release_memory[0] (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_mark_offline] (8): Going offline! (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [be_run_offline_cb] (3): Going offline. Running callbacks. (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): dbus conn: 8A1A678 (Tue Nov 2 08:14:53 2010) [sssd[be[redhat.com]]] [sbus_dispatch] (9): Dispatching. }}}
[domain/redhat.com] id_provider = ldap ldap_uri = ldap://ldap.server ldap_user_search_base = ou=Users,dc=redhat,dc=com ldap_group_search_base = ou=Groups,dc=redhat,dc=com enumerate = false
auth_provider = krb5 krb5_kdcip = kerberos.server, kerberos2.server krb5_auth_timeout = 5 krb5_realm = REDHAT.COM cache_credentials = true
Can you give the exact version numbers of the RPM you are using?
The delay occurs during the execution of ldap_sasl_bind() which is part of the asynchronous API of OpenLDAP. Maybe you are seeing some kinf of TCP timeout here, which is not handled by the version of OpenLDAP you are using.
I'm wondering why you do not see some message about a request to the RootDSE, because this should happen before the call to ldap_sasl_bind().
I think I saw this issue again. The scenario was the following:
- I was online connected to the central server. - I clicked suspend, closed the laptop and moved to a completely different network - When it tried to wake up I saw a delay of about 90 seconds before the prompt for the password came up.
The complication is that I am hitting another bug related to the file rotation. The log got rotated not long before I closed the lid and nothing is getting logged into the sssd log any more.
I'm pretty confident that this delay is due to the fact that the version of the OpenLDAP client libraries you are using (2.3.43) does not really support asynchronous connections (one indication is that there is no LDAP_OPT_CONNECT_ASYNC option).
I would be very useful to see if this delay can be reproduced with more recent OpenLDAP versions. Although we do not set LDAP_OPT_CONNECT_ASYNC the newer code might handle timeouts during the creation if the connection more gracefully and respect other timeout settings.
status: new => assigned
I just opened a ticket #743 it has a lot of details int the log. The ticket #743 itself tracks the invalid fail over logic that I traced down. It causes the delay to happen twice!!!
Please take a look at the delay there! It is 2 minutes sharp! I do not know why. I did not change the configuration. I am really ready to try latest openldap libraries and see if this issue would go away.
coverity: =>
I am hitting the 2 min delay constantly. From time to time the machine freezes for 2 minutes waiting for the LDAP timeout.
(Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sbus_dispatch] (9): Dispatching. (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sbus_message_handler] (9): Received SBUS method [getAccountInfo] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [be_get_account_info] (4): Got request for [4099][1][name=postfix] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [be_get_account_info] (4): Request processed. Returned 1,11,Fast reply - offline (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_id_op_connect_step] (9): beginning to connect (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP' (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_server_status] (7): Status of server 'ldap.foobar.com' is 'working' (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.foobar.com' is 'not working' (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_port_status] (4): Reseting the status of port 389 for server 'ldap.foobar.com' (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [get_server_status] (7): Status of server 'ldap.foobar.com' is 'working' (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [be_resolve_server_done] (4): Found address for server ldap.foobar.com: [xxx.xxx.xxx.xxx] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [setup_ldap_connection_callbacks] (9): LDAP connection callbacks are not supported. (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_rootdse_send] (9): Getting rootdse (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (6): calling ldap_search_ext with [(objectclass=*)][]. (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [*] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [altServer] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [namingContexts] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedControl] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedExtension] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedFeatures] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedLDAPVersion] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [supportedSASLMechanisms] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [defaultNamingContext] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [lastUSN] (Sun Dec 19 11:00:04 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (7): Requesting attrs: [highestCommittedUSN] (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (3): ldap_search_ext failed: Can't contact LDAP server (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_get_generic_send] (3): Connection error: (null) (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_install_ldap_callbacks] (8): Trace: sh[0x9afd8c8], connected[1], ops[(nil)], fde[0x9ae3de8], ldap[0xd5fd928] (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [fo_set_port_status] (4): Marking port 389 of server 'ldap.foobar.com' as 'not working' (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_handle_release] (8): Trace: sh[0x9afd8c8], connected[1], ops[(nil)], ldap[0xd5fd928], destructor_lock[0], release_memory[0] (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [fo_resolve_service_send] (4): Trying to resolve service 'LDAP' (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [get_server_status] (7): Status of server 'ldap.foobar.com' is 'working' (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [get_port_status] (7): Port status of port 389 for server 'ldap.foobar.com' is 'not working' (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [fo_resolve_service_send] (1): No available servers for service 'LDAP' (Sun Dec 19 11:01:56 2010) [sssd[be[foobar.com]]] [sdap_id_op_connect_done] (1): Failed to connect, going offline (5 [Input/output error])
I also noticed that it tries to establish the connection on any new request. Is there a config parameter that would control for how long it should wait before it retries?
The more I think about it the more I come to conclusion that we need to introduce machine profiles like mentioned in the ticket #743. We need to define a set of assumptions about client connectivity, timeouts and fail over. If it is a "laptop" we can assume that the network would most likely be changing frequently and that laptop is frequently off the corporate network thus it does not make sense to try to connect to the server frequently if it was detected that it can't. I will file a separate ticket.
When using Calc and trying to open File menu calc immediately tries to perform an nss lookup. It hits SSSD and it tries to go online. Since I am not connected to the corporate network the connection fails, SSSD goes offline (after 2 min) and then I hit the File menu again and all repeats again so at this point I can't use Calc at all.
milestone: SSSD 1.5.0 => SSSD 1.5.1
milestone: SSSD 1.5.1 => SSSD 1.5.2 upgrade: => 0
resolution: => worksforme status: assigned => closed
rhbz: => 0
Metadata Update from @dpal: - Issue assigned to sbose - Issue set to the milestone: SSSD 1.5.2
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/1706
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.