#922 1.5.11 regression - cannot resolve domain if not defined in /etc/hosts
Closed: Fixed None Opened 12 years ago by dpiddock.

The domain name cannot be resolved if it's not defined in /etc/hosts. This gives a chain reaction:
- sssd's resolver tries to resolve DNS based on the plain hostname (say, $hostname) - no such domain.
- I don't have krb5_server defined so it queries for _SRV _KERBEROS._udp.$hostname - no such domain.
- Then I see a query for _SRV _KERBEROS._udp.default (I was lazy and left "domains = default" in sssd.conf) which also fails.
- Repeat for _KERBEROS._tcp.
- Kerberos agent fails to initialize as there are no servers defined.

This setup works with sssd 1.5.8

Log level 3:

[sssd[be[default]]] [sssm_krb5_auth_init] (1): Missing krb5_server option, using service discovery!
[sssd[be[default]]] [fo_new_service] (3): Creating new service 'KERBEROS'
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KERBEROS' using udp
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KERBEROS' using tcp
[sssd[be[default]]] [fo_new_service] (3): Creating new service 'KPASSWD'
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KPASSWD' using udp
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KPASSWD' using tcp
[sssd[be[default]]] [check_and_export_options] (1): No KDC explicitly configured, using defaults.
[sssd[be[default]]] [check_and_export_options] (1): No kpasswd server explicitly configured, using the KDC or defaults.
[sssd[be[default]]] [main] (1): Backend provider (default) started!
[sssd[be[default]]] [sdap_control_create] (3): Server does not support the requested control [1.3.6.1.4.1.42.2.27.8.5.1].
[sssd[be[default]]] [simple_bind_done] (3): Bind result: Success(0), (null)
[sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name naab resolver returned: [2]: No such file or directory
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name naab resolver returned: [2]: No such file or directory
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [fo_resolve_service_send] (1): No available servers for service 'KERBEROS'
[sssd[be[default]]] [be_run_offline_cb] (3): Going offline. Running callbacks.

That looks a mess, sorry. Attached log as an attachment. Wiki formatted original report:

The domain name cannot be resolved if it's not defined in /etc/hosts. This gives a chain reaction:[[BR]]

  • sssd's resolver tries to resolve DNS based on the plain hostname (say, $hostname) - no such domain.[[BR]]
  • I don't have krb5_server defined so it queries for _SRV _KERBEROS._udp.$hostname - no such domain.[[BR]]
  • Then I see a query for _SRV _KERBEROS._udp.default (I was lazy and left "domains = default" in sssd.conf) which also fails.[[BR]]
  • Repeat for _KERBEROS._tcp.[[BR]]
  • Kerberos agent fails to initialize as there are no servers defined.

This setup previously worked with sssd 1.5.8

Fixed up the description to be readable.

description: The domain name cannot be resolved if it's not defined in /etc/hosts. This gives a chain reaction:
- sssd's resolver tries to resolve DNS based on the plain hostname (say, $hostname) - no such domain.
- I don't have krb5_server defined so it queries for _SRV _KERBEROS._udp.$hostname - no such domain.
- Then I see a query for _SRV _KERBEROS._udp.default (I was lazy and left "domains = default" in sssd.conf) which also fails.
- Repeat for _KERBEROS._tcp.
- Kerberos agent fails to initialize as there are no servers defined.

This setup works with sssd 1.5.8

Log level 3:
[sssd[be[default]]] [sssm_krb5_auth_init] (1): Missing krb5_server option, using service discovery!
[sssd[be[default]]] [fo_new_service] (3): Creating new service 'KERBEROS'
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KERBEROS' using udp
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KERBEROS' using tcp
[sssd[be[default]]] [fo_new_service] (3): Creating new service 'KPASSWD'
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KPASSWD' using udp
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KPASSWD' using tcp
[sssd[be[default]]] [check_and_export_options] (1): No KDC explicitly configured, using defaults.
[sssd[be[default]]] [check_and_export_options] (1): No kpasswd server explicitly configured, using the KDC or defaults.
[sssd[be[default]]] [main] (1): Backend provider (default) started!
[sssd[be[default]]] [sdap_control_create] (3): Server does not support the requested control [1.3.6.1.4.1.42.2.27.8.5.1].
[sssd[be[default]]] [simple_bind_done] (3): Bind result: Success(0), (null)
[sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name naab resolver returned: [2]: No such file or directory
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name naab resolver returned: [2]: No such file or directory
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [fo_resolve_service_send] (1): No available servers for service 'KERBEROS'
[sssd[be[default]]] [be_run_offline_cb] (3): Going offline. Running callbacks. => The domain name cannot be resolved if it's not defined in /etc/hosts. This gives a chain reaction:
* sssd's resolver tries to resolve DNS based on the plain hostname (say, $hostname) - no such domain.
* I don't have krb5_server defined so it queries for _SRV _KERBEROS._udp.$hostname - no such domain.
* Then I see a query for _SRV _KERBEROS._udp.default (I was lazy and left "domains = default" in sssd.conf) which also fails.
* Repeat for _KERBEROS._tcp.
* Kerberos agent fails to initialize as there are no servers defined.

This setup works with sssd 1.5.8

Log level 3:
{{{
[sssd[be[default]]] [sssm_krb5_auth_init] (1): Missing krb5_server option, using service discovery!
[sssd[be[default]]] [fo_new_service] (3): Creating new service 'KERBEROS'
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KERBEROS' using udp
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KERBEROS' using tcp
[sssd[be[default]]] [fo_new_service] (3): Creating new service 'KPASSWD'
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KPASSWD' using udp
[sssd[be[default]]] [fo_add_srv_server] (3): Adding new SRV server in domain 'unknown', to service 'KPASSWD' using tcp
[sssd[be[default]]] [check_and_export_options] (1): No KDC explicitly configured, using defaults.
[sssd[be[default]]] [check_and_export_options] (1): No kpasswd server explicitly configured, using the KDC or defaults.
[sssd[be[default]]] [main] (1): Backend provider (default) started!
[sssd[be[default]]] [sdap_control_create] (3): Server does not support the requested control [1.3.6.1.4.1.42.2.27.8.5.1].
[sssd[be[default]]] [simple_bind_done] (3): Bind result: Success(0), (null)
[sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name naab resolver returned: [2]: No such file or directory
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name naab resolver returned: [2]: No such file or directory
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [resolve_srv_done] (1): SRV query failed: [Domain name not found]
[sssd[be[default]]] [fo_resolve_service_send] (1): No available servers for service 'KERBEROS'
[sssd[be[default]]] [be_run_offline_cb] (3): Going offline. Running callbacks.
}}}

SSSD calls the {{{gethostname()}}} system function and uses that as the hostname. We then check DNS (and /etc/hosts) for whether this hostname exists and has a preferred FQDN.

Then we take the hostname, chop off the part before the first dot and use the remainder as the DNS discovery domain.

It looks like on your system, {{{gethostname()}}} returns "naab" (not fully-qualified), and the hostname is not listed in either /etc/hosts or DNS with a fully-qualified variant. As a result, we have no way to automatically determine your domain name.

In this situation, you should be using the 'dns_discovery_domain' option in sssd.conf to manually set the DNS domain against which to issue SRV record lookups.

This would never have worked with 1.5.8 any more than with 1.5.11. I can only surmise that at some point your machine changed hostnames to not include the domain part.

component: SSSD => Failover
resolution: => worksforme
status: new => closed

I have a network of 100 hosts saying that 1.5.8 works in this configuration ;)

I think we're looking at the wrong spot. I've been able to reproduce the problem with virtual machines:

$ hostname
fedora14-i686
$ hostname -f
fedora14-i686.int.corefiling.com

Using the currently working 1.5.8-1.fc14.i686 at debug_level 9:

(Tue Jul 12 18:04:04 2011) [sssd[be[default]]] [resolve_get_domain_send] (7): Host name is: fedora14-i686
(Tue Jul 12 18:04:04 2011) [sssd[be[default]]] [resolv_gethostbyname_send] (4): Trying to resolve A record of 'fedora14-i686'
(Tue Jul 12 18:04:04 2011) [sssd[be[default]]] [schedule_timeout_watcher] (9): Scheduling DNS timeout watcher
(Tue Jul 12 18:04:04 2011) [sssd[be[default]]] [unschedule_timeout_watcher] (9): Unscheduling DNS timeout watcher
(Tue Jul 12 18:04:04 2011) [sssd[be[default]]] [resolve_get_domain_done] (7): The full FQDN is: fedora14-i686.int.corefiling.com

With wireshark I also see a DNS query for fedora14-i686.int.corefiling.com

Now try the testing 1.5.11-2.fc14.i686:

(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolve_get_domain_send] (7): Host name is: fedora14-i686
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_is_address] (9): [fedora14-i686] does not look like an IP address
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_step] (8): Querying files
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_files_send] (4): Trying to resolve A record of 'fedora14-i686' in files
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_step] (8): Querying files
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_files_send] (4): Trying to resolve AAAA record of 'fedora14-i686' in files
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_next] (5): No more address families to retry
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_step] (8): Querying DNS
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [schedule_timeout_watcher] (9): Scheduling DNS timeout watcher
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_dns_query] (4): Trying to resolve A record of 'fedora14-i686' in DNS
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [unschedule_timeout_watcher] (9): Unscheduling DNS timeout watcher
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_step] (8): Querying DNS
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [schedule_timeout_watcher] (9): Scheduling DNS timeout watcher
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_dns_query] (4): Trying to resolve AAAA record of 'fedora14-i686' in DNS
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [unschedule_timeout_watcher] (9): Unscheduling DNS timeout watcher
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_next] (5): No more address families to retry
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolv_gethostbyname_next] (4): No more hosts databases to retry
(Tue Jul 12 18:01:06 2011) [sssd[be[default]]] [resolve_get_domain_done] (2): Could not get fully qualified name for host name fedora14-i686 resolver returned: [2]: No such file or directory

The DNS query that goes past is for the plain fedora14-i686. Something has changed in the DNS resolver so that it does not include the default search domain. This is probably the real cause of the apparent regression.

/etc/resolv.conf:

search int.corefiling.com
nameserver 172.16.252.10
nameserver 172.16.252.89

resolution: worksforme =>
status: closed => reopened

Can you confirm whether both setups are running the same version of the c-ares package? I'd like to rule out whether this is a bug in the DNS resolution library we use.

keywords: => Regression
milestone: NEEDS_TRIAGE => SSSD 1.5.12
owner: somebody => jhrozek
priority: major => blocker
status: reopened => new

I was changing the sssd version in the VM. c-ares package is 1.7.3-4.fc14. I also recompiled the sssd 1.5.8 and 1.5.11 packages against this version of c-ares to double check.

Replying to [comment:4 dpiddock]:

The DNS query that goes past is for the plain fedora14-i686. Something has changed in the DNS resolver so that it does not include the default search domain. This is probably the real cause of the apparent regression.

/etc/resolv.conf:
{{{
search int.corefiling.com
nameserver 172.16.252.10
nameserver 172.16.252.89
}}}

Yes, that is exactly the reason. When I rewrote our resolver to take TTL values into accout, I used the wrong API call from c-ares that does not use the search or domain into account.

I'm sorry about the breakage. A patch is already on list. Thank you for reporting the issue.

patch: 0 => 1
status: new => assigned

master: 55a89b8

sssd-1.5: dd51e97

resolution: => fixed
status: assigned => closed

Can confirm that the patch fixes the issue. Thank you!

Fields changed

rhbz: => 0

Metadata Update from @dpiddock:
- Issue assigned to jhrozek
- Issue set to the milestone: SSSD 1.5.12

7 years ago

SSSD is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in SSSD's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/SSSD/sssd/issues/1964

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Login to comment on this ticket.

Metadata