This is not the first time I've had this occur, so I'm reporting it now. One of my zones has stopped signing, and so for resolvers that validate DNSSEC (eg Google's public resolvers), the zone is effectively gone from the Internet. Last time it happened I just resigned the zone and waited for the propagation delays, but given the time periods involved, and that it has happened again, I'd like to try and get it fixed - having zones missing for days on a semi-regular basis is not a tenable situation.
Below is the stack trace I get when ipa-dnskeysyncd tries to synchronize the zone:
Sep 30 20:04:19 dnsmaster.example.com ipa-dnskeysyncd[25797]: ipa.ipapython.dnssec.bindmgr.BINDMgr: INFO Sep 30 20:04:19 dnsmaster.example.com ipa-dnskeysyncd[25797]: ipa.ipapython.dnssec.bindmgr.BINDMgr: INFO Synchronizing zone example.com. Sep 30 20:04:19 dnsmaster.example.com ipa-dnskeysyncd[25797]: ipa.ipapython.dnssec.bindmgr.BINDMgr: INFO attrs: {'idnsseckeyref': ['pkcs11:object=hash1'], 'dn': 'cn=ZSK-20150531030418Z-hash1,cn=keys,idnsname=example.com.,cn=dns,dc=example,dc=com', 'cn': ['ZSK-20150531030418Z-hash1'], 'idnsseckeypublish': ['20150531030419Z'], 'objectclass': ['idnsSecKey'], 'idnssecalgorithm': ['RSASHA256'], 'idnsseckeyzone': ['TRUE'], 'idnsseckeycreated': ['20150531030418Z']} Sep 30 20:04:19 dnsmaster.example.com python2[25797]: detected unhandled Python exception in '/usr/libexec/ipa/ipa-dnskeysyncd' Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: Traceback (most recent call last): Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/libexec/ipa/ipa-dnskeysyncd", line 112, in <module> Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: while ldap_connection.syncrepl_poll(all=1, msgid=ldap_search): Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/lib64/python2.7/site-packages/ldap/syncrepl.py", line 409, in syncrepl_poll Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: self.syncrepl_refreshdone() Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/lib/python2.7/site-packages/ipapython/dnssec/keysyncer.py", line 115, in syncrepl_refreshdone Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: self.bindmgr.sync() Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/lib/python2.7/site-packages/ipapython/dnssec/bindmgr.py", line 194, in sync Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: self.sync_zone(zone) Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/lib/python2.7/site-packages/ipapython/dnssec/bindmgr.py", line 177, in sync_zone Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: self.install_key(zone, uuid, attrs, tempdir) Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/lib/python2.7/site-packages/ipapython/dnssec/bindmgr.py", line 113, in install_key Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: basename = ipautil.run(cmd)[0].strip() Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: File "/usr/lib/python2.7/site-packages/ipapython/ipautil.py", line 373, in run Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: raise CalledProcessError(p.returncode, arg_string, stdout) Sep 30 20:04:20 dnsmaster.example.com ipa-dnskeysyncd[25797]: subprocess.CalledProcessError: Command ''/usr/sbin/dnssec-keyfromlabel-pkcs11' '-K' '/var/named/dyndb-ldap/ipa/master/example.com/tmpazgVEt' '-a' 'RSASHA256' '-l' 'pkcs11:object=hash1;pin-source=/var/lib/ipa/dnssec/softhsm_pin' '-P' '20150531030419' 'example.com.'' returned non-zero exit status 1 Sep 30 20:04:20 dnsmaster.example.com systemd[1]: ipa-dnskeysyncd.service: main process exited, code=exited, status=1/FAILURE Sep 30 20:04:20 dnsmaster.example.com systemd[1]: Unit ipa-dnskeysyncd.service entered failed state. Sep 30 20:04:20 dnsmaster.example.com systemd[1]: ipa-dnskeysyncd.service failed.
Please let me know what additional information I can provide. This is a lesser-used domain, so I have a little leeway, but I would certainly appreciate any help that might be obtained in a timely fashion so it's not gone for too long.
Hrm, this has resolved itself somehow this time, though it was out for nearly 12 hours. I'll mark this as deferred for now, until I'm in a position to trouble-shoot again (if this is not the correct course of action, please let me know or update as you see fit).
Hmm, interesting. First of all, please add line debug=True into file /etc/ipa/default.conf and restart IPA. It will give us more details when this happens next time.
debug=True
/etc/ipa/default.conf
When it happens, please run following commands:
sudo -u named -s /bin/bash source /etc/sysconfig/named export PKCS11_PROVIDER export SOFTHSM2_CONF python2 /usr/lib/python2.*/site-packages/ipapython/dnssec/localhsm.py <execute the dnssec-keyfromlabel-pkcs11 command which is failing according to the log>
Thank you!
Most probably duplicate of #5348.
@pspacek this doesn't appear to be a duplicate of #5348 since mbasti notes that restarting named-pkcs11 fixes that, however doing so has no impact on this issue. It is occurring for me now, so I have a window to troubleshoot before it either resolves itself as last time, or I have to bite the bullet and replace the keys (which I had to do the time before, and is obviously very painful due to timeouts and propagation).
I can't execute the failing command usefully because it depends on the stemp dir being created by whatever other process precedes it during normal execution.
Does the localhsm.py output include sensitive information? If so, can I send it to you off-tracker?
Debug ipa-dnskeysyncd log ipa-dnskeysyncd.log
Replying to [comment:5 pdf]:
Ah, I forgot about that. Just replace the path to the directory with something existing, /tmp should be okay. The generated files do not contain any sensitive information, just timestamps and key ID.
/tmp
Does the localhsm.py output include sensitive information? If so, can I send it to you off-tracker? No, there is nothing sensitive in there, just key ID and attributes. The private key will never leave 'HSM' in an unencrypted form.
Is the error popping out on DNSSEC key master? (Running opendnssec's ods-enforcered service?) Or a replica? (Without ods-enforcered service?) Or both?
ods-enforcered
Replying to [comment:6 pspacek]:
Executing the command simply outputs:
dnssec-keyfromlabel: fatal: failed to get key failingdomain.com/RSASHA256: not found
No, there is nothing sensitive in there, just key ID and attributes. The private key will never leave 'HSM' in an unencrypted form.
Will attach momentarily.
This is occurring on the master only.
localhsm.py output localhsm.py.txt
Interesting ... Could you please upload output from localhsm.py (using the same commands) also from one other replica where it works?
Also, please provide output generated by command ods-ksmutil key list --verbose running on DNSSEC key master as described on page http://www.freeipa.org/page/Troubleshooting#DNS_keys_are_not_generated_by_OpenDNSSEC
ods-ksmutil key list --verbose
Thank you.
localhsm.py output from replica localhsm.py-replica.txt
ods-ksmutil key list --verbose ods-ksmutil.txt
Attached, thanks for taking a look.
Thank you very much. This is really really weird because the key is present on replica but not on the master, hmm. I will try to reproduce it locally. In meantime you might try to delete LDAP object cn=ZSK-20150530170525Z-ce36572b364d1ca95962273e57c28240,cn=keys,idnsname=failingdomain.com.,cn=dns,dc=example,dc=com and restart the ipa-dnskeysyncd, it might be enough to workaround it.
cn=ZSK-20150530170525Z-ce36572b364d1ca95962273e57c28240,cn=keys,idnsname=failingdomain.com.,cn=dns,dc=example,dc=com
ipa-dnskeysyncd
The affected key should not be used for signing anyway (inactivity period started at 20150920123018Z, it is apparently the old, already rotated ZSK key) so you might be lucky enough if you do not have TTLs longer than 3 days or so.
20150920123018Z
Please let me know if it helped or not.
Looks like I had one more key in the same state, having deleted them both I believe I'm in good shape - ipa-dnskeysyncd is happy and keys appear to be tracing correctly on all domains. Thanks you greatly.
Is there anything else I can do to help track down the source of this in future?
Ticket has not been triaged.
I think that we have enough information now, thank you very much for cooperation! It seems the we do not properly handle key purging in OpenDNSSEC which happens 14 days after the key is not used anymore. Considering the fact that ZSK is rotatech each 3 months, it explains why it happens so rarely. We have to fix this.
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1296214
master:
ipa-4-3:
ipa-4-2 needs rebase - later
ipa-4-2:
Look at list of keys associated with the DNS zone (on DNSSEC key master):
$ ods-ksmutil key list --verbose
Compare this list with all the keys in HSM on master and all replicas:
$ python2 /usr/lib/python2.*/site-packages/ipapython/dnssec/localhsm.py
Wait for key rotation to happen (see /etc/opendnssec/kasp.xml for config)
At this point, some of previously-used keys should be purged from HSM and LDAP. Check it the deleted keys were removed from OpenDNSSEC & HSM on all the replicas: 6. Look at list of keys associated with the DNS zone (on DNSSEC key master):
Metadata Update from @pdf: - Issue assigned to pspacek - Issue set to the milestone: FreeIPA 4.2.4
Login to comment on this ticket.