#4078 renewal can fail if CA startup times out
Closed: Fixed None Opened 10 years ago by rcritten.

Have seen this backtrace on RHEL 7 during forced certificate renewal by moving time forward:

service.py:166:__wait_until_running:RuntimeError: CA did not start in 120s

Traceback (most recent call last):
  File "/usr/lib64/ipa/certmonger/renew_ca_cert", line 87, in <module>
    update_cert_config(nickname, cert)
  File "/usr/lib/python2.7/site-packages/ipaserver/install/cainstance.py", line 
1732, in update_cert_config
    quotes=False, separator='=')
  File "/usr/lib64/python2.7/contextlib.py", line 24, in __exit__
    self.gen.next()
  File "/usr/lib/python2.7/site-packages/ipaserver/install/installutils.py", lin
e 822, in stopped_service
    ipaservices.knownservices[service].start(instance_name)
  File "/usr/lib/python2.7/site-packages/ipapython/platform/fedora16/service.py"
, line 172, in start
    self.__wait_until_running()
  File "/usr/lib/python2.7/site-packages/ipapython/platform/fedora16/service.py"
, line 166, in __wait_until_running
    raise RuntimeError('CA did not start in %ss' % timeout)
RuntimeError: CA did not start in 120s

This is from /var/tmp/abrt/Python-2015-10-31-03:12:51-31354/backtrace.

Looking at the CS log from the same time, /var/log/pki/pki-tomcat/catalina.2015-10-31.log, I can see a startup request at Oct 31, 2015 3:10:07 AM and completed startup at Oct 31, 2015 3:13:10 AM.

So 180 seconds may be more appropriate, if not significantly more since this is automation.

The result of this failure was that the ou=pkidbuser entry (subsystemCert) wasn't updated so dogtag couldn't bind to the LDAP server causing all subsequent certificate operations to fail, including renewing the LDAP and Apache certs.

I've never been able to reproduce this myself but Yi has seen this multiple times in his QE automation.


It occurs to me that this can also be non-fatal from a renewal perspective. Sure, it's bad, but it shouldn't prevent the people entry from being updated with the new information, should it? Maybe this should be caught and logged and let the reset proceed.

Probably. We want to make the renewal as robust as possible. It is easier to increase the timeout than to renew manually.

It might be possible that this bug is caused by that renew_ca_cert starts dogtag at wrong time: it calls cainstance.update_cert_config, which calls installutils.stopped_service, which currently starts the service even if it was previously stopped, and dogtag is always stopped at that point in renew_ca_cert.

I fixed this in patch 179 (http://www.redhat.com/archives/freeipa-devel/2013-October/msg00247.html). Rob, can you please check if the patch fixes the problem?

master:[[BR]]
7ce3320 Do not start the service in stopped_service if it was not running before.[[BR]]
cb411ff ipaserver/install/installutils: clean up properly after yield[[BR]]

ipa-3-3:[[BR]]
8e874b1 Do not start the service in stopped_service if it was not running before.[[BR]]
f4739bc ipaserver/install/installutils: clean up properly after yield[[BR]]

I forgot to push the origin patch to increase default timeout:

master: be2c5d2[[BR]]
ipa-3-3: 876a00a

Metadata Update from @rcritten:
- Issue assigned to jcholast
- Issue set to the milestone: FreeIPA 3.3.x - 2014/01 (bug fixing)

7 years ago

Login to comment on this ticket.

Metadata