#48747 dirsrv service fails to start when nsslapd-listenhost is configured
Closed: wontfix None Opened 8 years ago by nhosoi.

Description of problem:
In some environments network initialization takes a long time (slow DHCP, etc).
When nsslapd-listenhost is configured, dirsrv systemd service fails to start,
because it doesn't wait for the network to be available (see attached
bootchart)

Only dirsrv.target waits for the network.service. To make it more confusing,
dirsrv.target reports that it's active, though no instances of dirsrv have been
started:

[root@rhel7ds ~]# systemctl status dirsrv.target
? dirsrv.target - 389 Directory Server
   Loaded: loaded (/usr/lib/systemd/system/dirsrv.target; enabled; vendor
preset: disabled)
   Active: active since Sat 2016-02-27 11:35:14 CET; 5min ago

[root@rhel7ds ~]# systemctl status dirsrv@rhel7ds.service -l
? dirsrv@rhel7ds.service - 389 Directory Server rhel7ds.
   Loaded: loaded (/usr/lib/systemd/system/dirsrv@.service; enabled; vendor
preset: disabled)
   Active: failed (Result: exit-code) since Sat 2016-02-27 11:35:14 CET; 5min
ago
  Process: 728 ExecStart=/usr/sbin/ns-slapd -D /etc/dirsrv/slapd-%i -i
/var/run/dirsrv/slapd-%i.pid -w /var/run/dirsrv/slapd-%i.startpid (code=exited,
status=1/FAILURE)

Feb 27 11:35:13 rhel7ds.brq.redhat.com ns-slapd[728]: [27/Feb/2016:11:35:13
+0100] createprlistensockets - PR_Bind() on 172.25.1.3 port 389 failed:
Netscape Portable Runtime error -5986 (Network address not available (in
use?).)

How reproducible:
often

Steps to Reproduce:

1. Configure nsslapd-listenhost to use external IP address.
2. Disable link on that interface (unplug cable or in case of libvirt VM: virsh
domif-setlink $VM_NAME vnetX down)
3. Restart the machine
4. Enable link during the startup when Network Manager waits for the link
(that's the tricky part).

Actual results:
dirsrv service fails to start

Expected results:
dirsrv should wait for network.service


On behalf of Viktor, pushed to master:
333f963..1e2cfe2 master -> master
commit 1e2cfe2
Author: Viktor Ashirov vashirov@redhat.com
Date: Mon Feb 29 19:07:12 2016 +0100

I think dirsrv@.service after line should match dirsrv.target. Or dirsrv.target should have no after line, and we rely on the @.service only.

We should probably add ntpd.service in dirsrv@.service too ....

We used to have ntpd in there somewhere, until someone told us to get rid of it for chrony instead.

Input from Lukas Slebodnik (Thanks!!)

On (29/02/16 11:36), Noriko Hosoi wrote:

On 02/29/2016 11:01 AM, Lukas Slebodnik wrote:

On (29/02/16 18:43), Noriko Hosoi wrote:

wrappers/systemd.template.service.in | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

New commits:
commit 1e2cfe2
Author: Viktor Ashirov vashirov@redhat.com
Date: Mon Feb 29 19:07:12 2016 +0100

Ticket 48747 - dirsrv service fails to start when nsslapd-listenhost is configured
Bug Description:
In some environments network initialization takes a long time (slow DHCP, etc).
When nsslapd-listenhost is configured, dirsrv systemd service fails to start,
because it doesn't wait for the network to be available.
Fix Description:
Make dirsrv@.service wait for network.service
https://fedorahosted.org/389/ticket/48747
Reviewed by nhosoi@redhat.com.

diff --git a/wrappers/systemd.template.service.in b/wrappers/systemd.template.service.in
index 629c1ad..3eb0789 100644
--- a/wrappers/systemd.template.service.in
+++ b/wrappers/systemd.template.service.in
@@ -15,7 +15,7 @@
[Unit]
Description=@capbrand@ Directory Server %i.
PartOf=@systemdgroupname@
-After=chronyd.service
+After=chronyd.service network.service
^^^^^^^^^^^^^^^
I'm not sure it is a right solution.
network.service:
network.service - LSB: Bring up/down networking
Loaded: loaded (/etc/rc.d/init.d/network; bad; vendor preset: disabled)
Active: inactive (dead)
Docs: man:systemd-sysv-generator(8)

I think you should use "network-online.target"
more details in man 7 systemd.special
or even better http://www.freedesktop.org/wiki/Software/systemd/NetworkTarget

@see also ntpdate.service chrony-dnssrv@.service

LS
Thanks for your input, Lukas!

I'm reading the docs you mentioned ...

https://www.freedesktop.org/wiki/Software/systemd/NetworkTarget/

Alternatively, you can change your service that needs the network to be up,
to include After=network-online.target and Wants=network-online.target.

man 7 systemd.special

network-online.target
Units that strictly require a configured network connection should pull in
network-online.target (via a Wants= type dependency) and order themselves after it.
This target unit is intended to pull in a service that delays further execution
until the network is sufficiently set up. What precisely this requires is
left to the implementation of the network managing service.

Note the distinction between this unit and network.target. This unit is an active
unit (i.e. pulled in by the consumer rather than the provider of this functionality)
and pulls in a service which possibly adds substantial delays to further execution.
In contrast, network.target is a passive unit (i.e. pulled in by the provider
of the functionality, rather than the consumer) that usually does not delay execution
much. Usually, network.target is part of the boot of most systems, while network-
online.target is not, except when at least one unit requires it. Also see Running
Services After the Network is up[1] for more information.

All mount units for remote network file systems automatically pull in this unit,
and order themselves after it. Note that networking daemons that simply
provide functionality to other hosts generally do not need to pull this in.

So, you think this does not give enough delay for the Directory Server's
start?
I'm not sure about delay but "network.service" is not mentioned on that
page and it's not used by many services

grep -Rn "network.service" /usr/lib/systemd/
/usr/lib/systemd/system/arp-ethers.service:5:After=network.service
/usr/lib/systemd/system/NetworkManager.service:4:Before=network.target network.service

+After=chronyd.service network.service

And you suggest we should do
+After=chronyd.service network-online.target
as well as this?
+Wants=network-online.target

I think that this one should be used
because "network.service" is mentioned only in NetworkManager.service on my
machine and IMHO it needn't guarantee that network is online.
network-online.target should be more reliable

/usr/lib/systemd/system/NetworkManager.service:4:Before=network.target network.service

But you my ask systemd guys about difference between "network.service"
and "network-online.target"

{lnykryn,msekleta} at redhat dot com
or on IRC

LS

Replying to [comment:5 rmeggins]:

We used to have ntpd in there somewhere, until someone told us to get rid of it for chrony instead.

Hmmm, this file has ntpd.service...
wrappers/systemd.group.in:After=syslog.target network.target ntpd.service

William, could you tell me this is what you suggest?
{{{
diff --git a/wrappers/systemd.group.in b/wrappers/systemd.group.in
index 135affc..9dd3a86 100644
--- a/wrappers/systemd.group.in
+++ b/wrappers/systemd.group.in
@@ -1,6 +1,5 @@
[Unit]
Description=@capbrand@ Directory Server
-After=syslog.target network.target ntpd.service

[Install]
WantedBy=multi-user.target
diff --git a/wrappers/systemd.template.service.in b/wrappers/systemd.template.service.in
index 3eb0789..12efbcc 100644
--- a/wrappers/systemd.template.service.in
+++ b/wrappers/systemd.template.service.in
@@ -15,7 +15,8 @@
[Unit]
Description=@capbrand@ Directory Server %i.
PartOf=@systemdgroupname@
-After=chronyd.service network.service
+After=chronyd.service network-online.target syslog.target ntpd.service
+Wants=network-online.target

[Service]
Type=notify
}}}

Take a look at https://fedorahosted.org/389/ticket/47947 - can we have a dependency on ntpd.service?

I think you should make the After= line in both:

{{{
After=chronyd.service ntpd.service network.target network-online.target syslog.target
}}}

I do not recommend putting network-online.target into the Wants section.

This way, when the target is reached we already know these services are ready: If we restart a single dirsrv instance, we know the services will be ready. If the admin is running NM and has wait-online in place (which is default) then this changes fixes the issue.

There is some issue in my mind about the After= in the .target. This is mainly to do with the behaviour of systemctl isolate on the dirsrv.target. I'll need to test this to be sure.

I don't think that network-online.target should be in the Wants section. If we put this in wants, we implicitly force the use of NetworkManager-wait-online on servers, even if it was explicitly disabled by the admin. This can delay boots and access to a login prompt. When you are in the middle of an emergency, as a system admin this is the last thing I would want to have forced on me.

The other issue here is often servers DO NOT use NetworkManager anyway! Because NetworkManager has such a bad rap, on servers it's very common to see:

{{{
systemctl disable NetworkManager
systemctl enable network # <<-- This enables the /etc/rc.d/init.d/network script.
}}}

It's also common to see:

{{{
/etc/sysconfig/network-scripts/ifcfg-eth0
...
NM_CONTROLLED=no
}}}

This means that even if the above wasn't run, NetworkManager doesn't own the interface, and the ifup / ifdown scripts are used.

In both of these scenarios NetworkManager-wait-online can't really tell if an address is ready yet because it's out of NetworkManager's hands.

Summary:

This fix will work only for the subset of systems that use NetworkManager on servers, and retain the default wait-online. For most admins, I would expect this not to be the case, and no matter what we do, it's hard to avoid. This may fix a default install, but after that, it's system admin error. No amount of patching can save is from that ...

git patch file (master) -- take 2 -- updated following the discussions
0001-Ticket-48747-dirsrv-service-fails-to-start-when-nssl.2.patch

In the After line you have:

{{{
After=chronyd.service ntpd.service network.target network-online.target syslog.target
}}}

The definition of network-online.target is:

{{{
...
After=network.target
}}}

So I think we can make this

{{{
After=chronyd.service ntpd.service network-online.target syslog.target
}}}

New version will be attached.

It's a question about this comment:

replacing network.service with network.target and network-online.target.

Are we replacing {network.service and network.target} with network-online.target?
{{{
3 After=syslog.target network.target ntpd.service
3 After=chronyd.service ntpd.service network-online.target syslog.target
18 After=chronyd.service network.service
18 After=chronyd.service ntpd.service network-online.target syslog.target
18 After=chronyd.service network.service
18 After=chronyd.service ntpd.service network-online.target syslog.target
}}}
Assuming we are, the fix itself looks good.

Yes. network-online.target is where if you have NetworkManager-waitonline.service enabled goes to, so we will block properly waiting for that. But if you have NetworkManage-waitonline disabled, this is a no-op.

network-online.target also pulls in network.target / network.service, so I think this is okay.

Thanks for your clarification. You have my full ack now. :)

commit 40c2aa6
Writing objects: 100% (6/6), 980 bytes | 0 bytes/s, done.
Total 6 (delta 4), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
f215fb6..40c2aa6 master -> master

Metadata Update from @nhosoi:
- Issue assigned to firstyear
- Issue set to the milestone: 1.3.5.5

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/1807

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata