Learn more about these different git repos.
Other Git URLs
Execution of "make -j4 distcheck" on RHEL7 or Fedora fails sometimes.
The issue seems to always be libsss_ldap_common.so being corrupted.
See CI output (with links to logs) for examples:
http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_fedora20/199/console
http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_rhel7/203/console
Or "make -j4 distcheck" output specifically (also to be attached):
http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_fedora20/199/artifact/ci-build-debug/ci-make-distcheck.log
http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_rhel7/203/artifact/ci-build-debug/ci-make-distcheck.log
What's interesting this is never a problem on Debian, although it uses different set of configure options: http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/30/console
_comment0: What's interesting this never a problem on Debian, although it uses different set of configure options: http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/30/console => 1401991519671459
attachment rhel7-ci-make-distcheck.log.xz
attachment fedora20-ci-make-distcheck.log.xz
attachment debian-testing-ci-make-distcheck.log.xz
I was not able to reproduce this problem on fedora 20, but I found interesting lines in log files on that machine.
Out of memory: Kill process 21701 (memcheck-amd64-) score 148 or sacrifice child Killed process 21701 (memcheck-amd64) total-vm:268836kB, anon-rss:83404kB, file-rss 308kB
cc: => lslebodn@redhat.com
OOM can be a reason of this issues and issue in ticket #2350. Could you try to increase memory for VM?
Make would have probably noticed a process dying. Still, how much more memory would you like me to add?
You wrote in ticket description: Execution of "make -j4 distcheck" on RHEL7 or Fedora fails sometimes. I am not sure how to reproduce it. In log files, I saw something like: file was truncated.
I just want to reduce potential source of problems. I expect that there is some java process from jenkins and it can consume some memory. In my opinion, it worth to try increase memory. If it does not help we can continue with investigating of this problem. But OOM is not good message and it will better to get rid of it.
Sure. I'll increase the memory to 1GB then, as a start.
Failed on a Debian VM with 1GB of RAM. Freshly rebooted, went through a few builds without a problem, then failed, now working fine again. No OOM killer messages in the log.
CI output (with links to logs): http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/40/console
"make -j4 distcheck" output: http://sssd-ci.idm.lab.eng.brq.redhat.com:8080/job/new_private_master_debian_testing/40/artifact/ci-build-debug/ci-make-distcheck.log
The make output will be attached in a moment.
attachment debian-testing-ci-make-distcheck2.log.xz
In the log file debian-testing-ci-make-distcheck2.log.xz, the problem is with re-linking library libsss_ldap_common.la
libtool: install: warning: relinking `libsss_ldap_common.la' libtool: install: (cd /var/lib/jenkins/workspace/new_private_master_debian_testing/ ... libtool: relink: gcc -shared -fPIC -DPIC src/providers/ldap/.libs/libsss_ldap_common_la-ldap_id.o ... /usr/bin/ld: cannot find -lsss_idmap libtool: install: (cd /var/lib/jenkins/workspace/new_private_master_debian_testing/ci-build-debug/sssd-1.11.92/_inst/lib && { ln -s -f libipa_hbac.so.0.0.1 libipa_hbac.so || { rm -f libi pa_hbac.so && ln -s libipa_hbac.so.0.0.1 libipa_hbac.so; }; }) collect2: error: ld returned 1 exit status
The re-linking of libraries is necessary in distcheck, because files are installed in different directory than it should be installed (/usr/local/{lib,lib64}) according to values from configure script.
Install-time re-linking error can happen if libraries are in wrong order in automake variable e.g. lib_LTLIBRARIES
1993 # Plugin Libraries # 1994 #################### 1995 1996 # libsss_krb5_common must be installed before libsss_ldap_common 1997 # because libtool tries to relink libsss_ldap_common when installing 1998 # libsss_ldap_common and therefore make distcheck fails 1999 pkglib_LTLIBRARIES += libsss_krb5_common.la 2000 pkglib_LTLIBRARIES += libsss_ldap_common.la
Between different variables (say, lib_LTLIBRARIES and pkglib_LTLIBRARIES) currently there is no guaranteed installation ordering at all. Between different Makefiles of a package the traversal order given by the SUBDIRS variables of all Makefiles need to walk libraries in dependency. http://gnu-automake.7480.n7.nabble.com/relinking-error-td6954.html
It looks like the only solution will be to run make distcheck without multiple jobs (-j1)
Aargh. That's a shame. Thank you for the research.
I'll consider making the distcheck single-process and moving it to extended CI run, or removing it altogether.
I think we can close this as "cantfix".
make distcheck works fine on my machine with multiple jobs and it fails just sometimes in CI. There is a race condition and we cannot fix it. (I agree) Fortunately, we have a workaround for issue in automake.
owner: somebody => lslebodn
What workaround do we have?
There won't be race condition with make distcheck without multiple jobs (-j1)
Ah, yes, but it's not a workaround for parallel (and faster) execution, unfortunately.
This is only possible solution(workaround). Feel free to send patch to automake :-)
Fields changed
resolution: => cantfix status: new => closed
Metadata Update from @nkondras: - Issue assigned to lslebodn - Issue set to the milestone: NEEDS_TRIAGE
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/3396
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.