Learn more about these different git repos.
Other Git URLs
https://bugzilla.redhat.com/show_bug.cgi?id=883429 (Fedora)
sss_nss_check_header() in src/sss_client/nss_mc_common.c lacks a memory barrier. As a result, this loop: /* retry barrier protected reading max 5 times then give up */ for (count = 5; count > 0; count--) { memcpy(&h, ctx->mmap_base, sizeof(struct sss_mc_header)); if (MC_VALID_BARRIER(h.b1) && h.b1 == h.b2) { /* record is consistent so we can proceed */ break; } } if (count == 0) { /* couldn't successfully read header we have to give up */ return EIO; } is compiled to: 0x0000000000004d20 <+0>: mov 0x10(%rdi),%rdx 0x0000000000004d24 <+4>: mov %rbx,-0x28(%rsp) 0x0000000000004d29 <+9>: mov $0x5,%eax 0x0000000000004d2e <+14>: mov %rbp,-0x20(%rsp) 0x0000000000004d33 <+19>: mov %r12,-0x18(%rsp) 0x0000000000004d38 <+24>: mov %r13,-0x10(%rsp) 0x0000000000004d3d <+29>: mov %r14,-0x8(%rsp) 0x0000000000004d42 <+34>: mov (%rdx),%esi 0x0000000000004d44 <+36>: mov 0x28(%rdx),%ebp 0x0000000000004d47 <+39>: mov 0x20(%rdx),%ebx 0x0000000000004d4a <+42>: mov 0x30(%rdx),%r8d 0x0000000000004d4e <+46>: mov 0x8(%rdx),%r9d 0x0000000000004d52 <+50>: mov 0xc(%rdx),%r11d 0x0000000000004d56 <+54>: mov 0x10(%rdx),%r12d 0x0000000000004d5a <+58>: mov 0x1c(%rdx),%r13d 0x0000000000004d5e <+62>: mov %esi,%ecx 0x0000000000004d60 <+64>: mov 0x4(%rdx),%r10d 0x0000000000004d64 <+68>: mov 0x14(%rdx),%r14d 0x0000000000004d68 <+72>: and $0xff000000,%ecx 0x0000000000004d6e <+78>: cmp $0xf0000000,%ecx 0x0000000000004d74 <+84>: je 0x4da0 <sss_nss_check_header+128> 0x0000000000004d76 <+86>: sub $0x1,%eax 0x0000000000004d79 <+89>: jne 0x4d6e <sss_nss_check_header+78> 0x0000000000004d7b <+91>: mov $0x5,%eax 0x0000000000004d80 <+96>: mov -0x28(%rsp),%rbx 0x0000000000004d85 <+101>: mov -0x20(%rsp),%rbp 0x0000000000004d8a <+106>: mov -0x18(%rsp),%r12 0x0000000000004d8f <+111>: mov -0x10(%rsp),%r13 0x0000000000004d94 <+116>: mov -0x8(%rsp),%r14 0x0000000000004d99 <+121>: retq 0x0000000000004d9a <+122>: nopw 0x0(%rax,%rax,1) 0x0000000000004da0 <+128>: cmp %r8d,%esi That is, %ecx is never reloaded from memory. In sss_nss_mc_get_record(), __sync_synchronize() is needed before the final barrier check. sss_mc_add_rec_to_chain() in src/responder/nss/nsssrv_mmap_cache.c contains this comment: /* changing a single uint32_t is atomic, so there is no * need to use barriers in this case */ I think the comment is misleading because it's not just atomicity that matters, ordering can also be relevant. However, in this particular case, it does not appear to matter in what order the writes to the links in the hash chain happen. A client can pick up a reference to a hash table slot which is about to be invalidated, resulting in a lookup failure when the record is actually present in the cache. This is probably not a problem for this application. You need to deal with counter overflow in your barriers (the ABA problem). One way to do this is to switch to switch to a different file when it happens, and rename it over the existing file. Concurrent readers will still have a consistent view. It seems you use this scheme because you need wait-free writers, so that the privileged process which updates the cache does not block on readers. Correct? Otherwise, there are probably simpler approaches.
Fields changed
blockedby: => blocking: => coverity: => design: => design_review: => 0 feature_milestone: => fedora_test_page: => milestone: NEEDS_TRIAGE => SSSD 1.9.3 owner: somebody => simo status: new => assigned testsupdated: => 0
patch: 0 => 1
A synchronization instruction right after the memcpy effectively fixes this problem. This is the new code:
Dump of assembler code for function sss_nss_check_header: 0x00000000000056dc <+0>: push %rbp 0x00000000000056dd <+1>: mov %rsp,%rbp 0x00000000000056e0 <+4>: mov %rdi,-0x48(%rbp) 0x00000000000056e4 <+8>: movl $0x5,-0x4(%rbp) 0x00000000000056eb <+15>: jmp 0x574a <sss_nss_check_header+110> 0x00000000000056ed <+17>: mov -0x48(%rbp),%rax 0x00000000000056f1 <+21>: mov 0x10(%rax),%rax 0x00000000000056f5 <+25>: mov (%rax),%rdx 0x00000000000056f8 <+28>: mov %rdx,-0x40(%rbp) 0x00000000000056fc <+32>: mov 0x8(%rax),%rdx 0x0000000000005700 <+36>: mov %rdx,-0x38(%rbp) 0x0000000000005704 <+40>: mov 0x10(%rax),%rdx 0x0000000000005708 <+44>: mov %rdx,-0x30(%rbp) 0x000000000000570c <+48>: mov 0x18(%rax),%rdx 0x0000000000005710 <+52>: mov %rdx,-0x28(%rbp) 0x0000000000005714 <+56>: mov 0x20(%rax),%rdx 0x0000000000005718 <+60>: mov %rdx,-0x20(%rbp) 0x000000000000571c <+64>: mov 0x28(%rax),%rdx 0x0000000000005720 <+68>: mov %rdx,-0x18(%rbp) 0x0000000000005724 <+72>: mov 0x30(%rax),%eax 0x0000000000005727 <+75>: mov %eax,-0x10(%rbp) 0x000000000000572a <+78>: mfence 0x000000000000572d <+81>: mov -0x40(%rbp),%eax 0x0000000000005730 <+84>: and $0xff000000,%eax 0x0000000000005735 <+89>: cmp $0xf0000000,%eax 0x000000000000573a <+94>: jne 0x5746 <sss_nss_check_header+106> 0x000000000000573c <+96>: mov -0x40(%rbp),%edx 0x000000000000573f <+99>: mov -0x10(%rbp),%eax 0x0000000000005742 <+102>: cmp %eax,%edx 0x0000000000005744 <+104>: je 0x5752 <sss_nss_check_header+118> 0x0000000000005746 <+106>: subl $0x1,-0x4(%rbp) 0x000000000000574a <+110>: cmpl $0x0,-0x4(%rbp) 0x000000000000574e <+114>: jg 0x56ed <sss_nss_check_header+17> 0x0000000000005750 <+116>: jmp 0x5753 <sss_nss_check_header+119> 0x0000000000005752 <+118>: nop 0x0000000000005753 <+119>: cmpl $0x0,-0x4(%rbp) 0x0000000000005757 <+123>: jne 0x5763 <sss_nss_check_header+135> 0x0000000000005759 <+125>: mov $0x5,%eax 0x000000000000575e <+130>: jmpq 0x5852 <sss_nss_check_header+374> 0x0000000000005763 <+135>: mov -0x3c(%rbp),%eax 0x0000000000005766 <+138>: cmp $0x1,%eax ....
Patch sent to the list.
I have not changed anything for ABA because updates are so rare that it is effectively impossible for any meaningful client to run into this issue IMO.
resolution: => fixed status: assigned => closed
The fix was not complete, the Fedora bugzilla has been reopened as well.
resolution: fixed => status: closed => reopened
milestone: SSSD 1.9.3 => SSSD 1.9.4 patch: 1 => 0 version: => 1.9.3
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=886757
rhbz: [https://bugzilla.redhat.com/show_bug.cgi?id=883429 883429] => [https://bugzilla.redhat.com/show_bug.cgi?id=883429 883429], [https://bugzilla.redhat.com/show_bug.cgi?id=886757 886757]
resolution: => fixed status: reopened => closed
Metadata Update from @jhrozek: - Issue assigned to simo - Issue set to the milestone: SSSD 1.9.4
SSSD is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in SSSD's github repository.
This issue has been cloned to Github and is available here: - https://github.com/SSSD/sssd/issues/2736
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Login to comment on this ticket.