#48996 Update DS to accomodate the changes to NS 0.2.0
Closed: wontfix None Opened 7 years ago by firstyear.

In the update to NS 0.2.0 the API has changed slightly. We should fix the current NS integration to be able to consume ns0.2.0 to continue to support users who are enabling it.


Tested with DS and ldclt. Results are:

{{{
Without NS
ldclt[14959]: Average rate: 567.57/thr (1702.70/sec), total: 17027
ldclt[14959]: Average rate: 413.67/thr (1241.00/sec), total: 12410
ldclt[14959]: Average rate: 116.70/thr ( 350.10/sec), total: 3501
ldclt[14959]: Average rate: 104.90/thr ( 314.70/sec), total: 3147
ldclt[14959]: Average rate: 116.73/thr ( 350.20/sec), total: 3502
ldclt[14959]: Average rate: 171.60/thr ( 514.80/sec), total: 5148
ldclt[14959]: Average rate: 305.03/thr ( 915.10/sec), total: 9151
ldclt[14959]: Average rate: 391.33/thr (1174.00/sec), total: 11740
ldclt[14959]: Average rate: 227.07/thr ( 681.20/sec), total: 6812
ldclt[14959]: Average rate: 190.77/thr ( 572.30/sec), total: 5723
ldclt[14959]: Number of samples achieved. Bye-bye...
ldclt[14959]: All threads are dead - exit.
ldclt[14959]: Global average rate: 2605.37/thr (781.61/sec), total: 78161

With NS 0.2.0
ldclt[15114]: Average rate: 720.77/thr (2162.30/sec), total: 21623
ldclt[15114]: Average rate: 309.23/thr ( 927.70/sec), total: 9277
ldclt[15114]: Average rate: 138.47/thr ( 415.40/sec), total: 4154
ldclt[15114]: Average rate: 110.50/thr ( 331.50/sec), total: 3315
ldclt[15114]: Average rate: 98.77/thr ( 296.30/sec), total: 2963
ldclt[15114]: Average rate: 103.03/thr ( 309.10/sec), total: 3091
ldclt[15114]: Average rate: 115.67/thr ( 347.00/sec), total: 3470
ldclt[15114]: Average rate: 282.23/thr ( 846.70/sec), total: 8467
ldclt[15114]: Average rate: 552.80/thr (1658.40/sec), total: 16584
ldclt[15114]: Average rate: 311.60/thr ( 934.80/sec), total: 9348
ldclt[15114]: Number of samples achieved. Bye-bye...
ldclt[15114]: All threads are dead - exit.
ldclt[15114]: Global average rate: 2743.07/thr (822.92/sec), total: 82292
}}}

A second patch will be added here too for creation of slapi_ch_memalign, to malloc aligned memory.

We don't keep the option without NS?

Also if you discontinue an api, you should clean up all the underlying implementation?
libglobs.c:config_get_maxdescriptors(void)

{{{
ldclt[10895]: Average rate: 728.37/thr (2185.10/sec), total: 21851
ldclt[10895]: Average rate: 338.57/thr (1015.70/sec), total: 10157
ldclt[10895]: Average rate: 232.60/thr ( 697.80/sec), total: 6978
ldclt[10895]: Average rate: 171.37/thr ( 514.10/sec), total: 5141
ldclt[10895]: Average rate: 135.10/thr ( 405.30/sec), total: 4053
ldclt[10895]: Average rate: 151.73/thr ( 455.20/sec), total: 4552
ldclt[10895]: Average rate: 597.80/thr (1793.40/sec), total: 17934
ldclt[10895]: Average rate: 364.37/thr (1093.10/sec), total: 10931
ldclt[10895]: Average rate: 215.43/thr ( 646.30/sec), total: 6463
ldclt[10895]: Average rate: 207.90/thr ( 623.70/sec), total: 6237
ldclt[10895]: Number of samples achieved. Bye-bye...
ldclt[10895]: All threads are dead - exit.
ldclt[10895]: Global average rate: 3143.23/thr (942.97/sec), total: 94297

}}}

Those numbers look good! What exact ldclt command did you use? Number of threads?

In addition to Noriko's comments can you add a small comment to slapi_ch_memalign() stating what it does. We know what it does, but it might throw off others looking at the code.

/opt/dirsrv/bin/ldclt -h localhost -p 389 -n 30 -N 10 -D "uid=testXXXX,dc=example,dc=com" -w passwordXXXX -e "randombinddn,randombinddnlow=0001,randombinddnhigh=9999" -e bindeach,bindonly

With 10,000 objects in the DS.

We don't keep the option without NS?

What do you mean?

Also if you discontinue an api, you should clean up all the underlying implementation?
libglobs.c:config_get_maxdescriptors(void)

Ahhh I didn't realise that this was the only place we used it? I'll have a look.

Okay, I will add the comment to the memalign. Thanks for the review.

Merge the two patches into one. Update based on comments.
0001-Ticket-48996-update-DS-for-ns-0.2.0.3.patch

Updated patch based on comments. Merge the two for easier review.

It looks like config_get_maxdescriptors is used in get_configured_connection_table_size, which is used to set
the connection table size when nunc-stans is disabled. So I'm going to leave that alone.

I'm including everyone in the CC list. :) I've had a short conversation with Nathan regarding this topic -- how to switch from the current connection table to the Nunc-Stans implementation.

We used to have a configuration switch to support 2 methods in some improvements, which turned out not so useful after all considering the support burden and the codes' readability. So, we do not want to do that here. Instead, can we put the new DS code for NS 0.2.0 into one new file (let me call it new_connection.c temporarily) and switch between the old code (connection.c + conntable.c?) and new_connection.c by a configure option?

Until NS 0.2.0 & new_connection.c are fully stabilized, we should keep the current code available. Otherwise, when debugging some other places and it is affected by the new code, we get easily confused...

This is just a proposal. Your comments would be greatly appreciated.

I want to clear something up.

This ticket does not remove or alter the behaviour of the connection table. That behaviour stays. For discussion of change to the connection table and it's removal, there is bug #48977. I completely agree that when we are working on that ticket we should do as you suggest, and make it in a new .c file in parallel. I think at that time when it is complete, we should take away the nsslapd-enable-nunc-stans: on setting because "today" we have a customer who relies on this setting and it's effect on accept().

Now lets look at the current change.

Currently, nunc-stans 0.1.8 is broken IMO. It's not reliable and has a lot of stability flaws. The work to 0.2.0 has all been focused on correctness and stability. But that change has involved changing the API of nunc-stans. This is why this ticket is altering the current nunc-stans integration code to make it possible to use it as a stop gap until we implement #48977.

In summary:

#48996 - (this ticket) is to use 0.2.0 with the current nunc-stans accept() and connection table code, keep the status quo.
#48977 - is to use 0.2.0 as the thread workers, and to remove the connection table. When this is complete, we would #ifdef out the current connection code with a compile flag.

Does that seem like a reasonable explanation and path forwards with this?

Ah, ok. I thought you were trying to get rid of, e.g., config_get_maxdescriptors which is not needed for NS 0.2.0 any more. But sorry, you are right. You are removing just in #ifdef ENABLE_NUNC_STANS.

Since it is really hard to review with lots of indentation changes, I'm attaching a diff withouth it and review based upon the file...

Made from 0001-Ticket-48996-update-DS-for-ns-0.2.0.3.patch​ ignoring space chanages.
diffs.txt

Please fix the indentation...
{{{
ldap/servers/slapd/backend_manager.c
15 + Slapi_PBlock pb;
}}}
I think you don't want to remove this memset.
{{{
ldap/servers/slapd/pblock.c
264 + / memset( pb, '\0', sizeof(Slapi_PBlock) ); /
}}}
I think you'd better follow the doxygen format? (yes, I know the other slapi_ch_ functions even do not have comments... :)
{{{
ldap/servers/slapd/slapi-plugin.h
276 +/
277 + * memalign returns an alligned block of memory as a multiple of alignment.
278 + * alignment must be a power of 2. This is not normally needed, but is required
279 + * for memory that works with certain cpu operations. It's basically malloc
280 + * with some extra guarantees.
281 +
/
}}}

Just for the ticket, we do want to remove the memset. This is the only place we call pblock_init_common. The memset actually triggers a stack overflow when used. We shouldn't be using memset like this in 2016, as we can allocate the struct off the stack with each loop iteration as empty using:

{{{
Slapi_PBlock pb = {0};
}}}

Which is much cleaner, doesn't cause the overflow, and a correct / modern way to do this.

Replying to [comment:12 firstyear]:

Just for the ticket, we do want to remove the memset. This is the only place we call pblock_init_common. The memset actually triggers a stack overflow when used.

Then we should increase the stack size.

We shouldn't be using memset like this in 2016, as we can allocate the struct off the stack with each loop iteration as empty using:

{{{
Slapi_PBlock pb = {0};
}}}

Which is much cleaner, doesn't cause the overflow, and a correct / modern way to do this.

In the past, this wouldn't work. Did C99 change automatic stack variable structure initialization to be able to initialize the entire structure to zeros using a simple {0}, and not {0, 0, ...} for each structure member?

If in both cases we are initializing a pblock structure that is too large on the stack, why does memset fail but variable initialization succeed? Aren't they both accessing past the end of a struct that is too large on the stack?

Replying to [comment:13 rmeggins]:

Replying to [comment:12 firstyear]:

Just for the ticket, we do want to remove the memset. This is the only place we call pblock_init_common. The memset actually triggers a stack overflow when used.

Then we should increase the stack size.

Stack size doesn't affect it. All my investigation, I could never work out what was really going wrong here, and this was the only reliable fix.

We shouldn't be using memset like this in 2016, as we can allocate the struct off the stack with each loop iteration as empty using:

{{{
Slapi_PBlock pb = {0};
}}}

Which is much cleaner, doesn't cause the overflow, and a correct / modern way to do this.

In the past, this wouldn't work. Did C99 change automatic stack variable structure initialization to be able to initialize the entire structure to zeros using a simple {0}, and not {0, 0, ...} for each structure member?

I believe so. I'm not seeing any compiler warnings for backend_manager.c.

If in both cases we are initializing a pblock structure that is too large on the stack, why does memset fail but variable initialization succeed? Aren't they both accessing past the end of a struct that is too large on the stack?

I tried for ages to solve this. I was disassembling the libraries, running under gdb, everything. It didn't make any sense what so ever. But asan would detect that we were over-running the struct causing the overflow when we called this memset. Curiously, only while nunc-stans is enabled (which led to another slew of NS fixes, but still no dice). What makes it weirder is that the threads aren't even part of the nunc-stans worker pools, so I'm just as lost now as to why it's happening.

So I could spend the next week working it out, or I could just supply a fix that is cleaner, more modern and removes a memset.

Replying to [comment:14 firstyear]:

Replying to [comment:13 rmeggins]:

Replying to [comment:12 firstyear]:

Just for the ticket, we do want to remove the memset. This is the only place we call pblock_init_common. The memset actually triggers a stack overflow when used.

Then we should increase the stack size.

Stack size doesn't affect it. All my investigation, I could never work out what was really going wrong here, and this was the only reliable fix.

We shouldn't be using memset like this in 2016, as we can allocate the struct off the stack with each loop iteration as empty using:

{{{
Slapi_PBlock pb = {0};
}}}

Which is much cleaner, doesn't cause the overflow, and a correct / modern way to do this.

In the past, this wouldn't work. Did C99 change automatic stack variable structure initialization to be able to initialize the entire structure to zeros using a simple {0}, and not {0, 0, ...} for each structure member?

I believe so. I'm not seeing any compiler warnings for backend_manager.c.

Would you see compiler warnings if all of the pblock members were not initialized to 0?
At any rate, on Fedora 24, a simple test shows that all structure members are initialized to 0, if {{{ = {0} }}} is used or memset is used. It is a trivial example, and perhaps the behavior would change in a large program, but I don't know why that would be the case.

If in both cases we are initializing a pblock structure that is too large on the stack, why does memset fail but variable initialization succeed? Aren't they both accessing past the end of a struct that is too large on the stack?

I tried for ages to solve this. I was disassembling the libraries, running under gdb, everything. It didn't make any sense what so ever. But asan would detect that we were over-running the struct causing the overflow when we called this memset.

Then it would seem to have nothing to do with the stack size. It doesn't make sense unless the size of the pblock structure is somehow different between the two contexts.

Curiously, only while nunc-stans is enabled (which led to another slew of NS fixes, but still no dice). What makes it weirder is that the threads aren't even part of the nunc-stans worker pools, so I'm just as lost now as to why it's happening.

So I could spend the next week working it out, or I could just supply a fix that is cleaner, more modern and removes a memset.

Yes, but I am concerned that we are just masking a real bug that will come back to haunt us later.

In the past, this wouldn't work. Did C99 change automatic stack variable structure initialization to be able to initialize the entire structure to zeros using a simple {0}, and not {0, 0, ...} for each structure member?

I believe so. I'm not seeing any compiler warnings for backend_manager.c.

Would you see compiler warnings if all of the pblock members were not initialized to 0?
At any rate, on Fedora 24, a simple test shows that all structure members are initialized to 0, if {{{ = {0} }}} is used or memset is used. It is a trivial example, and perhaps the behavior would change in a large program, but I don't know why that would be the case.

Glad that you tested it and got the same result as I did. When I see things like this, I'll probably start replacing them over time. If I had more time, I'd probably do an en-mass cleanup of whitespace and things like this TBH.

If in both cases we are initializing a pblock structure that is too large on the stack, why does memset fail but variable initialization succeed? Aren't they both accessing past the end of a struct that is too large on the stack?

I tried for ages to solve this. I was disassembling the libraries, running under gdb, everything. It didn't make any sense what so ever. But asan would detect that we were over-running the struct causing the overflow when we called this memset.

Then it would seem to have nothing to do with the stack size. It doesn't make sense unless the size of the pblock structure is somehow different between the two contexts.

Which i was testing for, but couldn't work out. It didn't make any sense what was occurring, because in gdb reading the last field would show an address at 704 bytes with a len of 8 bytes (to make 712) and you could read/write that just fine. It was only the application of the memset that had the issue.

Curiously, only while nunc-stans is enabled (which led to another slew of NS fixes, but still no dice). What makes it weirder is that the threads aren't even part of the nunc-stans worker pools, so I'm just as lost now as to why it's happening.

So I could spend the next week working it out, or I could just supply a fix that is cleaner, more modern and removes a memset.

Yes, but I am concerned that we are just masking a real bug that will come back to haunt us later.

It could just as easily be an issue with asan, as it could be a real bug. Again, we've satisfied the tools, and the test cases, so I think the question here is if it's worth me investigating. I'm still continually running dynamic analysis tools, so I'll pick up issues pretty fast if they do crop up. I'm trying to be proactive rather than reactive here. I'm not adding "more" issues here, and this change allow us to pass from crash, to passing a large load test. If it was likely to cause us "another issue", surely we would see this issue pass to another overflow later, but that's not what I saw.

Okay, undoing the memset change, but the patch stays as is otherwise.

commit ff9dafa
Compressing objects: 100% (14/14), done.
Writing objects: 100% (14/14), 3.44 KiB | 0 bytes/s, done.
Total 14 (delta 12), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
617b64f..ff9dafa master -> master

Once your patch is applied, the build started working again. Thanks.

commit e23f2af
Compressing objects: 100% (23/23), done.
Writing objects: 100% (23/23), 9.31 KiB | 0 bytes/s, done.
Total 23 (delta 20), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
b96806c..5cc301f master -> master

Metadata Update from @firstyear:
- Issue assigned to firstyear
- Issue set to the milestone: 1.3.6 backlog

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/2055

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata