#408 Create a normalized dn cache
Closed: wontfix None Opened 11 years ago by mreynolds.

DN normalization is very expensive, it would be much more efficient to implement a cache of normalized DNs. This would definitely give an overall performance boost, especially around static groups.


set default ticket origin to Community

Added initial screened field value.

I don't think you need cache_misses - you don't really need it except for monitoring and you can always derive it there

There are a lot of calls to strlen, strdup, and other functions that have to iterate the string first. I suggest calculating the length once.

For example, ndn_cache_lookup - return the udn_len with the udn - avoid using strdup and instead use malloc/memcpy OR use a berval for the dn len/val and use slapi_ber_bvcpy and avoid passing around separate dn and len.
{{{
2708 if(strlen(ndn_ht_val->ndn) == dn_len ){
}}}
Why not use ndn_ht_val->size == dn_len?

same with ndn_cache_add - avoid strlen/strdup if possible

Replying to [comment:7 rmeggins]:

I don't think you need cache_misses - you don't really need it except for monitoring and you can always derive it there
Good point. I only added it for debugging actually.

There are a lot of calls to strlen, strdup, and other functions that have to iterate the string first. I suggest calculating the length once.

For example, ndn_cache_lookup - return the udn_len with the udn - avoid using strdup and instead use malloc/memcpy OR use a berval for the dn len/val and use slapi_ber_bvcpy and avoid passing around separate dn and len.
{{{
2708 if(strlen(ndn_ht_val->ndn) == dn_len ){
}}}
Why not use ndn_ht_val->size == dn_len?
No I can't, but it was easy to add the len to the struct and fill it.

same with ndn_cache_add - avoid strlen/strdup if possible

I got rid of all the slapi_ch_strdup's, strcmp's, and strlen's that I could.

Running my basic performance test(randoming searching on users, I think there is improvement with these changes. Some of my tests got as high as a 5% increase, but the average is still around 2-3% which is what I've been observing from the start.

Looks good to me. One tiny corner case. When flushing (ndn_cache_flush), if the nodes in the lru list is shorter than FLUSH_COUNT (10000), something bad happens?
2867 ndn_cache_flush()
2868 {
[...]
2872 node = ndn_cache->tail;
2873 for(i = 0; i < FLUSH_COUNT; i++){
2874 flush_node = node;
[...]

Replying to [comment:9 nhosoi]:

Looks good to me. One tiny corner case. When flushing (ndn_cache_flush), if the nodes in the lru list is shorter than FLUSH_COUNT (10000), something bad happens?
2867 ndn_cache_flush()
2868 {
[...]
2872 node = ndn_cache->tail;
2873 for(i = 0; i < FLUSH_COUNT; i++){
2874 flush_node = node;
[...]

Nice catch! I used to have it at 1000, but after I saw how many DN's were being cache I figured the number needed to be bumped up. Its possible to set the max cache size to 1mb, and trying to flush 10000 dn's would definitely over run and crash. Anyway I set a minimum number of entries to maintain, so we will never run over. Thanks.

Is this necessary?
{{{
ndn_cache_add()
...
if(strlen(ndn) > ndn_len){
/ we need to null terminate the ndn /
*(ndn + ndn_len) = '\0';
}
}}}
why (dn_len * 2)?
{{{
size = (dn_len * 2) + ndn_len + sizeof(PLHashEntry) + sizeof(struct ndn_hash_val) + sizeof(struct ndn_cache_lru);
}}}

in ndn_cache_delete() are you supposed to free ht_val->key? I didn't see where that was freed

One problem with our DN implementation is that there is no way for the normalize code to return a const char * val - that is, it either normalizes the string in place, or it returns a malloc'd string. We might be able to get a performance boost by allowing the use of a const string - that is, one that cannot be changed, and does not have to be freed.

Replying to [comment:11 rmeggins]:

Is this necessary?
{{{
ndn_cache_add()
...
if(strlen(ndn) > ndn_len){
/ we need to null terminate the ndn /
*(ndn + ndn_len) = '\0';
}
}}}

Absolutely! I was testing with DN's that had many many spaces to be removed. The "normalized" string still contained characters from the udn: "uid=mark,dc=example,dc=com dc = example , dc = com ". Now, we need to null terminate the value before processing that key for the hash. If you look at all the places where slapi_dn_normalize_ext is called. If the return value is 1, the string is null terminated after its normalized. So we need to do "that" null terminating at this phase for the ndn_cache.

I originally was going to add this null terminating to the normalize function, but there are so many places where it does this as separate step. I was trying to keep the fix simple. We can always open a new ticket to make this all one step.

why (dn_len * 2)?
{{{
size = (dn_len * 2) + ndn_len + sizeof(PLHashEntry) + sizeof(struct ndn_hash_val) + sizeof(struct ndn_cache_lru);
}}}

Ok, the one dn is the key(in the hash table), the other one is the lru node key, and ndn is in the hash entry.

in ndn_cache_delete() are you supposed to free ht_val->key? I didn't see where that was freed

It's freed in ndn_cache_flush() right after the call to ndn_cache_delete.

One problem with our DN implementation is that there is no way for the normalize code to return a const char * val - that is, it either normalizes the string in place, or it returns a malloc'd string. We might be able to get a performance boost by allowing the use of a const string - that is, one that cannot be changed, and does not have to be freed.

Replying to [comment:12 mreynolds]:

Replying to [comment:11 rmeggins]:

Is this necessary?
{{{
ndn_cache_add()
...
if(strlen(ndn) > ndn_len){
/ we need to null terminate the ndn /
*(ndn + ndn_len) = '\0';
}
}}}

Absolutely! I was testing with DN's that had many many spaces to be removed. The "normalized" string still contained characters from the udn: "uid=mark,dc=example,dc=com dc = example , dc = com ". Now, we need to null terminate the value before processing that key for the hash. If you look at all the places where slapi_dn_normalize_ext is called. If the return value is 1, the string is null terminated after its normalized. So we need to do "that" null terminating at this phase for the ndn_cache.

I originally was going to add this null terminating to the normalize function, but there are so many places where it does this as separate step. I was trying to keep the fix simple. We can always open a new ticket to make this all one step.

Thanks for considering this. Some ACL code handles a long string which contains a pre-normalized DN in the middle. Adding NULL to such ACL string broke some cases and I gave it up...

git merge ticket408
Updating 1f775d7..1d6dd39
Fast-forward
ldap/ldif/template-dse.ldif.in | 3 +-
ldap/servers/slapd/attrsyntax.c | 23 --
ldap/servers/slapd/back-ldbm/monitor.c | 26 +++-
ldap/servers/slapd/dn.c | 347 ++++++++++++++++++++++++++++++++
ldap/servers/slapd/libglobs.c | 71 +++++++-
ldap/servers/slapd/main.c | 3 +
ldap/servers/slapd/proto-slap.h | 7 +
ldap/servers/slapd/schema.c | 17 --
ldap/servers/slapd/slap.h | 6 +
ldap/servers/slapd/slapi-private.h | 6 +-
10 files changed, 462 insertions(+), 47 deletions(-)

[mareynol@localhost servers]$ git push origin master
Counting objects: 33, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (17/17), done.
Writing objects: 100% (17/17), 5.98 KiB, done.
Total 17 (delta 15), reused 0 (delta 0)
To ssh://git.fedorahosted.org/git/389/ds.git
1f775d7..1d6dd39 master -> master

git merge ticket408
Updating 26aad75..03c90f0
Fast-forward
ldap/servers/slapd/back-ldbm/monitor.c | 2 +-
ldap/servers/slapd/dn.c | 93 +++++++++++++++++++++++++-------
ldap/servers/slapd/main.c | 1 +
ldap/servers/slapd/slapi-private.h | 2 +
4 files changed, 78 insertions(+), 20 deletions(-)

git push origin master
26aad75..03c90f0 master -> master

commit 03c90f0
Author: Mark Reynolds mreynolds@redhat.com
Date: Thu Jan 16 15:21:28 2014 -0500

1.3.2
d06de4e..50ad64a 389-ds-base-1.3.2 -> 389-ds-base-1.3.2

1.3.1
4aa849f..e0d85be 389-ds-base-1.3.1 -> 389-ds-base-1.3.1

1.3.0
9936fdd..075a54e 389-ds-base-1.3.0 -> 389-ds-base-1.3.0

git patch file (389-ds-base-1.3.1) -- covscan Defect type: FORWARD_NULL
0001-Ticket-408-create-a-normalized-dn-cache.2.patch

Reviewed by Rich (Thank you!!)

Pushed to master:
272bd14..383db4a master -> master
commit 5ac0803

Pushed to 389-ds-base-1.3.2:
5bf85e6..8b92149 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit 977c1b2

Pushed to 389-ds-base-1.3.2:
4f9ec32..86b76ef 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 056d390

Found second FORWARD_NULL in addition to commit 5ac0803

Pushed to master:
383db4a..ea13cda master -> master
commit ea13cda

Pushed to 389-ds-base-1.3.2:
8b92149..dfa36fd 389-ds-base-1.3.2 -> 389-ds-base-1.3.2
commit dfa36fd

Pushed to 389-ds-base-1.3.1:
86b76ef..347ffb7 389-ds-base-1.3.1 -> 389-ds-base-1.3.1
commit 347ffb7

83ecd45..2a8da7e 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 2a8da7e
Author: Mark Reynolds mreynolds@redhat.com
Date: Fri Dec 19 16:36:11 2014 -0500

git patch file (1.2.11 branch) -- fixing the type of nsslapd-ndn-cache-enabled
0001-Ticket-408-Backport-of-Normalized-DN-Cache.patch

Thank you for the review, Rich!

Pushed to 389-ds-base-1.2.11:
88ecf0c..37d5696 389-ds-base-1.2.11 -> 389-ds-base-1.2.11
commit 37d5696

Metadata Update from @rmeggins:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.2.11.33

7 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/408

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Login to comment on this ticket.

Metadata