With very large databases, some queries go through a lot of work to build huge ID lists for filter components with many matching IDs. For example, a search for (&(objectclass=inetorgperson)(uid=foo)) may build a huge idlist for objectclass=inetorgperson only to throw it away to intersect it with uid=foo. In these cases, it would be useful to be able to tell the indexing code to use a different idlistscanlimit for certain indexes, or use no idlist at all. In the above case, it would be useful to tell the indexing code to skip building an idlist for objectclass=inetorgperson, but still use the default idlistscanlimit for other objectclass searches (e.g. objectclass=groupOfNames).
This would also help in https://fedorahosted.org/389/ticket/47474 - if there are several million IDs for each of the objectclass= filter components, being able to skip id list generation for the objectclass values would make that query very fast.
We can't reuse nsslapd-idlistscanlimit, so perhaps a new attribute
dn: cn=attrname,cn=index,... objectclass: nsIndex nsIndexIDSize: NNNN[:type][:eqvalue:eqvalue:...]
Where NNNN is the max ID list size (or 0 for no list at all) type is the type of index (sub, pres, eq) eqvalue are for equality indexes - these are the values to which the max ID list size applies
So in the case of ticket/47474, something like
dn: cn=objectclass,... objectclass: nsIndex nsIndexType: eq nsIndexIDSize: 0:eq:organizationalPerson:inetOrgPerson:organization:organizationalUnit:groupOf Names:groupOfUniqueNames:group
Would effectively disable id list generation for the objectclass values listed.
Note that this will apply to all queries for any of the objectclass values, not just their use in conjunction with this particular search filter.
This looks like a good flexible approach that would be useful for many different situations.
Ticket has been cloned to Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1004876
A couple of questions... 1. Is "nsIndexIDListScanLimit" perpendicular to these config params? nsslapd-pagedlookthroughlimit: 0 nsslapd-pagedidlistscanlimit: 0 nsslapd-rangelookthroughlimit: 5000 2. I think the answer is yes :), but if "nsIndexIDListScanLimit" is set, is this original config param ignored? nsslapd-idlistscanlimit: 4000 3. This is a request... It'd be nice to check if values is NULL or values do not contain '=' (possibility of ptr == NULL)? If no such chance, could you put the comment (or PR_ASSERT?) 206 attr_index_parse_idlistsize_values(Slapi_Attr attr, struct index_idlistsizeinfo idlinfo, char values, const char strval, char returntext) 207 { ... 210 char ptr = PL_strchr(values, '='); ... 220 ++ptr;
356 attr_index_parse_idlistsize_limit(char *ptr, struct index_idlistsizeinfo *idlinfo, char *returntext) 357 { ... 361 ptr++; 380 attr_index_parse_idlistsize_type(char *ptr, struct attrinfo *ai, struct index_idlistsizeinfo *idlinfo, const char *val, const char *strval, char *returntext) 381 { ... 389 do { 390 ++ptr; 458 attr_index_parse_idlistsize_flags(char *ptr, struct index_idlistsizeinfo *idlinfo, const char *val, const char *strval, char *returntext) 459 { ... 464 do { 465 ++ptr;
Replying to [comment:5 nhosoi]:
A couple of questions... 1. Is "nsIndexIDListScanLimit" perpendicular to these config params? nsslapd-pagedlookthroughlimit: 0 nsslapd-pagedidlistscanlimit: 0 nsslapd-rangelookthroughlimit: 5000
I did not implement any special support for ranges. I will need to do that, and matching rules. But otherwise, yes, the new code will override nsslapd-pagedidlistscanlimit if set.
I think the answer is yes :), but if "nsIndexIDListScanLimit" is set, is this original config param ignored? nsslapd-idlistscanlimit: 4000
Yes. If there is a matching request, the matching request will override this value. Otherwise, if there is no matching request, the default value of nsslapd-idlistscanlimit/nsslapd-pagedidlistscanlimit will be used.
This is a request... It'd be nice to check if values is NULL or values do not contain '=' (possibility of ptr == NULL)? If no such chance, could you put the comment (or PR_ASSERT?)
Ok. At this point in the code, ptr should always be set. So I'll add PR_ASSERT.
206 attr_index_parse_idlistsize_values(Slapi_Attr attr, struct index_idlistsizeinfo idlinfo, char values, const char strval, char returntext) 207 { ... 210 char ptr = PL_strchr(values, '='); ... 220 ++ptr; 356 attr_index_parse_idlistsize_limit(char ptr, struct index_idlistsizeinfo idlinfo, char *returntext) 357 { ... 361 ptr++; 380 attr_index_parse_idlistsize_type(char ptr, struct attrinfo ai, struct index_idlistsizeinfo idlinfo, const char val, const char strval, char returntext) 381 { ... 389 do { 390 ++ptr; 458 attr_index_parse_idlistsize_flags(char ptr, struct index_idlistsizeinfo idlinfo, const char val, const char strval, char *returntext) 459 { ... 464 do { 465 ++ptr;
206 attr_index_parse_idlistsize_values(Slapi_Attr attr, struct index_idlistsizeinfo idlinfo, char values, const char strval, char returntext) 207 { ... 210 char ptr = PL_strchr(values, '='); ... 220 ++ptr;
356 attr_index_parse_idlistsize_limit(char ptr, struct index_idlistsizeinfo idlinfo, char *returntext) 357 { ... 361 ptr++;
380 attr_index_parse_idlistsize_type(char ptr, struct attrinfo ai, struct index_idlistsizeinfo idlinfo, const char val, const char strval, char returntext) 381 { ... 389 do { 390 ++ptr;
458 attr_index_parse_idlistsize_flags(char ptr, struct index_idlistsizeinfo idlinfo, const char val, const char strval, char *returntext) 459 { ... 464 do { 465 ++ptr;
Thanks for the answers, Rich. Ack.
final version of patch 0001-Ticket-47504-idlistscanlimit-per-index-type-value.patch
changes since the previous patch newdiffs
The design document for this functionality is located here:
http://port389.org/wiki/Design/Fine_Grained_ID_List_Size
To ssh://git.fedorahosted.org/git/389/ds.git 5005db5..b5ad052 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit b5ad052 Author: Rich Megginson rmeggins@redhat.com Date: Mon Sep 16 09:49:14 2013 -0600 e61009e..3ea8e58 389-ds-base-1.3.0 -> 389-ds-base-1.3.0 commit 3ea8e58 Author: Rich Megginson rmeggins@redhat.com Date: Mon Sep 16 09:49:14 2013 -0600 c244a9b..b348886 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit b348886 Author: Rich Megginson rmeggins@redhat.com Date: Mon Sep 16 09:49:14 2013 -0600 385b5dc..824b301 master -> master commit 824b301 Author: Rich Megginson rmeggins@redhat.com Date: Mon Sep 16 09:49:14 2013 -0600
Linked to Bugzilla bug: https://bugzilla.redhat.com/show_bug.cgi?id=1011539 (''Red Hat Enterprise Linux 7'')
To ssh://git.fedorahosted.org/git/389/ds.git b5ad052..d83311a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit d83311a Author: Rich Megginson rmeggins@redhat.com Date: Tue Sep 24 08:18:57 2013 -0600 3ea8e58..527c3e4 389-ds-base-1.3.0 -> 389-ds-base-1.3.0 commit 527c3e4 Author: Rich Megginson rmeggins@redhat.com Date: Tue Sep 24 08:18:57 2013 -0600 b348886..e95d7d6 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit e95d7d6 Author: Rich Megginson rmeggins@redhat.com Date: Tue Sep 24 08:18:57 2013 -0600 824b301..36f506d master -> master commit 36f506d Author: Rich Megginson rmeggins@redhat.com Date: Tue Sep 24 08:18:57 2013 -0600
To ssh://git.fedorahosted.org/git/389/ds.git d83311a..373e36a 389-ds-base-1.2.11 -> 389-ds-base-1.2.11 commit 373e36a Author: Rich Megginson rmeggins@redhat.com Date: Wed Sep 25 08:51:12 2013 -0600 527c3e4..c96eaa0 389-ds-base-1.3.0 -> 389-ds-base-1.3.0 commit c96eaa0 Author: Rich Megginson rmeggins@redhat.com Date: Wed Sep 25 08:51:12 2013 -0600 e95d7d6..e5405e6 389-ds-base-1.3.1 -> 389-ds-base-1.3.1 commit e5405e6 Author: Rich Megginson rmeggins@redhat.com Date: Wed Sep 25 08:51:12 2013 -0600 d9f25b7..058d01d master -> master commit 058d01d Author: Rich Megginson rmeggins@redhat.com Date: Wed Sep 25 08:51:12 2013 -0600
Metadata Update from @nkinder: - Issue set to the milestone: 1.3.2 - 09/13 (September)
389-ds-base is moving from Pagure to Github. This means that new issues and pull requests will be accepted only in 389-ds-base's github repository.
This issue has been cloned to Github and is available here: - https://github.com/389ds/389-ds-base/issues/841
If you want to receive further updates on the issue, please navigate to the github issue and click on subscribe button.
subscribe
Thank you for understanding. We apologize for all inconvenience.
Metadata Update from @spichugi: - Issue close_status updated to: wontfix (was: Fixed)
Login to comment on this ticket.