Issue #386: Overconsumption of memory with large cachememsize and heavy use of ldapmodify - 389-ds-base

389-ds-base

#386 Overconsumption of memory with large cachememsize and heavy use of ldapmodify

Closed: wontfix None Opened 11 years ago by beall.

The directory server will consume large amounts of memory when hit with endless ldapmodify requests in ratio to the configured cachememsize (regardless of whether the entries fill the cache or not).

Examples:
12GB cachememsize followed by a stream of ldapmodify quickly fill and crash a 32GB machine.

3GB cachememsize followed by heavy modify activity levels off at a memory usage of 24 GB.

1GB cachememsize followed by heavy modify activity levels off at a memory usage of 11-12 GB (including only 2GB of DB cache).

0GB cachememsize, which is reset to the minimum of 512000, followed by heavy modify activity will result in zero noticeable memory growth.

The server seems to believe it can have an in-memory workspace of too many multiples of the cachememsize. The ratio appears to be somewhere around (7 * cachememsize).

This behavior is seen in both the default RedHat 6.2 installation from the EPEL repository, which was a 1.2.9, as well as the latest 1.2.10 release from the rmeggins repo.

rmeggins commented 11 years ago

set default ticket origin to Community

nkinder commented 11 years ago

Added initial screened field value.

mreynolds commented 11 years ago

beall,

Do you have more info on the setup?

What ldapmodify's are you running?
How many existing entries are there?
It appears you set the entry cache(nsslapd-cachememsize) to a high value, but did you use the default db cache size(nsslapd-dbcachesize)?

I just want to make I'm following your exact steps.

Thanks,
Mark

mreynolds commented 11 years ago

I have been able to reproduce the memory growth. In my initial setup, after priming the entry cache (250K entries) with a cachememsize of 3gigs, the process was around 4 gigs in size. After modifying each entry a few times, it got above 8 gigs.

I ran the same test case under valgrind, and as expected there is no leak. This is such a basic operation that if it did leak we would of seen it along time ago.

I think what we are facing is simply memory fragmentation.

I'm going to run some more tests to see if I can manipulate the results.

beall commented 11 years ago

Hi mreynolds,

It seems you were able to reproduce the issue, so I'd be glad to provide more info if you need it, but I think you have it.

My environment uses over 10 gigs of memory to hold the entire set of entries in cache prior to any growth. Once I start a stream of modifies, this grows beyond the boundaries of the machine I have available. We are in the process of purchasing hardware, but we are sizing the machines to handle more than 8 times the basic entry cache requirements. Hopefully with that size hardware, the usage will level off.

It would be much better if we don't have to buy that huge hardware.

Is there perhaps a linux system setting where we can pre-allocate a large chunk of memory to the process and then it will never fragment?

Thanks,
Russ.

mreynolds commented 11 years ago

Hi Russ,

Looks like we identified a new malloc setting that might help. We are preparing a new test build, and I hope you will be willing to test this for us. The issue is that we don't have a perfect recommendation on what the setting should be.

You just set the env var SLAPD_MXFAST to a value between 0 and 128(128 is the default value).

0 disables the "fastbin" feature. This seems to reduce the fragmentation the most(smallest memory footprint), but there is a small performance impact.

Setting it to 64 reduced the fragmentation, but did not impact the performance as much.

All of this depends on the size of the entries, how many entries, overall usage, etc.

So, we would like you to test this and play around with the value(0, 32, 64, etc), and report your results. Is this something you would be willing to test for us?

If you will test it, I will let you know when/where to get the new 389 package.

Thanks,
Mark

beall commented 11 years ago

I would be very glad to test this. We are at the critical stage of soon to be determining what hardware to buy, and if we can reduce the overconsumption of memory, we can buy nodes considerably cheaper than otherwise needed.

mreynolds commented 11 years ago

Russ, what version of 389 should we create the patch for? And can you confirm the os and version?

Thanks,
Mark

beall commented 11 years ago

Using RedHat 6. uname -a follows:
Linux gds-dev5.usc.edu 2.6.32-279.2.1.el6.x86_64 #1 SMP Thu Jul 5 21:08:58 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux

beall commented 11 years ago

Oh, and the 389 version can be your latest. I am not tied to a specific version. The last version I was testing with is this:
389-Directory/1.2.10.12 B2012.180.1623

mreynolds commented 11 years ago

Russ,

You can get the patch at rmeggins testing repo: 389-ds-base-1.2.10.14-2

Keep me posted on your results.

Thanks,
Mark

beall commented 11 years ago

Ok. I'm running a test but the memory is still growing. I think it is growing a bit slower than before. I'd like to make sure I made the setting correctly. I put the following line in my /etc/sysconfig/dirsrv file:
export SLAPD_MXFAST=0

This is the version now showing in my errors logfile after enabling the testing repo and running a yum update on 389-ds-base:
389-Directory/1.2.10.14 B2012.250.1848

Am I correctly set up to test this?

mreynolds commented 11 years ago

That should work. I also set it from the command line. I saw the same initial growth(regardless of the setting), but in my tests the growth slowed down and stopped after awhile.

In my tests I had a 3 gig entry cache, with 250k entries. I saw it grow up to 9-11 gigs, but with setting this to zero, it only grew to 6-7 gigs.

mreynolds commented 11 years ago

Correction, setting from the command line does NOT work. It must be in /etc/sysconfig/dirsrv.

beall commented 11 years ago

Ok. Initial results are that my 15G cache grows to the 30G hard limit I set for virtual memory usage, but instead of dying for lack of memory, the system is slowly continuing to churn. It will be a while before I know how much slower it is. Definitely good that it is not dying.

I tried to research Linux memory fragmentation, and specifically I wanted a tool that would let me scrape a process and defragment it, but I don't see one. If such a tool existed, we could get by very well since we could just run that once a day or each time the update script completes.

mreynolds commented 11 years ago

Yeah I don't think such tools exists.

Also, don't forget to try other values for SLAPD_MXFAST. I saw good results with using 64. This might work for you, or maybe some other value.

pj101 commented 11 years ago

My 2 cents: the other solution to avoid memory fragmentation would be to use hugepages for large memory sizes (like Oracle databases do, for example) but it would require a great deal of rewrite of the server code i think...

rmeggins commented 11 years ago

If you want to make sure you are using the binary with the mxfast option, do
{{{
strings /usr/sbin/ns-slapd | grep SLAPD_MXFAST
}}}

Replying to [comment:21 pj101]:

My 2 cents: the other solution to avoid memory fragmentation would be to use hugepages for large memory sizes (like Oracle databases do, for example) but it would require a great deal of rewrite of the server code i think...

Can you point me at more information about this?

pj101 commented 11 years ago

Hi Rich,

Huge pages are memory pages of 2Mb instead of 4kb (at least, in Linux). They are usually pinned in memory and not swappable. They can be used (i think) only through a shared memory and should be preallocated - that's a lot of constraints.

The advantage is a much lower overhead for kernel memory management for large memory sizes (staring from ~8Gb). As for the fragmentation, since the number of huge pages is much less fro the same memory size, the fragmentation overhead should be smaller.

Used in large memory installations for Oracle databases (maybe some other vendors do it too, i do it regularly during Oracle installations) and gives a large performance benefit...

Huge pages support in Linux: http://lwn.net/Articles/375098/

A recent presentation (from RedHat) and information about huge pages in RHEL6:
http://www.slideshare.net/raghusiddarth/transparent-hugepages-in-rhel-6
https://access.redhat.com/knowledge/solutions/46111

beall commented 11 years ago

Looks like the env var setting shows up, so I am using the setting. It seems to be changing the behavior to a very small degree.

My latest testing shows though that there is basically little difference in the behavior. The memory still fills up quickly and crashes the server regardless of the MXFAST setting, and I have some "startling" new test results from another angle that shows something fishy going on.

I have been experimenting this morning with a new way to exercise the cache. Perhaps the server isn't meant to do this generally, but it shows up some surprising behavior.

The test is simple:
1. start the server with a cachesize of 1 instead of -1 for unlimited.
2. use ldapmodify on the running server to change that back to -1
3. run ldapsearch on objectclass=* returning the dn
all entries get loaded into the cache by this
actual process memory rises by exactly double the reported "currententrycachesize"
4. use ldapmodify on the running server to change cachesize back to 1
all entries are deleted from the cache according to a search over cn=config, but nothing about the system memory footprint changes
5. run the cycle again loading all entries
entries begin to load and the memory footprint immediately starts rising instead of using existing process memory that should have been freed.
the next time cachesize is set back to 1, a large chunk of the memory footprint is freed up in the system, and goes back to about what it was the first time the entire cache was loaded -- approximately double the original memory used = currententrycachesize plus the dbcachesize -- even though no entries are in the cache and the reported currententrycachesize is the size of one entry.

My conclusion from this is that something in the system is holding onto large chunks of memory instead of freeing them. The server may not be "leaking" them per se, and ultimately freeing the space on server shutdown, but while running it is definitely not freeing large amounts of memory that probably should be freed. Fragmentation shouldn't be an issue when much of the memory should be completely freed up. It is acting like there is a copy or history of the previously cached entries being kept around in addition to the existing current entry cache. Then, adding fragmentation to that usage pattern could definitely cause even greater memory growth over time. The huge pages solution may help with the fragmentation, but it seems there is something more going on.

rmeggins commented 11 years ago

This patch doesn't fix the problem - it allows us to set different values for mxfast

The remainder of the work is still scheduled for 1.3.0.a1

master
commit changeset:20dc4bc/389-ds-base
Author: Mark Reynolds mreynolds@redhat.com
Date: Thu Sep 6 13:21:27 2012 -0400
389-ds-base-1.2.11
commit changeset:8b33f23/389-ds-base
Author: Mark Reynolds mreynolds@redhat.com
Date: Fri Sep 7 13:47:10 2012 -0400

beall commented 11 years ago

Ok. Didn't know that you guys were already looking at more than just fragmentation issues.

I ran some more tests on and off today and focused specifically on checking the functionality of the MXFAST setting. I set my server to cache only half of the entries, so they would generally fit well into memory. Here is what I found.

At each of the values ( 0, 32, 64 ) memory fragmentation did not overwhelm the server and the memory footprint would float up and down with the range 19G to 27G using my newer cache exercise test. No noticeable speed difference was noted.

When SLAPD_MXFAST was commented out of /etc/sysconfig/dirsrv, the server quickly becomes overwhelmed with what must be fragmentation and slows way down for a little while and then crashes.

If you need a speed comparison of MXFAST versus no MXFAST, I'll have to change my cache parameters and run it all again. It seems clear though that this setting has a significant effect on fragmentation.

mreynolds commented 11 years ago

Thanks again for running these tests! Did you notice which setting(0, 32, 64) caused the least amount of memory growth?

beall commented 11 years ago

I did not notice much of a difference but I think 0 was the best as might be expected. I'll probably let more tests keep running in the background while I'm getting other stuff done. If I find out more, I'll let you know.

tjarls commented 11 years ago

Enabling TRIM_FASTBINS is likely to work as well and is preferable to setting MXFAST to zero. How about exposing that malloc option in a similar way?

mreynolds commented 11 years ago

TRIM_FASTBINS this is a compile time option, while MXFAST can be set via environment variable.

Created a new ticket to investigate using TRIM_FASTBINS:

https://fedorahosted.org/389/ticket/489

Closing this ticket.

Metadata Update from @beall:
- Issue assigned to mreynolds
- Issue set to the milestone: 1.3.0.a1

7 years ago

spichugi commented 3 years ago

389-ds-base is moving from Pagure to Github. This means that new issues and pull requests
will be accepted only in 389-ds-base's github repository.

This issue has been cloned to Github and is available here:
- https://github.com/389ds/389-ds-base/issues/386

If you want to receive further updates on the issue, please navigate to the github issue
and click on subscribe button.

Thank you for understanding. We apologize for all inconvenience.

Metadata Update from @spichugi:
- Issue close_status updated to: wontfix (was: Fixed)

3 years ago

Metadata

Assignee

mreynolds

Tags

None

Blocking

None

Depending on

None

Priority

major

Milestone

1.3.0.a1

reviewstatus

None

rhbz

origin

Community

389-ds-base

Source Code

#386 Overconsumption of memory with large cachememsize and heavy use of ldapmodify Closed: wontfix None Opened 11 years ago by beall.

Metadata

#386 Overconsumption of memory with large cachememsize and heavy use of ldapmodify

Closed: wontfix None Opened 11 years ago by beall.