Ebusiness Load Runner Ramp up Performance Bottleneck - Part 1

 A series of brain dumps of getting ebusiness up to speed when using load runner against it. Part 1 deals with changing the Apps 11i apache defaults to handle a larger load.

If your ebusiness system is of a spec to allow many thousands of concurrent users to be logged in but during load testing you have a performance issue or possible hang during the ramp up phase, and you have looked at all the usual things - CPus, memory, Jservs etc - then you maybe suffering an issue with Apaches ability to handle the frequency of the connections.

You may never see this issue in system with real users as they tend to be less mechanical than a load testing tool. But perhaps you might see it if you production system went down during you most busy period, and as soon as available people flooded on.

If a load test with a tool like loadrunner the symptoms may present as follows

- during ramp system shows poor response
- at some point in ramp up the system hang

The above issues are related but not fixed by the something. You wont necessarily get both symptoms but most likely if you get the second your are probably suffering from the first.

Depending on how your Apache logging is setup is how easy it is to diagnose the issue. Also potentially this just a symptom of the issue, I.e. Maybe the loadrunner tool is going wrong and causing the issue.

On the default logging you will see the message

[error] server reached MaxClients setting, consider raising the MaxClients setting 

When apache runs out of clients. Generally the default is 512 and the maximum value is 1024, although this can be increased by recompiling the kernel.

To get a better idea of a more optimum value and to see how fast more processes are being spawned you can see when the logging has been increased to

Debug

You will see a number of INFO messages like

[info]! server seems busy, (you may need to increase StartServers, or Min/MaxSpareServers), spawning 16 children, there are 0 idle, and 507 total children

From the above message you can work out what to set the parameter in you httpd.Conf file via the autoconfig XML file. The parameters you are interested in, shown with defaults, are:

MinSpareServers 5
MaxSpareServers 10
StartServers 5
MaxClients 512

MaxClient as the name suggests is the parameter that controls the maximum number of simultaneous connections. The default is 512 and the maximum in apps 11i (unless you recompile apache) is 1024. If your system is running out of clients, and not caused by a run away process, it should be safe to increase this on a modern system to 1024 will no ill effect. This won't cure the slow response but should stop it falling over completely.

MinSpareServers and MaxSpareServers control the pool of spare process kept in reserve to deal with a flood of new connections. This is because creating a process for each incoming connection on demand, especially if many happen simultaneously is expensive. This will almost certainly be a probable cause of poor performance in ramp up. You can see the rate of spawning from the INFO messages about. Conversely having a pool of processes to big waste resource. So when the number of spare processes drops below the MinSpareServers it will create enough new ones to bring the minimum backup to the correctly level. When the number of idle processes is greater than the MaxSpareServers it will reduce these idle processes until returned to an acceptable level.

StartServers is the number of processes that should be procreated when Apache starts up. If you have a ramp up performance issue you will probably need to increase this to the number of expected max connections. You can work this out from the INFO messages as it will tell you the number of processes created, so you would set StartServers to the maximum value of processes reached in an initial test run. You may have to tweak this number, along with MaxSpareServers until you can do a run that creates no processes.

If you start seeing too many open file issues then it might be the usual not enough file descriptors, some time the ebusiness apache control templates have the ulimit hardcoded, or if it accompanied by the MaxClient error it is probably a side effect of hitting this.

If you see no errors alluding to MaxClients then Double check:

- Soft/hard limit in ulimits.Conf
- Max-nofiles Kernel parameter
- Adapcctl.sh for ulimit

If Apache is still, probably randomly, spawning processes until it hits the kernel compiled limit you have 2 further choices.

If you think you really do just need more processes and there is not some other underlying 'leak' then recompile apache with higher limit [not covered here].

If you think that you do have a leak you can try changing this parameter

MaxRequestsPerChild

From 0 (unlimited) to 10000 

Note that a short load test might now be passed by altering the 4 initial parameters above, but you may find that when you perform an endurance test (24hrs+ and an average userload) you then hit this more obvious process leakage issue.

For us this lead to the following changes to the httpd.Conf file to pass the peak and endurance load tests on ebusiness.

MinSpareServers 32
MaxSpareServers 512
StartServers 512
MaxClients 1024
MaxRequestsPerChild 10000

Other things that might help to further tweak you ebusiness performance include os and nfs tuning (covered in separated blog entries) or the traditional database and code tuning.

randomness