Wilco van Bragt - LinkeIn Wilco van Bragt - Twitter rssa 

My CPU is at 100%. What can I do about it?

One of the main challenges in the Terminal Server environment is the high usage of resources like the CPU. In this article we are going to take a look which causes this high usage and what you can do about it to control the resource usage.

 

 

Processor capacity and Terminal Servers

On a Terminal server a considerably amount of users are working on the server together. All those users are sharing the same resources available in the system. Sharing these resources together results in the behavior that all users are noticing that resource becomes scarce. On top of common rare resources the CPU is definitely number one.

Main reasons are the changeable availability of CPU capacity and unpredictable usage of the CPU by applications.

When CPU resources are scarce users are noticing that immediately. The performance of the Terminal Server decreases which normally results in a user feeling that his application is not responding. Logically Terminal Server administrators need to ensure that the performance of the servers is acceptable to the user environment. In other words the administrator needs to control the resource usage of the CPU in such way that the users do not experience decreases of the performance.

 

What if your terminal servers are experiencing the behavior that CPU resource is (almost) completely exhausted? To solve this behavior the environment needs to be investigated thoroughly. The first step for this investigation is to collect information to analyze why this behavior is occurring.

 

Collecting information




Regrettably exhausted CPU resources are not occurring immediately after the implementation of the terminal server environment, but after a while.

Therefore it is really important to perform a baseline measurement of your servers. The following baseline measurements are ideal for analyzing your environment:

  • Resource usage with no users connected right after installation and configuration of the server(s).
  • Resource usage with a normal amount of users (with normal operation acting) just after the roll-out.

It is also a good idea to collect the same performance counters on a regular basis for trend analysis. I advice the monitor the following performance counters:

  • Processor: processor time %
  • System: Processor queue length
  • Memory: Pages/sec
  • Memory: % Committed bytes in use
  • Network interface: bytes total/sec
  • Physical disk: % disk time
  • Physical disk: Average current disk queue length
  • Paging file: % Usage

At the moment your server is actually experiencing a high CPU usage again you should monitor the performance counters. Likely you should detail you monitoring activities to other performance counters like:

  • Physical disk: % Read time
  • Physical disk: % Write time
  • Physical disk: Current disk queue length
  • Paging file: % Usage peak
  • Processor: % Interrupt time
  • Processor: % User time
  • Network Interface: bytes received/sec
  • Network Interface: bytes send/sec
  • Memory: Page faults/sec
  • Memory: Page reads/sec
  • Memory: Page writes/sec
  • Process: % Processor Time
  • Thread: % Processor Time

It is important to monitor several kinds of resources to analyze your CPU behavior.

 

Analyzing performance counters

Collecting all these performance counters is easy. But analyzing these counters and draw a conclusion from these data is something else. Every situation is different, so there is no standard manual for analyzing and pointing out a cause. Below the most known situations are described.

Single application level

One of the causes is one application that is consuming too many CPU resources. Main reasons are bad written code for usage on a Terminal Server. Often this kind of applications claims the CPU and does not release the CPU when the task is finished or a specific task is consuming all CPU usage for a long time without giving other threads to possibility to use some CPU capacity. If this behavior is present, the counter Processor: % Processor Time, System: Processor queue length, Process: % Processor Time and Thread: % Processor Time will show high values.

Multiple application level

When there are multiple applications on a system it is possible that these applications are fighting to get some CPU capacity. Because of these conflicts claiming the CPU can become overwhelmed with handling those requests as a result of that the tasks behind these requests can not handles anymore. Because we are talking about CPU capacity the counter Processor: % Processor Time will have a high value. Beside the counter Processor: % Interrupt time would have a high value, while the counter Processor: % User time will have a low value.

Resource level

When analyzing the performance data there could be a trend that more resources are showing high values like the counters Paging File: % Usage, Memory: Page faults/sex, Physical disk: % disk time and comparable counters. In this situation there is a possibility that another hardware resource is overwhelmed causing the CPU to rise to a high value. In this case you should analyze which component has also a high value and try to figure out why that resource is overwhelmed.

Too many users level

In cases that beside the Processor: % Processor Time no other counters show high values and in special the counters Process: % Processor Time and Threads: % Processor Time. In other words there are no demonstrable causes. This could mean that with normal usage of the applications the CPU is at his limit. If this behavior occurs the sizing of the terminal servers should be scaled again.

Too many applications

This situation is comparable with too many users level. Again the sizing should be scaled for the Terminal Server.

Setting up a solution

After analyzing the data and making the conclusion the next step is to setting up a solution. Logically the solution depends of the conclusion made.

Adding more hardware

If the CPU is constantly at a high level, a consideration could be made to upgrade the servers. But remember that the systems cannot be extended endlessly. Best practices learn that a Terminal Server best performs with two processors. When adding more processors other resources become the bottleneck in the system so the processor is not used fully. Also lots of applications don’t use additional processors because the software is not written for multi-processor systems. Adding an additional CPU can be considered when the machine now uses only one CPU and the causes are too many users level, too many applications or single application level.

Adding more Terminal Servers

Adding more Terminal Servers to your environment can also add more CPU capacity. Dividing the users over more servers implies automatically more resources available per user. When you have more terminal servers in your infrastructure the administrative tasks are increase. Everything server need some administrating and monitoring activities, also budget must be available to purchase the additional hardware. The solution can be used for the resource level and too many users’ level conclusions.

Introducing Silo concept

The silo concept is a well-know concept to solve application conflicts. Applications that are now known because of conflicts are placed on special terminal servers. On this terminal server only that application (of a small set of applications) are installed. Trough this separation the most used (normal) applications do not suffer from influences of those “bad” applications. For this solutions also additional hardware need to be purchased and adding extra administrative activities to the IT department. This concept can be used for the single application level; multiple application level and the too many applications level conclusions.

Performance Management tools

Above mentioned solutions are all using in some way additional hardware. You can also use a software method to solve this kind of behavior. These so called performance management tools can be divided in two kinds of groups. The first group of products is using the priority method within Windows. On several ways (algorithms, fixed value) the products are determine which processes should have a lower priority. Products which work in such a way are RES Powerfuse and Max-IT from Provision Networks. The second group uses so called CPU clamping. With CPU clamping the product arranges when a specified CPU usage is exceeded that controlling all threads in such way the CPU usage is in a few seconds below this value. Another way in this product race is setting up a maximum of CPU usage for each thread. Well-known products are Appsense Performance Suite, WMSoftware’s Relevos and Threadmaster.

Conclusion

The most important step when dealing with high CPU usage is thoroughly analyzing your infrastructure before drawing a conclusion. When the cause is founded there are several solutions. Compare the possible solutions and choose the one which best fits within your organization and infrastructure.

Article previous published at MSTerminalServices.org.