At almost every project start one of the first questions how many users the environment can handle or how many hardware should be ordered so all users have a good user experience. It is probably the most difficult to answer at the start of a project. Probably if you give the right answer, which is it depends, you will get the question you have experiences in other organizations and aren’t there benchmarks or best practices available. In this article I will try to explain my vision that you cannot determine this at the start of the project, based on my practical experiences at several projects.
I already stated that I think you cannot determine the exact amount of users when your project stars, there is one exception that you can make a good estimate and that’s when you already have running a similar infrastructure. For example if you are running XenApp 6.5 on Windows 2008R2 and migrating to XenDesktop 7.x you can probably use your current ratio subtracting the figures out of the community for the new OS you will be using. In all other cases it will be very difficult to “guess” the amount of users per XenApp server (VM) or the amount of VDIs that can be hosted on one physical server. The amount really depends on the usage and action taken by the users and even more on the type of applications being used in the organization and therefore pretty difficult to “guess” the amount of users. Sometimes people are showing be results by a hardware vendor or the software vendor (Microsoft, Citrix and VMware) where they show hundreds of users on the system. Most times they are using one easy application (like notepad) to load the system, which is logically is not representative for real world scenarios.
Therefore it is really important to a decent Proof of Concept and Pilot phase. During these phases one of the activities is to determine the amount of user you can host.
Proof of Concept phase: Determine final (hardware) configuration
In the Proof of Concept phase you want to determine the final (hardware) configuration. The best method is to set-up a configuration based on the best practices out of the community and validate if this is indeed the best configuration for the organization. I have seen several examples that the final result was different than the best practices. To determine the best way is to use a loadtest product. LoginVSI is one of best know, but there are logically other vendors in the market like Denamik LoadGen, Leaptest, LoadRunner and CitraTest VU as some other examples. Personally I have experiences with LoginVSI and LoadGen.
With these products you should do two kinds of tests. A “default” test so you can compare the results with other organizations. With LoginVSI this can be done to run one of the default load tests and (let) compare the results with other LoginVSI test. This gives a good impression if your infrastructure is providing results which are comparable with other organizations based on the used hardware and there are no configuration errors made. Secondly you should create a test script that is reflecting your organization including the most used applications and the resource intensive applications. This can be a challenge as you need to know which applications are used most and/or resource intensive. Secondly you need to know which activities are done within this application and if those can be used for recurrent tasks. Last but not least these tasks need to be added to a script including timers to determine how long these tasks are taking. This can be very challenging. If this is your first time I strongly to advice to hire a representative of the software product to help you with that.
After the test script is made you need to run the test multiple times with multiple configurations. With those multiple test you are testing the most efficient/best configuration for VM resources (CPU, Memory) and the amount of VMs on the hypervisor. If you are working with physical hardware (for XenApp, it is still be done) you would like to have different set-up of the hardware preferable. In practice the decision is already made and the test is only done to see how many session the server can host (and hopefully the amount is what the organization is expecting). Also run the same test more than once to ensure that the results are actually good and not a “lucky or bad shot”.
Those load test are good to do, but actually it does not see much how many user can work in practice in the VM and/or on the hardware layer. Why? Because it is really difficult to really simulate the way the users are working. I have seen both ways: the test script is much “heavier” than the actual user activities, so at the end you can handle more users than the load test shows. A good example is the script we have run at a hospital (on physical hardware). The run showed that we can host around 80 users with a good performance on one server. “Accidentally” I was also involved at another hospital running the same set-up, where we saw that the server with 70 active user almost had no load at all. In other words much more users could be hosted per server.
The other side is also possible. For a VDI project we have run the load test. Later on the project the users were complaining that the amount of memory was way too low and that the responsiveness of the system was not what they expected. It turns out that the script we made included a long run where we did not take any actions during that step, while the user continued with other activities requiring additional resources.
Both scenarios shows that doing the load test only does not provide real world results (in all cases).
Therefore you need to validate the results by running a pilot with actual users that do actual activities on the infrastructure. I would advise to buy a small portion of the required hardware and select a group of pilot users that have the following characteristics: eager to adopt new technologies, precisely, punctional, representative and understandable. Those users will encounter issues and maybe some performance issues. Acutually I would like to have users in the group which are often tagged as whiners, because they will find 95% of the issues so you won’t have those in production. However choosing this kind of users is also usable to determine the amount of users. Probably they will do actual all their daily activities on the new system. Unfortunate I have seen to many pilot in which the users just did a few test and started working again on the current environment. At the end we did not have accurate data. It is really important that you have really active users in the pilot using all the kind of applications available in the right mix, so it a full reflection of your organization. In that case you can see the actual users and can determine the actual amount of users your host can handle. As said before this can be much less or much more than the stress tests shown. Don’t forget to create a baseline, so you can compare the results during the production phase.
Based on the results we start deploying in production. If the pilot was good the amount of users will still be the right amount during this phase. Logically you need to monitor the environment intensively and compare it with the baseline created during the pilot phase. Logically you need to response if the results are not comparable with the pilot and create a plan to mediate this issue.
When you have rolled out and are fully running production the process actually does not end. Logically changes will be applied to the infrastructure which can influence the amount of users you can host. You can think of big changes like a new version of business applications, but sometime also small fixes have the same effect. Don’t forget about updates to the VDI software as well, especially when you are using the Current Release of Citrix XenDesktop (see Citrix LTSR or CR). You can use the stress test script to do a check at changes (to see if the result is still comparable with the earlier tests), use a specific monitoring tool (check my article about Citrix Monitoring for more information) or log performance counters and compare this with the baseline of the pilot phase.
In this article I wrote down my opinion of determine the amount of users for a SBC/VDI infrastructure like Citrix XenDesktop/XenApp, Microsoft or VMware View. I prefer to use two methods together. Start with the use of a stress test software product (if there is a budget available) and secondly determine the exact amount with real users within the pilot phase with real users doing their daily job on the infrastructure. Last but not least you should keep track on the infrastructure after the migration. Changes can cause that the amount of user can change, so on a regularly basis this should be checked where you can use the stress test software, a specific monitoring solution or comparing counter with the earlier created baseline.