Just as almost every product and solution PVs can be set-up and configured so the PVS infrastructure is high available. Within the PVS infrastructure there are several components related to the High Availability of PVS. Some of these components like PXE and TFTP are well described in several good articles for creating a High Available solution. However within the PVS infrastructure there are also some components/configurations settings that are not much mentioned around the HA topic, but can have a big impact on the High Availability of your PVS infrastructure. In this article I’m going to describe which components should be taken into account, why and how you could/should use them in a High Available PVS infrastructure.
Site
The first component within a PVS infrastructure I would like to discuss about is the PVS Site. Within a PVS farm you need to have at least one Site, but you can create as much as needed. A PVS site exists of PVS servers, PVS Device Collections and a vDisk Pool. At least one PVS server is required, but more can be added as needed. The same applies for the Device Collections, where one is required and more can be created. Within a Device Collection you will find your Target Devices (PVS Clients). More PVS sites are often created for Delegation of Control (there is a specific Site Administrator role), which can be a good reason. Another reason I often see is to separate geographic locations, so for each location a separate site. This is a valid reason; however it can introduce a challenge for High Availability. Although a PVS site is not the highest level (that’s a PVS farm) a site is an autonomous object, only the parts into the site can communicate with each other, so a target device can only communicates with the PVS server(s) in the site. If that server or servers is not available the target device will not fail over to another site, but just will “hang” and wait till a PVS server in his site is available again.
As you can imagine a site with a single server is causing that the target devices have a single point of failure (as shown in image1). So to have High Availability in a PVS Farm each site created needs at least two PVS servers. If you have many sites this can be not very economical at first and you still have only one server available per site for fallback. In the case both servers fail that specific site is out of order. This is not necessary. You use less sites and combine geographical location together in a site (or even all in one site) and use the Load Balancing algorithm to assign Target Devices to the PVS server in their own specific datacenter.
On each vDisk you can configure which Load Balancing algorithm can be used. There are three options available:
- None: From all PVS servers in the farm the least busy server will be used (default)
- Best Effort: A Target Device will be assigned to the least busy PVS server in the same subnet (VLAN), when there is nog PVS server available (anymore) in the subnet the least PVS server from the whole site will be used.
- Fixed: A Target Device will be assigned to the least busy PVS server in the same subnet (VLAN), when there is nog PVS server available (anymore) in the subnet the Target Device cannot connect to the PVS infrastructure.
From these algorithms Best Effort can arrange the highest availability. When you have more (geographical) locations in one site, by default the PVS server locally at the Target Devices will be used. When those are not available servers from another location will be used in a worst case scenario. The Fixed algorithm has the same effect as using a site, so this one does not have an added value for high availability.
Bootstrap
I think that this is most forgotten High Available component, but because most of people automatically add more because of the start-up high availability not many people run into issues with this feature. The configuration is bit dependent on the method used to connect to the PVS farm: PXE or BDM.
For more details when or why to use PXE or BDM I refer to my earlier published article To PXE or not to PXE.
When using the BDM you have much the option to specify a maximum of 4 servers based on their IP address. Luckily you can also use a DNS alias so more than 4 servers can be used. I also explained how to configure that in the article To PXE or not to PXE <<LINK>>. If you configure this wisely you have a high availability.
When you use the PXE/TFTP option you configure the servers in the bootstrap configuration. Unfortunate you can only provide the servers on their IP address here (no DNS alias option). The same maximum of four applies.
But the tricky part is actually the usage of this configuration (both in BDM as in the PXE/TFTP alternative). Most people think this is only used at the moment a Target Device is started and used this information to find out about the PVS infrastructure and are redirected to least busy server (based on the load balance algorithm). If you need more information about this part to create a high available solution check the article Load Balancing TFTP – Anything but trivial.
However the servers defined at this configuration step have another important role. When the Target Device loses the connection with the PVS server the file were streamed from, the servers mentioned in the bootstrap are used to connect to the PVS infrastructure again. If in that moment the configured servers in the bootstrap are not available the Target Device just hangs and cannot reconnect to the PVS infrastructure although there are other PVS servers available. So specify as much servers as possible (a maximum of four when based on IP address) spread over your infrastructure. Remember that by default only the servers its own IP address is added, so the target devices are only provided with one IP address.
Conclusion
In the article I discussed two components of the PVS infrastructure that are much mentioned in the high available scenario articles and blogs, although they absolute can have impact on your high availability rate. I explained why site and the bootstrap configuration are import to taken into account and what you should/could to set-up the highest available PVS infrastructure as possible.