The following guide provides information on how to get a NetScaler cluster up and running.
NetScaler clustering can provide active-active traffic processing on 2 or up to 32 NetScaler appliances either physical or virtual. Each appliance must be of the same license type, software build, hardware build (MPX/SDX) or virtual build (VPX/SDX).
Clustering is part of NetScaler Enterprise or Platinum licenses.
The following NetScaler features are not supported in a cluster:
- SSL Certificate Bundles
- TFTP (Note: Is supported on NetScaler 12.0+)
The following NetScaler 11.1 features are supported at a node level, meaning that they can run on a single node in a cluster.
- Integrated Caching http://www.jgspiers.com/netscaler-integrated-caching/
- Surge protection
- Full VPN or Clientless VPN
For a full list of features not supported at all or supported at a node level, see https://docs.citrix.com/en-us/netscaler/11-1/clustering/cluster-features-supported.html
♣ Create Cluster
♣ Synchronize Cluster Files
♣ Precision Time Protocol
♣ Cluster SNIPs
♣ Node Groups
♣ Traffic Flow
♣ Cluster Quorum
♣ Admin, Operational and Health states
♣ Equal Cost Multiple Path (ECMP)
♣ Cluster Link Aggregation
♣ Retain connections on cluster
♣ Command Propogation Failures
♣ Node level configurations that are supported
♣ Migrate NetScaler High Availability pair to a NetScaler Cluster
♣ Backup and Restore
Enter a Cluster instance id and IP for the cluster (CLIP). If the cluster nodes reside on different networks, check INC Mode. If you do not want packets to undergo any steering, check Process Local. Click Continue.
Note: When using INC Mode, you’ll likely have SNIPs for each NetScaler cluster node, also known as spotted SNIPs. Having a single SNIP across all nodes (striped SNIP) is not supported when INC Mode is enabled. Cluster nodes from the same network must be added in to nodegroups that include other nodes from the same network. As en example, a London and Glasgow datacentre, the cluster spans across both regions using INC Mode. Two nodegroups are created, one containing the Glasgow appliances, and the second storing the London nodes. Nodegroups are explained later.
Give this node a unique Node ID. In my case I’ll have three nodes labelled 1, 2 and 3. Specify the Backplane interface which is the interface used to communicate with other nodes. This should not be the management interface. Additional interfaces should be added to each node to serve client and server traffic. Set the State to Active, this can also be Passive or Spare. Set the priority on all nodes to the same value for the cluster to choose a random node to be the Configuration Coordinator. If priorities are different between nodes, the node with the lowest priority will be elected as the Configuration Coordinator for all the times it is online and healthy. If this node goes down, the lowest priority node is elected and so on. Click Done.
Note: A Configuration Coordinator owns the Cluster IP address, and replicates configuration out to the other node members using the NSIP address. It acts as a master node. The first node you use to create a cluster is the Configuration Coordinator until either it goes down or a node with lower priority is added to the cluster.
Note: The state of MAC spoofing on Hyper-V, VMware or XenServer hypervisors can affect the steering mechanism on the cluster backplane. Make sure MAC spoofing for XenServer is disabled, enabled for Hyper-V and enabled for VMware along with enabling Forged Transmits.
Click Yes. The cluster will be ready to accept new nodes at this stage.
Log back in to the node, navigate to System -> Cluster -> Nodes -> Add. This process adds a second node remotely.
Enter a unique Node id, enter the NSIP address of the second node and specify the Backplane interface. The first value of “2” in the Backplane Interface field constitutes the Node id number. The 0/1 value is simply the interface ID. This allows the cluster to uniquely identity each node. Specify the same priority as node 1 or a different priority if you prefer. Check Execute join command and reboot the remote system.
Enter nsroot credentials of the remote node and the current configuration coordinator. Click Create.
The second node will show as UNKNOWN whilst is is being restarted.
Note: When a node is added to the cluster, clear ns config -extended is ran on the node to clear the nodes configuration apart from VLAN and NSVLAN. If the cluster is an L3 cluster (nodes in separate networks using INC) then network configuration is not cleared.
Once the node comes online it should return an operational state of ACTIVE and be joined to the cluster.
Another way to add a node, which I’ll do for the final third node is to log on to one of the existing cluster nodes and navigate to System -> Cluster -> Manage cluster -> Add. Enter details of the node as below and click Create, this time leaving Execute join command and reboot the remote system unchecked.
The node will appear as unknown for now.
Log on to the third node and navigate to System -> Cluster -> Join cluster.
Enter the CLIP and password to the nsroot account of the configuation coordinator. Click OK.
At this stage, you have a three node all active cluster, sharing the same configuration. You can also discover netscalers that are on the same subnet as the current node, simply by clicking the Discover NetScalers button and scanning a range of IP addresses. Discovered NetScaler appliances can then be selectively added to the cluster.
If you need to remove a node from the cluster, check a node and select Remove. Doing this executes a clear ns config -extended command on the node being removed to clear the cluster configuration. The default VLAN and NSVLAN are not cleared from the node. When the node being removed is the Cluster Coordinator, that role along with the CLIP is assigned to another node, one that has the highest priority (lowest number). Removing all cluster nodes would eventually delete the cluster IP address when the last node is removed.
Synchronize Cluster Files
Using the drop-down box, you can choose to manually synchronize all files or certain ones such as SSL Certificate and Keys (when manually copied to NetScaler), or DNS records.
I menton SSL Certificate and Keys because when these are manually copied to the NetScaler Cluster Coordinator they aren’t automatically available on other cluster nodes.
All the files mentioned below are synchronized from the Configuration Coordinator once a node is added to the cluster or when you run a new command on the CLIP:
Note: When configuration synchronization is processing upon adding a new node to the cluster, there may be intermittent traffic loss.
Click OK to perform a manual sync.
You can also synchronize files using the CLI command sync cluster files all or sync cluster files DNS for example.
Precision Time Protocol
Clicking Configure Precision Time Protocol(PTP) Settings allows you to change the PTP state. The default state is ENABLE. PTP is a protocol mainly used with Linux systems that is used to synchronize time between systems and in our case the cluster nodes. If you disable PTP as a result of time issues or for other reasons, make sure the cluster is configured for NTP with a reliable time source.You can also disable PTP using CLI command set ptp -state disable
Clicking Statistics shows cluster related statistics such as the total amount of steered packets, the cluster status, size and so on.
You can also view the statistics of a cluster node via GUI or using CLI command stat cluster node nodeid
Just like a single NetScaler, the cluster can be assigned a SNIP address which is shared between all nodes (striped) or assigned to one or more single nodes (spotted). When creating a SNIP, select ALL_Nodes under Owner Node.
Note: When USIP is disabled, Citrix recommend using spotted SNIP addresses. Also, when running in INC Mode, striped SNIPs are not supported.
The SNIP will appear as below.
Node Groups are groups one or more nodes can optionally be a member of if you want certain configurations (entities) to be active only between certain node members or you want to configure redundancy for nodegroups. As you are now aware, the Cluster Coordinator replicates configuration out to all nodes (striped). Whilst you cannot stop that, you can stop configurations from being active on specific nodes. For example, you want only certain Load Balancing vServers or services to be active on some but not all of the nodes. If an entity is active on one node (in a Sticky nodegroup) it is referred to as a spotted entity. If an entity is active on more then one node, it is referred to as partially striped. To create a nodegroup, navigate to System -> Cluster -> Node Group -> Add.
Specify a name. You can optinally set the state of the Node Group (Active/Spare/Passive) for redundancy and failover for example having an Active Node Group and a second Passive or Spare Node Group which is capable of taking over. Keep in mind that any node you make a member of this group will use the state of the Node Group. If a single node has an admin state of passive and is added to an active Node Group, the node will become active. The Strict checkbox if checked means no other cluster node is added to the Node Group if one of the existing Node Group members goes down. If unchecked and a Node Group member node goes down, a non-member is added to the Node Group and picks up the configuration. This node is called the replacement node. When the original node comes back online, the backup node is replaced. The Sticky checkbox when enabled results in only one node being allowed to be bound to the Node Group. This is the owner node. When Sticky is checked and the owner node goes down, a backup node takes place and continues to serve traffic even when the original owner node comes back online. When unchecked, the original owner node takes over when back online. Sticky and Strict cannot be used together, so one of the options will be greyed out depending on what you have chosen.
To bind nodes to the Node Group click No Cluster Node.
Select one or more nodes to bind.
Now you can attach entities to the Node Group. These objects will be made active on the Node Group members. Select Load Balancing Virtual Servers.
Select one or more servers, click Select.
To create multiple nodegroups for redundancy, for example, you could create nodegroup1 with two active nodes, and a second nodegroup2 with 2 spare nodes. When one of the nodes in nodegroup1 goes down, the nodegroup is counted for nodes and compared with the number of nodes in other active nodegroups by priority. If another active nodegroup has a higher or equal node count, that nodegroup is made operationally active. If not, the spare nodegroup (nodegroup2) is checked.
Note: You cannot bind entities to a nodegroup when that entity has dependencies on another entity that isn’t bound to the nodegroup. Remove the dependency first, add both entities to the nodegroup then add the dependency back. For example, a vServer and spillover vServer or a Content Switching vServer with multiple LB vServers attached.
Note: GSLB is supported when GSLB sites are bound to nodegroups that contain a single cluster node. NetScaler Gateway is supported when VPN vServers are bound to nodegroups that contain a single cluster node with Sticky enabled on the nodegroup. AppFirewall is supported in striped or partially striped configurations. https://docs.citrix.com/en-us/netscaler/11/security/application-firewall/appendixes/cluster-configuration.html
Since all nodes act as a single entity using the CLIP, traffic that reaches the cluster must be routed to an appropriate node (flow receiver). Having a cluster with nodes that are active, spare or passive will obviously affect how traffic is routed. In the case of active nodes, ECMP (Equal Cost Multiple Path) or Cluster Link Aggregation is used to distribute traffic effectively. When NetScalers are operting in INC Mode, only ECMP can be used. When traffic reaches the flow receiver, the flow receiver then determines which node must process the traffic. Once a node is determined, the traffic is steered to the flow processor over the backplane if the receiver and processor nodes are on the same network or through a GRE unencrypted tunnel if receiver and processor nodes are on different networks.
Note: The fact that GRE tunnels are unencrypted means you are advised to secure the tunnel with IPSec encryption if traffic flows over the internet.
You can disable steering by checking the Process Local on the cluster instance. This means once the receiver receives traffic, it also acts as the processor. An exception to this is when the receiver does not own the required entity in a spotted or partially striped configuration. In this case, steering is still performed, however to the destination node which holds the service.
Note: When steering is disabled, striped SNIPs are not supported, L2 mode must be disabled, traffic flowing to the CLIP is steered to the Configuration Coordinator, MPTCP and FTP does not work.
To disable steering on a particular Virtual Server, run command set lb vserver vServerName -processLocal ENABLED.
Majority – Recommended and requires the majority of nodes (in my case two) to be online for the cluster to stay UP.
None – No quorum is defined.
Admin, Operational and Health states
Active – Nodes are in sync with the cluster and serve traffic that reaches the cluster.
Passive – Nodes are in sync with the cluster but do not serve any traffic that reaches the cluster. This state can only manually be changed.
Spare – Nodes are in sync with the cluster and act as backup to active state nodes. When a spare node takes place of a down active node, the admin state remains spare but the operational state changes from inactive to active.
Note: If preemption is disabled (default), spare nodes continue to serve traffic even when the active node comes back online. If enabled, the spare node is preempted by an active node when the active node comes back online and finds the spare node serving traffic.
Active – When the node is healthy and node is in active or spare admin state.
Inactive – When the node is in a passive admin state regardless of health, the node is in an active admin state and not health or when the node is in a spare admin state and not being used as backup.
Unknown – Node cannot receive heartbeats from other nodes
Health states are UP, NOT UP or UNKNOWN.
Health state criteria:
Up – When the following criteria is satisfied:
- Interfaces are up and enabled including backplane interface.
- Cluster synchronization operation is enabled and completed.
- SSL cards are available.
- Cluster Link Aggregation member(s) are up (if used).
Not up – One or more of the above criteria is not satisfied.
Unknown – The node cannot receive any heartbeats from other cluster nodes.
Equal Cost Multiple Path (ECMP)
As mentioned previously, traffic is distributed to cluster nodes effectively using either CLAG or ECMP. In the case of ECMP (requires NetScaler Standard edition), virtual server IP addresses are advertised by all active cluster nodes which means traffic can be received by any node, and steering directs t
he traffic to the correct node for processing. In the event you have spotted or striped virtual servers, these virtual IP addresses are advertised only on nodes where the virtual server is active. This helps to reduce redundant steering.
To configure ECMP, read the below useful resource links:
Cluster Link Aggregation
As mentioned previously, traffic is distributed to cluster nodes effectively using either CLAG or ECMP. In the case of CLAG, interfaces from different cluster nodes are grouped together. CLAG is not supported on VPX running on XenServer, Hyper-V or AWS. There are also some other requirements to consider when running CLAG across VPX nodes.
To configure CLAG, read the below useful resource links:
To remove a node from the cluster that uses CLAG, set the node as Passive and then remove the nodes interface from the Link Aggregation channel and cluster. To remove the node from cluster, use CLI command rm cluster node nodeid or use the GUI by navigating to Manage Cluster.
Retain connections on cluster
ake it offline during this period. When for example you are upgrading a cluster, one node should be taken offline at a time. When gracefully shutting down or upgrading a node, navigate to System -> Cluster -> Manage Cluster and check Retain Connections on Cluster.
Now edit the node you want to upgrade, set the state to Passive and configure a delay in minutes which will determine how long the NetScaler maintains existing connections before purging them.
After the upgrade is done, return the node to an active state.
Note: When performing cluster upgrades or downgrades, configuration propogation is disabled until all nodes return to the same software version. This means you cannot make any configurational changes using the CLIP.
Command propogation failures
Node level configurations that are supported
NetScaler nodes in a cluster generally cannot be edited manually without configurations going through the NetScaler Cluster IP. There are some exceptions though, and the following list are configurations that can be done at a node level:
- save ns config
- disable ntp sync
- sync cluster files
- force cluster sync
- add, remove, set, unset route
- add, remove, send -all arp
- set, enable, disable interface
- start, stop, show nstrace
- set, remove cluster node
- set, remove, enable, disable cluster interface
Migrate NetScaler High Availability pair to a NetScaler Cluster
Backup and Restore
Upon restoring, if Node1 was to fail, it is removed from the cluster and rebuilt/replaced. When Node1 is back online, it is added back to the cluster and you should restore the previously backed up ns.conf file and any other files as mentioned http://docs.citrix.com/en-us/netscaler/11-1/system/basic-operations/backup-restore-netscaler-appliance.html
If a cluster needs restored, you need to rebuld the cluster and repeat the above procedure on each node as they are added back the cluster.
There is a limitation if using CLA (Cluster Link Aggregation) and a MAC is generated from a faulty node causing the above procedure to not work.
show cluster instance – Outputs the details of the cluster including the health state for each node, admin state and operational state. If there are any errors on a node such as no heartbeats being received, this will be flagged.
show cluster node id – Get specific information for a specific cluster node, including the sync state, health state, interfaces on which heartbeats are not seen etc.
- Note: You can disable heartbeats on specific node interfaces on NetScaler 12+. You cannot disable heartbeats on the backplane interface since it is required for maintaining connectivity between nodes.
show techsupport -scope cluster – Gather techsupport bundle to send to Citrix technical support/upload to CIS.
nstrace – Run nstrace on a single node or on the CLIP, which captures a trace on all nodes and stores them individually on each appliance /var/nstrace/. http://www.jgspiers.com/citrix-netscaler-traffic-capture-using-nstrace-nstcpdump/
Common Issues and troubleshooting tips – https://docs.citrix.com/en-us/netscaler/11-1/clustering/cluster-troubleshooting/ns-cluster-trouble-tip-ref.html