Node Failure Recovery (MPP Only)

This node failure recovery process applies only to Teradata Database MPP systems.

Node failures may be related to hardware, software, operating system, network, or Azure platform issues. When a node fails, the failed node is automatically stopped/deallocated, and diagnostic information may be lost. A failed node cannot be recovered unless it is stopped/deallocated because NICs cannot be detached due to restrictions of the Azure platform.

When a node fails, a replacement node (similar to a hot standby node) automatically spins up, detaches the network-attached storage and NICs of the failed node, reattaches the network-attached storage and NICs to the new VM, and the configuration is reinstated. The replacement node is based on a snapshot of a healthy operating system disk of the currently active (control) node. The replacement node has the same private IPs and public IP as the replaced node.

Node failures are handled differently when the VM has local storage. When a node fails, the data is lost. Although the node is replaced and comes back online, the AMPs on the recovered VM display as FATAL and offline. The other vprocs on the system are online and in the configuration. To fully restore a VM that has local storage, you must run Fallback Recovery and rebuild the AMPs. For more information, see Rebuilding AMPs after Failure and Running the Script to Rebuild AMPs. For assistance, contact Teradata Customer Support.

Node failure recovery takes longer than a typical TPA reset. There are dozens of reasons for a node failure and it may be difficult for you to determine the cause. However, if your node does not automatically recover after 10 to 15 minutes, first check the deployment logs in your Azure resource group. For additional assistance, contact Teradata Customer Support.

If a node fails, do the following:
  1. Create an Azure Active Directory application and the service principal. See Creating an Azure Active Directory Application and the Service Principal.
  2. Enable the node failure recovery feature. See Enabling Node Failure Recovery.

Before a node failure occurs, you have the option of setting the VM to terminate instead of stopping the VM if a node failure occurs. For more information, see Configuring the VM State for Node Failure Recovery.

