High Availability White Papers

A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster

Overview The Open Source Cluster Application Resources (OSCAR) is a fully integrated cluster software stack designed for building, and maintaining a Linux Beowulf cluster. As OSCAR has become a popular tool for building the cost effective HPC cluster, undoubtedly, High Availability (HA) will equally be an important aspect that enables HPC systems, as clearly an unavailable cluster equals no performance. To embrace both HA and HPC features, the HA-OSCAR solution is created which eliminates the numerous single-point-of-failure in HPC systems and alleviates unplanned downtime through sophisticated self-healing mechanisms and component redundancy. This paper report the newly introduced ideas and experiments on hardware level failure detection and prediction based on the Service Availability Forum's Hardware Platform Interface (OpenHPI).

Further White Paper Details
PublisherLouisiana Tech University File FormatPDF
Date PublishedMarch 2004
FormatWhite Papers   
Topics

Quick Sitemap Links: