A SERVICE OF

logo

High Availability
High AvailabilityHigh Availability
High Availability
NOTE:
NOTE: NOTE:
NOTE:
Online addition/replacement for cell boards is not currently supported and will be available in a future HP UX
release. Online addition/replacement of individual CPUs and memory DIMMs will never be supported.)
Superdome high availability offering is as follows:
CPU
CPUCPU
CPU
: The features below nearly eliminate the down time associated with CPU cache errors (which are the majority
of CPU errors).
Dynamic processor resilience w/ Instant Capacity enhancement.
CPU cache ECC protection and automatic de allocation
CPU bus parity protection
Redundant DC conversion
Memory
MemoryMemory
Memory
: The memory subsystem design is such that a single SDRAM chip does not contribute more than 1 bit to
each ECC word. Therefore, the only way to get a multiple-bit memory error from SDRAMs is if more than one
SDRAM failed at the same time (rare event). The system is also resilient to any cosmic ray or alpha particle strike
because these failure modes can only affect multiple bits in a single SDRAM. If a location in memory is "bad", the
physical page is de-allocated dynamically and is replaced with a new page without any OS or application
interruption. In addition, a combination of hardware and software scrubbing is used for memory. The software
scrubber reads/writes all memory locations periodically. However, it does not have access to "locked-down" pages.
Therefore, a hardware memory scrubber is provided for full coverage. Finally data is protected by providing
address/control parity protection.
Memory DRAM fault tolerance (i.e., recovery of a single SDRAM failure)
DIMM address/control parity protection
Dynamic memory resilience (i.e., page de-allocation of bad memory pages during operation)
Hardware and software memory scrubbing
Redundant DC conversion
Cell COD
I/O
I/OI/O
I/O
: Partitions configured with dual path I/O can be configured to have no shared components between them,
thus preventing I/O cards from creating faults on other I/O paths. I/O cards in hardware partitions (nPars) are fully
isolated from I/O cards in other hard partitions. It is not possible for an I/O failure to propagate across hard
partitions. It is possible to dynamically repair and add I/O cards to an existing running partition.
Full single-wire error detection and correction on I/O links
I/O cards fully isolated from each other
Hardware for the prevention of silent corruption of data going to I/O
On-line addition/replacement (OLAR) for individual I/O cards, some external peripherals, SUB/HUB
Parity protected I/O paths
Dual path I/O
Crossbar and Cabinet Infrastructure
Crossbar and Cabinet InfrastructureCrossbar and Cabinet Infrastructure
Crossbar and Cabinet Infrastructure
:
Recovery of a single crossbar wire failure
Localization of crossbar failures to the partitions using the link
Automatic de-allocation of bad crossbar link upon boot
Redundant and hotswap DC converters for the crossbar backplane
ASIC full burn-in and "high quality" production process
Full "test to failure" and accelerated life testing on all critical assemblies
Strong emphasis on quality for multiple-nPartition single points of failure (SPOFs)
System resilience to Management Processor (MP)
Isolation of nPartition failure
Protection of nPartitions against spurious interrupts or memory corruption
Hot swap redundant fans (main and I/O) and power supplies (main and backplane power bricks)
Dual power source
Phone-Home capability
"HA Cluster-In-A-Box" Configuration
"HA Cluster-In-A-Box" Configuration"HA Cluster-In-A-Box" Configuration
"HA Cluster-In-A-Box" Configuration
: The "HA Cluster-In-A-Box" allows for failover of users' applications
between hardware partitions (nPars) on a single Superdome system. All providers of mission critical solutions agree
that failover between clustered systems provides the safest availability-no single points of failures (SPOFs) and no
ability to propagate failures between systems. However, HP supports the configuration of HA cluster software in a
single system to allow the highest possible availability for those users that need the benefits of a non-clustered
solution, such as scalability and manageability. Superdome with this configuration will provide the greatest single
system availability configurable. Since no single-system solution in the industry provides protection against a SPOF,
users that still need this kind of safety and HP's highest availability should use HA cluster software in a multiple
system HA configuration. Multiple Serviceguard or Serviceguard Extension for RAC clusters can be configured within
a single Superdome system (i.e., two 4-node clusters configured within a 32-way Superdome system).
QuickSpecs
HP 9000 Superdome Servers
HP 9000 Superdome ServersHP 9000 Superdome Servers
HP 9000 Superdome Servers
(PA-8600, PA-8700 and PA-8800)
(PA-8600, PA-8700 and PA-8800)(PA-8600, PA-8700 and PA-8800)
(PA-8600, PA-8700 and PA-8800)
Configuration
DA - 11721 North America — Version 13 — April 1, 2005
Page 9