Oracle® Database High Availability Overview 10g Release 2 (10.2) Part Number B14210-02 |
|
|
View PDF |
This chapter describes high availability architectures in an Oracle environment. It includes the following sections:
Oracle Database 10g provides a full range of capabilities to protect from all causes of system downtime, both planned and unplanned. Table 4-1 shows the outage types and the Oracle database capabilities and features that most effectively prevent, tolerate, or repair each outage type.
Table 4-1 Oracle Database High Availability Architectures
Outage Type | Database Capabilities and Features |
---|---|
Unplanned |
|
Computer failures |
|
Storage failures |
|
Human errors |
|
Data corruption |
|
Site failures |
|
Planned |
|
Data changes |
|
System changes |
|
This section describes the following top database architectures that address various high availability business needs:
Oracle Database 10g running a single database on a standalone computer contains significant high availability features and capabilities. For more information, see Chapter 2, "Oracle Database High Availability Solutions".
Oracle Real Application Clusters (RAC) builds upon the features and capabilities of Oracle Database 10g. RAC comprises several Oracle instances running on many clustered computers that access a shared database residing on shared disk. RAC combines the processing power of these multiple interconnected computers to provide system redundancy, scalability, and high availability. Application scale in a RAC environment to meet increasing data processing demands without changing the application code. In addition, allowing maintenance operations to occur on a subset of components in the cluster while the application continues to run on the rest of the cluster can reduce planned downtime.
Oracle Database 10g with Data Guard
Oracle Data Guard builds upon the features and capabilities of Oracle Database 10g. Data Guard maintains up to nine standby databases—each of which is a real-time copy of the production database—to protect against all threats: computer failures, storage failures, human errors, data corruption, and site failures. If a failure occurs on the production (primary) database, data processing can fail over to one of the standby databases (which will become the new primary database). In addition, planned downtime for maintenance can be reduced because production processing can quickly and easily switch over from the current primary database to a standby database, and then back again.
Oracle Database 10g with RAC and Data Guard - MAA
Maximum Availability Architecture (MAA) combines the scalability and availability advantages of RAC with the site protection capabilities of Data Guard. An MAA environment consists of a site containing a RAC production database, along with a second site containing a cluster that either hosts both logical and physical standby databases, or at least one physical or logical standby database. This architecture provides the most comprehensive set of solutions for both unplanned and planned outages because it inherits the capabilities and advantages of both the Oracle Database 10g with RAC and Oracle Database 10g with Data Guard architectures.
Oracle Database 10g with Streams
Oracle Streams enables the propagation and management of data, transactions, and events in a data stream either within a database, or from one database to another. Similar to Oracle Database 10g with Data Guard, Oracle Database with Streams can capture database changes, propagate them to destinations, and apply the changes at these destinations. Using a Streams environment may require additional administrative overhead, but it offers increased flexibility that might be required to meet specific business requirement.
Oracle provides a wide array of high availability architectural solutions. The Oracle Database 10g architecture contains many availability features and assets that are used by all other architectures and is the starting point for most customers. Oracle Database 10g with RAC, Oracle Database 10g with Data Guard, and Oracle Database 10g with Streams provide additional high availability capabilities in addition to the Oracle Database 10g capabilities. MAA incorporates both RAC and Data Guard advantages and represents the architecture with maximum availability. Choosing an architecture with more availability features does not necessarily lead to higher costs. As a matter of fact, RAC technology and GRID computing enable a more available and resilient architecture to be attained with lower total cost of ownership than most legacy high availability features. Figure 4-1 illustrates the hierarchy of the different high availability architectures.
Figure 4-1 Hierarchy of High Availability Architectures
The following sections provide further details on the various Oracle database high availability architectures:
Oracle provides high availability features that can be used in any of the database architectures. These features make the standalone database on a single computer attractive and available:
Oracle Database 10g with RAC architecture uses Real Application Clusters and is an inherently high availability system.The clusters that are typical of RAC environments can provide continuous service for both planned and unplanned outages. RAC build higher levels of availability on top of the standard Oracle features. All single instance high availability features, such as flashback technologies and online reorganization, apply to RAC as well.
In addition to the standard Oracle features, RAC exploits the redundancy that is provided by clustering to deliver availability with n - 1 node failures in an n-node cluster. All users have access to all nodes as long as there is one available node in the cluster.
This architecture provides the following benefits:
Fast node (measured in minutes) and instance failover (measured in seconds)
Integrated and intelligent connection and service failover across various instances
Planned node, instance, and service switchover and switchback
Rolling patch upgrades
Rolling release upgrades of Oracle Clusterware
Multiple active instance availability and scalability across multiple nodes
Comprehensive manageability that integrates database and cluster features
Extensive cluster and application services that allows the database and application services to be restarted or relocated in case of failures
Figure 4-2 shows Oracle Database 10g with RAC architecture.
Figure 4-2 Oracle Database 10g with RAC Architecture
Oracle Data Guard ensures high availability, data protection, and disaster recovery for enterprise data. Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable Oracle databases to survive disasters and data corruption. Data Guard maintains these standby databases as transactionally consistent copies of the production database. If the production database becomes unavailable due to a planned or unplanned outage, Data Guard can switch any standby database to the production role, minimizing the downtime associated with the outage. Data Guard can be used with traditional backup, restoration, and cluster technologies to provide a high level of data protection and availability. With Data Guard, administrators can optionally improve production database performance by diverting resource-intensive backup and reporting operations to standby systems.
Using a backup copy of the primary database, it is possible to create up to nine standby databases and integrate them in a Data Guard configuration. Once created, Data Guard automatically maintains each standby database by transmitting redo data from the primary database and applying the redo to the standby database. Similar to a primary database, a standby database can be either an Oracle single-instance or RAC database.
A standby database can be either a physical standby database or a logical standby database. A physical standby database provides a physically identical copy of the primary database, with on disk database structures that are identical to the primary database on a block-for-block basis. A physical standby database is synchronized with the primary database through Redo Apply, which recovers the redo data received from the primary database and applies it to the physical standby database. A physical standby database can be used for business purposes other than disaster recovery on a limited basis.
Physical standby databases provide these advantages:
Protection from user errors and logical corruption
Protection from disasters and site failures if located remotely
Fast site and database failover (less than 1 minute to five minutes)
Fast-start failover provides the ability to automatically, quickly, and reliably fail over to a designated, synchronized standby database in the event of primary database failure
Fast site and database planned switchovers for maintenance
Using Flashback Database, a Redo Apply standby database can diverge for reporting or testing purposes and resynchronize with its primary database once complete
Backups can be taken from the physical standby database instead of the production database, relieving the load on the production database
Read-only capability, resulting in better use of system resources
Greater support for fast application notification and application callouts resulting in better full-stack application failover
A logical standby database can be used for other business purposes in addition to disaster recovery. Users can access a logical standby database for queries and reporting purposes. Using a logical standby database, it is possible to upgrade Oracle database software and patch sets with minimal downtime. Therefore, a logical standby database can be concurrently used for data protection, reporting, and database upgrade purposes.
In addition to disaster recovery and data protection, logical standby databases provide the following benefits:
Enable the standby database to be open for normal operations with both read-only and read/write accessibility
Enable additional objects to be built and maintained
Enable rolling database upgrades of the production database
A recommended configuration for many cases includes both physical and logical standby databases. They can reside on the same database computer or cluster, but they should be remote from the production database. The physical standby database can be reserved for failovers in case of disaster, and the logical standby database can continue to be used for reporting. The physical standby database provides a faster apply technology because redo logs do not have to be converted to SQL.
Figure 4-3 shows the production database at the primary site and the standby databases at the secondary site.
Figure 4-3 Oracle Database 10g with Data Guard Architecture on Primary and Secondary Sites
See Also:
Oracle Data Guard Concepts and Administration for more information about datatypes supported by logical standby databases
The papers about standby databases at
http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
RAC and Data Guard provide the basis of Oracle Database 10g - MAA. Maximum Availability Architecture (MAA) provides the most comprehensive architecture for reducing downtime for scheduled outages and preventing, detecting, and recovering from unscheduled outages. The recommended MAA has two identical sites. The primary site contains the RAC primary database, and the secondary site contains a RAC standby database.
Identical site configuration is recommended to ensure that performance is not sacrificed after a failover or switchover. Symmetric sites also enable processes and procedures to be kept the same between sites, making operational tasks easier to maintain and execute.
MAA encompasses RAC, Data Guard, and a set of recommended best practices for configuring and managing the architecture as well as recovering from various outages. For more information, visit the MAA Web site at:
http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
Figure 4-4 provides an overview of Oracle Database 10g with RAC and Data Guard - MAA.
Figure 4-4 Oracle Database 10g with RAC and Data Guard - MAA
Oracle Streams is meant for information sharing and distribution. It can also provide an efficient and highly available architecture.
Oracle Database 10g with Streams provides granularity and control over what is replicated and how it is replicated. It supports bidirectional replication, data transformations, subsetting, custom apply functions, and heterogeneous platforms. It also gives users complete control over the routing of change records from the primary database to a replica database. The capture of data changes can be performed at the primary database or downstream at a replica database. This enables users to build hub and spoke network configurations that can support hundreds of replica databases.
Oracle Database 10g with Streams should be evaluated if one or more of the following conditions are true:
A full active/active site configuration is required, including bidirectional changes
Site configurations are on heterogeneous platforms
Different character sets are required between the primary database and its replicas
Fine control of information and data sharing are required
More investment and expertise to build and maintain an integrated high availability solution is available
For disaster recovery, Data Guard is Oracle's recommended solution.
Figure 4-5 shows Oracle Database 10g with Streams with local capture running at the primary database.
Figure 4-5 Oracle Database 10g with Streams
This section summarizes the advantages of the high availability architectures discussed in this chapter and provides guidelines for you to choose the correct high availability architecture for your business.
Oracle Database 10g with RAC and Oracle Database 10g with Data Guard are the most common Oracle high availability architectures, and each provides very significant high availability advantages. MAA provides the most redundant and robust high availability architecture. It prevents, detects, and recovers from different outages to meet stringent RTO and RPO requirements, as well as preventing or minimizing downtime for maintenance. Oracle Database 10g with Streams is an alternative high availability solution, but it requires more customization and administrative effort, and may not be as transparent to the application.
The baseline high availability architecture is Oracle Database 10g. Consider using:
Oracle Database 10g with RAC if:
Maximum Recovery Time Objective (RTO) for instance or node failure is in seconds or minutes
Database scalability beyond one instance or node is required
Oracle Database 10g with Data Guard if:
Maximum RTO for instance or node failure is in seconds to minutes or more
Maximum RTO for data corruption or site failure is less than 1 minute to five minutes
MAA if:
Projected planned maintenance is in hours or less for each year
Both RAC and Data Guard are required
Oracle Database 10g with Steams if active/active replicated system or heterogeneous solution is required
Table 4-2 identifies the additional capabilities provided by the architectures that build upon Oracle Database 10g.
Table 4-2 Additional Capabilities of High Level Oracle High Availability Architectures
Oracle High Availability Architecture | Key Characteristics and Additional Capabilities |
---|---|
|
Transparent to application Fast repair for human error Fast failover for computer failure and storage failure Scalability beyond a single system Reduced downtime for computer maintenance |
Oracle Database 10g with Data Guard |
Transparent to application Fast repair for human error Fast failover for computer failure, storage failure, and data corruption Protection from site failure Reduced downtime for computer or site maintenance |
Oracle Database 10g with RAC and Data Guard - MAA |
Transparent to application Fast repair for human error Fast failover for computer failure, storage failure, and data corruption Protection from site failure Scalability beyond a single system Reduced downtime for computer or site maintenance |
Fast repair for human error Replica database(s) available for read/write use Provides heterogeneous platform support Fast failover for computer failure and storage failure Protection from site failure Reduced downtime for computer or site maintenance |
Footnote 1 Requires planning and overhead to make solution robust
Table 4-3 shows the attainable recovery times for all types of unplanned downtime for each Oracle high availability architecture.
Table 4-3 Attainable Recovery Times for Unplanned Outages
Outage Type | Oracle Database 10g | RAC | Data Guard | MAA | Streams |
---|---|---|---|---|---|
Computer failure |
Minutes to hoursFoot 1 |
No downtimeFoot 2 |
Seconds to five minutes |
No downtime |
No downtime |
Storage failure |
No downtimeFoot 3 |
No downtime3 |
No downtime3 |
No downtime3 |
No downtime3 |
Human error |
< 30 minutesFoot 4 |
< 30 minutes4 |
< 30 minutes4 |
< 30 minutes4 |
< 30 minutes4 |
Data corruption |
HARD prevents data corruptionFoot 5 Potentially hoursFoot 6 |
HARD prevents data corruption5 Potentially hours6 |
HARD prevents data corruption5 Seconds to five minutes |
HARD prevents data corruption5 Seconds to five minutes |
HARD prevents data corruption5 Seconds to five minutes |
Site failure |
Hours to days |
Hours to days |
Seconds to five minutesFoot 7 |
Seconds to five minutesFoot 8 |
No downtime7 |
Footnote 1 Recovery time consists largely of the time it takes to restore the failed system.
Footnote 2 Database is still available, but portion of application connected to failed system is temporarily affected.
Footnote 3 Storage failures are prevented by using ASM with mirroring and its automatic rebalance capability.
Footnote 4 Recovery time for human errors depend primarily on detection time. If it takes seconds to detect a malicious DML or DDL transaction, it typically only requires seconds to flashback the appropriate transactions. Longer detection time usually leads to longer recovery time required to repair the appropriate transactions. An exception is undropping a table, which is literally instantaneous regardless of detection time.
Footnote 5 Not all types of data corruption are prevented. For the most recent information about the HARD initiative, refer to http://www.oracle.com/technology/deploy/availability/htdocs/HARD.html
.
Footnote 6 Recovery time depends on the age of the backup used for recovery and the number of log changes scanned to make the corrupt data consistent with the database.
Footnote 7 Recovery time indicated applies to database and existing connection failover. Network connection changes and other site-specific failover activities may lengthen overall recovery time.
Footnote 8 The portion of any application connected to the failed system is temporarily affected.
Table 4-4 shows the attainable recovery times for all types of planned downtime for each Oracle high availability architecture.
Table 4-4 Attainable Recovery Times for Planned Outages
Outage Type | Oracle Database 10g | RAC | Data Guard | MAA | Streams | |
---|---|---|---|---|---|---|
System changes - Dynamic Resource Provisioning |
No downtime |
No downtime |
No downtime |
No downtime |
No downtime |
|
System changes - Rolling Upgrades |
System level upgrade |
Minutes to hours |
No downtime |
Seconds to five minutes |
No downtime |
No downtime |
Cluster or site wide upgrade |
Minutes to hours |
Minutes to hours |
Seconds to five minutes |
Seconds to five minutes |
No downtimeFoot 1 |
|
Storage migration |
No downtimeFoot 2 |
No downtime2 |
No downtime2 |
No downtime2 |
No downtime2 |
|
Database one-off patch |
Minutes to hours |
No downtimeFoot 3 |
Seconds to five minutes |
No downtime3 |
No downtime |
|
Database patchset and version upgrade |
Minutes to hours |
Minutes to hours |
Seconds to five minutes |
Seconds to five minutes |
No downtime1 |
|
Platform migration |
Minutes to hours |
Minutes to hours |
Minutes to hours |
Minutes to hours |
No downtime1 |
|
Data changes - Online Reorganization and Redefinition |
No downtime |
No downtime |
No downtimeFoot 4 |
No downtime4 |
No downtime4 |
Footnote 1 The portion of any application connected to the failed system is temporarily affected.
Footnote 2 ASM automatically rebalances stored data when disks are added or removed while the database remains online. For storage migration, you will require both storage arrays to be leveraged by ASM temporarily.
Footnote 3 For qualified one-off patches only
Footnote 4 Tables can be reorganized online using the DBMS_REDEFINITION
package. However, the online changes are not supported by SQL Apply or data capture, and therefore the effects of this subprogram are not visible on the logical standby or replica database. For more information, see Oracle Data Guard Concepts and Administration or Oracle Streams Replication Administrator's Guide.
There are other Oracle and non-Oracle high availability and enterprise computing architectures. This section focuses on the most common variants.
Table 4-5 describes common alternative high availability architecture, their disadvantages, and the recommended Oracle high availability architectures.
Table 4-5 Comparison of High Availability Architectures
Alternative Architecture | Disadvantages | Recommended Oracle Architecture |
---|---|---|
Single instance database on hardware cluster |
|
|
Remote mirrored single instance database |
|
|
RAC database in a stretch cluster configuration |
|
|
RAC database with standby database on same site |
|
Oracle Database 10g with RAC and Data Guard - MAA |
Single instance database with standby database on same site |
|
Oracle Database 10g with RAC and Data Guard - MAA |
Table 4-6 describes common traditional enterprise computing architectures, their disadvantages, and the recommended Oracle enterprise computing architectures.
Table 4-6 Comparison of Enterprise Computing Architectures
Traditional Architecture | Disadvantages | Recommended Architecture |
---|---|---|
Monolithic database server |
|
|
Monolithic storage array |
|
|
See Also:
Oracle Resilient Low-Cost Storage Initiative Web site at
http://www.oracle.com/technology/deploy/availability/htdocs/lowcoststorage.html