Oracle® Database High Availability Overview 10g Release 2 (10.2) Part Number B14210-02 |
|
|
View PDF |
Oracle Database 10g offers an integrated suite of high availability solutions that increase availability and eliminate or minimize both planned and unplanned downtime. These solutions also help enterprises maintain 24x7 business continuity:
Oracle provides the following features for high availability:
Oracle Real Application Clusters (RAC) allows the Oracle database to run any packaged or custom application unchanged across a set of clustered servers. This capability provides the highest levels of availability and the most flexible scalability. If a clustered server fails, the Oracle database will continue running on the surviving servers. When more processing power is needed, another server can be added without interrupting user access to data.
RAC enables multiple instances that are linked by an interconnect to share access to an Oracle database. In a RAC environment, the Oracle database runs on two or more systems in a cluster while concurrently accessing a single shared database. The result is a single database system that spans multiple hardware systems yet appears as a single unified database system to the application. This enables RAC to provide high availability, scalability, and redundancy during failures within the cluster. RAC accommodates all system types, from read-only data warehouse (DSS) systems to update-intensive online transaction processing (OLTP) systems.
High availability configurations have redundant hardware and software that maintain operations by avoiding single points-of-failure. To accomplish this, the Oracle Clusterware is installed as part of the RAC installation process. Oracle Clusterware is a portable solution that is integrated and designed specifically for the Oracle database. In a RAC environment, Oracle Clusterware monitors all Oracle components (such as instances and listeners). If a failure occurs, Oracle Clusterware will automatically attempt to restart the failed component. Other non-Oracle processes can also be managed by Oracle Clusterware. During outages, Oracle Clusterware relocates the processing performed by the inoperative component to a backup component. For example, if a node in the cluster fails, Oracle Clusterware will cause client processes running on the failed node to reconnect and resume running on a surviving node.
The Oracle Clusterware requires two files, the Oracle Cluster Registry (OCR) and the voting disk. To avoid single points-of-failure, the Oracle Clusterware automatically maintains redundant copies of these files. Oracle Clusterware also enables you to replace a damaged copy of the OCR online. Oracle's recovery processes quickly re-master resources, recover partial or failed transactions, and rapidly restore the system.
RAC provides the following benefits:
Ability to tolerate and quickly recover from computer and instance failures
Fast, automatic, and intelligent connection and service relocation and failover
Rolling patch upgrades for qualified one-off patches
Rolling release upgrades of Oracle Clusterware
Load balancing advisory
Runtime connection load balancing
Flexibility to scale up processing capacity using commodity hardware without downtime or changes to the application
Comprehensive manageability integrating database and cluster features
Oracle Data Guard provides a comprehensive set of services that create, maintain, manage, and monitor one or more standby databases to enable production Oracle databases to survive failures, disasters, errors, and data corruption. Data Guard maintains these standby databases as transactionally consistent copies of the production database. Then, if the production database becomes unavailable due to a planned or an unplanned outage, Data Guard can switch any standby database to the production role, thus greatly reducing the downtime caused by the outage. The failover of data processing from the production to the standby database can be completely automatic and done without any human intervention, thereby reducing the management costs associated with the Data Guard configuration. Data Guard can be used with traditional backup, restore, and clustering solutions to provide a high level of data protection and data availability.
A Data Guard configuration consists of one production database and one or more physical or logical standby databases. The databases in a Data Guard configuration are connected by Oracle Net and may be dispersed geographically. There are no restrictions on where the databases are located if they can communicate with each other. For example, you can have a standby database in the same building as your primary database to help manage planned downtime and two or more standby databases in other locations for use in disaster recovery.
Oracle Data Guard provides the following benefits:
Maintains real-time, transactionally consistent database copies to provide protection against unplanned downtime and disaster
Complete data protection against computer failures, human errors, data corruption, and site failures
Reduces planned downtime for hardware and system upgrades, and Oracle patch set and database upgrades
Automatic detection and resolution of missing data following temporary loss of connectivity between the primary and standby database
Multiple levels of data protection and performance to balance data availability against system performance requirements
Allows efficient use of system resources by diverting reporting and backup operations from the production database to standby databases
Ability to diverge a Redo Apply standby database for reporting or testing purposes and resynchronize it with primary database once complete
Managed and automatic role transition and application notification to minimize planned and unplanned downtime
Automatic resynchronization of a failed primary database following a failover
All systems managed as a single configuration for simplified administration
Oracle Streams enables the propagation and management of data, transactions, and events in a data stream, either within a database or from one database to another. Streams provides a set of elements that allow users to control what information is put into a data stream, how the stream is routed from node to node, what happens to events in the stream as they flow into each node, and how the stream terminates.
Streams can be used to replicate a database or a subset of a database. This enables users and applications to simultaneously update data at multiple locations. If a failure occurs at one of the locations, then users and applications at the surviving sites can continue to access and update data.
Streams can be used to build distributed applications that replicate changes at the application level using message queuing. If an application fails, then the surviving applications can continue to operate and provide access to data through locally maintained copies.
Streams provides granularity and control over what is replicated and how it is replicated. It supports bidirectional replication, data transformations, subsetting, custom apply functions, and heterogeneous platforms. It also gives users complete control over the routing of change records from the primary database to a replica database.
As with Streams, Oracle Data Guard in SQL Apply mode can capture database changes, propagate them to destinations, and apply the changes at these destinations. Although Streams and Data Guard in SQL Apply mode share much of the same underlying technologies for high availability, Data Guard in SQL Apply mode is easier to implement and manage than a Streams-based high availability solution.
Oracle Streams provides the following benefits:
Data protection by maintaining a full or partial remote copy of the database
Achieves little or no downtime during database upgrade or maintenance operations such as migrating a database to a different platform or character set, modifying database objects to support upgrades to user-created applications, and applying an Oracle software patch
Data replication by capturing DML and DDL changes made to database objects and replicating these changes to one or more other databases
Event management and notification by enqueuing messages or capturing events, propagating the messages and events through queues, and dequeuing and applying or acting upon the message or event
Supports heterogeneous platforms across databases within the configuration
Allows character sets to differ between replicas
Permits fine-grained control of data sharing
Note:
The increased flexibility and capability of Oracle Streams over Oracle Data Guard with SQL Apply requires more investment and expertise to build and maintain an integrated high availability solution.Flashback technology provides a set of features to view and rewind data back and forth in time. The flashback features offer the capability to query past versions of schema objects, query historical data, perform change analysis, and perform self-service repair to recover from logical corruption while the database is online.
Flashback technology provides a SQL interface to quickly analyze and repair human errors. Flashback provides fine-grained analysis and repair for localized damage such as deleting the wrong customer order. Flashback technology also enables correction of more widespread damage, yet does it quickly to avoid long downtime. Flashback technology is unique to the Oracle Database and supports recovery at all levels including row, transaction, table, tablespace, and database.
Flashback technology includes the following features:
Oracle Flashback Query provides the ability to view the data as it existed in the past by utilizing the Automatic Undo Management system to obtain metadata and historical data for transactions. Undo data is persistent and will survive a database malfunction or shutdown. The unique features of Flashback Query not only provide the ability to query previous versions of tables, they also provide a powerful mechanism to recover from erroneous operations.
Uses of Flashback Query include:
Recovering lost data or undoing incorrect, committed changes. For example, rows that have been deleted or updated can be immediately repaired even after they have been committed.
Comparing current data with the corresponding data at some time in the past. For example, using a daily report that shows the changes in data from yesterday, it is possible to compare individual rows of table data, or find intersections or unions of sets of rows.
Checking the state of transactional data at a particular time, such as verifying the account balance on a certain day.
Simplifying application design by removing the need to store certain types of temporal data. Using a Flashback Query, it is possible to retrieve past data directly from the database.
Applying packaged applications, such as report generation tools, to past data.
Providing self-service error correction for an application, enabling users to undo and correct their errors.
Oracle Flashback Versions Query is an extension to SQL that can be used to retrieve the versions of rows in a given table that existed in a specific time interval. Oracle Flashback Versions Query returns a row for each version of the row that existed in the specified time interval. For any given table, a new row version is created each time the COMMIT
statement is executed.
Flashback Versions Query is a powerful tool for the DBA to run analysis to determine what happened. Additionally, application developers can use Flashback Versions Query to build customized applications for auditing purposes.
Oracle Flashback Transaction Query provides a mechanism to view all changes made to the database at the transaction level. When used in conjunction with Flashback Versions Query, it offers a fast and efficient means to recover from a user or application error. Flashback Transaction Query increases the ability to perform online diagnosis of problems in the database by returning the user that changed the row, and performs analysis and audits on transactions.
Oracle Flashback Table enables users to recover a table to a previous point in time. It provides a fast, online solution for recovering a table or set of tables that has been erroneously modified by a user or application. In most cases, Flashback Table alleviates the need for administrators to perform more complicated point-in-time recovery operations. Even after a flashback, the data in the original table is not lost; it can later be reverted back to the original state.
Dropping objects by accident has always been a problem for users and DBAs alike. Historically, there is no easy way to recover dropped tables, indexes, constraints, or triggers. Oracle Flashback Drop provides a safety net when dropping objects. When a user drops a table, Oracle automatically places it into the Recycle Bin. The Recycle Bin is a virtual container where all dropped objects reside. Users can continue to query data in a dropped table.
Oracle Flashback Database provides a more efficient alternative to database point-in-time recovery. With Oracle Flashback Database, current datafiles can be reverted to their contents at a past time. The result is much like the result of a point-in-time recovery using datafile backups and redo logs, but it is not necessary to restore datafiles from backup, or to re-apply as many individual changes in the redo logs as required in conventional media recovery.
Enabling Oracle Flashback Database provides the following benefits:
Eliminate the time to restore a backup when fixing human error that has a database-wide impact.
Allows standby databases to use real-time apply to synchronize with the primary database because humans errors can be quickly undone.
Allows quick standby database reinstantiation after a database failover.
When an Oracle Flashback recovery operation is performed on the database, the DBA must determine the point in time—identified by the System Change Number (SCN) or timestamp—to which the data can later be flashed back. Oracle Flashback restore points are user-defined labels that can be substituted for the SCN or transaction time used in Flashback Database, Flashback Table, and Recovery Manager (RMAN) operations. Furthermore, a database can be flashed back through a previous database recovery and open resetlogs by using guaranteed restore points. Guaranteed restore points allow major database changes—such as database batch jobs, upgrade, or patch—to be quickly undone by ensuring that the undo required to rewind the database is retained.
Using a combination of Oracle Data Guard, Flashback restore points and RMAN incremental backups, a physical standby database can be opened temporarily in read/write mode for development, reporting, or testing purposes. The physical standby database can then be resynchronized as an updated physical standby database by flashing back to the restore point and applying a recent incremental backup from the primary database.
Using Oracle Flashback restore points provides the following benefits:
Provides the ability to quickly cancel planned database changes that produced undesirable results, such as a failed batch job or application upgrade
Can be used in conjunction with Oracle Data Guard and RMAN incremental backups to quickly resynchronize a read/write clone database with the primary database
Automatic Storage Management (ASM) provides a vertically integrated file system and volume manager directly in the Oracle kernel, resulting in:
Significantly less work to provision database storage
Higher level of availability
Elimination of the expense, installation, and maintenance of specialized storage products
Unique capabilities for database applications
For optimal performance, ASM spreads files across all available storage. To protect against data loss, ASM extends the concept of SAME (stripe and mirror everything) and adds more flexibility in that it can mirror at the database file level rather than the entire disk level.
More importantly, ASM eliminates complexities associated with managing data and disks; it simplifies the processes of setting up mirroring, adding disks, and removing disks. Instead of managing hundreds and possibly thousands of files (as in a large data warehouse), DBAs using ASM create and administer a larger-grained object—the disk group—which identifies the set of disks that will be managed as a logical unit. Automation of file naming and placement of the underlying database files save DBAs time and ensures adherence to standard best practices.
The ASM native mirroring mechanism (2-way or 3-way) is an option that is used to protect against storage failures. With ASM mirroring, an additional level of data protection can be provided with the use of failures groups. A failure group is a set of disks sharing a common resource (disk controller or an entire disk array) whose failure can be tolerated. Once defined, an ASM failure group will intelligently place redundant copies of the data in separate failure groups to ensure that the data will be available and transparently protected against the failure of any component in the storage subsystem.
ASM provides the following benefits:
Provides the ability to mirror across drives and storage arrays
Automatically re-mirrors from a failed drive to remaining drives
Automatically rebalances stored data when disks are added or removed while the database remains online
Allows for operational simplicity in managing a database storage grid
See Also:
Oracle Database Administrator's GuideRecovery Manager (RMAN) is an Oracle utility to manage the backup and, more importantly, the recovery of the database. It eliminates operational complexity while providing superior performance and availability of the database.
Recovery Manager determines the most efficient method of executing the requested backup, restoration, or recovery operation and then submits these operations to the Oracle database server for processing. Recovery Manager and the server automatically identify modifications to the structure of the database and dynamically adjust the required operation to adapt to the changes.
RMAN provides the following benefits:
Automated channel failover on backup and restore operations
Automatic failover to a previous backup when the restore operation discovers a missing or corrupt backup
Automated creation of new database and temporary files during recovery
Automated recovery through a previous point-in-time recovery—recovery through resetlogs
Block media recovery enables the datafile to remain online while fixing the block corruption
Fast incremental backups using block change tracking
Merge incremental backups into image copies in the background providing up-to-date recoverability
Optimized backup and restore of required files only
Retention policy ensures that relevant backups are retained
Resumable backup and restore of previously failed operations
Automatic backup of the control file and the server parameter file ensuring that backup metadata is available in times of database structural changes as well as media failure and disasters
Online backup does not require the database to be placed into hot backup mode
The flash recovery area is a unified storage location for all recovery-related files and activities in an Oracle database. After this feature is enabled, all RMAN backups, archive logs, control file autobackups, and datafile copies are automatically written to a specified file system or automatic storage management disk group, and the management of this disk space is handled by RMAN and the database server.
Making a backup to disk is faster because using the flash recovery area eliminates the bottleneck of writing to tape. More importantly, if database media recovery is required, then datafile backups are readily available. Restoration and recovery time is reduced because you do not need to find a tape and a free tape device to restore the needed datafiles and archive logs.
The flash recovery area provides:
Unified storage location of related recovery files
Management of the disk space allocated for recovery files to simplify database administration tasks
Fast, reliable disk-based backup and restoration
The best protection against human errors is to prevent their occurrence. The best way to prevent human errors is to restrict user access to data and services to only those they truly need to perform their business functions. Oracle provides a wide range of security tools to control user access to application data by authenticating users and then enabling administrators to grant users only those privileges required to perform their duties.
In addition, the security model of the Oracle database provides the ability to restrict data access at a row level using the Virtual Private Database feature, thereby further isolating users from data that they do not need to access.
Oracle security features include:
Authentication control to validate the identities of entities using the networks, databases, and applications
Authorization control to provide limits to user access and actions linked by user identities and roles
Access control to objects, providing protection regardless of the entity seeking to access or alter them
Auditing control to monitor and gather data about specific database activities, investigate suspicious activity, deter users (or others) from inappropriate activities, and detect problems with authorization or access control implementation
Security policy management using profiles
Encryption of data residing within the database and backups, or transferred to and from databases
See Also:
Oracle Database Security GuideOracle provides fast and predictable recovery from system faults and database failures. The Fast-Start Fault Recovery technology included in the Oracle database automatically bounds database recovery time upon startup by using its self-tuned checkpoint processing. This makes recovery time fast and predictable, and improves the ability to meet service level objectives. Oracle's Fast-Start Fault Recovery can reduce recovery time on a heavily-laden database from tens of minutes to a few seconds.
Fast-Start Fault Recovery features include:
Predictable, bounded recovery from computer failures
Database checkpointing is self-tuning to maintain desired recovery time objective
Oracle log files contain useful information about the activities and history of the Oracle database. Log files contain all data necessary to perform a database recovery, and also record all changes made to the data and metadata within the database.
LogMiner is a fully relational tool that allows redo log files to be read, analyzed, and interpreted using SQL. Analysis of the log files with LogMiner can be used to:
Track or audit changes to data
Provide supplemental information for tuning and capacity planning
Retrieve critical information for debugging complex applications
Recover deleted data
LogMiner features include:
Pinpoint when a logical corruption to the database—such as errors made at the application level—may have occurred
Determine the necessary actions to perform fine-grained recovery at the transaction level
Performance tuning and capacity planning through trend analysis
Perform post-auditing
See Also:
Oracle Database UtilitiesOracle9i introduced the Hardware Assisted Resilient Data (HARD) Initiative, a program designed to prevent data corruption before it happens. Data corruption is very rare, but when it happens, it can have a catastrophic effect on a database, and therefore a business.
Under the HARD Initiative, Oracle works with selected system and storage vendors to build operating system and storage components that can detect corruption early and prevent corrupted data from being written to disk. The key approach is block checking where the storage subsystem validates the Oracle block contents.
To use HARD validation, all datafiles and log files are placed on HARD-compliant storage. The user must also enable the HARD validation feature on the storage, using the vendor-provided interface. When Oracle writes data to the storage, the storage system validates the data. If the data appears to be corrupted, then the write is either rejected with an error, or it is accepted with an error logged by the storage in the internal logs.
Storage vendors may choose to implement some or all of the checks in their implementation. Also, each vendor's implementation is unique and their control interfaces may have different features. Please check with the HARD initiative page for the latest vendor and implementation information.
http://www.oracle.com/technology/deploy/availability/htdocs/HARD.html
Oracle provides high availability solutions for all types of unplanned downtime:
Table 2-1 describes the various Oracle high availability solutions for unplanned downtime along with the recovery time that can be attained with each solution.
Table 2-1 Oracle High Availability Solutions for Unplanned Downtime
Outage Type | Oracle Solution | Benefits | Recovery Time |
---|---|---|---|
Computer failures |
Fast-Start Fault Recovery |
Tunable and predictable cache recovery |
Minutes to hoursFoot 1 |
RAC |
Automatic recovery of failed nodes and instances, fast connection failover, and service failover |
No downtimeFoot 2 |
|
Data Guard |
Fast Start Failover and fast connection failover |
Seconds to 5 minutes |
|
Oracle Streams |
Online replica database |
No downtime2 |
|
Storage failures |
ASM |
Mirroring and online automatic rebalance |
No downtime |
RMAN with flash recovery area |
Fully managed database recovery and managed disk-based backups |
Minutes to hours |
|
Data Guard |
Fast Start Failover and fast connection failover |
Seconds to 5 minutes |
|
Oracle Streams |
Online replica database |
No downtime2 |
|
Human errors |
Oracle security features |
Restrict user access as prevention |
No downtime |
Oracle Flashback technology |
Fine-grained and database-wide rewind capability |
< 30 minutesFoot 3 |
|
LogMiner |
Log analysis |
Minutes to hours |
|
Data corruption |
HARD |
Corruption prevention within a storage array |
No downtime |
RMAN with flash recovery area |
Online block media recovery and managed disk-based backups |
Minutes to hours |
|
Data Guard |
Automatic validation of redo blocks before they are applied, execute fast failover to an uncorrupted standby database |
Seconds to 5 minutes |
|
Oracle Streams |
Online replica database |
No downtime2 |
|
Site failures |
RMAN |
Fully managed database recovery and integration with tape management vendors |
Hours to days |
Data Guard |
Fast Start Failover and fast connection failover |
Seconds to 5 minutesFoot 4 |
|
Oracle Streams |
Online replica database |
Seconds to 5 minutes4 |
Footnote 1 Recovery time consists largely of the time it takes to restore the failed system.
Footnote 2 Database is still available, but portion of application connected to failed system is affected.
Footnote 3 Recovery time for human errors depend primarily on detection time. If it takes seconds to detect a malicious DML or DLL transaction, it typically only requires seconds to flashback the appropriate transactions. Longer detection time usually leads to longer recovery time required to repair the appropriate transactions. An exception is undropping a table, which is literally instantaneous regardless of detection time.
Footnote 4 Recovery time indicated applies to database and existing connection failover. Network connection changes and other site-specific failover activities may lengthen overall recovery time.
A computer failure outage occurs when the system running the database becomes unavailable because it has shut down or is no longer accessible. Downtime caused by computer failures can be reduced by employing rapid database recovery upon startup, or avoided by using cluster technology or data mirroring techniques.
Oracle offers the following high availability solutions to address computer failures:
For information on the benefits and attainable recovery time for each solution, see Table 2-1.
A storage failure outage occurs when the storage holding some or all of the database contents becomes unavailable because it has shut down or is no longer accessible. Downtime caused by storage failures can be reduced by keeping disk-based backups, copies, or replicas of the database, or avoided by using storage mirroring.
Oracle offers the following high availability solutions to address storage failures:
For information on the benefits and attainable recovery time for each solution, see Table 2-1.
A human error outage occurs when unintentional or malicious actions are committed that cause data within the database to become logically corrupt or unusable. The service level impact of a human error outage can vary significantly depending on the amount and critical nature of the affected data. The best protection against a human error outage is to prevent humans errors from occurring where possible, and when prevention is not possible, to detect and undo the errors quickly.
Oracle offers the following high availability solutions to address human errors:
For information on the benefits and attainable recovery time for each solution, see Table 2-1.
A data corruption outage occurs when a hardware or software component causes corrupt data to be read or written to the database. The service level impact of a data corruption outage may vary, from a small portion of the database (down to a single database block) to a large portion of the database (making it essentially unusable). If not prevented—or quickly detected and repaired—data corruption can disrupt the entire database or cause key business data to be lost.
Oracle offers the following high availability solutions to prevent—or detect and repair—data corruption:
Block checking db_block_checking
and block checksumming with db_block_checksum
DBVERIFY
, ANALYZE
, and DBMS_REPAIR
detection tools
For information on the benefits and attainable recovery time for each solution, see Table 2-1.
A site failure outage occurs when an event causes all or a significant portion of an application to stop processing or slow to an unusable level. A site failure may affect all processing at a data center, or a subset of applications supported by a data center. Downtime caused by a site failure can be minimized by keeping copies or replicas of the database updated in real time.
Oracle offers the following high availability solutions to address site failures:
For information on the benefits and attainable recovery time for each solution, see Table 2-1.
Planned downtime can be just as disruptive to operations as unplanned downtime. This holds especially true for global enterprises that need to support users in multiple time zones, or for those that need to provide 24x7 Internet access to their customers.
Planned downtime usually becomes necessary when performing routine operations, periodic maintenance, and new deployments. Routine operations include frequent maintenance tasks such as backup, performance tuning, user management, security enhancements, and batch operations. Periodic maintenance—such as patching or reconfiguring the system—may be necessary to update the database, application, operating system, middleware, or network on occasion. New deployments include major upgrades or new rollouts of the hardware, database, application, operating system, middleware, or network.
When the volume of data stored in a database becomes very large, such maintenance operations that require planned downtime may become quite time consuming. It thus becomes very important that these maintenance operations be performed without affecting the users' access to the data.
Oracle provides the following high availability solutions to address planned downtime:
For system changes:
For data changes:
Oracle continues to broaden support for dynamic reconfiguration of the database, enabling it to adapt to changes in hardware demands without any service interruptions. The Oracle database dynamically accommodates various changes to hardware and database configurations:
Add and remove processors from an SMP server
Add and remove nodes and instances in an Oracle Real Application Cluster (RAC) environment
Dynamically grow and shrink its shared memory allocation and automatically tune memory online using Automatic Shared Memory Management
Add and remove database disks online without disturbing database activities using Automatic Storage Management (ASM)
Add and remove storage arrays online without disturbing database activities using ASM
Automatically rebalance I/O load across the database storage using ASM
Move datafiles online when adding or dropping disks using ASM, which automatically rebalances database storage whenever the storage configuration is changed
These capabilities provide no-cost system changes and capacity on-demand provisioning, both of which are fundamental requirements of enterprise Grid computing.
Memory and storage management have improved significantly with the advent of Automatic Shared Memory Management and Automatic Storage Management (ASM). By setting the SGA_TARGET
parameter to a nonzero value, the shared pool, large pool, Java pool, Streams pool, and buffer cache can automatically and dynamically resize as needed. ASM automates and simplifies the layout of datafiles, control files, and log files. Database files are automatically distributed across all available disks, and database storage is rebalanced whenever the storage configuration changes, including adding and removing disks or storage arrays. ASM provides redundancy through the mirroring of database files, and provides optimal performance by automatically distributing database files across all available disks. Rebalancing of the database storage automatically occurs whenever the storage configuration changes.
Another type of dynamic reconfiguration occurs when Oracle polls the operating system to detect changes in the number of available CPUs and reallocates internal resources. In addition, almost all initialization parameters can be changed without shutting down the instance. Simply use the ALTER SESSION
statement to change the value of a parameter during a session, or the ALTER SYSTEM
statement to change the value of a parameter in all sessions of an instance for the duration of the instance.
See Also:
Oracle Database Concepts and Oracle Database Administrator's Guide for information on Automatic Shared Memory Management and Automatic Storage ManagementThe Oracle database continues to reduce downtime required for system, software, and application upgrades. Oracle provides the following benefits:
Zero downtime for system and hardware upgrades with RAC
Zero downtime for operating system upgrades with RAC
Zero downtime for qualified one-off database patches with RAC
Zero downtime for storage migration with ASM
Minimum downtime for system or cluster upgrades with Data Guard
Minimum downtime for patchset or database upgrades with Data Guard
Minimum downtime for database upgrade with Transportable Tablespace
Minimum downtime for platform migration using Transportable Tablespace and potentially Data Guard
Minimum downtime for database upgrade with Oracle Streams
Minimum downtime for platform migration with Oracle Streams
Table 2-2 describes the various Oracle high availability solutions for planned downtime along with the recovery time that can be attained with each solution and their known considerations. For all cases, extensive testing is highly recommended prior to performing any rolling upgrade.
Table 2-2 Oracle High Availability Solutions for Planned Downtime
Maintenance Type | Oracle Solution | Description | Recovery Time | Considerations |
---|---|---|---|---|
System and hardware upgrades |
RAC |
To avoid downtime:
|
No downtime |
Need to check for system restrictions. Need to check if the database and clusterware versions are certified with the new system and hardware changes. |
Operating system upgrade |
RAC |
To avoid application downtime:
|
No downtime |
Need to check if the database and the clusterware versions are certified for both operating system patch releases. |
Oracle one-off patches |
RAC |
"One-off" patches—or interim patches—to database software are usually applied to implement known fixes for software problems, or to apply diagnostic patches to gather information on a problem. Such patch application is often performed during a schedule maintenance outage. Oracle provides the capability to do rolling patch upgrades with RAC with little or no database downtime using the A RAC rolling upgrade enables at least some instances of the RAC installation to be available during the scheduled outage required for patch upgrades. Only the RAC instance that is currently being patched must be disabled. The other instance remains available. This means that the impact on the application downtime required for scheduled outages is further reduced. The Oracle |
No downtime |
Rolling upgrade is only available for patches that are certified for rolling upgrades. Typically, patches that can be installed in a rolling upgrade include:
RAC cannot be used for rolling upgrade of patch sets. |
CRS upgrades |
RAC |
All upgrades to Oracle Clusterware can be done in a rolling fashion. |
No downtime |
|
Storage migrationFoot 1 |
ASM |
ASM enables you to add all disks in one storage array and subsequently drop all disks from another array. ASM will automatically rebalance and migrate data to the new storage while the database remains operational. |
No downtime |
Before removing the source storage array, ensure that the rebalancing is complete. |
System and cluster upgrades, and ASM upgrades that are not rolling upgradeable |
Data Guard |
For system upgrades that are not rolling upgradable with RAC due to system restrictions or cluster firmware upgrades that require downtime, upgrade the standby first and then leverage Data Guard to switch over to a physical or logical standby database:
|
Seconds to minutes |
For fastest switchover, the standby database should be using real-time apply and synchronized prior to the switchover operation. This is the best approach if RAC rolling upgrade is not possible. |
Patchset and database upgrades |
Data Guard using SQL Apply |
Leverage Data Guard using SQL Apply to upgrade an Oracle database:
|
Seconds to minutes |
Only supported for Oracle database versions 10.1.0.3 and higher. SQL Apply has some data type restrictions. For more information, see Oracle Data Guard Concepts and Administration. This is the best approach if RAC rolling upgrade is not possible and there are no data type restrictions. |
Database upgrades and platform migration |
Transportable tablespace |
Transporting a database requires only copying datafiles and integrating the tablespace structural information. Tablespaces can even be transported between databases from different releases. With Oracle database 10g, tablespaces can be transported across platforms. To perform a database upgrade or platform migration:
If the target database resides on a separate host but on the same platform, create a physical standby database from the initial primary database co-located with the target database. After a Data Guard Switchover, transport the tablespaces from the source to the target without incurring the file transfer time as part of the downtime.Foot 2 |
Minutes to hours |
Transportable tablespace has limitations and restrictions in regard to character sets, opaque types, and system tablespace objects. Unlike previous solutions, the steps are not automated. Transportable tablespaces do provide the following benefits:
This is the best rolling upgrade approach if both of the following are true:
|
Database upgrades and platform migration |
Oracle Streams |
Like Data Guard using SQL Apply, Oracle Streams can capture database changes, propagate them to destinations, and apply the changes at these destinations. Oracle Streams is optimized for replicating data and can capture changes locally in the online redo log as it is written or remotely at a downstream database. The captured changes can then be propagated asynchronously to replica databases, or the redo can be sent to the downstream database synchronously or asynchronously and applied in real time by the downstream capture process. These optimizations can reduce latency and enable the replicas to lag the primary database by no more than a few seconds. Unlike Data Guard using SQL Apply, Oracle Streams enables updates on the replica and provides support for heterogeneous platforms with different database releases. Therefore, Oracle Streams may provide the fastest approach for database upgrades and platform migration. |
Seconds to minutes to hours |
Oracle Streams also has data type limitations and restrictions, such as for advanced queue and object types. But in some cases you can work around them by creating shadow tables on the source database. You can create a trigger on tables with unsupported data types to capture and propagate changes to tables with supported data types. Those changes will be replicated by way of Streams to the target database. You can customize apply to apply the changes to the original tables in the target database. Oracle Streams implementations will require additional investment for setup and configuration because it is designed to be a more flexible architecture. This is the best approach if you are currently using Oracle Streams. |
Platform migration to another platform with same endianness |
Transportable Database |
Transportable Database, a new feature in Oracle Database 10g Release 2 (10.2), is the recommended method for migrating an entire database to another platform that has the same endian format. |
Varies |
The downtime required for a platform migration when using Transportable Database is determined by the time needed to:
|
Footnote 1 An example is migration from traditional storage to low cost storage
Footnote 2 For more information, refer to the best practices white papers available at http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
.
See Also:
Oracle Data Guard Concepts and Administration for more information on using Data Guard with SQL Apply to upgrade an Oracle Database
Oracle Database Concepts and Oracle Database Administrator's Guide for more information on transportable tablespace
The best practices white papers on rolling upgrades at
http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm
One way to enhance availability and manageability is to allow users full access to the database during a data reorganization operation. The Online Reorganization and Redefinition feature in Oracle Database 10g offers administrators significant flexibility to modify the physical attributes of a table and transform both data and table structure while allowing users full access to the database. This capability improves data availability, query performance, response time, and disk space utilization—all of which are important in a mission-critical environment and make the application upgrade process easier, safer, and faster.
This online architecture provides the following benefits:
Tables can be reorganized and redefined online
Any physical attribute of the table can be changed online. The table can be moved to a new location, partitioned, and converted from one organization (such as heap-organized) to another (such as index-organized).
Many logical attributes can also be changed. Column names, types, and sizes can be changed. Columns can be added, deleted, or merged. One restriction is that the primary key of the table cannot be modified.
All index operation can be performed online
Indexes can be created online and analyzed simultaneously. Online repair of the physical guess component of logical rowids (used in secondary indexes and in the mapping table for index-organized tables) can also be used.
An index-organized table and secondary indexes can be reorganized online to eliminate the reorganization maintenance window. Secondary indexes support efficient use of block hints (physical guesses). Invalid physical guesses of logical rowids stored in secondary indexes on index-organized table can also be repaired online.
An index-organized table or table partition can be reorganized without rebuilding its secondary indexes, resulting in a short reorganization maintenance window.
Online move of a partitioned table
Online reorganization support for advanced queues, clustered tables, materialized views, and abstract data types (objects)
Depending on the type of online reorganization that is required, the following types of data reorganization can be performed using the DBMS_REDEFINITION
package or the SQL CREATE/ALTER TABLE
and INDEX
commands:
Modify table storage parameters
Move table to a different tablespace
Add support for parallel queries
Add or drop partitioning support
Recreate table to avoid fragmentation
Change from table to IOT or from IOT to table
Add or drop a column
Transform a column using a function
Create indexes online
Rebuild indexes online
Coalesce indexes online
Move index-organized tables online
Copy dependent objects (such as triggers, constraints, and indexes)
Convert LONG and LONG RAW columns to a LOB
Use a unique key as an alternative to a primary key or rowid
Specify columns to order data by
Change a table without recompiling stored procedures
Online segment shrink
Reorganize a single partition
Reorganize advanced queue and clustered tables
Reorganize a table containing an ADT
Retain and clone statistics
Copies check and not null constraints
Copies dependent objects for nested tables
See Also:
Oracle Database Administrator's GuideThe traditional Oracle database configuration consists of a monolithic server connected to a monolithic storage array. Trends in computing technology are enabling a new IT architecture called Grid computing. Grid computing is a new computing architecture that effectively pools large numbers of servers and storage into a flexible, on-demand computing resource for all enterprise computing needs. Technology innovations such as low-cost blade servers, small and inexpensive multi-processor servers, modular storage technologies, and open source operating systems such as Linux provide the raw materials for the Grid. By harnessing these technologies and leveraging the Grid technology available in Oracle Database 10g, enterprises can deliver extremely high quality of service to their users while vastly reducing their expenditures on IT.
Oracle Database 10g captures the cost advantages of Grid enterprise computing without sacrificing performance, scalability, security, manageability, functionality, or system availability. A Database Server Grid is a collection of commodity servers connected together to run on one or more databases. A Database Storage Grid is a collection of low-cost modular storage arrays combined together and accessed by the computers in the Database Server Grid.
Figure 2-2 illustrates the Database Server Grid and Database Storage Grid in a Grid enterprise computing environment.
This section covers the following topics:
The availability of low-cost and reliable blade servers, small multi-processor servers, and inexpensive open-source operating systems such as Linux, has made it possible to build a Database Server Grid that is highly available, scalable, flexible, and manageable.
Oracle Real Application Clusters (RAC) is the technology that enables a Database Server Grid by providing a single database that spans multiple low-cost servers yet appears to the application as a single, unified database system. RAC provides flexibility to dynamically provision resources and services within the Grid as computing needs change, and to add systems to the Grid as capacity demands increase. In addition, RAC provides protection from system failures by automatically recovering the processing of a failed node by any of the surviving systems running the database, and facilitating the reconnection of clients and redistribution of load affected by the failed system.
The availability of low-cost ATA disk-based storage arrays and low-cost storage networks has made it possible to use a Database Storage Grid with the Oracle database at very low cost. A DBA can use the Automatic Storage Management (ASM) interface to specify the disks within the Database Storage Grid that ASM should manage across all server and storage platforms. ASM partitions the disk space and evenly distributes the data storage throughout the entire storage array. Additionally, ASM automatically redistributes the data storage as storage arrays are added or removed from the Database Storage Grid.
When used with Oracle Data Guard, Oracle Flashback, and Recovery Manager (RMAN), the Database Storage Grid can achieve high availability and performance for the Oracle database. Oracle Data Guard protects against site disasters and data corruption by automatically maintaining a standby database. Oracle Flashback protects against human errors using snapshot capabilities, allowing a DBA to rewind a table or the entire database to a specific moment in time before the error occurred. RMAN provides full and incremental backup to disk or tape for archival and protection against data corruption, while the Flash Recovery Area fully manages all disk-based recovery related files and database activities.
To help customers successfully deploy a Database Storage Grid, Oracle has launched the Resilient Low-Cost Storage Initiative. This initiative is analogous to the Oracle Storage Compatibility Program (OSCP) but with a focus on low-cost storage. Its goal is to ensure that low-cost storage performance can be manageable, reliable, and optimal.
This initiative partners Oracle with storage vendors to validate that their low-cost storage arrays meet the minimum price, functionality, and performance requirements for use in a Database Storage Grid. Vendors use Oracle-supplied tools to characterize their storage array performance. After verifying that all requirements can be satisfied, the storage array is certified for this initiative and the vendor—together with Oracle—produces a Best Practice white paper to instruct Oracle customers on how to optimally configure the storage array for a Database Storage Grid environment.
See Also:
Oracle Resilient Low-Cost Storage Initiative Web site at
http://www.oracle.com/technology/deploy/availability/htdocs/lowcoststorage.html
Oracle Storage Compatibility Program (OSCP) Web site at
http://www.oracle.com/technology/deploy/availability/htdocs/oscpf.html
Oracle Enterprise Manager Grid Control is an HTML-based user interface that provides the administrator with complete monitoring across the entire Oracle technology stack—business applications, application servers, databases, and the E-Business Suite—as well as non-Oracle components within the Grid. If a component within the Grid becomes unavailable or experiences performance problems, an alert is automatically generated to the Enterprise Manager console to inform the administrator so appropriate action can be taken.
The components of the Oracle Enterprise Manager Grid Control include:
Oracle Management Service (OMS) - a J2EE Web application that renders the user interface for the Grid Control Console, works with all Management Agents to process monitoring information, and uses the Management Repository as its persistent data store
Oracle Management Agents - processes deployed on each monitored host to monitor all targets on the host, communicate that information to the Management Service, and maintain the host and its targets
Oracle Management Repository - schema in the Oracle database that contains all available information about administrators, targets, and applications managed by Enterprise Manager
Communication between the console, the OMS, and Oracle Management Agents is done through HTTP. SSL can also be enabled to allow secure communications between tiers within firewall-protected environments. The Management Agent uploads collected monitoring data to the OMS, which in turn loads the data into the Management Repository. Changes in a target state (such as an availability state change) result in an alert being generated to the Enterprise Manager Console.
Using Oracle Enterprise Manager Grid Control, an administrator can:
Monitor architecture components and be alerted when a failure occurs
View overall system status, such as the number of nodes in the database cluster and their current status
View alerts aggregated across all instances
Set thresholds for alert generation on a database cluster-wide basis
Monitor performance metric across all instances
Perform database cluster-wide operations such as backup and recovery
Viewing hardware and operating system information in the entire Grid as a whole
Interconnect monitoring of cluster databases
See Also:
Oracle Enterprise Manager Concepts and Oracle Enterprise Manager Administrator's Guide for more information Oracle Enterprise Manager
The best practices white papers on configuring Enterprise Manager for high availability at
http://www.oracle.com/technology/deploy/availability/htdocs/maa.htm