Packt+ | Advance your knowledge in tech

You're reading from Oracle Data Guard 11gR2 Administration : Beginner's Guide

Product type Book

Published in Jun 2013

Publisher Packt

ISBN-13 9781849687904

Pages 404 pages

Edition 1st Edition

Languages

Concepts

Database Administration

Table of Contents (19) Chapters

Oracle Data Guard 11gR2 Administration Beginner's Guide

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Pop Quiz Answers

Getting Started

Configuring the Oracle Data Guard Physical Standby Database

Configuring Oracle Data Guard Logical Standby Database

Oracle Data Guard Broker

Data Guard Protection Modes

Data Guard Role Transitions

Active Data Guard, Snapshot Standby, and Advanced Techniques

Integrating Data Guard with the Complete Oracle Environment

Data Guard Configuration Patching

Common Data Guard Issues

Data Guard Best Practices

Index

Chapter 10. Common Data Guard Issues

Data Guard administrators need to know methods to resolve some specific issues. These issues may originate from configuration changes, misconfiguration, or user errors. Another important point is the use of diagnostic data to identify these issues. Now we'll cover handling the most common of these Data Guard issues and the methods to access and use diagnostic data.

In this chapter, we will discuss the following topics:

Recreating the standby control file
Dealing with redo transport authentication problems
Dealing with UNNAMED datafiles
Closing a gap with RMAN incremental backups
Fixing NOLOGGING changes in a standby database
Turning on Data Guard tracing
Gathering diagnostic data

Let's start with renewing the standby control file of a standby database.

Recreating the standby control file

A standby control file essentially keeps the same information of the primary database with the control file, which is the physical structure of the database. It also contains some specific information about the Data Guard, such as whether an archive log sequence is applied or not. A standby control file is mandatory to mount a physical standby database, and we should consider keeping multiple copies of the standby control file, preferably on different disks, which is known as multiplexing.

In some cases, we may want to renew a standby control file by using a newly created one on the primary database. For example, before a switchover it's a good practice to renew the standby control file in order to guarantee that all of the redo, temp file structure, and historical archived log data are the same. In general, this is a three-step operation:

Create a copy of the standby control file from the primary database.
Transfer this standby control file from the primary...

Time for action – recreating the standby control file

This action shows how to renew the standby control file in a Data Guard environment with OMF.

In the primary database, create a backup of the standby control file with the following RMAN statements:

$rman target /
Recovery Manager: Release 11.2.0.1.0 - Production on Wed Dec 19 22:18:05 2012
Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.
connected to target database: ORCL (DBID=1319333016)

RMAN> BACKUP CURRENT CONTROLFILE FOR STANDBY FORMAT 'standbyctl.bkp';
Starting backup at 19-DEC-12
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=149 device type=DISK
channel ORA_DISK_1: starting full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
including standby control file in backup set
channel ORA_DISK_1: starting piece 1 at 19-DEC-12
channel ORA_DISK_1: finished piece 1 at 19-DEC-12
piece handle=/u01/app/oracle2...

Dealing with redo transport authentication problems

By default, the SYS user is used for redo transport in Data Guard configurations. Data Guard communication uses password files in the standby databases to authenticate redo transport sessions. If we change the password of the SYS user in the primary database, redo transport sessions will not be authenticated because the password file in the standby site will be outdated. So redo transport will raise the ORA-01017: invalid username/password or ORA-01031: insufficient privileges error. The primary database alert logfile will include the following lines:

Error 1017 received logging on to the standby
PING[ARC0]: Heartbeat failed to connect to standby 'INDIAPS'. Error is 16191.

It can also include the following lines:

ORA-01031: insufficient privileges
PING[ARC2]: Heartbeat failed to connect to standby 'INDIAPS'. Error is 1031.

Time for action – changing the SYS password in a Data Guard environment

The way to change the SYS password without breaking the redo transport service includes copying the primary database's password file to the standby server after changing the password. The following steps show how this can be done:

Stop redo transport from the primary database to the standby database. We can execute the DEFER command to defer the log destination with the ALTER SYSTEM statement:
```
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = 'DEFER';

System altered.
```
If the Data Guard broker is being used, we can use the following statement:
```
DGMGRL> EDIT DATABASE TURKEY_UN SET STATE = 'LOG-TRANSPORT-OFF';
```

Change the SYS user's password in the primary database:

SQL> ALTER USER SYS IDENTIFIED BY newpassword;

User altered.

Copy the primary database's password file to the standby site:

$ cd $ORACLE_HOME/dbs
$ scp orapwTURKEY standbyhost:/u01/app/oracle/product/11.2.0/ dbhome_1/dbs/orapwINDIAPS

Try logging into the...

Time for action – changing the redo transport user

Follow these steps to change the redo transport user in the Data Guard configuration:

Create a new database, which will be used for redo transport in the primary database. Grant the SYSOPER privileges to this user and ensure that the standby database has applied these changes:
```
SQL> CREATE USER DGUSER IDENTIFIED BY SOMEPASSWORD;
SQL> GRANT SYSOPER to DGUSER;
```
Tip
Don't forget that if the password expires periodically for this user, this will pose a problem in Data Guard redo transport. So ensure that the default profile does not include the PASSWORD_LIFE_TIME and PASSWORD_GRACE_TIME settings. If it does, choose another profile for this user.
Stop the redo transport from the primary database to the standby databases. We can execute the DEFER command to defer the log destination with the ALTER SYSTEM statement:
```
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = 'DEFER';
```
Change the redo transport user by setting the REDO_TRANSPORT_ USER...

Dealing with UNNAMED datafiles

There are some reasons for a file being created as UNNAMED in the standby database, including insufficient disk space on the standby site, non-privileged directory structure on standby database, or improper parameter settings related to file management.

The STANDBY_FILE_MANAGEMENT parameter enables or disables automatic standby file management in Data Guard. When automatic standby file management is enabled, file additions and deletions in the primary database are replicated to the standby database.

For example, we add a datafile in the primary database when the STANDBY_FILE_MANAGEMENT parameter on the standby database is set to MANUAL. Due to this parameter setting, it will create an UNNAMED file under $ORACLE_HOME/dbs, and this will cause the MRP process to be killed. Errors in the alert log file will be as follows:

Errors in file /u01/app/oracle2/diag/rdbms/india_ps/INDIAPS/trace/INDIAPS_pr00_691.trc:
ORA-01111: name for data file 10 is unknown - rename to...

Time for action – resolving UNNAMED datafile errors

Now we'll see how to resolve an UNNAMED datafile issue in a Data Guard configuration:

Check for the datafile number that needs to be recovered from the standby database:

SQL> SELECT * FROM V$RECOVER_FILE WHERE ERROR LIKE '%MISSING%';

     FILE# ONLINE  ONLINE_ ERROR                   CHANGE# TIME
---------- ------- ------- ----------------- ---------- ----------
       10  ONLINE  ONLINE  FILE MISSING                  0

Identify datafile 10 in the primary database:

SQL> SELECT FILE#,NAME FROM V$DATAFILE WHERE FILE#=10;

     FILE# NAME
---------- -----------------------------------------------
       536 /u01/app/oracle2/datafile/ORCL/users03.dbf

Identify the dummy filename created in the standby database:

SQL> SELECT FILE#,NAME FROM V$DATAFILE WHERE FILE#=10;

     FILE# NAME
---------- -------------------------------------------------------
       536 /u01/app/oracle2/product/11.2.0/dbhome_1/dbs/UNNAMED00010

If the reason for...

Closing a gap with an RMAN incremental backup

When a standby database falls behind the primary database in time because of any interruption in redo transport or apply, the database can be synchronized again by applying the archived logs produced in the no-synchronization period. However, even if one of the necessary archived logfiles is not accessible, there is nothing to do for Data Guard to close the gap.

In such a case, we have to restore these archived logfiles from backups if they exist. If not, the only way to close the gap is by using an RMAN incremental backup taken from the primary database, especially to close the gap in question. We use the BACKUP INCREMENTAL FROM SCN RMAN statement for this special-purpose backup.

Time for action – closing a gap with an RMAN incremental backup

Let's see all the required steps to practice this recovery operation:

In this practice, assume that there are missing archived logs (gap) in the standby database, and we're not able to restore these archived logs. We'll synchronize Data Guard using the RMAN incremental backup. To represent this situation, execute the DEFER command to defer the log destination in the primary database, and execute the following operation that will generate redo in the primary database:
```
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_STATE_2 = 'DEFER';
```
Now we have a standby database behind the primary database, and we'll use RMAN to reflect the primary database's changes to the standby database. Stop Redo Apply in the standby database:
```
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
```
Query the current system change number (SCN) of the standby database that will be used as the limit for an incremental backup of the primary database. Run...

Fixing NOLOGGING changes on the standby database

It's possible to limit redo generation for specific operations on Oracle databases, which provide higher performance. These operations include bulk inserts, creation of tables as select operations, and index creations. When we work using the NOLOGGING clause, redo will not include all the changes to data on the related segments. This means if we perform a restore/recovery of the related datafile, or of the whole database after the NOLOGGING operations, it'll not be possible to recover the data created with the NOLOGGING option.

The same problem exists with Data Guard. When the NOLOGGING operation is executed in the primary database, Data Guard is not able to reflect all the data changes in the standby database. In this case, when we activate a standby database or open it in the read-only mode, we'll see the following error messages:

ORA-01578: ORACLE data block corrupted (file # 1, block # 2521)
ORA-01110: data file 1: '/u01/app/oracle2/datafile...

Time for action – fixing NOLOGGING changes on a standby database with incremental datafile backups

As a prerequisite for this exercise, first put the primary database in the no-force logging mode using the ALTER DATABASE NO FORCE LOGGING statement. Then perform some DML operations in the primary database using the NOLOGGING clause so that we can fix the issue in the standby database with the following steps:

Run the following query to identify the datafiles that are affected by NOLOGGING changes:

SQL> SELECT FILE#, FIRST_NONLOGGED_SCN FROM V$DATAFILE WHERE FIRST_NONLOGGED_SCN > 0;

FILE#      FIRST_NONLOGGED_SCN
---------- -------------------
         4            20606544

First we need to put the affected datafiles in the OFFLINE state in the standby database. For this purpose, stop Redo Apply in the standby database, execute the ALTER DATABASE DATAFILE ... OFFLINE statement, and start Redo Apply again:
```
SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;
SQL> ALTER DATABASE...
```

Time for action – fixing NOLOGGING changes in the standby database with incremental database backups

Determine the SCN that we'll use in the RMAN incremental database backup by querying the minimum FIRST_NONLOGGED_SCN column of the V$DATAFILE view in the standby database:

SQL> SELECT MIN(FIRST_NONLOGGED_SCN) FROM V$DATAFILE WHERE FIRST_NONLOGGED_SCN>0;

MIN(FIRST_NONLOGGED_SCN)
------------------------
                20606544

Stop Redo Apply on the standby database:

SQL> ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

Now we'll take an incremental backup of the database using the FROM SCN keyword. The SCN value will be the output of the execution of the query in the first step. Connect to the primary database as the RMAN target and execute the following RMAN BACKUP statement:
```
RMAN> BACKUP INCREMENTAL FROM SCN 20606344 DATABASE FORMAT '/data/DB_Inc_%U' TAG 'FOR STANDBY';
```
Copy the backup files from the primary site to the standby site with FTP or SCP:
```
scp /data/DB_Inc_...
```

Turning on Data Guard tracing

When database administrators work on a Data Guard problem or plan an important Data Guard operation such as role transition, they generally prefer to gather comprehensive trace information about the activity of Data Guard-related processes. For this purpose, Oracle offers the LOG_ARCHIVE_TRACE parameter. By setting this parameter to an appropriate value, it's possible to have detailed information about log archiving, redo transport, and Redo Apply activities.

The default value of this initialization parameter is 0, which means the additional tracing feature is off, and Oracle will continue generating its default alert and trace entries related to error conditions. It's possible to change the value of this parameter in the primary and/or standby databases online using the ALTER SYSTEM statement. For example, look at the following statement:

SQL> ALTER SYSTEM SET LOG_ARCHIVE_TRACE=15;

In the Real Application Cluster database it's possible to set different tracing...

Gathering diagnostic data

We need to access diagnostic data about the Data Guard configuration, especially when there's a problem in the Redo Apply or redo transport services. After the first diagnosis of the problem, it's possible to decide whether to search for detailed information in the primary database or in the standby database. If the issue is about sending redo, it's more likely that the necessary information can be found on the primary site. However, if it's about Redo Apply, it's better to search for detailed information on the standby site.

No matter where we search for diagnostic data, we need to know where to search for the related logfiles and how to query diagnostic data in the database. The most commonly referenced files in a Data Guard issue are the primary and standby alert log files. If the Data Guard broker is used in the configuration, Data Guard Monitor (DMON) logfiles can also be helpful for troubleshooting. If necessary, we can also query Data Guard-related dynamic...

Time for action – monitoring the database alert log using ADRCI

Let's see an example of monitoring the database alert log using the ADRCI utility:

Ensure that the ORACLE_HOME and PATH environment variables are set properly. The PATH environment variable must include the ORACLE_HOME/bin directory.
```
export ORACLE_HOME=/u01/app/oracle2/product/11.2.0/dbhome_1
export PATH=$PATH:$ORACLE_HOME/bin
```

Start the ADRCI command-line tool:

$ adrci

ADRCI: Release 11.2.0.1.0 - Production on Thu Dec 20 02:06:49 2012

Copyright (c) 1982, 2009, Oracle and/or its affiliates.  All rights reserved.

ADR base = "/u01/app/oracle2"

We can run the HELP command to get help on the usage of this utility:

adrci> HELP
HELP [topic]
   Available Topics:
        CREATE REPORT
        ECHO
        EXIT
        HELP
        HOST
        IPS
        PURGE
        RUN
        SET BASE
        SET BROWSER
        SET CONTROL
        SET ECHO
        SET EDITOR
        SET HOMES | HOME | HOMEPATH
        SET TERMOUT
        SHOW...

Summary

In this chapter, we have covered common Data Guard issues using diagnostic data in a Data Guard environment. As Data Guard administrators, we have to identify the underlying reason of a Data Guard issue using this diagnostic data, and resolve the issue in the correct way. We think that the information and examples that we've seen in this chapter will be helpful for this purpose. The next chapter is about Data Guard best practices.

The rest of the chapter is locked