Chapter 10. Common Data Guard Issues
Data Guard administrators need to know methods to resolve some specific issues. These issues may originate from configuration changes, misconfiguration, or user errors. Another important point is the use of diagnostic data to identify these issues. Now we'll cover handling the most common of these Data Guard issues and the methods to access and use diagnostic data.
In this chapter, we will discuss the following topics:
Recreating the standby control file
Dealing with redo transport authentication problems
Dealing with UNNAMED
datafiles
Closing a gap with RMAN incremental backups
Fixing NOLOGGING
changes in a standby database
Turning on Data Guard tracing
Gathering diagnostic data
Let's start with renewing the standby control file of a standby database.
Recreating the standby control file
A
standby control file essentially keeps the same information of the primary database with the control file, which is the physical structure of the database. It also contains some specific information about the Data Guard, such as whether an archive log sequence is applied or not. A standby control file is mandatory to mount a physical standby database, and we should consider keeping multiple copies of the standby control file, preferably on different disks, which is known as multiplexing.
In some cases, we may want to renew a standby control file by using a newly created one on the primary database. For example, before a switchover it's a good practice to renew the standby control file in order to guarantee that all of the redo, temp file structure, and historical archived log data are the same. In general, this is a three-step operation:
Create a copy of the standby control file from the primary database.
Transfer this standby control file from the primary...
Time for action – recreating the standby control file
This action shows how to renew the standby control file in a Data Guard environment with OMF.
In the primary database, create a backup of the standby control file with the following RMAN statements:
Dealing with redo transport authentication problems
By default, the SYS
user is used for redo transport in Data Guard configurations. Data Guard communication uses password files in the standby databases to authenticate redo transport sessions. If we change the password of the SYS
user in the primary database, redo transport sessions will not be authenticated because the password file in the standby site will be outdated. So redo transport will raise the ORA-01017: invalid username/password
or ORA-01031: insufficient privileges
error. The primary database alert logfile will include the following lines:
It can also include the following lines:
Time for action – changing the SYS password in a Data Guard environment
The way to change the SYS
password without breaking the redo transport service includes copying the primary database's password file to the standby server after changing the password. The following steps show how this can be done:
Stop redo transport from the primary database to the standby database. We can execute the DEFER
command to defer the log destination with the ALTER SYSTEM
statement:
If the Data Guard broker is being used, we can use the following statement:
Change the SYS
user's password in the primary database:
Copy the primary database's password file to the standby site:
Try logging into the...
Time for action – changing the redo transport user
Follow these steps to change the redo transport user in the Data Guard configuration:
Create a new database, which will be used for redo transport in the primary database. Grant the SYSOPER
privileges to this user and ensure that the standby database has applied these changes:
Tip
Don't forget that if the password expires periodically for this user, this will pose a problem in Data Guard redo transport. So ensure that the default profile does not include the PASSWORD_LIFE_TIME
and PASSWORD_GRACE_TIME
settings. If it does, choose another profile for this user.
Stop the redo transport from the primary database to the standby databases. We can execute the DEFER
command to defer the log destination with the ALTER SYSTEM
statement:
Change the redo transport user by setting the REDO_TRANSPORT_ USER
...
Dealing with UNNAMED datafiles
There are some reasons for a file being created as UNNAMED
in the standby database, including insufficient disk space on the standby site, non-privileged directory structure on standby database, or improper parameter settings related to file management.
The STANDBY_FILE_MANAGEMENT
parameter enables or disables automatic standby file management in Data Guard. When automatic standby file management is enabled, file additions and deletions in the primary database are replicated to the standby database.
For example, we add a datafile in the primary database when the STANDBY_FILE_MANAGEMENT
parameter on the standby database is set to MANUAL
. Due to this parameter setting, it will create an UNNAMED
file under $ORACLE_HOME/dbs
, and this will cause the MRP process to be killed. Errors in the alert log file will be as follows:
Time for action – resolving UNNAMED datafile errors
Now we'll see how to resolve an UNNAMED
datafile issue in a Data Guard configuration:
Check for the datafile number that needs to be recovered from the standby database:
Identify datafile 10
in the primary database:
Identify the dummy filename created in the standby database:
If the reason for...
Closing a gap with an RMAN incremental backup
When a standby database falls behind the primary database in time because of any interruption in redo transport or apply, the database can be synchronized again by applying the archived logs produced in the no-synchronization period. However, even if one of the necessary archived logfiles is not accessible, there is nothing to do for Data Guard to close the gap.
In such a case, we have to restore these archived logfiles from backups if they exist. If not, the only way to close the gap is by using an RMAN incremental backup taken from the primary database, especially to close the gap in question. We use the BACKUP INCREMENTAL FROM SCN
RMAN statement for this special-purpose backup.
Time for action – closing a gap with an RMAN incremental backup
Let's see all the required steps to practice this recovery operation:
In this practice, assume that there are missing archived logs (gap) in the standby database, and we're not able to restore these archived logs. We'll synchronize Data Guard using the RMAN incremental backup. To represent this situation, execute the DEFER
command to defer the log destination in the primary database, and execute the following operation that will generate redo in the primary database:
Now we have a standby database behind the primary database, and we'll use RMAN to reflect the primary database's changes to the standby database. Stop Redo Apply in the standby database:
Query the current system change number (SCN) of the standby database that will be used as the limit for an incremental backup of the primary database. Run...
Fixing NOLOGGING changes on the standby database
It's possible to limit redo generation for specific operations on Oracle databases, which provide higher performance. These operations include bulk inserts, creation of tables as select operations, and index creations. When we work using the NOLOGGING
clause, redo will not include all the changes to data on the related segments. This means if we perform a restore/recovery of the related datafile, or of the whole database after the NOLOGGING
operations, it'll not be possible to recover the data created with the NOLOGGING
option.
The same problem exists with Data Guard. When the NOLOGGING
operation is executed in the primary database, Data Guard is not able to reflect all the data changes in the standby database. In this case, when we activate a standby database or open it in the read-only mode, we'll see the following error messages:
Time for action – fixing NOLOGGING changes on a standby database with incremental datafile backups
As a prerequisite for this exercise, first put the primary database in the no-force logging mode using the ALTER DATABASE NO FORCE LOGGING
statement. Then perform some DML operations in the primary database using the NOLOGGING
clause so that we can fix the issue in the standby database with the following steps:
Run the following query to identify the datafiles that are affected by NOLOGGING
changes:
First we need to put the affected datafiles in the OFFLINE
state in the standby database. For this purpose, stop Redo Apply in the standby database, execute the ALTER DATABASE DATAFILE ... OFFLINE
statement, and start Redo Apply again:
Turning on Data Guard tracing
When database administrators work on a Data Guard problem or plan an important Data Guard operation such as role transition, they generally prefer to gather comprehensive trace information about the activity of Data Guard-related processes. For this purpose, Oracle offers the LOG_ARCHIVE_TRACE
parameter. By setting this parameter to an appropriate value, it's possible to have detailed information about log archiving, redo transport, and Redo Apply activities.
The default value of this initialization parameter is 0
, which means the additional tracing feature is off, and Oracle will continue generating its default alert and trace entries related to error conditions. It's possible to change the value of this parameter in the primary and/or standby databases online using the ALTER SYSTEM
statement. For example, look at the following statement:
In the Real Application Cluster database it's possible to set different tracing...
Gathering diagnostic data
We need to access diagnostic data about the Data Guard configuration, especially when there's a problem in the Redo Apply or redo transport services. After the first diagnosis of the problem, it's possible to decide whether to search for detailed information in the primary database or in the standby database. If the issue is about sending redo, it's more likely that the necessary information can be found on the primary site. However, if it's about Redo Apply, it's better to search for detailed information on the standby site.
No matter where we search for diagnostic data, we need to know where to search for the related logfiles and how to query diagnostic data in the database. The most commonly referenced files in a Data Guard issue are the primary and standby alert log files. If the Data Guard broker is used in the configuration, Data Guard Monitor (DMON) logfiles can also be helpful for troubleshooting. If necessary, we can also query Data Guard-related dynamic...
Time for action – monitoring the database alert log using ADRCI
Let's see an example of monitoring the database alert log using the ADRCI utility:
Ensure that the ORACLE_HOME
and PATH
environment variables are set properly. The PATH
environment variable must include the ORACLE_HOME/bin
directory.
Start the ADRCI command-line tool:
We can run the HELP
command to get help on the usage of this utility:
In this chapter, we have covered common Data Guard issues using diagnostic data in a Data Guard environment. As Data Guard administrators, we have to identify the underlying reason of a Data Guard issue using this diagnostic data, and resolve the issue in the correct way. We think that the information and examples that we've seen in this chapter will be helpful for this purpose. The next chapter is about Data Guard best practices.