Once, while attending an Oracle conference, I overheard a heated cell phone conversation with the phrase "Step away from the keyboard!" Things were not going well back at their office: the backup Database Administrator (DBA) was attempting to fix a problem and there was a debate about what to do next. The database was down and the less-experienced DBA was attempting to bring it up without investigating why it was down in the first place. Just taking an extra minute to look at the database alert log before attempting to bring up the database might prevent the need to do a time-intensive restore process from a backup. In this chapter, we shall take on the philosophy as well as the motivation behind the DBA's role in the Enterprise, which can make or break a technical career.
A DBA will not know exactly how to fix every problem or issue that arises, but they should be well-versed in the tools available for diagnosing and eventual resolution. Attention to detail while not losing sight of the big picture is an important attribute to have. Focusing too much on the smaller, less essential items would be a waste of valuable time. All tasks and accomplishments should be centered on improving service to the client, so it makes sense that maximizing availability is a primary objective for a DBA. This chapter also includes a list of tips to avoid making a major blunder or a possible career-ending mistake such as the example above.
The story above shows that protecting the database is considered one of the ongoing responsibilities of a DBA. So, how does one accomplish this goal? You do it with the right knowledge and tools. At the same time you can't make everyone else an enemy while staunchly protecting the database. There are security decisions that you will have to stick to, but you should try not to stand in the way of getting the work done. It is a position of respect that you are striving for, not a position based on fear.
How does one gain this knowledge? Obviously, one of the first answers is to read the supplied documentation, but that isn't enough. Other less obvious ways are interacting with your peers at conferences, joining Oracle User Groups, conversations with other DBAs, office meetings with IT personnel, writing papers and giving presentations, interacting with technical support sites, and e-mailing lists. Try gathering information about a particular issue or problem from as many sources as possible before implementing a possible solution. Always take advice with a pinch of salt (including this book). You will be able to make a better and more informed decision by doing as much research and using as many resources as possible.
With that being said, acquiring information is only one of the initial steps to resolving a problem. These investigate different scenarios, as there will usually be more than one approach to fix a complex issue or problem. It is only by working through an issue with hands-on experience that a DBA gains the confidence along with the ability to survive a production-crisis situation. To accomplish this, you need adequate resources, such as the correct hardware, software, and most importantly, scheduled time to do a practice run through various scenarios.
A DBA needs a box that no one else is dependent on so that they can create and destroy databases at will. Adequate hardware for testing purposes is almost as important as the production hardware for database administration. The software side is significant as well and requires much more work to keep up with all the technological and security updates. Change is constant in the software industry. You will always be migrating to the next software version (Oracle software or operating system), patch set, Critical Patch Update (CPU), Patch Set Updates (PSU), one-off patch, or hardware replacements.
One of the questions most frequently asked by DBA managers is: what type of person should I hire to be the DBA? This usually means they weren't happy with the previous candidates. To be good at the job takes a tough but not mean attitude, getting it just right, and paying careful attention to the smallest of details without losing sight of the big picture. You must know everything about the Oracle software and if you are not able to find someone who does, step out of the office, communicate with other IT professionals, and pass on the information you learn to others who need help as well. Teaching someone else is the best test to see if you really know it well. Above all, don't be afraid to do the right thing when it comes to database security and privacy issues—in most situations, a DBA is all powerful, but needs a certain amount of self-restraint.
This book assumes that you could be any type of DBA, working on a single, small database, right up to many large databases. A Very Large Database (known as VLDB) will take several DBAs to run, so teamwork is important. Don't be surprised, as the newcomer, if you are relegated to minor duties for several years. Also, don't assume that if your database is not large, its tuning isn't as important: Oracle Maximum Availability Architecture (MAA) and Optimal Flexible Architecture (OFA) standards will apply to all Oracle customers, not just VLDBs. Both MAA and OFA are discussed later in this book.
There are general practitioners (GPs) who perform routine health exams and treat common, everyday problems such as the flu or a muscle strain. You go to a specialist when the GP has reached his or her limit of expertise. If your organization is large, then there will be an opportunity for you to become a specialist as you gain experience. For now you should become familiar with all aspects of the database as a generalist DBA. Don't ignore or discount certain areas of database administration as being unimportant; if you are responsible for this database, then all areas will be important. Your customers will often see their requirements as more important than yours, so adapting to meet everyone's needs will require flexibility on your part.
Your tools should be extensible, flexible, and ubiquitous. Now that you have some knowledge, what tools are needed? As you go through these chapters, there will be mention of what tools are available time and again. Reasons for choosing a certain tool will be provided: make your decisions based on availability, cost, and adaptation to your particular need or environment, and personal preferences. Be open and willing to change to a new tool because it may just make you more efficient. Don't let the flashy software with high price tags and marketing hype influence purchasing decisions. There are a lot of great open source tools that are available, which may require a little more investment in time to configure, install, or maintain, but which are well worth it. You will only be as good as the tools you pick for this job. It is time to pick the programs you will become best at—command line or GUI-based or command line-based Unix scripting in combination with SQL*Plus are excellent for manipulating both the database and operating system for almost any task that needs to be done. Graphic-based tools have their place, but there will be times that you will need to revert to command-line versions of those tools, especially for troubleshooting. Proficiency will be the only way to be effective, efficient (across the enterprise) and ubiquitous—common among all of the systems or databases you will have to administer.
Oracle Enterprise Manager (OEM—database control) is for a single database, and the Grid Control version of OEM is for the enterprise-deployed tool. The browser-based GUI tool has links to almost all of the other tools in this section of the chapter, and OEM database control has command-line equivalent
with limited functionality for specific tasks. You can migrate from database control to grid control with the
These are all of the graphic-based, command-line Oracle tools:
Database Configuration Assistant (DBCA): This is used for database creation, templates, and installing/uninstalling database options. It can use a utility to create scripts that can be saved to the OS, edited, and run manually for database creation. It has no real command-line equivalent.
Oracle Universal Installer (OUI): This is used for the installation of Oracle software and options. Its command-line equivalent would be to use the installer in silent mode with a text-based input file.
Network Configuration Assistant (NETCA): There isn't a command-line version of all the features of this utility. Most often the DBA edits the resulting network configuration files directly. The command-line equivalent to the Listener subcomponent of this utility is
Database Upgrade Assistant (DBUA): This is the utility for changing a database from one version to another; it will have specific compatibility requirements, and can do all commands manually using SQL*Plus and OS commands.
Oracle Environment (
coraenvdepending on Unix shell): These are the command-line utilities for configuring Oracle Environmental variables, and they work on entries located in the
Oracle Internet Directory Configuration Assistant (
oidca): This is the command-line utility for Oracle's version of the LDAP directory. There are several LDAP-specific command-line utilities for use with Oracle that all start with
Automatic Diagnostic Repository Control Interface (
adrci): This is the command-line utility for tracing, diagnostic packaging, and logging output for most of the utilities in this list. It is meant to be run in a single location for all Oracle products installed on a single server.
For Unix shell scripting, install the open source terminal emulator Cygwin on the Windows systems, otherwise you will have to learn Windows scripting. But why learn two languages? Standardize across the enterprise because it is only a matter of time before you will be taking on new applications and databases, as the business needs change for your organization.
What is a dinosaur? It is someone who is stuck in their ways and too comfortable where they are to move (change things). You can spot a dinosaur if the answers to the following questions are positive.
Do they only stay in their office, cubicle, or work area and rarely show up to meetings with other human beings?
Are their questions and answers short, curt, and rarely reveal what really happened when things went awry?
Do people wonder what they do all day?
Do they go to conferences without any business cards to pass around?
Do they just linger on technical e-mails, discussion lists, or forums?
Now that you know how to spot a dinosaur, the reasons for avoiding them may not be obvious. It is the sign of an insecure, cloistered individual who refuses to change because of fear of the unknown. Realize that there is no real way to know everything about Oracle and that you will need to be open and willing to change in order to keep ahead of the steep learning curve ahead of you.
It is normal that you might be a bit overwhelmed as a new DBA for your first year. If this situation extends for much longer than that, then something is wrong and it is obviously time to ask for long-term help. This situation is especially true if you are the only DBA or the first Oracle DBA in your organization. Help can come in the form of changing your phone number to non-published within your organization, changing who you directly answer to, or adding another DBA (on-site, remote, or temporary consultant). Compare and evaluate the regular tasks you perform with other DBAs. It doesn't matter if you are only a backup or junior DBA at this point, it is the best time to learn these valuable concepts and still be protected from disrupting the production environment. Everyone wants you to succeed as a DBA because it is a position with a huge amount of responsibility, long hours, and it requires ongoing training to stay competent.
Are you barely able to answer requests as they pour in? Ask that those requests be filtered or first reviewed by a front-line tech support personnel. The exception would be if that is your job—a front-line DBA to serve as part of a tech support organization. On a side note, third-party vendors will put tech people in support positions who give DBA-like advice, but be very wary as they often have little to no actual experience. Remember, a lot of technical helpers only play with databases all day. They don't actually have to manage a live production one with your critical data in it. Learn to say no to these individuals.
A DBA is most often placed underneath the head of IT (most often the CIO) because the database is often central to the entire IT department. If you are one of the many DBAs, your organization will most often have an immediate DBA supervisor that you will answer to first, before the head of IT.
The DBA is not the sole source for any issue related to Oracle, especially if your entire IT department is new to Oracle. Their job is not to train everyone on the basics of Oracle—they don't have the time and probably aren't the best people for that job. Most DBAs don't have the experience of teaching others; they are usually best in a one-on-one situation training a backup or junior DBA. They also shouldn't be subjected to the whims of upper management when there is a technical issue that can't be resolved by the standard front-line personnel, or when the perception is that the issue won't be handled quickly enough.
After a year on the job, you should be proficient enough with My Oracle Support (MOS, Oracle's support site renamed from Metalink) to find out most of the answers on your own and be comfortable enough to implement the solution without help. Using an Internet search engine may be another source of information (beware that the information may not be timely, complete, or even accurate), but it is a source you turn to when other more dependable ones aren't productive.
When you create a MOS Service Request, don't expect an answer that day unless the severity level and contract agreement dictates that level of support. Oracle Engineer support help may be hours or even days from now. Reach out to the Oracle community for help at the same time you enter a Service Request, which often will get a quicker if not better answer than Oracle support, because these people will have encountered the same problem at some point. Not everyone will give reliable advice; learn to recognize those that do. Most often a test case with reproducible results is the best indicator of expertise, or at least a well-thought-out process. E-mail lists and forums have been known in the past for shooting down people who don't read the documentation first. The exception would be if you are at a total loss on how to fix a problem or are experiencing a disaster at a particular moment.
Responsibilities include installing, configuring, and managing the database, and these responsibilities can be divided into tasks scheduled to occur at certain intervals. This is a generalized list and, depending on your environment, may or may not be applicable. Most of the outlined tasks will be investigated further in later chapters in the book.
Monitoring and Log Rotation tasks can be done with Enterprise Manager, Grid Control, Unix shell scripting, DBMS_Scheduler, Perl, third-party database tools, or a combination of any of these.
Let's run through the priority tasks you need to cover. Scheduling will depend on your environment, application needs, and overall job priorities.
Backups—these are usually incremental or cumulative, weekly fulls, and logs are archived and e-mailed to DBA upon failure
Database Alert Logs—such as ORA-errors, automatic notifications through e-mail, pagers
ADRCI—Automatic Repository Utility and Log Rotation
Operating System File space, CPU and I/O statistics—depends on system admin support
SQL Tuning Sets—Top 5 to 10 SQL statements
Corruption—RMAN logs, export and/or datapump logs, dbverify,
Tablespace growth—Extension, Partition Management, Temporary Tablespace, Undo
Data Guard—Log Shipping/Application in Synch
SQL*NET Listener Logs—intrusion detection
Audit trails and logs—intrusion detection, removal of unused accounts
Core Dumps and User Dumps—file space, Oracle bugs
New account creation—should be at least partially automated
Personnel security changes—At least 24 hours notice
Migrate schema and code changes or ad hoc SQL updates
Large table growth, coalescing tablespace
Keeping a log of daily changes to the database—publishing it for certain IT staff
Cloning for non-production databases—automated or scripted
Tablespace growth—daily rolled up to weekly
Oracle upgrade or patch set Migration Projects—Milestone updates
Data Guard site testing
Check for updates from My Oracle Support—new patches, updates, or news releases
Local Intranet updates on operational procedures
Cloning for non-production databases—automated or scripted
Monitoring tablespace growth—weekly rolled up to monthly
Trends and forecasts—CPU utilization, I/O stats, logons
Password changes on production—sys, system, wallet, schema, grid control, OAS
Oracle licensing usage and high water marks
Practicing recovery scenarios
Applying CPUs and PSUs into production with planned downtime. Applying CPUs, PSUs, one-offs into non-production instances
Monitoring tablespace growth—monthly rolled up to yearly
Oracle training updates—Oracle University (online or in-class), books, informal meetings
Trends and forecast rollups
Tablespace growth—yearly report
Trends and forecast rollups
Attend Oracle-oriented conferences—regional or national Oracle user groups
Oracle upgrades with planned downtime—version + patch sets + PSUs + one-offs
Software licensing and warranty renewals
Hardware evaluation and replacement
SSL Certificate renewals, Oracle Wallets
Yes, these look like a daunting number of tasks that need to be accomplished, but you will have help in the form of tools such as OEM, Grid Control, third-party monitoring, or home-grown scripts. That is why I will reiterate that automating these tasks is of paramount importance.
A few years ago having the database down on a regular basis was normal and considered necessary just for backups. But it is no longer needed in these days of 24x7 IT operations and expanded Service Level Agreements (SLAs).
The database most often will only have to be down for patches or upgrades, which can be either Oracle or application-specific. You should no longer need to have the database down to do backups. If cold backups are a norm at your workplace, then this is a sign of a dinosaur. Little to no downtime applies to production instances, but non-production should be mostly up during working hours with only intermittent outages.
Each organization has its own Outage Handling Procedures—depending on whether it is planned or unplanned downtime. Most DBAs are assigned a database to be the primary contact when there is an outage issue on call. Outage handling usually includes something similar to the following:
Initial troubleshooting to determine the type of outage: Evaluate any automatic failover procedures to check for success.
Forecasting the amount of time before resolution: This is the point for making the decision if a manual fail over is needed.
Bringing the application or database back online: Not all failures are due to the database being down, even when that is what first appears to be the case.
Root cause analysis: What was the real reason for the outage? This is not always evident at first glance.
Future preventive actions: Evaluating and rewriting the outage procedures, reassigning team members for outage coverage.
Outage handling is an important process and includes quite a few non-DBA team members who must coordinate efforts, and not just point fingers to get this issue resolved. These types of procedures should be well documented (in both print and online form for disasters) with a definite line of authority as to who can execute the procedures with administrative approval.
There are many things that could cause the database to crash or become unavailable to end users:
Operating system issues
ASM or RAC specific problems
Critical Oracle processes dying
Certain ORA-600 errors
Certain Oracle bugs
Listener not running
Speaking of the human side of things, the following list details how to avoid the really bad things that can happen to even experienced DBAs. Remember if you are a novice or new DBA, you shouldn't have access to certain servers or databases because your superior understands how easy it is to do the wrong thing at the wrong time. The following list may seem harsh, full of should and don't statements, but I felt it was important to state exactly what others have experienced or witnessed personally. Think of it as an experienced DBA giving someone under them some good advice about what to avoid.
rm -rf *.*for any reason at anytime, do
*.trc: It is safer to back up a directory and use
rmdirinstead. It would be even better if you renamed the entire directory and left it in place renamed for a day or two.
Assuming that all of the datafiles in a certain directory only pertain to one database is a recipe for disaster, those files can be created anywhere on the filesystem as long as Oracle has write access.
Modifying access for a production instance at the SQL*Plus level is unusual and generally not granted to programming staff unless there is a single point of accountability, such as a lead programmer.
It is best to use the Unix utility called
fuseragainst a database file before using an
mvcommand because it checks if the file is actively being used. Another way would be to force a database checkpoint and check the timestamp before removing. If it is an active datafile, the timestamp would be current.
ORACLE_SIDand user into the SQL prompt. This will prevent many a disaster by visually checking the prompt before running a script in what you think is a non-production database. Instructions on how to do this come later in the book.
Use the extended Unix prompt that puts in the hostname, user, and
ORACLE_SID. This will add more visual clues to ensure that you know exactly what you are modifying.
Copying and pasting directly into a SQL*Plus or other command-line utility window can lead to the wrong code being executed. Copy and paste into a text file and run it instead. This double checks exactly what is in the copy/paste buffer.
Type the word
productioninto the command-line window after you finish using it. This will prevent disasters if you accidentally switch windows and run something you shouldn't have. It will only produce an error because there is no command called production.
It is best to run recovery scenarios on a different server from any running production. Also, test operating system restores. Disaster recovery sites should also be located on a different server for true failover capabilities.
Make sure you know how to use the command line for all of the Oracle utilities and Unix
vieditor just in case you have nothing else at your disposal.
It is suggested to make your production windows, application, or command-line utility like PuTTY a completely different color for production versus non-production, and the scrolling history as large as possible. Unix has a history capture utility called
Tell someone else you are modifying something… just in case. Saying it aloud may give someone else time to stop you or at least give you a mental check on what you are doing.
Log rotating scripts can play havoc with naming the online redo logs with a file extension of
log. Using the letters
rdowould be safer.
Unknown outside consultants won't necessarily give the best advice. Be wary until you are sure of their expertise and ability. If at all possible, ask to do the work under their guidance so that you know what is actually occurring.
Using the number 8 in any type of scripting,
ORACLE_SIDname, or the like can play havoc with scripting or command-line executions because the all-inclusive wild card character * is above the eight—it's too easy to type it accidentally.
Double check by tracking the operating system's performance on a server, especially running out of file space.
Beware the reuse clause when adding or altering a database file. This command can overwrite an existing datafile, which destroys any existing data.
Be wary of scripts generated by third-party tools, they can be too powerful. A script to recreate an object usually drops it in the first line. This can be disastrous if the data has not been saved.
You are responsible for backups. It is not wise to delegate this in any way.
Be sure to investigate the addition of resource limits for any users that have ad hoc SQL access in production. Those types of users can easily hog CPU or I/O, which is needed more by the OLTP application.
Make sure the system administrators know not to modify, move, or delete anything that belongs to the Oracle accounts.
The Unix root account is not intended for everyday use and especially not suited for Oracle tasks. Investigate the use of
sudofor tracking root-authorized activities.
This is the most important blunder-avoiding tip—it is wisest not to do anything that you can't undo, reverse, or fix.
As a new DBA, one of the hardest things to figure out is the philosophy behind the position. The best DBAs in the business seem to have an underlying sense or gut feeling when something is wrong, and when to speak up and say no.
Treat others as you would like to be treated with a certain amount of respect. There should be lines drawn based on your position in the organization. Those boundaries are there for a reason—responsibilities and duties are attributed to the team player based on those boundaries. You will be working closely with the team and you will depend on them doing their job. A database is dependent on a reliable framework of underlying software and hardware, which most often is not your responsibility; this makes your job very dependent on your fellow team members' expertise.
This chapter helped to show what a DBA does all day. All rules may need to be bent or modified to suit your organization's needs. Only experience and testing will allow your team to decide how best to proceed on any single rule or suggestion in this chapter.