Studies show that many corporations world wide expect their IT footprint to grow in the coming years. They expect more servers, more databases, more data, and more of everything.
They require more floor space in their data centers, and correspondingly a greater power footprint. Have you heard of a data center where no more servers can be added as the power supply has reached its limit, or the uninterruptible power supply (UPS) can no longer cope? This story is not new, it happened a few years ago.
The growth seems to be endless—and this is fuelled by today's information age, where larger and larger volumes of data need to be stored and distributed to satisfy an ever-growing demand. More applications are using those databases, on more and more application servers.
So, for an IT manager, this will mean more of everything in his/her data centre. There may be different hardware platforms, different operating systems, for example, Solaris, Linux, IBM AIX, or Microsoft Windows, and in each such case there may be different versions such as the different flavors of Linux supplied by different vendors, including Oracle Enterprise Linux, Red Hat, SUSE Linux, and so on.
In the database arena, if a company has no policy of standardization for one particular database vendor, there may be different databases, such as Oracle, IBM DB2, or Microsoft SQL Server, in use by different projects.
Even if the databases belong to only one vendor, for example Oracle, the databases may be of different versions, such as Oracle Database 9i, 10g, or 11g. In the real world, it is very difficult to standardize on one version, as all applications may not be certified to use on that one database version. You may have some application vendors that say they are certified on Oracle Database 10.2.0.3 and not 10.2.0.5, and some that say they only use a particular version of Oracle Database 11g Release 1 and no other version.
So multiple database versions need to be installed separately, managed, patched as required, and upgraded when required. Also, development as well as test, staging, and production environments need to be provisioned (created) for each such database version. This level of complexity is the ground reality in today's data centers.
First, a business project identifies the need for an application server and/or a database server. A formal request is made for these. The request is approved, and arrives at the doorstep (or mailbox) of the Unix/Windows team. Teams such as these are normally present in any reasonably sized IT department.
The Unix/Windows team then procures the necessary servers, gets access, and installs the operating system—frequently following manual checklists where they tick off each step. This team may also need to install additional packages/patches at the OS level, as requested by the project managers, and this information may or may not be available to the management at this stage. It may well be the case that any missing OS-level packages could cause delays and annoyances later down the track, but let us say this information is known, and the additional packages/patches are applied by the Unix/Windows team.
The hardware with the installed operating system is then forwarded to the database team in the IT department, where the DBAs get access to the server and install the database software, such as the Enterprise Edition of the Oracle database.
Certain options for the Oracle database may also need to be installed at this stage, such as the partitioning option or the advanced security option, depending on the requirements of the project and the licenses available with the company. Let us presume there is no standardization at this stage, so everything has to be decided manually, or guessed—if there are no clear instructions from the project side.
At this point, if certain OS packages/patches are missing, the DBAs may redirect the servers back to the Unix/Windows team for the missing components to be installed. Assuming this is done in a day or so, the DBAs then re-attempt the database software installation the next day (if they are lucky).
The database software thus installed may itself need to be patched, for example, if Oracle Database 10g Release 2 (10.2.0.1.0) had been installed as the base release on the Solaris Operating System (SPARC) (64-bit), it should be patched on to the ultimate release of Oracle Database 10.2 on this platform, such as 10.2.0.5. It is always recommended to do this for production databases. Not patching for the ultimate or penultimate release can lead to issues later on when applying regular security patches that are often provided only for the latest releases of any version.
After this, the actual database is created. If there are no company standards, it is possible that each database created by different DBAs even in the same team may be different, for example one database may have the Sample Schemas provided by Oracle installed but another database may not have these installed. This is normally done via a simple manual selection in Database Configuration Assistant (DBCA)—Oracle's standalone graphical utility for creating new databases. One DBA may decide to select this, the other DBA may not.
There are other examples of similar configuration drifts. One DBA may decide to use Oracle Managed Files (OMF) for the data files of the database, another may not. It may depend on each individual and his/her likes or dislikes. For example, I, personally, have never used Oracle Managed Files even though there is nothing wrong with using them.
Even if there are published company standards for database creation, they may not be followed 100% of the time as standards are difficult to enforce when manual methods of installation are used and no control is enforced when performing the installation.
There is no separation of roles in the manual installation, there is no DBA Designer (the Senior DBA) who designs the database to be installed, and there is no DBA Operator (the Junior DBA) who follows the design and installs the database. It just depends on who does the installation and who does a better job—the Senior DBA or the Junior DBA. And frequently, the Senior DBA does not even have the time for repetitive installations and just leaves it to the Junior DBA. The Junior DBA then decides to try out different things for educational purposes – possibly.
The same scenario may be applicable to the middleware team when they install the application server software, such as Oracle WebLogic Server or Oracle Application Server, or application servers belonging to other vendors. Manual methods will lead to major or minor differences in the setup and configuration, and if there are no automated methods of checking the configuration compliance, it may lead to a near-impossible situation and there would be no way to enforce company standards on the middleware servers and domains either during or after installation.
The DBAs install the database software, patch it to the required level, and then create the databases required by the applications. They repeat this process for development, test, staging, and production environments.
After a reasonably successful installation, the DBA team is also requested to set up the backup of each database and also to set up the standby databases for disaster recovery using some tool, for example, Oracle DataGuard.
The setup of each backup and standby database is reasonably complicated, especially the latter, and involves a number of detailed steps. Also, each backup and disaster recovery scenario needs to be tested to ensure that it works.
Where is the time to do all this if everything is done manually?
Once the databases are released to the production environment, either the same team of DBAs or a different production team looks into the day-to-day workings of each database, and attempts to ensure that the application is guaranteed a certain level of performance—and this is a difficult order without automated tools.
Every database sooner or later needs diagnosis and tuning—as databases are not static; they change, their data changes, their users change, and their application changes. More load is placed on the database. More data is used in queries. All these changes are inevitable and eventually lead to performance issues.
First of all, to gain an understanding of these issues, performance information from the database is collected, manually analyzed, and certain worst-performing SQL statements are identified.
Each such SQL statement is then painstakingly fixed, often by adding indexes, perhaps without understanding the effect of extra indexes placing a heavier strain on inserts and updates.
Such manual performance diagnosis (finding out the problem) and tuning (fixing the problem) is obviously a very lengthy and tedious process, and is compounded when there are many tens or even hundreds of databases. A team of DBAs would be needed just to look at and fix day-to-day performance issues.
What about patching these databases?
The CPU is a collection of security patches, and the PSU includes the CPU as well as other patches that fix separate issues.
If a particular company is security conscious and wants to make sure that the latest security patches have been applied to all its databases, there will be a need for the DBAs to apply the CPUs or PSUs every calendar quarter, and this involves downloading the appropriate patch, FTPing or SCPing it to each server, and then going through a series of manual steps to apply the patch.
This is fine if there are few databases, but if there are hundreds compounded with several environments for development, testing, and production, then the patches must be painstakingly applied on each and every database home and database.
It is possible, but it takes a lot of time manually. The DBAs will never get their sleep in this case, when they stay behind to complete the manual patching of the numerous databases. Also, the manual repetitive tasks they follow for this patching will be more prone to mistakes—as the mind itself gets numbed by constant repetition of the same steps.
Let us say, on top of all this, there are development projects that constantly upgrade their applications and send database scripts to the DBAs to make schema changes corresponding to the new application release.
This may include new tables, new columns, a modification of existing columns, or new database structures as well as new/modified users and their privileges on the objects.
To implement these changes at the database level for the purpose of such application upgrades, they would use a collection of SQL or PL/SQL commands in a script that they supply to the DBA—to run in the test and then in the staging environments. Finally, the scripts would be run on the production database if the application is found to work well on the test and staging databases.
First of all, the scripts themselves need to be checked thoroughly by the DBA, and this takes time.
The DBA needs to ensure the script is doing the right thing, which he/she does with a manual check, then perhaps a dry run on a separate test database to work out script issues, for example, the creation of indexes on nonexistent columns, the granting of privileges before the table is created, errors in SQL syntax, and so on.
The script then has to be modified and re-run before it is finally ready to be executed in the test environment, and the DBA may send a few stern emails to the developer team to be more careful next time.
Once this is done, the project team commences testing the application on the test database. After a green signal is received, the DBA will be asked to run the script on the staging and production databases.
The DBA then rests easy, thinking that the job is over, but suddenly receives a shock—the developer tells him that the application is not working in production (although it is working in staging and testing).
The DBA calls his/her home to say that he/she will be coming home late today, then starts to investigate the issue by painstakingly comparing the staging and the production databases. Every table, user, and privilege in the two databases has to be compared until the DBA finds out that certain privileges are missing on the production database, whereas the user had higher privileges on the other databases.
This simple reason has taken hours to be discovered. The DBA fixes the script, creates a final version for the particular release, and runs the necessary part of granting the extra privileges (by a cut and paste from the final script), and the application finally works.
Also, the only way the history of all these changes can be preserved in this case is by retaining the script files on the database server, versioning them, and storing them in subdirectories. This is sometimes done instead on the DBA's Windows PC.
Trying to look into such files in this directory structure to find out a past change is normally a very manual, labour-intensive, and painstaking process. I have done this many times all by myself, so am well aware of the issues.
When manual effort is the norm, in such data centers, it is inevitable that the DBAs and other administrators spend much of their time being reactive. Junior and less-trained staff on night shift hours would compound the issue, as would offshore technicians working remotely without a proper understanding of the internal environment.
For example, if the OS maintenance is off sourced and, say, the admin is asked to apply an OS upgrade, and as a part of the process the admin decided to change some OS properties under the impression that it will improve the OS performance. However, the next day, the on-site DBA finds that the standby database has stopped working, and he/she has no way of knowing what has changed as there is no automatic capture or history of configuration changes in such a manual environment. All the DBA has is the general information that the OS was upgraded, and he/she has to then investigate further by comparing, bit-by-bit, the configuration of the server on which the database is still working and the server on which the database has had the problem.
You can guess how much time and effort this may take?
There is usually a team of Unix administrators and another team of Windows administrators who are responsible for manually preparing each and every piece of hardware by installing the operating system and patching it to the required level.
These administrators are also responsible for resolving issues with the systems they provision, such as missing pieces in the installation or performance issues that may be due to improper setup of the operating system (wrong values supplied for OS properties, for example, network buffer properties).
There is another team of Database Administrators (DBAs). These DBAs may specialize in Oracle or DB2 or SQL Server, and frequently in companies that seek to combine multiple roles, may dabble in all of these. (Indeed in the DBA world, it was once considered a plus point to know as many databases as possible, until the realization dawned that a real expert in one main database was more of a valuable asset than a DBA who knew multiple databases and their nuances, but only superficially.)
These teams of Unix, Windows, database and also the middleware administrators are put into action in their brute numbers, and this is normally seen in the highly-populated countries in the world today where there are a great number of administrators in the job market. The admin labor is available at a low cost in such markets, and consequently more administrators can be hired.
Such administrators, in an effort to be extremely competitive against their peers, and to appear extremely loyal to their work, proudly say "we never sleep" (sacrificing their family happiness in the process) and make themselves available for tackling all the issues mentioned—albeit in a manual, uncontrolled, haphazard manner that would be prone to multiple and deadly mistakes.
However, brute force, by throwing reams of administrators at the manual tasks, does work at fighting fires and keeping them under control. This technique is employed by a number of companies to handle their data centers. But then, they get used to fighting fires every other day.
The other scenario is the company that prides itself on the thousands of reams of scripts running its data center. These countless scripts are used in an attempt to automate the manual steps of managing the data center. They are used for provisioning, to collect the configuration, for patching, for applying the changes to the schemas, for backing up, and for creating and monitoring the standby disaster recovery databases.
However, these scripts are not a magic bullet—there needs to be an effort to write and maintain these scripts. As technology changes, more and more complicated scripts need to be written. The scripts may be layered unnecessarily and may become quickly outdated—for example, an Oracle RMAN script used to back up an Oracle 9i database may still be used to back up an Oracle 10g database, without using the new features such as Block Change Tracking and Fast Incremental Backups, present in the later releases of RMAN.
This is the very problem with scripts—they stay static.
The languages are not easy, and require expertise to write scripts—which is somewhat rare. The writers of such scripts soon establish a position for themselves in the company as heroes. They are available to script everything.
In this chapter, we have seen the common issues in data centers (albeit from the eyes of the administrators) and the chaos that can be seen in any such center if there is no controlled management and automation. We hope you have gained a good insight.
Brute manual force and multitudes of scripts are often used as the last bastion of defense against the chaos, but ultimately succumb. They cannot cope with rapid change.
Even the script heroes are eventually engulfed by the rising tide of continuous scripting and sooner or later quit the company.
In the next chapter, we will take our first look at powerful Enterprise-Management software from Oracle that can be used to reduce this chaos to a negligible extent, so that the data center can be managed efficiently from a central console, and with the barest minimum of scripts. Read on.