Generally, jobs run immediately upon being enabled, or when we call the run_job procedure of the dbms_scheduler package. Many jobs are time-based; they are controlled by a schedule based on some kind of calendar.
However, not everything in real life can be controlled by a calendar. Many things need an action on an ad hoc basis, depending on the occurrence of some other thing. This is called event-based scheduling. Events also exist as the outcome of a job. We can define a job to raise an event in several ways—when it ends, or when it ends in an error, or when it does not end within the expected runtime. Let's start with creating job events in order to make job monitoring a lot easier for you.
Monitoring job events
Most of the time when jobs just do their work as expected, there is not much to monitor. In most cases, the job controller has to fix application-specific problems (for example, sometimes file systems or table spaces get filled up). To make this easier, we can incorporate events. We can make jobs raise events when something unexpected happens, and we can have the Scheduler generate events when a job runs for too long. This gives us tremendous power. We can also use this to make chains a little easier to maintain.
Events in chains
A chain consists of steps that depend on each other. In many cases, it does not make sense to continue to step 2 when step 1 fails. For example, when a create table fails, why try to load data into the nonexistent table? So it is logical to terminate the job if no other independent steps can be performed.
One of the ways to handle this is implementing error steps in the chain. This might be a good idea, but the disadvantage is that this quickly doubles the steps involved in the chain, where most of the steps—hopefully—will not be executed. Another disadvantage is that the chain becomes less maintainable. It's a lot of extra code, and more code (mostly) gives us less oversight.
If a job chain has to be terminated because of a failure, using the option of creating an event handler to raise a Scheduler event is recommended instead of adding extra steps that try to tell which error possibly happened. This makes event notification a lot simpler because it's all in separate code and not mixed up with the application code.
Another situation is when the application logic has to take care of steps that fail, and has well-defined countermeasures to be executed that make the total outcome of the job a success.
An example is a situation that starts with a test for the existence of a fi le. If the test fails, get it by FTP; and if this succeeds, load it into the database. In this case, the first step can fail and go to the step that gets the file. As there is no other action possible when the FTP action fails, this should raise a Scheduler event that triggers—for example—a notification action. The same should happen when the load fails.
In other third-party scheduling packages, I have seen these notification actions implemented as part of the chain definitions because they lack a Scheduler event queue. In such packages, messages are sent by mail in extra chain steps. In the Oracle Scheduler, this queue is present and is very useful for us. Compared to 10g, nothing has changed in 11g. An event monitoring package can de-queue from the SCHEDULER$_EVENT_QUEUE variable into a sys.scheduler$_event_info type variable. The definition is shown in the following screenshot:
What you can do with an event handler is up to your imagination. The following DB Console screenshot shows the interface that can be used to specify which events to raise:
It is easy to generate an event for every possible event listed above and have the handler decide (by the rules defined in tables) what to do. This does sound a little creepy, but it is not very different from having a table that can be used as a lookup for the job found in the event message where—most of the time—a notification mail is sent, or not sent. Sometimes, a user wants to get a message when a job starts running; and most of the time, they want a message when a job ends.
In a chain, it is especially important to be able to tell in which step the event happened and what that step was supposed to do. In the event message, only the job name is present and so you have to search a bit to find the name of the step that failed.
For this, we can use the LOG_ID to find the step name in the SCHEDULER_JOB_LOGS (user/dba/all_SCHEDULER_JOB_LOG) view, where the step name is listed as JOB_SUBNAME. The following query can be used to find the step_name from the dba all_scheduler_log view, assuming that the event message is received in msg:
select job_subname from all_scheduler_job_log where
log_id = msg.log_id;
To enable the delivery of all the events a job can generate, we can set the raise_events attribute to a value of:
dbms_scheduler.job_started + dbms_scheduler.job_succeeded +
dbms_scheduler.job_failed + dbms_scheduler.job_broken +
dbms_scheduler.job_completed + dbms_scheduler.job_stopped +
dbms_scheduler.job_sch_lim_reached + dbms_scheduler.job_disabled +
Or in short, we can set it to: dbms_scheduler.job_all_events.
There are many things that can be called events. In the job system, there are basically two types of events: events caused by jobs (which we already discussed) and events that makes a job execute.
On many occasions, a calendar will do fine for scheduling jobs. However, there are situations that require an immediate action and which cannot wait for the next activation based on a calendar. An example might be of a user who logs on to the database and then, using a logon trigger, more actions are executed.
Another example could be a situation in which we want a backup server to be utilized to the maximum, but not beyond that. We schedule all the backups independent of each other and have each backup raise an event when ready, which tells the system that another backup can go ahead. By letting the backup jobs wait for an event that essentially flags "there is backup capacity available now", we make sure that a backup does not take longer than needed. We also make sure that the backup system is pushed to the highest throughput.
When we just use a preset date and time to start the backups, chances are that more backups are running at the same time (possibly caused by the growth of one or more databases, which is potentially causing their backups to be longer than anticipated). On the other hand, when we make sure that more backups are never ever run at the same time, we will likely have lots of idle time in the backup system.
This is a reason enough to learn how we can make good use of events. However, there are a few things we need to do. It essentially comes down to:
- Creating a queue and defining a payload for that queue
- Having a process that puts the message on the queue
- Coupling one or more job definition(s) to the queue
Again, this gives a kind of control that is hard to find in third-party scheduling packages.
Event messages are placed on an event queue and this is handled by AQ. So we need to call the AQ packages and for that we require DBMS_AQ and DBMS_AQADM. In the days before Oracle 10g, we needed to set the AQ_TM_PROCESSES parameter to a non-zero value to work. Since Oracle 10g, this is no longer the case and we can leave the AQ_TM_PROCESSES value to zero.
First, make sure we can use AQ. Ensure that the user has the right privileges. Let's assume we have a user MARVIN created for our system. The privilege can be granted using:
select grantee, privilege, table_name
where table_name in ( 'DBMS_AQ', 'DBMS_AQADM')
and grantee = 'MARVIN';
The expected output is as shown in the following screenshot:
If this query does not show MARVIN having the EXECUTE privileges on both DBMS_AQ and DBMS_AQADM, we need to give them to our user.
As a DBA, execute the following:
grant execute on dbms_aq to marvin;
grant execute on dbms_aqadm to marvin;
grant select on dba_aq_agents to marvin;
grant create type to marvin;
alter user marvin quota unlimited on users;
dbms_aqadm.grant_system_privilege ('ENQUEUE_ANY', 'marvin', FALSE);
dbms_aqadm.grant_system_privilege ('DEQUEUE_ANY', 'marvin', FALSE);
dbms_aqadm.grant_system_privilege ('MANAGE_ANY', 'marvin', TRUE);
This makes sure that marvin has enough privileges to be able to create and use queues. Now connect as marvin, create an object type that we can use to put a message on the queue, and read from the queue later on.
create or replace type bckup_msgt as object ( msg varchar2(20) )
This defines a type consisting of one msg field of 20-character length. This is the type we will be using in the queue for which we create a queue table next:
queue_table => 'bckup_qt',
queue_payload_type => 'bckup_msgt',
multiple_consumers => TRUE
queue_name => 'bckup_q',
queue_table => 'bckup_qt'
dbms_aqadm.start_queue ( queue_name => 'bckup_q' ) ;
This creates a queue table called bckup_qt, which contains messages defined by bckup_msgt. After that, bckup_q starts immediately.
The following objects show up in the schema, which are created to support the queue table:
This also explains why MARVIN needs quota on his default tablespace. The queue definitions part is ready. Now, we can tie a job to the queue. First, create a job as follows:
job_name => '"MARVIN"."BCKUP_01"',
job_type => 'EXECUTABLE',
job_action => '/home/oracle/bin/rman.sh',
event_condition => 'tab.user_data.msg=''GO''',
queue_spec => '"MARVIN"."BCKUP_Q"',
start_date => systimestamp at time zone 'Europe/Amsterdam',
job_class => '"LONGER"',
comments => 'backup a database',
auto_drop => FALSE,
number_of_arguments => 1,
enable => FALSE
name => '"MARVIN"."BCKUP_01"',
attribute => 'raise_events',
value => dbms_scheduler.job_started +
job_name => '"MARVIN"."BCKUP_01"',
argument_position => 1,
argument_value => 'db_01'
name => '"MARVIN"."BCKUP_01"',
attribute => 'destination',
value => 'pantzer:15021'
name => '"MARVIN"."BCKUP_01"',
attribute => 'credential_name',
value => '"MARVIN"."JOBS_CRED2"'
This is just a simple remote external job that calls an RMAN script with an argument for the database to back up. As the backup will take longer than a few seconds, it looks obvious to put it in the job_class called LONGER that we defined a while ago. The queue that is coupled to this job is the queue we defined before. It is bckup_q as defined by the queue_spec parameter. As soon as the GO message appears in the payload of the queue, all of the jobs that listen to this queue and those waiting for this GO message will get started. The code listed for the MARVIN job can also be put together using DB Console. In the following Schedule screen, select Event as
As the job was not Enabled, it now looks like the following:
So, let's enable the job:
sys.dbms_scheduler.enable( '"MARVIN"."BCKUP_01"' );
This produces the following:
The job is currently scheduled, but not on a date. All we need to do now is have someone put a GO message in the bckup_q.
sys.dbms_aq.enqueue('marvin.bckup_q', enqopts, props,
The result is that all of the jobs waiting for the GO message are started at the same time. With the health of the backup system in mind, it would be wiser to query the jobs view, find the backup job that was scheduled first, and give that job its specific event. In that case, the BCKUP_01 job will wait for the message "BCKUP_01"; and BCKUP_02 will wait for the message "BCKUP_02".
Another option is that Oracle can allow us to define an event that is delivered to exactly one waiting job at a time. An enhancement request has been filed for this. It will make this kind of waiting a bit easier because normal queuing behavior is then saved. This means that things such as job priorities will be honored. When we define a separate event for every job, we have manual control but we cannot influence the selection order of the jobs in the queue, for example, by raising the priority of a job.
When a backup is ready, the backup system can handle the next backup. We can utilize the enqueue operation by putting our next GO message into the queue in the epilogue of the backup script. However, what will happen if the script crashes? The next backup will never be released. Again, a smarter location for this code would be in an event handler routine that just waits for termination messages from the Scheduler event queue. As soon as it sees the termination of a backup, it can decide to call in the next one by giving that waiting job a signal at its location.
This article gave you a quick look at the two basic types of events that play a role in the Scheduler: the event generated by status changes in a running job and the event that causes a job to run. Both are very important factors in the power of the Scheduler.