- Create a file called network.py to store the network application.
- Create a class definition to represent a network event.
class Event(object):
def __init__(self, hostname, condition, severity, event_time):
self.hostname = hostname
self.condition = condition
self.severity = severity
self.id = -1
def __str__(self):
return "(ID:%s) %s:%s - %s" % (self.id, self.hostname,
self.condition, self.severity)
- hostname: It is assumed that all network alarms originate from pieces of equipment that have a hostname.
- condition: Indicates the type of alarm being generated. Two different alarming conditions can come from the same device.
- severity: 1 indicates a clear, green status; and 5 indicates a faulty, red status.
- id: The primary key value used when the event is stored in a database.
- Create a new file called network.sql to contain the SQL code.
- Create a SQL script that sets up the database and adds the definition for storing network events.
CREATE TABLE EVENTS (
ID INTEGER PRIMARY KEY,
HOST_NAME TEXT,
SEVERITY INTEGER,
EVENT_CONDITION TEXT
);
- Code a high-level algorithm where events are assessed for impact to equipment and customer services and add it to network.py.
from springpython.database.core import*
class EventCorrelator(object):
def __init__(self, factory):
self.dt = DatabaseTemplate(factory)
def __del__(self):
del(self.dt)
def process(self, event):
stored_event, is_active = self.store_event(event)
affected_services, affected_equip = self.impact(event)
updated_services = [
self.update_service(service, event)
for service in affected_services]
updated_equipment = [
self.update_equipment(equip, event)
for equip in affected_equip]
return (stored_event, is_active, updated_services,
updated_equipment)
The __init__ method contains some setup code to create a DatabaseTemplate. This is a Spring Python utility class used for database operations. See http://static.springsource.org/spring- python/1.2.x/sphinx/html/dao.html for more details. We are also using sqlite3 as our database engine, since it is a standard part of Python.
The process method contains some simple steps to process an incoming event.
- We first need to store the event in the EVENTS table. This includes evaluating whether or not it is an active event, meaning that it is actively impacting a piece of equipment.
- Then we determine what equipment and what services the event impacts.
- Next, we update the affected services by determining whether it causes any service outages or restorations.
- Then we update the affected equipment by determining whether it fails or clears a device.
- Finally, we return a tuple containing all the affected assets to support any screen interfaces that could be developed on top of this.
- Implement the store_event algorithm.
def store_event(self, event):
try:
max_id = self.dt.query_for_int("""select max(ID)
from EVENTS""")
except DataAccessException, e:
max_id = 0
event.id = max_id+1
self.dt.update("""insert into EVENTS
(ID, HOST_NAME, SEVERITY,
EVENT_CONDITION)
values
(?,?,?,?)""",
(event.id, event.hostname,
event.severity, event.condition))
is_active =
self.add_or_remove_from_active_events(event)
return (event, is_active)
This method stores every event that is processed. This supports many things including data mining and post mortem analysis of outages. It is also the authoritative place where other event-related data can point back using a foreign key.
- The store_event method looks up the maximum primary key value from the EVENTS table.
- It increments it by one.
- It assigns it to event.id.
- It then inserts it into the EVENTS table.
- Next, it calls a method to evaluate whether or not the event should be add to the list of active events, or if it clears out existing active events. Active events are events that are actively causing a piece of equipment to be unclear.
- Finally, it returns a tuple containing the event and whether or not it was classified as an active event.
For a more sophisticated system, some sort of partitioning solution needs to be implemented. Querying against a table containing millions of rows is very inefficient. However, this is for demonstration purposes only, so we will skip scaling as well as performance and security.
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
- Implement the method to evaluate whether to add or remove active events.
def add_or_remove_from_active_events(self, event):
"""Active events are current ones that cause equipment
and/or services to be down."""
if event.severity == 1:
self.dt.update("""delete from ACTIVE_EVENTS
where EVENT_FK in (
select ID
from EVENTS
where HOST_NAME = ?
and EVENT_CONDITION = ?)""",
(event.hostname,event.condition))
return False
else:
self.dt.execute("""insert into ACTIVE_EVENTS
(EVENT_FK) values (?)""",
(event.id,))
return True
When a device fails, it sends a severity 5 event. This is an active event and in this method, a row is inserted into the ACTIVE_EVENTS table, with a foreign key pointing back to the EVENTS table. Then we return back True, indicating this is an active event.
- Add the table definition for ACTIVE_EVENTS to the SQL script.
CREATE TABLE ACTIVE_EVENTS (
ID INTEGER PRIMARY KEY,
EVENT_FK,
FOREIGN KEY(EVENT_FK) REFERENCES EVENTS(ID)
);
This table makes it easy to query what events are currently causing equipment failures.
Later, when the failing condition on the device clears, it sends a severity 1 event. This means that severity 1 events are never active, since they aren't contributing to a piece of equipment being down. In our previous method, we search for any active events that have the same hostname and condition, and delete them. Then we return False, indicating this is not an active event.
- Write the method that evaluates the services and pieces of equipment that are affected by the network event.
def impact(self, event):
"""Look up this event has impact on either equipment
or services."""
affected_equipment = self.dt.query(
"""select * from EQUIPMENT
where HOST_NAME = ?""",
(event.hostname,),
rowhandler=DictionaryRowMapper())
affected_services = self.dt.query(
"""select SERVICE.*
from SERVICE
join SERVICE_MAPPING SM
on (SERVICE.ID = SM.SERVICE_FK)
join EQUIPMENT
on (SM.EQUIPMENT_FK = EQUIPMENT.ID
where EQUIPMENT.HOST_NAME = ?""",
(event.hostname,),
rowhandler=DictionaryRowMapper())
return (affected_services, affected_equipment)
- We first query the EQUIPMENT table to see if event.hostname matches anything.
- Next, we join the SERVICE table to the EQUIPMENT table through a many-to many relationship tracked by the SERVICE_MAPPING table. Any service that is related to the equipment that the event was reported on is captured.
- Finally, we return a tuple containing both the list of equipment and list of services that are potentially impacted.
Spring Python provides a convenient query operation that returns a list of objects mapped to every row of the query. It also provides an out-of-the-box DictionaryRowMapper that converts each row into a Python dictionary, with the keys matching the column names.
- Add the table definitions to the SQL script for EQUIPMENT, SERVICE, and SERVICE_MAPPING.
CREATE TABLE EQUIPMENT (
ID INTEGER PRIMARY KEY,
HOST_NAME TEXT UNIQUE,
STATUS INTEGER
);
CREATE TABLE SERVICE (
ID INTEGER PRIMARY KEY,
NAME TEXT UNIQUE,
STATUS TEXT
);
CREATE TABLE SERVICE_MAPPING (
ID INTEGER PRIMARY KEY,
SERVICE_FK,
EQUIPMENT_FK,
FOREIGN KEY(SERVICE_FK) REFERENCES SERVICE(ID),
FOREIGN KEY(EQUIPMENT_FK) REFERENCES EQUIPMENT(ID)
);
- Write the update_service method that stores or clears service-related even and then updates the service's status based on the remaining active events.
def update_service(self, service, event):
if event.severity == 1:
self.dt.update("""delete from SERVICE_EVENTS
where EVENT_FK in (
select ID
from EVENTS
where HOST_NAME = ?
and EVENT_CONDITION = ?)""",
(event.hostname,event.condition))
else:
self.dt.execute("""insert into SERVICE_EVENTS
(EVENT_FK, SERVICE_FK)
values (?,?)""",
(event.id,service["ID"]))
try:
max = self.dt.query_for_int(
"""select max(EVENTS.SEVERITY)
from SERVICE_EVENTS SE
join EVENTS
on (EVENTS.ID = SE.EVENT_FK)
join SERVICE
on (SERVICE.ID = SE.SERVICE_FK)
where SERVICE.NAME = ?""",
(service["NAME"],))
except DataAccessException, e:
max = 1
if max > 1 and service["STATUS"] == "Operational":
service["STATUS"] = "Outage"
self.dt.update("""update SERVICE
set STATUS = ?
where ID = ?""",
(service["STATUS"], service["ID"]))
if max == 1 and service["STATUS"] == "Outage":
service["STATUS"] = "Operational"
self.dt.update("""update SERVICE
set STATUS = ?
where ID = ?""",
(service["STATUS"], service["ID"]))
if event.severity == 1:
return {"service":service, "is_active":False}
else:
return {"service":service, "is_active":True}
Service-related events are active events related to a service. A single event can be related to many services. For example, what if we were monitoring a wireless router that provided Internet service to a lot of users, and it reported a critical error? This one event would be mapped as an impact to all the end users. When a new active event is processed, it is stored in SERVICE_EVENTS for each related service.
Then, when a clearing event is processed, the previous service event must be deleted from the SERVICE_EVENTS table.
- Add the table defnition for SERVICE_EVENTS to the SQL script.
CREATE TABLE SERVICE_EVENTS (
ID INTEGER PRIMARY KEY,
SERVICE_FK,
EVENT_FK,
FOREIGN KEY(SERVICE_FK) REFERENCES SERVICE(ID),
FOREIGN KEY(EVENT_FK) REFERENCES EVENTS(ID)
);
It is important to recognize that deleting an entry from SERVICE_EVENTS doesn't mean that we delete the original event from the EVENTS table. Instead, we are merely indicating that the original active event is no longer active and it does not impact the related service.
- Prepend the entire SQL script with drop statements, making it possible to run the script for several recipes
DROP TABLE IF EXISTS SERVICE_MAPPING;
DROP TABLE IF EXISTS SERVICE_EVENTS;
DROP TABLE IF EXISTS ACTIVE_EVENTS;
DROP TABLE IF EXISTS EQUIPMENT;
DROP TABLE IF EXISTS SERVICE;
DROP TABLE IF EXISTS EVENTS;
- Append the SQL script used for database setup with inserts to preload some equipment and services.
INSERT into EQUIPMENT (ID, HOST_NAME, STATUS) values (1,
'pyhost1', 1);
INSERT into EQUIPMENT (ID, HOST_NAME, STATUS) values (2,
'pyhost2', 1);
INSERT into EQUIPMENT (ID, HOST_NAME, STATUS) values (3,
'pyhost3', 1);
INSERT into SERVICE (ID, NAME, STATUS) values (1, 'service-abc',
'Operational');
INSERT into SERVICE (ID, NAME, STATUS) values (2, 'service-xyz',
'Outage');
INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values
(1,1);
INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values
(1,2);
INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values
(2,1);
INSERT into SERVICE_MAPPING (SERVICE_FK, EQUIPMENT_FK) values
(2,3);
- Finally, write the method that updates equipment status based on the current active events.
def update_equipment(self, equip, event):
try:
max = self.dt.query_for_int(
"""select max(EVENTS.SEVERITY)
from ACTIVE_EVENTS AE
join EVENTS
on (EVENTS.ID = AE.EVENT_FK)
where EVENTS.HOST_NAME = ?""",
(event.hostname,))
except DataAccessException:
max = 1
if max != equip["STATUS"]:
equip["STATUS"] = max
self.dt.update("""update EQUIPMENT
set STATUS = ?""",
(equip["STATUS"],))
return equip