In this article, we play the number game once more. Let's build a formula to calculate the complexity of processes. The purpose is not to teach how to design good processes (the earlier chapters tackled that), but to 'score' them on their control-flow complexity and flag those that exceed a particular threshold. Processes with lower scores are more readable, maintainable, and testable than those with higher scores. Processes with high scores need rework!
Most SOA developers, today, struggle with process complexity. They write impeccable, highly-structured Java or C# code, but their processes tend to branch out and meander in every possible direction. They would never dream of embedding an if-then inside a for loop inside a while loop inside of a catch inside a do-while in Java, but they are quick to bury, deep in an SOA process, a pick inside a sequence inside a flow inside of switch inside a scope. Their processes are often far too complex.
On the other hand, when they review each others' processes, they know intuitively what 'too complex' means. Process (b) in the following figure, for example, appears much more complex than Process (c), because it has far too many arrows and is difficult to navigate. Process (c) looks tidier and more structured by comparison. Process (a) is harder to judge. Although it is well-structured and easy to navigate, it is also absurdly long; having so many steps in a sequence is poor design.
In this article, we quantify 'complex'. We build a formula that rules in favor of Process (c), and penalizes (a) for its excessive sequence and (b) for its nesting and excessive branching. In addition, we demonstrate that processes designed in 'flat form' (introduced in Chapter 6) score lower than 'naïve' processes.
Applying McCabe's Formula for BPEL and TIBCO BusinessWorks
In this section, we study McCabe's formula for complexity, and describe how it can be used to measure the complexity of processes in BPEL and BusinessWorks.
Calculating McCabe Complexity
The best-known measure of programmatic complexity is Thomas McCabe's cyclomatic complexity. In his landmark paper, published in 1976 (A Complexity Measure, IEEE Transactions on Software Engineering, v. SE-2, no. 4, pp. 308—320, http://www.literateprogramming.com/mccabe.pdf), McCabe shows how to score the complexity of computer programs (FORTRAN is McCabe's preferred language) using concepts from graph theory. McCabe's approach is twofold:
- He shows how to represent a computer program as a directed graph, with nodes representing steps and arrows representing the flow of control between these steps. The program must have exactly one entry point, from which all nodes are reachable, and exactly one exit point, which is reachable from all nodes. If the program has several modules or procedures, each is represented as a separate directed graph.
- The complexity of this graph, and thus the complexity of the computer program, is E – N + 2P, where E is the number of edges, N the number of nodes, and P the number of modules. Assuming the program is a single unit with no separate modules, its complexity is E ‑ N + 2. Alternatively, the complexity of the program is A + 1, where A is the number of decisions or alternate paths in the graph; if a node branches in D directions, it has 1 normal path and D-1 alternate paths. This second method is easier to calculate: start with 1, then for each decision node add the number of outgoing branches less one. For example, add 1 for each binary decision, 2 for each ternary decision, and so on.
Process (a) in the figure above has 16 edges and 17 nodes, so its complexity is 16 – 17 + 2, or 1. Alternatively, it has no decisions, so its complexity is 0 + 1, or 1. Process (b) has 23 edges and 10 nodes, and thus, has a complexity of 23 – 10 + 2, or 15. Alternatively, in process (b), node D3 has one alternate path (that is, it is a binary decision), D1, D4, and D5 each have two alternative paths, D6 has three alternate paths, and D2 has four alternate paths; the total number of alternate paths is thus 1 + 2 + 2 + 2 + 3 + 4, or 14. So the complexity of Process (b) is 14 + 1, or 15. Process (c) has 19 edges and 14 nodes, and thus a complexity of 19 – 14 + 2, or 7; alternatively, it has six alternate paths—five for D1, one for D2—so its complexity is 6 + 1, or 7.
McCabe advocates the use of his cyclomatic measure on actual software projects. The team should agree on an acceptable upper bound for complexity (say 20), score each program, and decide how to rework those that score too high.
McCabe's measure is not perfect. For one thing, it does not penalize processes for having too many nodes. The rather preposterous sequence of 17 activities in Process (a) in the preceding figure has a perfect McCabe score of 1. A process consisting of a million consecutive activities, or even one with as many activities as there are particles in the universe, would also score 1, and would pass review!
Secondly, McCabe does not penalize for nested branching. The two processes shown in the following figure have the same complexity, although the process on the bottom is intuitively more complex than that on the top. Each process scores 7, because each contains three ternary (or 3-way) decisions: D1, D2, and D3. In the top process, those decisions come consecutively, whereas in the bottom, they are nested three levels deep.
McCabe Complexity for BPEL
Despite its flaws, McCabe's formula forms a part of our scoring mechanism for SOA processes; it is a component of a larger measure we develop in the last section of this article. As a pre-requisite for that discussion, we now consider how to apply the McCabe score to processes developed in BPEL and in TIBCO's BusinessWorks.
BPEL processes are mapped to McCabe's directed graph form as follows:
- A BPEL sequence maps easily to a line of nodes, such as fragment shown in the next figure, where the BPEL sequence of activities A, B, and C is drawn as three nodes—A, B, and C—with arrows from A to B and from B to C. A sequence does not add to the complexity of the process.
- A BPEL pick, flow, or switch is represented as an N-ary decision in the following figure. The beginning and end points of the structure are designated by the nodes labeled Split and Join. For a pick, the branches are onMessage or onAlarm handlers. For a switch, the branches are case or otherwise structures. For a flow, the branches are the activities to be run in parallel (A, B, and C in the figure). (If the flow has inter-activity links, the links are represented as edges.) A pick, switch, or flow adds N-1 to the overall complexity.
- A BPEL while activity is represented as a GOTO-style loop. The loop begins with a conditional check called Test (as shown in the following figure) and then branches either to A (to perform the while activity if the condition is true), or out of the loop to End. When A completes, it loops back to Test for another iteration. As it contains one binary decision, the while structure adds 1 to the complexity.
- Error handling is dicier. A single unnested scope with a single error handler and N basic activities (at any level within the scope) adds as much as N to the complexity, assuming each of those basic activities might encounter the error that the handler is meant to catch. The reason for this is that each basic activity must have a binary choice whether to continue down the happy path or route directly to the handler. The scope shown in process (a) in the following figure, contains activities A, B, and C in a sequence (Sc denotes the start of the scope, End its end), but each of these activities has an error path to the handler Error, thus adding 3 to the complexity. In cases with nested scopes and multiple handlers, things get more complicated. An example of this is shown in Process (b). The outer scope, bounded by Sc1 and End1, has two handlers: E11, which is accessible from activity A; and E12, accessible from E. The inner scope, bounded by Sc2 and End2, has its own two handlers, E21 and E22, both of which are accessible from the activities B and C. The E21 handler, in turn, can throw an exception to the outer handler E11. The complexity introduced by all of this is 7: one for the decision at A, one for the decision at E, one for the decision at E21, and two each for the decisions at B and C.
With these fundamentals out of the way, we now calculate the McCabe complexity of more substantial BPEL processes. We start with the event-based flat-form representation, shown in the following figure:
In the previous figure (and in those that follow), a small circle with a label of the form '+N' indicates the number of alternate paths for a BPEL activity. For example, the '+14' next to the pick indicates that there are 15 handlers in the pick, or 14 alternate paths introduced by the pick.
The event-based process has a score of 22, which is broken down as follows:
- We add 1 for the while loop.
- We add 14 for the main switch (which has 15 handlers).
- Three of the switch cases have inner switches of 2, 3, and 4 cases, so we add 1 + 2 + 3 for these.
- There are 21 alternate paths, the sum of those counted in the previous bullets. Using the formula A + 1, we add 1 to 21 to get a score of 22.
The state-based disputes process is shown in the following figure:
The McCabe complexity of the state-based process is 37, which implies that it has 36 alternate paths. These paths are the following:
- The while loop has one.
- The main switch has five cases and therefore four alternate paths.
- Three cases in the main switch have inner switches of 2, 5, and 5 cases. These inner switches therefore have 1, 4, and 4 alternate paths.
- There are nested picks in these inner switches with 5, 3, 2, 2, 2, 3, 4, 3, 3, 3, 2, and 2 handlers. The number of alternate paths in the nested picks is therefore the sum of 4, 2, 1, 1, 1, 2, 3, 2, 2, 2, 1, and 2.
The flow-based process is shown in the following figure:
The flow-based process scores 35. Its 34 alternate paths break down as follows:
- The while loop adds 1.
- The outer switch, which has 12 cases, adds 11.
- Several cases contain inner picks. Altogether the inner cases add 22 to the score. The reader can quickly verify this from the figure.
The naïve representation, shown in the following figure, scores 23:
The process is divided into three parts:
- The Capturing part has a score of 5: 1 for the while loop, 3 for the outer pick (which has 4 handlers), and 1 for the inner pick (with 2 handlers).
- The Investigating part scores 9. The outer pick, which has 4 handlers, scores 3. The inner picks have 2, 2, 2, and 4 handlers, and thus have 1, 1, 1, and 3 alternative paths. The total is thus 3 + 1 + 1 + 1 + 3, or 9.
- The Charging Back stage has picks at various levels of nesting with 3, 3, 3, 2, and 2 handlers, so its score is 2 + 2 + 2 + 1 + 1, or 8.
The three parts have 5 + 9 + 8 alternate paths, or 22 in total. The score for the process is therefore 1 + 22, or 23. (Note that the scoped fault handlers don't add to the complexity as they are linked directly to throw activities in the main flow.)
The following table
Form |
Score |
Calculation |
Naïve |
23 |
The process is divided into three parts. For the Capturing part, add 1 for the while loop, 3 for the outer pick (which has 4 handlers), and 1 for the inner pick (with 2 handlers), for a total of 5. For the Investigating part, add 3 for the outer pick (with 4 handlers); the inner picks have 2, 2, 2, and 4 handlers, so add 1, 1, 1, and 3 for these. The total for the Investigating part is thus 3 + 1 + 1 + 1 + 3, or 9. The Charging Back stage has picks at various levels of nesting with 3, 3, 3, 2, and 2 handlers, so its score is 2 + 2 + 2 + 1 + 1, or 8. The scoped fault handlers don't add to the complexity since they are linked directly to throw activities in the main flow. |
State-Based |
37 |
Add 1 for the while loop. Add 4 for the main switch (which has 5 cases). Three cases in the main switch have inner switches of 2, 5, and 5 cases, so add 1 + 4 + 4 for this. There are nested picks in these inner switches with 5, 3, 2, 2, 2, 3, 4, 3, 3, 3, 2, and 2 handlers, so add 4, 2, 1, 1, 1, 2, 3, 2, 2, 2, 1, and 1 for this. |
Event-Based |
22 |
Add 1 for the while loop. Add 14 for the main switch (which has 15 handlers). Three of the switch cases have inner switches of 2, 3, and 4 cases, so add 1 + 2 + 3 for this. |
Flow-Based |
35 |
Add 1 for the while loop. Add 11 for the outer switch, which has 12 cases. Several cases contain inner picks. Altogether the inner cases add 22 to the score. |
The results are astonishing and reinforce why McCabe scoring by itself is insufficient. The naïve process, with a complexity of 23, is, according to the McCabe number, much simpler than the state-based and flow-based representations, which score 37 and 35 respectively. The event-based representation has the lowest score, 22, but beats the naïve form by a margin of only one. What's happening? McCabe scoring favours the naïve approach. Part of the answer, as we discuss further in the last section of this article, is overhead: most of the decisioning in flat form is machinery, and if we deduct its cost from the score, we achieve the results we were looking for. Specifically:
- In the event form, the outer while and outer pick, which together form the 'event loop', account for 15 of the 18 complexity points, and hence are overhead.
- In the flow form, the outer while and outer switch, which together form the 'route loop', account for 12 of the 35 complexity points, are thus overhead.
- The complexity of the state form is entirely overhead! The while drives the machine, the switch selects states, and the pick drives transitions.
McCabe Complexity for TIBCO's BusinessWorks
As a contrast to BPEL—a block-structured language that also allows, through its flow activity, a graph-structured modelling style—consider TIBCO's process integration centrepiece BusinessWorks, a graph-structured SOA process language with support for block-structured conditionals, loops, picks, and exception handlers. The top-half of the following figure shows a BusinessWorks process that mixes these two styles:
In the happy path, the process logs its starting point (TraceStart) and formats its input request (Format Request) before entering into a for-each-style loop, called CreateCases. (The loop is enclosed in a box known as an iterate group; the symbol at the top left corner of the box is an 'i' in a counter-clockwise arrow, which conveys iteration.) The group moves through a list of items, for each creating a case (CreateCase) and populating a database (CreateReqOnDB). When the loop is finished, the process performs either Assign Email IDs or UseDefault, following a conditional path to one or of the other, before finally sending an email (Send Email Notification), logging its status (TraceEnd), and completing. If an error occurs along the way, the process catches it (Catch) and cleans itself up (Cleanup).
The bottom half of the diagram shows the process as a directed graph that can be scored for McCabe complexity. We represent the loop and the exception handler the same way we did in BPEL: the while starts with a node (Start Iter) that branches either to its first activity (Create Case) or to the end (End Iter), and the error handler (Catch) is linked from every node in the happy path that can encounter the error. The complexity, which is 13, is calculated as follows:
- Eight activities (Trace Start, Format Request, Create Case, Create Req on DB, Assign Email IDs, Use Default, Send Email Notification, and Trace End) have a binary decision either to continue on the happy path or route to the error handler, and thus, each adds one to the complexity.
- Start Iter and End Iter are ternary decisions, and thus, each adds two to the complexity.
- In total, there are 12 alternate paths, so the complexity is 13.
Summary
In this article, we discussed the following applying McCabe's Formula for BPEL and TIBCO BusinessWorks which includes:
- Calculating McCabe Complexity
- McCabe Complexity for BPEL
- McCabe Complexity for TIBCO's BusinessWorks