Hands-On Software Engineering with Python: Move beyond basic programming and construct reliable and efficient software with complex code

Nimesh Verma

Brian Allbee

$57.99

Paperback Oct 2018 736 pages 1st Edition

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

AI Assistant (beta) to help accelerate your learning

Programming versus Software Engineering

Development shops often have specific levels, grades, or ranks that their developers fall into, indicating the levels of experience, expertise, and industry wisdom expected of staff at each level. These may vary (perhaps wildly) from location to location, but a typical structure looks something like the following:

Junior developers: A junior developer is typically someone that doesn't have much programming experience. They probably know the basics of writing code, but they are not expected to know much beyond that.
Developers: Mid-level developers (referred to by whatever formal title might apply) usually have enough experience that they can be relied on to write reasonably solid code, with little to no supervision. They probably have enough experience to determine implementation details and strategies, and they will often have some understanding of how different chunks of code can (and do) interact with each other, and what approaches will minimize difficulties in those interactions.
Senior developers: Senior developers have enough experience - even if it's focused on a set of specific products/projects - to firmly grasp all of the technical skills involved in typical development efforts. At this point in their careers, they will almost always have a solid handle on a lot of the non-technical (or semi-technical) skills that are involved, as well—especially policies and procedures, and strategies and tactics that encourage or enforce business values such as stability and the predictability of development efforts. They may not be experts in those areas, but they will know when to call out risks, and they will often have several options to suggest for mitigating those risks.

Above the level of the senior developer, the terminology and definition often varies even more wildly, and the skill set usually starts to focus more on business-related abilities and responsibilities (scope and influence) than on technical capabilities or expertise.

The dividing line between programming and software engineering falls somewhere within the differences between developers and senior developers, as far as technical capabilities and expertise are concerned. At a junior level, and sometimes at a developer level, efforts are often centered around nothing more than writing code to meet whatever requirements apply, and conforming to whatever standards are in play. Software engineering, at a senior developer level, has a big-picture view of the same end results. The bigger picture involves awareness of, and attention paid to, the following things:

Standards, both technical/developmental and otherwise, including best practices
The goals that code is written to accomplish, including the business values that are attached to them
The shape and scope of the entire system that the code is a part of

The bigger picture

So, what does this bigger picture look like? There are three easily-identifiable areas of focus, with a fourth (call it user interaction) that either weaves through the other three or is broken down into its own groups.

Software engineering must pay heed to standards, especially non-technical (business) ones, and also best practices. These may or may not be followed but, since they are standards or best practices for a reason, not following them is something that should always be a conscious (and defensible) decision. It's not unusual for business-process standards and practices to span multiple software components, which can make them difficult to track if a certain degree of discipline and planning isn't factored into the development process to make them more visible. On the purely development-related side, standards and best practices can drastically impact the creation and upkeep of code, its ongoing usefulness, and even just the ability to find a given chunk of code, when necessary.

It's rare for code to be written simply for the sake of writing code. There's almost always some other value associated with it, especially if there's business value or actual revenue associated with a product that the code is a part of. In those cases, understandably, the people that are paying for the developmental effort will be very interested in ensuring that everything works as expected (code-quality) and can be deployed when expected (process-predictability).

Code-quality concerns will be addressed during the development of the hms_sys project a few chapters from now, and process-predictability is mostly impacted by the developmental methodologies discussed in Chapter 5, The hms_sys System-Project.

The remaining policy-and-procedure related concerns are generally managed by setting up and following various standards, processes, and best practices during the startup of a project (or perhaps a development team). Those items - things such as setting up source control, having standard coding conventions, and planning for repeatable, automated testing - will be examined in some detail during the set up chapter for the hms_sys project. Ideally, once these kinds of developmental process are in place, the ongoing activities that keep them running and reliable will just become habits, a part of the day-to-day process, almost fading into the background.

Finally, with more of a focus on the code side, software engineering must, by necessity, pay heed to entire systems, keeping a universal view of the system in mind. Software is composed of a lot of elements that might be classified as atomic; they are indivisible units in and of themselves, under normal circumstances. Just like their real-world counterparts, when they start to interact, things get interesting, and hopefully useful. Unfortunately, that's also when unexpected (or even dangerous) behaviors—bugs—usually start to appear.

This awareness is, perhaps, one of the more difficult items to cultivate. It relies on knowledge that may not be obvious, documented, or readily available. In large or complex systems, it may not even be obvious where to start looking, or what kinds of question to ask to try to find the information needed to acquire that knowledge.

Asking questions

There can be as many distinct questions that can be asked about any given chunk of code as there are chunks of code to ask about—even very simple code, living in a complex system, can raise questions in response to questions, and more questions in response to those questions.

If there isn't an obvious starting point, starting with the following really basic questions is a good first step:

Who will be using the functionality?
What will they be doing with it?
When, and where, will they have access to it?
What problem is it trying to solve? For example, why do they need it?
How does it have to work? If detail is lacking, breaking this one down into two separate questions is useful:
- What should happen if it executes successfully?
- What should happen if the execution fails?

Teasing out more information about the whole system usually starts with something as basic as the following questions:

What other parts of the system does this code interact with?
How does it interact with them?

Having identified all of the moving parts, thinking about "What happens if…" scenarios is a good way to identify potential points where things will break, risks, and dangerous interactions. You can ask questions such as the following:

What happens if this argument, which expects a number, is handed a string?
What happens if that property isn't the object that's expected?
What happens if some other object tries to change this object while it's already being changed?

Whenever one question has been answered, simply ask, What else? This can be useful for verifying whether the current answer is reasonably complete.

Let's see this process in action. To provide some context, a new function is being written for a system that keeps track of mineral resources on a map-grid, for three resources: gold, silver, and copper. Grid locations are measured in meters from a common origin point, and each grid location keeps track of a floating-point number, from 0.0 to 1.0, which indicates how likely it is that resource will be found in the grid square. The developmental dataset already includes four default nodes - at (0,0), (0,1), (1,0), and (1,1) - with no values, as follows:

The system already has some classes defined to represent individual map nodes, and functions to provide basic access to those nodes and their properties, from whatever central data store they live in:

Constants, exceptions, and functions for various purposes already exist, as follows:

node_resource_names: This contains all of the resource names that the system is concerned with, and can be thought of and treated as a list of strings: ['gold','silver','copper']
NodeAlreadyExistsError: An exception that will be raised if an attempt is made to create a MapNode that already exists
NonexistentNodeError: An exception that will be raised if a request is made for a MapNode that doesn't exist

OutOfMapBoundsError: An exception that will be raised if a request is made for a MapNode that isn't allowed to exist in the map area
create_node(x,y): Creates and returns a new, default MapNode, registering it in the global dataset of nodes in the process
get_node(x,y): Finds and returns a MapNode at the specified (x, y) coordinate location in the global dataset of available nodes

A developer makes an initial attempt at writing the code to set a value for a single resource at a given node, as a part of a project. The resulting code looks as follows (assume that all necessary imports already exist):

def SetNodeResource(x, y, z, r, v):
    n = get_node(x,y)
    n.z = z
    n.resources.add(r, v)

This code is functional, from the perspective that it will do what it's supposed to (and what the developer expected) for a set of simple tests; for example, executing, as follows:

SetNodeResource(0,0,None,'gold',0.25) print(get_node(0,0)) SetNodeResource(0,0,None,'silver',0.25) print(get_node(0,0)) SetNodeResource(0,0,None,'copper',0.25) print(get_node(0,0))

The results are in the following output:

By that measure, there's nothing wrong with the code and its functions, after all. Now, let's ask some of our questions, as follows:

Who will be using this functionality?: The function may be called, by either of two different application front-ends, by on-site surveyors, or by post-survey assayers. The surveyors probably won't use it often, but if they see obvious signs of a deposit during the survey, they're expected to log it with a 100% certainty of finding the resource(s) at that grid location; otherwise, they'll leave the resource rating completely alone.

What will they be doing with it?: Between the base requirements (to set a value for a single resource at a given node) and the preceding answer, this feels like it's already been answered.

When, and where, do they have access to it?: Through a library that's used by the surveyor and assayer applications. No one will use it directly, but it will be integrated into those applications.

How should it work?: This has already been answered, but raises the question: Will there ever be a need to add more than one resource rating at a time? That's probably worth nothing, if there's a good place to implement it.

What other parts of the system does this code interact with?: There's not much here that isn't obvious from the code; it uses MapNode objects, those objects' resources, and the get_node function.

What happens if an attempt is made to alter an existing MapNode?: With the code as it was originally written, this behaves as expected. This is the happy path that the code was written to handle, and it works.

What happens if a node doesn't already exist?: The fact that there is a NonexistentNodeError defined is a good clue that at least some map operations require a node to exist before they can complete. Execute a quick test against that by calling the existing function, as follows:

SetNodeResource(0,6,None,'gold',0.25)

The preceding command results in the following:

This is the result because the development data doesn't have a MapNode at that location yet.

What happens if a node can't exist at a given location?: Similarly, there's an OutOfMapBoundsError defined. Since there are no out-of-bounds nodes in the development data, and the code won't currently get past the fact that an out-of-bounds node doesn't exist, there's no good way to see what happens if this is attempted.

What happens if the z-value isn't known at the time?: Since the create_node function doesn't even expect a z-value, but MapNode instances have one, there's a real risk that calling this function on an existing node would overwrite an existing z-altitude value, on an existing node. That, in the long run, could be a critical bug.
Does this meet all of the various developmental standards that apply?: Without any details about standards, it's probably fair to assume that any standards that were defined would probably include, at a minimum, the following:
- Naming conventions for code elements, such as function names and arguments; an existing function at the same logical level as get_node, using SetNodeResources as the name of the new function, while perfectly legal syntactically, may be violating a naming convention standard.
- At least some of the effort towards documentation, of which there's none.
- Some inline comments (maybe), if there is a need to explain parts of the code to future readers—there are none of these also, although, given the amount of code in this version and the relatively straightforward approach, it's arguable whether there would be any need.
What should happen if the execution fails?: It should probably throw explicit errors, with reasonably detailed error messages, if something fails during execution.
What happens if an invalid value is passed for any of the arguments?: Some of them can be tested by executing the current function (as was done previously), while supplying invalid arguments—an out-of -range number first, then an invalid resource name.

Consider the following code, executed with an invalid number:

SetNodeResource(0,0,'gold',2)

The preceding code results in the following output:

Also, consider the following code, with an invalid resource type:

SetNodeResource(0,0,'tin',0.25)

The preceding code results in the following:

The function itself can either succeed or raise an error during execution, judging by these examples; so, ultimately, all that really needs to happen is that those potential errors have to be accounted for, in some fashion.

Other questions may come to mind, but the preceding questions are enough to implement some significant changes. The final version of the function, after considering the implications of the preceding answers and working out how to handle the issues that those answers exposed, is as follows:

def set_node_resource(x, y, resource_name, 
    resource_value, z=None):
    """
Sets the value of a named resource for a specified 
node, creating that node in the process if it doesn't 
exist.

Returns the MapNode instance.

Arguments:
 - x ................ (int, required, non-negative) The
                      x-coordinate location of the node 
                      that the resource type and value is 
                      to be associated with.
 - y ................ (int, required, non-negative) The 
                      y-coordinate location of the node 
                      that the resource type and value is 
                      to be associated with.
 - z ................ (int, optional, defaults to None) 
                      The z-coordinate (altitude) of the 
                      node.
 - resource_name .... (str, required, member of 
                      node_resource_names) The name of the 
                      resource to associate with the node.
 - resource_value ... (float, required, between 0.0 and 1.0, 
                      inclusive) The presence of the 
                      resource at the node's location.

Raises
 - RuntimeError if any errors are detected.
"""
    # Get the node, if it exists
    try:
        node = get_node(x,y)
    except NonexistentNodeError:
        # The node doesn't exist, so create it and 
        # populate it as applicable
        node = create_node(x, y)
    # If z is specified, set it
    if z != None:
        node.z = z
# TODO: Determine if there are other exceptions that we can 
#       do anything about here, and if so, do something 
#       about them. For example:
#    except Exception as error:
#        # Handle this exception
    # FUTURE: If there's ever a need to add more than one 
    #    resource-value at a time, we could add **resources 
    #    to the signature, and call node.resources.add once 
    #    for each resource.
    # All our values are checked and validated by the add 
    # method, so set the node's resource-value
    try:
        node.resources.add(resource_name, resource_value)
        # Return the newly-modified/created node in case 
        # we need to keep working with it.
        return node
    except Exception as error:
        raise RuntimeError(
            'set_node_resource could not set %s to %0.3f '
            'on the node at (%d,%d).' 
            % (resource_name, resource_value, node.x, 
            node.y)
        )

Stripping out the comments and documentation for the moment, this may not look much different from the original code—only nine lines of code were added—but the differences are significant, as follows:

It doesn't assume that a node will always be available.
If the requested node doesn't exist, it creates a new one to operate on, using the existing function defined for that purpose.
It doesn't assume that every attempt to add a new resource will succeed.
When such an attempt fails, it raises an error that shows what happened.

All of these additional items are direct results of the questions asked earlier, and of making conscious decisions on how to deal with the answers to those questions. That kind of end result is where the difference between the programming and software engineering mindsets really appears.

Key benefits

Master the tools and techniques used in software engineering

Evaluates available database options and selects one for the final Central Office system-components

Experience the iterations software go through and craft enterprise-grade systems

Description

Software Engineering is about more than just writing code—it includes a host of soft skills that apply to almost any development effort, no matter what the language, development methodology, or scope of the project. Being a senior developer all but requires awareness of how those skills, along with their expected technical counterparts, mesh together through a project's life cycle. This book walks you through that discovery by going over the entire life cycle of a multi-tier system and its related software projects. You'll see what happens before any development takes place, and what impact the decisions and designs made at each step have on the development process. The development of the entire project, over the course of several iterations based on real-world Agile iterations, will be executed, sometimes starting from nothing, in one of the fastest growing languages in the world—Python. Application of practices in Python will be laid out, along with a number of Python-specific capabilities that are often overlooked. Finally, the book will implement a high-performance computing solution, from first principles through complete foundation.

What you will learn

Understand what happens over the course of a system s life (SDLC)

Establish what to expect from the pre-development life cycle steps

Find out how the development-specific phases of the SDLC affect development

Uncover what a real-world development process might be like, in an Agile way

Find out how to do more than just write the code

Identify the existence of project-independent best practices and how to use them

Find out how to design and implement a high-performance computing process

What do you get with Print?