## Background briefing – math and numbers

We'll review basics of Python programming before we start any of our more serious missions. If you already know a little Python, this should be a review. If you don't know any Python, this is just an overview and many details will be omitted.

If you've never done any programming before, this briefing may be a bit too brief. You might want to get a more in-depth tutorial. If you're completely new to programming, you might want to look over this page for additional tutorials: https://wiki.python.org/moin/BeginnersGuide/NonProgrammers. For more help to start with expert Python programming, go to http://www.packtpub.com/expert-python-programming/book.

### The usual culprits

Python provides the usual mix of arithmetic and comparison operators. However, there are some important wrinkles and features. Rather than assuming you're aware of them, we'll review the details.

The conventional arithmetic operators are: `+`

, `-`

, `*`

, `/`

, `//`

, `%`

, and `**`

. There are two variations on division: an exact division (`/`

) and an integer division (`//`

). You must choose whether you want an exact, floating-point result, or an integer result:

>>> 355/113 3.1415929203539825 >>> 355//113 3 >>> 355.0/113.0 3.1415929203539825 >>> 355.0//113.0 3.0

The exact division (`/`

) produces a `float`

result from two integers. The integer division produces an integer result. When we use `float`

values, we expect exact division to produce `float`

. Even with two floating-point values, the integer division produces a rounded-down floating-point result.

We have this extra division operator to avoid having to use wordy constructs such as `int(a/b)`

or `math.floor(a/b)`

.

Beyond conventional arithmetic, there are some additional **bit fiddling** operators that are available: `&`

, `|`

, `^`

, `>>`

, `<<`

, and `~`

. These operators work on integers (and sets). These are emphatically not Boolean operators; they don't work on the narrow domain of `True`

and `False`

. They work on the individual bits of an integer.

We'll use binary values with the `0b`

prefix to show what the operators do, as shown in the following code. We'll look at details of this `0b`

prefix later.

>>> bin(0b0101 & 0b0110) '0b100' >>> bin(0b0101 ^ 0b0110) '0b11' >>> bin(0b0101 | 0b0110) '0b111' >>> bin(~0b0101) '-0b110'

The `&`

operator does bitwise `AND`

. The `^`

operator does bitwise exclusive `OR`

(`XOR`

). The `|`

operator does inclusive `OR`

. The `~`

operator is the complement of the bits. The result has many 1 bits and is shown as a negative number.

The `<<`

and `>>`

operators are for doing left and right shifts of the bits, as shown in the following code:

>>> bin( 0b110 << 4 ) '0b1100000' >>> bin( 0b1100000 >> 3 ) '0b1100'

It may not be obvious, but shifting left `x`

bits is like multiplying it by `2**x`

, except it may operate faster. Similarly, shifting right by b bits amounts to division by `2**b`

.

We also have all of the usual comparison operators: `<`

, `<=`

, `>`

, `>=`

, `==`

, and `!=`

.

In Python, we can combine comparison operators without including the `AND`

operator:

>>> 7 <= 11 < 17 True >>> 7 <= ll and 11 < 17 True

This simplification really does implement our conventional mathematical understanding of how comparisons can be written. We don't need to say `7 <= 11 and 11 < 17`

.

There's another comparison operator that's used in some specialized situations: `is`

. The `is`

operator will appear, for now, to be the same as `==`

. Try it. `3 is 3`

and `3 == 3`

seem to do the same thing. Later, when we start using the `None`

object, we'll see the most common use for the `is`

operator. For more advanced Python programming, there's a need to distinguish between two references to the same object (`is`

) and two objects which claim to have the same value (`==`

).

### The ivory tower of numbers

Python gives us a variety of numbers, plus the ability to easily add new kinds of numbers. We'll focus on the built-in numbers here. Adding new kinds of numbers is the sort of thing that takes up whole chapters in more advanced books.

Python ranks the numbers into a kind of tower. At the top are numbers with fewest features. Each subclass extends that number with more and more features. We'll look at the tower from bottom up, starting with the integers that have the most features, and moving towards the complex numbers that have the least features. The following sections cover the various kinds of numbers we'll need to use.

#### Integer numbers

We can write integer values in base 10, 16, 8, or 2. Base 10 numbers don't need a prefix, the other bases will use a simple two-character prefix, as shown in the following snippet:

48813 0xbead 0b1011111010101101 0o137255

We also have functions that will convert numbers into handy strings in different bases. We can use the `hex()`

, `oct()`

, and `bin()`

functions to see a value in base 16, 8, or 2.

The question of integer size is common. Python integers don't have a maximum size. They're not artificially limited to 32 or 64 bits. Try this:

>>> 2**256 115792089237316195423570985008687907853269984665640564039457584007913129639936

Large numbers work. They may be a bit slow, but they work perfectly fine.

#### Rational numbers

Rational numbers are not commonly used. They must be imported from the standard library. We must import the `fractions`

.`Fraction`

class definition. It looks like this:

>>> from fractions import Fraction

Once we have the `Fraction`

class defined, we can use it to create numbers. Let's say we were sent out to track down a missing device. Details of the device are strictly need-to-know. Since we're new agents, all that HQ will release to us is the overall size in square inches.

Here's an exact calculation of the area of a device we found. It is measured as 4⅞" multiplied by 2¼":

>>> length=4+Fraction("7/8") >>> width=2+Fraction("1/4") >>> length*width Fraction(351, 32)

Okay, the area is 351/32, which is—what?—in real inches and fractions.

We can use Python's `divmod()`

function to work this out. The `divmod()`

function gives us a quotient and a remainder, as shown in the following code:

>>> divmod(351,32) (10, 31)

It's about 5 × 2, so the value seems to fit within our rough approximation. We can transmit that as the proper result. If we found the right device, we'll be instructed on what to do with it. Otherwise, we might have blown the assignment.

#### Floating-point numbers

We can write floating-point values in common or scientific notation as follows:

3.1415926 6.22E12

The presence of the decimal point distinguishes an integer from a float.

These are ordinary double-precision floating-point numbers. It's important to remember that floating-point values are only approximations. They usually have a 64-bit implementation.

If you're using CPython, they're explicitly based on the C compiler that was shown in the `sys.version`

startup message. We can also get information from the `platform`

package as shown in the following code snippet:

>>> import platform >>> platform.python_build() ('v3.3.4:7ff62415e426', 'Feb 9 2014 00:29:34') >>> platform.python_compiler() 'GCC 4.2.1 (Apple Inc. build 5666) (dot 3)'

This tells us which compiler was used. That, in turn, can tell us what floating-point libraries were used. This may help determine which underlying mathematical libraries are in use.

#### Decimal numbers

We need to be careful with money. *Words to live by: the accountants watching over spies are a tight-fisted bunch*.

What's important is that floating-point numbers are an approximation. We can't rely on approximations when working with money. For currency, we need exact decimal values, nothing else will do. Decimal numbers can be used with the help of an extension module. We'll import the `decimal.Decimal`

class definition to work with currency. It looks like this:

>>> from decimal import Decimal

The informant we bribed to locate the device wants to be paid 50,000 Greek Drachma for the information on the missing device. When we submit our expenses, we'll need to include everything, including the cab fare (23.50 dollars) and the expensive lunch we had to buy her (12,900 GRD).

*Why wouldn't the informant accept Dollars or Euros? We don't want to know, we just want their information*. Recently, Greek Drachma were trading at 247.616 per dollar.

What's the exact budget for the information? In drachma and dollars?

First, we will convert currency exact to the mil (1000 of a dollar):

>>> conversion=Decimal("247.616") >>> conversion Decimal('247.616')

The tab for our lunch, converted from drachma to dollars, is calculated as follows:

>>> lunch=Decimal("12900") >>> lunch/conversion Decimal('52.09679503747738433703799431')

What? How is that mess going to satisfy the accountants?

All those digits are a consequence of exact division: we get a lot of decimal places of precision; not all of them are really relevant. We need to formalize the idea of *rounding off* the value so that the government accountants will be happy. The nearest penny will do. In the `Decimal`

method, we'll use the `quantize`

method. The term
**quantize** refers to rounding up, rounding down, and truncating a given value. The `decimal`

module offers a number of quantizing rules. The default rule is `ROUND_HALF_EVEN`

: round to the nearest value; in the case of a tie, prefer the even value. The code looks as follows:

>>> penny=Decimal('.00') >>> (lunch/conversion).quantize(penny) Decimal('52.10') That's much better. How much was the bribe we needed to pay? >>> bribe=50000 >>> (bribe/conversion).quantize(penny) Decimal('201.93')

Notice that the division involved an integer and a decimal. Python's definition of decimal will quietly create a new decimal number from the integer so that the math will be done using decimal objects.

The cab driver charged us US Dollars. We don't need to do much of a conversion. So, we will add this amount to the final amount, as shown in the following code:

>>> cab=Decimal('23.50') That gets us to the whole calculation: lunch plus bribe, converted, plus cab. >>> ((lunch+bribe)/conversion).quantize(penny)+cab Decimal('277.52')

Wait. We seem to be off by a penny. Why didn't we get 277.53 dollars as an answer?

Rounding. The basic rule is called *round half up*. Each individual amount (`52.10`

and `201.93`

) had a fraction of a penny value rounded up. (The more detailed values were `52.097`

and `201.926`

.) When we computed the sum of the drachma before converting, the total didn't include the two separately rounded-up half-penny values.

We have a very fine degree of control over this. There are a number of rounding schemes, and there are a number of ways to define when and how to round. Also, some algebra may be required to see how it all fits together.

#### Complex numbers

We also have complex numbers in Python. They're written with two parts: a real and an imaginary value, as shown in the following code:

>>> 2+3j (2+3j)

If we mix complex values with most other kinds of numbers, the results will be complex. The exception is decimal numbers. But why would we be mixing engineering data and currency? If any mission involves scientific and engineering data, we have a way to deal with the complex values.

### Outside the numbers

Python includes a variety of data types, which aren't numbers. In the *Handling text and strings* section, we'll look at Python strings. We'll look at collections in Chapter 2, *Acquiring Intelligence Data*.

Boolean values, `True`

and `False`

, form their own little domain. We can extract a Boolean value from most objects using the `bool()`

function. Here are some examples:

>>> bool(5) True >>> bool(0) False >>> bool('') False >>> bool(None) False >>> bool('word') True

The general pattern is that most objects have a value `True`

and a few exceptional objects have a value `False`

. Empty collections, `0`

, and `None`

have a value `False`

. Boolean values have their own special operators: `and`

, `or`

, and `not`

. These have an additional feature. Here's an example:

>>> True and 0 0 >>> False and 0 False

When we evaluate `True and 0`

, both sides of the `and`

operator are evaluated; the right-hand value was the result. But when we evaluated `False and 0`

, only the left-hand side of `and`

was evaluated. Since it was already `False`

, there was no reason to evaluate the right-hand side.

The `and`

and `or`

operators are *short-circuit *operators. If the left side of `and`

is `False`

, that's sufficient and the right-hand side is ignored. If the left-hand side of `or`

is `True`

, that's sufficient and the right-hand side is ignored.

Python's rules for evaluation follow mathematic practice closely. Arithmetic operations have the highest priority. Comparison operators have a lower priority than arithmetic operations. The logical operators have a very low priority. This means that `a+2 > b/3 or c==15`

will be done in phases: first the arithmetic, then the comparison, and finally the logic.

Mathematical rules are followed by arithmetic rules. `**`

has a higher priority than `*`

, `/`

, `//`

, or `%`

. The `+`

and `–`

operators come next. When we write `2*3+4`

, the `2*3`

operation must be performed first. The bit fiddling is even lower in priority. When you have a sequence of operations of the same priority (`a+b+c`

), the computations are performed from left to right. If course, if there's any doubt, it's sensible to use parenthesis.

### Assigning values to variables

We've been using the REPL feature of our Python toolset. In the long run, this isn't ideal. We'll be much happier writing scripts. The point behind using a computer for intelligence gathering is to automate data collection. Our scripts will require assignment to variables. It will also require explicit output and input.

We've shown the simple, obvious assignment statement in several examples previously. Note that we don't declare variables in Python. We simply assign values to variables. If the variable doesn't exist, it gets created. If the variable does exist, the previous value is replaced.

Let's look at some more sophisticated technology for creating and changing variables. We have multiple assignment statements. The following code will assign values to several variables at once:

>>> length, width = 2+Fraction(1,4), 4+Fraction(7,8) >>> length Fraction(9, 4) >>> width Fraction(39, 8) >>> length >= width False

We've set two variables, `length`

and `width`

. However, we also made a small mistake. The length isn't the larger value; we've switched the values of `length`

and `width`

. We can swap them very simply using a multiple assignment statement as follows:

>>> length, width = width, length >>> length Fraction(39, 8) >>> width Fraction(9, 4)

This works because the right-hand side is computed in its entirety. In this case, it's really simple. Then all of the values are broken down and assigned to the named variables. Clearly, the number of values on the right have to match the number of variables on the left or this won't work.

We also have *augmented* assignment statements. These couple an arithmetic operator with the assignment statement. The following code is an example of `+=`

: using assignment augmented with addition. Here's an example of computing a sum from various bits and pieces:

>>> total= 0 >>> total += (lunch/conversion).quantize(penny) >>> total += (bribe/conversion).quantize(penny) >>> total += cab >>> total Decimal('277.53')

We don't have to write `total = total +...`

. Instead, we can simply write `total += ...`

. It's a nice clarification of what our intent is.

All of the arithmetic operators are available as augmented assignment statements. We might have a hard time finding a use for `%=`

or `**=`

, but the statements are part of the language.

The idea of a nice clarification should lead to some additional thinking. For example, the variable named `conversion`

is a perfectly opaque name. Secrecy for data is one thing: we'll look at ways to encrypt data. Obscurity through shabby processing of that data often leads to a nightmarish mess. Maybe we should have called it something that defines more clearly what it means. We'll revisit this problem of obscurity in some examples later on.

### Writing scripts and seeing output

Most of our missions will involve gathering and analyzing data. We won't be creating a very sophisticated **User Interface** (**UI**). Python has tools for building websites and complex **graphical user interfaces** (**GUIs**). The complexity of those topics leads to entire books to cover GUI and web development.

We don't want to type each individual Python statement at the `>>>`

prompt. That makes it easy to learn Python, but our goal is to create programs. In GNU/Linux parlance, our Python application programs can be called
**scripts**. This is because Python programs fit the definition for a *scripting* language.

For our purposes, we'll focus on scripts that use the
**command-line interface** (**CLI**) Everything we'll write will run in a simple terminal window. The advantage of this approach is speed and simplicity. We can add graphic user interfaces later. Or we can expand the essential core of a small script into a web service, once it works.

What is an application or a script? A script is simply a plain text file. We can use any text editor to create this file. A word processor is rarely a good idea, since word processors aren't good at producing plain text files.

If we're not working from the `>>>`

REPL prompt, we'll need to explicitly display the output. We'll display output from a script using the `print()`

function.

Here's a simple script we can use to produce a receipt for bribing (*encouraging*) our informant.

From decimal import `Decimal`

:

PENNY= Decimal('.00') grd_usd= Decimal('247.616') lunch_grd= Decimal('12900') bribe_grd= 50000 cab_usd= Decimal('23.50') lunch_usd= (lunch_grd/grd_usd).quantize(PENNY) bribe_usd= (bribe_grd/grd_usd).quantize(PENNY) print( "Lunch", lunch_grd, "GRD", lunch_usd, "USD" ) print( "Bribe", bribe_grd, "GRD", bribe_usd, "USD" ) print( "Cab", cab_usd, "USD" ) print( "Total", lunch_usd+bribe_usd+cab_usd, "USD" )

Let's break this script down so that we can follow it. Reading a script is a lot like putting a tail on an informant. We want to see where the script goes and what it does.

First, we imported the `Decimal`

definition. This is essential for working with currency. We defined a value, `PENNY`

, that we'll use to round off currency calculations to the nearest penny. We used a name in all caps to make this variable distinctive. It's not an ordinary variable; we should *never* see it on the left-hand side of an assignment statement again in the script.

We created the currency conversion factor, and named it `grd_usd`

. That's a name that seems meaningful than `conversion`

in this context. Note that we also added a small suffix to our amount names. We used names such as `lunch_grd`

, `bribe_grd`

, and `cab_usd`

to emphasize which currency is being used. This can help prevent head-scrambling problems.

Given the `grd_usd`

conversion factor, we created two more variables, `lunch_usd`

and `bribe_usd`

, with the amounts converted to dollars and rounded to the nearest penny. If the accountants want to fiddle with the conversion factor—perhaps they can use a different bank than us spies—they can tweak the number and prepare a different receipt.

The final step was to use the `print()`

function to write the receipt. We printed the three items we spent money on, showing the amounts in GRD and USD. We also computed the total. This will help the accountants to properly reimburse us for the mission.

We'll describe the output as *primitive but acceptable*. After all, they're only accountants. We'll look into pretty formatting separately.

### Gathering user input

The simplest way to gather input is to copy and paste it into the script. That's what we did previously. We pasted the Greek Drachma conversion into the script: `grd_usd= Decimal('247.616')`

. We could annotate this with a comment to help the accountants make any changes.

Additional comments come at the end of the line, after a `#`

sign. They look like this:

grd_usd= Decimal('247.616') # Conversion from Mihalis Bank 5/15/14

This extra text is part of the application, but it doesn't actually do anything. It's a note to ourselves, our accountants, our handler, or the person who takes over our assignments when we disappear.

This kind of data line is easy to edit. But sometimes the people we work with want more flexibility. In that case, we can gather this value as input from a person. For this, we'll use the `input()`

function.

We often break user input down into two steps like this:

entry= input("GRD conversion: ") grd_usd= Decimal(entry)

The first line will write a prompt and wait for the user to enter the amount. The amount will be a string of characters, assigned to the variable `entry`

. Python can't use the characters directly in arithmetic statements, so we need to explicitly convert them to a useful numeric type.

The second line will try to convert the user's input to a useful `Decimal`

object. We have to emphasize the `try`

part of this. If the user doesn't enter a string that represents valid `Decimal`

number, there will be a major crisis. Try it.

The crisis will look like this:

>>> entry= input("GRD conversion: ") GRD conversion: 123.%$6 >>> grd_usd= Decimal(entry) Traceback (most recent call last): File "<stdin>", line 1, in <module> decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

Rather than this, enter a good number. We entered `123.%$6`

.

The bletch starting with `Traceback`

indicates that Python raised an exception. A crisis in Python always results in an exception being raised. Python defines a variety of exceptions to make it possible for us to write scripts that deal with these kinds of crises.

Once we've seen how to deal with crises, we can look at string data and some simple clean-up steps that can make the user's life a little easier. We can't fix their mistakes, but we can handle a few common problems that stem from trying to type numbers on a keyboard.

#### Handling exceptions

An exception such as `decimal.InvalidOperation`

is raised when the `Decimal`

class can't parse the given string to create a valid `Decimal`

object. What can we do with this exception?

We can ignore it. In that case, our application program crashes. It stops running and the agents using it are unhappy. Not really the best approach.

Here's the basic technique for catching an exception:

entry= input("GRD conversion: ") try: grd_usd= Decimal(entry) except decimal.InvalidOperation: print("Invalid: ", entry)

We've wrapped the `Decimal()`

conversion and assignment in a `try:`

statement. If every statement in the `try:`

block works, the `grd_usd`

variable will be set. If, on the other hand, a `decimal.InvalidOperation`

exception is raised inside the `try:`

block, the `except`

clause will be processed. This writes a message and does not set the `grd_usd`

variable.

We can handle an exception in a variety of ways. The most common kind of exception handling will clean up in the event of some failure. For example, a script that attempts to create a file might delete the useless file if an exception was raised. The problem hasn't been solved: the program still has to stop. But it can stop in a clean, pleasant way instead of a messy way.

We can also handle an exception by computing an alternate answer. We might be gathering information from a variety of web services. If one doesn't respond in time, we'll get a timeout exception. In this case, we may try an alternate web service.

In another common exception-handling case, we may reset the state of the computation so that an action can be tried again. In this case, we'll wrap the exception handler in a loop that can repeatedly ask the user for input until they provide a valid number.

These choices aren't exclusive and some handlers can perform combinations of the previous exception handlers. We'll look at the third choice, trying again, in detail.

#### Looping and trying again

Here's a common recipe for getting input from the user:

grd_usd= None while grd_usd is None: entry= input("GRD conversion: ") try: grd_usd= Decimal(entry) except decimal.InvalidOperation: print("Invalid: ", entry) print( grd_usd, "GRD = 1 USD" )

We'll add a tail to this and follow it around for a bit. The goal is to get a valid decimal value for our currency conversion, `grd_usd`

. We'll initialize that variable as Python's special `None`

object.

The `while`

statement makes a formal declaration of our intent. We're going to execute the body of the `while`

statement while the `grd_usd`

variable remains set to `None`

. Note that we're using the `is`

operator to compare `grd_usd`

to `None`

. We're emphasizing a detail here: there's only one `None`

object in Python and we're using that single instance. It's technically possible to tweak the definition of `==`

; we can't tweak the definition of `is`

.

At the end of the `while`

statement, `grd_usd is None`

must be `False`

; we can say `grd_usd is not None`

. When we look at the body of the statement, we can see that only one statement sets `grd_usd`

, so we're assured that it must be a valid `Decimal`

object.

Within the body of the `while`

statement, we've used our exception-handling recipe. First, we prompt and get some input, setting the `entry`

variable. Then, inside the `try`

statement, we attempt to convert the string to a `Decimal`

value. If that conversion works, then `grd_usd`

will have that `Decimal`

object assigned. The object will not be `None`

and the loop will terminate. Victory!

If the conversion of entry to a `Decimal`

value fails, the exception will be raised. We'll print a message, and leave `grd_usd`

alone. It will still have a value of `None`

. The loop will continue until a valid value is entered.

Python has other kinds of loops, we'll get to them later in this chapter.