Programming | 0 articles | Tech News, Tutorials & Expert Insights

18 Feb 2015

18 min read

Unit Testing

18 Feb 2015

In this article by Mikael Lundin, author of the book Testing with F#, we will see how unit testing is the art of designing our program in such a way that we can easily test each function as isolated units and such verify its correctness. Unit testing is not only a tool for verification of functionality, but also mostly a tool for designing that functionality in a testable way. What you gain is the means of finding problems early, facilitating change, documentation, and design. In this article, we will dive into how to write good unit tests using F#: Testing in isolation Finding the abstraction level (For more resources related to this topic, see here.) FsUnit The current state of unit testing in F# is good. You can get all the major test frameworks running with little effort, but there is still something that feels a bit off with the way tests and asserts are expressed: open NUnit.Framework Assert.That(result, Is.EqualTo(42)) Using FsUnit, you can achieve much higher expressiveness in writing unit tests by simply reversing the way the assert is written: open FsUnit result |> should equal 42 The FsUnit framework is not a test runner in itself, but uses an underlying test framework to execute. The underlying framework can be of MSTest, NUnit, or xUnit. FsUnit can best be explained as having a different structure and syntax while writing tests. While this is a more dense syntax, the need for structure still exists and AAA is more needed more than ever. Consider the following test example: [<Measure>] type EUR [<Measure>] type SEK type Country = | Sweden | Germany | France let calculateVat country (amount : float<'u>) = match country with | Sweden -> amount * 0.25 | Germany -> amount * 0.19 | France -> amount * 0.2 open NUnit.Framework open FsUnit [<Test>] let ``Sweden should have 25% VAT`` () = let amount = 200.<SEK> calculateVat Sweden amount |> should equal 50<SEK> This code will calculate the VAT in Sweden in Swedish currency. What is interesting is that when we break down the test code and see that it actually follows the AAA structure, even it doesn't explicitly tell us this is so: [<Test>] let ``Germany should have 19% VAT`` () = // arrange let amount = 200.<EUR> // act calculateVat Germany amount //assert |> should equal 38<EUR> The only thing I did here was add the annotations for AAA. It gives us the perspective of what we're doing, what frames we're working inside, and the rules for writing good unit tests. Assertions We have already seen the equal assertion, which verifies that the test result is equal to the expected value: result |> should equal 42 You can negate this assertion by using the not' statement, as follows: result |> should not' (equal 43) With strings, it's quite common to assert that a string starts or ends with some value, as follows: "$12" |> should startWith "$" "$12" |> should endWith "12" And, you can also negate that, as follows: "$12" |> should not' (startWith "€") "$12" |> should not' (endWith "14") You can verify that a result is within a boundary. This will, in turn, verify that the result is somewhere between the values of 35-45: result |> should (equalWithin 5) 40 And, you can also negate that, as follows: result |> should not' ((equalWithin 1) 40) With the collection types list, array, and sequence, you can check that it contains a specific value: [1..10] |> should contain 5 And, you can also negate it to verify that a value is missing, as follows: [1; 1; 2; 3; 5; 8; 13] |> should not' (contain 7) It is common to test the boundaries of a function and then its exception handling. This means you need to be able to assert exceptions, as follows: let getPersonById id = failwith "id cannot be less than 0" (fun () -> getPersonById -1 |> ignore) |> should throw typeof<System.Exception> There is a be function that can be used in a lot of interesting ways. Even in situations where the equal assertion can replace some of these be structures, we can opt for a more semantic way of expressing our assertions, providing better error messages. Let us see examples of this, as follows: // true or false 1 = 1 |> should be True 1 = 2 |> should be False // strings as result "" |> should be EmptyString null |> should be NullOrEmptyString // null is nasty in functional programming [] |> should not' (be Null) // same reference let person1 = new System.Object() let person2 = person1 person1 |> should be (sameAs person2) // not same reference, because copy by value let a = System.DateTime.Now let b = a a |> should not' (be (sameAs b)) // greater and lesser result |> should be (greaterThan 0) result |> should not' (be lessThan 0) // of type result |> should be ofExactType<int> // list assertions [] |> should be Empty [1; 2; 3] |> should not' (be Empty) With this, you should be able to assert most of the things you're looking for. But there still might be a few edge cases out there that default FsUnit asserts won't catch. Custom assertions FsUnit is extensible, which makes it easy to add your own assertions on top of the chosen test runner. This has the possibility of making your tests extremely readable. The first example will be a custom assert which verifies that a given string matches a regular expression. This will be implemented using NUnit as a framework, as follows: open FsUnit open NUnit.Framework.Constraints open System.Text.RegularExpressions // NUnit: implement a new assert type MatchConstraint(n) = inherit Constraint() with override this.WriteDescriptionTo(writer : MessageWriter) : unit = writer.WritePredicate("matches") writer.WriteExpectedValue(sprintf "%s" n) override this.Matches(actual : obj) = match actual with | :? string as input -> Regex.IsMatch(input, n) | _ -> failwith "input must be of string type" let match' n = MatchConstraint(n) open NUnit.Framework [<Test>] let ``NUnit custom assert`` () = "2014-10-11" |> should match' "d{4}-d{2}-d{2}" "11/10 2014" |> should not' (match' "d{4}-d{2}-d{2}") In order to create your own assert, you need to create a type that implements the NUnit.Framework.Constraints.IConstraint interface, and this is easily done by inheriting from the Constraint base class. You need to override both the WriteDescriptionTo() and Matches() method, where the first one controls the message that will be output from the test, and the second is the actual test. In this implementation, I verify that input is a string; or the test will fail. Then, I use the Regex.IsMatch() static function to verify the match. Next, we create an alias for the MatchConstraint() function, match', with the extra apostrophe to avoid conflict with the internal F# match expression, and then we can use it as any other assert function in FsUnit. Doing the same for xUnit requires a completely different implementation. First, we need to add a reference to NHamcrest API. We'll find it by searching for the package in the NuGet Package Manager: Instead, we make an implementation that uses the NHamcrest API, which is a .NET port of the Java Hamcrest library for building matchers for test expressions, shown as follows: open System.Text.RegularExpressions open NHamcrest open NHamcrest.Core // test assertion for regular expression matching let match' pattern = CustomMatcher<obj>(sprintf "Matches %s" pattern, fun c -> match c with | :? string as input -> Regex.IsMatch(input, pattern) | _ -> false) open Xunit open FsUnit.Xunit [<Fact>] let ``Xunit custom assert`` () = "2014-10-11" |> should match' "d{4}-d{2}-d{2}" "11/10 2014" |> should not' (match' "d{4}-d{2}-d{2}") The functionality in this implementation is the same as the NUnit version, but the implementation here is much easier. We create a function that receives an argument and return a CustomMatcher<obj> object. This will only take the output message from the test and the function to test the match. Writing an assertion for FsUnit driven by MSTest works exactly the same way as it would in Xunit, by NHamcrest creating a CustomMatcher<obj> object. Unquote There is another F# assertion library that is completely different from FsUnit but with different design philosophies accomplishes the same thing, by making F# unit tests more functional. Just like FsUnit, this library provides the means of writing assertions, but relies on NUnit as a testing framework. Instead of working with a DSL like FsUnit or API such as with the NUnit framework, the Unquote library assertions are based on F# code quotations. Code quotations is a quite unknown feature of F# where you can turn any code into an abstract syntax tree. Namely, when the F# compiler finds a code quotation in your source file, it will not compile it, but rather expand it into a syntax tree that represents an F# expression. The following is an example of a code quotation: <@ 1 + 1 @> If we execute this in F# Interactive, we'll get the following output: val it : Quotations.Expr = Call (None, op_Addition, [Value (1), Value (1)]) This is truly code as data, and we can use it to write code that operates on code as if it was data, which in this case, it is. It brings us closer to what a compiler does, and gives us lots of power in the metadata programming space. We can use this to write assertions with Unquote. Start by including the Unquote NuGet package in your test project, as shown in the following screenshot: And now, we can implement our first test using Unquote, as follows: open NUnit.Framework open Swensen.Unquote [<Test>] let ``Fibonacci sequence should start with 1, 1, 2, 3, 5`` () = test <@ fibonacci |> Seq.take 5 |> List.ofSeq = [1; 1; 2; 3; 5] @> This works by Unquote first finding the equals operation, and then reducing each side of the equals sign until they are equal or no longer able to reduce. Writing a test that fails and watching the output more easily explains this. The following test should fail because 9 is not a prime number: [<Test>] let ``prime numbers under 10 are 2, 3, 5, 7, 9`` () = test <@ primes 10 = [2; 3; 5; 7; 9] @> // fail The test will fail with the following message: Test Name: prime numbers under 10 are 2, 3, 5, 7, 9 Test FullName: chapter04.prime numbers under 10 are 2, 3, 5, 7, 9 Test Outcome: Failed Test Duration: 0:00:00.077 Result Message: primes 10 = [2; 3; 5; 7; 9] [2; 3; 5; 7] = [2; 3; 5; 7; 9] false Result StackTrace: at Microsoft.FSharp.Core.Operators.Raise[T](Exception exn) at chapter04.prime numbers under 10 are 2, 3, 5, 7, 9() In the resulting message, we can see both sides of the equals sign reduced until only false remains. It's a very elegant way of breaking down a complex assertion. Assertions The assertions in Unquote are not as specific or extensive as the ones in FsUnit. The idea of having lots of specific assertions for different situations is to get very descriptive error messages when the tests fail. Since Unquote actually outputs the whole reduction of the statements when the test fails, the need for explicit assertions is not that high. You'll get a descript error message anyway. The absolute most common is to check for equality, as shown before. You can also verify that two expressions are not equal: test <@ 1 + 2 = 4 - 1 @> test <@ 1 + 2 <> 4 @> We can check whether a value is greater or smaller than the expected value: test <@ 42 < 1337 @> test <@ 1337 > 42 @> You can check for a specific exception, or just any exception: raises<System.NullReferenceException> <@ (null : string).Length @> raises<exn> <@ System.String.Format(null, null) @> Here, the Unquote syntax excels compared to FsUnit, which uses a unit lambda expression to do the same thing in a quirky way. The Unquote library also has its reduce functionality in the public API, making it possible for you to reduce and analyze an expression. Using the reduceFully syntax, we can get the reduction in a list, as shown in the following: > <@ (1+2)/3 @> |> reduceFully |> List.map decompile;; val it : string list = ["(1 + 2) / 3"; "3 / 3"; "1"] If we just want the output to console output, we can run the unquote command directly: > unquote <@ [for i in 1..5 -> i * i] = ([1..5] |> List.map (fun i -> i * i)) @>;; Seq.toList (seq (Seq.delay (fun () -> Seq.map (fun i -> i * i) {1..5}))) = ([1..5] |> List.map (fun i -> i * i)) Seq.toList (seq seq [1; 4; 9; 16; ...]) = ([1; 2; 3; 4; 5] |> List.map (fun i -> i * i)) Seq.toList seq [1; 4; 9; 16; ...] = [1; 4; 9; 16; 25] [1; 4; 9; 16; 25] = [1; 4; 9; 16; 25] true It is important to know what tools are out there, and Unquote is one of those tools that is fantastic to know about when you run into a testing problem in which you want to reduce both sides of an equals sign. Most often, this belongs to difference computations or algorithms like price calculation. We have also seen that Unquote provides a great way of expressing tests for exceptions that is unmatched by FsUnit. Testing in isolation One of the most important aspects of unit testing is to test in isolation. This does not only mean to fake any external dependency, but also that the test code itself should not be tied up to some other test code. If you're not testing in isolation, there is a potential risk that your test fails. This is not because of the system under test, but the state that has lingered from a previous test run, or external dependencies. Writing pure functions without any state is one way of making sure your test runs in isolation. Another way is by making sure that the test creates all the needed state itself. Shared state, like connections, between tests is a bad idea. Using TestFixtureSetUp/TearDown attributes to set up a state for a set of tests is a bad idea. Keeping low performance resources around because they're expensive to set up is a bad idea. The most common shared states are the following: The ASP.NET Model View Controller (MVC) session state Dependency injection setup Database connection, even though it is no longer strictly a unit test Here's how one should think about unit testing in isolation, as shown in the following screenshot: Each test is responsible for setting up the SUT and its database/web service stubs in order to perform the test and assert on the result. It is equally important that the test cleans up after itself, which in the case of unit tests most often can be handed over to the garbage collector, and doesn't need to be explicitly disposed. It is common to think that one should only isolate a test fixture from other test fixtures, but this idea of a test fixture is bad. Instead, one should strive for having each test stand for itself to as large an extent as possible, and not be dependent on outside setups. This does not mean you will have unnecessary long unit tests, provided you write SUT and tests well within that context. The problem we often run into is that the SUT itself maintains some kind of state that is present between tests. The state can simply be a value that is set in the application domain and is present between different test runs, as follows: let getCustomerFullNameByID id = if cache.ContainsKey(id) then (cache.[id] :?> Customer).FullName else // get from database // NOTE: stub code let customer = db.getCustomerByID id cache.[id] <- customer customer.FullName The problem we see here is that the cache will be present from one test to another, so when the second test is running, it needs to make sure that its running with a clean cache, or the result might not be as expected. One way to test it properly would be to separate the core logic from the cache and test them each independently. Another would be to treat it as a black box and ignore the cache completely. If the cache makes the test fail, then the functionality fails as a whole. This depends on if we see the cache as an implementation detail of the function or a functionality by itself. Testing implementation details, or private functions, is dirty because our tests might break even if the functionality hasn't changed. And yet, there might be benefits into taking the implementation detail into account. In this case, we could use the cache functionality to easily stub out the database without the need of any mocking framework. Vertical slice testing Most often, we deal with dependencies as something we need to mock away, where as the better option would be to implement a test harness directly into the product. We know what kind of data and what kind of calls we need to make to the database, so right there, we have a public API for the database. This is often called a data access layer in a three-tier architecture (but no one ever does those anymore, right?). As we have a public data access layer, we could easily implement an in-memory representation that can be used not only by our tests, but in development of the product, as shown in the following image: When you're running the application in development mode, you configure it toward the in-memory version of the dependency. This provides you with the following benefits: You'll get a faster development environment Your tests will become simpler You have complete control of your dependency As your development environment is doing everything in memory, it becomes blazing fast. And as you develop your application, you will appreciate adjusting that public API and getting to understand completely what you expect from that dependency. It will lead to a cleaner API, where very few side effects are allowed to seep through. Your tests will become much simpler, as instead of mocking away the dependency, you can call the in-memory dependency and set whatever state you want. Here's an example of what a public data access API might look like: type IDataAccess = abstract member GetCustomerByID : int -> Customer abstract member FindCustomerByName : string -> Customer option abstract member UpdateCustomerName : int -> string -> Customer abstract member DeleteCustomerByID : int -> bool This is surely a very simple API, but it will demonstrate the point. There is a database with a customer inside it, and we want to do some operations on that. In this case, our in-memory implementation would look like this: type InMemoryDataAccess() = let data = new System.Collections.Generic.Dictionary<int, Customer>() // expose the add method member this.Add customer = data.Add(customer.ID, customer) interface IDataAccess with // throw exception if not found member this.GetCustomerByID id = data.[id] member this.FindCustomerByName fullName = data.Values |> Seq.tryFind (fun customer -> customer.FullName = fullName) member this.UpdateCustomerName id fullName = data.[id] <- { data.[id] with FullName = fullName } data.[id] member this.DeleteCustomerByID id = data.Remove(id) This is a simple implementation that provides the same functionality as the database would, but in memory. This makes it possible to run the tests completely in isolation without worrying about mocking away the dependencies. The dependencies are already substituted with in-memory replacements, and as seen with this example, the in-memory replacement doesn't have to be very extensive. The only extra function except from the interface implementation is the Add() function which lets us set the state prior to the test, as this is something the interface itself doesn't provide for us. Now, in order to sew it together with the real implementation, we need to create a configuration in order to select what version to use, as shown in the following code: open System.Configuration open System.Collections.Specialized // TryGetValue extension method to NameValueCollection type NameValueCollection with member this.TryGetValue (key : string) = if this.Get(key) = null then None else Some (this.Get key) let dataAccess : IDataAccess = match ConfigurationManager.AppSettings.TryGetValue("DataAccess") with | Some "InMemory" -> new InMemoryDataAccess() :> IDataAccess | Some _ | None -> new DefaultDataAccess() :> IDataAccess // usage let fullName = (dataAccess.GetCustomerByID 1).FullName Again, with only a few lines of code, we manage to select the appropriate IDataAccess instance and execute against it without using dependency injection or taking a penalty in code readability, as we would in C#. The code is straightforward and easy to read, and we can execute any tests we want without touching the external dependency, or in this case, the database. Finding the abstraction level In order to start unit testing, you have to start writing tests; this is what they'll tell you. If you want to get good at it, just start writing tests, any and a lot of them. The rest will solve itself. I've watched experienced developers sit around staring dumbfounded at an empty screen because they couldn't get into their mind how to get started, what to test. The question is not unfounded. In fact, it is still debated in the Test Driven Development (TDD) community what should be tested. The ground rule is that the test should bring at least as much value as the cost of writing it, but that is a bad rule for someone new to testing, as all tests are expensive for them to write. Summary In this article, we've learned how to write unit tests by using the appropriate tools to our disposal: NUnit, FsUnit, and Unquote. We have also learned about different techniques for handling external dependencies, using interfaces and functional signatures, and executing dependency injection into constructors, properties, and methods. Resources for Article: Further resources on this subject: Learning Option Pricing [article] Pentesting Using Python [article] Penetration Testing [article]

0
0
2340

article-image-financial-management-microsoft-dynamics-ax-2012-r3

Packt

11 Feb 2015

4 min read

Financial Management with Microsoft Dynamics AX 2012 R3

Packt

11 Feb 2015

4 min read

0
0
4187

article-image-how-to-build-a-koa-web-application-part-2

Christoffer Hallas

08 Feb 2015

5 min read

How to Build a Koa Web Application - Part 2

Christoffer Hallas

08 Feb 2015

5 min read

In Part 1 of this series, we got everything in place for our Koa app using Jade and Mongel. In this post, we will cover Jade templates and how to use listing and viewing pages. Please note that this series requires that you use Node.js version 0.11+. Jade templates Rendering HTML is always an important part of any web application. Luckily, when using Node.js there are many great choices, and for this article we’ve chosen Jade. Keep in mind though that we will only touch on a tiny fraction of the Jade functionality. Let’s create our first Jade template. Create a file called create.jade and put in the following: create.jade doctype html html(lang='en') head title Create Page body h1 Create Page form(method='POST', action='/create') input(type='text', name='title', placeholder='Title') input(type='text', name='contents', placeholder='Contents') input(type='submit') For all the Jade questions you have that we won’t answer in this series, I refer you to the excellent official Jade website at http://jade-lang.com . If you add the following statement app.listen(3000); to the end of index.js, then you should be able to run the program from your terminal using the following command and by visiting http://localhost:3000 in your browser. $ node --harmony index.js The --harmony flag just tells the node program that we need support for generators in our program: Listing and viewing pages Now that we can create a page in our MongoDB database, it is time to actually list and view these pages. For this purpose we need to add another middleware to our index.js file after the first middleware: app.use(function* () { if (this.method != 'GET') { this.status = 405; this.body = 'Method Not Allowed'; return } … }); As you can probably already tell, this new middleware is very similar to the first one we added that handled the creation of pages. At first we make sure that the method of the request is GET, and if not, we respond appropriately and return the following: var params = this.path.split('/').slice(1); var id = params[0]; if (id.length == 0) { var pages = yield Page.find(); var html = jade.renderFile('list.jade', { pages: pages }); this.body = html; return } Then, we proceed to inspect the path attribute of the Koa context, looking for an ID that represents the page in the database. Remember how we redirected using the ID in the previous middleware. We inspect the path by splitting it into an array of strings separated by the forward slashes of a URL; this way the path /1234 becomes an array of ‘’ and ‘1234.’ Because the path starts with a forward slash, the first item in the array will always be the empty string, so we just discard that by default. Then we check the length of the ID parameter, and if it’s zero we know that there is in fact no ID in the path, and we should just look for the pages in the database and render our list.jade template with those pages made available to the template as the variable pages. Making data available in templates is also known as providing locals to the template. list.jade doctype html html(lang="en") head title Your Web Application body h1 Your Web Application ul - each page in pages li a(href='/#{page._id}')= page.title But if the length of id was not zero, we assume that it’s an id and we try to load that specific page from the database instead of all the pages, and we proceed to render our view.jade template with the: var page = yield Page.findById(id); var html = jade.renderFile('view.jade', page); this.body = html; view.jade doctype html html(lang="en") head title= title body h1= title p= contents That’s it You should now be able to run the app as previously described and create a page, list all of your pages, and view them. If you want to, you can continue and build a simple CMS system. Koa is very simple to use and doesn’t enforce a lot of functionality on you, allowing you to pick and choose between libraries that you need and want to use. There are many possibilities and that is one of Koa’s biggest strengths. Find even more Node.js content on our Node.js page. Featuring our latest titles and most popular tutorials, it's the perfect place to learn more about Node.js. About the author Christoffer Hallas is a software developer and entrepreneur from Copenhagen, Denmark. He is a computer polyglot and contributes to and maintains a number of open source projects. When not contemplating his next grand idea (which remains an idea), he enjoys music, sports, and design of all kinds. Christoffer can be found on GitHub as hallas and at Twitter as @hamderhallas.

0
0
4080

article-image-five-kinds-python-functions-python-34-edition

Packt

06 Feb 2015

33 min read

The Five Kinds of Python Functions Python 3.4 Edition

Packt

06 Feb 2015

33 min read

This article is written by Steven Lott, author of the book Functional Python Programming. You can find more about him at http://slott-softwarearchitect.blogspot.com. (For more resources related to this topic, see here.) What's This About? We're going to look at various ways that Python 3 lets us define things which behave like functions. The proper term here is Callable – we're looking at objects that can be called like a function. We'll look at the following Python constructs: Function definitions Higher-order functions Function wrappers (around methods) Lambdas Callable objects Generator functions and the yield parameter And yes, we're aware that the list above has six items on it. That's because higher-order functions in Python aren't really all that complex or different. In some languages, functions that take functions are arguments involving special syntax. In Python, it's simple and common and barely worth mentioning as a separate topic. We'll look at when it's appropriate and inappropriate to use one or the other of these various functional forms. Some background Let's take a quick peek at a basic bit of mathematical formalism. We'll look at a function as an abstract formalism. We often annotate it like this: This shows us that f() is a function. It has one argument, x, and will map this to a single value, y. Some mathematical functions are written in front, for example, y=sin x. Some are written in other places around the argument, for example, y=|x|. In Python, the syntax is more consistent, for example, we use a function like this: >>> abs(-5)5 We've applied the abs() function to an argument value of -5. The argument value was mapped to a value of 5. Terminology Consider the following function: In this definition, the argument is a pair of values, (a,b). This is called the domain. We can summarize it as the domain of values for which the function is defined. Outside this domain, the function is not defined. In Python, we get a TypeError exception if we provide one value or three values as the argument. The function maps the domain pair to a pair of values, (q,r). This is the range of the function. We can call this the range of values that could be returned by the function. Mathematical function features As we look at the abstract mathematical definition of functions, we note that functions are generally assumed to have no hysteresis; they have no history or memory of prior use. This is sometimes called the property of being idempotent: the results are always the same for a given argument value. We see this in Python as a common feature. But it's not universally true. We'll look at a number of exceptions to the rule of idempotence. Here's an example of the usual situation: >>> int("10f1", 16)4337 The value returned from the evaluation of int("10f1", 16) never changes. There are, however, some common examples of non-idempotent functions in Python. Examples of hysteresis Here are three common situations where a function has hysteresis. In some cases, results vary based on history. In other cases, results vary based on events in some external environment, such as follows: Random number generators. We don't want them to produce the same value over and over again. The Python random.randrange() function, is not obviously idempotent. OS functions depend on the state of the machine as a whole. The os.listdir() function returns values that depend on the use of functions such as os.unlink(), os.rename(), and open() (among several others).While the rules are generally simple, it requires a stateful object outside the narrow world of the code itself. These are examples of Python functions that don't completely fit the formal mathematical definition; they lack idempotence, and their values depend on history, other functions, or both. Function Definitions Python has two statements that are essential features of function definition. The def statement specifies the domain and the return statement(s) specify the range. A simplified gloss of the syntax is as follows: def name(params): body return expression In effect, the function's domain is defined by the parameters provided in the def statement. This list of parameter names is not all the information on the domain, however. Even if we use one of the Python extensions to add type annotations, that's still not all the information. There may be if statements in the body of the function that impose additional explicit restrictions. There may be other functions that impose their own kind of implicit restrictions. If, for example, the body included math.sqrt() then there would be an implicit restriction on some values being non-negative. The return statements provide the function's range. An empty return statement means a range of simply None values. When there are multiple return statements, the range is the union of the ranges on all the return statements. This mapping between Python syntax and mathematical concepts isn't very complete. We need more information about a function. Example definition Here's an example of function definition: def odd(n): """odd(n) -> boolean, true if n is odd.""" return n % 2 == 1 What do does this definition tell us? Several things such as: Domain: We know that this function accepts n, a single object. Range: Boolean value, True if n is an odd number. This is the most likely interpretation. It's also remotely possible that the class of n has repurposed __mod__() or __rmod__() methods, in which case the semantics can be pretty obscure. Because of the inherent ambiguity in Python, this function has provided a triple-quoted """Docstring""" parameter with a summary of the function. This is a best practice, and should be followed universally except in articles like this where it gets too long-winded to include a docstring parameter everywhere. In this case, the doctoring parameter doesn't state unambiguously that n is intended to be a number. There are two ways to handle this gap, they are as follows: Actually include words like n is a number in the docstring parameter Include the docstring parameter test cases that show the required behavior Either is acceptable. Both are preferable. Using a function To complete this example, here's how we'd use this odd little function named odd(): >>> odd(3)True>>> odd(4)False This kind of example text can be included into the docstring parameter to create two test cases that offer insight into what the function really means. The lack of declarations More verbose type declarations—as used in many popular programming languages—aren't actually enough information to fully specify a function's domain and range. To be rigorously complete, we need type definitions that include optional predicates. Take a look at the following command: isinstance(n,int) and n >= 0 The assert statement is a good place for this kind of additional argument domain checking. This isn't the perfect solution because assert statements can be disabled very easily. It can help during design and testing and it can help people to read your code. The fussy formal declarations of data type used in other languages are not really needed in Python. Python replaces an up-front claim about required types with a runtime search for appropriate class methods. This works because each Python object has all the type information bound into it. Static compile-time type information is redundant, since the runtime type information is complete. A Python function definition is pretty spare. In includes the minimal amount of information about the function. There are no formal declaration of parameter types or return type. This odd little function will work with any object that implements the % operator: Generally, this means any object that implements __mod__() or __rmod__(). This means most subclasses of numbers.Number. It also means instances of any class that happen to provide these methods. That could become very weird, but still possible. We hesitate to think about non-numeric objects that work with the number-like % operator. Some Python features In Python, functions we declare are proper first-class objects. This means that they have attributes that can be assigned to variables and placed into collections. Quite a few clever things can be done with function objects. One of the most elegant things is to use a function as an argument or a return value from another function. The ability to do this means that we can easily create and use higher-order functions in Python. For folks who know languages such as C (and C++), functions aren't proper first-class objects. A pointer to a function, however, is a first class object in C. But the function itself is a block of code that can't easily be manipulated. We'll look at a number of simple ways in which we can write—and use—higher-order functions in Python. Functions are objects Consider the following command example: >>> not_even = odd>>> not_even(3)True We've assigned the odd little function object to a new variable, not_even. This creates an alias for a function. While this isn't always the best idea, there are times when we might want to provide an alternate name for a function as part of maintaining reverse compatibility with a previous release of a library. Using functions Consider the following function definition: def some_test(function, value): print(function, value) return function(value) This function's domain includes arguments named function and value. We can see that it prints the arguments, then applies the function argument to the given value. When we use the preceding function, it looks like this: >>> some_test(odd, 3)<function odd at 0x613978> 3True The some_test() function accepted a function as an argument. When we printed the function, we got a summary, <function odd at 0x613978>, that shows us some information about the object. We also show a summary of the argument value, 3. When we applied the function to a value, we got the expected result. We can—of course—extend this concept. In particular, we can apply a single function to many values. Higher-order Functions Higher-order functions become particularly useful when we apply them to collections of objects. The built-in map() function applies a simple function to each value in an argument sequence. Here's an example: >>> list(map(odd, [1,2,3,4]))[True, False, True, False] We've used the map() function to apply the odd() function to each value in the sequence. This is a lot like evaluating: >>> [odd(x) for x in [1,2,3,4]] We've created a list comprehension instead of applying a higher-order map() function. This is equivalent to the following command snippet: [odd(1), odd(2), odd(3), odd(4)] Here, we've manually applied the odd() function to each value in a sequence. Yes, that's a diesel engine alternator and some hoses: We'll use this alternator as a subject for some concrete examples of higher-order functions. Diesel engine background Some basic diesel engine mechanics. The following some basic information: The engine turns the alternator. The alternator generates pulses that drive the tachometer. Amongst other things, like charging the batteries. The alternator provides an indirect measurement of engine RPMs. Direct measurement would involve connecting to a small geared shaft. It's difficult and expensive. We already have a tachometer; it's just incorrect. The new alternator has new wheels. The ratios between engine and alternator have changed. We're not interested in installing a new tachometer. Instead, we'll create a conversion from a number on the tachometer, which is calibrated to the old alternator, to a proper number of engine RPMs. This has to allow the change in ratio between the original tachometer and the new tach. Let's collect some data and see what we can figure out about engine RPMs. New alternator First approximation: all we did was get new wheels. We can presume that the old tachometer was correct. Since the new wheel is smaller, we'll have higher alternator RPMs. That means higher readings on the old tachometer. Here's the key question: How far wrong are the RPMs? The old wheel was approximately 3.5 RPM and the new wheel is approximately 2.5 RPM. We can compute the potential ratio between what the tach says and what the engine is really doing: >>> 3.5/2.51.4>>> 1/_0.7142857142857143 That's nice. Is it right? Can we really just multiply and display RPMs by .7 to get actual engine RPMs? Let's create the conversion card first, then collect some more data. Use case Given RPM on the tachometer, what's the real RPM of the engine? Use the following command to find the RPM: def eng(r): return r/1.4 Use it like the following: >>> eng(2100)1500.0 This seems useful. Tach says 2100, engine (theoretically) spinning at 1500, more or less. Let's confirm our hypothesis with some real data. Data collection Over a period of time, we recorded tachometer readings and actual RPMs using a visual RPM measuring device. The visual device requires a strip of reflective tape on one of the engine wheels. It uses a laser and counts returns per minute. Simple. Elegant. Accurate. It's really inconvenient. But it got some data we could digest. Skipping some boring statistics, we wind up with the following function that maps displayed RPMs to actual RPMs, such as this: def eng2(r): return 0.7724*r**1.0134 Here's a sample result: >>> eng2(2100)1797.1291903589386 When tach says 2100, the engine is measured as spinning at about 1800 RPM. That's not quite the same as the theoretical model. But it's so close that it gives us a lot of confidence in this version. Of course, the number displayed is hideous. All that floating-point cruft is crazy. What can we do? Rounding is only part of the solution. We need to think through the use case. After all, we use this standing at the helm of the boat; how much detail is appropriate? Limits and ranges The engine has governors and only runs between 800 and 2500 RPM. There's a very tight limit here. Realistically, we're talking about this small range of values: >>> list(range(800, 2500, 200))[800, 1000, 1200, 1400, 1600, 1800, 2000, 2200, 2400] There's no sensible reason for proving any more detailed engine RPMs. It's a sailboat; top speed is 7.5 knots (Nautical miles per hour). Wind and current have far more impact on the boat speed than the difference between 1600 and 1700 RPMs. The tach can't be read to closer than 100-200 RPM. It's not digital, it's a red pointer near little tick lines. There's no reason to preserve more than a few bits of precision. Example of Tach translation Given the engine RPMs and the conversion function, we can deduce that the tachometer display will be between 1000 to 3200. This will map to engine RPMs in the range of about 800 to 2500. We can confirm this with a mapping like this: >>> list(map(eng2, range(1000,3200,200)))[847.3098694826986, 1019.258964596305, 1191.5942982618956, 1364.2609728487703, 1537.2178605443924, 1710.4329833319157, 1883.8807562755746, 2057.5402392829747, 2231.3939741669838, 2405.4271806626366, 2579.627182659544] We've applied the eng2() mapping from tach to engine RPM. For tach readings between 1000 and 3200 in steps of 200, we've computed the actual engine RPMs. For those who use spreadsheets a lot, the range() function is like filling a column with values. The map(eng2, …) function is like filling an adjacent column with a calculation. We've created the result of applying a function to each value of a given range. As shown, this is little difficult to use. We need to do a little more cleanup. What other function do we need to apply to the results? Round to 100 Here's a function that will round up to the nearest 100: def next100(n): return int(round(n, -2)) We could call this a kind of composite function built from a partial application of round() and int() functions. If we map this function to the previous results, we get something a little easier to work with. How does this look? >>> tach= range(1000,3200,200)>>> list(map(next100, map(eng2, tach)))[800, 1000, 1200, 1400, 1500, 1700, 1900, 2100, 2200, 2400, 2600] This expression is a bit complex; let's break it down into three discrete steps: First, map the eng2() function to tach numbers between 1000 and 3200. The result is effectively a sequence of values (it's not actually a list, it's a generator, a potential list) Second, map the next100() function to results of previous mapping Finally, collect a single list object from the results We've applied two functions, eng2() and next100(), to a list of values. In principle, we've created a kind of composite function, next100○eng20(rpm). Python doesn't support function composition directly, hence the complex-looking map of map syntax. Interleave sequences of values The final step is to create a table that shows both the tachometer reading and the computed engine RPMs. We need to interleave the input and output values into a single list of pairs. Here are the tach readings we're working with, as a list: >>> tach= range(1000,3200,200) Here are the engine RPMs: >>> engine= list(map(next100,map(eng2,tach))) Here's how we can interleave the two to create something that shows our tachometer reading and engine RPMs: >>> list(zip(tach, engine))[(1000, 800), (1200, 1000), (1400, 1200), (1600, 1400), (1800, 1500), (2000, 1700),(2200, 1900), (2400, 2100), (2600, 2200), (2800, 2400), (3000, 2600)] The rest is pretty-printing. What's important is that we could take functions like eng() or eng2() and apply it to columns of numbers, creating columns of results. The map() function means that we don't have to write explicit for loops to simply apply a function to a sequence of values. Map is lazy We have a few other observations about the Python higher-order functions. First, these functions are lazy, they don't compute any results until required by other statements or expressions. Because they don't actually create intermediate list objects, they may be quite fast. The laziness feature is true for the built-in higher-order functions map() and filter(). It's also true for many of the functions in the itertools library. Many of these functions don't simply create a list object, they yield values as requested. For debugging purposes, we use list() to see what's being produced. If we don't apply list() to the result of a lazy function, we simply see that it's a lazy function. Here's an example: >>> map(lambda x:x*1.4, range(1000,3200,200))<map object at 0x102130610> We don't see a proper result here, because the lazy map() function didn't do anything. The list(), tuple(), or set() functions will force a lazy map() function to actually get up off the couch and compute something. Function Wrappers There are a number of Python functions which are syntactic sugar for method functions. One example is the len() function. This function behaves as if it had the following definition: def len(obj): return obj.__len__() The function acts like it's simply invoking the object's built-in __len__() method. There are several Python functions that exist only to make the syntax a little more readable. Post-fix syntax purists would prefer to see syntax such as some_list.len(). Those who like their code to look a little more mathematical prefer len(some_list). Some people will go so far as to claim that the presence of prefix functions means that Python isn't strictly object-oriented. This is false; Python is very strictly object-oriented. It doesn't—however—use only postfix method notation. We can write function wrappers to make some method functions a little more palatable. Another good example is the divmod() function. This relies on two method functions, such as the following: a.__divmod__(b) b.__rdivmod__(a) The usual operator rules apply here. If the class for object a implements __divmod__(), then that's used to compute the result. If not, then the same test is made for the class of object b; if there's an implementation, that will be used to compute the results. Otherwise, it's undefined and we'll get an exception. Why wrap a method? Function wrappers for methods are syntactic sugar. They exist to make object methods look like simple functions. In some cases, the functional view is more succinct and expressive. Sometimes the object involved is obvious. For example, the os module functions provide access to OS-level libraries. The OS object is concealed inside the module. Sometimes the object is implied. For example, the random module makes a Random instance for us. We can simply call random.randint() without worrying about the object that was required for this to work properly. Lambdas A lambda is an anonymous function with a degenerate body. It's like a function in some respects and it's unlike a function because of the following two things: A lambda has no name A lambda has no statements A lambda's body is a single expression, nothing more. This expression can have parameters, however, which is why a lambda is a handy form of a callable function. The syntax is essentially as follows: lambda params : expression Here's a concrete example: lambda r: 0.7724*r**1.0134 You may recognize this as the eng2() function defined previously. We don't always need a complete, formal function. Sometimes, we just need an expression that has parameters. Speaking theoretically, a lambda is a one-argument function. When we have multi-argument functions, we can transform it to a series of one-argument lambda forms. This transformation can be helpful for optimization. None of that applies to Python. We'll move on. Using a Lambda with map Here are two equivalent results: map(eng2, tach) map(lambda r: 0.7724*r**1.0134, tach) Here's a previous example, using the lambda instead of the function: >>> tach= range(1000,3200,200)>>> list( map(lambda r: 0.7724*r**1.0134, tach))[847.3098694826986, 1019.258964596305, 1191.5942982618956, 1364.2609728487703, 1537.2178605443924, 1710.4329833319157, 1883.8807562755746, 2057.5402392829747, 2231.3939741669838, 2405.4271806626366, 2579.627182659544] You could scroll back to see that the results are the same. If we're doing a small thing once only, a lambda object might be more clear than a complete function definition. Emphasis here is on small once only. If we start trying to reuse a lambda object, or feel the need to assign a lambda object to a variable, we should really consider a function definition and the associated docstring and doctest features. Another use of Lambdas A common use of lambdas is with three other higher-order functions: sort(), min(), and max(). We might use one of these with a list object: list.sort(key= lambda x: expr) list.min(key= lambda x: expr) list.max(key= lambda x: expr) In each case, we're using a lambda object to embed an expression into the argument values for a function. In some cases, the expression might be very sophisticated; in other cases, it might be something as trivial as lambda x: x[1]. When the expression is trivial, a lambda object is a good idea. If the expression is going to get reused, however, a lambda object might be a bad idea. You can do this… But… The following kind of statement makes sense: some_name = lambda x: 3*x+1 We've created a callable object that takes a single argument value and returns a numeric value such as the following command snippet: def some_name(x): return 3*x+1. There are some differences. Most notably the following: A lambda object is all on one line of code. A possible advantage. There's no docstring. A disadvantage for lambdas of any complexity. Nor is there any doctest in the missing docstring. A significant problem for a lambda object that requires testing. There are ways to test lambdas with doctest outside a docstring, but it seems simpler to switch to a full function definition. We can't easily apply decorators to it. To do it, we lose the @decorator syntax. We can't use any Python statements in it. In particular, no try-except block is possible. For these reasons, we suggest limiting the use of lambdas to truly trivial situations. Callable Objects A callable object fits the model of a function. The unifying feature of all of the things we've looked at is that they're callable. Functions are the primary example of being callable but objects can also be callable. Callable objects can be subclasses of collections.abc.Callable. Because of Python's flexibility, this isn't a requirement, it's merely a good idea. To be callable, a class only needs to provide a __call__() method. Here's a complete callable class definition: from collections.abc import Callableclass Engine(Callable): def __call__(self, tach): return 0.7724*tach**1.0134 We've imported the collections.abc.Callable class. This will provide some assurance that any class that extends this abstract superclass will provide a definition for the __call__() method. This is a handy error-checking feature. Our class extends Callable by providing the needed __call__() method. In this case, the __call__() method performs a calculation on the single parameter value, returning a single result. Here's a callable object built from this class: eng= Engine() This creates a function that we can then use. We can evaluate eng(1000) to get the engine RPMs when the tach reads 1000. Callable objects step-by-step There are two parts to making a function a callable object. We'll emphasize these for folks who are new to object-oriented programming: Define a class. Generally, we make this a subclass of collections.abc.Callable. Technically, we only need to implement a __call__() method. It helps to use the proper superclass because it might help catch a few common mistakes. Create an instance of the class. This instance will be a callable object. The object that's created will be very similar to a defined function. And very similar to a lambda object that's been assigned to a variable. While it will be similar to a def statement, it will have one important additional feature: hysteresis. This can be the source of endless bugs. It can also be a way to improve performance. Callables can have hysteresis Here's an example of a callable object that uses hysteresis as a kind of optimization: class Factorial(Callable): def __init__(self): self.previous = {} def __call__(self, n): if n not in self.previous: self.previous[n]= self.compute(n) return self.previous[n] def compute(self, n): if n == 0 : return 1 return n*self.__call__(n-1)Here's how we can use this:>>> fact= Factorial()>>> fact(5)120 We create an instance of the class, and then call the instance to compute a value for us. The initializer The initialization method looks like this: def __init__(self): self.previous = {} This function creates a cache of previously computed values. This is a technique called memoization. If we've already computed a result once, it's in the self.previous cache; we don't need to compute it again, we already know the answer. The Callable interface The required __call__() method looks like this: def __call__(self, n): if n not in self.previous: self.previous[n]= self.compute(n) return self.previous[n] We've checked the memoization cache first. If the value is not there, we're forced to compute the answer, and insert it into the cache. The final answer is always a value in the cache. A common what if question is what if we have a function of multiple arguments? There are two minuscule changes to support more complex arguments. Use def __call__(self, *n): and self.compute(*n). Since we're only computing factorial, there's no need to over-generalize. The Compute method The essential computation has been allocated to a method called compute. It looks like this: def compute(self, n): if n == 0: return 1 return n*self.__call__(n-1) This does the real work of the callable object: it computes n!. In this case, we've used a pretty standard recursive factorial definition. This recursion relies on the __call__() method to check the cache for previous values. If we don't expect to compute values larger than 1000! (a 2,568 digit number, by the way) the recursion works nicely. If we think we need to compute really large factorials, we'll need to use a different approach. Execute the following code to compute very large factorials: functools.reduce(operator.mul, range(1,n+1)) Either way, we can depend on the internal memoization to leverage previous results. Note the potential issue Hysteresis—memory of what came before—is available to the callable objects. We call functions and lambdas stateless, where callable objects can be stateful. This may be desirable to optimize performance. We can memoize the previous results or we can design an object that's simply confusing. Consider a function like divmod() that returns two values. We could try to define a callable object that first returns the quotient and on the second call with the same arguments returns the remainder: >>> crazy_divmod(355,113)3>>> crazy_divmod(255,113)16 This is technically possible. But it's crazy. Warning: Stay away. We generally expect idempotence: functions do the same thing each time. Implementing memoization didn't alter the basic idempotence of our factorial function. Generator Functions Here's a fun generator, the Collatz function. The function creates a sequence using a simple pair of rules. We'll could call this rule, Half-Or-Three-Plus-One (HOTPO). We'll call it collatz(): def collatz(n): if n % 2 == 0: return n//2 else: return 3*n+1 Each integer argument yields a next integer. These can form a chain. For example, if we start with collatz(13), we get 40. The value of collatz(40) is 20. Here's the sequence of values: 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1At 1, it loops: 1 → 4 → 2 → 1 … Interestingly, all chains seem to lead—eventually—to 1. To explore this, we need a simple function that will build a chain from a given starting value. Successive values Here's a generator function that will build a list object. This iterates through values in the sequence until it reaches 1, when it terminates: def col_list(n): seq= [n] while n != 1: n= collatz(n) seq.append(n) return seq This is not wrong. But it's not really the most useful implementation. This always creates a sequence object. In many cases, we don't really want an object, we only want information about the sequence. We might only want the length, or the largest numbers, or the sum. This is where a generator function might be more useful. A generator function yields elements from the sequence instead of building the sequence as a single object. Generator functions To create a generator function, we write a function that has a loop; inside the loop, there's a yield statement. A function with a yield statement is effectively an Iterable object, it can be used in a for statement to produce values. It doesn't create a big list object, it creates the items that can be accumulated into a list or tuple object. A generator function is lazy: it doesn't compute anything unless forced to by another function needing results. We can iterate through as many (or as few) of the results as we need. For example, list(some_generator()) forces all values to be returned. For another example of a lazy generator, look at how range() objects work. If we evaluate range(10), we only get a generator. If we evaluate list(range(10)), we get a list object. The Collatz generator Here's a generator function that will produce sequences of values using the collatz() method rule shown previously: def col_iter(n): yield n while n != 1: n= collatz(n) yield n When we use this in a for loop or with the list() function, this will yield the argument number. While the number is not 1, it will apply the collatz() function and yield successive values in the chain. When it has yielded 1, it will will terminate. One common pattern for generator functions is to replace all list-accumulation statements with yield statements. Instead of building a list one time at a time, we yield each item. The collatz() function it lazy. We don't get an answer unless we use list() or tuple() or some variation of a for statement context. Using a generator function Here's how this function looks in practice: >>> for i in col_iter(3):… print(i)3105168421 We've used the generator function in a for loop so that it will yield all of the values until it terminates. Collatz function sequences Now we can do some exploration of this Collatz sequence. Here are a few evaluations of the col_iter() function: >>> list(col_iter(3))[3, 10, 5, 16, 8, 4, 2, 1]>>> list(col_iter(5))[5, 16, 8, 4, 2, 1]>>> list(col_iter(6))[6, 3, 10, 5, 16, 8, 4, 2, 1]>>> list(syracuse_iter(13))[13, 40, 20, 10, 5, 16, 8, 4, 2, 1] There's an interesting pattern here. It seems that from 16, we know the rest. Generalizing this: from any number we've already seen, we know the rest. Wait. This means that memoization might be a big help in exploring the values created by this sequence. When we start combining function design patterns like this, we're doing functional programming. We're stepping outside the box of purely object-oriented Python. Alternate styles Here is an alternative version of the collatz() function: def collatz2(n): return n//2 if n%2 == 0 else 3*n+1 This simply collapses the if statements into a single if expression and may not help much. We also have this: collatz3= lambda n: n//2 if n%2 == 0 else 3*n+1 We've collapsed the expression into a lambda object. Helpful? Perhaps not. On the other hand, the function doesn't really need all of the overhead of a full function definition and multiple statements. The lambda object seems to capture everything relevant. Functions as object There's a higher-level function that will produce values until some ending condition is met. We can plug in one of the versions of the collatz() function and a termination test into this general-purpose function: def recurse_until(ending, the_function, n): yield n while not ending(n): n= the_function(n) yield n This requires two plug-in functions, they are as follows: ending() is a function to test to see whether we're done, for example, lambda n: n==1 the_function() is a form of the Collatz function We've completely uncoupled the general idea of recursively applying a function from a specific function and a specific terminating condition. Using the recurs_until() function We can apply this higher-order recurse_until() function like this: >>> recurse_until(lambda n: n==1, syracuse2, 9)<generator object recurse_until at 0x1021278c0> What's that? That's how a lazy generator looks: it didn't return any values because we didn't demand any values. We need to use it in a loop or some kind of expression that iterates through all available values. The list() function, for example, will collect all of the values. Getting the list of values Here's how we make the lazy generator do some work: >>> list(_)[9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1] The _ variable is the previously computed value. It relieves us from the burden of having to write an assignment statement. We can write an expression, see the results, and know the results were automatically saved in the _ variable. Project Euler #14 Which starting number, under one million, produces the longest chain? Try it without memoization. It's really simple: >>> collatz_len= [len(list(recurse_until(lambda n: n==1, collatz2, i))) ... for i in range(1,11)]>>> results = zip(collatz_len, range(1,11))>>> max(results)(20, 9)>>> list(col_iter(9))[9, 28, 14, 7, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1] We defined collatz_len as a list. We're writing a list comprehension that shows the values built from a generator expression. The generator expression evaluates len(something) for i in range(1,11). This means we'll be collecting ten values into the list, each of which is the length of something. The something is a list object built from the recurse_until(lambda n: n==1, collatz2, i) function that we discussed. This will compute the sequence of Collatz values starting from i and proceeding until the value in the sequence is 1. We've zipped the lengths with the original values of i. This will create pairs of lengths and starting numbers. The maximum length will now have a starting value associated with it so that we can confirm that the results match our expectations. And yes, this Project Euler problem could—in principle—be solved in a single line of code. Will this scale to 100? 1,000? 1,000,000? How much will memoization help? Summary In this article, we've looked at five (or six) kinds of Python callables. They all fit the y = f(x) model of a function to varying degrees. When is it appropriate to use each of these different ways to express the same essential concept? Functions are created with def and return. It shouldn't come as a surprise that this should cover most cases. This allows a docstring comment and doctest test cases. We could call these def functions, since they're built with the def statement. Higher-order functions—map(), filter(), and the itertools library—are generally written as plain-old def functions. They're higher-order because they accept functions as arguments or return functions as results. Otherwise, they're just functions. Function wrappers—len(), divmod(), pow(), str(), and repr()—are function syntax wrapped around object methods. These are def'd functions with very tiny bodies. We use them because a.pow(2) doesn't seem as clear as pow(2,a). Lambdas are appropriate for one-time use of something so simple that it doesn't deserve being wrapped in a def statement body. In some cases, we have a small nugget of code that seems more clear when written as a lambda expression rather than a more complete function definition. Simple filter rules, and simple computations are often more clearly shown as a lambda object. The Callable objects have a special property that other functions lack: hysteresis. They can retain the results of previous calculations. We've used this hysteresis property to implement memoizing. This can be a huge performance boost. Callable objects can be used badly, however, to create objects that have simply bizarre behavior. Most functions should strive for idempotence—the same arguments should yield the same results. Generator functions are created with a def statement and at least one yield statement. These functions are iterable. They can be used in a for statement to examine each resulting value. They can also be used with functions like list(), tuple(), and set() to create an actual object from the iterable sequence of values. We might combine them with higher-order functions to do complex processing, one item at a time. It's important to work with each of these kinds of callables. If you only have one tool—a hammer—then every problem has to be reshaped into a nail before you can solve it. Once you have multiple tools available, you can pick the tools that provides the most succinct and expressive solution to the problem. Resources for Article: Further resources on this subject: Expert Python Programming [article] Python Network Programming Cookbook [article] Learning Python Design Patterns [article]

0
0
2725

article-image-working-webstart-and-browser-plugin

Packt

06 Feb 2015

12 min read

Working with WebStart and the Browser Plugin

Packt

06 Feb 2015

12 min read

0
0
7645

article-image-visualforce-development-apex

Packt

06 Feb 2015

12 min read

Visualforce Development with Apex

Packt

06 Feb 2015

12 min read

0
0
2474

article-image-contexts-and-dependency-injection-netbeans

Packt

06 Feb 2015

18 min read

Contexts and Dependency Injection in NetBeans

Packt

06 Feb 2015

18 min read

0
0
9639

article-image-multiplying-performance-parallel-computing

Packt

06 Feb 2015

22 min read

Multiplying Performance with Parallel Computing

Packt

06 Feb 2015

22 min read

In this article, by Aloysius Lim and William Tjhi, authors of the book R High Performance Programming, we will learn how to write and execute a parallel R code, where different parts of the code run simultaneously. So far, we have learned various ways to optimize the performance of R programs running serially, that is in a single process. This does not take full advantage of the computing power of modern CPUs with multiple cores. Parallel computing allows us to tap into all the computational resources available and to speed up the execution of R programs by many times. We will examine the different types of parallelism and how to implement them in R, and we will take a closer look at a few performance considerations when designing the parallel architecture of R programs. (For more resources related to this topic, see here.) Data parallelism versus task parallelism Many modern software applications are designed to run computations in parallel in order to take advantage of the multiple CPU cores available on almost any computer today. Many R programs can similarly be written in order to run in parallel. However, the extent of possible parallelism depends on the computing task involved. On one side of the scale are embarrassingly parallel tasks, where there are no dependencies between the parallel subtasks; such tasks can be made to run in parallel very easily. An example of this is, building an ensemble of decision trees in a random forest algorithm—randomized decision trees can be built independently from one another and in parallel across tens or hundreds of CPUs, and can be combined to form the random forest. On the other end of the scale are tasks that cannot be parallelized, as each step of the task depends on the results of the previous step. One such example is a depth-first search of a tree, where the subtree to search at each step depends on the path taken in previous steps. Most algorithms fall somewhere in between with some steps that must run serially and some that can run in parallel. With this in mind, careful thought must be given when designing a parallel code that works correctly and efficiently. Often an R program has some parts that have to be run serially and other parts that can run in parallel. Before making the effort to parallelize any of the R code, it is useful to have an estimate of the potential performance gains that can be achieved. Amdahl's law provides a way to estimate the best attainable performance gain when you convert a code from serial to parallel execution. It divides a computing task into its serial and potentially-parallel parts and states that the time needed to execute the task in parallel will be no less than this formula: T(n) = T(1)(P + (1-P)/n), where: T(n) is the time taken to execute the task using n parallel processes P is the proportion of the whole task that is strictly serial The theoretical best possible speed up of the parallel algorithm is thus: S(n) = T(1) / T(n) = 1 / (P + (1-P)/n) For example, given a task that takes 10 seconds to execute on one processor, where half of the task can be run in parallel, then the best possible time to run it on four processors is T(4) = 10(0.5 + (1-0.5)/4) = 6.25 seconds. The theoretical best possible speed up of the parallel algorithm with four processors is 1 / (0.5 + (1-0.5)/4) = 1.6x . The following figure shows you how the theoretical best possible execution time decreases as more CPU cores are added. Notice that the execution time reaches a limit that is just above five seconds. This corresponds to the half of the task that must be run serially, where parallelism does not help. Best possible execution time versus number of CPU cores In general, Amdahl's law means that the fastest execution time for any parallelized algorithm is limited by the time needed for the serial portions of the algorithm. Bear in mind that Amdahl's law provides only a theoretical estimate. It does not account for the overheads of parallel computing (such as starting and coordinating tasks) and assumes that the parallel portions of the algorithm are infinitely scalable. In practice, these factors might significantly limit the performance gains of parallelism, so use Amdahl's law only to get a rough estimate of the maximum speedup possible. There are two main classes of parallelism: data parallelism and task parallelism. Understanding these concepts helps to determine what types of tasks can be modified to run in parallel. In data parallelism, a dataset is divided into multiple partitions. Different partitions are distributed to multiple processors, and the same task is executed on each partition of data. Take for example, the task of finding the maximum value in a vector dataset, say one that has one billion numeric data points. A serial algorithm to do this would look like the following code, which iterates over every element of the data in sequence to search for the largest value. (This code is intentionally verbose to illustrate how the algorithm works; in practice, the max() function in R, though also serial in nature, is much faster.) serialmax <- function(data) {max = -Inffor (i in data) {if (i > max)max = i}return max} One way to parallelize this algorithm is to split the data into partitions. If we have a computer with eight CPU cores, we can split the data into eight partitions of 125 million numbers each. Here is the pseudocode for how to perform the same task in parallel: # Run this in parallel across 8 CPU corespart.results <- run.in.parallel(serialmax(data.part))# Compute global maxglobal.max <- serialmax(part.results) This pseudocode runs eight instances of serialmax()in parallel—one for each data partition—to find the local maximum value in each partition. Once all the partitions have been processed, the algorithm finds the global maximum value by finding the largest value among the local maxima. This parallel algorithm works because the global maximum of a dataset must be the largest of the local maxima from all the partitions. The following figure depicts data parallelism pictorially. The key behind data parallel algorithms is that each partition of data can be processed independently of the other partitions, and the results from all the partitions can be combined to compute the final results. This is similar to the mechanism of the MapReduce framework from Hadoop. Data parallelism allows algorithms to scale up easily as data volume increases—as more data is added to the dataset, more computing nodes can be added to a cluster to process new partitions of data. Data parallelism Other examples of computations and algorithms that can be run in a data parallel way include: Element-wise matrix operations such as addition and subtraction: The matrices can be partitioned and the operations are applied to each pair of partitions. Means: The sums and number of elements in each partition can be added to find the global sum and number of elements from which the mean can be computed. K-means clustering: After data partitioning, the K centroids are distributed to all the partitions. Finding the closest centroid is performed in parallel and independently across the partitions. The centroids are updated by first, calculating the sums and the counts of their respective members in parallel, and then consolidating them in a single process to get the global means. Frequent itemset mining using the Partition algorithm: In the first pass, the frequent itemsets are mined from each partition of data to generate a global set of candidate itemsets; in the second pass, the supports of the candidate itemsets are summed from each partition to filter out the globally infrequent ones. The other main class of parallelism is task parallelism, where tasks are distributed to and executed on different processors in parallel. The tasks on each processor might be the same or different, and the data that they act on might also be the same or different. The key difference between task parallelism and data parallelism is that the data is not divided into partitions. An example of a task parallel algorithm performing the same task on the same data is the training of a random forest model. A random forest is a collection of decision trees built independently on the same data. During the training process for a particular tree, a random subset of the data is chosen as the training set, and the variables to consider at each branch of the tree are also selected randomly. Hence, even though the same data is used, the trees are different from one another. In order to train a random forest of say 100 decision trees, the workload could be distributed to a computing cluster with 100 processors, with each processor building one tree. All the processors perform the same task on the same data (or exact copies of the data), but the data is not partitioned. The parallel tasks can also be different. For example, computing a set of summary statistics on the same set of data can be done in a task parallel way. Each process can be assigned to compute a different statistic—the mean, standard deviation, percentiles, and so on. Pseudocode of a task parallel algorithm might look like this: # Run 4 tasks in parallel across 4 coresfor (task in tasks)run.in.parallel(task)# Collect the results of the 4 tasksresults <- collect.parallel.output()# Continue processing after all 4 tasks are complete Implementing data parallel algorithms Several R packages allow code to be executed in parallel. The parallel package that comes with R provides the foundation for most parallel computing capabilities in other packages. Let's see how it works with an example. This example involves finding documents that match a regular expression. Regular expression matching is a fairly computational expensive task, depending on the complexity of the regular expression. The corpus, or set of documents, for this example is a sample of the Reuters-21578 dataset for the topic corporate acquisitions (acq) from the tm package. Because this dataset contains only 50 documents, they are replicated 100,000 times to form a corpus of 5 million documents so that parallelizing the code will lead to meaningful savings in execution times. library(tm)data("acq")textdata <- rep(sapply(content(acq), content), 1e5) The task is to find documents that match the regular expression d+(,d+)? mln dlrs, which represents monetary amounts in millions of dollars. In this regular expression, d+ matches a string of one or more digits, and (,d+)? optionally matches a comma followed by one more digits. For example, the strings 12 mln dlrs, 1,234 mln dlrs and 123,456,789 mln dlrs will match the regular expression. First, we will measure the execution time to find these documents serially with grepl(): pattern <- "\d+(,\d+)? mln dlrs"system.time(res1 <- grepl(pattern, textdata))## user system elapsed ## 65.601 0.114 65.721 Next, we will modify the code to run in parallel and measure the execution time on a computer with four CPU cores: library(parallel)detectCores()## [1] 4cl <- makeCluster(detectCores())part <- clusterSplit(cl, seq_along(textdata))text.partitioned <- lapply(part, function(p) textdata[p])system.time(res2 <- unlist( parSapply(cl, text.partitioned, grepl, pattern = pattern))) ## user system elapsed ## 3.708 8.007 50.806 stopCluster(cl) In this code, the detectCores() function reveals how many CPU cores are available on the machine, where this code is executed. Before running any parallel code, makeCluster() is called to create a local cluster of processing nodes with all four CPU cores. The corpus is then split into four partitions using the clusterSplit() function to determine the ideal split of the corpus such that each partition has roughly the same number of documents. The actual parallel execution of grepl() on each partition of the corpus is carried out by the parSapply() function. Each processing node in the cluster is given a copy of the partition of data that it is supposed to process along with the code to be executed and other variables that are needed to run the code (in this case, the pattern argument). When all four processing nodes have completed their tasks, the results are combined in a similar fashion to sapply(). Finally, the cluster is destroyed by calling stopCluster(). It is good practice to ensure that stopCluster() is always called in production code, even if an error occurs during execution. This can be done as follows: doSomethingInParallel <- function(...) { cl <- makeCluster(...) on.exit(stopCluster(cl)) # do something} In this example, running the task in parallel on four processors resulted in a 23 percent reduction in the execution time. This is not in proportion to the amount of compute resources used to perform the task; with four times as many CPU cores working on it, a perfectly parallelizable task might experience as much as a 75 percent runtime reduction. However, remember Amdahl's law—the speed of parallel code is limited by the serial parts, which includes the overheads of parallelization. In this case, calling makeCluster() with the default arguments creates a socket-based cluster. When such a cluster is created, additional copies of R are run as workers. The workers communicate with the master R process using network sockets, hence the name. The worker R processes are initialized with the relevant packages loaded, and data partitions are serialized and sent to each worker process. These overheads can be significant, especially in data parallel algorithms where large volumes of data needs to be transferred to the worker processes. Besides parSapply(), parallel also provides the parApply() and parLapply() functions; these functions are analogous to the standard sapply(), apply(), and lapply() functions, respectively. In addition, the parLapplyLB() and parSapplyLB() functions provide load balancing, which is useful when the execution of each parallel task takes variable amounts of time. Finally, parRapply() and parCapply() are parallel row and column apply() functions for matrices. On non-Windows systems, parallel supports another type of cluster that often incurs less overheads — forked clusters. In these clusters, new worker processes are forked from the parent R process with a copy of the data. However, the data is not actually copied in the memory unless it is modified by a child process. This means that, compared to socket-based clusters, initializing child processes is quicker and the memory usage is often lower. Another advantage of using forked clusters is that parallel provides a convenient and concise way to run tasks on them via the mclapply(), mcmapply(), and mcMap() functions. (These functions start with mc because they were originally a part of the multicore package) There is no need to explicitly create and destroy the cluster, as these functions do this automatically. We can simply call mclapply() and state the number of worker processes to fork via the mc.cores argument: system.time(res3 <- unlist( mclapply(text.partitioned, grepl, pattern = pattern, mc.cores = detectCores())))## user system elapsed ## 127.012 0.350 33.264 This shows a 49 percent reduction in execution time compared to the serial version, and 35 percent reduction compared to parallelizing using a socket-based cluster. For this example, forked clusters provide the best performance. Due to differences in system configuration, you might see very different results when you try the examples in your own environment. When you develop parallel code, it is important to test the code in an environment that is similar to the one that it will eventually run in. Implementing task parallel algorithms Let's now see how to implement a task parallel algorithm using both socket-based and forked clusters. We will look at how to run the same task and different tasks on workers in a cluster. Running the same task on workers in a cluster To demonstrate how to run the same task on a cluster, the task for this example is to generate 500 million Poisson random numbers. We will do this by using L'Ecuyer's combined multiple-recursive generator, which is the only random number generator in base R that supports multiple streams to generate random numbers in parallel. The random number generator is selected by calling the RNGkind() function. We cannot just use any random number generator in parallel because the randomness of the data depends on the algorithm used to generate random data and the seed value given to each parallel task. Most other algorithms were not designed to produce random numbers in multiple parallel streams, and might produce multiple highly correlated streams of numbers, or worse, multiple identical streams! First, we will measure the execution time of the serial algorithm: RNGkind("L'Ecuyer-CMRG")nsamples <- 5e8lambda <- 10system.time(random1 <- rpois(nsamples, lambda))## user system elapsed## 51.905 0.636 52.544 To generate the random numbers on a cluster, we will first distribute the task evenly among the workers. In the following code, the integer vector samples.per.process contains the number of random numbers that each worker needs to generate on a four-core CPU. The seq() function produces ncores+1 numbers evenly distributed between 0 and nsamples, with the first number being 0 and the next ncores numbers indicating the approximate cumulative number of samples across the worker processes. The round() function rounds off these numbers into integers and diff() computes the difference between them to give the number of random numbers that each worker process should generate. cores <- detectCores()cl <- makeCluster(ncores)samples.per.process <- diff(round(seq(0, nsamples, length.out = ncores+1))) Before we can generate the random numbers on a cluster, each worker needs a different seed from which it can generate a stream of random numbers. The seeds need to be set on all the workers before running the task, to ensure that all the workers generate different random numbers. For a socket-based cluster, we can call clusterSetRNGStream() to set the seeds for the workers, then run the random number generation task on the cluster. When the task is completed, we call stopCluster() to shut down the cluster: clusterSetRNGStream(cl)system.time(random2 <- unlist( parLapply(cl, samples.per.process, rpois, lambda = lambda)))## user system elapsed ## 5.006 3.000 27.436stopCluster(cl) Using four parallel processes in a socket-based cluster reduces the execution time by 48 percent. The performance of this type of cluster for this example is better than that of the data parallel example because there is less data to copy to the worker processes—only an integer that indicates how many random numbers to generate. Next, we run the same task on a forked cluster (again, this is not supported on Windows). The mclapply() function can set the random number seeds for each worker for us, when the mc.set.seed argument is set to TRUE; we do not need to call clusterSetRNGStream(). Otherwise, the code is similar to that of the socket-based cluster: system.time(random3 <- unlist( mclapply(samples.per.process, rpois, lambda = lambda, mc.set.seed = TRUE, mc.cores = ncores))) ## user system elapsed ## 76.283 7.272 25.052 On our test machine, the execution time of the forked cluster is slightly faster, but close to that of the socket-based cluster, indicating that the overheads for this task are similar for both types of clusters. Running different tasks on workers in a cluster So far, we have executed the same tasks on each parallel process. The parallel package also allows different tasks to be executed on different workers. For this example, the task is to generate not only Poisson random numbers, but also uniform, normal, and exponential random numbers. As before, we start by measuring the time to perform this task serially: RNGkind("L'Ecuyer-CMRG")nsamples <- 5e7pois.lambda <- 10system.time(random1 <- list(pois = rpois(nsamples, pois.lambda), unif = runif(nsamples), norm = rnorm(nsamples), exp = rexp(nsamples)))## user system elapsed ## 14.180 0.384 14.570 In order to run different tasks on different workers on socket-based clusters, a list of function calls and their associated arguments must be passed to parLapply(). This is a bit cumbersome, but parallel unfortunately does not provide an easier interface to run different tasks on a socket-based cluster. In the following code, the function calls are represented as a list of lists, where the first element of each sublist is the name of the function that runs on a worker, and the second element contains the function arguments. The function do.call() is used to call the given function with the given arguments. cores <- detectCores()cl <- makeCluster(cores)calls <- list(pois = list("rpois", list(n = nsamples, lambda = pois.lambda)), unif = list("runif", list(n = nsamples)), norm = list("rnorm", list(n = nsamples)), exp = list("rexp", list(n = nsamples)))clusterSetRNGStream(cl)system.time( random2 <- parLapply(cl, calls, function(call) { do.call(call[[1]], call[[2]]) }))## user system elapsed ## 2.185 1.629 10.403stopCluster(cl) On forked clusters on non-Windows machines, the mcparallel() and mccollect() functions offer a more intuitive way to run different tasks on different workers. For each task, mcparallel() sends the given task to an available worker. Once all the workers have been assigned their tasks, mccollect() waits for the workers to complete their tasks and collects the results from all the workers. mc.reset.stream()system.time({ jobs <- list() jobs[[1]] <- mcparallel(rpois(nsamples, pois.lambda), "pois", mc.set.seed = TRUE) jobs[[2]] <- mcparallel(runif(nsamples), "unif", mc.set.seed = TRUE) jobs[[3]] <- mcparallel(rnorm(nsamples), "norm", mc.set.seed = TRUE) jobs[[4]] <- mcparallel(rexp(nsamples), "exp", mc.set.seed = TRUE) random3 <- mccollect(jobs)})## user system elapsed ## 14.535 3.569 7.97 Notice that we also had to call mc.reset.stream() to set the seeds for random number generation in each worker. This was not necessary when we used mclapply(), which calls mc.reset.stream() for us. However, mcparallel() does not, so we need to call it ourselves. Summary In this article, we learned about two classes of parallelism: data parallelism and task parallelism. Data parallelism is good for tasks that can be performed in parallel on partitions of a dataset. The dataset to be processed is split into partitions and each partition is processed on a different worker processes. Task parallelism, on the other hand, divides a set of similar or different tasks to amongst the worker processes. In either case, Amdahl's law states that the maximum improvement in speed that can be achieved by parallelizing code is limited by the proportion of that code that can be parallelized. Resources for Article: Further resources on this subject: Using R for Statistics, Research, and Graphics [Article] Learning Data Analytics with R and Hadoop [Article] Aspects of Data Manipulation in R [Article]

0
0
3888

article-image-chain-responsibility-pattern

Packt

05 Feb 2015

12 min read

The Chain of Responsibility Pattern

Packt

05 Feb 2015

12 min read

In this article by Sakis Kasampalis, author of the book Mastering Python Design Patterns, we will see a detailed description of the Chain of Responsibility design pattern with the help of a real-life example as well as a software example. Also, its use cases and implementation are discussed. (For more resources related to this topic, see here.) When developing an application, most of the time we know which method should satisfy a particular request in advance. However, this is not always the case. For example, we can think of any broadcast computer network, such as the original Ethernet implementation [j.mp/wikishared]. In broadcast computer networks, all requests are sent to all nodes (broadcast domains are excluded for simplicity), but only the nodes that are interested in a sent request process it. All computers that participate in a broadcast network are connected to each other using a common medium such as the cable that connects the three nodes in the following figure: If a node is not interested or does not know how to handle a request, it can perform the following actions: Ignore the request and do nothing Forward the request to the next node The way in which the node reacts to a request is an implementation detail. However, we can use the analogy of a broadcast computer network to understand what the chain of responsibility pattern is all about. The Chain of Responsibility pattern is used when we want to give a chance to multiple objects to satisfy a single request, or when we don't know which object (from a chain of objects) should process a specific request in advance. The principle is the same as the following: There is a chain (linked list, tree, or any other convenient data structure) of objects. We start by sending a request to the first object in the chain. The object decides whether it should satisfy the request or not. The object forwards the request to the next object. This procedure is repeated until we reach the end of the chain. At the application level, instead of talking about cables and network nodes, we can focus on objects and the flow of a request. The following figure, courtesy of a title="Scala for Machine Learning" www.sourcemaking.com [j.mp/smchain], shows how the client code sends a request to all processing elements (also known as nodes or handlers) of an application: Note that the client code only knows about the first processing element, instead of having references to all of them, and each processing element only knows about its immediate next neighbor (called the successor), not about every other processing element. This is usually a one-way relationship, which in programming terms means a singly linked list in contrast to a doubly linked list; a singly linked list does not allow navigation in both ways, while a doubly linked list allows that. This chain organization is used for a good reason. It achieves decoupling between the sender (client) and the receivers (processing elements) [GOF95, page 254]. A real-life example ATMs and, in general, any kind of machine that accepts/returns banknotes or coins (for example, a snack vending machine) use the chain of responsibility pattern. There is always a single slot for all banknotes, as shown in the following figure, courtesy of www.sourcemaking.com: When a banknote is dropped, it is routed to the appropriate receptacle. When it is returned, it is taken from the appropriate receptacle [j.mp/smchain], [j.mp/c2chain]. We can think of the single slot as the shared communication medium and the different receptacles as the processing elements. The result contains cash from one or more receptacles. For example, in the preceding figure, we see what happens when we request $175 from the ATM. A software example I tried to find some good examples of Python applications that use the Chain of Responsibility pattern but I couldn't, most likely because Python programmers don't use this name. So, my apologies, but I will use other programming languages as a reference. The servlet filters of Java are pieces of code that are executed before an HTTP request arrives at a target. When using servlet filters, there is a chain of filters. Each filter performs a different action (user authentication, logging, data compression, and so forth), and either forwards the request to the next filter until the chain is exhausted, or it breaks the flow if there is an error (for example, the authentication failed three consecutive times) [j.mp/soservl]. Apple's Cocoa and Cocoa Touch frameworks use Chain of Responsibility to handle events. When a view receives an event that it doesn't know how to handle, it forwards the event to its superview. This goes on until a view is capable of handling the event or the chain of views is exhausted [j.mp/chaincocoa]. Use cases By using the Chain of Responsibility pattern, we give a chance to a number of different objects to satisfy a specific request. This is useful when we don't know which object should satisfy a request in advance. An example is a purchase system. In purchase systems, there are many approval authorities. One approval authority might be able to approve orders up to a certain value, let's say $100. If the order is more than $100, the order is sent to the next approval authority in the chain that can approve orders up to $200, and so forth. Another case where Chain of Responsibility is useful is when we know that more than one object might need to process a single request. This is what happens in an event-based programming. A single event such as a left mouse click can be caught by more than one listener. It is important to note that the Chain of Responsibility pattern is not very useful if all the requests can be taken care of by a single processing element, unless we really don't know which element that is. The value of this pattern is the decoupling that it offers. Instead of having a many-to-many relationship between a client and all processing elements (and the same is true regarding the relationship between a processing element and all other processing elements), a client only needs to know how to communicate with the start (head) of the chain. The following figure demonstrates the difference between tight and loose coupling. The idea behind loosely coupled systems is to simplify maintenance and make it easier for us to understand how they function [j.mp/loosecoup]: Implementation There are many ways to implement Chain of Responsibility in Python, but my favorite implementation is the one by Vespe Savikko [j.mp/savviko]. Vespe's implementation uses dynamic dispatching in a Pythonic style to handle requests [j.mp/ddispatch]. Let's implement a simple event-based system using Vespe's implementation as a guide. The following is the UML class diagram of the system: The Event class describes an event. We'll keep it simple, so in our case an event has only name: class Event: def __init__(self, name): self.name = name def __str__(self): return self.name The Widget class is the core class of the application. The parent aggregation shown in the UML diagram indicates that each widget can have a reference to a parent object, which by convention, we assume is a Widget instance. Note, however, that according to the rules of inheritance, an instance of any of the subclasses of Widget (for example, an instance of MsgText) is also an instance of Widget. The default value of parent is None: class Widget: def __init__(self, parent=None): self.parent = parent The handle() method uses dynamic dispatching through hasattr() and getattr() to decide who is the handler of a specific request (event). If the widget that is asked to handle an event does not support it, there are two fallback mechanisms. If the widget has parent, then the handle() method of parent is executed. If the widget has no parent but a handle_default() method, handle_default() is executed: def handle(self, event): handler = 'handle_{}'.format(event) if hasattr(self, handler): method = getattr(self, handler) method(event) elif self.parent: self.parent.handle(event) elif hasattr(self, 'handle_default'): self.handle_default(event) At this point, you might have realized why the Widget and Event classes are only associated (no aggregation or composition relationships) in the UML class diagram. The association is used to show that the Widget class "knows" about the Event class but does not have any strict references to it, since an event needs to be passed only as a parameter to handle(). MainWIndow, MsgText, and SendDialog are all widgets with different behaviors. Not all these three widgets are expected to be able to handle the same events, and even if they can handle the same event, they might behave differently. MainWIndow can handle only the close and default events: class MainWindow(Widget): def handle_close(self, event): print('MainWindow: {}'.format(event)) def handle_default(self, event): print('MainWindow Default: {}'.format(event)) SendDialog can handle only the paint event: class SendDialog(Widget): def handle_paint(self, event): print('SendDialog: {}'.format(event)) Finally, MsgText can handle only the down event: class MsgText(Widget): def handle_down(self, event): print('MsgText: {}'.format(event)) The main() function shows how we can create a few widgets and events, and how the widgets react to those events. All events are sent to all the widgets. Note the parent relationship of each widget. The sd object (an instance of SendDialog) has as its parent the mw object (an instance of MainWindow). However, not all objects need to have a parent that is an instance of MainWindow. For example, the msg object (an instance of MsgText) has the sd object as a parent: def main(): mw = MainWindow() sd = SendDialog(mw) msg = MsgText(sd) for e in ('down', 'paint', 'unhandled', 'close'): evt = Event(e) print('nSending event -{}- to MainWindow'.format(evt)) mw.handle(evt) print('Sending event -{}- to SendDialog'.format(evt)) sd.handle(evt) print('Sending event -{}- to MsgText'.format(evt)) msg.handle(evt) The following is the full code of the example (chain.py): class Event: def __init__(self, name): self.name = name def __str__(self): return self.name class Widget: def __init__(self, parent=None): self.parent = parent def handle(self, event): handler = 'handle_{}'.format(event) if hasattr(self, handler): method = getattr(self, handler) method(event) elif self.parent: self.parent.handle(event) elif hasattr(self, 'handle_default'): self.handle_default(event) class MainWindow(Widget): def handle_close(self, event): print('MainWindow: {}'.format(event)) def handle_default(self, event): print('MainWindow Default: {}'.format(event)) class SendDialog(Widget): def handle_paint(self, event): print('SendDialog: {}'.format(event)) class MsgText(Widget): def handle_down(self, event): print('MsgText: {}'.format(event)) def main(): mw = MainWindow() sd = SendDialog(mw) msg = MsgText(sd) for e in ('down', 'paint', 'unhandled', 'close'): evt = Event(e) print('nSending event -{}- to MainWindow'.format(evt)) mw.handle(evt) print('Sending event -{}- to SendDialog'.format(evt)) sd.handle(evt) print('Sending event -{}- to MsgText'.format(evt)) msg.handle(evt) if __name__ == '__main__': main() Executing chain.py gives us the following results: >>> python3 chain.py Sending event -down- to MainWindow MainWindow Default: down Sending event -down- to SendDialog MainWindow Default: down Sending event -down- to MsgText MsgText: down Sending event -paint- to MainWindow MainWindow Default: paint Sending event -paint- to SendDialog SendDialog: paint Sending event -paint- to MsgText SendDialog: paint Sending event -unhandled- to MainWindow MainWindow Default: unhandled Sending event -unhandled- to SendDialog MainWindow Default: unhandled Sending event -unhandled- to MsgText MainWindow Default: unhandled Sending event -close- to MainWindow MainWindow: close Sending event -close- to SendDialog MainWindow: close Sending event -close- to MsgText MainWindow: close There are some interesting things that we can see in the output. For instance, sending a down event to MainWindow ends up being handled by the default MainWindow handler. Another nice case is that although a close event cannot be handled directly by SendDialog and MsgText, all the close events end up being handled properly by MainWindow. That's the beauty of using the parent relationship as a fallback mechanism. If you want to spend some more creative time on the event example, you can replace the dumb print statements and add some actual behavior to the listed events. Of course, you are not limited to the listed events. Just add your favorite event and make it do something useful! Another exercise is to add a MsgText instance during runtime that has MainWindow as the parent. Is this hard? Do the same for an event (add a new event to an existing widget). Which is harder? Summary In this article, we covered the Chain of Responsibility design pattern. This pattern is useful to model requests / handle events when the number and type of handlers isn't known in advance. Examples of systems that fit well with Chain of Responsibility are event-based systems, purchase systems, and shipping systems. In the Chain Of Responsibility pattern, the sender has direct access to the first node of a chain. If the request cannot be satisfied by the first node, it forwards to the next node. This continues until either the request is satisfied by a node or the whole chain is traversed. This design is used to achieve loose coupling between the sender and the receiver(s). ATMs are an example of Chain Of Responsibility. The single slot that is used for all banknotes can be considered the head of the chain. From here, depending on the transaction, one or more receptacles is used to process the transaction. The receptacles can be considered the processing elements of the chain. Java's servlet filters use the Chain of Responsibility pattern to perform different actions (for example, compression and authentication) on an HTTP request. Apple's Cocoa frameworks use the same pattern to handle events such as button presses and finger gestures. Resources for Article: Further resources on this subject: Exploring Model View Controller [Article] Analyzing a Complex Dataset [Article] Automating Your System Administration and Deployment Tasks Over SSH [Article]

0
0
10075

Packt

04 Feb 2015

13 min read

OpenLayers' Key Components

Packt

04 Feb 2015

13 min read

In this article by, Thomas Gratier, Paul Spencer, and Erik Hazzard, authors of the book OpenLayers 3 Beginner's Guide, we will see the various components of OpenLayers and a short description about them. (For more resources related to this topic, see here.) The OpenLayers library provides web developers with components useful for building web mapping applications. Following the principles of object-oriented design, these components are called classes. The relationship between all the classes in the OpenLayers library is part of the deliberate design, or architecture, of the library. There are two types of relationships that we, as developers using the library, need to know about: relationships between classes and inheritance between classes. Relationships between classes describe how classes, or more specifically, instances of classes, are related to each other. There are several different conceptual ways that classes can be related, but basically a relationship between two classes implies that one of the class uses the other in some way, and often vice-versa. Inheritance between classes shows how behavior of classes, and their relationships are shared with other classes. Inheritance is really just a way of sharing common behavior between several different classes. We'll start our discussion of the key components of OpenLayers by focusing on the first of these – the relationship between classes. We'll start by looking at the Map class – ol.Map. Its all about the map Instances of the Map class are at the center of every OpenLayers application. These objects are instances of the ol.Map class and they use instances of other classes to do their job, which is to put an interactive map onto a web page. Almost every other class in the OpenLayers is related to the Map class in some direct or indirect relationship. The following diagram illustrates the direct relationships that we are most interested in: The preceding diagram shows the most important relationships between the Map class and other classes it uses to do its job. It tells us several important things: A map has 0 or 1 view instances and it uses the name view to refer to it. A view may be associated with multiple maps, however. A map may have 0 or more instances of layers managed by a Collection class and a layer may be associated with 0 or one Map class. The Map class has a member variable named layers that it uses to refer to this collection. A map may have 0 or more instances of overlays managed by a Collection class and an overlay may be associated with 0 or one Map class. The Map class has a member variable named overlays that it uses to refer to this collection. A map may have 0 or more instances of controls managed by a class called ol.Collection and controls may be associated with 0 or one Map class. The Map class has a member variable named controls that it uses to refer to this collection. A map may have 0 or more instances of interactions managed by a Collection class and an interaction may be associated with 0 or one Map class. The Map class has a member variable named interactions that it uses to refer to this collection. Although these are not the only relationships between the Map class and other classes, these are the ones we'll be working with the most. The View class (ol.View) manages information about the current position of the Map class. If you are familiar with the programming concept of MVC (Model-View-Controller), be aware that the view class is not a View in the MVC sense. It does not provide the presentation layer for the map, rather it acts more like a controller (although there is not an exact parallel because OpenLayers was not designed with MVC in mind). The Layer class (ol.layer.Base) is the base class for classes that provide data to the map to be rendered. The Overlay class (ol.Overlay) is an interactive visual element like a control, but it is tied to a specific geographic position. The Control class (ol.control.Control) is the base class for a group of classes that collectively provide the ability to a user to interact with the Map. Controls have a visible user interface element (such as a button or a form input element) with which the user interacts. The Interaction class (ol.interaction.Interaction) is the base class for a group of classes that also allow the user to interact with the map, but differ from controls in which they have no visible user interface element. For example, the DragPan interaction allows the user to click on and drag the map to pan around. Controlling the Map's view The OpenLayers view class, ol.View, represents a simple two-dimensional view of the world. It is responsible for determining where, and to some degree how, the user is looking at the world. It is responsible for managing the following information: The geographic center of the map The resolution of the map, which is to say how much of the map we can see around the center The rotation of the map Although you can create a map without a view, it won't display anything until a view is assigned to it. Every map must have a view in order to display any map data at all. However, a view may be shared between multiple instances of the Map class. This effectively synchronizes the center, resolution, and rotation of each of the maps. In this way, you can create two or more maps in different HTML containers on a web page, even showing different information, and have them look at the same world position. Changing the position of any of the maps (for instance, by dragging one) automatically updates the other maps at the same time! Displaying map content So, if the view is responsible for managing where the user is looking in the world, which component is responsible for determining what the user sees there? That's the job of layers and overlays. A layer provides access to a source of geospatial data. There are two basic kinds of layers, that is, raster and vector layers: In computer graphics, the term raster (raster graphics) refers to a digital image. In OpenLayers, a raster layer is one that displays images in your map at specific geographic locations. In computer graphics, the term vector (vector graphics) refers to images that are defined in terms of geometric shapes, such as points, lines, and polygons—or mathematic formulae such as Bézier curves. In OpenLayers, a vector layer reads geospatial data from vector data (such as a KML file) and the data can then be drawn onto the map. Layers are not the only way to display spatial information on the map. The other way is to use an overlay. We can create instances of ol.Overlay and add them to the map at specific locations. The overlay then positions its content (an HTML element) on the map at the specified location. The HTML element can then be used like any other HTML element. The most common use of overlays is to display spatially relevant information in a pop-up dialog in response to the mouse moving over, or clicking on a geographic feature. Interacting with the map As mentioned earlier, the two components that allow users to interact with the map are Interactions and Controls. Let's look at them in a bit more detail. Using interactions Interactions are components that allow the user to interact with the map via some direct input, usually by using the mouse (or a finger with a touch screen). Interactions have no visible user interface. The default set of interactions are: ol.interaction.DoubleClickZoom: If you double-click the left mouse button, the map will zoom in by a factor of 2 ol.interaction.DragPan: If you drag the map, it will pan as you move the mouse ol.interaction.PinchRotate: On touch-enabled devices, placing two fingers on the device and rotating them in a circular motion will rotate the map ol.interaction.PinchZoom: On touch-enabled devices, placing two fingers on the device and pinching them together or spreading them apart will zoom the map out and in respectively ol.interaction.KeyboardPan: You can use the arrow keys to pan the map in the direction of the arrows ol.interaction.KeyboardZoom: You can use the + and – keys to zoom in and out ol.interaction.MouseWheelZoom: You can use the scroll wheel on a mouse to zoom the map in and out ol.interaction.DragZoom: If you hold the Shift key while dragging on map, a rectangular region will be drawn and when you release the mouse button, you will zoom into that area Controls Controls are components that allow the user to modify the map state via some visible user interface element, such as a button. In the examples we've seen so far, we've seen zoom buttons in the top-left corner of the map and an attribution control in the bottom-right corner of the map. In fact, the default controls are: ol.control.Zoom: This displays the zoom buttons in the top-left corner. ol.control.Rotate: This is a button to reset rotation to 0; by default, this is only displayed when the map's rotation is not 0. Ol.control.Attribution: This displays attribution text for the layers currently visible in the map. By default, the attributions are collapsed to a single icon in the bottom-right corner and clicking the icon will show the attributions. This concludes our brief overview of the central components of an OpenLayers application. We saw that the Map class is at the center of everything and there are some key components—the view, layers, overlays, interactions, and controls—that it uses to accomplish its job of putting an interactive map onto a web page. At the beginning of this article, we talked about both relationships and inheritance. So far, we've only covered the relationships. In the next section, we'll show the inheritance architecture of the key components and introduce three classes that have been working behind the scenes to make everything work. OpenLayers' super classes In this section, we will look at three classes in the OpenLayers library that we won't often work directly with, but which provide an enormous amount of functionality to most of the other classes in the library. The first two classes, Observable and Object, are at the base of the inheritance tree for OpenLayers—the so-called super classes that most classes inherit from. The third class, Collection, isn't actually a super class but is used as the basis for many relationships between classes in OpenLayers—we've already seen that the Map class relationships with layers, overlays, interactions, and controls are managed by instances of the Collection class. Before we jump into the details, take a look at the inheritance diagram for the components we've already discussed: As you can see, the Observable class, ol.Observable, is the base class for every component of OpenLayers that we've seen so far. In fact, there are very few classes in the OpenLayers library that do not inherit from the Observable class or one of its subclasses. Similarly, the Object class, ol.Object, is the base class for many classes in the library and itself is a subclass of Observable. The Observable and Object classes aren't very glamorous. You can't see them in action and they don't do anything very exciting from a user's perspective. What they do though is provide two common sets of behavior that you can expect to be able to use on almost every object you create or access through the OpenLayers library—Event management and Key-Value Observing (KVO). Event management with the Observable class An event is basically what it sounds like—something happening. Events are a fundamental part of how various components of OpenLayers—the map, layers, controls, and pretty much everything else—communicate with each other. It is often important to know when something has happened and to react to it. One type of event that is very useful is a user-generated event, such as a mouse click or touches on a mobile device's screen. Knowing when the user has clicked and dragged on the Map class allows some code to react to this and move the map to simulate panning it. Other types of events are internal, such as the map being moved or data finishing loading. To continue the previous example, once the map has moved to simulate panning, another event is issued by OpenLayers to say that the map has finished moving so that other parts of OpenLayers can react by updating the user interface with the center coordinates or by loading more data. Key-Value Observing with the Object class OpenLayers' Object class inherits from Observable and implements a software pattern called Key-Value Observing (KVO). With KVO, an object representing some data maintains a list of other objects that wish to observe it. When the data value changes, the observers are notified automatically. Working with Collections The last section for this article is about the OpenLayers' Collection class, ol.Collection. As mentioned, the Collection class is not a super class like Observable and Object, but it is an integral part of the relationship model. Many classes in OpenLayers make use of the Collection class to manage one-to-many relationships. At its core, the Collection class is a JavaScript array with additional convenience methods. It also inherits directly from the Object class and inherits the functionality of both Observable and Object. This makes the Collection class extremely powerful. Collection properties A Collection class, inherited from the Object class, has one observable property, length. When a collection changes (elements are added or removed), it's length property is updated. This means it also emits an event, change:length, when the length property is changed. Collection events A Collection class also inherits the functionality of the Observable class (via Object class) and emits two other events—add and remove. Registered event handler functions of both events will receive a single argument, a CollectionEvent, that has an element property with the element that was added or removed. Summary This wraps up our overview of the key concepts in the OpenLayers library. We took a quick look at the key components of the library from two different aspects—relationships and inheritance. With the Map class as the central object of any OpenLayers application, we looked at its main relationships to other classes including views, layers, overlays, interactions, and controls. We briefly introduced each of these classes to give an overview of primary purpose. We then investigated inheritance related to these objects and reviewed the super classes that provide functionality to most classes in the OpenLayers library—the Observable and Object classes. The Observable class provides a basic event mechanism and the Object class adds observable properties with a powerful binding feature. Lastly, we looked at the Collection class. Although this isn't part of the inheritance structure, it is crucial to know how one-to-many relationships work throughout the library (including the Map class relationships with layers, overlays, interactions, and controls). Resources for Article: Further resources on this subject: OGC for ESRI Professionals [Article] Improving proximity filtering with KNN [Article] OpenLayers: Overview of Vector Layer [Article]

0
0
3611

Packt

23 Jan 2015

15 min read

Adding Authentication

Packt

23 Jan 2015

15 min read

This article written by Mat Ryer, the author of Go Programming Blueprints, is focused on high-performance transmission of messages from the clients to the server and back again, but our users have no way of knowing who they are talking to. One solution to this problem is building of some kind of signup and login functionality and letting our users create accounts and authenticate themselves before they can open the chat page. (For more resources related to this topic, see here.) Whenever we are about to build something from scratch, we must ask ourselves how others have solved this problem before (it is extremely rare to encounter genuinely original problems), and whether any open solutions or standards already exist that we can make use of. Authorization and authentication are hardly new problems, especially in the world of the Web, with many different protocols out there to choose from. So how do we decide the best option to pursue? As always, we must look at this question from the point of view of the user. A lot of websites these days allow you to sign in using your accounts existing elsewhere on a variety of social media or community websites. This saves users the tedious job of entering all their account information over and over again as they decide to try out different products and services. It also has a positive effect on the conversion rates for new sites. In this article, we will enhance our chat codebase to add authentication, which will allow our users to sign in using Google, Facebook, or GitHub and you'll see how easy it is to add other sign-in portals too. In order to join the chat, users must first sign in. Following this, we will use the authorized data to augment our user experience so everyone knows who is in the room, and who said what. In this article, you will learn to: Use the decorator pattern to wrap http.Handler types to add additional functionality to handlers Serve HTTP endpoints with dynamic paths Use the Gomniauth open source project to access authentication services Get and set cookies using the http package Encode objects as Base64 and back to normal again Send and receive JSON data over a web socket Give different types of data to templates Work with channels of your own types Handlers all the way down For our chat application, we implemented our own http.Handler type in order to easily compile, execute, and deliver HTML content to browsers. Since this is a very simple but powerful interface, we are going to continue to use it wherever possible when adding functionality to our HTTP processing. In order to determine whether a user is authenticated, we will create an authentication wrapper handler that performs the check, and passes execution on to the inner handler only if the user is authenticated. Our wrapper handler will satisfy the same http.Handler interface as the object inside it, allowing us to wrap any valid handler. In fact, even the authentication handler we are about to write could be later encapsulated inside a similar wrapper if needed. Diagram of a chaining pattern when applied to HTTP handlers The preceding figure shows how this pattern could be applied in a more complicated HTTP handler scenario. Each object implements the http.Handler interface, which means that object could be passed into the http.Handle method to directly handle a request, or it can be given to another object, which adds some kind of extra functionality. The Logging handler might write to a logfile before and after the ServeHTTP method is called on the inner handler. Because the inner handler is just another http.Handler, any other handler can be wrapped in (or decorated with) the Logging handler. It is also common for an object to contain logic that decides which inner handler should be executed. For example, our authentication handler will either pass the execution to the wrapped handler, or handle the request itself by issuing a redirect to the browser. That's plenty of theory for now; let's write some code. Create a new file called auth.go in the chat folder: package main import ( "net/http" ) type authHandler struct { next http.Handler } func (h *authHandler) ServeHTTP(w http.ResponseWriter, r *http.Request) { if _, err := r.Cookie("auth"); err == http.ErrNoCookie { // not authenticated w.Header().Set("Location", "/login") w.WriteHeader(http.StatusTemporaryRedirect) } else if err != nil { // some other error panic(err.Error()) } else { // success - call the next handler h.next.ServeHTTP(w, r) } } func MustAuth(handler http.Handler) http.Handler { return &authHandler{next: handler} } The authHandler type not only implements the ServeHTTP method (which satisfies the http.Handler interface) but also stores (wraps) http.Handler in the next field. Our MustAuth helper function simply creates authHandler that wraps any other http.Handler. This is the pattern in general programming practice that allows us to easily add authentication to our code in main.go. Let us tweak the following root mapping line: http.Handle("/", &templateHandler{filename: "chat.html"}) Let us change the first argument to make it explicit about the page meant for chatting. Next, let's use the MustAuth function to wrap templateHandler for the second argument: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) Wrapping templateHandler with the MustAuth function will cause execution to run first through our authHandler, and only to templateHandler if the request is authenticated. The ServeHTTP method in our authHandler will look for a special cookie called auth, and use the Header and WriteHeader methods on http.ResponseWriter to redirect the user to a login page if the cookie is missing. Build and run the chat application and try to hit http://localhost:8080/chat: go build -o chat ./chat -host=":8080" You need to delete your cookies to clear out previous auth tokens, or any other cookies that might be left over from other development projects served through localhost. If you look in the address bar of your browser, you will notice that you are immediately redirected to the /login page. Since we cannot handle that path yet, you'll just get a 404 page not found error. Making a pretty social sign-in page There is no excuse for building ugly apps, and so we will build a social sign-in page that is as pretty as it is functional. Bootstrap is a frontend framework used to develop responsive projects on the Web. It provides CSS and JavaScript code that solve many user-interface problems in a consistent and good-looking way. While sites built using Bootstrap all tend to look the same (although there are plenty of ways in which the UI can be customized), it is a great choice for early versions of apps, or for developers who don't have access to designers. If you build your application using the semantic standards set forth by Bootstrap, it becomes easy for you to make a Bootstrap theme for your site or application and you know it will slot right into your code. We will use the version of Bootstrap hosted on a CDN so we don't have to worry about downloading and serving our own version through our chat application. This means that in order to render our pages properly, we will need an active Internet connection, even during development. If you prefer to download and host your own copy of Bootstrap, you can do so. Keep the files in an assets folder and add the following call to your main function (it uses http.Handle to serve the assets via your application): http.Handle("/assets/", http.StripPrefix("/assets", http.FileServer(http.Dir("/path/to/assets/")))) Notice how the http.StripPrefix and http.FileServer functions return objects that satisfy the http.Handler interface as per the decorator pattern that we implement with our MustAuth helper function. In main.go, let's add an endpoint for the login page: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) http.Handle("/login", &templateHandler{filename: "login.html"}) http.Handle("/room", r) Obviously, we do not want to use the MustAuth method for our login page because it will cause an infinite redirection loop. Create a new file called login.html inside our templates folder, and insert the following HTML code: <html> <head> <title>Login</title> <link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css"> </head> <body> <div class="container"> <div class="page-header"> <h1>Sign in</h1> </div> <div class="panel panel-danger"> <div class="panel-heading"> <h3 class="panel-title">In order to chat, you must be signed in</h3> </div> <div class="panel-body"> <p>Select the service you would like to sign in with:</p> <ul> <li> <a href="/auth/login/facebook">Facebook</a> </li> <li> <a href="/auth/login/github">GitHub</a> </li> <li> <a href="/auth/login/google">Google</a> </li> </ul> </div> </div> </div> </body> </html> Restart the web server and navigate to http://localhost:8080/login. You will notice that it now displays our sign-in page: Endpoints with dynamic paths Pattern matching for the http package in the Go standard library isn't the most comprehensive and fully featured implementation out there. For example, Ruby on Rails makes it much easier to have dynamic segments inside the path: "auth/:action/:provider_name" This then provides a data map (or dictionary) containing the values that it automatically extracted from the matched path. So if you visit auth/login/google, then params[:provider_name] would equal google, and params[:action] would equal login. The most the http package lets us specify by default is a path prefix, which we can do by leaving a trailing slash at the end of the pattern: "auth/" We would then have to manually parse the remaining segments to extract the appropriate data. This is acceptable for relatively simple cases, which suits our needs for the time being since we only need to handle a few different paths such as: /auth/login/google /auth/login/facebook /auth/callback/google /auth/callback/facebook If you need to handle more advanced routing situations, you might want to consider using dedicated packages such as Goweb, Pat, Routes, or mux. For extremely simple cases such as ours, the built-in capabilities will do. We are going to create a new handler that powers our login process. In auth.go, add the following loginHandler code: // loginHandler handles the third-party login process. // format: /auth/{action}/{provider} func loginHandler(w http.ResponseWriter, r *http.Request) { segs := strings.Split(r.URL.Path, "/") action := segs[2] provider := segs[3] switch action { case "login": log.Println("TODO handle login for", provider) default: w.WriteHeader(http.StatusNotFound) fmt.Fprintf(w, "Auth action %s not supported", action) } } In the preceding code, we break the path into segments using strings.Split before pulling out the values for action and provider. If the action value is known, we will run the specific code, otherwise we will write out an error message and return an http.StatusNotFound status code (which in the language of HTTP status code, is a 404 code). We will not bullet-proof our code right now but it's worth noticing that if someone hits loginHandler with too few segments, our code will panic because it expects segs[2] and segs[3] to exist. For extra credit, see whether you can protect against this and return a nice error message instead of a panic if someone hits /auth/nonsense. Our loginHandler is only a function and not an object that implements the http.Handler interface. This is because, unlike other handlers, we don't need it to store any state. The Go standard library supports this, so we can use the http.HandleFunc function to map it in a way similar to how we used http.Handle earlier. In main.go, update the handlers: http.Handle("/chat", MustAuth(&templateHandler{filename: "chat.html"})) http.Handle("/login", &templateHandler{filename: "login.html"}) http.HandleFunc("/auth/", loginHandler) http.Handle("/room", r) Rebuild and run the chat application: go build –o chat ./chat –host=":8080" Hit the following URLs and notice the output logged in the terminal: http://localhost:8080/auth/login/google outputs TODO handle login for google http://localhost:8080/auth/login/facebook outputs TODO handle login for facebook We have successfully implemented a dynamic path-matching mechanism that so far just prints out TODO messages; we need to integrate with authentication services in order to make our login process work. OAuth2 OAuth2 is an open authentication and authorization standard designed to allow resource owners to give clients delegated access to private data (such as wall posts or tweets) via an access token exchange handshake. Even if you do not wish to access the private data, OAuth2 is a great option that allows people to sign in using their existing credentials, without exposing those credentials to a third-party site. In this case, we are the third party and we want to allow our users to sign in using services that support OAuth2. From a user's point of view, the OAuth2 flow is: A user selects provider with whom they wish to sign in to the client app. The user is redirected to the provider's website (with a URL that includes the client app ID) where they are asked to give permission to the client app. The user signs in from the OAuth2 service provider and accepts the permissions requested by the third-party application. The user is redirected back to the client app with a request code. In the background, the client app sends the grant code to the provider, who sends back an auth token. The client app uses the access token to make authorized requests to the provider, such as to get user information or wall posts. To avoid reinventing the wheel, we will look at a few open source projects that have already solved this problem for us. Open source OAuth2 packages Andrew Gerrand has been working on the core Go team since February 2010, that is two years before Go 1.0 was officially released. His goauth2 package (see https://code.google.com/p/goauth2/) is an elegant implementation of the OAuth2 protocol written entirely in Go. Andrew's project inspired Gomniauth (see https://github.com/stretchr/gomniauth). An open source Go alternative to Ruby's omniauth project, Gomniauth provides a unified solution to access different OAuth2 services. In the future, when OAuth3 (or whatever next-generation authentication protocol it is) comes out, in theory, Gomniauth could take on the pain of implementing the details, leaving the user code untouched. For our application, we will use Gomniauth to access OAuth services provided by Google, Facebook, and GitHub, so make sure you have it installed by running the following command: go get github.com/stretchr/gomniauth Some of the project dependencies of Gomniauth are kept in Bazaar repositories, so you'll need to head over to http://wiki.bazaar.canonical.com to download them. Tell the authentication providers about your app Before we ask an authentication provider to help our users sign in, we must tell them about our application. Most providers have some kind of web tool or console where you can create applications to kick this process. Here's one from Google: In order to identify the client application, we need to create a client ID and secret. Despite the fact that OAuth2 is an open standard, each provider has their own language and mechanism to set things up, so you will most likely have to play around with the user interface or the documentation to figure it out in each case. At the time of writing this, in Google Developer Console , you navigate to APIs & auth | Credentials and click on the Create new Client ID button. In most cases, for added security, you have to be explicit about the host URLs from where requests will come. For now, since we're hosting our app locally on localhost:8080, you should use that. You will also be asked for a redirect URI that is the endpoint in our chat application and to which the user will be redirected after successfully signing in. The callback will be another action on our loginHandler, so the redirection URL for the Google client will be http://localhost:8080/auth/callback/google. Once you finish the authentication process for the providers you want to support, you will be given a client ID and secret for each provider. Make a note of these, because we will need them when we set up the providers in our chat application. If we host our application on a real domain, we have to create new client IDs and secrets, or update the appropriate URL fields on our authentication providers to ensure that they point to the right place. Either way, it's not bad practice to have a different set of development and production keys for security. Summary This article shows how to add OAuth to our chat application so that we can keep track of who is saying what, but let them log in using Google, Facebook, or GitHub. We also learned how to use handlers for efficient coding. This article also thought us how to make a pretty social sign-in page. Resources for Article: Further resources on this subject: WebSockets in Wildfly [article] Using Socket.IO and Express together [article] The Importance of Securing Web Services [article]

0
0
3363

Packt

20 Jan 2015

16 min read

ArcGIS Spatial Analyst

Packt

20 Jan 2015

16 min read

0
0
3445

article-image-customization-microsoft-dynamics-crm

Packt

30 Dec 2014

24 min read

Customization in Microsoft Dynamics CRM

Packt

30 Dec 2014

24 min read

0
0
3983

Packt

30 Dec 2014

13 min read

Middleware

Packt

30 Dec 2014

13 min read

0
0
7953

article-image-how-vector-features-are-displayed

Packt

30 Dec 2014

23 min read

How Vector Features are Displayed

Packt

30 Dec 2014

23 min read

In this article by Erik Westra, author of the book Building Mapping Applications with QGIS, we will learn how QGIS symbols and renderers are used to control how vector features are displayed on a map. In addition to this, we will also learn saw how symbol layers work. The features within a vector map layer are displayed using a combination of renderer and symbol objects. The renderer chooses which symbol is to be used for a given feature, and the symbol does the actual drawing. There are three basic types of symbols defined by QGIS: Marker symbol: This displays a point as a filled circle Line symbol: This draws a line using a given line width and color Fill symbol: This draws the interior of a polygon with a given color These three types of symbols are implemented as subclasses of the qgis.core.QgsSymbolV2 class: qgis.core.QgsMarkerSymbolV2 qgis.core.QgsLineSymbolV2 qgis.core.QgsFillSymbolV2 You might be wondering why all these classes have "V2" in their name. This is a historical quirk of QGIS. Earlier versions of QGIS supported both an "old" and a "new" system of rendering, and the "V2" naming refers to the new rendering system. The old rendering system no longer exists, but the "V2" naming continues to maintain backward compatibility with existing code. Internally, symbols are rather complex, using "symbol layers" to draw multiple elements on top of each other. In most cases, however, you can make use of the "simple" version of the symbol. This makes it easier to create a new symbol without having to deal with the internal complexity of symbol layers. For example: symbol = QgsMarkerSymbolV2.createSimple({'width' : 1.0, 'color' : "255,0,0"}) While symbols draw the features onto the map, a renderer is used to choose which symbol to use to draw a particular feature. In the simplest case, the same symbol is used for every feature within a layer. This is called a single symbol renderer, and is represented by the qgis.core.QgsSingleSymbolRenderV2class. Other possibilities include: Categorized symbol renderer (qgis.core.QgsCategorizedSymbolRendererV2): This renderer chooses a symbol based on the value of an attribute. The categorized symbol renderer has a mapping from attribute values to symbols. Graduated symbol renderer (qgis.core.QgsGraduatedSymbolRendererV2): This type of renderer has a series of ranges of attribute values, and maps each range to an appropriate symbol. Using a single symbol renderer is very straightforward: symbol = ... renderer = QgsSingleSymbolRendererV2(symbol) layer.setRendererV2(renderer) To use a categorized symbol renderer, you first define a list of qgis.core.QgsRendererCategoryV2 objects, and then use that to create the renderer. For example: symbol_male = ... symbol_female = ... categories = [] categories.append(QgsRendererCategoryV2("M", symbol_male, "Male")) categories.append(QgsRendererCategoryV2("F", symbol_female, "Female")) renderer = QgsCategorizedSymbolRendererV2("", categories) renderer.setClassAttribute("GENDER") layer.setRendererV2(renderer) Notice that the QgsRendererCategoryV2 constructor takes three parameters: the desired value, the symbol to use, and the label used to describe that category. Finally, to use a graduated symbol renderer, you define a list of qgis.core.QgsRendererRangeV2 objects and then use that to create your renderer. For example: symbol1 = ... symbol2 = ... ranges = [] ranges.append(QgsRendererRangeV2(0, 10, symbol1, "Range 1")) ranges.append(QgsRendererRange(11, 20, symbol2, "Range 2")) renderer = QgsGraduatedSymbolRendererV2("", ranges) renderer.setClassAttribute("FIELD") layer.setRendererV2(renderer) Working with symbol layers Internally, symbols consist of one or more symbol layers that are displayed one on top of the other to draw the vector feature: The symbol layers are drawn in the order in which they are added to the symbol. So, in this example, Symbol Layer 1 will be drawn before Symbol Layer 2. This has the effect of drawing the second symbol layer on top of the first. Make sure you get the order of your symbol layers correct, or you may find a symbol layer completely obscured by another layer. While the symbols we have been working with so far have had only one layer, there are some clever tricks you can perform with multilayer symbols. When you create a symbol, it will automatically be initialized with a default symbol layer. For example, a line symbol (an instance of QgsLineSymbolV2) will be created with a single layer of type QgsSimpleLineSymbolLayerV2. This layer is used to draw the line feature onto the map. To work with symbol layers, you need to remove this default layer and replace it with your own symbol layer or layers. For example: symbol = QgsSymbolV2.defaultSymbol(layer.geometryType()) symbol.deleteSymbolLayer(0) # Remove default symbol layer. symbol_layer_1 = QgsSimpleFillSymbolLayerV2() symbol_layer_1.setFillColor(QColor("yellow")) symbol_layer_2 = QgsLinePatternFillSymbolLayer() symbol_layer_2.setLineAngle(30) symbol_layer_2.setDistance(2.0) symbol_layer_2.setLineWidth(0.5) symbol_layer_2.setColor(QColor("green")) symbol.appendSymbolLayer(symbol_layer_1) symbol.appendSymbolLayer(symbol_layer_2) The following methods can be used to manipulate the layers within a symbol: symbol.symbolLayerCount(): This returns the number of symbol layers within this symbol symbol.symbolLayer(index): This returns the given symbol layer within the symbol. Note that the first symbol layer has an index of zero. symbol.changeSymbolLayer(index, symbol_layer): This replaces a given symbol layer within the symbol symbol.appendSymbolLayer(symbol_layer): This appends a new symbol layer to the symbol symbol.insertSymbolLayer(index, symbol_layer): This inserts a symbol layer at a given index symbol.deleteSymbolLayer(index): This removes the symbol layer at the given index Remember that to use the symbol once you've created it, you create an appropriate renderer and then assign that renderer to your map layer. For example: renderer = QgsSingleSymbolRendererV2(symbol) layer.setRendererV2(renderer) The following symbol layer classes are available for you to use: PyQGIS class Description Example QgsSimpleMarkerSymbolLayerV2 This displays a point geometry as a small colored circle. QgsEllipseSymbolLayerV2 This displays a point geometry as an ellipse. QgsFontMarkerSymbolLayerV2 This displays a point geometry as a single character. You can choose the font and character to be displayed. QgsSvgMarkerSymbolLayerV2 This displays a point geometry using a single SVG format image. QgsVectorFieldSymbolLayer This displays a point geometry by drawing a displacement line. One end of the line is the coordinate of the point, while the other end is calculated using attributes of the feature. QgsSimpleLineSymbolLayerV2 This displays a line geometry or the outline of a polygon geometry using a line of a given color, width, and style. QgsMarkerLineSymbolLayerV2 This displays a line geometry or the outline of a polygon geometry by repeatedly drawing a marker symbol along the length of the line. QgsSimpleFillSymbolLayerV2 This displays a polygon geometry by filling the interior with a given solid color and then drawing a line around the perimeter. QgsGradientFillSymbolLayerV2 This fills the interior of a polygon geometry using a color or grayscale gradient. QgsCentroidFillSymbolLayerV2 This draws a simple dot at the centroid of a polygon geometry. QgsLinePatternFillSymbolLayer This draws the interior of a polygon geometry using a repeated line. You can choose the angle, width, and color to use for the line. QgsPointPatternFillSymbolLayer This draws the interior of a polygon geometry using a repeated point. QgsSVGFillSymbolLayer This draws the interior of a polygon geometry using a repeated SVG format image. These predefined symbol layers, either individually or in various combinations, give you enormous flexibility in how features are to be displayed. However, if these aren't enough for you, you can also implement your own symbol layers using Python. We will look at how this can be done later in this article. Combining symbol layers By combining symbol layers, you can achieve a range of complex visual effects. For example, you could combine an instance of QgsSimpleMarkerSymbolLayerV2 with a QgsVectorFieldSymbolLayer to display a point geometry using two symbols at once: One of the main uses of symbol layers is to draw different LineString or PolyLine symbols to represent different types of roads. For example, you can draw a complex road symbol by combining multiple symbol layers, like this: This effect is achieved using three separate symbol layers: Here is the Python code used to generate the above map symbol: symbol = QgsLineSymbolV2.createSimple({}) symbol.deleteSymbolLayer(0) # Remove default symbol layer. symbol_layer = QgsSimpleLineSymbolLayerV2() symbol_layer.setWidth(4) symbol_layer.setColor(QColor("light gray")) symbol_layer.setPenCapStyle(Qt.FlatCap) symbol.appendSymbolLayer(symbol_layer) symbol_layer = QgsSimpleLineSymbolLayerV2() symbol_layer.setColor(QColor("black")) symbol_layer.setWidth(2) symbol_layer.setPenCapStyle(Qt.FlatCap) symbol.appendSymbolLayer(symbol_layer) symbol_layer = QgsSimpleLineSymbolLayerV2() symbol_layer.setWidth(1) symbol_layer.setColor(QColor("white")) symbol_layer.setPenStyle(Qt.DotLine) symbol.appendSymbolLayer(symbol_layer) As you can see, you can set the line width, color, and style to create whatever effect you want. As always, you have to define the layers in the correct order, with the back-most symbol layer defined first. By combining line symbol layers in this way, you can create almost any type of road symbol that you want. You can also use symbol layers when displaying polygon geometries. For example, you can draw QgsPointPatternFillSymbolLayer on top of QgsSimpleFillSymbolLayerV2 to have repeated points on top of a simple filled polygon, like this: Finally, you can make use of transparency to allow the various symbol layers (or entire symbols) to blend into each other. For example, you can create a pinstripe effect by combining two symbol layers, like this: symbol = QgsFillSymbolV2.createSimple({}) symbol.deleteSymbolLayer(0) # Remove default symbol layer. symbol_layer = QgsGradientFillSymbolLayerV2() symbol_layer.setColor2(QColor("dark gray")) symbol_layer.setColor(QColor("white")) symbol.appendSymbolLayer(symbol_layer) symbol_layer = QgsLinePatternFillSymbolLayer() symbol_layer.setColor(QColor(0, 0, 0, 20)) symbol_layer.setLineWidth(2) symbol_layer.setDistance(4) symbol_layer.setLineAngle(70) symbol.appendSymbolLayer(symbol_layer) The result is quite subtle and visually pleasing: In addition to changing the transparency for a symbol layer, you can also change the transparency for the symbol as a whole. This is done by using the setAlpha() method, like this: symbol.setAlpha(0.3) The result looks like this: Note that setAlpha() takes a floating point number between 0.0 and 1.0, while the transparency of a QColor object, like the ones we used earlier, is specified using an alpha value between 0 and 255. Implementing symbol layers in Python If the built-in symbol layers aren't flexible enough for your needs, you can implement your own symbol layers using Python. To do this, you create a subclass of the appropriate type of symbol layer (QgsMarkerSymbolLayerV2, QgsLineSymbolV2, or QgsFillSymbolV2) and implement the various drawing methods yourself. For example, here is a simple marker symbol layer that draws a cross for a Point geometry: class CrossSymbolLayer(QgsMarkerSymbolLayerV2): def __init__(self, length=10.0, width=2.0): QgsMarkerSymbolLayerV2.__init__(self) self.length = length self.width = width def layerType(self): return "Cross" def properties(self): return {'length' : self.length, 'width' : self.width} def clone(self): return CrossSymbolLayer(self.length, self.width) def startRender(self, context): self.pen = QPen() self.pen.setColor(self.color()) self.pen.setWidth(self.width) def stopRender(self, context): self.pen = None def renderPoint(self, point, context): left = point.x() - self.length right = point.x() + self.length bottom = point.y() - self.length top = point.y() + self.length painter = context.renderContext().painter() painter.setPen(self.pen) painter.drawLine(left, bottom, right, top) painter.drawLine(right, bottom, left, top) Using this custom symbol layer in your code is straightforward: symbol = QgsMarkerSymbolV2.createSimple({}) symbol.deleteSymbolLayer(0) symbol_layer = CrossSymbolLayer() symbol_layer.setColor(QColor("gray")) symbol.appendSymbolLayer(symbol_layer) Running this code will draw a cross at the location of each point geometry, as follows: Of course, this is a simple example, but it shows you how to use custom symbol layers implemented in Python. Let's now take a closer look at the implementation of the CrossSymbolLayer class, and see what each method does: __init__(): Notice how the __init__ method accepts parameters that customize the way the symbol layer works. These parameters, which should always have default values assigned to them, are the properties associated with the symbol layer. If you want to make your custom symbol available within the QGIS Layer Properties window, you will need to register your custom symbol layer and tell QGIS how to edit the symbol layer's properties. We will look at this shortly. layerType(): This method returns a unique name for your symbol layer. properties(): This should return a dictionary that contains the various properties used by this symbol layer. The properties returned by this method will be stored in the QGIS project file, and used later to restore the symbol layer. clone(): This method should return a copy of the symbol layer. Since we have defined our properties as parameters to the __init__ method, implementing this method simply involves creating a new instance of the class and copying the properties from the current symbol layer to the new instance. startRender(): This method is called before the first feature in the map layer is rendered. This can be used to define any objects that will be required to draw the feature. Rather than creating these objects each time, it is more efficient (and therefore faster) to create them only once to render all the features. In this example, we create the QPen object that we will use to draw the Point geometries. stopRender(): This method is called after the last feature has been rendered. This can be used to release the objects created by the startRender() method. renderPoint(): This is where all the work is done for drawing point geometries. As you can see, this method takes two parameters: the point at which to draw the symbol, and the rendering context (an instance of QgsSymbolV2RenderContext) to use for drawing the symbol. The rendering context provides various methods for accessing the feature being displayed, as well as information about the rendering operation, the current scale factor, etc. Most importantly, it allows you to access the PyQt QPainter object needed to actually draw the symbol onto the screen. The renderPoint() method is only used for symbol layers that draw point geometries. For line geometries, you should implement the renderPolyline() method, which has the following signature: def renderPolyline(self, points, context): The points parameter will be a QPolygonF object containing the various points that make up the LineString, and context will be the rendering context to use for drawing the geometry. If your symbol layer is intended to work with polygons, you should implement the renderPolygon() method, which looks like this: def renderPolygon(self, outline, rings, context): Here, outline is a QPolygonF object that contains the points that make up the exterior of the polygon, and rings is a list of QPolygonF objects that define the interior rings or "holes" within the polygon. As always, context is the rendering context to use when drawing the geometry. A custom symbol layer created in this way will work fine if you just want to use it within your own external PyQGIS application. However, if you want to use a custom symbol layer within a running copy of QGIS, and in particular, if you want to allow end users to work with the symbol layer using the Layer Properties window, there are some extra steps you will have to take, which are as follows: If you want the symbol to be visually highlighted when the user clicks on it, you will need to change your symbol layer's renderXXX() method to see if the feature being drawn has been selected by the user, and if so, change the way it is drawn. The easiest way to do this is to change the geometry's color. For example: if context.selected(): color = context.selectionColor() else: color = self.color To allow the user to edit the symbol layer's properties, you should create a subclass of QgsSymbolLayerV2Widget, which defines the user interface to edit the properties. For example, a simple widget for the purpose of editing the length and width of a CrossSymbolLayer can be defined as follows: class CrossSymbolLayerWidget(QgsSymbolLayerV2Widget): def __init__(self, parent=None): QgsSymbolLayerV2Widget.__init__(self, parent) self.layer = None self.lengthField = QSpinBox(self) self.lengthField.setMinimum(1) self.lengthField.setMaximum(100) self.connect(self.lengthField, SIGNAL("valueChanged(int)"), self.lengthChanged) self.widthField = QSpinBox(self) self.widthField.setMinimum(1) self.widthField.setMaximum(100) self.connect(self.widthField, SIGNAL("valueChanged(int)"), self.widthChanged) self.form = QFormLayout() self.form.addRow('Length', self.lengthField) self.form.addRow('Width', self.widthField) self.setLayout(self.form) def setSymbolLayer(self, layer): if layer.layerType() == "Cross": self.layer = layer self.lengthField.setValue(layer.length) self.widthField.setValue(layer.width) def symbolLayer(self): return self.layer def lengthChanged(self, n): self.layer.length = n self.emit(SIGNAL("changed()")) def widthChanged(self, n): self.layer.width = n self.emit(SIGNAL("changed()")) We define the contents of our widget using the standard __init__() initializer. As you can see, we define two fields, lengthField and widthField, which let the user change the length and width properties respectively, for our symbol layer. The setSymbolLayer() method tells the widget which QgsSymbolLayerV2 object to use, while the symbolLayer() method returns the QgsSymbolLayerV2 object this widget is editing. Finally, the two XXXChanged() methods are called when the user changes the value of the fields, allowing us to update the symbol layer's properties to match the value set by the user. Finally, you will need to register your symbol layer. To do this, you create a subclass of QgsSymbolLayerV2AbstractMetadata and pass it to the QgsSymbolLayerV2Registry object's addSymbolLayerType() method. Here is an example implementation of the metadata for our CrossSymbolLayer class, along with the code to register it within QGIS: class CrossSymbolLayerMetadata(QgsSymbolLayerV2AbstractMetadata): def __init__(self): QgsSymbolLayerV2AbstractMetadata.__init__(self, "Cross", "Cross marker", QgsSymbolV2.Marker) def createSymbolLayer(self, properties): if "length" in properties: length = int(properties['length']) else: length = 10 if "width" in properties: width = int(properties['width']) else: width = 2 return CrossSymbolLayer(length, width) def createSymbolLayerWidget(self, layer): return CrossSymbolLayerWidget() registry = QgsSymbolLayerV2Registry.instance() registry.addSymbolLayerType(CrossSymbolLayerMetadata()) Note that the parameters of QgsSymbolLayerV2AbstractMetadata.__init__() are as follows: The unique name for the symbol layer, which must match the name returned by the symbol layer's layerType() method. A display name for this symbol layer, as shown to the user within the Layer Properties window. The type of symbol that this symbol layer will be used for. The createSymbolLayer() method is used to restore the symbol layer based on the properties stored in the QGIS project file when the project was saved. The createSymbolLayerWidget() method is called to create the user interface widget that lets the user view and edit the symbol layer's properties. Implementing renderers in Python If you need to choose symbols based on more complicated criteria than what the built-in renderers will provide, you can write your own custom QgsFeatureRendererV2 subclass using Python. For example, the following Python code implements a simple renderer that alternates between odd and even symbols as point features are displayed: class OddEvenRenderer(QgsFeatureRendererV2): def __init__(self): QgsFeatureRendererV2.__init__(self, "OddEvenRenderer") self.evenSymbol = QgsMarkerSymbolV2.createSimple({}) self.evenSymbol.setColor(QColor("light gray")) self.oddSymbol = QgsMarkerSymbolV2.createSimple({}) self.oddSymbol.setColor(QColor("black")) self.n = 0 def clone(self): return OddEvenRenderer() def symbolForFeature(self, feature): self.n = self.n + 1 if self.n % 2 == 0: return self.evenSymbol else: return self.oddSymbol def startRender(self, context, layer): self.n = 0 self.oddSymbol.startRender(context) self.evenSymbol.startRender(context) def stopRender(self, context): self.oddSymbol.stopRender(context) self.evenSymbol.stopRender(context) def usedAttributes(self): return [] Using this renderer will cause the various point geometries to be displayed in alternating colors, for example: Let's take a closer look at how this class was implemented, and what the various methods do: __init__(): This is your standard Python initializer. Notice how we have to provide a unique name for the renderer when calling the QgsFeatureRendererV2.__init__() method; this is used to keep track of the various renderers within QGIS itself. clone(): This creates a copy of this renderer. If your renderer uses properties to control how it works, this method should copy those properties into the new renderer object. symbolForFeature(): This returns the symbol to use for drawing the given feature. startRender(): This prepares to start rendering the features within the map layer. As the renderer can make use of multiple symbols, you need to implement this so that your symbols are also given a chance to prepare for rendering. stopRender(): This finishes rendering the features. Once again, you need to implement this so that your symbols can have a chance to clean up once the rendering process has finished. usedAttributes():This method should be implemented to return the list of attributes that the renderer requires if your renderer makes use of feature attributes to choose between the various symbols,. If you wish, you can also implement your own widget that lets the user change the way the renderer works. This is done by subclassing QgsRendererV2Widget and setting up the widget to edit the renderer's various properties in the same way that we implemented a subclass of QgsSymbolLayerV2Widget to edit the properties for a symbol layer. You will also need to provide metadata about your new renderer (by subclassing QgsRendererV2AbstractMetadata) and use the QgsRendererV2Registry object to register your new renderer. If you do this, the user will be able to select your custom renderer for new map layers, and change the way your renderer works by editing the renderer's properties. Summary In this article, we learned how QGIS symbols and renderers are used to control how vector features are displayed on a map. We saw that there are three standard types of symbols: marker symbols for drawing points, line symbols for drawing lines, and fill symbols for drawing the interior of a polygon. We then learned how to instantiate a "simple" version of each of these symbols for use in your programs. We next looked at the built-in renderers, and how these can be used to choose the same symbol for every feature (using the QgsSingleSymbolRenderV2 class), to select a symbol based on the exact value of an attribute (using QgsCategorizedSymbolRendererV2), and to choose a symbol based on a range of attribute values (using the QgsGraduatedSymbolRendererV2 class). We then saw how symbol layers work, and how to manipulate the layers within a symbol. We looked at all the different types of symbol layers built into QGIS, and learned how they can be combined to produce sophisticated visual effects. Finally, we saw how to implement our own symbol layers using Python, and how to write your own renderer from scratch if one of the existing renderer classes doesn't meet your needs. Using these various PyQGIS classes, you have an extremely powerful set of tools at your disposal for displaying vector data within a map. While simple visual effects can be achieved with a minimum of fuss, you can produce practically any visual effect you want using an appropriate combination of built-in or custom-written QGIS symbols and renderers. Resources for Article: Further resources on this subject: Combining Vector and Raster Datasets [article] QGIS Feature Selection Tools [article] Creating a Map [article]

0
0
4082

How-To Tutorials - Programming

Unit Testing

Financial Management with Microsoft Dynamics AX 2012 R3

How to Build a Koa Web Application - Part 2

The Five Kinds of Python Functions Python 3.4 Edition

Working with WebStart and the Browser Plugin

Visualforce Development with Apex

Contexts and Dependency Injection in NetBeans

Multiplying Performance with Parallel Computing

The Chain of Responsibility Pattern

OpenLayers' Key Components

Trending Topics

Adding Authentication

ArcGIS Spatial Analyst

Customization in Microsoft Dynamics CRM

Middleware

How Vector Features are Displayed

Create a Free Account To Continue Reading

SignIn Free Account To Continue Reading