Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-machine-learning-practice
Packt
18 Feb 2016
13 min read
Save for later

Machine learning in practice

Packt
18 Feb 2016
13 min read
In this article, we will learn how we can implement machine learning in practice. To apply the learning process to real-world tasks, we'll use a five-step process. Regardless of the task at hand, any machine learning algorithm can be deployed by following this plan: Data collection: The data collection step involves gathering the learning material an algorithm will use to generate actionable knowledge. In most cases, the data will need to be combined into a single source like a text file, spreadsheet, or database. Data exploration and and preparation: The quality of any machine learning project is based largely on the quality of its input data. Thus, it is important to learn more about the data and its nuances during a practice called data exploration. Additional work is required to prepare the data for the learning process. This involves fixing or cleaning so-called "messy" data, eliminating unnecessary data, and recoding the data to conform to the learner's expected inputs. Model training: By the time the data has been prepared for analysis, you are likely to have a sense of what you are capable of learning from the data. The specific machine learning task chosen will inform the selection of an appropriate algorithm and the algorithm will represent the data in the form of a model. Model evaluation: Because each machine learning model results in a biased solution to the learning problem, it is important to evaluate how well the algorithm learns from its experience. Depending on the type of model used, you might be able to evaluate the accuracy of the model using a test dataset or you may need to develop measures of performance specific to the intended application. Model improvement: If better performance is needed, it becomes necessary to utilize more advanced strategies to augment the performance of the model. Sometimes, it may be necessary to switch to a different type of model altogether. You may need to supplement your data with additional data or perform additional preparatory work as in step two of this process. (For more resources related to this topic, see here.) After these steps are completed, if the model appears to be performing well, it can be deployed for its intended task. As the case may be, you might utilize your model to provide score data for predictions (possibly in real time), for projections of financial data, to generate useful insight for marketing or research, or to automate tasks such as mail delivery or flying aircraft. The successes and failures of the deployed model might even provide additional data to train your next generation learner. Types of input data The practice of machine learning involves matching the characteristics of input data to the biases of the available approaches. Thus, before applying machine learning to real-world problems, it is important to understand the terminology that distinguishes among input datasets. The phrase unit of observation is used to describe the smallest entity with measured properties of interest for a study. Commonly, the unit of observation is in the form of persons, objects or things, transactions, time points, geographic regions, or measurements. Sometimes, units of observation are combined to form units such as person-years, which denote cases where the same person is tracked over multiple years; each person-year comprises of a person's data for one year. The unit of observation is related but not identical to the unit of analysis, which is the smallest unit from which the inference is made. Although it is often the case, the observed and analyzed units are not always the same. For example, data observed from people might be used to analyze trends across countries. Datasets that store the units of observation and their properties can be imagined as collections of data consisting of: Examples: Instances of the unit of observation for which properties have been recorded Features: Recorded properties or attributes of examples that may be useful for learning It is the easiest to understand features and examples through real-world cases. To build a learning algorithm to identify spam e-mail, the unit of observation could be e-mail messages, examples would be specific messages, and its features might consist of the words used in the messages. For a cancer detection algorithm, the unit of observation could be patients, the examples might include a random sample of cancer patients, and its features may be the genomic markers from biopsied cells as well as the characteristics of patient such as weight, height, or blood pressure. While examples and features do not have to be collected in any specific form, they are commonly gathered in the matrix format, which means that each example has exactly the same features. The following spreadsheet shows a dataset in the matrix format. In the matrix data, each row in the spreadsheet is an example and each column is a feature. Here, the rows indicate examples of automobiles, while the columns record various each automobile's feature, such as price, mileage, color, and transmission type. Matrix format data is by far the most common form used in machine learning, though other forms are used occasionally in specialized cases: Features also come in various forms. If a feature represents a characteristic measured in numbers, it is unsurprisingly called numeric. Alternatively, if a feature is an attribute that consists of a set of categories, the feature is called categorical or nominal. A special case of categorical variables is called ordinal, which designates a nominal variable to the categories falling in an ordered list. Some examples of ordinal variables include clothing sizes such as small, medium, and large, or a measurement of customer satisfaction on a scale from "not at all happy" to "very happy." It is important to consider what the features represent, as the type and number of features in your dataset will assist in determining an appropriate machine learning algorithm for your task. Types of machine learning algorithms Machine learning algorithms are divided into categories according to their purpose. Understanding the categories of learning algorithms is an essential first step towards using data to drive the desired action. A predictive model is used for tasks that involve, as the name implies, the prediction of one value using other values in the dataset. The learning algorithm attempts to discover and model the relationship between the target feature (the feature being predicted) and the other features. Despite the common use of the word "prediction" to imply forecasting, predictive models need not necessarily foresee events in the future. For instance, a predictive model could be used to predict past events such as the date of a baby's conception using the mother's present-day hormone levels. Predictive models can also be used in real time to control traffic lights during rush hours. Because predictive models are given clear instruction on what they need to learn and how they are intended to learn it, the process of training a predictive model is known as supervised learning. The supervision does not refer to human involvement, but rather to the fact that the target values provide a way for the learner to know how well it has learned the desired task. Stated more formally, given a set of data, a supervised learning algorithm attempts to optimize a function (the model) to find the combination of feature values that result in the target output. The often used supervised machine learning task of predicting which category an example belongs to is known as classification. It is easy to think of potential uses for a classifier. For instance, you could predict whether: An e-mail message is spam A person has cancer A football team will win or lose An applicant will default on a loan In classification, the target feature to be predicted is a categorical feature known as the class and is divided into categories called levels. A class can have two or more levels, and the levels may or may not be ordinal. Because classification is so widely used in machine learning, there are many types of classification algorithms, with strengths and weaknesses suited for different types of input data. Supervised learners can also be used to predict numeric data such as income, laboratory values, test scores, or counts of items. To predict such numeric values, a common form of numeric prediction fits linear regression models to the input data. Although regression models are not the only type of numeric models, they are, by far, the most widely used. Regression methods are widely used for forecasting, as they quantify in exact terms the association between inputs and the target, including both, the magnitude and uncertainty of the relationship. Since it is easy to convert numbers into categories (for example, ages 13 to 19 are teenagers) and categories into numbers (for example, assign 1 to all males, 0 to all females), the boundary between classification models and numeric prediction models is not necessarily firm. A descriptive model is used for tasks that would benefit from the insight gained from summarizing data in new and interesting ways. As opposed to predictive models that predict a target of interest, in a descriptive model, no single feature is more important than the other. In fact, because there is no target to learn, the process of training a descriptive model is called unsupervised learning. Although it can be more difficult to think of applications for descriptive models—after all, what good is a learner that isn't learning anything in particular—they are used quite regularly for data mining. For example, the descriptive modeling task called pattern discovery is used to identify useful associations within data. Pattern discovery is often used for market basket analysis on retailers' transactional purchase data. Here, the goal is to identify items that are frequently purchased together, such that the learned information can be used to refine marketing tactics. For instance, if a retailer learns that swimming trunks are commonly purchased at the same time as sunscreens, the retailer might reposition the items more closely in the store or run a promotion to "up-sell" customers on associated items. Originally used only in retail contexts, pattern discovery is now starting to be used in quite innovative ways. For instance, it can be used to detect patterns of fraudulent behavior, screen for genetic defects, or identify hot spots for criminal activity. The descriptive modeling task of dividing a dataset into homogeneous groups is called clustering. This is, sometimes, used for segmentation analysis that identifies groups of individuals with similar behavior or demographic information so that advertising campaigns could be tailored for particular audiences. Although the machine is capable of identifying the clusters, human intervention is required to interpret them. For example, given five different clusters of shoppers at a grocery store, the marketing team will need to understand the differences among the groups in order to create a promotion that best suits each group. This is almost certainly easier than trying to create a unique appeal for each customer. Lastly, a class of machine learning algorithms known as meta-learners is not tied to a specific learning task, but is rather focused on learning how to learn more effectively. A meta-learning algorithm uses the result of some learnings to inform additional learning. This can be beneficial for very challenging problems or when a predictive algorithm's performance needs to be as accurate as possible. Matching input data to algorithms The following table lists the general types of machine learning algorithms, each of which may be implemented in several ways. These are the basis on which nearly all the other more advanced methods are based. Although this covers only a fraction of the entire set of machine learning algorithms, learning these methods will provide a sufficient foundation to make sense of any other method you may encounter in the future. Model Learning task Supervised Learning Algorithms Nearest Neighbor Classification Naive Bayes Classification Decision Trees Classification Classification Rule Learners Classification Linear Regression Numeric prediction Regression Trees Numeric prediction Model Trees Numeric prediction Neural Networks Dual use Support Vector Machines Dual use Unsupervised Learning Algorithms Association Rules Pattern detection k-means clustering Clustering Meta-Learning Algorithms Bagging Dual use Boosting Dual use Random Forests Dual use  To begin applying machine learning to a real-world project, you will need to determine which of the four learning tasks your project represents: classification, numeric prediction, pattern detection, or clustering. The task will drive the choice of algorithm. For instance, if you are undertaking pattern detection, you are likely to employ association rules. Similarly, a clustering problem will likely utilize the k-means algorithm and numeric prediction will utilize regression analysis or regression trees. For classification, more thought is needed to match a learning problem to an appropriate classifier. In these cases, it is helpful to consider various distinctions among algorithms—distinctions that will only be apparent by studying each of the classifiers in depth. For instance, within classification problems, decision trees result in models that are readily understood, while the models of neural networks are notoriously difficult to interpret. If you were designing a credit-scoring model, this could be an important distinction because law often requires that the applicant must be notified about the reasons he or she was rejected for the loan. Even if the neural network is better at predicting loan defaults, if its predictions cannot be explained, then it is useless for this application. Although you will sometimes find that these characteristics exclude certain models from consideration. In many cases, the choice of algorithm is arbitrary. When this is true, feel free to use whichever algorithm you are most comfortable with. Other times, when predictive accuracy is primary, you may need to test several algorithms and choose the one that fits the best or use a meta-learning algorithm that combines several different learners to utilize the strengths of each. Summary Machine learning originated at the intersection of statistics, database science, and computer science. It is a powerful tool, capable of finding actionable insight in large quantities of data. Still, caution must be used in order to avoid common abuses of machine learning in the real world. Conceptually, learning involves the abstraction of data into a structured representation and the generalization of this structure into action that can be evaluated for utility. In practical terms, a machine learner uses data containing examples and features of the concept to be learned and summarizes this data in the form of a model, which is then used for predictive or descriptive purposes. These purposes can be grouped into tasks, including classification, numeric prediction, pattern detection, and clustering. Among the many options, machine learning algorithms are chosen on the basis of the input data and the learning task. R provides support for machine learning in the form of community-authored packages. These powerful tools are free to download, but need to be installed before they can be used. Resources for Article:   Further resources on this subject: Introduction to Machine Learning with R [article] Machine Learning with R [article] Spark – Architecture and First Program [article]
Read more
  • 0
  • 0
  • 3238

article-image-test-all-things-python
Packt
18 Feb 2016
20 min read
Save for later

Test all the things with Python

Packt
18 Feb 2016
20 min read
The first testing tool we're going to look at is called doctest. The name is short for "document testing" or perhaps a "testable document". Either way, it's a literate tool designed to make it easy to write tests in such a way that computers and humans both benefit from them. Ideally, doctest tests both, informs human readers, and tells the computer what to expect. Mixing tests and documentation helps us: Keeps the documentation up-to-date with reality Make sure that the tests express the intended behavior Reuse some of the efforts involved in the documentation and test creation (For more resources related to this topic, see here.) Where doctest performs best The design decisions that went into doctest make it particularly well suited to writing acceptance tests at the integration and system testing levels. This is because doctest mixes human-only text with examples that both humans and computers can read. This structure doesn't support or enforce any of the formalizations of testing, but it conveys information beautifully and it still provides the computer with the ability to say that works or that doesn't work. As an added bonus, it is about the easiest way to write tests you'll ever see. In other words, a doctest file is a truly excellent program specification that you can have the computer check against your actual code any time you want. API documentation also benefits from being written as doctests and checked alongside your other tests. You can even include doctests in your docstrings. The basic idea you should be getting from all this is that doctest is ideal for uses where humans and computers will both benefit from reading them. The doctest language Like program source code, doctest tests are written in plain text. The doctest module extracts the tests and ignores the rest of the text, which means that the tests can be embedded in human-readable explanations or discussions. This is the feature that makes doctest suitable for uses such as program specifications. Example – creating and running a simple doctest We are going to create a simple doctest file, to show the fundamentals of using the tool. Perform the following steps: Open a new text file in your editor, and name it test.txt. Insert the following text into the file: This is a simple doctest that checks some of Python's arithmetic >>> 2 + 2 4 >>> 3 * 3 10 We can now run the doctest. At the command prompt, change to the directory where you saved test.txt. Type the following command: $ python3 ‑m doctest test.txt When the test is run, you should see output like this: Result – three times three does not equal ten You just wrote a doctest file that describes a couple of arithmetic operations, and ran it to check whether Python behaved as the tests said it should. You ran the tests by telling Python to execute doctest on the file containing the tests. In this case, Python's behavior differed from the tests because, according to the tests, three times three equals ten. However, Python disagrees on that. As doctest expected one thing and Python did something different, doctest presented you with a nice little error report showing where to find the failed test, and how the actual result differed from the expected result. At the bottom of the report is a summary showing how many tests failed in each file tested, which is helpful when you have more than one file containing tests. The syntax of doctests You might have already figured it out from looking at the previous example: doctest recognizes tests by looking for sections of text that look like they've been copied and pasted from a Python interactive session. Anything that can be expressed in Python is valid within a doctest. Lines that start with a >>> prompt are sent to a Python interpreter. Lines that start with a ... prompt are sent as continuations of the code from the previous line, allowing you to embed complex block statements into your doctests. Finally, any lines that don't start with >>> or ..., up to the next blank line or >>> prompt, represent the output expected from the statement. The output appears as it would in an interactive Python session, including both the return value and anything printed to the console. If you don't have any output lines, doctest assumes it to mean that the statement is expected to have no visible result on the console, which usually means that it returns None. The doctest module ignores anything in the file that isn't part of a test, which means that you can put explanatory text, HTML, line-art diagrams, or whatever else strikes your fancy in between your tests. We took advantage of this in the previous doctest to add an explanatory sentence before the test itself. Example – a more complex test Add the following code to your test.txt file, separated from the existing code by at least one blank line: Now we're going to take some more of doctest's syntax for a spin.   >>> import sys >>> def test_write(): ...     sys.stdout.write("Hellon") ...     return True >>> test_write() Hello True Now take a moment to consider before running the test. Will it pass or fail? Should it pass or fail? Result – five tests run? Just as we discussed before, run the test using the following command: python3 -m doctest test.txt You should see a result like this: Because we added the new tests to the same file containing the tests from before, we still see the notification that three times three does not equal 10. Now, though, we also see that five tests were run, which means our new tests ran and were successful. Why five tests? As far as doctest is concerned, we added the following three tests to the file: The first one says that, when we import sys, nothing visible should happen The second test says that, when we define the test_write function, nothing visible should happen The third test says that, when we call the test_write function, Hello and True should appear on the console, in that order, on separate lines Since all three of these tests pass, doctest doesn't bother to say much about them. All it did was increase the number of tests reported at the bottom from two to five. Expecting exceptions That's all well and good for testing that things work as expected, but it is just as important to make sure that things fail when they're supposed to fail. Put another way: sometimes your code is supposed to raise an exception, and you need to be able to write tests that check that behavior as well. Fortunately, doctest follows nearly the same principle in dealing with exceptions as it does with everything else; it looks for text that looks like a Python interactive session. This means it looks for text that looks like a Python exception report and traceback, and matches it against any exception that gets raised. The doctest module does handle exceptions a little differently from the way it handles other things. It doesn't just match the text precisely and report a failure if it doesn't match. Exception tracebacks tend to contain many details that are not relevant to the test, but that can change unexpectedly. The doctest module deals with this by ignoring the traceback entirely: it's only concerned with the first line, Traceback (most recent call last):, which tells it that you expect an exception, and the part after the traceback, which tells it which exception you expect. The doctest module only reports a failure if one of these parts does not match. This is helpful for a second reason as well: manually figuring out what the traceback will look like, when you're writing your tests, would require a significant amount of effort and would gain you nothing. It's better to simply omit them. Example – checking for an exception This is yet another test that you can add to test.txt, this time testing some code that ought to raise an exception. Insert the following text into your doctest file, as always separated by at least one blank line: Here we use doctest's exception syntax to check that Python is correctly enforcing its grammar. The error is a missing ) on the def line. >>> def faulty(: ...     yield from [1, 2, 3, 4, 5] Traceback (most recent call last): SyntaxError: invalid syntax The test is supposed to raise an exception, so it will fail if it doesn't raise the exception or if it raises the wrong exception. Make sure that you have your mind wrapped around this: if the test code executes successfully, the test fails, because it expected an exception. Run the tests using the following doctest: python3 -m doctest test.txt Result – success at failing The code contains a syntax error, which means this raises a SyntaxError exception, which in turn means that the example behaves as expected; this signifies that the test passes. When dealing with exceptions, it is often desirable to be able to use a wildcard matching mechanism. The doctest provides this facility through its ellipsis directive that we'll discuss shortly. Expecting blank lines The doctest uses the first blank line after  >>> to identify the end of the expected output, so what do you do when the expected output actually contains a blank line? The doctest handles this situation by matching a line that contains only the text <BLANKLINE> in the expected output with a real blank line in the actual output. Controlling doctest behavior with directives Sometimes, the default behavior of doctest makes writing a particular test inconvenient. For example, doctest might look at a trivial difference between the expected and real outputs and wrongly conclude that the test has failed. This is where doctest directives come to the rescue. Directives are specially formatted comments that you can place after the source code of a test and that tell doctest to alter its default behavior in some way. A directive comment begins with # doctest:, after which comes a comma-separated list of options that either enable or disable various behaviors. To enable a behavior, write a + (plus symbol) followed by the behavior name. To disable a behavior, white a – (minus symbol) followed by the behavior name. We'll take a look at the several directives in the following sections. Ignoring part of the result It's fairly common that only part of the output of a test is actually relevant to determining whether the test passes. By using the +ELLIPSIS directive, you can make doctest treat the text ... (called an ellipsis) in the expected output as a wildcard that will match any text in the output. When you use an ellipsis, doctest will scan until it finds text matching whatever comes after the ellipsis in the expected output, and continue matching from there. This can lead to surprising results such as an ellipsis matching against a 0-length section of the actual output, or against multiple lines. For this reason, it needs to be used thoughtfully. Example – ellipsis test drive We're going to use the ellipsis in a few different tests to better get a feel of how it works. As an added bonus, these tests also show the use of doctest directives. Add the following code to your test.txt file: Next up, we're exploring the ellipsis. >>> sys.modules # doctest: +ELLIPSIS {...'sys': <module 'sys' (built-in)>...}   >>> 'This is an expression that evaluates to a string' ... # doctest: +ELLIPSIS 'This is ... a string'   >>> 'This is also a string' # doctest: +ELLIPSIS 'This is ... a string'   >>> import datetime >>> datetime.datetime.now().isoformat() # doctest: +ELLIPSIS '...-...-...T...:...:...' Result – ellipsis elides The tests all pass, where they would all fail without the ellipsis. The first and last tests, in which we checked for the presence of a specific module in sys.modules and confirmed a specific formatting while ignoring the contents of a string, demonstrate the kind of situation where ellipsis is really useful, because it lets you focus on the part of the output that is meaningful and ignore the rest of the test. The middle tests demonstrate how different outputs can match the same expected result when ellipsis is in play. Look at the last test. Can you imagine any output that wasn't an ISO-formatted time stamp, but that would match the example anyway? Remember that the ellipsis can match any amount of text. Ignoring white space Sometimes, white space (spaces, tabs, newlines, and their ilk) is more trouble than it's worth. Maybe you want to be able to break a single line of expected output across several lines in your test file, or maybe you're testing a system that uses lots of white space but doesn't convey any useful information with it. The doctest gives you a way to "normalize" white space, turning any sequence of white space characters, in both the expected output and in the actual output, into a single space. It then checks whether these normalized versions match. Example – invoking normality We're going to write a couple of tests that demonstrate how whitespace normalization works. Insert the following code into your doctest file: Next, a demonstration of whitespace normalization. >>> [1, 2, 3, 4, 5, 6, 7, 8, 9] ... # doctest: +NORMALIZE_WHITESPACE [1, 2, 3,  4, 5, 6,  7, 8, 9]   >>> sys.stdout.write("This textn contains weird     spacing.n") ... # doctest: +NORMALIZE_WHITESPACE This text contains weird spacing. 39 Result – white space matches any other white space Both of these tests pass, in spite of the fact that the result of the first one has been wrapped across multiple lines to make it easy for humans to read, and the result of the second one has had its strange newlines and indentations left out, also for human convenience. Notice how one of the tests inserts extra whitespace in the expected output, while the other one ignores extra whitespace in the actual output? When you use +NORMALIZE_WHITESPACE, you gain a lot of flexibility with regard to how things are formatted in the text file. You may have noted the value 39 on the last line of the last example. Why is that there? It's because the write() method returns the number of bytes that were written, which in this case happens to be 39. If you're trying this example in an environment that maps ASCII characters to more than one byte, you will see a different number here; this will cause the test to fail until you change the expected number of bytes. Skipping an example On some occasions, doctest will recognize some text as an example to be checked, when in truth you want it to be simply text. This situation is rarer than it might at first seem, because usually there's no harm in letting doctest check everything it can. In fact, usually it's very helpful to have doctest check everything it can. For those times when you want to limit what doctest checks, though, there's the +SKIP directive. Example – humans only Append the following code to your doctest file: Now we're telling doctest to skip a test   >>> 'This test would fail.' # doctest: +SKIP If it were allowed to run. Result – it looks like a test, but it's not Before we added this last example to the file, doctest reported thirteen tests when we ran the file through it. After adding this code, doctest still reports thirteen tests. Adding the skip directive to the code completely removed it from consideration by doctest. It's not a test that passes, nor a test that fails. It's not a test at all. The other directives There are a number of other directives that can be issued to doctest, should you find the need. They're not as broadly useful as the ones already mentioned, but the time might come when you require one or more of them. The full documentation for all of the doctest directives can be found at http://docs.python.org/3/library/doctest.html#doctest-options. The remaining directives of doctest in the Python 3.4 version are as follows: DONT_ACCEPT_TRUE_FOR_1: This makes doctest differentiate between boolean values and numbers DONT_ACCEPT_BLANKLINE: This removes support for the <BLANKLINE> feature IGNORE_EXCEPTION_DETAIL: This makes doctest only care that an exception is of the expected type Strictly speaking, doctest supports several other options that can be set using the directive syntax, but they don't make any sense as directives, so we'll ignore them here. The execution scope of doctest tests When doctest is running the tests from text files, all the tests from the same file are run in the same execution scope. This means that, if you import a module or bind a variable in one test, that module or variable is still available in later tests. We took advantage of this fact several times in the tests written so far in this article: the sys module was only imported once, for example, although it was used in several tests. This behavior is not necessarily beneficial, because tests need to be isolated from each other. We don't want them to contaminate each other because, if a test depends on something that another test does, or if it fails because of something that another test does, these two tests are in some sense combined into one test that covers a larger section of your code. You don't want that to happen, because then knowing which test has failed doesn't give you as much information about what went wrong and where it happened. So, how can we give each test its own execution scope? There are a few ways to do it. One would be to simply place each test in its own file, along with whatever explanatory text that is needed. This works well in terms of functionality, but running the tests can be a pain unless you have a tool to find and run all of them for you. Another problem with this approach is that this breaks the idea that the tests contribute to a human-readable document. Another way to give each test its own execution scope is to define each test within a function, as follows: >>> def test1(): ...     import frob ...     return frob.hash('qux') >>> test1() 77 By doing this, the only thing that ends up in the shared scope is the test function (named test1 here). The frob module and any other names bound inside the function are isolated with the caveat that things that happen inside imported modules are not isolated. If the frob.hash() method changes a state inside the frob module, that state will still be changed if a different test imports the frob module again. The third way is to exercise caution with the names you create, and be sure to set them to known values at the beginning of each test section. In many ways this is the easiest approach, but this is also the one that places the most burden on you, because you have to keep track of what's in the scope. Why does doctest behave in this way, instead of isolating tests from each other? The doctest files are intended not just for computers to read, but also for humans. They often form a sort of narrative, flowing from one thing to the next. It would break the narrative to be constantly repeating what came before. In other words, this approach is a compromise between being a document and being a test framework, a middle ground that works for both humans and computers. Check your understanding Once you've decided on your answers to these questions, check them by writing a test document and running it through doctest: How does doctest recognize the beginning of a test in a document? How does doctest know when a test continues to further lines? How does doctest recognize the beginning and end of the expected output of a test? How would you tell doctest that you want to break the expected output across several lines, even though that's not how the test actually outputs it? Which parts of an exception report are ignored by doctest? When you assign a variable in a test file, which parts of the file can actually see that variable? Why do we care what code can see the variables created by a test? How can we make doctest not care what a section of output contains? Exercise – English to doctest Time to stretch your wings a bit. I'm going to give you a description of a single function in English. Your job is to copy the description into a new text file, and then add tests that describe all the requirements in a way that the computer can understand and check. Try to make the doctests so that they're not just for the computer. Good doctests tend to clarify things for human readers as well. By and large, this means that you present them to human readers as examples interspersed with the text. Without further ado, here is the English description: The fib(N) function takes a single integer as its only parameter N. If N is 0 or 1, the function returns 1. If N is less than 0, the function raises a ValueError. Otherwise, the function returns the sum of fib(N – 1) and fib(N – 2). The returned value will never be less than 1. A naïve implementation of this function would get very slow as N increased. I'll give you a hint and point out that the last sentence about the function being slow, isn't really testable. As computers get faster, any test you write that depends on an arbitrary definition of "slow" will eventually fail. Also, there's no good way to test the difference between a slow function and a function stuck in an infinite loop, so there's not much point in trying. If you find yourself needing to do that, it's best to back off and try a different solution. Not being able to tell whether a function is stuck or just slow is called the halting problem by computer scientists. We know that it can't be solved unless we someday discover a fundamentally better kind of computer. Faster computers won't do the trick, and neither will quantum computers, so don't hold your breath. The next-to-last sentence also provides some difficulty, since to test it completely would require running every positive integer through the fib() function, which would take forever (except that the computer will eventually run out of memory and force Python to raise an exception). How do we deal with this sort of thing, then? The best solution is to check whether the condition holds true for a random sample of viable inputs. The random.randrange() and random.choice() functions in the Python standard library make that fairly easy to do. Summary We learned the syntax of doctest, and went through several examples describing how to use it. After that, we took a real-world specification for the AVL tree, and examined how to formalize it as a set of doctests, so that we could use it to automatically check the correctness of an implementation. Specifically, we covered doctest's default syntax and the directives that alter it, how to write doctests in text files, how to write doctests in Python docstrings, and what it feels like to use doctest to turn a specification into tests. If, you want to learn more about Python Testing then you can refer the following books: Expert Python Programming: https://www.packtpub.com/application-development/expert-python-programming Python Testing Cookbook: https://www.packtpub.com/application-development/python-testing-cookbook Resources for Article:   Further resources on this subject: Façade Pattern – Being Adaptive with Façade [article] Predicting Sports Winners with Decision Trees and pandas [article] Gradient Descent at Work [article]
Read more
  • 0
  • 0
  • 3540

article-image-scraping-web-python-quick-start
Packt
17 Feb 2016
9 min read
Save for later

Scraping the Web with Python - Quick Start

Packt
17 Feb 2016
9 min read
In this article we're going to acquire intelligence data from a variety of sources. We might interview people. We might steal files from a secret underground base. We might search the World Wide Web (WWW). (For more resources related to this topic, see here.) Accessing data from the Internet The WWW and Internet are based on a series of agreements called Request for Comments (RFC). The RFCs define the standards and protocols to interconnect different networks, that is, the rules for internetworking. The WWW is defined by a subset of these RFCs that specifies the protocols, behaviors of hosts and agents (servers and clients), and file formats, among other details. In a way, the Internet is a controlled chaos. Most software developers agree to follow the RFCs. Some don't. If their idea is really good, it can catch on, even though it doesn't precisely follow the standards. We often see this in the way some browsers don't work with some websites. This can cause confusion and questions. We'll often have to perform both espionage and plain old debugging to figure out what's available on a given website. Python provides a variety of modules that implement the software defined in the Internet RFCs. We'll look at some of the common protocols to gather data through the Internet and the Python library modules that implement these protocols. Background briefing – the TCP/IP protocols The essential idea behind the WWW is the Internet. The essential idea behind the Internet is the TCP/IP protocol stack. The IP part of this is the internetworking protocol. This defines how messages can be routed between networks. Layered on top of IP is the TCP protocol to connect two applications to each other. TCP connections are often made via a software abstraction called a socket. In addition to TCP, there's also UDP; it's not used as much for the kind of WWW data we're interested in. In Python, we can use the low-level socket library to work with the TCP protocol, but we won't. A socket is a file-like object that supports open, close, input, and output operations. Our software will be much simpler if we work at a higher level of abstraction. The Python libraries that we'll use will leverage the socket concept under the hood. The Internet RFCs defines a number of protocols that build on TCP/IP sockets. These are more useful definitions of interactions between host computers (servers) and user agents (clients). We'll look at two of these: Hypertext Transfer Protocol (HTTP) and File Transfer Protocol (FTP). Using http.client for HTTP GET The essence of web traffic is HTTP. This is built on TCP/IP. HTTP defines two roles: host and user agent, also called server and client, respectively. We'll stick to server and client. HTTP defines a number of kinds of request types, including GET and POST. A web browser is one kind of client software we can use. This software makes GET and POST requests, and displays the results from the web server. We can do this kind of client-side processing in Python using two library modules. The http.client module allows us to make GET and POST requests as well as PUT and DELETE. We can read the response object. Sometimes, the response is an HTML page. Sometimes, it's a graphic image. There are other things too, but we're mostly interested in text and graphics. Here's a picture of a mysterious device we've been trying to find. We need to download this image to our computer so that we can see it and send it to our informant from http://upload.wikimedia.org/wikipedia/commons/7/72/IPhone_Internals.jpg: Here's a picture of the currency we're supposed to track down and pay with: We need to download this image. Here is the link: http://upload.wikimedia.org/wikipedia/en/c/c1/1drachmi_1973.jpg Here's how we can use http.client to get these two image files: import http.client import contextlib path_list = [ "/wikipedia/commons/7/72/IPhone_Internals.jpg", "/wikipedia/en/c/c1/1drachmi_1973.jpg", ] host = "upload.wikimedia.org" with contextlib.closing(http.client.HTTPConnection( host )) as connection: for path in path_list: connection.request( "GET", path ) response= connection.getresponse() print("Status:", response.status) print("Headers:", response.getheaders()) _, _, filename = path.rpartition("/") print("Writing:", filename) with open(filename, "wb") as image: image.write( response.read() ) We're using http.client to handle the client side of the HTTP protocol. We're also using the contextlib module to politely disentangle our application from network resources when we're done using them. We've assigned a list of paths to the path_list variable. This example introduces list objects without providing any background. It's important that lists are surrounded by [] and the items are separated by ,. Yes, there's an extra , at the end. This is legal in Python. We created an http.client.HTTPConnection object using the host computer name. This connection object is a little like a file; it entangles Python with operating system resources on our local computer plus a remote server. Unlike a file, an HTTPConnection object isn't a proper context manager. As we really like context managers to release our resources, we made use of the contextlib.closing() function to handle the context management details. The connection needs to be closed; the closing() function assures that this will happen by calling the connection's close() method. For all of the paths in our path_list, we make an HTTP GET request. This is what browsers do to get the image files mentioned in an HTML page. We print a few things from each response. The status, if everything worked, will be 200. If the status is not 200, then something went wrong and we'll need to read up on the HTTP status code to see what happened. If you use a coffee shop Wi-Fi connection, perhaps you're not logged in. You might need to open a browser to set up a connection. An HTTP response includes headers that provide some additional details about the request and response. We've printed the headers because they can be helpful in debugging any problems we might have. One of the most useful headers is ('Content-Type', 'image/jpeg'). This confirms that we really did get an image. We used _, _, filename = path.rpartition("/") to locate the right-most / character in the path. Recall that the partition() method locates the left-most instance. We're using the right-most one here. We assigned the directory information and separator to the variable _. Yes, _ is a legal variable name. It's easy to ignore, which makes it a handy shorthand for we don't care. We kept the filename in the filename variable. We create a nested context for the resulting image file. We can then read the body of the response—a collection of bytes—and write these bytes to the image file. In one quick motion, the file is ours. The HTTP GET request is what underlies much of the WWW. Programs such as curl and wget are expansions of this example. They execute batches of GET requests to locate one or more pages of content. They can do quite a bit more, but this is the essence of extracting data from the WWW. Changing our client information An HTTP GET request includes several headers in addition to the URL. In the previous example, we simply relied on the Python http.client library to supply a suitable set of default headers. There are several reasons why we might want to supply different or additional headers. First, we might want to tweak the User-Agent header to change the kind of browser that we're claiming to be. We might also need to provide cookies for some kinds of interactions. For information on the user agent string, see http://en.wikipedia.org/wiki/User_agent_string#User_agent_identification. This information may be used by the web server to determine if a mobile device or desktop device is being used. We can use something like this: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_2) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/537.75.14 This makes our Python request appear to come from the Safari browser instead of a Python application. We can use something like this to appear to be a different browser on a desktop computer: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0 We can use something like this to appear to be an iPhone instead of a Python application: Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D201 Safari/9537.53 We make this change by adding headers to the request we're making. The change looks like this: connection.request( "GET", path, headers= { 'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 7_1_1 like Mac OS X) AppleWebKit/537.51.2 (KHTML, like Gecko) Version/7.0 Mobile/11D201 Safari/9537.53', }) This will make the web server treat our Python application like it's on an iPhone. This might lead to a more compact page of data than might be provided to a full desktop computer that makes the same request. The header information is a structure with the { key: value, } syntax. It's important that dictionaries are surrounded by {}, the keys and values are separated by :, and each key-value pair is separated by ,. Yes, there's an extra , at the end. This is legal in Python. There are many more HTTP headers we can provide. The User-Agent header is perhaps most important to gather different kinds of intelligence data from web servers. You can refer more book related to this topic on the following links: Python for Secret Agents - Volume II: (https://www.packtpub.com/application-development/python-secret-agents-volume-ii) Expert Python Programming: (https://www.packtpub.com/application-development/expert-python-programming) Raspberry Pi for Secret Agents: (https://www.packtpub.com/hardware-and-creative/raspberry-pi-secret-agents) Resources for Article: Further resources on this subject: Python Libraries[article] Optimization in Python[article] Introduction to Object-Oriented Programming using Python, JavaScript, and C#[article]
Read more
  • 0
  • 0
  • 15858

article-image-introducing-object-oriented-programmng-typescript
Packt
17 Feb 2016
13 min read
Save for later

Introducing Object Oriented Programmng with TypeScript

Packt
17 Feb 2016
13 min read
In this article, we will see how to group our functions in reusable components, such as classes or modules. This article will cover the following topics: SOLID principles Classes Association, aggregation, and composition Inheritance (For more resources related to this topic, see here.) SOLID principles In the early days of software development, developers used to write code with procedural programming languages. In procedural programming languages, the programs follow a top-to-bottom approach and the logic is wrapped with functions. New styles of computer programming, such as modular programming or structured programming, emerged when developers realized that procedural computer programs could not provide them with the desired level of abstraction, maintainability, and reusability. The development community created a series of recommended practices and design patterns to improve the level of abstraction and reusability of procedural programming languages, but some of these guidelines required a certain level of expertise. In order to facilitate adherence to these guidelines, a new style of computer programming known as object-oriented programming (OOP) was created. Developers quickly noticed some common OOP mistakes and came up with five rules that every OOP developer should follow to create a system that is easy to maintain and extend over time. These five rules are known as the SOLID principles. SOLID is an acronym introduced by Michael Feathers, which stands for the following principles: Single responsibility principle (SRP): This principle states that a software component (function, class, or module) should focus on one unique task (have only one responsibility). Open/closed principle (OCP): This principle states that software entities should be designed with application growth (new code) in mind (should be open to extension), but the application growth should require the fewer possible number of changes to the existing code (be closed for modification). Liskov substitution principle (LSP): This principle states that we should be able to replace a class in a program with another class as long as both classes implement the same interface. After replacing the class, no other changes should be required, and the program should continue to work as it did originally. Interface segregation principle (ISP): This principle states that we should split interfaces that are very large (general-purpose interfaces) into smaller and more specific ones (many client-specific interfaces) so that clients will only need to know about the methods that are of interest to them. Dependency inversion principle (DIP): This principle states that entities should depend on abstractions (interfaces) as opposed to depending on concretion (classes). In this article, we will see how to write TypeScript code that adheres to these principles so that our applications are easy to maintain and extend over time. Classes In this section, we will look at some details and OOP concepts through examples. Let's start by declaring a simple class: class Person { public name : string; public surname : string; public email : string; constructor(name : string, surname : string, email : string){ this.email = email; this.name = name; this.surname = surname; } greet() { alert("Hi!"); } } var me : Person = new Person("Remo", "Jansen", "remo.jansen@wolksoftware.com"); We use classes to represent the type of an object or entity. A class is composed of a name, attributes, and methods. The class in the preceding example is named Person and contains three attributes or properties (name, surname, and email) and two methods (constructor and greet). Class attributes are used to describe the object's characteristics, while class methods are used to describe its behavior. A constructor is a special method used by the new keyword to create instances (also known as objects) of our class. We have declared a variable named me, which holds an instance of the Person class. The new keyword uses the Person class's constructor to return an object whose type is Person. A class should adhere to the single responsibility principle (SRP). The Person class in the preceding example represents a person, including all their characteristics (attributes) and behaviors (methods). Now let's add some email as validation logic: class Person { public name : string; public surname : string; public email : string; constructor(name : string, surname : string, email : string) { this.surname = surname; this.name = name; if(this.validateEmail(email)) { this.email = email; } else { throw new Error("Invalid email!"); } } validateEmail() { var re = /S+@S+.S+/; return re.test(this.email); } greet() { alert("Hi! I'm " + this.name + ". You can reach me at " + this.email); } } When an object doesn't follow the SRP and it knows too much (has too many properties) or does too much (has too many methods), we say that the object is a God object. The Person class here is a God object because we have added a method named validateEmail that is not really related to the Person class's behavior. Deciding which attributes and methods should or should not be part of a class is a relatively subjective decision. If we spend some time analyzing our options, we should be able to find a way to improve the design of our classes. We can refactor the Person class by declaring an Email class, responsible for e-mail validation, and use it as an attribute in the Person class: class Email { public email : string; constructor(email : string){ if(this.validateEmail(email)) { this.email = email; } else { throw new Error("Invalid email!"); } } validateEmail(email : string) { var re = /S+@S+.S+/; return re.test(email); } } Now that we have an Email class, we can remove the responsibility of validating the emails from the Person class and update its email attribute to use the type Email instead of string: class Person { public name : string; public surname : string; public email : Email; constructor(name : string, surname : string, email : Email){ this.email = email; this.name = name; this.surname = surname; } greet() { alert("Hi!"); } } Making sure that a class has a single responsibility makes it easier to see what it does and how we can extend/improve it. We can further improve our Person and Email classes by increasing the level of abstraction of our classes. For example, when we use the Email class, we don't really need to be aware of the existence of the validateEmail method; so this method could be invisible from outside the Email class. As a result, the Email class would be much simpler to understand. When we increase the level of abstraction of an object, we can say that we are encapsulating the object's data and behavior. Encapsulation is also known as information hiding. For example, the Email class allows us to use emails without having to worry about e-mail validation because the class will deal with it for us. We can make this clearer by using access modifiers (public or private) to flag as private all the class attributes and methods that we want to abstract from the use of the Email class: class Email { private email : string; constructor(email : string){ if(this.validateEmail(email)) { this.email = email; } else { throw new Error("Invalid email!"); } } private validateEmail(email : string) { var re = /S+@S+.S+/; return re.test(email); } get():string { return this.email; } } We can then simply use the Email class without needing to explicitly perform any kind of validation: var email = new Email("remo.jansen@wolksoftware.com"); Interfaces The feature that we will miss the most when developing large-scale web applications with JavaScript is probably interfaces. We have seen that following the SOLID principles can help us to improve the quality of our code, and writing good code is a must when working on a large project. The problem is that if we attempt to follow the SOLID principles with JavaScript, we will soon realize that without interfaces, we will never be able to write SOLID OOP code. Fortunately for us, TypeScript features interfaces. Traditionally, in OOP, we say that a class can extend another class and implement one or more interfaces. An interface can implement one or more interfaces and cannot extend another class or interface. Wikipedia's definition of interfaces in OOP is as follows: In object-oriented languages, the term interface is often used to define an abstract type that contains no data or code, but defines behaviors as method signatures. Implementing an interface can be understood as signing a contract. The interface is a contract, and when we sign it (implement it), we must follow its rules. The interface rules are the signatures of the methods and properties, and we must implement them. We will see many examples of interfaces later in this article. In TypeScript, interfaces don't strictly follow this definition. The two main differences are that in TypeScript: An interface can extend another interface or class An interface can define data and behaviors as opposed to only behaviors Association, aggregation, and composition In OOP, classes can have some kind of relationship with each other. Now, we will take a look at the three different types of relationships between classes. Association We call association those relationships whose objects have an independent lifecycle and where there is no ownership between the objects. Let's take an example of a teacher and student. Multiple students can associate with a single teacher, and a single student can associate with multiple teachers, but both have their own lifecycles (both can be create and delete independently); so when a teacher leaves the school, we don't need to delete any students, and when a student leaves the school, we don't need to delete any teachers. Aggregation We call aggregation those relationships whose objects have an independent lifecycle, but there is ownership, and child objects cannot belong to another parent object. Let's take an example of a cell phone and a cell phone battery. A single battery can belong to a phone, but if the phone stops working, and we delete it from our database, the phone battery will not be deleted because it may still be functional. So in aggregation, while there is ownership, objects have their own lifecycle. Composition We use the term composition to refer to relationships whose objects don't have an independent lifecycle, and if the parent object is deleted, all child objects will also be deleted. Let's take an example of the relationship between questions and answers. Single questions can have multiple answers, and answers cannot belong to multiple questions. If we delete questions, answers will automatically be deleted. Objects with a dependent life cycle (answers, in the example) are known as weak entities. Sometimes, it can be a complicated process to decide if we should use association, aggregation, or composition. This difficulty is caused in part because aggregation and composition are subsets of association, meaning they are specific cases of association. Inheritance One of the most fundamental object-oriented programming features is its capability to extend existing classes. This feature is known as inheritance and allows us to create a new class (child class) that inherits all the properties and methods from an existing class (parent class). Child classes can include additional properties and methods not available in the parent class. Let's return to our previously declared Person class. We will use the Person class as the parent class of a child class named Teacher: class Person { public name : string; public surname : string; public email : Email; constructor(name : string, surname : string, email : Email){ this.name = name; this.surname = surname; this.email = email; } greet() { alert("Hi!"); } } This example is included in the companion source code. Once we have a parent class in place, we can extend it by using the reserved keyword extends. In the following example, we declare a class called Teacher, which extends the previously defined Person class. This means that Teacher will inherit all the attributes and methods from its parent class: class Teacher extends Person { class Teacher extends Person { teach() { alert("Welcome to class!"); } } Note that we have also added a new method named teach to the class Teacher. If we create instances of the Person and Teacher classes, we will be able to see that both instances share the same attributes and methods with the exception of the teach method, which is only available for the instance of the Teacher class: var teacher = new Teacher("remo", "jansen", new Email("remo.jansen@wolksoftware.com")); var me = new Person("remo", "jansen", new Email("remo.jansen@wolksoftware.com")); me.greet(); teacher.greet(); me.teach(); // Error : Property 'teach' does not exist on type 'Person' teacher.teach(); Sometimes, we will need a child class to provide a specific implementation of a method that is already provided by its parent class. We can use the reserved keyword super for this purpose. Imagine that we want to add a new attribute to list the teacher's subjects, and we want to be able to initialize this attribute through the teacher constructor. We will use the super keyword to explicitly reference the parent class constructor inside the child class constructor. We can also use the super keyword when we want to extend an existing method, such as greet. This OOP language feature that allows a subclass or child class to provide a specific implementation of a method that is already provided by its parent classes is known as method overriding. class Teacher extends Person { public subjects : string[]; constructor(name : string, surname : string, email : Email, subjects : string[]){ super(name, surname, email); this.subjects = subjects; } greet() { super.greet(); alert("I teach " + this.subjects); } teach() { alert("Welcome to Maths class!"); } } var teacher = new Teacher("remo", "jansen", new Email("remo.jansen@wolksoftware.com"), ["math", "physics"]); We can declare a new class that inherits from a class that is already inheriting from another. In the following code snippet, we declare a class called SchoolPrincipal that extends the Teacher class, which extends the Person class: class SchoolPrincipal extends Teacher { manageTeachers() { alert("We need to help students to get better results!"); } } If we create an instance of the SchoolPrincipal class, we will be able to access all the properties and methods from its parent classes (SchoolPrincipal, Teacher, and Person): var principal = new SchoolPrincipal("remo", "jansen", new Email("remo.jansen@wolksoftware.com"), ["math", "physics"]); principal.greet(); principal.teach(); principal.manageTeachers(); It is not recommended to have too many levels in the inheritance tree. A class situated too deeply in the inheritance tree will be relatively complex to develop, test, and maintain. Unfortunately, we don't have a specific rule that we can follow when we are unsure whether we should increase the depth of the inheritance tree (DIT). We should use inheritance in such a way that it helps us to reduce the complexity of our application and not the opposite. We should try to keep the DIT between 0 and 4 because a value greater than 4 would compromise encapsulation and increase complexity. Summary In this article, we saw how to work with classes, and interfaces in depth. We were able to reduce the complexity of our application by using techniques such as encapsulation and inheritance. To learn more about TypeScript, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: TypeScript Essentials (https://www.packtpub.com/web-development/typescript-essentials) Mastering TypeScript(https://www.packtpub.com/web-development/mastering-typescript) Resources for Article: Further resources on this subject: Writing SOLID JavaScript code with TypeScript [article] Introduction to TypeScript [article] An Introduction to Mastering JavaScript Promises and Its Implementation in Angular.js [article]
Read more
  • 0
  • 0
  • 27935

article-image-using-collider-based-system
Packt
17 Feb 2016
10 min read
Save for later

Using a collider-based system

Packt
17 Feb 2016
10 min read
In this article by Jorge Palacios, the author of the book Unity 5.x Game AI Programming Cookbook, you will learn how to implement agent awareness using a mixed approach that considers the previous learnt sensory-level algorithms. (For more resources related to this topic, see here.) Seeing using a collider-based system This is probably the easiest way to simulate vision. We take a collider, be it a mesh or a Unity primitive, and use it as the tool to determine whether an object is inside the agent's vision range or not. Getting ready It's important to have a collider component attached to the same game object using the script on this recipe, as well as the other collider-based algorithms in this chapter. In this case, it's recommended that the collider be a pyramid-based one in order to simulate a vision cone. The lesser the polygons, the faster it will be on the game. How to do it… We will create a component that is able to see enemies nearby by performing the following steps: Create the Visor component, declaring its member variables. It is important to add the corresponding tags into Unity's configuration: using UnityEngine; using System.Collections; public class Visor : MonoBehaviour { public string tagWall = "Wall"; public string tagTarget = "Enemy"; public GameObject agent; } Implement the function for initializing the game object in case the component is already assigned to it: void Start() { if (agent == null) agent = gameObject; } Declare the function for checking collisions for every frame and build it in the following steps: public void OnCollisionStay(Collision coll) { // next steps here } Discard the collision if it is not a target: string tag = coll.gameObject.tag; if (!tag.Equals(tagTarget)) return; Get the game object's position and compute its direction from the Visor: GameObject target = coll.gameObject; Vector3 agentPos = agent.transform.position; Vector3 targetPos = target.transform.position; Vector3 direction = targetPos - agentPos; Compute its length and create a new ray to be shot soon: float length = direction.magnitude; direction.Normalize(); Ray ray = new Ray(agentPos, direction); Cast the created ray and retrieve all the hits: RaycastHit[] hits; hits = Physics.RaycastAll(ray, length); Check for any wall between the visor and target. If none, we can proceed to call our functions or develop our behaviors to be triggered: int i; for (i = 0; i < hits.Length; i++) { GameObject hitObj; hitObj = hits[i].collider.gameObject; tag = hitObj.tag; if (tag.Equals(tagWall)) return; } // TODO // target is visible // code your behaviour below How it works… The collider component checks every frame to know whether it is colliding with any game object in the scene. We leverage the optimizations to Unity's scene graph and engine, and focus only on how to handle valid collisions. After checking whether a target object is inside the vision range represented by the collider, we cast a ray in order to check whether it is really visible or there is a wall in between. Hearing using a collider-based system In this recipe, we will emulate the sense of hearing by developing two entities; a sound emitter and a sound receiver. It is based on the principles proposed by Millington for simulating a hearing system, and uses the power of Unity colliders to detect receivers near an emitter. Getting ready As with the other recipes based on colliders, we will need collider components attached to every object to be checked and rigid body components attached to either emitters or receivers. How to do it… We will create the SoundReceiver class for our agents and SoundEmitter for things such as alarms: Create the class for the SoundReceiver object: using UnityEngine; using System.Collections; public class SoundReceiver : MonoBehaviour { public float soundThreshold; } We define the function for our own behavior to handle the reception of sound: public virtual void Receive(float intensity, Vector3 position) { // TODO // code your own behavior here } Now, let's create the class for the SoundEmitter object: using UnityEngine; using System.Collections; using System.Collections.Generic; public class SoundEmitter : MonoBehaviour { public float soundIntensity; public float soundAttenuation; public GameObject emitterObject; private Dictionary<int, SoundReceiver> receiverDic; } Initialize the list of receivers nearby and emitterObject in case the component is attached directly: void Start() { receiverDic = new Dictionary<int, SoundReceiver>(); if (emitterObject == null) emitterObject = gameObject; } Implement the function for adding new receivers to the list when they enter the emitter bounds: public void OnCollisionEnter(Collision coll) { SoundReceiver receiver; receiver = coll.gameObject.GetComponent<SoundReceiver>(); if (receiver == null) return; int objId = coll.gameObject.GetInstanceID(); receiverDic.Add(objId, receiver); } Also, implement the function for removing receivers from the list when they are out of reach: public void OnCollisionExit(Collision coll) { SoundReceiver receiver; receiver = coll.gameObject.GetComponent<SoundReceiver>(); if (receiver == null) return; int objId = coll.gameObject.GetInstanceID(); receiverDic.Remove(objId); } Define the function for emitting sound waves to nearby agents: public void Emit() { GameObject srObj; Vector3 srPos; float intensity; float distance; Vector3 emitterPos = emitterObject.transform.position; // next step here } Compute sound attenuation for every receiver: foreach (SoundReceiver sr in receiverDic.Values) { srObj = sr.gameObject; srPos = srObj.transform.position; distance = Vector3.Distance(srPos, emitterPos); intensity = soundIntensity; intensity -= soundAttenuation * distance; if (intensity < sr.soundThreshold) continue; sr.Receive(intensity, emitterPos); } How it works… The collider triggers help register agents in the list of agents assigned to an emitter. The sound emission function then takes into account the agent's distance from the emitter in order to decrease its intensity using the concept of sound attenuation. There is more… We can develop a more flexible algorithm by defining different types of walls that affect sound intensity. It works by casting rays and adding up their values to the sound attenuation: Create a dictionary to store wall types as strings (using tags) and their corresponding attenuation: public Dictionary<string, float> wallTypes; Reduce sound intensity this way: intensity -= GetWallAttenuation(emitterPos, srPos); Define the function called in the previous step: public float GetWallAttenuation(Vector3 emitterPos, Vector3 receiverPos) { // next steps here } Compute the necessary values for ray casting: float attenuation = 0f; Vector3 direction = receiverPos - emitterPos; float distance = direction.magnitude; direction.Normalize(); Cast the ray and retrieve the hits: Ray ray = new Ray(emitterPos, direction); RaycastHit[] hits = Physics.RaycastAll(ray, distance); For every wall type found via tags, add up its value (stored in the dictionary): int i; for (i = 0; i < hits.Length; i++) { GameObject obj; string tag; obj = hits[i].collider.gameObject; tag = obj.tag; if (wallTypes.ContainsKey(tag)) attenuation += wallTypes[tag]; } return attenuation; Smelling using a collider-based system Smelling can be simulated by computing collision between an agent and odor particles, scattered throughout the game level. Getting ready In this recipe based on colliders, we will need collider components attached to every object to be checked, which can be simulated by computing a collision between an agent and odor particles. How to do it… We will develop the scripts needed to represent odor particles and agents able to smell: Create the particle's script and define its member variables for computing its lifespan: using UnityEngine; using System.Collections; public class OdorParticle : MonoBehaviour { public float timespan; private float timer; } Implement the Start function for proper validations: void Start() { if (timespan < 0f) timespan = 0f; timer = timespan; } Implement the timer and destroy the object after its life cycle: void Update() { timer -= Time.deltaTime; if (timer < 0f) Destroy(gameObject); } Create the class for representing the sniffer agent: using UnityEngine; using System.Collections; using System.Collections.Generic; public class Smeller : MonoBehaviour { private Vector3 target; private Dictionary<int, GameObject> particles; } Initialize the dictionary for storing odor particles: void Start() { particles = new Dictionary<int, GameObject>(); } Add to the dictionary the colliding objects that have the odor-particle component attached: public void OnCollisionEnter(Collision coll) { GameObject obj = coll.gameObject; OdorParticle op; op = obj.GetComponent<OdorParticle>(); if (op == null) return; int objId = obj.GetInstanceID(); particles.Add(objId, obj); UpdateTarget(); } Release the odor particles from the local dictionary when they are out of the agent's range or are destroyed: public void OnCollisionExit(Collision coll) { GameObject obj = coll.gameObject; int objId = obj.GetInstanceID(); bool isRemoved; isRemoved = particles.Remove(objId); if (!isRemoved) return; UpdateTarget(); } Create the function for computing the odor centroid according to the current elements in the dictionary: private void UpdateTarget() { Vector3 centroid = Vector3.zero; foreach (GameObject p in particles.Values) { Vector3 pos = p.transform.position; centroid += pos; } target = centroid; } Implement the function for retrieving the odor centroid, if any: public Vector3? GetTargetPosition() { if (particles.Keys.Count == 0) return null; return target; } How it works… Just like the hearing recipe based on colliders, we use the trigger colliders to register odor particles to an agent's perception (implemented using a dictionary). When a particle is included or removed, the odor centroid is computed. However, we implement a function to retrieve that centroid because when no odor particle is registered, the internal centroid position is not updated. There is more… The particle emission logic is left behind to be implemented according to our game's needs and it basically instantiates odor-particle prefabs. Also, it is recommended to attach the rigid body components to the agents. Odor particles are prone to be massively instantiated, reducing the game's performance. Seeing using a graph-based system We will start a recipe oriented to use graph-based logic in order to simulate sense. Again, we will start by developing the sense of vision. Getting ready It is important to grasp the chapter regarding path finding in order to understand the inner workings of the graph-based recipes. How to do it… We will just implement a new file: Create the class for handling vision: using UnityEngine; using System.Collections; using System.Collections.Generic; public class VisorGraph : MonoBehaviour { public int visionReach; public GameObject visorObj; public Graph visionGraph; } Validate the visor object: void Start() { if (visorObj == null) visorObj = gameObject; } Define and start building the function needed to detect visibility of a given set of nodes: public bool IsVisible(int[] visibilityNodes) { int vision = visionReach; int src = visionGraph.GetNearestVertex(visorObj); HashSet<int> visibleNodes = new HashSet<int>(); Queue<int> queue = new Queue<int>(); queue.Enqueue(src); } Implement a breath-first search algorithm: while (queue.Count != 0) { if (vision == 0) break; int v = queue.Dequeue(); List<int> neighbours = visionGraph.GetNeighbors(v); foreach (int n in neighbours) { if (visibleNodes.Contains(n)) continue; queue.Enqueue(v); visibleNodes.Add(v); } } Compare the set of visible nodes with the set of nodes reached by the vision system: foreach (int vn in visibleNodes) { if (visibleNodes.Contains(vn)) return true; } Return false if there is no match between the two sets of nodes: return false; How it works… The recipe uses the breath-first search algorithm in order to discover nodes within its vision reach, and then compares this set of nodes with the set of nodes where the agents reside. Summary In this article, we explained some algorithms for simulating senses and agent awareness. Resources for Article: Further resources on this subject: Animation and Unity3D Physics[article] Unity 3-0 Enter the Third Dimension[article] Animation features in Unity 5[article]
Read more
  • 0
  • 0
  • 31157

Packt
17 Feb 2016
29 min read
Save for later

Spark – Architecture and First Program

Packt
17 Feb 2016
29 min read
In this article by Sumit Gupta and Shilpi Saxena, the authors of Real-Time Big Data Analytics, we will discuss the architecture of Spark and its various components in detail. We will also briefly talk about the various extensions/libraries of Spark, which are developed over the core Spark framework. (For more resources related to this topic, see here.) Spark is a general-purpose computing engine that initially focused to provide solutions to the iterative and interactive computations and workloads. For example, machine learning algorithms, which reuse intermediate or working datasets across multiple parallel operations. The real challenge with iterative computations was the dependency of the intermediate data/steps on the overall job. This intermediate data needs to be cached in the memory itself for faster computations because flushing and reading from a disk was an overhead, which, in turn, makes the overall process unacceptably slow. The creators of Apache Spark not only provided scalability, fault tolerance, performance, distributed data processing but also provided in-memory processing of distributed data over the cluster of nodes. To achieve this, a new layer abstraction of distributed datasets that is partitioned over the set of machines (cluster) was introduced, which can be cached in the memory to reduce the latency. This new layer of abstraction was known as resilient distributed datasets (RDD). RDD, by definition, is an immutable (read-only) collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost. It is important to note that Spark is capable of performing in-memory operations, but at the same time, it can also work on the data stored on the disk. High-level architecture Spark provides a well-defined and layered architecture where all its layers and components are loosely coupled and integration with external components/libraries/extensions is performed using well-defined contracts. Here is the high-level architecture of Spark 1.5.1 and its various components/layers: The preceding diagram shows the high-level architecture of Spark. Let's discuss the roles and usage of each of the architecture components: Physical machines: This layer represents the physical or virtual machines/nodes on which Spark jobs are executed. These nodes collectively represent the total capacity of the cluster with respect to the CPU, memory, and data storage. Data storage layer: This layer provides the APIs to store and retrieve the data from the persistent storage area to Spark jobs/applications. This layer is used by Spark workers to dump data on the persistent storage whenever the cluster memory is not sufficient to hold the data. Spark is extensible and capable of using any kind of filesystem. RDD, which hold the data, are agnostic to the underlying storage layer and can persist the data in various persistent storage areas, such as local filesystems, HDFS, or any other NoSQL database such as HBase, Cassandra, MongoDB, S3, and Elasticsearch. Resource manager: The architecture of Spark abstracts out the deployment of the Spark framework and its associated applications. Spark applications can leverage cluster managers such as YARN (http://tinyurl.com/pcymnnf) and Mesos (http://mesos.apache.org/) for the allocation and deallocation of various physical resources, such as the CPU and memory for the client jobs. The resource manager layer provides the APIs that are used to request for the allocation and deallocation of available resource across the cluster. Spark core libraries: The Spark core library represents the Spark Core engine, which is responsible for the execution of the Spark jobs. It contains APIs for in-memory distributed data processing and a generalized execution model that supports a wide variety of applications and languages. Spark extensions/libraries: This layer represents the additional frameworks/APIs/libraries developed by extending the Spark core APIs to support different use cases. For example, Spark SQL is one such extension, which is developed to perform ad hoc queries and interactive analysis over large datasets. The preceding architecture should be sufficient enough to understand the various layers of abstraction provided by Spark. All the layers are loosely coupled, and if required, can be replaced or extended as per the requirements. Spark extensions is one such layer that is widely used by architects and developers to develop custom libraries. Let's move forward and talk more about Spark extensions, which are available for developing custom applications/jobs. Spark extensions/libraries In this section, we will briefly discuss the usage of various Spark extensions/libraries that are available for different use cases. The following are the extensions/libraries available with Spark 1.5.1: Spark Streaming: Spark Streaming, as an extension, is developed over the core Spark API. It enables scalable, high-throughput, and fault-tolerant stream processing of live data streams. Spark Streaming enables the ingestion of data from various sources, such as Kafka, Flume, Kinesis, or TCP sockets. Once the data is ingested, it can be further processed using complex algorithms that are expressed with high-level functions, such as map, reduce, join, and window. Finally, the processed data can be pushed out to filesystems, databases, and live dashboards. In fact, Spark Streaming also facilitates the usage Spark's machine learning and graph processing algorithms on data streams. For more information, refer to http://spark.apache.org/docs/latest/streaming-programming-guide.html. Spark MLlib: Spark MLlib is another extension that provides the distributed implementation of various machine learning algorithms. Its goal is to make practical machine learning library scalable and easy to use. It provides implementation of various common machine learning algorithms used for classification, regression, clustering, and many more. For more information, refer to http://spark.apache.org/docs/latest/mllib-guide.html. Spark GraphX: GraphX provides the API to create a directed multigraph with properties attached to each vertex and edges. It also provides the various common operators used for the aggregation and distributed implementation of various graph algorithms, such as PageRank and triangle counting. For more information, refer to http://spark.apache.org/docs/latest/graphx-programming-guide.html. Spark SQL: Spark SQL provides the distributed processing of structured data and facilitates the execution of relational queries, which are expressed in a structured query language. (http://en.wikipedia.org/wiki/SQL). It provides the high level of abstraction known as DataFrames, which is a distributed collection of data organized into named columns. For more information, refer to http://spark.apache.org/docs/latest/sql-programming-guide.html. SparkR: R (https://en.wikipedia.org/wiki/R_(programming_language) is a popular programming language used for statistical computing and performing machine learning tasks. However, the execution of the R language is single threaded, which makes it difficult to leverage in order to process large data (TBs or PBs). R can only process the data that fits into the memory of a single machine. In order to overcome the limitations of R, Spark introduced a new extension: SparkR. SparkR provides an interface to invoke and leverage Spark distributed execution engine from R, which allows us to run large-scale data analysis from the R shell. For more information, refer to http://spark.apache.org/docs/latest/sparkr.html. All the previously listed Spark extension/libraries are part of the standard Spark distribution. Once we install and configure Spark, we can start using APIs that are exposed by the extensions. Apart from the earlier extensions, Spark also provides various other external packages that are developed and provided by the open source community. These packages are not distributed with the standard Spark distribution, but they can be searched and downloaded from http://spark-packages.org/. Spark packages provide libraries/packages for integration with various data sources, management tools, higher level domain-specific libraries, machine learning algorithms, code samples, and other Spark content. Let's move on to the next section where we will dive deep into the Spark packaging structure and execution model, and we will also talk about various other Spark components. Spark packaging structure and core APIs In this section, we will briefly talk about the packaging structure of the Spark code base. We will also discuss core packages and APIs, which will be frequently used by the architects and developers to develop custom applications with Spark. Spark is written in Scala (http://www.scala-lang.org/), but for interoperability, it also provides the equivalent APIs in Java and Python as well. For brevity, we will only talk about the Scala and Java APIs, and for Python APIs, users can refer to https://spark.apache.org/docs/1.5.1/api/python/index.html. A high-level Spark code base is divided into the following two packages: Spark extensions: All APIs for a particular extension is packaged in its own package structure. For example, all APIs for Spark Streaming are packaged in the org.apache.spark.streaming.* package, and the same packaging structure goes for other extensions: Spark MLlib—org.apache.spark.mllib.*, Spark SQL—org.apcahe.spark.sql.*, Spark GraphX—org.apache.spark.graphx.*. For more information, refer to http://tinyurl.com/q2wgar8 for Scala APIs and http://tinyurl.com/nc4qu5l for Java APIs. Spark Core: Spark Core is the heart of Spark and provides two basic components: SparkContext and SparkConfig. Both of these components are used by each and every standard or customized Spark job or Spark library and extension. The terms/concepts Context and Config are not new and more or less they have now become a standard architectural pattern. By definition, a Context is an entry point of the application that provides access to various resources/features exposed by the framework, whereas a Config contains the application configurations, which helps define the environment of the application. Let's move on to the nitty-gritty of the Scala APIs exposed by Spark Core: org.apache.spark: This is the base package for all Spark APIs that contains a functionality to create/distribute/submit Spark jobs on the cluster. org.apache.spark.SparkContext: This is the first statement in any Spark job/application. It defines the SparkContext and then further defines the custom business logic that is is provided in the job/application. The entry point for accessing any of the Spark features that we may want to use or leverage is SparkContext, for example, connecting to the Spark cluster, submitting jobs, and so on. Even the references to all Spark extensions are provided by SparkContext. There can be only one SparkContext per JVM, which needs to be stopped if we want to create a new one. The SparkContext is immutable, which means that it cannot be changed or modified once it is started. org.apache.spark.rdd.RDD.scala: This is another important component of Spark that represents the distributed collection of datasets. It exposes various operations that can be executed in parallel over the cluster. The SparkContext exposes various methods to load the data from HDFS or the local filesystem or Scala collections, and finally, create an RDD on which various operations such as map, filter, join, and persist can be invoked. RDD also defines some useful child classes within the org.apache.spark.rdd.* package such as PairRDDFunctions to work with key/value pairs, SequenceFileRDDFunctions to work with Hadoop sequence files, and DoubleRDDFunctions to work with RDDs of doubles. We will read more about RDD in the subsequent sections. org.apache.spark.annotation: This package contains the annotations, which are used within the Spark API. This is the internal Spark package, and it is recommended that you do not to use the annotations defined in this package while developing our custom Spark jobs. The three main annotations defined within this package are as follows: DeveloperAPI: All those APIs/methods, which are marked with DeveloperAPI, are for advance usage where users are free to extend and modify the default functionality. These methods may be changed or removed in the next minor or major releases of Spark. Experimental: All functions/APIs marked as Experimental are officially not adopted by Spark but are introduced temporarily in a specific release. These methods may be changed or removed in the next minor or major releases. AlphaComponent: The functions/APIs, which are still being tested by the Spark community, are marked as AlphaComponent. These are not recommended for production use and may be changed or removed in the next minor or major releases. org.apache.spark.broadcast: This is one of the most important packages, which are frequently used by developers in their custom Spark jobs. It provides the API for sharing the read-only variables across the Spark jobs. Once the variables are defined and broadcast, they cannot be changed. Broadcasting the variables and data across the cluster is a complex task, and we need to ensure that an efficient mechanism is used so that it improves the overall performance of the Spark job and does not become an overhead. Spark provides two different types of implementations of broadcasts—HttpBroadcast and TorrentBroadcast. The HttpBroadcast broadcast leverages the HTTP server to fetch/retrieve the data from Spark driver. In this mechanism, the broadcast data is fetched through an HTTP Server running at the driver itself and further stored in the executor block manager for faster accesses. The TorrentBroadcast broadcast, which is also the default implementation of the broadcast, maintains its own block manager. The first request to access the data makes the call to its own block manager, and if not found, the data is fetched in chunks from the executor or driver. It works on the principle of BitTorrent and ensures that the driver is not the bottleneck in fetching the shared variables and data. Spark also provides accumulators, which work like broadcast, but provide updatable variables shared across the Spark jobs but with some limitations. You can refer to https://spark.apache.org/docs/1.5.1/api/scala/index.html#org.apache.spark.Accumulator. org.apache.spark.io: This provides implementation of various compression libraries, which can be used at block storage level. This whole package is marked as Developer API, so developers can extend and provide their own custom implementations. By default, it provides three implementations: LZ4, LZF, and Snappy. org.apache.spark.scheduler: This provides various scheduler libraries, which help in job scheduling, tracking, and monitoring. It defines the directed acyclic graph (DAG) scheduler (http://en.wikipedia.org/wiki/Directed_acyclic_graph). The Spark DAG scheduler defines the stage-oriented scheduling where it keeps track of the completion of each RDD and the output of each stage and then computes DAG, which is further submitted to the underlying org.apache.spark.scheduler.TaskScheduler API that executes them on the cluster. org.apache.spark.storage: This provides APIs for structuring, managing, and finally, persisting the data stored in RDD within blocks. It also keeps tracks of data and ensures that it is either stored in memory, or if the memory is full, it is flushed to the underlying persistent storage area. org.apache.spark.util: These are the utility classes used to perform common functions across the Spark APIs. For example, it defines MutablePair, which can be used as an alternative to Scala's Tuple2 with the difference that MutablePair is updatable while Scala's Tuple2 is not. It helps in optimizing memory and minimizing object allocations. Spark execution model – master worker view Let's move on to the next section where we will dive deep into the Spark execution model, and we will also talk about various other Spark components. Spark essentially enables the distributed in-memory execution of a given piece of code. We discussed the Spark architecture and its various layers in the previous section. Let's also discuss its major components, which are used to configure the Spark cluster, and at the same time, they will be used to submit and execute our Spark jobs. The following are the high-level components involved in setting up the Spark cluster or submitting a Spark job: Spark driver: This is the client program, which defines SparkContext. The entry point for any job that defines the environment/configuration and the dependencies of the submitted job is SparkContext. It connects to the cluster manager and requests resources for further execution of the jobs. Cluster manager/resource manager/Spark master: The cluster manager manages and allocates the required system resources to the Spark jobs. Furthermore, it coordinates and keeps track of the live/dead nodes in a cluster. It enables the execution of jobs submitted by the driver on the worker nodes (also called Spark workers) and finally tracks and shows the status of various jobs running by the worker nodes. Spark worker/executors: A worker actually executes the business logic submitted by the Spark driver. Spark workers are abstracted and are allocated dynamically by the cluster manager to the Spark driver for the execution of submitted jobs. The following diagram shows the high-level components and the master worker view of Spark: The preceding diagram depicts the various components involved in setting up the Spark cluster, and the same components are also responsible for the execution of the Spark job. Although all the components are important, but let's briefly discuss the cluster/resource manager, as it defines the deployment model and allocation of resources to our submitted jobs. Spark enables and provides flexibility to choose our resource manager. As of Spark 1.5.1, the following are the resource managers or deployment models that are supported by Spark: Apache Mesos: Apache Mesos (http://mesos.apache.org/) is a cluster manager that provides efficient resource isolation and sharing across distributed applications or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other frameworks on a dynamically shared pool of nodes. Apache Mesos and Spark are closely related to each other (but they are not the same). The story started way back in 2009 when Mesos was ready and there were talks going on about the ideas/frameworks that can be developed on top of Mesos, and that's exactly how Spark was born. Refer to http://spark.apache.org/docs/latest/running-on-mesos.html for more information on running Spark jobs on Apache Mesos. Hadoop YARN: Hadoop 2.0 (http://tinyurl.com/qypb4xm), also known as YARN, was a complete change in the architecture. It was introduced as a generic cluster computing framework that was entrusted with the responsibility of allocating and managing the resources required to execute the varied jobs or applications. It introduced new daemon services, such as the resource manager (RM), node manager (NM), and application master (AM), which are responsible for managing cluster resources, individual nodes, and respective applications. YARN also introduced specific interfaces/guidelines for application developers where they can implement/follow and submit or execute their custom applications on the YARN cluster. The Spark framework implements the interfaces exposed by YARN and provides the flexibility of executing the Spark applications on YARN. Spark applications can be executed in the following two different modes in YARN: YARN client mode: In this mode, the Spark driver executes the client machine (the machine used for submitting the job), and the YARN application master is just used for requesting the resources from YARN. All our logs and sysouts (println) are printed on the same console, which is used to submit the job. YARN cluster mode: In this mode, the Spark driver runs inside the YARN application master process, which is further managed by YARN on the cluster, and the client can go away just after submitting the application. Now as our Spark driver is executed on the YARN cluster, our application logs/sysouts (println) are also written in the log files maintained by YARN and not on the machine that is used to submit our Spark job. For more information on executing Spark applications on YARN, refer to http://spark.apache.org/docs/latest/running-on-yarn.html. Standalone mode: The Core Spark distribution contains the required APIs to create an independent, distributed, and fault tolerant cluster without any external or third-party libraries or dependencies. Local mode: Local mode should not be confused with standalone mode. In local mode, Spark jobs can be executed on a local machine without any special cluster setup by just passing local[N] as the master URL, where N is the number of parallel threads. Writing and executing our first Spark program In this section, we will install/configure and write our first Spark program in Java and Scala. Hardware requirements Spark supports a variety of hardware and software platforms. It can be deployed on commodity hardware and also supports deployments on high-end servers. Spark clusters can be provisioned either on cloud or on-premises. Though there is no single configuration or standards, which can guide us through the requirements of Spark, but to create and execute Spark examples provided in this article, it would be good to have a laptop/desktop/server with the following configuration: RAM: 8 GB. CPU: Dual core or Quad core. DISK: SATA drives with a capacity of 300 GB to 500 GB with 15 k RPM. Operating system: Spark supports a variety of platforms that include various flavors of Linux (Ubuntu, HP-UX, RHEL, and many more) and Windows. For our examples, we will recommend that you use Ubuntu for the deployment and execution of examples. Spark core is coded in Scala, but it offers several development APIs in different languages, such as Scala, Java, and Python, so that developers can choose their preferred weapon for coding. The dependent software may vary based on the programming languages but still there are common sets of software for configuring the Spark cluster and then language-specific software for developing Spark jobs. In the next section, we will discuss the software installation steps required to write/execute Spark jobs in Scala and Java on Ubuntu as the operating system. Installation of the basic softwares In this section, we will discuss the various steps required to install the basic software, which will help us in the development and execution of our Spark jobs. Spark Perform the following steps to install Spark: Download the Spark compressed tarball from http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.4.tgz. Create a new directory spark-1.5.1 on your local filesystem and extract the Spark tarball into this directory. Execute the following command on your Linux shell in order to set SPARK_HOME as an environment variable: export SPARK_HOME=<Path of Spark install Dir> Now, browse your SPARK_HOME directory and it should look similar to the following screenshot: Java Perform the following steps to install Java: Download and install Oracle Java 7 from http://www.oracle.com/technetwork/java/javase/install-linux-self-extracting-138783.html. Execute the following command on your Linux shell to set JAVA_HOME as an environment variable: export JAVA_HOME=<Path of Java install Dir> Scala Perform the following steps to install Scala: Download the Scala 2.10.5 compressed tarball from http://downloads.typesafe.com/scala/2.10.5/scala-2.10.5.tgz?_ga=1.7758962.1104547853.1428884173. Create a new directory, Scala 2.10.5, on your local filesystem and extract the Scala tarball into this directory. Execute the following commands on your Linux shell to set SCALA_HOME as an environment variable, and add the Scala compiler to the $PATH system: export SCALA_HOME=<Path of Scala install Dir> Next, execute the command in the following screenshot to ensure that the Scala runtime and Scala compiler is available and the version is 2.10.x: Spark 1.5.1 supports the 2.10.5 version of Scala, so it is advisable to use the same version to avoid any runtime exceptions due to mismatch of libraries. Eclipse Perform the following steps to install Eclipse: Based on your hardware configuration, download Eclipse Luna (4.4) from http://www.eclipse.org/downloads/packages/eclipse-ide-java-eedevelopers/lunasr2: Next, install the IDE for Scala in Eclipse itself so that we can write and compile our Scala code inside Eclipse (http://scala-ide.org/download/current.html). We are now done with the installation of all the required software. Let's move on and configure our Spark cluster. Configuring the Spark cluster The first step to configure the Spark cluster is to identify the appropriate resource manager. We discussed the various resource managers in the Spark execution model – master worker view section (Yarn, Mesos, and standalone). Standalone is the most preferred resource manager for development because it is simple/quick and does not require installation of any other component or software. We will also configure the standalone resource manager for all our Spark examples, and for more information on Yarn and Mesos, refer to the Spark execution model – master worker view section. Perform the following steps to bring up an independent cluster using Spark binaries: The first step to set up the Spark cluster is to bring up the master node, which will track and allocate the systems' resource. Open your Linux shell and execute the following command: $SPARK_HOME/sbin/start-master.sh The preceding command will bring up your master node, and it will also enable a UI, the Spark UI to monitor the nodes/jobs in the Spark cluster, http://<host>:8080/. The <host> is the domain name of the machine on which the master is running. Next, let's bring up our worker node, which will execute our Spark jobs. Execute the following command on the same Linux shell: $SPARK_HOME/bin/spark-class org.apache.spark.deploy.worker.Worker <Spark-Master> & In the preceding command, replace the <Spark-Master> with the Spark URL, which is shown at the top of the Spark UI, just beside Spark master at. The preceding command will start the Spark worker process in the background and the same will also be reported in the Spark UI. The Spark UI shown in the preceding screenshot shows the three different sections, providing the following information: Workers: This reports the health of a worker node, which is alive or dead and also provides drill-down to query the status and details logs of the various jobs executed by that specific worker node Running applications: This shows the applications that are currently being executed in the cluster and also provides drill-down and enables viewing of application logs Completed application: This is the same functionality as running applications; the only difference being that it shows the jobs, which are finished We are done!!! Our Spark cluster is up and running and ready to execute our Spark jobs with one worker node. Let's move on and write our first Spark application in Scala and Java and further execute it on our newly created cluster. Coding Spark job in Scala In this section, we will code our first Spark job in Scala, and we will also execute the same job on our newly created Spark cluster and will further analyze the results. This is our first Spark job, so we will keep it simple. We will use the Chicago crimes dataset for August 2015and will count the number of crimes reported in August 2015. Perform the following steps to code the Spark job in Scala for aggregating the number of crimes in August 2015: Open Eclipse and create a Scala project called Spark-Examples. Expand your newly created project and modify the version of the Scala library container to 2.10. This is done to ensure that the version of Scala libraries used by Spark and the custom jobs developed/deployed are the same. Next, open the properties of your project Spark-Examples and add the dependencies for the all libraries packaged with the Spark distribution, which can be found at $SPARK_HOME/lib. Next, create a chapter.six Scala package, and in this package, define a new Scala object by the name of ScalaFirstSparkJob. Define a main method in the Scala object and also import SparkConfand SparkContext. Now, add the following code to the main method of ScalaFirstSparkJob: object ScalaFirstSparkJob { def main(args: Array[String]) { println("Creating Spark Configuration") //Create an Object of Spark Configuration val conf = new SparkConf() //Set the logical and user defined Name of this Application conf.setAppName("My First Spark Scala Application") println("Creating Spark Context") //Create a Spark Context and provide previously created //Object of SparkConf as an reference. val ctx = new SparkContext(conf) //Define the location of the file containing the Crime Data val file = "file:///home/ec2-user/softwares/crime-data/ Crimes_-Aug-2015.csv"; println("Loading the Dataset and will further process it") //Loading the Text file from the local file system or HDFS //and converting it into RDD. //SparkContext.textFile(..) - It uses the Hadoop's //TextInputFormat and file is broken by New line Character. //Refer to http://hadoop.apache.org/docs/r2.6.0/api/org/ apache/hadoop/mapred/TextInputFormat.html //The Second Argument is the Partitions which specify the parallelism. //It should be equal or more then number of Cores in the cluster. val logData = ctx.textFile(file, 2) //Invoking Filter operation on the RDD, and counting the number of lines in the Data loaded in RDD. //Simply returning true as "TextInputFormat" have already divided the data by "\n" //So each RDD will have only 1 line. val numLines = logData.filter(line => true).count() //Finally Printing the Number of lines. println("Number of Crimes reported in Aug-2015 = " + numLines) } } We are now done with the coding! Our first Spark job in Scala is ready for execution. Now, from Eclipse itself, export your project as a .jar fie, name it spark-examples.jar, and save this .jar file in the root of $SPARK_HOME. Next, open your Linux console, go to $SPARK_HOME, and execute the following command: $SPARK_HOME/bin/spark-submit --class chapter.six.ScalaFirstSparkJob --master spark://ip-10-166-191-242:7077 spark-examples.jar In the preceding command, ensure that the value given to --masterparameter is the same as it is shown on your Spark UI. The Spark-submit is a utility script, which is used to submit the Spark jobs to the cluster. As soon as you click on Enter and execute the preceding command, you will see lot of activity (log messages) on the console, and finally, you will see the output of your job at the end: Isn't that simple! As we move forward and discuss Spark more, you will appreciate the ease of coding and simplicity provided by Spark for creating, deploying, and running jobs in a distributed framework. Your completed job will also be available for viewing at the Spark UI: The preceding image shows the status of our first Scala job on the UI. Now let's move forward and develop the same Job using Spark Java APIs. Coding Spark job in Java Perform the following steps to code the Spark job in Java for aggregating the number of crimes in August 2015: Open your Spark-Examples Eclipse project (created in the previous section). Add a new chapter.six.JavaFirstSparkJobJava file, and add the following code snippet: import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.Function; public class JavaFirstSparkJob { public static void main(String[] args) { System.out.println("Creating Spark Configuration"); // Create an Object of Spark Configuration SparkConf javaConf = new SparkConf(); // Set the logical and user defined Name of this Application javaConf.setAppName("My First Spark Java Application"); System.out.println("Creating Spark Context"); // Create a Spark Context and provide previously created //Objectx of SparkConf as an reference. JavaSparkContext javaCtx = new JavaSparkContext(javaConf); System.out.println("Loading the Crime Dataset and will further process it"); String file = "file:///home/ec2-user/softwares/crime-data/Crimes_-Aug-2015.csv"; JavaRDD<String> logData = javaCtx.textFile(file); //Invoking Filter operation on the RDD. //And counting the number of lines in the Data loaded //in RDD. //Simply returning true as "TextInputFormat" have already divided the data by "\n" //So each RDD will have only 1 line. long numLines = logData.filter(new Function<String, Boolean>() { public Boolean call(String s) { return true; } }).count(); //Finally Printing the Number of lines System.out.println("Number of Crimes reported in Aug-2015 = "+numLines); javaCtx.close(); } } Next, compile the preceding JavaFirstSparkJob from Eclipse itself and perform steps 7, 8, and 9 of the previous section in which we executed the Spark Scala job. We are done! Analyze the output on the console; it should be the same as the output of the Scala job, which we executed in the previous section. Troubleshooting – tips and tricks In this section, we will talk about troubleshooting tips and tricks, which are helpful in solving the most common errors encountered while working with Spark. Port numbers used by Spark Spark binds various network ports for communication within the cluster/nodes and also exposes the monitoring information of jobs to developers and administrators. There may be instances where the default ports used by Spark may not be available or may be blocked by the network firewall which in turn will result in modifying the default Spark ports for master/worker or driver. Here is a list of all the ports utilized by Spark and their associated parameters, which need to be configured for any changes (http://spark.apache.org/docs/latest/security.html#configuring-ports-for-network-security). Classpath issues – class not found exception Classpath is the most common issue and it occurs frequently in distributed applications. Spark and its associated jobs run in a distributed mode on a cluster. So, if your Spark job is dependent upon external libraries, then we need to ensure that we package them into a single JAR fie and place it in a common location or the default classpath of all worker nodes or define the path of the JAR file within SparkConf itself: val sparkConf = new SparkConf().setAppName("myapp").setJars(<path of Jar file>)) Other common exceptions In this section, we will talk about few of the common errors/issues/exceptions encountered by architects/developers when they set up Spark or execute Spark jobs: Too many open files: This increases the ulimit on your Linux OS by executingsudo ulimit –n 20000. Version of Scala: Spark 1.5.1 supports Scala 2.10, so if you have multiple versions of Scala deployed on your box, then ensure that all versions are the same, that is, Scala 2.10. Out of memory on workers in standalone mode: This configures SPARK_WORKER_MEMORY in $SPARK_HOME/conf/spark-env.sh. By default, it provides a total memory of 1 G to workers, but at the same time, you should analyze and ensure that you are not loading or caching too much data on worker nodes. Out of memory in applications executed on worker nodes: This configures spark.executor.memory in your SparkConf, as follows: val sparkConf = new SparkConf().setAppName("myapp") .set("spark.executor.memory", "1g") The preceding tips will help you solve basic issues in setting up Spark clusters, but as you move ahead, there could be more complex issues, which are beyond the basic setup, and for all those issues, post your queries at http://stackoverflow.com/questions/tagged/apache-spark or mail at user@spark.apache.org. Summary In this article, we discussed the architecture of Spark and its various components. We also configured our Spark cluster and executed our first Spark job in Scala and Java. Resources for Article:   Further resources on this subject: Data mining [article] Python Data Science Up and Running [article] The Design Patterns Out There and Setting Up Your Environment [article]
Read more
  • 0
  • 0
  • 3188
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-chrome-custom-tabs
Packt
17 Feb 2016
16 min read
Save for later

Chrome Custom Tabs

Packt
17 Feb 2016
16 min read
Well, most of us know tabs from every day Internet browsing. It doesn't really matter which browser you use; all browsers support tabs and multiple tabs' browsing. This allows us to have more than one website open at the same time and navigate between the opened instances. In Android, things are much the same, but when using WebView, you don't have tabs. This article will give highlights about WebView and the new feature of Android 6, Chrome custom tabs. (For more resources related to this topic, see here.) What is WebView? WebView is the part in the Android OS that's responsible for rendering web pages in most Android apps. If you see web content in an Android app, chances are you're looking at WebView. The major exceptions to this rule are some of the Android browsers, such as Chrome, Firefox, and so on. In Android 4.3 and lower, WebView uses code based on Apple's Webkit. In Android 4.4 and higher, WebView is based on the Chromium project, which is the open source base of Google Chrome. In Android 5.0, WebView was decoupled into a separate app that allowed timely updates through Google Play without requiring firmware updates to be issued, and the same technique was used with Google Play services. Now, let's talk again about a simple scenario: we want to display web content (URL-related) in our application. We have two options: either launch a browser or build our own in-app browser using WebView. Both options have trade-offs or disadvantages if we write them down. A browser is an external application and you can't really change its UI; while using it, you push the users to other apps and you may lose them in the wild. On the other hand, using WebView will keep the users tightly inside. However, actually dealing with all possible actions in WebView is quite an overhead. Google heard our rant and came to the rescue with Chrome custom tabs. Now we have better control over the web content in our application, and we can stitch web content into our app in a cleaner, prettier manner. Customization options Chrome custom tabs allow several modifications and tweaks: The toolbar color Enter and exit animations Custom actions for the toolbar and overflow menu Prestarted and prefetched content for faster loading When to use Chrome custom tabs Ever since WebView came out, applications have been using it in multiple ways, embedding content—local static content inside the APK and dynamic content as loading web pages that were not designed for mobile devices at the beginning. Later on we saw the rise of the mobile web era complete with hybrid applications). Chrome custom tabs are a bit more than just loading local content or mobile-compatible web content. They should be used when you load web data and want to allow simple implementation and easier code maintenance and, furthermore, make the web content part of your application—as if it's always there within your app. Among the reasons why you should use custom tabs are the following: Easy implementation: you use the support library when required or just add extras to your View intent. It's that simple. In app UI modifications, you can do the following: Set the toolbar color Add/change the action button Add custom menu items to the overflow menu Set and create custom in/out animations when entering the tab or exiting to the previous location Easier navigation and navigation logic: you can get a callback notifying you about an external navigation, if required. You know when the user navigates to web content and where they should return when done. Chrome custom tabs allow added performance optimizations that you can use: You can keep the engine running, so to speak, and actually give the custom tab a head start to start itself and do some warm up prior to using it. This is done without interfering or taking away precious application resources. You can provide a URL to load in advance in the background while waiting for other user interactions. This speeds up the user-visible page loading time and gives the user a sense of blazing fast application where all the content is just a click away. While using the custom tab, the application won't be evicted as the application level will still be in the foreground even though the tab is on top of it. So, we remain at the top level for the entire usage time (unless a phone call or some other user interaction leads to a change). Using the same Chrome container means that users are already signed in to sites they connected to in the past; specific permissions that were granted previously apply here as well; even fill data, autocomplete, and sync work here. Chrome custom tabs allow us give the users the latest browser implementation on pre-Lollipop devices where WebView is not the latest version. The implementation guide As discussed earlier, we have a couple of features integrated into Chrome custom tabs. The first customizes the UI and interaction with the custom tabs. The second allows pages to be loaded faster and keeps the application alive. Can we use Chrome custom tabs? Before we start using custom tabs, we want to make sure they're supported. Chrome custom tabs expose a service, so the best check for support is to try and bind to the service. Success means that custom tabs are supported and can be used. You can check out this gist, which shows a helper how to to check it, or check the project source code later on at https://gist.github.com/MaTriXy/5775cb0ff98216b2a99d. After checking and learning that support exists, we will start with the UI and interaction part. Custom UI and tab interaction Here, we will use the well-known ACTION_VIEW intent action, and by appending extras to the intent sent to Chrome, we will trigger changes in the UI. Remember that the ACTION_VIEW intent is compatible with all browsers, including Chrome. There are some phones without Chrome out there, or there are instances where the device's default browser isn't Chrome. In these cases, the user will navigate to the specific browser application. Intent is a convenient way to pass that extra data we want Chrome to get. Don't use any of these flags when calling to the Chrome custom tabs: FLAG_ACTIVITY_NEW_TASK FLAG_ACTIVITY_NEW_DOCUMENT Before using the API, we need to add it to our gradle file: compile 'com.android.support:customtabs:23.1.0' This will allow us to use the custom tab support library in our application: CustomTabsIntent.EXTRA_SESSION The preceding code is an extra from the custom tabs support library; it's used to match the session. It must be included in the intent when opening a custom tab. It can be null if there is no need to match any service-side sessions with the intent. We have a sample project to show the options for the UI called ChubbyTabby at https://github.com/MaTriXy/ChubbyTabby. We will go over the important parts here as well. Our main interaction comes from a special builder from the support library called CustomTabsIntent.Builder; this class will help us build the intent we need for the custom tab: CustomTabsIntent.Builder intentBuilder = new CustomTabsIntent.Builder(); //init our Builder //Setting Toolbar Color int color = getResources().getColor(R.color.primary); //we use primary color for our toolbar as well - you can define any color you want and use it. intentBuilder.setToolbarColor(color); //Enabling Title showing intentBuilder.setShowTitle(true); //this will show the title in the custom tab along the url showing at the bottom part of the tab toolbar. //This part is adding custom actions to the over flow menu String menuItemTitle = getString(R.string.menu_title_share); PendingIntent menuItemPendingIntent = createPendingShareIntent(); intentBuilder.addMenuItem(menuItemTitle, menuItemPendingIntent); String menuItemEmailTitle = getString(R.string.menu_title_email); PendingIntent menuItemPendingIntentTwo = createPendingEmailIntent(); intentBuilder.addMenuItem(menuItemEmailTitle, menuItemPendingIntentTwo); //Setting custom Close Icon. intentBuilder.setCloseButtonIcon(mCloseButtonBitmap); //Adding custom icon with custom action for the share action. intentBuilder.setActionButton(mActionButtonBitmap, getString(R.string.menu_title_share), createPendingShareIntent()); //Setting start and exit animation for the custom tab. intentBuilder.setStartAnimations(this, R.anim.slide_in_right, R.anim.slide_out_left); intentBuilder.setExitAnimations(this, android.R.anim.slide_in_left, android.R.anim.slide_out_right); CustomTabActivityHelper.openCustomTab(this, intentBuilder.build(), Uri.parse(URL), new WebviewFallback(), useCustom);  A few things to notice here are as follows: Every menu item uses a pending intent; if you don't know what a pending intent is, head to http://developer.android.com/reference/android/app/PendingIntent.html When we set custom icons, such as close buttons or an action button, for that matter, we use bitmaps and we must decode the bitmap prior to passing it to the builder Setting animations is easy and you can use animations' XML files that you created previously; just make sure that you test the result before releasing the app The following screenshot is an example of a Chrome custom UI and tab: The custom action button As developers, we have full control over the action buttons presented in our custom tab. For most use cases, we can think of a share action or maybe a more common option that your users will perform. The action button is basically a bundle with an icon of the action button and a pending intent that will be called by Chrome when your user hits the action button. The icon should be 24 dp in height and 24-48 dp in width according to specifications: //Adding custom icon with custom action for the share action intentBuilder.setActionButton(mActionButtonBitmap, getString(R.string.menu_title_share), createPendingShareIntent()); Configuring a custom menu By default, Chrome custom tabs usually have a three-icon row with Forward, Page Info, and Refresh on top at all times and Find in page and Open in Browser (Open in Chrome can appear as well) at the footer of the menu. We, developers, have the ability to add and customize up to three menu items that will appear between the icon row and foot items as shown in the following screenshot: The menu we see is actually represented by an array of bundles, each with menu text and a pending intent that Chrome will call on your behalf when the user taps the item: //This part is adding custom buttons to the over flow menu String menuItemTitle = getString(R.string.menu_title_share); PendingIntent menuItemPendingIntent = createPendingShareIntent(); intentBuilder.addMenuItem(menuItemTitle, menuItemPendingIntent); String menuItemEmailTitle = getString(R.string.menu_title_email); PendingIntent menuItemPendingIntentTwo = createPendingEmailIntent(); intentBuilder.addMenuItem(menuItemEmailTitle, menuItemPendingIntentTwo); Configuring custom enter and exit animations Nothing is complete without a few animations to tag along. This is no different, as we have two transitions to make: one for the custom tab to enter and another for its exit; we have the option to set a specific animation for each start and exit animation: //Setting start and exit animation for the custom tab. intentBuilder.setStartAnimations(this,R.anim.slide_in_right, R.anim.slide_out_left); intentBuilder.setExitAnimations(this, android.R.anim.slide_in_left, android.R.anim.slide_out_right); Chrome warm-up Normally, after we finish setting up the intent with the intent builder, we should call CustomTabsIntent.launchUrl (Activity context, Uri url), which is a nonstatic method that will trigger a new custom tab activity to load the URL and show it in the custom tab. This can take up quite some time and impact the impression of smoothness the app provides. We all know that users demand a near-instantaneous experience, so Chrome has a service that we can connect to and ask it to warm up the browser and its native components. Calling this will ask Chrome to perform the following: The DNS preresolution of the URL's main domain The DNS preresolution of the most likely subresources Preconnection to the destination, including HTTPS/TLS negotiation The process to warm up Chrome is as follows: Connect to the service. Attach a navigation callback to get notified upon finishing the page load. On the service, call warmup to start Chrome behind the scenes. Create newSession; this session is used for all requests to the API. Tell Chrome which pages the user is likely to load with mayLaunchUrl. Launch the intent with the session ID generated in step 4. Connecting to the Chrome service Connecting to the Chrome service involves dealing with Android Interface Definition Language (AIDL). If you don't know about AIDL, read http://developer.android.com/guide/components/aidl.html. The interface is created with AIDL, and it automatically creates a proxy service class for you: CustomTabsClient.bindCustomTabsService() So, we check for the Chrome package name; in our sample project, we have a special method to check whether Chrome is present in all variations. After we set the package, we bind to the service and get a CustomTabsClient object that we can use until we're disconnected from the service: pkgName - This is one of several options checking to see if we have a version of Chrome installed can be one of the following static final String STABLE_PACKAGE = "com.android.chrome"; static final String BETA_PACKAGE = "com.chrome.beta"; static final String DEV_PACKAGE = "com.chrome.dev"; static final String LOCAL_PACKAGE = "com.google.android.apps.chrome"; private CustomTabsClient mClient; // Binds to the service. CustomTabsClient.bindCustomTabsService(myContext, pkgName, new CustomTabsServiceConnection() { @Override public void onCustomTabsServiceConnected(ComponentName name, CustomTabsClient client) { // CustomTabsClient should now be valid to use mClient = client; } @Override public void onServiceDisconnected(ComponentName name) { // CustomTabsClient is no longer valid which also invalidates sessions. mClient = null; } });  After we bind to the service, we can call the proper methods we need. Warming up the browser process The method for this is as follows: boolean CustomTabsClient.warmup(long flags) //With our valid client earlier we call the warmup method. mClient.warmup(0); Flags are currently not being used, so we pass 0 for now. The warm-up procedure loads native libraries and the browser process required to support custom tab browsing later on. This is asynchronous, and the return value indicates whether the request has been accepted or not. It returns true to indicate success. Creating a new tab session The method for this is as follows: boolean CustomTabsClient.newSession(ICustomTabsCallback callback) The new tab session is used as the grouping object tying the mayLaunchUrl call, the VIEW intent that we build, and the tab generated altogether. We can get a callback associated with the created session that would be passed for any consecutive mayLaunchUrl calls. This method returns CustomTabsSession when a session is created successfully; otherwise, it returns Null. Setting the prefetching URL The method for this is as follows: boolean CustomTabsSession.mayLaunchUrl (Uri url, Bundle extras, List<Bundle> otherLikelyBundles) This method will notify the browser that a navigation to this URL will happen soon. Make sure that you call warmup() prior to calling this method—this is a must. The most likely URL has to be specified first, and you can send an optional list of other likely URLs (otherLikelyBundles). Lists have to be sorted in a descending order and the optional list may be ignored. A new call to this method will lower the priority of previous calls and can result in URLs not being prefetched. Boolean values inform us whether the operation has been completed successfully. Custom tabs connection callback The method for this is as follows: void CustomTabsCallback.onNavigationEvent (int navigationEvent, Bundle extras) We have a callback triggered upon each navigation event in the custom tab. The int navigationEvent element is one of the six that defines the state the page is in. Refer to the following code for more information: //Sent when the tab has started loading a page. public static final int NAVIGATION_STARTED = 1; //Sent when the tab has finished loading a page. public static final int NAVIGATION_FINISHED = 2; //Sent when the tab couldn't finish loading due to a failure. public static final int NAVIGATION_FAILED = 3; //Sent when loading was aborted by a user action. public static final int NAVIGATION_ABORTED = 4; //Sent when the tab becomes visible. public static final int TAB_SHOWN = 5; //Sent when the tab becomes hidden. public static final int TAB_HIDDEN = 6; private static class NavigationCallback extends CustomTabsCallback { @Override public void onNavigationEvent(int navigationEvent, Bundle extras) { Log.i(TAG, "onNavigationEvent: Code = " + navigationEvent); } } Summary In this article, we learned about a newly added feature, Chrome custom tabs, which allows us to embed web content into our application and modify the UI. Chrome custom tabs allow us to provide a fuller, faster in-app web experience for our users. We use the Chrome engine under the hood, which allows faster loading than regular WebViews or loading the entire Chrome (or another browser) application. We saw that we can preload pages in the background, making it appear as if our data is blazing fast. We can customize the look and feel of our Chrome tab so that it matches our app. Among the changes we saw were the toolbar color, transition animations, and even the addition of custom actions to the toolbar. Custom tabs also benefit from Chrome features such as saved passwords, autofill, tap to search, and sync; these are all available within a custom tab. For developers, integration is quite easy and requires only a few extra lines of code in the basic level. The support library helps with more complex integration, if required. This is a Chrome feature, which means you get it on any Android device where the latest versions of Chrome are installed. Remember that the Chrome custom tab support library changes with new features and fixes, which is the same as other support libraries, so please update your version and make sure that you use the latest API to avoid any issues. To learn more about Chrome custom tabs and Android 6, refer to the following books: Android 6 Essentials (https://www.packtpub.com/application-development/android-6-essentials) Augmented Reality for Android Application Development (https://www.packtpub.com/application-development/augmented-reality-android-application-development) Resources for Article: Further resources on this subject: Android and iOS Apps Testing at a Glance [Article] Working with Xamarin.Android [Article] Mobile Phone Forensics – A First Step into Android Forensics [Article]
Read more
  • 0
  • 0
  • 5817

article-image-openstack-performance-availability
Packt
17 Feb 2016
21 min read
Save for later

OpenStack Performance, Availability

Packt
17 Feb 2016
21 min read
In this article by Tony Campbell, author of the book Troubleshooting OpenStack, we will cover some of the chronic issues that might be early signs of trouble. This article is more about prevention and aims to help you avoid emergency troubleshooting as much as possible. (For more resources related to this topic, see here.) Database Many OpenStack services make heavy use of a database. Production deployments will typically use MySQL or Postgres as the backend database server. As you may have learned, a failing or misconfigured database will quickly lead to trouble in your OpenStack cluster. Database problems can also present more subtle concerns that may grow into huge problems if neglected. Availability This database server can become a single point of failure if your database server is not deployed in a highly available configuration. OpenStack does not require a high availability installation of your database, and as a result, many installations may skip this step. However, production deployments of OpenStack should take care to ensure that their database can survive the failure of a single database server. MySQL with Galera Cluster For installations using the MySQL database engine, there are several options for clustering your installation. One popular method is to leverage Galera Cluster (http://http://galeracluster.com/). Galera Cluster for MySQL leverages synchronous replication and provides a multi-master cluster, which offers high availability for your OpenStack databases. Postgres Installations that use the Postgres database engine have several options such as high availability, load balancing, and replication. These options include block device replication with DRBD, log shipping, Master-Standby replication based on triggers, statement-based replication, and asynchronous multi-master replication. For details, refer to the Postgres High Availability Guide (http://www.postgresql.org/docs/current/static/high-availability.html). Performance Database performance is one of those metrics that can degrade over time. For an administrator who does not pay attention to small problems in this area, these can eventually become large problems. A wise administrator will regularly monitor the performance of their database and will constantly be on the lookout for slow queries, high database loads, and other indications of trouble. MySQL There are several options for monitoring your MySQL server, some of which are commercial and many others that are open source. Administrators should evaluate the options available and select a solution that fits their current set of tools and operating environment. There are several performance metrics you will want to monitor. Show Status The MySQL SHOW STATUS statement can be executed from the mysql command prompt. The output of this statement is server status information with over 300 variables reported. To narrow down the information, you can leverage a LIKE clause on the variable_name to display the sections you are interested in. Here is an abbreviated list of the output instances returned by SHOW STATUS: mysql> SHOW STATUS; +------------------------------------------+-------------+ | Variable_name                            | Value       | +------------------------------------------+-------------+ | Aborted_clients                          | 29          | | Aborted_connects                         | 27          | | Binlog_cache_disk_use                    | 0           | | Binlog_cache_use                         | 0           | | Binlog_stmt_cache_disk_use               | 0           | | Binlog_stmt_cache_use                    | 0           | | Bytes_received                           | 614         | | Bytes_sent                               | 33178       | Mytop Mytop is a command-line utility inspired by the Linux top command. Mytop retrieves data from the MySql SHOW PROCESSLIST command and the SHOW STATUS command. Data from these commands is refreshed, processed, and displayed in the output of the Mytop command. The Mytop output includes a header, which contains summary data followed by a thread section. Mytop header section Here is an example of the header output from the Mytop command: MySQL on localhost (5.5.46)                                                                                                                    load 1.01 0.85 0.79 4/538 23573 up 5+02:19:24 [14:35:24]  Queries: 3.9M     qps:    9 Slow:     0.0         Se/In/Up/De(%):    49/00/08/00  Sorts:     0 qps now:   10 Slow qps: 0.0  Threads:   30 (   1/   4) 40/00/12/00  Cache Hits: 822.0 Hits/s:  0.0 Hits now:   0.0  Ratio:  0.0%  Ratio now:  0.0%  Key Efficiency: 97.3%  Bps in/out:  1.7k/ 3.1k   Now in/out:  1.0k/ 3.9k As demonstrated in the preceding output, the header section for the Mytop command includes the following information: The hostname and MySQL version The server load The MySQL server uptime The total number of queries The average number of queries Slow queries The percentage of Select, Insert, Update, and Delete queries Queries per second Threads Cache hits Key efficiency Mytop thread section They Mytop thread section will list as many threads as it can display. The threads are ordered by the Time column, which displays the threads' idle time:        Id      User         Host/IP         DB       Time    Cmd    State Query                                                                                                                                --      ----         -------         --       ----    ---    ----- ----------                                                                                                                         3461   neutron  174.143.201.98    neutron   5680  Sleep                                                                                                                                            3477    glance  174.143.201.98     glance   1480  Sleep                                                                                                                                            3491      nova  174.143.201.98     nova      880  Sleep                                                                                                                                             3512      nova  174.143.201.98    nova      281  Sleep                                                                                                                                             3487  keystone  174.143.201.98   keystone        280  Sleep                                                                                                                                             3489    glance  174.143.201.98     glance        280  Sleep                                                                                                                                            3511  keystone  174.143.201.98   keystone        280  Sleep                                                                                                                                            3513   neutron  174.143.201.98    neutron        280  Sleep                                                                                                                                             3505  keystone  174.143.201.98   keystone        279  Sleep                                                                                                                                             3514  keystone  174.143.201.98   keystone        141  Sleep                                                                                                                                            ... The Mytop thread section displays the ID of each thread followed by the user and host. Finally, this section will display the database, idle time, and state or command query. Mytop will allow you to keep an eye on the performance of your MySql database server. Percona toolkit The Percona Toolkit is a very useful set of command-line tools for performing MySQL operations and system tasks. The toolkit can be downloaded from Percona at https://www.percona.com/downloads/percona-toolkit/. The output from these tools can be fed into your monitoring system allowing you to effectively monitor your MyQL installation. Postgres Like MySQL, the Postgres database also has a series of tools, which can be leveraged to monitor database performance. In addition to standard Linux troubleshooting tools, such as top and ps, Postgres also offers its own collection of statistics. The PostgreSQL statistics collector The statistics collector in Postgres allows you to collect data about server activity. The statistics collected in this tool are varied and may be helpful for troubleshooting or system monitoring. In order to leverage the statistics collector, you must turn on the functionality in the postgresql.conf file. The settings are commented out by default in the RUNTIME STATISTICS section of the configuration file. Uncomment the lines in the Query/Index Statistics Collector subsection. #------------------------------------------------------------------------------ # RUNTIME STATISTICS #------------------------------------------------------------------------------   # - Query/Index Statistics Collector -   track_activities = on track_counts = on track_io_timing = off track_functions = none                 # none, pl, all track_activity_query_size = 1024       # (change requires restart) update_process_title = on stats_temp_directory = 'pg_stat_tmp' Once the statistics collector is configured, restart the database server or execute a pg_ctl reload for the configuration to take effect. Once the collector has been configured, there will be a series of views created and named with the prefix “pg_stat”. These views can be queried for relevant statistics in the Posgres database server. Database bckups A diligent operator will ensure that a backup of the database for each OpenStack project is created. Since most OpenStack services make heavy use of the database for persisting things such as states and metadata, a corruption or loss of data can render your OpenStack cloud unusable. The current database backups can help rescue you from this fate. MySQL users can use the mysqldump utility to back up all of the OpenStack datbases. mysqldump --opt --all-databases > all_openstack_dbs.sql Similarly, Postgres users can back up all OpenStack databases with a command similar to the following: pg_dumpall > all_openstack_dbs.sql Your cadence for backups will depend on your environment and tolerance for data corruption of loss. You should store these backups in a safe place and occasional deploy test restores from the data to ensure that they work as expected. Monitoring Monitoring is often your early warning system that something is going wrong in your cluster. Your monitoring system can also be a rich source of information when the time comes to troubleshoot issues with the cluster. There are multiple options available for monitoring OpenStack. Many of your current application monitoring platforms will handle OpenStack just as well as any other Linux system. Regardless of the tool you select to for monitoring, there are several parts of OpenStack that you should focus on. Resource monitoring OpenStack is typically deployed on a series of Linux servers. Monitoring the resources on those servers is essential. A set it and forget it attitude is a recipe for disaster. The things you may want to monitor on your host servers include the following: CPU Disk Memory Log file size Network I/O Database Message broker OpenStack qotas OpenStack operators have the option to set usage quotas for each tenant/project. As an administrator, it is helpful to monitor a project’s usage as it pertains to these quotas. Once users reach a quota, they may not be able to deploy additional resources. Users may misinterpret this as an error in the system and report it to you . By keeping an eye on the quotas, your can proactively warn users as they reach their thresholds or you can decide to increase the quotas as appropriate. Some of the services have client commands that can be used to retrieve quota statistics. As an example, we demonstrate the nova absolute-limits command here: nova absolute-limits +--------------------+------+-------+ | Name               | Used | Max   | +--------------------+------+-------+ | Cores              | 1    | 20    | | FloatingIps        | 0    | 10    | | ImageMeta          | -    | 128   | | Instances          | 1    | 10    | | Keypairs           | -    | 100   | | Personality        | -    | 5     | | Personality Size   | -    | 10240 | | RAM                | 512  | 51200 | | SecurityGroupRules | -    | 20    | | SecurityGroups     | 1    | 10    | | Server Meta        | -    | 128   | | ServerGroupMembers | -    | 10    | | ServerGroups       | 0    | 10    | +--------------------+------+-------+ The absolute-limits command in Nova is nice because it displays the project’s current usage alongside the quota maximum, making it easy to notice that a project/tenant is coming close to the limit. RabbitMQ RabbitMQ is the default message broker used in OpenStack installations. However, if it is installed as is out the box, it can become a single point of failure. Administrators should consider clustering RabbitMQ and activating mirrored queues. Summary OpenStack is the leading open source software for running private clouds. Its popularity has grown exponentially since it was founded by Rackspace and NASA. The output of this engaged community is staggering, resulting in plenty of new features finding their way into OpenStack with each release. The project is at a size now where no one can truly know the details of each service. When working with such a complex project, it is inevitable that you will run into problems, bugs, errors, issues, and plain old trouble. Resources for Article:   Further resources on this subject: Concepts for OpenStack [article] Implementing OpenStack Networking and Security [article] Using the OpenStack Dashboard [article]
Read more
  • 0
  • 0
  • 12568

article-image-putting-fun-functional-python
Packt
17 Feb 2016
21 min read
Save for later

Putting the Fun in Functional Python

Packt
17 Feb 2016
21 min read
Functional programming defines a computation using expressions and evaluation—often encapsulated in function definitions. It de-emphasizes or avoids the complexity of state change and mutable objects. This tends to create programs that are more succinct and expressive. In this article, we'll introduce some of the techniques that characterize functional programming. We'll identify some of the ways to map these features to Python. Finally, we'll also address some ways in which the benefits of functional programming accrue when we use these design patterns to build Python applications. Python has numerous functional programming features. It is not a purely functional programming language. It offers enough of the right kinds of features that it confers to the benefits of functional programming. It also retains all optimization power available from an imperative programming language. We'll also look at a problem domain that we'll use for many of the examples in this book. We'll try to stick closely to Exploratory Data Analysis (EDA) because its algorithms are often good examples of functional programming. Furthermore, the benefits of functional programming accrue rapidly in this problem domain. Our goal is to establish some essential principles of functional programming. We'll focus on Python 3 features in this book. However, some of the examples might also work in Python 2. (For more resources related to this topic, see here.) Identifying a paradigm It's difficult to be definitive on what fills the universe of programming paradigms. For our purposes, we will distinguish between just two of the many programming paradigms: Functional programming and Imperative programming. One important distinguishing feature between these two is the concept of state. In an imperative language, like Python, the state of the computation is reflected by the values of the variables in the various namespaces. The values of the variables establish the state of a computation; each kind of statement makes a well-defined change to the state by adding or changing (or even removing) a variable. A language is imperative because each statement is a command, which changes the state in some way. Our general focus is on the assignment statement and how it changes state. Python has other statements, such as global or nonlocal, which modify the rules for variables in a particular namespace. Statements like def, class, and import change the processing context. Other statements like try, except, if, elif, and else act as guards to modify how a collection of statements will change the computation's state. Statements like for and while, similarly, wrap a block of statements so that the statements can make repeated changes to the state of the computation. The focus of all these various statement types, however, is on changing the state of the variables. Ideally, each statement advances the state of the computation from an initial condition toward the desired final outcome. This "advances the computation" assertion can be challenging to prove. One approach is to define the final state, identify a statement that will establish this final state, and then deduce the precondition required for this final statement to work. This design process can be iterated until an acceptable initial state is derived. In a functional language, we replace state—the changing values of variables—with a simpler notion of evaluating functions. Each function evaluation creates a new object or objects from existing objects. Since a functional program is a composition of a function, we can design lower-level functions that are easy to understand, and we will design higher-level compositions that can also be easier to visualize than a complex sequence of statements. Function evaluation more closely parallels mathematical formalisms. Because of this, we can often use simple algebra to design an algorithm, which clearly handles the edge cases and boundary conditions. This makes us more confident that the functions work. It also makes it easy to locate test cases for formal unit testing. It's important to note that functional programs tend to be relatively succinct, expressive, and efficient when compared to imperative (object-oriented or procedural) programs. The benefit isn't automatic; it requires a careful design. This design effort is often easier than functionally similar procedural programming. Subdividing the procedural paradigm We can subdivide imperative languages into a number of discrete categories. In this section, we'll glance quickly at the procedural versus object-oriented distinction. What's important here is to see how object-oriented programming is a subset of imperative programming. The distinction between procedural and object-orientation doesn't reflect the kind of fundamental difference that functional programming represents. We'll use code examples to illustrate the concepts. For some, this will feel like reinventing a wheel. For others, it provides a concrete expression of abstract concepts. For some kinds of computations, we can ignore Python's object-oriented features and write simple numeric algorithms. For example, we might write something like the following to get the range of numbers: s = 0 for n in range(1, 10): if n % 3 == 0 or n % 5 == 0: s += n print(s) We've made this program strictly procedural, avoiding any explicit use of Python's object features. The program's state is defined by the values of the variables s and n. The variable, n, takes on values such that 1 ≤ n < 10. As the loop involves an ordered exploration of values of n, we can prove that it will terminate when n == 10. Similar code would work in C or Java using their primitive (non-object) data types. We can exploit Python's Object-Oriented Programming (OOP) features and create a similar program: m = list() for n in range(1, 10): if n % 3 == 0 or n % 5 == 0: m.append(n) print(sum(m)) This program produces the same result but it accumulates a stateful collection object, m, as it proceeds. The state of the computation is defined by the values of the variables m and n. The syntax of m.append(n) and sum(m) can be confusing. It causes some programmers to insist (wrongly) that Python is somehow not purely Object-oriented because it has a mixture of the function()and object.method() syntax. Rest assured, Python is purely Object-oriented. Some languages, like C++, allow the use of primitive data type such as int, float, and long, which are not objects. Python doesn't have these primitive types. The presence of prefix syntax doesn't change the nature of the language. To be pedantic, we could fully embrace the object model, the subclass, the list class, and add a sum method: class SummableList(list): def sum( self ): s= 0 for v in self.__iter__(): s += v return s If we initialize the variable, m, with the SummableList() class instead of the list() method, we can use the m.sum() method instead of the sum(m) method. This kind of change can help to clarify the idea that Python is truly and completely object-oriented. The use of prefix function notation is purely syntactic sugar. All three of these examples rely on variables to explicitly show the state of the program. They rely on the assignment statements to change the values of the variables and advance the computation toward completion. We can insert the assert statements throughout these examples to demonstrate that the expected state changes are implemented properly. The point is not that imperative programming is broken in some way. The point is that functional programming leads to a change in viewpoint, which can, in many cases, be very helpful. We'll show a function view of the same algorithm. Functional programming doesn't make this example dramatically shorter or faster. Using the functional paradigm In a functional sense, the sum of the multiples of 3 and 5 can be defined in two parts: The sum of a sequence of numbers A sequence of values that pass a simple test condition, for example, being multiples of three and five The sum of a sequence has a simple, recursive definition: def sum(seq): if len(seq) == 0: return 0 return seq[0] + sum(seq[1:]) We've defined the sum of a sequence in two cases: the base case states that the sum of a zero length sequence is 0, while the recursive case states that the sum of a sequence is the first value plus the sum of the rest of the sequence. Since the recursive definition depends on a shorter sequence, we can be sure that it will (eventually) devolve to the base case. The + operator on the last line of the preceeding example and the initial value of 0 in the base case characterize the equation as a sum. If we change the operator to * and the initial value to 1, it would just as easily compute a product. Similarly, a sequence of values can have a simple, recursive definition, as follows: def until(n, filter_func, v): if v == n: return [] if filter_func(v): return [v] + until( n, filter_func, v+1 ) else: return until(n, filter_func, v+1) In this function, we've compared a given value, v, against the upper bound, n. If v reaches the upper bound, the resulting list must be empty. This is the base case for the given recursion. There are two more cases defined by the given filter_func() function. If the value of v is passed by the filter_func() function, we'll create a very small list, containing one element, and append the remaining values of the until() function to this list. If the value of v is rejected by the filter_func() function, this value is ignored and the result is simply defined by the remaining values of the until() function. We can see that the value of v will increase from an initial value until it reaches n, assuring us that we'll reach the base case soon. Here's how we can use the until() function to generate the multiples of 3 or 5. First, we'll define a handy lambda object to filter values: mult_3_5= lambda x: x%3==0 or x%5==0 (We will use lambdas to emphasize succinct definitions of simple functions. Anything more complex than a one-line expression requires the def statement.) We can see how this lambda works from the command prompt in the following example: >>> mult_3_5(3) True >>> mult_3_5(4) False >>> mult_3_5(5) True This function can be used with the until() function to generate a sequence of values, which are multiples of 3 or 5. The until() function for generating a sequence of values works as follows: >>> until(10, lambda x: x%3==0 or x%5==0, 0) [0, 3, 5, 6, 9] We can use our recursive sum() function to compute the sum of this sequence of values. The various functions, such as sum(), until(), and mult_3_5() are defined as simple recursive functions. The values are computed without restoring to use intermediate variables to store state. We'll return to the ideas behind this purely functional recursive function definition in several places. It's important to note here that many functional programming language compilers can optimize these kinds of simple recursive functions. Python can't do the same optimizations. Using a functional hybrid We'll continue this example with a mostly functional version of the previous example to compute the sum of the multiples of 3 and 5. Our hybrid functional version might look like the following: print( sum(n for n in range(1, 10) if n%3==0 or n%5==0) ) We've used nested generator expressions to iterate through a collection of values and compute the sum of these values. The range(1, 10) method is an iterable and, consequently, a kind of generator expression; it generates a sequence of values . The more complex expression, n for n in range(1, 10) if n%3==0 or n%5==0, is also an iterable expression. It produces a set of values . A variable, n, is bound to each value, more as a way of expressing the contents of the set than as an indicator of the state of the computation. The sum() function consumes the iterable expression, creating a final object, 23. The bound variable doesn't change once a value is bound to it. The variable, n, in the loop is essentially a shorthand for the values available from the range() function. The if clause of the expression can be extracted into a separate function, allowing us to easily repurpose this with other rules. We could also use a higher-order function named filter() instead of the if clause of the generator expression. As we work with generator expressions, we'll see that the bound variable is at the blurry edge of defining the state of the computation. The variable, n, in this example isn't directly comparable to the variable, n, in the first two imperative examples. The for statement creates a proper variable in the local namespace. The generator expression does not create a variable in the same way as a for statement does: >>> sum( n for n in range(1, 10) if n%3==0 or n%5==0 ) 23 >>> n Traceback (most recent call last): File "<stdin>", line 1, in <module> NameError: name 'n' is not defined Because of the way Python uses namespaces, it might be possible to write a function that can observe the n variable in a generator expression. However, we won't. Our objective is to exploit the functional features of Python, not to detect how those features have an object-oriented implementation under the hood. Looking at object creation In some cases, it might help to look at intermediate objects as a history of the computation. What's important is that the history of a computation is not fixed. When functions are commutative or associative, then changes to the order of evaluation might lead to different objects being created. This might have performance improvements with no changes to the correctness of the results. Consider this expression: >>> 1+2+3+4 10 We are looking at a variety of potential computation histories with the same result. Because the + operator is commutative and associative, there are a large number of candidate histories that lead to the same result. Of the candidate sequences, there are two important alternatives, which are as follows: >>> ((1+2)+3)+4 10 >>> 1+(2+(3+4)) 10 In the first case, we fold in values working from left to right. This is the way Python works implicitly. Intermediate objects 3 and 6 are created as part of this evaluation. In the second case, we fold from right-to-left. In this case, intermediate objects 7 and 9 are created. In the case of simple integer arithmetic, the two results have identical performance; there's no optimization benefit. When we work with something like the list append, we might see some optimization improvements when we change the association rules. Here's a simple example: >>> import timeit >>> timeit.timeit("((([]+[1])+[2])+[3])+[4]") 0.8846941249794327 >>> timeit.timeit("[]+([1]+([2]+([3]+[4])))") 1.0207440659869462 In this case, there's some benefit in working from left to right. What's important for functional design is the idea that the + operator (or add() function) can be used in any order to produce the same results. The + operator has no hidden side effects that restrict the way this operator can be used. The stack of turtles When we use Python for functional programming, we embark down a path that will involve a hybrid that's not strictly functional. Python is not Haskell, OCaml, or Erlang. For that matter, our underlying processor hardware is not functional; it's not even strictly object-oriented—CPUs are generally procedural. All programming languages rest on abstractions, libraries, frameworks and virtual machines. These abstractions, in turn, may rely on other abstractions, libraries, frameworks and virtual machines. The most apt metaphor is this: the world is carried on the back of a giant turtle. The turtle stands on the back of another giant turtle. And that turtle, in turn, is standing on the back of yet another turtle. It's turtles all the way down.                                                                                                             – Anonymous
There's no practical end to the layers of abstractions. More importantly, the presence of abstractions and virtual machines doesn't materially change our approach to designing software to exploit the functional programming features of Python. Even within the functional programming community, there are more pure and less pure functional programming languages. Some languages make extensive use of monads to handle stateful things like filesystem input and output. Other languages rely on a hybridized environment that's similar to the way we use Python. We write software that's generally functional with carefully chosen procedural exceptions. Our functional Python programs will rely on the following three stacks of abstractions: Our applications will be functions—all the way down—until we hit the objects The underlying Python runtime environment that supports our functional programming is objects—all the way down—until we hit the turtles The libraries that support Python are a turtle on which Python stands The operating system and hardware form their own stack of turtles. These details aren't relevant to the problems we're going to solve. A classic example of functional programming As part of our introduction, we'll look at a classic example of functional programming. This is based on the classic paper Why Functional Programming Matters by John Hughes. The article appeared in a paper called Research Topics in Functional Programming, edited by D. Turner, published by Addison-Wesley in 1990. Here's a link to the paper Research Topics in Functional Programming: http://www.cs.kent.ac.uk/people/staff/dat/miranda/whyfp90.pdf This discussion of functional programming in general is profound. There are several examples given in the paper. We'll look at just one: the Newton-Raphson algorithm for locating the roots of a function. In this case, the function is the square root. It's important because many versions of this algorithm rely on the explicit state managed via loops. Indeed, the Hughes paper provides a snippet of the Fortran code that emphasizes stateful, imperative processing. The backbone of this approximation is the calculation of the next approximation from the current approximation. The next_() function takes x, an approximation to the sqrt(n) method and calculates a next value that brackets the proper root. Take a look at the following example: def next_(n, x): return (x+n/x)/2 This function computes a series of values . The distance between the values is halved each time, so they'll quickly get to converge on the value such that, which means . We don't want to call the method next() because this name would collide with a built-in function. We call it the next_() method so that we can follow the original presentation as closely as possible. Here's how the function looks when used in the command prompt: >>> n= 2 >>> f= lambda x: next_(n, x) >>> a0= 1.0 >>> [ round(x,4) for x in (a0, f(a0), f(f(a0)), f(f(f(a0))),) ] [1.0, 1.5, 1.4167, 1.4142] We've defined the f() method as a lambda that will converge on . We started with 1.0 as the initial value for . Then we evaluated a sequence of recursive evaluations: , and so on. We evaluated these functions using a generator expression so that we could round off each value. This makes the output easier to read and easier to use with doctest. The sequence appears to converge rapidly on . We can write a function, which will (in principle) generate an infinite sequence of values converging on the proper square root: def repeat(f, a): yield a for v in repeat(f, f(a)): yield v This function will generate approximations using a function, f(), and an initial value, a. If we provide the next_() function defined earlier, we'll get a sequence of approximations to the square root of the n argument. The repeat() function expects the f() function to have a single argument, however, our next_() function has two arguments. We can use a lambda object, lambda x: next_(n, x), to create a partial version of the next_() function with one of two variables bound. The Python generator functions can't be trivially recursive, they must explicitly iterate over the recursive results, yielding them individually. Attempting to use a simple return repeat(f, f(a)) will end the iteration, returning a generator expression instead of yielding the sequence of values. We have two ways to return all the values instead of returning a generator expression, which are as follows: We can write an explicit for loop as follows: for x in some_iter: yield x. We can use the yield from statement as follows: yield from some_iter. Both techniques of yielding the values of a recursive generator function are equivalent. We'll try to emphasize yield from. In some cases, however, the yield with a complex expression will be more clear than the equivalent mapping or generator expression. Of course, we don't want the entire infinite sequence. We will stop generating values when two values are so close to each other that we can call either one the square root we're looking for. The common symbol for the value, which is close enough, is the Greek letter Epsilon, ε, which can be thought of as the largest error we will tolerate. In Python, we'll have to be a little clever about taking items from an infinite sequence one at a time. It works out well to use a simple interface function that wraps a slightly more complex recursion. Take a look at the following code snippet: def within(ε, iterable): def head_tail(ε, a, iterable): b= next(iterable) if abs(a-b) <= ε: return b return head_tail(ε, b, iterable) return head_tail(ε, next(iterable), iterable) We've defined an internal function, head_tail(), which accepts the tolerance, ε, an item from the iterable sequence, a, and the rest of the iterable sequence, iterable. The next item from the iterable bound to a name b. If , then the two values that are close enough together that we've found the square root. Otherwise, we use the b value in a recursive invocation of the head_tail() function to examine the next pair of values. Our within() function merely seeks to properly initialize the internal head_tail() function with the first value from the iterable parameter. Some functional programming languages offer a technique that will put a value back into an iterable sequence. In Python, this might be a kind of unget() or previous() method that pushes a value back into the iterator. Python iterables don't offer this kind of rich functionality. We can use the three functions next_(), repeat(), and within() to create a square root function, as follows: def sqrt(a0, ε, n): return within(ε, repeat(lambda x: next_(n,x), a0)) We've used the repeat() function to generate a (potentially) infinite sequence of values based on the next_(n,x) function. Our within() function will stop generating values in the sequence when it locates two values with a difference less than ε. When we use this version of the sqrt() method, we need to provide an initial seed value, a0, and an ε value. An expression like sqrt(1.0, .0001, 3) will start with an approximation of 1.0 and compute the value of to within 0.0001. For most applications, the initial a0 value can be 1.0. However, the closer it is to the actual square root, the more rapidly this method converges. The original example of this approximation algorithm was shown in the Miranda language. It's easy to see that there are few profound differences between Miranda and Python. The biggest difference is Miranda's ability to construct cons, a value back into an iterable, doing a kind of unget. This parallelism between Miranda and Python gives us confidence that many kinds of functional programming can be easily done in Python. Summary We've looked at programming paradigms with an eye toward distinguishing the functional paradigm from two common imperative paradigms in details. For more information kindly take a look at the following books, also by Packt Publishing: Learning Python (https://www.packtpub.com/application-development/learning-python) Mastering Python (https://www.packtpub.com/application-development/mastering-python) Mastering Object-oriented Python (https://www.packtpub.com/application-development/mastering-object-oriented-python) Resources for Article: Further resources on this subject: Saying Hello to Unity and Android [article] Using Specular in Unity [article] Unity 3.x Scripting-Character Controller versus Rigidbody [article]
Read more
  • 0
  • 1
  • 6081

article-image-developing-basic-site-nodejs-and-express
Packt
17 Feb 2016
21 min read
Save for later

Developing a Basic Site with Node.js and Express

Packt
17 Feb 2016
21 min read
In this article, we will continue with the Express framework. It's one of the most popular frameworks available and is certainly a pioneering one. Express is still widely used and several developers use it as a starting point. (For more resources related to this topic, see here.) Getting acquainted with Express Express (http://expressjs.com/) is a web application framework for Node.js. It is built on top of Connect (http://www.senchalabs.org/connect/), which means that it implements middleware architecture. In the previous chapter, when exploring Node.js, we discovered the benefit of such a design decision: the framework acts as a plugin system. Thus, we can say that Express is suitable for not only simple but also complex applications because of its architecture. We may use only some of the popular types of middleware or add a lot of features and still keep the application modular. In general, most projects in Node.js perform two functions: run a server that listens on a specific port, and process incoming requests. Express is a wrapper for these two functionalities. The following is basic code that runs the server: var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(1337, '127.0.0.1'); console.log('Server running at http://127.0.0.1:1337/'); var http = require('http'); http.createServer(function (req, res) { res.writeHead(200, {'Content-Type': 'text/plain'}); res.end('Hello Worldn'); }).listen(1337, '127.0.0.1'); console.log('Server running at http://127.0.0.1:1337/'); This is an example extracted from the official documentation of Node.js. As shown, we use the native module http and run a server on the port 1337. There is also a request handler function, which simply sends the Hello world string to the browser. Now, let's implement the same thing but with the Express framework, using the following code: var express = require('express'); var app = express(); app.get("/", function(req, res, next) { res.send("Hello world"); }).listen(1337); console.log('Server running at http://127.0.0.1:1337/'); It's pretty much the same thing. However, we don't need to specify the response headers or add a new line at the end of the string because the framework does it for us. In addition, we have a bunch of middleware available, which will help us process the requests easily. Express is like a toolbox. We have a lot of tools to do the boring stuff, allowing us to focus on the application's logic and content. That's what Express is built for: saving time for the developer by providing ready-to-use functionalities. Installing Express There are two ways to install Express. We'll will start with the simple one and then proceed to the more advanced technique. The simpler approach generates a template, which we may use to start writing the business logic directly. In some cases, this can save us time. From another viewpoint, if we are developing a custom application, we need to use custom settings. We can also use the boilerplate, which we get with the advanced technique; however, it may not work for us. Using package.json Express is like every other module. It has its own place in the packages register. If we want to use it, we need to add the framework in the package.json file. The ecosystem of Node.js is built on top of the Node Package Manager. It uses the JSON file to find out what we need and installs it in the current directory. So, the content of our package.json file looks like the following code: { "name": "projectname", "description": "description", "version": "0.0.1", "dependencies": { "express": "3.x" } } These are the required fields that we have to add. To be more accurate, we have to say that the mandatory fields are name and version. However, it is always good to add descriptions to our modules, particularly if we want to publish our work in the registry, where such information is extremely important. Otherwise, the other developers will not know what our library is doing. Of course, there are a bunch of other fields, such as contributors, keywords, or development dependencies, but we will stick to limited options so that we can focus on Express. Once we have our package.json file placed in the project's folder, we have to call npm install in the console. By doing so, the package manager will create a node_modules folder and will store Express and its dependencies there. At the end of the command's execution, we will see something like the following screenshot: The first line shows us the installed version, and the proceeding lines are actually modules that Express depends on. Now, we are ready to use Express. If we type require('express'), Node.js will start looking for that library inside the local node_modules directory. Since we are not using absolute paths, this is normal behavior. If we miss running the npm install command, we will be prompted with Error: Cannot find module 'express'. Using a command-line tool There is a command-line instrument called express-generator. Once we run npm install -g express-generator, we will install and use it as every other command in our terminal. If you use the framework inseveral projects, you will notice that some things are repeated. We can even copy and paste them from one application to another, and this is perfectly fine. We may even end up with our own boiler plate and can always start from there. The command-line version of Express does the same thing. It accepts few arguments and based on them, creates a skeleton for use. This can be very handy in some cases and will definitely save some time. Let's have a look at the available arguments: -h, --help: This signifies output usage information. -V, --version: This shows the version of Express. -e, --ejs: This argument adds the EJS template engine support. Normally, we need a library to deal with our templates. Writing pure HTML is not very practical. The default engine is set to JADE. -H, --hogan: This argument is Hogan-enabled (another template engine). -c, --css: If wewant to use the CSS preprocessors, this option lets us use LESS(short forLeaner CSS) or Stylus. The default is plain CSS. -f, --force: This forces Express to operate on a nonempty directory. Let's try to generate an Express application skeleton with LESS as a CSS preprocessor. We use the following line of command: express --css less myapp A new myapp folder is created with the file structure, as seen in the following screenshot: We still need to install the dependencies, so cd myapp && npm install is required. We will skip the explanation of the generated directories for now and will move to the created app.js file. It starts with initializing the module dependencies, as follows: var express = require('express'); var path = require('path'); var favicon = require('static-favicon'); var logger = require('morgan'); var cookieParser = require('cookie-parser'); var bodyParser = require('body-parser'); var routes = require('./routes/index'); var users = require('./routes/users'); var app = express(); Our framework is express, and path is a native Node.js module. The middleware are favicon, logger, cookieParser, and bodyParser. The routes and users are custom-made modules, placed in local for the project folders. Similarly, as in the Model-View-Controller(MVC) pattern, these are the controllers for our application. Immediately after, an app variable is created; this represents the Express library. We use this variable to configure our application. The script continues by setting some key-value pairs. The next code snippet defines the path to our views and the default template engine: app.set('views', path.join(__dirname, 'views')); app.set('view engine', 'jade'); The framework uses the methods set and get to define the internal properties. In fact, we may use these methods to define our own variables. If the value is a Boolean, we can replace set and get with enable and disable. For example, see the following code: app.set('color', 'red'); app.get('color'); // red app.enable('isAvailable'); The next code adds middleware to the framework. Wecan see the code as follows: app.use(favicon()); app.use(logger('dev')); app.use(bodyParser.json()); app.use(bodyParser.urlencoded()); app.use(cookieParser()); app.use(require('less-middleware')({ src: path.join(__dirname, 'public') })); app.use(express.static(path.join(__dirname, 'public'))); The first middleware serves as the favicon of our application. The second is responsible for the output in the console. If we remove it, we will not get information about the incoming requests to our server. The following is a simple output produced by logger: GET / 200 554ms - 170b GET /stylesheets/style.css 200 18ms - 110b The json and urlencoded middleware are related to the data sent along with the request. We need them because they convert the information in an easy-to-use format. There is also a middleware for the cookies. It populates the request object, so we later have access to the required data. The generated app uses LESS as a CSS preprocessor, and we need to configure it by setting the directory containing the .less files. Eventually, we define our static resources, which should be delivered by the server. These are just few lines, but we've configured the whole application. We may remove or replace some of the modules, and the others will continue working. The next code in the file maps two defined routes to two different handlers, as follows: app.use('/', routes); app.use('/users', users); If the user tries to open a missing page, Express still processes the request by forwarding it to the error handler, as follows: app.use(function(req, res, next) { var err = new Error('Not Found'); err.status = 404; next(err); }); The framework suggests two types of error handling:one for the development environment and another for the production server. The difference is that the second one hides the stack trace of the error, which should be visible only for the developers of the application. As we can see in the following code, we are checking the value of the env property and handling the error differently: // development error handler if (app.get('env') === 'development') { app.use(function(err, req, res, next) { res.status(err.status || 500); res.render('error', { message: err.message, error: err }); }); } // production error handler app.use(function(err, req, res, next) { res.status(err.status || 500); res.render('error', { message: err.message, error: {} }); }); At the end, the app.js file exports the created Express instance, as follows: module.exports = app; To run the application, we need to execute node ./bin/www. The code requires app.js and starts the server, which by default listens on port 3000. #!/usr/bin/env node var debug = require('debug')('my-application'); var app = require('../app'); app.set('port', process.env.PORT || 3000); var server = app.listen(app.get('port'), function() { debug('Express server listening on port ' + server.address().port); }); The process.env declaration provides an access to variables defined in the current development environment. If there is no PORT setting, Express uses 3000 as the value. The required debug module uses a similar approach to find out whether it has to show messages to the console. Managing routes The input of our application is the routes. The user visits our page at a specific URL and we have to map this URL to a specific logic. In the context of Express, this can be done easily, as follows: var controller = function(req, res, next) { res.send("response"); } app.get('/example/url', controller); We even have control over the HTTP's method, that is, we are able to catch POST, PUT, or DELETE requests. This is very handy if we want to retain the address path but apply a different logic. For example, see the following code: var getUsers = function(req, res, next) { // ... } var createUser = function(req, res, next) { // ... } app.get('/users', getUsers); app.post('/users', createUser); The path is still the same, /users, but if we make a POST request to that URL, the application will try to create a new user. Otherwise, if the method is GET, it will return a list of all the registered members. There is also a method, app.all, which we can use to handle all the method types at once. We can see this method in the following code snippet: app.all('/', serverHomePage); There is something interesting about the routing in Express. We may pass not just one but many handlers. This means that we can create a chain of functions that correspond to one URL. For example, it we need to know if the user is logged in, there is a module for that. We can add another method that validates the current user and attaches a variable to the request object, as follows: var isUserLogged = function(req, res, next) { req.userLogged = Validator.isCurrentUserLogged(); next(); } var getUser = function(req, res, next) { if(req.userLogged) { res.send("You are logged in. Hello!"); } else { res.send("Please log in first."); } } app.get('/user', isUserLogged, getUser); The Validator class is a class that checks the current user's session. The idea is simple: we add another handler, which acts as an additional middleware. After performing the necessary actions, we call the next function, which passes the flow to the next handler, getUser. Because the request and response objects are the same for all the middlewares, we have access to the userLogged variable. This is what makes Express really flexible. There are a lot of great features available, but they are optional. At the end of this chapter, we will make a simple website that implements the same logic. Handling dynamic URLs and the HTML forms The Express framework also supports dynamic URLs. Let's say we have a separate page for every user in our system. The address to those pages looks like the following code: /user/45/profile Here, 45 is the unique number of the user in our database. It's of course normal to use one route handler for this functionality. We can't really define different functions for every user. The problem can be solved by using the following syntax: var getUser = function(req, res, next) { res.send("Show user with id = " + req.params.id); } app.get('/user/:id/profile', getUser); The route is actually like a regular expression with variables inside. Later, that variable is accessible in the req.params object. We can have more than one variable. Here is a slightly more complex example: var getUser = function(req, res, next) { var userId = req.params.id; var actionToPerform = req.params.action; res.send("User (" + userId + "): " + actionToPerform) } app.get('/user/:id/profile/:action', getUser); If we open http://localhost:3000/user/451/profile/edit, we see User (451): edit as a response. This is how we can get a nice looking, SEO-friendly URL. Of course, sometimes we need to pass data via the GET or POST parameters. We may have a request like http://localhost:3000/user?action=edit. To parse it easily, we need to use the native url module, which has few helper functions to parse URLs: var getUser = function(req, res, next) { var url = require('url'); var url_parts = url.parse(req.url, true); var query = url_parts.query; res.send("User: " + query.action); } app.get('/user', getUser); Once the module parses the given URL, our GET parameters are stored in the .query object. The POST variables are a bit different. We need a new middleware to handle that. Thankfully, Express has one, which is as follows: app.use(express.bodyParser()); var getUser = function(req, res, next) { res.send("User: " + req.body.action); } app.post('/user', getUser); The express.bodyParser() middleware populates the req.body object with the POST data. Of course, we have to change the HTTP method from .get to .post or .all. If we want to read cookies in Express, we may use the cookieParser middleware. Similar to the body parser, it should also be installed and added to the package.json file. The following example sets the middleware and demonstrates its usage: var cookieParser = require('cookie-parser'); app.use(cookieParser('optional secret string')); app.get('/', function(req, res, next){ var prop = req.cookies.propName }); Returning a response Our server accepts requests, does some stuff, and finally, sends the response to the client's browser. This can be HTML, JSON, XML, or binary data, among others. As we know, by default, every middleware in Express accepts two objects, request and response. The response object has methods that we can use to send an answer to the client. Every response should have a proper content type or length. Express simplifies the process by providing functions to set HTTP headers and sending content to the browser. In most cases, we will use the .send method, as follows: res.send("simple text"); When we pass a string, the framework sets the Content-Type header to text/html. It's great to know that if we pass an object or array, the content type is application/json. If we develop an API, the response status code is probably going to be important for us. With Express, we are able to set it like in the following code snippet: res.send(404, 'Sorry, we cannot find that!'); It's even possible to respond with a file from our hard disk. If we don't use the framework, we will need to read the file, set the correct HTTP headers, and send the content. However, Express offers the .sendfile method, which wraps all these operations as follows: res.sendfile(__dirname + "/images/photo.jpg"); Again, the content type is set automatically; this time it is based on the filename's extension. When building websites or applications with a user interface, we normally need to serve an HTML. Sure, we can write it manually in JavaScript, but it's good practice to use a template engine. This means we save everything in external files and the engine reads the markup from there. It populates them with some data and, at the end, provides ready-to-show content. In Express, the whole process is summarized in one method, .render. However, to work properly, we have to instruct the framework regarding which template engine to use. We already talked about this in the beginning of this chapter. The following two lines of code, set the path to our views and the template engine: app.set('views', path.join(__dirname, 'views')); app.set('view engine', 'jade'); Let's say we have the following template ( /views/index.jade ): h1= title p Welcome to #{title} Express provides a method to serve templates. It accepts the path to the template, the data to be applied, and a callback. To render the previous template, we should use the following code: res.render("index", {title: "Page title here"}); The HTML produced looks as follows: <h1>Page title here</h1><p>Welcome to Page title here</p> If we pass a third parameter, function, we will have access to the generated HTML. However, it will not be sent as a response to the browser. The example-logging system We've seen the main features of Express. Now let's build something real. The next few pages present a simple website where users can read only if they are logged in. Let's start and set up the application. We are going to use Express' command-line instrument. It should be installed using npm install -g express-generator. We create a new folder for the example, navigate to it via the terminal, and execute express --css less site. A new directory, site, will be created. If we go there and run npm install, Express will download all the required dependencies. As we saw earlier, by default, we have two routes and two controllers. To simplify the example, we will use only the first one: app.use('/', routes). Let's change the views/index.jade file content to the following HTML code: doctype html html head title= title link(rel='stylesheet', href='/stylesheets/style.css') body h1= title hr p That's a simple application using Express. Now, if we run node ./bin/www and open http://127.0.0.1:3000, we will see the page. Jade uses indentation to parse our template. So, we should not mix tabs and spaces. Otherwise, we will get an error. Next, we need to protect our content. We check whether the current user has a session created; if not, a login form is shown. It's the perfect time to create a new middleware. To use sessions in Express, install an additional module: express-session. We need to open our package.json file and add the following line of code: "express-session": "~1.0.0" Once we do that, a quick run of npm install will bring the module to our application. All we have to do is use it. The following code goes to app.js: var session = require('express-session'); app.use(session({ secret: 'app', cookie: { maxAge: 60000 }})); var verifyUser = function(req, res, next) { if(req.session.loggedIn) { next(); } else { res.send("show login form"); } } app.use('/', verifyUser, routes); Note that we changed the original app.use('/', routes) line. The session middleware is initialized and added to Express. The verifyUser function is called before the page rendering. It uses the req.session object, and checks whether there is a loggedIn variable defined and if its value is true. If we run the script again, we will see that the show login form text is shown for every request. It's like this because no code sets the session exactly the way we want it. We need a form where users can type their username and password. We will process the result of the form and if the credentials are correct, the loggedIn variable will be set to true. Let's create a new Jade template, /views/login.jade: doctype html html head title= title link(rel='stylesheet', href='/stylesheets/style.css') body h1= title hr form(method='post') label Username: br input(type='text', name='username') br label Password: br input(type='password', name='password') br input(type='submit') Instead of sending just a text with res.send("show login form"); we should render the new template, as follows: res.render("login", {title: "Please log in."}); We choose POST as the method for the form. So, we need to add the middleware that populates the req.body object with the user's data, as follows: app.use(bodyParser()); Process the submitted username and password as follows: var verifyUser = function(req, res, next) { if(req.session.loggedIn) { next(); } else { var username = "admin", password = "admin"; if(req.body.username === username && req.body.password === password) { req.session.loggedIn = true; res.redirect('/'); } else { res.render("login", {title: "Please log in."}); } } } The valid credentials are set to admin/admin. In a real application, we may need to access a database or get this information from another place. It's not really a good idea to place the username and password in the code; however, for our little experiment, it is fine. The previous code checks whether the passed data matches our predefined values. If everything is correct, it sets the session, after which the user is forwarded to the home page. Once you log in, you should be able to log out. Let's add a link for that just after the content on the index page (views/index.jade ): a(href='/logout') logout Once users clicks on this link, they will be forward to a new page. We just need to create a handler for the new route, remove the session, and forward them to the index page where the login form is reflected. Here is what our logging out handler looks like: // in app.js var logout = function(req, res, next) { req.session.loggedIn = false; res.redirect('/'); } app.all('/logout', logout); Setting loggedIn to false is enough to make the session invalid. The redirect sends users to the same content page they came from. However, this time, the content is hidden and the login form pops up. Summary In this article, we learned about one of most widely used Node.js frameworks, Express. We discussed its fundamentals, how to set it up, and its main characteristics. The middleware architecture, which we mentioned in the previous chapter, is the base of the library and gives us the power to write complex but, at the same time, flexible applications. The example we used was a simple one. We required a valid session to provide page access. However, it illustrates the usage of the body parser middleware and the process of registering the new routes. We also updated the Jade templates and saw the results in the browser. For more information on Node.js Refer to the following URLs: https://www.packtpub.com/web-development/instant-nodejs-starter-instant https://www.packtpub.com/web-development/learning-nodejs-net-developers https://www.packtpub.com/web-development/nodejs-essentials Resources for Article: Further resources on this subject: Writing a Blog Application with Node.js and AngularJS [article] Testing in Node and Hapi [article] Learning Node.js for Mobile Application Development [article]
Read more
  • 0
  • 0
  • 14618
article-image-understanding-docker
Packt
17 Feb 2016
26 min read
Save for later

Understanding Docker

Packt
17 Feb 2016
26 min read
This article will cover the Docker basics that you should already have a pretty good handle on. But if you don't already have the required knowledge at this point, this article will help give you the basics. (For more resources related to this topic, see here.) In this article, we're going to review the following higher level topics with subtopics in each section: Understanding Docker Docker versus typical VMs The Dockerfile and its function Docker networking/linking Docker installers/installation Types of installers and how they operate Controlling your Docker daemon The Kitematic GUI Docker commands Useful commands for Docker, Docker images, and Docker containers Understanding Docker In this section, we will be covering the structure of Docker and the flow of what happens behind the scenes in this world. We will also take a look at Dockerfile and all the magic it can do. Lastly, in this section, we will look at the Docker networking/linking. Difference between Docker and typical VMs First, we must know what exactly Docker is and does. Docker is a container management system that helps easily manage Linux Containers (LXC) in an easier and universal fashion. This lets you create images in virtual environments on your laptop and run commands or operations against them. The actions you do to the containers that you run in these environments locally on your own machine will be the same commands or operations you run against them when they are running in your production environment. This helps in not having to do things differently when you go from a development environment like that on your local machine to a production environment on your server. Now, let's take a look at the differences between Docker containers and the typical virtual machine environments. In the following illustration, we can see the typical Docker setup on the right-hand side versus the typical VM setup on the left-hand side: This illustration gives us a lot of insight into the biggest key benefit of Docker; and that is its no need for a full operating system every time we need to bring up a new container, which cuts down on the overall size of containers. Docker relies on using the host OS's Linux kernel (since almost all the versions of Linux use the standard kernel models) for the OS it was built upon, such as Red Hat, CentOS, Ubuntu, and so on. For this reason, you can have almost any Linux OS as your host operating system (Ubuntu in the previous illustration) and be able to layer other OSes on top of the host. For example, in the earlier illustration, we could have Red Hat running for one app (the one on the left) and Debian running for the other app (the one on the right), but there would never be a need to actually install Red Hat or Debian on the host. Thus, another benefit of Docker is the size of images when they are born. They are not built with the largest piece: the kernel or the operating system. This makes them incredibly small, compact, and easy to ship. Dockerfile Next, let's take a look at the most important file pertaining to Docker: Dockerfile. Dockerfile is the core file that contains instructions to be performed when an image is built. For example, in an Ubuntu-based system, if you want to install the Apache package, you would first do an apt-get update followed by an apt-get install -y apache2. These would be the type of instructions you would find inside a typical Dockerfile. Items such as commands, calls to other scripts, setting environmental variables, adding files, and setting permissions can all be done via Dockerfile. Dockerfile is also where you specify what image is to be used as your base image for the build. Let's take a look at a very basic Dockerfile and then go over the individual pieces that make one up and what they all do: FROM ubuntu:latest MAINTAINER Scott P. Gallagher <email@somewhere.com> RUN apt-get update && apt-get install -y apache2 ADD 000-default.conf /etc/apache2/sites-available/ RUN chown root:root /etc/apache2/sites-available/000-default.conf EXPOSE 80 CMD ["/usr/sbin/apache2ctl", "-D", "FOREGROUND"] These are the typical items you would find in a basic Dockerfile. The first line states the image we want to start off with when we build the container. In this example, we will be using Ubuntu; the item after the colon can be called if you want a specific version of it. In this case, I am just going to say use the latest version of Ubuntu; but you will also specify trusty, precise, raring, and so on. The second line is the line that is relevant to the maintainer of Dockerfile. In this case, I just have my information in there; well, at least, my name is there. This is for people to contact you if they have any questions or find any errors in your file. Typically, most people just include their name and e-mail address. The next line is a typical line you will see while pulling updates and packages in a Ubuntu environment. You might think they should be separate and wonder why they should be put on the same line separated by &&. Well, in the Dockerfile, it helps by only having to run one process to encompass the entire line. If you were to split it into separate lines, it would have to run one process, finish the process, then start the next process, and finish it. With this, it helps speed up the process by pairing the processes together. They still run one after another, but with more efficiency. The next two lines complement each other. The first adds your custom configurations to the path you specified and changes the owner and group owner to the root user. The EXPOSE line will expose the ports to anything external to the container and to the host it is running on. (This will, by default, expose the container externally beyond the host, unless the firewall is enabled and protecting it.) The last line is the command that is run when the container is launched. This particular command in a Dockerfile should only be used once. If it is used more than once, the last CMD in the Dockerfile will be launched upon the container that is running. This also helps emphasize the one process per container rule. The idea is to spread out the processes so that each process runs in its own container, thus the value of the containers will become more understandable. Essentially, something that runs in the foreground, such as the earlier command to keep the Apache running in the foreground. If we were to use CMD ["service apache2 start"], the container would start and then immediately stop. There is nothing to keep the container running. You can also have other instructions, such as ENV to specify the environmental variables that users can pass upon runtime. These are typically used and are useful while using shell scripts to perform actions such as specifying a database to be created in MySQL or setting permission databases. Docker networking/linking Another important aspect that needs to be understood is how Docker containers are networked or linked together. The way they are networked or linked together highlights another important and large benefit of Docker. When a container is created, it creates a bridge network adapter for which it is assigns an address; it is through these network adapters that the communication flows when you link containers together. Docker doesn't have the need to expose ports to link containers. Let's take a look at it with the help of the following illustration: In the preceding illustration, we can see that the typical VM has to expose ports for others to be able to communicate with each other. This can be dangerous if you don't set up your firewalls or, in this case with MySQL, your MySQL permissions correctly. This can also cause unwanted traffic to the open ports. In the case of Docker, you can link your containers together, so there is no need to expose the ports. This adds security to your setup, as there is now a secure connection between your containers. We've looked at the differences between Docker and typical VMs, as well as the Dockerfile structure and the components that make up the file. We also looked at how Docker containers are linked together for security purposes as opposed to typical VMs. Now, let's review the installers for Docker and the structure behind the installation once they are installed, manipulating them to ensure they are operating correctly. Docker installers/installation Installers are one of the first pieces you need to get up and running with Docker on both your local machine as well as your server environments. Let's first take a look at what environments you can install Docker in: Apple OS X (Mac) Windows Linux (various Linux flavors) Cloud (AWS, DigitalOcean, Microsoft Azure, and so on) Types of installers With the various types of installers listed earlier, there are different ways Docker actually operates on the operating system. Docker natively runs on Linux; so if you are using Linux, then it's pretty straightforward how Docker runs right on your system. However, if you are using Windows or Mac OS X, then it operates a little differently, since it relies on using Linux. With these operating systems, they need Linux in some sort of way, thus enters the virtual machine needed to run the Linux part that Docker operates on, which is called boot2docker. The installers for both Windows and Mac OS X are bundled with the boot2docker package alongside the virtual machine software that, by default, is the Oracle VirtualBox. Now, it is worthwhile to note that Docker recently moved away from offering boot2docker. But, I feel, it is important to understand the boot2docker terms and commands in case you run across anyone running the previous version of the Docker installer. This will help you understand what is going on and move forward to the new installer(s). Currently, they are offering up Docker Toolbox that, like the name implies, includes a lot of items that the installer will install for you. The installers for each OS contain different applications with regards to Docker such as: Docker Toolbox piece Mac OS X Windows Docker Client X X Docker Machine X X Docker Compose X   Docker Kitematic X X VirtualBox X X First, let's take a look at the older style commands of boot2docker. Then, we will take a look at the new commands or application that you can use to achieve these outcomes. Controlling the Docker VM (boot2docker) Now, there are ways to run boot2docker on different VM software. But to start off, VirtualBox is the best and easiest way to operate boot2docker: $ boot2docker Usage: boot2docker [<options>] {help|init|up|ssh|save|down|poweroff|reset|restart|config|status|info|ip|shellinit|delete|download|upgrade|version} [<args>] Now, after we have installed Docker on Linux, OS X, or Windows, how do we go about controlling this virtual machine in the events when we need to start it up, restart it, or even shut it down? This is where the boot2docker command-line parameters come into play. As you can see in the earlier illustration, there are a lot of options you can use for your boot2docker instance. The options you will use mostly are up, down, poweroff, restart, status, ip, upgrade, and version. Some of these commands you will use mostly to troubleshoot items when you are trying to see why the Docker commands might hang, or when you run into any other issues with your boot2docker virtual machine. You can see what each command does by executing the following command: $ boot2docker help The most useful command that I have found while troubleshooting is the boot2docker status command: $ boot2docker status Another useful boot2docker command is: $ boot2docker version This command will help see what version of boot2docker you are currently running. This is helpful in knowing when to use the boot2docker upgrade command. The last command we will look at with respect to boot2docker is the boot2docker ip command. This command is very useful when you need to know what IP address is to be used to access the machines you have been running on a particular host: $ boot2docker ip 192.168.59.103 As you can see, the earlier command gives us the IP address of the boot2docker client running on my OS X machine inside VirtualBox. By using this IP, I can now access the containers I might have been running using the IP address alongside any of the open ports I have exposed. Docker Machine – the new boot2docker So, with boot2docker on its way out, there needs to be a new way to do what boot2docker does. This being said, enter Docker Machine. With Docker Machine, you can do the same things you did with boot2docker, but now in Machine. The following table shows the commands you used in boot2docker and what they are now in Machine: Command boot2docker Docker Machine command boot2docker docker-machine help boot2docker help docker-machine help status boot2docker status docker-machine status version boot2docker version docker-machine sionus i ip boot2docker ip docker-machine ip Kitematic Now that we have covered all the basics of controlling your boot2docker VM, let's take a look at another way you can run Docker containers on your local machine. Let's take a look at Kitematic. Kitematic is a recent addition to the Docker portfolio. Up until now, everything we have done has been command line-based. With Kitematic, you can manage your Docker containers through a GUI. Kitematic can be used either on Windows or OS X, just not on Linux; besides who needs a GUI on Linux anyways! Kitematic, just like boot2docker, operates on a VM defaulting to VirtualBox. Pictures are worth a thousand words, so let's take a look at some screenshots of Kitematic: The previous screenshot depicts what you will see when you launch Kitematic for the first time. After you start running the containers, they will show up on the left-hand side column. You can manipulate and get information about them through the GUI. You can search for prebuilt images on the Docker Hub and click on the CREATE button once you have found the one you want to use or test. In the preceding screenshot, we have created and are running the hello-world-nginx image inside Kitematic. We can now use the STOP, RESTART, and EXEC commands against the container as well as view the settings of the running container. In the following screenshot, we can go to settings and view what ports are exposed from the container to the outside: In the following screenshot, you can see that you can use your login credentials to log in to the Docker Hub and view the repositories you have created and pushed there: The Docker commands We have covered the types of installers and what they can be run on. We have also seen how to control the Docker VM that gets created for you and how to use Kitematic. Let's look at some Docker commands that you should be familiar with already. We will start with some common commands and then take a peek at the commands that are used for the Docker images. We will then take a dive into the commands that are used for the containers. The first command we will be taking a look at will be one of the most useful commands not only in Docker but in any command-line utility you use—the help command. It is run simply by executing the command as follows: $ docker help The earlier command will give you a full list of all the Docker commands at your disposal and a brief description of what each command does. For further help with a particular command, you can run the following: $ docker COMMAND --help You will then receive additional information on using the command, such as the switches, arguments, and descriptions of the arguments. Similar to the boot2docker version command we ran earlier, there is also a version command for the Docker daemon: $ docker version Now, this command will give us a little bit more information than the boot2docker command output, as follows: Client version: 1.7.0 Client API version: 1.19 Go version (client): go1.4.2 Git commit (client): 0baf609 OS/Arch (client): darwin/amd64 Server version: 1.7.0 Server API version: 1.19 Go version (server): go1.4.2 Git commit (server): 0baf609 OS/Arch (server): linux/amd64 This is helpful when you want to see the version of the Docker daemon you may be running to see if you need/want to upgrade. The Docker images Next, let's take a dive into the Docker images. You will learn how to view the images you currently have that you can run, search for images on the Docker Hub, and pull them down to your environment, so you can run them. Let's first take a look at the docker images command. Upon running the command, we will get an output similar to the following output: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE ubuntu 14.10 ab57dbafeeea 11 days ago 194.5 MB ubuntu trusty 6d4946999d4f 11 days ago 188.3 MB ubuntu latest 6d4946999d4f 11 days ago 188.3 MB Your output will differ based on whether you have any images at all in your Docker environment or upon what images you do have. There are a few important pieces you need to understand from the output you see. Let's go over the columns and what is contained in each. The first column you see is the REPOSITORY column; this column contains the name of the repository as it exists in the Docker Hub. If you were to have a repository that was from someone's user account, it may show up as follows: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE scottpgallagher/mysql latest 57df9c7989a1 9 weeks ago 321.7 MB The next column, the TAG column, will show you different versions of a repository. As you can see in the preceding example with the Ubuntu repository, there are tag names for the different versions. So, if you want to specify a particular version of a repository in your Dockerfile (as we saw earlier), you are able to. This is useful, so you're not always reliant on having to use the latest version of an operating system and can use the one your application supports the best. It can also help you do backward compatibility testing for your application. The next column is labeled IMAGE ID and it is based on a unique 64 hexadecimal digit string of characters. The image ID simplifies this down to the first 12 digits for easier viewing. Imagine if you had to view all 64 bits on one line! You will learn when to use this unique image ID for later tasks. The last two columns are pretty straightforward; the first being the creation date for the repository, followed by the virtual size of the image. The size is very important as you want to keep or use images that are very small in size if you plan to be moving them around a lot. The smaller the image, the faster is the load time; and who doesn't like it faster? Searching for the Docker images Okay, so let's look at how we can search for the images that are in the Docker Hub using the Docker commands. The command we will be looking at is docker search. With the docker search command, you can search based on the different criteria you are looking for. For example, we can search for all the images with the term ubuntu in them and see what all is available. Here is what we would get back in our results; it would go as follows: $ docker search ubuntu We would get back our results: NAME DESCRIPTION STARS OFFICIAL AUTOMATED ubuntu Ubuntu is a Debian-based Linux operating s... 1835 [OK] ubuntu-upstart Upstart is an event-based replacement for ... 26 [OK] tutum/ubuntu Ubuntu image with SSH access. For the root... 25 [OK] torusware/speedus-ubuntu Always updated official Ubuntu docker imag... 25 [OK] ubuntu-debootstrap debootstrap --variant=minbase --components... 10 [OK] rastasheep/ubuntu-sshd Dockerized SSH service, built on top of of... 4 [OK] maxexcloo/ubuntu Docker base image built on Ubuntu with Sup... 2 [OK] nuagebec/ubuntu Simple always updated Ubuntu docker images... 2 [OK] nimmis/ubuntu This is a docker images different LTS vers... 1 [OK] alsanium/ubuntu Ubuntu Core image for Docker 1 [OK] Based on these results, we can now decipher some information. We can see the name of the repository, a reduced description, how many people have starred and think it is a good repository, whether it's an official repository; which means it's been approved by the Docker team, as well as if it's an automated build. An automated build is typically a Docker image that is built automatically when a Git repository it is linked to is updated. The code gets updated, the web hook is called, and a new Docker image is built in the Docker Hub. If we find an image we want to use, we can simply pull it using its repository name with the docker pull command, as follows: $ docker pull tutum/ubuntu The image will be downloaded and show up in our list when we perform the docker images command we ran earlier. We now know how to search for Docker images and pull them down to our machine. What if we want to get rid of them? That's where the docker rmi command comes into play. With the docker rmi command, you can remove unwanted images from your machine(s). So, let's take look at the images we currently have on our machine with the docker images command. We will get the following: REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE ubuntu 14.10 ab57dbafeeea 11 days ago 194.5 MB ubuntu trusty 6d4946999d4f 11 days ago 188.3 MB ubuntu latest 6d4946999d4f 11 days ago 188.3 MB We can see that we have duplicate images here taking up space. We can see this by looking at the image ID and seeing the exact image ID for both ubuntu:trusty and ubuntu:latest. We now know that ubuntu:trusty is the latest Ubuntu image, so there is no need to keep them both around. Let's free up some space by removing ubuntu:trusty and just keeping ubuntu:latest. We do this by using the docker rmi command, as follows: $ docker rmi ubuntu:trusty If you issue the docker images command now, you will see that ubuntu:trusty no longer shows up in your images list and has been removed. Now, you can remove machines based on their image ID as well. But be careful while you do so; in this scenario, not only will you remove ubuntu:trusty, but you will also remove ubuntu:latest as they have the same image ID. Manipulating the Docker images We have gone over the images and know how to obtain and manipulate them in some ways. Next, we are going to take a look at what it takes to fire them up and manipulate them. This is the part where the images become containers! Let's first go over the basics of the docker run command and how to run containers. We will cover some basic docker run items in this article. So, let's just look at how to get images up, running, and turned into containers. The most basic way to run a container is as follows: $ docker run -i -t <image_name>:<tag> /bin/bash Upon closer inspection of the earlier command, we start off with the docker run command, followed by two switches: -i and -t. The -i gives us an interactive shell into the running container, the -t will allocate a pseudo-tty that, while using interactive processes, must be used together with the -i switch. You can also use switches together; for example, -it is commonly used for these two switches. This will help you test the container to see how it operates before running it as a daemon. Once you are comfortable with your container, you can test how it operates in the daemon mode: $ docker run -d <image_name>:<tag> If the container is set up correctly and has an entry point setup, you should be able to see the running container by issuing the docker ps command. You will see something similar to the following: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES cc1fefcfa098 ubuntu:14.10 "/bin/bash" 3 seconds ago Up 3 seconds boring_mccarthy Based on the earlier command, we get a lot of other important information indicating that the container is running. We can see the container ID, the image name that is running, the command that is running to keep the image alive, when the container started, its current status, if any ports were exposed they would be listed here, as well as the name given to the container. Now, these names are random, unless it is specified otherwise by the --name= switch. You can also the expose the ports on your containers by using the -p switch as follows: $ docker run -d -p <host_port>:<container_port> <image>:<tag> $ docker run -d -p 8080:80 ubuntu:14.10 This will run the ubuntu 14.10 container in the demonized mode, exposing port 8080 on the Docker host to port 80 on the running container: CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 55cfdcb6beb6 ubuntu:14.10 "/bin/bash" 2 seconds ago Up 2 seconds 0.0.0.0:8080->80/tcp babbage Now, there will come a time when containers don't want to behave. For this, you can see the issues you have by using the docker logs command. The command is very straightforward. You specify the container you want to see the logs off. For this command, you need to use the container ID or the name of the container from the docker ps output: $ docker logs 55cfdcb6beb6 Or: $ docker logs babbage You can also get this ID when you first initiate the docker run command: $ docker run -d ubuntu:14.10 /bin/bash da92261485db98c7463fffadb43e3f684ea9f47949f287f92408fd0f3e4f2bad Stopping containers Now, let's take a look at how we can stop these containers. For various reasons, we would want to do this. There are a few commands we could use; they are docker kill, docker stop, docker pause, and docker unpause. Let's cover them briefly as they are fairly straightforward. First, let's look at the difference between docker kill and docker stop. The docker kill command will do just that—kill the container immediately. For a graceful shutdown of the container, you would want to use the docker stop command. Mostly, when you are testing, you will be using docker kill. When you're in your production environments, you will want to use docker stop to ensure you don't corrupt any data you might have in the Docker volumes. The commands are used exactly like the docker logs command, where you can use the container ID, the random name given to the container, or the one you might specify with the --name= switch. Now, let's take a dive into how we can execute some commands, view information on our running containers, and manipulate them in a small sense. The first thing we want to take a look at, which will make things a little easier with the upcoming commands, is the docker rename command. With the docker rename command, we can change the name that has been randomly generated for the container. When we performed the docker run command, a random name was assigned to our container; most times, these names are fine. But if you are looking for an easy way to manage the containers, a name can be sometimes easier to remember. For this, you can use the docker rename command as follows: $ docker rename <current_container_name> <new_container_name> Now that we have an easily recognizable and rememberable name, let's take a peek inside our containers with the docker stats and docker top commands, taking them in order: $ docker stats <container_name> CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O web1 0.00% 1.016 MB/2.099 GB 0.05% 0 B/0 B The other command docker top provides a list of all running processes inside the container. Again, we can use the name of the container to pull the information: $ docker top <container_name> We will receive an output similar to the following one based on what processes are running inside the container: UID PID PPID C STIME TTY TIME CMD root 8057 1380 0 13:02 pts/0 00:00:00 /bin/bash We can see who is running the process (in this case, the root user), the command being run (in this case, /bin/bash), as well as the other information that might be useful. Lastly, let's cover how we can remove the containers. The same way we looked at removing images earlier with the docker rmi command, we can use the docker rm command to remove unwanted containers. This is useful if you want to reuse a name you provided to a container: $ docker rm <container_name> Summary In this article, we have gone over the basics of what Docker is and how it is compared to typical virtual machines. We looked at the Dockerfile structure and the networking and linking of containers. We went over the installers, how they operate on different operating systems, and how to control them through the command line. We briefly looked at the latest Docker addition Kitematic for those interested in a GUI version for Windows or OS X. Then, we took a small but deep dive into the basic Docker commands to get you started. Resources for Article: Further resources on this subject: Introduction to Docker[article] Docker in Production[article] Speeding Vagrant Development With Docker[article]
Read more
  • 0
  • 0
  • 50741

article-image-setting-project-atomic-host
Packt
17 Feb 2016
6 min read
Save for later

Setting up a Project Atomic host

Packt
17 Feb 2016
6 min read
With DockerTM, containers are becoming mainstream and enterprises are ready to use them. Docker and its ecosystem are evolving at a very high pace, so it is very important to understand the basics and build group up to adopt to new concepts and tools. In this article, we will cover the following recipes: Setting up a Project Atomic host Doing atomic update/rollback with Project Atomic (For more resources related to this topic, see here.) Setting up a Project Atomic host Project Atomic facilitates application-centric IT architecture by providing an end-to-end solution to deploy containerized applications quickly and reliably, with atomic update and rollback for the application and host alike. This is achieved by running applications in containers on a Project Atomic host, which is a lightweight operating system specially designed to run containers. The hosts can be based on Fedora, CentOS, or Red Hat Enterprise Linux. Next, we will elaborate on the building blocks of the Project Atomic host. OSTree and rpm-OSTree OSTree (https://wiki.gnome.org/action/show/Projects/OSTree) is a tool to manage bootable, immutable, and versioned filesystem trees. Using this, we can build client-server architecture in which the server hosts an OSTree repository and the client subscribed to it can incrementally replicate the content. rpm-OSTree is a system to decompose RPMs on the server side into the OSTree repository to which the client can subscribe and perform updates. With each update, a new root is created, which is used for the next reboot. During updates, /etc is rebased and /var is untouched. Container runtime As of now Project Atomic only supports Docker as container runtime. systemd Project Atomic uses Kubernetes (http://kubernetes.io/) for application deployment over clusters of container hosts. Project Atomic can be installed on bare metal, cloud providers, VMs, and so on. In this recipe, let’s see how we can install it on a VM using virt-manager on Fedora. Getting ready Download the image: $ wget http://download.fedoraproject.org/pub/fedora/linux/releases/test/22_Beta/Cloud/x86_64/Images/Fedora-Cloud-Atomic-22_Beta-20150415.x86_64.raw.xz I have downloaded the beta image for Fedora 22 Cloud image For Containers. You should look for the latest cloud image For Containers at https://getfedora.org/en/cloud/download/. Uncompress this image by using the following command: $ xz -d Fedora-Cloud-Atomic-22_Beta-20150415.x86_64.raw.xz How to do it… We downloaded the cloud image that does not have any password set for the default user fedora. While booting the VM, we have to provide a cloud configuration file through which we can customize the VM. To do this, we need to create two files, meta-data and user-data, as follows: $ cat meta-data instance-id: iid-local01 local-hostname: atomichost $ cat user-data #cloud-config password: atomic ssh_pwauth: True chpasswd: { expire: False } ssh_authorized_keys: - ssh-rsa AAAAB3NzaC1yc......... In the preceding code, we need to provide the complete SSH public key. We then need to create an ISO image consisting of these files, which we will use to boot to the VM. As we are using a cloud image, our setting will be applied to the VM during the boot process. This means the hostname will be set to atomichost, the password will be set to atomic, and so on. To create the ISO, run the following command: $ genisoimage -output init.iso -volid cidata -joliet -rock user-data meta-data Start virt-manager. Select New Virtual Machine and then import the existing disk image. Enter the image path of the Project Atomic image we downloaded earlier. Select OS type as Linux and Version as Fedora 20/Fedora 21 (or later), and click on Forward. Next, assign CPU and Memory and click on Forward. Then, give a name to the VM and select Customize configuration before install. Finally, click on Finish and review the details. Next, click on Add Hardware, and after selecting Storage, attach the ISO (init.iso) file we created to the VM and select Begin Installation: Once booted, you can see that its hostname is correctly set and you will be able to log in through the password given in the cloud init file. The default user is fedora and password is atomic as set in the user-data file. How it works… In this recipe, we took a Project Atomic Fedora cloud image and booted it using virt-manager after supplying the cloud init file. There’s more… After logging in, if you do file listing at /, you will see that most of the traditional directories are linked to /var because it is preserved across upgrades. After logging in, you can run the Docker command as usual $sudo docker run -it fedora bash See also The virtual manager documentation at https://virt-manager.org/documentation/ More information on package systems, image systems, and RPM-OSTree at https://github.com/projectatomic/rpm-ostree/blob/master/doc/background.md The quick-start guide on the Project Atomic website at http://www.projectatomic.io/docs/quickstart/ The resources on cloud images at https://www.technovelty.org//linux/running-cloud-images-locally.html and http://cloudinit.readthedocs.org/en/latest/ How to set up Kubernetes with an Atomic host at http://www.projectatomic.io/blog/2014/11/testing-kubernetes-with-an-atomic-host/ and https://github.com/cgwalters/vagrant-atomic-cluster Doing atomic update/rollback with Project Atomic To get to the latest version or to roll back to the older version of Project Atomic, we use the atomic host command, which internally calls rpm-ostree. Getting ready Boot and log in to the Atomic host. How to do it… Just after the boot, run the following command: $ atomic host status You will see details about one deployment that is in use now. To upgrade, run the following command: This changes and/or adds new packages. After the upgrade, we will need to reboot the system to use the new update. Let’s reboot and see the outcome: As we can see, the system is now booted with the new update. The *, which is at the beginning of the first line, specifies the active build. To roll back, run the following command: $ sudo atomic host rollback We will have to reboot again if we want to use older bits. How it works… For updates, the Atomic host connects to the remote repository hosting the newer build, which is downloaded and used from the next reboot onwards until the user upgrades or rolls back. In the case rollback older build available on the system used after the reboot. See also The documentation Project Atomic website, which can be found at http://www.projectatomic.io/docs/os-updates/ Summary Docker and its ecosystem are evolving at a very high pace, so it is very important to understand the basics and build group up to adopt to new concepts and tools. Additional information on Docker can be gained be referring to: https://www.packtpub.com/networking-and-servers/docker-high-performance https://www.packtpub.com/virtualization-and-cloud/monitoring-docker Resources for Article: Further resources on this subject: Understanding Docker [article] Hands On with Docker Swarm [article] Introduction to Docker [article]
Read more
  • 0
  • 0
  • 2952

article-image-create-your-first-react-element
Packt
17 Feb 2016
22 min read
Save for later

Create Your First React Element

Packt
17 Feb 2016
22 min read
From the 7th to the 13th of November 2016, you can save up to 80% on some of our top ReactJS content - so what are you waiting for? Dive in here before the week ends! As many of you know, creating a simple web application today involves writing the HTML, CSS, and JavaScript code. The reason we use three different technologies is because we want to separate three different concerns: Content (HTML) Styling (CSS) Logic (JavaScript) (For more resources related to this topic, see here.) This separation works great for creating a web page because, traditionally, we had different people working on different parts of our web page: one person structured the content using HTML and styled it using CSS, and then another person implemented the dynamic behavior of various elements on that web page using JavaScript. It was a content-centric approach. Today, we mostly don't think of a website as a collection of web pages anymore. Instead, we build web applications that might have only one web page, and that web page does not represent the layout for our content—it represents a container for our web application. Such a web application with a single web page is called (unsurprisingly) a Single Page Application (SPA). You might be wondering, how do we represent the rest of the content in a SPA? Surely, we need to create an additional layout using HTML tags? Otherwise, how does a web browser know what to render? These are all valid questions. Let's take a look at how it works in this article. Once you load your web page in a web browser, it creates a Document Object Model (DOM) of that web page. A DOM represents your web page in a tree structure, and at this point, it reflects the structure of the layout that you created with only HTML tags. This is what happens regardless of whether you're building a traditional web page or a SPA. The difference between the two is what happens next. If you are building a traditional web page, then you would finish creating your web page's layout. On the other hand, if you are building a SPA, then you would need to start creating additional elements by manipulating the DOM with JavaScript. A web browser provides you with the JavaScript DOM API to do this. You can learn more about it at https://developer.mozilla.org/en-US/docs/Web/API/Document_Object_Model. However, manipulating (or mutating) the DOM with JavaScript has two issues: Your programming style will be imperative if you decide to use the JavaScript DOM API directly. This programming style leads to a code base that is harder to maintain. DOM mutations are slow because they cannot be optimized for speed, unlike other JavaScript code. Luckily, React solves both these problems for us. Understanding virtual DOM Why do we need to manipulate the DOM in the first place? Because our web applications are not static. They have a state represented by the user interface (UI) that a web browser renders, and that state can be changed when an event occurs. What kind of events are we talking about? There are two types of events that we're interested in: User events: When a user types, clicks, scrolls, resizes, and so on Server events: When an application receives data or an error from a server, among others What happens while handling these events? Usually, we update the data that our application depends on, and that data represents a state of our data model. In turn, when a state of our data model changes, we might want to reflect this change by updating a state of our UI. Looks like what we want is a way of syncing two different states: the UI state and the data model state. We want one to react to the changes in  the other and vice versa. How can we achieve this? One of the ways to sync your application's UI state with an underlying data model's state is two-way data binding. There are different types of two-way data binding. One of them is key-value observing (KVO), which is used in Ember.js, Knockout, Backbone, and iOS, among others. Another one is dirty checking, which is used in Angular. Instead of two-way data binding, React offers a different solution called the virtual DOM. The virtual DOM is a fast, in-memory representation of the real DOM, and it's an abstraction that allows us to treat JavaScript and DOM as if they were reactive. Let's take a look at how it works: Whenever the state of your data model changes, the virtual DOM and React will rerender your UI to a virtual DOM representation. React then calculates the difference between the two virtual DOM representations: the previous virtual DOM representation that was computed before the data was changed and the current virtual DOM representation that was computed after the data was changed. This difference between the two virtual DOM representations is what actually needs to be changed in the real DOM. React updates only what needs to be updated in the real DOM. The process of finding a difference between the two representations of the virtual DOM and rerendering only the updated patches in a real DOM is fast. Also, the best part is, as a React developer, that you don't need to worry about what actually needs to be rerendered. React allows you to write your code as if you were rerendering the entire DOM every time your application's state changes. If you would like to learn more about the virtual DOM, the rationale behind it, and how it can be compared to data binding, then I would strongly recommend that you watch this very informative talk by Pete Hunt from Facebook at https://www.youtube.com/watch?v=-DX3vJiqxm4. Now that we've learnt about the virtual DOM, let's mutate a real DOM by installing React and creating our first React element. Installing React To start using the React library, we need to first install it. I am going to show you two ways of doing this: the simplest one and the one using the npm install command. The simplest way is to add the <script> tag to our ~/snapterest/build/index.html file: For the development version of React, add the following command: <script src="https://cdnjs.cloudflare.com/ajax/libs/react/0.14.0-beta3/react.js"></script> For the production version version of React, add the following command: <script src="https://cdnjs.cloudflare.com/ajax/libs/react/0.14.0-beta3/react.min.js"></script> For our project, we'll be using the development version of React. At the time of writing, the latest version of React library is 0.14.0-beta3. Over time, React gets updated, so make sure you use the latest version that is available to you, unless it introduces breaking changes that are incompatible with the code samples provided in this article. Visit https://github.com/fedosejev/react-essentials to learn about any compatibility issues between the code samples and the latest version of React. We all know that Browserify allows us to import all the dependency modules for our application using the require() function. We'll be using require() to import the React library as well, which means that, instead of adding a <script> tag to our index.html, we'll be using the npm install command to install React: Navigate to the ~/snapterest/ directory and run this command:  npm install --save react@0.14.0-beta3 react-dom@0.14.0-beta3 Then, open the ~/snapterest/source/app.js file in your text editor and import the React and ReactDOM libraries to the React and ReactDOM variables, respectively: var React = require('react'); var ReactDOM = require('react-dom'); The react package contains methods that are concerned with the key idea behind React, that is, describing what you want to render in a declarative way. On the other hand, the react-dom package offers methods that are responsible for rendering to the DOM. You can read more about why developers at Facebook think it's a good idea to separate the React library into two packages at https://facebook.github.io/react/blog/2015/07/03/react-v0.14-beta-1.html#two-packages. Now we're ready to start using the React library in our project. Next, let's create our first React Element! Creating React Elements with JavaScript We'll start by familiarizing ourselves with a fundamental React terminology. It will help us build a clear picture of what the React library is made of. This terminology will most likely update over time, so keep an eye on the official documentation at http://facebook.github.io/react/docs/glossary.html. Just like the DOM is a tree of nodes, React's virtual DOM is a tree of React nodes. One of the core types in React is called ReactNode. It's a building block for a virtual DOM, and it can be any one of these core types: ReactElement: This is the primary type in React. It's a light, stateless, immutable, virtual representation of a DOM Element. ReactText: This is a string or a number. It represents textual content and it's a virtual representation of a Text Node in the DOM. ReactElements and ReactTexts are ReactNodes. An array of ReactNodes is called a ReactFragment. You will see examples of all of these in this article. Let's start with an example of a ReactElement: Add the following code to your ~/snapterest/source/app.js file: var reactElement = React.createElement('h1'); ReactDOM.render(reactElement, document.getElementById('react-application')); Now your app.js file should look exactly like this: var React = require('react'); var ReactDOM = require('react-dom'); var reactElement = React.createElement('h1'); ReactDOM.render(reactElement, document.getElementById('react-application')); Navigate to the ~/snapterest/ directory and run Gulp's default task: gulp You will see the following output: Starting 'default'... Finished 'default' after 1.73 s Navigate to the ~/snapterest/build/ directory, and open index.html in a web browser. You will see a blank web page. Open Developer Tools in your web browser and inspect the HTML markup for your blank web page. You should see this line, among others: <h1 data-reactid=".0"></h1> Well done! We've just created your first React element. Let's see exactly how we did it. The entry point to the React library is the React object. This object has a method called createElement() that takes three parameters: type, props, and children: React.createElement(type, props, children); Let's take a look at each parameter in more detail. The type parameter The type parameter can be either a string or a ReactClass: A string could be an HTML tag name such as 'div', 'p', 'h1', and so on. React supports all the common HTML tags and attributes. For a complete list of HTML tags and attributes supported by React, you can refer to http://facebook.github.io/react/docs/tags-and-attributes.html. A ReactClass is created via the React.createClass() method. The type parameter describes how an HTML tag or a ReactClass is going to be rendered. In our example, we're rendering the h1 HTML tag. The props parameter The props parameter is a JavaScript object passed from a parent element to a child element (and not the other way around) with some properties that are considered immutable, that is, those that should not be changed. While creating DOM elements with React, we can pass the props object with properties that represent the HTML attributes such as class, style, and so on. For example, run the following commands: var React = require('react'); var ReactDOM = require('react-dom'); var reactElement = React.createElement('h1', { className: 'header' }); ReactDOM.render(reactElement, document.getElementById('react-application')); The preceding code will create an h1 HTML element with a class attribute set to header: <h1 class="header" data-reactid=".0"></h1> Notice that we name our property className rather than class. The reason is that the class keyword is reserved in JavaScript. If you use class as a property name, it will be ignored by React, and a helpful warning message will be printed on the web browser's console: Warning: Unknown DOM property class. Did you mean className?Use className instead. You might be wondering what this data-reactid=".0" attribute is doing in our h1 tag? We didn't pass it to our props object, so where did it come from? It is added and used by React to track the DOM nodes; it might be removed in a future version of React. The children parameter The children parameter describes what child elements this element should have, if any. A child element can be any type of ReactNode: a virtual DOM element represented by a ReactElement, a string or a number represented by a ReactText, or an array of other ReactNodes, which is also called ReactFragment. Let's take a look at this example: var React = require('react'); var ReactDOM = require('react-dom'); var reactElement = React.createElement('h1', { className: 'header' }, 'This is React'); ReactDOM.render(reactElement, document.getElementById('react-application')); The following code will create an h1 HTML element with a class attribute and a text node, This is React: <h1 class="header" data-reactid=".0">This is React</h1> The h1 tag is represented by a ReactElement, while the This is React string is represented by a ReactText. Next, let's create a React element with a number of other React elements as it's children: var React = require('react'); var ReactDOM = require('react-dom');   var h1 = React.createElement('h1', { className: 'header', key: 'header' }, 'This is React'); var p = React.createElement('p', { className: 'content', key: 'content' }, "And that's how it works."); var reactFragment = [ h1, p ]; var section = React.createElement('section', { className: 'container' }, reactFragment);   ReactDOM.render(section, document.getElementById('react-application')); We've created three React elements: h1, p, and section. h1 and p both have child text nodes, "This is React" and "And that's how it works.", respectively. The section has a child that is an array of two ReactElements, h1 and p, called reactFragment. This is also an array of ReactNodes. Each ReactElement in the reactFragment array must have a key property that helps React to identify that ReactElement. As a result, we get the following HTML markup: <section class="container" data-reactid=".0">   <h1 class="header" data-reactid=".0.$header">This is React</h1>   <p class="content" data-reactid=".0.$content">And that's how it works.</p> </section> Now we understand how to create React elements. What if we want to create a number of React elements of the same type? Does it mean that we need to call React.createElement('type') over and over again for each element of the same type? We can, but we don't need to because React provides us with a factory function called React.createFactory(). A factory function is a function that creates other functions. This is exactly what React.createFactory(type) does: it creates a function that produces a ReactElement of a given type. Consider the following example: var React = require('react'); var ReactDOM = require('react-dom');   var listItemElement1 = React.createElement('li', { className: 'item-1', key: 'item-1' }, 'Item 1'); var listItemElement2 = React.createElement('li', { className: 'item-2', key: 'item-2' }, 'Item 2'); var listItemElement3 = React.createElement('li', { className: 'item-3', key: 'item-3' }, 'Item 3');   var reactFragment = [ listItemElement1, listItemElement2, listItemElement3 ]; var listOfItems = React.createElement('ul', { className: 'list-of-items' }, reactFragment);   ReactDOM.render(listOfItems, document.getElementById('react-application')); The preceding example produces this HTML: <ul class="list-of-items" data-reactid=".0">   <li class="item-1" data-reactid=".0.$item-1">Item 1</li>   <li class="item-2" data-reactid=".0.$item-2">Item 2</li>   <li class="item-3" data-reactid=".0.$item-3">Item 3</li> </ul> We can simplify it by first creating a factory function: var React = require('react'); var ReactDOM = require('react-dom'); var createListItemElement = React.createFactory('li'); var listItemElement1 = createListItemElement({ className: 'item-1', key: 'item-1' }, 'Item 1'); var listItemElement2 = createListItemElement({ className: 'item-2', key: 'item-2' }, 'Item 2'); var listItemElement3 = createListItemElement({ className: 'item-3', key: 'item-3' }, 'Item 3'); var reactFragment = [ listItemElement1, listItemElement2, listItemElement3 ]; var listOfItems = React.createElement('ul', { className: 'list-of-items' }, reactFragment); ReactDOM.render(listOfItems, document.getElementById('react-application')); In the preceding example, we're first calling the React.createFactory() function and passing a li HTML tag name as a type parameter. Then, the React.createFactory() function returns a new function that we can use as a convenient shorthand to create elements of type li. We store a reference to this function in a variable called createListItemElement. Then, we call this function three times, and each time we only pass the props and children parameters, which are unique for each element. Notice that React.createElement() and React.createFactory() both expect the HTML tag name string (such as li) or the ReactClass object as a type parameter. React provides us with a number of built-in factory functions to create the common HTML tags. You can call them from the React.DOM object; for example, React.DOM.ul(), React.DOM.li(), React.DOM.div(), and so on. Using them, we can simplify our previous example even further: var React = require('react'); var ReactDOM = require('react-dom');   var listItemElement1 = React.DOM.li({ className: 'item-1', key: 'item-1' }, 'Item 1'); var listItemElement2 = React.DOM.li({ className: 'item-2', key: 'item-2' }, 'Item 2'); var listItemElement3 = React.DOM.li({ className: 'item-3', key: 'item-3' }, 'Item 3');   var reactFragment = [ listItemElement1, listItemElement2, listItemElement3 ]; var listOfItems = React.DOM.ul({ className: 'list-of-items' }, reactFragment);   ReactDOM.render(listOfItems, document.getElementById('react-application')); Now we know how to create a tree of ReactNodes. However, there is one important line of code that we need to discuss before we can progress further: ReactDOM.render(listOfItems, document.getElementById('react-application')); As you might have already guessed, it renders our ReactNode tree to the DOM. Let's take a closer look at how it works. Rendering React Elements The ReactDOM.render() method takes three parameters: ReactElement, a regular DOMElement, and a callback function: ReactDOM.render(ReactElement, DOMElement, callback); ReactElement is a root element in the tree of ReactNodes that you've created. A regular DOMElement is a container DOM node for that tree. The callback is a function executed after the tree is rendered or updated. It's important to note that if this ReactElement was previously rendered to a parent DOM Element, then ReactDOM.render() will perform an update on the already rendered DOM tree and only mutate the DOM as it is necessary to reflect the latest version of the ReactElement. This is why a virtual DOM requires fewer DOM mutations. So far, we've assumed that we're always creating our virtual DOM in a web browser. This is understandable because, after all, React is a user interface library, and all the user interfaces are rendered in a web browser. Can you think of a case when rendering a user interface on a client would be slow? Some of you might have already guessed that I am talking about the initial page load. The problem with the initial page load is the one I mentioned at the beginning of this article—we're not creating static web pages anymore. Instead, when a web browser loads our web application, it receives only the bare minimum HTML markup that is usually used as a container or a parent element for our web application. Then, our JavaScript code creates the rest of the DOM, but in order for it to do so it often needs to request extra data from the server. However, getting this data takes time. Once this data is received, our JavaScript code starts to mutate the DOM. We know that DOM mutations are slow. How can we solve this problem? The solution is somewhat unexpected. Instead of mutating the DOM in a web browser, we mutate it on a server. Just like we would with our static web pages. A web browser will then receive an HTML that fully represents a user interface of our web application at the time of the initial page load. Sounds simple, but we can't mutate the DOM on a server because it doesn't exist outside a web browser. Or can we? We have a virtual DOM that is just a JavaScript, and as you know using Node.js, we can run JavaScript on a server. So technically, we can use the React library on a server, and we can create our ReactNode tree on a server. The question is how can we render it to a string that we can send to a client? React has a method called ReactDOMServer.renderToString() just to do this: var ReactDOMServer = require('react-dom/server'); ReactDOMServer.renderToString(ReactElement); It takes a ReactElement as a parameter and renders it to its initial HTML. Not only is this faster than mutating a DOM on a client, but it also improves the Search Engine Optimization (SEO) of your web application. Speaking of generating static web pages, we can do this too with React: var ReactDOMServer = require('react-dom/server'); ReactDOM.renderToStaticMarkup(ReactElement); Similar to ReactDOM.renderToString(), this method also takes a ReactElement as a parameter and outputs an HTML string. However, it doesn't create the extra DOM attributes that React uses internally, it produces shorter HTML strings that we can transfer to the wire quickly. Now you know not only how to create a virtual DOM tree using React elements, but you also know how to render it to a client and server. Our next question is whether we can do it quickly and in a more visual manner. Creating React Elements with JSX When we build our virtual DOM by constantly calling the React.createElement() method, it becomes quite hard to visually translate these multiple function calls into a hierarchy of HTML tags. Don't forget that, even though we're working with a virtual DOM, we're still creating a structure layout for our content and user interface. Wouldn't it be great to be able to visualize that layout easily by simply looking at our React code? JSX is an optional HTML-like syntax that allows us to create a virtual DOM tree without using the React.createElement() method. Let's take a look at the previous example that we created without JSX: var React = require('react'); var ReactDOM = require('react-dom');   var listItemElement1 = React.DOM.li({ className: 'item-1', key: 'item-1' }, 'Item 1'); var listItemElement2 = React.DOM.li({ className: 'item-2', key: 'item-2' }, 'Item 2'); var listItemElement3 = React.DOM.li({ className: 'item-3', key: 'item-3' }, 'Item 3');   var reactFragment = [ listItemElement1, listItemElement2, listItemElement3 ]; var listOfItems = React.DOM.ul({ className: 'list-of-items' }, reactFragment);   ReactDOM.render(listOfItems, document.getElementById('react-application')); Translate this to the one with JSX: var React = require('react'); var ReactDOM = require('react-dom');   var listOfItems = <ul className="list-of-items">                     <li className="item-1">Item 1</li>                     <li className="item-2">Item 2</li>                     <li className="item-3">Item 3</li>                   </ul>; ReactDOM.render(listOfItems, document.getElementById('react-application'));   As you can see, JSX allows us to write HTML-like syntax in our JavaScript code. More importantly, we can now clearly see what our HTML layout will look like once it's rendered. JSX is a convenience tool and it comes with a price in the form of an additional transformation step. Transformation of the JSX syntax into valid JavaScript syntax must happen before our "invalid" JavaScript code is interpreted. We know that the babely module transforms our JSX syntax into a JavaScript one. This transformation happens every time we run our default task from gulpfile.js: gulp.task('default', function () {   return browserify('./source/app.js')         .transform(babelify)         .bundle()         .pipe(source('snapterest.js'))         .pipe(gulp.dest('./build/')); }); As you can see, the .transform(babelify) function call transforms JSX into JavaScript before bundling it with the other JavaScript code. To test our transformation, run this command: gulp Then, navigate to the ~/snapterest/build/ directory, and open index.html in a web browser. You will see a list of three items. The React team has built an online JSX Compiler that you can use to test your understanding of how JSX works at http://facebook.github.io/react/jsx-compiler.html. Using JSX, you might feel very unusual in the beginning, but it can become a very intuitive and convenient tool to use. The best part is that you can choose whether to use it or not. I found that JSX saves me development time, so I chose to use it in this project that we're building. If you choose to not use it, then I believe that you have learned enough in this article to be able to translate the JSX syntax into a JavaScript code with the React.createElement() function calls. If you have a question about what we have discussed in this article, then you can refer to https://github.com/fedosejev/react-essentials and create a new issue. Summary We started this article by discussing the issues with single web page applications and how they can be addressed. Then, we learned what a virtual DOM is and how React allows us to build it. We also installed React and created our first React element using only JavaScript. Then, we also learned how to render React elements in a web browser and on a server. Finally, we looked at a simpler way of creating React elements with JSX. Resources for Article: Further resources on this subject: Changing Views [article] Introduction to Akka [article] ECMAScript 6 Standard [article]
Read more
  • 0
  • 0
  • 17426
Packt
17 Feb 2016
14 min read
Save for later

"D-J-A-N-G-O... The D is silent." - Authentication in Django

Packt
17 Feb 2016
14 min read
The authentication module saves a lot of time in creating space for users. The following are the main advantages of this module: The main actions related to users are simplified (connection, account activation, and so on) Using this system ensures a certain level of security Access restrictions to pages can be done very easily (For more resources related to this topic, see here.) It's such a useful module that we have already used it without noticing. Indeed, access to the administration module is performed by the authentication module. The user we created during the generation of our database was the first user of the site. This article greatly alters the application we wrote earlier. At the end of this article, we will have: Modified our UserProfile model to make it compatible with the module Created a login page Modified the addition of developer and supervisor pages Added the restriction of access to connected users How to use the authentication module In this section, we will learn how to use the authentication module by making our application compatible with the module. Configuring the Django application There is normally nothing special to do for the administration module to work in our TasksManager application. Indeed, by default, the module is enabled and allows us to use the administration module. However, it is possible to work on a site where the web Django authentication module has been disabled. We will check whether the module is enabled. In the INSTALLED_APPS section of the settings.py file, we have to check the following line: 'django.contrib.auth', Editing the UserProfile model The authentication module has its own User model. This is also the reason why we have created a UserProfile model and not just User. It is a model that already contains some fields, such as nickname and password. To use the administration module, you have to use the User model on the Python33/Lib/site-package/django/contrib/auth/models.py file. We will modify the UserProfile model in the models.py file that will become the following: class UserProfile(models.Model): user_auth = models.OneToOneField(User, primary_key=True) phone = models.CharField(max_length=20, verbose_name="Phone number", null=True, default=None, blank=True) born_date = models.DateField(verbose_name="Born date", null=True, default=None, blank=True) last_connexion = models.DateTimeField(verbose_name="Date of last connexion", null=True, default=None, blank=True) years_seniority = models.IntegerField(verbose_name="Seniority", default=0) def __str__(self): return self.user_auth.username We must also add the following line in models.py: from django.contrib.auth.models import User In this new model, we have: Created a OneToOneField relationship with the user model we imported Deleted the fields that didn't exist in the user model The OneToOne relation means that for each recorded UserProfile model, there will be a record of the User model. In doing all this, we deeply modify the database. Given these changes and because the password is stored as a hash, we will not perform the migration with South. It is possible to keep all the data and do a migration with South, but we should develop a specific code to save the information of the UserProfile model to the User model. The code should also generate a hash for the password, but it would be long. To reset South, we must do the following: Delete the TasksManager/migrations folder and all the files contained in this folder Delete the database.db file To use the migration system, we have to use the following commands: manage.py schemamigration TasksManager --initial manage.py syncdb –migrate After the deletion of the database, we must remove the initial data in create_developer.py. We must also delete the URL developer_detail and the following line in index.html: <a href="{% url "developer_detail" "2" %}">Detail second developer (The second user must be a developer)</a><br /> Adding a user The pages that allow you to add a developer and supervisor no longer work because they are not compatible with our recent changes. We will change these pages to integrate our style changes. The view contained in the create_supervisor.py file will contain the following code: from django.shortcuts import render from TasksManager.models import Supervisor from django import forms from django.http import HttpResponseRedirect from django.core.urlresolvers import reverse from django.contrib.auth.models import User def page(request): if request.POST: form = Form_supervisor(request.POST) if form.is_valid(): name = form.cleaned_data['name'] login = form.cleaned_data['login'] password = form.cleaned_data['password'] specialisation = form.cleaned_data['specialisation'] email = form.cleaned_data['email'] new_user = User.objects.create_user(username = login, email = email, password=password) # In this line, we create an instance of the User model with the create_user() method. It is important to use this method because it can store a hashcode of the password in database. In this way, the password cannot be retrieved from the database. Django uses the PBKDF2 algorithm to generate the hash code password of the user. new_user.is_active = True # In this line, the is_active attribute defines whether the user can connect or not. This attribute is false by default which allows you to create a system of account verification by email, or other system user validation. new_user.last_name=name # In this line, we define the name of the new user. new_user.save() # In this line, we register the new user in the database. new_supervisor = Supervisor(user_auth = new_user, specialisation=specialisation) # In this line, we create the new supervisor with the form data. We do not forget to create the relationship with the User model by setting the property user_auth with new_user instance. new_supervisor.save() return HttpResponseRedirect(reverse('public_empty')) else: return render(request, 'en/public/create_supervisor.html', {'form' : form}) else: form = Form_supervisor() form = Form_supervisor() return render(request, 'en/public/create_supervisor.html', {'form' : form}) class Form_supervisor(forms.Form): name = forms.CharField(label="Name", max_length=30) login = forms.CharField(label = "Login") email = forms.EmailField(label = "Email") specialisation = forms.CharField(label = "Specialisation") password = forms.CharField(label = "Password", widget = forms.PasswordInput) password_bis = forms.CharField(label = "Password", widget = forms.PasswordInput) def clean(self): cleaned_data = super (Form_supervisor, self).clean() password = self.cleaned_data.get('password') password_bis = self.cleaned_data.get('password_bis') if password and password_bis and password != password_bis: raise forms.ValidationError("Passwords are not identical.") return self.cleaned_data The create_supervisor.html template remains the same, as we are using a Django form. You can change the page() method in the create_developer.py file to make it compatible with the authentication module (you can refer to downloadable Packt code files for further help): def page(request): if request.POST: form = Form_inscription(request.POST) if form.is_valid(): name = form.cleaned_data['name'] login = form.cleaned_data['login'] password = form.cleaned_data['password'] supervisor = form.cleaned_data['supervisor'] new_user = User.objects.create_user(username = login, password=password) new_user.is_active = True new_user.last_name=name new_user.save() new_developer = Developer(user_auth = new_user, supervisor=supervisor) new_developer.save() return HttpResponse("Developer added") else: return render(request, 'en/public/create_developer.html', {'form' : form}) else: form = Form_inscription() return render(request, 'en/public/create_developer.html', {'form' : form}) We can also modify developer_list.html with the following content: {% extends "base.html" %} {% block title_html %} Developer list {% endblock %} {% block h1 %} Developer list {% endblock %} {% block article_content %} <table> <tr> <td>Name</td> <td>Login</td> <td>Supervisor</td> </tr> {% for dev in object_list %} <tr> <!-- The following line displays the __str__ method of the model. In this case it will display the username of the developer --> <td><a href="">{{ dev }}</a></td> <!-- The following line displays the last_name of the developer --> <td>{{ dev.user_auth.last_name }}</td> <!-- The following line displays the __str__ method of the Supervisor model. In this case it will display the username of the supervisor --> <td>{{ dev.supervisor }}</td> </tr> {% endfor %} </table> {% endblock %} Login and logout pages Now that you can create users, you must create a login page to allow the user to authenticate. We must add the following URL in the urls.py file: url(r'^connection$', 'TasksManager.views.connection.page', name="public_connection"), You must then create the connection.py view with the following code: from django.shortcuts import render from django import forms from django.contrib.auth import authenticate, login # This line allows you to import the necessary functions of the authentication module. def page(request): if request.POST: # This line is used to check if the Form_connection form has been posted. If mailed, the form will be treated, otherwise it will be displayed to the user. form = Form_connection(request.POST) if form.is_valid(): username = form.cleaned_data["username"] password = form.cleaned_data["password"] user = authenticate(username=username, password=password) # This line verifies that the username exists and the password is correct. if user: # In this line, the authenticate function returns None if authentication has failed, otherwise it returns an object that validates the condition. login(request, user) # In this line, the login() function allows the user to connect. else: return render(request, 'en/public/connection.html', {'form' : form}) else: form = Form_connection() return render(request, 'en/public/connection.html', {'form' : form}) class Form_connection(forms.Form): username = forms.CharField(label="Login") password = forms.CharField(label="Password", widget=forms.PasswordInput) def clean(self): cleaned_data = super(Form_connection, self).clean() username = self.cleaned_data.get('username') password = self.cleaned_data.get('password') if not authenticate(username=username, password=password): raise forms.ValidationError("Wrong login or passwsord") return self.cleaned_data You must then create the connection.html template with the following code: {% extends "base.html" %} {% block article_content %} {% if user.is_authenticated %} <-- This line checks if the user is connected.--> <h1>You are connected.</h1> <p> Your email : {{ user.email }} <-- In this line, if the user is connected, this line will display his/her e-mail address.--> </p> {% else %} <!-- In this line, if the user is not connected, we display the login form.--> <h1>Connexion</h1> <form method="post" action="{{ public_connection }}"> {% csrf_token %} <table> {{ form.as_table }} </table> <input type="submit" class="button" value="Connection" /> </form> {% endif %} {% endblock %} When the user logs in, Django will save his/her data connection in session variables. This example has allowed us to verify that the audit login and password was transparent to the user. Indeed, the authenticate() and login() methods allow the developer to save a lot of time. Django also provides convenient shortcuts for the developer such as the user.is_authenticated attribute that checks if the user is logged in. Users prefer when a logout link is present on the website, especially when connecting from a public computer. We will now create the logout page. First, we need to create the logout.py file with the following code: from django.shortcuts import render from django.contrib.auth import logout def page(request): logout(request) return render(request, 'en/public/logout.html') In the previous code, we imported the logout() function of the authentication module and used it with the request object. This function will remove the user identifier of the request object, and delete flushes their session data. When the user logs out, he/she needs to know that the site was actually disconnected. Let's create the following template in the logout.html file: {% extends "base.html" %} {% block article_content %} <h1>You are not connected.</h1> {% endblock %} Restricting access to the connected members When developers implement an authentication system, it's usually to limit access to anonymous users. In this section, we'll see two ways to control access to our web pages. Restricting access in views The authentication module provides simple ways to prevent anonymous users from accessing some pages. Indeed, there is a very convenient decorator to restrict access to a view. This decorator is called login_required. In the example that follows, we will use the designer to limit access to the page() view from the create_developer module in the following manner: First, we must import the decorator with the following line: from django.contrib.auth.decorators import login_required Then, we will add the decorator just before the declaration of the view: @login_required def page(request): # This line already exists. Do not copy it. With the addition of these two lines, the page that lets you add a developer is only available to the logged-in users. If you try to access the page without being connected, you will realize that it is not very practical because the obtained page is a 404 error. To improve this, simply tell Django what the connection URL is by adding the following line in the settings.py file: LOGIN_URL = 'public_connection' With this line, if the user tries to access a protected page, he/she will be redirected to the login page. You may have noticed that if you're not logged in and you click the Create a developer link, the URL contains a parameter named next. The following is the screen capture of the URL: This parameter contains the URL that the user tried to consult. The authentication module redirects the user to the page when he/she connects. To do this, we will modify the connection.py file we created. We add the line that imports the render() function to import the redirect() function: from django.shortcuts import render, redirect To redirect the user after they log in, we must add two lines after the line that contains the code login (request, user). There are two lines to be added: if request.GET.get('next') is not None: return redirect(request.GET['next']) This system is very useful when the user session has expired and he/she wants to see a specific page. Restricting access to URLs The system that we have seen does not simply limit access to pages generated by CBVs. For this, we will use the same decorator, but this time in the urls.py file. We will add the following line to import the decorator: from django.contrib.auth.decorators import login_required We need to change the line that corresponds to the URL named create_project: url (r'^create_project$', login_required(CreateView.as_view(model=Project, template_name="en/public/create_project.html", success_url = 'index')), name="create_project"), The use of the login_required decorator is very simple and allows the developer to not waste too much time. Summary In this article, we modified our application to make it compatible with the authentication module. We created pages that allow the user to log in and log out. We then learned how to restrict access to some pages for the logged in users. To learn more about Django, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Django By Example (https://www.packtpub.com/web-development/django-example) Learning Django Web Development (https://www.packtpub.com/web-development/learning-django-web-development) Resources for Article: Further resources on this subject: Code Style in Django [Article] Adding a developer with Django forms [Article] Enhancing Your Blog with Advanced Features [Article]
Read more
  • 0
  • 0
  • 10521

article-image-trends-ux-design
Packt
17 Feb 2016
10 min read
Save for later

Trends UX Design

Packt
17 Feb 2016
10 min read
In this article written by Scott Faranello, author of the book Practical UX Design, as UX has come a long way since the early days of the Internet and it’s still evolving. UX may be more popular today than ever, but that doesn’t mean it’s perfected. There are still many unknowns in our field and we are still a long way off from any kind of universal understanding of what UX is, its value and its importance. There are many “truths” about UX – many of which was presented in this book – but with each new project and each new experience it becomes clear that the practice of UX is evolving and must continue to do so in order to make what we do even easier to understand. Them more popular something becomes the more those outside of the practice become interested. This is both a great thing and a bad thing. Good because it becomes better understand and accepted and bad because it can also become a “shiny object” that companies suddenly feel they must have without having any idea what it is and what it means to incorporate it into their culture. (For more resources related to this topic, see here.) Learning the fundamentals, as presented in this book, is the first step. Following that, staying abreast of what’s new will keep you moving forward and relevant as a practitioner. There is a classic movie called Glengarry Glen Ross in which the phrase Always Be Closing (ABC) came to define the practice of selling and salesmanship. Well, UX has a phrase as well, one I call ABL or Always Be Learning. Trends may come and go but learning is life long. As for trends, I recommend doing a Google search at the beginning and end of each year where you type: “UX Trends” followed by the upcoming year. If you haven’t done so do it for this year and make it a habit to do that going forward. You will find plenty of information about what coming and also what’s passed to see how much of it actually came true. Look at UX tends as more than predictions though. Trends in UX often become so because of what has already been happening, but starting to appear more often. Read about them, share them with your colleagues, notice them and seek them out in other’s work. Where some things like patterns and IA might be universal, the way we approach them can evolve. For example, here are some common areas of UX that while fundamental in our understanding of the practice are ever changing: Presentation of Design Uniformity of design UX’s Role in Design Advancement in Technologies Generational Changes and Expectations Presentation of design Today there are two ways to present design in terms of UX: Visibly and Invisibly. Visible design: Visible design is what we know and see when we go online and look at our phones, our watches and anything else that requires our attention to interactive screens using typical design elements (Colors, fonts, patterns, and so on.) Invisible design: Invisible design doesn’t require an interface in the traditional sense. We still need to interact, but none of the typical design elements are necessary. All that’s needed is an ability to read and to follow very easy instructions. For example: https://digit.co/ Digit is an application that automatically saves the user money by transferring small amounts, or whatever Digit deems “safe” to transfer without the user even noticing. It does this by tracking the users spending habits and then every few days transferring small amounts ($5 up to $50) into a Digit account where it sits until the user is ready to transfer it back to their account to spend on whatever they choose. There is no fee to use Digit. To understand how it makes money, please read about it on their website. The point is, the “interface” of digit is text on the users' phone. Invisible design is a “conversation” between itself and the user that works seamlessly and without the need for a traditional app or screen and it’s becoming more real every day. Invisible design is less about the app and more about what the app provides to the user without the user having to do anything except receive the message. Recall Dieter Rams’ Ten Principles of Design from Chapter 4 when he pointed out that good design is unobtrusive; good design does not distract or rely on the user stopping to figure it out; good design just works. This is true of all good design, both visible and invisible, but while some require an interface to engage and interact, truly invisible design does not. Some additional examples of invisible design: Automated airline text messages for flight delays Weather alerts Auto updates of apps on any device Knock (which is something you need to see to understand and even then, but it’s intriguing: http://www.knocktounlock.com/). Internet of Things (IoT) Invisible design is still is still evolving. Will it one day overtake traditional apps or has it already? Uniformity of design Ask designers today what they think of the state of design and you will get varied answers. One that seems to be getting louder is the notion that design is losing its identity. Google articles with titles such as “Where Has All The Soul Gone”; “Fall of the Designer”; “Why Web Design Is Dead”, “The End of Design As We Know It” and you will find arguments about how templates have taken over. Sites like WordPress, Drupal and others like Medium that allow content to be created by the user and seen my millions without any of the heavy lifting of web design, has created, in some eyes, a world of bland design where creativity suffers, patterns have matured and where “up and running” has become more important than what we are actually running with. There may be something to the argument, but I will leave it to you to decide. UX’s Role in Design As UX becomes more prevalent and as companies become more aware of it there is a risk of UX becoming less of a stand-alone discipline and more of something that everyone just does. A similar thing happened with Customer Experience, or UX, where now companies just consider themselves “customer centric” even if they aren’t and where the mantra is “Customer First”, even though that might not be the case in reality. Mantras are easy; living up to them at a high level is something else. As a result of these changes, UX as a practice has to clarify itself and its value more than ever. Where once just being good at design and making wireframes were a priority, today these are skills that even developers can lay claim to. To remain relevant means being able to address the more challenging concepts and activities that many may not be able to grasp or have the time to focus on, like ethnography, human factors and on going, in-depth research around usability, user needs and analyzing usage patterns and analytics as they relate to productivity. While these may not be solely within the UX domain, they are areas of focus that many companies are not spending enough time on, to their detriment. They are, however, areas that UX can easily focus on and in so doing close a widening gap that benefits everyone involved. Advancement in Technologies As technology becomes easier to use, it usually requires greater complexity to make that happen. As a result, UX practitioners need to be more aware of and on top of the latest trends as well as increasing their understanding of the technologies underlying them. This is not to suggest that UX practitioners become programmers or even a strong understanding of how to code. What is suggested is that UX practitioners become more aware of how technology is becoming more integral and intertwined into the everyday. For example, we will continue to hear about the Internet of Things (IoT) for some time, at least in the near future and probably longer. The more invisible the user experience becomes and the more untethered we become from standalone devices the more important it is going to be for UX practitioners to think more conceptually and more creatively to see how seemingly disparate objects and ideas can go together to make something entirely new. Some examples of IoT technology: Prescription refill and regimen reminders (glowcaps.com) Remote biometric readers to track a patients’ heart rate, blood pressure, safe activity levels, etc. (preventicesolutions.com) Controlling the home from afar with smart outlets that can be used to turn appliances and thermostats on and off (nest.com, belkin.com) Smart trash cans (bigbelly.com) Efficient “smart” street light systems (Echelon) The list literally goes on and on because there really is no end to what technology can provide to almost anything and all of it will require user engagement in one way or another. Knowing what, where, when and how these changes will take place and how they will affect the end user is our role and our responsibility as UX practitioners to get right the first time. Generational Changes and Expectations Baby Boomers, Gen X, Gen Y, Millenials, Gen Z. With each passing year and decade it gets harder and harder to know what the next generation will expect and demand from technology. One thing is for sure, they will expect things to work without question and without the need to stop and thin about it. Every company will require meeting such demands over at least the next decade and anyone not thinking about this and the workforce requirements to meet these demands is going to disappear. Even today we see it happening as institutions like WalMart fall to Amazon, Taxi’s to Uber, TV networks to Netflix, and who knows what else. Trends may come and go, but change, whether we are ready for it or not is going to happen faster than we’ve ever experienced it. Will you be ready? Summary In this article we looked at Tools and Trends and the importance of staying engaged and attuned to how the field of UX is evolving. In addition, choosing the tools available to make our jobs easier should be based on your needs and the needs of your team, however, when it comes to standard tools like ethnography, empathy and journey maps and usability studies there is less to debate and more to learn in terms of using these tools to get the most from your customers/users in terms of understanding and need. Resources for Article: Further resources on this subject: Performance by Design[article] Unity 3.x Scripting-Character Controller versus Rigidbody[article] Using Specular in Unity[article]
Read more
  • 0
  • 0
  • 26648
Modal Close icon
Modal Close icon