How-To Tutorials

article-image-introduction-python-lists-and-dictionaries

09 Mar 2016

10 min read

An Introduction to Python Lists and Dictionaries

09 Mar 2016

In this article by Jessica Ingrassellino, the author of Python Projects for Kids, you will learn that Python has very efficient ways of storing data, which is one reason why it is popular among many companies that make web applications. You will learn about the two most important ways to store and retrieve data in Python—lists and dictionaries. (For more resources related to this topic, see here.) Lists Lists have many different uses when coding and many different operations can be performed on lists, thanks to Python. In this article, you will only learn some of the many uses of lists. However, if you wish to learn more about lists, the Python documentation is very detailed and available at https://docs.python.org/3/tutorial/datastructures.html?highlight=lists#more-on-lists. First, it is important to note that a list is made by assigning it a name and putting the items in the list inside of square brackets []. In your Python shell, type the three lists, one on each line: fruit = ['apple', 'banana', 'kiwi', 'dragonfruit'] years = [2012, 2013, 2014, 2015] students_in_class = [30, 22, 28, 33] The lists that you just typed have a particular kind of data inside. However, one good feature of lists is that they can mix up data types within the same list. For example, I have made this list that combines strings and integers: computer_class = ['Cynthia', 78, 42, 'Raj', 98, 24, 35, 'Kadeem', 'Rachel'] Now that we have made the lists, we can get the contents of the list in many ways. In fact, once you create a list, the computer remembers the order of the list, and the order stays constant until it is changed purposefully. The easiest way for us to see that the order of lists is maintained is to run tests on the lists that we have already made. The first item of a Python list is always counted as 0 (zero). So, for our first test, let's see if asking for the 0 item actually gives us the first item. Using our fruit list, we will type the name of the list inside of the print statement, and then add square brackets [] with the number 0: print(fruit[0]) Your output will be apple, since apple is the first fruit in the list that we created earlier. So, we have evidence that counting in Python does start with 0. Now, we can try to print the fourth item in the fruit list. You will notice that we are entering 3 in our print command. This is because the first item started at 0. Type the following code into your Python shell: print(fruit[3]) What is your outcome? Did you expect dragonfruit to be the answer? If so, good, you are learning to count items in lists. If not, remember that the first item in a list is the 0 item. With practice, you will become better at counting items in Python lists. For extra practice, work with the other lists that we made earlier, and try printing different items from the list by changing the number in the following line of code: print(list_name[item_number]) Where the code says list_name, write the name of the list that you want to use. Where the code says item_number, write the number of the item that you want to print. Remember that lists begin counting at 0. Changing the list – adding and removing information Even though lists have order, lists can be changed. Items can be added to a list, removed from a list, or changed in a list. Again, there are many ways to interact with lists. We will only discuss a few here, but you can always read the Python documentation for more information. To add an item to our fruit list, for example, we can use a method called list.append(). To use this method, type the name of the list, a dot, the method name append, and then parenthesis with the item that you would like to add contained inside. If the item is a string, remember to use single quotes. Type the following code to add an orange to the list of fruits that we have made: fruit.append('orange') Then, print the list of fruit to see that orange has been added to the list: print(fruit) Now, let's say that we no longer wish for the dragonfruit to appear on our list. We will use a method called list.remove(). To do this, we will type the name of our list, a dot, the method name called remove, and the name of the item that we wish to remove: fruit.remove('dragonfruit') Then, we will print the list to see that the dragonfruit has been removed: print(fruit) If you have more than one of the same item in the list, list.remove() will only remove the first instance of that item. The other items with the same name need to be removed separately. Loops and lists Lists and for loops work very well together. With lists, we can do something called iteration. By itself, the word iteration means to repeat a procedure over and over again. We know that the for loops repeat things for a limited and specific number of times. In this sample, we have three colors in our list. Make this list in your Python terminal: colors = ['green', 'yellow', 'red'] Using our list, we may decide that for each color in the list, we want to print the statement called I see and add each color in our list. Using the for loop with the list, we can type the print statement one time and get three statements in return. Type the following for loop into your Python shell: for color in colors: print('I see ' + str(color) + '.') Once you are done typing the print line and pressing Enter twice, your for loop will start running, and you should see the following statements printed out in your Python shell: As you can imagine, lists and the for loops are very powerful when used together. Instead of having to type the line three times with three different pieces of code, we only had to type two lines of code. We used the str() method to make sure that the sentence that we printed combined with the list items. Our for loop is helpful because those two lines of code would work if there were 20 colors in our list. Dictionaries Dictionaries are another way to organize data. At first glance, a dictionary may look just like a list. However, dictionaries have different jobs, rules, and syntax. Dictionaries have names and use curly braces to store information. For example, if we wanted to make a dictionary called sports, we would then put the dictionary entries inside of curly braces. Here is a simple example: numbers = {'one': 1, 'two': 2, 'three': 3} Key/value pairs in dictionaries A dictionary stores information with things called keys and values. In a dictionary of items, for example, we may have keys that tell us the names of each item and values that tell us how many of each item we have in our inventory. Once we store these items in our dictionary, we can add or remove new items (keys), add new amounts (values), or change the amounts of existing items. Here is an example of a dictionary that could hold some information for a game. Let’s suppose that the hero in our game has some items needed to survive. Here is a dictionary of our hero's items: items = {'arrows' : 200, 'rocks' : 25, 'food' : 15, 'lives' : 2} Unlike lists, a dictionary uses keys and values to find information. So, this dictionary has the keys called arrows, rocks, food, and lives. Each of the numbers tells us the amount of items that our hero has. Dictionaries have different characteristics than lists do. So, we can look up certain items in our dictionary using the print function: print(items['arrows']) The result of this print command will print 200, as this is the number of arrows our hero has in his inventory: Changing the dictionary – adding and removing information Python offers us ways not only to make a dictionary but to also add and remove things from our dictionaries. For example, let's say that in our game, we allow the player to discover a fireball later in the game. To add the item to the dictionary, we will use what is called the subscript method to add a new key and a new value to our dictionary. This means that we will use the name of the dictionary and square brackets to write the name of the item that we wish to add, and finally, we will set the value to how many items we want to put into our dictionary: items['fireball'] = 10 If we print the entire dictionary of items, you will see that fireball has been added: print(items) items = {'arrows' : 200, 'rocks' : 25, 'food' : 15, 'lives' : 2, 'fireball' : 10} We can also change the number of items in our dictionary using the dict.update() method. This method uses the name of the dictionary and the word update. Then, in parentheses (), we use curly braces {} to type the name of the item that we wish to update—a colon (:), and the new number of items we want in the dictionary. Try this in your Python shell: items.update({'rocks':10}) print(items) You will notice that if you have done print(items), then you will now have 10 rocks instead of 25. We have successfully updated our number of items. To remove something from a dictionary, one must reference the key or the name of the item and delete the item. By doing so, the value that goes with the item will also be removed. In Python, this means using del along with the name of the dictionary and the name of the item you wish to remove. Using the items list as our example, let's remove lives, and then use a print statement to test and see if the lives key was removed: del items['lives'] print(items) The items list will now look as follows: With dictionaries, information is stored and retrieved differently than with lists, but we can still perform the same operation of adding and removing information as well as making changes to information. Summary Lists and dictionaries are two different ways to store and retrieve information in Python. Although, the sample data that we saw in this article was small, Python can handle large sets of data and process them at high speeds. Learning to use lists and dictionaries will allow you to solve complicated programming problems in many realms, including gaming, web app development, and data analysis. Resources for Article: Further resources on this subject: Exception Handling in MySQL for Python [article] Configuring and securing PYTHON LDAP Applications Part 1 [article] Web scraping with Python (Part 2) [article]

0
0
4833

Packt

09 Mar 2016

23 min read

Python Data Structures

Packt

09 Mar 2016

23 min read

In this article written by Dusty Phillips, author of the book Python 3 Object-oriented Programming - Second Edition we'll be discussing the object-oriented features of data structures, when they should be used instead of a regular class, and when they should not be used. In particular, we'll be covering: Tuples and named tuples Dictionaries (For more resources related to this topic, see here.) Empty objects Let's start with the most basic Python built-in, one that we've seen many times already, the one that we've extended in every class we have created: the object. Technically, we can instantiate an object without writing a subclass: >>> o = object() >>> o.x = 5 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'object' object has no attribute 'x' Unfortunately, as you can see, it's not possible to set any attributes on an object that was instantiated directly. This isn't because the Python developers wanted to force us to write our own classes, or anything so sinister. They did this to save memory; a lot of memory. When Python allows an object to have arbitrary attributes, it takes a certain amount of system memory to keep track of what attributes each object has, for storing both the attribute name and its value. Even if no attributes are stored, memory is allocated for potential new attributes. Given the dozens, hundreds, or thousands of objects (every class extends object) in a typical Python program; this small amount of memory would quickly become a large amount of memory. So, Python disables arbitrary properties on object, and several other built-ins, by default. It is possible to restrict arbitrary properties on our own classes using slots. You now have a search term if you are looking for more information. In normal use, there isn't much benefit to using slots, but if you're writing an object that will be duplicated thousands of times throughout the system, they can help save memory, just as they do for object. It is, however, trivial to create an empty object class of our own; we saw it in our earliest example: class MyObject: pass And, as we've already seen, it's possible to set attributes on such classes: >>> m = MyObject() >>> m.x = "hello" >>> m.x 'hello' If we wanted to group properties together, we could store them in an empty object like this. But we are usually better off using other built-ins designed for storing data. It has been stressed that classes and objects should only be used when you want to specify both data and behaviors. The main reason to write an empty class is to quickly block something out, knowing we'll come back later to add behavior. It is much easier to adapt behaviors to a class than it is to replace a data structure with an object and change all references to it. Therefore, it is important to decide from the outset if the data is just data, or if it is an object in disguise. Once that design decision is made, the rest of the design naturally falls into place. Tuples and named tuples Tuples are objects that can store a specific number of other objects in order. They are immutable, so we can't add, remove, or replace objects on the fly. This may seem like a massive restriction, but the truth is, if you need to modify a tuple, you're using the wrong data type (usually a list would be more suitable). The primary benefit of tuples' immutability is that we can use them as keys in dictionaries, and in other locations where an object requires a hash value. Tuples are used to store data; behavior cannot be stored in a tuple. If we require behavior to manipulate a tuple, we have to pass the tuple into a function (or method on another object) that performs the action. Tuples should generally store values that are somehow different from each other. For example, we would not put three stock symbols in a tuple, but we might create a tuple of stock symbol, current price, high, and low for the day. The primary purpose of a tuple is to aggregate different pieces of data together into one container. Thus, a tuple can be the easiest tool to replace the "object with no data" idiom. We can create a tuple by separating the values with a comma. Usually, tuples are wrapped in parentheses to make them easy to read and to separate them from other parts of an expression, but this is not always mandatory. The following two assignments are identical (they record a stock, the current price, the high, and the low for a rather profitable company): >>> stock = "FB", 75.00, 75.03, 74.90 >>> stock2 = ("FB", 75.00, 75.03, 74.90) If we're grouping a tuple inside of some other object, such as a function call, list comprehension, or generator, the parentheses are required. Otherwise, it would be impossible for the interpreter to know whether it is a tuple or the next function parameter. For example, the following function accepts a tuple and a date, and returns a tuple of the date and the middle value between the stock's high and low value: import datetime def middle(stock, date): symbol, current, high, low = stock return (((high + low) / 2), date) mid_value, date = middle(("FB", 75.00, 75.03, 74.90), datetime.date(2014, 10, 31)) The tuple is created directly inside the function call by separating the values with commas and enclosing the entire tuple in parenthesis. This tuple is then followed by a comma to separate it from the second argument. This example also illustrates tuple unpacking. The first line inside the function unpacks the stock parameter into four different variables. The tuple has to be exactly the same length as the number of variables, or it will raise an exception. We can also see an example of tuple unpacking on the last line, where the tuple returned inside the function is unpacked into two values, mid_value and date. Granted, this is a strange thing to do, since we supplied the date to the function in the first place, but it gave us a chance to see unpacking at work. Unpacking is a very useful feature in Python. We can group variables together to make storing and passing them around simpler, but the moment we need to access all of them, we can unpack them into separate variables. Of course, sometimes we only need access to one of the variables in the tuple. We can use the same syntax that we use for other sequence types (lists and strings, for example) to access an individual value: >>> stock = "FB", 75.00, 75.03, 74.90 >>> high = stock[2] >>> high 75.03 We can even use slice notation to extract larger pieces of tuples: >>> stock[1:3] (75.00, 75.03) These examples, while illustrating how flexible tuples can be, also demonstrate one of their major disadvantages: readability. How does someone reading this code know what is in the second position of a specific tuple? They can guess, from the name of the variable we assigned it to, that it is high of some sort, but if we had just accessed the tuple value in a calculation without assigning it, there would be no such indication. They would have to paw through the code to find where the tuple was declared before they could discover what it does. Accessing tuple members directly is fine in some circumstances, but don't make a habit of it. Such so-called "magic numbers" (numbers that seem to come out of thin air with no apparent meaning within the code) are the source of many coding errors and lead to hours of frustrated debugging. Try to use tuples only when you know that all the values are going to be useful at once and it's normally going to be unpacked when it is accessed. If you have to access a member directly or using a slice and the purpose of that value is not immediately obvious, at least include a comment explaining where it came from. Named tuples So, what do we do when we want to group values together, but know we're frequently going to need to access them individually? Well, we could use an empty object, as discussed in the previous section (but that is rarely useful unless we anticipate adding behavior later), or we could use a dictionary (most useful if we don't know exactly how many or which specific data will be stored), as we'll cover in the next section. If, however, we do not need to add behavior to the object, and we know in advance what attributes we need to store, we can use a named tuple. Named tuples are tuples with attitude. They are a great way to group read-only data together. Constructing a named tuple takes a bit more work than a normal tuple. First, we have to import namedtuple, as it is not in the namespace by default. Then, we describe the named tuple by giving it a name and outlining its attributes. This returns a class-like object that we can instantiate with the required values as many times as we want: from collections import namedtuple Stock = namedtuple("Stock", "symbol current high low") stock = Stock("FB", 75.00, high=75.03, low=74.90) The namedtuple constructor accepts two arguments. The first is an identifier for the named tuple. The second is a string of space-separated attributes that the named tuple can have. The first attribute should be listed, followed by a space (or comma if you prefer), then the second attribute, then another space, and so on. The result is an object that can be called just like a normal class to instantiate other objects. The constructor must have exactly the right number of arguments that can be passed in as arguments or keyword arguments. As with normal objects, we can create as many instances of this "class" as we like, with different values for each. The resulting namedtuple can then be packed, unpacked, and otherwise treated like a normal tuple, but we can also access individual attributes on it as if it were an object: >>> stock.high 75.03 >>> symbol, current, high, low = stock >>> current 75.00 Remember that creating named tuples is a two-step process. First, use collections.namedtuple to create a class, and then construct instances of that class. Named tuples are perfect for many "data only" representations, but they are not ideal for all situations. Like tuples and strings, named tuples are immutable, so we cannot modify an attribute once it has been set. For example, the current value of my company's stock has gone down since we started this discussion, but we can't set the new value: >>> stock.current = 74.98 Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: can't set attribute If we need to be able to change stored data, a dictionary may be what we need instead. Dictionaries Dictionaries are incredibly useful containers that allow us to map objects directly to other objects. An empty object with attributes to it is a sort of dictionary; the names of the properties map to the property values. This is actually closer to the truth than it sounds; internally, objects normally represent attributes as a dictionary, where the values are properties or methods on the objects (see the __dict__ attribute if you don't believe me). Even the attributes on a module are stored, internally, in a dictionary. Dictionaries are extremely efficient at looking up a value, given a specific key object that maps to that value. They should always be used when you want to find one object based on some other object. The object that is being stored is called the value; the object that is being used as an index is called the key. We've already seen dictionary syntax in some of our previous examples. Dictionaries can be created either using the dict() constructor or using the {} syntax shortcut. In practice, the latter format is almost always used. We can prepopulate a dictionary by separating the keys from the values using a colon, and separating the key value pairs using a comma. For example, in a stock application, we would most often want to look up prices by the stock symbol. We can create a dictionary that uses stock symbols as keys, and tuples of current, high, and low as values like this: stocks = {"GOOG": (613.30, 625.86, 610.50), "MSFT": (30.25, 30.70, 30.19)} As we've seen in previous examples, we can then look up values in the dictionary by requesting a key inside square brackets. If the key is not in the dictionary, it will raise an exception: >>> stocks["GOOG"] (613.3, 625.86, 610.5) >>> stocks["RIM"] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: 'RIM' We can, of course, catch the KeyError and handle it. But we have other options. Remember, dictionaries are objects, even if their primary purpose is to hold other objects. As such, they have several behaviors associated with them. One of the most useful of these methods is the get method; it accepts a key as the first parameter and an optional default value if the key doesn't exist: >>> print(stocks.get("RIM")) None >>> stocks.get("RIM", "NOT FOUND") 'NOT FOUND' For even more control, we can use the setdefault method. If the key is in the dictionary, this method behaves just like get; it returns the value for that key. Otherwise, if the key is not in the dictionary, it will not only return the default value we supply in the method call (just like get does), it will also set the key to that same value. Another way to think of it is that setdefault sets a value in the dictionary only if that value has not previously been set. Then it returns the value in the dictionary, either the one that was already there, or the newly provided default value. >>> stocks.setdefault("GOOG", "INVALID") (613.3, 625.86, 610.5) >>> stocks.setdefault("BBRY", (10.50, 10.62, 10.39)) (10.50, 10.62, 10.39) >>> stocks["BBRY"] (10.50, 10.62, 10.39) The GOOG stock was already in the dictionary, so when we tried to setdefault it to an invalid value, it just returned the value already in the dictionary. BBRY was not in the dictionary, so setdefault returned the default value and set the new value in the dictionary for us. We then check that the new stock is, indeed, in the dictionary. Three other very useful dictionary methods are keys(), values(), and items(). The first two return an iterator over all the keys and all the values in the dictionary. We can use these like lists or in for loops if we want to process all the keys or values. The items() method is probably the most useful; it returns an iterator over tuples of (key, value) pairs for every item in the dictionary. This works great with tuple unpacking in a for loop to loop over associated keys and values. This example does just that to print each stock in the dictionary with its current value: >>> for stock, values in stocks.items(): ... print("{} last value is {}".format(stock, values[0])) ... GOOG last value is 613.3 BBRY last value is 10.50 MSFT last value is 30.25 Each key/value tuple is unpacked into two variables named stock and values (we could use any variable names we wanted, but these both seem appropriate) and then printed in a formatted string. Notice that the stocks do not show up in the same order in which they were inserted. Dictionaries, due to the efficient algorithm (known as hashing) that is used to make key lookup so fast, are inherently unsorted. So, there are numerous ways to retrieve data from a dictionary once it has been instantiated; we can use square brackets as index syntax, the get method, the setdefault method, or iterate over the items method, among others. Finally, as you likely already know, we can set a value in a dictionary using the same indexing syntax we use to retrieve a value: >>> stocks["GOOG"] = (597.63, 610.00, 596.28) >>> stocks['GOOG'] (597.63, 610.0, 596.28) Google's price is lower today, so I've updated the tuple value in the dictionary. We can use this index syntax to set a value for any key, regardless of whether the key is in the dictionary. If it is in the dictionary, the old value will be replaced with the new one; otherwise, a new key/value pair will be created. We've been using strings as dictionary keys, so far, but we aren't limited to string keys. It is common to use strings as keys, especially when we're storing data in a dictionary to gather it together (instead of using an object with named properties). But we can also use tuples, numbers, or even objects we've defined ourselves as dictionary keys. We can even use different types of keys in a single dictionary: random_keys = {} random_keys["astring"] = "somestring" random_keys[5] = "aninteger" random_keys[25.2] = "floats work too" random_keys[("abc", 123)] = "so do tuples" class AnObject: def __init__(self, avalue): self.avalue = avalue my_object = AnObject(14) random_keys[my_object] = "We can even store objects" my_object.avalue = 12 try: random_keys[[1,2,3]] = "we can't store lists though" except: print("unable to store listn") for key, value in random_keys.items(): print("{} has value {}".format(key, value)) This code shows several different types of keys we can supply to a dictionary. It also shows one type of object that cannot be used. We've already used lists extensively, and we'll be seeing many more details of them in the next section. Because lists can change at any time (by adding or removing items, for example), they cannot hash to a specific value. Objects that are hashable basically have a defined algorithm that converts the object into a unique integer value for rapid lookup. This hash is what is actually used to look up values in a dictionary. For example, strings map to integers based on the characters in the string, while tuples combine hashes of the items inside the tuple. Any two objects that are somehow considered equal (like strings with the same characters or tuples with the same values) should have the same hash value, and the hash value for an object should never ever change. Lists, however, can have their contents changed, which would change their hash value (two lists should only be equal if their contents are the same). Because of this, they can't be used as dictionary keys. For the same reason, dictionaries cannot be used as keys into other dictionaries. In contrast, there are no limits on the types of objects that can be used as dictionary values. We can use a string key that maps to a list value, for example, or we can have a nested dictionary as a value in another dictionary. Dictionary use cases Dictionaries are extremely versatile and have numerous uses. There are two major ways that dictionaries can be used. The first is dictionaries where all the keys represent different instances of similar objects; for example, our stock dictionary. This is an indexing system. We use the stock symbol as an index to the values. The values could even have been complicated self-defined objects that made buy and sell decisions or set a stop-loss, rather than our simple tuples. The second design is dictionaries where each key represents some aspect of a single structure; in this case, we'd probably use a separate dictionary for each object, and they'd all have similar (though often not identical) sets of keys. This latter situation can often also be solved with named tuples. These should typically be used when we know exactly what attributes the data must store, and we know that all pieces of the data must be supplied at once (when the item is constructed). But if we need to create or change dictionary keys over time or we don't know exactly what the keys might be, a dictionary is more suitable. Using defaultdict We've seen how to use setdefault to set a default value if a key doesn't exist, but this can get a bit monotonous if we need to set a default value every time we look up a value. For example, if we're writing code that counts the number of times a letter occurs in a given sentence, we could do this: def letter_frequency(sentence): frequencies = {} for letter in sentence: frequency = frequencies.setdefault(letter, 0) frequencies[letter] = frequency + 1 return frequencies Every time we access the dictionary, we need to check that it has a value already, and if not, set it to zero. When something like this needs to be done every time an empty key is requested, we can use a different version of the dictionary, called defaultdict: from collections import defaultdict def letter_frequency(sentence): frequencies = defaultdict(int) for letter in sentence: frequencies[letter] += 1 return frequencies This code looks like it couldn't possibly work. The defaultdict accepts a function in its constructor. Whenever a key is accessed that is not already in the dictionary, it calls that function, with no parameters, to create a default value. In this case, the function it calls is int, which is the constructor for an integer object. Normally, integers are created simply by typing an integer number into our code, and if we do create one using the int constructor, we pass it the item we want to create (for example, to convert a string of digits into an integer). But if we call int without any arguments, it returns, conveniently, the number zero. In this code, if the letter doesn't exist in the defaultdict, the number zero is returned when we access it. Then we add one to this number to indicate we've found an instance of that letter, and the next time we find one, that number will be returned and we can increment the value again. The defaultdict is useful for creating dictionaries of containers. If we want to create a dictionary of stock prices for the past 30 days, we could use a stock symbol as the key and store the prices in list; the first time we access the stock price, we would want it to create an empty list. Simply pass list into the defaultdict, and it will be called every time an empty key is accessed. We can do similar things with sets or even empty dictionaries if we want to associate one with a key. Of course, we can also write our own functions and pass them into the defaultdict. Suppose we want to create a defaultdict where each new element contains a tuple of the number of items inserted into the dictionary at that time and an empty list to hold other things. Nobody knows why we would want to create such an object, but let's have a look: from collections import defaultdict num_items = 0 def tuple_counter(): global num_items num_items += 1 return (num_items, []) d = defaultdict(tuple_counter) When we run this code, we can access empty keys and insert into the list all in one statement: >>> d = defaultdict(tuple_counter) >>> d['a'][1].append("hello") >>> d['b'][1].append('world') >>> d defaultdict(<function tuple_counter at 0x82f2c6c>, {'a': (1, ['hello']), 'b': (2, ['world'])}) When we print dict at the end, we see that the counter really was working. This example, while succinctly demonstrating how to create our own function for defaultdict, is not actually very good code; using a global variable means that if we created four different defaultdict segments that each used tuple_counter, it would count the number of entries in all dictionaries, rather than having a different count for each one. It would be better to create a class and pass a method on that class to defaultdict. Counter You'd think that you couldn't get much simpler than defaultdict(int), but the "I want to count specific instances in an iterable" use case is common enough that the Python developers created a specific class for it. The previous code that counts characters in a string can easily be calculated in a single line: from collections import Counter def letter_frequency(sentence): return Counter(sentence) The Counter object behaves like a beefed up dictionary where the keys are the items being counted and the values are the number of such items. One of the most useful functions is the most_common() method. It returns a list of (key, count) tuples ordered by the count. You can optionally pass an integer argument into most_common() to request only the top most common elements. For example, you could write a simple polling application as follows: from collections import Counter responses = [ "vanilla", "chocolate", "vanilla", "vanilla", "caramel", "strawberry", "vanilla" ] print( "The children voted for {} ice cream".format( Counter(responses).most_common(1)[0][0] ) ) Presumably, you'd get the responses from a database or by using a complicated vision algorithm to count the kids who raised their hands. Here, we hardcode it so that we can test the most_common method. It returns a list that has only one element (because we requested one element in the parameter). This element stores the name of the top choice at position zero, hence the double [0][0] at the end of the call. I think they look like a surprised face, don't you? Your computer is probably amazed it can count data so easily. It's ancestor, Hollerith's Tabulating Machine for the 1890 US census, must be so jealous! Summary We've covered several built-in data structures and attempted to understand how to choose one for specific applications. Sometimes, the best thing we can do is create a new class of objects, but often, one of the built-ins provides exactly what we need. When it doesn't, we can always use inheritance or composition to adapt them to our use cases. We can even override special methods to completely change the behavior of built-in syntaxes. Be sure to pick up the full title, Python 3 Object-oriented Programming - Second Edition, to continue your learning in the world of OOP Python. If you want to go above and beyond then there’s no better way than building on what you’ve discovered with Mastering Object-oriented Python and Learning Object-Oriented Programming too! Resources for Article: Further resources on this subject: Python LDAP applications - extra LDAP operations and the LDAP URL library [article] Exception Handling in MySQL for Python [article] Data Transactions Made Easy with MySQL and Python [article]

0
0
3247

Packt

08 Mar 2016

17 min read

Magento 2 – the New E-commerce Era

Packt

08 Mar 2016

17 min read

In this article by Ray Bogman and Vladimir Kerkhoff, the authors of the book, Magento 2 Cookbook, we will cover the basic tasks related to creating a catalog and products in Magento 2. You will learn the following recipes: Creating a root catalog Creating subcategories Managing an attribute set (For more resources related to this topic, see here.) Introduction This article explains how to set up a vanilla Magento 2 store. If Magento 2 is totally new for you, then lots of new basic whereabouts are pointed out. If you are currently working with Magento 1, then not a lot has changed since. The new backend of Magento 2 is the biggest improvement of them all. The design is built responsively and has a great user experience. Compared to Magento 1, this is a great improvement. The menu is located vertically on the left of the screen and works great on desktop and mobile environments: In this article, we will see how to set up a website with multiple domains using different catalogs. Depending on the website, store, and store view setup, we can create different subcategories, URLs, and product per domain name. There are a number of different ways customers can browse your store, but one of the most effective one is layered navigation. Layered navigation is located in your catalog and holds product features to sort or filter. Every website benefits from great Search Engine Optimization (SEO). You will learn how to define catalog URLs per catalog. Throughout this article, we will cover the basics on how to set up a multidomain setup. Additional tasks required to complete a production-like setup are out of the scope of this article. Creating a root catalog The first thing that we need to start with when setting up a vanilla Magento 2 website is defining our website, store, and store view structure. So what is the difference between website, store, and store view, and why is it important: A website is the top-level container and most important of the three. It is the parent level of the entire store and used, for example, to define domain names, different shipping methods, payment options, customers, orders, and so on. Stores can be used to define, for example, different store views with the same information. A store is always connected to a root catalog that holds all the categories and subcategories. One website can manage multiple stores, and every store has a different root catalog. When using multiple stores, it is not possible to share one basket. The main reason for this has to do with the configuration setup where shipping, catalog, customer, inventory, taxes, and payment settings are not sharable between different sites. Store views is the lowest level and mostly used to handle different localizations. Every store view can be set with a different language. Besides using store views just for localizations, it can also be used for Business to Business (B2B), hidden private sales pages (with noindex and nofollow), and so on. The option where we use the base link URL, for example, (yourdomain.com/myhiddenpage) is easy to set up. The website, store, and store view structure is shown in the following image: Getting ready For this recipe, we will use a Droplet created at DigitalOcean, https://www.digitalocean.com/. We will be using NGINX, PHP-FPM, and a Composer-based setup including Magento 2 preinstalled. No other prerequisites are required. How to do it... For the purpose of this recipe, let's assume that we need to create a multi-website setup including three domains (yourdomain.com, yourdomain.de, and yourdomain.fr) and separate root catalogs. The following steps will guide you through this: First, we need to update our NGINX. We need to configure the additional domains before we can connect them to Magento. Make sure that all domain names are connected to your server and DNS is configured correctly. Go to /etc/nginx/conf.d, open the default.conf file, and include the following content at the top of your file: map $http_host $magecode { hostnames; default base; yourdomain.de de; yourdomain.fr fr; } Your configuration should look like this now: map $http_host $magecode { hostnames; default base; yourdomain.de de; yourdomain.fr fr; } upstream fastcgi_backend { server 127.0.0.1:9000; } server { listen 80; listen 443 ssl http2; server_name yourdomain.com; set $MAGE_ROOT /var/www/html; set $MAGE_MODE developer; ssl_certificate /etc/ssl/yourdomain-com.cert; ssl_certificate_key /etc/ssl/yourdomain-com.key; include /var/www/html/nginx.conf.sample; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; location ~ /\.ht { deny all; } } Now let's go to the Magento 2 configuration file in /var/www/html/ and open the nginx.conf.sample file. Go to the bottom and look for the following: location ~ (index|get|static|report|404|503)\.php$ Now we add the following lines to the file under fastcgi_pass fastcgi_backend;: fastcgi_param MAGE_RUN_TYPE website; fastcgi_param MAGE_RUN_CODE $magecode; Your configuration should look like this now (this is only a small section of the bottom section): location ~ (index|get|static|report|404|503)\.php$ { try_files $uri =404; fastcgi_pass fastcgi_backend; fastcgi_param MAGE_RUN_TYPE website; fastcgi_param MAGE_RUN_CODE $magecode; fastcgi_param PHP_FLAG "session.auto_start=off \n suhosin.session.cryptua=off"; fastcgi_param PHP_VALUE "memory_limit=256M \n max_execution_time=600"; fastcgi_read_timeout 600s; fastcgi_connect_timeout 600s; fastcgi_param MAGE_MODE $MAGE_MODE; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } The current setup is using the MAGE_RUN_TYPE website variable. You may change website to store depending on your setup preferences. When changing the variable, you need your default.conf mapping codes as well. Now, all you have to do is restart NGINX and PHP-FPM to use your new settings. Run the following command: service nginx restart && service php-fpm restart Before we continue we need to check whether our web server is serving the correct codes. Run the following command in the Magento 2 web directory: var/www/html/pub echo "<?php header("Content-type: text/plain"); print_r($_SERVER); ?>" > magecode.php Don't forget to update your nginx.conf.sample file with the new magecode code. It's located at the bottom of your file and should look like this: location ~ (index|get|static|report|404|503|magecode)\.php$ { Restart NGINX and open the file in your browser. The output should look as follows. As you can see, the created MAGE_RUN variables are available. Congratulations, you just finished configuring NGINX including additional domains. Now let's continue connecting them in Magento 2. Now log in to the backend and navigate to Stores | All Stores. By default, Magento 2 has one Website, Store, and Store View setup. Now click on Create Website and commit the following details: Name My German Website Code de Next, click on Create Store and commit the following details: Web site My German Website Name My German Website Root Category Default Category (we will change this later) Next, click on Create Store View and commit the following details: Store My German Website Name German Code de Status Enabled Continue the same step for the French domain. Make sure that the Code in Website and Store View is fr. The next important step is connecting the websites with the domain name. Navigate to Stores | Configuration | Web | Base URLs. Change the Store View scope at the top to My German Website. You will be prompted when switching; press ok to continue. Now, unset the checkbox called Use Default from Base URL and Base Link URL and commit your domain name. Save and continue the same procedure for the other website. The output should look like this: Save your entire configuration and clear your cache. Now go to Products | Categories and click on Add Root Category with the following data: Name Root German Is Active Yes Page Title My German Website Continue the same step for the French domain. You may add additional information here but it is not needed. Changing the current Root Category called Default Category to Root English is also optional but advised. Save your configuration, go to Stores | All Stores, and change all of the stores to the appropriate Root Catalog that we just created. Every Root Category should now have a dedicated Root Catalog. Congratulations, you just finished configuring Magento 2 including additional domains and dedicated Root Categories. Now let's open a browser and surf to your created domain names: yourdomain.com, yourdomain.de, and yourdomain.fr. How it works… Let's recap and find out what we did throughout this recipe. In steps 1 through 11, we created a multistore setup for .com, .de, and .fr domains using a separate Root Catalog. In steps 1 through 4, we configured the domain mapping in the NGINX default.conf file. Then, we added the fastcgi_param MAGE_RUN code to the nginx.conf.sample file, which will manage what website or store view to request within Magento. In step 6, we used an easy test method to check whether all domains run the correct MAGE_RUN code. In steps 7 through 9, we configured the website, store, and store view name and code for the given domain names. In step 10, we created additional Root Catalogs for the remaining German and French stores. They are then connected to the previously created store configuration. All stores have their own Root Catalog now. There's more… Are you able to buy additional domain names but like to try setting up a multistore? Here are some tips to create one. Depending on whether you are using Windows, Mac OS, or Linux, the following options apply: Windows: Go to C:\Windows\System32\drivers\etc, open up the hosts file as an administrator, and add the following: (Change the IP and domain name accordingly.) 123.456.789.0 yourdomain.de 123.456.789.0 yourdomain.fr 123.456.789.0 www.yourdomain.de 123.456.789.0 www.yourdomain.fr Save the file and click on the Start button; then search for cmd.exe and commit the following: ipconfig /flushdns Mac OS: Go to the /etc/ directory, open the hosts file as a super user, and add the following: (Change the IP and domain name accordingly.) 123.456.789.0 yourdomain.de 123.456.789.0 yourdomain.fr 123.456.789.0 www.yourdomain.de 123.456.789.0 www.yourdomain.fr Save the file and run the following command on the shell: dscacheutil -flushcache Depending on your Mac version, check out the different commands here: http://www.hongkiat.com/blog/how-to-clear-flush-dns-cache-in-os-x-yosemite/ Linux: Go to the /etc/ directory, open the hosts file as a root user, and add the following: (Change the IP and domain name accordingly.) 123.456.789.0 yourdomain.de 123.456.789.0 yourdomain.fr 123.456.789.0 www.yourdomain.de 123.456.789.0 www.yourdomain.fr Save the file and run the following command on the shell: service nscd restart Depending on your Linux version, check out the different commands here: http://www.cyberciti.biz/faq/rhel-debian-ubuntu-flush-clear-dns-cache/ Open your browser and surf to the custom-made domains. These domains work only on your PC. You can copy these IP and domain names on as many PCs as you prefer. This method also works great when you are developing or testing and your production domain is not available on your development environment. Creating subcategories After creating the foundation of the website, we need to set up a catalog structure. Setting up a catalog structure is not difficult, but needs to be thought out well. Some websites have an easy setup using two levels, while others sometimes use five or more subcategories. Always keep in mind the user experience; your customer needs to crawl the pages easily. Keep it simple! Getting ready For this recipe, we will use a Droplet created at DigitalOcean, https://www.digitalocean.com/. We will be using NGINX, PHP-FPM, and a Composer-based setup including Magento 2 preinstalled. No other prerequisites are required. How to do it... For the purpose of this recipe, let's assume that we need to set up a catalog including subcategories. The following steps will guide you through this: First, log in to the backend of Magento 2 and go to Products | Categories. As we have already created Root Catalogs, we start with using the Root English catalog first. Click on the Root English catalog on the left and then select the Add Subcategory button above the menu. Now commit the following and repeat all steps again for the other Root Catalogs: Name Shoes (Schuhe) (Chaussures) Is Active Yes Page Title Shoes (Schuhe) (Chaussures) Name Clothes (Kleider) (Vêtements) Is Active Yes Page Title Clothes (Kleider) (Vêtements) As we have created the first level of our catalog, we can continue with the second level. Now click on the first level that you need to extend with a subcategory and select the Add Subcategory button. Now commit the following and repeat all steps again for the other Root Catalogs: Name Men (Männer) (Hommes) Is Active Yes Page Title Men (Männer) (Hommes) Name Women (Frau) (Femmes) Is Active Yes Page Title Women (Frau) (Femmes) Congratulations, you just finished configuring subcategories in Magento 2. Now let's open a browser and surf to your created domain names: yourdomain.com, yourdomain.de, and yourdomain.fr. Your categories should now look as follows: How it works… Let's recap and find out what we did throughout this recipe. In steps 1 through 4, we created subcategories for the English, German, and French stores. In this recipe, we created a dedicated Root Catalog for every website. This way, every store can be configured using their own tax and shipping rules. There's more… In our example, we only submitted Name, Is Active, and Page Title. You may continue to commit the Description, Image, Meta Keywords, and Meta Description fields. By default, the URL key is the same as the Name field; you can change this depending on your SEO needs. Every category or subcategory has a default page layout defined by the theme. You may need to override this. Go to the Custom Design tab and click the drop-down menu of Page Layout. We can choose from the following options: 1 column, 2 columns with left bar, 2 columns with right bar, 3 columns, or Empty. Managing an attribute set Every product has a unique DNA; some products such as shoes could have different colors, brands, and sizes, while a snowboard could have weight, length, torsion, manufacture, and style. Setting up a website with all the attributes does not make sense. Depending on the products that you sell, you should create attributes that apply per website. When creating products for your website, attributes are the key elements and need to be thought through. What and how many attributes do I need? How many values does one need? All types of questions that could have a great impact on your website and, not to forget, the performance of it. Creating an attribute such as color and having 100 K of different key values stored is not improving your overall speed and user experience. Always think things through. After creating the attributes, we combine them in attribute sets that can be picked when starting to create a product. Some attributes can be used more than once, while others are unique to one product of an attribute set. Getting ready For this recipe, we will use a Droplet created at DigitalOcean, https://www.digitalocean.com/. We will be using NGINX, PHP-FPM, and a Composer-based setup including Magento 2 preinstalled. No other prerequisites are required. How to do it... For the purpose of this recipe, let's assume that we need to create product attributes and sets. The following steps will guide you through this: First, log in to the backend of Magento 2 and go to Stores | Products. As we are using a vanilla setup, only system attributes and one attribute set is installed. Now click on Add New Attribute and commit the following data in the Properties tab: Attribute Properties Default label shoe_size Catalog Input Type for Store Owners Dropdown Values Required No Manage Options (values of your attribute) English Admin French German 4 4 35 35 4.5 4.5 35 35 5 5 35-36 35-36 5.5 5.5 36 36 6 6 36-37 36-37 6.5 6.5 37 37 7 7 37-38 37-38 7.5 7.5 38 38 8 8 38-39 38-39 8.5 8.5 39 39 Advanced Attribute Properties Scope Global Unique Value No Add to Column Options Yes Use in Filer Options Yes As we have already set up a multi-website that sells shoes and clothes, we stick with this. The attributes that we need to sell shoes are: shoe_size, shoe_type, width, color, gender, and occasion. Continue with the rest of the chart accordingly (http://www.shoesizingcharts.com). Click on Save and Continue Edit now and continue on the Manage Labels tab with the following information: Manage Titles (Size, Color, etc.) English French German Size Taille Größe Click on Save and Continue Edit now and continue on the Storefront Properties tab with the following information: Storefront Properties Use in Search No Comparable in Storefront No Use in Layered Navigation Filterable (with result) Use in Search Result Layered Navigation No Position 0 Use for Promo Rule Conditions No Allow HTML Tags on Storefront Yes Visible on Catalog Pages on Storefront Yes Used in Product Listing No Used for Sorting in Product Listing No Click on Save Attribute now and clear the cache. Depending on whether you have set up the index management accordingly through the Magento 2 cronjob, it will automatically update the newly created attribute. The additional shoe_type, width, color, gender, and occasion attributes configuration can be downloaded at https://github.com/mage2cookbook/chapter4. After creating all of the attributes, we combine them in an attribute set called Shoes. Go to Stores | Attribute Set, click on Add Attribute Set, and commit the following data: Edit Attribute Set Name Name Shoes Based On Default Now click on the Add New button in the Groups section and commit the group name called Shoes. The newly created group is now located at the bottom of the list. You may need to scroll down before you see it. It is possible to drag and drop the group higher up in the list. Now drag and drop the created attributes, shoe_size, shoe_type, width, color, gender, and occasion to the group and save the configuration. The notice of the cron job is automatically updated depending on your settings. Congratulations, you just finished creating attributes and attribute sets in Magento 2. This can be seen in the following screenshot: How it works… Let's recap and find out what we did throughout this recipe. In steps 1 through 10, we created attributes that will be used in an attribute set. The attributes and sets are the fundamentals for every website. In steps 1 through 5, we created multiple attributes to define all details about the shoes and clothes that we would like to sell. Some attributes are later used as configurable values on the frontend while others only indicate the gender or occasion. In steps 6 through 9, we connected the attributes to the related attribute set so that when creating a product, all correct elements are available. There's more… After creating the attribute set for Shoes, we continue to create an attribute set for Clothes. Use the following attributes to create the set: color, occasion, apparel_type, sleeve_length, fit, size, length, and gender. Follow the same steps as we did before to create a new attribute set. You may reuse the attributes, color, occasion, and gender. All detailed attributes can be found at https://github.com/mage2cookbook/chapter4#clothes-set. The following is the screenshot of the Clothes attribute set: Summary In this article, you learned how to create a Root Catalog, subcategories, and manage attribute sets. For more information on Magento 2, Refer the following books by Packt Publishing: Magento 2 Development Cookbook (https://www.packtpub.com/web-development/magento-2-development-cookbook) Magento 2 Developer's Guide (https://www.packtpub.com/web-development/magento-2-developers-guide) Resources for Article: Further resources on this subject: Social Media in Magento [article] Upgrading from Magneto 1 [article] Social Media and Magento [article]

0
0
12025

article-image-getting-started-deep-learning

Packt

07 Mar 2016

12 min read

Getting Started with Deep Learning

Packt

07 Mar 2016

12 min read

In this article by Joshua F. Wiley, author of the book, R Deep Learning Essentials, we will discuss deep learning, a powerful multilayered architecture for pattern recognition, signal detection, classification, and prediction. Although deep learning is not new, it has gained popularity in the past decade due to the advances in the computational capacity and new ways of efficient training models, as well as the availability of ever growing amount of data. In this article, you will learn what deep learning is. What is deep learning? To understand what deep learning is, perhaps it is easiest to start with what is meant by regular machine learning. In general terms, machine learning is devoted to developing and using algorithms that learn from raw data in order to make predictions. Prediction is a very general term. For example, predictions from machine learning may include predicting how much money a customer will spend at a given company, or whether a particular credit card purchase is fraudulent. Predictions also encompass more general pattern recognition, such as what letters are present in a given image, or whether a picture is of a horse, dog, person, face, building, and so on. Deep learning is a branch of machine learning where a multi-layered (deep) architecture is used to map the relations between inputs or observed features and the outcome. This deep architecture makes deep learning particularly suitable for handling a large number of variables and allows deep learning to generate features as part of the overall learning algorithm, rather than feature creation being a separate step. Deep learning has proven particularly effective in the fields of image recognition (including handwriting as well as photo or object classification) and natural language processing, such as recognizing speech. There are many types of machine learning algorithms. In this article, we are primarily going to focus on neural networks as these have been particularly popular in deep learning. However, this focus does not mean that it is the only technique available in machine learning or even deep learning, nor that other techniques are not valuable or even better suited, depending on the specific task. Conceptual overview of neural networks As their name suggests, neural networks draw their inspiration from neural processes and neurons in the body. Neural networks contain a series of neurons, or nodes, which are interconnected and process input. The connections between neurons are weighted, with these weights based on the function being used and learned from the data. Activation in one set of neurons and the weights (adaptively learned from the data) may then feed into other neurons, and the activation of some final neuron(s) is the prediction. To make this process more concrete, an example from human visual perception may be helpful. The term grandmother cell is used to refer to the concept that somewhere in the brain there is a cell or neuron that responds specifically to a complex and specific object, such as your grandmother. Such specificity would require thousands of cells to represent every unique entity or object we encounter. Instead, it is thought that visual perception occurs by building up more basic pieces into complex representations. For example, the following is a picture of a square: Figure 1 Rather than our visual system having cells neurons that are activated only upon seeing the gestalt, or entirety, of a square, we can have cells that recognize horizontal and vertical lines, as shown in the following: Figure 2 In this hypothetical case, there may be two neurons, one which is activated when it senses horizontal lines and another that is activated when it senses vertical lines. Finally, a higher-order process recognizes that it is seeing a square when both the lower order neurons are activated simultaneously. Neural networks share some of these same concepts, with inputs being processed by a first layer of neurons that may go on to trigger another layer. Neural networks are sometimes shown as graphical models. In Figure 3, Inputs are data represented as squares. These may be pixels in an image or different aspects of sounds, or something else. The next layer of Hidden neurons is neurons that recognize basic features, such as horizontal lines, vertical lines, or curved lines. Finally, the output may be a neuron that is activated by the simultaneous activation of two of the hidden neurons. In this article, observed data or features are depicted as squares, and unobserved or hidden layers as circles: Figure 3 Neural networks are used to refer to a broad class of models and algorithms. Hidden neurons are generated based on some combination of the observed data, similar to a basis expansion in other statistical techniques; however, rather than choosing the form of the expansion, the weights used to create the hidden neurons are learned from the data. Neural networks can involve a variety of activation function(s), which are transformations of the weighted raw data inputs to create the hidden neurons. A common choice for activation functions is the sigmoid function: and the hyperbolic tangent function . Finally, radial basis functions are sometimes used as they are efficient function approximators. Although there are a variety of these, the Gaussian form is common: . In a shallow neural network such as is shown in Figure 3, with only a single hidden layer, from the hidden units to the outputs is essentially a standard regression or classification problem. The hidden units can be denoted by, h, the outputs by, Y. Different outputs can be denoted by subscripts i = 1, …, k and may represent different possible classifications, such as (in our case) a circle or square. The paths from each hidden unit to each output are the weights and for the ith output are denoted by wi. These weights are also learned from the data, just like the weights used to create the hidden layer. For classification, it is common to use a final transformation, the softmax function, which is as this ensures that the estimates are positive (using the exponential function) and that the probability of being in any given class sums to one. For linear regression, the identity function, which returns its input, is commonly used. Confusion may arise as to why there are paths between every hidden unit and output as well as every input and hidden unit. These are commonly drawn to represent that a priori any of these relations are allowed to exist. The weights must then be learned from the data, with zero or near zero weights essentially equating to dropping unnecessary relations. This only scratches the surface of the conceptual and practical aspects of neural networks. For a slightly more in-depth introduction to neural networks, see Chapter 11 of The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009) also freely available at http://statweb.stanford.edu/~tibs/ElemStatLearn/. Next, we will turn to a brief introduction to deep neural networks. Deep neural networks Perhaps the simplest, if not the most informative, definition of a deep neural network (DNN) is that it is a neural network with multiple hidden layers. Although a relatively simple conceptual extension of neural networks, such deep architecture provides valuable advances in terms of the capability of the models and new challenges in training them. Using multiple hidden layers allows a more sophisticated build-up from simple elements to more complex ones. When discussing neural networks, we considered the outputs to be whether the object was a circle or a square. In a deep neural network, many circles and squares could be combined to form other more advanced shapes. One can consider two complexity aspects of a model's architecture. One is how wide or narrow it is—that is, how many neurons in a given layer. The second is how deep it is, or how many layers of neurons there are. For data that truly has such deep architectures, a DNN can fit it more accurately with fewer parameters than a neural network (NN), because more layers (each with fewer neurons) can be a more efficient and accurate representation; for example, because the shallow NN cannot build more advanced shapes from basic pieces, in order to provide equal accuracy to the DNN it must represent each unique object. Again considering pattern recognition in images, if we are trying to train a model for text recognition the raw data may be pixels from an image. The first layer of neurons could be trained to capture different letters of the alphabet, and then another layer could recognize sets of these letters as words. The advantage is that the second layer does not have to directly learn from the pixels, which are noisy and complex. In contrast, a shallow architecture may require far more parameters, as each hidden neuron would have to be capable of going directly from pixels in an image to a complete word, and many words may overlap, creating redundancy in the model. One of the challenges in training deep neural networks is how to efficiently learn the weights. The models are often complex and local minima abound making the optimization problem a challenging one. One of the major advancements came in 2006, when it was shown that Deep Belief Networks (DBNs) could be trained one layer at a time (Refer A Fast Learning Algorithm for Deep Belief Nets, by Geoffrey E. Hinton, Simon Osindero, and Yee-Whye Teh, (2006) at http://www.cs.toronto.edu/~fritz/absps/ncfast.pdf). A DBN is a type of DNN where multiple hidden layers and connections between (but not within) layers (that is, a neuron in layer 1 may be connected to a neuron in layer 2, but may not be connected to another neuron in layer 1). This is the essentially the same definition of a Restricted Boltzmann Machine (RBM)—an example is diagrammed in Figure 4, except that a RBM typically has one input layer and one hidden layer: Figure 4 The restriction of no connections within a layer is valuable as it allows for much faster training algorithms to be used, such as the contrastive divergence algorithm. If several RBMs are stacked together, they can form a DBN. Essentially, the DBN can then be trained as a series of RBMs. The first RBM layer is trained and used to transform raw data into hidden neurons, which are then treated as a new set of inputs in a second RBM, and the process is repeated until all layers have been trained. The benefits of the realization that DBNs could be trained one layer at a time extend beyond just DBNs, however. DBNs are sometimes used as a pre-training stage for a deep neural network. This allows the comparatively fast, greedy layer-by-layer training to be used to provide good initial estimates, which are then refined in the deep neural network using other, slower, training algorithms such as back propagation. So far we have been primarily focused on feed-forward neural networks, where the results from one layer and neuron feed forward to the next. Before closing this section, two specific kinds of deep neural networks that have grown in popularity are worth mentioning. The first is a Recurrent Neural Network (RNN) where neurons send feedback signals to each other. These feedback loops allow RNNs to work well with sequences. A recent example of an application of RNNs was to automatically generate click-bait such as One trick to great hair salons don't want you to know or Top 10 reasons to visit Los Angeles: #6 will shock you!. RNNs work well for such jobs as they can be seeded from a large initial pool of a few words (even just trending search terms or names) and then predict/generate what the next word should be. This process can be repeated a few times until a short phrase is generated, the click-bait. This example is drawn from a blog post by Lars Eidnes, available at http://larseidnes.com/2015/10/13/auto-generating-clickbait-with-recurrent-neural-networks/. The second type is a Convolutional Neural Network (CNN). CNNs are most commonly used in image recognition. CNNs work by having each neuron respond to overlapping subregions of an image. The benefits of CNNs are that they require comparatively minimal pre-processing yet still do not require too many parameters through weight sharing (for example, across subregions of an image). This is particularly valuable for images as they are often not consistent. For example, imagine ten different people taking a picture of the same desk. Some may be closer or farther away or at positions resulting in essentially the same image having different heights, widths, and the amount of image captured around the focal object. As for neural networks, this description only provides the briefest of overviews as to what DNNs are and some of the use cases to which they can be applied. Summary This article presented a brief introduction to NNs and DNNs. Using multiple hidden layers, DNNs have been a revolution in machine learning by providing a powerful unsupervised learning and feature-extraction component that can be standalone or integrated as part of a supervised model. There are many applications of such models and they are increasingly used by large-scale companies such as Google, Microsoft, and Facebook. Examples of tasks for deep learning are image recognition (for example, automatically tagging faces or identifying keywords for an image), voice recognition, and text translation (for example, to go from English to Spanish, or vice versa). Work is being done on text recognition, such as sentiment analysis to try to identify whether a sentence or paragraph is generally positive or negative, which is particularly useful to evaluate perceptions about a product or service. Imagine being able to scrape reviews and social media for any mention of your product and analyze whether it was being discussed more favorably than the previous month or year or not! Resources for Article: Further resources on this subject: Dealing with a Mess [article] Design with Spring AOP [article] Probability of R? [article]

0
0
2402

article-image-common-wlan-protection-mechanisms-and-their-flaws

Packt

07 Mar 2016

19 min read

Common WLAN Protection Mechanisms and their Flaws

Packt

07 Mar 2016

19 min read

In this article by Vyacheslav Fadyushin, the author of the book Building a Pentesting Lab for Wireless Networks, we will discuss various WLAN protection mechanisms and the flaws present in them. To be able to protect a wireless network, it is crucial to clearly understand which protection mechanisms exist and which security flaws do they have. This topic will be useful not only for those readers who are new to Wi-Fi security, but also as a refresher for experienced security specialists. (For more resources related to this topic, see here.) Hiding SSID Let's start with one of the common mistakes done by network administrators: relying only on the security by obscurity. In the frames of the current subject, it means using a hidden WLAN SSID (short for service set identification) or simply WLAN name. Hidden SSID means that a WLAN does not send its SSID in broadcast beacons advertising itself and doesn't respond to broadcast probe requests, thus making itself unavailable in the list of networks on Wi-Fi-enabled devices. It also means that normal users do not see that WLAN in their available networks list. But the lack of WLAN advertising does not mean that a SSID is never transmitted in the air—it is actually transmitted in plaintext with a lot of packet between access points and devices connected to them regardless of a security type used. Therefore, SSIDs are always available for all the Wi-Fi network interfaces in a range and visible for any attacker using various passive sniffing tools. MAC filtering To be honest, MAC filtering cannot be even considered as a security or protection mechanism for a wireless network, but it is still called so in various sources. So let's clarify why we cannot call it a security feature. Basically, MAC filtering means allowing only those devices that have MAC addresses from a pre-defined list to connect to a WLAN, and not allowing connections from other devices. MAC addresses are transmitted unencrypted in Wi-Fi and are extremely easy for an attacker to intercept without even being noticed (refer to the following screenshot): An example of wireless traffic sniffing tool easily revealing MAC addresses Keeping in mind the extreme simplicity of changing a physical address (MAC address) of a network interface, it becomes obvious why MAC filtering should not be treated as a reliable security mechanism. MAC filtering can be used to support other security mechanisms, but it should not be used as an only security measure for a WLAN. WEP Wired equivalent privacy (WEP) was born almost 20 years ago at the same time as the Wi-Fi technology and was integrated as a security mechanism for the IEEE 802.11 standard. Like it often happens with new technologies, it soon became clear that WEP contains weaknesses by design and cannot provide reliable security for wireless networks. Several attack techniques were developed by security researchers that allowed them to crack a WEP key in a reasonable time and use it to connect to a WLAN or intercept network communications between WLAN and client devices. Let's briefly review how WEP encryption works and why is it so easy to break. WEP uses so-called initialization vectors (IV) concatenated with a WLAN's shared key to encrypt transmitted packets. After encrypting a network packet, an IV is added to a packet as it is and sent to a receiving side, for example, an access point. This process is depicted in the following flowchart: The WEP encryption process An attacker just needs to collect enough IVs, which is also a trivial task using additional reply attacks to force victims to generate more IVs. Even worse, there are attack techniques that allow an attacker to penetrate WEP-protected WLANs even without connected clients, which makes those WLANs vulnerable by default. Additionally, WEP does not have a cryptographic integrity control, which makes it vulnerable not only to attacks on confidentiality. There are numerous ways how an attacker can abuse a WEP-protected WLAN, for example: Decrypt network traffic using passive sniffing and statistical cryptanalysis Decrypt network traffic using active attacks (reply attack, for example) Traffic injection attacks Unauthorized WLAN access Although WEP was officially superseded by the WPA technology in 2003, it still can be sometimes found in private home networks and even in some corporate networks (mostly belonging to small companies nowadays). But this security technology has become very rare and will not be used anymore in future, largely due to awareness in corporate networks and because manufacturers no longer activate WEP by default on new devices. In our humble opinion, device manufacturers should not include WEP support in their new devices to avoid its usage and increase their customers' security. From the security specialist's point of view, WEP should never be used to protect a WLAN, but it can be used for Wi-Fi security training purposes. Regardless of a security type in use, shared keys always add the additional security risk; users often tend to share keys, thus increasing the risk of key compromise and reducing accountability for key privacy. Moreover, the more devices use the same key, the greater amount of traffic becomes suitable for an attacker during cryptanalytic attacks, increasing their performance and chances of success. This risk can be minimized by using personal identifiers (key, certificate) for users and devices. WPA/WPA2 Due to numerous WEP security flaws, the next generation of Wi-Fi security mechanism became available in 2003: Wi-Fi Protected Access (WPA). It was announced as an intermediate solution until WPA2 is available and contained significant security improvements over WEP. Those improvements include: Stronger encryption Cryptographic integrity control: WPA uses an algorithm called Michael instead of CRC used in WEP. This is supposed to prevent altering data packets on the fly and protects from resending sniffed packets. Usage of temporary keys: The Temporal Key Integrity Protocol (TKIP) automatically changes encryption keys generated for every packet. This is the major improvement over the static WEP where encryption keys should be entered manually in an AP config. TKIP also operates RC4, but the way how it is used was improved. Support of client authentication: The capability to use dedicated authentication servers for user and device authentication made WPA suitable for use in large enterprise networks. The support of cryptographically strong algorithm Advanced Encryption Standard (AES) was implemented in WPA, but it was not set as mandatory, only optional. Although WPA was a significant improvement over WEP, it was a temporary solution before WPA2 was released in 2004 and became mandatory for all new Wi-Fi devices. WPA2 works very similar to WPA and the main differences between WPA and WPA2 is in the algorithms used to provide security: AES became the mandatory algorithm for encryption in WPA2 instead of default RC4 in WPA TKIP used in WPA was replaced by Counter Cipher Mode with Block Chaining Message Authentication Code Protocol (CCMP) Because of the very similar workflows, WPA and WPA2 are also vulnerable to the similar or the same attacks and usually called and spelled in one word WPA/WPA2. Both WPA and WPA2 can work in two modes: pre-shared key (PSK) or personal mode and enterprise mode. Pre-shared key mode Pre-shared key or personal mode was intended for home and small office use where networks have low complexity. We are more than sure that all our readers have met this mode and that most of you use it at home to connect your laptops, mobile phones, tablets, and so on to home networks. The general idea of PSK mode is using the same secret key on an access point and on a client device to authenticate the device and establish an encrypted connection for networking. The process of WPA/WPA2 authentication using a PSK consists of four phases and is also called a 4-way handshake. It is depicted in the following diagram: WPA/WPA2 4-way handshake The main WPA/WPA2 flaw in PSK mode is the possibility to sniff a whole 4-way handshake and brute-force a security key offline without any interaction with a target WLAN. Generally, the security of a WLAN mostly depends on the complexity of a chosen PSK. Computing a PMK (short for primary master key) used in 4-way handshakes (refer to the handshake diagram) is a very time-consuming process compared to other computing operations and computing hundreds of thousands of them can take very long. But in case of a short and low complexity PSK in use, a brute-force attack does not take long even on a not-so-powerful computer. If a key is complex and long enough, cracking it can take much longer, but still there are ways to speed up this process: Using powerful computers with CUDA (short for Compute Unified Device Architecture), which allows a software to directly communicate with GPUs for computing. As GPUs are natively designed to perform mathematical operations and do it much faster than CPUs, the process of cracking works several times faster with CUDA. Using rainbow tables that contain pairs of various PSKs and their corresponding precomputed hashes. They save a lot of time for an attacker because the cracking software just searches for a value from an intercepted 4-way handshake in rainbow tables and returns a key corresponding to the given PMK if there was a match instead of computing PMKs for every possible character combination. Because WLAN SSIDs are used in 4-way handshakes analogous to a cryptographic salt, PMKs for the same key will differ for different SSIDs. This limits the application of rainbow tables to a number of the most popular SSIDs. Using cloud computing is another way to speed up the cracking process, but it usually costs additional money. The more computing power can an attacker rent (or get through another ways), the faster goes the process. There are also online cloud-cracking services available in Internet for various cracking purposes including cracking 4-way handshakes. Furthermore, as with WEP, the more users know a WPA/WPA2 PSK, the greater the risk of compromise—that's why it is also not an option for big complex corporate networks. WPA/WPA2 PSK mode provides the sufficient level of security for home and small office networks only when a key is long and complex enough and is used with a unique (or at least not popular) WLAN SSID. Enterprise mode As it was already mentioned in the previous section, using shared keys itself poses a security risk and in case of WPA/WPA2 highly relies on a key length and complexity. But there are several factors in enterprise networks that should be taken into account when talking about WLAN infrastructure: flexibility, manageability, and accountability. There are various components that implement those functions in big networks, but in the context of our topic, we are mostly interested in two of them: AAA (short for authentication, authorization, and accounting) servers and wireless controllers. WPA-Enterprise or 802.1x mode was designed for enterprise networks where a high security level is needed and the use of AAA server is required. In most cases, a RADIUS server is used as an AAA server and the following EAP (Extensible Authentication Protocol) types are supported (and several more, depending on a wireless device) with WPA/WPA2 to perform authentication: EAP-TLS EAP-TTLS/MSCHAPv2 PEAPv0/EAP-MSCHAPv2 PEAPv1/EAP-GTC PEAP-TLS EAP-FAST You can find a simplified WPA-Enterprise authentication workflow in the following diagram: WPA-Enterprise authentication Depending on an EAP-type configured, WPA-Enterprise can provide various authentication options. The most popular EAP-type (based on our own experience in numerous pentests) is PEAPv0/MSCHAPv2, which is relatively easily integrated with existing Microsoft Active Directory infrastructures and is relatively easy to manage. But this type of WPA-protection is relatively easy to defeat by stealing and bruteforcing user credentials with a rogue access point. The most secure EAP-type (at least, when configured and managed correctly) is EAP-TLS, which employs certificate-based authentication for both users and authentication servers. During this type of authentication, clients also check server's identity and a successful attack with a rogue access point becomes possible only if there are errors in configuration or insecurities in certificates maintenance and distribution. It is recommended to protect enterprise WLANs with WPA-Enterprise in EAP-TLS mode with mutual client and server certificate-based authentication. But this type of security requires additional work and resources. WPS Wi-Fi Protected Setup (WPS) is actually not a security mechanism, but a key exchange mechanism which plays an important role in establishing connections between devices and access points. It was developed to make the process of connecting a device to an access point easier, but it turned out to be one of the biggest wholes in modern WLANs if activated. WPS works with WPA/WPA2-PSK and allows connecting devices to WLANs with one of the following methods: PIN: Entering a PIN on a device. A PIN is usually printed on a sticker at the back side of a Wi-Fi access point. Push button: Special buttons should be pushed on both an access point and a client device during the connection phase. Buttons on devices can be physical and virtual. NFC: A client should bring a device close to an access point to utilize the Near Field Communication technology. USB drive: Necessary connection information exchange between an access point and a device is done using a USB drive. Because WPS PINs are very short and their first and second parts are validated separately, online brute-force attack on a PIN can be done in several hours allowing an attacker to connect to a WLAN. Furthermore, the possibility of offline PIN cracking was found in 2014 which allows attackers to crack pins in 1 to 30 seconds, but it works only on certain devices. You should also not forget that a person who is not permitted to connect to a WLAN but who can physically access a Wi-Fi router or access point can also read and use a PIN or connect via push button method. Getting familiar with the Wi-Fi attack workflow In our opinion (and we hope you agree with us), planning and building a secure WLAN is not possible without the understanding of various attack methods and their workflow. In this topic, we will give you an overview of how attackers work when they are hacking WLANs. General Wi-Fi attack methodology After refreshing our knowledge about wireless threats and Wi-Fi security mechanisms, let's have a look at the attack methodology used by attackers in the real world. Of course, as with all other types of network attack, wireless attack workflows depend on certain situations and targets, but they still align with the following general sequence in almost all cases: The first step is planning. Normally, attackers need to plan what are they going to attack, how can they do it, which tools are necessary for the task, when is the best time and place to attack certain targets, and which configuration templates will be useful so that they can be prepared in advance. White-hat hackers or penetration testers need to set schedules and coordinate project plans with their customers, choose contact persons on a customer side, define project deliverables, and do some other organizational work if demanded. As with every penetration testing project, the better a project was planned (and we can use the word "project" for black-hat hackers' tasks), the more chances of a successful result. The next step is survey. Getting as accurate as possible and as much as possible information about a target is crucial for a successful hack, especially in uncommon network infrastructures. To hack a WLAN or its wireless clients, an attacker would normally collect at least SSIDs or MAC addresses of access points and clients and information about a security type in use. It is also very helpful for an attacker to understand if WPS is enabled on a target access point. All that data allows attackers not only to set proper configs and choose the right options for their tools, but also to choose appropriate attack types and conditions for a certain WLAN or Wi-Fi client. All collected information, especially non-technical (for example, company and department names, brands, or employee names), can also become useful at the cracking phase to build dictionaries for brute-force attacks. Depending on a type of security and attacker's luck, a data collected at the survey phase can even make the active attack phase unnecessary and allow an attacker to proceed directly with the cracking phase. The active attacks phase involves active interaction between an attacker and targets (WLANs and Wi-Fi clients). At this phase, attackers have to create conditions necessary for a chosen attack type and execute it. It includes sending various Wi-Fi management and control frames and installing rogue access points. If an attacker wants to cause a denial of service in a target WLAN as a goal, such attacks are also executed at this phase. Some of active attacks are essential to successfully hack a WLAN, but some of them are intended to just speed up hacking and can be omitted to avoid causing alarm on various wireless intrusion detection/prevention systems (wIDPS), which can possibly be installed in a target network. Thus, the active attacks phase can be called optional. Cracking is another important phase where an attacker cracks 4-way-hadshakes, WEP data, NTLM hashes, and so on, which were intercepted at the previous phases. There are plenty of various free and commercial tools and services including cloud cracking services. In case of success at this phase, an attacker gets the target WLAN's secret(s) and can proceed with connecting to the WLAN, decrypt intercepted traffic, and so on. The active attacking phase Let's have a closer look to the most interesting parts of the active attack phase—WPA-PSK and WPA-Enterprise attacks—in the following topics. WPA-PSK attacks As both WPA and WPA2 are based on the 4-way handshake, attacking them doesn't differ—an attacker needs to sniff a 4-way handshake in the moment, establishing a connection between an access point and an arbitrary wireless client and bruteforcing a matching PSK. It does not matter whose handshake is intercepted, because all clients use the same PSK for a given target WLAN. Sometimes, attackers have to wait long until a device connects to a WLAN to intercept a 4-way handshake and of course they would like to speed up the process when possible. For that purpose, they force already connected device to disconnect from an access point sending control frames (deauthentication attack) on behalf of a target access point. When a device receives such a frame, it disconnects from a WLAN and tries to reconnect again if the "automatic reconnect" feature is enabled (it is enabled by default on most devices), thus performing another one 4-way handshake that can be intercepted by an attacker. Another possibility to hack a WPA-PSK protected network is to crack a WPS PIN if WPS is enabled on a target WLAN. Enterprise WLAN attacks Attacking becomes a little bit more complicated if WPA-Enterprise security is in place, but could be executed in several minutes by a properly prepared attacker by imitating a legitimate access point with a RADIUS server and by gathering user credentials for further analysis (cracking). To settle this attack, an attacker needs to install a rogue access point with a SSID identical to a target WLAN's SSID and set other parameter (like EAP type) similar to the target WLAN to increase chances on success and reduce the probability of the attack to be quickly detected. Most user Wi-Fi devices choose an access point for a connection to a certain WLAN by a signal strength—they connect to that one which has the strongest signal. That is why an attacker needs to use a powerful Wi-Fi interface for a rogue access point to override signals from legitimate ones and make devices around connect to the rogue access point. A RADIUS server used during such attacks should have a capability to record authentication data, NTLM hashes, for example. From a user perspective, being attacked in such way looks just being unable to connect to a WLAN for an unknown reason and could even be not seen if a user is not using a device at the moment and just passing by a rogue access point. It is worth mentioning that classic physical security or wireless IDPS solutions are not always effective in such cases. An attacker or a penetration tester can install a rogue access point outside of the range of a target WLAN. It will allow hacker to attack user devices without the need to get into a physically controlled area (for example, office building), thus making the rogue access point unreachable and invisible for wireless IDPS systems. Such a place could be a bus or train station, parking or a café where a lot of users of a target WLAN go with their Wi-Fi devices. Unlike WPA-PSK with only one key shared between all WLAN users, the Enterprise mode employs personified credentials for each user whose credentials could be more or less complex depending only on a certain user. That is why it is better to collect as many user credentials and hashes as possible thus increasing the chances of successful cracking. Summary In this article, we looked at the security mechanisms that are used to secure access to wireless networks, their typical threads, and common misconfigurations that lead to security breaches and allow attacker to harm corporate and private wireless networks. The brief attack methodology overview has given us a general understanding of how attackers normally act during wireless attacks and how they bypass common security mechanisms by abusing certain flaws in those mechanisms. We also saw that the most secure and preferable way to protect a wireless network is to use WPA2-Enterprise security along with a mutual client and server authentication, which we are going to implement in our penetration testing lab. Resources for Article: Further resources on this subject: Pentesting Using Python [article] Advanced Wireless Sniffing [article] Wireless and Mobile Hacks [article]

0
0
9789

article-image-enhancing-scripting-experience

Packt

07 Mar 2016

22 min read

Enhancing the Scripting Experience

Packt

07 Mar 2016

22 min read

In this article by Chris Halverson, the author of the book, PowerCLI Essentials, has discussed a few important tips on effective scripting. He starts off with a simple everyday example to make you aware of the situation in a humorous manner. Let's take an easy example. Monday morning rolls around and the boss comes into your office with a big smile on his face, "Great report!", he says in his loud booming voice. "The information you provided helped the department secure another $150K for the budget. We are supposed to build a new automation framework that will allow us to produce these types of reports to the management team on a weekly basis." So you reply, "So we got a $150K budget for developing this framework in-house?" The boss chuckles and says, "No, it is for an office renovation. I'm getting my corner window office!" and he walks away laughing. "Typical!" you think. In building an automation framework, there needs to be a methodology to document procedures, create central stores, establish tools, design development standards, and form repositories. Each process takes time and in some cases, capital. However, setting precedence beforehand makes the task of sorting and running scripts later handy and usable. This article deals with the building of a central script store and establishing a means to build consistent and usable scripting experience to provision, report, or configure a virtual environment that is simple and straightforward. We'll be looking on the following topics: Building a code repository Scripting with the intention of repurpose Establishing quality control Code repositories – why would I need that? The first problem that may arise when developing code is that you spend hours designing and writing a script and then, the script is not found. Or you pass the script along to another member of the team and they respond, "It doesn't work". Having a well-defined repository helps to save the definition of the script into an open and robust place. This repository can later be used by everyone in the team to build something new. A repository can consist of a shared folder on a centralized file, a document like SharePoint, a commercial code repository such as CVS or Team Foundation Server, or a public-facing code sharing site such as personal Software Change Manager (SCM) or even a personal blogsite. The value of the repository is dependent upon the sensitivity of the script data being placed in the store. If the scripts are being written with the intention of sharing them with a work associate, a file share may be enough; however, if the scripts are very generic and has no company-sensitive data and there is a desire to get more feedback and more contributors, a public SCM account might be beneficial. Building an internal repository Let's start with a file share layout. Most IT shops have a central server that tends to be a "Jump box" or a utility server that runs general scripts or contains a multitude of administration tools. This server could be set up with a file folder named Scripts that will contain the repository. Next, let's think through the structure. Any good store of scripts should have an area that stores final scripts and scripts under development; it needs a way to label versioning and should have a runtime area that has input files and output files. This enables consistency and provides a means to develop more functionality in the code, while helping to control the script's content and allowing additional functionality to be developed while still allowing the script to be run. So with this in mind, this is an example of a folder structure for Scripts: Scripts Final Input Output Modules Test Input Output Modules Development Input Output Modules Storing the scripts in the main folder will allow a repository to be useful in a layout. Start by building the scripts in the Development folder; once the script is functional, a copy would be placed in the Test folder, and would be available for testing by the remaining operational staff. Once confirmed, the script would be copied to the Final or Production folder. This would allow multiple copies of a script and set for different stages of the code base. The main disadvantage of this is that it quickly becomes unruly and loses its effectiveness once the script repository exceeds a couple of hundred scripts, or the user/developer base exceeds about 10 individuals. This is due to most of the check-in, checkout process. These processes are very manual and prone to overwrites and loss. Other issues such as having multiple people trying to edit the same script at the same time, problems cropping up when versions become out of sync (editing version 1.4 while the script that is in test is version 1.1 and the one in Production is version 1.3), or not knowing what is the latest version to work with. The fact is that there is no way to lock a script and see its history can be some of the biggest issues. Using a third-party tool for code check-in and checkout Downloading and integration aside, there are numerous solutions that could be used to help alleviate the check-in/checkout problem. Using a document management solution such as Microsoft SharePoint or EMC Documentum can resolve some of the version control and sign-in and sign-out issues; however, these can be large and expensive solutions requiring local support staff, and these are truly relatively small issues. If these tools are already running in the environment, look at them as options for the Development portion of the repository. The alternative solution is to utilize a Version Control System (VCS) or Software Change Manager (SCM), which is smaller, has a number of open source forks, and can solve a number of the previously-mentioned issues. As with any solution, there is the typical problem of having another product to learn and another product to support. Any solution that is to be installed in the site needs to be robust and be jointly agreed upon by all the parties that will use it: development, administration, and users. Having one individual that supports and operates a proprietary solution will become either a bottleneck or lose the point of the original solution when that person moves on to a different job. Send it to the cloud! Public-facing script repositories There are a number of places to store code snippets, one-liners, and short examples for a team to view and comment on. Blog sites such as Blogger.com or WordPress can provide a static means of storing scripts and a Wiki can provide a simple collaboration for the team. When these items are shared as a team within a common source, it can be very useful. For most individuals, this static area provides a place to share their experience. GitHub.com is an example of a cloud-based VCS and provides up to a 5-user repository for free. This site adds to the VCS that were mentioned in the previous section and allows scripts to be collaborated through a team whose members may not reside in the same geographic area or in the same time zone. Conclusions No matter what method is chosen, the organization of scripts should be planned beforehand. Planning the structure helps to ensure that whatever automation mechanism is chosen in the future, such as VMware vRealize Automation, VMware vRealize Orchestration or some other custom developed workflow, the scripts will be available to anyone requiring them to run. Scripting with the intention of reuse Changing gears from storing scripts will in fact enable us to repurpose them, or use portions of a script in multiple areas. Functions, workflows, and modules are critical to this end. Each will be described in detail in this section of the book. Functions, functions, functions – the fun with functions What is a function? The help file defines it as a named block of code that performs an action. So writing a function would look like this: Function Get-PS {get-process PowerShell} The output would be nothing unless the function is called like this: Function Get-PS {get-process PowerShell} Get-PS The output of this is the same as Get-Process PowerShell, but it is reusable and could be called at any point in the script. There is a simple and useful function that I personally use in many of the developed scripts to arrive at a decision and return a true or a false value: Function Get-Decision ($Decision) { Switch ($Decision.toupper()) { "Y" {$Decision = $True} "N" {$Decision = $False} Default {"Enter a ''Y'' for a Yes or a ''N'' for No." ; Get-Decision ($Decision)} } Return ($Decision) } As you can see, the preceding function is called Get-Decision and accepts an object through Read-Host and returns the decision. So the command that would call this would be something like this. $MakeCSV = Get-Decision $(Read-Host "`nDo you need to make this a CSV to duplicate the settings and create more than one? (Y/N)") So the $MakeCSV variable would be either $True or $False depending on the outcome of the user prompt. Then the next line would do something based upon that decision. Important tip All functions must be before the mainline code. All PoSH runs in a linear fashion from start to end, therefore if the function is being called, it has to be run to enter into the running memory. The function is also removed from memory upon the closing of the command window, so think it through where the function is needed and open it accordingly. An example for a multifunction script is as follows: Premise: Build a script that automates the building of a VM. Inputs: The script should capture the VM name, how many CPUs, how much memory, and build a VM with the settings captured and a 50 GB hard disk. Output: Builds VM. Assumption: $cred is captured using the $cred = Get-Credential command before this script is run. Script: Insert the following code snippet: If ($Cred -ne $null) { Get-vc vCenter.domain.local -credential $cred | out-null } Else { End } Function Gather-Data { $Sys = @{ Name = Read-Host "What is the name of the VM?"; NumCPU = Read-Host "What is the amount of CPUs required?"; MeminGB = Read-Host "How much RAM is required?" VMHost = get-vmhost | select -first 1 } Return ($Sys) } Function Build-VM ($Sys) { New-VM –Name $Sys.Name –NumCPU $Sys.NumCPU –MemoryGB $Sys.MeminGB –DiskGB 50 -vmhost $Sys.vmhost } $output = Gather-Data Build-VM($output) This is a pretty simple example of building a hash table in one function, outputting the hash table, and using it as the input for a second function. The information stored in the hash table is used to populate a New-VM command. The point is that the Gather-Data function can be used to collect data for other things such as a CSV file, or a different function that searches for the information and removes the VM. The best part is that the information is deleted once the window is closed and is useful only when the script is run. Using modules Building modules As building modules are a little more of an advanced topic, this book will only discuss their use. However, a little understanding of what they are may whet your appetite to delve deeper into the topic. MSDN provides an outline of four different module types: the script module, the binary module, the manifest module, and the dynamic module. (https://msdn.microsoft.com/en-us/library/dd878324%28v=vs.85%29.aspx) Primarily, PowerCLI modules have always been typically plug-ins and not modules; they always required a separate running of the PowerCLI icon to allow the use of their cmdlets. Recent versions have changed that mantra and are now using modules. Type Get-Help About_PSSnapins in your PowerShell interface to learn more about the difference. A module is, again as pointed out in the help file, a package of commands. It is a collection of cmdlets, scripts, workflows, aliases, variables, and functions that when imported allow the user to run true and tested commands. Every install of PoSH has modules installed by default. To view the installed modules type Get-Module –ListAvailable and the command will list the installed modules. After the installation of the PowerCLI package, there are a number of modules that are added to the available list of packages. They can be as follows: Directory: C:Program Files (x86)VMwareInfrastructurevSphere PowerCLIModules ModuleType Version Name ExportedCommands ---------- ------- ---- ---------------- Binary 6.0.0.0 VMware.VimAutomation.Cis.Core Binary 6.0.0.0 VMware.VimAutomation.Cloud Manifest 6.0.0.0 VMware.VimAutomation.Core Binary 6.0.0.0 VMware.VimAutomation.HA Binary 1.0.0.0 VMware.VimAutomation.License Binary 6.0.0.0 VMware.VimAutomation.PCloud Manifest 6.0.0.0 VMware.VimAutomation.SDK Binary 6.0.0.0 VMware.VimAutomation.Storage Binary 6.0.0.0 VMware.VimAutomation.Vds Binary 1.0.0.0 VMware.VimAutomation.vROps Binary 6.0.0.0 VMware.VumAutomation Each of these modules includes numerous pieces of code that perform various functions with the VMware environment. To list the commands available in a module such as VMware.VimAutomation.Core, run the Get-Command –module VMware.VimAutomation.Core command, where the outputs is, as of PowerCLI v6.0 R3, 291 individual cmdlets. With the use of modules and PoSH v3, importing modules from another system is now a reality. This importing isn't a permanent addition, but can be very handy in a pinch. Firstly, WinRM must be running and configured for remote management; this is done through the WinRM quickconfig command running on an administrative PowerShell window. Once WinRM allows remote connections, type $PS = New-PSSession –Computername <computername>; this allows a remote connection to be established through WinRM to the other computer. Once this PSSession is established, commands can be run on an alternate computer as well as modules can be imported to the local computer. $PS = New-PSSession –ComputerName TestComputer Get-Module –PSSession $PS –ListAvailable This shows the available modules on the remote computer. Adding the Import-Module –PSSession $PS Module_name method will permit the import of a module into the current PoSH window. So if there is no VMware.VimAutomation.Core installed on the local system, the preceding method allows it to run as if it were already installed. The running of the commands become slower as they are run through the remote connection; however, they do run and can get the information needed. Note that this command doesn't transfer the aliases as well, so try and remember its full syntax. Calling other scripts within the script Calling other scripts within a script is fairly straightforward as is a single command. invoke-expression –command .script.ps1 or "c:scriptssample_script.ps1" | invoke-expression, allows an external script to be run and for the output of that script to be captured within the main script. Think about it this way, as the wrapper ps1 or a scripting framework that has been developed it can call another script that another person can write and the output from the external expression. It wouldn't matter what the output would be unless it needs to be parsed and processed for a different output. We will use same the example that was written in the preceding multifunction section, where there was a need to capture the VM name, how many CPUs, and how much memory, and then build a VM with the settings captured and on a 50 GB hard disk. An example of a framework script is as follows: Premise: Build a script that automates the building of a VM. Inputs: The script should capture the VM name, how many CPUs, how much memory, and then build a VM with the settings captured and a 50 GB Hard Disk. Output: Builds VM. Assumption: This script must call other scripts through invoke-expression. Script: Insert the following code snippet: Connect-vCenter.ps1<# .Synopsis Does the lifting of connecting to a vCenter .Description Starts VIMAutomation Module Gets Credentials Gets the vCenter name and connects to vCenter .Input User Input Credentials vCenter Name .Output None .Author Chris Halverson .Change Log 11/6/2015 .FileName Connect-vCenter.ps1 .Version Draft 0.1 #> #Call the Module If ($(get-module -ListAvailable) –notmatch` "VMware.VimAutomation.Core") { Import-Module VMware.vimautomation.core } #Test for stored credentials If ($Cred -ne $null){ write-Host 'Using the Credential '$Cred.UserName } Else { $Cred = Get-Credential } $VC = Read-Host "What is the name of the vCenter server? " #Trying a ping of the vCenter Host If (Test-Connection -ComputerName $VC -Count 2 -quiet) { Connect-ViServer $VC -Credential $Cred | Out-Null Write-Host "Connected to vCenter $VC " -ForegroundColor Green [Console]::Foregroundcolor = "Gray" } Else { Write-Host "vCenter not found. Exiting script" ` -ForegroundColor Red [console]::ForegroundColor = "Gray" Break} Gather-Data.ps1 <# .Synopsis Collects Information for build data .Description Gets Appropriate data and outputs to a variable .Input User Input VM Name Number of CPUs Amount of RAM in GB .Output Variable .Author Chris Halverson .Change Log 11/6/2015 .FileName Gather-Data.ps1 .Version Draft 0.1 #> $global:Sys = @{ Name = Read-Host "What is the name of the VM?"; NumCPU = Read-Host "What is the amount of CPUs required?"; MeminGB = Read-Host "How much RAM is required?" } Build-VM.ps1 <# .Synopsis Builds a VM .Description Calls the Connect-vCenter.ps1 Script to connect to vCenter Gathers Data from Gather-Data.ps1 Builds VM based on Specs .Input Called through other Scripts .Output Builds VM .Author Chris Halverson .Change Log 11/6/2015 .FileName Build-VM.ps1 .Version Draft 0.1 #> Invoke-Expression -command .Connect-vCenter.ps1 Invoke-Expression -command .Gather-Data.ps1 $vHost = Get-VMhost | Select -first 1 New-VM –Name $Sys.Name –NumCPU $Sys.NumCPU –MemoryGB $Sys.MeminGB ` –DiskGB 50 -vmhost $vhost Upon completion of the Build-VM.ps1 script, a new VM is created and although the Guest OS type is Windows XP and some of the other configuration is a little off from what a typical build would be, the script was successful and it has components that can be reused in a loop or a workflow. Building a framework that others can build upon A framework in this context is a set of standard components that every other script that has been developed would want to access. For example, building a wrapper that can do email a final output without running the command and specifying certain information, or something that automatically opens the connection to the vCenter's environment. These are some of the things that building a wrapper or workflow can do. Some of the advantages of workflows are that they can run parallel instead of serially as compared to a standard PowerShell script; they can be paused, stopped, or restarted as needed, and they can be run on remote hosts and can increase the performance ten times if there are enough remote engines to run it. Logging is included in the workflow engine. These become huge as more and more administrative tasks need to have parallelism to run properly. One important thing of note is that workflows are built using the .NET framework Windows Workflow Foundation (WWF), and the PowerShell code that is being run in the workflow is actually being translated into XAML for WWF. The XAML code can actually be seen when a workflow is created by typing this: get-command [Workflow name] | select-object XamlDefinition Typing Help About_Workflows gives a lengthy bit of information that discusses what a workflow is, its benefits, and why it is are needed: ABOUT WORKFLOWS Workflows are commands that consist of an ordered sequence of related activities. Typically, they run for an extended period of time, gathering data from and making changes to hundreds of computers, often in heterogeneous environments. Workflows can be written in XAML, the language used in Windows Workflow Foundation, or in the Windows PowerShell language. Workflows are typically packaged in modules and include help Workflows are critical in an IT environment because they can survive reboots and recover automatically from common failures. You can disconnect and reconnect from sessions and computers running workflows without interrupting workflow processing, and suspend and resume workflows transparently without data loss. Each activity in a workflow can be logged and audited for reference. Workflow can run as jobs and can be scheduled by using the Scheduled Jobs feature of Windows PowerShell. The state and data in a workflow is saved or "persisted" at the beginning and end of the workflow and at points that you specify. Workflow persistence points work like database snapshots or program checkpoints to protect the workflow from the effects of interruptions and failures. In the case of a failure from which the workflow cannot recover, you can use the persisted data and resume from the last persistence point, instead of rerunning an extensive workflow from the beginning. So to add to the description from the help file, these are jobs or scheduled jobs that can be started, run, rerun and be paused for a reboot. The workflow engine should use PoSH v3 and higher to run, as it gains many advantages when run through these jobs. The workflow requires the following: A client computer to run the workflow A workflow session (a PowerShell session otherwise known as a PSSession) on the client computer (it can be local or remote) And a target for the workflow activities Running a workflow To run a workflow, there are some caveats to make sure things run smoothly. Firstly, make sure that PowerShell is running in an administrative window or option. Either right-click on the PowerShell icon and select Run as Administrator (which triggers an User Access Control or UAC verification window), or type Start-Process PowerShell –verb RunAs that does the same thing. Next, enable the client or the remote client to have a remote PSSession run by typing Enable-PSRemoting –Force or Set-WSManQuickConfig or as seen in a previous section, type WinRM QuickConfig, which allows the computer to accept WinRM or remote management commands. These commands will start the WinRM service, configure a firewall exception for the service, and allow Kerberos authentication for the commands to be run. To return the configuration back Disable-PSRemoting –force returns the configuration back to the original condition of the OS. A typical way to set up a remote session is to use the New-PSWorkflowSession cmdlet. Let's see what this actually looks like and then process a simple workflow script. Premise: Build a workflow that gets a WMI entry for Win32_BIOS, lists running processes, lists the stopped services on the computer, and does all that in parallel. Inputs: All inputs will be hardcoded for simplicity. Output: The contents of the Win32_BIOS processes that are running and the stopped services. Assumption: That a workflow is being used. Script: Insert the following code snippet: Workflow Workflow pTest { Parallel { Get-CimInstance -ClassName Win32_Bios Get-Process | Select Id, ProcessName Get-Service | ? Status –match "Stopped" |Select DisplayName } } When this workflow is run, it is easy to see that these commands are run in parallel, as the output is not displayed in the order of execution. In fact, the CimInstance completes first, then the services and finally, the processes, and export outputs shows these were run in parallel. The order would change depending on what completes first. These workflows will be developed more when we mix VMware administration scripts with Windows administration scripts to fill out more of remote development. Summary This article summarized good coding practices, outlined the use of repeatable code, and ways to use that code for other general use. It built a longer code base and methods to keep the practicalities from the concept and transform the thought to action. This article also summed up a version control technique to keep historical information into the script management structure. Further resources on this subject: Introduction to vSphere Distributed switches Creating Custom Reports and Notifications for vSphere Creating an Image Profile by cloning an existing one

0
0
4107

How-To Tutorials

Packt

07 Mar 2016

19 min read

Docker Hosts

Packt

07 Mar 2016

19 min read

0
0
3190

Packt

07 Mar 2016

9 min read

Common Grunt Plugins

Packt

07 Mar 2016

9 min read

In this article, by Douglas Reynolds author of the book Learning Grunt, you will learn about Grunt plugins as they are core of the Grunt functionality and are an important aspect of Grunt as plugins are what we use in order to design an automated build process. (For more resources related to this topic, see here.) Common Grunt plugins and their purposes At this point, you should be asking yourself, what plugins can benefit me the most and why? Once you ask these questions, you may find that a natural response will be to ask further questions such as "what plugins are available?" This is exactly the intended purpose of this section, to introduce useful Grunt plugins and describe their intended purpose. contrib-watch This is, in the author's opinion, probably the most useful plugin available. The contrib-watch plugin responds to changes in files defined by you and runs additional tasks upon being triggered by the changed file events. For example, let's say that you make changes to a JavaScript file. When you save, and these changes are persisted, contrib-watch will detect that the file being watched has changed. An example workflow might be to make and save changes in a JavaScript file, then run Lint on the file. You might paste the code into a Lint tool, such as http://www.jslint.com/, or you might run an editor plugin tool on the file to ensure that your code is valid and has no defined errors. Using Grunt and contrib-watch, you can configure contrib-watch to automatically run a Grunt linting plugin so that every time you make changes to your JavaScript files, they are automatically lined. Installation of contrib-watch is straight forward and accomplished using the following npm-install command: npm install grunt-contrib-watch –save-dev The contrib-watch plugin will now be installed into your node-module directory. This will be located in the root of your project, you may see the Angular-Seed project for an example. Additionally, contrib-watch will be registered in package.json, you will see something similar to the following in package.json when you actually run this command: "devDependencies": { "grunt": "~0.4.5", "grunt-contrib-watch": "~0.4.0" } Notice the tilde character (~) in the grunt-contrib-watch: 0.5.3 line, the tilde actually specifies that the most recent minor version should be used when updating. Therefore, for instance, if you updated your npm package for the project, it would use 0.5.4 if available; however, it would not use 0.6.x as that is a higher minor version. There is also a caret character (^) that you may see. It will allow updates matching the most recent major version. In the case of the 0.5.3 example, 0.6.3 will be allowed, while 1.x versions are not allowed. At this point, contrib-watch is ready to be configured into your project in a gruntfile, we will look more into the gruntfile later. It should be noted that this, and many other tasks, can be run manually. In the case of contrib-watch, once installed and configured, you can run the grunt watch command to start watching the files. It will continue to watch until you end your session. The contrib-watch has some useful options. While we won't cover all of the available options, the following are some notable options that you should be aware of. Make sure to review the documentation for the full listing of options: Options.event: This will allow you to configure contrib-watch to only trigger when certain event types occur. The available types are all, changed, added, and deleted. You may configure more than one type if you wish. The all will trigger any file change, added will respond to new files added, and deleted will be triggered on removal of a file. Options.reload: This will trigger the reload of the watch task when any of the watched files change. A good example of this is when the file in which the watch is configured changes, it is called gruntfile.js. This will reload the gruntfile and restart contrib-watch, watching the new version of gruntfile.js. Options.livereload: This is different than reload, therefore, don't confuse the two. Livereload starts a server that enables live reloading. What this means is that when files are changed, your server will automatically update with the changed files. Take, for instance, a web application running in a browser. Rather than saving your files and refreshing your browser to get the changed files, livereload automatically reloads your app in the browser for you. contrib-jshint The contrib-jshint plugin is to run automated JavaScript error detection and will help with identifying potential problems in your code, which may surface during the runtime. When the plugin is run, it will scan your JavaScript code and issue warnings on the preconfigured options. There are a large number of error messages that jshint might provide and it can be difficult at times to understand what exactly a particular message might be referring to. Some examples are shown in the following: The array literal notation [] is preferable The {a} is already defined Avoid arguments.{a} Bad assignment Confusing minuses The list goes on and there are resources such as http://jslinterrors.com/, whose purpose is to help you understand what a particular warning/error message means. Installation of contrib-jshint follows the same pattern as other plugins, using npm to install the plugin in your project, as shown in the following: npm install grunt-contrib-jshint --save-dev This will install the contrib-jshint plugin in your project's node-modules directory and register the plugin in your devDependencies section of the package.json file at the root of your project. It will be similar to the following: "devDependencies": { "grunt": "~0.4.5", "grunt-contrib-jshint": "~0.4.5" } Similar to the other plugins, you may manually run contrib-jshint using the grunt jshint command. The contrib-jshint is jshint, therefore, any of the options available in jshint may be passed to contrib-jshint. Take a look at http://jshint.com/docs/options/ for a complete listing of the jshint options. Options are configured in the gruntfile.js file that we will cover in detail later in this book. Some examples of options are as follows: curly: This enforces that curly braces are used in code blocks undef: This ensures that all the variables have been declared maxparams: This checks to make sure that the number of arguments in a method does not exceed a certain limit The contrib-jshint allows you to configure the files that will be linted, the order in which the linting will occur, and even control linting before and after the concatenation. Additionally, contrib-jshint allows you to suppress warnings in the configuration options using the ignore_warning option. contrib-uglify Compression and minification are important for reducing the file sizes and contributing for better loading times to improve the performance. The contrib-uglify plugin provides the compression and minification utility by optimizing the JavaScript code and removing unneeded line breaks and whitespaces. It does this by parsing JavaScript and outputting regenerated, optimized, and code with shortened variable names for example. The contrib-uglify plugin is installed in your project using the npm install command, just as you will see with all the other Grunt plugins, as follows: npm install grunt-contrib-uglify --save-dev After you run this command, contrib-uglify will be installed in your node-modules directory at the root of your application. The plugin will also be registered in your devDependencies section of package.json. You should see something similar to the following in devDependencies: "devDependencies": { "grunt": "~0.4.5", "grunt-contrib-uglify": "~0.4.0" } In addition to running as an automated task, contrib-uglify plugin may be run manually by issuing the grunt uglify command. The contrib-uglify plugin is configured to process specific files as defined in the gruntfile.js configuration file. Additionally, contrib-uglify will have defined output destination files that will be created for the processed minified JavaScript. There is also a beautify option that can be used to revert the minified code, should you wish to easily debug your JavaScript. A useful option that is available in conntrib-uglify is banners. Banners allow you to configure banner comments to be added to the minified output files. For example, a banner could be created with the current date and time, author, version number, and any other important information that should be included. You may reference your package.json file in order to get information, such as the package name and version, directly from the package.json configuration file. Another notable option is the ability to configure directory-level compiling of files. You achieve this through the configuration of the files option to use wildcard path references with file extension, such as **/*.js. This is useful when you want to minify all the contents in a directory. contrib-less The contrib-less is a plugin that compiles LESS files into CSS files. The LESS provides extensibility to standard CSS by allowing variables, mixins (declaring a group of style declarations at once that can be reused anywhere in the stylesheet), and even conditional logic to manage styles throughout the document. Just as with other plugins, contrib-less is installed to your project using the npm install command with the following command: npm install grunt-contrib-less –save-dev The npm install will add contrib-less to your node-modules directory, located at the root of your application. As we are using --save-dev, the task will be registered in devDependencies of package.json. The registration will look something similar to the following: "devDependencies": { "grunt": "~0.4.5", "grunt-contrib-less": "~0.4.5" } Typical of Grunt tasks, you may also run contrib-less manually using the grunt less command. The contrib-less will be configured using the path and file options to define the location of source and destination output files. The contrib-less plugin can also be configured with multiple environment-type options, for example dev, test, and production, in order to apply different options that may be needed for different environments. Some typical options used in contrib-less include the following: paths: These are the directories that should be scanned compress: This shows whether to compress output to remove the whitespace plugins: This is the mechanism for including additional plugins in the flow of processing banner: This shows the banner to use in the compiled destination files There are several more options that are not listed here, make sure to refer to the documentation for the full listing of contrib-less options and example usage. Summary In this article we have covered some basic grunt plugins. Resources for Article: Further resources on this subject: Grunt in Action [article] Optimizing JavaScript for iOS Hybrid Apps [article] Welcome to JavaScript in the full stack [article]

0
0
10490

article-image-app-development-using-react-native-vs-androidios

Manuel Nakamurakare

03 Mar 2016

6 min read

App Development Using React Native vs. Android/iOS

Manuel Nakamurakare

03 Mar 2016

6 min read

Until two years ago, I had exclusively done Android native development. I had never developed iOS apps, but that changed last year, when my company decided that I had to learn iOS development. I was super excited at first, but all that excitement started to fade away as I started developing our iOS app and I quickly saw how my productivity was declining. I realized I had to basically re-learn everything I learnt in Android: the framework, the tools, the IDE, etc. I am a person who likes going to meetups, so suddenly I started going to both Android and iOS meetups. I needed to keep up-to-date with the latest features in both platforms. It was very time-consuming and at the same time somewhat frustrating since I was feeling my learning pace was not fast enough. Then, React Native for iOS came out. We didn’t start using it until mid 2015. We started playing around with it and we really liked it. What is React Native? React Native is a technology created by Facebook. It allows developers to use JavaScript in order to create mobile apps in both Android and iOS that look, feel, and are native. A good way to explain how it works is to think of it as a wrapper of native code. There are many components that have been created that are basically wrapping the native iOS or Android functionality. React Native has been gaining a lot of traction since it was released because it has basically changed the game in many ways. Two Ecosystems One reason why mobile development is so difficult and time consuming is the fact that two entirely different ecosystems need to be learned. If you want to develop an iOS app, then you need to learn Swift or Objective-C and Cocoa Touch. If you want to develop Android apps, you need to learn Java and the Android SDK. I have written code in the three languages, Swift, Objective C, and Java. I don’t really want to get into the argument of comparing which of these is better. However, what I can say is that they are different and learning each of them takes a considerable amount of time. A similar thing happens with the frameworks: Cocoa Touch and the Android SDK. Of course, with each of these frameworks, there is also a big bag of other tools such as testing tools, libraries, packages, etc. And we are not even considering that developers need to stay up-to-date with the latest features each ecosystem offers. On the other hand, if you choose to develop on React Native, you will, most of the time, only need to learn one set of tools. It is true that there are many things that you will need to get familiar with: JavaScript, Node, React Native, etc. However, it is only one set of tools to learn. Reusability Reusability is a big thing in software development. Whenever you are able to reuse code that is a good thing. React Native is not meant to be a write once, run everywhere platform. Whenever you want to build an app for them, you have to build a UI that looks and feels native. For this reason, some of the UI code needs to be written according to the platform's best practices and standards. However, there will always be some common UI code that can be shared together with all the logic. Being able to share code has many advantages: better use of human resources, less code to maintain, less chance of bugs, features in both platforms are more likely to be on parity, etc. Learn Once, Write Everywhere As I mentioned before, React Native is not meant to be a write once, run everywhere platform. As the Facebook team that created React Native says, the goal is to be a learn once, write everywhere platform. And this totally makes sense. Since all of the code for Android and iOS is written using the same set of tools, it is very easy to imagine having a team of developers building the app for both platforms. This is not something that will usually happen when doing native Android and iOS development because there are very few developers that do both. I can even go farther and say that a team that is developing a web app using React.js will not have a very hard time learning React Native development and start developing mobile apps. Declarative API When you build applications using React Native, your UI is more predictable and easier to understand since it has a declarative API as opposed to an imperative one. The difference between these approaches is that when you have an application that has different states, you usually need to keep track of all the changes in the UI and modify them. This can become a complex and very unpredictable task as your application grows. This is called imperative programming. If you use React Native, which has declarative APIs, you just need to worry about what the current UI state looks like without having to keep track of the older ones. Hot Reloading The usual developer routine when coding is to test changes every time some code has been written. For this to happen, the application needs to be compiled and then installed in either a simulator or a real device. In case of React Native, you don’t, most of the time, need to recompile the app every time you make a change. You just need to refresh the app in the simulator, emulator, or device and that’s it. There is even a feature called Live Reload that will refresh the app automatically every time it detects a change in the code. Isn’t that cool? Open Source React Native is still a very new technology; it was made open source less than a year ago. It is not perfect yet. It still has some bugs, but, overall, I think it is ready to be used in production for most mobile apps. There are still some features that are available in the native frameworks that have not been exposed to React Native but that is not really a big deal. I can tell from experience that it is somewhat easy to do in case you are familiar with native development. Also, since React Native is open source, there is a big community of developers helping to implement more features, fix bugs, and help people. Most of the time, if you are trying to build something that is common in mobile apps, it is very likely that it has already been built. As you can see, I am really bullish on React Native. I miss native Android and iOS development, but I really feel excited to be using React Native these days. I really think React Native is a game-changer in mobile development and I cannot wait until it becomes the to-go platform for mobile development!

0
0
12911

article-image-rxjs-observable-promise-users-part-2

Charanjit Singh

02 Mar 2016

12 min read

(RxJS) Observable for Promise Users : Part 2

Charanjit Singh

02 Mar 2016

12 min read

0
0
10415

How-To Tutorials

article-image-introducing-openstack-trove

Packt

02 Mar 2016

17 min read

Introducing OpenStack Trove

Packt

02 Mar 2016

17 min read

In this article, Alok Shrivastwa and Sunil Sarat, authors of the book OpenStack Trove Essentials, explain how OpenStack Trove truly and remarkably is a treasure or collection of valuable things, especially for open source lovers like us and, of course, it is an apt name for the Database as a Service (DBaaS) component of OpenStack. In this article, we shall see why this component shows the potential and is on its way to becoming one of the crucial components in the OpenStack world. In this article, we will cover the following: DBaaS and its advantages An introduction to OpenStack's Trove project and its components Database as a Service Data is a key component in today's world, and what would applications do without data? Data is very critical, especially in the case of businesses such as the financial sector, social media, e-commerce, healthcare, and streaming media. Storing and retrieving data in a manageable way is absolutely key. Databases, as we all know, have been helping us manage data for quite some time now. Databases form an integral part of any application. Also, the data-handling needs of different type of applications are different, which has given rise to an increase in the number of database types. As the overall complexity increases, it becomes increasingly challenging and difficult for the database administrators (DBAs) to manage them. DBaaS is a cloud-based service-oriented approach to offering databases on demand for storing and managing data. DBaaS offers a flexible and scalable platform that is oriented towards self-service and easy management, particularly in terms of provisioning a business' environment using a database of choice in a matter of a few clicks and in minutes rather than waiting on it for days or even, in some cases, weeks. The fundamental building block of any DBaaS is that it will be deployed over a cloud platform, be it public (AWS, Azure, and so on) or private (VMware, OpenStack, and so on). In our case, we are looking at a private cloud running OpenStack. So, to the extent necessary, you might come across references to OpenStack and its other services, on which Trove depends. XaaS (short for Anything/Everything as a Service, of which DBaaS is one such service) is fast gaining momentum. In the cloud world, everything is offered as a service, be it infrastructure, software, or, in this case, databases. Amazon Web Services (AWS) offers various services around this: the Relational Database Service (RDS) for the RDBMS (short for relational database management system) kind of system; SimpleDB and DynamoDB for NoSQL databases; and Redshift for data warehousing needs. The OpenStack world was also not untouched by the growing demand for DBaaS, not just by users but also by DBAs, and as a result, Trove made its debut with the OpenStack release Icehouse in April 2014 and since then is one of the most popular advanced services of OpenStack. It supports several SQL and NoSQL databases and provides the full life cycle management of the databases. Advantages Now, you must be wondering why we must even consider DBaaS over traditional database management strategies. Here are a few points you might want to consider that might make it worth your time. Reduced database management costs In any organization, most of their DBAs' time is wasted in mundane tasks such as creating databases, creating instances, and so on. They are not able to concentrate on tasks such as fine-tuning SQL queries so that applications run faster, not to mention the time taken to do it all manually (or with a bunch of scripts that need to be fired manually), so this in effect is wasting resources in terms of both developers' and DBAs' time. This can be significantly reduced using a DBaaS. Faster provisioning and standardization With DBaaS, databases that are provisioned by the system will be compliant with standards as there is very little human intervention involved. This is especially helpful in the case of heavily regulated industries. As an example, let's look at members of the healthcare industry. They are bound by regulations such as HIPAA (short for Health Insurance Portability and Accountability Act of 1996), which enforces certain controls on how data is to be stored and managed. Given this scenario, DBaaS makes the database provisioning process easy and compliant as they only need to qualify the process once, and then every other database coming out of the automated provisioning system is then compliant with the standards or controls set. Easier administration Since DBaaS is cloud based, which means there will be a lot of automation, administration becomes that much more automated and easier. Some important administration tasks are backup/recovery and software upgrade/downgrade management. As an example, with most databases, we should be able to push configuration modifications within minutes to all the database instances that have been spun out by the DBaaS system. This ensures that any new standards being thought of can easily be implemented. Scaling and efficiency Scaling (up or down) becomes immensely easy, and this reduces resource hogging, which developers used as part of their planning for a rainy day, and in most cases, it never came. In the case of DBaaS, since you don't commit resources upfront and only scale up or down as and when necessary, resource utilization will be highly efficient. These are some of the advantages available to organizations that use DBaaS. Some of the concerns and roadblocks for organizations in adopting DBaaS, especially in a public cloud model, are as follows: Companies don't want to have sensitive data leave their premises. Database access and speed are key to application performance. Not being able to manage the underlying infrastructure inhibits some organizations from going to a DBaaS model. In contrast to public cloud-based DBaaS, concerns regarding data security, performance, and visibility reduce significantly in the case of private DBaaS systems such as Trove. In addition, the benefits of a cloud environment are not lost either. Trove OpenStack Trove, which was originally called Red Dwarf, is a project that was initiated by HP, and many others contributed to it later on, including Rackspace. The project was in incubation till the Havana release of OpenStack. It was formally introduced in the Icehouse release in April 2014, and its mission is to provide scalable and reliable cloud DBaaS provisioning functionality for relational and non-relational database engines. As of the Liberty release, Trove is considered as a big-tent service. Big-tent is a new approach that allows projects to enter the OpenStack code namespace. In order for a service to be a big-tent service, it only needs to follow some basic rules, which are listed here. This allows the projects to have access to the shared teams in OpenStack, such as the infrastructure teams, release management teams, and documentation teams. The project should: Align with the OpenStack mission Subject itself to the rulings of the OpenStack Technical Committee Support Keystone authentication Be completely open source and open community based At the time of writing the article, the adoption and maturity levels are as shown here: The previous diagram shows that the Age of the project is just 2 YRS and it has a 27% Adoption rate, meaning 27 of 100 people running OpenStack also run Trove. The maturity index is 1 on a scale of 1 to 5. It is derived from the following five aspects: The presence of an installation guide Whether the Adoption percentage is greater or lesser than 75 Stable branches of the project Whether it supports seven or more SDKs Corporate diversity in the team working on the project Without further ado, let's take a look at the architecture that Trove implements in order to provide DBaaS. Architecture The trove project uses some shared components and some dedicated project-related components as mentioned in the following subsections. Shared components The Trove system shares two components with the other OpenStack projects, the backend database (MySQL/MariaDB), and the message bus. The message bus The AMQP (short for Advanced Message Queuing Protocol) message bus brokers the interactions between the task manager, API, guest agent, and conductor. This component ensures that Trove can be installed and configured as a distributed system. MySQL/MariaDB MySQL or MariaDB is used by Trove to store the state of the system. API This component is responsible for providing the RESTful API with JSON and XML support. This component can be called the face of Trove to the external world since all the other components talk to Trove using this. It talks to the task manager for complex tasks, but it can also talk to the guest agent directly to perform simple tasks, such as retrieving users. The task manager The task manager is the engine responsible for doing the majority of the work. It is responsible for provisioning instances, managing the life cycle, and performing different operations. The task manager normally sends common commands, which are of an abstract nature; it is the responsibility of the guest agent to read them and issue database-specific commands in order to execute them. The guest agent The guest agent runs inside the Nova instances that are used to run the database engines. The agent listens to the messaging bus for the topic and is responsible for actually translating and executing the commands that are sent to it by the task manager component for the particular datastore. Let's also look at the different types of guest agents that are required depending on the database engine that needs to be supported. The different guest agents (for example, the MySQL and PostgreSQL guest agents) may even have different capabilities depending on what is supported on the particular database. This way, different datastores with different capabilities can be supported, and the system is kept extensible. The conductor The conductor component is responsible for updating the Trove backend database with the information that the guest agent sends regarding the instances. It eliminates the need for direct database access by all the guest agents for updating information. This is like the way the guest agent also listens to the topic on the messaging bus and performs its functions based on it. The following diagram can be used to illustrate the different components of Trove and also their interaction with the dependent services: Terminology Let's take a look at some of the terminology that Trove uses. Datastore Datastore is the term used for the RDBMS or NoSQL database that Trove can manage; it is nothing more than an abstraction of the underlying database engine, for example, MySQL, MongoDB, Percona, Couchbase, and so on. Datastore version This is linked to the datastore and defines a set of packages to be installed or already installed on an image. As an example, let's take MySQL 5.5. The datastore version will also link to a base image (operating system) that is stored in Glance. The configuration parameters that can be modified are also dependent on the datastore and the datastore version. Instance An instance is an instantiation of a datastore version. It runs on OpenStack Nova and uses Cinder for persistent storage. It has a full OS and additionally has the guest agent of Trove. Configuration group A configuration group is a bunch of options that you can set. As an example, we can create a group and associate a number of instances to one configuration group, thereby maintaining the configurations in sync. Flavor The flavor is similar to the Nova machine flavor, but it is just a definition of memory and CPU requirements for the instance that will run and host the databases. Normally, it's a good idea to have a high memory-to-CPU ratio as a flavor for running database instances. Database This is the actual database that the users consume. Several databases can run in a single Trove instance. This is where the actual users or applications connect with their database clients. The following diagram shows these different terminologies, as a quick summary. Users or applications connect to databases, which reside in instances. The instances run in Nova but are instantiations of the Datastore version belonging to a Datastore. Just to explain this a little further, say we have two versions of MySQL that are being serviced. We will have one datastore but two datastore versions, and any instantiation of that will be called an instance, and the actual MySQL database that will be used by the application will be called the database (shown as DB in the diagram). A multi-datastore scenario One of the important features of the Trove system is that it supports multiple databases to various degrees. In this subsection, we will see how Trove works with multiple Trove datastores. In the following diagram, we have represented all the components of Trove (the API, task manager, and conductor) except the Guest Agent databases as Trove Controller. The Guest Agent code is different for every datastore that needs to be supported and the Guest Agent for that particular datastore is installed on the corresponding image of the datastore version. The guest agents by default have to implement some of the basic actions for the datastore, namely, create, resize, and delete, and individual guest agents have extensions that enable them to support additional features just for that datastore. The following diagram should help us understand the command proxy function of the guest agent. Please note that the commands shown are only indicative, and the actual commands will vary. At the time of writing this article, Trove's guest agents are installable only on Linux; hence, only databases on Linux systems are supported. Feature requests (https://blueprints.launchpad.net/trove/+spec/mssql-server-db-support) were created for the ability to create a guest agent for Windows and support Microsoft SQL databases, but they have not yet been approved at the time of writing this and might be a remote possibility. Database software distribution support Trove supports various databases; the following table shows the databases supported by this service at the time of writing this. Automated installation is available for all the different databases, but there is some level of difference when it comes to the configuration capabilities of Trove with respect to different databases. This has lot to do with the lack of a common configuration base among the different databases. At the time of writing this article, MySQL and MariaDB have the most configuration options available, as shown in this list: Database Version MySQL 5.5, 5.6 Percona 5.5, 5.6 MariaDB 5.5, 10.0 Couchbase 2.2, 3.0 Cassandra 2.1 Redis 2.8 PostgreSQL 9.3, 9.4 MongoDB 2.6, 3.0 DB2 Expre 10.5 CouchDB 1.6 So, as you can see, almost all the major database applications that can run on Linux are already supported on Trove. Putting it all together Now that we have understood the architecture and terminologies, we will take a look at the general steps that are followed: Horizon/Trove CLI requests a new database instance and passes the datastore name and version, along with the flavor ID and volume size as mandatory parameters. Optional parameters such as the configuration group, AZ, replica-of, and so on can also be passed. The Trove API requests Nova for an instance with the particular image and a Cinder volume of a specific size to be added to the instance. The Nova instance boots and follows these steps: The cloud-init scripts are run(like all other Nova instances) The configuration files (for example, trove-guestagent.conf) are copied down to the instance The guest agent is installed The Trove API will also have sent the request to the task manager, which will then send the prepare call to the message bus topic. After booting, the guest agent listens to the message bus for any activities for it to do, and once it finds a message for itself, it processes the prepare command and performs the following functions: Installing the database distribution (if not already installed on the image) Creating the configuration file with the default configuration for the database engine (and any configuration from the configuration groups associated overriding the defaults) Starting the database engine and enabling auto-start Polling the database engine for availability (until the database engine is available or the timeout is reached) Reporting the status back to the Trove backend using the Trove conductor The Trove manager reports back to the API and the status of the machine is changed. Use cases So, if you are wondering all the places where we can use Trove, it fits in rather nicely with the following use cases. Dev/test databases Dev/test databases are an absolute killer feature, and almost all companies that start using Trove will definitely use it for their dev/test environments. This provides developers with the ability to freely create and dispose of database instances at will. This ability helps them be more productive and removes any lag from when they want it to when they get it. The capability of being able to take a backup, run a database, and restore the backup to another server is especially key when it comes to these kinds of workloads. Web application databases Trove is used in production for any database that supports low-risk applications, such as some web applications. With the introduction of different redundancy mechanisms, such as master-slave in MySQL, this is becoming more suited to many production environments. Features Trove is moving fast in terms of the features being added in the various releases. In this section, we will take a look at the features of three releases: the current release and the past two. The Juno release The Juno release saw a lot of features being added to the Trove system. Here is a non-exhaustive list: Support for Neutron: Now we can use both nova-network and Neutron for networking purposes Replication: MySQL master/slave replication was added. The API also allowed us to detach a slave for it to be promoted Clustering: Mongo DB cluster support was added Configuration group improvements: The functionality of using a default configuration group for a datastore version was added. This allows us to build the datastore version with a base configuration of your company standards Basic error checking was added to configuration groups The Kilo release The Kilo release majorly worked on introducing a new datastore. The following is the list of major features that were introduced: Support for the GTID (short for global transaction identifier) replication strategy New datastores, namely Vertica, DB2, and CouchDB, are supported The Liberty release The Liberty release introduced the following features to Trove. This is a non-exhaustive list. Configuration groups for Redis and MongoDB Cluster support for Redis and MongoDB Percona XtraDB cluster support Backup and restore for a single instance of MongoDB User and database management for Mongo DB Horizon support for database clusters A management API for datastores and versions The ability to deploy Trove instances in a single admin tenant so that the Nova instances are hidden from the user In order to see all the features introduced in the releases, please look at the release notes of the system, which can be found at these URLs: Juno : https://wiki.openstack.org/wiki/ReleaseNotes/Juno Kilo : https://wiki.openstack.org/wiki/ReleaseNotes/Kilo Liberty : https://wiki.openstack.org/wiki/ReleaseNotes/Liberty Summary In this article, we were introduced to the basic concepts of DBaaS and how Trove can help with this. With several changes being introduced and a score of one on five with respect to maturity, it might seem as if it is too early to adopt Trove. However, a lot of companies are giving Trove a go in their dev/test environments as well as for some web databases in production, which is why the adoption percentage is steadily on the rise. A few companies that are using Trove today are giants such as eBay, who run their dev/test Test databases on Trove; HP Helion Cloud, Rackspace Cloud, and Tesora (which is also one of the biggest contributors to the project) have DBaaS offerings based on the Trove component. Trove is increasingly being used in various companies, and it is helping in reducing DBAs' mundane work and improving standardization. Resources for Article: Further resources on this subject: OpenStack Performance, Availability [article] Concepts for OpenStack [article] Implementing OpenStack Networking and Security [article]

0
0
3515

article-image-responsive-visualizations-using-d3js-and-bootstrap

Packt

01 Mar 2016

21 min read

Responsive Visualizations Using D3.js and Bootstrap

Packt

01 Mar 2016

21 min read

0
0
16457

Packt

01 Mar 2016

32 min read

Breaking the Bank

Packt

01 Mar 2016

32 min read

0
0
4591

article-image-analyzing-transport-layer-protocols

Packt

01 Mar 2016

13 min read

Analyzing Transport Layer Protocols

Packt

01 Mar 2016

13 min read

0
0
15321

article-image-rxjs-observable-promise-users-part-1

Charanjit Singh

29 Feb 2016

12 min read

(RxJS) Observable for Promise Users : Part 1

Charanjit Singh

29 Feb 2016

12 min read

I heard you like Promises! You should like them; they are awesome. I assume you are here because you have heard of something even more awesome and want to know more. In this small two-part post series, we'll explore RxJS Observable with hints from your existing knowledge of JavaScript Promise. We'll see how Observable can be used in places where you might be (incorrectly) using Promise now; we'll understand what an Observable is, how similar/different from Promise it is; and we'll see how Promise and Observable are actually best friends in disguise. For our journey we'll build a small app we will call "Code like Chuck Norris app." It's a ludicrously small/simple app, which will just show quotes about Chuck Norris and programming. Our real objective is to get inspired from Chuck Norris and eventually become a better coder. For that we need to get the Chuck Norris inspirational quotes. We'll do so by requesting quotes from the holy ICNDB. We'll use superagent for making requests. It's a really simple library that allows making Ajax requests from both Node.js and the browser. For starters, let's turn that console in your browser to an inspirational one. Here's our first test run: import superagent from 'superagent'; let apiUrl; apiUrl = `http://api.icndb.com/jokes/random?limitTo=[nerdy,explicit]`; superagent .get(apiUrl) .end((err, res) => { let data, inspiration; data = JSON.parse(res.text); inspiration = data.value.joke; console.log(inspiration); }); Note that we are using ES6. You can check out the live examples on jsbin: http://jsbin.com/loyemo/edit. Fear not if you see a blank page opening that link. Just click on the Edit in JsBin button on the top right, and then turn on ES6/Babel and Console tabs in jsbin. Click on the run button to see the code running. Also make sure you have http://jsbin.com/* and not https. You won't be able to make a request to ICNDB over https. The code is fairly straight-forward. We ask superagent to make a GET request to our apiUrl, and when the request finishes, superagent invokes our callback. We don't handle the error, parse the response JSON and console.log the inspiration. Now that we know how to get inspired in the console, let's turn this callback inspiration to promise based inspiration. In this post, we'll keep building our inspirational app in 2 codebases. Every part that we build, we'll build it using both Promises and Observables. This will give you a good perspective of how things are done with both approaches. Plus, we are going to create both Promise and Observable from scratch, so to get an idea of how things work under the hood, and to see the similarities and differences between the two. Promise Promises are awesome. I liked them when I first heard of them. I mean, the ability to pass around the asynchronous computations like they are variables, what more can you ask from life?Here's what our promise-based inspiration looks like (JsBin: http://jsbin.com/qolido/edit) import superagent from 'superagent'; let apiUrl, promiscuousInspiration; apiUrl = `http://api.icndb.com/jokes/random?limitTo=[nerdy,explicit]`; promiscuousInspiration = new Promise((resolve, reject) => { superagent .get(apiUrl) .end((err, res) => { if (err) { return reject(err); } let data, inspiration; data = JSON.parse(res.text); inspiration = data.value.joke; return resolve(inspiration); }); }); promiscuousInspiration .then((inspiration) => { console.log('Inspiration: ', inspiration); }, (err) => { console.error('Error while getting inspired: ', err); }); For all the power they bring, Promises are simple business. Here we're using the ES6 native Promise. For creating a new promise we pass the Promise constructor a function that receives two functions: resolve and reject. Inside this function we do our asynchronous work. We execute reject when an error happens, else we execute resolve when the asynchronous job gets its result. And then we can carry this Promise around like a variable, and work with it elsewhere in the program. To use a promise, we use its then method, which takes two callbacks for success and error. Promises can do plenty more than this, but you already know that, right? It's about time to address the elephant in the room. Let's create an Observable for the same task and see how it looks. Observable Observables are similar to Promises in that they both represent asynchronous jobs as first class things in Javascript. The similarity doesn't end here. You can in fact think of an Observable as a Promise that can resolve more than once. Or more aptly, we have the plural of Promise. We'll come to that later, for now, let's port our inspirational console to use Observable. JsBin: http://jsbin.com/foweyo/1/edit import superagent from 'superagent'; import {Observable} from 'rx'; let apiUrl, reactiveInspiration; apiUrl = `http://api.icndb.com/jokes/random?limitTo=[nerdy,explicit]`; reactiveInspiration = Observable.create((observer) => { superagent .get(apiUrl) .end((err, res) => { if (err) { return observer.onError(err); } let data, inspiration; data = JSON.parse(res.text); inspiration = data.value.joke; observer.onNext(inspiration); observer.onCompleted(); }); return () => { console.log('Release the kraken!'); }; }); reactiveInspiration.subscribe( (inspiration) => { console.log('Inspiration: ', inspiration); }, (error) => { console.error('Error while getting inspired', error); }, () => { console.log('Done getting inspired!'); } ); Observable.create is a helper for creating Observable for a specified subscribe method implementation (we'll see what that means in a minute). It looks very similar to creating a Promise but there are some differences. Let's walk over them from bottom up. reactiveInspiration.subscribe To use the value of a Promise, we use its 'then' method. For using an Observable, the equivalent is its subscribe method. First question that pops to mind is why the weird name? I mean aPromise.then makes perfect sense "when this promise resolves, then do this". As I hinted above, an Observable is not just a single value. It represents a series of values. You can think of it as a data stream to which we subscribe. Three callbacks to subscribe The second thing to note is three callbacks being passed to subscribe instead of just two. Again, two callbacks for success and error makes perfect sense, but what is that third one for? Well, Observable is not a single computation that'll succeed or fail, it is a series. Arrays have an implicit null value that marks the termination of Array, a value that tells the loop (or whatever uses the array) that, "We are done." Observables are the same as arrays in that they're a collection, or an asynchronous collection of asynchronous values, but still a collection (actually calling it an Iterable would make more sense but they're similar enough to be assumed same for this case). A collection needs to tell its consumer when it finishes. That is what third callback does. It's called onCompleted callback, and it gets invoked when the Observable is completed and will not receive any more values. In our simple example it gets completed right after receiving one value. The other two callbacks are called onNext and onError in that order. Together this trio makes a subscriber of the Observable. We can actually pass an object to the subscribe method with these three methods as keys (JsBin: http://jsbin.com/foweyo/2/edit): reactiveInspiration.subscribe({ onNext: (inspiration) => { console.log('Inspiration: ', inspiration); }, onError: (error) => { console.error('Error while getting inspired', error); }, onCompleted: () => { console.log('Done getting inspired!'); }, }); Implicitly, RxJs will create a Subscriber with the methods we provide and use default for those we don't provide, but let's not get technical about it. Now that you know what a subscriber is for an Observable, let's move up the code and see how to create an Observable from scratch with Observable.create. Observable.create, which creates an Observable from specified subscriber implementation. It's actually very simple in practice. Observable.create's callback is given an object as argument, which provides our three favorite methods: onNext, onError, and onCompleted. Just like we do with Promise, we do our asynchronous work in the callback we pass to Observable.create. We call the appropriate callback when the asynchronous job get its value or some error happens, or in the case of Observable, when we are done sending values from the Observable (i.e when the asynchronous job is finished). In our app, the first thing we do is check if the request as resulted in an error. If it does, we call the onError method of the observer we receive in the callback. When we get the value, we call observer.onNext. In our case we are only interested in one value, so we mark our Observable finished right after that by calling onCompleted of the observer. Release the Kraken? This all was quite similar to the Promise, but what is that function we are returning at the end? It's a function, which releases all the Krakens you use in your asynchronous operation. All of the resources that you need to get rid of when your Observable need to quit. In our case we don't have anything we want to get rid of, but there are times when we do. For example, if you haven't already guessed, we can represent almost any asynchronous operation in JavaScript as an Observable. That includes operations like events (yes including DOM events). The dark art of DOM-events can end up being pretty expensive if we forget to remove the event-listeners. So that's what we can release in this function. Just imagine how powerful this can end up being. We defined what resources to release right when we started using them! If you can't yet see the power it delivers, then just trust me on this one (as well). Being able to get rid of potentially expensive resources declaratively with a standard interface is very powerful. We'll taste a sip of this power in next part of this post. This function can be called the dispose function of the Observable. And the Observable can be said to be implementing the Disposable interface if you're into those things. The finishing part of the asynchronous operation is the default in case of a Promise. When a promise gets a value or when it meets an error, it is finished. An Observable differs from a Promise only in case of getting a value. The Observable doesn't finish when we invoke onNext. But for the other part (facing an error), they are the same as Promise. An Observable stops in two situations: when it receives onCompleted when it faces an error If any error occurs in the Observable, it stops right there and will not emit any more values unless explicitly asked to do so. It also calls its dispose function when an Observable finishes. So it's guaranteed that an Observable's resources will get freed whenever it finishes, naturally or by error. That's all there is you need to know to get started with Observables. A Promise represents a single asynchronous value, like a regular variable. Observables on the other hand represents a series of asynchronous values, like an array. Laziness The second noticeable difference you might need to know before we move to next part of this tutorial is that Observables are lazy while promises are not. In our example, let's write a comment in both cases whenever a value is arrived from the API. JsBin: http://jsbin.com/zavidi/1/edit import superagent from 'superagent'; let apiUrl, promiscuousInspiration; apiUrl = `http://api.icndb.com/jokes/random?limitTo=[nerdy,explicit]`; promiscuousInspiration = new Promise((resolve, reject) => { superagent .get(apiUrl) .end((err, res) => { if (err) { return reject(err); } let data, inspiration; data = JSON.parse(res.text); inspiration = data.value.joke; console.log('Inspiration has arrived!'); return resolve(inspiration); }); }); // promiscuousInspiration // .then((inspiration) => { // console.log('Inspiration: ', inspiration); // }, (err) => { // console.error('Error while getting inspired: ', err); // }); And let's do equivalent for Observable part (JsBin: http://jsbin.com/zavidi/3/edit): //in observable.js import superagent from 'superagent'; import {Observable} from 'rx'; let apiUrl, reactiveInspiration; apiUrl = `http://api.icndb.com/jokes/random?limitTo=[nerdy,explicit]`; reactiveInspiration = Observable.create((observer) => { superagent .get(apiUrl) .end((err, res) => { if (err) { return observer.onError(err); } let data, inspiration; data = JSON.parse(res.text); inspiration = data.value.joke; console.log('Inspiration has arrived!'); observer.onNext(inspiration); observer.onCompleted(); }); return () => { console.log('Release the Kraken!'); }; }); // reactiveInspiration.subscribe({ // onNext: (inspiration) => { // console.log('Inspiration: ', inspiration); // }, // onError: (error) => { // console.error('Error while getting inspired', error); // }, // onCompleted: () => { // console.log('Done getting inspired!'); // }, // }); Notice that we have commented out the part of the code that actually uses our Promise/Observable. Now if you run the app for both files, you'll notice that the promise code makes the request and invokes the code (you'll see a log in console) even though we don't really use the Promise. The Observable, however, acts as if it doesn't even exist. This is because Observables are lazy. An Observable won't execute code until it has at least one subscriber. We can deduce here that a let p = new Promise(...) actually carries the "value" of the promise which was obtained ages ago when the promise was created. On the other hand, an Observable carries the complete computation with it, which is executed on-demand. This laziness (like most other features of Observable) makes them very powerful. This also means that we can reproduce (re-execute) an Observable from itself (for example when you wanna retry on error), while to do the same for a Promise we need to have access to the code that generates the promise. Now that we understand what Observables are, we shall practice their real powers in next part of this tutorial. About the author Charanjit Singh is a freelance developer based out of Punjab, India. He can be found on GitHub @channikhabra.

0
0
12820

How-To Tutorials

An Introduction to Python Lists and Dictionaries

Python Data Structures

Magento 2 – the New E-commerce Era

Getting Started with Deep Learning

Common WLAN Protection Mechanisms and their Flaws

Enhancing the Scripting Experience

Docker Hosts

Common Grunt Plugins

App Development Using React Native vs. Android/iOS

(RxJS) Observable for Promise Users : Part 2

Trending Topics

Introducing OpenStack Trove

Responsive Visualizations Using D3.js and Bootstrap

Breaking the Bank

Analyzing Transport Layer Protocols

(RxJS) Observable for Promise Users : Part 1

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access