Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
Packt
23 Jun 2017
18 min read
Save for later

User Story Map – The First User Experience Map in a Product’s Life

Packt
23 Jun 2017
18 min read
In this article by Peter W. Szabo, the author of the book User Experience Mapping we will explore the idea of how to start with predictive analysis. In this User story maps solve the user's problems in form of a discussion. Your job as a product manager or user experience consultant should be to make the world better through user-centric products. Essentially solving the user's problems. Contrary to popular belief, user story maps are not just cash cows for agile experts. They will help a product to succeed, by increasing their understanding of the system. Not just what's inside it, but what will happen to the world as a result. By focusing on the opportunity and outcomes the team can prioritize development. In reality, this often means stopping the proliferation of features, andunderdoing your competition. Wait a minute, did you just read underdoing? As in, fewer features, not making bold promises and significantly less customizability and options? Yes indeed. The founders of Basecamp (formerly 37 signals) are the champions of building less. In their bookReWork: Change the Way You Work Foreverthey tell Basecamp's success story while giving vital advice to anyone trying to run a build a product or a startup: “When things aren't working, the natural inclination is to throw more at the problem. More people, time, and money. All that ends up doing is making the problem bigger. The right way to go is the opposite direction: Cut back. So do less. Your project won't suffer nearly as much as you fear. In fact, there's a good chance it'll end up even better.” (Jason Fried) User Story Maps will help you to throw less at the problem, chopping down extras, until you reach an awesome product, which is actually done.One of the problems with long product backlogs or nightmarish requirement documents is that it never gets done. Literally never. Once I had to work on improving the user experience of a bank's backend. It was a gargantuan task, as this backend was a large collection of distributed microservices, which meant hundreds of different forms with hard to understand functions and a badly designed multi-level menu which connected them together. I knew almost nothing about banking, and they knew almost nothing about UX, so this was a match made in heaven. They gave me a twelve-page document. That was just the non-disclosure agreement. The project had many 100+ page documents, detailing various things and how they are done, complete with business processes and banking jargon. They wanted us to compile an expert review on what needs to be redesigned and create a detailed strategy for that. I found a better use of their money than wasting time on expert reviews and redesign strategies at that stage. Recording or even watching bank employees, while they used the system during their work was out of the question. So we went for the quick win and did user story mapping in the first week of the project. Among the attendees of the user story mapping sessions, therewerea few non-manager level bank employees, who used the backend extensively. One of them was quite new to her job, but fortunately, quite talkative about it. It was immediately evident that most employees almost never used at least 95% of the functionality. Those were reserved for specially trained people, usually managers. After creating the user story map with the most essential and frequently used features, I suggested a backend interface, which only contained about 1% of the functionality of the old system at first, with the mention of other features to be added later. (As a UX consultant you should avoid saying no, instead try saying later. It has the same effect for the project but keeps everyone happy.) No one in the room believed that such a crazy idea would go through senior management, although they supported the proposal. Quite the contrary, it did go extremely well with senior management. The senior managers understood, that by creating a simple and fast backend user interface, they will be able to reduce the queues without hiring new employees. Moreover, if they need to hire people, training will be easier and faster. The new UI could also reduce the number of human errors. Almost all of the old backend was still online two years later, although used only by a few employees. This made both the product and the information security team happy, not to mention HR. The functionality of the new application extended only slightly in 24 months. Nobody complained and the bank's customers were happy with smaller queues. All this was achieved with a pack of colored sticky notes, some markers and much more importantly a discussion and shared understanding. This is just one example, how a simple technique, like user story mapping, could save millions of dollars for a company. (For more resources related to this topic, see here.) Just tell the story Drawing a map, any map will lead to solving the problem. User story maps aim to replace document hand-overs with discussions and collaboration. Enterprises tend to have some sort of formal approval process, usually with a sign-off. That's perfectly fine, and most of the time unavoidable. Just make sure, that the sign-off happens after the mapping and story discussions. Ideally, right after the discussion, not days or weeks later. There is a reason why product manager, UX experts and all stakeholders love stories: they are humans. As such, we all have a natural tendency to love an emotionally satisfying tale. Most of our entertainment revolves around stories, and we want to hear good stories. A great story revolves around conflicts in a memorable and exciting way. How to tell a story? Telling a story is an easy task. We all did that as kids, yet we tend to forget about that skill we possess when we get into a serious product management discussion. How to tell a great story? There are a few rules to consider, the most important one is that you should talk about something that captivates the audience. The audience You should focus on the audience. What are their problems? What would make them listen actively, instead of texting or catching Pokémon, while at a user story discussion? Even if the project is about scratching your own itch, you should spin the story so it's their itch that is scratched. Engaging the audience can be indeed a challenge. Once upon atimeI have written a sci-fi novel. Actually, it was published in 2000, with the title Tűzsugár, in Hungarian.The English title would be Ray of Fire, but fortunately for my future writing career, it was never translated into English. The bookhad everything my 15-16 years old self consideredfun: For instance a great space war or passionate love between members of different sapient spacefaring races. The characters were miscellaneous human and non-human life-forms stuck in a spaceship for most of the story. Some of my characters had a keen resemblance to miscellaneous video-game characters, from games like Mortal Kombat 2 or Might & Magic VI. They certainly lacked emotional struggles over insignificant things like mass-murder or the end of the universe. As I certainly hope you will never read that book, I will spoiler you the end. A whole planet died, hinting that the entire galaxy might share the same fate, with a faint hope for salvation. This could have led to a sequel, but fortunately for all sci-fi lovers, I stopped writing the sequel after nine chapters. The book seemed to be a success. A national TV channel made an interview with me, if that’s any measure of success. More importantly, I had lots of fun writing it. But the book itself was hard to understand and probably impossible to appreciate. My biggest mistake was writing only what I considered fun. To be honest, I still write for fun, but now Ihave an audience in mind. I tell the story of my passion for user experience mapping to a great audience: you. I try to find things that are fun to write and still interesting to my target audience. As a true UX geek, I create the main persona of my audience before writing anything and tell a story to her. This article’s main persona is called Maya, and she shares many traits with my daughter. Could I say, I'm writing this book to my daughter? Of course, I do, but I keep in mind all otherpersonas. Hopefully one of them is a bit like you. Before a talk at a conference, I always ask the organizers about the audience. Even if the topic stays the same, the audience completely changes the story and the way I present it. I might tell the same user story differently to one of my team members, to the leader of another team, or to a chief executive. Differently internally, to a client or a third party. When telling a story, contrary to a written story, you will see an immediate feedback or the lack of it from your audience. You should go even further and shape the story based on this feedback. Telling a story is an interactive experience. Engage with the audience. Ask them questions, and let them ask you questions as a start, then turn this into a shared storytelling experience, where the story is being told by all participants together (not at the same time, though, unless you turn the workshop into a UX carol). When you tell a fairy-tale to your daughter, she might ask you why can't theprincessescape using her sharp wits and cunning, instead of relying on a prince challenging the dragon to a bloody duel. (Then you might start appreciating the story of the My Little Pony where the girl ponies solve challenges mostly non-violently while working together as a team of friends, instead of acting as a prize to be won.) So why not spin a tale of heroic princesses with fluffy pink cats?   Start with action Beginningin medias res, as in starting the narrative in the midst of action is a technique used by masters, such as Shakespeare or Homer, and it is also a powerful tool in your user story arsenal. While telling a story, always try to add as little background as possible, and start with drama or something to catch the attention of the audience, whenever possible. At the beginning of TheOdyssey quite a few unruly men want to marry Telemachus' mother, while his father has still not returned home from the Trojan War. There is no lengthy introduction explaining how those men ended up in Ithaca, or why the goddess, flashing-eyed Athena cares about Odysseus. The poem was composed in an oral tradition and was more likely to be heard thanreadat the time of composition. While literacy skyrocketed since Homer's time, you want to tell and discuss your user stories. Therefore you should consider a similar start. (Maybe not mentioning the user's mother or her rascally suitors.) Simplify In literary fiction, a complex story can be entertaining. A Game of Thrones and its sequels inA Song of Ice and Fireseries is a good example for that. The thing is, George R. R. Martin writes those novels, and he certainly has no intention to discuss them during sprint planning meetings with stakeholders. User Story Maps are more similar to sagas, folktales and other stories formed in an oral tradition. They develop in a discussion, and their understandability is granted by their simplicity. We need to create a map as simple and small as possible, with as few story cards as possible. So how big should the story map be? Jim Shore definesstory card hell as something that happens when you have 300 story cards and you have to keep track of them. Madness, huh? This is not Sparta! Sorry Jim for the bad pun, but you are absolutely right, in the 300 range, you will not understand the map, and the whole discussion part will completely fail. The user stories will be lost, and the audience will not even try to understand what's happening. There is no ideal number of cards in a story map but aim low. Then eliminate most of the cards. Clutter will destroy the conversation. In most card games you will have from two to seven cards in hand, with some rare exceptions. The most popular card game both online and offline is Texas Hold 'em Poker. In that game, each player is dealt only two cards. This is because human thought processes and discussions work best with a small number of objects. Sometimes the number of objects in the real world is high. Our mind is good at simplifying, classifying and grouping things into manageable units. With that said, most books and conference presentations about user story mapping show us a photo of a wall covered with sticky notes. The viewer will have absolutely no idea what's on them, but one thing is certain, it looks like acomplex project. I have a bad news for you: projects with a complex user story map never get finished, and if they do get finished to a degree they will fail. The abundance of sticky notes means that the communication and simplification process needs one or more iterations. Throw away most of the sticky notes! To do that, you need to understand the problem better. Tell the story of your passion Seriously. Find someone, and tell her the user story of the next big thing. The app or hardware which will change the world. Try it now. Be bold and let your imagination flow. I believe that in this century we will be able to digitalize human beings. This will be the key to both humankind's survival as a species and our exploration of the space. The digital society would have no famine, no plagues and no poverty. This would solve all major problems we face today. Digital humans would even defeat death. Sounds like a wild sci-fi idea? It is, but then again, smartphones were also a wild sci-fi idea a few decades ago. Now I will tell you the story of something we can build today. The grocery surplus webshop We will create the user story map for a grocery surplus webshop. Using this eCommerce site, we will sell clearancefood and drink at a discount price. This means food, that would be thrown away at a regular store. For example food past its expiry date or with damaged packaging.This idea is popular in developed countries, like Denmark or the UK, and it might help cutting down on the amounts of food wasted every year, totaling 1.3 billion metric tonnes worldwide. We are trying to create the online-only version of WeFood (https://donate.danchurchaid.org/wefood). Our users can be environmentally conscious shoppers or low-income families with limited budgets just to give two examples. In this article I will not introduce personas, and treat them separately, so for now, we will only think about them as shoppers. The opportunity to scratch your own itch Mapping will help you to achieve the most, with as little as possible. In other words: maximize the opportunity, while minimizing the outputs. To use the mapping lingo: The outputs are products of the map’s usage, while the outcomes are the results. The opportunity is the desired outcome we plan to achieve with the aid of the map. This is how we want to change the world. We should start the map with the opportunity. The opportunity should not be defined as selling surplus food and drink to our visitors. If you approach a project or a business without solving the users' problem the project might become a failure. The best way to find out what our user want is through valid user research, remote and lab-based user experience testing. Sometimes we need to settle with the second best solution, which happens to be free. That’ssolving your own problem, in other words,scratch your own itch. Probably the best summary of this mantra comes from Jason Fried, the founder and CEO of Basecamp: “When you solve your own problem, you create a tool that you're passionate about. And passion is key. Passion means you'll truly use it and care about it. And that's the best way to get others to feel passionate about it too.” (Getting Real: The Smarter, Faster, Easier Way to Build a Successful Web Application) We will create the web store we would love to use. Although, as the cliché goes, there is noI inteam, but there is certainly an I inwriter. My ideal eCommerce site could be different to yours. When following the examples of this article, please try to think of your itch, your ideal web store, and use my words only as guidance. You can create the user story map for any other project, ideally something you are passionate about. I would encourage you to pick something that's not a webshop, or maybe not even a digital product if you feel adventurous. You need the tell the story of your passion. (No, not that passion. This is not an adults-only website.) My passion is reducing food waste (that's also the poor excuse I'm using when looking at the bathroom scale). Here is my attempt to phrase the opportunity. The opportunity: Our shoppers want to save money while reducing global food waste. They understand and accept what surplus food and drink means, and they are happy to shop with us. Actually, the first sentence would be enough. Remember, you want to have a simple one or two sentence opportunity definition. I ended up working for two tapestry web shops as a consultant. Not at the same time, though, and the second company approached me mostly as the result of how successful the first one was. It's a relatively small industry in Europe, and business owners and decision-makers know each other by name. I still recall the pleasant experience I had meeting the owners of the first web shop. They invited me to dinner at a homely restaurant in Budapest.We had a great discussion and they shared their passion. They were an elderly couple, so they must have spent most of their life in the communist era. In the early 90's they decided to start a business, selling tapestry in a brick and mortar store. Obviously, they had no background in management or running acapitalist business, but that didn't matter, they only wanted to help people to make their homes beautiful. They loved tapestry, so they started importing and selling it. When I visited their physical store I have seen them talking to a customer. They spent more than an hour discussing interior decoration with someone, who just popped by to ask the square meter prices of tapestry. Tapestry is not sold per square meter, but they did the math for the customer among many other things. They showed her many different patterns, types and discussed application methods. After leaving the shop the customer knew more about tapestry than most other people ever will. Fast forward to the second contract. I only talked to the client on Skype, and that's perfectly fine because most of my clients don't invite me to dinner. I saw many differences in this client's approach to the previous one. At some point, I asked him “Why do you sell tapestry? Is tapestry your passion?” He was a bit startled by the question, but he promptly replied: “To make money, why else? You need to be pretty crazy to have tapestry as a passion.” Seven years later the second business no longer exists, yet the first one is still successful. Treating your work as your passion works wonders. Passion is an important contributor to the success of an idea. Whenever possible, pour your passion into a product and summarize it as the opportunity. What’s next? If you buy my new book, User Experience Mapping, you will find more about user story maps in the second chapter. In that chapter, we will explore user story maps, and how they help you to create requirements through collaboration (and a few sticky notes): We will create user stories and arrange them as a user story map. We will discuss the reasons behind creating them. We will learn how to tell a story. The grocery surplus webshop's user story map will be the example I will create in this chapter. To do this, we will explore user story templates, characteristics of a good user story (INVEST) and epics. With the 3 Cs (Card, Conversation and Confirmation) process we will turn the stories into reality. We will create a user story map on a wall with sticky notes Then digitally using StoriesOnBoard. And that’s just the second chapter, each of the eleven chapters contains different user experience maps. The book reveals two advanced mapping techniques for the first time in print, the behavioural change map and the 4D UX map. You will also explore user story maps, task models and journey maps. You will create wireflows, mental model maps, ecosystem maps and solution maps. In this book, we will show you how to use insights from real users to create and improve your maps and your product. Start mapping your products now to change your users’ lives! Resources for Article: Further resources on this subject: Learning D3.js Mapping Data Acquisition and Mapping Creating User Interfaces
Read more
  • 0
  • 0
  • 933

article-image-string-encryption-and-decryption
Packt
22 Jun 2017
21 min read
Save for later

String Encryption and Decryption

Packt
22 Jun 2017
21 min read
In this article by Brenton J.W Blawat, author of the book Enterprise PowerShell Scripting Bootcamp, we will learn about string encryption and decryption. Large enterprises often have very strict security standards that are required by industry-specific regulations. When you are creating your Windows server scanning script, you will need to approach the script carefully with certain security concepts in mind. One of the most common situations you may encounter is the need to leverage sensitive data, such as credentials,in your script. While you could prompt for sensitive data during runtime, most enterprises want to automate the full script using zero-touch automation. (For more resources related to this topic, see here.) Zero-touch automation requires that the scripts are self-contained and have all of the required credentials and components to successfully run. The problem with incorporating sensitive data in the script, however, is that data can be obtained in clear text. The usage of clear text passwords in scripts is a bad practice, and violates many regulatory and security standards. As a result, PowerShell scripters need a method to securely store and retrieve sensitive data for use in their scripts. One of the popular methods to secure sensitive data is to encrypt the sensitive strings. This article explores RijndaelManaged symmetric encryption, and how to use it to encrypt and decrypt strings using PowerShell. In this article, we will cover the following topics: Learn about RijndaelManaged symmetric encryption Understand the salt, init, and password for the encryption algorithm Script a method to create randomized salt, init, and password values Encrypt and decrypt strings using RijndaelManaged encryption Create an encoding and data separation security mechanism for encryption passwords The examples in this article build upon each other. You will need to execute the script sequentially to have the final script in this article work properly. RijndaelManaged encryption When you are creating your scripts, it is best practice to leverage some sort of obfuscation, or encryption for sensitive data. There are many different strategies that you can use to secure your data. One is leveraging string and script encoding. Encoding takes your human readable string or script, and scrambles it to make it more difficult for someone to see what the actual code is. The downsides of encoding are that you must decode the script to make changes to it and decoding does not require the use of a password or passphrase. Thus, someone can easily decode your sensitive data using the same method you would use to decode the script. The alternative to encoding is leveraging an encryption algorithm. Encryption algorithms provide multiple mechanisms to secure your scripts and strings. While you can encrypt your entire script, it's most commonly used to encrypt the sensitive data in the scripts themselves, or answer files. One of the most popular encryption algorithms to use with PowerShell is RijndaelManaged. RijndaelManaged is a symmetric block cipher algorithm, which was selected by United States National Institute of Standards and Technology (NIST) for its implementation of Advanced Encryption Standard (AES). When using RijndaelManaged for the standard of AES, it supports 128-bit, 192-bit, and 256-bit encryption. In contrast to encoding, encryption algorithms require additional information to be able to properly encrypt and decrypt the string. When implementing the RijndaelManaged in PowerShell, the algorithm requires salt, a password, and the InitializationVector (IV). The salt is typically a randomized value that changes each time you leverage the encryption algorithm. The purpose of salt in a traditional encryption scenario is to change the encrypted value each time the encryption function is used. This is important in scenarios where you are encrypting multiple passwords or strings with the same value. If two users are using the same password, the encryption value in the database would also be the same. By changing the salt each time, the passwords, though the same value, would have different encrypted values in the database. In this article, we will be leveraging a static salt value. The password typically is a value that is manually entered by a user, or fed into the script using a parameter block. You can also derive the password value from a certificate, active directory attribute values, or a multitude of other sources. In this article, we will be leveraging three sources for the password. The InitializationVector (IV) is a hash generated from the IV string and is used for the EncryptionKey. The IV string is also typically a randomized value that changes each time you leverage the encryption algorithm. The purpose of the IV string is to strengthen the hash created by the encryption algorithm. This was created to thwart a hacker who is leveraging a rainbow attack using precalculated hash tables using no IV strings, or commonly used strings. Since you are setting the IV string, the number of hash combinations exponentially increases and it reduces the effectiveness of a rainbow attack. In this article, we will be using a static initialization vector value. The implementation of randomization of the salt and initialization vector strings become more important in scenarios where you are encrypting a large set of data. An attacker can intercept hundreds of thousands of packets, or strings,which reveals an increasing amount of information about your IV. With this, the attacker can guess the IV and derive the password. The most notable hack of IVs were with WiredEquivalentPrivacy (WEP) wireless protocol that used aweak, or small, initialization vector. After capturing enough packets, anIV hash could be guessed and a hacker could easily obtain the passphrase used on the wireless network. Creating random salt, initialization vector, and passwords As you are creating your scripts, you will want to make sure you use complex random values for the salt, IV string, and password. This is to prevent dictionary attacks where an individual may use common passwords and phrases to guess the salt, IV string, and password. When you create your salt and IVs, make sure they are a minimum of 10 random characters each. It is also recommended that you use a minimum of 30 random characters for the password. To create random passwords in PowerShell, you can do the following: Function create-password { # Declare password variable outside of loop. $password = "" # For numbers between 33 and 126 For ($a=33;$a –le 126;$a++) { # Add the Ascii text for the ascii number referenced. $ascii += ,[char][byte]$a } # Generate a random character form the $ascii character set. # Repeat 30 times, or create 30 random characters. 1..30 | ForEach { $password += $ascii | get-random } # Return the password return $password } # Create four 30 character passwords create-password create-password create-password create-password The output of this command would look like the following: This function will create a string with 30 random characters for use with random password creation. You first start by declaring the create-password function. You then declare the $password variable for use within the function by setting it equal to "". The next step is creating a For command to loop through a set of numbers. These numbers represent ASCII character numbers that you can select from for the password. You then create the For command by writing For ($a=33; $a -le 126;$a++). This means starting at the number 33, increase the value by one ($a++), and continue until the number is less than or equal to 126. You then declare the $ascii variable and construct the variable using the += assignment operator. As the For loop goes through its iterations, it adds a character to the array values. The script then leverages the [char] or character value of the [byte] number contained in $a. After this section, the $ascii array will contain an array of all the ASCII characters with the byte values between 33 and 126. You then continue to the random character generation. You declare the 1..30 command, which means for numbers 1 to 30, repeat the following command. You pipe this to ForEach {, which will designate for each of the 30 iterations. You then call the $ascii array and pipe it to | get-random cmdlet. The get-random cmdlet will randomly select one of the characters in the $ascii array. This value is then joined to the existing values in the $password string using the assignment operator +=. After the 30 iterations, there will be 30 random values in the $password variable. Lastly, you leverage return $password, to return this value to the script. After declaring the function, you call the function four times using create-password. This creates four random passwords for use. To create strings that are less than 30 random characters in length, you can modify the 1..30 to be any value that you want. If you want 15 random character Salt and Initialization Vector, you would use 1..15 instead. Encrypting and decrypting strings To start using RijndaelManaged encryption, you need to import the .NET System.Security Assembly into your script. Much like importing a module to provide additional cmdlets, using .NET assemblies provide an extension to a variety of classes you wouldn't normally have access to in PowerShell. Importing the assembly isn't persistent. This means you will need to import the assembly each time you want to use it in a PowerShell session, or each time you want to run the script. To load the .NET assembly, you can use the Add-Type cmdlet with the -AssemblyName parameter with the System.Security argument. Since the cmdlet doesn't actually output anything to the screen, you may choose to print to the screen successful importing of the assembly. To import the System.Security Assembly with display information, you can do the following: Write-host "Loading the .NET System.Security Assembly For Encryption" Add-Type -AssemblyNameSystem.Security -ErrorActionSilentlyContinue -ErrorVariable err if ($err) { Write-host "Error Importing the .NET System.Security Assembly." PAUSE EXIT } # if err is not set, it was successful. if (!$err) { Write-host "Succesfully loaded the .NET System.Security Assembly For Encryption" } The output from this command looks like the following: In this example, you successfully import the.NET System.SecurityAssembly for use with PowerShell. You first start by writing "Loading the .NET System.Security Assembly for Encryption" to the screen using the Write-host command. You then leverage the Add-Type cmdlet with the -AssemblyName parameter with the System.Security argument, the -ErrorAction parameter with the SilentlyContinue argument, and the -ErrorVariable parameter with the err argument. You then create an if statement to see if $err contains data. If it does, it will use Write-host cmdlet to print"Error Importing the .NET System.Security Assembly." to the screen. It will PAUSE the script so the error can be read. Finally, it will exit the script. If $err is $null, designated by if (!$err) {, it will use the Write-host cmdlet to print "Successfully loaded the .NET System.Security Assembly for Encryption" to the screen. At this point, the script or PowerShell window is ready to leverageSystem.Security Assembly. After you loadSystem.Security Assembly, you can start creating the encryption function. The RijndaelManaged encryption requires a four-step process to encrypt the strings which is represented in the preceeding diagram. The RijndaelManaged encryption process is as follows: The process starts by creating the encryptor. The encryptor is derived from the encryption key (password and salt) and initialization vector. After you define the encryptor, you will need to create a new memory stream using the IO.MemoryStream object. A memory stream is what stores values in memory for use by the encryption assembly. Once the memory stream is open, you define a System.Security.Cryptography.CryptoStream object. The CryptoStream is the mechanism that uses the memory stream and the encryptor to transform the unencrypted data to encrypted data. In order to leverage the CryptoStream, you need to write data to the CryptoStream. The final step is to use the IO.StreamWriter object to write the unencrypted value into the CryptoStream. The output from this transformation is placed into MemoryStream. To access the encrypted value, you read the data in the memory stream. To learn more about the System.Security.Cryptography.RijndaelManaged class, you can view the following MSDN article: https://msdn.microsoft.com/en-us/library/system.security.cryptography.rijndaelmanaged(v=vs.110).aspx. To create a script that encrypts strings using the RijndaelManaged encryption, you would perform the following: Add-Type -AssemblyNameSystem.Security function Encrypt-String { param($String, $Pass, $salt="CreateAUniqueSalt", $init="CreateAUniqueInit") try{ $r = new-Object System.Security.Cryptography.RijndaelManaged $pass = [Text.Encoding]::UTF8.GetBytes($pass) $salt = [Text.Encoding]::UTF8.GetBytes($salt) $init = [Text.Encoding]::UTF8.GetBytes($init) $r.Key = (new-Object Security.Cryptography.PasswordDeriveBytes $pass, $salt, "SHA1", 50000).GetBytes(32) $r.IV = (new-Object Security.Cryptography.SHA1Managed).ComputeHash($init)[0..15] $c = $r.CreateEncryptor() $ms = new-Object IO.MemoryStream $cs = new-Object Security.Cryptography.CryptoStream $ms,$c,"Write" $sw = new-Object IO.StreamWriter $cs $sw.Write($String) $sw.Close() $cs.Close() $ms.Close() $r.Clear() [byte[]]$result = $ms.ToArray() } catch { $err = "Error Occurred Encrypting String: $_" } if($err) { # Report Back Error return $err } else { return [Convert]::ToBase64String($result) } } Encrypt-String "Encrypt This String""A_Complex_Password_With_A_Lot_Of_Characters" The output of this script would look like the following: This function displays how to encrypt a string leveraging the RijndaelManaged encryption algorithm. You first start by importing the System.Security assembly by leveraging Add-Type cmdlet, using the -AssemblyName parameter with the System.Security argument. You then declare the function of Encrypt-String. You include a parameter block to accept and set values into the function. The first value is $string, which is the unencrypted text. The second value is $pass, which is used for the encryption key. The third is a predefined $salt variable set to "CreateAUniqueSalt". You then define the $init variable, which is set to "CreateAUniqueInit". After the parameter block, you declare try { to handle any errors in the .NET assembly. The first step is to declare the encryption class using new-Object cmdlet with the System.Security.Cryptography.RijndaelManaged argument. You place this object inside the $r variable. You then convert the $pass, $salt, and $init values to the character encoding standard of UTF8 and store the character byte values in a variable. This is done specifying [Text.Encoding]::UTF8.GetBytes($pass) for the $pass variable, [Text.Encoding]::UTF8.GetBytes($salt) for the $salt variable, and [Text.Encoding]::UTF8.GetBytes($init) for the $init variable. After setting the proper character encoding, you proceed to create the encryption key for the RijndalManaged encryption algorithm. This is done by setting the RijndaelManaged $r.Key attribute to the object created by (new-Object Security.Cryptography.PasswordDeriveBytes $pass, $salt, "SHA1", 50000).GetBytes(32). This object leverages the Security.Cryptography.PasswordDeriveBytes class and creates a key using the $pass variable, $salt variable, "SHA1" hash name, and iterating the derivative 50000 times. Each iteration of this class generates a different key value, making it more complex to guess the key. You then leverage the .Get-Bytes(32) method to return the 32-byte value of the key. The RijndaelManaged 256-bit encryption is a derivative of the 32 bytes in the key. 32 bytes times 8 bits per byte is 256bits. To create the initialization vector for the algorithm, you set the RijndaelManaged$r.IV attribute to the object created by (new-Object Security.Cryptography.SHA1Managed).ComputeHash($init)[0..15]. This section of the code leverages Security.Cryptography.SHA1Managed and computes the hash based on the $init value. When you invoke the [0..15] range operator, it will obtain the first 16 bytes of the hash and place it into $r.IVattribute. The RijndaelManaged default block size for the initialization vector is 128bits. 16 bytes times 8 bits per byte is 128bits. After setting up the required attributes, you are now ready to start encrypting data. You first start by leveraging the $r RijndaelManaged object with the $r.Key and $r.IV attributes defined. You use the $r.CreateEncryptor() method to generate the encryptor. Once you've generated the encryptor, you have to create a memory stream to do the encryption in memory. This is done by declaring new-Objectcmdlet, set to the IO.MemoryStream class, and placing the memory stream object in the $ms variable. Next, you create CryptoStream. The CryptoStream is used to transform the unencrypted data into the encrypted data. You first declare the new-Object cmdlet with the Security.Cryptopgraphy.CryptoStream argument. You also define the memory stream of $ms, the encryptor of $c, and the operator of "Write" to tell the class to write unencrypted data to the encryption stream in memory. After creating CryptoStream, you are ready to write the unencrypted data into the CryptoStream. This is done using the IO.StreamWriter class. You declare a new-Object cmdlet with the IO.StreamWriter argument, and define CryptoStream of $cs for writing. Last, you take the unencrypted string stored in the $string variable, and pass it into the StreamWriter$sw with $sw.Write($String). The encrypted value is now stored in the memory stream. To stop the writing of data to the CryptoStream and MemoryStream, you close the StreamWriter with $sw.Close(), close the CryptoStream with $cs.Close() and the memory stream with $ms.Close(). For security purposes, you also clear out the encryptor data by declaring $r.Clear(). After the encryption process is done, you will need to export the memory stream to a byte array. This is done calling the $ms.ToArray() method and setting it to the$result variable with the [byte[]] data type. The contents are stored in a byte array in $result. This section of the code is where you declare your catch { statement. If there were any errors in the encryption process, the script will execute this section. You declare the variable of $err with the"Error Occurred Encrypting String: $_" argument. The $_ will be the pipeline error that occurred during the try {} section. You then create an if statement to determine whether there is data in the $err variable. If there is data in $err, it returns the error string to the script. If there were no errors, the script will enter the else { section of the script. It will convert the $result byte array to Base64String by leveraging [Convert]::ToBase64String($result). This converts the byte array to string for use in your scripts. After defining the encryption function, you call the function for use. You first start by calling Encrypt-String followed by "Encrypt This String". You also declare the second argument as the password for the encryptor, which is "A_Complex_Password_With_A_Lot_Of_Characters". After execution, this example receives the encrypted value of hK7GHaDD1FxknHu03TYAPxbFAAZeJ6KTSHlnSCPpJ7c= generated from the function. Your results will vary depending on your salt, init, and password you use for the encryption algorithm. Decrypting strings The decryption of strings is very similar to the process you performed of encrypting strings. Instead of writing data to the memory stream, the function reads the data in the memory stream. Also, instead of using the .CreateEncryptor() method, the decryption process leverages the .CreateDecryptor() method. To create a script that decrypts encrypted strings using the RijndaelManaged encryption, you would perform the following: Add-Type -AssemblyNameSystem.Security function Decrypt-String { param($Encrypted, $pass, $salt="CreateAUniqueSalt", $init="CreateAUniqueInit") if($Encrypted -is [string]){ $Encrypted = [Convert]::FromBase64String($Encrypted) } $r = new-Object System.Security.Cryptography.RijndaelManaged $pass = [System.Text.Encoding]::UTF8.GetBytes($pass) $salt = [System.Text.Encoding]::UTF8.GetBytes($salt) $init = [Text.Encoding]::UTF8.GetBytes($init) $r.Key = (new-Object Security.Cryptography.PasswordDeriveBytes $pass, $salt, "SHA1", 50000).GetBytes(32) $r.IV = (new-Object Security.Cryptography.SHA1Managed).ComputeHash($init)[0..15] $d = $r.CreateDecryptor() $ms = new-Object IO.MemoryStream@(,$Encrypted) $cs = new-Object Security.Cryptography.CryptoStream $ms,$d,"Read" $sr = new-Object IO.StreamReader $cs try { $result = $sr.ReadToEnd() $sr.Close() $cs.Close() $ms.Close() $r.Clear() Return $result } Catch { Write-host "Error Occurred Decrypting String: Wrong String Used In Script." } } Decrypt-String "hK7GHaDD1FxknHu03TYAPxbFAAZeJ6KTSHlnSCPpJ7c=""A_Complex_Password_With_A_Lot_Of_Characters". The output of this script would look like the following: This function displays how to decrypt a string leveraging the RijndaelManaged encryption algorithm. You first start by importing the System.Security assembly by leveraging the Add-Type cmdlet, using the -AssemblyName parameter with the System.Security argument. You then declare the Decrypt-String function. You include a parameter block to accept and set values for the function. The first value is $Encrypted, which is the encrypted text. The second value is the $pass which is used for the encryption key. The third is a predefined $salt variable set to "CreateAUniqueSalt". You then define the $init variable, which is set to "CreateAUniqueInit". After the parameter block, you check to see if the encrypted value is formatted as a string by using if ($Encrypted -is [string]) {. If this evaluates to True, you convert the string to bytes using [Convert]::FromBase64String($Encrypted) and placing the encoded value in the $Encrypted variable. Next, you declare the decryption class using new-Object cmdlet with the System.Security.Cryptography.RijndaelManaged argument. You place this object inside of the $r variable. You then convert the $pass, $salt, and $init values to the character encoding standard of UTF8 and store the character byte values in a variable. This is done specifying [Text.Encoding]::UTF8.GetBytes($pass) for the $pass variable, [Text.Encoding]::UTF8.GetBytes($salt) for the $salt variable, and [Text.Encoding]::UTF8.GetBytes($init) for the $init variable. After setting the proper character encoding, you proceed to create the encryption key for the RijndaelManaged encryption algorithm. This is done by setting the RijndaelManaged $r.Key attribute to the object created by (new-Object Security.Cryptography.PasswordDeriveBytes $pass, $salt, "SHA1", 50000).GetBytes(32). This object leverages the Security.Cryptography.PasswordDeriveBytes class and creates a key using the $pass variable, $salt variable, "SHA1" hash name, and iterating the derivative 50000 times. Each iteration of this class generates a different key value, making it more complex to guess the key. You then leverage the .get-bytes(32) method to return the 32-byte value of the key. To create the initialization vector for the algorithm, you set the RijndaelManaged $r.IV attribute to the object created by (new-Object Security.Cryptography.SHA1Managed).ComputeHash($init)[0..15]. This section of the code leverages the Security.Cryptography.SHA1Managed class and computes the hash based on the $init value. When you invoke the [0..15] range operator, the first 16 bytes of the hash are obtained and placed into $r.IV attribute. After setting up the required attributes, you are now ready to start decrypting data. You first start by leveraging the $r RijndaelManaged object with the $r.key and $r.IV attributes defined. You use the $r.CreateDecryptor() method to generate the decryptor. Once you've generated the decryptor, you have to create a memory stream to do the decryption in memory. This is done by declaring new-Object cmdlet with the IO.MemoryStream class argument. You then reference the $encrypted values to place in the memory stream object with @(,$Encrypted), and store the populated memory stream in the $ms variable. Next, you create CryptoStream, which CryptoStream is used to transform the encrypted data into the decrypted data. You first declare new-Object cmdlet with the Security.Cryptopgraphy.CryptoStream class argument. You also define the memory stream of $ms, the decryptor of $d, and the operator of "Read" to tell the class to read the encrypted data from the encryption stream in memory. After creating CryptoStream, you are ready to read the decrypted datafrom CryptoStream. This is done using the IO.StreamReader class. You declare new-Object with the IO.StreamReader class argument, and define CryptoStream of $cs to read from. At this point, you use try { to catch any error messages that are generated from reading the data in the StreamReader. You call $sr.ReadToEnd(), which calls the StreamReader and reads the complete decrypted value and places the datain the $result variable. To stop the reading of data to CryptoStream and MemoryStream, you close StreamWriter with $sw.Close(), close the CryptoStream with $cs.Close() and the memory stream with $ms.Close(). For security purposes, you also clear out the decryptor data by declaring $r.Clear(). If the decryption is successful, you return the value of $result to the script. After defining the decryption function, you call the function for use. You first start by calling Decrypt-String followed by "hK7GHaDD1FxknHu03TYAPxbFAAZeJ6KTSHlnSCPpJ7c=". You also declare the second argument as the password for the decryptor, which is "A_Complex_Password_With_A_Lot_Of_Characters". After execution, you will receive the decrypted value of "Encrypt This String" generated from the function. Summary In this article, we learned about RijndaelManaged 256-bit encryption. We first started with the basics of the encryption process. Then, we proceeded into learning how to create randomized salt, init, and passwords in scripts. We ended the article with learning how to encrypt and decrypt strings. Resources for Article: Further resources on this subject: WLAN Encryption Flaws [article] Introducing PowerShell Remoting [article] SQL Server with PowerShell [article]
Read more
  • 0
  • 0
  • 13052

article-image-getting-started-metasploit
Packt
22 Jun 2017
10 min read
Save for later

Getting Started with Metasploit

Packt
22 Jun 2017
10 min read
In this article by Nipun Jaswal, the author of the book Metasploit Bootcamp, we will be covering the following topics: Fundamentals of Metasploit Benefits of using Metasploit (For more resources related to this topic, see here.) Penetration testing is an art of performing a deliberate attack on a network, web application, server or any device that require a thorough check up from the security perspective. The idea of a penetration test is to uncover flaws while simulating real world threats. A penetration test is performed to figure out vulnerabilities and weaknesses in the systems so that vulnerable systems can stay immune to threats and malicious activities. Achieving success in a penetration test largely depends on using the right set of tools and techniques. A penetration tester must choose the right set of tools and methodologies in order to complete a test. While talking about the best tools for penetration testing, the first one that comes to mind is Metasploit. It is considered as one of the most practical tools to carry out penetration testing today. Metasploit offers a wide variety of exploits, a great exploit development environment, information gathering and web testing capabilities, and much more. The fundamentals of Metasploit Now that we have completed the setup of Kali Linux let us talk about the big picture: Metasploit. Metasploit is a security project that provides exploits and tons of reconnaissance features to aid a penetration tester. Metasploit was created by H.D Moore back in 2003, and since then, its rapid development has led it to be recognized as one of the most popular penetration testing tools. Metasploit is entirely a Ruby-driven project and offers a great deal of exploits, payloads, encoding techniques, and loads of post-exploitation features. Metasploit comes in various editions, as follows: Metasploit pro: This edition is a commercial edition, offers tons of great features such as web application scanning and exploitation, automated exploitation and is quite suitable for professional penetration testers and IT security teams. Pro edition is used for advanced penetration tests and enterprise security programs. Metasploit express: The Express edition is used for baseline penetration tests. Features in this version of Metasploit include smart exploitation, automated brute forcing of the credentials, and much more. This version is quite suitable for IT security teams to small to medium size companies. Metasploit community: This is a free version with reduced functionalities of the express edition. However, for students and small businesses, this edition is a favorable choice. Metasploit framework: This is a command-line version with all manual tasks such as manual exploitation, third-party import, and so on. This release is entirely suitable for developers and security researchers. You can download Metasploit from the following link: https://www.rapid7.com/products/metasploit/download/editions/ We will be using the Metasploit community and framework version.Metasploit also offers various types of user interfaces, as follows: The graphical user interface(GUI): This has all the options available at a click of a button. This interface offers a user-friendly interface that helps to provide a cleaner vulnerability management. The console interface: This is the most preferred interface and the most popular one as well. This interface provides an all in one approach to all the options offered by Metasploit. This interface is also considered one of the most stable interfaces. The command-line interface: This is the more potent interface that supports the launching of exploits to activities such as payload generation. However, remembering each and every command while using the command-line interface is a difficult job. Armitage: Armitage by Raphael Mudge added a neat hacker-style GUI interface to Metasploit. Armitage offers easy vulnerability management, built-in NMAP scans, exploit recommendations, and the ability to automate features using the Cortanascripting language. Basics of Metasploit framework Before we put our hands onto the Metasploit framework, let us understand basic terminologies used in Metasploit. However, the following modules are not just terminologies but modules that are heart and soul of the Metasploit project: Exploit: This is a piece of code, which when executed, will trigger the vulnerability at the target. Payload: This is a piece of code that runs at the target after a successful exploitation is done. It defines the type of access and actions we need to gain on the target system. Auxiliary: These are modules that provide additional functionalities such as scanning, fuzzing, sniffing, and much more. Encoder: Encoders are used to obfuscate modules to avoid detection by a protection mechanism such as an antivirus or a firewall. Meterpreter: This is a payload that uses in-memory stagers based on DLL injections. It provides a variety of functions to perform at the target, which makes it a popular choice. Architecture of Metasploit Metasploit comprises of various components such as extensive libraries, modules, plugins, and tools. A diagrammatic view of the structure of Metasploit is as follows: Let's see what these components are and how they work. It is best to start with the libraries that act as the heart of Metasploit. Let's understand the use of various libraries as explained in the following table: Library name Uses REX Handles almost all core functions such as setting up sockets, connections, formatting, and all other raw functions MSF CORE Provides the underlying API and the actual core that describes the framework MSF BASE Provides friendly API support to modules We have many types of modules in Metasploit, and they differ regarding their functionality. We have payload modules for creating access channels to exploited systems. We have auxiliary modules to carry out operations such as information gathering, fingerprinting, fuzzing an application, and logging into various services. Let's examine the basic functionality of these modules, as shown in the following table: Module type Working Payloads Payloads are used to carry out operations such as connecting to or from the target system after exploitation or performing a particular task such as installing a service and so on. Payload execution is the next step after the system is exploited successfully. Auxiliary Auxiliary modules are a special kind of module that performs specific tasks such as information gathering, database fingerprinting, scanning the network to find a particular service and enumeration, and so on. Encoders Encoders are used to encode payloads and the attack vectors to (or intending to) evade detection by antivirus solutions or firewalls. NOPs NOP generators are used for alignment which results in making exploits stable. Exploits The actual code that triggers a vulnerability Metasploit framework console and commands Gathering knowledge of the architecture of Metasploit, let us now run Metasploit to get a hands-on knowledge about the commands and different modules. To start Metasploit, we first need to establish database connection so that everything we do can be logged into the database. However, usage of databases also speeds up Metasploit's load time by making use of cache and indexes for all modules. Therefore, let us start the postgresql service by typing in the following command at the terminal: root@beast:~# service postgresql start Now, to initialize Metasploit's database let us initialize msfdb as shown in the following screenshot: It is clearly visible in the preceding screenshot that we have successfully created the initial database schema for Metasploit. Let us now start the Metasploit's database using the following command: root@beast:~# msfdb start We are now ready to launch Metasploit. Let us issue msfconsole in the terminal to startMetasploit as shown in the following screenshot: Welcome to the Metasploit console, let us run the help command to see what other commands are available to us: The commands in the preceding screenshot are core Metasploit commands which are used to set/get variables, load plugins, route traffic, unset variables, printing version, finding the history of commands issued, and much more. These commands are pretty general. Let's see module based commands as follows: Everything related to a particular module in Metasploit comes under module controls section of the Help menu. Using the preceding commands, we can select a particular module, load modules from a particular path, get information about a module, show core, and advanced options related to a module and even can edit a module inline. Let us learn some basic commands in Metasploit and familiarize ourselves to the syntax and semantics of these commands: Command Usage Example use [auxiliary/exploit/payload/encoder] To select a particular msf>use exploit/unix/ftp/vsftpd_234_backdoor msf>use auxiliary/scanner/portscan/tcp show[exploits/payloads/encoder/auxiliary/options] To see the list of available modules of a particular type msf>show payloads msf> show options set [options/payload] To set a value to a particular object msf>set payload windows/meterpreter/reverse_tcp msf>set LHOST 192.168.10.118 msf> set RHOST 192.168.10.112 msf> set LPORT 4444 msf> set RPORT 8080 setg [options/payload] To assign a value to a particular object globally, so the values do not change when a module is switched on msf>setgRHOST 192.168.10.112 run To launch an auxiliary module after all the required options are set msf>run exploit To launch an exploit msf>exploit back To unselect a module and move back msf(ms08_067_netapi)>back msf> Info To list the information related to a particular exploit/module/auxiliary msf>info exploit/windows/smb/ms08_067_netapi msf(ms08_067_netapi)>info Search To find a particular module msf>search hfs check To check whether a particular target is vulnerable to the exploit or not msf>check Sessions To list the available sessions msf>sessions [session number]   Meterpreter commands Usage Example sysinfo To list system information of the compromised host meterpreter>sysinfo ifconfig To list the network interfaces on the compromised host meterpreter>ifconfig meterpreter>ipconfig (Windows) Arp List of IP and MAC addresses of hosts connected to the target meterpreter>arp background To send an active session to background meterpreter>background shell To drop a cmd shell on the target meterpreter>shell getuid To get the current user details meterpreter>getuid getsystem To escalate privileges and gain system access meterpreter>getsystem getpid To gain the process id of the meterpreter access meterpreter>getpid ps To list all the processes running at the target meterpreter>ps If you are using Metasploit for the very first time, refer to http://www.offensive-security.com/metasploit-unleashed/Msfconsole_Commandsfor more information on basic commands Benefits of using Metasploit Metasploit is an excellent choice when compared to traditional manual techniques because of certain factors which are listed as follows: Metasploit framework is open source Metasploit supports large testing networks by making use of CIDR identifiers Metasploit offers quick generation of payloads which can be changed or switched on the fly Metasploit leaves the target system stable in most of the cases The GUI environment provides a fast and user-friendly way to conduct penetration testing Summary Throughout this article, we learned the basics of Metasploit. We learned about various syntax and semantics of Metasploit commands. We also learned the benefits of using Metasploit. Resources for Article: Further resources on this subject: Approaching a Penetration Test Using Metasploit [article] Metasploit Custom Modules and Meterpreter Scripting [article] So, what is Metasploit? [article]
Read more
  • 0
  • 0
  • 17382

article-image-inbuilt-data-types-python
Packt
22 Jun 2017
4 min read
Save for later

Inbuilt Data Types in Python

Packt
22 Jun 2017
4 min read
This article by Benjamin Baka, author of the book Python Data Structures and Algorithm, explains the inbuilt data types in Python. Python data types can be divided into 3 categories, numeric, sequence and mapping. There is also the None object that represents a Null, or absence of a value. It should not be forgotten either that other objects such as classes, files and exceptions can also properly be considered types, however they will not be considered here. (For more resources related to this topic, see here.) Every value in Python has a data type. Unlike many programming languages, in Python you do not need to explicitly declare the type of a variable. Python keeps track of object types internally. Python inbuilt data types are outlined in the following table: Category Name Description None None The null object Numeric int Integer   float Floating point number   complex Complex number   bool Boolean (True, False) Sequences str String of characters   list List of arbitrary objects   Tuple Group of arbitrary items   range Creates a range of integers. Mapping dict Dictionary of key – value pairs   set Mutable, unordered collection of unique items   frozenset Immutable set None type The None type is immutable and has one value, None. It is used to represent the absence of a value. It is returned by objects that do not explicitly return a value and evaluates to False in Boolean expressions. It is often used as the default value in optional arguments to allow the function to detect if the caller has passed a value. Numeric Types All numeric types, apart from bool, are signed and they are all immutable. Booleans have two possible values, True and False. These values are mapped to 1 and 0 respectively. The integer type, int, represents whole numbers of unlimited range. Floating point numbers are represented by the native double precision floating point representation of the machine. Complex numbers are represented by two floating point numbers. they are assigned using the j operator to signify the imaginary part of the complex number. For example : a = 2+3j We can access the real and imaginary parts by a.real and a.imag respectively. Representation error It should be noted that the native double precision representation of floating point numbers leads to some unexpected results. For example, consider the following: In[14]: 1-0.9 Out[14]: 0.09999999999998 In [15]: 1-0.9 == 0.1 Out[15]: False This is a result of the fact that most decimal fractions are not exactly representable as a binary fraction, which is how most underlying hardware represents floating point numbers. For algorithms or applications where this may be an issue Python provides a decimalmodule. This module allows for the exact representation of decimal numbers and facilitates greater control properties such as rounding behaviour, number of significant digits and precision. It defines two objects, a Decimal type, representing decimal numbers and a Context type, representing various computational parameters such as precision, rounding and error handling.  An example of its usage can be seen in the following: In [1]: import decimal In[2]: x = decimal.Decimal(3.14); y=decimal.Decimal(2.74) In[3]: x*y Out[3]: Decimal (‘8.60360000000001010036498883’) In[4]: decimal.getcontext().prec = 4 In[5]: x * y Out[5]: Decimal(‘8.604’) Here we have created a global context and set the precision to 4. The Decimal object can be treated pretty much as you would treat an int or a float. They are subject to all the same mathematical operations and can be used as dictionary keys, placed in sets and so on. In addition, Decimal objects also have several methods for mathematical operations such as natural exponents x.exp(), natural logarithms, x.ln() and base 10 logarithms, x.log10().  Python also has a fractions module that implements a rational number type. The following shows several ways to create fractions: In [62]: import fractions In [63]: fractions Fraction(3,4) #creates the fraction ¾ Out[63]: Fraction(3,4) In [64]: fraction Fraction(0,5) #creates a fraction from a float Out[64]: Fraction(1,2) In [65]: fraction Fraction(“.25”) #creates a fraction from a string Out[65]: Fraction(1,4) It is also worth mentioning here the NumPy extension. This has types for mathematical objects such as arrays, vectors and matrixes and capabilities for linear algebra, calculation of Fourier transforms, eigenvectors, logical operations and much more. Summary We have looked at the built in data types and some internal Python modules, most notable the collections module. There are a number of external libraries such as the SciPy stack, and, likewise.  Resources for Article: Further resources on this subject: Python Data Structures [article] Getting Started with Python Packages [article] An Introduction to Python Lists and Dictionaries [article]
Read more
  • 0
  • 0
  • 2412

article-image-understanding-microservices
Packt
22 Jun 2017
19 min read
Save for later

Understanding Microservices

Packt
22 Jun 2017
19 min read
This article by Tarek Ziadé, author of the book Python Microservices Development explains the benefits and implementation of microservices with Python. While the microservices architecture looks more complicated than its monolithic counterpart, its advantages are multiple. It offers the following benefits. (For more resources related to this topic, see here.) Separation of concerns First of all, each microservice can be developed independently by a separate team. For instance, building a reservation service can be a full project on its own. The team in charge can make it in whatever programming language and database, as long as it has a well-documented HTTP API. That also means the evolution of the app is more under control than with monoliths. For example, if the payment system changes its underlying interactions with the bank, the impact is localized inside that service and the rest of the application stays stable and under control. This loose coupling improves a lot the overall project velocity as we're applying at the service level a similar philosophy than the single responsibility principle. The single responsibility principle was defined by Robert Martin to explain that a class should have only one reason to change - in other words, each class should be providing a single, well-defined feature. Applied to microservices, it means that we want to make sure that each microservice focuses on a single role. Smaller projects The second benefit is breaking the complexity of the project. When you are adding a feature to an application like the PDF reporting, even if you are doing it cleanly, you are making the base code bigger, more complicated and sometimes slower. Building that feature in a separate application avoids this problem, and makes it easier to write it with whatever tools you want. You can refactor it often and shorten your release cycles, and stay on the top of things. The growth of the application remains under your control. Dealing with a smaller project also reduces risks when improving the application: if a team wants to try out the latest programming language or framework, they can iterate quickly on a prototype that implements the same microservice API, try it out, and decide whether or not to stick with it. One real-life example in mind is the Firefox Sync storage microservice. There are currently some experiments to switch from the current Python+MySQL implementation to a Go based one that stores users data in standalone SQLite databases. That prototype is highly experimental, but since we have isolated the storage feature in a microservice with a well-defined HTTP API, it's easy enough to give it a try with a small subset of the user base. Scaling and deployment Last, having your application split into components makes it easier to scale depending on your constraints. Let's say you are starting to get a lot of customers that are booking hotels daily, and the PDF generation is starting to heat up the CPUs. You can deploy that specific microservice in some servers that have bigger CPUs. Another typical example is RAM-consuming microservices like the ones that are interacting with memory databases like Redis or Memcache. You could tweak your deployments consequently by deploying them on servers with less CPU and a lot more RAM. To summarize microservices benefits: A team can develop each microservice independently, and use whatever technological stack makes sense. They can define a custom release cycle. The tip of the iceberg is its language agnostic HTTP API. Developers break the application complexity into logical components. Each microservice focuses on doing one thing well. Since microservices are standalone applications, there's a finer control on deployments, which makes scaling easier. Microservices architectures are good at solving a lot of the problems that may arise once your application is starting to grow. Although, we need to be aware of some of the new issues they also bring in practice. Implementing microservices with Python Python is an amazingly versatile language. As you probably already know, it's used to build many different kinds of applications, from simple system scripts that perform tasks on a server, to large object-oriented applications that run services for millions of users. According to a study conducted by Philip Guo in 2014, published in the Association for Computing Machinery (ACM) website, Python has surpassed Java in top U.S. universities and is the most popular language to learn Computer Science. This trend is also true in the software industry. Python sits now in the top 5 languages in the TIOBE index (http://www.tiobe.com/tiobe-index/), and it's probably even bigger in the web development land since languages like C are rarely used as main languages to build web applications. However, some developers criticize Python for being slow and unfit for building efficient web services. Python is slow, and this is undeniable. But it's still is a language of choice for building microservices, and many major companies are happily using it. This section will give you some background on the different ways you can write microservices using Python, some insights on asynchronous versus synchronous programming, and conclude with some details on Python performances. It's composed of 4 parts: The WSGI standard Greenlet & Gevent Twisted & Tornado asyncio Language performances The WSGI standard What strikes the most web developers that are starting with Python is how easy it is to get a web application up and running. The Python web community has created a standard inspired from the Common Gateway Interface (CGI) called Web Server Gateway Interface (WSGI) that simplifies a lot how you can write a Python application which goal is to serve HTTP requests. When your code is using that standard, your project can be executed by standard web servers like Apache or NGinx, using WSGI extensions like uwsgi or mod_wsgi. Your application just has to deal with incoming requests and send back JSON responses, and Python includes all that goodness in its standard library. You can create a fully functional microservice that returns the server's local time with a vanilla Python module of fewer than ten lines: import JSON import time def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) return bytes(json.dumps({'time': time.time()}), 'utf8') Since its introduction, the WSGI protocol became an essential standard and the Python web community widely adopted it. Developers wrote middlewares, which are functions you can hook before or after the WSGI application function itself, to do something within the environment. Some web frameworks were created specifically around that standard, like Bottle (http://bottlepy.org) - and soon enough, every framework out there could be used through WSGI in a way or another. The biggest problem with WSGI though is its synchronous nature. The application function you see above is called exactly once per incoming request, and when the function returns, it has to send back the response. That means that every time you are calling the function, it will block until the response is ready. And writing microservices means your code will be waiting for responses from various network resources all the time. In other words, your application will idle and just block the client until everything is ready. That's an entirely okay behavior for HTTP APIs. We're not talking about building bidirectional applications like web socket based ones. But what happens when you have several incoming requests that are calling your application at the same time? WSGI servers will let you run a pool of threads to serve several requests concurrently. But you can't run thousands of them, and as soon as the pool is exhausted, the next request will be blocking even if your microservice is doing nothing but idling and waiting for backend services responses. That's one of the reasons why non-WSGI frameworks like Twisted, Tornado and in Javascript land Node.js became very successful - it's fully async. When you're coding a Twisted application, you can use callbacks to pause and resume the work done to build a response. That means you can accept new requests and start to treat them. That model dramatically reduces the idling time in your process. It can serve thousands of concurrent requests. Of course, that does not mean the application will return each single response faster. It just means one process can accept more concurrent requests and juggle between them as the data is getting ready to be sent back. There's no simple way with the WSGI standard to introduce something similar, and the community has debated for years to come up with a consensus - and failed. The odds are that the community will eventually drop the WSGI standard for something else. In the meantime, building microservices with synchronous frameworks is still possible and completely fine if your deployments take into account the one request == one thread limitation of the WSGI standard. There's, however, one trick to boost synchronous web applications: greenlets. Greenlet & Gevent The general principle of asynchronous programming is that the process deals with several concurrent execution contexts to simulate parallelism. Asynchronous applications are using an event loop that pauses and resumes execution contexts when an event is triggered - only one context is active, and they take turns. Explicit instruction in the code will tell the event loop that this is where it can pause the execution. When that occurs, the process will look for some other pending work to resume. Eventually, the process will come back to your function and continue it where it stopped - moving from an execution context to another is called switching. The Greenlet project (https://github.com/python-greenlet/greenlet) is a package based on the Stackless project, a particular CPython implementation, and provides greenlets. Greenlets are pseudo-threads that are very cheap to instantiate, unlike real threads, and that can be used to call python functions. Within those functions, you can switch and give back the control to another function. The switching is done with an event loop and allows you to write an asynchronous application using a Thread-like interface paradigm. Here's an example from the Greenlet documentation def test1(x, y): z = gr2.switch(x+y) print z def test2(u): print u gr1.switch(42) gr1 = greenlet(test1) gr2 = greenlet(test2) gr1.switch("hello", " world") The two greenlets are explicitly switching from one to the other. For building microservices based on the WSGI standard, if the underlying code was using greenlets we could accept several concurrent requests and just switch from one to another when we know a call is going to block the request - like performing a SQL query. Although, switching from one greenlet to another has to be done explicitly, and the resulting code can quickly become messy and hard to understand. That's where Gevent can become very useful. The Gevent project (http://www.gevent.org/) is built on the top of Greenlet and offers among other things an implicit and automatic way of switching between greenlets. It provides a cooperative version of the socket module that will use greenlets to automatically pause and resume the execution when some data is made available in the socket. There's even a monkey patch feature that will automatically replace the standard lib socket with Gevent's version. That makes your standard synchronous code magically asynchronous every time it uses sockets - with just one extra line. from gevent import monkey; monkey.patch_all() def application(environ, start_response): headers = [('Content-type', 'application/json')] start_response('200 OK', headers) # ...do something with sockets here... return result This implicit magic comes with a price, though. For Gevent to work well, all the underlying code needs to be compatible with the patching Gevent is doing. Some packages from the community will continue to block or even have unexpected results because of this. In particular, if they use C extensions and bypass some of the features of the standard library Gevent patched. But for most cases, it works well. Projects that are playing well with Gevent are dubbed "green," and when a library is not functioning well, and the community asks its authors to "make it green," it usually happens. That's what was used to scale the Firefox Sync service at Mozilla for instance. Twisted and Tornado If you are building microservices where increasing the number of concurrent requests you can hold is important, it's tempting to drop the WSGI standard and just use an asynchronous framework like Tornado (http://www.tornadoweb.org/) or Twisted (https://twistedmatrix.com/trac/). Twisted has been around for ages. To implement the same microservices you need to write a slightly more verbose code: import time from twisted.web import server, resource from twisted.internet import reactor, endpoints class Simple(resource.Resource): isLeaf = True def render_GET(self, request): request.responseHeaders.addRawHeader(b"content-type", b"application/json") return bytes(json.dumps({'time': time.time()}), 'utf8') site = server.Site(Simple()) endpoint = endpoints.TCP4ServerEndpoint(reactor, 8080) endpoint.listen(site) reactor.run() While Twisted is an extremely robust and efficient framework, it suffers from a few problems when building HTTP microservices: You need to implement each endpoint in your microservice with a class derived from a Resource class, and that implements each supported method. For a few simple APIs, it adds a lot of boilerplate code. Twisted code can be hard to understand & debug due to its asynchronous nature. It's easy to fall into callback hell when you're chaining too many functions that are getting triggered successively one after the other - and the code can get messy Properly testing your Twisted application is hard, and you have to use Twisted-specific unit testing model. Tornado is based on a similar model but is doing a better job in some areas. It has a lighter routing system and does everything possible to make the code closer to plain Python. Tornado is also using a callback model, so debugging can be hard. But both frameworks are working hard at bridging the gap to rely on the new async features introduced in Python 3. asyncio When Guido van Rossum started to work on adding async features in Python 3, part of the community pushed for a Gevent-like solution because it made a lot of sense to write applications in a synchronous, sequential fashion - rather than having to add explicit callbacks like in Tornado or Twisted. But Guido picked the explicit technique and experimented in a project called Tulip that Twisted inspired. Eventually, asyncio was born out of that side project and added into Python. In hindsight, implementing an explicit event loop mechanism in Python instead of going the Gevent way makes a lot of sense. The way the Python core developers coded asyncio and how they elegantly extended the language with the async and await keywords to implement coroutines, made asynchronous applications built with vanilla Python 3.5+ code look very elegant and close to synchronous programming. By doing this, Python did a great job at avoiding the callback syntax mess we sometimes see in Node.js or Twisted (Python 2) applications. And beyond coroutines, Python 3 has introduced a full set of features and helpers in the asyncio package to build asynchronous applications, see https://docs.python.org/3/library/asyncio.html. Python is now as expressive as languages like Lua to create coroutine-based applications, and there are now a few emerging frameworks that have embraced those features and will only work with Python 3.5+ to benefit from this. KeepSafe's aiohttp (http://aiohttp.readthedocs.io) is one of them, and building the same microservice, fully asynchronous, with it would simply be these few elegant lines. from aiohttp import web import time async def handle(request): return web.json_response({'time': time.time()}) if __name__ == '__main__': app = web.Application() app.router.add_get('/', handle) web.run_app(app) In this small example, we're very close to how we would implement a synchronous app. The only hint we're async is the async keyword marking the handle function as being a coroutine. And that's what's going to be used at every level of an async Python app going forward. Here's another example using aiopg - a Postgresql lib for asyncio. From the project documentation: import asyncio import aiopg dsn = 'dbname=aiopg user=aiopg password=passwd host=127.0.0.1' async def go(): pool = await aiopg.create_pool(dsn) async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute("SELECT 1") ret = [] async for row in cur: ret.append(row) assert ret == [(1,)] loop = asyncio.get_event_loop() loop.run_until_complete(go()) With a few async and await prefixes, the function that's performing a SQL query and send back the result looks a lot like a synchronous function. But asynchronous frameworks and libraries based on Python 3 are still emerging, and if you are using asyncio or a framework like aiohttp, you will need to stick with particular asynchronous implementations for each feature you need. If you require using a library that is not asynchronous in your code, using it from your asynchronous code means you will need to go through some extra and challenging work if you want to prevent blocking the event loop. If your microservices are dealing with a limited number of resources, it could be manageable. But it's probably a safer bet at this point (2017) to stick with a synchronous framework that's been around for a while rather than an asynchronous one. Let's enjoy the existing ecosystem of mature packages, and wait until the asyncio ecosystem gets more sophisticated. And there are many great synchronous frameworks to build microservices with Python, like Bottle, Pyramid with Cornice or Flask. Language performances In the previous sections we've been through the two different ways to write microservices - asynchronous vs. synchronous, and whatever technique you are using, the speed of Python is directly impacting the performance of your microservice. Of course, everyone knows Python is slower than Java or Go - but execution speed is not always the top priority. A microservice is often a thin layer of code that is sitting most of its life waiting for some network responses from other services. Its core speed is usually less important than how fast your SQL queries will take to return from your Postgres server because the latter will represent most of the time spent to build the response. But wanting an application that's as fast as possible is legitimate. One controversial topic in the Python community around speeding up the language is how the Global Interpreter Lock (GIL) mutex can ruin performances because multi-threaded applications cannot use several processes. The GIL has good reasons to exist. It protects non thread-safe parts of the CPython interpreter and exists in other languages like Ruby. And all attempts to remove it so far have failed to produce a faster CPython implementation. Larry Hasting is working on a GIL-free CPython project called Gilectomy - https://github.com/larryhastings/gilectomy - its minimal goal is to come up with a GIL-free implementation that can run a single-threaded application as fast as CPython. As of today (2017), this implementation is still slower that CPython. But it's interesting to follow this work and see if it reaches speed parity one day. That would make a GIL-free CPython very appealing. For microservices, besides preventing the usage of multiple cores in the same process, the GIL will slightly degrade performances on high load, because of the system calls overhead introduced by the mutex. Although, all the scrutiny around the GIL had one beneficial impact: some work has been done in the past years to reduce its contention in the interpreter, and in some area, Python performances have improved a lot. But bear in mind that even if the core team removes the GIL, Python is an interpreted language and the produced code will never be very efficient at execution time. Python provides the dis module if you are interested to see how the interpreter decomposes a function. In the example below, the interpreter will decompose a simple function that yields incremented values from a sequence in no less than 29 steps! >>> def myfunc(data): ... for value in data: ... yield value + 1 ... >>> import dis >>> dis.dis(myfunc) 2 0 SETUP_LOOP 23 (to 26) 3 LOAD_FAST 0 (data) 6 GET_ITER >> 7 FOR_ITER 15 (to 25) 10 STORE_FAST 1 (value) 3 13 LOAD_FAST 1 (value) 16 LOAD_CONST 1 (1) 19 BINARY_ADD 20 YIELD_VALUE 21 POP_TOP 22 JUMP_ABSOLUTE 7 >> 25 POP_BLOCK >> 26 LOAD_CONST 0 (None) 29 RETURN_VALUE A similar function written in a statically compiled language will dramatically reduce the number of operations required to produce the same result. There are ways to speed up Python execution, though. One is to write part of your code into compiled code by building C extensions or using a static extension of the language like Cython (http://cython.org/) - but that makes your code more complicated. Another solution, which is the most promising one, is by simply running your application using the PyPy interpreter (http://pypy.org/). PyPy implements a Just-In-Time compiler (JIT). This compiler is directly replacing at run time pieces of Python with machine code that can be directly used by the CPU. The whole trick for the JIT is to detect in real time, ahead of the execution, when and how to do it. Even if PyPy is always a few Python versions behind CPython, it reached a point where you can use it in production, and its performances can be quite amazing. In one of our projects at Mozilla that needs fast execution, the PyPy version was almost as fast as the Go version, and we've decided to use Python there instead. The Pypy Speed Center website is a great place to look at how PyPy compares to CPython - http://speed.pypy.org/ However, if your program uses C extensions, you will need to recompile them for PyPy, and that can be a problem. In particular, if other developers maintain some of the extensions you are using. But if you are building your microservice with a standard set of libraries, the chances are that will it work out of the box with the PyPy interpreter, so that's worth a try. In any case, for most projects, the benefits of Python and its ecosystem largely surpasses the performances issues described in this section because the overhead in a microservice is rarely a problem. Summary In this article we saw that Python is considered to be one of the best languages to write web applications, and therefore microservices - for the same reasons, it's a language of choice in other areas and also because it provides tons of mature frameworks and packages to do the work. Resources for Article: Further resources on this subject: Inbuilt Data Types in Python [article] Getting Started with Python Packages [article] Layout Management for Python GUI [article]
Read more
  • 0
  • 0
  • 42735

article-image-tangled-web-not-all
Packt
22 Jun 2017
20 min read
Save for later

Tangled Web? Not At All!

Packt
22 Jun 2017
20 min read
In this article by Clif Flynt, the author of the book Linux Shell Scripting Cookbook - Third Edition, we can see a collection of shell-scripting recipes that talk to services on the Internet. This articleis intended to help readers understand how to interact with the Web using shell scripts to automate tasks such as collecting and parsing data from web pages. This is discussed using POST and GET to web pages, writing clients to web services. (For more resources related to this topic, see here.) In this article, we will cover the following recipes: Downloading a web page as plain text Parsing data from a website Image crawler and downloader Web photo album generator Twitter command-line client Tracking changes to a website Posting to a web page and reading response Downloading a video from the Internet The Web has become the face of technology and the central access point for data processing. The primary interface to the web is via a browser that's designed for interactive use. That's great for searching and reading articles on the web, but you can also do a lot to automate your interactions with shell scripts. For instance, instead of checking a website daily to see if your favorite blogger has added a new blog, you can automate the check and be informed when there's new information. Similarly, twitter is the current hot technology for getting up-to-the-minute information. But if I subscribe to my local newspaper's twitter account because I want the local news, twitter will send me all news, including high-school sports that I don't care about. With a shell script, I can grab the tweets and customize my filters to match my desires, not rely on their filters. Downloading a web page as plain text Web pages are simply text with HTML tags, JavaScript and CSS. The HTML tags define the content of a web page, which we can parse for specific content. Bash scripts can parse web pages. An HTML file can be viewed in a web browser to see it properly formatted. Parsing a text document is simpler than parsing HTML data because we aren't required to strip off the HTML tags. Lynx is a command-line web browser which download a web page as plaintext. Getting Ready Lynx is not installed in all distributions, but is available via the package manager. # yum install lynx or apt-get install lynx How to do it... Let's download the webpage view, in ASCII character representation, in a text file by using the -dump flag with the lynx command: $ lynx URL -dump > webpage_as_text.txt This command will list all the hyperlinks <a href="link"> separately under a heading References, as the footer of the text output. This lets us parse links separately with regular expressions. For example: $lynx -dump http://google.com > plain_text_page.txt You can see the plaintext version of text by using the cat command: $ cat plain_text_page.txt Search [1]Images [2]Maps [3]Play [4]YouTube [5]News [6]Gmail [7]Drive [8]More » [9]Web History | [10]Settings | [11]Sign in [12]St. Patrick's Day 2017 _______________________________________________________ Google Search I'm Feeling Lucky [13]Advanced search [14]Language tools [15]Advertising Programs [16]Business Solutions [17]+Google [18]About Google © 2017 - [19]Privacy - [20]Terms References Parsing data from a website The lynx, sed, and awk commands can be used to mine data from websites. How to do it... Let's go through the commands used to parse details of actresses from the website: $ lynx -dump -nolist http://www.johntorres.net/BoxOfficefemaleList.html | grep -o "Rank-.*" | sed -e 's/ *Rank-([0-9]*) *(.*)/1t2/' | sort -nk 1 > actresslist.txt The output is: # Only 3 entries shown. All others omitted due to space limits 1 Keira Knightley 2 Natalie Portman 3 Monica Bellucci How it works... Lynx is a command-line web browser—it can dump a text version of a website as we would see in a web browser, instead of returning the raw html as wget or cURL do. This saves the step of removing HTML tags. The -nolist option shows the links without numbers. Parsing and formatting the lines that contain Rank is done with sed: sed -e 's/ *Rank-([0-9]*) *(.*)/1t2/' These lines are then sorted according to the ranks. See also The Downloading a web page as plain text recipe in this article explains the lynx command. Image crawler and downloader Image crawlers download all the images that appear in a web page. Instead of going through the HTML page by hand to pick the images, we can use a script to identify the images and download them automatically. How to do it... This Bash script will identify and download the images from a web page: #!/bin/bash #Desc: Images downloader #Filename: img_downloader.sh if [ $# -ne 3 ]; then echo "Usage: $0 URL -d DIRECTORY" exit -1 fi while [ -n $1 ] do case $1 in -d) shift; directory=$1; shift ;; *) url=$1; shift;; esac done mkdir -p $directory; baseurl=$(echo $url | egrep -o "https?://[a-z.-]+") echo Downloading $url curl -s $url | egrep -o "<imgsrc=[^>]*>" | sed's/<imgsrc="([^"]*).*/1/g' | sed"s,^/,$baseurl/,"> /tmp/$$.list cd $directory; while read filename; do echo Downloading $filename curl -s -O "$filename" --silent done < /tmp/$$.list An example usage is: $ ./img_downloader.sh http://www.flickr.com/search/?q=linux -d images How it works... The image downloader script reads an HTML page, strips out all tags except <img>, parses src="URL" from the <img> tag, and downloads them to the specified directory. This script accepts a web page URL and the destination directory as command-line arguments. The [ $# -ne 3 ] statement checks whether the total number of arguments to the script is three, otherwise it exits and returns a usage example. Otherwise, this code parses the URL and destination directory: while [ -n "$1" ] do case $1 in -d) shift; directory=$1; shift ;; *) url=${url:-$1}; shift;; esac done The while loop runs until all the arguments are processed. The shift command shifts arguments to the left so that $1 will take the next argument's value; that is, $2, and so on. Hence, we can evaluate all arguments through $1 itself. The case statement checks the first argument ($1). If that matches -d, the next argument must be a directory name, so the arguments are shifted and the directory name is saved. If the argument is any other string it is a URL. The advantage of parsing arguments in this way is that we can place the -d argument anywhere in the command line: $ ./img_downloader.sh -d DIR URL Or: $ ./img_downloader.sh URL -d DIR The egrep -o "<imgsrc=[^>]*>"code will print only the matching strings, which are the <img> tags including their attributes. The phrase [^>]*matches all the characters except the closing >, that is, <imgsrc="image.jpg">. sed's/<imgsrc="([^"]*).*/1/g' extracts the url from the string src="url". There are two types of image source paths—relative and absolute. Absolute paths contain full URLs that start with http:// or https://. Relative URLs starts with / or image_name itself. An example of an absolute URL is http://example.com/image.jpg. An example of a relative URL is /image.jpg. For relative URLs, the starting / should be replaced with the base URL to transform it to http://example.com/image.jpg. The script initializes the baseurl by extracting it from the initial url with the command: baseurl=$(echo $url | egrep -o "https?://[a-z.-]+") The output of the previously described sed command is piped into another sed command to replace a leading / with the baseurl, and the results are saved in a file named for the script's PID: /tmp/$$.list. sed"s,^/,$baseurl/,"> /tmp/$$.list The final while loop iterates through each line of the list and uses curl to downloas the images. The --silent argument is used with curl to avoid extra progress messages from being printed on the screen. The final while loop iterates through each line of the list and uses curl to downloas the images. The --silent argument is used with curl to avoid extra progress messages from being printed on the screen. Web photo album generator Web developers frequently create photo albums of full sized and thumbnail images. When a thumbnail is clicked, a large version of the picture is displayed. This requires resizing and placing many images. These actions can be automated with a simple bash script. The script creates thumbnails, places them in exact directories, and generates the code fragment for <img> tags automatically.  Web developers frequently create photo albums of full sized and thumbnail images. When a thumbnail is clicked, a large version of the picture is displayed. This requires resizing and placing many images. These actions can be automated with a simple bash script. The script creates thumbnails, places them in exact directories, and generates the code fragment for <img> tags automatically. Getting ready This script uses a for loop to iterate over every image in the current directory. The usual Bash utilities such as cat and convert (from the Image Magick package) are used. These will generate an HTML album, using all the images, in index.html. How to do it... This Bash script will generate an HTML album page: #!/bin/bash #Filename: generate_album.sh #Description: Create a photo album using images in current directory echo "Creating album.." mkdir -p thumbs cat <<EOF1 > index.html <html> <head> <style> body { width:470px; margin:auto; border: 1px dashed grey; padding:10px; } img { margin:5px; border: 1px solid black; } </style> </head> <body> <center><h1> #Album title </h1></center> <p> EOF1 for img in *.jpg; do convert "$img" -resize "100x""thumbs/$img" echo "<a href="$img">">>index.html echo "<imgsrc="thumbs/$img" title="$img" /></a>">> index.html done cat <<EOF2 >> index.html </p> </body> </html> EOF2 echo Album generated to index.html Run the script as follows: $ ./generate_album.sh Creating album.. Album generated to index.html How it works... The initial part of the script is used to write the header part of the HTML page. The following script redirects all the contents up to EOF1 to index.html: cat <<EOF1 > index.html contents... EOF1 The header includes the HTML and CSS styling. for img in *.jpg *.JPG; iterates over the file names and evaluates the body of the loop. convert "$img" -resize "100x""thumbs/$img" creates images of 100 px width as thumbnails. The following statements generate the required <img> tag and appends it to index.html: echo "<a href="$img">" echo "<imgsrc="thumbs/$img" title="$img" /></a>">> index.html Finally, the footer HTML tags are appended with cat as done in the first part of the script. Twitter command-line client Twitter is the hottest micro-blogging platform, as well as the latest buzz of the online social media now. We can use Twitter API to read tweets on our timeline from the command line! Twitter is the hottest micro-blogging platform, as well as the latest buzz of the online social media now. We can use Twitter API to read tweets on our timeline from the command line! Let's see how to do it. Getting ready Recently, Twitter stopped allowing people to log in by using plain HTTP Authentication, so we must use OAuth to authenticate ourselves.  Perform the following steps: Download the bash-oauth library from https://github.com/livibetter/bash-oauth/archive/master.zip, and unzip it to any directory. Go to that directory and then inside the subdirectory bash-oauth-master, run make install-all as root.Go to https://apps.twitter.com/ and register a new app. This will make it possible to use OAuth. After registering the new app, go to your app's settings and change Access type to Read and Write. Now, go to the Details section of the app and note two things—Consumer Key and Consumer Secret, so that you can substitute these in the script we are going to write. Great, now let's write the script that uses this. How to do it... This Bash script uses the OAuth library to read tweets or send your own updates. #!/bin/bash #Filename: twitter.sh #Description: Basic twitter client oauth_consumer_key=YOUR_CONSUMER_KEY oauth_consumer_scret=YOUR_CONSUMER_SECRET config_file=~/.$oauth_consumer_key-$oauth_consumer_secret-rc if [[ "$1" != "read" ]] && [[ "$1" != "tweet" ]]; then echo -e "Usage: $0 tweet status_messagen ORn $0 readn" exit -1; fi #source /usr/local/bin/TwitterOAuth.sh source bash-oauth-master/TwitterOAuth.sh TO_init if [ ! -e $config_file ]; then TO_access_token_helper if (( $? == 0 )); then echo oauth_token=${TO_ret[0]} > $config_file echo oauth_token_secret=${TO_ret[1]} >> $config_file fi fi source $config_file if [[ "$1" = "read" ]]; then TO_statuses_home_timeline'''YOUR_TWEET_NAME''10' echo $TO_ret | sed's/,"/n/g' | sed's/":/~/' | awk -F~ '{} {if ($1 == "text") {txt=$2;} else if ($1 == "screen_name") printf("From: %sn Tweet: %snn", $2, txt);} {}' | tr'"''' elif [[ "$1" = "tweet" ]]; then shift TO_statuses_update''"$@" echo 'Tweeted :)' fi Run the script as follows: $./twitter.sh read Please go to the following link to get the PIN: https://api.twitter.com/oauth/authorize?oauth_token=LONG_TOKEN_STRING PIN: PIN_FROM_WEBSITE Now you can create, edit and present Slides offline. - by A Googler $./twitter.sh tweet "I am reading Packt Shell Scripting Cookbook" Tweeted :) $./twitter.sh read | head -2 From: Clif Flynt Tweet: I am reading Packt Shell Scripting Cookbook How it works... First of all, we use the source command to include the TwitterOAuth.sh library, so we can use its functions to access Twitter. The TO_init function initializes the library. Every app needs to get an OAuth token and token secret the first time it is used. If these are not present, we use the library function TO_access_token_helper to acquire them. Once we have the tokens, we save them to a config file so we can simply source it the next time the script is run. The library function TO_statuses_home_timeline fetches the tweets from Twitter. This data is retuned as a single long string in JSON format, which starts like this: [{"created_at":"Thu Nov 10 14:45:20 +0000 "016","id":7...9,"id_str":"7...9","text":"Dining... Each tweet starts with the created_at tag and includes a text and a screen_nametag. The script will extract the text and screen name data and display only those fields. The script assigns the long string to the variable TO_ret. The JSON format uses quoted strings for the key and may or may not quote the value. The key/value pairs are separated by commas, and the key and value are separated by a colon :. The first sed to replaces each," character set with a newline, making each key/value a separate line. These lines are piped to another sed command to replace each occurrence of ": with a tilde ~ which creates a line like screen_name~"Clif_Flynt" The final awk script reads each line. The -F~ option splits the line into fields at the tilde, so $1 is the key and $2 is the value. The if command checks for text or screen_name. The text is first in the tweet, but it's easier to read if we report the sender first, so the script saves a text return until it sees a screen_name, then prints the current value of $2 and the saved value of the text. The TO_statuses_updatelibrary function generates a tweet. The empty first parameter defines our message as being in the default format, and the message is a part of the second parameter. Tracking changes to a website Tracking website changes is useful to both web developers and users. Checking a website manually impractical, but a change tracking script can be run at regular intervals. When a change occurs, it generate a notification. Getting ready Tracking changes in terms of Bash scripting means fetching websites at different times and taking the difference by using the diff command. We can use curl and diff to do this. How to do it... This bash script combines different commands, to track changes in a webpage: #!/bin/bash #Filename: change_track.sh #Desc: Script to track changes to webpage if [ $# -ne 1 ]; then echo -e "$Usage: $0 URLn" exit 1; fi first_time=0 # Not first time if [ ! -e "last.html" ]; then first_time=1 # Set it is first time run fi curl --silent $1 -o recent.html if [ $first_time -ne 1 ]; then changes=$(diff -u last.html recent.html) if [ -n "$changes" ]; then echo -e "Changes:n" echo "$changes" else echo -e "nWebsite has no changes" fi else echo "[First run] Archiving.." fi cp recent.html last.html Let's look at the output of the track_changes.sh script on a website you control. First we'll see the output when a web page is unchanged, and then after making changes. Note that you should change MyWebSite.org to your website name. First, run the following command: $ ./track_changes.sh http://www.MyWebSite.org [First run] Archiving.. Second, run the command again. $ ./track_changes.sh http://www.MyWebSite.org Website has no changes Third, run the following command after making changes to the web page: $ ./track_changes.sh http://www.MyWebSite.org Changes: --- last.html 2010-08-01 07:29:15.000000000 +0200 +++ recent.html 2010-08-01 07:29:43.000000000 +0200 @@ -1,3 +1,4 @@ +added line :) data How it works... The script checks whether the script is running for the first time by using [ ! -e "last.html" ];. If last.html doesn't exist, it means that it is the first time and, the webpage must be downloaded and saved as last.html. If it is not the first time, it downloads the new copy recent.html and checks the difference with the diff utility. Any changes will be displayed as diff output.Finally, recent.html is copied to last.html. Note that changing the website you're checking will generate a huge diff file the first time you examine it. If you need to track multiple pages, you can create a folder for each website you intend to watch. Posting to a web page and reading the response POST and GET are two types of requests in HTTP to send information to, or retrieve information from a website. In a GET request, we send parameters (name-value pairs) through the webpage URL itself. The POST command places the key/value pairs in the message body instead of the URL. POST is commonly used when submitting long forms or to conceal the information submitted from a casual glance. Getting ready For this recipe, we will use the sample guestbook website included in the tclhttpd package.  You can download tclhttpd from http://sourceforge.net/projects/tclhttpd and then run it on your local system to create a local webserver. The guestbook page requests a name and URL which it adds to a guestbook to show who has visited a site when the user clicks the Add me to your guestbook button. This process can be automated with a single curl (or wget) command. How to do it... Download the tclhttpd package and cd to the bin folder. Start the tclhttpd daemon with this command: tclsh httpd.tcl The format to POST and read the HTML response from generic website resembles this: $ curl URL -d "postvar=postdata2&postvar2=postdata2" Consider the following example: $ curl http://127.0.0.1:8015/guestbook/newguest.html -d "name=Clif&url=www.noucorp.com&http=www.noucorp.com" curl prints a response page like this: <HTML> <Head> <title>Guestbook Registration Confirmed</title> </Head> <Body BGCOLOR=white TEXT=black> <a href="www.noucorp.com">www.noucorp.com</a> <DL> <DT>Name <DD>Clif <DT>URL <DD> </DL> www.noucorp.com </Body> -d is the argument used for posting. The string argument for -d is similar to the GET request semantics. var=value pairs are to be delimited by &. You can POST the data using wget by using --post-data "string". For example: $ wgethttp://127.0.0.1:8015/guestbook/newguest.cgi --post-data "name=Clif&url=www.noucorp.com&http=www.noucorp.com" -O output.html Use the same format as cURL for name-value pairs. The text in output.html is the same as that returned by the cURL command. The string to the post arguments (for example, to -d or --post-data) should always be given in quotes. If quotes are not used, & is interpreted by the shell to indicate that this should be a background process. How to do it... If you look at the website source (use the View Source option from the web browser), you will see an HTML form defined, similar to the following code: <form action="newguest.cgi"" method="post"> <ul> <li> Name: <input type="text" name="name" size="40"> <li> Url: <input type="text" name="url" size="40"> <input type="submit"> </ul> </form> Here, newguest.cgi is the target URL. When the user enters the details and clicks on the Submit button, the name and url inputs are sent to newguest.cgi as a POST request, and the response page is returned to the browser. Downloading a video from the internet There are many reasons for downloading a video. If you are on a metered service, you might want to download videos during off-hours when the rates are cheaper. You might want to watch videos where the bandwidth doesn't support streaming, or you might just want to make certain that you always have that video of cute cats to show your friends. Getting ready One program for downloading videos is youtube-dl. This is not included in most distributions and the repositories may not be up to date, so it's best to go to the youtube-dl main site:http://yt-dl.org You'll find links and information on that page for downloading and installing youtube-dl. How to do it… Using youtube-dl is easy. Open your browser and find a video you like. Then copy/paste that URL to the youtube-dl command line. youtube-dl  https://www.youtube.com/watch?v=AJrsl3fHQ74 While youtube-dl is downloading the file it will generate a status line on your terminal. How it works… The youtube-dl program works by sending a GET message to the server, just as a browser would do. It masquerades as a browser so that YouTube or other video providers will download a video as if the device were streaming. The –list-formats (-F) option will list the available formats a video is available in, and the –format (-f) option will specify which format to download. This is useful if you want to download a higher-resolution video than your internet connection can reliably stream. Summary In this article we learned how to download and parse website data, send data to forms, and automate website-usage tasks and similar activities. We can automate many activities that we perform interactively through a browser with a few lines of scripting. Resources for Article: Further resources on this subject: Linux Shell Scripting – various recipes to help you [article] Linux Shell Script: Tips and Tricks [article] Linux Shell Script: Monitoring Activities [article]
Read more
  • 0
  • 0
  • 33382
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-setting-intel-edison
Packt
21 Jun 2017
8 min read
Save for later

Setting up Intel Edison

Packt
21 Jun 2017
8 min read
In this article by Avirup Basu, the author of the book Intel Edison Projects, we will be covering the following topics: Setting up the Intel Edison Setting up the developer environment (For more resources related to this topic, see here.) In every Internet of Things(IoT) or robotics project, we have a controller that is the brain of the entire system. Similarly we have Intel Edison. The Intel Edison computing module comes in two different packages. One of which is a mini breakout board the other of which is an Arduino compatible board. One can use the board in its native state as well but in that case the person has to fabricate his/hers own expansion board. The Edison is basically a size of a SD card. Due to its tiny size, it's perfect for wearable devices. However it's capabilities makes it suitable for IoT application and above all, the powerful processing capability makes it suitable for robotics application. However we don't simply use the device in this state. We hook up the board with an expansion board. The expansion board provides the user with enough flexibility and compatibility for interfacing with other units. The Edison has an operating system that is running the entire system. It runs a Linux image. Thus, to setup your device, you initially need to configure your device both at the hardware and at software level. Initial hardware setup We'll concentrate on the Edison package that comes with an Arduino expansion board. Initially you will get two different pieces: The Intel® Edison board The Arduino expansion board The following given is the architecture of the device: Architecture of Intel Edison. Picture Credits: https://software.intel.com/en-us/ We need to hook these two pieces up in a single unit. Place the Edison board on top of the expansion board such that the GPIO interfaces meet at a single point. Gently push the Edison against the expansion board. You will get a click sound. Use the screws that comes with the package to tighten the set up. Once, this is done, we'll now setup the device both at hardware level and software level to be used further. Following are the steps we'll cover in details: Downloading necessary software packages Connecting your Intel® Edison to your PC Flashing your device with the Linux image Connecting to a Wi-Fi network SSH-ing your Intel® Edison device Downloading necessary software packages To move forward with the development on this platform, we need to download and install a couple of software which includes the drivers and the IDEs. Following is the list of the software along with the links that are required: Intel® Platform Flash Tool Lite (https://01.org/android-ia/downloads/intel-platform-flash-tool-lite) PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html) Intel XDK for IoT (https://software.intel.com/en-us/intel-xdk) Arduino IDE (https://www.arduino.cc/en/Main/Software) FileZilla FTP client (https://filezilla-project.org/download.php) Notepad ++ or any other editor (https://notepad-plus-plus.org/download/v7.3.html) Drivers and miscellaneous downloads Latest Yocto* Poky image Windows standalone driver for Intel Edison FTDI drivers (http://www.ftdichip.com/Drivers/VCP.htm) The 1st and the 2nd packages can be downloaded from (https://software.intel.com/en-us/iot/hardware/edison/downloads) Plugging in your device After all the software and drivers installation, we'll now connect the device to a PC. You need two Micro-B USB Cables(s) to connect your device to the PC. You can also use a 9V power adapter and a single Micro-B USB Cable, but for now we will not use the power adapter: Different sections of Arduino expansion board of Intel Edison A small switch exists between the USB port and the OTG port. This switch must be towards the OTG port because we're going to power the device from the OTG port and not through the DC power port. Once it is connected to your PC, open your device manager and expands the ports section. If all installations of drivers were successful, then you must see two ports: Intel Edison virtual com port USB serial port Flashing your device Once your device is successfully detected an installed, you need to flash your device with the Linux image. For this we'll use the flash tool provided by Intel: Open the flash lite tool and connect your device to the PC: Intel phone flash lite tool Once the flash tool is opened, click on Browse... and browse to the .zip file of the Linux image you have downloaded. After you click on OK, the tool will automatically unzip the file. Next, click on Start to flash: Intel® Phone flash lite tool – stage 1 You will be asked to disconnect and reconnect your device. Do as the tool says and the board should start flashing. It may take some time before the flashing is completed. You are requested not to tamper with the device during the process. Once the flashing is completed, we'll now configure the device: Intel® Phone flash lite tool – complete Configuring the device After flashing is successfully we'll now configure the device. We're going to use the PuTTY console for the configuration. PuTTY is an SSH and telnet client, developed originally by Simon Tatham for the Windows platform. We're going to use the serial section here. Before opening PuTTY console: Open up the device manager and note the port number for USB serial port. This will be used in your PuTTY console: Ports for Intel® Edison in PuTTY Next select Serialon PuTTY console and enter the port number. Use a baud rate of 115200. Press Open to open the window for communicating with the device: PuTTY console – login screen Once you are in the console of PuTTY, then you can execute commands to configure your Edison. Following is the set of tasks we'll do in the console to configure the device: Provide your device a name Provide root password (SSH your device) Connect your device to Wi-Fi Initially when in the console, you will be asked to login. Type in root and press Enter. Once entered you will see root@edison which means that you are in the root directory: PuTTY console – login success Now, we are in the Linux Terminal of the device. Firstly, we'll enter the following command for setup: configure_edison –setup Press Enter after entering the command and the entire configuration will be somewhat straightforward: PuTTY console – set password Firstly, you will be asked to set a password. Type in a password and press Enter. You need to type in your password again for confirmation. Next, we'll set up a name for the device: PuTTY console – set name Give a name for your device. Please note that this is not the login name for your device. It's just an alias for your device. Also the name should be at-least 5 characters long. Once you entered the name, it will ask for confirmation press y to confirm. Then it will ask you to setup Wi-Fi. Again select y to continue. It's not mandatory to setup Wi-Fi, but it's recommended. We need the Wi-Fi for file transfer, downloading packages, and so on: PuTTY console – set Wi-Fi Once the scanning is completed, we'll get a list of available networks. Select the number corresponding to your network and press Enter. In this case it 5 which corresponds to avirup171which is my Wi-Fi. Enter the network credentials. After you do that, your device will get connected to the Wi-Fi. You should get an IP after your device is connected: PuTTY console – set Wi-Fi -2 After successful connection you should get this screen. Make sure your PC is connected to the same network. Open up the browser in your PC, and enter the IP address as mentioned in the console. You should get a screen similar to this: Wi-Fi setup – completed Now, we are done with the initial setup. However Wi-Fi setup normally doesn't happens in one go. Sometimes your device doesn't gets connected to the Wi-Fi and sometimes we cannot get this page as shown before. In those cases you need to start wpa_cli to manually configure the Wi-Fi. Refer to the following link for the details: http://www.intel.com/content/www/us/en/support/boards-and-kits/000006202.html Summary In this article, we have covered the areas of initial setup of Intel Edison and configuring it to the network. We have also covered how to transfer files to the Edison and vice versa. Resources for Article: Further resources on this subject: Getting Started with Intel Galileo [article] Creating Basic Artificial Intelligence [article] Using IntelliTrace to Diagnose Problems with a Hosted Service [article]
Read more
  • 0
  • 0
  • 26088

article-image-understanding-basics-rxjava
Packt
20 Jun 2017
15 min read
Save for later

Understanding the Basics of RxJava

Packt
20 Jun 2017
15 min read
In this article, by Tadas Subonis author of the book Reactive Android Programming, will go through the core basics of RxJava so that we can fully understand what it is, what are the core elements, and how they work. Before that, let's take a step back and briefly discuss how RxJava is different from other approaches. RxJava is about reacting to results. It might be an item that originated from some source. It can also be an error. RxJava provides a framework to handle these items in a reactive way and to create complicated manipulation and handling schemes in a very easy-to-use interface. Things like waiting for an arrival of an item before transforming it become very easy with RxJava. To achieve all this, RxJava provides some basic primitives: Observables: A source of data Subscriptions: An activated handle to the Observable that receives data Schedulers: A means to define where (on which Thread) the data is processed First of all, we will cover Observables--the source of all the data and the core structure/class that we will be working with. We will explore how are they related to Disposables (Subscriptions). Furthermore, the life cycle and hook points of an Observable will be described, so we will actually know what's happening when an item travels through an Observable and what are the different stages that we can tap into. Finally, we will briefly introduce Flowable--a big brother of Observable that lets you handle big amounts of data with high rates of publishing. To summarize, we will cover these aspects: What is an Observable? What are Disposables (formerly Subscriptions)? How items travel through the Observable? What is backpressure and how we can use it with Flowable? Let's dive into it! (For more resources related to this topic, see here.) Observables Everything starts with an Observable. It's a source of data that you can observe for emitted data (hence the name). In almost all cases, you will be working with the Observable class. It is possible to (and we will!) combine different Observables into one Observable. Basically, it is a universal interface to tap into data streams in a reactive way. There are lots of different ways of how one can create Observables. The simplest way is to use the .just() method like we did before: Observable.just("First item", "Second item"); It is usually a perfect way to glue non-Rx-like parts of the code to Rx compatible flow. When an Observable is created, it is not usually defined when it will start emitting data. If it was created using simple tools such as.just(), it won't start emitting data until there is a subscription to the observable. How do you create a subscription? It's done by calling .subscribe() : Observable.just("First item", "Second item") .subscribe(); Usually (but not always), the observable be activated the moment somebody subscribes to it. So, if a new Observable was just created, it won't magically start sending data "somewhere". Hot and Cold Observables Quite often, in the literature and documentation terms, Hot and Cold Observables can be found. Cold Observable is the most common Observable type. For example, it can be created with the following code: Observable.just("First item", "Second item") .subscribe(); Cold Observable means that the items won't be emitted by the Observable until there is a Subscriber. This means that before the .subscribe() is called, no items will be produced and thus none of the items that are intended to be omitted will be missed, everything will be processed. Hot Observable is an Observable that will begin producing (emitting) items internally as soon as it is created. The status updates are produced constantly and it doesn't matter if there is something that is ready to receive them (like Subscription). If there were no subscriptions to the Observable, it means that the updates will be lost. Disposables A disposable (previously called Subscription in RxJava 1.0) is a tool that can be used to control the life cycle of an Observable. If the stream of data that the Observable is producing is boundless, it means that it will stay active forever. It might not be a problem for a server-side application, but it can cause some serious trouble on Android. Usually, this is the common source of memory leaks. Obtaining a reference to a disposable is pretty simple: Disposable disposable = Observable.just("First item", "Second item") .subscribe(); Disposable is a very simple interface. It has only two methods: dispose() and isDisposed() .  dispose() can be used to cancel the existing Disposable (Subscription). This will stop the call of .subscribe()to receive any further items from Observable, and the Observable itself will be cleaned up. isDisposed() has a pretty straightforward function--it checks whether the subscription is still active. However, it is not used very often in regular code as the subscriptions are usually unsubscribed and forgotten. The disposed subscriptions (Disposables) cannot be re-enabled. They can only be created anew. Finally, Disposables can be grouped using CompositeDisposable like this: Disposable disposable = new CompositeDisposable( Observable.just("First item", "Second item").subscribe(), Observable.just("1", "2").subscribe(), Observable.just("One", "Two").subscribe() ); It's useful in the cases when there are many Observables that should be canceled at the same time, for example, an Activity being destroyed. Schedulers As described in the documentation, a scheduler is something that can schedule a unit of work to be executed now or later. In practice, it means that Schedulers control where the code will actually be executed and usually that means selecting some kind of specific thread. Most often, Subscribers are used to executing long-running tasks on some background thread so that it wouldn't block the main computation or UI thread. This is especially relevant on Android when all long-running tasks must not be executed on MainThread. Schedulers can be set with a simple .subscribeOn() call: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .subscribe(); There are only a few main Schedulers that are commonly used: Schedulers.io() Schedulers.computation() Schedulers.newThread() AndroidSchedulers.mainThread() The AndroidSchedulers.mainThread() is only used on Android systems. Scheduling examples Let's explore how schedulers work by checking out a few examples. Let's run the following code: Observable.just("First item", "Second item") .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as follows: on-next:main:First item subscribe:main:First item on-next:main:Second item subscribe:main:Second item Now let's try changing the code to as shown: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); Now, the output should look like this: on-next:RxCachedThreadScheduler-1:First item subscribe:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:RxCachedThreadScheduler-1:Second item We can see how the code was executed on the main thread in the first case and on a new thread in the next. Android requires that all UI modifications should be done on the main thread. So, how can we execute a long-running process in the background but process the result on the main thread? That can be done with .observeOn() method: Observable.just("First item", "Second item") .subscribeOn(Schedulers.io()) .doOnNext(e -> Log.d("APP", "on-next:" + Thread.currentThread().getName() + ":" + e)) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> Log.d("APP", "subscribe:" + Thread.currentThread().getName() + ":" + e)); The output will be as illustrated: on-next:RxCachedThreadScheduler-1:First item on-next:RxCachedThreadScheduler-1:Second item subscribe:main:First item subscribe:main:Second item You will note that the items in the doOnNext block were executed on the "RxThread", and the subscribe block items were executed on the main thread. Investigating the Flow of Observable The logging inside the steps of an Observable is a very powerful tool when one wants to understand how they work. If you are in doubt at any point as to what's happening, add logging and experiment. A few quick iterations with logs will definitely help you understand what's going on under the hood. Let's use this technique to analyze a full flow of an Observable. We will start off with this script: private void log(String stage, String item) { Log.d("APP", stage + ":" + Thread.currentThread().getName() + ":" + item); } private void log(String stage) { Log.d("APP", stage + ":" + Thread.currentThread().getName()); } Observable.just("One", "Two") .subscribeOn(Schedulers.io()) .doOnDispose(() -> log("doOnDispose")) .doOnComplete(() -> log("doOnComplete")) .doOnNext(e -> log("doOnNext", e)) .doOnEach(e -> log("doOnEach")) .doOnSubscribe((e) -> log("doOnSubscribe")) .doOnTerminate(() -> log("doOnTerminate")) .doFinally(() -> log("doFinally")) .observeOn(AndroidSchedulers.mainThread()) .subscribe(e -> log("subscribe", e)); It can be seen that it has lots of additional and unfamiliar steps (more about this later). They represent different stages during the processing of an Observable. So, what's the output of the preceding script?: doOnSubscribe:main doOnNext:RxCachedThreadScheduler-1:One doOnEach:RxCachedThreadScheduler-1 doOnNext:RxCachedThreadScheduler-1:Two doOnEach:RxCachedThreadScheduler-1 doOnComplete:RxCachedThreadScheduler-1 doOnEach:RxCachedThreadScheduler-1 doOnTerminate:RxCachedThreadScheduler-1 doFinally:RxCachedThreadScheduler-1 subscribe:main:One subscribe:main:Two doOnDispose:main Let's go through some of the steps. First of all, by calling .subscribe() the doOnSubscribe block was executed. This started the emission of items from the Observable as we can see on the doOnNext and doOnEach lines. Finally, the stream finished at termination life cycle was activated--the doOnComplete, doOnTerminate and doOnFinally. Also, the reader will note that the doOnDispose block was called on the main thread along with the subscribe block. The flow will be a little different if .subscribeOn() and .observeOn() calls won't be there: doOnSubscribe:main doOnNext:main:One doOnEach:main subscribe:main:One doOnNext:main:Two doOnEach:main subscribe:main:Two doOnComplete:main doOnEach:main doOnTerminate:main doOnDispose:main doFinally:main You will readily note that now, the doFinally block was executed after doOnDispose while in the former setup, doOnDispose was the last. This happens due to the way Android Looper schedulers code blocks for execution and the fact that we used two different threads in the first case. The takeaway here is that whenever you are unsure of what is going on, start logging actions (and the thread they are running on) to see what's actually happening. Flowable Flowable can be regarded as a special type of Observable (but internally it isn't). It has almost the same method signature like the Observable as well. The difference is that Flowable allows you to process items that emitted faster from the source than some of the following steps can handle. It might sound confusing, so let's analyze an example. Assume that you have a source that can emit a million items per second. However, the next step uses those items to do a network request. We know, for sure, that we cannot do more than 50 requests per second: That poses a problem. What will we do after 60 seconds? There will be 60 million items in the queue waiting to be processed. The items are accumulating at a rate of 1 million items per second between the first and the second steps because the second step processes them at a much slower rate. Clearly, the problem here is that the available memory will be exhausted and the programming will fail with an OutOfMemory (OOM) exception. For example, this script will cause an excessive memory usage because the processing step just won't be able to keep up with the pace the items are emitted at. PublishSubject<Integer> observable = PublishSubject.create(); observable .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); for (int i = 0; i < 1000000; i++) { observable.onNext(i); } private void log(Throwable throwable) { Log.e("APP", "Error", throwable); } By converting this to a Flowable, we can start controlling this behavior: observable.toFlowable(BackpressureStrategy.MISSING) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Since we have chosen not to specify how we want to handle items that cannot be processed (it's called Backpressuring), it will throw a MissingBackpressureException. However, if the number of items was 100 instead of a million, it would have been just fine as it wouldn't hit the internal buffer of Flowable. By default, the size of the Flowable queue (buffer) is 128. There are a few Backpressure strategies that will define how the excessive amount of items should be handled. Drop Items Dropping means that if the downstream processing steps cannot keep up with the pace of the source Observable, just drop the data that cannot be handled. This can only be used in the cases when losing data is okay, and you care more about the values that were emitted in the beginning. There are a few ways in which items can be dropped. The first one is just to specify Backpressure strategy, like this: observable.toFlowable(BackpressureStrategy.DROP) Alternatively, it will be like this: observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureDrop() A similar way to do that would be to call .sample(). It will emit items only periodically, and it will take only the last value that's available (while BackpressureStrategy.DROP drops it instantly unless it is free to push it down the stream). All the other values between "ticks" will be dropped: observable.toFlowable(BackpressureStrategy.MISSING) .sample(10, TimeUnit.MILLISECONDS) .observeOn(Schedulers.computation()) .subscribe(v -> log("s", v.toString()), this::log); Preserve Latest Item Preserving the last items means that if the downstream cannot cope with the items that are being sent to them, stop emitting values and wait until they become available. While waiting, keep dropping all the values except the last one that arrived and when the downstream becomes available to send the last message that's currently stored. Like with Dropping, the "Latest" strategy can be specified while creating an Observable: observable.toFlowable(BackpressureStrategy.LATEST) Alternatively, by calling .onBackpressure(): observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureLatest() Finally, a method, .debounce(), can periodically take the last value at specific intervals: observable.toFlowable(BackpressureStrategy.MISSING) .debounce(10, TimeUnit.MILLISECONDS) Buffering It's usually a poor way to handle different paces of items being emitted and consumed as it often just delays the problem. However, this can work just fine if there is just a temporal slowdown in one of the consumers. In this case, the items emitted will be stored until later processing and when the slowdown is over, the consumers will catch up. If the consumers cannot catch up, at some point the buffer will run out and we can see a very similar behavior to the original Observable with memory running out. Enabling buffers is, again, pretty straightforward by calling the following: observable.toFlowable(BackpressureStrategy.BUFFER) or observable.toFlowable(BackpressureStrategy.MISSING) .onBackpressureBuffer() If there is a need to specify a particular value for the buffer, one can use .buffer(): observable.toFlowable(BackpressureStrategy.MISSING) .buffer(10) Completable, Single, and Maybe Types Besides the types of Observable and Flowable, there are three more types that RxJava provides: Completable: It represents an action without a result that will be completed in the future Single: It's just like Observable (or Flowable) that returns a single item instead of a stream Maybe: It stands for an action that can complete (or fail) without returning any value (like Completable) but can also return an item like Single However, all these are used quite rarely. Let's take a quick look at the examples. Completable Since Completable can basically process just two types of actions--onComplete and onError--we will cover it very briefly. Completable has many static factory methods available to create it but, most often, it will just be found as a return value in some other libraries. For example, the Completable can be created by calling the following: Completable completable = Completable.fromAction(() -> { log("Let's do something"); }); Then, it is to be subscribed with the following: completable.subscribe(() -> { log("Finished"); }, throwable -> { log(throwable); }); Single Single provides a way to represent an Observable that will return just a single item (thus the name). You might ask, why it is worth having it at all? These types are useful to tell the developers about the specific behavior that they should expect. To create a Single, one can use this example: Single.just("One item") The Single and the Subscription to it can be created with the following: Single.just("One item") .subscribe((item) -> { log(item); }, (throwable) -> { log(throwable); }); Make a note that this differs from Completable in that the first argument to the .subscribe() action now expects to receive an item as a result. Maybe Finally, the Maybe type is very similar to the Single type, but the item might not be returned to the subscriber in the end. The Maybe type can be created in a very similar fashion as before: Maybe.empty(); or like Maybe.just("Item"); However, the .subscribe() can be called with arguments dedicated to handling onSuccess (for received items), onError (to handle errors), and onComplete (to do a final action after the item is handled): Maybe.just("Item") .subscribe( s -> log("success: " + s), throwable -> log("error"), () -> log("onComplete") ); Summary In this article, we covered the most essentials parts of RxJava. Resources for Article: Further resources on this subject: The Art of Android Development Using Android Studio [article] Drawing and Drawables in Android Canvas [article] Optimizing Games for Android [article]
Read more
  • 0
  • 0
  • 14739

article-image-grouping-sets-advanced-sql
Packt
20 Jun 2017
6 min read
Save for later

Grouping Sets in Advanced SQL

Packt
20 Jun 2017
6 min read
In this article by Hans JurgenSchonig, the author of the book Mastering PostgreSQL 9.6, we will learn about advanced SQL. Introducing grouping sets Every advanced user of SQL should be familiar with GROUP BY and HAVING clauses. But are you also aware of CUBE, ROLLUP, and GROUPING SETS? If not this articlemight be worth reading for you. Loading some sample data To make this article a pleasant experience for you, I have compiled some sample data, which has been taken from the BP energy report at http://www.bp.com/en/global/corporate/energy-economics/statistical-review-of-world-energy.html. Here is the data structure, which will be used: test=# CREATE TABLE t_oil ( region text, country text, year int, production int, consumption int ); CREATE TABLE The test data can be downloaded from our website using curl directly: test=# COPY t_oil FROM PROGRAM 'curl www.cybertec.at/secret/oil_ext.txt'; COPY 644 On some operating systems curl is not there by default or has not been installed so downloading the file before might be the easier option for many people. All together there is data for 14 nations between 1965 and 2010, which are in two regions of the world: test=# SELECT region, avg(production) FROM t_oil GROUP BY region; region | avg ---------------+--------------------- Middle East | 1992.6036866359447005 North America | 4541.3623188405797101 (2 rows) Applying grouping sets The GROUP BY clause will turn many rows into one row per group. However, if you do reporting in real life, you might also be interested in the overall average. One additional line might be needed. Here is how this can be achieved: test=# SELECT region, avg(production) FROM t_oil GROUP BY ROLLUP (region); region | avg ---------------+----------------------- Middle East | 1992.6036866359447005 North America | 4541.3623188405797101 | 2607.5139860139860140 (3 rows) The ROLLUP clause will inject an additional line, which will contain the overall average. If you do reporting it is highly likely that a summary line will be needed. Instead of running two queries, PostgreSQL can provide the data running just a single query. Of course this kind of operation can also be used if you are grouping by more than just one column: test=# SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY ROLLUP (region, country); region | country | avg ---------------+---------+----------------------- Middle East | Iran | 3631.6956521739130435 Middle East | Oman | 586.4545454545454545 Middle East | | 2142.9111111111111111 North America | Canada | 2123.2173913043478261 North America | USA | 9141.3478260869565217 North America | | 5632.2826086956521739 | | 3906.7692307692307692 (7 rows) In this example, PostgreSQL will inject three lines into the result set. One line will be injected for Middle East, one for North America. On top of that we will get a line for the overall averages. If you are building a web application the current result is ideal because you can easily build a GUI to drill into the result set by filtering out the NULL values. The ROLLUPclause is nice in case you instantly want to display a result. I always used it to display final results to end users. However, if you are doing reporting, you might want to pre-calculate more data to ensure more flexibility. The CUBEkeyword is what you might have been looking for: test=# SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY CUBE (region, country); region | country | avg ---------------+---------+----------------------- Middle East | Iran | 3631.6956521739130435 Middle East | Oman | 586.4545454545454545 Middle East | | 2142.9111111111111111 North America | Canada | 2123.2173913043478261 North America | USA | 9141.3478260869565217 North America | | 5632.2826086956521739 | | 3906.7692307692307692 | Canada | 2123.2173913043478261 | Iran | 3631.6956521739130435 | Oman | 586.4545454545454545 | USA | 9141.3478260869565217 (11 rows) Note that even more rows have been added to the result. The CUBEwill create the same data as: GROUP BY region, country + GROUP BY region + GROUP BY country + the overall average. So the whole idea is to extract many results and various levels of aggregation at once. The resulting cube contains all possible combinations of groups. The ROLLUP and CUBE are really just convenience features on top of GROUP SETS. With the GROUPING SETS clause you can explicitly list the aggregates you want: test=# SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY GROUPING SETS ( (), region, country); region | country | avg ---------------+---------+----------------------- Middle East | | 2142.9111111111111111 North America | | 5632.2826086956521739 | | 3906.7692307692307692 | Canada | 2123.2173913043478261 | Iran | 3631.6956521739130435 | Oman | 586.4545454545454545 | USA | 9141.3478260869565217 (7 rows) In this I went for three grouping sets: The overall average, GROUP BY region and GROUP BY country. In case you want region and country combined, use (region, country). Investigating performance Grouping sets are a powerful feature, which help to reduce the number of expensive queries. Internally,PostgreSQL will basically turn to traditional GroupAggregates to make things work. A GroupAggregate node requires sorted data so be prepared that PostgreSQL might do a lot of temporary sorting: test=# explain SELECT region, country, avg(production) FROM t_oil WHERE country IN ('USA', 'Canada', 'Iran', 'Oman') GROUP BY GROUPING SETS ( (), region, country); QUERY PLAN --------------------------------------------------------------- GroupAggregate (cost=22.58..32.69 rows=34 width=52) Group Key: region Group Key: () Sort Key: country Group Key: country -> Sort (cost=22.58..23.04 rows=184 width=24) Sort Key: region ->Seq Scan on t_oil (cost=0.00..15.66 rows=184 width=24) Filter: (country = ANY ('{USA,Canada,Iran,Oman}'::text[])) (9 rows) Hash aggregates are only supported for normal GROUP BY clauses involving no grouping sets. According to the developer of grouping sets (AtriShama), adding support for hashes is not worth the effort so it seems PostgreSQL already has an efficient implementation even if the optimizer has fewer choices than it has with normal GROUP BY statements. Combining grouping sets with the FILTER clause In real world applications grouping sets can often be combined with so called FILTER clauses. The idea behind FILTER is to be able to run partial aggregates. Here is an example: test=# SELECT region, avg(production) AS all, avg(production) FILTER (WHERE year < 1990) AS old, avg(production) FILTER (WHERE year >= 1990) AS new FROM t_oil GROUP BY ROLLUP (region); region | all | old | new ---------------+----------------+----------------+---------------- Middle East | 1992.603686635 | 1747.325892857 | 2254.233333333 North America | 4541.362318840 | 4471.653333333 | 4624.349206349 | 2607.513986013 | 2430.685618729 | 2801.183150183 (3 rows) The idea here is that not all columns will use the same data for aggregation. The FILTER clauses allow you to selectively pass data to those aggregates. In my example, the second aggregate will only consider data before 1990 while the second aggregate will take care of more recent data. If it is possible to move conditions to a WHERE clause it is always more desirable as less data has to be fetched from the table. The FILTERis only useful if the data left by the WHERE clause is not needed by each aggregate. The FILTER works for all kinds of aggregates and offers a simple way to pivot your data. Summary We have learned about advanced feature provided by SQL. On top of the simple aggregates,PostgreSQL provides, grouping sets to create custom aggregates.  Resources for Article: Further resources on this subject: PostgreSQL in Action [article] PostgreSQL as an Extensible RDBMS [article] Recovery in PostgreSQL 9 [article]
Read more
  • 0
  • 0
  • 2721

article-image-introduction-nfrs
Packt
20 Jun 2017
14 min read
Save for later

Introduction to NFRs

Packt
20 Jun 2017
14 min read
In this article by Sameer Paradkar, the author of the book Mastering Non-Functional Requirements, we will learn the non-functional requirements are those aspects of the IT system that, while not directly affect the business functionality of the application but have a profound impact on the efficiency and effectiveness of business systems for end users as well as the people responsible for supporting the program. The definition of these requirements is an essential factor in developing a total customer solution that delivers business goals. Non-functional requirements are used primarily to drive the operational aspects of the architecture, in other words, to address major operational and technical areas of the system to ensure the robustness and ruggedness of the application. Benchmark or Proof-of-Concept can be used to verify if the implementation meets these requirements or indicate if a corrective action is necessary. Ideally, a series of tests should be planned that maps to the development schedule and grows in complexity. The topics that are covered in this article are as follows: Definition of NFRs NFR KPIs and metrics (For more resources related to this topic, see here.) Introducing NFR The following pointers state the definition of NFR: To define requirements and constraints on the IT system As a basis for cost estimates and early system sizing To assess the viability of the proposed IT system NFRs are an important determining factor of the architecture and design of the operational models As an guideline to design phase to meet NFRs such as performance, scalability, availability The NFRs foreach of the domains e.g. scalability, availability and so on,must be understood to facilitate the design and development of the target operating model. These include the servers, networks, and platforms including the application runtime environments. These are critical for the execution of benchmark tests. They also affect the design of technical and application components. End users have expectations about the effectiveness of the application. These characteristics include ease of software use, speed, reliability, and recoverability when unexpected conditions arise. The NFRs define these aspects of the IT system. The non-functional requirements should be defined precisely and involves quantifying them. NFRs should provide measurements the application must meet. For example, the maximum number of time allowed to execute a process, the number of hours in a day an application must be available, the maximum size of a database on disk, and the number of concurrent users supported are typical NFRs the software must implement. Figure 1: Key Non-Functional Requirements There are many kinds of non-functional requirements, including: Performance Performance is the responsiveness of the application to perform specific actions in a given time span. Performance is scored in terms of throughput or latency. Latency is the time taken by the application to respond to an event. Throughput is the number of events scored in a given time interval. An application’s performance can directly impact its scalability. Enhancing application’s performance often enhances scalability by reducing contention for shared resources. Performance attributes specify the timing characteristics of the application. Certain features are more time-sensitive than others; the NFRs should identify such software tasks that have constraints on their performance. Response time relates to the time needed to complete specific business processes, batch or interactive, within the target business system. The system must be designed to fulfil the agreed upon response time requirements, while supporting the defined workload mapped against the given static baseline, on a system platform that does not exceed the stated utilization. The following attributes are: Throughput: The ability of the system to execute a given number of transactions within a given unit of time Response times: The distribution of time which the system takes to respond to the request Scalability Scalability is the ability to handle an increase in the work load without impacting the performance, or the ability to quickly expand the architecture. Itis the ability to expand the architecture to accommodate more users, more processes, more transactions, additional systems and services as the business requirements change and the systems evolve to meet the future business demands. This permits existing systems to be extended without replacing them. Thisdirectly affects the architecture and the selection of software components and hardware. The solution must allow the hardware and the deployed software services and components to be scaled horizontally as well as vertically. Horizontal scaling involves replicating the same functionality across additional nodes vertical scaling involves the same functionality across bigger and more powerful nodes. Scalability definitions measure volumes of users and data the system should support. There are two key techniques for improving both vertical and horizontal scalability. Vertical Scaling is also known as scaling up and includes adding more resources such as memory, CPUand hard disk to a system. Horizontal scaling is also know as scaling out and includes adding more nodes to a cluster forwork load sharing. The following attributes are: Throughput: Number of maximum transactions your system needs to handle. E.g., thousand a day or A million Storage: Amount  of data you going to need to store Growth requirements: Data growth in the next 3-5 years Availability Availability is the time frame in which the system functions normally and without failures. Availability is measured as the percentage of total application downtime over a defined time period. Availability is affected by failures, exceptions, infrastructure issues, malicious attacks, and maintenance and upgrades. It is the uptime or the amount of time the system is operational and available for use. This is specified because some systems are architected with expected downtime for activities like database upgrades and backups. Availability also conveys the number of hours or days per week or weeks per year the application will be available to its end customers, as well as how rapidly it can recover from faults. Since the architecture establishes software, hardware, and networking entities, this requirement extends to all of them. Hardware availability, recoverability, and reliability definitions measure system up-time. For example, it is specified in terms of mean time between failures or “MTBF”. The following attributes are: Availability: Application availability considering the weekends, holidays and maintenance times and failures. Locations of operation: Geographic location, Connection requirements and the restrictions of the network prevail. Offline Requirement: Time available for offline operations including batch processing & system maintenance. Length of time between failures Recoverability: Time required by the system can resume operation in the event of failure. Resilience: The reliability characteristics of the system and sub-components Capacity This non-functional requirement defines the ways in which the system is expected to scale-up by increasing capacity, hardware or adding machines based on business objectives. Capacity is delivering enough functionality required for the end users.  A request for a web service to provide 1,000 requests per second when the server is only capable of 100 requests a second, may not succeed.  While this sounds like an availability issue, it occurs because the server is unable to handle the requisite capacity. A single node may not be able to provide enough capacity, and one needs to deploy multiple nodes with a similar configuration to meet organizational capacity requirements. Capacity to identify a failing node and restart it on another machine or VM is a non-functional requirement. The following attributes are: Throughput is the number of peak transactions the system needs to handle Storage: Volume of data the system can persist at run time to disk and relates to the memory/disk Year-on-yeargrowthrequirements (users, processing and storage) e-channel growth projections Different types of things (for example, activities or transactions supported, and so on) For each type of transaction, volumes on an hourly, daily, weekly, monthly, and so on During the specific time of the day (for example, at lunch), week, month or year are volumes significantly higher Transaction volume growth expected and additional volumes you will be able to handle Security Security is the ability of an application to avoid malicious incidences and events outside of the designed system usage, and prevent disclosure or loss of information. Improving security increases the reliability of application by reducing the likelihood of an attack succeeding and impairing operations. Adding security controls protects assets and prevents unauthorized access and manipulation of critical information. The factors that affect an application security are confidentiality and integrity. The key security controls used to secure systems are authorization, authentication, encryption, auditing, and logging. Definition and monitoring of effectiveness in meeting the security requirements of the system, for example, to avoid financial harm in accounting systems, is critical. Integrityrequirements are restrictingaccess to functionality or data to certain users and protecting the privacyof data entered into the software. The following attributes are: Authentication: Correct identification of parties attempting to access systems and protection of systems from unauthorized parties Authorization: Mechanism required to authorize users to perform different functions within the systems Encryption(data at rest or data in flight): All external communications between the data server and clients must beencrypted Data confidentiality: All data must be protectively marked, stored and protected Compliance: The process to confirm systems compliance with the organization's security standards and policies  Maintainability Maintainability is the ability of any application to go through modifications and updates with a degree of ease. This is the degree of flexibility with which the application can be modified, whether for bug fixes or to update functionality. These changes may impact any of the components, services, functionality, or interfaces in the application landscape while modifying to fix errors, or to meet changing business requirements. This is also a degree of time it takes to restore the system to its normal state following a failure or fault. Improving maintainability can improve the availability and reduce the run-time defects. Application’s maintainability is dependent on the overall quality attributes. It is critical as a large chunk of the IT budget is spent on maintenance of systems. The more maintainable a system is the lower the total cost of ownership. The following attributes are: Conformance to design standards, coding standards, best practices, reference architectures, and frameworks. Flexibility: The degree to which the system is intended to support change Release support: The way in which the system supports the introduction of initial release, phased rollouts and future releases Manageability Manageability is the ease with which the administrators can manage the application, through useful instrumentation exposed for monitoring. It is the ability of the system or the group of the system to provide key information to the operations and support team to be able to debug, analyze and understand the root cause of failures. It deals with compliance/governance with the domain frameworks and polices. The key is to design the application that is easy to manage, by exposing useful instrumentation for monitoring systems and for understanding the cause of failures. The following attributes are: System must maintain total traceability of transactions Businessobjectsand database fields are part of auditing User and transactional timestamps. File characteristics include size before, size after and structure Getting events and alerts as thresholds (for example, memory, storage, processor) are breached Remotely manage applications and create new virtual instances at the click of a button Rich graphical dashboard for all key applications metrics and KPI Reliability Reliability is the ability of the application to maintain its integrity and veracity over a time span and also in the event of faults or exceptions. It is measured as the probability that the software will not fail and that it will continue functioning for a defined time interval. It alsospecifies the ability of the system to maintain its performance over a time span. Unreliable software is prone to failures anda few processes may be more sensitive to failure than others, because such processes may not be able to recover from a fault or exceptions. The following attributes are: The characteristic of a system to perform its functions under stated conditions for a specificperiod of time. Mean Time To Recovery: Time is available to get the system back up online. Mean Time Between Failures – Acceptable threshold for downtime Data integrity is also known as referential integrity in database tables and interfaces Application Integrity and Information Integrity: during transactions Fault trapping (I/O): Handling failures and recovery Extensibility Extensibility is the ability of a system to cater to future changes through flexible architecture, design or implementation. Extensible applications have excellent endurance, which prevents the expensive processes of procuring large inflexible applications and retiring them due to changes in business needs. Extensibility enables organizations to take advantage of opportunities and respond to risks. While there is a significant difference extensibility is often tangled with modifiability quality. Modifiability means that is possible to change the software whereas extensibility means that change has been planned and will be effortless. Adaptability is at times erroneously leveraged with extensibility. However, adaptability deals with how the user interactions with the system are managed and governed. Extensibilityallows a system, people, technology, information, and processes all working together to achieve following objectives: The following attributes are: Handle new information types Manage new or changed business entities Consume or provide new feeds Recovery In the event of a natural calamity for example, flood or hurricane, the entire facility where the application is hosted may become inoperable or inaccessible. Business-critical applications should have a strategy to recover from such disasters within a reasonable amount of time frame. The solution implementing various processes must be integrated with the existing enterprise disaster recovery plan. The processes must be analysed to understand the criticality of each process to the business, the impact of loss to the business in case of non-availability of the process. Based on this analysis, appropriate disaster procedures must be developed, and plans should be outlined. As part of disaster recovery, electronic backups of data and procedures must be maintained at the recovery location and be retrievable within the appropriate time frames for system function restoration. In the case of high criticality, real-time mirroring to a mirror site should be deployed. The following attributes are: Recoveryprocess: Recovery Time Objectives(RTO) / Recovery Point Objectives(RPO) Restore time: Time required switching to the secondary site when the primary fails RPO/Backup time: Time it takes to back your data Backup frequencies: Frequency of backing-up the transaction data, configuration data and code Interoperability Interoperability is the ability to exchange information and communicate with internal and external applications and systems. Interoperable systems make it easier to exchange information both internally and externally. The data formats, transport protocols and interfaces are the key attributes for architecting interoperable systems. Standardization of data formats, transport protocols and interfaces are the key aspect to be considered when architecting interoperable system. Interoperability is achieved through: Publishing and describing interfaces Describing the syntax used to communicate Describing the semantics of information it produces and consumes Leveraging open standards to communicate with external systems Loosely coupled with external systems The following attributes are: Compatibility with shared applications: Other system it needs to integrate Compatibility with 3rd party applications: Other systems it has to live with amicably Compatibility with various OS: Different OS compatibility Compatibility on different platforms: Hardware platforms it needs to work on Usability Usability measures characteristics such as consistency and aesthetics in the user interface. Consistency is the constant use of mechanisms employed in the user interface while Aesthetics refers to the artistic, visual quality of the user interface. It is the ease at which the users operate the system and make productive use of it. Usability is discussed with relation to the system interfaces, but it can just as well be applied to any tool, device, or rich system. This addresses the factors that establish the ability of the software to be understood, used, and learned by its intended users. The application interfaces must be designed with end users in mind so that they are intuitive to use, are localized, provide access for differently abled users, and provide an excellent overall user experience. The following attributes are: Look and feel standards: Layout and flow, screen element density, keyboard shortcuts, UI metaphors, colors. Localization/Internationalization requirements: Keyboards, paper sizes, languages, spellings, and so on Summary It explains he introduction of NFRs and why NFRs are a critical for building software systems. The article also explained various KPI for each of the key of NFRs i.e. scalability, availability, reliability and do on.  Resources for Article: Further resources on this subject: Software Documentation with Trac [article] The Software Task Management Tool - Rake [article] Installing Software and Updates [article]
Read more
  • 0
  • 0
  • 7250
article-image-analyzing-social-networks-facebook
Packt
20 Jun 2017
15 min read
Save for later

Analyzing Social Networks with Facebook

Packt
20 Jun 2017
15 min read
In this article by Raghav Bali, Dipanjan Sarkar and Tushar Sharma, the authors of the book Learning Social Media Analytics with R, we got a good flavor of the various aspects related to the most popular social micro-blogging platform, Twitter. In this article, we will look more closely at the most popular social networking platform, Facebook. With more than 1.8 billion monthly active users, over 18 billion dollars annual revenue and record breaking acquisitions for popular products including Oculus, WhatsApp and Instagram have truly made Facebook the core of the social media network today. (For more resources related to this topic, see here.) Before we put Facebook data under the microscope, let us briefly look at Facebook’s interesting origins! Like many popular products, businesses and organizations, Facebook too had a humble beginning. Originally starting off as Mark Zuckerberg’s brainchild in 2004, it was initially known as “Thefacebook” located at thefacebook.com, which was branded as an online social network, connecting university and college students. While this social network was only open to Harvard students in the beginning, it soon expanded within a month by including students from other popular universities. In 2005, the domain facebook.com was finally purchased and “Facebook” extended its membership to employees of companies and organizations for the first time. Finally in 2006, Facebook was finally opened to everyone above 13 years of age and having a valid email address. The following snapshot shows us how the look and feel of the Facebook platform has evolved over the years! Facebook’s evolving look over time While Facebook has a primary website, also known as a web application, it has also launched mobile applications for the major operating systems on handheld devices. In short, Facebook is not just a social network website but an entire platform including a huge social network of connected people and organizations through friends, followers and pages. We will leverage Facebook’s social “Graph API” to access actual Facebook data to perform various analyses. Users, brands, business, news channels, media houses, retail stores and many more are using Facebook actively on a daily basis for producing and consuming content. This generates vast amount of data and a substantial amount of this is available to users through its APIs.  From a social media analytics perspective, this is really exciting because this treasure trove of data with easy to access APIs and powerful open source libraries from R, gives us enormous potential and opportunity to get valuable information from analyzing this data in various ways. We will follow a structured path in this article and cover the following major topics sequentially to ensure that you do not get overwhelmed with too much content at once. Accessing Facebook data Analyzing your personal social network Analyzing an English football social network Analyzing English football clubs’ brand page engagements We will use libraries like Rfacebook, igraph and ggplot2 to retrieve, analyze and visualize data from Facebook. All the following sections of the book assume that you have a Facebook account which is necessary to access data from the APIs and analyze it. In case you do not have an account, do not despair. You can use the data and code files for this article to follow along with the hands-on examples to gain a better understanding of the concepts of social network and engagement analysis.    Accessing Facebook data You will find a lot of content in several books and on the web about various techniques to access and retrieve data from Facebook. There are several official ways of doing this which include using the Facebook Graph API either directly through low level HTTP based calls or indirectly through higher level abstract interfaces belonging to libraries like Rfacebook. Some alternate ways of retrieving Facebook data would be to use registered applications on Facebook like Netvizz or the GetNet application built by Lada Adamic, used in her very popular “Social Network Analysis” course (Unfortunately http://snacourse.com/getnet is not working since Facebook completely changed its API access permissions and privacy settings). Unofficial ways include techniques like web scraping and crawling to extract data. Do note though that Facebook considers this to be a violation of its terms and conditions of accessing data and you should try and avoid crawling Facebook for data especially if you plan to use it for commercial purposes. In this section, we will take a closer look at the Graph API and the Rfacebook package in R. The main focus will be on how you can extract data from Facebook using both of them. Understanding the Graph API To start using the Graph API, you would need to have an account on Facebook to be able to use the API. You can access the API in various ways. You can create an application on Facebook by going to https://developers.facebook.com/apps/ and then create a long-lived OAuth access token using the fbOAuth(…)function from the Rfacebook package. This enables R to make calls to the Graph API and you can also store this token on the disk and load it for future use. An easier way is to create a short-lived token which would let you access the API data for about two hours by going to the Facebook Graph API Explorer page which is available at https://developers.facebook.com/tools/explorer and get a temporary access token from there. The following snapshot depicts how to get an access token for the Graph API from Facebook. Facebook’s Graph API explorer On clicking “Get User Access Token” in the above snapshot, it will present a list of checkboxes with various permissions which you might need for accessing data including user data permissions, events, groups and pages and other miscellaneous permissions. You can select the ones you need and click on the “Get Access Token” button in the prompt. This will generate a new access token the field depicted in the above snapshot and you can directly copy and use it to retrieve data in R. Before going into that, we will take a closer look at the Graph API explorer which directly allows you to access the API from your web browser itself and helps if you want to do some quick exploratory analysis. A part of it is depicted in the above snapshot. The current version of the API when writing this book is v2.8 which you can see in the snapshot beside the GET resource call. Interestingly, the Graph API is so named because Facebook by itself can be considered as a huge social graph where all the information can be classified into the following three categories. Nodes: These are basically users, pages, photos and so on. Nodes indicate a focal point of interest which is connected to other points. Edges: These connect various nodes together forming the core social graph and these connections are based on various relations like friends, followers and so on. Fields: These are specific attributes or properties about nodes, an example would be a user’s address, birthday, name and so on. Like we mentioned before, the API is HTTP based and you can make HTTPGET requests to nodes or edges and all requests are passed to graph.facebook.com to get data. Each node usually has a specific identifier and you can use it for querying information about a node as depicted in the following snippet. GET graph.facebook.com /{node-id} You can also use edge names in addition to the identifier to get information about the edges of the node. The following snippet depicts how you can do the same. GET graph.facebook.com /{node-id}/{edge-name} The following snapshot shows us how we can get information about our own profile. Querying your details in the Graph API explorer Now suppose, I wanted to retrieve information about a Facebook page,“Premier League” which represents the top tier competition in English Football using its identifier and also take a look at its liked pages. I can do the same using the following request. Querying information about a Facebook Page using the Graph API explorer Thus from the above figure, you can clearly see the node identifier, page name and likes for the page, “Premier League”. It must be clear by now that all API responses are returned in the very popular JSON format which is easy to parse and format as needed for analysis. Besides this, there also used to be another way of querying the social graph in Facebook, which was known as FQL or Facebook Query Language, an SQL like interface for querying and retrieving data. Unfortunately, Facebook seems to have deprecated its use and hence covering it would be out of our present scope. Now that you have a firm grasp on the syntax of the Graph API and have also seen a few examples of how to retrieve data from Facebook, we will take a closer look at the Rfacebook package. Understanding Rfacebook Since we will be accessing and analyzing data from Facebook using R, it makes sense to have some robust mechanism to directly query Facebook and retrieve data instead of going to the browser every time like we did in the earlier section. Fortunately, there is an excellent package in R called Rfacebook which has been developed by Pablo Barberá. You can either install it from CRAN or get its most updated version from GitHub. The following snippet depicts how you can do the same. Remember you might need to install the devtools package if you don’t have it already, to download and install the latest version of the Rfacebook package from GitHub. install.packages("Rfacebook") # install from CRAN # install from GitHub library(devtools) install_github("pablobarbera/Rfacebook/Rfacebook") Once you install the package, you can load up the package using load(Rfacebook) and start using it to retrieve data from Facebook by using the access token you generated earlier. The following snippet shows us how you can access your own details like we had mentioned in the previous section, but this time by using R. > token = 'XXXXXX' > me <- getUsers("me", token=token) > me$name [1] "Dipanjan Sarkar" > me$id [1] "1026544" The beauty of this package is that you directly get the results in curated and neatly formatted data frames and you do not need to spend extra time trying to parse the raw JSON response objects from the Graph API. The package is well documented and has high level functions for accessing personal profile data on Facebook as well as page and group level data points. We will now take a quick look at Netvizz a Facebook application, which can also be used to extract data easily from Facebook. Understanding Netvizz The Netvizz application was developed by Bernhard Rieder and is a tool which can be used to extract data from Facebook pages, groups, get statistics about links and also extract social networks from Facebook pages based on liked pages from each connected page in the network. You can access Netvizz at https://apps.facebook.com/netvizz/ and on registering the application on your profile, you will be able to see the following screen. The Netvizz application interface From the above app snapshot, you can see that there are various links based on the type of operation you want to execute to extract data. Feel free to play around with this tool and we will be using its “page like network” capability later on in one of our analyses in a future section. Data Access Challenges There are several challenges with regards to accessing data from Facebook. Some of the major issues and caveats have been mentioned in the following points: Facebook will keep evolving and updating its data access APIs and this can and will lead to changes and deprecation of older APIs and access patterns just like FQL was deprecated. Scope of data available keeps changing with time and evolving of Facebook’s API and privacy settings. For instance we can no longer get details of all our friends from the API any longer. Libraries and Tools built on top of the API can tend to break with changes to Facebook’s APIs and this has happened before with Rfacebook as well as Netvizz. Besides this, Lada Adamic’s GetNet application has stopped working permanently ever since Facebook changed the way apps are created and the permissions they require. You can get more information about it here http://thepoliticsofsystems.net/2015/01/the-end-of-netvizz/ Thus what was used in the book today for data retrieval might not be working completely tomorrow if there are any changes in the APIs though it is expected it will be working fine for at least the next couple of years. However to prevent any hindrance on analyzing Facebook data, we have provided the datasets we used in most of our analyses except personal networks so that you can still follow along with each example and use-case. Personal names have been anonymized wherever possible to protect their privacy. Now that we have a good idea about Facebook’s Graph API and how to access data, let’s analyze some social networks! Analyzing your personal social network Like we had mentioned before, Facebook by itself is a massive social graph, connecting billions of users, brands and organization. Consider your own Facebook account if you have one. You will have several friends which are your immediate connections, they in turn will be having their own set of friends including you and you might be friends with some of them and so on. Thus you and your friends form the nodes of the network and edges determine the connections. In this section we will analyze a small network of you and your immediate friends and also look at how we can extract and analyze some properties from the network. Before we jump into our analysis, we will start by loading the necessary packages needed which are mentioned in the following snippet and storing the Facebook Graph API access token in a variable. library(Rfacebook) library(gridExtra) library(dplyr) # get the Graph API access token token = ‘XXXXXXXXXXX’ You can refer to the file fb_personal_network_analysis.R for code snippets used in the examples depicted in this section. Basic descriptive statistics In this section, we will try to get some basic information and descriptive statistics on the same from our personal social network on Facebook. To start with let us look at some details of our own profile on Facebook using the following code. # get my personal information me <- getUsers("me", token=token, private_info = TRUE) > View(me[c('name', 'id', 'gender', 'birthday')]) This shows us a few fields from the data frame containing our personal details retrieved from Facebook. We use the View function which basically invokes a spreadsheet-style data viewer on R objects like data frames. Now, let us get information about our friends in our personal network. Do note that Facebook currently only lets you access information about those friends who have allowed access to the Graph API and hence you may not be able to get information pertaining to all friends in your friend list. We have anonymized their names below for privacy reasons. anonymous_names <- c('Johnny Juel', 'Houston Tancredi',..., 'Julius Henrichs', 'Yong Sprayberry') # getting friends information friends <- getFriends(token, simplify=TRUE) friends$name <- anonymous_names # view top few rows > View(head(friends)) This gives us a peek at some people from our list of friends which we just retrieved from Facebook. Let’s now analyze some descriptive statistics based on personal information regarding our friends like where they are from, their gender and so on. # get personal information friends_info <- getUsers(friends$id, token, private_info = TRUE) # get the gender of your friends >View(table(friends_info$gender)) This gives us the gender of my friends, looks like more male friends have authorized access to the Graph API in my network! # get the location of your friends >View(table(friends_info$location)) This depicts the location of my friends (wherever available) in the following data frame. # get relationship status of your friends > View(table(friends_info$relationship_status)) From the statistics in the following table I can see that a lot of my friends have gotten married over the past couple of years. Boy that does make me feel old! Suppose I want to look at the relationship status of my friends grouped by gender, we can do the same using the following snippet. # get relationship status of friends grouped by gender View(table(friends_info$relationship_status, friends_info$gender)) The following table gives us the desired results and you can see the distribution of friends by their gender and relationship status. Summary This article has been proven very beneficial to know some basic analytics of social networks with the help of R. Moreover, you will also get to know the information regarding the packages that R use. Resources for Article: Further resources on this subject: How to integrate social media with your WordPress website [article] Social Media Insight Using Naive Bayes [article] Social Media in Magento [article]
Read more
  • 0
  • 0
  • 4530

article-image-cors-nodejs
Packt
20 Jun 2017
14 min read
Save for later

CORS in Node.js

Packt
20 Jun 2017
14 min read
In this article by Randall Goya, and Rajesh Gunasundaram the author of the book CORS Essentials, Node.js is a cross-platform JavaScript runtime environment that executes JavaScript code at server side. This enables to have a unified language across the web application development. JavaScript becomes the unified language that runs both on client side and server side. (For more resources related to this topic, see here.) In this article we will learn about: Node.js is a JavaScript platform for developing server-side web applications. Node.js can provide the web server for other frameworks including Express.js, AngularJS, Backbone,js, Ember.js and others. Some other JavaScript frameworks such as ReactJS, Ember.js and Socket.IO may also use Node.js as the web server. Isomorphic JavaScript can add server-side functionality for client-side frameworks. JavaScript frameworks are evolving rapidly. This article reviews some of the current techniques, and syntax specific for some frameworks. Make sure to check the documentation for the project to discover the latest techniques. Understanding CORS concepts, you may create your own solution, because JavaScript is a loosely structured language. All the examples are based on the fundamentals of CORS, with allowed origin(s), methods, and headers such as Content-Type, or preflight, that may be required according to the CORS specification. JavaScript frameworks are very popular JavaScript is sometimes called the lingua franca of the Internet, because it is cross-platform and supported by many devices. It is also a loosely-structured language, which makes it possible to craft solutions for many types of applications. Sometimes an entire application is built in JavaScript. Frequently JavaScript provides a client-side front-end for applications built with Symfony, Content Management Systems such as Drupal, and other back-end frameworks. Node.js is server-side JavaScript and provides a web server as an alternative to Apache, IIS, Nginx and other traditional web servers. Introduction to Node.js Node.js is an open-source and cross-platform library that enables in developing server-side web applications. Applications will be written using JavaScript in Node.js can run on many operating systems, including OS X, Microsoft Windows, Linux, and many others. Node.js provides a non-blocking I/O and an event-driven architecture designed to optimize an application's performance and scalability for real-time web applications. The biggest difference between PHP and Node.js is that PHP is a blocking language, where commands execute only after the previous command has completed, while Node.js is a non-blocking language where commands execute in parallel, and use callbacks to signal completion. Node.js can move files, payloads from services, and data asynchronously, without waiting for some command to complete, which improves performance. Most JS frameworks that work with Node.js use the concept of routes to manage pages and other parts of the application. Each route may have its own set of configurations. For example, CORS may be enabled only for a specific page or route. Node.js loads modules for extending functionality via the npm package manager. The developer selects which packages to load with npm, which reduces bloat. The developer community creates a large number of npm packages created for specific functions. JXcore is a fork of Node.js targeting mobile devices and IoTs (Internet of Things devices). JXcore can use both Google V8 and Mozilla SpiderMonkey as its JavaScript engine. JXcore can run Node applications on iOS devices using Mozilla SpiderMonkey. MEAN is a popular JavaScript software stack with MongoDB (a NoSQL database), Express.js and AngularJS, all of which run on a Node.js server. JavaScript frameworks that work with Node.js Node.js provides a server for other popular JS frameworks, including AngularJS, Express.js. Backbone.js, Socket.IO, and Connect.js. ReactJS was designed to run in the client browser, but it is often combined with a Node.js server. As we shall see in the following descriptions, these frameworks are not necessarily exclusive, and are often combined in applications. Express.js is a Node.js server framework Express.js is a Node.js web application server framework, designed for building single-page, multi-page, and hybrid web applications. It is considered the "standard" server framework for Node.js. The package is installed with the command npm install express –save. AngularJS extends static HTML with dynamic views HTML was designed for static content, not for dynamic views. AngularJS extends HTML syntax with custom tag attributes. It provides model–view–controller (MVC) and model–view–viewmodel (MVVM) architectures in a front-end client-side framework.  AngularJS is often combined with a Node.js server and other JS frameworks. AngularJS runs client-side and Express.js runs on the server, therefore Express.js is considered more secure for functions such as validating user input, which can be tampered client-side. AngularJS applications can use the Express.js framework to connect to databases, for example in the MEAN stack. Connect.js provides middleware for Node.js requests Connect.js is a JavaScript framework providing middleware to handle requests in Node.js applications. Connect.js provides middleware to handle Express.js and cookie sessions, to provide parsers for the HTML body and cookies, and to create vhosts (virtual hosts) and error handlers, and to override methods. Backbone.js often uses a Node.js server Backbone.js is a JavaScript framework with a RESTful JSON interface and is based on the model–view–presenter (MVP) application design. It is designed for developing single-page web applications, and for keeping various parts of web applications (for example, multiple clients and the server) synchronized. Backbone depends on Underscore.js, plus jQuery for use of all the available fetures. Backbone often uses a Node.js server, for example to connect to data storage. ReactJS handles user interfaces ReactJS is a JavaScript library for creating user interfaces while addressing challenges encountered in developing single-page applications where data changes over time. React handles the user interface in model–view–controller (MVC) architecture. ReactJS typically runs client-side and can be combined with AngularJS. Although ReactJS was designed to run client-side, it can also be used server-side in conjunction with Node.js. PayPal and Netflix leverage the server-side rendering of ReactJS known as Isomorphic ReactJS. There are React-based add-ons that take care of the server-side parts of a web application. Socket.IO uses WebSockets for realtime event-driven applications Socket.IO is a JavaScript library for event-driven web applications using the WebSocket protocol ,with realtime, bi-directional communication between web clients and servers. It has two parts: a client-side library that runs in the browser, and a server-side library for Node.js. Although it can be used as simply a wrapper for WebSocket, it provides many more features, including broadcasting to multiple sockets, storing data associated with each client, and asynchronous I/O. Socket.IO provides better security than WebSocket alone, since allowed domains must be specified for its server. Ember.js can use Node.js Ember is another popular JavaScript framework with routing that uses Moustache templates. It can run on a Node.js server, or also with Express.js. Ember can also be combined with Rack, a component of Ruby On Rails (ROR). Ember Data is a library for  modeling data in Ember.js applications. CORS in Express.js The following code adds the Access-Control-Allow-Origin and Access-Control-Allow-Headers headers globally to all requests on all routes in an Express.js application. A route is a path in the Express.js application, for example /user for a user page. app.all sets the configuration for all routes in the application. Specific HTTP requests such as GET or POST are handled by app.get and app.post. app.all('*', function(req, res, next) { res.header("Access-Control-Allow-Origin", "*"); res.header("Access-Control-Allow-Headers", "X-Requested-With"); next(); }); app.get('/', function(req, res, next) { // Handle GET for this route }); app.post('/', function(req, res, next) { // Handle the POST for this route }); For better security, consider limiting the allowed origin to a single domain, or adding some additional code to validate or limit the domain(s) that are allowed. Also, consider limiting sending the headers only for routes that require CORS by replacing app.all with a more specific route and method. The following code only sends the CORS headers on a GET request on the route/user, and only allows the request from http://www.localdomain.com. app.get('/user', function(req, res, next) { res.header("Access-Control-Allow-Origin", "http://www.localdomain.com"); res.header("Access-Control-Allow-Headers", "X-Requested-With"); next(); }); Since this is JavaScript code, you may dynamically manage the values of routes, methods, and domains via variables, instead of hard-coding the values. CORS npm for Express.js using Connect.js middleware Connect.js provides middleware to handle requests in Express.js. You can use Node Package Manager (npm) to install a package that enables CORS in Express.js with Connect.js: npm install cors The package offers flexible options, which should be familiar from the CORS specification, including using credentials and preflight. It provides dynamic ways to validate an origin domain using a function or a regular expression, and handler functions to process preflight. Configuration options for CORS npm origin: Configures the Access-Control-Allow-Origin CORS header with a string containing the full URL and protocol making the request, for example http://localdomain.com. Possible values for origin: Default value TRUE uses req.header('Origin') to determine the origin and CORS is enabled. When set to FALSE CORS is disabled. It can be set to a function with the request origin as the first parameter and a callback function as the second parameter. It can be a regular expression, for example /localdomain.com$/, or an array of regular expressions and/or strings to match. methods: Sets the Access-Control-Allow-Methods CORS header. Possible values for methods: A comma-delimited string of HTTP methods, for example GET, POST An array of HTTP methods, for example ['GET', 'PUT', 'POST'] allowedHeaders: Sets the Access-Control-Allow-Headers CORS header. Possible values for allowedHeaders: A comma-delimited string of  allowed headers, for example "Content-Type, Authorization'' An array of allowed headers, for example ['Content-Type', 'Authorization'] If unspecified, it defaults to the value specified in the request's Access-Control-Request-Headers header exposedHeaders: Sets the Access-Control-Expose-Headers header. Possible values for exposedHeaders: A comma-delimited string of exposed headers, for example 'Content-Range, X-Content-Range' An array of exposed headers, for example ['Content-Range', 'X-Content-Range'] If unspecified, no custom headers are exposed credentials: Sets the Access-Control-Allow-Credentials CORS header. Possible values for credentials: TRUE—passes the header for preflight FALSE or unspecified—omit the header, no preflight maxAge: Sets the Access-Control-Allow-Max-Age header. Possible values for maxAge An integer value in milliseconds for TTL to cache the request If unspecified, the request is not cached preflightContinue: Passes the CORS preflight response to the next handler. The default configuration without setting any values allows all origins and methods without preflight. Keep in mind that complex CORS requests other than GET, HEAD, POST will fail without preflight, so make sure you enable preflight in the configuration when using them. Without setting any values, the configuration defaults to: { "origin": "*", "methods": "GET,HEAD,PUT,PATCH,POST,DELETE", "preflightContinue": false } Code examples for CORS npm These examples demonstrate the flexibility of CORS npm for specific configurations. Note that the express and cors packages are always required. Enable CORS globally for all origins and all routes The simplest implementation of CORS npm enables CORS for all origins and all requests. The following example enables CORS for an arbitrary route " /product/:id" for a GET request by telling the entire app to use CORS for all routes: var express = require('express') , cors = require('cors') , app = express(); app.use(cors()); // this tells the app to use CORS for all re-quests and all routes app.get('/product/:id', function(req, res, next){ res.json({msg: 'CORS is enabled for all origins'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80'); }); Allow CORS for dynamic origins for a specific route The following example uses corsOptions to check if the domain making the request is in the whitelisted array with a callback function, which returns null if it doesn't find a match. This CORS option is passed to the route "product/:id" which is the only route that has CORS enabled. The allowed origins can be dynamic by changing the value of the variable "whitelist." var express = require('express') , cors = require('cors') , app = express(); // define the whitelisted domains and set the CORS options to check them var whitelist = ['http://localdomain.com', 'http://localdomain-other.com']; var corsOptions = { origin: function(origin, callback){ var originWhitelisted = whitelist.indexOf(origin) !== -1; callback(null, originWhitelisted); } }; // add the CORS options to a specific route /product/:id for a GET request app.get('/product/:id', cors(corsOptions), function(req, res, next){ res.json({msg: 'A whitelisted domain matches and CORS is enabled for route product/:id'}); }); // log that CORS is enabled on the server app.listen(80, function(){ console.log(''CORS is enabled on the web server listening on port 80''); }); You may set different CORS options for specific routes, or sets of routes, by defining the options assigned to unique variable names, for example "corsUserOptions." Pass the specific configuration variable to each route that requires that set of options. Enabling CORS preflight CORS requests that use a HTTP method other than GET, HEAD, POST (for example DELETE), or that use custom headers, are considered complex and require a preflight request before proceeding with the CORS requests. Enable preflight by adding an OPTIONS handler for the route: var express = require('express') , cors = require('cors') , app = express(); // add the OPTIONS handler app.options('/products/:id', cors()); // options is added to the route /products/:id // use the OPTIONS handler for the DELETE method on the route /products/:id app.del('/products/:id', cors(), function(req, res, next){ res.json({msg: 'CORS is enabled with preflight on the route '/products/:id' for the DELETE method for all origins!'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80''); }); You can enable preflight globally on all routes with the wildcard: app.options('*', cors()); Configuring CORS asynchronously One of the reasons to use NodeJS frameworks is to take advantage of their asynchronous abilities, handling multiple tasks at the same time. Here we use a callback function corsDelegateOptions and add it to the cors parameter passed to the route /products/:id. The callback function can handle multiple requests asynchronously. var express = require('express') , cors = require('cors') , app = express(); // define the allowed origins stored in a variable var whitelist = ['http://example1.com', 'http://example2.com']; // create the callback function var corsDelegateOptions = function(req, callback){ var corsOptions; if(whitelist.indexOf(req.header('Origin')) !== -1){ corsOptions = { origin: true }; // the requested origin in the CORS response matches and is allowed }else{ corsOptions = { origin: false }; // the requested origin in the CORS response doesn't match, and CORS is disabled for this request } callback(null, corsOptions); // callback expects two parameters: error and options }; // add the callback function to the cors parameter for the route /products/:id for a GET request app.get('/products/:id', cors(corsDelegateOptions), function(req, res, next){ res.json({msg: ''A whitelisted domain matches and CORS is enabled for route product/:id'}); }); app.listen(80, function(){ console.log('CORS is enabled on the web server listening on port 80''); }); Summary We have learned important stuffs of applying CORS in Node.js. Let us have a qssuick recap of what we have learnt: Node.js provides a web server built with JavaScript, and can be combined with many other JS frameworks as the application server. Although some frameworks have specific syntax for implementing CORS, they all follow the CORS specification by specifying allowed origin(s) and method(s). More robust frameworks allow custom headers such as Content-Type, and preflight when required for complex CORS requests. JavaScript frameworks may depend on the jQuery XHR object, which must be configured properly to allow Cross-Origin requests. JavaScript frameworks are evolving rapidly. The examples here may become outdated. Always refer to the project documentation for up-to-date information. With knowledge of the CORS specification, you may create your own techniques using JavaScript based on these examples, depending on the specific needs of your application. https://en.wikipedia.org/wiki/Node.js  Resources for Article: Further resources on this subject: An Introduction to Node.js Design Patterns [article] Five common questions for .NET/Java developers learning JavaScript and Node.js [article] API with MongoDB and Node.js [article]
Read more
  • 0
  • 0
  • 25710

article-image-scraping-web-page
Packt
20 Jun 2017
11 min read
Save for later

Scraping a Web Page

Packt
20 Jun 2017
11 min read
In this article by Katharine Jarmul author of the book Python Web Scraping - Second Edition we can look at some example as suppose I have a shop selling shoes and want to keep track of my competitor's prices. I could go to my competitor's website each day and compare each shoe's price with my own, however this will take a lot of time and will not scale well if I sell thousands of shoes or need to check price changes frequently. Or maybe I just want to buy a shoe when it's on sale. I could come back and check the shoe website each day until I get lucky, but the shoe I want might not be on sale for months. These repetitive manual processes could instead be replaced with an automated solution using the web scraping techniques covered in this book. In an ideal world, web scraping wouldn't be necessary and each website would provide an API to share the data in a structured format. Indeed, some websites do provide APIs, but they typically restrict the data that is available and how frequently it can be accessed. Additionally, a website developer might change, remove or restrict the backend API. In short, we cannot rely on APIs to access the online data we may want and therefore, we need to learn about web scraping techniques. (For more resources related to this topic, see here.) Three approaches to scrape a web page Now that we understand the structure of this web page we will investigate three different approaches to scraping its data, first with regular expressions, then with the popular BeautifulSoup module, and finally with the powerful lxml module. Regular expressions If you are unfamiliar with regular expressions or need a reminder, there is a thorough overview available at (https://docs.python.org/3/howto/regex.html). Even if you use regular expressions (or regex) with another programming language, I recommend stepping through it for a refresher on regex with Python. To scrape the country area using regular expressions, we will first try matching the contents of the <td> element, as follows: >>> import re >>> from advanced_link_crawler import download >>> url = 'http://example.webscraping.com/view/UnitedKingdom-239' >>> html = download(url) >>> re.findall(r'(.*?)', html) ['<'img src="/places/static/images/flags/gb.png" />', '244,820 square kilometres', '62,348,447', 'GB', 'United Kingdom', 'London', 'EU', '.uk', 'GBP', 'Pound', '44', '@# #@@|@## #@@|@@# #@@|@@## #@@|@#@ #@@|@@#@ #@@|GIR0AA', '^(([A-Z]\d{2}[A-Z]{2})|([A-Z]\d{3}[A-Z]{2})|([A-Z]{2}\d{2} [A-Z]{2})|([A-Z]{2}\d{3}[A-Z]{2})|([A-Z]\d[A-Z]\d[A-Z]{2}) |([A-Z]{2}\d[A-Z]\d[A-Z]{2})|(GIR0AA))$', 'en-GB,cy-GB,gd', 'IE '] This result shows that thetag is used for multiple country attributes. If we simply wanted to scrape the country area, we can select the second matching element, as follows: >>> re.findall('(.*?)', html)[1]'244,820 square kilometres' This solution works but could easily fail if the web page is updated. Consider if this table is changed and the area is no longer in the second matching element. If we just need to scrape the data now, future changes can be ignored. However, if we want to rescrape this data at some point, we want our solution to be as robust against layout changes as possible. To make this regular expression more specific, we can include the parentelement, which has an ID, so it ought to be unique: >>> re.findall(' Area: (.*?) ', html) ['244,820 square kilometres'] This iteration is better; however, there are many other ways the web page could be updated in a way that still breaks the regular expression. For example, double quotation marks might be changed to single, extra spaces could be added between the tags, or the area_label could be changed. Here is an improved version to try and support these various possibilities: >>> re.findall('''.*?<tds*class=["']w2p_fw["']>(.*?) ''', html) ['244,820 square kilometres'] This regular expression is more future-proof but is difficult to construct, and quite unreadable. Also, there are still plenty of other minor layout changes that would break it, such as if a title attribute was added to the <td> tag or if the tr or td elements changed their CSS classes or IDs. From this example, it is clear that regular expressions provide a quick way to scrape data but are too brittle and easily break when a web page is updated. Fortunately, there are better data extraction solutions such as. Beautiful Soup Beautiful Soup is a popular library that parses a web page and provides a convenient interface to navigate content. If you do not already have this module, the latest version can be installed using this command: pip install beautifulsoup4 The first step with Beautiful Soup is to parse the downloaded HTML into a soup document. Many web pages do not contain perfectly valid HTML and Beautiful Soup needs to correct improper open and close tags. For example, consider this simple web page containing a list with missing attribute quotes and closing tags: <ul class=country> <li>Area <li>Population </ul> If the Population item is interpreted as a child of the Area item instead of the list, we could get unexpected results when scraping. Let us see how Beautiful Soup handles this: >>> from bs4 import BeautifulSoup >>> broken_html = '<ul class=country><li>Area<li>Population</ul>' >>> # parse the HTML >>> soup = BeautifulSoup(broken_html, 'html.parser') >>> fixed_html = soup.prettify() >>> print(fixed_html) <ul class="country"> <li> Area <li> Population </li> </li> </ul> We can see that using the default html.parser did not result in properly parsed HTML. We can see from the previous snippet that it has used nested li elements, which might make it difficult to navigate. Luckily there are more options for parsers. We can install LXML or we can also use html5lib. To install html5lib, simply use pip: pip install html5lib Now, we can repeat this code, changing only the parser like so: >>> soup = BeautifulSoup(broken_html, 'html5lib') >>> fixed_html = soup.prettify() >>> print(fixed_html) <html> <head> </head> <body> <ul class="country"> <li> Area </li> <li> Population </li> </ul> </body> </html>  Here, BeautifulSoup using html5lib was able to correctly interpret the missing attribute quotes and closing tags, as well as add the <html> and <body> tags to form a complete HTML document. You should see similar results if you used lxml. Now, we can navigate to the elements we want using the find() and find_all() methods: >>> ul = soup.find('ul', attrs={'class':'country'}) >>> ul.find('li') # returns just the first match <li>Area</li> >>> ul.find_all('li') # returns all matches [<li>Area</li>, <li>Population</li>] For a full list of available methods and parameters, the official documentation is available at http://www.crummy.com/software/BeautifulSoup/bs4/doc/. Now, using these techniques, here is a full example to extract the country area from our example website: >>> from bs4 import BeautifulSoup >>> url = 'http://example.webscraping.com/places/view/United-Kingdom-239' >>> html = download(url) >>> soup = BeautifulSoup(html) >>> # locate the area row >>> tr = soup.find(attrs={'id':'places_area__row'}) >>> td = tr.find(attrs={'class':'w2p_fw'}) # locate the data element >>> area = td.text # extract the text from the data element >>> print(area) 244,820 square kilometres This code is more verbose than regular expressions but easier to construct and understand. Also, we no longer need to worry about problems in minor layout changes, such as extra whitespace or tag attributes. We also know if the page contains broken HTML that BeautifulSoup can help clean the page and allow us to extract data from very broken website code. Lxml Lxml is a Python library built on top of the libxml2 XML parsing library written in C, which helps make it faster than Beautiful Soup but also harder to install on some computers, specifically Windows. The latest installation instructions are available at http://lxml.de/installation.html. If you run into difficulties installing the library on your own, you can also use Anaconda to do so:  https://anaconda.org/anaconda/lxml. If you are unfamiliar with Anaconda, it is a package and environment manager primarily focused on open data science packages built by the folks at Continuum Analytics. You can download and install Anaconda by following their setup instructions here: https://www.continuum.io/downloads. Note that using the Anaconda quick install will set your PYTHON_PATH to the Conda installation of Python. As with Beautiful Soup, the first step when using lxml is parsing the potentially invalid HTML into a consistent format. Here is an example of parsing the same broken HTML: >>> from lxml.html import fromstring, tostring >>> broken_html = '<ul class=country><li>Area<li>Population</ul>' >>> tree = fromstring(broken_html) # parse the HTML >>> fixed_html = tostring(tree, pretty_print=True) >>> print(fixed_html) <ul class="country"> <li>Area</li> <li>Population</li> </ul> As with BeautifulSoup, lxml was able to correctly parse the missing attribute quotes and closing tags, although it did not add the <html> and <body> tags. These are not requirements for standard XML and so are unnecessary for lxml to insert. After parsing the input, lxml has a number of different options to select elements, such as XPath selectors and a find() method similar to Beautiful Soup. Instead, we will use CSS selectors here, because they are more compact and can be reused later when parsing dynamic content. Some readers will already be familiar with them from their experience with jQuery selectors or use in front-end web application development. We will compare performance of these selectors with XPath. To use CSS selectors, you might need to install the cssselect library like so: pip install cssselect Now we can use the lxml CSS selectors to extract the area data from the example page: >>> tree = fromstring(html) >>> td = tree.cssselect('tr#places_area__row > td.w2p_fw')[0] >>> area = td.text_content() >>> print(area) 244,820 square kilometres By using the cssselect method on our tree, we can utilize CSS syntax to select a table row element with the places_area__row ID, and then the child table data tag with the w2p_fw class. Since cssselect returns a list, we then index the first result and call the text_content method, which will iterate over all child elements and return concatenated text of each element. In this case, we only have one element, but this functionality is useful to know for more complex extraction examples. Summary We have walked through a variety of ways to scrape data from a web page. Regular expressions can be useful for a one-off scrape or to avoid the overhead of parsing the entire web page, and BeautifulSoup provides a high-level interface while avoiding any difficult dependencies. However, in general, lxml will be the best choice because of its speed and extensive functionality, so we will use it in future examples. Resources for Article: Further resources on this subject: Web scraping with Python (Part 2) [article] Scraping the Web with Python - Quick Start [article] Scraping the Data [article]
Read more
  • 0
  • 0
  • 2431
article-image-manipulating-functions-functional-programming
Packt
20 Jun 2017
6 min read
Save for later

Manipulating functions in functional programming

Packt
20 Jun 2017
6 min read
In this article by Wisnu Anggoro, author of the book Learning C++ Functional Programming, you will learn to apply functional programming techniques to C++ to build highly modular, testable, and reusable code. In this article, you will learn the following topics: Applying a first-class function in all functions Passing a function as other functions parameter Assigning a function to a variable Storing a function in the container (For more resources related to this topic, see here.) Applying a first-class function in all functions There's nothing special with the first-class function since it's a normal class. We can treat the first-class function like any other data type. However, in the language that supports the first-class function, we can perform the following tasks without invoking the compiler recursively: Passing a function as other function parameters Assigning functions to a variable Storing functions in collections Fortunately, C++ can be used to solve the preceding tasks. We will discuss it in depth in the following topics. Passing a function as other functions parameter Let's start passing a function as the function parameter. We will choose one of four functions and invoke the function from its main function. The code will look as follows: /* first-class-1.cpp */ #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; int addition(int x, int y) { return x + y; } int subtraction(int x, int y) { return x - y; } int multiplication(int x, int y) { return x * y; } int division(int x, int y) { return x / y; } void PassingFunc(FuncType fn, int x, int y) { cout << "Result = " << fn(x, y) << endl; } int main() { int i, a, b; FuncType func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; } return 0; } From the preceding code, we can see that we have four functions, and we want the user to choose one from them, and then run it. In the switch statement, we will invoke one of the four functions based on the choice of the user. We will pass the selected function to PassingFunc(), as we can see in the following code snippet: case 1: PassingFunc(addition, a, b); break; case 2: PassingFunc(subtraction, a, b); break; case 3: PassingFunc(multiplication, a, b); break; case 4: PassingFunc(division, a, b); break; The result we get on the screen should look like the following screenshot: The preceding screenshot shows that we selected the Subtraction mode and gave 8 to a and 3 to b. As we expected, the code gives us 5 as a result. Assigning a function to variable We can also assign a function to the variable so that we can call the function by calling the variable. We will refactor first-class-1.cpp, and it will be as follows: /* first-class-2.cpp */ #include <functional> #include <iostream> using namespace std; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; switch(i) { case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; } cout << "Result = " << func(a, b) << endl; return 0; } We will now assign the four functions based on the user choice. We will store the selected function in func variable inside the switch statement, as follows: case 1: func = addition; break; case 2: func = subtraction; break; case 3: func = multiplication; break; case 4: func = division; break; After the func variable is assigned with the user's choice, the code will just call the variable like calling the function as follows: cout << "Result = " << func(a, b) << endl; Moreover, we will obtain the same output on the console if we run the code. Storing a function in the container Now, let's save the function to the container. Here, we will use the vector as the container. The code is as follows: /* first-class-3.cpp */ #include <vector> #include <functional> #include <iostream> using namespace std; typedef function<int(int, int)> FuncType; // Adding the addition, subtraction, multiplication, and // division function as we've got in first-class-1.cpp int main() { vector<FuncType> functions; functions.push_back(addition); functions.push_back(subtraction); functions.push_back(multiplication); functions.push_back(division); int i, a, b; function<int(int, int)> func; cout << "Select mode:" << endl; cout << "1. Addition" << endl; cout << "2. Subtraction" << endl; cout << "3. Multiplication" << endl; cout << "4. Division" << endl; cout << "Choice: "; cin >> i; cout << "a = "; cin >> a; cout << "b = "; cin >> b; cout << "Result = " << functions.at(i - 1)(a, b) << endl; return 0; } From the preceding code, we can see that we created a new vector named functions, then stored four different functions to it. Same with our two previous code samples, we ask the user to select the mode as well. However, now the code becomes simpler since we don't need to add the switch statement as we can select the function directly by selecting the vector index, as we can see in the following line of code: cout << "Result = " << functions.at(i - 1)(a, b) << endl; However, since the vector is a zero-based index, we have to adjust the index with the menu choice. The result will be the same with our two previous code samples. Summary In this article, we discussed that there are some techniques to manipulate a function to produce the various purpose on it. Since we can implement the first-class function in C++ language, we can pass a function as other functions parameter. We can treat a function as a data object so we can assign it to a variable and store it in the container. Resources for Article: Further resources on this subject: Introduction to the Functional Programming [article] Functional Programming in C# [article] Putting the Function in Functional Programming [article]
Read more
  • 0
  • 0
  • 18827

article-image-monitoring-logging-and-troubleshooting
Packt
20 Jun 2017
6 min read
Save for later

Monitoring, Logging, and Troubleshooting

Packt
20 Jun 2017
6 min read
In this article by Gigi Sayfan, the author of the book Mastering Kubernetes, we will learn how to do the monitoring Kubernetes with Heapster. (For more resources related to this topic, see here.) Monitoring Kubernetes with Heapster Heapster is a Kubernetes project that provides a robust monitoring solution for Kubernetes clusters. It runs as a pod (of course), so it can be managed by Kubernetes itself. Heapster supports Kubernetes and CoreOS clusters. It has a very modular and flexible design. Heapster collects both operational metrics and events from every node in the cluster, stores them in a persistent backend (with a well-defined schema) and allows visualization and programmatic access. Heapster can be configured to use different backends (or sinks, in Heapster’s parlance) and their corresponding visualization frontends. The most common combination is InfluxDB as backend and Grafana as frontend. The Google Cloud platform integrates Heapster with the Google monitoring service. There are many other less common backends, such as the following: Log InfluxDB Google Cloud monitoring Google Cloud logging Hawkular-Metrics(metrics only) OpenTSDB Monasca (metrics only) Kafka (metrics only) Riemann (metrics only) Elasticsearch You can use multiple backends by specifying sinks on the command-line: --sink=log --sink=influxdb:http://monitoring-influxdb:80/ cAdvisor cAdvisor is part of the kubelet, which runs on every node. It collects information about the CPU/cores usage, memory, network,and file systems of each container. It provides a basic UI on port 4194, but, most importantly for Heapster, it provides all this information through the kubelet. Heapster records the information collected by cAdvisor on each node and stores it in its backend for analysis and visualization. The cAdvisor UI is useful if you want to quickly verify that a particular node is setup correctly, for example, while creating a new cluster when Heapster is not hooked up yet. Here is what it looks same as shown following: InfluxDB backend InfluxDB is a modern and robust distributed time-series database. It is very well-suited and used broadly for centralized metrics and logging. It is also the preferred Heapster backend (outside the Google Cloud platform). The only thing is InfluxDB clustering, high availability is part of enterprise offering. The storageschema The InfluxDB storage schema defines the information that Heapster stores in InfluxDB and is available for querying and graphing later. The metrics are divided into multiple categories, called measurements. You can treat and query each metric separately, or you can query a whole category as one measurement and receive the individual metrics as fields. The naming convention is <category>/<metrics name> (except for uptime, which has a single metric). If you have a SQL background you can think of measurements as tables. Each metrics are stored per container. Each metric is labeled with the following information: pod_id – Unique ID of a pod pod_name – User-provided name of a pod pod_namespace – The namespace of a pod container_base_image – Base image for the container container_name – User-provided name of the container or full cgroup name for system containers host_id – Cloud-provider-specified or user-specified Identifier of a node hostname – Hostname where the container ran labels – Comma-separated list of user-provided labels; format is key:value’ namespace_id – UID of the namespace of a pod resource_id – A unique identifier used to differentiate multiple metrics of the same type, for example, FS partitions under filesystem/usage Here are all the metrics grouped by category. As you can see, it is quite extensive. CPU cpu/limit – CPU hard limit in millicores cpu/node_capacity – CPU capacity of a node cpu/node_allocatable – CPU allocatable of a node cpu/node_reservation – Share of CPU that is reserved on the node allocatable cpu/node_utilization – CPU utilization as a share of node allocatable cpu/request – CPU request (the guaranteed amount of resources) in millicores cpu/usage – Cumulative CPU usage on all cores cpu/usage_rate – CPU usage on all cores in millicores File system filesystem/usage – Total number of bytes consumed on a filesystem filesystem/limit – The total size of the filesystem in bytes filesystem/available – The number of available bytes remaining in the filesystem Memory memory/limit – Memory hard limit in bytes memory/major_page_faults – Number of major page faults memory/major_page_faults_rate – Number of major page faults per second memory/node_capacity – Memory capacity of a node memory/node_allocatable – Memory allocatable of a node memory/node_reservation – Share of memory that is reserved on the node allocatable memory/node_utilization – Memory utilization as a share of memory allocatable memory/page_faults – Number of page faults memory/page_faults_rate – Number of page faults per second memory/request – Memory request (the guaranteed amount of resources) in bytes memory/usage – Total memory usage memory/working_set – Total working set usage; working set is the memory being used and not easily dropped by the kernel Network network/rx – Cumulative number of bytes received over the network network/rx_errors – Cumulative number of errors while receiving over the network network/rx_errors_rate – Number of errors per second while receiving over the network network/rx_rate – Number of bytes received over the network per second network/tx – Cumulative number of bytes sent over the network network/tx_errors – Cumulative number of errors while sending over the network network/tx_errors_rate – Number of errors while sending over the network network/tx_rate – Number of bytes sent over the network per second Uptime uptime – Number of milliseconds since the container was started You can work with InfluxDB directly if you’re familiar with it. You can either connect to it using its own API or use its web interface. Type the following command to find its port: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: http 32699/TCP NodePort: api 30020/TCP Now you can browse to the InfluxDB web interface using the HTTP port. You’ll need to configure it to point to the API port. The username and password are root and root by default: Once you’re setup you can select what database to use (see top-right corner). The Kubernetes database is called k8s. You can now query the metrics using the InfluxDB query language. Grafana visualization Grafana runs in its own container and serves a sophisticated dashboard that works well with InfluxDB as a data source. To locate the port, type the following command: k describe service monitoring-influxdb --namespace=kube-system | grep NodePort Type: NodePort NodePort: <unset> 30763/TCP Now you can access the Grafana web interface on that port. The first thing you need to do is setup the data source to point to the InfluxDB backend: Make sure to test the connection and then go explore the various options in the dashboards. There are several default dashboards, but you should be able to customize it to your preferences. Grafana is designed to let adapt it to your needs. Summary In this article we have learned how to do monitoring Kubernetes with Heapster.  Resources for Article: Further resources on this subject: The Microsoft Azure Stack Architecture [article] Building A Recommendation System with Azure [article] Setting up a Kubernetes Cluster [article]
Read more
  • 0
  • 0
  • 21389
Modal Close icon
Modal Close icon