Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-play-functions
Packt
21 Feb 2018
6 min read
Save for later

Play With Functions

Packt
21 Feb 2018
6 min read
This article by Igor Wojda and Marcin Moskala, authors of the book Android Development with Kotlin, introduces functions in Kotlin, together with different ways of calling functions. (For more resources related to this topic, see here.) Single-expression functions During typical programming, many functions contain only one expression. Here is example of this kind of function: fun square(x: Int): Int { return x * x } Or another one, which can be often found in Android projects. It is pattern used in Activity, to define methods that are just getting text from some view or providing some other data from view to allow presenter to get them: fun getEmail(): String { return emailView.text.toString() } Both functions are defined to return result of single expression. In first example it is result of x * x multiplication, and in second one it is result of expression emailView.text.toString(). This kind of functions are used all around Android projects. Here are some common use-cases: Extracting some small operations Using polymorphism to provide values specific to class Functions that are only creating some object Functions that are passing data between architecture layers (like in preceding example Activity is passing data from view to presenter) Functional programming style functions that base on recurrence Such functions are often used, so Kotlin has notation for this kind of them. When a function returns a single expression, then curly braces and body of the function can be omitted. We specify expression directly using equality character. Functions defined this way are called single-expression functions. Let's update our square function, and define it as a single-expression function: As we can see, single expression function have expression body instead of block body. This notation is shorter, but whole body needs to be just a single expression. In single-expression functions, declaring return type is optional, because it can be inferred by the compiler from type of expression. This is why we can simplify use square function, and define it this way: fun square(x: Int) = x * x There are many places inside Android application where we can utilize single expression functions. Let's consider RecyclerView adapter that is providing layout ID and creating ViewHolder: class AddressAdapter : ItemAdapter<AddressAdapter.ViewHolder>() { override fun getLayoutId() = R.layout.choose_address_view override fun onCreateViewHolder(itemView: View) = ViewHolder(itemView) // Rest of methods } In the following example, we achieve high readability thanks to single expression function. Single-expression functions are also very popular in the functional world. Single expression function notation is also well-pairing with when structure. Here example of their connection, used to get specific data from object according to key (use-case from big Kotlin project): fun valueFromBooking(key: String, booking: Booking?) = when(key) { // 1 "patient.nin" -> booking?.patient?.nin "patient.email" -> booking?.patient?.email "patient.phone" -> booking?.patient?.phone "comment" -> booking?.comment else -> null } We don't need a type, because it is inferred from when expression. Another common Android example is that we can combine when expression with activity method onOptionsItemSelected that handles top bar menu clicks: override fun onOptionsItemSelected(item: MenuItem): Boolean = when { item.itemId == android.R.id.home -> { onBackPressed() true } else -> super.onOptionsItemSelected(item) } As we can see, single expression functions can make our code more concise and improved readability. Single-expression functions are commonly used in Kotlin Android projects and they are really popular for functional programming. As an example. Let's suppose that we need to filter all the odd values from following list: val list = listOf(1, 2, 3, 4, 5) We will use following helper function that returns true if argument is odd otherwise it returns false: fun isOdd(i: Int) = i % 2 == 1 In imperative programming style, we should specify steps of processing, which are connected to execution process (iterate through list, check if value is odd, add value to one list if it's odd). Here is implementation of this functionality, that is typical for imperative style: var oddList = emptyList<Int>() for(i in list) { if(isOdd(i)) { newList += i } } In declarative programming style, the way of thinking about code is different - we should think what is the required result and simply use functions that will give us this result. Kotlin stdlib provides lot of functions supporting declarative programming style. Here is how we could implement the same functionality using one of them, called filter: var oddList = list.filter(::isOdd) filter is function that leaves only elements that are true according to predicate. Here function isOdd is used as an predicate. Different ways of calling a function Sometimes we need to call function and provide only selected arguments. In Java we could create multiple overloads of the same method, but this solution have there are some limitations. First problem is that number of possible method permutations is growing very quickly (2n) making them very difficult to maintain. Second problem is that overloads must be distinguishable from each other, so compiler will know which overload to call, so when method defines few parameters with the same type we can't define all possible overloads. That's why in Java we often need to pass multiple null values to a method: // Java printValue("abc", null, null, "!"); Multiple null parameters provide boilerplate and greatly decrease method readability. In Kotlin there is no such problem, because Kotlin has feature called default arguments and named argument syntax. Default arguments values Default arguments are mostly known from C++, which is one of the oldest languages supporting it. Default argument provides a value for a parameter in case it is not provided during method call. Each function parameter can have default value. It might be any value that is matching specified type including null. This way we can simply define functions that can be called in multiple ways We can use this function the same way as normal function (function without default argument values) by providing values for each parameter (all arguments): printValue("str", true, "","") // Prints: (str) Thanks to default argument values, we can call a function by providing arguments only for parameters without default values: printValue("str") // Prints: (str) We can also provide all parameters without default values, and only some that have a default value: printValue("str", false) // Prints: str Named arguments syntax Sometimes we want only to pass value for last argument. Let's suppose that we define want to define value for suffix, but not for prefix and inBracket (which are defined before suffix). Normally we would have to provide values for all previous parameters including the default parameter values: printValue("str", true, true, "!") // Prints: (str) By using named argument syntax, we can pass specific argument using argument name: printValue("str", suffix = "!") // Prints: (str)! We can also use named argument syntax together with classic call. The only restriction is when we start using named syntax we cannot use classic one for next arguments we are serving: printValue("str", true, "") printValue("str", true, prefix = "") printValue("str", inBracket = true, prefix = "") Summary In this article, we learned about single expression functions as a type of defining functions in application development. We also briefly explained Resources for Article:   Further resources on this subject: Getting started with Android Development [article] Android Game Development with Unity3D [article] Kotlin Basics [article]
Read more
  • 0
  • 0
  • 30011

article-image-create-conversational-assistant-chatbot-using-python
Savia Lobo
21 Feb 2018
5 min read
Save for later

How to create a conversational assistant or chatbot using Python

Savia Lobo
21 Feb 2018
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Natural Language Processing with Python Cookbook written by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti. This book includes unique recipes to teach various aspects of performing Natural Language Processing with NLTK—the leading Python platform for the task.[/box] Today we will learn to create a conversational assistant or chatbot using Python programming language. Conversational assistants or chatbots are not very new. One of the foremost of this kind is ELIZA, which was created in the early 1960s and is worth exploring. In order to successfully build a conversational engine, it should take care of the following things: 1. Understand the target audience 2. Understand the natural language in which communication happens.  3. Understand the intent of the user 4. Come up with responses that can answer the user and give further clues NLTK has a module, nltk.chat, which simplifies building these engines by providing a generic framework. Let's see the available engines in NLTK: Engines Modules Eliza nltk.chat.eliza Python module Iesha nltk.chat.iesha Python module Rude nltk.chat.rudep ython module Suntsu Suntsu nltk.chat.suntsu module Zen nltk.chat.zen module In order to interact with these engines we can just load these modules in our Python program and invoke the demo() function. This recipe will show us how to use built-in engines and also write our own simple conversational engine using the framework provided by the nltk.chat module. Getting ready You should have Python installed, along with the nltk library. Having an understanding of regular expressions also helps. How to do it...    Open atom editor (or your favorite programming editor).    Create a new file called Conversational.py.    Type the following source code:    Save the file.    Run the program using the Python interpreter.    You will see the following output: How it works... Let's try to understand what we are trying to achieve here. import nltk This instruction imports the nltk library into the current program. def builtinEngines(whichOne): This instruction defines a new function called builtinEngines that takes a string parameter, whichOne: if whichOne == 'eliza': nltk.chat.eliza.demo() elif whichOne == 'iesha': nltk.chat.iesha.demo() elif whichOne == 'rude': nltk.chat.rude.demo() elif whichOne == 'suntsu': nltk.chat.suntsu.demo() elif whichOne == 'zen': nltk.chat.zen.demo() else: print("unknown built-in chat engine {}".format(whichOne)) These if, elif, else instructions are typical branching instructions that decide which chat engine's demo() function is to be invoked depending on the argument that is present in the whichOne variable. When the user passes an unknown engine name, it displays a message to the user that it's not aware of this engine. It's a good practice to handle all known and unknown cases also; it makes our programs more robust in handling unknown situations def myEngine():. This instruction defines a new function called myEngine(); this function does not take any parameters. chatpairs = ( (r"(.*?)Stock price(.*)", ("Today stock price is 100", "I am unable to find out the stock price.")), (r"(.*?)not well(.*)", ("Oh, take care. May be you should visit a doctor", "Did you take some medicine ?")), (r"(.*?)raining(.*)", ("Its monsoon season, what more do you expect ?", "Yes, its good for farmers")), (r"How(.*?)health(.*)", ("I am always healthy.", "I am a program, super healthy!")), (r".*", ("I am good. How are you today ?", "What brings you here ?")) ) This is a single instruction where we are defining a nested tuple data structure and assigning it to chat pairs. Let's pay close attention to the data structure: We are defining a tuple of tuples Each subtuple consists of two elements: The first member is a regular expression (this is the user's question in regex format) The second member of the tuple is another set of tuples (these are the answers) def chat(): print("!"*80) print(" >> my Engine << ") print("Talk to the program using normal english") print("="*80) print("Enter 'quit' when done") chatbot = nltk.chat.util.Chat(chatpairs, nltk.chat.util.reflections) chatbot.converse() We are defining a subfunction called chat()inside the myEngine() function. This is permitted in Python. This chat() function displays some information to the user on the screen and calls the nltk built-in nltk.chat.util.Chat() class with the chatpairs variable. It passes nltk.chat.util.reflections as the second argument. Finally we call the chatbot.converse() function on the object that's created using the chat() class. chat() This instruction calls the chat() function, which shows a prompt on the screen and accepts the user's requests. It shows responses according to the regular expressions that we have built before: if   name    == '  main  ': for engine in ['eliza', 'iesha', 'rude', 'suntsu', 'zen']: print("=== demo of {} ===".format(engine)) builtinEngines(engine) print() myEngine() These instructions will be called when the program is invoked as a standalone program (not using import). They do these two things: Invoke the built-in engines one after another (so that we can experience them) Once all the five built-in engines are excited, they call our myEngine(), where our customer engine comes into play We have learned to create a chatbot of our own using the easiest programming language ‘Python’. To know more about how to efficiently use NLTK and implement text classification, identify parts of speech, tag words, etc check out Natural Language Processing with Python Cookbook.
Read more
  • 0
  • 0
  • 50958

article-image-installing-tensorflow-in-windows-ubuntu-and-mac-os
Amarabha Banerjee
21 Feb 2018
7 min read
Save for later

Installing TensorFlow in Windows, Ubuntu and Mac OS

Amarabha Banerjee
21 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is taken from the book Machine Learning with Tensorflow 1.x, written by Quan Hua, Shams Ul Azeem and Saif Ahmed. This book will help tackle common commercial machine learning problems with Google’s TensorFlow 1.x library.[/box] Today, we shall explore the basics of getting started with TensorFlow, its installation and configuration process. The proliferation of large public datasets, inexpensive GPUs, and open-minded developer culture has revolutionized machine learning efforts in recent years. Training data, the lifeblood of machine learning, has become widely available and easily consumable in recent years. Computing power has made the required horsepower available to small businesses and even individuals. The current decade is incredibly exciting for data scientists. Some of the top platforms used in the industry include Caffe, Theano, and Torch. While the underlying platforms are actively developed and openly shared, usage is limited largely to machine learning practitioners due to difficult installations, non-obvious configurations, and difficulty with productionizing solutions. TensorFlow has one of the easiest installations of any platform, bringing machine learning capabilities squarely into the realm of casual tinkerers and novice programmers. Meanwhile, high-performance features, such as—multiGPU support, make the platform exciting for experienced data scientists and industrial use as well. TensorFlow also provides a reimagined process and multiple user-friendly utilities, such as TensorBoard, to manage machine learning efforts. Finally, the platform has significant backing and community support from the world's largest machine learning powerhouse--Google. All this is before even considering the compelling underlying technical advantages, which we'll dive into later. Installing TensorFlow TensorFlow conveniently offers several types of installation and operates on multiple operating systems. The basic installation is CPU-only, while more advanced installations unleash serious horsepower by pushing calculations onto the graphics card, or even to multiple graphics cards. We recommend starting with a basic CPU installation at first. More complex GPU and CUDA installations will be discussed in Appendix, Advanced Installation. Even with just a basic CPU installation, TensorFlow offers multiple options, which are as follows: A basic Python pip installation A segregated Python installation via Virtualenv A fully segregated container-based installation via Docker Ubuntu installation Ubuntu is one of the best Linux distributions for working with Tensorflow. We highly recommend that you use an Ubuntu machine, especially if you want to work with GPU. We will do most of our work on the Ubuntu terminal. We will begin with installing pythonpip and python-dev via the following command: sudo apt-get install python-pip python-dev A successful installation will appear as follows: If you find missing packages, you can correct them via the following command: sudo apt-get update --fix-missing Then, you can continue the python and pip installation. We are now ready to install TensorFlow. The CPU installation is initiated via the following command: sudo pip install tensorflow A successful installation will appear as follows: macOS installation If you use Python, you will probably already have the Python package installer, pip. However, if not, you can easily install it using the easy_install pip command. You'll note that we actually executed sudo easy_install pip—the sudo prefix was required because the installation requires administrative rights. We will make the fair assumption that you already have the basic package installer, easy_install, available; if not, you can install it from https://pypi.python.org/pypi/setuptools. A successful installation will appear as shown in the following screenshot: Next, we will install the six package: sudo easy_install --upgrade six A successful installation will appear as shown in the following screenshot: Surprisingly, those are the only two prerequisites for TensorFlow, and we can now install the core platform. We will use the pip package installer mentioned earlier and install TensorFlow directly from Google's site. The most recent version at the time of writing this book is v1.3, but you should change this to the latest version you wish to use: sudo pip install tensorflow The pip installer will automatically gather all the other required dependencies. You will see each individual download and installation until the software is fully installed. A successful installation will appear as shown in the following screenshot: That's it! If you were able to get to this point, you can start to train and run your first model. Skip to Chapter 2, Your First Classifier, to train your first model. macOS X users wishing to completely segregate their installation can use a VM instead, as described in the Windows installation. Windows installation As we mentioned earlier, TensorFlow with Python 2.7 does not function natively on Windows. In this section, we will guide you through installing TensorFlow with Python 3.5 and set up a VM with Linux if you want to use TensorFlow with Python 2.7. First, we need to install Python 3.5.x or 3.6.x 64-bit from the following links: https://www.python.org/downloads/release/python-352/ https://www.python.org/downloads/release/python-362/ Make sure that you download the 64-bit version of Python where the name of the installation has amd64, such as python-3.6.2-amd64.exe. The Python 3.6.2 installation looks like this: We will select Add Python 3.6 to PATH and click Install Now. The installation process will complete with the following screen: We will click the Disable path length limit and then click Close to finish the Python installation. Now, let's open the Windows PowerShell application under the Windows menu. We will install the CPU-only version of Tensorflow with the following command: pip3 install tensorflow. The result of the installation will look like this: Congratulations, you can now use TensorFlow on Windows with Python 3.5.x or 3.6.x support. In the next section, we will show you how to set up a VM to use TensorFlow with Python 2.7. However, you can skip to the Test installation section of Chapter 2, Your First Classifier, if you don't need Python 2.7. Now, we will show you how to set up a VM with Linux to use TensorFlow with Python 2.7. We recommend the free VirtualBox system available at https://www.virtualbox.org/wiki/Downloads. The latest stable version at the time of writing is v5.0.14, available at the following URL: http:/ / download. virtualbox. org/ virtualbox/ 5. 1. 28/ VirtualBox- 5. 1. 28- 117968- Win. exe A successful installation will allow you to run the Oracle VM VirtualBox Manager dashboard, which looks like this: Testing the installation In this section, we will use TensorFlow to compute a simple math operation. First, open your terminal on Linux/macOS or Windows PowerShell in Windows. Now, we need to run python to use TensorFlow with the following command: python Enter the following program in the Python shell: import tensorflow as tf a = tf.constant(1.0) b = tf.constant(2.0) c = a + b sess = tf.Session() print(sess.run(c)) The result will look like the following screen where 3.0 is printed at the end: We covered TensorFlow installation on the three major operating systems, so that you are up and running with the platform. Windows users faced an extra challenge, as TensorFlow on Windows only supports Python 3.5.x or Python 3.6.x 64-bit version. However, even Windows users should now be up and running. Further get a detailed understanding of implementing Tensorflow with contextual examples in this post. If you liked this article, be sure to check out Machine Learning with Tensorflow 1.x which will help you take up any challenge you may face while implementing TensorFlow 1.x in your machine learning environment.  
Read more
  • 0
  • 0
  • 13492

article-image-open-and-proprietary-next-generation-networks
Packt
21 Feb 2018
29 min read
Save for later

Open and Proprietary Next Generation Networks

Packt
21 Feb 2018
29 min read
In this article by Steven Noble, the author of the book Building Modern Networks, we will discuss networking concepts such as hyper-scale networking, software-defined networking, network hardware and software design along with a litany of network design ideas utilized in NGN. (For more resources related to this topic, see here.) The term Next Generation Network (NGN) has been around for over 20 years and refers to the current state of the art network equipment, protocols and features. A big driver in NGN is the constant newer, better, faster forwarding ASICs coming out of companies like Barefoot, Broadcom, Cavium, Nephos (MediaTek) and others. The advent of commodity networking chips has shortened the development time for generic switches, allowing hyper scale networking end users to build equipment upgrades into their network designs. At the time of writing, multiple companies have announced 6.4 Tbps switching chips. In layman terms, a 6.4 Tbps switching chip can handle 64x100GbE of evenly distributed network traffic without losing any packets. To put the number in perspective, the entire internet in 2004 was about 4 Tbps, so all of the internet traffic in 2004 could have crossed this one switching chip without issue. (Internet Traffic 1.3 EB/month http://blogs.cisco.com/sp/the-history-and-future-of-internet-traffic) A hyper-scale network is one that is operated by companies such as Facebook, Google, Twitter and other companies that add hundreds if not thousands of new systems a month to keep up with demand. Examples of next generation networking At the start of the commercial internet age (1994), software routers running on minicomputers such as BBNs PDP-11 based IP routers designed in the 1970's were still in use and hubs were simply dumb hardware devices that broadcast traffic everywhere. At that time, the state of the art in networking was the Cisco 7000 series router, introduced in 1993. The next generation router was the Cisco 7500 (1995), while the Cisco 12000 series (gigabit) routers and the Juniper M40 were only concepts. When we say next generation, we are speaking of the current state of the art and the near future of networking equipment and software. For example, 100 GB Ethernet is the current state of the art, while 400 GB Ethernet is in the pipeline. The definition of a modern network is a network that contains one or more of the following concepts: Software-defined Networking (SDN) Network design concepts Next generation hardware Hyper scale networking Open networking hardware and software Network Function Virtualization (NFV) Highly configurable traffic management Both Open and Closed network hardware vendors have been innovating at a high rate of speed with the help of and due to hyper-scale companies like Google, Facebook and others who have the need for next generation high speed network devices. This provides the network architect with a reasonable pipeline of equipment to be used in designs. Google and Facebook are both companies with hyper scale networks. A hyper scale network is one where the data stored, transferred, and updated on the network grows exponentially. Hyper scale companies deploy new equipment, software, and configurations weekly or even daily to support the needs of their customers. These companies have needs that are outside of the normal networking equipment available, so they must innovate by building their own next generation network devices, designing multi-tiered networks (like a three stage Clos network) and automating the installation and configuration of the next generation networking devices. The need of hyper scalers is well summed up by Google's Amin Vahdat in a 2014 Wired article "We couldn't buy the hardware we needed to build a network of the size and speed we needed to build". Terms and concepts in networking Here you will find the definition of some terms that are important in networking. They have been broken into groups of similar concepts. Routing and switching concepts In network devices and network designs there are many important concepts to understand. Here we begin with the way data is handled. The easiest way to discuss networking is to look at the OSI layer and point out where each device sits. OSI Layer with respect to routers and switches: Layer 1 (Physical): Layer 1 includes cables, hub, and switch ports. This is how all of the devices connect to each other including copper cables (CatX), fiber optics and Direct Attach Cables (DAC) which connect SFP ports without fiber. Layer 2 (Data link Layer): Layer 2 includes the raw data sent over the links and manages the Media Access Control (MAC) addresses for Ethernet Layer 3 (Network layer): Layer 3 includes packets that have more than just layer 2 data, such as IP, IPX (Novell Networks protocol), AFP (Apple's protocol) Routers and switches In a network you will have equipment that switches and/or routes traffic. A switch is a networking device that connects multiple devices such as servers, provides local connectivity and provides an uplink to the core network. A router is a network device that computes paths to remote and local devices, providing connectivity to devices across a network. Both switches and routers can use copper and fiber connections to interconnect. There are a few parts to a networking device, the forwarding chip, the TCAM, and the network processor. Some newer switches have Baseboard Management Controllers (BMCs) which manage the power, fans and other hardware, lessening the burden on the NOS to manage these devices. Currently routers and switches are very similar as there are many Layer 3 forwarding capable switches and some Layer 2 forwarding capable routers. Making a switch Layer 3 capable is less of an issue than making a router Layer 2 forwarding as the switch already is doing Layer 2 and adding Layer 3 is not an issue. A router does not do Layer 2 forwarding in general, so it has to be modified to allow for ports to switch rather than route. Control plane The control plane is where all of the information about how packets should be handled is kept. Routing protocols live in the control plane and are constantly scanning information received to determine the best path for traffic to flow. This data is then packed into a simple table and pushed down to the data plane. Data plane The data plane is where forwarding happens. In a software router, this would be done in the devices CPU, in a hardware router, this would be done using the forwarding chip and associated memories. VLAN/VXLAN A Virtual Local Area Network (VLAN) is a way of creating separate logical networks within a physical network. VLANs are generally used to separate/combine different users, or network elements such as phones, servers, workstations, and so on. You can have up to 4,096 VLANs on a network segment. A Virtual Extensible LAN (VXLAN) was created to all for large, dynamic isolated logical networks for virtualized and multiple tenant networks. You can have up to 16 million VXLANs on a network segment. A VXLAN Tunnel Endpoint (VTEP) is a set of two logical interfaces inbound which encapsulates incoming traffic into VXLANs and outbound which removes the encapsulation of outgoing traffic from VXLAN back to its original state.  Network design concepts Network design requires the knowledge of the physical structure of the network so that the proper design choices are made. For example, in data center you would have a local area network, if you have multiple data centers near each other, they would be considered a metro area network. LAN A Local Area Network (LAN), generally considered to be within the same building. These networks can be bridged (switched) or routed. In general LANs are segmented into areas to avoid large broadcast domains. MAN A Metro Area Network (MAN), generally defined as multiple sites in the same geographic area or city, that is, metropolitan area. A MAN generally runs at the same speed as a LAN but is able to cover larger distances. WAN A Wide Area Network (WAN), essentially everything that is not a LAN or MAN is a WAN. WANs generally use fiber optic cables to transmit data from one location to another. WAN circuits can be provided via multiple connections and data encapsulations including MPLS, ATM, and Ethernet. Most large network providers utilize Dense Wavelength Division Multiplexing (DWDM) to put more bits on their fiber networks. DWDM puts multiple colors of light onto the fiber, allowing up to 128 different wavelengths to be sent down a single fiber. DWDM has just entered open networking with the introduction of Facebook's Voyager system. Leaf-Spine design In a Leaf-Spine network design, there are Leaf switches (the switches that connect to the servers) sometimes called Top of Rack (ToR) switches connected to a set of Spine (switches that connect leafs together) sometimes called End of Rack (EoR) switches. Clos network A Clos network is one of the ways to design a multi-stage network. Based on the switching network design by Charles Clos in 1952, a three stage Clos is the smallest version of a Clos network. It has an ingress, a middle, and an egress stage. Some hyper scale networks are using five stage Clos where the middle is replaced with another three stage Clos. In a five stage Clos there is an ingress, a middle ingress, a middle, a middle egress and an egress stage. All stages are connected to their neighbor, so in the example shown, Ingress 1 is connected to all four of the middle stages just as Egress 1 is connected to all four of the middle stages. A Clos network can be built in odd numbers starting with 3, so a 5, 7, and so on stage Clos is possible. For even numbered designs, Benes designs are usable. Benes network A Benes design is a non-blocking Clos design where the middle stage is 2x2 instead of NxN. A Benes network can have even numbers of stages. Here is a four stage Benes network. Network controller concepts Here we will discuss the concepts of network controllers. Every networking device has a controller, whether built in or external to manage the forwarding of the system. Controller A controller is a computer that sits on the network and manages one or more network devices. A controller can be built into a device, like the Cisco Supervisor module or standalone like an OpenFlow controller. The controller is responsible for managing all of the control plane data and deciding what should be sent down to the data plane. Generally, a controller will have a Command-line Interface (CLI) and more recently a web configuration interface. Some controllers will even have an Application Programming Interface (API). OpenFlow controller An OpenFlow controller, as it sounds is a controller that uses the OpenFlow protocol to communicate with network devices. The most common OpenFlow controllers that people hear about are OpenDaylight and ONOS. People who are working with OpenFlow would also know of Floodlight and RYU. Supervisor module A route processor is a computer that sits inside of the chassis of the network device you are managing. Sometimes the route processor is built in to the system, while other times it is a module that can be replaced/upgraded. Many vendor multi-slot systems have multiple route processors for redundancy. An example of a removable route processor is the Cisco 9500 series Supervisor module. There are multiple versions available including revision A, with a 4 core processor and 16 GB of RAM and revision B with a 6 core processor and 24 GB of RAM. Previous systems such as the Cisco Catalyst 7600 had options such as the SUP720 (Supervisor Module 720) of which they offered multiple versions. The standard SUP720 had a limited number of routes that it could support (256k) versus the SUP720 XL which could support up to 1M routes. Juniper Route Engine In Juniper terminology, the controller is called a Route Engine. They are similar to the Cisco Route Processor/Supervisor modules. Unlike Cisco Supervisor modules which utilize special CPUS, Juniper's REs generally use common x86 CPUs. Like Cisco, Juniper multi-slot systems can have redundant processors. Juniper has recently released the information about the NG-REs or Next Generation Route Engines. One example is the new RE-S-X6-64G, a 6-core x86 CPU based routing engine with 64 GB DRAM and 2x 64 GB SSD storage available for the MX240/MX480/MX960. These NG-REs allow for containers and other virtual machines to be run directly. Built in processor When looking at single rack unit (1 RU) or pizza box design switches there are some important design considerations. Most 1 RU switches do not have redundant processors, or field replaceable route processors. In general the field replaceable units (FRUs) that the customer can replace are power supplies and fans. If the failure is outside of the available FRUs, the entire switch must be replaced in the event of a failure. With white box switches this can be a simple process as white box switches can be used in multiple locations of your network including the customer edge, provider edge and core. Sparing (keeping a spare switch) is easy when you have the same hardware in multiple parts of the network. Recently commodity switch fabric chips have come with built-in low power ARM CPUs that can be used to manage the entire system, leading to cheaper and less power hungry designs. Facebook Wedge microserver The Facebook Wedge is different from most white box switches as it has its controller as an add in module, the same board that is used in some of the OCP servers. By separating the controller board from the switch, different boards can be put in place, such as higher memory, faster CPUs, different CPU types, and so on. Routing protocols A routing protocol is a daemon that runs on a controller and communicates with other network devices to exchange route information. For this section we will use common words to demonstrate the way the routing protocol is working, these should not be construed as the actual way that the protocols talk. BGP Border Gateway Protocol (BGP) is a path vector based External Gateway Protocol (EGP) protocol that makes routing decisions based on paths, network policies, or rules (route-maps on Cisco). Though designed as a EGP, BGP can be used as both an interior (iboga) and exterior (eBGP) routing protocol. BGP uses keep alive packets (are you there?) to confirm that neighbors are still accessible. BGP is the protocol that is utilized to route traffic across the internet, exchanging routing information between different Autonomous Systems (AS). An AS is all of the connected networks under the control of a single entity such as Level 3 (AS1) or Sprint (AS1239). When two different ASes interconnect, BGP peering sessions are setup between two or more network devices that have direct connections to each other. In an eBGP scenario, AS1 and AS1239 would setup BGP peering sessions that would allow traffic to route between their AS. In an iBGP scenario, the same AS would peer with other routers with the same AS and transfer the routes that are defined on the system. While iBGP is used internally in most networks, iBGP is used in large corporate networks because other Interior Gateway Protocols (IGPs) may not scale. Examples: iBGP next hop self In this scenario AS1 and AS2 are peered with each other and exchanging one prefix each. AS1 advertises 192.168.1.0/24 and AS2 advertises 192.168.2.0/24. Each network has two routers, one border router, which connects to other ASes and one internal router which gets its routes from the border router. The routes are advertised internally with the next-hop set as the border router. This is a standard scenario when you are not running an IGP inside to distribute the routes for the border router external interfaces. The conversation goes like this: AS1 -> AS2: Hi AS2, I am AS1 AS2 -> AS1: Hi AS1, I am AS2 AS1 -> AS2: I have the following route, 192.168.1.0/24 AS2 - AS1: I have received the route, I have 192.168.2.0/24 AS1 - AS2: I have received the route AS1 -> Internal Router AS1: I have this route, 192.168.2.0/24, you can reach it through me at 10.1.1.1 AS2 -> Internal Router AS2: I have this route, 192.168.1.0/24, you can reach it through me at 10.1.1.1 iBGP next-hop unmodified In the next scenario the border routers are the same, but the internal routers are given a next-hop of the external (Other AS) border router. The last scenario is where you peer with a router server, a system that handles peering, filtering the routes based on what you have specified you send. The routes are then forwarded onto your peers with your IP as the next hop. OSPF Open Shortest Path First (OSPF) is a relatively simple protocol. Different links on the same router are put into the same or different areas. For example, you would use area 1 for the interconnects between campuses but you would use another area, such as area 10 for the campus itself. By separating areas, you can reduce the amount of cross talk that happens between devices. There are two versions of OSPF, v2 and v3. The main difference between v2 and v3 is that v2 is for IPv4 networks and v3 is for IPv6 networks. When there are multiple paths that can be taken, the cost of the links must be taken into account. Below you can see where there are two paths, one has a total cost of 20 (5+5+10), the other 16 (8+8) so the traffic will take the lowest cost link. IS-IS IS-IS is a link-state routing protocol, operating by flooding link state information throughout a network of routers using NETs (Network Entity Title). Each IS-IS router has its own database of the network topology, built by aggregating the flooded network information. IS-IS is used by companies who are looking for Fast convergence, scalability and Rapid flooding of new information. IS-IS uses the concept of levels instead of areas as in OSPF. There are two levels in IS-IS, Level 1 - area and Level2 - backbone. A Level 1 Intermediate System (IS), keeps track of the destinations within its area, while a Level 2 IS keep track of paths to the Level 1 areas. EIGRP Enhanced Interior Gateway Routing Protocol (EIGRP) is Cisco's proprietary routing protocol. It is hardly ever seen in current networks but if you see it in yours, then you need to plan accordingly. Replacing EIGRP with OSPF is suggested so that you can interoperate with non-cisco devices. RIP If Routing Information Protocol (RIP) is being used in your network, it must be replaced during the design. Most newer routing stacks do not support RIP. RIP is one of the original routing protocols, using the number of hops (routed ports) between the device and remote location to determine the optimal path. RIP sends its entire routing database out every 30 seconds. When routing tables were small, many years ago, RIP worked fine. With larger tables, the traffic bursts and resulting re-computing by other routers in the network causes routers to run at almost 100 percent CPU all the time. Cables Here we will review the major types of cables. Copper Copper cables have been around for a very long time, originally network devices were connected together using coax cable (the same cable used for television antennas and cable).  These days there are a few standard cables that are used. RJ45 Cables Cat5 - A 100Mb capable cable, used for both 10Mb and 100Mb connections  Cat5E - 1GbE capable cable but not suggested for 1GbE networks (Cat6 is better and the price difference is nominal). Cat6 - A 1GbE capable cable, can be used for any speed at or below 1GbE including 100Mb and 10Mb. SFPs SFP - Small Form-factor Pluggable port. Capable of up to 1GbE connections SFP+ - Same size as the SFP, capable of up to 10Gb connections SFP28 - Same size as the SFP, capable of up to 25Gb connections QSFP - Quad Small Form-factor Pluggable - A bit wider than the SFP but capable of multiple GbE connections QSFP+ - Same size as the QSFP - capable of 40GbE as 4x10GbE on the same cable QSFP28 - Same size as the QSFP - capable of 100GbE DAC - A direct attach cable that fits into a SFP or QSFP port Fiber/Hot pluggable Breakout Cables As routers and switches continue to become more dense, where the number of ports on the front of the device can no longer fit in the space, manufacturers have moved to what we call breakout cables. For example, if you have a switch that can handle 3.2Tb/s of traffic, you need to provide 3200Gbp/s of port capacity. The easiest way to do that is to use 32 100Gb ports which will fit on the front of a 1U device.  You cannot fit 128 10Gb ports without using either a breakout patch panel (which will then use another few rack units (RUs), or a breakout cable. For a period of time in the 1990's, Cisco used RJ21 connectors to provide up to 96 ethernet ports per slot Network engineers would then create breakout cables to go from the RJ21 to RJ45. These days, we have both DAC (Direct Attach Cable) and Fiber breakout cables. For example, here you can see a 1x4 breakout cable, providing 4 10g or 25G ports from a single 40G or 100G port. If you build a LAN network that only includes switches that provide layer 2 connectivity, any devices you want to connect together need to be in the same IP block. If you have a router in your network, it can route traffic between IP blocks. Part 1: What defines a modern network There is a litany of concepts that define a modern network, from simple principles to full feature sets. In general, a next-generation data center design enables you to move to a widely distributed non-blocking fabric with uniform chipset, bandwidth, and buffering characteristics in a simple architecture. In one example, to support these requirements, you would begin with a true three-tier Clos switching architecture with Top of Rack (ToR), spine, and fabric layers to build a data center network. Each ToR would have access to multiple fabrics and have the ability to select a desired path based on application requirement or network availability. Following the definition of a modern network from the introduction, here we layout the general definition of the parts. Modern network pieces Here we will discuss the concepts that build a Next Generation Network (NGN). Software Defined Networks Software defined networks can be defined in multiple ways. The general definition of a Software defined network is one which can be controlled as a singular unit instead of at a system by system basis. The control-plane which would normally be in the device and using routing protocols is replaced with a controller. Software defined networks can be built using many different technologies including OpenFlow, overlay networks and automation tools. Next generation networking and hyper scale networks As we mention in the introduction, twenty years ago NGN hardware would have been the Cisco GSR (officially introduced in 1997) or the Juniper M40 (officially released in 1998). Large Cisco and Juniper customers would have been working with the companies to help come up with the specifications and determining how to deploy the devices (possibly Alpha or Beta versions) in their networks. Today we can look at the hyper scale networking companies to see what a modern network looks like. A hyper scale network is one where the data stored, transferred and updated on the network grows exponentially. Technology such as 100Gb Ethernet, software defined networking, Open networking equipment and software are being deployed by hyper scale companies. Open networking hardware overview Open Hardware has been around for about 10 years, first in the consumer space and more recently in the enterprise space. Enterprise open networking hardware companies such as Quanta and Accton provide a significant amount of the hardware currently utilized in networks today. Companies such as Google and Facebook have been building their own hardware for many years. Facebook's routers such as the Wedge 100 and Backpack are available publicly for end users to utilize. Some examples of Open Networking hardware are: The Dell S6000-ON - a 32x40G switch with 32 QSFP ports on the front. The Quanta LY8 - a 48x10G + 6x40G switch with 48 SFP+ ports and 6 QSFP ports. The Facebook Wedge 100 - a 32x100G switch with 32 QSFP28 ports on the front. Open networking software overview To use open networking hardware, you need an operating system. The operating system manages the system devices such as fans, power, LEDs and temperature. On top of the operating system you will run a forwarding agent, examples of forwarding agents are Indigo, the open source OpenFlow daemon and Quagga, an open source routing agent. Closed networking hardware overview Cisco and Juniper are the leaders in the Closed Hardware and Software space. Cisco produces switches like the Nexus series (3000, 7000, 9000) with the 9000 programmable by ACI. Juniper provides the MX series (480, 960, 2020) with the 2020 being the highest end forwarding system they sell. Closed networking software overview Cisco has multiple network operating systems including IOS, NX-OS, IOS-XR. All Cisco NOSs are closed source and proprietary to the system that they run on. Cisco has what the industry calls a "industry standard CLI" which is emulated by many other companies. Juniper ships a single NOS, JunOS which can install on multiple different systems. JunOS is a closed source BSD based NOS. The JunOS CLI is significantly different from IOS and is more focused on engineers who program. Network Virtualization Not to be confused with Network Function Virtualization (NFV), Network virtualization is the concept of re-creating the hardware interfaces that exist in a traditional network in software. By creating a software counterpart to the hardware interfaces, you decouple the network forwarding from the hardware. There are a few companies and software projects that allow the end user to enable network virtualization. The first one is NSX which comes from the same team that developed OvS (Open Virtual Switch) Nicira, which was acquired by VMWare in 2012. Another project is Big Switch Networks Big Cloud Fabric, which utilizes a heavily modified version of Indigo, an OpenFlow controller. Network Function Virtualization Network Function Virtualization can be summed up by the statement that: "Due to recent network focused advancements in PC hardware, any service able to be delivered on proprietary, application specific hardware should be able to be done on a virtual machine". Essentially: routers, firewalls, load balancers and other network devices all running virtualized on commodity hardware. Traffic Engineering Traffic engineering is a method of optimizing the performance of a telecommunications network by dynamically analyzing, predicting and regulating the behavior of data transmitted over that network. Part 2: Next generation networking examples In my 25 or so years of networking, I have dealt with a lot of different networking technologies, each iteration (supposedly) better than the last. Starting with Thin Net (10BASE2), moving through ArcNet, 10BASE-T, Token Ring, ATM to the Desktop, FDDI and onwards. Generally, the technology improved for each system until it was swapped out. A good example is the change from a literal ring for token ring to a switching design where devices hung off of a hub (as in 10BASE-T). ATM to the desktop was a novel idea, providing up to 25Mbps to connected devices, but the complexity of configuring and managing it was not worth the gain. Today almost everything is Ethernet as shown by the Facebook Voyager DWDM system, which uses Ethernet over both traditional SFP ports and the DWDM interfaces.  Ethernet is simple, well supported and easy to manage. Example 1 - Migration from FDDI to 100Base-T In late 1996, early 1997, the Exodus network used FDDI rings (Fiber Distributed Data Interface) to connect the main routers together at 100Mbps. As the network grew we had to decide between two competing technologies, FDDI switches and Fast Ethernet (100Base-T) both providing 100Mbp/s. FDDI switches from companies like DEC (FDDI Gigaswitch) were used in most of the Internet Exchange Points (IXPs) and worked reasonably well with one minor issue, head of line blocking (HOLB), which also impacted other technologies. Head of line blocking occurs when a packet is destined for an interface that is already full, so a queue is built, if the interface continues to be full, eventually the queue will be dropped. While we were testing the DEC FDDI Gigaswitches, we were also in deep discussions with Cisco about the availability of Fast Ethernet (FE) and working on designs. Because FE was new, there were concerns about how it would perform and how we would be able to build a redundant network design. In the end, we decided to use FE, connect the main routers in a full mesh and use routing protocols to manage fail-over. Example 2 - NGN Failure - LANE (LAN Emulation) During the high growth period at Exodus communications, there was a request to connect a new data center to the original one and allow customers to put servers in both locations using the same address space. To do this, we chose LAN Emulation or LANE which allows a ATM network to be used like a LAN. On paper, LANE looked like a great idea, the ability to extend the LAN so that customers could use the same IP space in two different locations. In reality, it was very different. For hardware, we were using Cisco 5513 switches which provided a combination of Ethernet and ATM ports. There were multiple issues with this design: First, the customer is provided with an ethernet interface, which runs over an ATM optical interface.  Any error on the physical connection between switches or the ATM layer would cause errors on the Ethernet layer. Second, monitoring was very hard, when there were network issues, you had to look in multiple locations to determine where the errors were happening. After a few weeks, we did a midnight swap putting Cisco 7500 routers in to replace the 5500 switches and moving customers onto new blocks for the new data center. Part 3: Designing a modern network When designing a new network, some of the following might be important to you: Simple, focused yet non-blocking IP fabric Multistage parallel fabrics based on Clos network concept Simple merchant silicon Distributed control plane with some centralized controls Wide multi-path (ECMP) Uniform chipset, bandwidth, and buffering 1:1 oversubscribed (non-blocking fabric) Minimize the hardware necessary to carry east–west traffic Ability to support a large number of bare metal servers without adding an additional layer Limit fabric to a 5 stage Clos within the data center to minimize lookups and switching latency. Support host attachment at 10G, 25G, 50G and 100G Ethernet Traffic management In a modern network one of the first decisions is whether you will use a centralized controller or not. If you use a centralized controller, you will be able to see and control the entire network from one location. If you do not use a centralized controller, you will need to either manage each system directly or via automation. There is a middle space where you can use some software defined network pieces to manage parts of the network, such as an OpenFlow controller for the WAN or VMware NSX for your virtualized workloads. Once you know what the general management goal is, the next decision is whether to use open, proprietary, or a combination of both open and proprietary networking equipment. Open networking equipment is a concept that has been around less than a decade and started when very large network operators decided that they wanted a better control of the cost and features of the equipment in their networks. Google is a good example. In the following figure, you can see how Facebook used both their own hardware, 6-Pack/Backpack and legacy vendor hardware for their interoperability and performance testing. Google wanted to build a high-speed backbone, but was not looking to pay the prices that the incumbent proprietary vendors such as Cisco and Juniper wanted. Google set a price per port (1G/10G/40G) that they wanted to hit and designed equipment around that. Later companies like Facebook decided to go the same direction and contracted with commodity manufacturers to build network switches that met their needs. Proprietary vendors can offer the same level of performance or better using their massive teams of engineers to design and optimize hardware. This distinction even applies on the software side where companies like VMware and Cisco have created software defined networking tools such as NSX and ACI. With the large amount of networking gear available, designing and building a modern network can appear to be a complex concept. Designing a modern network requires research and a good understanding of networking equipment. While complex, the task is not hard if you follow the guidelines. These are a few of the stages of planning that need to be followed before the modern network design is started: The first step is to understand the scope of the project (single site, multi-site, multi-continent, multi-planet). The second step is to determine if the project is a green field (new) or brown field deployment (how many of the sites already exist and will/will not be upgraded). The third step is to determine if there will be any software defined networking (SDN), next generation networking (NGN) or Open Networking pieces. Finally, it is key that the equipment to be used is assembled and tested to determine if the equipment meets the needs of the network. Summary In this article, we have discussed many different concepts that tie NGN together. The term NGN refers to the latest and near-term networking equipment and designs. We looked at networking concepts such as local, metro and wide area networks, network controllers, routers and switches. Routing protocols such as BGP, IS-IS, OSPF and RIP. Then we discussed many pieces that are used either singularly or together that create a modern network. In the end, we also learned some guidelines that should be followed while designing a network. Resources for Article:   Further resources on this subject: Analyzing Social Networks with Facebook [article] Social Networks [article] Point-to-Point Networks [article]
Read more
  • 0
  • 0
  • 21370

article-image-ipv6-unix-domain-sockets-and-network-interfaces
Packt
21 Feb 2018
38 min read
Save for later

IPv6, Unix Domain Sockets, and Network Interfaces

Packt
21 Feb 2018
38 min read
 In this article, given by Pradeeban Kathiravelu, author of the book Python Network Programming Cookbook - Second Edition, we will cover the following topics: Forwarding a local port to a remote host Pinging hosts on the network with ICMP Waiting for a remote network service Enumerating interfaces on your machine Finding the IP address for a specific interface on your machine Finding whether an interface is up on your machine Detecting inactive machines on your network Performing a basic IPC using connected sockets (socketpair) Performing IPC using Unix domain sockets Finding out if your Python supports IPv6 sockets Extracting an IPv6 prefix from an IPv6 address Writing an IPv6 echo client/server (For more resources related to this topic, see here.) This article extends the use of Python's socket library with a few third-party libraries. It also discusses some advanced techniques, for example, the asynchronous asyncore module from the Python standard library. This article also touches upon various protocols, ranging from an ICMP ping to an IPv6 client/server. In this article, a few useful Python third-party modules have been introduced by some example recipes. For example, the network packet capture library, Scapy, is well known among Python network programmers. A few recipes have been dedicated to explore the IPv6 utilities in Python including an IPv6 client/server. Some other recipes cover Unix domain sockets. Forwarding a local port to a remote host Sometimes, you may need to create a local port forwarder that will redirect all traffic from a local port to a particular remote host. This might be useful to enable proxy users to browse a certain site while preventing them from browsing some others. How to do it... Let us create a local port forwarding script that will redirect all traffic received at port 8800 to the Google home page (http://www.google.com). We can pass the local and remote host as well as port number to this script. For the sake of simplicity, let's only specify the local port number as we are aware that the web server runs on port 80. Listing 3.1 shows a port forwarding example, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import argparse LOCAL_SERVER_HOST = 'localhost' REMOTE_SERVER_HOST = 'www.google.com' BUFSIZE = 4096 import asyncore import socket class PortForwarder(asyncore.dispatcher): def __init__(self, ip, port, remoteip,remoteport,backlog=5): asyncore.dispatcher.__init__(self) self.remoteip=remoteip self.remoteport=remoteport self.create_socket(socket.AF_INET,socket.SOCK_STREAM) self.set_reuse_addr() self.bind((ip,port)) self.listen(backlog) def handle_accept(self): conn, addr = self.accept() print ("Connected to:",addr) Sender(Receiver(conn),self.remoteip,self.remoteport) class Receiver(asyncore.dispatcher): def __init__(self,conn): asyncore.dispatcher.__init__(self,conn) self.from_remote_buffer='' self.to_remote_buffer='' self.sender=None def handle_connect(self): pass def handle_read(self): read = self.recv(BUFSIZE) self.from_remote_buffer += read def writable(self): return (len(self.to_remote_buffer) > 0) def handle_write(self): sent = self.send(self.to_remote_buffer) self.to_remote_buffer = self.to_remote_buffer[sent:] def handle_close(self): self.close() if self.sender: self.sender.close() class Sender(asyncore.dispatcher): def __init__(self, receiver, remoteaddr,remoteport): asyncore.dispatcher.__init__(self) self.receiver=receiver receiver.sender=self self.create_socket(socket.AF_INET, socket.SOCK_STREAM) self.connect((remoteaddr, remoteport)) def handle_connect(self): pass def handle_read(self): read = self.recv(BUFSIZE) self.receiver.to_remote_buffer += read def writable(self): return (len(self.receiver.from_remote_buffer) > 0) def handle_write(self): sent = self.send(self.receiver.from_remote_buffer) self.receiver.from_remote_buffer = self.receiver.from_remote_buffer[sent:] def handle_close(self): self.close() self.receiver.close() if __name__ == "__main__": parser = argparse.ArgumentParser(description='Stackless Socket Server Example') parser.add_argument('--local-host', action="store", dest="local_host", default=LOCAL_SERVER_HOST) parser.add_argument('--local-port', action="store", dest="local_port", type=int, required=True) parser.add_argument('--remote-host', action="store", dest="remote_host", default=REMOTE_SERVER_HOST) parser.add_argument('--remote-port', action="store", dest="remote_port", type=int, default=80) given_args = parser.parse_args() local_host, remote_host = given_args.local_host, given_args.remote_host local_port, remote_port = given_args.local_port, given_args.remote_port print ("Starting port forwarding local %s:%s => remote %s:%s" % (local_host, local_port, remote_host, remote_port)) PortForwarder(local_host, local_port, remote_host, remote_port) asyncore.loop() If you run this script, it will show the following output: $ python 3_1_port_forwarding.py --local-port=8800 Starting port forwarding local localhost:8800 => remote www.google.com:80 Now, open your browser and visit http://localhost:8800. This will take you to the Google home page and the script will print something similar to the following command: ('Connected to:', ('127.0.0.1', 37236)) The following screenshot shows the forwarding a local port to a remote host: How it works... We created a port forwarding class, PortForwarder subclassed, from asyncore.dispatcher, which wraps around the socket object. It provides a few additional helpful functions when certain events occur, for example, when the connection is successful or a client is connected to a server socket. You have the choice of overriding the set of methods defined in this class. In our case, we only override the handle_accept() method. Two other classes have been derived from asyncore.dispatcher. The receiver class handles the incoming client requests and the sender class takes this receiver instance and processes the sent data to the clients. As you can see, these two classes override the handle_read(), handle_write(), and writeable() methods to facilitate the bi-directional communication between the remote host and local client. In summary, the PortForwarder class takes the incoming client request in a local socket and passes this to the sender class instance, which in turn uses the receiver class instance to initiate a bi-directional communication with a remote server in the specified port. Pinging hosts on the network with ICMP An ICMP ping is the most common type of network scanning you have ever encountered. It is very easy to open a command-line prompt or terminal and type ping www.google.com. How difficult is that from inside a Python program? This recipe shows you an example of a Python ping. Getting ready You need the superuser or administrator privilege to run this recipe on your machine. How to do it... You can lazily write a Python script that calls the system ping command-line tool, as follows: import subprocess import shlex command_line = "ping -c 1 www.google.com" args = shlex.split(command_line) try: subprocess.check_call(args,stdout=subprocess.PIPE, stderr=subprocess.PIPE) print ("Google web server is up!") except subprocess.CalledProcessError: print ("Failed to get ping.") However, in many circumstances, the system's ping executable may not be available or may be inaccessible. In this case, we need a pure Python script to do that ping. Note that this script needs to be run as a superuser or administrator. Listing 3.2 shows the ICMP ping, as follows: #!/usr/bin/env python # This program is optimized for Python 3.5.2. # Instructions to make it run with Python 2.7.x is as follows. # It may run on any other version with/without modifications. import os import argparse import socket import struct import select import time ICMP_ECHO_REQUEST = 8 # Platform specific DEFAULT_TIMEOUT = 2 DEFAULT_COUNT = 4 class Pinger(object): """ Pings to a host -- the Pythonic way""" def __init__(self, target_host, count=DEFAULT_COUNT, timeout=DEFAULT_TIMEOUT): self.target_host = target_host self.count = count self.timeout = timeout def do_checksum(self, source_string): """ Verify the packet integritity """ sum = 0 max_count = (len(source_string)/2)*2 count = 0 while count < max_count: # To make this program run with Python 2.7.x: # val = ord(source_string[count + 1])*256 + ord(source_string[count]) # ### uncomment the preceding line, and comment out the following line. val = source_string[count + 1]*256 + source_string[count] # In Python 3, indexing a bytes object returns an integer. # Hence, ord() is redundant. sum = sum + val sum = sum & 0xffffffff count = count + 2 if max_count<len(source_string): sum = sum + ord(source_string[len(source_string) - 1]) sum = sum & 0xffffffff sum = (sum >> 16) + (sum & 0xffff) sum = sum + (sum >> 16) answer = ~sum answer = answer & 0xffff answer = answer >> 8 | (answer << 8 & 0xff00) return answer def receive_pong(self, sock, ID, timeout): """ Receive ping from the socket. """ time_remaining = timeout while True: start_time = time.time() readable = select.select([sock], [], [], time_remaining) time_spent = (time.time() - start_time) if readable[0] == []: # Timeout return time_received = time.time() recv_packet, addr = sock.recvfrom(1024) icmp_header = recv_packet[20:28] type, code, checksum, packet_ID, sequence = struct.unpack( "bbHHh", icmp_header ) if packet_ID == ID: bytes_In_double = struct.calcsize("d") time_sent = struct.unpack("d", recv_packet[28:28 + bytes_In_double])[0] return time_received - time_sent time_remaining = time_remaining - time_spent if time_remaining <= 0: return We need a send_ping() method that will send the data of a ping request to the target host. Also, this will call the do_checksum() method for checking the integrity of the ping data, as follows: def send_ping(self, sock, ID): """ Send ping to the target host """ target_addr = socket.gethostbyname(self.target_host) my_checksum = 0 # Create a dummy heder with a 0 checksum. header = struct.pack("bbHHh", ICMP_ECHO_REQUEST, 0, my_checksum, ID, 1) bytes_In_double = struct.calcsize("d") data = (192 - bytes_In_double) * "Q" data = struct.pack("d", time.time()) + bytes(data.encode('utf-8')) # Get the checksum on the data and the dummy header. my_checksum = self.do_checksum(header + data) header = struct.pack( "bbHHh", ICMP_ECHO_REQUEST, 0, socket.htons(my_checksum), ID, 1 ) packet = header + data sock.sendto(packet, (target_addr, 1)) Let us define another method called ping_once() that makes a single ping call to the target host. It creates a raw ICMP socket by passing the ICMP protocol to socket(). The exception handling code takes care if the script is not run by a superuser or if any other socket error occurs. Let's take a look at the following code: def ping_once(self): """ Returns the delay (in seconds) or none on timeout. """ icmp = socket.getprotobyname("icmp") try: sock = socket.socket(socket.AF_INET, socket.SOCK_RAW, icmp) except socket.error as e: if e.errno == 1: # Not superuser, so operation not permitted e.msg += "ICMP messages can only be sent from root user processes" raise socket.error(e.msg) except Exception as e: print ("Exception: %s" %(e)) my_ID = os.getpid() & 0xFFFF self.send_ping(sock, my_ID) delay = self.receive_pong(sock, my_ID, self.timeout) sock.close() return delay The main executive method of this class is ping(). It runs a for loop inside which the ping_once() method is called count times and receives a delay in the ping response in seconds. If no delay is returned, that means the ping has failed. Let's take a look at the following code: def ping(self): """ Run the ping process """ for i in range(self.count): print ("Ping to %s..." % self.target_host,) try: delay = self.ping_once() except socket.gaierror as e: print ("Ping failed. (socket error: '%s')" % e[1]) break if delay == None: print ("Ping failed. (timeout within %ssec.)" % self.timeout) else: delay = delay * 1000 print ("Get pong in %0.4fms" % delay) if __name__ == '__main__': parser = argparse.ArgumentParser(description='Python ping') parser.add_argument('--target-host', action="store", dest="target_host", required=True) given_args = parser.parse_args() target_host = given_args.target_host pinger = Pinger(target_host=target_host) pinger.ping() This script shows the following output. This has been run with the superuser privilege: $ sudo python 3_2_ping_remote_host.py --target-host=www.google.com Ping to www.google.com... Get pong in 27.0808ms Ping to www.google.com... Get pong in 17.3445ms Ping to www.google.com... Get pong in 33.3586ms Ping to www.google.com... Get pong in 32.3212ms How it works... A Pinger class has been constructed to define a few useful methods. The class initializes with a few user-defined or default inputs, which are as follows: target_host: This is the target host to ping count: This is how many times to do the ping timeout: This is the value that determines when to end an unfinished ping operation The send_ping() method gets the DNS hostname of the target host and creates an ICMP_ECHO_REQUEST packet using the struct module. It is necessary to check the data integrity of the method using the do_checksum() method. It takes the source string and manipulates it to produce a proper checksum. On the receiving end, the receive_pong() method waits for a response until the timeout occurs or receives the response. It captures the ICMP response header and then compares the packet ID and calculates the delay in the request and response cycle. Waiting for a remote network service Sometimes, during the recovery of a network service, it might be useful to run a script to check when the server is online again. How to do it... We can write a client that will wait for a particular network service forever or for a timeout. In this example, by default, we would like to check when a web server is up in localhost. If you specified some other remote host or port, that information will be used instead. Listing 3.3 shows waiting for a remote network service, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import argparse import socket import errno from time import time as now DEFAULT_TIMEOUT = 120 DEFAULT_SERVER_HOST = 'localhost' DEFAULT_SERVER_PORT = 80 class NetServiceChecker(object): """ Wait for a network service to come online""" def __init__(self, host, port, timeout=DEFAULT_TIMEOUT): self.host = host self.port = port self.timeout = timeout self.sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) def end_wait(self): self.sock.close() def check(self): """ Check the service """ if self.timeout: end_time = now() + self.timeout while True: try: if self.timeout: next_timeout = end_time - now() if next_timeout < 0: return False else: print ("setting socket next timeout %ss" %round(next_timeout)) self.sock.settimeout(next_timeout) self.sock.connect((self.host, self.port)) # handle exceptions except socket.timeout as err: if self.timeout: return False except socket.error as err: print ("Exception: %s" %err) else: # if all goes well self.end_wait() return True if __name__ == '__main__': parser = argparse.ArgumentParser(description='Wait for Network Service') parser.add_argument('--host', action="store", dest="host", default=DEFAULT_SERVER_HOST) parser.add_argument('--port', action="store", dest="port", type=int, default=DEFAULT_SERVER_PORT) parser.add_argument('--timeout', action="store", dest="timeout", type=int, default=DEFAULT_TIMEOUT) given_args = parser.parse_args() host, port, timeout = given_args.host, given_args.port, given_args.timeout service_checker = NetServiceChecker(host, port, timeout=timeout) print ("Checking for network service %s:%s ..." %(host, port)) if service_checker.check(): print ("Service is available again!") If a web server is running on your machine, this script will show the following output: $ python 3_3_wait_for_remote_service.py Waiting for network service localhost:80 ... setting socket next timeout 120.0s Service is available again! If you do not have a web server already running in your computer, make sure to install one such as Apache 2 Web Server: $ sudo apt install apache2 Now, stop the Apache process: $ sudo /etc/init.d/apache2 stop It will print the following message while stopping the service. [ ok ] Stopping apache2 (via systemctl): apache2.service. Run this script, and start Apache again. $ sudo /etc/init.d/apache2 start[ ok ] Starting apache2 (via systemctl): apache2.service. The output pattern will be different. On my machine, the following output pattern was found: Exception: [Errno 103] Software caused connection abort setting socket next timeout 119.0s Exception: [Errno 111] Connection refused setting socket next timeout 119.0s Exception: [Errno 103] Software caused connection abort setting socket next timeout 119.0s Exception: [Errno 111] Connection refused setting socket next timeout 119.0s And finally when Apache2 is up again, the following log is printed: Service is available again! The following screenshot shows the waiting for an active Apache web server process: How it works... The preceding script uses the argparse module to take the user input and process the hostname, port, and timeout, that is how long our script will wait for the desired network service. It launches an instance of the NetServiceChecker class and calls the check() method. This method calculates the final end time of waiting and uses the socket's settimeout() method to control each round's end time, that is next_timeout. It then uses the socket's connect() method to test if the desired network service is available until the socket timeout occurs. This method also catches the socket timeout error and checks the socket timeout against the timeout values given by the user. Enumerating interfaces on your machine If you need to list the network interfaces present on your machine, it is not very complicated in Python. There are a couple of third-party libraries out there that can do this job in a few lines. However, let's see how this is done using a pure socket call. Getting ready You need to run this recipe on a Linux box. To get the list of available interfaces, you can execute the following command: $ /sbin/ifconfig How to do it... Listing 3.4 shows how to list the networking interfaces, as follows: #!/usr/bin/env python # Python Network Programming Cookbook, Second Edition -- Article - 3 # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import sys import socket import fcntl import struct import array SIOCGIFCONF = 0x8912 #from C library sockios.h STUCT_SIZE_32 = 32 STUCT_SIZE_64 = 40 PLATFORM_32_MAX_NUMBER = 2**32 DEFAULT_INTERFACES = 8 def list_interfaces(): interfaces = [] max_interfaces = DEFAULT_INTERFACES is_64bits = sys.maxsize > PLATFORM_32_MAX_NUMBER struct_size = STUCT_SIZE_64 if is_64bits else STUCT_SIZE_32 sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) while True: bytes = max_interfaces * struct_size interface_names = array.array('B', b' ' * bytes) sock_info = fcntl.ioctl( sock.fileno(), SIOCGIFCONF, struct.pack('iL', bytes, interface_names.buffer_info()[0]) ) outbytes = struct.unpack('iL', sock_info)[0] if outbytes == bytes: max_interfaces *= 2 else: break namestr = interface_names.tostring() for i in range(0, outbytes, struct_size): interfaces.append((namestr[i:i+16].split(b' ', 1)[0]).decode('ascii', 'ignore')) return interfaces if __name__ == '__main__': interfaces = list_interfaces() print ("This machine has %s network interfaces: %s." %(len(interfaces), interfaces)) The preceding script will list the network interfaces, as shown in the following output: $ python 3_4_list_network_interfaces.py This machine has 2 network interfaces: ['lo', 'wlo1']. How it works... This recipe code uses a low-level socket feature to find out the interfaces present on the system. The single list_interfaces()method creates a socket object and finds the network interface information from manipulating this object. It does so by making a call to the fnctl module's ioctl() method. The fnctl module interfaces with some Unix routines, for example, fnctl(). This interface performs an I/O control operation on the underlying file descriptor socket, which is obtained by calling the fileno() method of the socket object. The additional parameter of the ioctl() method includes the SIOCGIFADDR constant defined in the C socket library and a data structure produced by the struct module's pack() function. The memory address specified by a data structure is modified as a result of the ioctl() call. In this case, the interface_names variable holds this information. After unpacking the sock_info return value of the ioctl() call, the number of network interfaces is increased twice if the size of the data suggests it. This is done in a while loop to discover all interfaces if our initial interface count assumption is not correct. The names of interfaces are extracted from the string format of the interface_names variable. It reads specific fields of that variable and appends the values in the interfaces' list. At the end of the list_interfaces() function, this is returned. Finding the IP address for a specific interface on your machine Finding the IP address of a particular network interface may be needed from your Python network application. Getting ready This recipe is prepared exclusively for a Linux box. There are some Python modules specially designed to bring similar functionalities on Windows and Mac platforms. For example, see http://sourceforge.net/projects/pywin32/ for Windows-specific implementation. How to do it... You can use the fnctl module to query the IP address on your machine. Listing 3.5 shows us how to find the IP address for a specific interface on your machine, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import argparse import sys import socket import fcntl import struct import array def get_ip_address(ifname): s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) return socket.inet_ntoa(fcntl.ioctl( s.fileno(), 0x8915, # SIOCGIFADDR struct.pack(b'256s', bytes(ifname[:15], 'utf-8')) )[20:24]) if __name__ == '__main__': parser = argparse.ArgumentParser(description='Python networking utils') parser.add_argument('--ifname', action="store", dest="ifname", required=True) given_args = parser.parse_args() ifname = given_args.ifname print ("Interface [%s] --> IP: %s" %(ifname, get_ip_address(ifname))) The output of this script is shown in one line, as follows: $ python 3_5_get_interface_ip_address.py --ifname=lo Interface [lo] --> IP: 127.0.0.1 In the preceding execution, make sure to use an existing interface, as printed in the previous recipe. In my computer, I got the output previously for3_4_list_network_interfaces.py: This machine has 2 network interfaces: ['lo', 'wlo1']. If you use a non-existing interface, an error will be printed. For example, I do not have eth0 interface right now.So the output is, $ python3 3_5_get_interface_ip_address.py --ifname=eth0 Traceback (most recent call last): File "3_5_get_interface_ip_address.py", line 27, in <module> print ("Interface [%s] --> IP: %s" %(ifname, get_ip_address(ifname))) File "3_5_get_interface_ip_address.py", line 19, in get_ip_address struct.pack(b'256s', bytes(ifname[:15], 'utf-8')) OSError: [Errno 19] No such device How it works... This recipe is similar to the previous one. The preceding script takes a command-line argument: the name of the network interface whose IP address is to be known. The get_ip_address() function creates a socket object and calls the fnctl.ioctl() function to query on that object about IP information. Note that the socket.inet_ntoa() function converts the binary data to a human-readable string in a dotted format as we are familiar with it. Finding whether an interface is up on your machine If you have multiple network interfaces on your machine, before doing any work on a particular interface, you would like to know the status of that network interface, for example, if the interface is actually up. This makes sure that you route your command to active interfaces. Getting ready This recipe is written for a Linux machine. So, this script will not run on a Windows or Mac host. In this recipe, we use nmap, a famous network scanning tool. You can find more about nmap from its website http://nmap.org/. Install nmap in your computer. For Debian-based system, the command is: $ sudo apt-get install nmap You also need the python-nmap module to run this recipe. This can be installed by pip,  as follows: $ pip install python-nmap How to do it... We can create a socket object and get the IP address of that interface. Then, we can use any of the scanning techniques to probe the interface status. Listing 3.6 shows the detect network interface status, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import argparse import socket import struct import fcntl import nmap SAMPLE_PORTS = '21-23' def get_interface_status(ifname): sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) ip_address = socket.inet_ntoa(fcntl.ioctl( sock.fileno(), 0x8915, #SIOCGIFADDR, C socket library sockios.h struct.pack(b'256s', bytes(ifname[:15], 'utf-8')) )[20:24]) nm = nmap.PortScanner() nm.scan(ip_address, SAMPLE_PORTS) return nm[ip_address].state() if __name__ == '__main__': parser = argparse.ArgumentParser(description='Python networking utils') parser.add_argument('--ifname', action="store", dest="ifname", required=True) given_args = parser.parse_args() ifname = given_args.ifname print ("Interface [%s] is: %s" %(ifname, get_interface_status(ifname))) If you run this script to inquire the status of the eth0 status, it will show something similar to the following output: $ python 3_6_find_network_interface_status.py --ifname=lo Interface [lo] is: up How it works... The recipe takes the interface's name from the command line and passes it to the get_interface_status() function. This function finds the IP address of that interface by manipulating a UDP socket object. This recipe needs the nmap third-party module. We can install that PyPI using the pip install command. The nmap scanning instance, nm, has been created by calling PortScanner(). An initial scan to a local IP address gives us the status of the associated network interface. Detecting inactive machines on your network If you have been given a list of IP addresses of a few machines on your network and you are asked to write a script to find out which hosts are inactive periodically, you would want to create a network scanner type program without installing anything on the target host computers. Getting ready This recipe requires installing the Scapy library (> 2.2), which can be obtained at http://www.secdev.org/projects/scapy/files/scapy-latest.zip. At the time of writing, the default Scapy release works with Python 2, and does not support Python 3. You may download the Scapy for Python 3 from https://pypi.python.org/pypi/scapy-python3/0.20 How to do it... We can use Scapy, a mature network-analyzing, third-party library, to launch an ICMP scan. Since we would like to do it periodically, we need Python's sched module to schedule the scanning tasks. Listing 3.7 shows us how to detect inactive machines, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. # Requires scapy-2.2.0 or higher for Python 2.7. # Visit: http://www.secdev.org/projects/scapy/files/scapy-latest.zip # As of now, requires a separate bundle for Python 3.x. # Download it from: https://pypi.python.org/pypi/scapy-python3/0.20 import argparse import time import sched from scapy.all import sr, srp, IP, UDP, ICMP, TCP, ARP, Ether RUN_FREQUENCY = 10 scheduler = sched.scheduler(time.time, time.sleep) def detect_inactive_hosts(scan_hosts): """ Scans the network to find scan_hosts are live or dead scan_hosts can be like 10.0.2.2-4 to cover range. See Scapy docs for specifying targets. """ global scheduler scheduler.enter(RUN_FREQUENCY, 1, detect_inactive_hosts, (scan_hosts, )) inactive_hosts = [] try: ans, unans = sr(IP(dst=scan_hosts)/ICMP(), retry=0, timeout=1) ans.summary(lambda r : r.sprintf("%IP.src% is alive")) for inactive in unans: print ("%s is inactive" %inactive.dst) inactive_hosts.append(inactive.dst) print ("Total %d hosts are inactive" %(len(inactive_hosts))) except KeyboardInterrupt: exit(0) if __name__ == "__main__": parser = argparse.ArgumentParser(description='Python networking utils') parser.add_argument('--scan-hosts', action="store", dest="scan_hosts", required=True) given_args = parser.parse_args() scan_hosts = given_args.scan_hosts scheduler.enter(1, 1, detect_inactive_hosts, (scan_hosts, )) scheduler.run() The output of this script will be something like the following command: $ sudo python 3_7_detect_inactive_machines.py --scan-hosts=10.0.2.2-4 Begin emission: *.Finished to send 3 packets. . Received 6 packets, got 1 answers, remaining 2 packets 10.0.2.2 is alive 10.0.2.4 is inactive 10.0.2.3 is inactive Total 2 hosts are inactive Begin emission: *.Finished to send 3 packets. Received 3 packets, got 1 answers, remaining 2 packets 10.0.2.2 is alive 10.0.2.4 is inactive 10.0.2.3 is inactive Total 2 hosts are inactive How it works... The preceding script first takes a list of network hosts, scan_hosts, from the command line. It then creates a schedule to launch the detect_inactive_hosts() function after a one-second delay. The target function takes the scan_hosts argument and calls Scapy's sr() function. This function schedules itself to rerun after every 10 seconds by calling the schedule.enter() function once again. This way, we run this scanning task periodically. Scapy's sr() scanning function takes an IP, protocol and some scan-control information. In this case, the IP() method passes scan_hosts as the destination hosts to scan, and the protocol is specified as ICMP. This can also be TCP or UDP. We do not specify a retry and one-second timeout to run this script faster. However, you can experiment with the options that suit you. The scanning sr()function returns the hosts that answer and those that don't as a tuple. We check the hosts that don't answer, build a list, and print that information. Performing a basic IPC using connected sockets (socketpair) Sometimes, two scripts need to communicate some information between themselves via two processes. In Unix/Linux, there's a concept of connected socket, of socketpair. We can experiment with this here. Getting ready This recipe is designed for a Unix/Linux host. Windows/Mac is not suitable for running this one. How to do it... We use a test_socketpair() function to wrap a few lines that test the socket's socketpair() function. List 3.8 shows an example of socketpair, as follows: #!/usr/bin/env python # This program is optimized for Python 3.5.2. # It may run on any other version with/without modifications. # To make it run on Python 2.7.x, needs some changes due to API differences. # Follow the comments inline to make the program work with Python 2. import socket import os BUFSIZE = 1024 def test_socketpair(): """ Test Unix socketpair""" parent, child = socket.socketpair() pid = os.fork() try: if pid: print ("@Parent, sending message...") child.close() parent.sendall(bytes("Hello from parent!", 'utf-8')) # Comment out the preceding line and uncomment the following line for Python 2.7. # parent.sendall("Hello from parent!") response = parent.recv(BUFSIZE) print ("Response from child:", response) parent.close() else: print ("@Child, waiting for message from parent") parent.close() message = child.recv(BUFSIZE) print ("Message from parent:", message) child.sendall(bytes("Hello from child!!", 'utf-8')) # Comment out the preceding line and uncomment the following line for Python 2.7. # child.sendall("Hello from child!!") child.close() except Exception as err: print ("Error: %s" %err) if __name__ == '__main__': test_socketpair() The output from the preceding script is as follows: $ python 3_8_ipc_using_socketpairs.py @Parent, sending message... @Child, waiting for message from parent Message from parent: b'Hello from parent!' Response from child: b'Hello from child!!' How it works... The socket.socketpair() function simply returns two connected socket objects. In our case, we can say that one is a parent and another is a child. We fork another process via a os.fork() call. This returns the process ID of the parent. In each process, the other process' socket is closed first and then a message is exchanged via a sendall() method call on the process's socket. The try-except block prints any error in case of any kind of exception. Performing IPC using Unix domain sockets Unix domain sockets (UDS) are sometimes used as a convenient way to communicate between two processes. As in Unix, everything is conceptually a file. If you need an example of such an IPC action, this can be useful. How to do it... We launch a UDS server that binds to a filesystem path, and a UDS client uses the same path to communicate with the server. Listing 3.9a shows a Unix domain socket server, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import socket import os import time SERVER_PATH = "/tmp/python_unix_socket_server" def run_unix_domain_socket_server(): if os.path.exists(SERVER_PATH): os.remove( SERVER_PATH ) print ("starting unix domain socket server.") server = socket.socket( socket.AF_UNIX, socket.SOCK_DGRAM ) server.bind(SERVER_PATH) print ("Listening on path: %s" %SERVER_PATH) while True: datagram = server.recv( 1024 ) if not datagram: break else: print ("-" * 20) print (datagram) if "DONE" == datagram: break print ("-" * 20) print ("Server is shutting down now...") server.close() os.remove(SERVER_PATH) print ("Server shutdown and path removed.") if __name__ == '__main__': run_unix_domain_socket_server() Listing 3.9b shows a UDS client, as follows: #!/usr/bin/env python # Python Network Programming Cookbook, Second Edition -- Article - 3 # This program is optimized for Python 3.5.2. # It may run on any other version with/without modifications. # To make it run on Python 2.7.x, needs some changes due to API differences. # Follow the comments inline to make the program work with Python 2. import socket import sys SERVER_PATH = "/tmp/python_unix_socket_server" def run_unix_domain_socket_client(): """ Run "a Unix domain socket client """ sock = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM) # Connect the socket to the path where the server is listening server_address = SERVER_PATH print ("connecting to %s" % server_address) try: sock.connect(server_address) except socket.error as msg: print (msg) sys.exit(1) try: message = "This is the message. This will be echoed back!" print ("Sending [%s]" %message) sock.sendall(bytes(message, 'utf-8')) # Comment out the preceding line and uncomment the bfollowing line for Python 2.7. # sock.sendall(message) amount_received = 0 amount_expected = len(message) while amount_received < amount_expected: data = sock.recv(16) amount_received += len(data) print ("Received [%s]" % data) finally: print ("Closing client") sock.close() if __name__ == '__main__': run_unix_domain_socket_client() The server output is as follows: $ python 3_9a_unix_domain_socket_server.py starting unix domain socket server. Listening on path: /tmp/python_unix_socket_server -------------------- This is the message. This will be echoed back! The client output is as follows: $ python 3_9b_unix_domain_socket_client.py connecting to /tmp/python_unix_socket_server Sending [This is the message. This will be echoed back!] How it works... A common path is defined for a UDS client/server to interact. Both the client and server use the same path to connect and listen to. In a server code, we remove the path if it exists from the previous run of this script. It then creates a Unix datagram socket and binds it to the specified path. It then listens for incoming connections. In the data processing loop, it uses the recv() method to get data from the client and prints that information on screen. The client-side code simply opens a Unix datagram socket and connects to the shared server address. It sends a message to the server using sendall(). It then waits for the message to be echoed back to itself and prints that message. Finding out if your Python supports IPv6 sockets IP version 6 or IPv6 is increasingly adopted by the industry to build newer applications. In case you would like to write an IPv6 application, the first thing you'd like to know is if your machine supports IPv6. This can be done from the Linux/Unix command line, as follows: $ cat /proc/net/if_inet6 00000000000000000000000000000001 01 80 10 80 lo fe80000000000000642a57c2e51932a2 03 40 20 80 wlo1 From your Python script, you can also check if the IPv6 support is present on your machine, and Python is installed with that support. Getting ready For this recipe, use pip to install a Python third-party library, netifaces, as follows: $ pip install netifaces How to do it... We can use a third-party library, netifaces, to find out if there is IPv6 support on your machine. We can call the interfaces() function from this library to list all interfaces present in the system. Listing 3.10 shows the Python IPv6 support checker, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. # This program depends on Python module netifaces => 0.8 import socket import argparse import netifaces as ni def inspect_ipv6_support(): """ Find the ipv6 address""" print ("IPV6 support built into Python: %s" %socket.has_ipv6) ipv6_addr = {} for interface in ni.interfaces(): all_addresses = ni.ifaddresses(interface) print ("Interface %s:" %interface) for family,addrs in all_addresses.items(): fam_name = ni.address_families[family] print (' Address family: %s' % fam_name) for addr in addrs: if fam_name == 'AF_INET6': ipv6_addr[interface] = addr['addr'] print (' Address : %s' % addr['addr']) nmask = addr.get('netmask', None) if nmask: print (' Netmask : %s' % nmask) bcast = addr.get('broadcast', None) if bcast: print (' Broadcast: %s' % bcast) if ipv6_addr: print ("Found IPv6 address: %s" %ipv6_addr) else: print ("No IPv6 interface found!") if __name__ == '__main__': inspect_ipv6_support() The output from this script will be as follows: $ python 3_10_check_ipv6_support.py IPV6 support built into Python: True Interface lo: Address family: AF_PACKET Address : 00:00:00:00:00:00 Address family: AF_INET Address : 127.0.0.1 Netmask : 255.0.0.0 Address family: AF_INET6 Address : ::1 Netmask : ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff/128 Interface enp2s0: Address family: AF_PACKET Address : 9c:5c:8e:26:a2:48 Broadcast: ff:ff:ff:ff:ff:ff Address family: AF_INET Address : 130.104.228.90 Netmask : 255.255.255.128 Broadcast: 130.104.228.127 Address family: AF_INET6 Address : 2001:6a8:308f:2:88bc:e3ec:ace4:3afb Netmask : ffff:ffff:ffff:ffff::/64 Address : 2001:6a8:308f:2:5bef:e3e6:82f8:8cca Netmask : ffff:ffff:ffff:ffff::/64 Address : fe80::66a0:7a3f:f8e9:8c03%enp2s0 Netmask : ffff:ffff:ffff:ffff::/64 Interface wlp1s0: Address family: AF_PACKET Address : c8:ff:28:90:17:d1 Broadcast: ff:ff:ff:ff:ff:ff Found IPv6 address: {'lo': '::1', 'enp2s0': 'fe80::66a0:7a3f:f8e9:8c03%enp2s0'} How it works... The IPv6 support checker function, inspect_ipv6_support(), first checks if Python is built with IPv6 using socket.has_ipv6. Next, we call the interfaces() function from the netifaces module. This gives us the list of all interfaces. If we call the ifaddresses() method by passing a network interface to it, we can get all the IP addresses of this interface. We then extract various IP-related information, such as protocol family, address, netmask, and broadcast address. Then, the address of a network interface has been added to the IPv6_address dictionary if its protocol family matches AF_INET6. Extracting an IPv6 prefix from an IPv6 address In your IPv6 application, you need to dig out the IPv6 address for getting the prefix information. Note that the upper 64-bits of an IPv6 address are represented from a global routing prefix plus a subnet ID, as defined in RFC 3513. A general prefix (for example, /48) holds a short prefix based on which a number of longer, more specific prefixes (for example, /64) can be defined. A Python script can be very helpful in generating the prefix information. How to do it... We can use the netifaces and netaddr third-party libraries to find out the IPv6 prefix information for a given IPv6 address. Make sure to have netifaces and netaddr installed in your system. $ pip install netaddr The program is as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. # This program depends on Python modules netifaces and netaddr. import socket import netifaces as ni import netaddr as na def extract_ipv6_info(): """ Extracts IPv6 information""" print ("IPv6 support built into Python: %s" %socket.has_ipv6) for interface in ni.interfaces(): all_addresses = ni.ifaddresses(interface) print ("Interface %s:" %interface) for family,addrs in all_addresses.items(): fam_name = ni.address_families[family] for addr in addrs: if fam_name == 'AF_INET6': addr = addr['addr'] has_eth_string = addr.split("%eth") if has_eth_string: addr = addr.split("%eth")[0] try: print (" IP Address: %s" %na.IPNetwork(addr)) print (" IP Version: %s" %na.IPNetwork(addr).version) print (" IP Prefix length: %s" %na.IPNetwork(addr).prefixlen) print (" Network: %s" %na.IPNetwork(addr).network) print (" Broadcast: %s" %na.IPNetwork(addr).broadcast) except Exception as e: print ("Skip Non-IPv6 Interface") if __name__ == '__main__': extract_ipv6_info() The output from this script is as follows: $ python 3_11_extract_ipv6_prefix.py IPv6 support built into Python: True Interface lo: IP Address: ::1/128 IP Version: 6 IP Prefix length: 128 Network: ::1 Broadcast: ::1 Interface enp2s0: IP Address: 2001:6a8:308f:2:88bc:e3ec:ace4:3afb/128 IP Version: 6 IP Prefix length: 128 Network: 2001:6a8:308f:2:88bc:e3ec:ace4:3afb Broadcast: 2001:6a8:308f:2:88bc:e3ec:ace4:3afb IP Address: 2001:6a8:308f:2:5bef:e3e6:82f8:8cca/128 IP Version: 6 IP Prefix length: 128 Network: 2001:6a8:308f:2:5bef:e3e6:82f8:8cca Broadcast: 2001:6a8:308f:2:5bef:e3e6:82f8:8cca Skip Non-IPv6 Interface Interface wlp1s0: How it works... Python's netifaces module gives us the network interface IPv6 address. It uses the interfaces() and ifaddresses() functions for doing this. The netaddr module is particularly helpful to manipulate a network address. It has a IPNetwork() class that provides us with an address, IPv4 or IPv6, and computes the prefix, network, and broadcast addresses. Here, we find this information class instance's version, prefixlen, and network and broadcast attributes. Writing an IPv6 echo client/server You need to write an IPv6 compliant server or client and wonder what could be the differences between an IPv6 compliant server or client and its IPv4 counterpart. How to do it... We use the same approach as writing an echo client/server using IPv6. The only major difference is how the socket is created using IPv6 information. Listing 12a shows an IPv6 echo server, as follows: #!/usr/bin/env python # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import argparse import socket import sys HOST = 'localhost' def echo_server(port, host=HOST): """Echo server using IPv6 """ for result in socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_PASSIVE): af, socktype, proto, canonname, sa = result try: sock = socket.socket(af, socktype, proto) except socket.error as err: print ("Error: %s" %err) try: sock.bind(sa) sock.listen(1) print ("Server lisenting on %s:%s" %(host, port)) except socket.error as msg: sock.close() continue break sys.exit(1) conn, addr = sock.accept() print ('Connected to', addr) while True: data = conn.recv(1024) print ("Received data from the client: [%s]" %data) if not data: break conn.send(data) print ("Sent data echoed back to the client: [%s]" %data) conn.close() if __name__ == '__main__': parser = argparse.ArgumentParser(description='IPv6 Socket Server Example') parser.add_argument('--port', action="store", dest="port", type=int, required=True) given_args = parser.parse_args() port = given_args.port echo_server(port) Listing 12b shows an IPv6 echo client, as follows: #!/usr/bin/env python # Python Network Programming Cookbook, Second Edition -- Article - 3 # This program is optimized for Python 2.7.12 and Python 3.5.2. # It may run on any other version with/without modifications. import argparse import socket import sys HOST = 'localhost' BUFSIZE = 1024 def ipv6_echo_client(port, host=HOST): for res in socket.getaddrinfo(host, port, socket.AF_UNSPEC, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res try: sock = socket.socket(af, socktype, proto) except socket.error as err: print ("Error:%s" %err) try: sock.connect(sa) except socket.error as msg: sock.close() continue if sock is None: print ('Failed to open socket!') sys.exit(1) msg = "Hello from ipv6 client" print ("Send data to server: %s" %msg) sock.send(bytes(msg.encode('utf-8'))) while True: data = sock.recv(BUFSIZE) print ('Received from server', repr(data)) if not data: break sock.close() if __name__ == '__main__': parser = argparse.ArgumentParser(description='IPv6 socket client example') parser.add_argument('--port', action="store", dest="port", type=int, required=True) given_args = parser.parse_args() port = given_args.port ipv6_echo_client(port) The server output is as follows: $ python 3_12a_ipv6_echo_server.py --port=8800 Server lisenting on localhost:8800 ('Connected to', ('127.0.0.1', 56958)) Received data from the client: [Hello from ipv6 client] Sent data echoed back to the client: [Hello from ipv6 client] The client output is as follows: $ python 3_12b_ipv6_echo_client.py --port=8800 Send data to server: Hello from ipv6 client ('Received from server', "'Hello from ipv6 client'") The following screenshot indicates the server and client output: How it works... The IPv6 echo server first determines its IPv6 information by calling socket.getaddrinfo(). Notice that we passed the AF_UNSPEC protocol for creating a TCP socket. The resulting information is a tuple of five values. We use three of them, address family, socket type, and protocol, to create a server socket. Then, this socket is bound with the socket address from the previous tuple. It then listens to the incoming connections and accepts them. After a connection is made, it receives data from the client and echoes it back. On the client-side code, we create an IPv6-compliant client socket instance and send the data using the send() method of that instance. When the data is echoed back, the recv() method is used to get it back. Summary In this article, the author has tried to explain certain recipes that explains the various IPv6 utilities in Python including an IPv6 client/server. Also some other protocols like ICMP ping and their working is touched upon throroughly. Scapy is explained so as to give a even better understanding about its popularity amongst the network Python programmers. Resources for Article: Further resources on this subject: Introduction to Network Security [article] Getting Started with Cisco UCS and Virtual Networking [article] Revisiting Linux Network Basics [article]
Read more
  • 0
  • 0
  • 22745

article-image-how-to-use-standard-macro-in-workflows
Sunith Shetty
21 Feb 2018
6 min read
Save for later

How to use Standard Macro in Workflows

Sunith Shetty
21 Feb 2018
6 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Renato Baruti titled Learning Alteryx. In this book, you will learn how to perform self-analytics and create interactive dashboards using various tools in Alteryx.[/box] Today we will learn Standard Macro that will provide you with a foundation for building enhanced workflows. The csv file required for this tutorial is available to download here. Standard Macro Before getting into Standard Macro, let's define what a macro is. A macro is a collection of workflow tools that are grouped together into one tool. Using a range of different interface tools, a macro can be developed and used within a workflow. Any workflow can be turned into a macro and a repeatable element of a workflow can commonly be converted into a macro. There are a couple of ways you can turn your workflow into a Standard Macro. The first is to go to the canvas configuration pane and navigate to the Workflow tab. This is where you select what type of workflow you want. If you select Macro you should then have Standard Macro automatically selected. Now, when you save this workflow it will save as a macro. You’ll then be able to add it to another workflow and run the process created within the macro itself. The second method is just to add a Macro Input tool from the Interface tool section onto the canvas; the workflow will then automatically change to a Standard Macro. The following screenshot shows the selection of a Standard Macro, under the Workflow tab: Let's go through an example of creating and deploying a standard macro. Standard Macro Example #1: Create a macro that allows the user to input a number used as a multiplier. Use the multiplier for the DataValueAlt field. The following steps demonstrate this process: Step 1: Select the Macro Input tool from the Interface tool palette and add the tool onto the canvas. The workflow will automatically change to a Standard Macro. Step 2: Select Text Input and Edit Data option within the Macro Input tool configuration. Step 3: Create a field called Number and enter the values: 155, 243, 128, 352, and 357 in each row, as shown in the following image: Step 4: Rename the Input Name Input and set the Anchor Abbreviation as I as shown in the following image: Step 5: Select the Formula tool from the Preparation tool palette. Connect the Formula tool to the Macro Input tool. Step 6: Select the + Add Column option in the Select Column drop down within the Formula tool configuration. Name the field Result. Step 7: Add the following expression to the expression window: [Number]*0.50 Step 8: Select the Macro Output tool from the Interface tool palette and add the tool onto the canvas. Connect the Macro Output tool to the Formula tool. Step 9: Rename the Output Name Output and set the Anchor Abbreviation as O: The Standard Macro has now been created. It can be saved to use as multiplier, to calculate the five numbers added within the Macro Input tool to multiply 0.50. This is great; however, let's take it a step further to make it dynamic and flexible by allowing the user to enter a multiplier. For instance, currently the multiplier is set to 0.50, but what if a user wants to change that to 0.25 or 0.10 to determine the 25% or 10% value of a field. Let's continue building out the Standard Macro to make this possible. Step 1: Select the Text Box tool from the Interface tool palette and drag it onto the canvas. Connect the Text Box tool to the Formula tool on the lightning bolt (the macro indicator). The Action tool will automatically be added to the canvas, as this automatically updates the configuration of a workflow with values provided by interface questions when run as an app or macro. Step 2: Configure the Action tool that will automatically update the expression replaced by a specific field. Select Formula | FormulaFields | FormulaField | @expression - value="[Number]*0.50". Select the Replace a specific string: option and enter 0.50. This is where the automation happens, updating the 0.50 to any number the user enters. You will see how this happens in the following steps: Step 3: In the Enter the text or question to be displayed text box, within the Text Box tool configuration, enter: Please enter a number: Step 4: Save the workflow as Standard Macro.yxmc. The .yxmc file type indicates it's a macro related workflow, as shown in the following image: Step 5: Open a new workflow. Step 6: Select the Input Data tool from the In/Out tool palette and connect to the U.S. Chronic Disease Indicators.csv file. Step 7: Select the Select tool from the Preparation tool palette and drag it onto the canvas. Connect the Select tool to the Input Data tool. Step 8: Change the Data Type for the DataValueAlt field to Double. Step 9: Right-click on the canvas and select Insert | Macro | Standard Macro. Step 10: Connect the Standard Macro to the Select tool. Step 11: There will be Questions to select within the Standard Macro tool configuration. Select DataValueAlt (Double) as the Choose Field option and enter 0.25 in the Please enter a number text box: Step 12: Add a Browse tool to the Standard Macro tool. Step 13: Run the workflow: The goal for creating this Standard Macro was to allow the user to select what they would like the multiplier to be rather than a static number. Let's recap what has been created and deployed using a Standard Macro. First, Standard Macro.yxmc was developed using Interface tools. The Macro Input (I) was used to enter sample text data for the Number field. This Number field is what is used to multiply to what the given multiplier is - in this case, 0.50. This is the static number multiplier. The Formula tool was used to create the expression to conclude that the Number field will be multiplied by 0.50. The Macro Output (O) was used to output the macro so that it can be used in another workflow. The Text Box tool is where the question Please enter a number will be displayed, along with the Action tool that is used to update the specific value replaced. The current multiplier, 0.50, is replaced by 0.25, as identified in step 20, through a dynamic input by which the user can enter the multiplier. Notice that, in the Browse tool output, the Result field has been added, multiplying the values for the DataValueAlt field to the multiplier 0.25. Change the value in the macro to 0.10 and run the workflow. The Result field has been updated to now multiple the values for the DataValueAlt field to the multiplier 0.10. This is a great use case of a Standard Macro and demonstrates how versatile the Interface tools are. We learned about macros and their dynamic use within workflows. We saw how Standard Macro was developed to allow the end user to specify what they want the multiplier to be. This is a great way to implement the interactivity within a workflow. To know more about high-quality interactive dashboards and efficient self-service data analytics, do checkout this book Learning Alteryx.  
Read more
  • 0
  • 0
  • 6426
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-vmware-vsphere-storage-datastores-snapshots
Packt
21 Feb 2018
9 min read
Save for later

VMware vSphere storage, datastores, snapshots

Packt
21 Feb 2018
9 min read
VMware vSphere storage, datastores, snapshotsIn this article, byAbhilash G B, author of the book,VMware vSphere 6.5 CookBook - Third Edition, we will cover the following:Managing VMFS volumes detected as snapshotsCreating NFSv4.1 datastores with Kerberos authenticationEnabling storage I/O control (For more resources related to this topic, see here.) IntroductionStorage is an integral part of any infrastructure. It is used to store the files backing your virtual machines. The most common way to refer to a type of storage presented to a VMware environment is based on the protocol used and the connection type. NFS are storage solutions that can leverage the existing TCP/IP network infrastructure. Hence, they are referred to as IP-based storage. Storage IO Control (SIOC) is one of the mechanisms to use ensure a fair share of storage bandwidth allocation to all Virtual Machines running on shared storage, regardless of the ESXi host the Virtual Machines are running on. Managing VMFS volumes detected as snapshotsSome environments maintain copies of the production LUNs as a backup, by replicating them. These replicas are exact copies of the LUNs that were already presented to the ESXi hosts. If for any reason a replicated LUN is presented to an ESXi host, then the host will not mount the VMFS volume on the LUN. This is a precaution to prevent data corruption. ESXi identifies each VMFS volume using its signature denoted by aUniversally Unique Identifier (UUID). The UUID is generated when the volume is first created or resignatured and is stored in the LVM header of the VMFS volume. When an ESXi host scans for new LUN ;devices and VMFS volumes on it, it compares the physical device ID (NAA ID) of the LUN with the device ID (NAA ID) value stored in the VMFS volumes LVM header. If it finds a mismatch, then it flags the volume as a snapshot volume.Volumes detected as snapshots are not mounted by default. There are two ways to mount such volumes/datastore:Mount by Keeping the Existing Signature Intact - This is used when you are attempting to temporarily mount the snapshot volume on an ESXi that doesn't see the original volume. If you were to attempt mounting the VMFS volume by keeping the existing signature and if the host sees the original volume, then you will not be allowed to mount the volume and will be warned about the presence of another VMFS volume with the same UUID:Mount by generating a new VMFS Signature - This has to be used if you are mounting a clone or a snapshot of an existing VMFS datastore to the same host/s. The process of assigning a new signature will not only update the LVM header with the newly generated UUID, but all the Physical Device ID (NAA ID) of the snapshot LUN. Here, the VMFS volume/datastore will be renamed by prefixing the wordsnap followed by a random number and the name of the original datastore: Getting readyMake sure that the original datastore and its LUN is no longer seen by the ESXi host the snapshot is being mounted to. How to do it...The following procedure will help mount a VMFS volume from a LUN detected as a snapshot:Log in to the vCenter Server using the vSphere Web Client and use the key combination Ctrl+Alt+2 to switch to the Host and Clusters view.Right click on the ESXi host the snapshot LUN is mapped to and go to Storage | New Datastore.On the New Datastore wizard, select VMFS as the filesystem type and click Next to continue.On the Name and Device selection screen, select the LUN detected as a snaphsot and click Next to continue:On the Mount Option screen, choose to either mount by assigning a new signature or by keeping the existing signature, and click Next to continue:On the Ready to Complete screen, review the setting and click Finish to initiate the operation. Creating NFSv4.1 datastores with Kerberos authenticationVMware introduced support for NFS 4.1 with vSphere 6.0. The vSphere 6.5 added several enhancements:It now supports AES encryptionSupport for IP version 6Support Kerberos's integrity checking mechanismHere, we will learn how to create NFS 4.1 datastores. Although the procedure is similar to NFSv3, there are a few additional steps that needs to be performed. Getting readyFor Kerberos authentication to work, you need to make sure that the ESXi hosts and the NFS Server is joined to the Active Directory domainCreate a new or select an existing AD user for NFS Kerberos authenticationConfigure the NFS Server/Share to Allow access to the AD user chosen for NFS Kerberos authentication How to do it...The following procedure will help you mount an NFS datasture using the NFSv4.1 client with Kerberos authentication enabled:Log in to the vCenter Server using the vSphere Web Client and use the key combination Ctrl+Alt+2 to switch to the Host and Clusters view, select the desired ESXi host and navigate to it  Configure | System | Authentication Services section and supply the credentials of the Active Directory user that was chosen for NFS Kerberon Authentication:Right-click on the desired ESXi host and go to Storage | New Datastore to bring-up the Add Storage wizard.On the New Datastore wizard, select the Type as NFS and click Next to continue.On the Select NFS version screen, select NFS 4.1 and click Next to continue. Keep in mind that it is not recommended to mount an NFS Export using both NFS3 and NFS4.1 client. On the Name and Configuration screen, supply a Name for the Datastore, the NFS export's folder path and NFS Server's IP Address or FQDN. You can also choose to mount the share as ready-only if desired:On the Configure Kerberos Authentication screen, check the Enable Kerberos-based authentication box and choose the type of authentication required and click Next to continue:On the Ready to Complete screen review the settings and click Finish to mount the NFS export. Enabling storage I/O controlThe use of disk shares will work just fine as long as the datastore is seen by a single ESXi host. Unfortunately, that is not a common case. Datastores are often shared among multiple ESXi hosts. When datastores are shared, you bring in more than one local host scheduler into the process of balancing the I/O among the virtual machines. However, these lost host schedules cannot talk to each other and their visibility is limited to the ESXi hosts they are running on. This easily contributes to a serious problem called thenoisy neighbor situation. The job of SIOC is to enable some form of communication between local host schedulers so that I/O can be balanced between virtual machines running on separate hosts.  How to do it...The following procedure will help you enable SIOC on a datastore:Connect to the vCenter Server using the Web Client and switch to the Storage view using the key combination Ctrl+Alt+4.Right-click on the desired datastore and go to Configure Storage I/O Control:On the Configure Storage I/O Control window, select the checkbox Enable Storage I/O Control, set a custom congestion threshold (only if needed) and click OK to confirm the settings: With the Virtual Machine selected from the inventory, navigate to its Configure | General tab and review its datastore capability settings to ensure that SIOC is enabled:  How it works...As mentioned earlier, SIOC enables communication between these local host schedulers so that I/O can be balanced between virtual machines running on separate hosts. It does so by maintaining a shared file in the datastore that all hosts can read/write/update. When SIOC is enabled on a datastore, it starts monitoring the device latency on the LUN backing the datastore. If the latency crosses the threshold, it throttles the LUN's queue depth on each of the ESXi hosts in an attempt to distribute a fair share of access to the LUN for all the Virtual Machines issuing the I/O.The local scheduler on each of the ESXi hosts maintains an iostats file to keep its companion hosts aware of the device I/O statistics observed on the LUN. The file is placed in a directory (naa.xxxxxxxxx) on the same datastore.For example, if there are six virtual machines running on three different ESXi hosts, accessing a shared LUN. Among the six VMs, four of them have a normal share value of 1000 and the remaining two have high (2000) disk share value sets on them. These virtual machines have only a single VMDK attached to them. VM-C on host ESX-02 is issuing a large number of I/O operations. Since that is the only VM accessing the shared LUN from that host, it gets the entire queue's bandwidth. This can induce latency on the I/O operations performed by the other VMs: ESX-01 and ESX-03. If the SIOC detects the latency value to be greater than the dynamic threshold, then it will start throttling the queue depth: The throttled DQLEN for a VM is calculated as follows:DQLEN for the VM = (VM's Percent of Shares) of (Queue Depth)Example: 12.5 % of 64 → (12.5 * 64)/100 = 8The throttled DQLEN per host is calculated as follows:DQLEN of the Host = Sum of the DQLEN of the VMs on itExample: VM-A (8) + VM-B(16) = 24The following diagram shows the effect of SIOC throttling the queue depth: SummaryIn this article we learnt, how to mount a VMFS volume from a LUN detected as a snapshot, how to mount an NFS datasture using the NFSv4.1 client with Kerberos authentication enabled, and how to enable SIOC on a datastore. Resources for Article:   Further resources on this subject: Essentials of VMware vSphere [article] Working with VMware Infrastructure [article] Network Virtualization and vSphere [article]
Read more
  • 0
  • 0
  • 37975

article-image-what-is-asp-dotnet-core
Jason De
21 Feb 2018
4 min read
Save for later

What is ASP.NET Core?

Jason De
21 Feb 2018
4 min read
This is a guest post by Jason De Oliveira, the author of Learning ASP.NET Core 2.0. ASP.NET Core is a new cross-platform open-source framework for building modern web-based applications. It provides everything necessary for building web applications, mobile applications as well as Internet of Things (IoT) applications quickly and easily. The first preview release of ASP.NET came out almost 15 years ago as part of the .NET Framework. Over the years Microsoft has added and evolved many of its features until coming up with a complete redesign of the ASP.NET framework called ASP.NET Core in June 2016. After ASP.NET Core 1.0 and 1.1, the third and latest installment - version 2.0 was released in August 2017. ASP.NET Core 2.0 applications not only run on the .NET Core Framework but also on the full .NET Framework. The ASP.NET Core 2.0 framework was designed from the ground-up to provide an optimized development framework for applications, which can be deployed either in the cloud or on-premises. It consists of modular components with minimal overhead to retain a high degree of flexibility, when conceiving and implementing software solutions. Additionally, ASP.NET Core 2.0 applications run on Windows, Linux and MacOS. Given these impressive set of features, millions of software developers have adopted it to build and run all types of web applications since its inception. What are the key features of ASP.NET Core 2.0? ASP.NET Core 2.0 applications have normally the following characteristics: They are service-oriented They provide very high-performance They are easy to deploy and to configure They support cross-platform scenarios They contain high security levels It helps developers build scalable, reliable, high-performance web applications in C#. Moreover, it allows them to achieve productivity and quality levels never seen before with any of the other frameworks available in the market. Thanks to these impressive set of features, millions of software developers have adopted ASP.NET Core to build and run all types of web applications since its inception. Why web developers need to know ASP.NET Core 2.0 Modern web applications, which can run on multiple environments are clearly the future. Developers, who want to invest in a sustainable technology, which is going to last for a long time, have to acquire advanced web development skills today. ASP.NET Core 2.0 will surely be the future market standard. It includes everything a web developer can dream of. This allows them to be more productive and provide higher quality, while lowering maintenance efforts, since everything is already built-in. You can build modern web applications as well as powerful Web APIs, which can be deployed in any environment. Furthermore, it supports the latest technologies and best practices such as Containers and Microservices. But providing great web applications is not only about their development, which is only the beginning. Once developed, you need to assure that your applications are running successfully and that you can react quickly in case of unexpected behavior or errors during runtime. That is where supervision and monitoring come into play. A great developer needs to be open to these topics and understand them to build software, that supports them by default. How ASP.NET Core 2.0 might be used in the future If you want to prepare yourself for the future, you need to learn ASP.NET 2.0! It has been designed for the latest standards and best practices. It supports all current technologies and is open-source, which means that it will evolve over time and be adapted to new trends by Microsoft and the various developer communities. Since it is running on Windows, Linux and Mac OS on-premises and in the cloud, you can deploy and host your web application in any type of environment. So, there is no lock-in or risk in using it for your applications. No seriously, ASP.NET Core 2.0 is the most futuristic web development framework on the market! So start learning it today!
Read more
  • 0
  • 0
  • 31993

article-image-implementing-face-detection-using-haar-cascades-adaboost-algorithm
Sugandha Lahoti
20 Feb 2018
7 min read
Save for later

Implementing face detection using the Haar Cascades and AdaBoost algorithm

Sugandha Lahoti
20 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Ankit Dixit titled Ensemble Machine Learning. This book serves as an effective guide to using ensemble techniques to enhance machine learning models.[/box] In today’s tutorial, we will learn how to apply the AdaBoost classifier in face detection using Haar cascades. Face detection using Haar cascades Object detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper Rapid Object Detection using a Boosted Cascade of Simple Features in 2001. It is a machine-learning-based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images. Here, we will work with face detection. Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. Features are nothing but numerical information extracted from the images that can be used to distinguish one image from another; for example, a histogram (distribution of intensity values) is one of the features that can be used to define several characteristics of an image even without looking at the image, such as dark or bright image, the intensity range of the image, contrast, and so on. We will use Haar features to detect faces in an image. Here is a figure showing different Haar features: These features are just like the convolution kernel; to know about convolution, you need to wait for the following chapters. For a basic understanding, convolutions can be described as in the following figure: So we can summarize convolution with these steps: Pick a pixel location from the image. Now crop a sub-image with the selected pixel as the center from the source image with the same size as the convolution kernel. Calculate an element-wise product between the values of the kernel and sub- image. Add the result of the product. Put the resultant value into the new image at the same place where you picked up the pixel location. Now we are going to do a similar kind of procedure, but with a slight difference for our images. Each feature of ours is a single value obtained by subtracting the sum of the pixels under the white rectangle from the sum of the pixels under the black rectangle. Now, all possible sizes and locations of each kernel are used to calculate plenty of features. (Just imagine how much computation it needs. Even a 24x24 window results in over 160,000 features.) For each feature calculation, we need to find the sum of the pixels under the white and black rectangles. To solve this, we will use the concept of integral image; we will discuss this concept very briefly here, as it's not a part of our context. Integral image Integral images are those images in which the pixel value at any (x,y) location is the sum of the all pixel values present before the current pixel. Its use can be understood by the following example: Image on the left and the integral image on the right. Let's see how this concept can help reduce computation time; let us assume a matrix A of size 5x5 representing an image, as shown here: Now, let's say we want to calculate the average intensity over the area highlighted: Region for addition Normally, you'd do the following: 9 + 1 + 2 + 6 + 0 + 5 + 3 + 6 + 5 = 37 37 / 9 = 4.11 This requires a total of 9 operations. Doing the same for 100 such operations would require: 100 * 9 = 900 operations. Now, let us first make a integral image of the preceding image: Making this image requires a total of 56 operations. Again, focus on the highlighted portion: To calculate the avg intensity, all you have to do is: (76 - 20) - (24 - 5) = 37 37 / 9 = 4.11 This required a total of 4 operations. To do this for 100 such operations, we would require: 56 + 100 * 4 = 456 operations. For just a hundred operations over a 5x5 matrix, using an integral image requires about 50% less computations. Imagine the difference it makes for large images and other such operations. Creation of an integral image changes other sum difference operations by almost O(1) time complexity, thereby decreasing the number of calculations. It simplifies the calculation of the sum of pixels—no matter how large the number of pixels—to an operation involving just four pixels. Nice, isn't it? It makes things superfast. However, among all of these features we calculated, most of them are irrelevant. For example, consider the following image. The top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks or any other part is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by AdaBoost. To do this, we apply each and every feature on all the training images. For each feature, it finds the best threshold that will classify the faces as positive and negative. Obviously, there will be errors or misclassifications. We select the features with the minimum error rate, which means they are the features that best classify the face and non-face images. Note: The process is not as simple as this. Each image is given an equal weight in the       beginning. After each classification, the weights of misclassified images are increased. Again, the same process is done. New error rates are calculated among the new weights. This process continues until the required accuracy or error rate is achieved or the required number of features is found. The final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can't classify the image, but together with others, it forms a strong classifier. The paper says that even 200 features provide detection with 95% accuracy. Their final setup had around 6,000 features. (Imagine a reduction from 160,000+ to 6000 features. That is a big gain.) Face detection framework using the Haar cascade and AdaBoost algorithm So now, you take an image take each 24x24 window, apply 6,000 features to it, and check if it is a face or not. Wow! Wow! Isn't this a little inefficient and time consuming? Yes, it is. The authors of the algorithm have a good solution for that. In an image, most of the image region is non-face. So it is a better idea to have a simple method to verify that a window is not a face region. If it is not, discard it in a single shot. Don’t process it again. Instead, focus on the region where there can be a face. This way, we can find more time to check a possible face region. For this, they introduced the concept of a cascade of classifiers. Instead of applying all the 6,000 features to a window, we group the features into different stages of classifiers and apply one by one (normally first few stages will contain very few features). If a window fails in the first stage, discard it. We don’t consider the remaining features in it. If it passes, apply the second stage of features and continue the process. The window that passes all stages is a face region. How cool is the plan!!! The authors' detector had 6,000+ features with 38 stages, with 1, 10, 25, 25, and 50 features in the first five stages (two features in the preceding image were actually obtained as the best two features from AdaBoost). According to the authors, on average, 10 features out of 6,000+ are evaluated per subwindow. So this is a simple, intuitive explanation of how Viola-Jones face detection works. Read the paper for more details. If you found this post useful, do check out the book Ensemble Machine Learning to learn different machine learning aspects such as bagging, boosting, and stacking.    
Read more
  • 0
  • 0
  • 77087

article-image-your-first-swift-program
Packt
20 Feb 2018
4 min read
Save for later

Your First Swift Program

Packt
20 Feb 2018
4 min read
 In this article, by Keith Moon author of the book Swift 4 Programming Cookbook, we will learn how to write your first swift program. (For more resources related to this topic, see here.) Your first Swift program In this first recipe will be get up and running with Swift using a Swift Playground, and run our first piece of Swift code. Getting ready To run our first Swift program, we first need to download and install our IDE. During the beta of Apple's Xcode 9, it is available as a direct download from Apple's developer website at http://developer.apple.com/download, access to this beta will require a free Apple developer account. Once the beta has ended and Xcode 9 is publically available, it will also be available from the Mac App Store. By obtaining it from the Mac App Store, you will automatically be informed of updates, so this is the preferred route, once Xcode 9 is out of beta. Xcode from the Mac App Store Open up the Mac App Store, either from the dock or via Spotlight: Search for xcode: Click Install: Xcode is a large download (over 4 GB). So, depending on your internet connection, this could take a while! Progress can be monitored from Launchpad: Xcode as a direct download Go to the Apple Developer download page at http://developer.apple.com/download  Click the Download button to download Xcode within a .xip file.  Double click on the downloaded file to unpack the Xcode application. Drag the Xcode application into your Applications folder How to do it... With Xcode downloaded, let create our first Swift playground: Launch Xcode from the icon in your dock. From the welcome screen, choose Get started with a playground. From the template chooser, select the blank template from the iOS tab: Choose a name for your playground and a location to save it: Xcode Playgrounds can be based on one of three different Apple platforms, iOS, tvOS and macOS (the operating system formerly known as OSX). Playgrounds provide full access to the frameworks available to either iOS, tvOS or macOS, depending on which you choose. An iOS playground will be assumed for the entirety of this chapter, chiefly because this is the platform of choice of the author. Where recipes do have UI components, the iOS platform will be used until otherwise stated. You are now presented with a view that looks like this: Let's replace the word playground with Swift!. Press the blue play button in the bottom left-hand corner of the window to execute the code in the playground: Congratulations! You have just run some Swift code. On the right-hand side of the window, you will see the output of each line of code in the playground. We can see our line of code has output "Hello, Swift!": There's more... If you put your cursor over the output on the left-hand side, you will see two buttons, one that looks like an eye, another that is a circle: Click on the eye button and you get a Quick Look box of the output. This isn't that useful for just a string, but can be useful for more visual output like colors and views. Click on the square button, and a box will be added in-line, under your code, showing the output of the code. This can be really useful if you want to see how the output changes as you change the code. Summary In this article, we learnt how to run your first swift program. Resources for Article: Further resources on this subject: Your First Swift App [article] Exploring Swift [article] Functions in Swift [article]
Read more
  • 0
  • 0
  • 28336
article-image-introduction-performance-testing-and-jmeter
Packt
20 Feb 2018
11 min read
Save for later

Introduction to Performance Testing and JMeter

Packt
20 Feb 2018
11 min read
In this article by Bayo Erinle, the author of the book Performance Testing with JMeter 3, will explore some of the options that make JMeter a great tool of choice for performance testing.  (For more resources related to this topic, see here.) Performance testing and tuning There is a strong relationship between performance testing and tuning, in the sense that one often leads to the other. Often, end-to-end testing unveils system or application bottlenecks that are regarded unacceptable with project target goals. Once those bottlenecks are discovered, the next step for most teams is a series of tuning efforts to make the application perform adequately. Such efforts normally include, but are not limited to, the following: Configuring changes in system resources Optimizing database queries Reducing round trips in application calls, sometimes leading to redesigning and re-architecting problematic modules Scaling out application and database server capacity Reducing application resource footprint Optimizing and refactoring code, including eliminating redundancy and reducing execution time Tuning efforts may also commence if the application has reached acceptable performance but the team wants to reduce the amount of system resources being used, decrease the volume of hardware needed, or further increase system performance. After each change (or series of changes), the test is re-executed to see whether the performance has improved or declined due to the changes. The process will be continued with the performance results having reached acceptable goals. The outcome of these test-tuning circles normally produces a baseline. Baselines Baseline is a process of capturing performance metric data for the sole purpose of evaluating the efficacy of successive changes to the system or application. It is important that all characteristics and configurations, except those specifically being varied for comparison, remain the same in order to make effective comparisons as to which change (or series of changes) is driving results toward the targeted goal. Armed with such baseline results, subsequent changes can be made to the system configuration or application and testing results can be compared to see whether such changes were relevant or not. Some considerations when generating baselines include the following: They are application-specific They can be created for system, application, or modules They are metrics/results They should not be over generalized They evolve and may need to be redefined from time to time They act as a shared frame of reference They are reusable They help identify changes in performance Load and stress testing Load testing is the process of putting demand on a system and measuring its response, that is, determining how much volume the system can handle. Stress testing is the process of subjecting the system to unusually high loads far beyond its normal usage pattern to determine its responsiveness. These are different from performance testing, whose sole purpose is to determine the response and effectiveness of a system, that is, how fast the system is. Since load ultimately affects how a system responds, performance testing is always done in conjunction with stress testing. JMeter to the rescue One of the areas performance testing covers is testing tools. Which testing tool do you use to put the system and application under load? There are numerous testing tools available to perform this operation, from free to commercial solutions. However, our focus will be on Apache JMeter, a free, open source, cross-platform desktop application from the Apache Software foundation. JMeter has been around since 1998 according to historic change logs on its official site, making it a mature, robust, and reliable testing tool. Cost may also have played a role in its wide adoption. Small companies usually may not want to foot the bill for commercial end testing tools, which often place restrictions, for example, on how many concurrent users one can spin off. My first encounter with JMeter was exactly a result of this. I worked in a small shop that had paid for a commercial testing tool, but during the course of testing, we had outrun the licensing limits of how many concurrent users we needed to simulate for realistic test plans. Since JMeter was free, we explored it and were quite delighted with the offerings and the share amount of features we got for free. Here are some of its features: Performance tests of different server types, including web (HTTP and HTTPS), SOAP, database, LDAP, JMS, mail, and native commands or shell scripts Complete portability across various operating systems Full multithreading framework allowing concurrent sampling by many threads and simultaneous sampling of different functions by separate thread groups Full featured Test IDE that allows fast Test Plan recording, building, and debugging Dashboard Report for detailed analysis of application performance indexes and key transactions In-built integration with real-time reporting and analysis tools, such as Graphite, InfluxDB, and Grafana, to name a few Complete dynamic HTML reports Graphical User Interface (GUI) HTTP proxy recording server Caching and offline analysis/replaying of test results High extensibility Live view of results as testing is being conducted JMeter allows multiple concurrent users to be simulated on the application, allowing you to work toward most of the target goals obtained earlier, such as attaining baseline and identifying bottlenecks. It will help answer questions, such as the following: Will the application still be responsive if 50 users are accessing it concurrently? How reliable will it be under a load of 200 users? How much of the system resources will be consumed under a load of 250 users? What will the throughput look like with 1000 users active in the system? What will be the response time for the various components in the application under load? JMeter, however, should not be confused with a browser. It doesn't perform all the operations supported by browsers; in particular, JMeter does not execute JavaScript found in HTML pages, nor does it render HTML pages the way a browser does. However, it does give you the ability to view request responses as HTML through many of its listeners, but the timings are not included in any samples. Furthermore, there are limitations to how many users can be spun on a single machine. These vary depending on the machine specifications (for example, memory, processor speed, and so on) and the test scenarios being executed. In our experience, we have mostly been able to successfully spin off 250-450 users on a single machine with a 2.2 GHz processor and 8 GB of RAM. Up and running with JMeter Now, let's get up and running with JMeter, beginning with its installation. Installation JMeter comes as a bundled archive, so it is super easy to get started with it. Those working in corporate environments behind a firewall or machines with non-admin privileges appreciate this more. To get started, grab the latest binary release by pointing your browser to http://jmeter.apache.org/download_jmeter.cgi. At the time of writing this, the current release version is 3.1. The download site offers the bundle as both a .zip file and a .tgz file. We go with the .zip file option, but feel free to download the .tgz file if that's your preferred way of grabbing archives. Once downloaded, extract the archive to a location of your choice. The location you extracted the archive to will be referred to as JMETER_HOME. Provided you have a JDK/JRE correctly installed and a JAVA_HOME environment variable set, you are all set and ready to run! The following screenshot shows a trimmed down directory structure of a vanilla JMeter install: JMETER_HOME folder structure The following are some of the folders in Apache-JMeter-3.2, as shown in the preceding screenshot: bin: This folder contains executable scripts to run and perform other operations in JMeter docs: This folder contains a well-documented user guide extras: This folder contains miscellaneous items, including samples illustrating the usage of the Apache Ant build tool (http://ant.apache.org/) with JMeter, and bean shell scripting lib: This folder contains utility JAR files needed by JMeter (you may add additional JARs here to use from within JMeter; we will cover this in detail later) printable_docs: This is the printable documentation Installing Java JDK Follow these steps to install Java JDK: Go to http://www.oracle.com/technetwork/java/javase/downloads/index.html. Download Java JDK (not JRE) compatible with the system that you will use to test. At the time of writing, JDK 1.8 (update 131) was the latest. Double-click on the executable and follow the onscreen instructions. On Windows systems, the default location for the JDK is under Program Files. While there is nothing wrong with this, the issue is that the folder name contains a space, which can sometimes be problematic when attempting to set PATH and run programs, such as JMeter, depending on the JDK from the command line. With this in mind, it is advisable to change the default location to something like C:toolsjdk. Setting up JAVA_HOME Here are the steps to set up the JAVA_HOME environment variable on Windows and Unix operating systems. On Windows For illustrative purposes, assume that you have installed Java JDK at C:toolsjdk: Go to Control Panel. Click on System. Click on Advance System settings. Add Environment to the following variables:     Value: JAVA_HOME     Path: C:toolsjdk Locate Path (under system variables, bottom half of the screen). Click on Edit. Append %JAVA_HOME%/bin to the end of the existing path value (if any). On Unix For illustrative purposes, assume that you have installed Java JDK at /opt/tools/jdk: Open up a Terminal window. Export JAVA_HOME=/opt/tools/jdk. Export PATH=$PATH:$JAVA_HOME. It is advisable to set this in your shell profile settings, such as .bash_profile (for bash users) or .zshrc (for zsh users), so that you won't have to set it for each new Terminal window you open. Running JMeter Once installed, the bin folder under the JMETER_HOME folder contains all the executable scripts that can be run. Based on the operating system that you installed JMeter on, you either execute the shell scripts (.sh file) for operating systems that are Unix/Linux flavored, or their batch (.bat file) counterparts on operating systems that are Windows flavored. JMeter files are saved as XML files with a .jmx extension. We refer to them as test scripts or JMX files. These scripts include the following: jmeter.sh: This script launches JMeter GUI (the default) jmeter-n.sh: This script launches JMeter in non-GUI mode (takes a JMX file as input) jmeter-n-r.sh: This script launches JMeter in non-GUI mode remotely jmeter-t.sh: This opens a JMX file in the GUI jmeter-server.sh: This script starts JMeter in server mode (this will be kicked off on the master node when testing with multiple machines remotely) mirror-server.sh: This script runs the mirror server for JMeter shutdown.sh: This script gracefully shuts down a running non-GUI instance stoptest.sh: This script abruptly shuts down a running non-GUI instance   To start JMeter, open a Terminal shell, change to the JMETER_HOME/bin folder, and run the following command on Unix/Linux: ./jmeter.sh Alternatively, run the following command on Windows: jmeter.bat Take a moment to explore the GUI. Hover over each icon to see a short description of what it does. The Apache JMeter team has done an excellent job with the GUI. Most icons are very similar to what you are used to, which helps ease the learning curve for new adapters. Some of the icons, for example, stop and shutdown, are disabled for now till a scenario/test is being conducted. The JVM_ARGS environment variable can be used to override JVM settings in the jmeter.bat or jmeter.sh script. Consider the following example: export JVM_ARGS="-Xms1024m -Xmx1024m -Dpropname=propvalue". Command-line options To see all the options available to start JMeter, run the JMeter executable with the -? command. The options provided are as follows: . ./jmeter.sh -? -? print command line options and exit -h, --help print usage information and exit -v, --version print the version information and exit -p, --propfile <argument> the jmeter property file to use -q, --addprop <argument> additional JMeter property file(s) -t, --testfile <argument> the jmeter test(.jmx) file to run -l, --logfile <argument> the file to log samples to -j, --jmeterlogfile <argument> jmeter run log file (jmeter.log) -n, --nongui run JMeter in nongui mode ... -J, --jmeterproperty <argument>=<value> Define additional JMeter properties -G, --globalproperty <argument>=<value> Define Global properties (sent to servers) e.g. -Gport=123 or -Gglobal.properties -D, --systemproperty <argument>=<value> Define additional system properties -S, --systemPropertyFile <argument> additional system property file(s) This is a snippet (non-exhaustive list) of what you might see if you did the same. Summary In this article we have learnt relationship between performance testing and tuning, and how to install and run JMeter.   Resources for Article: Further resources on this subject: Functional Testing with JMeter [article] Creating an Apache JMeter™ test workbench [article] Getting Started with Apache Spark DataFrames [article]
Read more
  • 0
  • 0
  • 3287

article-image-installing-configuring-x-pack-elasticsearch-kibana
Pravin Dhandre
20 Feb 2018
6 min read
Save for later

Installing and Configuring X-pack on Elasticsearch and Kibana

Pravin Dhandre
20 Feb 2018
6 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Pranav Shukla and Sharath Kumar M N titled Learning Elastic Stack 6.0. This book provides detailed coverage on fundamentals of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box] In this short tutorial, we will show step-by-step installation and configuration of X-pack components in Elastic Stack to extend the functionalities of Elasticsearch and Kibana. As X-Pack is an extension of Elastic Stack, prior to installing X-Pack, you need to have both Elasticsearch and Kibana installed. You must run the version of X-Pack that matches the version of Elasticsearch and Kibana. Installing X-Pack on Elasticsearch X-Pack is installed just like any plugin to extend Elasticsearch. These are the steps to install X-Pack in Elasticsearch: Navigate to the ES_HOME folder. Install X-Pack using the following command: $ ES_HOME> bin/elasticsearch-plugin install x-pack During installation, it will ask you to grant extra permissions to X-Pack, which are required by Watcher to send email alerts and also to enable Elasticsearch to launch the machine learning analytical engine. Specify y to continue the installation or N to abort the installation. You should get the following logs/prompts during installation: -> Downloading x-pack from elastic [=================================================] 100% @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: plugin requires additional permissions @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ * java.io.FilePermission .pipe* read,write * java.lang.RuntimePermissionaccessClassInPackage.com.sun.activation.registries * java.lang.RuntimePermission getClassLoader * java.lang.RuntimePermission setContextClassLoader * java.lang.RuntimePermission setFactory * java.net.SocketPermission * connect,accept,resolve * java.security.SecurityPermission createPolicy.JavaPolicy * java.security.SecurityPermission getPolicy * java.security.SecurityPermission putProviderProperty.BC * java.security.SecurityPermission setPolicy * java.util.PropertyPermission * read,write * java.util.PropertyPermission sun.nio.ch.bugLevel write See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html for descriptions of what these permissions allow and the associated Risks. Continue with installation? [y/N]y @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: plugin forks a native controller @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ This plugin launches a native controller that is not subject to the Java security manager nor to system call filters. Continue with installation? [y/N]y Elasticsearch keystore is required by plugin [x-pack], creating... -> Installed x-pack Restart Elasticsearch: $ ES_HOME> bin/elasticsearch Generate the passwords for the default/reserved users—elastic, kibana, and logstash_system—by executing this command: $ ES_HOME>bin/x-pack/setup-passwords interactive You should get the following logs/prompts to enter the password for the reserved/default users: Initiating the setup of reserved user elastic,kibana,logstash_system passwords. You will be prompted to enter passwords as the process progresses. Please confirm that you would like to continue [y/N]y Enter password for [elastic]: elastic Reenter password for [elastic]: elastic Enter password for [kibana]: kibana Reenter password for [kibana]:kibana Enter password for [logstash_system]: logstash Reenter password for [logstash_system]: logstash Changed password for user [kibana] Changed password for user [logstash_system] Changed password for user [elastic] Please make a note of the passwords set for the reserved/default users. You can choose any password of your liking. We have chosen the passwords as elastic, kibana, and logstash for elastic, kibana, and logstash_system users, respectively, and we will be using them throughout this chapter. To verify the X-Pack installation and enforcement of security, point your web browser to http://localhost:9200/ to open Elasticsearch. You should be prompted to log in to Elasticsearch. To log in, you can use the built-in elastic user and the password elastic. Upon a successful log in, you should see the following response: { name: "fwDdHSI", cluster_name: "elasticsearch", cluster_uuid: "08wSPsjSQCmeRaxF4iHizw", version: { number: "6.0.0", build_hash: "8f0685b", build_date: "2017-11-10T18:41:22.859Z", build_snapshot: false, lucene_version: "7.0.1", minimum_wire_compatibility_version: "5.6.0", minimum_index_compatibility_version: "5.0.0" }, tagline: "You Know, for Search" } A typical cluster in Elasticsearch is made up of multiple nodes, and X-Pack needs to be installed on each node belonging to the cluster. To skip the install prompt, use the—batch parameters during installation: $ES_HOME>bin/elasticsearch-plugin install x-pack --batch. Your installation of X-Pack will have created folders named x-pack in bin, config, and plugins found under ES_HOME. We shall explore these in later sections of the chapter. Installing X-Pack on Kibana X-Pack is installed just like any plugins to extend Kibana. The following are the steps to install X-Pack in Kibana: Navigate to the KIBANA_HOME folder. Install X-Pack using the following command: $KIBANA_HOME>bin/kibana-plugin install x-pack You should get the following logs/prompts during installation: Attempting to transfer from x-pack Attempting to transfer from https://artifacts.elastic.co/downloads/kibana-plugins/x-pack/x-pack -6.0.0.zip Transferring 120307264 bytes.................... Transfer complete Retrieving metadata from plugin archive Extracting plugin archive Extraction complete Optimizing and caching browser bundles... Plugin installation complete Add the following credentials in the kibana.yml file found under $KIBANA_HOME/config and save it: elasticsearch.username: "kibana" elasticsearch.password: "kibana" If you have chosen a different password for the kibana user during password setup, use that value for the elasticsearch.password property. Start Kibana: $KIBANA_HOME>bin/kibana To verify the X-Pack installation, go to http://localhost:5601/ to open Kibana. You should be prompted to log in to Kibana. To log in, you can use the built-in elastic user and the password elastic. Your installation of X-Pack will have created a folder named x-pack in the plugins folder found under KIBANA_HOME. You can also optionally install X-Pack on Logstash. However, X-Pack currently supports only monitoring of Logstash. Uninstalling X-Pack To uninstall X-Pack: Stop Elasticsearch. Remove X-Pack from Elasticsearch: $ES_HOME>bin/elasticsearch-plugin remove x-pack Restart Elasticsearch and stop Kibana 2. Remove X-Pack from Kibana: $KIBANA_HOME>bin/kibana-plugin remove x-pack Restart Kibana. Configuring X-Pack X-Pack comes bundled with security, alerting, monitoring, reporting, machine learning, and graph capabilities. By default, all of these features are enabled. However, one might not be interested in all the features it provides. One can selectively enable and disable the features that they are interested in from the elasticsearch.yml and kibana.yml configuration files. Elasticsearch supports the following features and settings in the elasticsearch.yml file: Kibana supports these features and settings in the kibana.yml file: If X-Pack is installed on Logstash, you can disable the monitoring by setting the xpack.monitoring.enabled property to false in the logstash.yml configuration file.   With this, we successfully explored how to install and configure the X-Pack components in order to bundle different capabilities of X-pack into one package of Elasticsearch and Kibana. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.    
Read more
  • 0
  • 0
  • 40114

article-image-decision-trees
Packt
20 Feb 2018
17 min read
Save for later

Decision Trees

Packt
20 Feb 2018
17 min read
In this article by David Toth, the author of the book Data Science Algorithms in a Week, we will cover the following topics: Concepts Analysis Concepts A decision tree is the arrangement of the data in a tree structure where at each node data is separated to different branches according to the value of the attribute at the node. Analysis To construct a decision tree, we will use a standard ID3 learning algorithm that chooses an attribute that classifies the data samples in the best possible way to maximize the information gain – a measure based on information entropy. Information entropy Information entropy of the given data measures the least amount of the information necessary to represent a data item from the given data. The unit of the information entropy is a familiar unit – a bit and a byte, a kilobyte, and so on. The lower information entropy, the more regular, data is, the more pattern occurs in the data and thus less amount of the information is necessary to represent it. That is why compression tools on the computer can take large text files and compress them to a much smaller size, as words and word expressions keep reoccurring, forming a pattern. Coin flipping Imagine we flip and unbiased coin. We would like to know if the result is head or tail. How much information do we need to represent the result? Both words head and tail consists of 4 characters and if we represent one character with one byte (8 bits) as it is standard in ASCII table, then we would need 4 bytes or 32 bits to represent the result. But the information entropy is the least amount of the data necessary to represent the result. We know that there are only two possible results – head or tail. If we agree to represent head with 0 and tail with 1, then 1 bit would be sufficient to communicate the result efficiently. Here the data is the space of the possibilities of the result of the coin throw. It is the set {head,tail} which can represented as a set {0,1}. The actual result is a data item from this set. It turns out that the entropy of the set is 1. This is owing to that the probability of head and tail are both 50%. Now imagine that the coin is biased and throws head 25% of time and tails 75% of time. What would be the entropy of the probability space {0,1} this time? We could certainly represent the result with 1 bit of the information. But can we do better? 1 bit is of course indivisible, but maybe we could generalize the concept of the information to indiscrete amounts. In the previous example, we know nothing about the previous result of the coin flip unless we look at the coin. But in the example with the biased coin, we know that the result tail is more likely to happen. If we recorded n results of coin flips in a file representing heads with 0 and tails with 1, then about 75% of the bits there would have the value 1 and 25% of them would have the value 0. The size of such file would be n bits. But since it is more regular (the pattern of 1s prevails in it) a good compression tool should be able to compress it to less than n bits. To learn the theoretical bound to the compression and the amount of the information necessary to represent a data item we define information entropy precisely. Definition of Information Entropy Suppose that we are given a probability space S with the elements 1, 2, …, n. The probability an element i would be chosen from the probability space is pi. Then the information entropy of the probability space is defined as: E(S)=-p1 *log2(p1) - … - pn *log2(pn) where log2 is a binary logarithm. So for the information entropy of the probability space of unbiased coin throws is E = -0.5 * log2(0.5) – 0.5*log2(0.5)=0.5+0.5=1. When the coin is based with 25% chance of a head and 75% change of a tail, then the information entropy of such space is: E = -0.25 * log2(0.25) – 0.75*log2(0.75) = 0.81127812445 which is less than 1. Thus for example if we had a large file with about 25% of 0 bits and 75% of 1 bits, a good compression tool should be able to compress it down to about 81.12% of its size. Information gain The information gain is the amount of the information entropy gained as a result of a certain procedure. For example, if we would like to know the results of 3 fair coins, then its information entropy is 3. But if we could look at the 3rd coin, then information entropy of the result for the remaining 2 coins would be 2. Thus by looking at the 3rd coin we gained 1 bit information, so the information gain was 1. We may also gain the information entropy by dividing the whole set S into sets grouping them by similar pattern. If we group elements by their value of an attribute A, then we define the information gain as IG(S, A) = E(S) – Sumv in values(A)[(|Sv|/|S|) * E(Sv)] where Sv is a set with the elements of S that have the value v for the attribute A. For example let us calculate the information gain for the 6 rows in the swimming example by taking swimming suit as an attribute. Because we are interested whether a given row of data is classified as no or yes for the question whether one should swim, we will use swim preference to calculate the entropy and information gain. We partition the set S by the attribute swimming suit: Snone={(none,cold,no),(none,warm,no)} Ssmall={(small,cold,no),(small,warm,no)} Sgood= {(good,cold,no),(good,warm,yes)} The information entropy of S is E(S)=-(1/6)*log2(1/6)-(5/6)*log2(5/6)~0.65002242164 The information entropy of the partitions is: E(Snone)=-(2/2)*log2(2/2)=-log2(1)=0 since all instances have the class no. E(Ssmall)=0 for a similar reason E(Sgood)=-(1/2)*log2(1/2)=1 Therefore the information gain is IG(S,swimming suit)=E(S)-[(2/6)*E(Snone)+(2/6)*E(Ssmall)+(2/6)*E(Sgood)] =0.65002242164-(1/3)=0.3166890883 If we chose the attribute water temperature to partition the set S, what would be the information gain IG(S,water temperature)? The water temperature partitions the set S into the following sets: Scold={(none,cold,no),(small,cold,no),(good,cold,no)} Swarm={(none,warm,no),(small,warm,no),(good,warm,yes)} Their entropies are: E(Scold)=0 as all instances are classified as no. E(Swarm)=-(2/3)*log2(2/3)-(1/3)*log2(1/3)~0.91829583405 which is less than IG(S,swimming suit). Therefore, we can gain more information about the set S (the classification of its instances) by partitioning it per the attribute swimming suit instead of the attribute water temperature. This finding will be the basis of the ID3 algorithm constructing a decision tree in the next paragraph. ID3 algorithm ID3 algorithm constructs a decision tree from the data based on the information gain. In the beginning, we start with the set S. The data items in the set S have various properties according to which we can partition the set S. If an attribute A has the values {v1, …, vn}, then we partition the set S into the sets Sv1, …, Svn. Where the set Svi is a subset of the set S where the elements have the value vi for the attribute A. If each element in the set S has attributes A1, …, Am, then we can partition the set S according to any of the possible attributes. ID3 algorithm partitions the set S according to the attribute that yields the highest information gain. Now suppose that it was an attribute A1. Then for the set S we have the partitions Sv1, …, Svn where A1 has the possible values {v1,…, vn}. Since we have not constructed any tree yet, we first place a root node into the tree. For every partition of S we place a new branch from the root. Every branch represents one value of the selected attributes. A branch has data samples with the same value for that attribute. For every new branch we can define a new node that will have data samples from its ancestor branch. Once we have defined a new node, we choose another of the remaining attributes with the highest information gain for the data at that node to partition the data at that node further, then define new branches and nodes. This process can be repeated until we run out of all the attributes for the nodes or even earlier until all the data at the node have the same class of our interest. In the case of a swimming example there are only two possible classes for swimming preference: class no and class yes. The last node is called a leaf node and decides the class of a data item from the data. Tree construction by ID3 algorithm Here we describe step by step how an ID3 algorithm would construct a decision tree from the given data samples in the swimming example. The initial set consists of 6 data samples: S={(none,cold,no),(small,cold,no),(good,cold,no),(none,warm,no),(small,warm,no),(good,warm,yes)} In the previous sections we calculated the information gains for both and the only non- classifying attributes swimming suit and water temperature: IG(S,swimming suit)=0.3166890883 IG(S,water temperature)=0.19087450461 Hence we would choose the attribute swimming suit as it has a higher information gain. There is no tree drawn yet, so we start from the root node. As the attribute swimming suit has 3 possible values {none, small, good}, we draw 3 possible branches out of it for each. Each branch will have one partition from the partitioned set S: Snone, Ssmall, Sgood. We add nodes to the ends of the branches. Snone data samples have the same class swimming preference = no, so we do not need to branch that node by a further attribute and partition set. Thus the node with the data Snone is already a leaf node. The same is true for the node with the data Ssmall. But the node with the data Sgood has two possible classes for swimming preference. Therefore, we will branch the node further. There is only one non- classifying attribute left – water temperature. So there is no need to calculate the information gain for that attribute with the data Sgood. From the node Sgood we will have 2 branches each with the partition from the set Sgood. One branch will have the set of the data sample Sgood, cold={(good,cold,no)}, the other branch will have the partition Sgood, warm={(good,warm,yes)}. Each of these 2 branches will end with a node. Each node will be a leaf node because each node has the data samples of the same value for the classifying attribute swimming preference. The resulting decision tree has 4 leaf nodes and is the tree in the picture decision tree for the swimming preference example. Deciding with a decision tree Once we have constructed a decision tree from the data with the attributes A1, …, Am and the classes {c1, …, ck}; we can use this decision tree to classify a new data item with the attributes A1, …, Am into one of the classes {c1, …, ck}. Given a new data item that we would like to classify, we can think of each node including the root as a question for data sample: What value does that data sample for the selected attribute Aihave? Then based on the answer we select the branch of a decision tree and move further to the next node. Then another question is answered about the data sample and another until the data sample reaches the leaf node. A leaf node has an associated one of the classes {c1, …, ck} with it, e.g. ci. Then the decision tree algorithm would classify the data sample into the class ci. Deciding a data sample with the swimming preference decision tree Let us construct a decision tree for the swimming preference example with the ID3 algorithm. Consider a data sample (good,cold,?) and we would like to use the constructed decision tree to decide into which class it should belong. Start with a data sample at the root of the tree. The first attribute that branches from the root is swimming suit, so we ask for the value for the attribute swimming suit of the sample (good,cold,?). We learn that the value of the attribute is swimming suit=good, therefore move down the rightmost branch with that value for its data samples. We arrive at the node with the attribute water temperature and ask the question: what is the value of the attribute water temperature for the data sample (good,cold,?). We learn that for that data sample we have water temperature=cold, therefore we move down the left branch into the leaf node. This leaf is associated with the class swimming preference=no. Therefore the decision tree would classify the data sample (good,cold,?) to be in that class swimming preference, i.e. to complete it to the data sample (good,cold,no). Therefore, the decision tree says that if one has a good swimming suit, but the water temperature is cold, then one would still not want to swim based on the data collected in the table. Implementation decision_tree.py import math import imp import sys #anytree module is used to visualize the decision tree constructed by this ID3 algorithm. from anytree import Node, RenderTree import common #Node for the construction of a decision tree. class TreeNode: definit(self,var=None,val=None): self.children=[] self.var=varself.val=val defadd_child(self,child): self.children.append(child) defget_children(self): return self.children defget_var(self): return self.var defis_root(self): return self.var==None and self.val==None defis_leaf(self): return len(self.children)==0 def name(self): if self.is_root(): return “[root]” return “[“+self.var+”=“+self.val+”]” #Constructs a decision tree where heading is the heading of the table with the data, i.e. the names of the attributes. #complete_data are data samples with a known value for every attribute. #enquired_column is the index of the column (starting from zero) which holds the classifying attribute. defconstuct_decision_tree(heading,complete_data,enquired_column): available_columns=[] for col in range(0,len(heading)): if col!=enquired_column: available_columns.append(col) tree=TreeNode() add_children_to_node(tree,heading,complete_data,available_columns,enquired_ column) return tree #Splits the data samples into the groups with each having a different value for the attribute at the column col. defsplit_data_by_col(data,col): data_groups={} for data_item in data: if data_groups.get(data_item[col])==None: data_groups[data_item[col]]=[] data_groups[data_item[col]].append(data_item) return data_groups #Adds a leaf node to node. defadd_leaf(node,heading,complete_data,enquired_column): node.add_child(TreeNode(heading[enquired_column],complete_data[0][enquired_ column])) #Adds all the descendants to the node. def add_children_to_node(node,heading,complete_data,available_columns,enquired_ column): if len(available_columns)==0: add_leaf(node,heading,complete_data,enquired_column) return -1 selected_col=select_col(complete_data,available_columns,enquired_column) for i inrange(0,len(available_columns)): if available_columns[i]==selected_col: available_columns.pop(i) break data_groups=split_data_by_col(complete_data,selected_col) if(len(data_groups.items())==1): add_leaf(node,heading,complete_data,enquired_column) return -1 for child_group, child_data in data_groups.items(): child=TreeNode(heading[selected_col],child_group) add_children_to_node(child,heading,child_data,list(available_columns),enquired_column) node.add_child(child) #Selects an available column/attribute with the highest information gain. defselect_col(complete_data,available_columns,enquired_column): selected_col=-1 selected_col_information_gain=-1 for col in available_columns: current_information_gain=col_information_gain(complete_data,col,enquired_column) if current_information_gain>selected_col_information_gain: selected_col=col selected_col_information_gain=current_information_gainreturn selected_col #Calculates the information gain when partitioning complete_dataaccording to the attribute at the column col and classifying by the attribute at enquired_column. defcol_information_gain(complete_data,col,enquired_column): data_groups=split_data_by_col(complete_data,col) information_gain=entropy(complete_data,enquired_column) for _,data_group in data_groups.items(): information_gain- =(float(len(data_group))/len(complete_data))*entropy(data_group,enquired_column) return information_gain #Calculates the entropy of the data classified by the attribute at the enquired_column. def entropy(data,enquired_column): value_counts={} for data_item in data: if value_counts.get(data_item[enquired_column])==None: value_counts[data_item[enquired_column]]=0 value_counts[data_item[enquired_column]]+=1 entropy=0 for _,count in value_counts.items(): probability=float(count)/len(data) entropy-=probability*math.log(probability,2) return entropy #A visual output of a tree using the text characters. defdisplay_tree(tree): anytree=convert_tree_to_anytree(tree) for pre, fill, node in RenderTree(anytree): pre=pre.encode(encoding=‘UTF-8’,errors=‘strict’) print(“%s%s” % (pre, node.name)) #A simple textual output of a tree without the visualization. defdisplay_tree_simple(tree): print(‘***Tree structure***’) display_node(tree) sys.stdout.flush() #A simple textual output of a node in a tree. defdisplay_node(node): if node.is_leaf(): print(‘The node ‘+node.name()+’ is a leaf node.’) return sys.stdout.write(‘The node ‘+node.name()+’ has children: ‘) for child in node.get_children(): sys.stdout.write(child.name()+’‘) print(‘‘) for child in node.get_children(): display_node(child) #Convert a decision tree into the anytree module tree format to make it ready for rendering. defconvert_tree_to_anytree(tree): anytree=Node(“Root”) attach_children(tree,anytree) return anytree#Attach the children from the decision tree into the anytree tree format. defattach_children(parent_node, parent_anytree_node): for child_node in parent_node.get_children(): child_anytree_node=Node(child_node.name(),parent=parent_anytree_node) attach_children(child_node,child_anytree_node) ###PROGRAM START### if len(sys.argv)<2: sys.exit(‘Please, input as an argument the name of the CSV file.’) csv_file_name=sys.argv[1] (heading,complete_data,incomplete_data,enquired_column)=common.csv_file_to_ ordered_data(csv_file_name) tree=constuct_decision_tree(heading,complete_data,enquired_column) display_tree(tree) common.py #Reads the csv file into the table and then separates the table into heading, complete data, incomplete data and then produces also the index number for the column that is not complete, i.e. contains a question mark. defcsv_file_to_ordered_data(csv_file_name): with open(csv_file_name, ‘rb’) as f: reader = csv.reader(f) data = list(reader) return order_csv_data(data) deforder_csv_data(csv_data): #The first row in the CSV file is the heading of the data table. heading=csv_data.pop(0) complete_data=[] incomplete_data=[] #Let enquired_column be the column of the variable which conditional probability should be calculated. Here set that column to be the last one. enquired_column=len(heading)-1 #Divide the data into the complete and the incomplete data. An incomplete row is the one that has a question mark in the enquired_column. The question mark will be replaced by the calculated Baysian probabilities from the complete data. for data_item in csv_data: if is_complete(data_item,enquired_column): complete_data.append(data_item) else: incomplete_data.append(data_item) return (heading,complete_data,incomplete_data,enquired_column) Program input swim.csv swimming_suit,water_temperature,swimNone,Cold,No None,Warm,NoSmall,Cold,NoSmall,Warm,NoGood,Cold,NoGood,Warm,Yes Program output $ python decision_tree.py swim.csv Root ├── [swimming_suit=Small] │├──[water_temperature=Cold] ││└──[swim=No] │└──[water_temperature=Warm] │└──[swim=No] ├── [swimming_suit=None] │├──[water_temperature=Cold] ││└──[swim=No] │└──[water_temperature=Warm] │└──[swim=No] └── [swimming_suit=Good] ├── [water_temperature=Cold] │└──[swim=No] └── [water_temperature=Warm] └── [swim=Yes] Summary In this article we have learned the concept of decision tree, analysis using ID3 algorithm, and implementation. Resources for Article: Further resources on this subject: Working with Data – Exploratory Data Analysis [article] Introduction to Data Analysis and Libraries [article] Data Analysis Using R [article]
Read more
  • 0
  • 0
  • 2300
article-image-roslyn-cookbook
Packt
20 Feb 2018
6 min read
Save for later

Consuming Diagnostic Analyzers in .NET projects

Packt
20 Feb 2018
6 min read
We know how to write diagnostic analyzers to analyze and report issues about .NET source code and contribute them to the .NET developer community. In this article by the author Manish Vasani, of the book Roslyn Cookbook, we will show you how to search, install, view and configure the analyzers that have already been published by various analyzer authors on NuGet and VS Extension gallery. We will cover the following recipes: (For more resources related to this topic, see here.) Searching and installing analyzers through the NuGet package manager. Searching and installing VSIX analyzers through the VS extension gallery. Viewing and configuring analyzers in solution explorer in Visual Studio. Using ruleset file and ruleset editor to configure analyzers. Diagnostic analyzers are extensions to the Roslyn C# compiler and Visual Studio IDE to analyze user code and report diagnostics. User will see these diagnostics in the error list after building the project from Visual Studio and even when building the project on the command line. They will also see the diagnostics live while editing the source code in the Visual Studio IDE. Analyzers can report diagnostics to enforce specific code styles, improve code quality and maintenance, recommend design guidelines or even report very domain specific issues which cannot be covered by the core compiler. Analyzers can be installed to a .NET project either as a NuGet package or as a VSIX. To get a better understanding of these packaging schemes and learn about the differences in the analyzer experience when installed as a NuGet package versus a VSIX. Analyzers are supported on various different flavors of .NET standard, .NET core and .NET framework projects, for example, class library, console app, etc. Searching and installing analyzers through the NuGet package manager In this recipe we will show you how to search and install analyzer NuGet packages in the NuGet package manager in Visual Studio and see how the analyzer diagnostics from an installed NuGet package light up in project build and as live diagnostics during code editing in Visual Studio. Getting ready You will need to have Visual Studio 2017 installed on your machine to this recipe. You can install a free community version of Visual Studio 2017 from https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15.  How to do it… Create a C# class library project, say ClassLibrary, in Visual Studio 2017. In solution explorer, right click on the solution or project node and execute Manage NuGet Packages command.  This brings up the NuGet Package Manager, which can be used to search and install NuGet packages to the solution or project. In the search bar type the following text to find NuGet packages tagged as analyzers: Tags:"analyzers" Note that some of the well known packages are tagged as analyzer, so you may also want to search:Tags:"analyzer" Check or uncheck the Include prerelease checkbox to the right of the search bar to search or hide the prerelease analyzer packages respectively. The packages are listed based on the number of downloads, with the highest downloaded package at the top. Select a package to install, say System.Runtime.Analyzers, and pick a specific version, say 1.1.0, and click Install. Click on I Accept button on the License Acceptance dialog to install the NuGet package. Verify the installed analyzer(s) show up under the Analyzers node in the solution explorer. Verify the project file has a new ItemGroup with the following analyzer references from the installed analyzer package: <ItemGroup> <Analyzer Include="..packagesSystem.Runtime.Analyzers.1.1.0analyzersdotnetcsSystem.Runtime.Analyzers.dll" /> <Analyzer Include="..packagesSystem.Runtime.Analyzers.1.1.0analyzersdotnetcsSystem.Runtime.CSharp.Analyzers.dll" /> </ItemGroup> Add the following code to your C# project: namespace ClassLibrary { public class MyAttribute : System.Attribute { } } Verify the analyzer diagnostic from the installed analyzer is shown in the error list: Open a Visual Studio 2017 Developer Command Prompt and build the project to verify that the analyzer is executed on the command line build and the analyzer diagnostic is reported: Create a new C# project in VS2017 and add the same code to it as step 9 and verify no analyzer diagnostic shows up in error list or command line, confirming that the analyzer package was only installed to the selected project in steps 1-6. Note that CA1018 (Custom attribute should have AttributeUsage defined) has been moved to a separate analyzer assembly in future versions of FxCop/System.Runtime.Analyzers package. It is recommended that you install Microsoft.CodeAnalysis.FxCopAnalyzers NuGet package to get the latest group of Microsoft recommended analyzers. Searching and installing VSIX analyzers through the VS extension gallery In this recipe we will show you how to search and install analyzer VSIX packages in the Visual Studio Extension manager and see how the analyzer diagnostics from an installed VSIX light up as live diagnostics during code editing in Visual Studio. Getting ready You will need to have Visual Studio 2017 installed on your machine to this recipe. You can install a free community version of Visual Studio 2017 from https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15. How to do it… Create a C# class library project, say ClassLibrary, in Visual Studio 2017. From the top level menu, execute Tools | Extensions and Updates Navigate to Online | Visual Studio Marketplace on the left tab of the dialog to view the available VSIXes in the Visual Studio extension gallery/marketplace. Search analyzers in the search text box in the upper right corner of the dialog and download an analyzer VSIX, say Refactoring Essentials for Visual Studio. Once the download completes, you will get a message at the bottom of the dialog that the install will be scheduled to execute once Visual Studio and related windows are closed. Close the dialog and then close the Visual Studio instance to start the install. In the VSIX Installer dialog, click Modify to start installation. The subsequent message prompts you to kill all the active Visual Studio and satellite processes. Save all your relevant work in all the open Visual Studio instances, and click End Tasks to kill these processes and install the VSIX. After installation, restart VS, click Tools | Extensions And Updates, and verify Refactoring Essentials VSIX is installed. Create a new C# project with the following source code and verify analyzer diagnostic RECS0085 (Redundant array creation expression) in the error list: namespace ClassLibrary { public class Class1 { void Method() { int[] values = new int[] { 1, 2, 3 }; } } } Build the project from Visual Studio 2017 or command line and confirm no analyzer diagnostic shows up in the Output Window or the command line respectively, confirming that the VSIX analyzer did not execute as part of the build. Resources for Article: Further resources on this subject: C++, SFML, Visual Studio, and Starting the first game [article] Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [article] Creating efficient reports with Visual Studio [article]
Read more
  • 0
  • 0
  • 11880

article-image-what-makes-hadoop-so-revolutionary
Packt
20 Feb 2018
17 min read
Save for later

What makes Hadoop so revolutionary?

Packt
20 Feb 2018
17 min read
In this article by Sourav Gulati and Sumit Kumar authors of book Apache Spark 2.x for Java Developers , explain in classical sense if we are to talk of Hadoop, then it comprises of two components a storage layer called HDFS and a processing layer called MapReduce. The resource management task prior to Hadoop 2.X was done using MapReduce Framework of Hadoop itself, however that changed with the introduction of YARN. In Hadoop 2.0 YARN was introduced as the third component of Hadoop to manage the resources of Hadoop Cluster and make it more Map Reduce agnostic. (For more resources related to this topic, see here.) HDFS Hadoop Distributed File System as the name suggests is a distributed file system based on the lines of Google File System written in Java. In practice HDFS resembles closely like any other UNIX file system with support for common file operations like ls, cp, rm, du, cat and so on. However what makes HDFS stand out despite its simplicity, is its mechanism to handle node failure in Hadoop cluster without effectively changing the seek time for accessing stored files. HDFS cluster consists of two major components: Data Nodes and Name Node. HDFS has a unique way of storing data on HDFS clusters (cheap commodity networked commodity computers). It splits the regular file in smaller chunks called blocks and then makes an exact number of copies of such chunks depending on the replication factor for that file. After that it copies such chunks to different Data Nodes of the Cluster. Name Node Name Node is responsible for managing the metadata of HDFS cluster such as list of files and folders that exist in a cluster, number of splits each file is divided into and their replication and storage at different Data Nodes. It also maintains and manages the namespace and file permission of all the files available in HDFS cluster. Apart from bookkeeping Name Node also has a supervisory role that keeps a watch on the replication factor of all the files and if some block goes missing then issue commands to replicate the missing block of data. It also generates reports to ascertain cluster health too. It is important to note that all the communication for supervisory task happens from Data Node to Name node that is Data Node sends reports a.k.a block reports to Name Node and it is then that Name Node responds to them by issuing different commands or instructions as the need may be. HDFS I/O A HDFS read operation from a client involves: Client requests the NameNode to determine where the actual data blocks are stored for a given file. Name Node obliges by providing the Block IDs and locations of the hosts (Data Node ) where the data can be found. The client contacts the Data Node with respective Block IDs to fetches the data from Data Node while preserving the order of the block files. A HDFS write operation from a client involves: Client contacts the Name Node to update the namespace with the file name and verify necessary permissions. If the file exists then Name Node throws an error else return the client FSDataOutputStream which points to data queue. The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes. The data is then copied to that DataNode, and as per replication strategy the data it further copied from that DataNode to rest of the DataNodes. It’s important to note that the data is never moved through the NameNode as it would have caused performance bottleneck. YARN Simplest way to understand Yet Another Resource manager (YARN) is to think of it as an operating system on a Cluster; provisioning resources, scheduling jobs & node maintenance. With Hadoop 2.x, MapReduce model of processing the data and managing the cluster (job tracker/task tracker) was divided. While data processing was still left to MapReduce, the cluster’s resource allocation (or rather, scheduling) task was assigned to a new component called YARN. Another objective that YARN met was that it made MapReduce one of the techniques to process the data rather than being the only technology to process data on HDFS as was the case in Hadoop 1.x systems. This paradigm shift opened the flood gate for the development of interesting applications around Hadoop and a new eco-system of not only classical MapReduce processing system evolved. It didn’t take much time after that for Apache Spark to break the hegemony of classical MapReduce and become arguably the most popular processing framework for parallel computing as far as active development and adoption is concerned. In order to serve Multi-tenancy, fault tolerance, and resource isolation in YARN, it developed below components to manage the cluster seamlessly. ResourceManager: It negotiates resources for different compute programmes on a Hadoop cluster while guaranteeing the following: resource isolation, data locality, fault tolerance, task prioritization and effective cluster capacity utilization. A configurable scheduler allows Resource Manager the flexibility to schedule and prioritize different applications as per the need. Tasks served by RM while serving clients: Using client or APIs user can submit or terminate an application. The user can also gather statistics on submitted application, cluster and queue information. RM also priorities ADMIN tasks higher over any other task to perform clean up or maintenance activities on a cluster like refreshing node-list, the queues configuration. Tasks served by RM while serving Cluster Nodes: Provisioning and de-provisioning of new nodes forms an important task of RM. Each node sends a heartbeat at a configured interval, default being 10 minutes. Any failure of node in doing so is treated as dead node. As a clean-up activity all the supposedly running process including containers are marked dead too. Tasks served by RM while serving Application Master: RM registers new AM while terminating the successfully executed ones. Just like Cluster Nodes if the heartbeat of AM is not received within a preconfigured duration, default value being 10 minutes, then AM is marked dead and all the associated containers too are marked dead. But since YARN is reliable as far as Application execution is concerned hence a new AM is rescheduled to try another execution on a new container until it reaches the retry configurable default count of 4. Scheduling and other miscellaneous tasks served by RM: RM maintains a list of running, submitted and executed applications along with its statistics such as execution time , status etc. Privileges of user as well as of applications are maintained and compared while serving various requests of user per application life cycle. RM scheduler oversees resource allocation for application such as memory allocation. Two common scheduling algorithms used in YARN are fair scheduling and capacity scheduling algorithms. NodeManager: NM exist per node of the cluster on a slightly similar fashion as to what slave nodes are in master slave architecture. When a NM starts it sends the information to RM for its availability to share its resources for upcoming jobs. There on NM sends periodic signal also called heartbeat to RM informing them of its status as being alive in the cluster. Primarily NM is responsible for launching containers that has been requested by AM with certain resource requirement such as memory, disk and so on. Once the containers are up and running the NM keeps a watch not on the status of the container’s task but on the resource utilization of the container and kill them if the container start utilizing more resources then it has been provisioned for. Apart from managing the life cycle of the container the NM also keeps RM informed about node’s health. ApplicationMaster: AM gets launched per submitted application and manages the life cycle of submitted application. However the first and foremost task AM does is to negotiate resources from RM to launch task specific containers at different nodes. Once containers are launched the AM keeps track of all the containers’ task status. If any node goes down or the container gets killed because of using excess resources or otherwise in such cases AM renegotiates resources from RM and launch those pending tasks again. AM also keeps reporting the status of the submitted application directly to the user and other such statistics to RM. ApplicationMaster implementation is framework specific and it is because of this reason application/framework specific code if transferred the AM , and it the AM that distributes it further across. This important feature also makes YARN technology agnostic as any framework can implement its ApplicationMaster and then utilized the resources of YARN cluster seamlessly. Container: Container in an abstract sense is a set of minimal resources such as CPU, RAM, Disk I/O, Disk space etc. that are required to run a task independently on a node. The first container after submitting the job is launched by RM to host ApplicationMaster. It is the AM which then negotiates resources from RM in the form of containers, which then gets hosted in different nodes across the Hadoop Cluster. Process flow of application submission in YARN: Step 1: Using a client or APIs the user submits the application let’s say a Spark Job jar. Resource Manager, whose primary task is to gather and report all the applications running on entire Hadoop cluster and available resources on respective Hadoop nodes, depending on the privileges of the user submitting the job accepts the newly submitted task. Step2: After this RM delegates the task to scheduler. The scheduler then searches for a container which can host the application-specific Application Master. While Scheduler does takes into consideration parameters like availability of resources, task priority, data locality etc. before scheduling or launching an Application Master, it has no role in monitoring or restarting a failed job. It is the responsibility of RM to keep track of AM and restart them in a new container when be it fails. Step 3: Once the Application Master gets launched it becomes the prerogative of AM to oversee the resources negotiation with RM for launching task specific containers. Negotiations with RM is typically over:    The priority of the tasks at hand.    Number of containers to be launched to complete the tasks.    The resources need to execute the tasks i.e. RAM, CPU (since Hadoop 3.x).    Available nodes where job containers can be launched with required resources    Depending on the priority and availability of resources the RM grants containers represented by container ID and hostname of the node on which it can be launched. Step 4: The AM then request the NM of the respective hosts to launch the containers with specific ID’s and resource configuration. The NM then launches the containers but keeps a watch on the resources usage of the task. If for example the container starts utilizing more resources than it has been provisioned for then in such scenario the said containers are killed by the NM. This greatly improves the job isolation and fair sharing of resources guarantee that YARN provides as otherwise it would have impacted the execution of other containers. However, it is important to note that the job status and application status as a whole is managed by AM. It falls in the domain of AM to continuously monitor any delay or dead containers, simultaneously negotiating with RM to launch new containers to reassign the task of dead containers. Step 5: The Containers executing on different nodes sends Application specific statistics to AM at specific intervals. Step 6: AM also reports the status of the application directly to the client that submitted the specific application, in our case a Spark Job. Step 7: NM monitors the resources being utilized by all the containers on the respective nodes and keeps sending a periodic update to RM. Step 8: The AM sends periodic statistics such application status, task failure, log information to RM Overview Of MapReduce Before delving deep into MapReduce implementation in Hadoop, let’s first understand the MapReduce as a concept in parallel computing and why it is a preferred way of computing. MapReduce comprises two mutually exclusive but dependent phases each capable of running on two different machines or nodes: Map: In Map phase transformation of data takes place. It splits data into key value pair by splitting it on a keyword. Suppose we have a text file and we would want to do an analysis such as to count total number of words or even the frequency with which the word has occurred in the text file. This is the classical Word Count problem of MapReduce, now to address this problem first we will have to identify the splitting keyword so that the data can be spilt and be converted into a key value pair. Let’s begin with John Lennon's song Imagine. Sample Text: Imagine there's no heaven It's easy if you try No hell below us Above us only sky Imagine all the people living for today After running Map phase on the sampled text and splitting it over <space> it will get converted to key value pair as follows: <imagine, 1> <there's, 1> <no, 1> <heaven, 1> <it's, 1> <easy, 1> <if, 1> <you, 1> <try, 1> <no, 1> <hell, 1> <below, 1> <us, 1> <above, 1> <us, 1> <only, 1> <sky, 1> <imagine, 1> <all, 1> <the, 1> <people, 1> <living, 1> <for, 1> <today, 1>] The key here represents the word and value represents the count, also it should be noted that we have converted all the keys to lowercase to reduce any further complexity arising out of matching case sensitive keys. Reduce: Reduce phase deals with aggregation of Map phase result and hence all the key value pairs are aggregated over key. So the Map output of the text would get aggregated as follows: [<imagine, 2> <there's, 1> <no, 2> <heaven, 1> <it's, 1> <easy, 1> <if, 1> <you, 1> <try, 1> <hell, 1> <below, 1> <us, 2> <above, 1> <only, 1> <sky, 1> <all, 1> <the, 1> <people, 1> <living, 1> <for, 1> <today, 1>] As we can see both Map and Reduce phase can be run exclusively and hence can use independent nodes in cluster to process the data. This approach of separation of tasks into smaller units called Map and Reduce has revolutionized general purpose distributed/parallel computing, which we now know as MapReduce. Apache Hadoop's MapReduce has been implemented pretty much the same way as discussed except for adding extra features into how the data from Map phase of each node gets transferred to their designated Reduce phase node. Hadoop's implementation of MapReduce enriches the Map and Reduce phase by adding few more concrete steps in between to make it fault tolerant and truly distributed. We can describe MR jobs on YARN in five stages. Job Submission Stage: When a client submits a MR Job following things happen RM is requested for an application ID. Input data location is checked and if present then file split size is computed. Job's output location need to exist as well. If all the three conditions are met then the MR job jar along with its configuration ,details of input split are copied to HDFS in a directory named the application ID provided by RM. And then the job is submitted to RM to launch a job specific Application Master, MRAppMaster. MAP Stage: Once RM receives the client's request for launching MRAppMaster, a call is made to YARN scheduler for assigning a container. As per resource availability the container is granted and hence the MRAppMaster is launched at the designated node with provisioned resources. After this MRAppMaster fetches input split information from the HDFS path that was submitted by the client and computes the number of Mapper task that will be launched based on the splits. Depending on number of Mappers it also calculates the required number of Reducers as per configuration, If MRAppMaster now finds the number of Mapper ,Reducer & size of input files to be small enough to be run in the same JVM then it goes ahead in doing so, such tasks are called Uber task. However, in other scenarios MRAppMaster negotiates container resources from RM for running these tasks albeit Mapper tasks having higher order and priority. This is so as Mapper tasks must finish before sorting phase can start. Data locality is another concern for containers hosting Mappers as data local nodes are preferred over rack local, with least preference being given to remote node hosted data. But when it comes to Reduce phase no such preference of data locality exist for containers. Containers hosting Mapper function first copy mapReduce JAR & configuration files locally and then launch a class YarnChild in the JVM. The mapper then start reading the input files, process them by making key value pairs and writes them in a circular buffer. Shuffle and Sort Phase: Considering circular buffer has size constraint, after a certain percentage where default being 80, a thread gets spawned which spills the data from buffer. But before copying the spilled data to disk, it is first partitioned with respect to its Reducer then the background thread also sorts the partitioned data on key and if combiner is mentioned then combines the data too. This process optimizes the data once it is copied to their respective partitioned folder. This process is continued until all the data from circular buffer gets written to disk. A background thread again checks if the number of spilled files in each partition is within the range of configurable parameter or else the files are merged and combiner is run over them until it falls within the limit of the parameter. Map task keeps updating the status to ApplicationMaster its entire life cycle, it is only when 5 percent of Map task has been completed that the reduce task start. An auxiliary service in the NodeManager serving Reduce task starts a Netty web server that makes a request to MRAppMaster for Mapper hosts having specific Mapper partitioned files. All the partitioned files that pertain to the Reducer is copied to their respective nodes in similar fashion. Since multiple files gets copied as data from various nodes representing that reduce nodes gets collected, a background thread merges the sorted map file again sorts them and if Combiner is configured then combines the result too. Reduce Stage: It is important to note here that at this stage every input file of each reducer should have been sorted by key, this is the presumption with which Reducer starts processing these records and converts the key value pair into aggregated list. Once reducer processes the data it writes them to the output folder as was mentioned during Job submission. Clean up stage: Each Reducer sends periodic update to MRAppMaster about the task completion, once the Reduce task is over the application master starts the clean-up activity. The submitted job status is changed from running to successful, all the temporary and intermediate files and folders are deleted .The application statistics are archived to job history server. Summary In this article we saw what is HDFS and YARN along with MapReduce in which we learned different function of MapReduce and HDFS I/O. Resources for Article: Further resources on this subject: Getting Started with Apache Spark DataFrames [article] Five common questions for .NET/Java developers learning JavaScript and Node.js [article] Getting Started with Apache Hadoop and Apache Spark [article]
Read more
  • 0
  • 0
  • 41115
Modal Close icon
Modal Close icon