Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7013 Articles
article-image-open-and-proprietary-next-generation-networks
Packt
21 Feb 2018
29 min read
Save for later

Open and Proprietary Next Generation Networks

Packt
21 Feb 2018
29 min read
In this article by Steven Noble, the author of the book Building Modern Networks, we will discuss networking concepts such as hyper-scale networking, software-defined networking, network hardware and software design along with a litany of network design ideas utilized in NGN. (For more resources related to this topic, see here.) The term Next Generation Network (NGN) has been around for over 20 years and refers to the current state of the art network equipment, protocols and features. A big driver in NGN is the constant newer, better, faster forwarding ASICs coming out of companies like Barefoot, Broadcom, Cavium, Nephos (MediaTek) and others. The advent of commodity networking chips has shortened the development time for generic switches, allowing hyper scale networking end users to build equipment upgrades into their network designs. At the time of writing, multiple companies have announced 6.4 Tbps switching chips. In layman terms, a 6.4 Tbps switching chip can handle 64x100GbE of evenly distributed network traffic without losing any packets. To put the number in perspective, the entire internet in 2004 was about 4 Tbps, so all of the internet traffic in 2004 could have crossed this one switching chip without issue. (Internet Traffic 1.3 EB/month http://blogs.cisco.com/sp/the-history-and-future-of-internet-traffic) A hyper-scale network is one that is operated by companies such as Facebook, Google, Twitter and other companies that add hundreds if not thousands of new systems a month to keep up with demand. Examples of next generation networking At the start of the commercial internet age (1994), software routers running on minicomputers such as BBNs PDP-11 based IP routers designed in the 1970's were still in use and hubs were simply dumb hardware devices that broadcast traffic everywhere. At that time, the state of the art in networking was the Cisco 7000 series router, introduced in 1993. The next generation router was the Cisco 7500 (1995), while the Cisco 12000 series (gigabit) routers and the Juniper M40 were only concepts. When we say next generation, we are speaking of the current state of the art and the near future of networking equipment and software. For example, 100 GB Ethernet is the current state of the art, while 400 GB Ethernet is in the pipeline. The definition of a modern network is a network that contains one or more of the following concepts: Software-defined Networking (SDN) Network design concepts Next generation hardware Hyper scale networking Open networking hardware and software Network Function Virtualization (NFV) Highly configurable traffic management Both Open and Closed network hardware vendors have been innovating at a high rate of speed with the help of and due to hyper-scale companies like Google, Facebook and others who have the need for next generation high speed network devices. This provides the network architect with a reasonable pipeline of equipment to be used in designs. Google and Facebook are both companies with hyper scale networks. A hyper scale network is one where the data stored, transferred, and updated on the network grows exponentially. Hyper scale companies deploy new equipment, software, and configurations weekly or even daily to support the needs of their customers. These companies have needs that are outside of the normal networking equipment available, so they must innovate by building their own next generation network devices, designing multi-tiered networks (like a three stage Clos network) and automating the installation and configuration of the next generation networking devices. The need of hyper scalers is well summed up by Google's Amin Vahdat in a 2014 Wired article "We couldn't buy the hardware we needed to build a network of the size and speed we needed to build". Terms and concepts in networking Here you will find the definition of some terms that are important in networking. They have been broken into groups of similar concepts. Routing and switching concepts In network devices and network designs there are many important concepts to understand. Here we begin with the way data is handled. The easiest way to discuss networking is to look at the OSI layer and point out where each device sits. OSI Layer with respect to routers and switches: Layer 1 (Physical): Layer 1 includes cables, hub, and switch ports. This is how all of the devices connect to each other including copper cables (CatX), fiber optics and Direct Attach Cables (DAC) which connect SFP ports without fiber. Layer 2 (Data link Layer): Layer 2 includes the raw data sent over the links and manages the Media Access Control (MAC) addresses for Ethernet Layer 3 (Network layer): Layer 3 includes packets that have more than just layer 2 data, such as IP, IPX (Novell Networks protocol), AFP (Apple's protocol) Routers and switches In a network you will have equipment that switches and/or routes traffic. A switch is a networking device that connects multiple devices such as servers, provides local connectivity and provides an uplink to the core network. A router is a network device that computes paths to remote and local devices, providing connectivity to devices across a network. Both switches and routers can use copper and fiber connections to interconnect. There are a few parts to a networking device, the forwarding chip, the TCAM, and the network processor. Some newer switches have Baseboard Management Controllers (BMCs) which manage the power, fans and other hardware, lessening the burden on the NOS to manage these devices. Currently routers and switches are very similar as there are many Layer 3 forwarding capable switches and some Layer 2 forwarding capable routers. Making a switch Layer 3 capable is less of an issue than making a router Layer 2 forwarding as the switch already is doing Layer 2 and adding Layer 3 is not an issue. A router does not do Layer 2 forwarding in general, so it has to be modified to allow for ports to switch rather than route. Control plane The control plane is where all of the information about how packets should be handled is kept. Routing protocols live in the control plane and are constantly scanning information received to determine the best path for traffic to flow. This data is then packed into a simple table and pushed down to the data plane. Data plane The data plane is where forwarding happens. In a software router, this would be done in the devices CPU, in a hardware router, this would be done using the forwarding chip and associated memories. VLAN/VXLAN A Virtual Local Area Network (VLAN) is a way of creating separate logical networks within a physical network. VLANs are generally used to separate/combine different users, or network elements such as phones, servers, workstations, and so on. You can have up to 4,096 VLANs on a network segment. A Virtual Extensible LAN (VXLAN) was created to all for large, dynamic isolated logical networks for virtualized and multiple tenant networks. You can have up to 16 million VXLANs on a network segment. A VXLAN Tunnel Endpoint (VTEP) is a set of two logical interfaces inbound which encapsulates incoming traffic into VXLANs and outbound which removes the encapsulation of outgoing traffic from VXLAN back to its original state.  Network design concepts Network design requires the knowledge of the physical structure of the network so that the proper design choices are made. For example, in data center you would have a local area network, if you have multiple data centers near each other, they would be considered a metro area network. LAN A Local Area Network (LAN), generally considered to be within the same building. These networks can be bridged (switched) or routed. In general LANs are segmented into areas to avoid large broadcast domains. MAN A Metro Area Network (MAN), generally defined as multiple sites in the same geographic area or city, that is, metropolitan area. A MAN generally runs at the same speed as a LAN but is able to cover larger distances. WAN A Wide Area Network (WAN), essentially everything that is not a LAN or MAN is a WAN. WANs generally use fiber optic cables to transmit data from one location to another. WAN circuits can be provided via multiple connections and data encapsulations including MPLS, ATM, and Ethernet. Most large network providers utilize Dense Wavelength Division Multiplexing (DWDM) to put more bits on their fiber networks. DWDM puts multiple colors of light onto the fiber, allowing up to 128 different wavelengths to be sent down a single fiber. DWDM has just entered open networking with the introduction of Facebook's Voyager system. Leaf-Spine design In a Leaf-Spine network design, there are Leaf switches (the switches that connect to the servers) sometimes called Top of Rack (ToR) switches connected to a set of Spine (switches that connect leafs together) sometimes called End of Rack (EoR) switches. Clos network A Clos network is one of the ways to design a multi-stage network. Based on the switching network design by Charles Clos in 1952, a three stage Clos is the smallest version of a Clos network. It has an ingress, a middle, and an egress stage. Some hyper scale networks are using five stage Clos where the middle is replaced with another three stage Clos. In a five stage Clos there is an ingress, a middle ingress, a middle, a middle egress and an egress stage. All stages are connected to their neighbor, so in the example shown, Ingress 1 is connected to all four of the middle stages just as Egress 1 is connected to all four of the middle stages. A Clos network can be built in odd numbers starting with 3, so a 5, 7, and so on stage Clos is possible. For even numbered designs, Benes designs are usable. Benes network A Benes design is a non-blocking Clos design where the middle stage is 2x2 instead of NxN. A Benes network can have even numbers of stages. Here is a four stage Benes network. Network controller concepts Here we will discuss the concepts of network controllers. Every networking device has a controller, whether built in or external to manage the forwarding of the system. Controller A controller is a computer that sits on the network and manages one or more network devices. A controller can be built into a device, like the Cisco Supervisor module or standalone like an OpenFlow controller. The controller is responsible for managing all of the control plane data and deciding what should be sent down to the data plane. Generally, a controller will have a Command-line Interface (CLI) and more recently a web configuration interface. Some controllers will even have an Application Programming Interface (API). OpenFlow controller An OpenFlow controller, as it sounds is a controller that uses the OpenFlow protocol to communicate with network devices. The most common OpenFlow controllers that people hear about are OpenDaylight and ONOS. People who are working with OpenFlow would also know of Floodlight and RYU. Supervisor module A route processor is a computer that sits inside of the chassis of the network device you are managing. Sometimes the route processor is built in to the system, while other times it is a module that can be replaced/upgraded. Many vendor multi-slot systems have multiple route processors for redundancy. An example of a removable route processor is the Cisco 9500 series Supervisor module. There are multiple versions available including revision A, with a 4 core processor and 16 GB of RAM and revision B with a 6 core processor and 24 GB of RAM. Previous systems such as the Cisco Catalyst 7600 had options such as the SUP720 (Supervisor Module 720) of which they offered multiple versions. The standard SUP720 had a limited number of routes that it could support (256k) versus the SUP720 XL which could support up to 1M routes. Juniper Route Engine In Juniper terminology, the controller is called a Route Engine. They are similar to the Cisco Route Processor/Supervisor modules. Unlike Cisco Supervisor modules which utilize special CPUS, Juniper's REs generally use common x86 CPUs. Like Cisco, Juniper multi-slot systems can have redundant processors. Juniper has recently released the information about the NG-REs or Next Generation Route Engines. One example is the new RE-S-X6-64G, a 6-core x86 CPU based routing engine with 64 GB DRAM and 2x 64 GB SSD storage available for the MX240/MX480/MX960. These NG-REs allow for containers and other virtual machines to be run directly. Built in processor When looking at single rack unit (1 RU) or pizza box design switches there are some important design considerations. Most 1 RU switches do not have redundant processors, or field replaceable route processors. In general the field replaceable units (FRUs) that the customer can replace are power supplies and fans. If the failure is outside of the available FRUs, the entire switch must be replaced in the event of a failure. With white box switches this can be a simple process as white box switches can be used in multiple locations of your network including the customer edge, provider edge and core. Sparing (keeping a spare switch) is easy when you have the same hardware in multiple parts of the network. Recently commodity switch fabric chips have come with built-in low power ARM CPUs that can be used to manage the entire system, leading to cheaper and less power hungry designs. Facebook Wedge microserver The Facebook Wedge is different from most white box switches as it has its controller as an add in module, the same board that is used in some of the OCP servers. By separating the controller board from the switch, different boards can be put in place, such as higher memory, faster CPUs, different CPU types, and so on. Routing protocols A routing protocol is a daemon that runs on a controller and communicates with other network devices to exchange route information. For this section we will use common words to demonstrate the way the routing protocol is working, these should not be construed as the actual way that the protocols talk. BGP Border Gateway Protocol (BGP) is a path vector based External Gateway Protocol (EGP) protocol that makes routing decisions based on paths, network policies, or rules (route-maps on Cisco). Though designed as a EGP, BGP can be used as both an interior (iboga) and exterior (eBGP) routing protocol. BGP uses keep alive packets (are you there?) to confirm that neighbors are still accessible. BGP is the protocol that is utilized to route traffic across the internet, exchanging routing information between different Autonomous Systems (AS). An AS is all of the connected networks under the control of a single entity such as Level 3 (AS1) or Sprint (AS1239). When two different ASes interconnect, BGP peering sessions are setup between two or more network devices that have direct connections to each other. In an eBGP scenario, AS1 and AS1239 would setup BGP peering sessions that would allow traffic to route between their AS. In an iBGP scenario, the same AS would peer with other routers with the same AS and transfer the routes that are defined on the system. While iBGP is used internally in most networks, iBGP is used in large corporate networks because other Interior Gateway Protocols (IGPs) may not scale. Examples: iBGP next hop self In this scenario AS1 and AS2 are peered with each other and exchanging one prefix each. AS1 advertises 192.168.1.0/24 and AS2 advertises 192.168.2.0/24. Each network has two routers, one border router, which connects to other ASes and one internal router which gets its routes from the border router. The routes are advertised internally with the next-hop set as the border router. This is a standard scenario when you are not running an IGP inside to distribute the routes for the border router external interfaces. The conversation goes like this: AS1 -> AS2: Hi AS2, I am AS1 AS2 -> AS1: Hi AS1, I am AS2 AS1 -> AS2: I have the following route, 192.168.1.0/24 AS2 - AS1: I have received the route, I have 192.168.2.0/24 AS1 - AS2: I have received the route AS1 -> Internal Router AS1: I have this route, 192.168.2.0/24, you can reach it through me at 10.1.1.1 AS2 -> Internal Router AS2: I have this route, 192.168.1.0/24, you can reach it through me at 10.1.1.1 iBGP next-hop unmodified In the next scenario the border routers are the same, but the internal routers are given a next-hop of the external (Other AS) border router. The last scenario is where you peer with a router server, a system that handles peering, filtering the routes based on what you have specified you send. The routes are then forwarded onto your peers with your IP as the next hop. OSPF Open Shortest Path First (OSPF) is a relatively simple protocol. Different links on the same router are put into the same or different areas. For example, you would use area 1 for the interconnects between campuses but you would use another area, such as area 10 for the campus itself. By separating areas, you can reduce the amount of cross talk that happens between devices. There are two versions of OSPF, v2 and v3. The main difference between v2 and v3 is that v2 is for IPv4 networks and v3 is for IPv6 networks. When there are multiple paths that can be taken, the cost of the links must be taken into account. Below you can see where there are two paths, one has a total cost of 20 (5+5+10), the other 16 (8+8) so the traffic will take the lowest cost link. IS-IS IS-IS is a link-state routing protocol, operating by flooding link state information throughout a network of routers using NETs (Network Entity Title). Each IS-IS router has its own database of the network topology, built by aggregating the flooded network information. IS-IS is used by companies who are looking for Fast convergence, scalability and Rapid flooding of new information. IS-IS uses the concept of levels instead of areas as in OSPF. There are two levels in IS-IS, Level 1 - area and Level2 - backbone. A Level 1 Intermediate System (IS), keeps track of the destinations within its area, while a Level 2 IS keep track of paths to the Level 1 areas. EIGRP Enhanced Interior Gateway Routing Protocol (EIGRP) is Cisco's proprietary routing protocol. It is hardly ever seen in current networks but if you see it in yours, then you need to plan accordingly. Replacing EIGRP with OSPF is suggested so that you can interoperate with non-cisco devices. RIP If Routing Information Protocol (RIP) is being used in your network, it must be replaced during the design. Most newer routing stacks do not support RIP. RIP is one of the original routing protocols, using the number of hops (routed ports) between the device and remote location to determine the optimal path. RIP sends its entire routing database out every 30 seconds. When routing tables were small, many years ago, RIP worked fine. With larger tables, the traffic bursts and resulting re-computing by other routers in the network causes routers to run at almost 100 percent CPU all the time. Cables Here we will review the major types of cables. Copper Copper cables have been around for a very long time, originally network devices were connected together using coax cable (the same cable used for television antennas and cable).  These days there are a few standard cables that are used. RJ45 Cables Cat5 - A 100Mb capable cable, used for both 10Mb and 100Mb connections  Cat5E - 1GbE capable cable but not suggested for 1GbE networks (Cat6 is better and the price difference is nominal). Cat6 - A 1GbE capable cable, can be used for any speed at or below 1GbE including 100Mb and 10Mb. SFPs SFP - Small Form-factor Pluggable port. Capable of up to 1GbE connections SFP+ - Same size as the SFP, capable of up to 10Gb connections SFP28 - Same size as the SFP, capable of up to 25Gb connections QSFP - Quad Small Form-factor Pluggable - A bit wider than the SFP but capable of multiple GbE connections QSFP+ - Same size as the QSFP - capable of 40GbE as 4x10GbE on the same cable QSFP28 - Same size as the QSFP - capable of 100GbE DAC - A direct attach cable that fits into a SFP or QSFP port Fiber/Hot pluggable Breakout Cables As routers and switches continue to become more dense, where the number of ports on the front of the device can no longer fit in the space, manufacturers have moved to what we call breakout cables. For example, if you have a switch that can handle 3.2Tb/s of traffic, you need to provide 3200Gbp/s of port capacity. The easiest way to do that is to use 32 100Gb ports which will fit on the front of a 1U device.  You cannot fit 128 10Gb ports without using either a breakout patch panel (which will then use another few rack units (RUs), or a breakout cable. For a period of time in the 1990's, Cisco used RJ21 connectors to provide up to 96 ethernet ports per slot Network engineers would then create breakout cables to go from the RJ21 to RJ45. These days, we have both DAC (Direct Attach Cable) and Fiber breakout cables. For example, here you can see a 1x4 breakout cable, providing 4 10g or 25G ports from a single 40G or 100G port. If you build a LAN network that only includes switches that provide layer 2 connectivity, any devices you want to connect together need to be in the same IP block. If you have a router in your network, it can route traffic between IP blocks. Part 1: What defines a modern network There is a litany of concepts that define a modern network, from simple principles to full feature sets. In general, a next-generation data center design enables you to move to a widely distributed non-blocking fabric with uniform chipset, bandwidth, and buffering characteristics in a simple architecture. In one example, to support these requirements, you would begin with a true three-tier Clos switching architecture with Top of Rack (ToR), spine, and fabric layers to build a data center network. Each ToR would have access to multiple fabrics and have the ability to select a desired path based on application requirement or network availability. Following the definition of a modern network from the introduction, here we layout the general definition of the parts. Modern network pieces Here we will discuss the concepts that build a Next Generation Network (NGN). Software Defined Networks Software defined networks can be defined in multiple ways. The general definition of a Software defined network is one which can be controlled as a singular unit instead of at a system by system basis. The control-plane which would normally be in the device and using routing protocols is replaced with a controller. Software defined networks can be built using many different technologies including OpenFlow, overlay networks and automation tools. Next generation networking and hyper scale networks As we mention in the introduction, twenty years ago NGN hardware would have been the Cisco GSR (officially introduced in 1997) or the Juniper M40 (officially released in 1998). Large Cisco and Juniper customers would have been working with the companies to help come up with the specifications and determining how to deploy the devices (possibly Alpha or Beta versions) in their networks. Today we can look at the hyper scale networking companies to see what a modern network looks like. A hyper scale network is one where the data stored, transferred and updated on the network grows exponentially. Technology such as 100Gb Ethernet, software defined networking, Open networking equipment and software are being deployed by hyper scale companies. Open networking hardware overview Open Hardware has been around for about 10 years, first in the consumer space and more recently in the enterprise space. Enterprise open networking hardware companies such as Quanta and Accton provide a significant amount of the hardware currently utilized in networks today. Companies such as Google and Facebook have been building their own hardware for many years. Facebook's routers such as the Wedge 100 and Backpack are available publicly for end users to utilize. Some examples of Open Networking hardware are: The Dell S6000-ON - a 32x40G switch with 32 QSFP ports on the front. The Quanta LY8 - a 48x10G + 6x40G switch with 48 SFP+ ports and 6 QSFP ports. The Facebook Wedge 100 - a 32x100G switch with 32 QSFP28 ports on the front. Open networking software overview To use open networking hardware, you need an operating system. The operating system manages the system devices such as fans, power, LEDs and temperature. On top of the operating system you will run a forwarding agent, examples of forwarding agents are Indigo, the open source OpenFlow daemon and Quagga, an open source routing agent. Closed networking hardware overview Cisco and Juniper are the leaders in the Closed Hardware and Software space. Cisco produces switches like the Nexus series (3000, 7000, 9000) with the 9000 programmable by ACI. Juniper provides the MX series (480, 960, 2020) with the 2020 being the highest end forwarding system they sell. Closed networking software overview Cisco has multiple network operating systems including IOS, NX-OS, IOS-XR. All Cisco NOSs are closed source and proprietary to the system that they run on. Cisco has what the industry calls a "industry standard CLI" which is emulated by many other companies. Juniper ships a single NOS, JunOS which can install on multiple different systems. JunOS is a closed source BSD based NOS. The JunOS CLI is significantly different from IOS and is more focused on engineers who program. Network Virtualization Not to be confused with Network Function Virtualization (NFV), Network virtualization is the concept of re-creating the hardware interfaces that exist in a traditional network in software. By creating a software counterpart to the hardware interfaces, you decouple the network forwarding from the hardware. There are a few companies and software projects that allow the end user to enable network virtualization. The first one is NSX which comes from the same team that developed OvS (Open Virtual Switch) Nicira, which was acquired by VMWare in 2012. Another project is Big Switch Networks Big Cloud Fabric, which utilizes a heavily modified version of Indigo, an OpenFlow controller. Network Function Virtualization Network Function Virtualization can be summed up by the statement that: "Due to recent network focused advancements in PC hardware, any service able to be delivered on proprietary, application specific hardware should be able to be done on a virtual machine". Essentially: routers, firewalls, load balancers and other network devices all running virtualized on commodity hardware. Traffic Engineering Traffic engineering is a method of optimizing the performance of a telecommunications network by dynamically analyzing, predicting and regulating the behavior of data transmitted over that network. Part 2: Next generation networking examples In my 25 or so years of networking, I have dealt with a lot of different networking technologies, each iteration (supposedly) better than the last. Starting with Thin Net (10BASE2), moving through ArcNet, 10BASE-T, Token Ring, ATM to the Desktop, FDDI and onwards. Generally, the technology improved for each system until it was swapped out. A good example is the change from a literal ring for token ring to a switching design where devices hung off of a hub (as in 10BASE-T). ATM to the desktop was a novel idea, providing up to 25Mbps to connected devices, but the complexity of configuring and managing it was not worth the gain. Today almost everything is Ethernet as shown by the Facebook Voyager DWDM system, which uses Ethernet over both traditional SFP ports and the DWDM interfaces.  Ethernet is simple, well supported and easy to manage. Example 1 - Migration from FDDI to 100Base-T In late 1996, early 1997, the Exodus network used FDDI rings (Fiber Distributed Data Interface) to connect the main routers together at 100Mbps. As the network grew we had to decide between two competing technologies, FDDI switches and Fast Ethernet (100Base-T) both providing 100Mbp/s. FDDI switches from companies like DEC (FDDI Gigaswitch) were used in most of the Internet Exchange Points (IXPs) and worked reasonably well with one minor issue, head of line blocking (HOLB), which also impacted other technologies. Head of line blocking occurs when a packet is destined for an interface that is already full, so a queue is built, if the interface continues to be full, eventually the queue will be dropped. While we were testing the DEC FDDI Gigaswitches, we were also in deep discussions with Cisco about the availability of Fast Ethernet (FE) and working on designs. Because FE was new, there were concerns about how it would perform and how we would be able to build a redundant network design. In the end, we decided to use FE, connect the main routers in a full mesh and use routing protocols to manage fail-over. Example 2 - NGN Failure - LANE (LAN Emulation) During the high growth period at Exodus communications, there was a request to connect a new data center to the original one and allow customers to put servers in both locations using the same address space. To do this, we chose LAN Emulation or LANE which allows a ATM network to be used like a LAN. On paper, LANE looked like a great idea, the ability to extend the LAN so that customers could use the same IP space in two different locations. In reality, it was very different. For hardware, we were using Cisco 5513 switches which provided a combination of Ethernet and ATM ports. There were multiple issues with this design: First, the customer is provided with an ethernet interface, which runs over an ATM optical interface.  Any error on the physical connection between switches or the ATM layer would cause errors on the Ethernet layer. Second, monitoring was very hard, when there were network issues, you had to look in multiple locations to determine where the errors were happening. After a few weeks, we did a midnight swap putting Cisco 7500 routers in to replace the 5500 switches and moving customers onto new blocks for the new data center. Part 3: Designing a modern network When designing a new network, some of the following might be important to you: Simple, focused yet non-blocking IP fabric Multistage parallel fabrics based on Clos network concept Simple merchant silicon Distributed control plane with some centralized controls Wide multi-path (ECMP) Uniform chipset, bandwidth, and buffering 1:1 oversubscribed (non-blocking fabric) Minimize the hardware necessary to carry east–west traffic Ability to support a large number of bare metal servers without adding an additional layer Limit fabric to a 5 stage Clos within the data center to minimize lookups and switching latency. Support host attachment at 10G, 25G, 50G and 100G Ethernet Traffic management In a modern network one of the first decisions is whether you will use a centralized controller or not. If you use a centralized controller, you will be able to see and control the entire network from one location. If you do not use a centralized controller, you will need to either manage each system directly or via automation. There is a middle space where you can use some software defined network pieces to manage parts of the network, such as an OpenFlow controller for the WAN or VMware NSX for your virtualized workloads. Once you know what the general management goal is, the next decision is whether to use open, proprietary, or a combination of both open and proprietary networking equipment. Open networking equipment is a concept that has been around less than a decade and started when very large network operators decided that they wanted a better control of the cost and features of the equipment in their networks. Google is a good example. In the following figure, you can see how Facebook used both their own hardware, 6-Pack/Backpack and legacy vendor hardware for their interoperability and performance testing. Google wanted to build a high-speed backbone, but was not looking to pay the prices that the incumbent proprietary vendors such as Cisco and Juniper wanted. Google set a price per port (1G/10G/40G) that they wanted to hit and designed equipment around that. Later companies like Facebook decided to go the same direction and contracted with commodity manufacturers to build network switches that met their needs. Proprietary vendors can offer the same level of performance or better using their massive teams of engineers to design and optimize hardware. This distinction even applies on the software side where companies like VMware and Cisco have created software defined networking tools such as NSX and ACI. With the large amount of networking gear available, designing and building a modern network can appear to be a complex concept. Designing a modern network requires research and a good understanding of networking equipment. While complex, the task is not hard if you follow the guidelines. These are a few of the stages of planning that need to be followed before the modern network design is started: The first step is to understand the scope of the project (single site, multi-site, multi-continent, multi-planet). The second step is to determine if the project is a green field (new) or brown field deployment (how many of the sites already exist and will/will not be upgraded). The third step is to determine if there will be any software defined networking (SDN), next generation networking (NGN) or Open Networking pieces. Finally, it is key that the equipment to be used is assembled and tested to determine if the equipment meets the needs of the network. Summary In this article, we have discussed many different concepts that tie NGN together. The term NGN refers to the latest and near-term networking equipment and designs. We looked at networking concepts such as local, metro and wide area networks, network controllers, routers and switches. Routing protocols such as BGP, IS-IS, OSPF and RIP. Then we discussed many pieces that are used either singularly or together that create a modern network. In the end, we also learned some guidelines that should be followed while designing a network. Resources for Article:   Further resources on this subject: Analyzing Social Networks with Facebook [article] Social Networks [article] Point-to-Point Networks [article]
Read more
  • 0
  • 0
  • 21370

article-image-create-conversational-assistant-chatbot-using-python
Savia Lobo
21 Feb 2018
5 min read
Save for later

How to create a conversational assistant or chatbot using Python

Savia Lobo
21 Feb 2018
5 min read
[box type="note" align="" class="" width=""]This article is an excerpt taken from a book Natural Language Processing with Python Cookbook written by Krishna Bhavsar, Naresh Kumar, and Pratap Dangeti. This book includes unique recipes to teach various aspects of performing Natural Language Processing with NLTK—the leading Python platform for the task.[/box] Today we will learn to create a conversational assistant or chatbot using Python programming language. Conversational assistants or chatbots are not very new. One of the foremost of this kind is ELIZA, which was created in the early 1960s and is worth exploring. In order to successfully build a conversational engine, it should take care of the following things: 1. Understand the target audience 2. Understand the natural language in which communication happens.  3. Understand the intent of the user 4. Come up with responses that can answer the user and give further clues NLTK has a module, nltk.chat, which simplifies building these engines by providing a generic framework. Let's see the available engines in NLTK: Engines Modules Eliza nltk.chat.eliza Python module Iesha nltk.chat.iesha Python module Rude nltk.chat.rudep ython module Suntsu Suntsu nltk.chat.suntsu module Zen nltk.chat.zen module In order to interact with these engines we can just load these modules in our Python program and invoke the demo() function. This recipe will show us how to use built-in engines and also write our own simple conversational engine using the framework provided by the nltk.chat module. Getting ready You should have Python installed, along with the nltk library. Having an understanding of regular expressions also helps. How to do it...    Open atom editor (or your favorite programming editor).    Create a new file called Conversational.py.    Type the following source code:    Save the file.    Run the program using the Python interpreter.    You will see the following output: How it works... Let's try to understand what we are trying to achieve here. import nltk This instruction imports the nltk library into the current program. def builtinEngines(whichOne): This instruction defines a new function called builtinEngines that takes a string parameter, whichOne: if whichOne == 'eliza': nltk.chat.eliza.demo() elif whichOne == 'iesha': nltk.chat.iesha.demo() elif whichOne == 'rude': nltk.chat.rude.demo() elif whichOne == 'suntsu': nltk.chat.suntsu.demo() elif whichOne == 'zen': nltk.chat.zen.demo() else: print("unknown built-in chat engine {}".format(whichOne)) These if, elif, else instructions are typical branching instructions that decide which chat engine's demo() function is to be invoked depending on the argument that is present in the whichOne variable. When the user passes an unknown engine name, it displays a message to the user that it's not aware of this engine. It's a good practice to handle all known and unknown cases also; it makes our programs more robust in handling unknown situations def myEngine():. This instruction defines a new function called myEngine(); this function does not take any parameters. chatpairs = ( (r"(.*?)Stock price(.*)", ("Today stock price is 100", "I am unable to find out the stock price.")), (r"(.*?)not well(.*)", ("Oh, take care. May be you should visit a doctor", "Did you take some medicine ?")), (r"(.*?)raining(.*)", ("Its monsoon season, what more do you expect ?", "Yes, its good for farmers")), (r"How(.*?)health(.*)", ("I am always healthy.", "I am a program, super healthy!")), (r".*", ("I am good. How are you today ?", "What brings you here ?")) ) This is a single instruction where we are defining a nested tuple data structure and assigning it to chat pairs. Let's pay close attention to the data structure: We are defining a tuple of tuples Each subtuple consists of two elements: The first member is a regular expression (this is the user's question in regex format) The second member of the tuple is another set of tuples (these are the answers) def chat(): print("!"*80) print(" >> my Engine << ") print("Talk to the program using normal english") print("="*80) print("Enter 'quit' when done") chatbot = nltk.chat.util.Chat(chatpairs, nltk.chat.util.reflections) chatbot.converse() We are defining a subfunction called chat()inside the myEngine() function. This is permitted in Python. This chat() function displays some information to the user on the screen and calls the nltk built-in nltk.chat.util.Chat() class with the chatpairs variable. It passes nltk.chat.util.reflections as the second argument. Finally we call the chatbot.converse() function on the object that's created using the chat() class. chat() This instruction calls the chat() function, which shows a prompt on the screen and accepts the user's requests. It shows responses according to the regular expressions that we have built before: if   name    == '  main  ': for engine in ['eliza', 'iesha', 'rude', 'suntsu', 'zen']: print("=== demo of {} ===".format(engine)) builtinEngines(engine) print() myEngine() These instructions will be called when the program is invoked as a standalone program (not using import). They do these two things: Invoke the built-in engines one after another (so that we can experience them) Once all the five built-in engines are excited, they call our myEngine(), where our customer engine comes into play We have learned to create a chatbot of our own using the easiest programming language ‘Python’. To know more about how to efficiently use NLTK and implement text classification, identify parts of speech, tag words, etc check out Natural Language Processing with Python Cookbook.
Read more
  • 0
  • 0
  • 50958

article-image-tree-test-and-surveys
Packt
21 Feb 2018
13 min read
Save for later

Tree Test and Surveys

Packt
21 Feb 2018
13 min read
In this article by Pablo Perea and  Pau Giner, authors of the book UX Design for Mobile, we will cover how to use different techniques that can be applied according to the needs of the project and the how to obtain the information that we want. (For more resources related to this topic, see here.) Tree Test This is also called reverse card sorting. This is a method where the participants try to find elements in a given structure. The objective of this method is to discover findability problems and improve the organization and labeling system. The organization structure used should represent a realistic navigation for the application or web page you are evaluating. If you don’t have one by the time the experiment is taking place, try to create a real scenario, as using a fake organization will not lead to really valuable results. There are some platforms available to perform this type of experiment with a computer. One example is https://www.optimalworkshop.com. This can have several advantages: the experiment can be carried out without requiring a physical displacement by the participant, and you can also study the participant steps and not just analyze whether the participant succeeded or not. It can be that the participants found the objectives but had to make many attempts to achieve them. The method steps Create the structure: Type or create the navigation structure with all the different levels you want to evaluate. Create a set of findability tasks: Think about different items that the participant should find or give a location in the given structure. Test with participants: The participant will receive a set of tasks to do. The following are some examples of possible tasks: Find some different products to buy Contact the customer support Get the shipping rates The results: At the end, we should have a success rate for each of the tasks. Tasks such as finding products in a store must be done several times with products located in different sections. This will help us classify our assortment and show us how to organize the first levels of the structure better. How to improve the organization Once we find the main weak points and workarounds we have in our structure, we can create alternative structures to retest and try to find better results. We can repeat this process several times until we get the desired results. The Information Architecture is the science field of organizing and labeling content in a web page to support usability and findability. There's a growing community of Information architecture specialists that supports the Information architecture Institute--https://en.wikipedia.org/wiki/Information_architecture. There are some general lines of work in which we have to invest time in order to improve our Information architecture. Content organization The content can be ordered by following different schemes, in the same way that a supermarket orders products according to different criteria. We should try to find the one that better fits our user needs. We can order the content, dividing it into groups according to nature, goals, audience, chronological entry, and so on. Each of these approaches will lead to different results and each will work better with different kinds of users. In the case of mobile applications, it is common to have certain sections where they mix contents of a different nature, for instance, integrating messages for the user in the contents of the activity view. However, an abuse of these types of techniques can lead to turning the section into a confusing area for the user. Areas naming There are words that have a completely different meaning for one person to another, especially if those people are thinking in different fields when they use our solution. Understanding our user needs, and how the think and speak, will help us provide clear names for sections and subsections. For example, the word pool will represent a different set of products for a person looking for summer products than for a person looking for games. In the case of applications, we will have to find a balance between simplicity and clarity. If space permits, adding a label along with the icon will clarify and reduce the possible ambiguities that may be encountered in recognizing the meaning of these graphic representations. In the case of mobiles, where space is really small, we can find some universal icons, but we must test with users to ensure that they interpret them properly. In the following examples, you can find two different approaches. In the Gmail app, attachment and send are known icons and can work without a label. We find a very different scenario in the Samsung Clock app, where it would be really difficult to differentiate between the Alarm, the Stopwatch, and the Timer without labels:      Samsung system and Google Gmail App screenshots (source: Screenshot from Google Gmail App, source: Screenshot from Gmail App) The working memory limit The way the information is displayed to the user can drastically change the ease with which it is understood. When we talk about mobiles, where space is very limited, limiting the number of options and providing a navigation adapted to small spaces can help our user have a more satisfactory experience. As you probably know, the human working memory is not limitless, and it is commonly supposed to be limited to remembering a maximum of seven elements (https://en.wikipedia.org/wiki/The_Magical_Number_Seven,_Plus_or_Minus_Two). Some authors such as Nelson Cowan suggested that the number of elements an adult can remember while performing a task is even lower, and gives the number of reference as four (https://en.wikipedia.org/wiki/Working_memory). This means that your users will understand the information you give them better if you block it into groups according to their limitations. Once we create a new structure, we can evaluate the efficiency of this new structure versus the last version. With small improvements, we will be able to increase the user engagement. Another way to learn about how the user understands the organization of our app or web is by testing a competitor product. This is one of the cheapest ways to have a quick prototype. Evaluate as many versions as you can; in each review, you will find new ideas to organize and show the content of your application or web app better. Surveys Surveys allow us to gather information from lots of participants without too much effort. Sometimes, we need information from a big group of people, and interviewing them one by one will not be affordable. Instead of that, surveys can quickly provide answers from lots of participants and analyze the results with bulk methods. It is not the purpose of this book to deal in depth with the field of questionnaires since there are books devoted entirely to this subject. Nevertheless, we will give some brushstrokes on the subject since they are commonly used to gather information in both web pages and mobile applications. Creating proper questions is a key part of the process that will reduce the noise and help the participants provide useful answers. Some questions will require more effort to analyze, but they will give us answers with deeper level of detail. Questions with pre-established answers are usually easier to automatize, and we can get results in less time. What we want to discover? The first thing to do is to define the objective for which we are making a survey. Working with a clear objective will help the process be focused and will get better results. Plan carefully and determine the information that you really need at that moment. We should avoid surveys with lots of questions that do not have a clear purpose. They will produce poor outcome and result in meaningless exercises for the participants. On the contrary, if we have a general leitmotiv for the questionnaire, it will also help the participants understand how the effort of completing the survey will help the company, and therefore it will give clear value to the time expended. You can plan your survey according to different planning approaches, your questions can be focussed on short and long term goals: Long-term planning: Understanding your users expectations and their view about your product in the long term will help plan the evolution of your application and help create new features that match their needs. For example, imagine that you are designing a music application, and you are unsure about focusing on mass majority music or maybe giving more visibility to amateur groups. Creating a long-term survey can help you understand what your users want to find in your platform and plan changes that match the conclusions extracted from the survey analysis. Short-term planning: This is usually related to operational actions. The objective with these kind of surveys is to gather information for taking actions later with a defined mission. These kind of surveys are useful when we need to choose between two options, that is, whether we are deciding to make a change in our platform or not. For example, it can help to decide what type of information is most important for the user when choosing between one group and another, so we can make that information more visible. We will take better decisions by understanding the main aspects our users will expect to find in our platform. Find the participants Depending on the goal of the survey, we can use a wider range of participants or reduce their number, filtering by their demographics, experience, or the relationship with our brand or products. If the goal is to expand our number of users, it may be interesting to expand the search range to participants outside our current set of users. Looking for new niches and interesting features for our potential users can make new users try out our application. If, on the contrary, our objective is to keep our current users loyal, it can be a great source of improvement to consult them about their preferences and their opinions about the things that work properly in our application. This data, along with the data of use and navigation, will let us see areas for improvement, and we will be able to solve problems of navigation and usability. Determining the questions We can ask different types of questions; depending on the type, we will get more or less detailed answers. If we choose the right type, we can save analysis effort, or we can reduce the number of participants when we require a deep analysis of each response. It is common to include questions at the beginning of the questionnaires in order to classify the results. They are usually called filtering or screening questions, and they will allow us to analyze the answers based on data such as age, gender, or technical skills. These questions have the objective of knowing the person answering the survey. If we know the person solving the questionnaire, we will be able to determine whether the answers given by this user are useful for our goals or not. We can add questions about the experience the participant has with general technology, or with our app, and about the relation with the brand. We can create two kinds of questions based on the type of answers the participant can provide; each of them, therefore, will lead to different results. Open-answer questions The objective of this type of questions is to know more about the participant without guiding the answers. We will try to ask objectively for a subject without providing possible answers. The participant will answer these type of questions with open-ended answers, so it will be easier to know more about how that participant thinks and which aspects are proving more or less satisfactory. While the advantage of this kind of questions is that you will gain a lot of insights and new ideas, the con is the cost of managing big amounts of data. So, these type of questions will be more useful when the number of participants is reduced. Here are some examples of open-answer questions: How often have you been using our customer service? How was your last purchase experience on our platform? Questions with pre-established answers These type of questions facilitate the analysis when the number of participants is high. We will create questions with a clear objective and give different options to respond. Participants will be able to choose one of the options in response. The analysis of these types of questions can be automated and therefore is faster, but it will not give us as detailed information as an open question, in which the participant can expose all his ideas about the matter in the question. The following is an example of a question with pre-established answers: Questions: How many times have you used our application in the last week? Answers: 1) More than five times 2) Two to Five 3) Once 4) None Another great advantage is the facility to answer these types of questions when the participant does not have much time or interest to respond. In environments such as mobile phones, typing long answers can be costly and frustrating. With these types of questions, we can offer answers that the user can select with a single click. This can help increase the number of participants completing the form. It is common to mix both the types of questions. Open-answer questions where the user can respond in more detail can be included as optional questions. The participants willing to share more information can use these fields to introduce more detailed answers. This way, we can make a quicker analysis on the questions with pre-established answers and analyze the questions that require more precise revision later. Humanize the forms When we create a form, we must think about the person who will answer it. Almost no one likes to fill in questionnaires, especially if they are long and complex. To make our participants feel comfortable filling out all the answers on our form, we have to try to treat the process as a human relationship: The first thing we should do is to explain the reason of our form. If our participants understand how their answers will be used in the project, and how they can help us achieve the goal, they will feel more encouraged to answer the questions and take their role seriously. Ask only what is strictly necessary for the purpose you have set for it. We must prevent different departments from introducing questions without a common goal. If the form will answer concerns of different departments, all of them should have the same goal. This way, the form will have more cohesion. The tone used on the form should be friendly and clear. We should not go beyond the limits of indiscretion with our questions, or the participant may feel overwhelmed, especially if the participants of our study are not users of our application or our services, we must treat them as unknown. Being respectful and kind is a key point in getting high participation. Summary In the article we saw how to apply Tree Test and how to conduct surveys to gain information that you want. Resources for Article:   Further resources on this subject: Trends UX Design [article] Building Mobile Apps [article] Auditing Mobile Applications [article]
Read more
  • 0
  • 0
  • 3495

article-image-introduction-device-management
Packt
20 Feb 2018
10 min read
Save for later

Introduction with Device Management

Packt
20 Feb 2018
10 min read
In this article by Yatish Patil, the author of the book Microsoft Azure IOT Development Cookbook, we will look at device management using different techniques with Azure IoT Hub. We will see the following recipes: Device registry operations Device twins Device direct methods Device jobs (For more resources related to this topic, see here.) Azure IoT Hub has the capabilities that can be used by a developer to build a robust device management. There could be different use cases or scenarios across multiple industries but these device management capabilities, their patterns and the SDK code remains same, saving the significant time in developing and managing as well as maintaining the millions of devices. Device management will be the central part of any IoT solution. The IoT solution is going to help the users to manage the devices remotely, take actions from the cloud based application like disable, update data, run any command, and firmware update. In this article, we are going to perform all these tasks for device management and will start with creating the device. Device registry operations This sample application is focused on device registry operations and how it works, we will create a console application as our first IoT solution and look at the various device management techniques. Getting ready Let’s create a console application to start with IoT: Create a new project in Visual Studio: Create a Console Application Add IoT Hub connectivity extension in Visual Studio: Add the extension for IoT Hub connectivity Now right click on the Solution and go to Add a Connected Services. Select Azure IoT Hub and click Add. Now select Azure subscription and the IoT Hub created: Select IoT Hub for our application Next it will ask you to add device or you can skip this step and click Complete the configuration. How to do it... Create device identity: initialize the Azure IoT Hub registry connection: registryManager = RegistryManager.CreateFromConnectionString(connectionString); Device device = new Device(); try { device = await registryManager.AddDeviceAsync(new Device(deviceId)); success = true; } catch (DeviceAlreadyExistsException) { success = false; } Retrieve device identity by ID: Device device = new Device(); try { device = await registryManager.GetDeviceAsync(deviceId); } catch (DeviceAlreadyExistsException) { return device; } Delete device identity: Device device = new Device(); try { device = GetDevice(deviceId); await registryManager.RemoveDeviceAsync(device); success = true; } catch (Exception ex) { success = false; } List up to 1000 identities: try { var devicelist = registryManager.GetDevicesAsync(1000); return devicelist.Result; } catch (Exception ex) { // Export all identities to Azure blob storage: var blobClient = storageAccount.CreateCloudBlobClient(); string Containername = "iothubdevices"; //Get a reference to a container var container = blobClient.GetContainerReference(Containername); container.CreateIfNotExists(); //Generate a SAS token var storageUri = GetContainerSasUri(container); await registryManager.ExportDevicesAsync(storageUri, "devices1.txt", false); } Import all identities to Azure blob storage: await registryManager.ImportDevicesAsync(storageUri, OutputStorageUri); How it works... Let’s now understand the steps we performed. We initiated by creating a console application and configured it for the Azure IoT Hub solution. The idea behind this is to see the simple operation for device management. In this article, we started with simple operation for provision of the device by adding it to IoT Hub. We need to create connection to the IoT Hub followed by the created object of registry manager which is a part of devices namespace. Once we are connected we can perform operations like, add device, delete device, get device, these methods are asynchronous ones. IoT Hub also provides a way where in it connects with Azure storage blob for bulk operations like export all devices or import all devices, this works on JSON format only, the entire set of IoT devices gets exported in this way. There's more... Device identities are represented as JSON documents. It consists of properties like: deviceId: It represents the unique identification or the IoT device. ETag: A string representing a weak ETag for the device identity. symkey: A composite object containing a primary and a secondary key, stored in base64 format. status: If enabled, the device can connect. If disabled, this device cannot access any device-facing Endpoint. statusReason: A string that can be used to store the reason for the status changes. connectionState: It can be connected or disconnected. Device twins First we need to understand what device twin is and what is the purpose where we can use the device twin in any IoT solution. The device twin is a JSON formatted document that describes the metadata, properties of any device created within IoT Hub. It describes the individual device specific information. The device twin is made up of: tags, desired properties, and the reported properties. The operation that can be done by a IoT solution are basically update this the data, query for any IoT device. Tags hold the device metadata that can be accessed from IoT solution only. Desired properties are set from IoT solution and can be accessed on the device. Whereas the reported properties are set on the device and retrieved at IoT solution end. How to do it... Store device metadata: var patch = new { properties = new { desired = new { deviceConfig = new { configId = Guid.NewGuid().ToString(), DeviceOwner = "yatish", latitude = "17.5122560", longitude = "70.7760470" } }, reported = new { deviceConfig = new { configId = Guid.NewGuid().ToString(), DeviceOwner = "yatish", latitude = "17.5122560", longitude = "70.7760470" } } }, tags = new { location = new { region = "US", plant = "Redmond43" } } }; await registryManager.UpdateTwinAsync(deviceTwin.DeviceId, JsonConvert.SerializeObject(patch), deviceTwin.ETag); Query device metadata: var query = registryManager.CreateQuery("SELECT * FROM devices WHERE deviceId = '" + deviceTwin.DeviceId + "'"); Report current state of device: var results = await query.GetNextAsTwinAsync(); How it works... In this sample, we retrieved the current information of the device twin and updated the desired properties, which will be accessible on the device side. In the code, we will set the co-ordinates of the device with latitude and longitude values, also the device owner name and so on. This same value will be accessible on the device side. In the similar manner, we can set some properties on the device side which will be a part of the reported properties. While using the device twin we must always consider: Tags can be set, read, and accessed only by backend . Reported properties are set by device and can be read by backend. Desired properties are set by backend and can be read by backend. Use version and last updated properties to detect updates when necessary. Each device twin size is limited to 8 KB by default per device by IoT Hub There's more... Device twin metadata always maintains the last updated time stamp for any modifications. This is UTC time stamp maintained in the metadata. Device twin format is JSON format in which the tags, desired, and reported properties are stored, here is sample JSON with different nodes showing how it is stored: "tags": { "$etag": "1234321", "location": { "country": "India" "city": "Mumbai", "zipCode": "400001" } }, "properties": { "desired": { "latitude": 18.75, "longitude": -75.75, "status": 1, "$version": 4 }, "reported": { "latitude": 18.75, "longitude": -75.75, "status": 1, "$version": 4 } } Device direct methods Azure IoT Hub provides a fully managed bi-directional communication between the IoT solution on the backend and the IoT devices in the fields. When there is need for an immediate communication result, a direct method best suites the scenarios. Lets take example in home automation system, one needs to control the AC temperature or on/off the faucet showers. Invoke method from application: public async Task<CloudToDeviceMethodResult> InvokeDirectMethodOnDevice(string deviceId, ServiceClient serviceClient) { var methodInvocation = new CloudToDeviceMethod("WriteToMessage") { ResponseTimeout = TimeSpan.FromSeconds(300) }; methodInvocation.SetPayloadJson("'1234567890'"); var response = await serviceClient.InvokeDeviceMethodAsync(deviceId, methodInvocation); return response; } Method execution on device: deviceClient = DeviceClient.CreateFromConnectionString("", TransportType.Mqtt); deviceClient.SetMethodHandlerAsync("WriteToMessage", new DeviceSimulator().WriteToMessage, null).Wait(); deviceClient.SetMethodHandlerAsync("GetDeviceName", new DeviceSimulator().GetDeviceName, new DeviceData("DeviceClientMethodMqttSample")).Wait(); How it works... Direct method works on request-response interaction with the IoT device and backend solution. It works on timeout basis if no reply within that, it fails. These synchronous requests have by default 30 seconds of timeout, one can modify the timeout and increase up to 3600 depending on the IoT scenarios they have.  The device needs to connect using the MQTT protocol whereas the backend solution can be using HTTP. The JSON data size direct method can work up to 8 KB Device jobs In a typical scenario, device administrator or operators are required to manage the devices in bulk. We look at the device twin which maintains the properties and tags. Conceptually the job is nothing but a wrapper on the possible actions which can be done in bulk. Suppose we have a scenario in which we need to update the properties for multiple devices, in that case one can schedule the job and track the progress of that job. I would like to set the frequency to send the data at every 1 hour instead of every 30 min for 1000 IoT devices. Another example could be to reboot the multiple devices at the same time. Device administrators can perform device registration in bulk using the export and import methods. How to do it... Job to update twin properties. var twin = new Twin(); twin.Properties.Desired["HighTemperature"] = "44"; twin.Properties.Desired["City"] = "Mumbai"; twin.ETag = "*"; return await jobClient.ScheduleTwinUpdateAsync(jobId, "deviceId='"+ deviceId + "'", twin, DateTime.Now, 10); Job status. var twin = new Twin(); twin.Properties.Desired["HighTemperature"] = "44"; twin.Properties.Desired["City"] = "Mumbai"; twin.ETag = "*"; return await jobClient.ScheduleTwinUpdateAsync(jobId, "deviceId='"+ deviceId + "'", twin, DateTime.Now, 10); How it works... In this example, we looked at a job updating the device twin information and we can follow up the job for its status to find out if the job was completed or failed. In this case, instead of having single API calls, a job can be created to execute on multiple IoT devices. The job client object provides the jobs available with the IoT Hub using the connection to it. Once we locate the job using its unique ID we can retrieve the status for it. The code snippet mentioned in the How to do it... preceding recipe, uses the temperature properties and updates the data. The job is scheduled to start execution immediately with 10 seconds of execution timeout set. There's more... For a job, the life cycle begins with initiation from the IoT solution. If any job is in execution, we can query to it and see the status of execution. Another most common scenario where this could be useful is the firmware update, reboot, configuration updates, and so on, apart from the device property read or write. Each device job has properties that helps us working with them. The useful properties are start and end date time, status, and lastly device job statistics which gives the job execution statistics. Summary We have learned the device management using different techniques with Azure IoT Hub in detail. We have explained, how the IoT solution is going to help the users to manage the devices remotely, take actions from the cloud based application like disable, update data, run any command, and firmware update. We also performed different tasks for device management. Resources for Article: Further resources on this subject: Device Management in Zenoss Core Network and System Monitoring: Part 1 [article] Device Management in Zenoss Core Network and System Monitoring: Part 2 [article] Managing Network Devices [article]
Read more
  • 0
  • 0
  • 13929

article-image-introduction-performance-testing-and-jmeter
Packt
20 Feb 2018
11 min read
Save for later

Introduction to Performance Testing and JMeter

Packt
20 Feb 2018
11 min read
In this article by Bayo Erinle, the author of the book Performance Testing with JMeter 3, will explore some of the options that make JMeter a great tool of choice for performance testing.  (For more resources related to this topic, see here.) Performance testing and tuning There is a strong relationship between performance testing and tuning, in the sense that one often leads to the other. Often, end-to-end testing unveils system or application bottlenecks that are regarded unacceptable with project target goals. Once those bottlenecks are discovered, the next step for most teams is a series of tuning efforts to make the application perform adequately. Such efforts normally include, but are not limited to, the following: Configuring changes in system resources Optimizing database queries Reducing round trips in application calls, sometimes leading to redesigning and re-architecting problematic modules Scaling out application and database server capacity Reducing application resource footprint Optimizing and refactoring code, including eliminating redundancy and reducing execution time Tuning efforts may also commence if the application has reached acceptable performance but the team wants to reduce the amount of system resources being used, decrease the volume of hardware needed, or further increase system performance. After each change (or series of changes), the test is re-executed to see whether the performance has improved or declined due to the changes. The process will be continued with the performance results having reached acceptable goals. The outcome of these test-tuning circles normally produces a baseline. Baselines Baseline is a process of capturing performance metric data for the sole purpose of evaluating the efficacy of successive changes to the system or application. It is important that all characteristics and configurations, except those specifically being varied for comparison, remain the same in order to make effective comparisons as to which change (or series of changes) is driving results toward the targeted goal. Armed with such baseline results, subsequent changes can be made to the system configuration or application and testing results can be compared to see whether such changes were relevant or not. Some considerations when generating baselines include the following: They are application-specific They can be created for system, application, or modules They are metrics/results They should not be over generalized They evolve and may need to be redefined from time to time They act as a shared frame of reference They are reusable They help identify changes in performance Load and stress testing Load testing is the process of putting demand on a system and measuring its response, that is, determining how much volume the system can handle. Stress testing is the process of subjecting the system to unusually high loads far beyond its normal usage pattern to determine its responsiveness. These are different from performance testing, whose sole purpose is to determine the response and effectiveness of a system, that is, how fast the system is. Since load ultimately affects how a system responds, performance testing is always done in conjunction with stress testing. JMeter to the rescue One of the areas performance testing covers is testing tools. Which testing tool do you use to put the system and application under load? There are numerous testing tools available to perform this operation, from free to commercial solutions. However, our focus will be on Apache JMeter, a free, open source, cross-platform desktop application from the Apache Software foundation. JMeter has been around since 1998 according to historic change logs on its official site, making it a mature, robust, and reliable testing tool. Cost may also have played a role in its wide adoption. Small companies usually may not want to foot the bill for commercial end testing tools, which often place restrictions, for example, on how many concurrent users one can spin off. My first encounter with JMeter was exactly a result of this. I worked in a small shop that had paid for a commercial testing tool, but during the course of testing, we had outrun the licensing limits of how many concurrent users we needed to simulate for realistic test plans. Since JMeter was free, we explored it and were quite delighted with the offerings and the share amount of features we got for free. Here are some of its features: Performance tests of different server types, including web (HTTP and HTTPS), SOAP, database, LDAP, JMS, mail, and native commands or shell scripts Complete portability across various operating systems Full multithreading framework allowing concurrent sampling by many threads and simultaneous sampling of different functions by separate thread groups Full featured Test IDE that allows fast Test Plan recording, building, and debugging Dashboard Report for detailed analysis of application performance indexes and key transactions In-built integration with real-time reporting and analysis tools, such as Graphite, InfluxDB, and Grafana, to name a few Complete dynamic HTML reports Graphical User Interface (GUI) HTTP proxy recording server Caching and offline analysis/replaying of test results High extensibility Live view of results as testing is being conducted JMeter allows multiple concurrent users to be simulated on the application, allowing you to work toward most of the target goals obtained earlier, such as attaining baseline and identifying bottlenecks. It will help answer questions, such as the following: Will the application still be responsive if 50 users are accessing it concurrently? How reliable will it be under a load of 200 users? How much of the system resources will be consumed under a load of 250 users? What will the throughput look like with 1000 users active in the system? What will be the response time for the various components in the application under load? JMeter, however, should not be confused with a browser. It doesn't perform all the operations supported by browsers; in particular, JMeter does not execute JavaScript found in HTML pages, nor does it render HTML pages the way a browser does. However, it does give you the ability to view request responses as HTML through many of its listeners, but the timings are not included in any samples. Furthermore, there are limitations to how many users can be spun on a single machine. These vary depending on the machine specifications (for example, memory, processor speed, and so on) and the test scenarios being executed. In our experience, we have mostly been able to successfully spin off 250-450 users on a single machine with a 2.2 GHz processor and 8 GB of RAM. Up and running with JMeter Now, let's get up and running with JMeter, beginning with its installation. Installation JMeter comes as a bundled archive, so it is super easy to get started with it. Those working in corporate environments behind a firewall or machines with non-admin privileges appreciate this more. To get started, grab the latest binary release by pointing your browser to http://jmeter.apache.org/download_jmeter.cgi. At the time of writing this, the current release version is 3.1. The download site offers the bundle as both a .zip file and a .tgz file. We go with the .zip file option, but feel free to download the .tgz file if that's your preferred way of grabbing archives. Once downloaded, extract the archive to a location of your choice. The location you extracted the archive to will be referred to as JMETER_HOME. Provided you have a JDK/JRE correctly installed and a JAVA_HOME environment variable set, you are all set and ready to run! The following screenshot shows a trimmed down directory structure of a vanilla JMeter install: JMETER_HOME folder structure The following are some of the folders in Apache-JMeter-3.2, as shown in the preceding screenshot: bin: This folder contains executable scripts to run and perform other operations in JMeter docs: This folder contains a well-documented user guide extras: This folder contains miscellaneous items, including samples illustrating the usage of the Apache Ant build tool (http://ant.apache.org/) with JMeter, and bean shell scripting lib: This folder contains utility JAR files needed by JMeter (you may add additional JARs here to use from within JMeter; we will cover this in detail later) printable_docs: This is the printable documentation Installing Java JDK Follow these steps to install Java JDK: Go to http://www.oracle.com/technetwork/java/javase/downloads/index.html. Download Java JDK (not JRE) compatible with the system that you will use to test. At the time of writing, JDK 1.8 (update 131) was the latest. Double-click on the executable and follow the onscreen instructions. On Windows systems, the default location for the JDK is under Program Files. While there is nothing wrong with this, the issue is that the folder name contains a space, which can sometimes be problematic when attempting to set PATH and run programs, such as JMeter, depending on the JDK from the command line. With this in mind, it is advisable to change the default location to something like C:toolsjdk. Setting up JAVA_HOME Here are the steps to set up the JAVA_HOME environment variable on Windows and Unix operating systems. On Windows For illustrative purposes, assume that you have installed Java JDK at C:toolsjdk: Go to Control Panel. Click on System. Click on Advance System settings. Add Environment to the following variables:     Value: JAVA_HOME     Path: C:toolsjdk Locate Path (under system variables, bottom half of the screen). Click on Edit. Append %JAVA_HOME%/bin to the end of the existing path value (if any). On Unix For illustrative purposes, assume that you have installed Java JDK at /opt/tools/jdk: Open up a Terminal window. Export JAVA_HOME=/opt/tools/jdk. Export PATH=$PATH:$JAVA_HOME. It is advisable to set this in your shell profile settings, such as .bash_profile (for bash users) or .zshrc (for zsh users), so that you won't have to set it for each new Terminal window you open. Running JMeter Once installed, the bin folder under the JMETER_HOME folder contains all the executable scripts that can be run. Based on the operating system that you installed JMeter on, you either execute the shell scripts (.sh file) for operating systems that are Unix/Linux flavored, or their batch (.bat file) counterparts on operating systems that are Windows flavored. JMeter files are saved as XML files with a .jmx extension. We refer to them as test scripts or JMX files. These scripts include the following: jmeter.sh: This script launches JMeter GUI (the default) jmeter-n.sh: This script launches JMeter in non-GUI mode (takes a JMX file as input) jmeter-n-r.sh: This script launches JMeter in non-GUI mode remotely jmeter-t.sh: This opens a JMX file in the GUI jmeter-server.sh: This script starts JMeter in server mode (this will be kicked off on the master node when testing with multiple machines remotely) mirror-server.sh: This script runs the mirror server for JMeter shutdown.sh: This script gracefully shuts down a running non-GUI instance stoptest.sh: This script abruptly shuts down a running non-GUI instance   To start JMeter, open a Terminal shell, change to the JMETER_HOME/bin folder, and run the following command on Unix/Linux: ./jmeter.sh Alternatively, run the following command on Windows: jmeter.bat Take a moment to explore the GUI. Hover over each icon to see a short description of what it does. The Apache JMeter team has done an excellent job with the GUI. Most icons are very similar to what you are used to, which helps ease the learning curve for new adapters. Some of the icons, for example, stop and shutdown, are disabled for now till a scenario/test is being conducted. The JVM_ARGS environment variable can be used to override JVM settings in the jmeter.bat or jmeter.sh script. Consider the following example: export JVM_ARGS="-Xms1024m -Xmx1024m -Dpropname=propvalue". Command-line options To see all the options available to start JMeter, run the JMeter executable with the -? command. The options provided are as follows: . ./jmeter.sh -? -? print command line options and exit -h, --help print usage information and exit -v, --version print the version information and exit -p, --propfile <argument> the jmeter property file to use -q, --addprop <argument> additional JMeter property file(s) -t, --testfile <argument> the jmeter test(.jmx) file to run -l, --logfile <argument> the file to log samples to -j, --jmeterlogfile <argument> jmeter run log file (jmeter.log) -n, --nongui run JMeter in nongui mode ... -J, --jmeterproperty <argument>=<value> Define additional JMeter properties -G, --globalproperty <argument>=<value> Define Global properties (sent to servers) e.g. -Gport=123 or -Gglobal.properties -D, --systemproperty <argument>=<value> Define additional system properties -S, --systemPropertyFile <argument> additional system property file(s) This is a snippet (non-exhaustive list) of what you might see if you did the same. Summary In this article we have learnt relationship between performance testing and tuning, and how to install and run JMeter.   Resources for Article: Further resources on this subject: Functional Testing with JMeter [article] Creating an Apache JMeter™ test workbench [article] Getting Started with Apache Spark DataFrames [article]
Read more
  • 0
  • 0
  • 3287

article-image-implementing-face-detection-using-haar-cascades-adaboost-algorithm
Sugandha Lahoti
20 Feb 2018
7 min read
Save for later

Implementing face detection using the Haar Cascades and AdaBoost algorithm

Sugandha Lahoti
20 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Ankit Dixit titled Ensemble Machine Learning. This book serves as an effective guide to using ensemble techniques to enhance machine learning models.[/box] In today’s tutorial, we will learn how to apply the AdaBoost classifier in face detection using Haar cascades. Face detection using Haar cascades Object detection using Haar feature-based cascade classifiers is an effective object detection method proposed by Paul Viola and Michael Jones in their paper Rapid Object Detection using a Boosted Cascade of Simple Features in 2001. It is a machine-learning-based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images. Here, we will work with face detection. Initially, the algorithm needs a lot of positive images (images of faces) and negative images (images without faces) to train the classifier. Then we need to extract features from it. Features are nothing but numerical information extracted from the images that can be used to distinguish one image from another; for example, a histogram (distribution of intensity values) is one of the features that can be used to define several characteristics of an image even without looking at the image, such as dark or bright image, the intensity range of the image, contrast, and so on. We will use Haar features to detect faces in an image. Here is a figure showing different Haar features: These features are just like the convolution kernel; to know about convolution, you need to wait for the following chapters. For a basic understanding, convolutions can be described as in the following figure: So we can summarize convolution with these steps: Pick a pixel location from the image. Now crop a sub-image with the selected pixel as the center from the source image with the same size as the convolution kernel. Calculate an element-wise product between the values of the kernel and sub- image. Add the result of the product. Put the resultant value into the new image at the same place where you picked up the pixel location. Now we are going to do a similar kind of procedure, but with a slight difference for our images. Each feature of ours is a single value obtained by subtracting the sum of the pixels under the white rectangle from the sum of the pixels under the black rectangle. Now, all possible sizes and locations of each kernel are used to calculate plenty of features. (Just imagine how much computation it needs. Even a 24x24 window results in over 160,000 features.) For each feature calculation, we need to find the sum of the pixels under the white and black rectangles. To solve this, we will use the concept of integral image; we will discuss this concept very briefly here, as it's not a part of our context. Integral image Integral images are those images in which the pixel value at any (x,y) location is the sum of the all pixel values present before the current pixel. Its use can be understood by the following example: Image on the left and the integral image on the right. Let's see how this concept can help reduce computation time; let us assume a matrix A of size 5x5 representing an image, as shown here: Now, let's say we want to calculate the average intensity over the area highlighted: Region for addition Normally, you'd do the following: 9 + 1 + 2 + 6 + 0 + 5 + 3 + 6 + 5 = 37 37 / 9 = 4.11 This requires a total of 9 operations. Doing the same for 100 such operations would require: 100 * 9 = 900 operations. Now, let us first make a integral image of the preceding image: Making this image requires a total of 56 operations. Again, focus on the highlighted portion: To calculate the avg intensity, all you have to do is: (76 - 20) - (24 - 5) = 37 37 / 9 = 4.11 This required a total of 4 operations. To do this for 100 such operations, we would require: 56 + 100 * 4 = 456 operations. For just a hundred operations over a 5x5 matrix, using an integral image requires about 50% less computations. Imagine the difference it makes for large images and other such operations. Creation of an integral image changes other sum difference operations by almost O(1) time complexity, thereby decreasing the number of calculations. It simplifies the calculation of the sum of pixels—no matter how large the number of pixels—to an operation involving just four pixels. Nice, isn't it? It makes things superfast. However, among all of these features we calculated, most of them are irrelevant. For example, consider the following image. The top row shows two good features. The first feature selected seems to focus on the property that the region of the eyes is often darker than the region of the nose and cheeks. The second feature selected relies on the property that the eyes are darker than the bridge of the nose. But the same windows applying on cheeks or any other part is irrelevant. So how do we select the best features out of 160000+ features? It is achieved by AdaBoost. To do this, we apply each and every feature on all the training images. For each feature, it finds the best threshold that will classify the faces as positive and negative. Obviously, there will be errors or misclassifications. We select the features with the minimum error rate, which means they are the features that best classify the face and non-face images. Note: The process is not as simple as this. Each image is given an equal weight in the       beginning. After each classification, the weights of misclassified images are increased. Again, the same process is done. New error rates are calculated among the new weights. This process continues until the required accuracy or error rate is achieved or the required number of features is found. The final classifier is a weighted sum of these weak classifiers. It is called weak because it alone can't classify the image, but together with others, it forms a strong classifier. The paper says that even 200 features provide detection with 95% accuracy. Their final setup had around 6,000 features. (Imagine a reduction from 160,000+ to 6000 features. That is a big gain.) Face detection framework using the Haar cascade and AdaBoost algorithm So now, you take an image take each 24x24 window, apply 6,000 features to it, and check if it is a face or not. Wow! Wow! Isn't this a little inefficient and time consuming? Yes, it is. The authors of the algorithm have a good solution for that. In an image, most of the image region is non-face. So it is a better idea to have a simple method to verify that a window is not a face region. If it is not, discard it in a single shot. Don’t process it again. Instead, focus on the region where there can be a face. This way, we can find more time to check a possible face region. For this, they introduced the concept of a cascade of classifiers. Instead of applying all the 6,000 features to a window, we group the features into different stages of classifiers and apply one by one (normally first few stages will contain very few features). If a window fails in the first stage, discard it. We don’t consider the remaining features in it. If it passes, apply the second stage of features and continue the process. The window that passes all stages is a face region. How cool is the plan!!! The authors' detector had 6,000+ features with 38 stages, with 1, 10, 25, 25, and 50 features in the first five stages (two features in the preceding image were actually obtained as the best two features from AdaBoost). According to the authors, on average, 10 features out of 6,000+ are evaluated per subwindow. So this is a simple, intuitive explanation of how Viola-Jones face detection works. Read the paper for more details. If you found this post useful, do check out the book Ensemble Machine Learning to learn different machine learning aspects such as bagging, boosting, and stacking.    
Read more
  • 0
  • 0
  • 77087
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-your-first-swift-program
Packt
20 Feb 2018
4 min read
Save for later

Your First Swift Program

Packt
20 Feb 2018
4 min read
 In this article, by Keith Moon author of the book Swift 4 Programming Cookbook, we will learn how to write your first swift program. (For more resources related to this topic, see here.) Your first Swift program In this first recipe will be get up and running with Swift using a Swift Playground, and run our first piece of Swift code. Getting ready To run our first Swift program, we first need to download and install our IDE. During the beta of Apple's Xcode 9, it is available as a direct download from Apple's developer website at http://developer.apple.com/download, access to this beta will require a free Apple developer account. Once the beta has ended and Xcode 9 is publically available, it will also be available from the Mac App Store. By obtaining it from the Mac App Store, you will automatically be informed of updates, so this is the preferred route, once Xcode 9 is out of beta. Xcode from the Mac App Store Open up the Mac App Store, either from the dock or via Spotlight: Search for xcode: Click Install: Xcode is a large download (over 4 GB). So, depending on your internet connection, this could take a while! Progress can be monitored from Launchpad: Xcode as a direct download Go to the Apple Developer download page at http://developer.apple.com/download  Click the Download button to download Xcode within a .xip file.  Double click on the downloaded file to unpack the Xcode application. Drag the Xcode application into your Applications folder How to do it... With Xcode downloaded, let create our first Swift playground: Launch Xcode from the icon in your dock. From the welcome screen, choose Get started with a playground. From the template chooser, select the blank template from the iOS tab: Choose a name for your playground and a location to save it: Xcode Playgrounds can be based on one of three different Apple platforms, iOS, tvOS and macOS (the operating system formerly known as OSX). Playgrounds provide full access to the frameworks available to either iOS, tvOS or macOS, depending on which you choose. An iOS playground will be assumed for the entirety of this chapter, chiefly because this is the platform of choice of the author. Where recipes do have UI components, the iOS platform will be used until otherwise stated. You are now presented with a view that looks like this: Let's replace the word playground with Swift!. Press the blue play button in the bottom left-hand corner of the window to execute the code in the playground: Congratulations! You have just run some Swift code. On the right-hand side of the window, you will see the output of each line of code in the playground. We can see our line of code has output "Hello, Swift!": There's more... If you put your cursor over the output on the left-hand side, you will see two buttons, one that looks like an eye, another that is a circle: Click on the eye button and you get a Quick Look box of the output. This isn't that useful for just a string, but can be useful for more visual output like colors and views. Click on the square button, and a box will be added in-line, under your code, showing the output of the code. This can be really useful if you want to see how the output changes as you change the code. Summary In this article, we learnt how to run your first swift program. Resources for Article: Further resources on this subject: Your First Swift App [article] Exploring Swift [article] Functions in Swift [article]
Read more
  • 0
  • 0
  • 28336

article-image-getting-know-generative-models-types
Sunith Shetty
20 Feb 2018
9 min read
Save for later

Getting to know Generative Models and their types

Sunith Shetty
20 Feb 2018
9 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Rajdeep Dua and Manpreet Singh Ghotra titled Neural Network Programming with Tensorflow. In this book, you will use TensorFlow to build and train neural networks of varying complexities, without any hassle.[/box] In today’s tutorial, we will learn about generative models, and their types. We will also look into how discriminative models differs from generative models. Introduction to Generative models Generative models are the family of machine learning models that are used to describe how data is generated. To train a generative model we first accumulate a vast amount of data in any domain and later train a model to create or generate data like it. In other words, these are the models that can learn to create data that is similar to data that we give them. One such approach is using Generative Adversarial Networks (GANs). There are two kinds of machine learning models: generative models and discriminative models. Let's examine the following list of classifiers: decision trees, neural networks, random forests, generalized boosted models, logistic regression, naive bayes, and Support Vector Machine (SVM). Most of these are classifiers and ensemble models. The odd one out here is Naive Bayes. It's the only generative model in the list. The others are examples of discriminative models. The fundamental difference between generative and discriminative models lies in the underlying probability inference structure. Let's go through some of the key differences between generative and discriminative models. Discriminative versus generative models Discriminative models learn P(Y|X), which is the conditional relationship between the target variable Y and features X. This is how least squares regression works, and it is the kind of inference pattern that gets used. It is an approach to sort out the relationship among variables. Generative models aim for a complete probabilistic description of the dataset. With generative models, the goal is to develop the joint probability distribution P(X, Y), either directly or by computing P(Y | X) and P(X) and then inferring the conditional probabilities required to classify newer data. This method requires more solid probabilistic thought than regression demands, but it provides a complete model of the probabilistic structure of the data. Knowing the joint distribution enables you to generate the data; hence, Naive Bayes is a generative model. Suppose we have a supervised learning task, where xi is the given features of the data points and yi is the corresponding labels. One way to predict y on future x is to learn a function f() from (xi,yi) that takes in x and outputs the most likely y. Such models fall in the category of discriminative models, as you are learning how to discriminate between x's from different classes. Methods like SVMs and neural networks fall into this category. Even if you're able to classify the data very accurately, you have no notion of how the data might have been generated. The second approach is to model how the data might have been generated and learn a function f(x,y) that gives a score to the configuration determined by x and y together. Then you can predict y for a new x by finding the y for which the score f(x,y) is maximum. A canonical example of this is Gaussian mixture models. Another example of this is: you can imagine x to be an image and y to be a kind of object like a dog, namely in the image. The probability written as p(y|x) tells us how much the model believes that there is a dog, given an input image compared to all possibilities it knows about. Algorithms that try to model this probability map directly are called discriminative models. Generative models, on the other hand, try to learn a function called the joint probability p(y, x). We can read this as how much the model believes that x is an image and there is a dog y in it at the same time. These two probabilities are related and that could be written as p(y, x) = p(x) p(y|x), with p(x) being how likely it is that the input x is an image. The p(x) probability is usually called a density function in literature. The main reason to call these models generative ultimately connects to the fact that the model has access to the probability of both input and output at the same time. Using this, we can generate images of animals by sampling animal kinds y and new images x from p(y, x). We can mainly learn the density function p(x) which only depends on the input space. Both models are useful; however, comparatively, generative models have an interesting advantage over discriminative models, namely, they have the potential to understand and explain the underlying structure of the input data even when there are no labels available. This is very desirable when working in the real world. Types of generative models Discriminative models have been at the forefront of the recent success in the field of machine learning. Models make predictions that depend on a given input, although they are not able to generate new samples or data. The idea behind the recent progress of generative modeling is to convert the generation problem to a prediction one and use deep learning algorithms to learn such a problem. Autoencoders One way to convert a generative to a discriminative problem can be by learning the mapping from the input space itself. For example, we want to learn an identity map that, for each image x, would ideally predict the same image, namely, x = f(x), where f is the predictive model. This model may not be of use in its current form, but from this, we can create a generative model. Here, we create a model formed of two main components: an encoder model q(h|x) that maps the input to another space, which is referred to as hidden or the latent space represented by h, and a decoder model q(x|h) that learns the opposite mapping from the hidden input space. These components--encoder and decoder--are connected together to create an end-to-end trainable model. Both the encoder and decoder models are neural networks of different architectures, for example, RNNs and Attention Nets, to get desired outcomes. As the model is learned, we can remove the decoder from the encoder and then use them separately. To generate a new data sample, we can first generate a sample from the latent space and then feed that to the decoder to create a new sample from the output space. GAN As seen with autoencoders, we can think of a general concept to create networks that will work together in a relationship, and training them will help us learn the latent spaces that allow us to generate new data samples. Another type of generative network is GAN, where we have a generator model q(x|h) to map the small dimensional latent space of h (which is usually represented as noise samples from a simple distribution) to the input space of x. This is quite similar to the role of decoders in autoencoders. The deal is now to introduce a discriminative model p(y| x), which tries to associate an input instance x to a yes/no binary answer y, about whether the generator model generated the input or was a genuine sample from the dataset we were training on. Let's use the image example done previously. Assume that the generator model creates a new image, and we also have the real image from our actual dataset. If the generator model was right, the discriminator model would not be able to distinguish between the two images easily. If the generator model was poor, it would be very simple to tell which one was a fake or fraud and which one was real. When both these models are coupled, we can train them end to end by assuring that the generator model is getting better over time to fool the discriminator model, while the discriminator model is trained to work on the harder problem of detecting frauds. Finally, we desire a generator model with outputs that are indistinguishable from the real data that we used for the training. Through the initial parts of the training, the discriminator model can easily detect the samples coming from the actual dataset versus the ones generated synthetically by the generator model, which is just beginning to learn. As the generator gets better at modeling the dataset, we begin to see more and more generated samples that look similar to the dataset. The following example depicts the generated images of a GAN model learning over time: Sequence models If the data is temporal in nature, then we can use specialized algorithms called Sequence Models. These models can learn the probability of the form p(y|x_n, x_1), where i is an index signifying the location in the sequence and x_i is the ith  input sample. As an example, we can consider each word as a series of characters, each sentence as a series of words, and each paragraph as a series of sentences. Output y could be the sentiment of the sentence. Using a similar trick from autoencoders, we can replace y with the next item in the series or sequence, namely y = x_n + 1, allowing the model to learn. To summarize, we learned generative models are a fast advancing area of study and research. As we proceed to advance these models and grow the training and datasets, we can expect to generate data examples that depict completely believable images. This can be used in several applications such as image denoising, painting, structured prediction, and exploration in reinforcement learning. To know more about how to build and optimize neural networks using TensorFlow, do checkout this book Neural Network Programming with Tensorflow.    
Read more
  • 0
  • 0
  • 37175

article-image-installing-configuring-x-pack-elasticsearch-kibana
Pravin Dhandre
20 Feb 2018
6 min read
Save for later

Installing and Configuring X-pack on Elasticsearch and Kibana

Pravin Dhandre
20 Feb 2018
6 min read
[box type="note" align="" class="" width=""]This article is an excerpt from a book written by Pranav Shukla and Sharath Kumar M N titled Learning Elastic Stack 6.0. This book provides detailed coverage on fundamentals of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box] In this short tutorial, we will show step-by-step installation and configuration of X-pack components in Elastic Stack to extend the functionalities of Elasticsearch and Kibana. As X-Pack is an extension of Elastic Stack, prior to installing X-Pack, you need to have both Elasticsearch and Kibana installed. You must run the version of X-Pack that matches the version of Elasticsearch and Kibana. Installing X-Pack on Elasticsearch X-Pack is installed just like any plugin to extend Elasticsearch. These are the steps to install X-Pack in Elasticsearch: Navigate to the ES_HOME folder. Install X-Pack using the following command: $ ES_HOME> bin/elasticsearch-plugin install x-pack During installation, it will ask you to grant extra permissions to X-Pack, which are required by Watcher to send email alerts and also to enable Elasticsearch to launch the machine learning analytical engine. Specify y to continue the installation or N to abort the installation. You should get the following logs/prompts during installation: -> Downloading x-pack from elastic [=================================================] 100% @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: plugin requires additional permissions @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ * java.io.FilePermission .pipe* read,write * java.lang.RuntimePermissionaccessClassInPackage.com.sun.activation.registries * java.lang.RuntimePermission getClassLoader * java.lang.RuntimePermission setContextClassLoader * java.lang.RuntimePermission setFactory * java.net.SocketPermission * connect,accept,resolve * java.security.SecurityPermission createPolicy.JavaPolicy * java.security.SecurityPermission getPolicy * java.security.SecurityPermission putProviderProperty.BC * java.security.SecurityPermission setPolicy * java.util.PropertyPermission * read,write * java.util.PropertyPermission sun.nio.ch.bugLevel write See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html for descriptions of what these permissions allow and the associated Risks. Continue with installation? [y/N]y @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: plugin forks a native controller @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ This plugin launches a native controller that is not subject to the Java security manager nor to system call filters. Continue with installation? [y/N]y Elasticsearch keystore is required by plugin [x-pack], creating... -> Installed x-pack Restart Elasticsearch: $ ES_HOME> bin/elasticsearch Generate the passwords for the default/reserved users—elastic, kibana, and logstash_system—by executing this command: $ ES_HOME>bin/x-pack/setup-passwords interactive You should get the following logs/prompts to enter the password for the reserved/default users: Initiating the setup of reserved user elastic,kibana,logstash_system passwords. You will be prompted to enter passwords as the process progresses. Please confirm that you would like to continue [y/N]y Enter password for [elastic]: elastic Reenter password for [elastic]: elastic Enter password for [kibana]: kibana Reenter password for [kibana]:kibana Enter password for [logstash_system]: logstash Reenter password for [logstash_system]: logstash Changed password for user [kibana] Changed password for user [logstash_system] Changed password for user [elastic] Please make a note of the passwords set for the reserved/default users. You can choose any password of your liking. We have chosen the passwords as elastic, kibana, and logstash for elastic, kibana, and logstash_system users, respectively, and we will be using them throughout this chapter. To verify the X-Pack installation and enforcement of security, point your web browser to http://localhost:9200/ to open Elasticsearch. You should be prompted to log in to Elasticsearch. To log in, you can use the built-in elastic user and the password elastic. Upon a successful log in, you should see the following response: { name: "fwDdHSI", cluster_name: "elasticsearch", cluster_uuid: "08wSPsjSQCmeRaxF4iHizw", version: { number: "6.0.0", build_hash: "8f0685b", build_date: "2017-11-10T18:41:22.859Z", build_snapshot: false, lucene_version: "7.0.1", minimum_wire_compatibility_version: "5.6.0", minimum_index_compatibility_version: "5.0.0" }, tagline: "You Know, for Search" } A typical cluster in Elasticsearch is made up of multiple nodes, and X-Pack needs to be installed on each node belonging to the cluster. To skip the install prompt, use the—batch parameters during installation: $ES_HOME>bin/elasticsearch-plugin install x-pack --batch. Your installation of X-Pack will have created folders named x-pack in bin, config, and plugins found under ES_HOME. We shall explore these in later sections of the chapter. Installing X-Pack on Kibana X-Pack is installed just like any plugins to extend Kibana. The following are the steps to install X-Pack in Kibana: Navigate to the KIBANA_HOME folder. Install X-Pack using the following command: $KIBANA_HOME>bin/kibana-plugin install x-pack You should get the following logs/prompts during installation: Attempting to transfer from x-pack Attempting to transfer from https://artifacts.elastic.co/downloads/kibana-plugins/x-pack/x-pack -6.0.0.zip Transferring 120307264 bytes.................... Transfer complete Retrieving metadata from plugin archive Extracting plugin archive Extraction complete Optimizing and caching browser bundles... Plugin installation complete Add the following credentials in the kibana.yml file found under $KIBANA_HOME/config and save it: elasticsearch.username: "kibana" elasticsearch.password: "kibana" If you have chosen a different password for the kibana user during password setup, use that value for the elasticsearch.password property. Start Kibana: $KIBANA_HOME>bin/kibana To verify the X-Pack installation, go to http://localhost:5601/ to open Kibana. You should be prompted to log in to Kibana. To log in, you can use the built-in elastic user and the password elastic. Your installation of X-Pack will have created a folder named x-pack in the plugins folder found under KIBANA_HOME. You can also optionally install X-Pack on Logstash. However, X-Pack currently supports only monitoring of Logstash. Uninstalling X-Pack To uninstall X-Pack: Stop Elasticsearch. Remove X-Pack from Elasticsearch: $ES_HOME>bin/elasticsearch-plugin remove x-pack Restart Elasticsearch and stop Kibana 2. Remove X-Pack from Kibana: $KIBANA_HOME>bin/kibana-plugin remove x-pack Restart Kibana. Configuring X-Pack X-Pack comes bundled with security, alerting, monitoring, reporting, machine learning, and graph capabilities. By default, all of these features are enabled. However, one might not be interested in all the features it provides. One can selectively enable and disable the features that they are interested in from the elasticsearch.yml and kibana.yml configuration files. Elasticsearch supports the following features and settings in the elasticsearch.yml file: Kibana supports these features and settings in the kibana.yml file: If X-Pack is installed on Logstash, you can disable the monitoring by setting the xpack.monitoring.enabled property to false in the logstash.yml configuration file.   With this, we successfully explored how to install and configure the X-Pack components in order to bundle different capabilities of X-pack into one package of Elasticsearch and Kibana. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.    
Read more
  • 0
  • 0
  • 40114

article-image-stack-wars-epic-struggle-who-controls-tech-stack
Dave Maclean
20 Feb 2018
4 min read
Save for later

Stack Wars: The epic struggle for control of the tech stack

Dave Maclean
20 Feb 2018
4 min read
The choice of tech stack for a project, team or organisation is an ongoing struggle between competing forces. Each of the players has their own logic, beliefs and drivers. Where you stand and what side you are on totally determines the way you see the struggle. Packt is on the developer team. This is how we see the struggle we’re all part of: Technology vendors are the Empire Any organisation that is selling tools, technologies or platform services is either already behaving like the Empire, or will, eventually, become the Empire. Vendors want the stack to include their tech, and if the vendor has a full stack like IBM, MS, or Oracle then they want you to live in their world. To be completely Blue or Red Stack. The economics driving this are relentless. The biggest cost for large software vendors is acquiring customers. Once you have a customer, it makes sense to keep expanding your product portfolio to sell more to each customer. The end game is when the Empire captures whole planets from the Alliance and enslaves the occupants in a move called Large Outsourcing Deals. Businesses and IT departments are the Rebel Alliance Companies and organisations build systems to try and serve their users and customers. Their underlying intentions are good. They are trying to do the right thing. They do the best they can. They have to manage within a structured organisation, co-ordinating different groups and teams. They sometimes have some cool new stuff, but often they are struggling with outdated kit, against overwhelming odds. Companies sometimes achieve great things in specific battles with heroic individuals and teams, but they also have to keep the whole show on the road. The Empire Vendors are constantly trying to bring them into their captive stack-universe, to make life “easier" with the comforting myth of the one-stop-shop. The Alliance gets new weapons and allies in the form of insurgent vendors who start out fighting the Empire, like GitHhub, Jira and AWS. However, these can be dangerous alliances. The iron law of the costs of customer acquisition will drive even the insurgent vendors to continually expand their product offer and then - BAM! – another empire wanting to lock you in. They call this the ‘Land and Expand’ strategy and every vendor has it, overtly or secretly. Even the currently much-beloved Slack will eventually try and turn itself into the Facebook of the office, and will gobble up the app ecosystem just like Facebook.  They all cross over to the dark side eventually. Developers are the Jedi Devs have a deep understanding of how technologies really work in action because they have to actually build things. This knowledge can appear mystical to outsiders. It is hard to express and articulate the intuitive skills gained from actual development experience. The very best devs are 10, 100, 1000 times more productive than the implementation teams from the vendors. Devs know what vendor tools are really like under the hood, when the action starts. They know that even the Death Star has hidden yet fatal vulnerabilities, no matter how great it looks from a distance. Over the years devs have evolved their own special ways of working that is hard for outsiders to understand. These go by the names of Agile and Open Source. Agile is a semi-mysterious Way, trusting the process to migrate towards success, without being really able to say what that is before we realise we get there. Open Source is the shared network that binds developers together into a powerful network of shared power on platforms like GitHub. Devs have two forces driving them. The first is to get the very best tech stack for each project, based on their unique technical insight into how it really works. Devs always want to choose best of breed, for this problem, here and now. But devs also have personal weapons of choice, over which they have mastery, and will try and use these wherever possible. Laser swords can do a lot more than you think, but there are other, better weapons in certain circumstances. Stack Wars are never going to end. There will be more and more episodes of this eternal struggle. The Empire can never be completely defeated, any more than the Jedi can die out. The story needs all three, and ebbs and flows over time in a pattern that repeats itself but in new and different ways.
Read more
  • 0
  • 0
  • 2892
article-image-decision-trees
Packt
20 Feb 2018
17 min read
Save for later

Decision Trees

Packt
20 Feb 2018
17 min read
In this article by David Toth, the author of the book Data Science Algorithms in a Week, we will cover the following topics: Concepts Analysis Concepts A decision tree is the arrangement of the data in a tree structure where at each node data is separated to different branches according to the value of the attribute at the node. Analysis To construct a decision tree, we will use a standard ID3 learning algorithm that chooses an attribute that classifies the data samples in the best possible way to maximize the information gain – a measure based on information entropy. Information entropy Information entropy of the given data measures the least amount of the information necessary to represent a data item from the given data. The unit of the information entropy is a familiar unit – a bit and a byte, a kilobyte, and so on. The lower information entropy, the more regular, data is, the more pattern occurs in the data and thus less amount of the information is necessary to represent it. That is why compression tools on the computer can take large text files and compress them to a much smaller size, as words and word expressions keep reoccurring, forming a pattern. Coin flipping Imagine we flip and unbiased coin. We would like to know if the result is head or tail. How much information do we need to represent the result? Both words head and tail consists of 4 characters and if we represent one character with one byte (8 bits) as it is standard in ASCII table, then we would need 4 bytes or 32 bits to represent the result. But the information entropy is the least amount of the data necessary to represent the result. We know that there are only two possible results – head or tail. If we agree to represent head with 0 and tail with 1, then 1 bit would be sufficient to communicate the result efficiently. Here the data is the space of the possibilities of the result of the coin throw. It is the set {head,tail} which can represented as a set {0,1}. The actual result is a data item from this set. It turns out that the entropy of the set is 1. This is owing to that the probability of head and tail are both 50%. Now imagine that the coin is biased and throws head 25% of time and tails 75% of time. What would be the entropy of the probability space {0,1} this time? We could certainly represent the result with 1 bit of the information. But can we do better? 1 bit is of course indivisible, but maybe we could generalize the concept of the information to indiscrete amounts. In the previous example, we know nothing about the previous result of the coin flip unless we look at the coin. But in the example with the biased coin, we know that the result tail is more likely to happen. If we recorded n results of coin flips in a file representing heads with 0 and tails with 1, then about 75% of the bits there would have the value 1 and 25% of them would have the value 0. The size of such file would be n bits. But since it is more regular (the pattern of 1s prevails in it) a good compression tool should be able to compress it to less than n bits. To learn the theoretical bound to the compression and the amount of the information necessary to represent a data item we define information entropy precisely. Definition of Information Entropy Suppose that we are given a probability space S with the elements 1, 2, …, n. The probability an element i would be chosen from the probability space is pi. Then the information entropy of the probability space is defined as: E(S)=-p1 *log2(p1) - … - pn *log2(pn) where log2 is a binary logarithm. So for the information entropy of the probability space of unbiased coin throws is E = -0.5 * log2(0.5) – 0.5*log2(0.5)=0.5+0.5=1. When the coin is based with 25% chance of a head and 75% change of a tail, then the information entropy of such space is: E = -0.25 * log2(0.25) – 0.75*log2(0.75) = 0.81127812445 which is less than 1. Thus for example if we had a large file with about 25% of 0 bits and 75% of 1 bits, a good compression tool should be able to compress it down to about 81.12% of its size. Information gain The information gain is the amount of the information entropy gained as a result of a certain procedure. For example, if we would like to know the results of 3 fair coins, then its information entropy is 3. But if we could look at the 3rd coin, then information entropy of the result for the remaining 2 coins would be 2. Thus by looking at the 3rd coin we gained 1 bit information, so the information gain was 1. We may also gain the information entropy by dividing the whole set S into sets grouping them by similar pattern. If we group elements by their value of an attribute A, then we define the information gain as IG(S, A) = E(S) – Sumv in values(A)[(|Sv|/|S|) * E(Sv)] where Sv is a set with the elements of S that have the value v for the attribute A. For example let us calculate the information gain for the 6 rows in the swimming example by taking swimming suit as an attribute. Because we are interested whether a given row of data is classified as no or yes for the question whether one should swim, we will use swim preference to calculate the entropy and information gain. We partition the set S by the attribute swimming suit: Snone={(none,cold,no),(none,warm,no)} Ssmall={(small,cold,no),(small,warm,no)} Sgood= {(good,cold,no),(good,warm,yes)} The information entropy of S is E(S)=-(1/6)*log2(1/6)-(5/6)*log2(5/6)~0.65002242164 The information entropy of the partitions is: E(Snone)=-(2/2)*log2(2/2)=-log2(1)=0 since all instances have the class no. E(Ssmall)=0 for a similar reason E(Sgood)=-(1/2)*log2(1/2)=1 Therefore the information gain is IG(S,swimming suit)=E(S)-[(2/6)*E(Snone)+(2/6)*E(Ssmall)+(2/6)*E(Sgood)] =0.65002242164-(1/3)=0.3166890883 If we chose the attribute water temperature to partition the set S, what would be the information gain IG(S,water temperature)? The water temperature partitions the set S into the following sets: Scold={(none,cold,no),(small,cold,no),(good,cold,no)} Swarm={(none,warm,no),(small,warm,no),(good,warm,yes)} Their entropies are: E(Scold)=0 as all instances are classified as no. E(Swarm)=-(2/3)*log2(2/3)-(1/3)*log2(1/3)~0.91829583405 which is less than IG(S,swimming suit). Therefore, we can gain more information about the set S (the classification of its instances) by partitioning it per the attribute swimming suit instead of the attribute water temperature. This finding will be the basis of the ID3 algorithm constructing a decision tree in the next paragraph. ID3 algorithm ID3 algorithm constructs a decision tree from the data based on the information gain. In the beginning, we start with the set S. The data items in the set S have various properties according to which we can partition the set S. If an attribute A has the values {v1, …, vn}, then we partition the set S into the sets Sv1, …, Svn. Where the set Svi is a subset of the set S where the elements have the value vi for the attribute A. If each element in the set S has attributes A1, …, Am, then we can partition the set S according to any of the possible attributes. ID3 algorithm partitions the set S according to the attribute that yields the highest information gain. Now suppose that it was an attribute A1. Then for the set S we have the partitions Sv1, …, Svn where A1 has the possible values {v1,…, vn}. Since we have not constructed any tree yet, we first place a root node into the tree. For every partition of S we place a new branch from the root. Every branch represents one value of the selected attributes. A branch has data samples with the same value for that attribute. For every new branch we can define a new node that will have data samples from its ancestor branch. Once we have defined a new node, we choose another of the remaining attributes with the highest information gain for the data at that node to partition the data at that node further, then define new branches and nodes. This process can be repeated until we run out of all the attributes for the nodes or even earlier until all the data at the node have the same class of our interest. In the case of a swimming example there are only two possible classes for swimming preference: class no and class yes. The last node is called a leaf node and decides the class of a data item from the data. Tree construction by ID3 algorithm Here we describe step by step how an ID3 algorithm would construct a decision tree from the given data samples in the swimming example. The initial set consists of 6 data samples: S={(none,cold,no),(small,cold,no),(good,cold,no),(none,warm,no),(small,warm,no),(good,warm,yes)} In the previous sections we calculated the information gains for both and the only non- classifying attributes swimming suit and water temperature: IG(S,swimming suit)=0.3166890883 IG(S,water temperature)=0.19087450461 Hence we would choose the attribute swimming suit as it has a higher information gain. There is no tree drawn yet, so we start from the root node. As the attribute swimming suit has 3 possible values {none, small, good}, we draw 3 possible branches out of it for each. Each branch will have one partition from the partitioned set S: Snone, Ssmall, Sgood. We add nodes to the ends of the branches. Snone data samples have the same class swimming preference = no, so we do not need to branch that node by a further attribute and partition set. Thus the node with the data Snone is already a leaf node. The same is true for the node with the data Ssmall. But the node with the data Sgood has two possible classes for swimming preference. Therefore, we will branch the node further. There is only one non- classifying attribute left – water temperature. So there is no need to calculate the information gain for that attribute with the data Sgood. From the node Sgood we will have 2 branches each with the partition from the set Sgood. One branch will have the set of the data sample Sgood, cold={(good,cold,no)}, the other branch will have the partition Sgood, warm={(good,warm,yes)}. Each of these 2 branches will end with a node. Each node will be a leaf node because each node has the data samples of the same value for the classifying attribute swimming preference. The resulting decision tree has 4 leaf nodes and is the tree in the picture decision tree for the swimming preference example. Deciding with a decision tree Once we have constructed a decision tree from the data with the attributes A1, …, Am and the classes {c1, …, ck}; we can use this decision tree to classify a new data item with the attributes A1, …, Am into one of the classes {c1, …, ck}. Given a new data item that we would like to classify, we can think of each node including the root as a question for data sample: What value does that data sample for the selected attribute Aihave? Then based on the answer we select the branch of a decision tree and move further to the next node. Then another question is answered about the data sample and another until the data sample reaches the leaf node. A leaf node has an associated one of the classes {c1, …, ck} with it, e.g. ci. Then the decision tree algorithm would classify the data sample into the class ci. Deciding a data sample with the swimming preference decision tree Let us construct a decision tree for the swimming preference example with the ID3 algorithm. Consider a data sample (good,cold,?) and we would like to use the constructed decision tree to decide into which class it should belong. Start with a data sample at the root of the tree. The first attribute that branches from the root is swimming suit, so we ask for the value for the attribute swimming suit of the sample (good,cold,?). We learn that the value of the attribute is swimming suit=good, therefore move down the rightmost branch with that value for its data samples. We arrive at the node with the attribute water temperature and ask the question: what is the value of the attribute water temperature for the data sample (good,cold,?). We learn that for that data sample we have water temperature=cold, therefore we move down the left branch into the leaf node. This leaf is associated with the class swimming preference=no. Therefore the decision tree would classify the data sample (good,cold,?) to be in that class swimming preference, i.e. to complete it to the data sample (good,cold,no). Therefore, the decision tree says that if one has a good swimming suit, but the water temperature is cold, then one would still not want to swim based on the data collected in the table. Implementation decision_tree.py import math import imp import sys #anytree module is used to visualize the decision tree constructed by this ID3 algorithm. from anytree import Node, RenderTree import common #Node for the construction of a decision tree. class TreeNode: definit(self,var=None,val=None): self.children=[] self.var=varself.val=val defadd_child(self,child): self.children.append(child) defget_children(self): return self.children defget_var(self): return self.var defis_root(self): return self.var==None and self.val==None defis_leaf(self): return len(self.children)==0 def name(self): if self.is_root(): return “[root]” return “[“+self.var+”=“+self.val+”]” #Constructs a decision tree where heading is the heading of the table with the data, i.e. the names of the attributes. #complete_data are data samples with a known value for every attribute. #enquired_column is the index of the column (starting from zero) which holds the classifying attribute. defconstuct_decision_tree(heading,complete_data,enquired_column): available_columns=[] for col in range(0,len(heading)): if col!=enquired_column: available_columns.append(col) tree=TreeNode() add_children_to_node(tree,heading,complete_data,available_columns,enquired_ column) return tree #Splits the data samples into the groups with each having a different value for the attribute at the column col. defsplit_data_by_col(data,col): data_groups={} for data_item in data: if data_groups.get(data_item[col])==None: data_groups[data_item[col]]=[] data_groups[data_item[col]].append(data_item) return data_groups #Adds a leaf node to node. defadd_leaf(node,heading,complete_data,enquired_column): node.add_child(TreeNode(heading[enquired_column],complete_data[0][enquired_ column])) #Adds all the descendants to the node. def add_children_to_node(node,heading,complete_data,available_columns,enquired_ column): if len(available_columns)==0: add_leaf(node,heading,complete_data,enquired_column) return -1 selected_col=select_col(complete_data,available_columns,enquired_column) for i inrange(0,len(available_columns)): if available_columns[i]==selected_col: available_columns.pop(i) break data_groups=split_data_by_col(complete_data,selected_col) if(len(data_groups.items())==1): add_leaf(node,heading,complete_data,enquired_column) return -1 for child_group, child_data in data_groups.items(): child=TreeNode(heading[selected_col],child_group) add_children_to_node(child,heading,child_data,list(available_columns),enquired_column) node.add_child(child) #Selects an available column/attribute with the highest information gain. defselect_col(complete_data,available_columns,enquired_column): selected_col=-1 selected_col_information_gain=-1 for col in available_columns: current_information_gain=col_information_gain(complete_data,col,enquired_column) if current_information_gain>selected_col_information_gain: selected_col=col selected_col_information_gain=current_information_gainreturn selected_col #Calculates the information gain when partitioning complete_dataaccording to the attribute at the column col and classifying by the attribute at enquired_column. defcol_information_gain(complete_data,col,enquired_column): data_groups=split_data_by_col(complete_data,col) information_gain=entropy(complete_data,enquired_column) for _,data_group in data_groups.items(): information_gain- =(float(len(data_group))/len(complete_data))*entropy(data_group,enquired_column) return information_gain #Calculates the entropy of the data classified by the attribute at the enquired_column. def entropy(data,enquired_column): value_counts={} for data_item in data: if value_counts.get(data_item[enquired_column])==None: value_counts[data_item[enquired_column]]=0 value_counts[data_item[enquired_column]]+=1 entropy=0 for _,count in value_counts.items(): probability=float(count)/len(data) entropy-=probability*math.log(probability,2) return entropy #A visual output of a tree using the text characters. defdisplay_tree(tree): anytree=convert_tree_to_anytree(tree) for pre, fill, node in RenderTree(anytree): pre=pre.encode(encoding=‘UTF-8’,errors=‘strict’) print(“%s%s” % (pre, node.name)) #A simple textual output of a tree without the visualization. defdisplay_tree_simple(tree): print(‘***Tree structure***’) display_node(tree) sys.stdout.flush() #A simple textual output of a node in a tree. defdisplay_node(node): if node.is_leaf(): print(‘The node ‘+node.name()+’ is a leaf node.’) return sys.stdout.write(‘The node ‘+node.name()+’ has children: ‘) for child in node.get_children(): sys.stdout.write(child.name()+’‘) print(‘‘) for child in node.get_children(): display_node(child) #Convert a decision tree into the anytree module tree format to make it ready for rendering. defconvert_tree_to_anytree(tree): anytree=Node(“Root”) attach_children(tree,anytree) return anytree#Attach the children from the decision tree into the anytree tree format. defattach_children(parent_node, parent_anytree_node): for child_node in parent_node.get_children(): child_anytree_node=Node(child_node.name(),parent=parent_anytree_node) attach_children(child_node,child_anytree_node) ###PROGRAM START### if len(sys.argv)<2: sys.exit(‘Please, input as an argument the name of the CSV file.’) csv_file_name=sys.argv[1] (heading,complete_data,incomplete_data,enquired_column)=common.csv_file_to_ ordered_data(csv_file_name) tree=constuct_decision_tree(heading,complete_data,enquired_column) display_tree(tree) common.py #Reads the csv file into the table and then separates the table into heading, complete data, incomplete data and then produces also the index number for the column that is not complete, i.e. contains a question mark. defcsv_file_to_ordered_data(csv_file_name): with open(csv_file_name, ‘rb’) as f: reader = csv.reader(f) data = list(reader) return order_csv_data(data) deforder_csv_data(csv_data): #The first row in the CSV file is the heading of the data table. heading=csv_data.pop(0) complete_data=[] incomplete_data=[] #Let enquired_column be the column of the variable which conditional probability should be calculated. Here set that column to be the last one. enquired_column=len(heading)-1 #Divide the data into the complete and the incomplete data. An incomplete row is the one that has a question mark in the enquired_column. The question mark will be replaced by the calculated Baysian probabilities from the complete data. for data_item in csv_data: if is_complete(data_item,enquired_column): complete_data.append(data_item) else: incomplete_data.append(data_item) return (heading,complete_data,incomplete_data,enquired_column) Program input swim.csv swimming_suit,water_temperature,swimNone,Cold,No None,Warm,NoSmall,Cold,NoSmall,Warm,NoGood,Cold,NoGood,Warm,Yes Program output $ python decision_tree.py swim.csv Root ├── [swimming_suit=Small] │├──[water_temperature=Cold] ││└──[swim=No] │└──[water_temperature=Warm] │└──[swim=No] ├── [swimming_suit=None] │├──[water_temperature=Cold] ││└──[swim=No] │└──[water_temperature=Warm] │└──[swim=No] └── [swimming_suit=Good] ├── [water_temperature=Cold] │└──[swim=No] └── [water_temperature=Warm] └── [swim=Yes] Summary In this article we have learned the concept of decision tree, analysis using ID3 algorithm, and implementation. Resources for Article: Further resources on this subject: Working with Data – Exploratory Data Analysis [article] Introduction to Data Analysis and Libraries [article] Data Analysis Using R [article]
Read more
  • 0
  • 0
  • 2300

article-image-develop-stock-price-predictive-model-using-reinforcement-learning-tensorflow
Aaron Lazar
20 Feb 2018
12 min read
Save for later

How to develop a stock price predictive model using Reinforcement Learning and TensorFlow

Aaron Lazar
20 Feb 2018
12 min read
[box type="note" align="" class="" width=""]This article is an extract from the book Predictive Analytics with TensorFlow, authored by Md. Rezaul Karim. This book helps you build, tune, and deploy predictive models with TensorFlow.[/box] In this article we’ll show you how to create a predictive model to predict stock prices, using TensorFlow and Reinforcement Learning. An emerging area for applying Reinforcement Learning is the stock market trading, where a trader acts like a reinforcement agent since buying and selling (that is, action) particular stock changes the state of the trader by generating profit or loss, that is, reward. The following figure shows some of the most active stocks on July 15, 2017 (for an example): Now, we want to develop an intelligent agent that will predict stock prices such that a trader will buy at a low price and sell at a high price. However, this type of prediction is not so easy and is dependent on several parameters such as the current number of stocks, recent historical prices, and most importantly, on the available budget to be invested for buying and selling. The states in this situation are a vector containing information about the current budget, current number of stocks, and a recent history of stock prices (the last 200 stock prices). So each state is a 202-dimensional vector. For simplicity, there are only three actions to be performed by a stock market agent: buy, sell, and hold. So, we have the state and action, what else do you need? Policy, right? Yes, we should have a good policy, so based on that an action will be performed in a state. A simple policy can consist of the following rules: Buying (that is, action) a stock at the current stock price (that is, state) decreases the budget while incrementing the current stock count Selling a stock trades it in for money at the current share price Holding does neither, and performing this action simply waits for a particular time period and yields no reward To find the stock prices, we can use the yahoo_finance library in Python. A general warning you might experience is "HTTPError: HTTP Error 400: Bad Request". But keep trying. Now, let's try to get familiar with this module: >>> from yahoo_finance import Share >>> msoft = Share('MSFT') >>> print(msoft.get_open()) 72.24= >>> print(msoft.get_price()) 72.78 >>> print(msoft.get_trade_datetime()) 2017-07-14 20:00:00 UTC+0000 >>> So as of July 14, 2017, the stock price of Microsoft Inc. went higher, from 72.24 to 72.78, which means about a 7.5% increase. However, this small and just one-day data doesn't give us any significant information. But, at least we got to know the present state for this particular stock or instrument. To install yahoo_finance, issue the following command: $ sudo pip3 install yahoo_finance Now it would be worth looking at the historical data. The following function helps us get the historical data for Microsoft Inc: def get_prices(share_symbol, start_date, end_date, cache_filename): try: stock_prices = np.load(cache_filename) except IOError: share = Share(share_symbol) stock_hist = share.get_historical(start_date, end_date) stock_prices = [stock_price['Open'] for stock_price in stock_ hist] np.save(cache_filename, stock_prices) return stock_prices The get_prices() method takes several parameters such as the share symbol of an instrument in the stock market, the opening date, and the end date. You will also like to specify and cache the historical data to avoid repeated downloading. Once you have downloaded the data, it's time to plot the data to get some insights. The following function helps us to plot the price: def plot_prices(prices): plt.title('Opening stock prices') plt.xlabel('day') plt.ylabel('price ($)') plt.plot(prices) plt.savefig('prices.png') Now we can call these two functions by specifying a real argument as follows: if __name__ == '__main__': prices = get_prices('MSFT', '2000-07-01', '2017-07-01', 'historical_stock_prices.npy') plot_prices(prices) Here I have chosen a wide range for the historical data of 17 years to get a better insight. Now, let's take a look at the output of this data: The goal is to learn a policy that gains the maximum net worth from trading in the stock market. So what will a trading agent be achieving in the end? Figure 8 gives you some clue: Well, figure 8 shows that if the agent buys a certain instrument with price $20 and sells at a peak price say at $180, it will be able to make $160 reward, that is, profit. So, implementing such an intelligent agent using RL algorithms is a cool idea? From the previous example, we have seen that for a successful RL agent, we need two operations well defined, which are as follows: How to select an action How to improve the utility Q-function To be more specific, given a state, the decision policy will calculate the next action to take. On the other hand, improve Q-function from a new experience of taking an action. Also, most reinforcement learning algorithms boil down to just three main steps: infer, perform, and learn. During the first step, the algorithm selects the best action (a) given a state (s) using the knowledge it has so far. Next, it performs the action to find out the reward (r) as well as the next state (s'). Then, it improves its understanding of the world using the newly acquired knowledge (s, r, a, s') as shown in the following figure: Now, let's start implementing the decision policy based on which action will be taken for buying, selling, or holding a stock item. Again, we will do it an incremental way. At first, we will create a random decision policy and evaluate the agent's performance. But before that, let's create an abstract class so that we can implement it accordingly: class DecisionPolicy: def select_action(self, current_state, step): pass def update_q(self, state, action, reward, next_state): pass The next task that can be performed is to inherit from this superclass to implement a random decision policy: class RandomDecisionPolicy(DecisionPolicy): def __init__(self, actions): self.actions = actions def select_action(self, current_state, step): action = self.actions[random.randint(0, len(self.actions) - 1)] return action The previous class did nothing except defi ning a function named select_action (), which will randomly pick an action without even looking at the state. Now, if you would like to use this policy, you can run it on the real-world stock price data. This function takes care of exploration and exploitation at each interval of time, as shown in the following figure that form states S1, S2, and S3. The policy suggests an action to be taken, which we may either choose to exploit or otherwise randomly explore another action. As we get rewards for performing an action, we can update the policy function over time: Fantastic, so we have the policy and now it's time to utilize this policy to make decisions and return the performance. Now, imagine a real scenario—suppose you're trading on Forex or ForTrade platform, then you can recall that you also need to compute the portfolio and the current profit or loss, that is, reward. Typically, these can be calculated as follows: portfolio = budget + number of stocks * share value reward = new_portfolio - current_portfolio At first, we can initialize values that depend on computing the net worth of a portfolio, where the state is a hist+2 dimensional vector. In our case, it would be 202 dimensional. Then we define the range of tuning the range up to: Length of the prices selected by the user query – (history + 1), since we start from 0, we subtract 1 instead. Then, we should calculate the updated value of the portfolio and from the portfolio, we can calculate the value of the reward, that is, profit. Also, we have already defined our random policy, so we can then select an action from the current policy. Then, we repeatedly update the portfolio values based on the action in each iteration and the new portfolio value after taking the action can be calculated. Then, we need to compute the reward from taking an action at a state. Nevertheless, we also need to update the policy after experiencing a new action. Finally, we compute the final portfolio worth: def run_simulation(policy, initial_budget, initial_num_stocks, prices, hist, debug=False): budget = initial_budget num_stocks = initial_num_stocks share_value = 0 transitions = list() for i in range(len(prices) - hist - 1): if i % 100 == 0: print('progress {:.2f}%'.format(float(100*i) / (len(prices) - hist - 1))) current_state = np.asmatrix(np.hstack((prices[i:i+hist], budget, num_stocks))) current_portfolio = budget + num_stocks * share_value action = policy.select_action(current_state, i) share_value = float(prices[i + hist + 1]) if action == 'Buy' and budget >= share_value: budget -= share_value num_stocks += 1 elif action == 'Sell' and num_stocks > 0: budget += share_value num_stocks -= 1 else: action = 'Hold' new_portfolio = budget + num_stocks * share_value reward = new_portfolio - current_portfolio next_state = np.asmatrix(np.hstack((prices[i+1:i+hist+1], budget, num_stocks))) transitions.append((current_state, action, reward, next_ state)) policy.update_q(current_state, action, reward, next_state) portfolio = budget + num_stocks * share_value if debug: print('${}t{} shares'.format(budget, num_stocks)) return portfolio The previous simulation predicts a somewhat good result; however, it produces random results too often. Thus, to obtain a more robust measurement of success, let's run the simulation a couple of times and average the results. Doing so may take a while to complete, say 100 times, but the results will be more reliable: def run_simulations(policy, budget, num_stocks, prices, hist): num_tries = 100 final_portfolios = list() for i in range(num_tries): final_portfolio = run_simulation(policy, budget, num_stocks, prices, hist) final_portfolios.append(final_portfolio) avg, std = np.mean(final_portfolios), np.std(final_portfolios) return avg, std The previous function computes the average portfolio and the standard deviation by iterating the previous simulation function 100 times. Now, it's time to evaluate the previous agent. As already stated, there will be three possible actions to be taken by the stock trading agent such as buy, sell, and hold. We have a state vector of 202 dimension and budget only $1000. Then, the evaluation goes as follows: actions = ['Buy', 'Sell', 'Hold'] hist = 200 policy = RandomDecisionPolicy(actions) budget = 1000.0 num_stocks = 0 avg,std=run_simulations(policy,budget,num_stocks,prices, hist) print(avg, std) >>> 1512.87102405 682.427384814 The first one is the mean and the second one is the standard deviation of the final portfolio. So, our stock prediction agent predicts that as a trader you/we could make a profit about $513. Not bad. However, the problem is that since we have utilized a random decision policy, the result is not so reliable. To be more specific, the second execution will definitely produce a different result: >>> 1518.12039077 603.15350649 Therefore, we should develop a more robust decision policy. Here comes the use of neural network-based QLearning for decision policy. Next, we will see a new hyperparameter epsilon to keep the solution from getting stuck when applying the same action over and over. The lesser its value, the more often it will randomly explore new actions: Next, I am going to write a class containing their functions: Constructor: This helps to set the hyperparameters from the Q-function. It also helps to set the number of hidden nodes in the neural networks. Once we have these two, it helps to define the input and output tensors. It then defines the structure of the neural network. Further, it defines the operations to compute the utility. Then, it uses an optimizer to update model parameters to minimize the loss and sets up the session and initializes variables. select_action: This function exploits the best option with probability 1-epsilon. update_q: This updates the Q-function by updating its model parameters. Refer to the following code: class QLearningDecisionPolicy(DecisionPolicy): def __init__(self, actions, input_dim): self.epsilon = 0.9 self.gamma = 0.001 self.actions = actions output_dim = len(actions) h1_dim = 200 self.x = tf.placeholder(tf.float32, [None, input_dim]) self.y = tf.placeholder(tf.float32, [output_dim]) W1 = tf.Variable(tf.random_normal([input_dim, h1_dim])) b1 = tf.Variable(tf.constant(0.1, shape=[h1_dim])) h1 = tf.nn.relu(tf.matmul(self.x, W1) + b1) W2 = tf.Variable(tf.random_normal([h1_dim, output_dim])) b2 = tf.Variable(tf.constant(0.1, shape=[output_dim])) self.q = tf.nn.relu(tf.matmul(h1, W2) + b2) loss = tf.square(self.y - self.q) self.train_op = tf.train.GradientDescentOptimizer(0.01). minimize(loss) self.sess = tf.Session() self.sess.run(tf.initialize_all_variables()) def select_action(self, current_state, step): threshold = min(self.epsilon, step / 1000.) if random.random() < threshold: # Exploit best option with probability epsilon action_q_vals = self.sess.run(self.q, feed_dict={self.x: current_state}) action_idx = np.argmax(action_q_vals) action = self.actions[action_idx] else: # Random option with probability 1 - epsilon action = self.actions[random.randint(0, len(self.actions) - 1)] return action def update_q(self, state, action, reward, next_state): action_q_vals = self.sess.run(self.q, feed_dict={self.x: state}) next_action_q_vals = self.sess.run(self.q, feed_dict={self.x: next_state}) next_action_idx = np.argmax(next_action_q_vals) action_q_vals[0, next_action_idx] = reward + self.gamma * next_action_q_vals[0, next_action_idx] action_q_vals = np.squeeze(np.asarray(action_q_vals)) self.sess.run(self.train_op, feed_dict={self.x: state, self.y: action_q_vals}) There you go! We have a stock price predictive model running and we’ve built it using Reinforcement Learning and TensorFlow. If you found this tutorial interesting and would like to learn more, head over to grab this book, Predictive Analytics with TensorFlow, by Md. Rezaul Karim.    
Read more
  • 0
  • 1
  • 19813

article-image-roslyn-cookbook
Packt
20 Feb 2018
6 min read
Save for later

Consuming Diagnostic Analyzers in .NET projects

Packt
20 Feb 2018
6 min read
We know how to write diagnostic analyzers to analyze and report issues about .NET source code and contribute them to the .NET developer community. In this article by the author Manish Vasani, of the book Roslyn Cookbook, we will show you how to search, install, view and configure the analyzers that have already been published by various analyzer authors on NuGet and VS Extension gallery. We will cover the following recipes: (For more resources related to this topic, see here.) Searching and installing analyzers through the NuGet package manager. Searching and installing VSIX analyzers through the VS extension gallery. Viewing and configuring analyzers in solution explorer in Visual Studio. Using ruleset file and ruleset editor to configure analyzers. Diagnostic analyzers are extensions to the Roslyn C# compiler and Visual Studio IDE to analyze user code and report diagnostics. User will see these diagnostics in the error list after building the project from Visual Studio and even when building the project on the command line. They will also see the diagnostics live while editing the source code in the Visual Studio IDE. Analyzers can report diagnostics to enforce specific code styles, improve code quality and maintenance, recommend design guidelines or even report very domain specific issues which cannot be covered by the core compiler. Analyzers can be installed to a .NET project either as a NuGet package or as a VSIX. To get a better understanding of these packaging schemes and learn about the differences in the analyzer experience when installed as a NuGet package versus a VSIX. Analyzers are supported on various different flavors of .NET standard, .NET core and .NET framework projects, for example, class library, console app, etc. Searching and installing analyzers through the NuGet package manager In this recipe we will show you how to search and install analyzer NuGet packages in the NuGet package manager in Visual Studio and see how the analyzer diagnostics from an installed NuGet package light up in project build and as live diagnostics during code editing in Visual Studio. Getting ready You will need to have Visual Studio 2017 installed on your machine to this recipe. You can install a free community version of Visual Studio 2017 from https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15.  How to do it… Create a C# class library project, say ClassLibrary, in Visual Studio 2017. In solution explorer, right click on the solution or project node and execute Manage NuGet Packages command.  This brings up the NuGet Package Manager, which can be used to search and install NuGet packages to the solution or project. In the search bar type the following text to find NuGet packages tagged as analyzers: Tags:"analyzers" Note that some of the well known packages are tagged as analyzer, so you may also want to search:Tags:"analyzer" Check or uncheck the Include prerelease checkbox to the right of the search bar to search or hide the prerelease analyzer packages respectively. The packages are listed based on the number of downloads, with the highest downloaded package at the top. Select a package to install, say System.Runtime.Analyzers, and pick a specific version, say 1.1.0, and click Install. Click on I Accept button on the License Acceptance dialog to install the NuGet package. Verify the installed analyzer(s) show up under the Analyzers node in the solution explorer. Verify the project file has a new ItemGroup with the following analyzer references from the installed analyzer package: <ItemGroup> <Analyzer Include="..packagesSystem.Runtime.Analyzers.1.1.0analyzersdotnetcsSystem.Runtime.Analyzers.dll" /> <Analyzer Include="..packagesSystem.Runtime.Analyzers.1.1.0analyzersdotnetcsSystem.Runtime.CSharp.Analyzers.dll" /> </ItemGroup> Add the following code to your C# project: namespace ClassLibrary { public class MyAttribute : System.Attribute { } } Verify the analyzer diagnostic from the installed analyzer is shown in the error list: Open a Visual Studio 2017 Developer Command Prompt and build the project to verify that the analyzer is executed on the command line build and the analyzer diagnostic is reported: Create a new C# project in VS2017 and add the same code to it as step 9 and verify no analyzer diagnostic shows up in error list or command line, confirming that the analyzer package was only installed to the selected project in steps 1-6. Note that CA1018 (Custom attribute should have AttributeUsage defined) has been moved to a separate analyzer assembly in future versions of FxCop/System.Runtime.Analyzers package. It is recommended that you install Microsoft.CodeAnalysis.FxCopAnalyzers NuGet package to get the latest group of Microsoft recommended analyzers. Searching and installing VSIX analyzers through the VS extension gallery In this recipe we will show you how to search and install analyzer VSIX packages in the Visual Studio Extension manager and see how the analyzer diagnostics from an installed VSIX light up as live diagnostics during code editing in Visual Studio. Getting ready You will need to have Visual Studio 2017 installed on your machine to this recipe. You can install a free community version of Visual Studio 2017 from https://www.visualstudio.com/thank-you-downloading-visual-studio/?sku=Community&rel=15. How to do it… Create a C# class library project, say ClassLibrary, in Visual Studio 2017. From the top level menu, execute Tools | Extensions and Updates Navigate to Online | Visual Studio Marketplace on the left tab of the dialog to view the available VSIXes in the Visual Studio extension gallery/marketplace. Search analyzers in the search text box in the upper right corner of the dialog and download an analyzer VSIX, say Refactoring Essentials for Visual Studio. Once the download completes, you will get a message at the bottom of the dialog that the install will be scheduled to execute once Visual Studio and related windows are closed. Close the dialog and then close the Visual Studio instance to start the install. In the VSIX Installer dialog, click Modify to start installation. The subsequent message prompts you to kill all the active Visual Studio and satellite processes. Save all your relevant work in all the open Visual Studio instances, and click End Tasks to kill these processes and install the VSIX. After installation, restart VS, click Tools | Extensions And Updates, and verify Refactoring Essentials VSIX is installed. Create a new C# project with the following source code and verify analyzer diagnostic RECS0085 (Redundant array creation expression) in the error list: namespace ClassLibrary { public class Class1 { void Method() { int[] values = new int[] { 1, 2, 3 }; } } } Build the project from Visual Studio 2017 or command line and confirm no analyzer diagnostic shows up in the Output Window or the command line respectively, confirming that the VSIX analyzer did not execute as part of the build. Resources for Article: Further resources on this subject: C++, SFML, Visual Studio, and Starting the first game [article] Connecting to Microsoft SQL Server Compact 3.5 with Visual Studio [article] Creating efficient reports with Visual Studio [article]
Read more
  • 0
  • 0
  • 11880
article-image-what-makes-hadoop-so-revolutionary
Packt
20 Feb 2018
17 min read
Save for later

What makes Hadoop so revolutionary?

Packt
20 Feb 2018
17 min read
In this article by Sourav Gulati and Sumit Kumar authors of book Apache Spark 2.x for Java Developers , explain in classical sense if we are to talk of Hadoop, then it comprises of two components a storage layer called HDFS and a processing layer called MapReduce. The resource management task prior to Hadoop 2.X was done using MapReduce Framework of Hadoop itself, however that changed with the introduction of YARN. In Hadoop 2.0 YARN was introduced as the third component of Hadoop to manage the resources of Hadoop Cluster and make it more Map Reduce agnostic. (For more resources related to this topic, see here.) HDFS Hadoop Distributed File System as the name suggests is a distributed file system based on the lines of Google File System written in Java. In practice HDFS resembles closely like any other UNIX file system with support for common file operations like ls, cp, rm, du, cat and so on. However what makes HDFS stand out despite its simplicity, is its mechanism to handle node failure in Hadoop cluster without effectively changing the seek time for accessing stored files. HDFS cluster consists of two major components: Data Nodes and Name Node. HDFS has a unique way of storing data on HDFS clusters (cheap commodity networked commodity computers). It splits the regular file in smaller chunks called blocks and then makes an exact number of copies of such chunks depending on the replication factor for that file. After that it copies such chunks to different Data Nodes of the Cluster. Name Node Name Node is responsible for managing the metadata of HDFS cluster such as list of files and folders that exist in a cluster, number of splits each file is divided into and their replication and storage at different Data Nodes. It also maintains and manages the namespace and file permission of all the files available in HDFS cluster. Apart from bookkeeping Name Node also has a supervisory role that keeps a watch on the replication factor of all the files and if some block goes missing then issue commands to replicate the missing block of data. It also generates reports to ascertain cluster health too. It is important to note that all the communication for supervisory task happens from Data Node to Name node that is Data Node sends reports a.k.a block reports to Name Node and it is then that Name Node responds to them by issuing different commands or instructions as the need may be. HDFS I/O A HDFS read operation from a client involves: Client requests the NameNode to determine where the actual data blocks are stored for a given file. Name Node obliges by providing the Block IDs and locations of the hosts (Data Node ) where the data can be found. The client contacts the Data Node with respective Block IDs to fetches the data from Data Node while preserving the order of the block files. A HDFS write operation from a client involves: Client contacts the Name Node to update the namespace with the file name and verify necessary permissions. If the file exists then Name Node throws an error else return the client FSDataOutputStream which points to data queue. The data queue negotiates with the NameNode to allocate new blocks on suitable DataNodes. The data is then copied to that DataNode, and as per replication strategy the data it further copied from that DataNode to rest of the DataNodes. It’s important to note that the data is never moved through the NameNode as it would have caused performance bottleneck. YARN Simplest way to understand Yet Another Resource manager (YARN) is to think of it as an operating system on a Cluster; provisioning resources, scheduling jobs & node maintenance. With Hadoop 2.x, MapReduce model of processing the data and managing the cluster (job tracker/task tracker) was divided. While data processing was still left to MapReduce, the cluster’s resource allocation (or rather, scheduling) task was assigned to a new component called YARN. Another objective that YARN met was that it made MapReduce one of the techniques to process the data rather than being the only technology to process data on HDFS as was the case in Hadoop 1.x systems. This paradigm shift opened the flood gate for the development of interesting applications around Hadoop and a new eco-system of not only classical MapReduce processing system evolved. It didn’t take much time after that for Apache Spark to break the hegemony of classical MapReduce and become arguably the most popular processing framework for parallel computing as far as active development and adoption is concerned. In order to serve Multi-tenancy, fault tolerance, and resource isolation in YARN, it developed below components to manage the cluster seamlessly. ResourceManager: It negotiates resources for different compute programmes on a Hadoop cluster while guaranteeing the following: resource isolation, data locality, fault tolerance, task prioritization and effective cluster capacity utilization. A configurable scheduler allows Resource Manager the flexibility to schedule and prioritize different applications as per the need. Tasks served by RM while serving clients: Using client or APIs user can submit or terminate an application. The user can also gather statistics on submitted application, cluster and queue information. RM also priorities ADMIN tasks higher over any other task to perform clean up or maintenance activities on a cluster like refreshing node-list, the queues configuration. Tasks served by RM while serving Cluster Nodes: Provisioning and de-provisioning of new nodes forms an important task of RM. Each node sends a heartbeat at a configured interval, default being 10 minutes. Any failure of node in doing so is treated as dead node. As a clean-up activity all the supposedly running process including containers are marked dead too. Tasks served by RM while serving Application Master: RM registers new AM while terminating the successfully executed ones. Just like Cluster Nodes if the heartbeat of AM is not received within a preconfigured duration, default value being 10 minutes, then AM is marked dead and all the associated containers too are marked dead. But since YARN is reliable as far as Application execution is concerned hence a new AM is rescheduled to try another execution on a new container until it reaches the retry configurable default count of 4. Scheduling and other miscellaneous tasks served by RM: RM maintains a list of running, submitted and executed applications along with its statistics such as execution time , status etc. Privileges of user as well as of applications are maintained and compared while serving various requests of user per application life cycle. RM scheduler oversees resource allocation for application such as memory allocation. Two common scheduling algorithms used in YARN are fair scheduling and capacity scheduling algorithms. NodeManager: NM exist per node of the cluster on a slightly similar fashion as to what slave nodes are in master slave architecture. When a NM starts it sends the information to RM for its availability to share its resources for upcoming jobs. There on NM sends periodic signal also called heartbeat to RM informing them of its status as being alive in the cluster. Primarily NM is responsible for launching containers that has been requested by AM with certain resource requirement such as memory, disk and so on. Once the containers are up and running the NM keeps a watch not on the status of the container’s task but on the resource utilization of the container and kill them if the container start utilizing more resources then it has been provisioned for. Apart from managing the life cycle of the container the NM also keeps RM informed about node’s health. ApplicationMaster: AM gets launched per submitted application and manages the life cycle of submitted application. However the first and foremost task AM does is to negotiate resources from RM to launch task specific containers at different nodes. Once containers are launched the AM keeps track of all the containers’ task status. If any node goes down or the container gets killed because of using excess resources or otherwise in such cases AM renegotiates resources from RM and launch those pending tasks again. AM also keeps reporting the status of the submitted application directly to the user and other such statistics to RM. ApplicationMaster implementation is framework specific and it is because of this reason application/framework specific code if transferred the AM , and it the AM that distributes it further across. This important feature also makes YARN technology agnostic as any framework can implement its ApplicationMaster and then utilized the resources of YARN cluster seamlessly. Container: Container in an abstract sense is a set of minimal resources such as CPU, RAM, Disk I/O, Disk space etc. that are required to run a task independently on a node. The first container after submitting the job is launched by RM to host ApplicationMaster. It is the AM which then negotiates resources from RM in the form of containers, which then gets hosted in different nodes across the Hadoop Cluster. Process flow of application submission in YARN: Step 1: Using a client or APIs the user submits the application let’s say a Spark Job jar. Resource Manager, whose primary task is to gather and report all the applications running on entire Hadoop cluster and available resources on respective Hadoop nodes, depending on the privileges of the user submitting the job accepts the newly submitted task. Step2: After this RM delegates the task to scheduler. The scheduler then searches for a container which can host the application-specific Application Master. While Scheduler does takes into consideration parameters like availability of resources, task priority, data locality etc. before scheduling or launching an Application Master, it has no role in monitoring or restarting a failed job. It is the responsibility of RM to keep track of AM and restart them in a new container when be it fails. Step 3: Once the Application Master gets launched it becomes the prerogative of AM to oversee the resources negotiation with RM for launching task specific containers. Negotiations with RM is typically over:    The priority of the tasks at hand.    Number of containers to be launched to complete the tasks.    The resources need to execute the tasks i.e. RAM, CPU (since Hadoop 3.x).    Available nodes where job containers can be launched with required resources    Depending on the priority and availability of resources the RM grants containers represented by container ID and hostname of the node on which it can be launched. Step 4: The AM then request the NM of the respective hosts to launch the containers with specific ID’s and resource configuration. The NM then launches the containers but keeps a watch on the resources usage of the task. If for example the container starts utilizing more resources than it has been provisioned for then in such scenario the said containers are killed by the NM. This greatly improves the job isolation and fair sharing of resources guarantee that YARN provides as otherwise it would have impacted the execution of other containers. However, it is important to note that the job status and application status as a whole is managed by AM. It falls in the domain of AM to continuously monitor any delay or dead containers, simultaneously negotiating with RM to launch new containers to reassign the task of dead containers. Step 5: The Containers executing on different nodes sends Application specific statistics to AM at specific intervals. Step 6: AM also reports the status of the application directly to the client that submitted the specific application, in our case a Spark Job. Step 7: NM monitors the resources being utilized by all the containers on the respective nodes and keeps sending a periodic update to RM. Step 8: The AM sends periodic statistics such application status, task failure, log information to RM Overview Of MapReduce Before delving deep into MapReduce implementation in Hadoop, let’s first understand the MapReduce as a concept in parallel computing and why it is a preferred way of computing. MapReduce comprises two mutually exclusive but dependent phases each capable of running on two different machines or nodes: Map: In Map phase transformation of data takes place. It splits data into key value pair by splitting it on a keyword. Suppose we have a text file and we would want to do an analysis such as to count total number of words or even the frequency with which the word has occurred in the text file. This is the classical Word Count problem of MapReduce, now to address this problem first we will have to identify the splitting keyword so that the data can be spilt and be converted into a key value pair. Let’s begin with John Lennon's song Imagine. Sample Text: Imagine there's no heaven It's easy if you try No hell below us Above us only sky Imagine all the people living for today After running Map phase on the sampled text and splitting it over <space> it will get converted to key value pair as follows: <imagine, 1> <there's, 1> <no, 1> <heaven, 1> <it's, 1> <easy, 1> <if, 1> <you, 1> <try, 1> <no, 1> <hell, 1> <below, 1> <us, 1> <above, 1> <us, 1> <only, 1> <sky, 1> <imagine, 1> <all, 1> <the, 1> <people, 1> <living, 1> <for, 1> <today, 1>] The key here represents the word and value represents the count, also it should be noted that we have converted all the keys to lowercase to reduce any further complexity arising out of matching case sensitive keys. Reduce: Reduce phase deals with aggregation of Map phase result and hence all the key value pairs are aggregated over key. So the Map output of the text would get aggregated as follows: [<imagine, 2> <there's, 1> <no, 2> <heaven, 1> <it's, 1> <easy, 1> <if, 1> <you, 1> <try, 1> <hell, 1> <below, 1> <us, 2> <above, 1> <only, 1> <sky, 1> <all, 1> <the, 1> <people, 1> <living, 1> <for, 1> <today, 1>] As we can see both Map and Reduce phase can be run exclusively and hence can use independent nodes in cluster to process the data. This approach of separation of tasks into smaller units called Map and Reduce has revolutionized general purpose distributed/parallel computing, which we now know as MapReduce. Apache Hadoop's MapReduce has been implemented pretty much the same way as discussed except for adding extra features into how the data from Map phase of each node gets transferred to their designated Reduce phase node. Hadoop's implementation of MapReduce enriches the Map and Reduce phase by adding few more concrete steps in between to make it fault tolerant and truly distributed. We can describe MR jobs on YARN in five stages. Job Submission Stage: When a client submits a MR Job following things happen RM is requested for an application ID. Input data location is checked and if present then file split size is computed. Job's output location need to exist as well. If all the three conditions are met then the MR job jar along with its configuration ,details of input split are copied to HDFS in a directory named the application ID provided by RM. And then the job is submitted to RM to launch a job specific Application Master, MRAppMaster. MAP Stage: Once RM receives the client's request for launching MRAppMaster, a call is made to YARN scheduler for assigning a container. As per resource availability the container is granted and hence the MRAppMaster is launched at the designated node with provisioned resources. After this MRAppMaster fetches input split information from the HDFS path that was submitted by the client and computes the number of Mapper task that will be launched based on the splits. Depending on number of Mappers it also calculates the required number of Reducers as per configuration, If MRAppMaster now finds the number of Mapper ,Reducer & size of input files to be small enough to be run in the same JVM then it goes ahead in doing so, such tasks are called Uber task. However, in other scenarios MRAppMaster negotiates container resources from RM for running these tasks albeit Mapper tasks having higher order and priority. This is so as Mapper tasks must finish before sorting phase can start. Data locality is another concern for containers hosting Mappers as data local nodes are preferred over rack local, with least preference being given to remote node hosted data. But when it comes to Reduce phase no such preference of data locality exist for containers. Containers hosting Mapper function first copy mapReduce JAR & configuration files locally and then launch a class YarnChild in the JVM. The mapper then start reading the input files, process them by making key value pairs and writes them in a circular buffer. Shuffle and Sort Phase: Considering circular buffer has size constraint, after a certain percentage where default being 80, a thread gets spawned which spills the data from buffer. But before copying the spilled data to disk, it is first partitioned with respect to its Reducer then the background thread also sorts the partitioned data on key and if combiner is mentioned then combines the data too. This process optimizes the data once it is copied to their respective partitioned folder. This process is continued until all the data from circular buffer gets written to disk. A background thread again checks if the number of spilled files in each partition is within the range of configurable parameter or else the files are merged and combiner is run over them until it falls within the limit of the parameter. Map task keeps updating the status to ApplicationMaster its entire life cycle, it is only when 5 percent of Map task has been completed that the reduce task start. An auxiliary service in the NodeManager serving Reduce task starts a Netty web server that makes a request to MRAppMaster for Mapper hosts having specific Mapper partitioned files. All the partitioned files that pertain to the Reducer is copied to their respective nodes in similar fashion. Since multiple files gets copied as data from various nodes representing that reduce nodes gets collected, a background thread merges the sorted map file again sorts them and if Combiner is configured then combines the result too. Reduce Stage: It is important to note here that at this stage every input file of each reducer should have been sorted by key, this is the presumption with which Reducer starts processing these records and converts the key value pair into aggregated list. Once reducer processes the data it writes them to the output folder as was mentioned during Job submission. Clean up stage: Each Reducer sends periodic update to MRAppMaster about the task completion, once the Reduce task is over the application master starts the clean-up activity. The submitted job status is changed from running to successful, all the temporary and intermediate files and folders are deleted .The application statistics are archived to job history server. Summary In this article we saw what is HDFS and YARN along with MapReduce in which we learned different function of MapReduce and HDFS I/O. Resources for Article: Further resources on this subject: Getting Started with Apache Spark DataFrames [article] Five common questions for .NET/Java developers learning JavaScript and Node.js [article] Getting Started with Apache Hadoop and Apache Spark [article]
Read more
  • 0
  • 0
  • 41115

article-image-k-nearest-neighbors
Packt
20 Feb 2018
10 min read
Save for later

K Nearest Neighbors

Packt
20 Feb 2018
10 min read
In this article by Gavin Hackeling, author of book Mastering Machine Learning with scikit-learn - Second Edition, we will start with K Nearest Neighbors (KNN) which is a simple model for regression and classification tasks. It is so simple that its name describes most of its learning algorithm. The titular neighbors are representations of training instances in a metric space. A metric space is a feature space in which the distances between all members of a set are defined. (For more resources related to this topic, see here.) For classification tasks, a set of tuples of feature vectors and class labels comprise the training set. KNN is a capable of binary, multi-class, and multi-label classification. We will focus on binary classification in this article. The simplest KNN classifiers use the mode of the KNN labels to classify test instances, but other strategies can be used. k is often set to an odd number to prevent ties. In regression tasks, the features vectors are each associated with a response variable that takes a real-valued scalar instead of a label. The prediction is the mean or weighted mean of the k nearest neighbors’ response variables. Lazy learning and non-parametric models KNN is a lazy learner. Also known as instance-based learners, lazy learners simply store the training data set with little or no processing. In contrast to eager learners, such as simple linear regression, KNN does not estimate the parameters of a model that generalizes the training data during a training phase. Lazy learning has advantages and disadvantages. Training an eager learner is often computationally costly, but prediction with the resulting model is often inexpensive. For simple linear regression, prediction consists only of multiplying the learned coefficient by the feature, and adding the learned intercept parameter. A lazy learner can predict almost immediately, but making predictions can be costly. In the simplest implementation of KNN, prediction requires calculating the distances between a test instance and all of the training instances. In contrast to most of the other models we will discuss, KNN is a non-parametric model. A parametric model uses a fixed number of parameters, or coefficients, to define the model that summarizes the data. The number of parameters is independent of the number of training instances. Non-parametric may seem to be a misnomer, as it does not mean that the model has no parameters; rather, non-parametric means that the number of parameters of the model is not fixed, and may grow with the number of training instances. Non-parametric models can be useful when training data is abundant and you have little prior knowledge about the relationship between the response and explanatory variables. KNN makes only one assumption: instances that are near each other are likely to have similar values of the response variable. The flexibility provided by non-parametric models is not always desirable; a model that makes assumptions about the relationship can be useful if training data is scarce or if you already know about the relationship. Classification with KNN The goal of classification tasks is to use one or more features to predict the value of a discrete response variable. Let’s work through a toy classification problem. Assume that you must use a person’s height and weight to predict his or her sex. This problem is called binary classification because the response variable can take one of two labels. The following table records nine training instances. height weight label 158 cm 64 kg male 170 cm 66 kg male 183 cm 84 kg male 191 cm 80 kg male 155 cm 49 kg female 163 cm 59 kg female 180 cm 67 kg female 158 cm 54 kg female 178 cm 77 kg female We are now using features from two explanatory variables to predict the value of the response variable. KNN is not limited to two features; the algorithm can use an arbitrary number of features, but more than three features cannot be visualized. Let’s visualize the data by creating a scatter plot with matplotlib. # In[1]: import numpy as np import matplotlib.pyplot as plt X_train = np.array([ [158, 64], [170, 86], [183, 84], [191, 80], [155, 49], [163, 59], [180, 67], [158, 54], [170, 67] ]) y_train = ['male', 'male', 'male', 'male', 'female', 'female', 'female', 'female', 'female'] plt.figure() plt.title('Human Heights and Weights by Sex') plt.xlabel('Height in cm') plt.ylabel('Weight in kg') for i, x in enumerate(X_train): # Use 'x' markers for instances that are male and diamond markers for instances that are female plt.scatter(x[0], x[1], c='k', marker='x' if y_train[i] == 'male' else 'D') plt.grid(True) plt.show() From the plot we can see that men, denoted by the x markers, tend to be taller and weigh more than women. This observation is probably consistent with your experience. Now let’s use KNN to predict whether a person with a given height and weight is a man or a woman. Let’s assume that we want to predict the sex of a person who is 155 cm tall and who weighs 70 kg. First, we must define our distance measure. In this case we will use Euclidean distance, the straight line distance between points in a Euclidean space. Euclidean distance in a two-dimensional space is given by the following: Next we must calculate the distances between the query instance and all of the training instances. height weight label Distance from test instance 158 cm 64 kg male 170 cm 66 kg male 183 cm 84 kg male 191 cm 80 kg male 155 cm 49 kg female 163 cm 59 kg female 180 cm 67 kg female 158 cm 54 kg female 178 cm 77 kg female We will set k to 3, and select the three nearest training instances. The following script calculates the distances between the test instance and the training instances, and identifies the most common sex of the nearest neighbors. # In[2]: x = np.array([[155, 70]]) distances = np.sqrt(np.sum((X_train - x)**2, axis=1)) distances # Out[2]: array([ 6.70820393, 21.9317122 , 31.30495168, 37.36308338, 21. , 13.60147051, 25.17935662, 16.2788206 , 15.29705854]) # In[3]: nearest_neighbor_indices = distances.argsort()[:3] nearest_neighbor_genders = np.take(y_train, nearest_neighbor_indices) nearest_neighbor_genders # Out[3]: array(['male', 'female', 'female'], dtype='|S6') # In[4]: from collections import Counter b = Counter(np.take(y_train, distances.argsort()[:3])) b.most_common(1)[0][0] # Out[4]: 'female' The following plots the query instance, indicated by the circle, and its three nearest neighbors, indicated by the enlarged markers: Two of the neighbors are female, and one is male. We therefore predict that the test instance is female. Now let’s implement a KNN classifier using scikit-learn. # In[5]: from sklearn.preprocessing import LabelBinarizer from sklearn.neighbors import KNeighborsClassifier lb = LabelBinarizer() y_train_binarized = lb.fit_transform(y_train) y_train_binarized # Out[5]: array([[1], [1], [1], [1], [0], [0], [0], [0], [0]]) # In[6]: K = 3 clf = KNeighborsClassifier(n_neighbors=K) clf.fit(X_train, y_train_binarized.reshape(-1)) prediction_binarized = clf.predict(np.array([155, 70]).reshape(1, -1))[0] predicted_label = lb.inverse_transform(prediction_binarized) predicted_label # Out[6]: array(['female'], dtype='|S6') Our labels are strings; we first use LabelBinarizer to convert them to integers. LabelBinarizer implements the transformer interface, which consists of the methods fit, transform, and fit_transform. fit prepares the transformer; in this case, it creates a mapping from label strings to integers. transform applies the mapping to input labels. fit_transform is a convenience method that calls fit and transform. A transformer should be fit only on the training set. Independently fitting and transforming the training and testing sets could result in inconsistent mappings from labels to integers; in this case, male might be mapped to 1 in the training set and 0 in the testing set. Fitting on the entire dataset should also be avoided, as for some transformers it will leak information about the testing set in to the model. This advantage won't be available in production, so performance measures on the test set may be optimistic. We wil discuss this pitfall more when we extract features from text. Next, we initialize a KNeighborsClassifier. Even through KNN is a lazy learner, it still implements the estimator interface. We call fit and predict just as we did with our simple linear regression object. Finally, we can use our fit LabelBinarizer to reverse the transformation and return a string label. Now let’s use our classifier to make predictions for a test set, and evaluate the performance of our classifier. height weight label 168 cm 65 kg male 170 cm 61 kg male 160 cm 52 kg female 169 cm 67 kg female # In[7]: X_test = np.array([ [168, 65], [180, 96], [160, 52], [169, 67] ]) y_test = ['male', 'male', 'female', 'female'] y_test_binarized = lb.transform(y_test) print('Binarized labels: %s' % y_test_binarized.T[0]) predictions_binarized = clf.predict(X_test) print('Binarized predictions: %s' % predictions_binarized) print('Predicted labels: %s' % lb.inverse_transform(predictions_binarized)) # Out[7]: Binarized labels: [1 1 0 0] Binarized predictions: [0 1 0 0] Predicted labels: ['female' 'male' 'female' 'female'] By comparing our test labels to our classifier's predictions, we find that it incorrectly predicted that one of the male test instances was female. There are two types of errors in binary classification tasks: false positives and false negatives. There are many performance measures for classifiers; some measures may be more appropriate than others depending on the consequences of the types of errors in your application. We will assess our classifier using several common performance measures, including accuracy, precision, and recall. Accuracy is the proportion of test instances that were classified correctly. Our model classified one of the four instances incorrectly, so the accuracy is 75%. # In[8]: from sklearn.metrics import accuracy_score print('Accuracy: %s' % accuracy_score(y_test_binarized, predictions_binarized)) # Out[8]: Accuracy: 0.75 Precision is the proportion of test instances that were predicted to be positive that are truly positive. In this example the positive class is male. The assignment of male and “female” to the positive and negative classes is arbitrary, and could be reversed. Our classifier predicted that one of the test instances is the positive class. This instance is truly the positive class, so the classifier’s precision is 100%. # In[9]: from sklearn.metrics import precision_score print('Precision: %s' % precision_score(y_test_binarized, predictions_binarized)) # Out[9]: Precision: 1.0 Recall is the proportion of truly positive test instances that were predicted to be positive. Our classifier predicted that one of the two truly positive test instances is positive. Its recall is therefore 50%. # In[10]: from sklearn.metrics import recall_score print('Recall: %s' % recall_score(y_test_binarized, predictions_binarized)) # Out[10]: Recall: 0.5 Sometimes it is useful to summarize precision and recall with a single statistic, called the F1-score or F1-measure. The F1-score is the harmonic mean of precision and recall. # In[11]: from sklearn.metrics import f1_score print('F1 score: %s' % f1_score(y_test_binarized, predictions_binarized)) # Out[11]: F1 score: 0.666666666667 Note that the arithmetic mean of the precision and recall scores is the upper bound of the F1 score. The F1 score penalizes classifiers more as the difference between their precision and recall scores increases. Finally, the Matthews correlation coefficient is an alternative to the F1 score for measuring the performance of binary classifiers. A perfect classifier’s MCC is 1. A trivial classifier that predicts randomly will score 0, and a perfectly wrong classifier will score -1. MCC is useful even when the proportions of the classes in the test set is severely imbalanced. # In[12]: from sklearn.metrics import matthews_corrcoef print('Matthews correlation coefficient: %s' % matthews_corrcoef(y_test_binarized, predictions_binarized)) # Out[12]: Matthews correlation coefficient: 0.57735026919 scikit-learn also provides a convenience function, classification_report, that reports the precision, recall and F1 score. # In[13]: from sklearn.metrics import classification_report print(classification_report(y_test_binarized, predictions_binarized, target_names=['male'], labels=[1])) # Out[13]: precision recall f1-score support male 1.00 0.50 0.67 2 avg / total 1.00 0.50 0.67 2 Summary In this article we learned about K Nearest Neighbors in which we saw that KNN is lazy learner as well as non-parametric model. We also saw about the classification of KNN. Resources for Article: Further resources on this subject: Introduction to Scikit-Learn [article] Machine Learning in IPython with scikit-learn [article] Machine Learning Models [article]
Read more
  • 0
  • 0
  • 11248
Modal Close icon
Modal Close icon