Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7008 Articles
article-image-hierarchical-clustering
Packt
07 Feb 2017
6 min read
Save for later

Hierarchical Clustering

Packt
07 Feb 2017
6 min read
In this article by Atul Tripathi, author of the book Machine Learning Cookbook, we will cover hierarchical clustering with a World Bank sample dataset. (For more resources related to this topic, see here.) Introduction Hierarchical clustering is one of the most important methods in unsupervised learning is hierarchical clustering. In hierarchical clustering for a given set of data points, the output is produced in the form of a binary tree (dendrogram). In the binary tree, the leaves represent the data points while internal nodes represent nested clusters of various sizes. Each object is assigned to a separate cluster. Evaluation of all the clusters shall take place based on a pairwise distance matrix. The distance matrix shall be constructed using distance values. The pair of clusters with the shortest distance must be considered. The pair identified should then be removed from the matrix and merged together. The merged clusters must be distance must be evaluated with the other clusters and the distance matrix must be updated. The process must be repeated until the distance matrix is reduced to a single element. Hierarchical clustering - World Bank sample dataset One of the main goals for establishing the World Bank has been to fight and eliminate poverty. Continuous evolution and fine tuning its policies in the ever-evolving world has been helping the institution to achieve the goal of poverty elimination. The barometer of success in elimination of poverty is measured in terms of improvement of each of the parameters in health, education, sanitation, infrastructure, and other services needed to improve the lives of poor. The development gains which will ensure the goals must be pursued in an environmentally, socially, and economically sustainable manner. Getting ready In order to perform hierarchical clustering, we shall be using a dataset collected from the World Bank dataset.  Step 1 - Collecting and describing data The dataset titled WBClust2013 shall be used. This is available in the CSV format titled WBClust2013.csv. The dataset is in standard format. There are 80 rows of data. There are 14 variables. The numeric variables are: new.forest Rural log.CO2 log.GNI log.Energy.2011 LifeExp Fertility InfMort log.Exports log.Imports CellPhone RuralWater Pop The non-numeric variables are: Country How to do it Step 2 - exploring data Version info: Code for this page was tested in R version 3.2.3 (2015-12-10) Let's explore the data and understand the relationships among the variables. We'll begin by importing the CSV file named WBClust2013.csv. We will be saving the data to the wbclust data frame: > wbclust=read.csv("d:/WBClust2013.csv",header=T) Next, we shall print the wbclust data frame. The head() function returns the wbclust data frame. The wbclust data frame is passed as an input parameter: > head(wbclust) The results are as follows: Step 3 - transforming data Centering variables and creating z-scores are two common data analysis activities to standardize data. The numeric variables mentioned above need to create z-scores. The scale()function is a generic function whose default method centers and/or scales the columns of a numeric matrix. The data frame, wbclust is passed to the scale function. All the numeric fields are only considered. The result is then stored in another data frame, wbnorm. > wbnorm<-scale(wbclust[,2:13]) > wbnorm The results are as follows: All data frames have a row names attribute. In order to retrieve or set the row or column names of a matrix-like object, the rownames()function is used. The data frame, wbclust with the first column is passed to the rownames()function. > rownames(wbnorm)=wbclust[,1] > rownames(wbnorm) The call to the function rownames(wbnorm)results in display of the values from the first column. The results are as follows: Step 4 - training and evaluating the model performance The next step is about training the model. The first step is to calculate the distance matrix. The dist()function is used. Using the specified distance measure, distances between the rows of a data matrix are computed. The distance measure used can be Euclidean, maximum, Manhattan, Canberra, binary, or Minkowski. The distance measure used is Euclidean. The Euclidean distance calculates the distance between two vectors as sqrt(sum((x_i - y_i)^2)).The result is then stored in a new data frame, dist1. > dist1<-dist(wbnorm, method="euclidean") The next step is to perform clustering using Ward's method. The hclust() function is used. In order to perform cluster analysis on a set of dissimilarities of the n objects, the hclust()function is used. At the first stage, each of the objects is assigned to its own cluster. After which, at each stage the algorithm iterates and joins two of the most similar clusters. This process will continue till there is just a single cluster left. The hclust() function requires that we provide the data in the form of a distance matrix. The dist1 data frame is passed. By default, the complete linkage method is used. There are multiple agglomeration methods which can be used. Some of the agglomeration methods could be ward.D, ward.D2, single, complete, average. > clust1<-hclust(dist1,method="ward.D") > clust1 The call to the function, clust1results in display of the agglomeration methods used, the manner in which the distance is calculated, and the number of objects. The results are as follows: Step 5 - plotting the model The plot()function is a generic function for plotting of R objects. Here, the plot() function is used to draw the dendrogram: > plot(clust1,labels= wbclust$Country, cex=0.7, xlab="",ylab="Distance",main="Clustering for 80 Most Populous Countries") The result is as follows: The rect.hclust() function highlights the clusters and draws the rectangles around the branches of the dendrogram. The dendrogram is first cut at a certain level followed by drawing a rectangle around the selected branches. The object, clust1 is passed as an object to the function along with the number of clusters to be formed: > rect.hclust(clust1,k=5) The result is as follows: The cuts()function shall cut the tree into multiple groups on the basis of the desired number of groups or the cut height. Here, clust1 is passed as an object to the function along with the number of the desired group: > cuts=cutree(clust1,k=5) > cuts The result is as follows: Getting the list of countries in each group. The result is as follows: Summary In this article we covered hierarchical clustering by collecting, exploring its contents, transforming the data. We trained and evaluated it by using distance matrix and finally plotted the data as a dendrogram. Resources for Article: Further resources on this subject: Supervised Machine Learning [article] Specialized Machine Learning Topics [article] Machine Learning Using Spark MLlib [article]
Read more
  • 0
  • 0
  • 5811

article-image-create-your-first-augmented-reality-experience-tools-and-terms-you-need-understand
Andreas Zeitler
06 Feb 2017
5 min read
Save for later

Create Your First Augmented Reality Experience: The Tools and Terms You Need to Understand

Andreas Zeitler
06 Feb 2017
5 min read
This post gives a short summary of the different methods and tools used to create AR experiences with today’s technology. We also outline the advantages and drawbacks of specific solutions. In a follow-up post, we outline how to create your own AR experience.  Augmented Reality (AR) Augmented Reality (AR) is the “it” thing of the moment thanks to Pokémon Go and the Microsoft Hololens. But how can you create your own AR “thing”? The good news is you don’t need to learn a new programming language. Today, there are tools available for most languages used for application development out there. The AR software out there is not packaged in the most user-friendly way, so most of the time, it’s a bit tricky to even get the samples, provided by the software company itself, up and running. The really bad news is that AR is one of those “native-only” features. This means that it cannot yet be achieved using web technologies (JavaScript & HTML5) only, or at least not with any real-world, production-grade performance and fidelity. Consequently, in order to run our AR experience on the intended device (a smartphone, tablet or Windows PC, or Windows tablet with a camera) you need to wrap it into an app, which you need to build yourself. I am including instructions on how to run an app with your own code and AR included for each programming language below.  3D Models First, there’s some more bad news: AR is primarily visual media, so you will need to have great content to create great experiences. “Content” in this case means 3D models optimized for use in real-time rendering on a mobile device. This should not keep you from trying it because you can always get great, free, or open source models in OBJ, FBX, or Collada file format on Turbosquid or the Unity 3D Asset Store. These are the most common 3D file exchange formats, which can be exported by pretty much any software. This will suffice for now.  One more thing before we dive into the code: Markers. Or Triggers. Or Target Images. Or Trackables. You will encounter these terms often. They all refer to the same thing. For a long time now, in order to be able to do AR at all, you needed to “teach” your app a visual pattern to look for in the real world. Once found, the position of the pattern in the real world will provide the frame of reference of positioning the 3D content into the camera picture. To make it easy, this pattern is an image, which you store inside your app either as a plain image or in a special format (depending on the AR framework you use). You can read more about what makes a good AR pattern here on the Vuforia Developer portal (be sure to check out “Natural Features and Image Ratings” and “Local Contrast Enhancement” for more theoretical background). Please note: the augmentation will only work as long as the AR pattern is actually visible to the camera; once it moves outside the field of view, the 3D model vanishes again. That’s a drawback. Using a visual AR pattern to do augmentations can have one benefit, though: in the cases where you know the exact dimensions of the image the AR uses to set the frame of reference (e.g. a magazine cover), you can configure the AR software in a way that the scale of the real world and the scale of the virtual frame of reference in your app match together. A 3D model of a desk, which is 2 feet tall, will then also be 2 feet tall when projected into a room using the magazine as the AR pattern. This enables life-size, 1:1 scale AR projections. Life-size AR projection with fixed-size image pattern  SLAM The opposite of using a pattern for AR is called SLAM. It’s a method of creating a frame of reference for AR without any known patterns within the camera’s field of view. This works reasonably well with today’s software and hardware and has one major benefit: no printed AR marker is needed. The AR can just start; you don’t need to worry about the marker being in the camera’s field of view. So, why is this not the default way of doing it? Firstly, the algorithm is so expensive to run that it drains the battery of smartphones in minutes (well, half an hour, but still). Secondly, it looses “scale” altogether. SLAM-based AR tracking can detect the environment and its horizon (i.e., the plan that is perpendicular to gravity, based on the phone’s gravity sensor), but it cannot detect the environment's scale. If you were to run a SLAM-based augmentation on three different devices in a row, the projected 3D model will likely be a different size every time. The quality of the software you use will make this more or less noticeable. Some frameworks offer hybrid SLAM tracking methods, which are started once a pre-defined pattern is detected. This is a good compromise; it’s called “Extended Tracking” or “Instant Tracking.” If you encounter an option for something like this, enable it. Summary This is pretty much all of the theory you will need to understand before you can create your first AR experience. We will tell you how to do just that in a programming language you already know in a follow-up post.  About the Author Andreas is the Founder and CEO of Vuframe. He’s been working with Augmented & Virtual Reality on a daily basis for the past 8 years. Vuframe’s mission is to democratize AR & VR by removing the tech barrier for everyone.
Read more
  • 0
  • 0
  • 31564

article-image-building-data-driven-application
Packt
06 Feb 2017
8 min read
Save for later

Building Data Driven Application

Packt
06 Feb 2017
8 min read
In this article by Ashish Srivastav the authors of the book ServiceNow Cookbook, we will learn following recipes: Starting a new application Getting into new modules (For more resources related to this topic, see here.) Service-Now provides a great platform to developers. Although it's a java-based application. Starting a new application Out of the box, Service-Now provides many applications for facilitating business operations and other activities in an IT environment, but if you find that the customer's requirements are not fitting into the system's applications boundaries then you can think of creating new applications. In these recipes, we will build a small application which will include table, form, business rules, client script, ACL, update sets, deployment, and so on. Getting ready To step through this recipe, you must have an active Service-Now instance, valid credentials, and an admin role. How to do it… Open any standard web browser and type the instance address. Log in to the Service-Now instance with the credentials. On the left-hand side in the search box, type local update and Service-Now will search the module for you: Local update set for a new application Now, you need to click on Local Update Sets to create a new update set so that you can capture all your configuration in the update set:   Create new update set On clicking, you need to give the name as Book registration application and click on the Submit and Make Current button:   Local update set – book registration application Now you will able to see the Book registration application update set next to your profile, which means you are ready to create a new application:   Current update set On the left-hand side in the search box, type system definition and click on the Application Menus module:   Application menu to create a new application To understand better, you can click on any application menu as shown here, where you can see the application and associated modules. For example, you can clicked on the self-service application as shown here:   Self-service application Now to see the associated modules, you need to scroll down as shown here, and if you want to create a new module then you need to click on the New button:   Self-service application's modules Now you should have a better understanding of how applications look within the Service-Now environment. So, to make a new application, you need to click on the New button on the Application Menus page:   Applications repository After clicking on the New button, you will able to see the configuration page. To understand better, let's consider this example. For instance, you are creating a Book Registration application for your customer, with the following configuration: Title: Book Registration Role: Leave blank Application: Global Active: True Category: Custom Applications (You can change it as per your requirement)   Book registration configuration Click on the Submit button. After sometime, a new Book Registration application menu will be visible under the application menu:   Book registration Getting into new modules A module is a part of an application which contains actually workable items. As a developer, you will always have an option to add a new module to support business requirements. Getting ready To step through this recipe, you should have an active Service-Now instance, valid credentials, and an admin role. How to do it… Open any standard web browser and type the instance address. Log into the Service-Now instance with the credentials. Service-Now gives you many options to create a new module. You can create a new module from the Application Menu module or you can go through the Table & Column module as well. If you have chosen to create a new module from the Application Menu module, then in order to create the module, click on the Book Registration application menu and scroll down. To create a new module, click on the New button:   Creating a New Module After clicking on the New button, you will able to see a configuration screen, where you need configure the following fields: Title: Author Registration Application Menu: Book Registration   The Author Registration module registration under the Book Registration menu Now in the Link Type section, you will need to configure the new module, rather I would say, you will need to define the base line regarding what your new module will do? Whether the new module will show a form to create a record or it will show a list of reports from a table or it will execute some the reports? That's why this is a critical step:   Link type to create a new module and select a table Link type gives you many options to decide the behavior of the new module.   Link type options Now, let's take a different approach to create a new module: On the left-hand side, type column and Service-Now will search the Tables & Columns module for you. Click on the Tables & Columns module:   The Tables & Columns module Now you will able to see the configuration page as shown here, where you need to click on the Create Table button. Note that by clicking on Create Table, you can create a new table:   Tables & Columns – Create a new table After clicking on the Create Table button, you will able to see the configuration page as given here, where you need to configure the following fields: Label: Author Registration (Module name) Name: Auto populate Extends table: Task (By extends, your module will incorporate all fields of the base table, in this scenario, Task table)   Module Configuration Create module: To create a new module through table, check the Create module check box, and to add a module under your application, you will need to select the application name:   Add module in application Controls is a critical section as out of the box, Service-Now gives an option to auto create number. For incident, INC is the prefix or for change ticker CHG is the prefix; here you are also allowed to create your prefix for a new module record:   Configure the new module Controls section Now you will able to auto number the configuration, as shown here; your new records will start with the AUT prefix:   New module auto numbering Click on Submit. After submission of the form, Service-Now will create a role automatically for the new module, as shown here. Only the u_author_registration_user role holder will able to view the module. So whenever a request is generated, you will need to go into that particular user's profile from the user administration module, to add a role:   Role created for module Your module is created, but there are no fields. So for rapid development, you can directly add a column to the table by clicking on Insert a new row...:   The Insert field in the form As an output, you will able to see that a new Author Registrations module is added under the Book Registration application:   Search newly created module Now if you click on the Author Registrations module, you will be able to see the following page:   The Author Registration Page On clicking the New button, you will be able to see the form as shown here. Note that you have not added any field on the u_author_registration table, but the table extends to the TASK table. That's why you are able to see fields on the form, but they are coming through the TASK table:   Author registration form without new fields If you want to add new fields to the form, then you can do so by performing the following steps: Right-click on banner. Select Configure. Click on Form Layout. Now in the Create new field section, you can enter the desired field Name and Type, as shown here: Form Fields Field Type Author Name String Author Primary Email Address String Author Secondary Email Address String Author Mobile Number String Author Permanent Address String Author Temporary Address String Country India USA UK Australia Author Experience String Book Type Choice Book Title String Contract Start Date String Contract End Date String Author registration form fields Click on Add and Save. After saving the form, new fields are added to the Author Registration form:   Author registration form To create a new form section in the form view and section, click on the New option and add the name:   Create a new section After creating the new Payment Details section, you can add fields under this section: Form Fields Field Type Country String Preferred Payment Currency Currency Bank Name String Branch Name String IFSC Code String Bank Address String Payment Details section fields As an output, you will able to see the following screen:   Author registration Payment Details form section Summary In this article we have learned how Service-Now provides a great platform to developers and also starting a new application, and getting into new modules. Resources for Article: Further resources on this subject: Client and Server Applications [article] Modules and Templates [article] My First Puppet Module [article]
Read more
  • 0
  • 0
  • 1610

article-image-ble-and-internet-things
Packt
02 Feb 2017
11 min read
Save for later

BLE and the Internet of Things

Packt
02 Feb 2017
11 min read
In this article by Muhammad Usama bin Aftab, the author of the book Building Bluetooth Low Energy (BLE) Systems, this article is a practical guide to the world of Internet of Things (IoT), where readers will not only learn the theoretical concepts of the Internet of Things but also will get a number of practical examples. The purpose of this article is to bridge the gap between the knowledge base and its interpretation. Much literature is available for the understanding of this domain but it is difficult to find something that follows a hands-on approach to the technology. In this article, the readers will get an introduction of Internet of Things with a special focus on Bluetooth Low Energy (BLE). There is no problem justifying that the most important technology for the Internet of Things is Bluetooth Low Energy as it is widely available throughout the world and almost every cell phone user keeps this technology in his pocket. The article will then go beyond Bluetooth Low Energy and will discuss many other technologies available for the Internet of Things. In this article we'll explore the following topics: Introduction to Internet of Things Current statistics about IoT and how we are living in a world which is going towards Machine to Machine (M2M) communication Technologies in IoT (Bluetooth Low Energy, Bluetooth beacons, Bluetooth mesh and wireless gateways and so on) Typical examples of IoT devices (catering wearables, sports gadgets and autonomous vehicles and so on) (For more resources related to this topic, see here.) Internet of Things The Internet is a system of interconnected devices which uses a full stack of protocols over a number of layers. In early 1960, the first packet-switched network ARPANET was introduced by the United States Department of Defense (DOD) which used a variety of protocols. Later, with the invention of TCP/IP protocols the possibilities were infinite. Many standards were evolved over time to facilitate the communication between devices over a network. Application layer protocols, routing layer protocols, access layer protocols, and physical layer protocols were designed to successfully transfer the Internet packets from the source address to the destination address. Security risks were also taken care of during this process and now we live in the world where the Internet is an essential part of our lives. The world had progressed quite afar from ARPANET and the scientific communities had realized that the need of connecting more and more devices was inevitable. Thus came the need of more Internet addresses. The Internet Protocol version 6 (IPv6) was developed to give support to an almost infinite number of devices. It uses 128 bits' address, allowing 2^128 (3.4 e38) devices to successfully transmit packets over the internet. With this powerful addressing mechanism, it was now possible to think beyond the traditional communication over the Internet. The availability of more addresses opened the way to connect more and more devices. Although, there are other limitations in expanding the number of connected devices, addressing scheme opened up significant ways. Modern Day IoT The idea of modern day Internet of Things is not significantly old. In 2013, the perception of the Internet of Things evolved. The reasons being the merger of wireless technologies, increase the range of wireless communication and significant advancement in embedded technology. It was now possible to connect devices, buildings, light bulbs and theoretically any device which has a power source and can be connected wirelessly. The combination of electronics, software, and network connectivity has already shown enough marvels in the computer industry in the last century and Internet of Things is no different. Internet of Things is a network of connected devices that are aware of their surrounding. Those devices are constantly or eventually transferring data to its neighboring devices in order to fulfil certain responsibility. These devices can be automobiles, sensors, lights, solar panels, refrigerators, heart monitoring implants or any day-to-day device. These things have their dedicated software and electronics to support the wireless connectivity. It also implements the protocol stack and the application level programming to achieve the required functionality: An illustration of connected devices in the Internet of Things Real life examples of the Internet of Things Internet of Things is fascinatingly spread in our surroundings and the best way to check it is to go to a shopping mall and turn on your Bluetooth. The devices you will see are merely a drop in the bucket of the Internet of Things. Cars, watches, printers, jackets, cameras, light bulbs, street lights, and other devices that were too simple before are now connected and continuously transferring data. It is to keep in mind that this progress in the Internet of Things is only 3 years old and it is not improbable to expect that the adoption rate of this technology will be something that we have never seen before. Last decade tells us that the increase in the internet users was exponential where it reached the first billion in 2005, second in 2010 and third in 2014. Currently, there are 3.4 billion internet users present in the world. Although this trend looks unrealistic, the adoption rate of the Internet of Things is even more excessive. The reports say that by 2020, there will be 50 billion connected devices in the world and 90 percent of the vehicles will be connected to the Internet. This expansion will bring $19 trillion in profits by the same year. By the end of this year, wearables will become a $6 billion market with 171 million devices sold. As the article suggests, we will discuss different kinds of IoT devices available in the market today. The article will not cover them all, but to an extent where the reader will get an idea about the possibilities in future. The reader will also be able to define and identify the potential candidates for future IoT devices. Wearables The most important and widely recognized form of Internet of Things is wearables. In the traditional definition, wearables can be any item that can be worn. Wearables technology can range from fashion accessories to smart watches. The Apple Watch is a prime example of wearable technology. It contains fitness tracking and health-oriented sensors/apps which work with iOS and other Apple products. A competitor of Apple Watch is Samsung Gear S2 which provides compatibility with Android devices and fitness sensors. Likewise, there are many other manufacturers who are building smart watches including, Motorola, Pebble, Sony, Huawei, Asus, LG and Tag Heuer. These devices are more than just watches as they form a part of the Internet of Things—they can now transfer data, talk to your phone, read your heart rate and connect directly to Wi-Fi. For example, a watch can now keep track of your steps and transfer this information to the cellphone: Fitbit Blaze and Apple Watch The fitness tracker The fitness tracker is another important example of the Internet of Things where the physical activities of an athlete are monitored and maintained. Fitness wearables are not confined to the bands, there are smart shirts that monitor the fitness goals and progress of the athlete. We will discuss two examples of fitness trackers in this article. Fitbit and Athos smart apparel. The Blaze is a new product from Fitbit which resembles a smart watch. Although it resembles a smart watch, it a fitness-first watch targeted at the fitness market. It provides step tracking, sleep monitoring, and 24/7 heart rate monitoring. Some of Fitbit's competitors like Garmin's vívoactive watch provide a built-in GPS capability as well. Athos apparel is another example of a fitness wearable which provides heart rate and EMG sensors. Unlike watch fitness tracker, their sensors are spread across the apparel. The theoretical definition of wearables may include augmented and virtual reality headsets and Bluetooth earphones/headphones in the list. Smart home devices The evolution of the Internet of Things is transforming the way we live our daily lives as people use wearables and other Internet of Things devices in their daily lives. Another growing technology in the field of the Internet of Things is the smart home. Home automation, sometimes referred to as smart homes, results from extending the home by including automated controls to the things like heating, ventilation, lighting, air-conditioning, and security. This concept is fully supported by the Internet of Things which demands the connection of devices in an environment. Although the concept of smart homes has already existed for several decades 1900s, it remained a niche technology that was either too expensive to deploy or with limited capabilities. In the last decade, many smart home devices have been introduced into the market by major technology companies, lowering costs and opening the doors to mass adoption. Amazon Echo A significant development in the world of home automation was the launch of Amazon Echo in late 2014. The Amazon Echo is a voice enabled device that performs tasks just by recognizing voice commands. The device responds to the name Alexa, a key word that can be used to wake up the device and perform an number of tasks. This keyword can be used followed by a command to perform specific tasks. Some basic commands that can be used to fulfil home automation tasks are: Alexa, play some Adele. Alexa, play playlist XYZ. Alexa, turn the bedroom lights on (Bluetooth enabled lights bulbs (for example Philips Hue) should be present in order to fulfil this command). Alexa, turn the heat up to 80 (A connected thermostat should be present to execute this command). Alexa, what is the weather? Alexa, what is my commute? Alexa, play audiobook a Game of Thrones. Alexa, Wikipedia Packt Publishing. Alexa, How many teaspoons are in one cup? Alexa, set a timer for 10 minutes. With these voice commands, Alexa is fully operable: Amazon Echo, Amazon Tap and Amazon Dot (From left to right) Amazon Echo's main connectivity is through Bluetooth and Wi-Fi. Wi-Fi connectivity enables it to connect to the Internet and to other devices present on the network or worldwide. Bluetooth Low Energy, on the other hand, is used to connect to other devices in the home which are Bluetooth Low Energy capable. For example, Philips Hue and Thermostat are controlled through Bluetooth Low Energy. In Google IO 2016, Google announced a competing smart home device that will use Google as a backbone to perform various tasks, similar to Alexa. Google intends to use this device to further increase their presence in the smart home market, challenging Amazon and Alexa. Amazon also launched Amazon Dot and Amazon Tap. Amazon Dot is a smaller version of Echo which does not have speakers. External speakers can be connected to the Dot in order to get full access to Alexa. Amazon Tap is a more affordable, cheaper and wireless version of Amazon Echo. Wireless bulbs The Philips Hue wireless bulb is another example of a smart home device. It is a Bluetooth Low Energy connected light bulb that's give full control to the user through his smartphone. These colored bulbs can display millions of colors and can be also controlled remotely through the away from home feature. The lights are also smart enough to sync with music: Illustration of controlling Philips Hue bulbs with smartphones Smart refrigerators Our discussion of home automation would not be complete incomplete without discussing kitchen and other house electronics, as several major vendors such as Samsung have begun offering smart appliances for a smarter home. The Family Hub refrigerator is a smart fridge that lets you access the Internet and runs applications. It is also categorized in the Internet of Things devices as it is fully connected to the Internet and provides various controls to the users: Samsung Family Hub refrigerator with touch controls Summary In this article we spoke about the Internet of Things technology and how it is rooting in our real lives. The introduction of the Internet of Things discussed wearable devices, autonomous vehicles, smart light bulbs, and portable media streaming devices. Internet of Things technologies like Wireless Local Area Network (WLAN), Mobile Ad-hoc Networks (MANETs) and Zigbee was discussed in order to have a better understanding of the available choices in the IoT. Resources for Article: Further resources on this subject: Get Connected – Bluetooth Basics [article] IoT and Decision Science [article] Building Voice Technology on IoT Projects [article]
Read more
  • 0
  • 0
  • 15404

article-image-introduction-microsoft-dynamics-nav-2016
Packt
02 Feb 2017
9 min read
Save for later

Introduction to Microsoft Dynamics NAV 2016

Packt
02 Feb 2017
9 min read
In this article by Alexander Drogin, author of the book, Extending Microsoft Dynamics NAV 2016 Cookbook, we will see the basic introduction to Microsoft Dynamics NAV 2016 and C/AL programming. (For more resources related to this topic, see here.) Microsoft Dynamics NAV 2016 has a rich toolset for extending functionality. Besides, a wide number of external tools can be connected to NAV database to enrich data processing and analysis experience. But C/AL, internal NAV application language, is still the primary means of enhancing user experience when it comes to developing new functionality. C/AL Development is framed around objects representing different kinds of functionality and designers associated with each object type. Basic C/AL programming As an example, Let us create a simple function that receives a checksum as a parameter and verifies if a the check sum satisfies given criteria. It is a typical task for a developer working on an ERP system to implement verification like Luhn algorithm that is widely used to validate identification numbers, such as a credit card number. How to do it... In the Object Designer window, create a new codeunit. Assign a number and name (for example, 50000, Luhn Verification). Click View, then C/AL Globals, in the Globals window open the Functions tab. In the first line of the table enter the function name: SplitNumber. Verification function receives a BigInteger number as an input parameter, but the algorithm works with separate digits. Therefore, before starting the validation we  need to convert the number into an array of digits. Position into the function Split number you just declared and click  View, then C/AL Locals. First tab in the Locals window is Parameters. Enter the first parameter of the function: Name: Digits DataType: Byte Var: Set the checkmark in this field Still in the C/AL Locals window, click View, Properties to open the variable properties and set Dimensions = 11 Close the variable properties window and add the second function parameter AccountNumber with type BigInteger. Parameters window should now look like this: Next, navigate to the Variables tab. Insert a variable i of Integer type. Close the local declarations window to return to the code editor and type the function code FOR i := 11 DOWNTO 1 DO BEGIN Digits[i] := AccountNumber MOD 10; AccountNumber := AccountNumber DIV 10; END; Open the C/AL Globals again and insert the second function VerifyCheckSum. This is the main function that implements the verification algorithm. In the C/AL Locals window, insert a single parameter of this function AccountNumber of type BigInteger. Navigate to the Return Value tab and fill in the Return Type field. In this case type should be Boolean. In the C/AL Locals window, declare three local variables as follows: Name Data Type Digits Byte CheckSum Integer i Integer Select the Digits variable, open its properties and set Dimensions to 11. Type the function code: SplitNumber(Digits,AccountNumber); FOR i := 1 TO 10 DO BEGIN IF i MOD 2 = 1 THEN CheckSum += (Digits[i] * 2) DIV 10 + (Digits[i] * 2) MOD 10; END; CheckSum := 10 - (CheckSum * 9) MOD 10; EXIT(CheckSum = Digits[11]); In the OnRun trigger, place the code that will call the verification function: IF VerifyCheckSum(79927398712) THEN MESSAGE(CorrectCheckSumMsg) ELSE MESSAGE(WrongCheckSumMsg); To present the execution result to the user, OnRun uses two text constants that we have not declared yet. To do it, open C/AL Globals in the View menu. In the Text Constants tab, enter the values as shown in the following table: Name Value CorrectCheckSumMsg Account number has correct checksum WrongCheckSumMsg Account number has wrong checksum How it works... Function SplitNumber described in Step 1 through 8 uses a FOR..DOWNTO loop with a loop control variable to iterate on all digits in the BigInteger number, starting from the last digit. In each step the number is divided by 10 using integer division function DIV. Modulus division function MOD returns the remainder of this division that is placed in the corresponding element of an array. The Dimensions property of the parameter Digits tells that this variable is an array consisting of 11 elements (value of Dimensions is the number of elements. Variable with undefined dimensions is a scalar). When a function is called, it can receive arguments either by value or by reference. Var checkmark in the parameter declaration means that the argument will be passed to the function by reference, and all changes made to the Digits array in the function SplitNumber will be reflected in the function VerifyCheckSum that calls SplitNumber. Arrays cannot be function return values in C/AL, so passing an array variable by reference is the only way to send arrays between functions. Function VerifyCheckSum defined in Step 9 to 13 calls the helper function SplitNumber and then uses the same loop type, but iterates from the first digit to the last (FOR 1 TO 10). This loop computes the checksum, which is compared to the last digit of the account number. If the two values match, checksum is correct. Finally, the function returns the Boolean value conveying the verification result, TRUE or FALSE. Based on this result, the OnRun function in the codeunit will display one of the two text constants in a message. In the given example checksum is incorrect, so the message will look like this: To see the message for correct result, replace the last digit in the account number with 3. Correct number is 79927398713. Accessing the Database in C/AL Microsoft Dynamics NAV is an information system, and its primary purpose is to collect, store, organize and present data. Therefore C/AL has a rich set of functions for data access and manipulation. Next example will present a set of basic functions to read data from the NAV database, filter and search records in a table and calculate aggregated values based on database records. In this example suppose we want to calculate the total amount in all open sales orders and invoices for a certain customer. How to do it... In the NAV Object Designer, create a new codeunit object. Open the codeunit object you just created in code designer, position in the OnRun trigger and open local declarations window (C/AL Locals). Declare local variables: Name DataType Subtype SalesLine Record SalesLine StartingDate Date   EndingDate Date   Close the local variables window and declare a global text constant in the C/AL Globals window: Name ConstValue SalesAmountMsg Total amount in sales documents: %1 Return to the code editor and type the function code: StartingDate := CALCDATE('<-1M>',WORKDATE); EndingDate := WORKDATE; SalesLine.SETRANGE("Sell-to Customer No.",'10000'); SalesLine.SETFILTER( "Document Type",'%1|%2', SalesLine."Document Type"::Order, SalesLine."Document Type"::Invoice); SalesLine.SETRANGE("Posting Date",StartingDate,EndingDate); SalesLine.CALCSUMS("Line Amount"); MESSAGE(SalesAmountMsg,SalesLine."Line Amount"); Save the changes, then close the code editor and run the codeunit. How it works... Record is a complex data type. Variable declared as record refers to a table in the database. Variable contains a single table record and can move forward and backward through the record set. C/AL record resembles an object in object-oriented languages, although they are not exactly the same. You can call record methods and read fields using dot notation. For example following are valid statements with customer record variable: Customer.Name := 'New Customer'; IF Customer.Balance <= 0 THEN MESSAGE Variable we just declared refers to the table Sales Line which stores all open sales documents lines. Since we want to calculate sales amount in a certain period, first of all we need to define the date range for the calculation. First line in the code example finds the starting date of the period. In this calculation we refer to the system-defined global variable WORKDATE. If you are an experienced NAV user, you know what a workdate is – this is the default date for all documents created in the system. It does not always match the calendar date, so in the application code we use workdate as the pivot date. Another system variable TODAY stores the actual calendar date, but it is used much less frequently than workdate. Workdate is the last date of the period we want to analyze. To find the first date, use the CALCDATE function. It calculates a date based on the a formula and a reference date. CALCDATE('<-1M>',WORKDATE) means that the resulting date will be one month earlier than the workdate. In the NAV 9.0 demo database workdate is 25.01.2017, so the result of this CALCDATE will be 25.12.2016. Next line sets a filter on the SalesLine table. Filtering is used in C/AL to search for records corresponding given criteria. There are two functions to apply filters to a table: SETFILTER and SETRANGE. Both take the field name to which the filter is applied, as the first parameter. SETRANGE can filter all values within a given range or a single value. In the code example we use it to filter sales lines where the customer code is '10000'. Then we apply one more filter on Posting Date field to filter out all dates less than StartingDate and greater than EndingDate. Another filter is applied on the Document Type field. SalesLine.SETFILTER( "Document Type",'%1|%2', SalesLine."Document Type"::Order, SalesLine."Document Type"::Invoice); We want to see only invoices and orders in the final result, and we can combine these two values in a filter with the SETFILTER function. '%1|%2' is a combination of two placeholders that will be replaced with actual filter values in the runtime. The last database statement in this example is the CALCSUMS function. SETRANGE itself does not change the state of the record variable – it only prepares filters for the following records search or calculation. Now CALCSUNS will calculate the result based on the record filters. It will find the sum of the Line Amount field in all records within the filtered range. Only sales lines in which all filtering conditions are satisfied, will be taken into account: Customer No is '10000' Document Type is Order or Invoice Posting Date is between 25.12.2016 and 25.01.2017 Finally, we will show the result as a message with the MESSAGE function. Placeholders %1 in the message text will be replaced with the second parameter of the function (SalesLine."Line Amount"): Summary So in this article we covered the introduction of Microsoft Dynamics NAV 2016 and basics of C/AL programming. Resources for Article: Further resources on this subject: Code Analysis and Debugging Tools in Microsoft Dynamics NAV 2009 [article] Introduction to NAV 2017 [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 1949

article-image-introduction-magento-2
Packt
02 Feb 2017
10 min read
Save for later

Introduction to Magento 2

Packt
02 Feb 2017
10 min read
In this article, Gabriel Guarino, the author of the book Magento 2 Beginners Guide discusses, will cover the following topics: Magento as a life style: Magento as a platform and the Magento community Competitors: hosted and self-hosted e-commerce platforms New features in Magento 2 What do you need to get started? (For more resources related to this topic, see here.) Magento as a life style Magento is an open source e-commerce platform. That is the short definition, but I would like to define Magento considering the seven years that I have been part of the Magento ecosystem. In the seven years, Magento has been evolving to the point it is today, a complete solution backed up by people with a passion for e-commerce. If you choose Magento as the platform for your e-commerce website, you will receive updates for the platform on a regular basis. Those updates include new features, improvements, and bug fixes to enhance the overall experience in your website. As a Magento specialist, I can confirm that Magento is a platform that can be customized to fit any requirement. This means that you can add new features, include third-party libraries, and customize the default behavior of Magento. As the saying goes, the only limit is your imagination. Whenever I have to talk about Magento, I always take some time to talk about its community. Sherrie Rohde is the Magento Community Manager and she has shared some really interesting facts about the Magento community in 2016: Delivered over 725 talks on Magento or at Magento-centric events Produced over 100 podcast episodes around Magento Organized and produced conferences and meetup groups in over 34 countries Written over 1000 blog posts about Magento Types of e-commerce solutions There are two types of e-commerce solutions: hosted and self-hosted. We will analyze each e-commerce solution type, and we will cover the general information, pros, and cons of each platform from each category. Self-hosted e-commerce solutions The self-hosted e-commerce solution is a platform that runs on your server, which means that you can download the code, customize it based on your needs, and then implement it in the server that you prefer. Magento is a self-hosted e-commerce solution, which means that you have absolute control on the customization and implementation of your Magento store. WooCommerce WooCommerce is a free shopping cart plugin for WordPress that can be used to create a full-featured e-commerce website. WooCommerce has been created following the same architecture and standards of WordPress, which means that you can customize it with themes and plugins. The plugin currently has more than 18,000,000 downloads, which represents over 39% of all online stores. Pros: It can be downloaded for free Easy setup and configuration A lot of themes available Almost 400 extensions in the marketplace Support through the WooCommerce help desk Cons: WooCommerce cannot be used without WordPress Some essential features are not included out-of-the-box, such us PayPal as a payment method, which means that you need to buy several extensions to add those features Adding custom features to WooCommerce through extensions can be expensive PrestaShop PrestaShop is a free open source e-commerce platform. The platform is currently used by more than 250,000 online stores and is backed by a community of more than 1,000,000 members. The company behind PrestaShop provides a range of paid services, such us technical support, migration, and training to run, manage, and maintain the store. Pros: Free and open source 310 integrated features 3,500 modules and templates in the marketplace Downloaded over 4 million times 63 languages Cons: As WooCommerce, many basic features are not included by default and adding those features through extensions is expensive Multiple bugs and complaints from the PrestaShop community OpenCart OpenCart is an open source platform for e-commerce, available under the GNU General Public License. OpenCart is a good choice for a basic e-commerce website. Pros: Free and open source Easy learning curve More than 13,000 extensions available More than 1,500 themes available Cons: Limited features Not ready for SEO No cache management page in admin panel Hard to customize Hosted e-commerce solutions A hosted e-commerce solution is a platform that runs on the server from the company that provides that service, which means that the solution is easier to set up but there are limitations and you don’t have the freedom to customize the solution according to your needs. The monthly or annual fees increase when the store attracts more traffic and has more customers and orders placed. Shopify Shopify is a cloud-based e-commerce platform for small and medium-sized business. The platform currently powers over 325,000 online stores in approximately 150 countries. Pros: No technical skills required to use the platform Tool to import products from another platform during the sign up process More than 1,500 apps and integrations 24/7 support through phone, chat, and e-mail Cons: The source code is not provided Recurring fee to use the platform Hard to migrate from Shopify to another platform BigCommerce BigCommerce is one of the most popular hosted e-commerce platforms, which is powering more than 95,000 stores in 150 countries. Pros: No technical skills required to use the platform More than 300 apps and integrations available More than 75 themes available Cons: The source code is not provided Recurring fee to use the platform Hard to migrate from BigCommerce to another platform New features in Magento 2 Magento 2 is the new generation of the platform, with new features, technologies, and improvements that make Magento one of the most robust and complete e-commerce solutions available at the moment. In this section, we will describe the main differences between Magento 1 and Magento 2. New technologies Composer: This is a dependency manager for PHP. The dependencies can be declared and Composer will manage these dependencies by installing and updating them. In Magento 2, Composer simplifies the process of installing and upgrading extensions and upgrading Magento. Varnish 4: This is an open source HTTP accelerator. Varnish stores pages and other assets in memory to reduce the response time and network bandwidth consumption. Full Page Caching: In Magento 1, Full Page Caching was only included in the Magento Enterprise Edition. In Magento 2, Full Page Caching is included in all the editions, allowing the content from static pages to be cached, increasing the performance and reducing the server load. Elasticsearch: This is a search engine that improves the search quality in Magento and provides background re-indexing and horizontal scaling. RequireJS: It is a library to load Javascript files on-the-fly, reducing the number of HTTP requests and improving the speed of the Magento Store. jQuery: The frontend in Magento 1 was implemented using Prototype as the language for Javascript. In Magento 2, the language for Javascript code is jQuery. Knockout.js: This is an open source Javascript library that implements the Model-View-ViewModel (MVVM) pattern, providing a great way of creating interactive frontend components. LESS: This is an open source CSS preprocessor that allows the developer to write styles for the store in a more maintainable and extendable way. Magento UI Library: This is a modular frontend library that uses a set of mix-ins for general elements and allows developers to work more efficiently on frontend tasks. New tools Magento Performance Toolkit: This is a tool that allows merchants and developers to test the performance of the Magento installation and customizations. Magento 2 command-line tool: This is a tool to run a set of commands in the Magento installation to clear the cache, re-index the store, create database backups, enable maintenance mode, and more. Data Migration Tool: This tool allows developers to migrate the existing data from Magento 1.x to Magento 2. The tool includes verification, progress tracking, logging, and testing functions. Code Migration Toolkit: This allows developers to migrate Magento 1.x extensions and customizations to Magento 2. Manual verification and updates are required in order to make the Magento 1.x extensions compatible with Magento 2. Magento 2 Developer Documentation: One of the complaints by the Magento community was that Magento 1 didn’t have enough documentation for developers. In order to resolve this problem, the Magento team created the official Magento 2 Developer Documentation with information for developers, system administrators, designers, and QA specialists. Admin panel changes Better UI: The admin panel has a new look-and-feel, which is more intuitive and easier to use. In addition to that, the admin panel is now responsive and can be viewed from any device in any resolution. Inline editing: The admin panel grids allow inline editing to manage data in a more effective way. Step-by-step product creation: The product add/edit page is one of the most important pages in the admin panel. The Magento team worked hard to create a different experience when it comes to adding/editing products in the Magento admin panel, and the result is that you can manage products with a step-by-step page that includes the fields and import tools separated in different sections. Frontend changes Integrated video in product page: Magento 2 allows uploading a video for the product, introducing a new way of displaying products in the catalog. Simplified checkout: The steps in the checkout page have been reduced to allow customers to place orders in less time, increasing the conversion rate of the Magento store. Register section removed from checkout page: In Magento 1, the customer had the opportunity to register from step 1 of the checkout page. This required the customer to think about his account and the password before completing the order. In order to make the checkout simpler, Magento 2 allows the customer to register from the order success page without delaying the checkout process. What do you need to get started? Magento is a really powerful platform and there is always something new to learn. Just when you think you know everything about Magento, a new version is released with new features to discover. This makes Magento fun, and this makes Magento unique as an e-commerce platform. That being said, this book will be your guide to discover everything you need to know to implement, manage, and maintain your first Magento store. In addition to that, I would like to highlight additional resources that will be useful in your journey of mastering Magento: Official Magento Blog (https://magento.com/blog): Get the latest news from the Magento team: best practices, customer stories, information related to events, and general Magento news Magento Resources Library (https://magento.com/resources): Videos, webinars and publications covering useful information organized by categories: order management, marketing and merchandising, international expansion, customer experience, mobile architecture and technology, performance and scalability, security, payments and fraud, retail innovation, and business flexibility Magento Release Information (http://devdocs.magento.com/guides/v2.1/release-notes/bk-release-notes.html): This is the place where you will get all the information about the latest Magento releases, including the highlights of each release, security enhancements, information about known issues, new features, and instructions for upgrade Magento Security Center (https://magento.com/security): Information about each of the Magento security patches as well as best practices and guidelines to keep your Magento store secure Upcoming Events and Webinars (https://magento.com/events): The official list of upcoming Magento events, including live events and webinars Official Magento Forums (https://community.magento.com): Get feedback from the Magento community in the official Magento Forums Summary In this article, we reviewed Magento 2 and the changes that have been introduced in the new version of the platform. We also analyzed the types of e-commerce solutions and the most important platforms available. Resources for Article: Further resources on this subject: Installing Magento [article] Magento : Payment and shipping method [article] Magento 2 – the New E-commerce Era [article]
Read more
  • 0
  • 0
  • 36770
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-what-azure-api-management
Packt
01 Feb 2017
15 min read
Save for later

What is Azure API Management?

Packt
01 Feb 2017
15 min read
In this article by Martin Abbott, Ashish Bhambhani, James Corbould, Gautam Gyanendra, Abhishek Kumar, and Mahindra Morar, authors of the book Robust Cloud Integration with Azure, we learn that it is important to know how to control and manage API assets that exist or are built as part of any enterprise development. (For more resources related to this topic, see here.) Typically, modern APIs are used to achieve one of the following two outcomes: First, to expose the on-premises line of business applications, such as Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) solutions to other applications that need to consume and interact with these enterprise assets both on-premises and in the cloud Second, to provide access to the API for commercial purposes to monetize access to the assets exposed by the API The latter use case is important as it allows organizations to extend the use of their API investment, and it has led to what has become known as the API economy. The API economy provides a mechanism to gain additional value from data contained within the organizational boundary whether that data exists in the cloud or on-premises. When providing access to information via an API, two considerations are important: Compliance: This ensures that an access to the API and the use of the API meets requirements around internal or legal policies and procedures, and it provides reporting and auditing information Governance: This ensures the API is accessed and used only by those authorized to do so, and in a way, that is controlled and if necessary metered, and provides reporting and auditing information, which can be used, for example, to provide usage information for billing In order to achieve this at scale in an organization, a tool is required that can be used to apply both compliance and governance structures to an exposed endpoint. This is required to ensure that the usage of the information behind that endpoint is limited only to those who should be allowed access and only in a way that meets the requirements and policies of the organization. This is where API Management plays a significant role. There are two main types of tools that fit within the landscape that broadly fall under the banner of API Management: API Management: These tools provide the compliance and governance control required to ensure that the exposed API is used appropriately and data presented in the correct format. For example, a message may be received in the XML format, but the consuming service may need the data in the JSON format. They can also provide monitoring tools and access control that allows organizations to gain insight into the use of the API, perhaps with the view to charge a fee for access. API Gateway: These tools provide the same or similar level of management as normal API Management tools, but often include other functionality that allows some message mediation and message orchestration thereby allowing more complex interactions and business processes to be modeled, exposed, and governed. Microsoft Azure API Management falls under the first category above whilst Logic Apps, provide the capabilities (and more) that API Gateways offer. Another important aspect of providing management of APIs is creating documentation that can be used by consumers, so they know how to interact with and get the best out of the API. For APIs, generally, it is not a case of build it and they will come, so some form of documentation that includes endpoint and operation information, along with sample code, can lead to greater uptake of usage of the API. Azure API Management is currently offered in three tiers: Developer, Standard, and Premium. The details associated with these tiers at the time of writing are shown in the following table:   Developer Standard Premium API Calls (per unit) 32 K / day ( ~1 M / month ) 7 M / day ( ~217 M / month ) 32 M / day ( ~1 B / month ) Data Transfer (per unit) 161 MB / day ( ~5 GB / month ) 32 GB / day ( ~1 TB / month ) 161 GB / day ( ~5 TB / month ) Cache 10 MB 1 GB 5 GB Scale-out N/A 4 units Unlimited SLA N/A 99.9% 99.95% Multi-Region Deployment N/A N/A Yes Azure Active Directory Integration Unlimited user accounts N/A Unlimited user accounts VPN Yes N/A Yes Key items of note in the table are Scale-out, multiregion deployment, and Azure Active Directory Integration. Scale-out: This defines how many instances, or units, of the API instance are possible; this is configured through the Azure Classic Portal Multi-region deployment: When using Premium tier, it is possible to deploy the API Management instance to many locations to provided geographically distributed load Azure Active Directory Integration: If an organization synchronizes an on-premises Active Directory domain to Azure, access to the API endpoints can be configured to use Azure Active Directory to provide same sign-on capabilities The main use case for Premium tier is if an organization has many hundreds or even thousands of APIs they want to expose to developers, or in cases where scale and integration with line of business APIs is critical. The anatomy of Azure API Management To understand how to get the best out of an API, it is important to understand some terms that are used for APIs and within Azure API Management, and these are described here. API and operations An API provides an abstraction layer through an endpoint that allows interaction with entities or processes that would otherwise be difficult to consume. Most API developers favor using a RESTful approach to API applications since this allows us easy understanding on how to work with the operations that the API exposes and provides scalability, modifiability, reliability, and performance. Representational State Transfer (REST) is an architectural style that was introduced by Roy Fielding in his doctoral thesis in 2000 (http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm). Typically, modern APIs are exposed using HTTP since this makes it easier for different types of clients to interact with it, and this increased interoperability provides the greatest opportunity to offer additional value and greater adoption across different technology stacks. When building an API, a set of methods or operations is exposed that a user can interact with in a predictable way. While RESTful services do not have to use HTTP as a transfer method, nearly all modern APIs do, since the HTTP standard is well known to most developers, and it is simple and straightforward to use. Since the operations are called via HTTP, a distinct endpoint or Unified Resource Identifier (URI) is required to ensure sufficient modularity of the API service. When calling an endpoint, which may for example represent, an entity in a line of business system, HTTP verbs (GET, POST, PUT, and DELETE, for example) are used to provide a standard way of interacting with the object. An example of how these verbs are used by a developer to interact with an entity is given in the following table: TYPE GET POST PUT DELETE Collection Retrieve a list of entities and their URIs Create a new entity in the collection Replace (update) a collection Delete the entire collection Entity Retrieve a specific entity and its information usually in a particular data format Create a new entity in the collection, not generally used Replace (update) an entity in the collection, or if it does not exist, create it Delete a specific entity from a collection When passing data to and receiving data from an API operation, the data needs to be encapsulated in a specific format. When services and entities were exposed through SOAP-based services, this data format was typically XML. For modern APIs, JavaScript Object Notation (JSON) has become the norm. JSON has become the format of choice since it has a smaller payload than XML and a smaller processing overhead, which suits the limited needs of mobile devices (often running on battery power). JavaScript (as the acronym JSON implies) also has good support for processing and generating JSON, and this suits developers, who can leverage existing toolsets and knowledge. API operations should abstract small amounts of work to be efficient, and in order to provide scalability, they should be stateless, and they can be scaled independently. Furthermore, PUT and DELETE operations must be created that ensure consistent state regardless of how many times the specific operation is performed, this leads to the need of those operations being idempotent. Idempotency describes an operation that when performed multiple times produces the same result on the object that is being operated on. This is an important concept in computing, particularly, where you cannot guarantee that an operation will only be performed once, such as with interactions over the Internet. Another outcome of using a URI to expose entities is that the operation is easily modified and versioned because any new version can simply be made available on a different URI, and because HTTP is used as a transport mechanism, endpoint calls can be cached to provide better performance and HTTP Headers can be used to provide additional information, for example security. By default, when an instance of API Management is provisioned, it has a single API already available named Echo API. This has the following operations: Creating resource Modifying resource Removing resource Retrieving header only Retrieving resource Retrieving resource (cached) In order to get some understanding of how objects are connected, this API can be used, and some information is given in the next section. Objects within API Management Within Azure API Management, there are a number of key objects that help define a structure and provide the governance, compliance, and security artifacts required to get the best out of a deployed API, as shown in the following diagram: As can be seen, the most important object is a PRODUCT. A product has a title and description and is used to define a set of APIs that are exposed to developers for consumption. They can be Open or Protected, with an Open product being publicly available and a Protected product requiring a subscription once published. Groups provide a mechanism to organize the visibility of and access to the APIs within a product to the development community wishing to consume the exposed APIs. By default, a product has three standard groups that cannot be deleted: Administrators: Subscription administrators are included by default, and the members of this group manage API services instances, API creation, API policies, operations, and products Developers: The members of this group have authenticated access to the Developer Portal; they are the developers who have chosen to build applications that consume APIs exposed as a specific product Guests: Guests are able to browse products through the Developer Portal and examine documentation, and they have read-only access to information about the products In addition to these built-in groups, it is possible to create new groups as required, including the use of groups within an Azure Active Directory tenant. When a new instance of API Management is provisioned, it has the following two products already configured: Starter: This product limits subscribers to a maximum of five calls per minute up to a maximum of 100 calls per week Unlimited: This product has no limits on use, but subscribers can only use it with the administrator approval Both of these products are protected, meaning that they need to be subscribed to and published. They can be used to help gain some understanding of how the objects within API Management interact. These products are configured with a number of sample policies that can be used to provide a starting point. Azure API Management policies API Management policies are the mechanism used to provide governance structures around the API. They can define, for instance, the number of call requests allowed within a period, cross-origin resource sharing (CORS), or certificate authentication to a service backend. Policies are defined using XML and can be stored in source control to provide active management. Policies are discussed in greater detail later in the article. Working with Azure API Management Azure API Management is the outcome of the acquisition by Microsoft of Apiphany, and as such it has its own management interfaces. Therefore, it has a slightly different look and feel to the standard Azure Portal content. The Developer and Publisher Portals are described in detail in this section, but first a new instance of API Management is required. Once created and the provisioning in the Azure infrastructure can take some time, most interactions take place through the Developer and Publisher Portals. Policies in Azure API Management In order to provide control over interactions with Products or APIs in Azure API Management, policies are used. Policies make it possible to change the default behavior of an API in the Product, for example, to meet the governance needs of your company or Product, and are a series of statements executed sequentially on each request or response of an API. Three demo scenarios will provide a "taster" of this powerful feature of Azure API Management. How to use Policies in Azure API Management Policies are created and managed through the Publisher Portal. The first step in policy creation is to determine at what scope the policy should be applied. Policies can be assigned to all Products, individual Products, the individual APIs associated with a Product, and finally the individual operations associated with an API. Secure your API in Azure API Management We have previously discussed how it is possible to organize APIs in Products with those products further refined through the use of Policies. Access to and visibility of products is controlled through the use of Groups and developer subscriptions for those APIs requiring subscriptions. In most enterprise scenarios where you are providing access to some line of business system on-premises, it is necessary to provide sufficient security on the API endpoint to ensure that the solution remains compliant. There are a number of ways to achieve this level of security using Azure API Management, such as using certificates, Azure Active Directory, or extending the corporate network into Microsoft Azure using a Virtual Private Network (VPN), and creating a hybrid cloud solution. Securing your API backend with Mutual Certificates Certificate exchange allows Azure API Management and an API to create a trust boundary based on encryption that is well understood and easy to use. In this scenario, because Azure API Management is communicating with an API that has been provided, a self-signed certificate is allowed as the key exchange for the certificate is via a trusted party. For an in-depth discussion on how to configure Mutual Certificate authentication to secure your API, please refer to the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-mutual-certificates/). Securing your API backend with Azure Active Directory If an enterprise already uses Azure Active Directory to provide single or same sign-on to cloud-based services, for instance, on-premises Active Directory synchronization via ADConnect or DirSync, then this provides a good opportunity to leverage Azure Active Directory to provide a security and trust boundary to on-premises API solutions. For an in-depth discussion on how to add Azure Active Directory to an API Management instance, please see the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-protect-backend-with-aad/). VPN connection in Azure API Management Another way of providing a security boundary between Azure API Management and the API is managing the creation of a virtual private network. A VPN creates a tunnel between the corporate network edge and Azure, essentially creating a hybrid cloud solution. Azure API Management supports site-to-site VPNs, and these are created using the Classic Portal. If an organization already has an ExpressRoute circuit provisioned, this can also be used to provide connectivity via private peering. Because a VPN needs to communicate to on-premises assets, a number of firewall port exclusions need to be created to ensure the traffic can flow between the Azure API Management instance and the API endpoint. Monitoring your API Any application tool is only as good as the insight you can gain from the operation of the tool. Azure API Management is no exception and provides a number of ways of getting information about how the APIs are being used and are performing. Summary API Management can be used to provide developer access to key information in your organization, information that could be sensitive, or that needs to be limited in use. Through the use of Products, Policies, and Security, it is possible to ensure that firm control is maintained over the API estate. The developer experience can be tailored to provide a virtual storefront to any APIs along with information and blogs to help drive deeper developer engagement. Although not discussed in this article, it is also possible for developers to publish their own applications to the API Management instance for other developers to use. Resources for Article: Further resources on this subject: Creating Multitenant Applications in Azure [article] Building A Recommendation System with Azure [article] Putting Your Database at the Heart of Azure Solutions [article]
Read more
  • 0
  • 0
  • 28974

article-image-building-search-geo-locator-elasticsearch-and-spark
Packt
31 Jan 2017
12 min read
Save for later

Building A Search Geo Locator with Elasticsearch and Spark

Packt
31 Jan 2017
12 min read
In this article, Alberto Paro, the author of the book Elasticsearch 5.x Cookbook - Third Edition discusses how to use and manage Elasticsearch covering topics as installation/setup, mapping management, indices management, queries, aggregations/analytics, scripting, building custom plugins, and integration with Python, Java, Scala and some big data tools such as Apache Spark and Apache Pig. (For more resources related to this topic, see here.) Background Elasticsearch is a common answer for every needs of search on data and with its aggregation framework, it can provides analytics in real-time. Elasticsearch was one of the first software that was able to bring the search in BigData world. It’s cloud native design, JSON as standard format for both data and search, and its HTTP based approach are only the solid bases of this product. Elasticsearch solves a growing list of search, log analysis, and analytics challenges across virtually every industry. It’s used by big companies such as Linkedin, Wikipedia, Cisco, Ebay, Facebook, and many others (source https://www.elastic.co/use-cases). In this article, we will show how to easily build a simple search geolocator with Elasticsearch using Apache Spark for ingestion. Objective In this article, they will develop a search geolocator application using the world geonames database. To make this happen the following steps will be covered: Data collection Optimized Index creation Ingestion via Apache Spark Searching for a location name Searching for a city given a location position Executing some analytics on the dataset. All the article code is available on GitHub at https://github.com/aparo/elasticsearch-geonames-locator. All the below commands need to be executed in the code directory on Linux/MacOS X. The requirements are a local Elasticsearch Server instance, a working local Spark installation and SBT installed (http://www.scala-sbt.org/) . Data collection To populate our application we need a database of geo locations. One of the most famous and used dataset is the GeoNames geographical database, that is available for download free of charge under a creative commons attribution license. It contains over 10 million geographical names and consists of over 9 million unique features whereof 2.8 million populated places and 5.5 million alternate names. It can be easily downloaded from http://download.geonames.org/export/dump. The dump directory provided CSV divided in counties and but in our case we’ll take the dump with all the countries allCountries.zip file To download the code we can use wget via: wget http://download.geonames.org/export/dump/allCountries.zip Then we need to unzip it and put in downloads folder: unzip allCountries.zip mv allCountries.txt downloads The Geoname dump has the following fields: No. Attribute name Explanation 1 geonameid Unique ID for this geoname 2 name The name of the geoname 3 asciiname ASCII representation of the name 4 alternatenames Other forms of this name. Generally in several languages 5 latitude Latitude in decimal degrees of the Geoname 6 longitude Longitude in decimal degrees of the Geoname 7 fclass Feature class see http://www.geonames.org/export/codes.html 8 fcode Feature code see http://www.geonames.org/export/codes.html 9 country ISO-3166 2-letter country code 10 cc2 Alternate country codes, comma separated, ISO-3166 2-letter country code 11 admin1 Fipscode (subject to change to iso code 12 admin2 Code for the second administrative division, a county in the US 13 admin3 Code for third level administrative division 14 admin4 Code for fourth level administrative division 15 population The population of Geoname 16 elevation The elevation in meters of Geoname 17 gtopo30 Digital elevation model 18 timezone The timezone of Geoname 19 moddate The date of last change of this Geoname Table 1: Dataset characteristics Optimized Index creation Elasticsearch provides automatic schema inference for your data, but the inferred schema is not the best possible. Often you need to tune it for: Removing not-required fields Managing Geo fields. Optimizing string fields that are index twice in their tokenized and keyword version. Given the Geoname dataset, we will add a new field location that is a GeoPoint that we will use in geo searches. Another important optimization for indexing, it’s define the correct number of shards. In this case we have only 11M records, so using only 2 shards is enough. The settings for creating our optimized index with mapping and shards is the following one: { "mappings": { "geoname": { "properties": { "admin1": { "type": "keyword", "ignore_above": 256 }, "admin2": { "type": "keyword", "ignore_above": 256 }, "admin3": { "type": "keyword", "ignore_above": 256 }, "admin4": { "type": "keyword", "ignore_above": 256 }, "alternatenames": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "asciiname": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "cc2": { "type": "keyword", "ignore_above": 256 }, "country": { "type": "keyword", "ignore_above": 256 }, "elevation": { "type": "long" }, "fclass": { "type": "keyword", "ignore_above": 256 }, "fcode": { "type": "keyword", "ignore_above": 256 }, "geonameid": { "type": "long" }, "gtopo30": { "type": "long" }, "latitude": { "type": "float" }, "location": { "type": "geo_point" }, "longitude": { "type": "float" }, "moddate": { "type": "date" }, "name": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } }, "population": { "type": "long" }, "timezone": { "type": "text", "fields": { "keyword": { "type": "keyword", "ignore_above": 256 } } } } } }, "settings": { "index": { "number_of_shards": "2", "number_of_replicas": "1" } } } We can store the above JSON in a file called settings.json and we can create an index via the curl command: curl -XPUT http://localhost:9200/geonames -d @settings.json Now our index is created and ready to receive our documents. Ingestion via Apache Spark Apache Spark is very hardy for processing CSV and manipulate the data before saving it in a storage both disk or NoSQL. Elasticsearch provides easy integration with Apache Spark allowing write Spark RDD with a single command in Elasticsearch. We will build a spark job called GeonameIngester that will execute the following steps: Initialize the Spark Job Parse the CSV Defining our required structures and conversions Populating our classes Writing the RDD in Elasticsearch Executing the Spark Job Initialize the Spark Job We need to import required classes: import org.apache.spark.sql.SparkSession import org.apache.spark.sql.types._ import org.elasticsearch.spark.rdd.EsSpark import scala.util.Try We define the GeonameIngester object and the SparkSession: object GeonameIngester { def main(args: Array[String]) { val sparkSession = SparkSession.builder .master("local") .appName("GeonameIngester") .getOrCreate() To easy serialize complex datatypes, we switch to use the Kryo encoder: import scala.reflect.ClassTag implicit def kryoEncoder[A](implicit ct: ClassTag[A]) = org.apache.spark.sql.Encoders.kryo[A](ct) import sparkSession.implicits._ Parse the CSV For parsing the CSV, we need to define the Geoname schema to be used to read: val geonameSchema = StructType(Array( StructField("geonameid", IntegerType, false), StructField("name", StringType, false), StructField("asciiname", StringType, true), StructField("alternatenames", StringType, true), StructField("latitude", FloatType, true), StructField("longitude", FloatType, true), StructField("fclass", StringType, true), StructField("fcode", StringType, true), StructField("country", StringType, true), StructField("cc2", StringType, true), StructField("admin1", StringType, true), StructField("admin2", StringType, true), StructField("admin3", StringType, true), StructField("admin4", StringType, true), StructField("population", DoubleType, true), // Asia population overflows Integer StructField("elevation", IntegerType, true), StructField("gtopo30", IntegerType, true), StructField("timezone", StringType, true), StructField("moddate", DateType, true))) Now we can read all the geonames from CSV via: val GEONAME_PATH = "downloads/allCountries.txt" val geonames = sparkSession.sqlContext.read .option("header", false) .option("quote", "") .option("delimiter", "t") .option("maxColumns", 22) .schema(geonameSchema) .csv(GEONAME_PATH) .cache() Defining our required structures and conversions The plain CSV data is not suitable for our advanced requirements, so we define new classes to store our Geoname data. We define a GeoPoint object to store the Geo Point location of our geoname. case class GeoPoint(lat: Double, lon: Double) We define also our Geoname class with optional and list types: case class Geoname(geonameid: Int, name: String, asciiname: String, alternatenames: List[String], latitude: Float, longitude: Float, location: GeoPoint, fclass: String, fcode: String, country: String, cc2: String, admin1: Option[String], admin2: Option[String], admin3: Option[String], admin4: Option[String], population: Double, elevation: Int, gtopo30: Int, timezone: String, moddate: String) To reduce the boilerplate of the conversion we define an implicit method that convert a String in an Option[String] if it is empty or null. implicit def emptyToOption(value: String): Option[String] = { if (value == null) return None val clean = value.trim if (clean.isEmpty) { None } else { Some(clean) } } During processing, in case of the population value is null we need a function to fix this value and set it to 0: to do this we define a function to fixNullInt: def fixNullInt(value: Any): Int = { if (value == null) 0 else { Try(value.asInstanceOf[Int]).toOption.getOrElse(0) } } Populating our classes We can populate the records that we need to store in Elasticsearch via a map on geonames DataFrame. val records = geonames.map { row => val id = row.getInt(0) val lat = row.getFloat(4) val lon = row.getFloat(5) Geoname(id, row.getString(1), row.getString(2), Option(row.getString(3)).map(_.split(",").map(_.trim).filterNot(_.isEmpty).toList).getOrElse(Nil), lat, lon, GeoPoint(lat, lon), row.getString(6), row.getString(7), row.getString(8), row.getString(9), row.getString(10), row.getString(11), row.getString(12), row.getString(13), row.getDouble(14), fixNullInt(row.get(15)), row.getInt(16), row.getString(17), row.getDate(18).toString ) } Writing the RDD in Elasticsearch The final step is to store our new build DataFrame records in Elasticsearch via: EsSpark.saveToEs(records.toJavaRDD, "geonames/geoname", Map("es.mapping.id" -> "geonameid")) The value “geonames/geoname” are the index/type to be used for store the records in Elasticsearch. To maintain the same ID of the geonames in both CSV and Elasticsearch we pass an additional parameter es.mapping.id that refers to where find the id to be used in Elasticsearch geonameid in the above example. Executing the Spark Job To execute a Spark job you need to build a Jar with all the required library and than to execute it on spark. The first step is done via sbt assembly command that will generate a fatJar with only the required libraries. To submit the Spark Job in the jar, we can use the spark-submit command: spark-submit --class GeonameIngester target/scala-2.11/elasticsearch-geonames-locator-assembly-1.0.jar Now you need to wait (about 20 minutes on my machine) that Spark will send all the documents to Elasticsearch and that they are indexed. Searching for a location name After having indexed all the geonames, you can search for them. In case we want search for Moscow, we need a complex query because: City in geonames are entities with fclass=”P” We want skip not populated cities We sort by population descendent to have first the most populated The city name can be in name, alternatenames or asciiname field To achieve this kind of query in Elasticsearch we can use a simple Boolean with several should queries for match the names and some filter to filter out unwanted results. We can execute it via curl via: curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "minimum_should_match": 1, "should": [ { "term": { "name": "moscow"}}, { "term": { "alternatenames": "moscow"}}, { "term": { "asciiname": "moscow" }} ], "filter": [ { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }' We used “moscow” lowercase because it’s the standard token generate for a tokenized string (Elasticsearch text type). The result will be similar to this one: { "took": 14, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 9, "max_score": null, "hits": [ { "_index": "geonames", "_type": "geoname", "_id": "524901", "_score": null, "_source": { "name": "Moscow", "location": { "lat": 55.752220153808594, "lon": 37.61555862426758 }, "latitude": 55.75222, "population": 10381222, "moddate": "2016-04-13", "timezone": "Europe/Moscow", "alternatenames": [ "Gorad Maskva", "MOW", "Maeskuy", .... ], "country": "RU", "admin1": "48", "longitude": 37.61556, "admin3": null, "gtopo30": 144, "asciiname": "Moscow", "admin4": null, "elevation": 0, "admin2": null, "fcode": "PPLC", "fclass": "P", "geonameid": 524901, "cc2": null }, "sort": [ 10381222 ] }, Searching for cities given a location position We have processed the geoname so that in Elasticsearch, we were able to have a GeoPoint field. Elasticsearch GeoPoint field allows to enable search for a lot of geolocation queries. One of the most common search is to find cities near me via a Geo Distance Query. This can be achieved modifying the above search in curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d '{ "query": { "bool": { "filter": [ { "geo_distance" : { "distance" : "100km", "location" : { "lat" : 55.7522201, "lon" : 36.6155586 } } }, { "term": { "fclass": "P" }}, { "range": { "population": {"gt": 0}}} ] } }, "sort": [ { "population": { "order": "desc"}}] }' Executing an analytic on the dataset. Having indexed all the geonames, we can check the completes of our dataset and executing analytics on them. For example, it’s useful to check how many geonames there are for a single country and the feature class for every single top country to evaluate their distribution. This can be easily achieved using an Elasticsearch aggregation in a single query: curl -XPOST 'http://localhost:9200/geonames/geoname/_search' -d ' { "size": 0, "aggs": { "geoname_by_country": { "terms": { "field": "country", "size": 5 }, "aggs": { "feature_by_country": { "terms": { "field": "fclass", "size": 5 } } } } } }’ The result can be will be something similar: { "took": 477, "timed_out": false, "_shards": { "total": 2, "successful": 2, "failed": 0 }, "hits": { "total": 11301974, "max_score": 0, "hits": [ ] }, "aggregations": { "geoname_by_country": { "doc_count_error_upper_bound": 113415, "sum_other_doc_count": 6787106, "buckets": [ { "key": "US", "doc_count": 2229464, "feature_by_country": { "doc_count_error_upper_bound": 0, "sum_other_doc_count": 82076, "buckets": [ { "key": "S", "doc_count": 1140332 }, { "key": "H", "doc_count": 506875 }, { "key": "T", "doc_count": 225276 }, { "key": "P", "doc_count": 192697 }, { "key": "L", "doc_count": 79544 } ] } },…truncated… These are simple examples how to easy index and search data with Elasticsearch. Integrating Elasticsearch with Apache Spark it’s very trivial: the core of part is to design your index and your data model to efficiently use it. After having correct indexed your data to cover your use case, Elasticsearch is able to provides your result or analytics in few microseconds. Summary In this article, we learned how to easily build a simple search geolocator with Elasticsearch using Apache Spark for ingestion. Resources for Article: Further resources on this subject: Basic Operations of Elasticsearch [article] Extending ElasticSearch with Scripting [article] Integrating Elasticsearch with the Hadoop ecosystem [article]
Read more
  • 0
  • 0
  • 6282

article-image-working-nav-and-azure-app-service
Packt
30 Jan 2017
12 min read
Save for later

Working with NAV and Azure App Service

Packt
30 Jan 2017
12 min read
In this article by Stefano Demiliani, the author of the book Building ERP Solutions with Microsoft Dynamics NAV, we will experience the quick solution to solve complex technical architectural scenarios and create external applications within a blink of an eye using Microsoft Dynamics NAV. (For more resources related to this topic, see here.) The business scenario Imagine a requirement where many Microsoft Dynamics NAV instances (physically located at different places around the world) have to interact with an external application. A typical scenario could be a headquarter of a big enterprise company that has a business application (called HQAPP) that must collect data about item shipments from the ERP of the subsidiary companies around the world (Microsoft Dynamics NAV): The cloud could help us to efficiently handle this scenario. Why not place the interface layer in the Azure Cloud and use the scalability features that Azure could offer? Azure App Service could be the solution to this. We can implement an architecture like the following schema: Here, the interface layer is placed on Azure App Service. Every NAV instance has the business logic (in our scenario, a query to retrieve the desired data) exposed as an NAV Web Service. The NAV instance can have an Azure VPN in place for security. HQAPP performs a request to the interface layer in Azure App Service with the correct parameters. The cloud service then redirects the request to the correct NAV instance and retrieves the data, which in turn is forwarded to HQAPP. Azure App Service can be scaled (manually or automatically) based on the resources requested to perform the data retrieval process. Azure App Service overview Azure App Service is a PaaS service for building scalable web and mobile apps and enabling interaction with on-premises or on-cloud data. With Azure App Service, you can deploy your application to the cloud and you can quickly scale your application to handle high traffic loads and manage traffic and application availability without interacting with the underlying infrastructure. This is the main difference with Azure VM, where you can run a web application on the cloud but in a IaaS environment (you control the infrastructure like OS, configuration, installed services, and so on). Some key features of Azure App Service are as follows: Support for many languages and frameworks Global scale with high availability (scaling up and out manually or automatically) Security Visual Studio integration for creating, deploying and debugging applications Application templates and connectors available Azure App Service offers different types of resources for running a workload, which are as follows: Web Apps: This hosts websites and web applications Mobile Apps: This hosts mobile app backends API Apps: This hosts RESTful APIs Logic Apps: This automates business processes across the cloud Azure App Service has the following different service plans where you can scale from depending on your requirements in terms of resources: Free: This is ideal for testing and development, no custom domains or SSL are required, you can deploy up to 10 applications. Shared: This has a fixed per-hour charge. This is ideal for testing and development, supports for custom domains and SSL, you can deploy up to 100 applications. Basic: This has a per-hour charge based on the number of instances. It runs on a dedicated instance. This is ideal for low traffic requirements, you can deploy an unlimited number of apps. It supports only a single SSL certificate per plan (not ideal if you need to connect to an Azure VPN or use deployment slots). Standard: This has a per-hour charge based on the number of instances. This provides full SSL support. This provides up to 10 instances with auto-scaling, automated backups, up to five deployment slots, ideal for production environments. Premium: This has per-hour charge based on the number of instances. This provides up to 50 instances with auto-scaling, up to 20 deployment slots, different daily backups, dedicated App Service Environment. Ideal for enterprise scale and integration. Regarding the application deployment, Azure App Service supports the concept of Deployment Slot (only on the Standard and Premium tiers). Deployment Slot is a feature that permits you to have a separate instance of an application that runs on the same VM but is isolated from the other deployment slots and production slots active in the App Service. Always remember that all Deployment Slots share the same VM instance and the same server resources. Developing the solution Our solution is essentially composed of two parts: The NAV business logic The interface layer (cloud service) The following steps will help you retrieve the required data from an external application: In the NAV instances of the subsidiary companies, we need to retrieve the sales shipment's data for every item. To do so, we need to create a Query object that reads Sales Shipment Header and Sales Shipment Line and exposes them as web services (OData).The Query object will be designed as follows: For every Sales Shipment Header web service, we retrieve the corresponding Sales Shipment Lines web service that have Type as DataItem: I've changed the name of the field No. in Sales Shipment Line in DataItem as ItemNo because the default name was in used in Sales Shipment Header in DataItem. Compile and save the Query object (here, I've used Object ID as 50009 and Service Name as Item Shipments). Now, we will publish the Query object as web service in NAV, so open the Web Services page and create the following entries:    Object Type: Query    Object ID: 50009    Service Name: Item Shipments    Published: TRUE When published, NAV returns the OData service URL. This Query object must be published as web service on every NAV instances in the subsidiary companies. To develop our interface layer, we need first to download and install (if not present) the Azure SDK for Visual Studio from https://azure.microsoft.com/en-us/downloads/. After that, we can create a new Azure Cloud Service project by opening Visual Studio and navigate to File | New | Project, select the Cloud templates, and choose Azure Cloud Service. Select the project's name (here, it is NAVAzureCloudService) and click on OK. After clicking on OK, Visual Studio asks you to select a service type. Select WCF Service Web Role, as shown in the following screenshot: Visual Studio now creates a template for our solution. Now right-click on the NAVAzureCloudService project and select New Web Role Project, and in the Add New .NET Framework Role Project window, select WCF Service Web Role and give it a proper name (here, we have named it WCFServiceWebRoleNAV): Then, rename Service1.svc with a better name (here, it is NAVService.svc). Our WCF Service Web Role must have the reference to all the NAV web service URLs for the various NAV instances in our scenario and (if we want to use impersonation) the credentials to access the relative NAV instance. You can right-click the WCFServiceWebRoleNAV project, select Properties and then the Settings tab. Here you can add the URL for the various NAV instances and the relative web service credentials. Let's start writing our service code. We create a class called SalesShipment that defines our data model as follows: public class SalesShipment { public string No { get; set; } public string CustomerNo { get; set; } public string ItemNo { get; set; } public string Description { get; set; } public string Description2 { get; set; } public string UoM { get; set; } public decimal? Quantity { get; set; } public DateTime? ShipmentDate { get; set; } } In next step, we have to define our service contract (interface). Our service will have a single method to retrieve shipments for a NAV instance and with a shipment date filter. The service contract will be defined as follows: public interface INAVService { [OperationContract] [WebInvoke(Method = "GET", ResponseFormat = WebMessageFormat.Xml, BodyStyle = WebMessageBodyStyle.Wrapped, UriTemplate = "getShipments?instance={NAVInstanceName}&date={shipmentDateFilter}"] List<SalesShipment> GetShipments(string NAVInstanceName, string shipmentDateFilter); //Date format parameter: YYYY-MM-DD } The WCF service definition will implement the previously defined interface as follows: public class NAVService : INAVService { } The GetShipments method is implemented as follows: public List<SalesShipment> GetShipments(string NAVInstanceName, string shipmentDateFilter) { try { DataAccessLayer.DataAccessLayer DAL = new DataAccessLayer.DataAccessLayer(); List<SalesShipment> list = DAL.GetNAVShipments(NAVInstanceName, shipmentDateFilter); return list; } catch(Exception ex) { // You can handle exceptions here… throw ex; } } This method creates an instance of a DataAccessLayer class (which we will discuss in detail later) and calls a method called GetNAVShipments by passing the NAV instance name and ShipmentDateFilter. To call the NAV business logic, we need to have a reference to the NAV OData web service (only to generate a proxy class, the real service URL will be dynamically called by code) so right-click on your project (WCFServiceWebRoleNAV) and navigate to Add | Service Reference. In the Add Service Reference window, paste the OData URL that comes from NAV and when the service is discovered, give it a reference name (here, it is NAVODATAWS). Visual Studio automatically adds a service reference to your project. The DataAccessLayer class will be responsible for handling calls to the NAV OData web service. This class defines a method called GetNAVShipments with the following two parameters: NAVInstanceName: This is the name of the NAV instance to call shipmentDateFilter: This filters date for the NAV shipment lines (greater than or equal to) According to NAVInstanceName, the method retrieves from the web.config file (appSettings) the correct NAV OData URL and credentials, calls the NAV query (by passing filters), and retrieves the data as a list of SalesShipment records (our data model). The DataAccessLayer class is defined as follows: public List<SalesShipment> GetNAVShipments(string NAVInstanceName, string shipmentDateFilter) { try { string URL = Properties.Settings.Default[NAVInstanceName].ToString(); string WS_User = Properties.Settings.Default[NAVInstanceName + "_User"].ToString(); string WS_Pwd = Properties.Settings.Default[NAVInstanceName + "_Pwd"].ToString(); string WS_Domain = Properties.Settings.Default[NAVInstanceName + "_Domain"].ToString(); DataServiceContext context = new DataServiceContext(new Uri(URL)); NAVODATAWS.NAV NAV = new NAVODATAWS.NAV(new Uri(URL)); NAV.Credentials = new System.Net.NetworkCredential(WS_User, WS_Pwd, WS_Domain); DataServiceQuery<NAVODATAWS.ItemShipments> q = NAV.CreateQuery<NAVODATAWS.ItemShipments>("ItemShipments"); if (shipmentDateFilter != null) { string FilterValue = string.Format("Shipment_Date ge datetime'{0}'", shipmentDateFilter); q = q.AddQueryOption("$filter", FilterValue); } List<NAVODATAWS.ItemShipments> list = q.Execute().ToList(); List<SalesShipment> sslist = new List<SalesShipment>(); foreach (NAVODATAWS.ItemShipments shpt in list) { SalesShipment ss = new SalesShipment(); ss.No = shpt.No; ss.CustomerNo = shpt.Sell_to_Customer_No; ss.ItemNo = shpt.ItemNo; ss.Description = shpt.Description; ss.Description2 = shpt.Description_2; ss.UoM = shpt.Unit_of_Measure; ss.Quantity = shpt.Quantity; ss.ShipmentDate = shpt.Shipment_Date; sslist.Add(ss); } return sslist; } catch (Exception ex) { throw ex; } } The method returns a list of the SalesShipment objects. It creates an instance of the NAV OData web service, applies the OData filter to the NAV query, reads the results, and loads the list of the SalesShipment objects. Deployment to Azure App Service Now that your service is ready, you have to deploy it to the Azure App Service by performing the following steps: Right-click on the NAVAzureCloudService project and select Package… as shown in the following screenshot: In the Package Azure Application window, select Service configuration as Cloud and Build configuration as Release, and then click on Package as shown in the following screenshot: This operation creates two files in the <YourProjectName>binReleaseapp.publish folder as shown in the following screenshot: These are the packages that must be deployed to Azure. To do so, you have to log in to the Azure Portal and navigate to Cloud Services | Add from the hub menu at the left. In the next window, set the following cloud service parameters: DNS name: This depicts name of your cloud service (yourname.cloudapp.net) Subscription: This is the Azure Subscription where the cloud service will be added Resource group: This creates a new resource group for your cloud service or use existing one Location: This is the Azure location where the cloud service is to be added Finally, you can click on the Create button to create your cloud service. Now, deploy the previously created cloud packages to your cloud service that was just created. In the cloud services list, click on NAVAZureCloudService, and in the next window, select the desired slot (for example, Production slot) and click on Upload as shown in the following screenshot: In the Upload a package window, provide the following parameters: Storage account: This is a previously created storage account for your subscription Deployment label: This is the name of your deployment Package: Select the .cspkg file previously created for your cloud service Configuration: Select the .cspkg file previously created for your cloud service configuration You can take a look at the preceding parameters in the following screenshot: Select the Start deployment checkbox and click on the OK button at the bottom to start the deployment process to Azure. Now you can start your cloud service and manage it (swap, scaling, and so on) directly from the Azure Portal: When running, you can use your deployed service by reaching this URL: http://navazurecloudservice.cloudapp.net/NAVService.svc This is the URL that the HQAPP in our business scenario has to call for retrieving data from the various NAV instances of the subsidiary companies around the world. In this way, you have deployed a service to the cloud, you can manage the resources in a central way (via the Azure Portal), and you can easily have different environments by using slots. Summary In this article, you learned to enable NAV instances placed at different locations to interact with an external application through Azure App Service and also the features that it provides. Resources for Article: Further resources on this subject: Introduction to NAV 2017 [article] Code Analysis and Debugging Tools in Microsoft Dynamics NAV 2009 [article] Exploring Microsoft Dynamics NAV – An Introduction [article]
Read more
  • 0
  • 0
  • 3234

article-image-internet-things-technologies
Packt
30 Jan 2017
9 min read
Save for later

Internet of Things Technologies

Packt
30 Jan 2017
9 min read
In this article by Rubén Oliva Ramos author of the book Internet of Things Programming with JavaScript we will understand different platform for the use of Internet of Things, and how to install Raspbian on micro SD card. (For more resources related to this topic, see here.) Technology has played a huge role in increasing efficiency in the work place, improving living conditions at home, monitoring health and environmental conditions and saving energy and natural resources. This has been made possible through continuous development of sensing and actuation devices. Due to the huge data handling requirements of these devices, the need for a more sophisticated, yet versatile data handling and storage medium, such as the Internet, has arisen. That’s why many developers are adopting the different Internet of Things platforms for prototyping that are available. There are several different prototyping platforms that one can use to add internet connectivity to his or her prototypes. It is important for a developer to understand the capabilities and limitations of the different platforms if he or she wants to make the best choice. So, here is a quick look at each platform. Arduino Arduino is a popular open-source hardware prototyping platform that is used in many devices. It comprises several boards including the UNO, Mega, YUN and DUE among several others. However, out of all the boards, only the Arduino Yun has built-in capability to connect to Wi-Fi and LAN networks. The rest of the boards rely on the shields, such as the Wi-Fi shield and Ethernet shield, to connect to the Internet. The official Arduino Ethernet shield allows you to connect to a network using an RJ45 cable. It has a Wiznet W5100 Ethernet chip that provides a network IP stack that is compatible with UDP and TCP. Using the Ethernet library provided by Arduino you will be able to connect to your network in a few simple steps. If you are thinking of connecting to the Internet wirelessly, you could consider getting the Arduino Wi-Fi shield. It is based on the HDG204 wireless LAN 802.11b/g System in-Package and has an AT32UC3 that provides a network IP stack that is compatible with UDP and TCP. Using this shield and the Wi-Fi library provided by Arduino, you can quickly connect to your prototypes to the web. The Arduino Yun is a hybrid board that has both Ethernet and Wi-Fi connectivity. It is based on the ATmega32u4 and the Atheros AR9331 which supports OpenWrt-Yun. The Atheros processor handles the WiFi and Ethernet interfaces while the ATmega32u4 handles the USB communication. So, if you are looking for a more versatile Arduino board for an Internet of Things project, this is it. There are several advantages to using the Arduino platform for internet of things projects. For starters the Arduino platform is easy to use and has a huge community that you can rely on for technical support. It also is easy to create a prototype using Arduino, since you can design a PCB based on the boards. Moreover, apart from Arduino official shields, the Arduino platform can also work with third party Wi-Fi and Ethernet shields such as the WiFi shield. Therefore, your options are limitless. On the down side, all Arduino boards, apart from Yun, need an external module so as to connect to the internet. So, you have to invest more. In addition to this, there are very many available shields that are compatible with the Arduino. This makes it difficult for you to choose. Also, you still need to choose the right Internet of Things platform for your project, such as Xively or EasyIoT. Raspberry Pi The Raspberry Pi is an open source prototyping platform that features credit-card sized computer boards. The boards have USB ports for a keyboard and a mouse, a HDMI port for display, an Ethernet port for network connection and an SD card to store the operating system. There are several versions of the Raspberry Pi available in the market. They include the Raspberry Pi 1 A, B, A+ and B+ and the Raspberry Pi 2 B. When using a Raspberry Pi board you can connect to the Internet either wirelessly or via an Ethernet cable. Raspberry Pi boards, except version A and A+, have an Ethernet port where you can connect an Ethernet cable. Normally, the boards gain internet connection immediately after you connect the Ethernet cable. However, your router must be configured for Dynamic Host Configuration Protocol (DHCP) for this to happen. Otherwise, you will have to set the IP address of the Raspberry Pi manually and the restart it. To connect your Raspberry Pi to the internet wirelessly, you have to use a WiFi adapter, preferably one that supports the RTL8192cu chipset. This is because Raspbian and Ocidentalis distributions have built-in support for that chip. However, there is no need to be choosy. Almost all Wi-Fi adapters in the market, including the very low cost budget adapters will work without any trouble. Using Raspberry Pi boards for IoT projects is advantageous because you don’t need extra shields or hardware to connect to the internet. Moreover, connecting to your wireless or LAN network happens automatically, so long as the router that has DHCP configured (most routers do). Also, you don’t have to worry if you are a newbie, since there is a huge Raspberry Pi community. You can get help quickly. The disadvantage of using the Raspberry Pi platform to make IoT devices is that it is not easy to use. It would take tremendous time for newbies to learn how to set up everything and code apps. Another demerit is that the Raspberry Pi boards cannot be easily integrated into a product. There are also numerous operating systems that the Pi boards can run on and it is not easy to decide on the best operating system for the device you are creating. Which Platform to Choose? The different features that come with each platform make it ideal for certain applications, but not all. Therefore, if at all you have not yet made up your mind on the platform to use, try selecting them based on usage. The Raspberry Pi is ideal for IoT applications that are server based. This is because it has high storage space and RAM and a powerful processor. Moreover, it supports many programming languages that can create Server-Side apps, such as Node.js. You can also use the Raspberry Pi in instances where you want to access web pages and view data posted on online servers, since you can connect a display to it and view the pages on the web browser. The Raspberry Pi can connect to both LAN and Wi-Fi networks. However, it is not advisable to use the raspberry Pi for projects where you would want to integrate it into a finished product or create a custom made PCB. On the other hand, Arduino comes in handy as a client. It can log data on an online server and retrieve data from the server. It is ideal for logging sensor data and controlling actuators via commands posted on the server by another client. However, there are instances where you can use Arduino boards for server functions such as hosting a simple web page that you can use to control your Arduino from the local network it is connected to. The Arduino platform can connect to both LAN and Wi-Fi networks. The ESP8266 has a very simple hardware architecture and is best used in client applications such as data logging and control of actuators from online server applications. You can use it as a webserver as well, but for applications or web pages that you would want to access from the local network that the module is connected to. The Spark Core platform is ideal for both server and client functions. It can be used to log sensor data onto the Spark.io cloud or receive commands from the cloud. Moreover, you don’t have to worry about getting online server space, since the Spark cloud is available for free. You can also create Graphical User Interfaces based on Node.js to visualize incoming data from sensors connected to the Spark Core and send commands to the Spark Core for activation of actuators. Setting up Raspberry Pi Zero Raspberry Pi is low-cost board dedicated for project purpose this will use a Raspberry Pi Zero board. See the following link https://www.adafruit.com/products/2816 for the Raspberry Pi Zero board and Kit. In order to make the Raspberry Pi work, we need an operating system that acts as a bridge between the hardware and the user, we will be using the Raspbian Jessy that you can download from https://www.raspberrypi.org/downloads/, in this link you will find all the information that you need to download all the software that it’s necessary to use with your Raspberry Pi to deploy Raspbian, we need a micro SD card of at least 4 GB. The kit that I used for testing the Raspberry Pi Zero includes all the necessary items for installing every thing and ready the board. Preparing the SD card The Raspberry Pi Zero only boots from a SD card, and cannot boot from an external drive or USB stick. For now it is recommended to use a Micro SD 4 GB of space. Installing the Raspian Operating System When we have the image file for preparing the SD card that has previously got from the page. Also we insert the micro SD into the adapter, download Win32DiskImager from https://sourceforge.net/projects/win32diskimager/ In the following screenshot you will see the files after downloading the folder It appears the window, open the file image and select the path you have the micro SD card and click on the Write button. After a few seconds we have here the download image and converted files into the micro SD In the next screenshot you can see the progress of the installation Summary In this article we discussed about the different platform for the use of Internet of Things like Ardunio and Raspberry Pi. We also how to setup and prepare Raspberry Pi for further use. Resources for Article: Further resources on this subject: Classes and Instances of Ember Object Model [article] Using JavaScript with HTML [article] Introducing the Ember.JS framework [article]
Read more
  • 0
  • 0
  • 13047
article-image-creating-dynamic-maps
Packt
27 Jan 2017
15 min read
Save for later

Creating Dynamic Maps

Packt
27 Jan 2017
15 min read
In this article by Joel Lawhead, author of the book, QGIS Python Programming Cookbook - Second Edition, we will cover the following recipes: Setting a transparent layer fill Using a filled marker symbol Rendering a single band raster using a color ramp algorithm Setting a feature's color using a column in a CSV file Creating a complex vector layer symbol Using an outline for font markers Using arrow symbols (For more resources related to this topic, see here.) Setting a transparent layer fill Sometimes, you may just want to display the outline of a polygon in a layer and have the insides of the polygon render transparently, so you can see the other features and background layers inside that space. For example, this technique is common with political boundaries. In this recipe, we will load a polygon layer onto the map, and then interactively change it to just an outline of the polygon. Getting ready Download the zipped shapefile and extract it to your qgis_data directory into a folder named ms from https://github.com/GeospatialPython/Learn/raw/master/Mississippi.zip. How to do it… In the following steps, we'll load a vector polygon layer, set up a properties dictionary to define the color and style, apply the properties to the layer's symbol, and repaint the layer. In Python Console, execute the following: Create the polygon layer: lyr = QgsVectorLayer("/qgis_data/ms/mississippi.shp", "Mississippi", "ogr") Load the layer onto the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) Now, we’ll create the properties dictionary: properties = {} Next, set each property for the fill color, border color, border width, and a style of no meaning no-brush. Note that we’ll still set a fill color; we are just making it transparent: properties["color"] = '#289e26' properties["color_border"] = '#289e26' properties["width_border"] = '2' properties["style"] = 'no' Now, we create a new symbol and set its new property: sym = QgsFillSymbolV2.createSimple(properties) Next, we access the layer's renderer: renderer = lyr.rendererV2() Then, we set the renderer's symbol to the new symbol we created: renderer.setSymbol(sym) Finally, we repaint the layer to show the style updates: lyr.triggerRepaint() How it works… In this recipe, we used a simple dictionary to define our properties combined with the createSimple method of the QgsFillSymbolV2 class. Note that we could have changed the symbology of the layer before adding it to the canvas, but adding it first allows you to see the change take place interactively. Using a filled marker symbol A newer feature of QGIS is filled marker symbols. Filled marker symbols are powerful features that allow you to use other symbols, such as point markers, lines, and shapebursts as a fill pattern for a polygon. Filled marker symbols allow for an endless set of options for rendering a polygon. In this recipe, we'll do a very simple filled marker symbol that paints a polygon with stars. Getting ready Download the zipped shapefile and extract it to your qgis_data directory into a folder named ms from https://github.com/GeospatialPython/Learn/raw/master/Mississippi.zip. How to do it… A filled marker symbol requires us to first create the representative star point marker symbol. Then, we'll add that symbol to the filled marker symbol and change it with the layer's default symbol. Finally, we'll repaint the layer to update the symbology: First, create the layer with our polygon shapefile: lyr = QgsVectorLayer("/qgis_data/ms/mississippi.shp", "Mississippi", "ogr") Next, load the layer onto the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) Now, set up the dictionary with the properties of the star marker symbol: marker_props = {} marker_props["color"] = 'red' marker_props["color_border"] = 'black' marker_props["name"] = 'star' marker_props["size"] = '3' Now, create the star marker symbol: marker = QgsMarkerSymbolV2.createSimple(marker_props) Then, we create our filled marker symbol: filled_marker = QgsPointPatternFillSymbolLayer() We need to set the horizontal and vertical spacing of the filled markers in millimeters: filled_marker.setDistanceX(4.0) filled_marker.setDistanceY(4.0) Now, we can add the simple star marker to the filled marker symbol: filled_marker.setSubSymbol(marker) Next, access the layer's renderer: renderer = lyr.rendererV2() Now, we swap the first symbol layer of the first symbol with our filled marker using zero indexes to reference them: renderer.symbols()[0].changeSymbolLayer(0, filled_marker) Finally, we repaint the layer to see the changes: lyr.triggerRepaint() Verify that the result looks similar to the following screenshot: Rendering a single band raster using a color ramp algorithm A color ramp allows you to render a raster using just a few colors to represent different ranges of cell values that have a similar meaning in order to group them. The approach that will be used in this recipe is the most common way to render elevation data. Getting ready You can download a sample DEM from https://github.com/GeospatialPython/Learn/raw/master/dem.zip, which you can unzip in a directory named rasters in your qgis_data directory. How to do it... In the following steps, we will set up objects to color a raster, create a list establishing the color ramp ranges, apply the ramp to the layer renderer, and finally, add the layer to the map. To do this, we need to perform the following: First, we import the QtGui library for color objects in Python Console: from PyQt4 import QtGui Next, we load the raster layer, as follows: lyr = QgsRasterLayer("/qgis_data/rasters/dem.asc", "DEM") Now, we create a generic raster shader object: s = QgsRasterShader() Then, we instantiate the specialized ramp shader object: c = QgsColorRampShader() We must name a type for the ramp shader. In this case, we use an INTERPOLATED shader: c.setColorRampType(QgsColorRampShader.INTERPOLATED) Now, we'll create a list of our color ramp definitions: i = [] Then, we populate the list with the color ramp values that correspond to the elevation value ranges: i.append(QgsColorRampShader.ColorRampItem(400, QtGui.QColor('#d7191c'), '400')) i.append(QgsColorRampShader.ColorRampItem(900, QtGui.QColor('#fdae61'), '900')) i.append(QgsColorRampShader.ColorRampItem(1500, QtGui.QColor('#ffffbf'), '1500')) i.append(QgsColorRampShader.ColorRampItem(2000, QtGui.QColor('#abdda4'), '2000')) i.append(QgsColorRampShader.ColorRampItem(2500, QtGui.QColor('#2b83ba'), '2500')) Now, we assign the color ramp to our shader: c.setColorRampItemList(i) Now, we tell the generic raster shader to use the color ramp: s.setRasterShaderFunction(c) Next, we create a raster renderer object with the shader: ps = QgsSingleBandPseudoColorRenderer(lyr.dataProvider(), 1, s) We assign the renderer to the raster layer: lyr.setRenderer(ps) Finally, we add the layer to the canvas in order to view it: QgsMapLayerRegistry.instance().addMapLayer(lyr) How it works… While it takes a stack of four objects to create a color ramp, this recipe demonstrates how flexible the PyQGIS API is. Typically, the more number of objects it takes to accomplish an operation in QGIS, the richer the API is, giving you the flexibility to make complex maps. Notice that in each ColorRampItem object, you specify a starting elevation value, the color, and a label as the string. The range for the color ramp ends at any value less than the following item. So, in this case, the first color will be assigned to the cells with a value between 400 and 899. The following screenshot shows the applied color ramp: Setting a feature's color using a column in a CSV file Comma Separated Value (CSV) files are an easy way to store basic geospatial information. But you can also store styling properties alongside the geospatial data for QGIS to use in order to dynamically style the feature data. In this recipe, we'll load some points into QGIS from a CSV file and use one of the columns to determine the color of each point. Getting ready Download the sample zipped CSV file from the following URL: https://github.com/GeospatialPython/Learn/raw/master/point_colors.csv.zip Extract it and place it in your qgis_data directory in a directory named shapes. How to do it… We'll load the CSV file into QGIS as a vector layer and create a default point symbol. Then we'll specify the property and the CSV column we want to control. Finally we'll assign the symbol to the layer and add the layer to the map: First, create the URI string needed to load the CSV: uri = "file:///qgis_data/shapes/point_colors.csv?" uri += "type=csv&" uri += "xField=X&yField=Y&" uri += "spatialIndex=no&" uri += "subsetIndex=no&" uri += "watchFile=no&" uri += "crs=epsg:4326" Next, create the layer using the URI string: lyr = QgsVectorLayer(uri,"Points","delimitedtext") Now, create a default symbol for the layer's geometry type: sym = QgsSymbolV2.defaultSymbol(lyr.geometryType()) Then, we access the layer's symbol layer: symLyr = sym.symbolLayer(0) Now, we perform the key step, which is to assign a symbol layer property to a CSV column: symLyr.setDataDefinedProperty("color", '"COLOR"') Then, we change the existing symbol layer with our data-driven symbol layer: lyr.rendererV2().symbols()[0].changeSymbolLayer(0, symLyr) Finally, we add the layer to the map and verify that each point has the correct color, as defined in the CSV: QgsMapLayerRegistry.instance().addMapLayers([lyr]) How it works… In this example, we pulled feature colors from the CSV, but you could control any symbol layer property in this manner. CSV files can be a simple alternative to databases for lightweight applications or for testing key parts of a large application before investing the overhead to set up a database. Creating a complex vector layer symbol The true power of QGIS symbology lies in its ability to stack multiple symbols in order to create a single complex symbol. This ability makes it possible to create virtually any type of map symbol you can imagine. In this recipe, we'll merge two symbols to create a single symbol and begin unlocking the potential of complex symbols. Getting ready For this recipe, we will need a line shapefile, which you can download and extract from https://github.com/GeospatialPython/Learn/raw/master/paths.zip. Add this shapefile to a directory named shapes in your qgis_data directory. How to do it… Using Python Console, we will create a classic railroad line symbol by placing a series of short, rotated line markers along a regular line symbol. To do this, we need to perform the following steps: First, we load our line shapefile: lyr = QgsVectorLayer("/qgis_data/shapes/paths.shp", "Route", "ogr") Next, we get the symbol list and reference the default symbol: symbolList = lyr.rendererV2().symbols() symbol = symbolList[0] Then,we create a shorter variable name for the symbol layer registry: symLyrReg = QgsSymbolLayerV2Registry Now, we set up the line style for a simple line using a Python dictionary: lineStyle = {'width':'0.26', 'color':'0,0,0'} Then, we create an abstract symbol layer for a simple line: symLyr1Meta = symLyrReg.instance().symbolLayerMetadata("SimpleLine") We instantiate a symbol layer from the abstract layer using the line style properties: symLyr1 = symLyr1Meta.createSymbolLayer(lineStyle) Now, we add the symbol layer to the layer's symbol: symbol.appendSymbolLayer(symLyr1) Now,in order to create the rails on the railroad, we begin building a marker line style with another Python dictionary, as follows: markerStyle = {} markerStyle['width'] = '0.26' markerStyle['color'] = '0,0,0' markerStyle['interval'] = '3' markerStyle['interval_unit'] = 'MM' markerStyle['placement'] = 'interval' markerStyle['rotate'] = '1' Then, we create the marker line abstract symbol layer for the second symbol: symLyr2Meta = symLyrReg.instance().symbolLayerMetadata("MarkerLine") We instatiate the symbol layer, as shown here: symLyr2 = symLyr2Meta.createSymbolLayer(markerStyle) Now, we must work with a subsymbol that defines the markers along the marker line: sybSym = symLyr2.subSymbol() We must delete the default subsymbol: sybSym.deleteSymbolLayer(0) Now, we set up the style for our rail marker using a dictionary: railStyle = {'size':'2', 'color':'0,0,0', 'name':'line', 'angle':'0'} Now, we repeat the process of building a symbol layer and add it to the subsymbol: railMeta = symLyrReg.instance().symbolLayerMetadata("SimpleMarker") rail = railMeta.createSymbolLayer(railStyle) sybSym.appendSymbolLayer(rail) Then, we add the subsymbol to the second symbol layer: symbol.appendSymbolLayer(symLyr2) Finally, we add the layer to the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) How it works… First, we must create a simple line symbol. The marker line, by itself, will render correctly, but the underlying simple line will be a randomly chosen color. We must also change the subsymbol of the marker line because the default subsymbol is a simple circle. Using an outline for font markers Font markers open up broad possibilities for icons, but a single-color shape can be hard to see across a varied map background. Recently, QGIS added the ability to place outlines around font marker symbols. In this recipe, we'll use font marker symbol methods to place an outline around the symbol to give it contrast and, therefore, visibility on any type of background. Getting ready Download the following zipped shapefile. Extract it and place it in a directory named ms in your qgis_data directory: https://github.com/GeospatialPython/Learn/raw/master/tourism_points.zip How to do it… This recipe will load a layer from a shapefile, set up a font marker symbol, put an outline on it, and then add it to the layer. We'll use a simple text character, an @ sign, as our font marker to keep things simple: First, we need to import the QtGUI library, so we can work with color objects: from PyQt4.QtGui import * Now, we create a path string to our shapefile: src = "/qgis_data/ms/tourism_points.shp" Next, we can create the layer: lyr = QgsVectorLayer(src, "Points of Interest", "ogr") Then, we can create the font marker symbol specifying the font size and color in the constructor: symLyr = QgsFontMarkerSymbolLayerV2(pointSize=16, color=QColor("cyan")) Now, we can set the font family, character, outline width, and outline color: symLyr.setFontFamily("'Arial'") symLyr.setCharacter("@") symLyr.setOutlineWidth(.5) symLyr.setOutlineColor(QColor("black")) We are now ready to assign the symbol to the layer: lyr.rendererV2().symbols()[0].changeSymbolLayer(0, symLyr) Finally, we add the layer to the map: QgsMapLayerRegistry.instance().addMapLayer(lyr) Verify that your map looks similar to the following image: How it works… We used class methods to set this symbol up, but we also could have used a property dictionary just as easily. Note that the font size and color were set in the object constructor for the font maker symbol instead of using setter methods. QgsFontMarkerSymbolLayerV2 doesn't have methods for these two properties. Using arrow symbols Line features convey location, but sometimes you also need to convey a direction along a line. QGIS recently added a symbol that does just that by turning lines into arrows. In this recipe, we'll symbolize some line features showing historical human migration routes around the world. This data requires directional arrows for us to understand it: Getting ready We will use two shapefiles in this example. One is a world boundaries shapefile and the other is a route shapefile. You can download the countries shapefile here: https://github.com/GeospatialPython/Learn/raw/master/countries.zip You can download the routes shapefile here: https://github.com/GeospatialPython/Learn/raw/master/human_migration_routes.zip Download these ZIP files and unzip the shapefiles into your qgis_data directory. How to do it… We will load the countries shapefile as a background reference layer and then, the route shapefile. Before we display the layers on the map, we'll create the arrow symbol layer, configure it, and then add it to the routes layer. Finally, we'll add the layers to the map. First, we'll create the URI strings for the paths to the two shapefiles: countries_shp = "/qgis_data/countries.shp" routes_shp = "/qgis_data/human_migration_routes.shp" Next, we'll create our countries and routes layers: countries = QgsVectorLayer(countries_shp, "Countries", "ogr") routes = QgsVectorLayer(routes_shp, "Human Migration Routes", "ogr") Now, we’ll create the arrow symbol layer: symLyr = QgsArrowSymbolLayer() Then, we’ll configure the layer. We'll use the default configuration except for two paramters--to curve the arrow and to not repeat the arrow symbol for each line segment: symLyr.setIsCurved(True) symLyr.setIsRepeated(False) Next, we add the symbol layer to the map layer: routes.rendererV2().symbols()[0].changeSymbolLayer(0, symLyr) Finally, we add the layers to the map: QgsMapLayerRegistry.instance().addMapLayers([routes,countries]) Verify that your map looks similar to the following image: How it works… The symbol calculates the arrow's direction based on the order of the feature's points. You may find that you need to edit the underlying feature data to produce the desired visual effect, especially when using curved arrows. You have limited control over the arc of the curve using the end points plus an optional third vertex. This symbol is one of the several new powerful visual effects added to QGIS, which would have normally been done in a vector illustration program after you produced a map. Summary In this article, weprogrammatically created dynamic maps using Python to control every aspect of the QGIS map canvas. We learnt to dynamically apply symbology from data in a CSV file. We also learnt how to use some newer QGIS custom symbology including font markers, arrow symbols, null symbols, and the powerful new 2.5D renderer for buildings. Wesaw that every aspect of QGIS is up for grabs with Python, to write your own application. Sometimes, the PyQGIS API may not directly support our application goal, but there is nearly always a way to accomplish what you set out to do with QGIS. Resources for Article: Further resources on this subject: Normal maps [article] Putting the Fun in Functional Python [article] Revisiting Linux Network Basics [article]
Read more
  • 0
  • 0
  • 4330

article-image-understanding-container-scenarios-and-overview-docker
Packt
24 Jan 2017
17 min read
Save for later

Understanding Container Scenarios and Overview of Docker

Packt
24 Jan 2017
17 min read
Docker is one of the recent most successful open source project which provides packaging, shipping, and running any application as light weight containers. We can actually compare Docker containers as shipping containers that provides standard consistent way of shipping any application. Docker is fairly a new project and with help of this article it will be easy to troubleshoot some of the common problems which Docker users face while installing and using Dockers containers. In this article by Rajdeep Dua, Vaibhav Kohli, and John Wooten authors of the book Troubleshooting Docker, the emphasis will be on the following topics; Decoding containers Diving into Docker Advantages of Docker containers Docker lifecycle Docker design patterns Unikernels (For more resources related to this topic, see here.) Decoding containers Containerization are an alternative to virtual machine which involves encapsulation of applications and providing it with its own operating environment. The basic foundation for containers is Linux containers (LXC) which is user space interface for Linux Kernel containment features. With help of powerful API and simple tools it lets Linux users create and manage application containers. LXC containers are in-between of chroot and full-fledged virtual machine. Another key difference with containerization from traditional hypervisor's is that containers share the Linux Kernel used by operating system running the host machine, thus multiple containers running in the same machine uses the same Linux Kernel. It gives the advantage of being fast with almost zero performance overhead compared to VMs. Major use cases of containers are listed in the further sections. OS container OS containers can be easily imagined as a Virtual Machine (VM) but unlike a VM they share the Kernel of the host operating system but provide user space isolation. Similar to a VM dedicated resources can be assigned to containers and we can install, configure and run different application, libraries, and so on. Just as you would run on any VM. OS containers are helpful in case of scalability testing where fleet of containers can be deployed easily with different flavors of distros, which is very less expensive compared to deployment of VM's. Container are created from templates or images that determine the structure and contents of the container. It allows to create a container with identical environment, same package version, and configuration across all containers mostly used in case of dev environment setups. There are various container technologies like LXC, OpenVZ , Docker, and BSD jails which are suitable for OS containers. Figure 1: OS based container Application containers Application containers are designed to run a single service in the package, while OS containers which are explained previously can support multiple processes. Application containers are getting lot of attraction after launch of Docker and Rocket. Whenever container is launched it runs a single process. This process runs an application process but in case of OS containers it runs multiple services on the same OS. Containers usually have a layered approach as in case of Docker container which helps in reduced duplication and increased re-use. Container can be started with base image common for all components and then we can go on adding layers in the file system that are specific to the component. Layered file system helps to rollback changes as we can simple switch to old layers if required. The run command which is specified in Dockerfile adds a new layer for the container. The main purpose of application containers is to package different component of the application in separate container. The different component of the application which are packaged separately in container then interact with help of API's and services. The distributed multi-component system deployment is the basic implementation of micro-service architecture. In the preceding approach developer gets the freedom to package the application as per his requirement and IT team gets the privilege to deploy the container on multiple platforms in order to scale the system both horizontally as well as vertically. Hypervisor is virtual machine monitor (VMM), used to allow multiple operation system to run and share the hardware resources from the host. Each virtual machine is termed as guest machine. The following simple example explains the difference between application container and OS containers: Figure 2: Docker layers Let's consider the example of web three-tier architecture we have a database tier such as MySQL, Nginx for load balancer and application tier as Node.js: Figure 3: OS container In case of OS container we can pick up by default Ubuntu as the base container and install services MySQL, Nginx, Node.js using Dockerfile. This type of packaging is good for testing or for development setup where all the services are packaged together and can be shipped and shared across developer's. But deploying this architecture for production cannot be done with OS containers as there is no consideration of data scalability and isolation. Application containers helps to meet this use case as we can scale the required component by deploying more application specific containers and it also helps to meet load-balancing and recovery use-case. For the preceding three-tier architecture each of the services will be packaged into separate containers in order to fulfill the architecture deployment use-case. Figure 4: Application containers scaled up Main difference between OS and application containers are: OS container Application container Meant to run multiple services on same OS container Meant to run single service Natively, No layered filesystem Layered filesystem Example: LXC, OpenVZ, BSD Jails Example: Docker, Rocket Diving into Docker Docker is a container implementation that has gathered enormous interest in recent years. It neatly bundles various Linux Kernel features and services like namespaces, cgroups, SELinux, and AppArmor profiles and so on with Union files systems like AUFS, BTRFS to make modular images. These images provides highly configurable virtualized environment for applications and follows write-once-run-anywhere principle. Application can be as simple as running a process to a highly scalable and distributed processes working together. Docker is getting a lot of traction in industry, because of its performance savvy, and universal replicability architecture, meanwhile providing the following four cornerstones of modern application development: Autonomy Decentralization Parallelism Isolation Furthermore, wide-scale adaptation of Thoughtworks's micro services architecture or Lots of Small Applications (LOSA) is further bringing potential in Docker technology. As a result, big companies like Google, VMware, and Microsoft have already ported Docker to their infrastructure, and the momentum is continued by the launch of myriad of Docker startups namely Tutum, Flocker, Giantswarm and so on. Since Docker containers replicate their behavior anywhere, be it your development machine, a bare-metal server, virtual machine, or datacenter, application designers can focus their attention on development, while operational semantics are left with Devops. This makes team workflow modular, efficient and productive. Docker is not to be confused with VM, even though they are both virtualization technologies. Where Docker shares an OS, meanwhile providing sufficient level of isolation and security to applications running in containers, later completely abstracts out OS and gives strong isolation and security guarantees. But Docker resource footprint is minuscule in comparison to VM, and hence preferred for economy and performance. However, it still cannot completely replace VM, and hence is complementary to VM technology: Figure 5: VM and Docker architecture Advantages of Docker containers Following listed are some of the advantages of using Docker containers in Micro-service architecture: Rapid application deployment: With minimal runtime containers can be deployed quickly because of the reduced size as only application is packaged. Portability: An application with its operating environment (dependencies) can be bundled together into a single Docker container that is independent from the OS version or deployment model. The Docker containers can be easily transferred to another machine that runs Docker container and executed without any compatibility issues. As well Windows support is also going to be part of future Docker releases. Easily sharable: Pre-built container images can be easily shared with help of public repositories as well as hosted private repositories for internal use. Lightweight footprint: Even the Docker images are very small and have minimal footprint to deploy new application with help of containers. Reusability: Successive versions of Docker containers can be easily built as well as roll-backed to previous versions easily whenever required. It makes them noticeably lightweight as components from the pre-existing layers can be reused. Docker lifecycle These are some of the basic steps involved in the lifecycle of Docker container: Build the Docker image with help of Dockerfile which contains all the commands required to be packaged. It can run in the following way: Docker build Tag name can be added in following way: Docker build -t username/my-imagename If Dockerfile exists at different path then the Docker build command can be executed by providing –f flag: Docker build -t username/my-imagename -f /path/Dockerfile After the image creation, in order to deploy the container Docker run can be used. The running containers can be checked with help of Docker pscommand, which list the currently active containers. There are two more commands to be discussed; Docker pause: This command used cgroups freezer to suspend all the process running in container, internally it uses SIGSTOP signal. Using this command process can be easily suspended and resumed whenever required. Docker start: This command is used to either start the paused or stopped container. After the usage of container is done it can either be stopped or killed; Docker stop: command will gracefully stop the running container by sending SIGTERM and then SIGKILL command. In this case container can still be listed by using Docker ps –a command. Docker kill will kill the running container by sending SIGKILL to main process running inside the container. If there are some changes made to the container while it is running, which are likely to be preserved, container can be converted back to image by using the Docker commit after container has been stopped. Figure 6: Docker lifecycle Docker design patterns Following listed are some of the Docker design patterns with examples. Dockerfile is the base structure from which we define a Docker image it contains all the commands to assemble an image. Using Docker build command we can create automated build that executes all the previously mentioned command-line instructions to create an image: $ Docker build Sending build context to Docker daemon 6.51 MB ... Design patterns listed further can help in creating Docker images that persist in volumes and provides various other flexibility so that they can be re-created or replaced easily at any time. The base image sharing For creating a web-based application or blog we can create a base image which can be shared and help to deploy the application with ease. This patterns helps out as it tries to package all the required services on top of one base image, so that this web application blog image can be re-used anywhere: FROM debian:wheezy RUN apt-get update RUN apt-get -y install ruby ruby-dev build-essential git # For debugging RUN apt-get install -y gdb strace # Set up my user RUN useradd vkohli -u 1000 -s /bin/bash --no-create-home RUN gem install -n /usr/bin bundler RUN gem install -n /usr/bin rake WORKDIR /home/vkohli/ ENV HOME /home/vkohli VOLUME ["/home"] USER vkohli EXPOSE 8080 The preceding Dockerfile shows the standard way of creating an application-based image. Docker image is a zipped file which is a snapshot of all the configuration parameters as well as the changes made in the base image (Kernel of the OS). It installs some specific tools (Ruby tools rake and bundler) on top of Debian base image. It creates a new user adds it to the container image and specifies the working directory by mounting /home directory from the host which is explained in detail in next section. Shared volume Sharing the volume at host level allows other containers to pick up the shared content required by them. This helps in faster rebuilding of Docker image or add/modify/remove dependencies. Example if we are creating the homepage deployment of the previously mentioned blog only directory required to be shared is /home/vkohli/src/repos/homepage directory with this web app container through the Dockerfile in the following way: FROM vkohli/devbase WORKDIR /home/vkohli/src/repos/homepage ENTRYPOINT bin/homepage web For creating the dev version of the blog we can share the folder /home/vkohli/src/repos/blog where all the related developer files can reside. And for creating the dev-version image we can take the base image from pre-created devbase: FROM vkohli/devbase WORKDIR / USER root # For Graphivz integration RUN apt-get update RUN apt-get -y install graphviz xsltproc imagemagick USER vkohli WORKDIR /home/vkohli/src/repos/blog ENTRYPOINT bundle exec rackup -p 8080 Dev-tools container For development purpose we have separate dependencies in dev and production environment which easily gets co-mingled at some point. Containers can be helpful in differentiating the dependencies by packaging them separately. As shown in the following example we can derive the dev tools container image from the base image and install development dependencies on top of it even allowing ssh connection so that we to work upon the code: FROM vkohli/devbase RUN apt-get update RUN apt-get -y install openssh-server emacs23-nox htop screen # For debugging RUN apt-get -y install sudo wget curl telnet tcpdump # For 32-bit experiments RUN apt-get -y install gcc-multilib # Man pages and "most" viewer: RUN apt-get install -y man most RUN mkdir /var/run/sshd ENTRYPOINT /usr/sbin/sshd -D VOLUME ["/home"] EXPOSE 22 EXPOSE 8080 As can be seen previously basic tools such as wget, curl, tcpdump are installed which are required during development. Even SSHD service is installed which allows to ssh connection into the dev container. Test environment container Testing the code in different environment always eases the process and helps to find more bugs in isolation. We can create a ruby environment in separate container to spawn a new ruby shell and use it to test the code base: FROM vkohli/devbase RUN apt-get update RUN apt-get -y install ruby1.8 git ruby1.8-dev In the preceding Dockerfile we are using the base image as devbase and with help of just one command Docker run can easily create a new environment by using the image created from this Dockerfile to test the code. The build container We have built steps involved in the application that are sometimes expensive. In order to overcome this we can create a separate a build container which can use the dependencies needed during build process. Following Dockerfile can be used to run a separate build process: FROM sampleapp RUN apt-get update RUN apt-get install -y build-essential [assorted dev packages for libraries] VOLUME ["/build"] WORKDIR /build CMD ["bundler", "install","--path","vendor","--standalone"] /build directory is the shared directory that can be used to provide the compiled binaries also we can mount the /build/source directory in the container to provide updated dependencies. Thus by using build container we can decouple the build process and final packaging part in separate containers. It still encapsulates both the process and dependencies by breaking the previous process in separate containers. The installation container The purpose of this container is to package the installation steps in separate container. Basically, in order to provide deployment of container in production environment. Sample Dockerfile to package the installation script inside Docker image as follows: ADD installer /installer CMD /installer.sh The installer.sh can contain the specific installation command to deploy container in production environment and also to provide the proxy setup with DNS entry in order to have the cohesive environment deployed. Service-in-a-box container In order to deploy the complete application in a container we can bundle multiple services to provide the complete deployment container. In this case we bundle web app, API service and database together in one container. It helps to ease the pain of inter-linking various separate containers: services: web: git_url: git@github.com:vkohli/sampleapp.git git_branch: test command: rackup -p 3000 build_command: rake db:migrate deploy_command: rake db:migrate log_folder: /usr/src/app/log ports: ["3000:80:443", "4000"] volumes: ["/tmp:/tmp/mnt_folder"] health: default api: image: quay.io/john/node command: node test.js ports: ["1337:8080"] requires: ["web"] databases: - "mysql" - "redis" Infrastructure container As we have talked about the container usage in development environment, there is one big category missing the usage of container for infrastructure services such as proxy setup which provides a cohesive environment in order to provide the access to application. In the following mentioned Dockerfile example we can see that haproxy is installed and links to its configuration file is provided: FROM debian:wheezy ADD wheezy-backports.list /etc/apt/sources.list.d/ RUN apt-get update RUN apt-get -y install haproxy ADD haproxy.cfg /etc/haproxy/haproxy.cfg CMD ["haproxy", "-db", "-f", "/etc/haproxy/haproxy.cfg"] EXPOSE 80 EXPOSE 443 Haproxy.cfg is the configuration file responsible for authenticating a user: backend test acl authok http_auth(adminusers) http-request auth realm vkohli if !authok server s1 192.168.0.44:8084 Unikernels Unikernels compiles source code into a custom operating system that includes only the functionality required by the application logic producing a specialized single address space machine image, eliminating unnecessary code. Unikernels is built using library operating system, which has the following benefits compared to traditional OS: Fast Boot time: Unikernels make provisioning highly dynamic and can boot in less than second. Small Footprint: Unikernel code base is smaller than the traditional OS equivalents and pretty much easy to manage. Improved security: As unnecessary code is not deployed, the attack surface is drastically reduced. Fine-grained optimization: Unikernels are constructed using compile tool chain and are optimized for device drivers and application logic to be used. Unikernels matches very well with the micro-services architecture as both source code and generated binaries can be easily version-controlled and are compact enough to be rebuild. Whereas on other side modifying VM's is not permitted and changes can be only made to source code which is time-consuming and hectic. For example, if the application doesn't require disk access and display facility. Unikernels can help to remove this unnecessary device drivers and display functionality from the Kernel. Thus production system becomes minimalistic only packaging the application code, runtime environment and OS facilities which is the basic concept of immutable application deployment where new image is constructed if any application change is required in production servers: Figure 7: Transition from traditional container to Unikernel based containers Container and Unikernels are best fit for each other. Recently, Unikernel system has become part of Docker and the collaboration of both this technology will be seen sooner in the next Docker release. As it is explained in the preceding diagram the first one shows the traditional way of packaging one VM supporting multiple Docker containers. The next step shows 1:1 map (one container per VM) which allows each application to be self-contained and gives better resource usage but creating a separate VM for each container adds an overhead. In the last step we can see the collaboration of Unikernels with the current existing Docker tools and eco-system, where container will get the Kernel low-library environment specific to its need. Adoption of Unikernels in Docker toolchain will accelerate the progress of Unikernels and it will be widely used and will be understood as packaging model and runtime framework making Unikernels as another type of container. After the Unikernels abstraction for Docker developers, we will be able to choose either to use traditional Docker container or use the Unikernel container in order to create the production environment. Summary In this article we studied about the basic containerization concept with help of application and OS-based containers. And the differences between them explained in this article will clearly help the developers to choose the containerization approach which fits perfectly for their system. We have thrown some light around the Docker technology, its advantages and lifecycle of Docker container. The eight Docker design patterns explained in this article clearly shows the way to implement Docker containers in production environment. Resources for Article: Further resources on this subject: Orchestration with Docker Swarm [article] Benefits and Components of Docker [article] Docker Hosts [article]
Read more
  • 0
  • 1
  • 12467

article-image-common-php-scenarios
Packt
24 Jan 2017
11 min read
Save for later

Common PHP Scenarios

Packt
24 Jan 2017
11 min read
Introduction In this article by Tim Butler, author of the book Nginx 1.9 Cookbook, we'll go through examples of the more common PHP scenarios and how to implement them with Nginx. PHP is a thoroughly tested product to use with Nginx because it is the most popular web-based programming language. It powers sites, such as Facebook, Wikipedia, and every WordPress-based site, and its popularity hasn't faded as other languages have grown. (For more resources related to this topic, see here.) As WordPress is the most popular of the PHP systems, I've put some additional information to help with troubleshooting. Even if you're not using WordPress, some of this information may be helpful if you run into issues with other PHP frameworks. Most of the recipes expect that you have a working understanding of the PHP systems, so not all of the setup steps for the systems will be covered. In order to keep the configurations as simple as possible, I haven't included details such as cache headers or SSL configurations in these recipes. Configuring Nginx for WordPress Covering nearly 30 percent of all websites, WordPress is certainly the Content Management System (CMS) of choice by many. Although it came from a blogging background, WordPress is a very powerful CMS for all content types and powers some of the world's busiest websites. By combing it with Nginx, you can deploy a highly scalable web platform. You can view the official WordPress documentation on Nginx at https://codex.wordpress.org/Nginx. We'll also cover some of the more complex WordPress scenarios, including multisite configurations with subdomains and directories. Let's get started. Getting ready To compile PHP code and run it via Nginx, the preferred method is via PHP-FPM, a high speed FastCGI Process Manager. We'll also need to install PHP itself and for the sake of simplicity, we'll stick with the OS supplied version. Those seeking the highest possible performance should ensure they're running PHP 7 (released December 3, 2015), which can offer a 2-3x speed improvement for WordPress. To install PHP-FPM, you should run the following on a Debian/Ubuntu system: sudo apt-get install php5-fpm For those running CentOS/RHEL, you should run the following: sudo yum install php-fpm As PHP itself is a prerequisite for the php-fpm packages, it will also be installed. Note: Other packages such as MySQL will be required if you're intending on running this on a single VPS instance. Consult the WordPress documentation for a full list of requirements. How to do it… At this instance, we're simply using a standalone WordPress site, which would be deployed in many personal and business scenarios. This is the typical deployment for WordPress. For ease of management, I've created a dedicated config file just for the WordPress site (/etc/nginx.conf.d/generic-wordpress.conf): server { listen 80; server_name wordpressdemo.nginxcookbook.com; access_log /var/log/nginx/access.log combined; location / { root /var/www/html; try_files $uri $uri/ /index.php?$args; } location ~ .php$ { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } Restart Nginx to pickup the new configuration file and then check your log files if there are any errors. If you're installing WordPress from scratch, you should see the following: You can complete the WordPress installation if you haven't already. How it works… For the root URL call, we have a new try_files directive, which will attempt to load the files in the order specified, but will fallback to the last parameter if they all fail. For this WordPress example, it means that any static files will be served if they exist on the system, then fallback to /index.php?args if this fails. This can also be very handy for automatic maintenance pages too. The args rewrite allows the permalinks of the site to be in a much more human form. For example, if you have a working WordPress installation, you can see links such as the one shown in the following image: Lastly, we process all PHP files via the FastCGI interface to PHP-FPM. In the preceding example, we're referencing the Ubuntu/Debian standard; if you're running CentOS/RHEL, then the path will be /var/run/php-fpm.sock. Nginx is simply proxying the connection to the PHP-FPM instance, rather than being part of Nginx itself. This separation allows for greater resource control, especially since the number of incoming requests to the webserver don't necessarily match the number of PHP requests for a typical website. There’s more… Take care when copying and pasting any configuration files. It's very easy to miss something and have one thing slightly different in your environment, which will cause issues with the website working as expected. Here's a quick lookup table of various other issues which you may come across: Error What to check 502 Bad Gateway File ownership permissions for the PHP-FPM socket file 404 File Not Found Check for the missing index index.php directive 403 Forbidden Check for the correct path in the root directive Your error log (defaults to /var/log/nginx/error.log) will generally contain a lot more detail in regard to the issue you're seeing compared with what's displayed in the browser. Make sure you check the error log if you receive any errors. Hint: Nginx does not support .htaccess files. If you see examples on the web referencing a .htaccess files, these are Apache specific. Make sure any configurations you're looking at are for Nginx. WordPress multisite with Nginx WordPress multisites (also referred to as network sites) allow you to run multiple websites from the one codebase. This can reduce the management burden of having separate WordPress installs when you have similar sites. For example, if you have a sporting site with separate news and staff for different regions, you can use a Multisite install to accomplish this. How to do it... To convert a WordPress site into a multisite, you need to add the configuration variable into your config file: define( 'WP_ALLOW_MULTISITE', true ); Under the Tools menu, you'll now see an extra menu called Network Setup. This will present you with two main options, Sub-domains and Sub-directories. This is the two different ways the multisite installation will work. The Sub-domains option have the sites separated by domain names, for example, site1.nginxcookbook.com and site2.nginxcookbook.com. The Sub-directories option mean that the sites are separated by directories, for example, www.nginxcookbook.com/site1 and www.nginxcookbook.com/site2. There's no functional difference between the two, it's simply an aesthetic choice. However, once you've made your choice, you cannot return to the previous state. Once you've made the choice, it will then provide the additional code to add to your wp-config.php file. Here's the code for my example instance, which is subdirectory based: define('MULTISITE', true); define('SUBDOMAIN_INSTALL', false); define('DOMAIN_CURRENT_SITE', 'wordpress.nginxcookbook.com'); define('PATH_CURRENT_SITE', '/'); define('SITE_ID_CURRENT_SITE', 1); define('BLOG_ID_CURRENT_SITE', 1); Because Nginx doesn't support .htaccess files, the second part of the WordPress instructions will not work. Instead, we need to modify the Nginx configuration to provide the rewrite rules ourselves. In the existing /etc/nginx/conf.d/wordpress.conf file, you'll need to add the following just after the location / directive: if (!-e $request_filename) { rewrite /wp-admin$ $scheme://$host$uri/ permanent; rewrite ^(/[^/]+)?(/wp-.*) $2 last; rewrite ^(/[^/]+)?(/.*.php) $2 last; } Although the if statements are normally avoided if possible, at this instance, it will ensure the subdirectory multisite configuration works as expected. If you're expecting a few thousand concurrent users on your site, then it may be worthwhile investigating the static mapping of each site. There are plugins to assist with the map generations for this, but they are still more complex compared to the if statement. Subdomains If you've selected subdomains, your code to put in wp-config.php will look like this: define('MULTISITE', true); define('SUBDOMAIN_INSTALL', true); define('DOMAIN_CURRENT_SITE', 'wordpressdemo.nginxcookbook.com'); define('PATH_CURRENT_SITE', '/'); define('SITE_ID_CURRENT_SITE', 1); define('BLOG_ID_CURRENT_SITE', 1); You'll also need to modify the Nginx config as well to add the wildcard in for the server name: server_name *.wordpressdemo.nginxcookbook.com wordpressdemo.nginxcookbook.com; You can now add in the additional sites such as site1.wordpressdemo.nginxcookbook.com and there won't be any changes required for Nginx. See also Nginx recipe page: https://www.nginx.com/resources/wiki/start/topics/recipes/wordpress/ WordPress Codex page: https://codex.wordpress.org/Nginx Running Drupal using Nginx With version 8 recently released and a community of over 1 million supporters, Drupal remains a popular choice when it comes to a highly flexible and functional CMS platform. Version 8 has over 200 new features compared to version 7, aimed at improving both the usability and manageability of the system. This cookbook will be using version 8.0.5. Getting ready This example assumes you already have a working instance of Drupal or are familiar with the installation process. You can also follow the installation guide available at https://www.drupal.org/documentation/install. How to do it... This recipe is for a basic Drupal configuration, with the Drupal files located in /var/www/vhosts/drupal. Here's the configuration to use: server { listen 80; server_name drupal.nginxcookbook.com; access_log /var/log/nginx/drupal.access.log combined; index index.php; root /var/www/vhosts/drupal/; location / { try_files $uri $uri/ /index.php?$args; } location ~ (^|/). { return 403; } location ~ /vendor/.*.php$ { deny all; return 404; } location ~ .php$|^/update.php { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_split_path_info ^(.+?.php)(|/.*)$; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } How it works… Based on a simple PHP-FPM structure, we make a few key changes specific for the Drupal environment. The first change is as follows: location ~ (^|/). { return 403; } We put a block in for any files beginning with a dot, which are normally hidden and/or system files. This is to prevent accidental information leakage: location ~ /vendor/.*.php$ { deny all; return 404; } Any PHP file within the vendor directory is also blocked, as they shouldn't be called directly. Blocking the PHP files limits any potential exploit opportunity which could be discovered in third-party code. Lastly, Drupal 8 changed the way the PHP functions are called for updates, which causes any old configuration to break. The location directive for the PHP files looks like this: location ~ .php$|^/update.php { This is to allow the distinct pattern that Drupal uses, where the PHP filename could be midway through the URI. We also modify how the FastCGI process splits the string, so that we ensure we always get the correct answer: fastcgi_split_path_info ^(.+?.php)(|/.*)$; See also Nginx Recipe: https://www.nginx.com/resources/wiki/start/topics/recipes/drupal/ Using Nginx with MediaWiki MediaWiki, most recognized by its use with Wikipedia, is the most popular open source wiki platform available. With features heavily focused on the ease of editing and sharing content, MediaWiki makes a great system to store information you want to continually edit: Getting ready This example assumes you already have a working instance of MediaWiki or are familiar with the installation process. For those unfamiliar with the process, it's available online at https://www.mediawiki.org/wiki/Manual:Installation_guide. How to do it... The basic Nginx configuration for MediaWiki is very similar to many other PHP platforms. It has a flat directory structure which easily runs with basic system resources. Here's the configuration: server { listen 80; server_name mediawiki.nginxcookbook.com; access_log /var/log/nginx/mediawiki.access.log combined; index index.php; root /var/www/vhosts/mediawiki/; location / { try_files $uri $uri/ /index.php?$args; } location ~ .php$ { fastcgi_pass unix:/var/run/php5-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; include fastcgi_params; } } The default installation doesn't use any rewrite rules, which means you'll get URLs such as index.php?title=Main_Page instead of the neater (and more readable) /wiki/Main_Page. To enable this, we need to edit the LocalSettings.php file and add the following lines: $wgArticlePath = "/wiki/$1"; $wgUsePathInfo = TRUE; This allows the URLs to be rewritten in a much neater format. See also NGINX Recipe: https://www.nginx.com/resources/wiki/start/topics/recipes/mediawiki/ Summary In this article we learned common PHP scenarios and how to configure them with Nginx. The first recipe talks about how to configure Nginx for WordPress. Then we learned how to set up a WordPress multisite. In third recipe we discussed how to configure and run Drupal using Nginx. In the last recipe we learned how to configure Nginx for MediaWiki. Resources for Article: Further resources on this subject: A Configuration Guide [article] Nginx service [article] Getting Started with Nginx [article]
Read more
  • 0
  • 0
  • 28486
article-image-storage-apache-cassandra
Packt
23 Jan 2017
42 min read
Save for later

The Storage - Apache Cassandra

Packt
23 Jan 2017
42 min read
In this article by Raúl Estrada, the author of the book Fast Data Processing Systems with SMACK Stack we will learn about Apache Cassandra. We have reached the part where we talk about storage. The C in the SMACK stack refers to Cassandra. The reader may wonder; why not use a conventional database? The answer is that Cassandra is the database that propels some giants like Walmart, CERN, Cisco, Facebook, Netflix, and Twitter. Spark uses a lot of Cassandra’s power. The application efficiency is greatly increased using the Spark Cassandra Connector. This article has the following sections: A bit of history NoSQL Apache Cassandra installation Authentication and authorization (roles) Backup and recovery Spark +a connector (For more resources related to this topic, see here.) A bit of history In Greek mythology, there was a priestess who was chastised for her treason againstthe God, Apollo. She asked forthe power of prophecy in exchange for a carnal meeting; however, she failed to fulfill her part of the deal. So, she received a punishment; she would have the power of prophecy, but no one would ever believe her forecasts. This priestess’s name was Cassandra. Movingto more recenttimes, let’s say 50 years ago, in the world of computing there have been big changes. In 1960, the HDD (Hard Disk Drive) took precedence over the magnetic strips which facilitate data handling. In 1966, IBM created the Information Management System (IMS) for the Apollo space program from whose hierarchical models later developed IBM DB2. In 1970s, a model that is fundamentally changing the existing data storage methods appeared, called the relational data model. Devised by Codd as an alternative to IBM’s IMS and its organization mode and data storage in 1985, his work presented 12 rules that a database should meet in order to be considered a relational database. The Web (especially social networks) appeared and demanded the storage oflarge amounts of data. The Relational Database Management System (RDBMS) scales the actual costs of databases, the number of users, amount of data, response time, or the time it takes to make a specific query on a database. In the beginning, it waspossible to solve through vertical scaling: the server machine is upgraded with more RAM, higher processors, and larger and faster HDDs. Now we can mitigate the problem, but it will not disappear. When the same problem occurs again, and the server cannot be upgraded, the only solution is to add a new server, which itself may hide unplanned costs: OS license, Database Management System (DBMS), and so on, without mentioning the data replication, transactions, and data consistency under normal use. One solution of such problems is the use of NoSQL databases. NoSQL was born from the need to process large amounts of data based on large hardware platforms built through clustering servers. The term NoSQL is perhaps not precise. A more appropriate term should be Not Only SQL. It is used on several non-relational databases such as Apache Cassandra, MongoDB, Riak, Neo4J, and so on, which have becomemore widespread in recent years. NoSQL We will read NoSQL as Not only SQL (SQL, Structured Query Language). NoSQL is a distributed database with an emphasis on scalability, high availability, and ease of administration; the opposite of established relational databases. Don’t think it as a direct replacement for RDBMS, rather, an alternative or a complement. The focus is in avoiding unnecessary complexity, the solution for data storage according to today’s needs, and without a fixed scheme. Due its distributed, the cloud computing is a great NoSQL sponsor. A NoSQL database model can be: Key-value/Tuple based For example, Redis, Oracle NoSQL (ACID compliant), Riak, Tokyo Cabinet / Tyrant, Voldemort, Amazon Dynamo, and Memcached and is used by Linked-In, Amazon, BestBuy, Github, and AOL. Wide Row/Column-oriented-based For example, Google BigTable, Apache Cassandra, Hbase/Hypertable, and Amazon SimpleDB and used by Amazon, Google, Facebook, and RealNetworks Document-based For example, CouchDB (ACID compliant), MongoDB, TerraStore, and Lotus Notes (possibly the oldest) and used in various financial and other relevant institutions: the US army, SAP, MTV, and SourceForge Object-based For example, db4o, Versant, Objectivity, and NEO and used by Siemens, China Telecom, and the European Space Agency. Graph-based For example, Neo4J, InfiniteGraph, VertexDb, and FlocDb and used by Twitter, Nortler, Ericson, Qualcomm, and Siemens. XML, multivalue, and others In Table 4-1, we have a comparison ofthe mentioned data models: Model Performance Scalability Flexibility Complexity Functionality key-value high high high low depends column high high high low depends document high high high low depends graph depends depends high high graph theory RDBMS depends depends low moderate relational algebra Table 4-1: Categorization and comparison NoSQL data model of Scofield and Popescu NoSQL or SQL? This is thewrong question. It would be better to ask the question: What do we need? Basically, it all depends on the application’s needs. Nothing is black and white. If consistency is essential, use RDBMS. If we need high-availability, fault tolerance, and scalability then use NoSQL. The recommendation is that in a new project, evaluate the best of each world. It doesn’t make sense to force NoSQL where it doesn’t fit, because its benefits (scalability, read/write speed in entire order of magnitude, soft data model) are only conditioned advantages achieved in a set of problems that can be solved, per se. It is necessary to carefully weigh, beyond marketing, what exactly is needed, what kind of strategy is needed, and how they will be applied to solve our problem. Consider using a NoSQL database only when you decide that this is a better solution than SQL. The challenges for NoSQL databases are: elastic scaling, cost-effective, simple and flexible. In table 4-2, we compare the two models: NoSQL RDBMS Schema-less Relational schema Scalable read/write Scalable read Auto high availability Custom high availability Limited queries Flexible queries Eventual consistency Consistency BASE ACID Table 4-2: Comparison of NoSQL and RDBMS CAP Brewer’s theorem In 2000, in Portland Oregon, the United States held the nineteenth international symposium on principles of distributed computing where keynote speaker Eric Brewer, a professor at UC Berkeley talked. In his presentation, among other things, he said that there are three basic system requirements which have a special relationship when making the design and implementation of applications in a distributed environment, and that a distributed system can have a maximum of two of the three properties (which is the basis of his theorem). The three properties are: Consistency: This property says that the data on one node must be the same data when read from a second node, the second node must show exactly the same data (could be a delay, if someone else in between is performing an update, but not different). Availability: This property says that a failure on one node doesn’t mean the loss of its data; the system must be able to display the requested data. Partition tolerance: This property says that in the event of a breakdown in communication between two nodes, the system should still work, meaning the data will still be available. In Figure 4-1, we show the CAP Brewer’s theorem with some examples.   Figure 4-1 CAP Brewer’s theorem Apache Cassandra installation In the Facebook laboratories, although not visible to the public, new software is developed, for example, the junction between two concepts involving the development departments of Google and Amazon. In short, Cassandra is defined as a distributed database. Since the beginning, the authors took the task of creating a scalable database massively decentralized, optimized for read operations when possible, painlessly modifying data structures, and with all this, not difficult to manage. The solution was found by combining two existing technologies: Google’s BigTable and Amazon’s Dynamo.One of the two authors, A. Lakshman, had earlier worked on BigTable and he borrowed the data model layout, while Dynamo contributed with the overall distributed architecture. Cassandra is written in Java and for good performance it requires the latest possible JDK version. In Cassandra 1.0, they used another open source project Thriftfor client access, which also came from Facebook and is currently an Apache Software project. In Cassandra 2.0, Thrift was removed in favor of CQL. Initially, thrift was not made just for Cassandra, but it is a software library tool and code generator for accessing backend services. Cassandra administration is done with the command-line tools or via the JMX console, the default installation allows us to use additional client tools. Since this is a server cluster, it hasdifferent administration rules and it is always good to review thedocumentation to take advantage of other people’s experiences. Cassandra managed the very demanding taskssuccessfully. Often used on site, serving a huge number of users (such as Twitter, Digg, Facebook, and Cisco) that, relatively, often change their complex data models to meet the challenges that will come later, and usually do not have to dealwith expensive hardware or licenses. At the time of writing, the Cassandra homepage (http://cassandra.apache.org) says that Apple Inc. for example, has a 75000 node cluster storing 10 Petabytes. Data model The storage model of Cassandra could be seen as a sorted HashMap of sorted HashMaps. Cassandra is a database that stores the rows in the form of key-value. In this model, the number of columns is not predefined in advance as in standard relational databases, but a single row can contain several columns. The column (Figure 4-2, Column) is the smallest atomic unit model. Each element in the column consists of a triplet: a name, a value (stored as a series of bytes without regard to the source type), and a timestamp (the time used to determine the most recent record). Figure4-2: Column All data triplets are obtained from the client, and even a timestamp. Thus, the row consists of a key and a set of data triplets (Figure 4-3).Here is how the super column will look: Figure 4-3: Super column In addition, the columns can be grouped into so-called column families (Figure 4-4, Column family), which would be somehow equivalent to the table and can be indexed: Figure 4-4: Column family A higher logical unit is the super column (as shown in the followingFigure 4-5, Super column family), in which columns contain other columns: Figure 4-5: Super column family Above all is the key space (As shown in Figure 4-6, Cluster with Key Spaces), which would be equivalent to a relational schema andis typically used by one application. The data model is simple, but at the same time very flexible and it takes some time to become accustomed to the new way of thinking while rejecting all the SQL’s syntax luxury. The replication factor is unique per keyspace. Moreover, keyspace could span multiple clusters and have different replication factors for each of them. This is used in geo-distributed deployments. Figure 4-6: Cluster with key spaces Data storage Apache Cassandra is designed to process large amounts of data in a short time; this way of storing data is taken from her big brother, Google’s Bigtable. Cassandra has a commit log file in which all the new data is recorded in order to ensure their sustainability. When data is successfully written on the commit log file, the recording of the freshest data is stored in a memory structure called memtable (Cassandra considers a writing failure if the same information is in the commit log and in memtable). Data within memtables issorted by Row key. When memtable is full, its contents are copied to the hard drive in a structure called Sorted String Table (SSTable). The process of copying content from memtable into SSTable is called flush. Data flush is performed periodically, although it could be carried out manually (for example, before restarting a node) through node tool flush commands. The SSTable provides a fixed, sorted map of row and value keys. Data entered in one SSTable cannot be changed, but is possible to enter new data. The internal structure of SSTable consists of a series of blocks of 64Kb (the block size can be changed), internally a SSTable is a block index used to locate blocks. One data row is usually stored within several SSTables so reading a single data row is performed in the background combining SSTables and the memtable (which have not yet made flush). In order to optimize the process of connecting, Cassandra uses a memory structure called Bloomfilter. Every SSTable has a bloom filter that checks if the requested row key is in the SSTable before look up in the disk. In order to reduce row fragmentation through several SSTables, in the background Cassandra performs another process: the compaction, a merge of several SSTables into a single SSTable. Fragmented data iscombined based on the values ​​of a row key. After creating a new SSTable, the old SSTable islabeled as outdated and marked in the garbage collector process for deletion. Compaction has different strategies: size-tiered compaction and leveled compaction and both have their own benefits for different scenarios. Installation To install Cassandra, go to http://www.planetcassandra.org/cassandra/. Installation is simple. After downloading the compressed files, extract them and change a couple of settings in the configuration files (set the new directory path). Run the startup scripts to activate a single node, and the database server. Of course, it is possible to use Cassandra in only one node, but we lose its main power, the distribution. The process of adding new servers to the cluster is called bootstrap and is generally not a difficult operation. Once all the servers are active, they form a ring of nodes, none of which is central meaning without a main server. Within the ring, the information propagation on all servers is performed through a gossip protocol. In short, one node transmits information about the new instances to only some of their known colleagues, and if one of them already knows from other sources about the new node, the first node propagation is stopped. Thus, the information about the node is propagated in an efficient and rapid way through the network. It is necessary for a new node activation to seed its information to at least one existing server in the cluster so the gossip protocol works. The server receives its numeric identifier, and each of the ring nodes stores its data. Which nodes store the information depends on the hash MD5 key-value (a combination of key-value) as shown in Figure 4-7, Nodes within a cluster. Figure 4-7: Nodes within a cluster The nodes are in a circular stack, that is, a ring, and each record is stored on multiple nodes. In case of failure of one of them, the data isstill available. Nodes are occupied according to their identifier integer range, that is, if the calculated value falls into a node range, then the data is saved there. Saving is not performed on only one node, more is better, an operation is considered a success if the data is correctly stored at the most possible nodes. All this is parameterized. In this way, Cassandra achieves sufficient data consistency and provides greater robustness of the entire system, if one node in the ring fails, is always possible to retrieve valid information from the other nodes. In the event that a node comes back online again, it is necessary to synchronize the data on it, which is achieved through the reading operation. The data is read from all the ring servers, a node saves just the data accepted as valid, that is, the most recent data, the data comparison is made according to the timestamp records. The nodes that don’t have the latest information, refresh theirdata in a low priority back-end process. Although this brief description of the architecture makes it sound like it is full of holes, in reality everything works flawlessly. Indeed, more servers in the game implies a better general situation. DataStax OpsCenter In this section, we make the Cassandra installation on a computer with a Windows operating system (to prove that nobody is excluded). Installing software under the Apache open license can be complicated on a Windows computer, especially if it is new software, such as Cassandra. To make things simpler we will use a distribution package for easy installation, start-up and work with Cassandra on a Windows computer. The distribution used in this example is called DataStax Community Edition. DataStax contains Apache Cassandra, along with the Cassandra Query Language (CQL) tool and the free version of DataStax OpsCenter for management and monitoring the Cassandra cluster. We can say that OpsCenter is a kind of DBMS for NoSQL databases. After downloading the installer from the DataStax’s official site, the installation process is quite simple, just keep in mind that DataStax supports Windows 7 and Windows Server 2008 and that DataStax used on a Windows computer must have the Chrome or Firefox web browser (Internet explorer is not supported). When starting DataStax on a Windows computer, DataStax will open asin Figure 4-8, DataStax OpsCenter. Figure 4-8: DataStax OpsCenter DataStax consists of a control panel (dashboard), in which we review the events, performance, and capacity of the cluster and also see how many nodes belong to our cluster (in this case a single node). In cluster control, we can see the different types of views (ring, physical, list). Adding a new key space (the equivalent to creating a database in the classic DBMS) is done through the CQLShell using CQL or using the DataStax data modeling. Also, using the data explorer we can view the column family and the database. Creating a key space The main tool for managing Cassandra CQL runs in a console interface and this tool is used to add new key spaces from which we will create a column family. The key space is created as follows: cqlsh> create keyspace hr with strategy_class=‘SimpleStrategy’ and strategy_options_replication_factor=1; After opening CQL Shell, the command create keyspace will make a new key space, the strategy_class = ‘SimpleStrategy’parameter invokes class replication strategy used when creating new key spaces. Optionally,strategy_options:replication_factor = 1command creates a copy of each row in each cluster node, and the value replication_factor set to 1 produces only one copy of each row on each node (if we set to 2, we will have two copies of each row on each node). cqlsh> use hr; cqlsh:hr> create columnfamily employee (sid int primary key, ... name varchar, ... last_name varchar); There are two types of keyspaces: SimpleStrategy and NetworkTopologyStrategy, whose syntax is as follows: { ‘class’ : ‘SimpleStrategy’, ‘replication_factor’ : <integer> }; { ‘class’ : ‘NetworkTopologyStrategy’[, ‘<data center>‘ : <integer>, ‘<data center>‘ : <integer>] . . . }; When NetworkTopologyStrategyis configured as the replication strategy, we set up one or more virtual data centers. To create a new column family, we use the create command; select the desired Key Space, and with the command create columnfamily example, we create a new table in which we define the id an integer as a primary key and other attributes like name and lastname. To make a data entry in column family, we use the insert command: insert into <table name> (<attribute_1>, < attribute_2> ... < attribute_n>); When filling data tables we use the common SQL syntax: cqlsh:hr>insert into employee (sid, name, lastname) values (1, ‘Raul’, ‘Estrada’); So we enter data values. With the selectcommand we can review our insert: cqlsh:hr> select * from employee; sid | name | last_name ----+------+------------ 1 | Raul | Estrada Authentication and authorization (roles) In Cassandra, the authentication and authorization must be configured on the cassandra.yamlfile and two additional files. The first file is to assign rights to users over the key space and column family, while the second is to assign passwords to users. These files are called access.properties and passwd.properties, and are located in the Cassandra installation directory. These files can be opened using our favorite text editor in order to be successfully configured. Setting up a simple authentication and authorization The following steps are: In the access.properitesfile we add the access rights to users and the permissions to read and write certain key spaces and columnfamily.Syntax: keyspace.columnfamily.permits = users Example 1: hr <rw> = restrada Example 2: hr.cars <ro> = restrada, raparicio In example 1, we give full rights in the Key Space hr to restrada while in example 2 we give read-only rights to users to the column family cars. In the passwd.propertiesfile, user names are matched to passwords, onthe left side of the equal sign we write username and onthe right side the password: Example: restrada = Swordfish01 After we change the files, before restarting Cassandra it is necessary to type the following command in the terminal in order to reflect the changes in the database: $ cd <installation_directory> $ sh bin/cassandra -f -Dpasswd.properties = conf/passwd.properties -Daccess.properties = conf/access.properties Note: The third step of setting up authentication and authorization doesn’t work onWindows computers and is just needed on Linux distributions. Also, note that user authentication and authorization should not be solved through Cassandra, for safety reasons, in the latest Cassandra versions this function is not included. Backup The purpose of making Cassandra a NoSQL database is because when we create a single node, we make a copy of it. Copying the database to other nodes and the exact number of copies depend on the replication factor established when we create a new key space. But as any other standard SQL database, Cassandra offers to create a backup on the local computer. Cassandra creates a copy of the base using snapshot. It is possible to make a snapshot of all the key spaces, or just one column family. It is also possible to make a snapshot of the entire cluster using the parallel SSH tool (pssh). If the user decides to snapshot the entire cluster, it can be reinitiated and use an incremental backup on each node. Incremental backups provide a way to get each node configured separately, through setting the incremental_backupsflagto truein cassandra.yaml. When incremental backups are enabled, Cassandra hard-links each flushed SSTable to a backups directory under the keyspace data directory. This allows storing backups offsite without transferring entire snapshots. To snapshot a key space we use the nodetool command: Syntax: nodetool snapshot -cf <ColumnFamily><keypace> -t <snapshot_name> Example: nodetool snapshot -cf cars hr snapshot1 The snapshot is stored in the Cassandra installation directory: C:Program FilesDataStax Communitydatadataenexamplesnapshots Compression The compression increases the cluster nodes capacity reducing the data size on the disk. With this function, compression also enhances the server’s disk performance. Compression in Cassandra works better when compressing a column family with a lot of columns, when each row has the same columns, or when we have a lot of common columns with the same data. A good example of this is a column family that contains user information such as user name and password because it is possible that they have the same data repeated. As the greater number of the same data to be extended through the rows, the compression ratio higher is. Column family compression is made with the Cassandra-CLI tool. It is possible to update existing columns families or create a new column family with specific compression conditions, for example, the compression shown here: CREATE COLUMN FAMILY users WITH comparator = ‘UTF8Type’ AND key_validation_class = ‘UTF8Type’ AND column_metadata = [ (column_name: name, validation_class: UTF8Type) (column_name: email, validation_class: UTF8Type) (column_name: country, validation_class: UTF8Type) (column_name: birth_date, validation_class: LongType) ] AND compression_options=(sstable_compression:SnappyCompressor, chunk_length_kb:64); We will see this output: Waiting for schema agreement.... ... schemas agree across the cluster After opening the Cassandra-CLI, we need to choose thekey space where the new column family would be. When creating a column family, it is necessary to state that the comparator (UTF8 type) and key_validation_class are of the same type. With this we will ensure that when executing the command we won’t have an exception (generated by a bug). After printing the column names, we set compression_options which has two possible classes: SnappyCompresor that provides faster data compression or DeflateCompresor which provides a higher compression ratio. The chunk_length adjusts compression size in kilobytes. Recovery Recovering a key space snapshot requests all the snapshots made for a certain column family. If you use an incremental backup, it is also necessary to provide the incremental backups created after the snapshot. There are multiple ways to perform a recovery from the snapshot. We can use the SSTable loader tool (used exclusively on the Linux distribution) or can recreate the installation method. Restart node If the recovery is running on one node, we must first shutdown the node. If the recovery is for the entire cluster, it is necessary to restart each node in the cluster. Here is the procedure: Shut down the node Delete all the log files in:C:Program FilesDataStax Communitylogs Delete all .db files within a specified key space and column family:C:Program FilesDataStax Communitydatadataencars Locate all Snapshots related to the column family:C:Program FilesDataStax Communitydatadataencarssnapshots1,351,279,613,842, Copy them to: C:Program FilesDataStax Communitydatadataencars Re-start the node. Printing schema Through DataStax OpsCenter or Apache Cassandra CLI we can obtain the schemes (Key Spaces) with the associated column families, but there is no way to make a data export or print it. Apache Cassandra is not RDBMS and it is not possible to obtain a relational model scheme from the key space database. Logs Apache Cassandra and DataStax OpsCenter both use the Apache log4j logging service API. In the directory where DataStax is installed, under Apache-Cassandra and opsCenter is the conf directory where the file log4j-server.properties is located, log4j-tools.properties for apache-cassandra andlog4j.properties for OpsCenter. The parameters of the log4j file can be modified using a text editor, log files are stored in plain text in the...DataStax Communitylogsdirectory, here it is possible to change the directory location to store the log files. Configuring log4j log4j configuration files are divided into several parts where all the parameters are set to specify how collected data is processed and written in the log files. For RootLoger: # RootLoger level log4j.rootLogger = INFO, stdout, R This section defines the data level, respectively, to all the events recorded in the log file. As we can see in Table 4-3, log level can be: Level Record ALL The lowest level, all the events are recorded in the log file DEBUG Detailed information about events ERROR Information about runtime errors or unexpected events FATAL Critical error information INFO Information about the state of the system OFF The highest level, the log file record is off TRACE Detailed debug information WARN Information about potential adverse events (unwanted/unexpected runtime errors) Table 4-3 Log4J Log level For Standard out stdout: # stdout log4j.appender.stdout = org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout = org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern= %5p %d{HH:mm:ss,SSS} %m%n Through the StandardOutputWriterclass,we define the appearance of the data in the log file. ConsoleAppenderclass is used for entry data in the log file, and theConversionPattern class defines the data appearance written into a log file. In the diagram, we can see how the data looks like stored in a log file, which isdefined by the previous configuration. Log file rotation In this example, we rotate the log when it reaches 20 Mb and we retain just 50 log files. # rolling log file log4j.appender.R=org.apache.log4j.RollingFileAppender log4j.appender.R.maxFileSize=20MB log4j.appender.R.maxBackupIndex=50 log4j.appender.R.layout=org.apache.log4j.PatternLayout log4j.appender.R.layout.ConversionPattern=%5p [%t] %d{ISO8601} %F (line %L) %m%n This part sets the log files. TheRollingFileAppenderclass inherits from FileAppender, and its role is to make a log file backup when it reaches a given size (in this case 20 MB). TheRollingFileAppender class has several methods, these two are the most used: public void setMaxFileSize( String value ) Method to define the file size and can take a value from 0 to 263 using the abbreviations KB, MB, GB.The integer value is automatically converted (in the example, the file size is limited to 20 MB): public void setMaxBackupIndex( int maxBackups ) Method that defines how the backup file is stored before the oldest log file is deleted (in this case retain 50 log files). To set the parameters of the location where the log files will be stored, use: # Edit the next line to point to your logs directory log4j.appender.R.File=C:/Program Files (x86)/DataStax Community/logs/cassandra.log User activity log log4j API has the ability to store user activity logs.In production, it is not recommended to use DEBUG or TRACE log level. Transaction log As mentioned earlier, any new data is stored in the commit log file. Within thecassandra.yaml configuration file, we can set the location where the commit log files will be stored: # commit log commitlog_directory: “C:/Program Files (x86)/DataStax Community/data/commitlog” SQL dump It is not possible to make a database SQL dump, onlysnapshot the DB. CQL CQL is a language like SQL, CQL means Cassandra Query Language.With this language we make the queries on a Key Space. There are several ways to interact with a Key Space, in the previous section we show how to do it using a shell called CQL shell. Since CQL is the first way to interact with Cassandra, in Table 4-4, Shell Command Summary, we see the main commands that can be used on the CQL Shell: Command Description Cqlsh Captures command output and appends it to a file. CAPTURE Shows the current consistency level, or given a level, sets it. CONSISTENCY Imports and exports CSV (comma-separated values) data to and from Cassandra. COPY Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster. DESCRIBE Formats the output of a query vertically. EXPAND Terminates cqlsh. EXIT Enables or disables query paging. PAGING Shows the Cassandra version, host, or tracing information for the current cqlsh client session. SHOW Executes a file containing CQL statements. SOURCE Enables or disables request tracing. TRACING Captures command output and appends it to a file. Table 4-4. Shell command summary For more detailed information of shell commands, visit: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlshCommandsTOC.html CQL commands CQL is very similar to SQLas we have already seen in this article. Table 4-5, CQL Command Summary lists the language commands. CQL, like SQL, is based on sentences/statements.These sentences are for data manipulation and work with their logical container, the key space. The same as SQL statements, they must end with a semicolon (;) Command Description ALTER KEYSPACE Change property values of a keyspace. ALTER TABLE Modify the column metadata of a table. ALTER TYPE Modify a user-defined type. Cassandra 2.1 and later. ALTER USER Alter existing user options. BATCH Write multiple DML statements. CREATE INDEX Define a new index on a single column of a table. CREATE KEYSPACE Define a new keyspace and its replica placement strategy. CREATE TABLE Define a new table. CREATE TRIGGER Registers a trigger on a table. CREATE TYPE Create a user-defined type. Cassandra 2.1 and later. CREATE USER Create a new user. DELETE Removes entire rows or one or more columns from one or more rows. DESCRIBE Provides information about the connected Cassandra cluster, or about the data objects stored in the cluster. DROP INDEX Drop the named index. DROP KEYSPACE Remove the keyspace. DROP TABLE Remove the named table. DROP TRIGGER Removes registration of a trigger. DROP TYPE Drop a user-defined type. Cassandra 2.1 and later. DROP USER Remove a user. GRANT Provide access to database objects. INSERT Add or update columns. LIST PERMISSIONS List permissions granted to a user. LIST USERS List existing users and their superuser status. REVOKE Revoke user permissions. SELECT Retrieve data from a Cassandra table. TRUNCATE Remove all data from a table. UPDATE Update columns in a row. USE Connect the client session to a keyspace. Table 4-5. CQL command summary For more detailed information of CQL commands visit: http://docs.datastax.com/en/cql/3.1/cql/cql_reference/cqlCommandsTOC.html DBMS Cluster The idea of ​​Cassandra is a database working in a cluster, that is databases on multiple nodes. Although primarily intended for Cassandra Linux distributions is building clusters on Linux servers, Cassandra offers the possibility to build clusters on Windows computers. The first task that must be done prior to setting up the cluster on Windows computers is opening the firewall for Cassandra DBMS DataStax OpsCenter. Ports that must be open for Cassandra are 7000 and 9160. For OpsCenter, the ports are 7199, 8888, 61620 and 61621. These ports are the default when we install Cassandra and OpsCenter, however, unless it is necessary, we can specify new ports. Immediately after installing Cassandra and OpsCenter on a Windows computer, it is necessary to stop the DataStax OpsCenter service, the DataStax OpsCenter agent like in Figure 4-9,Microsoft Windows display services. Figure 4-9: Microsoft Windows display services One of Cassandra’s advantages is that it automatically distributes data in the computers of the cluster using the algorithm for the incoming data. To successfully perform this, it is necessary to assign tokens to each computer in the cluster. The token is a numeric identifier that indicates the computer’s position in the cluster and the data scope in the cluster responsible for that computer. For a successful token generation can be used Python that comes within the Cassandra installation located in the DataStax’s installation directory. In the code for generating tokens, the variable num = 2 refers to the number of computers in the cluster: $ python -c “num=2; print ““n”“.join([(““token %d: %d”“ %(i,(i*(2**127)/num))) for i in range(0,num)])” We will see an output like this: token 0: 0 token 1: 88743298547982745894789547895490438209 It is necessary to preserve the value of the token because they will be required in the following steps. We now need to configure the cassandra.yaml file which we have already met in the authentication and authorization section. The cassandra.yaml file must be configured separately on each computer in the cluster. After opening the file, you need to make the following changes: Initial_token On each computer in the cluster, copy the tokens generated. It should start from the token 0 and assign each computer a unique token. Listen_adress In this section, we will enter the IP of the computer used. Seeds You need to enter the IP address of the primary (main) node in the cluster. Once the file is modified and saved, you must restart DataStax Community Server as we already saw. This should be done only on the primary node. After that it is possible to check if the cluster nodes have communication using the node tool. In node tool, enter the following command: nodetool -h localhost ring If the cluster works, we will see the following result: AddressDCRackStatusStateLeadOwnsToken -datacenter1rack1UpNormal13.41 Kb50.0%88743298547982745894789547895490438209 -datacenter1rack1UpNormal6.68 Kb50.0%88743298547982745894789547895490438209 If the cluster is operating normally,select which computer will be the primary OpsCenter (may not be the primary node). Then on that computer open opscenter.conf which can be found in the DataStax’s installation directory. In that directory, you need to find the webserver interface section and set the parameter to the value 0.0.0.0. After that, in the agent section, change the incoming_interfaceparameter to your computer IP address. In DataStax’s installation directory (on each computer in the cluster) we must configure the address.yamlfile. Within these files, set the stomp_interface local_interfaceparameters and to the IP address of the computer where the file is configured. Now the primary computer should run the DataStax OpsCenter Community and DataStax OpsCenter agent services. After that, runcomputers the DataStax OpsCentar agent service on all the nodes. At this point it is possible to open DataStax OpsCenter with anInternet browser and OpsCenter should look like Figure 4-10, Display cluster in OpsCenter. Figure 4-10: Display cluster in OpsCenter Deleting the database In Apache Cassandra, there are several ways to delete the database (key space) or parts of the database (column family, individual rows within the family row, and so on). Although the easiest way to make a deletion is using the DataStax OpsCenter data modeling tool, there are commands that can be executed through the Cassandra-CLI or the CQL shell. CLI delete commands InTable 4-6, we have the CLI delete commands: CLI Command Function part Used to delete a great column, a column from the column family or rows within certain columns drop columnfamily Delete column family and all data contained on them drop keyspace Delete the key space, all the column families and the data contained on them. truncate Delete all the data from the selected column family Table 4-6 CLI delete commands CQL shell delete commands  In Table 4-7, we have the shell delete commands: CQL shell command Function alter_drop Delete specified column from the column family delete Delete one or more columns from one or more rows of the selected column family delete_columns Delete columns from the column family delete_where Delete individual rows drop_table Delete the selected column family and all the data contained on it drop_columnfamily Delete column family and all the data contained on it drop_keyspace Delete the key space, all the column families and all the data contained on them. truncate Delete all data from the selected column family. Table 4-7 CQL Shell delete commands DB and DBMS optimization Cassandra optimization is specified in the cassandra.yamlfile and these properties are used to adjust the performance and specify the use of system resources such as disk I/O, memory, and CPU usage. column_index_size_in_kb: Initial value: 64 Kb Range of values: - Column indices added to each row after the data reached the default size of 64 Kilobytes. commitlog_segment_size_in_mb Initial value: 32 Mb Range of values: 8-1024 Mb Determines the size of the commit log segment. The commit log segment is archived to be obliterated or recycled after they are transferred to the SRM table. commitlog_sync Initial value: - Range of values: - In Cassandra, this method is used for entry reception. This method is closely correlated with commitlog_sync_period_in_ms that controls how often log is synchronized with the disc. commitlog_sync_period_in_ms Initial value: 1000 ms Range of values: - Decides how often to send the commit log to disk when commit_sync is in periodic mode. commitlog_total_space_in_mb Initial value: 4096 MB Range of values: - When the size of the commit log reaches an initial value, Cassandra removes the oldest parts of the commit log. This reduces the data amount and facilitates the launch of fixtures. compaction_preheat_key_cache Initial value: true Range of values: true / false When this value is set to true, the stored key rows are monitored during compression, and after resaves it to a new location in the compressed SSTable. compaction_throughput_mb_per_sec Initial value: 16 Range of values: 0-32 Compression damping the overall bandwidth throughout the system. Faster data insertion means faster compression. concurrent_compactors Initial value: 1 per CPU core Range of values: depends on the number of CPU cores Adjusts the number of simultaneous compression processes on the node. concurrent_reads Initial value: 32 Range of values: - When there is more data than the memory can fit, a bottleneck occurs in reading data from disk. concurrent_writes Initial value: 32 Range of values: - Making inserts in Cassandra does not depend on I/O limitations. Concurrent inserts depend on the number of CPU cores. The recommended number of cores is 8. flush_largest_memtables_at Initial value: 0.75 Range of values: - This parameter clears the biggest memtable to free disk space. This parameter can be used as an emergency measure to prevent memory loss (out of memory errors) in_memory_compaction_limit_in_mb Initial value: 64 Range of values: Limit order size on the memory. Larger orders use a slower compression method. index_interval Initial value: 128 Value range: 128-512 Controlled sampling records from the first row of the index in the ratio of space and time, that is, the larger the time interval to be sampled the less effective. In technical terms, the interval corresponds to the number of index samples skipped between taking each sample. memtable_flush_queue_size Initial value: 4 Range of values: a minimum set of the maximum number of secondary indexes that make more than one Column family Indicates the total number of full-memtable to allow a flush, that is, waiting to the write thread. memtable_flush_writers Initial value: 1 (according to the data map) Range of values: - Number of memtable flush writer threads. These threads are blocked by the disk I/O, and each thread holds a memtable in memory until it is blocked. memtable_total_space_in_mb Initial value: 1/3 Java Heap Range of values: - Total amount of memory used for all the Column family memtables on the node. multithreaded_compaction Initial value: false Range of values: true/false Useful only on nodes using solid state disks reduce_cache_capacity_to Initial value: 0.6 Range of values: - Used in combination with reduce_cache_capacity_at. When Java Heap reaches the value of reduce_cache_size_at, this value is the total cache size to reduce the percentage to the declared value (in this case the size of the cache is reduced to 60%). Used to avoid unexpected out-of-memory errors. reduce_cache_size_at Initial value: 0.85 Range of values: 1.0 (disabled) When Java Heap marked to full sweep by the garbage Collector reaches a percentage stated on this variable (85%), Cassandra reduces the size of the cache to the value of the variable reduce_cache_capacity_to. stream_throughput_outbound_megabits_per_sec Initial value: off, that is, 400 Mbps (50 Mb/s) Range of values: - Regulate the stream of output file transfer in a node to a given throughput in Mbps. This is necessary because Cassandra mainly do sequential I/O when it streams data during system startup or repair, which can lead to network saturation and affect Remote Procedure Call performance. Bloom filter Every SSTable has a Bloom filter. In data requests, the Bloom filter checks whether the requested order exists in the SSTable before any disk I/O. If the value of the Bloom filter is too low, it may cause seizures of large amounts of memory, respectively, a higher Bloom filter value, means less memory use. The Bloom filter range of values ​​is from 0.000744 to 1.0. It is recommended keep the minimum value of the Bloom filter less than 0.1. The value of the Bloom filter column family is adjusted through the CQL shell as follows: ALTER TABLE <column_family> WITH bloom_filter_fp_chance = 0.01; Data cache Apache Cassandra has two caches by which it achieves highly efficient data caching. These are: cache key (default: enabled): cache index primary key columns families row cache (default: disabled): holding a row in memory so that reading can be done without using the disc If the key and row cache set, the query of data is accomplished in the way shown in Figure 4-11, Apache Cassandra Cache. Figure 4-11: Apache Cassandra cache When information is requested, first it checks in the row cache, if the information is available, then row cache returns the result without reading from the disk. If it has come from a request and the row cache can return a result, it checks if the data can be retrieved through the key cache, which is more efficient than reading from the disk, the retrieved data is finally written to the row cache. As the key cache memory stores the key location of an individual column family, any increase in key cache has a positive impact on reading data for the column family. If the situation permits, a combination of key cache and row cache increases the efficiency. It is recommended that the size of the key cache is set in relation to the size of the Java heap. Row cache is used in situations where data access patterns follow a normal (Gaussian) distribution of rows that contain often-read data and queries often returning data from the most or all the columns. Within cassandra.yaml files, we have the following options to configure the data cache: key_cache_size_in_mb Initial value: empty, meaning“Auto” (min (5% Heap (in MB), 100MB)) Range of values: blank or 0 (disabled key cache) Variable that defines the key cache size per node row_cache_size_in_mb Initial value: 0 (disabled) Range of values: - Variable that defines the row cache size per node key_cache_save_period Initial value: 14400 (i.e. 4 hours) Range of values: - Variable that defines the save frequency of key cache to disk row_cache_save_period Initial value: 0 (disabled) Range of values: - Variable that defines the save frequency of row cache to disk row_cache_provider Initial value: SerializingCacheProvider Range of values: ConcurrentLinkedHashCacheProvider or SerializingCacheProvider Variable that defines the implementation of row cache Java heap tune up Apache Cassandra interacts with the operating system using the Java virtual machine, so the Java heap size plays an important role. When starting Cassandra, the size of the Java Heap is set automatically based on the total amount of RAM (Table 4-8, Determination of the Java heap relative to the amount of RAM). The Java heap size can be manually adjusted by changing the values ​​of the following variables contained on the file cassandra-env.sh located in the directory...apache-cassandraconf. # MAX_HEAP_SIZE = “4G” # HEAP_NEWSIZE = “800M” Total system memory Java heap size < 2 Gb Half of the system memory 2 Gb - 4 Gb 1 Gb > 4 Gb One quarter of the system memory, no more than 8 Gb Table 4-8: Determination of the Java heap relative to the amount of RAM Java garbage collection tune up Apache Cassandra has a GC Inspector which is responsible for collecting information on each garbage collection process longer than 200ms. The Garbage Collection Processes that occur frequently and take a lot of time (as concurrent mark-sweep which takes several seconds) indicate that there is a great pressure on garbage collection and in the JVM. The recommendations to address these issues include: Add new nodes Reduce the cache size Adjust items related to the JVM garbage collection Views, triggers, and stored procedures By definition (In RDBMS) view represents a virtual table that acts as a real (created) table, which in reality does not contain any data. The obtained data isthe result of a SELECT query. View consists of a rows and columns combination of one or more different tables. Respectively in NoSQL, in Cassandra all data for key value rows are placed in one Column family. As in NoSQL, there is noJOIN commands and there is no possibility of flexible queries, the SELECT command lists the actual data, but there is no display options for a virtual table, that is, a view. Since Cassandra does not belong to the RDBMS group, there is no possibility of creating triggers and stored procedures. RI Restrictions can be set only in the application code Also, as Cassandra does not belong to the RDBMS group, we cannot apply Codd’s rules. Client-server architecture At this point, we have probably already noticed that Apache Cassandra runs on a client-server architecture. By definition, the client-server architecture allows distributed applications, since the tasks are divided into two main parts: On one hand, service providers: the servers. On the other hand, the service petitioners:  the clients. In this architecture, several clients are allowed to access the server; the server is responsible for meeting requests and handle each one according its own rules. So far, we have only used one client, managed from the same machine, that is, from the same data network. CQLs allows us to connect to Cassandra, access a key space, and send CQL statements to the Cassandra server. This is the most immediate method, but in daily practice, it is common to access the key spaces from different execution contexts (other systems and other programming languages). Thus, we require other clients different from CQLs, to do it in the Apache Cassandra context, we require connection drivers. Drivers A driver is just a software component that allows access to a key space to run CQL statements. Fortunately, there arealready a lot of drivers to create clients for Cassandra in almost any modern programming language, you can see an extensive list at this URL:http://wiki.apache.org/cassandra/ClientOptions. Typically, in a client-server architecture there are different clients accessing the server from different clients, which are distributed in different networks. Our implementation needs will dictate the required clients. Summary NoSQL is not just hype,or ayoung technology; it is an alternative, with known limitations and capabilities. It is not an RDBMS killer. It’s more like a younger brother who is slowly growing up and takes some of the burden. Acceptance is increasing and it will be even better as NoSQL solutions mature. Skepticism may be justified, but only for concrete reasons. Since Cassandra is an easy and free working environment, suitable for application development, it is recommended, especially with the additional utilities that ease and accelerate database administration. Cassandra has some faults (for example, user authentication and authorization are still insufficiently supportedin Windows environments) and preferably used when there is a need to store large amounts of data. For start-up companies that need to manipulate large amounts of data with the aim of costs reduction, implementing Cassandra in a Linux environment is a must-have. Resources for Article: Further resources on this subject: Getting Started with Apache Cassandra [article] Apache Cassandra: Working in Multiple Datacenter Environments [article] Apache Cassandra: Libraries and Applications [article]
Read more
  • 0
  • 0
  • 6431

article-image-welcome-new-world
Packt
23 Jan 2017
8 min read
Save for later

Welcome to the New World

Packt
23 Jan 2017
8 min read
We live in very exciting times. Technology is changing at a pace so rapid, that it is becoming near impossible to keep up with these new frontiers as they arrive. And they seem to arrive on a daily basis now. Moore's Law continues to stand, meaning that technology is getting smaller and more powerful at a constant rate. As I said, very exciting. In this article by Jason Odom, the author of the book HoloLens Beginner's Guide, we will be discussing about one of these new emerging technologies that finally is reaching a place more material than science fiction stories, is Augmented or Mixed Reality. Imagine the world where our communication and entertainment devices are worn, and the digital tools we use, as well as the games we play, are holographic projections in the world around us. These holograms know how to interact with our world and change to fit our needs. Microsoft has to lead the charge by releasing such a device... the HoloLens. (For more resources related to this topic, see here.) The Microsoft HoloLens changes the paradigm of what we know as personal computing. We can now have our Word window up on the wall (this is how I am typing right now), we can have research material floating around it, we can have our communication tools like Gmail and Skype in the area as well. We are finally no longer trapped to a virtual desktop, on a screen, sitting on a physical desktop. We aren't even trapped by the confines of a room anymore. What exactly is the HoloLens? The HoloLens is a first of its kind, head-worn standalone computer with a sensor array which includes microphones and multiple types of cameras, spatial sound speaker array, a light projector, and an optical waveguide. The HoloLens is not only a wearable computer; it is also a complete replacement for the standard two-dimensional display. HoloLens has the capability of using holographic projection to create multiple screens throughout and environment as well as fully 3D- rendered objects as well. With the HoloLens sensor array these holograms can fully interact with the environment you are in. The sensor array allows the HoloLens to see the world around it, to see input from the user's hands, as well as for it to hear voice commands. While Microsoft has been very quiet about what the entire sensor array includes we have a good general idea about the components used in the sensor array, let's have a look at them: One IMU: The Inertia Measurement Unit (IMU) is a sensor array that includes, an Accelerometer, a Gyroscope, and a Magnetometer. This unit handles head orientation tracking and compensates for drift that comes from the Gyroscopes eventual lack of precision. Four environment understanding sensors: These together form the spatial mapping that the HoloLens uses to create a mesh of the world around the user. One depth camera:Also known as a structured--light 3D scanner. This device is used for measuring the three-dimensional shape of an object using projected light patterns and a camera system. Microsoft first used this type of camera inside the Kinect for the Xbox 360 and Xbox One.  One ambient light sensor:Ambient light sensors or photosensors are used for ambient light sensing as well as proximity detection. 2 MP photo/HD video camera:For taking pictures and video. Four-microphone array: These do a great job of listening to the user and not the sounds around them. Voice is one of the primary input types with HoloLens. Putting all of these elements together forms a Holographic computer that allows the user to see, hear and interact with the world around in new and unique ways. What you need to develop for the HoloLens The HoloLens development environment breaks down to two primary tools, Unity and Visual Studio. Unity is the 3D environment that we will do most of our work in. This includes adding holograms, creating user interface elements, adding sound, particle systems and other things that bring a 3D program to life. If Unity is the meat on the bone, Visual Studio is a skeleton. Here we write scripts or machine code to make our 3D creations come to life and add a level of control and immersion that Unity can not produce on its own. Unity Unity is a software framework designed to speed up the creation of games and 3D based software. Generally speaking, Unity is known as a game engine, but as the holographic world becomes more apparently, the more we will use such a development environment for many different kinds of applications. Unity is an application that allows us to take 3D models, 2D graphics, particle systems, and sound to make them interact with each other and our user. Many elements are drag and drop, plug and play, what you see is what you get. This can simplify the iteration and testing process. As developers, we most likely do not want to build and compile forever little change we make in the development process. This allows us to see the changes in context to make sure they work, then once we hit a group of changes we can test on the HoloLens ourselves. This does not work for every aspect of HoloLens--Unity development but it does work for a good 80% - 90%. Visual Studio community Microsoft Visual Studio Community is a great free Integrated Development Environment (IDE). Here we use programming languages such as C# or JavaScript to code change in the behavior of objects, and generally, make things happen inside of our programs. HoloToolkit - Unity The HoloToolkit--Unity is a repository of samples, scripts, and components to help speed up the process of development. This covers a large selection of areas in HoloLens Development such as: Input:Gaze, gesture, and voice are the primary ways in which we interact with the HoloLens Sharing:The sharing repository helps allow users to share holographic spaces and connect to each other via the network. Spatial Mapping:This is how the HoloLens sees our world. A large 3D mesh of our space is generated and give our holograms something to interact with or bounce off of. Spatial Sound:The speaker array inside the HoloLens does an amazing work of giving the illusion of space. Objects behind us sound like they are behind us. HoloLens emulator The HoloLens emulator is an extension to Visual Studio that will simulate how a program will run on the HoloLens. This is great for those who want to get started with HoloLens development but do not have an actual HoloLens yet. This software does require the use of Microsoft Hyper-V , a feature only available inside of the Windows 10 Pro operating system. Hyper-V is a virtualization environment, which allows the creation of a virtual machine. This virtual machine emulates the specific hardware so one can test without the actual hardware. Visual Studio tools for Unity This collection of tools adds IntelliSense and debugging features to Visual Studio. If you use Visual Studio and Unity this is a must have: IntelliSense:An intelligent code completion tool for Microsoft Visual Studio. This is designed to speed up many processes when writing code. The version that comes with Visual Studios tools for Unity has unity specific updates. Debugging:Up to the point that this extension exists debugging Unity apps proved to be a little tedious. With this tool, we can now debug Unity applications inside Visual Studio speeding of the bug squashing process considerably. Other useful tools Following mentioned are some the useful tools that are required: Image editor: Photoshop or Gimp are both good examples of programs that allow us to create 2D UI elements and textures for objects in our apps. 3D Modeling Software: 3D Studio Max, Maya, and Blender are all programs that allow us to make 3D objects that can be imported in Unity. Sound Editing Software: There are a few resources for free sounds out of the web with that in mind, Sound Forge is a great tool for editing those sounds, layering sounds together to create new sounds. Summary In this article, we have gotten to know a little bit about the HoloLens, so we can begin our journey into this new world. Here the only limitations are our imaginations. Resources for Article: Further resources on this subject: Creating a Supercomputer [article] Building Voice Technology on IoT Projects [article] C++, SFML, Visual Studio, and Starting the first game [article]
Read more
  • 0
  • 0
  • 26530
Modal Close icon
Modal Close icon