How-To Tutorials

article-image-introduction-mastering-javascript-promises-and-its-implementation-angularjs

23 Jul 2015

21 min read

An Introduction to Mastering JavaScript Promises and Its Implementation in Angular.js

23 Jul 2015

0
0
7143

How-To Tutorials

Packt

23 Jul 2015

15 min read

Role Management

Packt

23 Jul 2015

15 min read

0
0
2102

How-To Tutorials

article-image-programmable-dc-motor-controller-lcd

Packt

23 Jul 2015

23 min read

Programmable DC Motor Controller with an LCD

Packt

23 Jul 2015

23 min read

0
0
13261

How-To Tutorials

article-image-persisting-data-local-storage-ionic

Troy Miles

22 Jul 2015

5 min read

Persisting Data to Local Storage in Ionic

Troy Miles

22 Jul 2015

5 min read

Users expect certain things to simply work in every mobile app. And if they don't work as expected, users delete your app and even worse they will probably give it a bad rating. Settings are one of those things users expect to simply work. Whenever a user makes a change to your app's setting page, they expect that the rest of the app will pick up that change and that those changes will be correctly persisted so that whenever they use the app again, the changes they made will be remembered. There is nothing too difficult about getting persistence to work correctly in an Ionic app, but there are a few road bumps that this post can help you to avoid. For an example, we will use the Ionic side menu starter template and add a settings page to it (beta 14 of Ionic was used for this post). There is nothing special about the settings page, in fact, settings can be persisted from anywhere in the application. The settings page just gives a nice place to play with things. And we can see how to persist values as they are changed by the user. The first part of our settings strategy is that we will keep all of our individual settings in the Settings object. This is a personal preference of mine. My app always serializes and deserializes all of the individual properties of the Settings object. Anytime I want something persisted, I add it to the Settings object and the system takes care of the rest. Next, we use an Angular Value object to hold the settings. A value is one of Angular's providers, like factories, services, providers, and constants. And unlike its cousin, the constant, values can be changed. So a value gives us a nice place to store our settings object and the values we put in it serve as the default values. angular.module('starter') .value("Settings", { searchRadius: {value: "5"}, acceptsPushNotification: true, hasUserSeenMessage: false, pagesPerLoad: 20 }); At the base of settings strategy is the HTML5 local storage. Local storage gives web apps a nice place to store information in string based key value pairs. If you're wondering how we get types besides strings into storage, wonder no more. Part of the magic and the reason why it is nice to keep everything in a single object is that we are going to convert that single object to and from a string using JSON. Inside of the file, "localstorage-service.js" there are only two methods in the services API. The first is serializeSettings and the second is deserializeSettings. Both do exactly what their names imply. There is also an internal only method in Local Storage, checkLocalStorage. It is only used for diagnostic purposes, since it is only used to write where or not the device has local storage to the console log. The final thing that Local Storage does is call deserializeSettings once during its startup. This gives the settings object the last values stored. If there are no saved values, it uses the Settings object stored in values. One other bit of weirdness which should be explained is why we copy properties using angular extend instead of simply copying the whole object. If we ever modify the entire angular value object, it returns to the default values and our changes are lost. We could write our function to copy the properties, but angular includes extend which copies the properties exactly the way we need them. function deserializeSettings() { var newSettings, rawSettings = localStorage[settings]; if(rawSettings) { newSettings = JSON.parse(rawSettings); if (newSettings) { // use extend since it copies one property at a time angular.extend(Settings, newSettings); console.log("Settings restore"); } } } In the Settings controller we bind the values from our Settings object to widgets on the page. It is not an accident that we name the property the same thing on the $scope object as on the Settings object. This makes updating the property easier, if we access the object using JavaScript bracket notation, we can access both the $scope and Settings object at the same time. We use this in the onChange method which is called anytime the value of a widget is changed. All of the widgets call this onChange method. if (!Settings.hasUserSeenMessage) { Settings.hasUserSeenMessage = true; LocalStorageService.serializeSettings(); $ionicPopup.alert({ title: 'Hi There', template: '<div class="text-center">You are a first time user.</div>' }); } // set the initial values for the widgets $scope.searchRadius = Settings.searchRadius; $scope.acceptPush = Settings.acceptPush; // when a widget is changed, come here and update the setting object too $scope.onChange = function (type, value) { $scope[type] = value; Settings[type] = value; LocalStorageService.serializeSettings(); }; We also demonstrate how to persist values programmatically. The hasUserSeenMessage property is checked in the code. If the user hasn't seen our one time message, we set the value to true, persist the value to local storage, then display the message. Anytime you want to persist Settings, simply call LocalStorageService.serializeSettings. About the author Troy Miles, aka the Rockncoder, began writing games in assembly language for early computers like the Apple II, Vic20, C64, and the IBM PC over 35 years ago. Currently he fills his days writing web apps for a Southern California based automotive valuation and information company. Nights and weekends he can usually be found writing cool apps for mobile and web or teaching other developers how to do so. He likes to post interesting code nuggets on his blog: http://therockncoder.com and videos on his YouTube channel: https://www.youtube.com/user/rockncoder. He can be reached at rockncoder@gmail.com The complete code of this tutorial is in my GitHub repo at https://github.com/Rockncoder/settings. Now that you know one way to persist settings, there is no excuse for not giving users a settings page which persist data properly to the device.

0
0
5329

How-To Tutorials

Packt

21 Jul 2015

6 min read

WildFly – the Basics

Packt

21 Jul 2015

6 min read

In this article, Luigi Fugaro, author of the book Wildfly Cookbook says that the JBoss.org community is a huge community, where people all over the world develop, test, and document pieces of code. There are a lot of projects in there, not just JBoss AS, which is now WildFly. I can mention a few: Infinispan, Undertow, PicketLink, Arquillian, HornetQ, RESTeasy, AeroGear, and Vert.X. For a complete list of all projects, visit http://www.jboss.org/projects/. (For more resources related to this topic, see here.) Despite marketing reasons, as there is no preferred project, the community wanted to change the name of the JBoss AS project to something different, which would not collide with the community name. There was also another reason, which was about the JBoss Red Hat supported version named JBoss Enterprise Application Platform (EAP). This was another point toward replacing the JBoss AS name. Software prerequisites WildFly runs on top of the Java platform. It needs at least Java Runtime Environment (JRE) version 1.7 to run (further reference to version 1.7 and 7 should be considered equal; the same applies to 1.8 and 8 as well), but it also works perfectly with latest JRE version 8. As we will also need to compile and build Java web applications, we will need the Java Development Kit (JDK), which gives the necessary tools to work with the Java source code. In the JDK panorama, we can find the Oracle JDK, developed and maintained by Oracle, and OpenJDK, which relies on contributions by the community. Nevertheless, since April 2015, Oracle no longer posts updates of Java SE 7 to its public download sites as mentioned at http://www.oracle.com/technetwork/java/javase/downloads/eol-135779.html. Also, bear in mind that Java Critical Patch Update is released on a quarterly basis; therefore, for reasons of stability and feature support, we will use the Oracle JDK 8, which is freely available for download at http://www.oracle.com/technetwork/java/javase/downloads/index.html. While writing the book, the latest stable Oracle JDK is version 1.8.0_31 (8u31). From here on, every reference to Java Virtual Machine (JVM), Java, JRE, and JDK will be intended to be to Oracle JDK 1.8.0_31. To keep things simple, if you don't mind, use that same version. In addition to the JDK, we will need Apache Maven 3, which is a build tool for Java projects. It is freely available for download at http://maven.apache.org/download.cgi. A generic download link can be found at http://www.us.apache.org/dist/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz. Downloading and installing WildFly In this article, you will learn how to get and install WildFly. As always in the open source world, you can do the same thing in different ways. WildFly can be installed using your preferred software manager or by downloading the bundle provided by the http://wildfly.org/ site. Going as per the JDK, we will choose the second way. Getting ready Just open your favorite browser and point it to http://wildfly.org/downloads/. You should see a page similar to this one: WildFly's download page Now download the latest version into our WFC folder. How to do it Once the download is complete, open a terminal and extract its content into the WFC folder, executing the following commands: $ cd ~/WFC && tar zx wildfly-9.0.0.Beta2.tar.gz The preceding command will first point to our WildFly Cookbook folder; it will then extract the WildFly archive from it. Listing our WFC folder, we should find the newly created WildFly folder named wildfly-9.0.0.Beta2. To better remember and handle WildFly's installation directory, rename it to wildfly, as follows: $ cd ~/WFC && mv wildfly-9.0.0.Beta2 wildfly By the way, WildFly can be also installed using YUM, Fedora's traditional software manager. In production environment, you will not place the WildFly installation directory in the home folder of a specific user; rather, you will place it in different paths, relative to the context you are working on. Now, we need to create the JBOSS_HOME environment variable, which is used by WildFly itself as base directory when it starts up (in the feature release, this was updated to WILDFLY_HOME). We will also create the WILDFLY_HOME environment variable, which we will use through the whole book to reference WildFly's installation directory. Thus, open the .bash_profile file, placed in your home folder, with your favorite text editor, and add the following directives: export JBOSS_HOME=~/WFC/wildfly export WILDFLY_HOME=$JBOSS_HOME For the changes to take effect, you can either log out and log back in, or just issue the following command: $ source ~/.bash_profile Your .bash_profile file should look as follows: Understanding the WildFly directory overview Now that we have finished installing WildFly, let's look into its folders. How to do it Open your terminal and run the following command: $ cd $WILDFLY_HOME $ pwd && ls -la The output of your commands should be similar to the following image: WildFly's folders overview How it works The preceding image depicts WildFly's folder on the filesystem. Each is outlined in the following table: Folder's name Description appclient These are configuration files, deployment content, and writable areas used by the application client container run from this installation. bin These are the startup scripts; startup configuration files, and various command-line utilities, such as Vault, add-user, and Java diagnostic report available for the Unix and Windows environments. bin/client This contains a client JAR for use by non maven-based clients. docs/schema These are XML schema definition files. docs/examples/configs These are example configuration files representing specific use cases. domain These are configuration files, deployment content, and writable areas used by the domain mode processes run from this installation. modules WildFly is based on a modular class-loading architecture. The various modules used in the server are stored here. standalone These are configuration files, deployment content, and writable areas used by the single standalone server run from this installation. welcome-content This is the default Welcome Page content. In the preceding table, I've emphasized the domain and standalone folders, which are those that determine in which mode WildFly will run: standalone or domain. From here on, whenever mentioned, WildFly's home will be intended as $WILDFLY_HOME. Summary In this article, we covered the software prerequisites of WildFly, downloading and installing WildFly, and an overview of WildFly's folder structure. Resources for Article: Further resources on this subject: Various subsystem configurations [article] Creating a JSF composite component [article] Introducing PrimeFaces [article]

0
0
12627

article-image-configuring-freeswitch-webrtc

Packt

21 Jul 2015

12 min read

Configuring FreeSWITCH for WebRTC

Packt

21 Jul 2015

12 min read

In the article written by Giovanni Maruzzelli, author of FreeSWITCH 1.6 Cookbook, we learn how WebRTC is all about security and encryption. Theye are not an afterthought. They're intimately interwoven at the design level and are mandatory. For example, you cannot stream audio or video clearly (without encryption) via WebRTC. (For more resources related to this topic, see here.) Getting ready To start with this recipe, you need certificates. These are the same kind of certificates used by web servers for SSL-HTTPS. Yes, you can be your own Certification Authority and self-sign your own certificate. However, this will add considerable hassles; browsers will not recognize the certificate, and you will have to manually instruct them to make a security exception and accept it, or import your own Certification Authority chain to the browser. Also, in some mobile browsers, it is not possible to import self-signed Certification Authorities at all. The bottom line is that you can buy an SSL certificate for less than $10, and in 5 minutes. (No signatures, papers, faxes, telephone calls nothing is required. Only a confirmation email and a few bucks are enough.) It will save you much frustration, and you'll be able to cleanly showcase your installation to others. The same reasoning applies to DNS Full Qualified Domain Names (FQDN)certificates belonging to FQDN's. You can put your DNS names in /etc/hosts, or set up an internal DNS server, but this will not work for mobile clients and desktops outside your control. You can register a domain, point an fqdn to your machine's public IP (it can be a Linode, an AWS VM, or whatever), and buy a certificate using that fqdn as Common Name (CN). Don't try to set up the WebRTC server on your internal LAN behind the same NAT that your clients are into (again, it is possible but painful). How to do it... Once you have obtained your certificate (be sure to download the Certification Authority Chain too, and keep your Private Key; you'll need it), you must concatenate those three elements to create the needed certificates for mod_sofia to serve SIP signaling via WSS and media via SRTP/DTLS. With certificates in the right place, you can now activate ssl in Sofia. Open /usr/local/freeswitch/conf/vars.xml: As you can see, in the default configuration, both lines that feature SSL are false. Edit them both to change them to true. How it works... By default, Sofia will listen on port 7443 for WSS clients. You may want to change this port if you need your clients to traverse very restrictive firewalls. Edit /usr/local/freeswitch/conf/sip-profiles/internal.xml and change the "wss-binding" value to 443. This number, 443, is the HTTPS (SSL) port, and is almost universally open in all firewalls. Also, wss traffic is indistinguishable from https/ssl traffic, so your signaling will pass through the most advanced Deep Packet Inspection. Remember that if you use port 443 for WSS, you cannot use that same port for HTTPS, so you will need to deploy your secure web server on another machine. There's more... A few examples of such a configuration are certificates, DNS, and STUN/TURN. Generally speaking, if you set up with real DNS names, you will not need to run your own STUN server; your clients can rely on Google STUN servers. But if you need a TURN server because some of your clients need a media relay (which is because they're behind and demented NAT got UDP blocked by zealous firewalls), install on another machine rfc5766-turn-server, and have it listen on TCP ports 443 and 80. You can also put certificates with it and use TURNS on encrypted connection. The same firewall piercing properties as per signaling. SIP signaling in JavaScript with SIP.js (WebRTC client) Let's carry out the most basic interaction with a web browser audio/video through WebRTC. We'll start using SIP.js, which uses a protocol very familiar to all those who are old hands at VoIP. A web page will display a click-to-call button, and anyone can click for inquiries. That call will be answered by our company's PBX and routed to our employee extension (1010). Our employee will wait on a browser with the "answer" web page open, and will automatically be connected to the incoming call (if our employee does not answer, the call will go to their voicemail). Getting ready To use this example, download version 0.7.0 of the SIP.js JavaScript library from www.sipjs.com. We need an "anonymous" user that we can allow into our system without risks, that is, a user that can do only what we have preplanned. Create an anonymous user for click-to-call in a file named /usr/local/freeswitch/conf/directory/default/anonymous.xml : <include> <user id="anonymous"> <params> <param name="password" value="welcome"/> </params> <variables> <variable name="user_context" value="anonymous"/> <variable name="effective_caller_id_name" value="Anonymous"/> <variable name="effective_caller_id_number" value="666"/> <variable name="outbound_caller_id_name" value="$${outbound_caller_name}"/> <variable name="outbound_caller_id_number" value="$${outbound_caller_id}"/> </variables> </user> </include> Then add the user's own dialplan to /usr/local/freeswitch/conf/dialplan/anonymous.xml: <include> <context name="anonymous"> <extension name="public_extensions"> <condition field="destination_number" expression="^(10[01][0-9])$"> <action application="transfer" data="$1 XML default"/> </condition> </extension> <extension name="conferences"> <condition field="destination_number" expression="^(36d{2})$"> <action application="answer"/> <action application="conference" data="$1-${domain_name}@video-mcu"/> </condition> </extension> <extension name="echo"> <condition field="destination_number" expression="^9196$"> <action application="answer"/> <action application="echo"/> </condition> </extension> </context> </include> How to do it... In a directory served by your HTPS server (for example, Apache with an SSL certificate), put all the following files. Minimal click-to-call caller client HTML (call.html): <html> <body> <button id="startCall">Start Call</button> <button id="endCall">End Call</button> <br/> <video id="remoteVideo"></video> <br/> <video id="localVideo" muted="muted" width="128px" height="96px"></video> <script src="js/sip-0.7.0.min.js"></script> <script src="call.js"></script> </body> </html> JAVASCRIPT (call.js): var session; var endButton = document.getElementById('endCall'); endButton.addEventListener("click", function () { session.bye(); alert("Call Ended"); }, false); var startButton = document.getElementById('startCall'); startButton.addEventListener("click", function () { session = userAgent.invite('sip:1010@gmaruzz.org', options); alert("Call Started"); }, false); var userAgent = new SIP.UA({ uri: 'anonymous@gmaruzz.org', wsServers: ['wss://self2.gmaruzz.org:7443'], authorizationUser: 'anonymous', password: 'welcome' }); var options = { media: { constraints: { audio: true, video: true }, render: { remote: document.getElementById('remoteVideo'), local: document.getElementById('localVideo') } } }; Minimal callee HTML (answer.html): <html> <body> <button id="endCall">End Call</button> <br/> <video id="remoteVideo"></video> <br/> <video id="localVideo" muted="muted" width="128px" height="96px"></video> <script src="js/sip-0.7.0.min.js"></script> <script src="answer.js"></script> </body> </html> JAVASCRIPT (answer.js): var session; var endButton = document.getElementById('endCall'); endButton.addEventListener("click", function () { session.bye(); alert("Call Ended"); }, false); var userAgent = new SIP.UA({ uri: '1010@gmaruzz.org', wsServers: ['wss://self2.gmaruzz.org:7443'], authorizationUser: '1010', password: 'ciaociao' }); userAgent.on('invite', function (ciapalo) { session = ciapalo; session.accept({ media: { constraints: { audio: true, video: true }, render: { remote: document.getElementById('remoteVideo'), local: document.getElementById('localVideo') } } }); }); How it works... Our employee (the callee, or the person who will answer the call) will sit tight with the answer.html web page open on their browser. Upon page load, JavaScript will have created the SIP agent and registered it with our FreeSWITCH server as SIP user "1010" (just as our employee was on their own regular SIP phone). Our customer (the caller, or the person who initiates the communication) will visit the call.html webpage (while loading, this web page will register as an SIP "anonymous" user with FreeSWITCH), and then click on the Start Call button. This clicking will activate the JavaScript that creates the communication session using the invite method of the user agent, passing as an argument the SIP address of our employee. The Invite method initiates a call, and our FreeSWITCH server duly invites SIP user 1010. That happens to be the answer.html web page our employee is in front of. The Invite method sent from FreeSWITCH to answer.html will activate the JavaScript local user agent, which will create the session and accept the call. At this moment, the caller and callee are connected, and voice and video will begin to flow back and forth. The received audio or video stream will be rendered by the RemoteVideo tag in the web page, while its own stream (the video that is sent to the peer) will show up locally in the little localVideo tag. That's muted not to generate Larsen whistles. See also The Configuring a SIP phone to register with FreeSWITCH recipe in Chapter 2, Connecting Telephones and Service Providers, and the documentation at http://sipjs.com/guides/.confluence/display/FREESWITCH/mod_verto Summary This article features the new disruptive technology that allows real-time audio/video/data-secure communication from hundreds of millions of browsers. FreeSWITCH is ready to serve as a gateway and an application server. Resources for Article: Further resources on this subject: WebRTC with SIP and IMS [article] Architecture of FreeSWITCH [article] Calling your fellow agents [article]

0
0
27763

Packt

21 Jul 2015

28 min read

More about Julia

Packt

21 Jul 2015

28 min read

In this article by Malcolm Sherrington, author of the book Mastering Julia, we will see why write a book on Julia when the language is not yet reached the version v1.0 stage? It was the first question which needed to be addressed when deciding on the contents and philosophy behind the book. (For more resources related to this topic, see here.) Julia at the time as v0.2, it is now soon to achieve a stable v0.4 but already the blueprint for v0.5 is being touted. There were some common misconceptions which I wished to address: It is a language designed for Geeks It's main attribute, possibly only, was its speed It was a scientific language primarily a MATLAB clone It is not as easy to use compared with the alternatives such Python and R There are not enough library support to tackle Enterprise Solutions In fact none of these apply to Julia. True it is a relatively young programming language. The initial design work on Julia project began at MIT in August 2009, by February 2012 it became open source. It is largely the work of three developers Stefan Karpinski, Jeff Bezanson, and Viral Shah. Initially Julia was envisaged by the designers as a scientific language sufficiently rapid to make the necessity of modeling in an interactive language and subsequently having to redevelop in a compiled language, such as C or Fortran. To achieve this, Julia code would need to be transformed to the underlying machine code of the computer but using the low level virtual machine (LLVM) compilation system, at the time itself a new project. This was a masterstroke. LLVM is now the basis of a variety of systems, the Apple C compiler (clang) uses it, Google V8 JavaScript engine and Mozilla's Rust language use it and Python is attempting to achieve significant increases in speed with its numba module. In Julia LLVM always works, there are no exceptions because it has to. When launched possibly the community itself saw Julia as a replacement for MATLAB but that proved not to be just case. The syntax of Julia is similar to MATLAB, so much so that anyone competent in the latter can easily learn Julia but, it is a much richer language with many significant differences. The task of the book was to focus on these. In particular my target audience was the data scientist and programmer analyst but have sufficient for the "jobbing" C++ and Java programmer. Julia's features The Julia programming language is free and open source (MIT licensed) and the source is available on GitHub. To the veteran programmer it has a look and feel similar to MATLAB. Blocks created by for, while, and if statements are all terminated by end rather than by endfor, endwhile, and endif or using the familiar {} style syntax. However it is not a MATLAB clone and sources written for MATLAB will not run on Julia. The following are some of the Julia's features: Designed for parallelism and distributed computation (multicore and cluster) C functions called directly (no wrappers or special APIs needed) Powerful shell-like capabilities for managing other processes Lisp-like macros and other metaprogramming facilities User-defined types are as fast and compact as built-ins LLVM-based, just-in-time (JIT) compiler that allows Julia to approach and often match the performance of C/C++ An extensive mathematical function library (written in Julia) Integrated mature, best-of-breed C and Fortran libraries for linear algebra, random number generation, FFTs, and string processing Julia's core is implemented in C and C++, its parser in Scheme, and the LLVM compiler framework is used for JIT generation of machine code. The standard library is written in Julia itself using the Node.js's libuv library for efficient, cross-platform I/O. Julia has a rich language of types for constructing and describing objects that can also optionally be used to make type declarations. The ability to define function behavior across many combinations of argument types via multiple dispatch which is a key cornerstone of the language design. Julia can utilize code in other programming languages by a directly calling routines written in C or Fortran and stored in shared libraries or DLLs. This is a feature of the language syntax. In addition it is possible to interact with Python via the PyCall and this is used in the implementation of the IJulia programming environment. A quick look at some Julia To get feel for programming in Julia let's look at an example which uses random numbers to price an Asian derivative on the options market. A share option is the right to purchase a specific stock at a nominated price sometime in the future. The person granting the option is called the grantor and the person who has the benefit of the option is the beneficiary. At the time the option matures the beneficiary may choose to exercise the option if it is in his/her interest the grantor is then obliged to complete the contract. The following snippet is part of the calculation and computes a single trial and uses the Winston package to display the trajectory: using Winston; S0 = 100; # Spot price K = 102; # Strike price r = 0.05; # Risk free rate q = 0.0; # Dividend yield v = 0.2; # Volatility tma = 0.25; # Time to maturity T = 100; # Number of time steps dt = tma/T; # Time increment S = zeros(Float64,T) S[1] = S0; dW = randn(T)*sqrt(dt); [ S[t] = S[t-1] * (1 + (r - q - 0.5*v*v)*dt + v*dW[t] + 0.5*v*v*dW[t]*dW[t]) for t=2:T ] x = linspace(1, T, length(T)); p = FramedPlot(title = "Random Walk, drift 5%, volatility 2%") add(p, Curve(x,S,color="red")) display(p) Plot one track so only compute a vector S of T elements. The stochastic variance dW is computed in a single vectorized statement. The track S is computed using a "list comprehension". The array x is created using linspace to define a linear absicca for the plot. Using the Winston package to produce the display, which only requires 3 statements: to define the plot space, add a curve to it and display the plot as shown in the following figure: Generating Julia sets Both the Mandelbrot set and Julia set (for a given constant z0) are the sets of all z (complex number) for which the iteration z → z*z + z0 does not diverge to infinity. The Mandelbrot set is those z0 for which the Julia set is connected. We create a file jset.jl and its contents defines the function to generate a Julia set. functionjuliaset(z, z0, nmax::Int64) for n = 1:nmax if abs(z) > 2 (return n-1) end z = z^2 + z0 end returnnmax end Here z and z0 are complex values and nmax is the number of trials to make before returning. If the modulus of the complex number z gets above 2 then it can be shown that it will increase without limit. The function returns the number of iterations until the modulus test succeeds or else nmax. Also we will write a second file pgmfile.jl to handling displaying the Julia set. functioncreate_pgmfile(img, outf::String) s = open(outf, "w") write(s, "P5n") n, m = size(img) write(s, "$m $n 255n") for i=1:n, j=1:m p = img[i,j] write(s, uint8(p)) end close(s) end It is quite easy to create a simple disk file using the portable bitmap (netpbm) format. This consists of a "magic" number P1 - P6, followed on the next line the image height, width and a maximum color value which must be greater than 0 and less than 65536; all of these are ASCII values not binary. Then follows the image values (height x width) which make be ASCII for P1, P2, P3 or binary for P4, P5, P6. There are three different types of portable bitmap; B/W (P1/P4), Grayscale (P2/P5), and Colour (P3/P6). The function create_pgmfile() creates a binary grayscale file (magic number = P5) from an image matrix where the values are written as Uint8. Notice that the for loop defines the indices i, j in a single statement with correspondingly only one end statement. The image matrix is output in column order which matches the way it is stored in Julia. So the main program looks like: include("jset.jl") include("pgmfile.jl") h = 400; w = 800; M = Array(Int64, h, w); c0 = -0.8+0.16im; pgm_name = "juliaset.pgm"; t0 = time(); for y=1:h, x=1:w c = complex((x-w/2)/(w/2), (y-h/2)/(w/2)) M[y,x] = juliaset(c, c0, 256) end t1 = time(); create_pgmfile(M, pgm_name); print("Written $pgm_namenFinished in $(t1-t0) seconds.n"); This is how the previous code works: We define an array N of type Int64 to hold the return values from the juliaset function. The constant c0 is arbitrary, different values of c0 will produce different Julia sets. The starting complex number is constructed from the (x,y) coordinates and scaled to the half width. We 'cheat' a little by defining the maximum number of iterations as 256. Because we are writing byte values (Uint8) and value which remains bounded will be 256 and since overflow values wrap around will be output as 0 (black). The script defines a main program in a function jmain(): julia>jmain Written juliaset.pgm Finished in 0.458 seconds # => (on my laptop) Julia type system and multiple dispatch Julia is not an object-oriented language so when we speak of object they are a different sort of data structure to those in traditional O-O languages. Julia does not allow types to have methods or so it is not possible to create subtypes which inherit methods. While this might seem restrictive it does permit methods to use a multiple dispatch call structure rather than the single dispatch system employed in object orientated ones. Coupled with Julia's system of types, multiple dispatch is extremely powerful. Moreover it is a more logical approach for data scientists and scientific programmers and if for no other reason exposing this to you the analyst/programmer is a reason to use Julia. A function is an object that maps a tuple of arguments to a return value. In the case where the arguments are not valid the function should handle the situation cleanly by catching the error and handling it or throw an exception. When a function is applied to its argument tuple it selects the appropriate method and this process is called dispatch. In traditional object-oriented languages the method chosen is based only on the object type and this paradigm is termed single-dispatch. With Julia the combination of all a functions argument determine which method is chosen, this is the basis of multiple dispatch. To the scientific programmer this all seems very natural. It makes little sense in most circumstances for one argument to be more important than the others. In Julia all functions and operators, which are also functions, use multiple dispatch. The methods chosen for any combination of operators. For example looking at the methods of the power operator (^): julia> methods(^) # 43 methods for generic function "^": ^(x::Bool,y::Bool) at bool.jl:41 ^(x::BigInt,y::Bool) at gmp.jl:314 ^(x::Integer,y::Bool) at bool.jl:42 ^(A::Array{T,2},p::Number) at linalg/dense.jl:180 ^(::MathConst{:e},x::AbstractArray{T,2}) at constants.jl:87 We can see that there are 43 methods for ^ and the file and line where the methods is defined are given too. Because any untyped argument is designed as type Any, it is possible to define a set of function methods such that there is no unique most specific method applicable to some combinations of arguments. julia> pow(a,b::Int64) = a^b; julia> pow(a::Float64,b) = a^b; Warning: New definition pow(Float64,Any) at /Applications/JuliaStudio.app/Contents/Resources/Console/Console.jl:1 is ambiguous with: pow(Any,Int64) at /Applications/JuliaStudio.app/Contents/Resources/Console/Console.jl:1. To fix, define pow(Float64,Int64) before the new definition. A call of pow(3.5, 2) can be handled by either function. In this case they will give the same result by only because of the function bodies and Julia can't know that. Working with Python The ability for Julia with call code written in other languages is one of its main strengths. From its inception Julia had to play "catchup" and a key feature was it makes calling code written in C, and by implication Fortran, very easy. The code to be called must be available as a shared library rather than just a standalone object file. There is a zero-overhead in the call, meaning that it is reduced to a single machine instruction in the LLVM compilation. In addition Python models can be accessed via the PyCall package which provides a @pyimport macro that mimics a Python import statement. This imports a Python module and provides Julia wrappers for all of the functions and constants including automatic conversion of types between Julia and Python. This work has led to the creation of an IJulia kernel to the IPython IDE which now is a principle component of the Jupyter project. In Pycall, type conversions are automatically performed for numeric, boolean, string, IO streams plus all tuples, arrays and dictionaries of these types. julia> using PyCall julia> @pyimport scipy.optimize as so julia> @pyimport scipy.integrate as si julia> so.ridder(x -> x*cos(x), 1, pi); # => 1.570796326795 julia> si.quad(x -> x*sin(x), 1, pi)[1]; # => 2.840423974650 In the preceding commands, the Python optimize and integrate modules are imported and functions in those modules called from the Julia REPL. One difference imposed on the package is that calls using the Python object notation are not possible from Julia, so these are referenced using an array-style notation po[:atr] rather po.atr, where po is a PyObject and atr is an attribute. It is also easy to use the Python matplotlib module to display simple (and complex) graphs. @pyimport matplotlib.pyplot as plt x = linspace(0,2*pi,1000); y = sin(3*x + 4*cos(2*x)); plt.plot(x, y, color="red", linewidth=2.0, linestyle="--") 1-element Array{Any,1}: PyObject<matplotlib.lines.Line2D object at 0x0000000027652358> plt.show() Notice that keywords can also be passed such as the color, line width and the preceding style. Simple statistics using dataframes Julia implements various approaches for handling data held on disk. These may be 'normal' files such as text files, CSV and other delimited files, or in SQL and NoSQL databases. Also there is an implementation of dataframe support similar to that provided in R and via the pandas module in Python. The following looks at the Apple share prices from 2000 to 200, using a CSV file with provides opening, closing, high and low prices together with trading volumes over the day. using DataFrames, StatsBase aapl = readtable("AAPL-Short.csv"); naapl = size(aapl)[1] m1 = int(mean((aapl[:Volume]))); # => 6306547 The data is read into a DataFrame and we can estimate the mean (m1). For the volume, it is possible to cast it as an integer as this makes more sense. We can do this by creating a weighting vector. using Match wts = zeros(naapl); for i in 1:naapl dt = aapl[:Date][i] wts[i] = @match dt begin r"^2000" => 1.0 r"^2001" => 2.0 r"^2002" => 4.0 _ => 0.0 end end; wv = WeightVec(wts); m2 = int(mean(aapl[:Volume], wv)); # => 6012863 Computing weighted statistical metrics it is possible to 'trim' off the outliers from each end of the data. Returning to the closing prices: mean(aapl[:Close]); # => 37.1255 mean(aapl[:Close], wv); # => 26.9944 trimmean(aapl[:Close], 0.1); # => 34.3951 trimean() is the trimmed mean where 5 percent is taken from each end. std(aapl[:Close]); # => 34.1186 skewness(aapl[:Close]) # => 1.6911 kurtosis(aapl[:Close]) # => 1.3820 As well as second moments such as standard deviation, StatsBase provides a generic moments() function and specific instances based on these such as for skewness (third) and kurtosis (fourth). It is also possible to provide some summary statistics: summarystats(aapl[:Close]) Summary Stats: Mean: 37.125505 Minimum: 13.590000 1st Quartile: 17.735000 Median: 21.490000 3rd Quartile: 31.615000 Maximum: 144.190000 The first and third quartiles related to the 25 percent and 75 percent percentiles for a finer granularity we can use the percentile() function. percentile(aapl[:Close],5); # => 14.4855 percentile(aapl[:Close],95); # => 118.934 MySQL access using PyCall We have seen previously that Python can be used for plotting via the PyPlot package which interfaces with matplotlib. In fact the ability to easily call Python modules is a very powerful feature in Julia and we can use this as an alternative method for connecting to databases. Any database which can be manipulated by Python is also available to Julia. In particular since the DBD driver for MySQL is not fully DBT compliant, let's look this approach to running some queries. Our current MySQL setup already has the Chinook dataset loaded some we will execute a query to list the Genre table. In Python we will first need to download the MySQL Connector module. For Anaconda this needs to be using the source (independent) distribution, rather than a binary package and the installation performed using the setup.py file. The query (in Python) to list the Genre table would be: import mysql.connector as mc cnx = mc.connect(user="malcolm", password="mypasswd") csr = cnx.cursor() qry = "SELECT * FROM Chinook.Genre" csr.execute(qry) for vals in csr: print(vals) (1, u'Rock') (2, u'Jazz') (3, u'Metal') (4, u'Alternative & Punk') (5, u'Rock And Roll') ... ... csr.close() cnx.close() We can execute the same in Julia by using the PyCall to the mysql.connector module and the form of the coding is remarkably similar: using PyCall @pyimport mysql.connector as mc cnx = mc.connect (user="malcolm", password="mypasswd"); csr = cnx[:cursor]() query = "SELECT * FROM Chinook.Genre" csr[:execute](query) for vals in csr id = vals[1] genre = vals[2] @printf “ID: %2d, %sn” id genre end ID: 1, Rock ID: 2, Jazz ID: 3, Metal ID: 4, Alternative & Punk ID: 5, Rock And Roll ... ... csr[:close]() cnx[:close]() Note that the form of the call is a little different from the corresponding Python method, since Julia is not object-oriented the methods for a Python object are constructed as an array of symbols. For example the Python csr.execute(qry) routine is called in Julia as csr[:execute](qry). Also be aware that although Python arrays are zero-based this is translated to one-based by PyCall, so the first values is referenced as vals[1]. Scientific programming with Julia Julia was originally developed as a replacement for MATLAB with a focus on scientific programming. There are modules which are concerned with linear algebra, signal processing, mathematical calculus, optimization problems, and stochastic simulation. The following is a subject dear to my heart: the solution of differential equations. Differential equations are those which involve terms which involve the rates of change of variates as well as the variates themselves. They arise naturally in a number of fields, notably dynamics and when the changes are with respect to one dependent variable, often time, the systems are called ordinary differential equations. If more than a single dependent variable is involved, then they are termed partial differential equations. Julia supports the solution of ordinary differential equations thorough a couple of packages ODE and Sundials. The former (ODE) consists of routines written solely in Julia whereas Sundials is a wrapper package around a shared library. ODE exports a set of adaptive solvers; adaptive meaning that the 'step' size of the algorithm changes algorithmically to reduce the error estimate to be below a certain threshold. The calls take the form odeXY, where X is the order of the solver and Y the error control. ode23: Third order adaptive solver with third order error control ode45: Fourth order adaptive solver with fifth order error control ode78: Seventh order adaptive solver with eighth order error control To solve the explicit ODE defined as a vectorize set of equations dy/dt = F(t,y), all routines of which have the same basic form: (tout, yout) = odeXY(F, y0, tspan). As an example, I will look at it as a linear three-species food chain model where the lowest-level prey x is preyed upon by a mid-level species y, which, in turn, is preyed upon by a top level predator z. This is an extension of the Lotka-Volterra system from to three species. Examples might be three-species ecosystems such as mouse-snake-owl, vegetation-rabbits-foxes, and worm-sparrow-falcon. x' = a*x − b*x*y y' = −c*y + d*x*y − e*y*z z' = −f*z + g*y*z #for a,b,c,d,e,f g > 0 Where a, b, c, d are in the two-species Lotka-Volterra equations: e represents the effect of predation on species y by species z f represents the natural death rate of the predator z in the absence of prey g represents the efficiency and propagation rate of the predator z in the presence of prey This translates to the following set of linear equations: x[1] = p[1]*x[1] - p[2]*x[1]*x[2] x[2] = -p[3]*x[2] + p[4]*x[1]*x[2] - p[5]*x[2]*x[3] x[3] = -p[6]*x[3] + p[7]*x[2]*x[3] It is slightly over specified since one of the parameters can be removed by rescaling the timescale. We define the function F as follows: function F(t,x,p) d1 = p[1]*x[1] - p[2]*x[1]*x[2] d2 = -p[3]*x[2] + p[4]*x[1]*x[2] - p[5]*x[2]*x[3] d3 = -p[6]*x[3] + p[7]*x[2]*x[3] [d1, d2, d3] end This takes the time range, vectors of the independent variables and of the coefficients and returns a vector of the derivative estimates: p = ones(7); # Choose all parameters as 1.0 x0 = [0.5, 1.0, 2.0]; # Setup the initial conditions tspan = [0.0:0.1:10.0]; # and the time range Solve the equations by calling the ode23 routine. This returns a matrix of the solutions in a columnar order which we extract and display using Winston: (t,x) = ODE.ode23((t,x) -> F(t,x,pp), x0, tspan); n = length(t); y1 = zeros(n); [y1[i] = x[i][1] for i = 1:n]; y2 = zeros(n); [y2[i] = x[i][2] for i = 1:n]; y3 = zeros(n); [y3[i] = x[i][3] for i = 1:n]; using Winston plot(t,y1,"b.",t,y2,"g-.",t,y3,"r--") This is shown in the following figure: Graphics with Gadlfy Julia now provides a vast array of graphics packages. The "popular" ones may be thought of as Winston, PyPlot and Gadfly and there is also an interface to the increasingly more popular online system Plot.ly. Gadfly is a large and complex package and provides great flexibility in the range and breadth of the visualizations possible in Julia. It is equivalent to the R module ggplot2 and similarly is based on the seminal work The Grammar of Graphics by Leland Wilkinson. The package was written by Daniel Jones and the source on GitHub contains numerous examples of visual displays together with the accompanying code. An entire text could be devoted just to Gadfly so I can only point out some of the main features here and encourage the reader interested in print standard graphics in Julia to refer to the online website at http://gadflyjl.org. The standard call is to the plot() function with creates a graph on the display device via a browser either directly or under the control of IJulia if that is being used as an IDE. It is possible to assign the result of plot() to a variable and invoke this using display(), In this way output can be written to files including: SVG, SVGJS/D3 PDF, and PNG. dd = plot(x = rand(10), y = rand(10)); draw(SVG(“random-pts.svg”, 15cm, 12cm) , dd); Notice that if writing to a backend, the display size is provided, this can be specified in units of cm and inch. Gadfly works well with C libraries of cairo, pango and, fontconfig installed. It will produce SVG and SVGJS graphics but for PNG, PostScript (PS) and PDF cairo is required. Also complex text layouts are more accurate when pango and fontconfig are available. The plot() call can operate on three different data sources: Dataframes Functions and expressions Arrays and collections Unless otherwise specified the type of graph produced is a scatter diagram. The ability to work directly with data frames is especially useful. To illustrate this let's look at the GCSE result set. Recall that this is available as part of the RDatasets suite of source data. using Gadfly, RDatasets, DataFrames; set_default_plot_size(20cm, 12cm); mlmf = dataset("mlmRev","Gcsemv") df = mlmf[complete_cases(mlmf), :] After extracting the data we need to operate with values with do not have any NA values, so we use the complete_cases() function to create a subset of the original data. names(df) 5-element Array{Symbol,1}: ; # => [ :School, :Student, :Gender, :Written, :Course ] If we wish to view the data values for the exam and course work results and at the same time differentiate between boys and girls, this can be displayed by: plot(df, x="Course", y="Written", color="Gender") The JuliaGPU community group Many Julia modules build on the work of other authors working within the same field of study and these have classified as community groups (http://julialang.org/community). Probably the most prolific is the statistics group: JuliaStats (http://juliastats.github.io). One of the main themes in my professional career has been working with hardware to speed up the computing process. In my work on satellite data I worked with the STAR-100 array processor, and once back in the UK, used Silicon Graphics for 3D rendering of medical data . Currently I am interested in using NVIDIA GPUs in financial scenarios and risk calculations. Much of this work has been coded in C, with domain specific languages to program the ancillary hardware. It is now possible to do much of this in Julia with packages contained in the JuliaGPU group. This has routines for both CUDA and OpenCL, at present covering: Basic runtime: CUDA.jl, CUDArt.jl, OpenCL.jl BLAS integration: CUBLAS.jl, CLBLAS FFT operations: CUFFT.jl, CLFFT.jl The CU*-style routines only applies to NVIDIA cards and requires the CUDA SDK to be installed, whereas CL*-functions can be used with variety of GPU s. The CLFFT and CLBLAS do require some additional libraries to be present but we can use OpenCL as is. The following is output from a Lenovo Z50 laptop with an i7 processor and both Intel and NVIDIA graphics chips. julia> using OpenCL julia> OpenCL.devices() OpenCL.Platform(Intel(R) HDGraphics 4400) OpenCL.Platform(Intel(R) Core(TM) i7-4510U CPU) OpenCL.Platform(GeForce 840M on NVIDIA CUDA) To do some calculations we need to define a kernel to be loaded on the GPU. The following multiplies two 1024x1024 matrices of Gaussian random numbers: import OpenCL const cl = OpenCL const kernel_source = """ __kernel void mmul( const int Mdim, const int Ndim, const int Pdim, __global float* A, __global float* B, __global float* C) { int k; int i = get_global_id(0); int j = get_global_id(1); float tmp; if ((i < Ndim) && (j < Mdim)) { tmp = 0.0f; for (k = 0; k < Pdim; k++) tmp += A[i*Ndim + k] * B[k*Pdim + j]; C[i*Ndim+j] = tmp; } } """ The kernel is expressed as a string and the OpenCL DSL has a C-like syntax: const ORDER = 1024; # Order of the square matrices A, B and C const TOL = 0.001; # Tolerance used in floating point comps const COUNT = 3; # Number of runs sizeN = ORDER * ORDER; h_A = float32(randn(ORDER)); # Fill array with random numbers h_B = float32(randn(ORDER)); # --- ditto -- h_C = Array(Float32, ORDER); # Array to hold the results ctx = cl.Context(cl.devices()[3]); queue = cl.CmdQueue(ctx, :profile); d_a = cl.Buffer(Float32, ctx, (:r,:copy), hostbuf = h_A); d_b = cl.Buffer(Float32, ctx, (:r,:copy), hostbuf = h_B); d_c = cl.Buffer(Float32, ctx, :w, length(h_C)); We now create the Open CL context and some data space on the GPU for the three arrays d_A, d_B, and D_C. Then we copy the data in the host arrays h_A and h_B to the device and then load the kernel onto the GPU. prg = cl.Program(ctx, source=kernel_source) |> cl.build! mmul = cl.Kernel(prg, "mmul"); The following loop runs the kernel COUNT times to give an accurate estimate of the elapsed time for the operation. This includes the cl-copy!() operation which copies the results back from the device to the host (Julia) program. for i in 1:COUNT fill!(h_C, 0.0); global_range = (ORDER. ORDER); mmul_ocl = mmul[queue, global_range]; evt = mmul_ocl(int32(ORDER), int32(ORDER), int32(ORDER), d_a, d_b, d_c); run_time = evt[:profile_duration] / 1e9; cl.copy!(queue, h_C, d_c); mflops = 2.0 * Ndims^3 / (1000000.0 * run_time); @printf “%10.8f seconds at %9.5f MFLOPSn” run_time mflops end 0.59426405 seconds at 3613.686 MFLOPS 0.59078856 seconds at 3634.957 MFLOPS 0.57401651 seconds at 3741.153 MFLOPS This compares with the figures for running this natively, without the GPU processor: 7.060888678 seconds at 304.133 MFLOPS That is using the GPU gives a 12-fold increase in the performance of matrix calculation. Summary This article has introduced some of the main features which sets Julia apart from other similar programming languages. I began with a quick look some Julia code by developing a trajectory used in estimating the price of a financial option which was displayed graphically. Continuing with the graphics theme we presented some code to generating a Julia set and to write this to disk as a PGM formatted file. The type system and use of multiple dispatch is discussed next. This a major difference for the user between Julia and object-orientated languages such as R and Python and is central to what gives Julia the power to generate fast machine-level code via LLVM compilation. We then turned to a series of topics from the Julia armory: Working with Python: The ability to call C and Fortran, seamlessly, has been a central feature of Julia since its initial development by the addition of interoperability with Python has opened up a new series of possibilities, leading to the development of the IJulia interface and its integration in the Jupyter project. Simple statistics using DataFrames :As an example of working with data highlighted the Julia implementation of data frames by looking at Apple share prices and applying some simple statistics. MySQL Access using PyCall: Returns to another usage of Python interoperability to illustrate an unconventional method to interface to a MySQL database. Scientific programming with Julia: The case of solution of the ordinary differential equations is presented here by looking at the Lotka-Volterras equation but unusually we develop a solution for the three species model. Graphics with Gadfly: Julia has a wide range of options when developing data visualizations. Gadfly is one of the ‘heavyweights’ and a dataset is extracted from the RDataset.jl package, containing UK GCSE results and the comparison between written and course work results is displayed using Gadfly, categorized by gender. Finally we showcased the work of Julia community groups by looking at an example from the JuliaGPU group by utilizing the OpenCL package to check on the set of supported devices. We then selected an NVIDIA GeForce chip, in order to run execute a simple kernel which multiplied a pair of matrices via the GPU. This was timed against conventional evaluation against native Julia coding in order to illustrate the speedup involved in this approach from parallelizing matrix operations. Resources for Article: Further resources on this subject: Pricing the Double-no-touch option [article] Basics of Programming in Julia [article] SQL Server Analysis Services Administering and Monitoring Analysis Services [article]

0
0
2486

Packt

20 Jul 2015

10 min read

Getting Started with Nginx

Packt

20 Jul 2015

10 min read

In this article by the author, Valery Kholodkov, of the book, Nginx Essentials, we learn to start digging a bit deeper into Nginx, we will quickly go through most common distributions that contain prebuilt packages for Nginx. Installing Nginx Before you can dive into specific features of Nginx, you need to learn how to install Nginx on your system. It is strongly recommended that you use prebuilt binary packages of Nginx if they are available in your distribution. This ensures best integration of Nginx with your system and reuse of best practices incorporated into the package by the package maintainer. Prebuilt binary packages of Nginx automatically maintain dependencies for you and package maintainers are usually fast to include security patches, so you don't get any complaints from security officers. In addition to that, the package usually provides a distribution-specific startup script, which doesn't come out of the box. Refer to your distribution package directory to find out if you have a prebuilt package for Nginx. Prebuilt Nginx packages can also be found under the download link on the official Nginx.org site. Installing Nginx on Ubuntu The Ubuntu Linux distribution contains a prebuilt package for Nginx. To install it, simply run the following command: $ sudo apt-get install nginx The preceding command will install all the required files on your system, including the logrotate script and service autorun scripts. The following table describes the Nginx installation layout that will be created after running this command as well as the purpose of the selected files and folders: Description Path/Folder Nginx configuration files /etc/nginx Main configuration file /etc/nginx/nginx.conf Virtual hosts configuration files (including default one) /etc/nginx/sites-enabled Custom configuration files /etc/nginx/conf.d Log files (both access and error log) /var/log/nginx Temporary files /var/lib/nginx Default virtual host files /usr/share/nginx/html Default virtual host files will be placed into /usr/share/nginx/html. Please keep in mind that this directory is only for the default virtual host. For deploying your web application, use folders recommended by Filesystem Hierarchy Standard (FHS). Now you can start the Nginx service with the following command: $ sudo service nginx start This will start Nginx on your system. Alternatives The prebuilt Nginx package on Ubuntu has a number of alternatives. Each of them allows you to fine tune the Nginx installation for your system. Installing Nginx on Red Hat Enterprise Linux or CentOS/Scientific Linux Nginx is not provided out of the box in Red Hat Enterprise Linux or CentOS/Scientific Linux. Instead, we will use the Extra Packages for Enterprise Linux (EPEL) repository. EPEL is a repository that is maintained by Red Hat Enterprise Linux maintainers, but contains packages that are not a part of the main distribution for various reasons. You can read more about EPEL at https://fedoraproject.org/wiki/EPEL. To enable EPEL, you need to download and install the repository configuration package: For RHEL or CentOS/SL 7, use the following link: http://download.fedoraproject.org/pub/epel/7/x86_64/repoview/epel-release.html For RHEL/CentOS/SL 6 use the following link: http://download.fedoraproject.org/pub/epel/6/i386/repoview/epel-release.html If you have a newer/older RHEL version, please take a look at the How can I use these extra packages? section in the original EPEL wiki at the following link: https://fedoraproject.org/wiki/EPEL Now that you are ready to install Nginx, use the following command: # yum install nginx The preceding command will install all the required files on your system, including the logrotate script and service autorun scripts. The following table describes the Nginx installation layout that will be created after running this command and the purpose of the selected files and folders: Description Path/Folder Nginx configuration files /etc/nginx Main configuration file /etc/nginx/nginx.conf Virtual hosts configuration files (including default one) /etc/nginx/conf.d Custom configuration files /etc/nginx/conf.d Log files (both access and error log) /var/log/nginx Temporary files /var/lib/nginx Default virtual host files /usr/share/nginx/html Default virtual host files will be placed into /usr/share/nginx/html. Please keep in mind that this directory is only for the default virtual host. For deploying your web application, use folders recommended by FHS. By default, the Nginx service will not autostart on system startup, so let's enable it. Refer to the following table for the commands corresponding to your CentOS version: Function Cent OS 6 Cent OS 7 Enable Nginx startup at system startup chkconfig nginx on systemctl enable nginx Manually start Nginx service nginx start systemctl start nginx Manually stop Nginx service nginx stop systemctl start nginx Installing Nginx from source files Traditionally, Nginx is distributed in the source code. In order to install Nginx from the source code, you need to download and compile the source files on your system. It is not recommended that you install Nginx from the source code. Do this only if you have a good reason, such as the following scenarios: You are a software developer and want to debug or extend Nginx You feel confident enough to maintain your own package A package from your distribution is not good enough for you You want to fine-tune your Nginx binary. In either case, if you are planning to use this way of installing for real use, be prepared to sort out challenges such as dependency maintenance, distribution, and application of security patches. In this section, we will be referring to the configuration script. Configuration script is a shell script similar to one generated by autoconf, which is required to properly configure the Nginx source code before it can be compiled. This configuration script has nothing to do with the Nginx configuration file that we will be discussing later. Downloading the Nginx source files The primary source for Nginx for an English-speaking audience is Nginx.org. Open https://nginx.org/en/download.html in your browser and choose the most recent stable version of Nginx. Download the chosen archive into a directory of your choice (/usr/local or /usr/src are common directories to use for compiling software): $ wget -q http://nginx.org/download/nginx-1.7.9.tar.gz Extract the files from the downloaded archive and change to the directory corresponding to the chosen version of Nginx: $ tar xf nginx-1.7.9.tar.gz$ cd nginx-1.7.9 To configure the source code, we need to run the ./configure script included in the archive: $ ./configurechecking for OS+ Linux 3.13.0-36-generic i686checking for C compiler ... found+ using GNU C compiler[...] This script will produce a lot of output and, if successful, will generate a Makefile file for the source files. Notice that we showed the non-privileged user prompt $ instead of the root # in the previous command lines. You are encouraged to configure and compile software as a regular user and only install as root. This will prevent a lot of problems related to access restriction while working with the source code. Troubleshooting The troubleshooting step, although very simple, has a couple of common pitfalls. The basic installation of Nginx requires the presence of OpenSSL and Perl-compatible Regex (PCRE) developer packages in order to compile. If these packages are not properly installed or not installed in locations where the Nginx configuration script is able to locate them, the configuration step might fail. Then, you have to choose between disabling the affected Nginx built-in modules (rewrite or SSL, installing required packages properly, or pointing the Nginx configuration script to the actual location of those packages if they are installed. Building Nginx You can build the source files now using the following command: $ make You'll see a lot of output on compilation. If build is successful, you can install the Nginx file on your system. Before doing that, make sure you escalate your privileges to the super user so that the installation script can install the necessary files into the system areas and assign necessary privileges. Once successful, run the make install command: # make install The preceding command will install all the necessary files on your system. The following table lists all locations of the Nginx files that will be created after running this command and their purposes: Description Path/Folder Nginx configuration files /usr/local/nginx/conf Main configuration file /usr/local/nginx/conf/nginx.conf Log files (both access and error log) /usr/local/nginx/logs Temporary files /usr/local/nginx Default virtual host files /usr/local/nginx/html Unlike installations from prebuilt packages, installation from source files does not harness Nginx folders for the custom configuration files or virtual host configuration files. The main configuration file is also very simple in its nature. You have to take care of this yourself. Nginx must be ready to use now. To start Nginx, change your working directory to the /usr/local/nginx directory and run the following command: # sbin/nginx This will start Nginx on your system with the default configuration. Troubleshooting This stage works flawlessly most of the time. A problem can occur in the following situations: You are using nonstandard system configuration. Try to play with the options in the configuration script in order to overcome the problem. You compiled in third-party modules and they are out of date or not maintained. Switch off third-party modules that break your build or contact the developer for assistance. Copying the source code configuration from prebuilt packages Occasionally you might want to amend Nginx binary from a prebuilt packages with your own changes. In order to do that you need to reproduce the build tree that was used to compile Nginx binary for the prebuilt package. But how would you know what version of Nginx and what configuration script options were used at the build time? Fortunately, Nginx has a solution for that. Just run the existing Nginx binary with the -V command-line option. Nginx will print the configure-time options. This is shown in the following: $ /usr/sbin/nginx -Vnginx version: nginx/1.4.6 (Ubuntu)built by gcc 4.8.2 (Ubuntu 4.8.2-19ubuntu1)TLS SNI support enabledconfigure arguments: --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' … Using the output of the preceding command, reproduce the entire build environment, including the Nginx source tree of the corresponding version and modules that were included into the build. Here, the output of the Nginx -V command is trimmed for simplicity. In reality, you will be able to see and copy the entire command line that was passed to the configuration script at the build time. You might even want to reproduce the version of the compiler used in order to produce a binary-identical Nginx executable file (we will discuss this later when discussing how to troubleshoot crashes). Once this is done, run the ./configure script of your Nginx source tree with options from the output of the -V option (with necessary alterations) and follow the remaining steps of the build procedure. You will get an altered Nginx executable on the objs/ folder of the source tree. Summary Here, you learned how to install Nginx from a number of available sources, the structure of Nginx installation and the purpose of various files, the elements and structure of the Nginx configuration file, and how to create a minimal working Nginx configuration file. You also learned about some best practices for Nginx configuration.

0
0
23443

article-image-deployment-and-maintenance

Packt

20 Jul 2015

21 min read

Deployment and Maintenance

Packt

20 Jul 2015

21 min read

In this article by Sandro Pasquali, author of Deploying Node.js, we will learn about the following: Automating the deployment of applications, including a look at the differences between continuous integration, delivery, and deployment Using Git to track local changes and triggering deployment actions via webhooks when appropriate Using Vagrant to synchronize your local development environment with a deployed production server Provisioning a server with Ansible Note that application deployment is a complex topic with many dimensions that are often considered within unique sets of needs. This article is intended as an introduction to some of the technologies and themes you will encounter. Also, note that the scaling issues are part and parcel of deployment. (For more resources related to this topic, see here.) Using GitHub webhooks At the most basic level, deployment involves automatically validating, preparing, and releasing new code into production environments. One of the simplest ways to set up a deployment strategy is to trigger releases whenever changes are committed to a Git repository through the use of webhooks. Paraphrasing the GitHub documentation, webhooks provide a way for notifications to be delivered to an external web server whenever certain actions occur on a repository. In this section, we'll use GitHub webhooks to create a simple continuous deployment workflow, adding more realistic checks and balances. We'll build a local development environment that lets developers work with a clone of the production server code, make changes, and see the results of those changes immediately. As this local development build uses the same repository as the production build, the build process for a chosen environment is simple to configure, and multiple production and/or development boxes can be created with no special effort. The first step is to create a GitHub (www.github.com) account if you don't already have one. Basic accounts are free and easy to set up. Now, let's look at how GitHub webhooks work. Enabling webhooks Create a new folder and insert the following package.json file: {"name": "express-webhook","main": "server.js","dependencies": {"express": "~4.0.0","body-parser": "^1.12.3"}} This ensures that Express 4.x is installed and includes the body-parser package, which is used to handle POST data. Next, create a basic server called server.js: var express = require('express');var app = express();var bodyParser = require('body-parser');var port = process.env.PORT || 8082;app.use(bodyParser.json());app.get('/', function(req, res) {res.send('Hello World!');});app.post('/webhook', function(req, res) {// We'll add this next});app.listen(port);console.log('Express server listening on port ' + port); Enter the folder you've created, and build and run the server with npm install; npm start. Visit localhost:8082/ and you should see "Hello World!" in your browser. Whenever any file changes in a given repository, we want GitHub to push information about the change to /webhook. So, the first step is to create a GitHub repository for the Express server mentioned in the code. Go to your GitHub account and create a new repository with the name 'express-webhook'. The following screenshot shows this: Once the repository is created, enter your local repository folder and run the following commands: git initgit add .git commit -m "first commit"git remote add origin git@github.com:<your username>/express-webhook You should now have a new GitHub repository and a local linked version. The next step is to configure this repository to broadcast the push event on the repository. Navigate to the following URL: https://github.com/<your_username>/express-webhook/settings From here, navigate to Webhooks & Services | Add webhook (you may need to enter your password again). You should now see the following screen: This is where you set up webhooks. Note that the push event is already set as default, and, if asked, you'll want to disable SSL verification for now. GitHub needs a target URL to use POST on change events. If you have your local repository in a location that is already web accessible, enter that now, remembering to append the /webhook route, as in http://www.example.com/webhook. If you are building on a local machine or on another limited network, you'll need to create a secure tunnel that GitHub can use. A free service to do this can be found at http://localtunnel.me/. Follow the instructions on that page, and use the custom URL provided to configure your webhook. Other good forwarding services can be found at https://forwardhq.com/ and https://meetfinch.com/. Now that webhooks are enabled, the next step is to test the system by triggering a push event. Create a new file called readme.md (add whatever you'd like to it), save it, and then run the following commands: git add readme.mdgit commit -m "testing webhooks"git push origin master This will push changes to your GitHub repository. Return to the Webhooks & Services section for the express-webhook repository on GitHub. You should see something like this: This is a good thing! GitHub noticed your push and attempted to deliver information about the changes to the webhook endpoint you set, but the delivery failed as we haven't configured the /webhook route yet—that's to be expected. Inspect the failed delivery payload by clicking on the last attempt—you should see a large JSON file. In that payload, you'll find something like this: "committer": {"name": "Sandro Pasquali","email": "spasquali@gmail.com","username": "sandro-pasquali"},"added": ["readme.md"],"removed": [],"modified": [] It should now be clear what sort of information GitHub will pass along whenever a push event happens. You can now configure the /webhook route in the demonstration Express server to parse this data and do something with that information, such as sending an e-mail to an administrator. For example, use the following code: app.post('/webhook', function(req, res) {console.log(req.body);}); The next time your webhook fires, the entire JSON payload will be displayed. Let's take this to another level, breaking down the autopilot application to see how webhooks can be used to create a build/deploy system. Implementing a build/deploy system using webhooks To demonstrate how to build a webhook-powered deployment system, we're going to use a starter kit for application development. Go ahead and use fork on the repository at https://github.com/sandro-pasquali/autopilot.git. You now have a copy of the autopilot repository, which includes scaffolding for common Gulp tasks, tests, an Express server, and a deploy system that we're now going to explore. The autopilot application implements special features depending on whether you are running it in production or in development. While autopilot is a little too large and complex to fully document here, we're going to take a look at how major components of the system are designed and implemented so that you can build your own or augment existing systems. Here's what we will examine: How to create webhooks on GitHub programmatically How to catch and read webhook payloads How to use payload data to clone, test, and integrate changes How to use PM2 to safely manage and restart servers when code changes If you haven't already used fork on the autopilot repository, do that now. Clone the autopilot repository onto a server or someplace else where it is web-accessible. Follow the instructions on how to connect and push to the fork you've created on GitHub, and get familiar with how to pull and push changes, commit changes, and so on. PM2 delivers a basic deploy system that you might consider for your project (https://github.com/Unitech/PM2/blob/master/ADVANCED_README.md#deployment). Install the cloned autopilot repository with npm install; npm start. Once npm has installed dependencies, an interactive CLI application will lead you through the configuration process. Just hit the Enter key for all the questions, which will set defaults for a local development build (we'll build in production later). Once the configuration is complete, a new development server process controlled by PM2 will have been spawned. You'll see it listed in the PM2 manifest under autopilot-dev in the following screenshot: You will make changes in the /source directory of this development build. When you eventually have a production server in place, you will use git push on the local changes to push them to the autopilot repository on GitHub, triggering a webhook. GitHub will use POST on the information about the change to an Express route that we will define on our server, which will trigger the build process. The build runner will pull your changes from GitHub into a temporary directory, install, build, and test the changes, and if all is well, it will replace the relevant files in your deployed repository. At this point, PM2 will restart, and your changes will be immediately available. Schematically, the flow looks like this: To create webhooks on GitHub programmatically, you will need to create an access token. The following diagram explains the steps from A to B to C: We're going to use the Node library at https://github.com/mikedeboer/node-github to access GitHub. We'll use this package to create hooks on Github using the access token you've just created. Once you have an access token, creating a webhook is easy: var GitHubApi = require("github");github.authenticate({type: "oauth",token: <your token>});github.repos.createHook({"user": <your github username>,"repo": <github repo name>,"name": "web","secret": <any secret string>,"active": true,"events": ["push"],"config": {"url": "http://yourserver.com/git-webhook","content_type": "json"}}, function(err, resp) {...}); Autopilot performs this on startup, removing the need for you to manually create a hook. Now, we are listening for changes. As we saw previously, GitHub will deliver a payload indicating what has been added, what has been deleted, and what has changed. The next step for the autopilot system is to integrate these changes. It is important to remember that, when you use webhooks, you do not have control over how often GitHub will send changesets—if more than one person on your team can push, there is no predicting when those pushes will happen. The autopilot system uses Redis to manage a queue of requests, executing them in order. You will need to manage multiple changes in a way. For now, let's look at a straightforward way to build, test, and integrate changes. In your code bundle, visit autopilot/swanson/push.js. This is a process runner on which fork has been used by buildQueue.js in that same folder. The following information is passed to it: The URL of the GitHub repository that we will clone The directory to clone that repository into (<temp directory>/<commit hash>) The changeset The location of the production repository that will be changed Go ahead and read through the code. Using a few shell scripts, we will clone the changed repository and build it using the same commands you're used to—npm install, npm test, and so on. If the application builds without errors, we need only run through the changeset and replace the old files with the changed files. The final step is to restart our production server so that the changes reach our users. Here is where the real power of PM2 comes into play. When the autopilot system is run in production, PM2 creates a cluster of servers (similar to the Node cluster module). This is important as it allows us to restart the production server incrementally. As we restart one server node in the cluster with the newly pushed content, the other clusters continue to serve old content. This is essential to keeping a zero-downtime production running. Hopefully, the autopilot implementation will give you a few ideas on how to improve this process and customize it to your own needs. Synchronizing local and deployed builds One of the most important (and often difficult) parts of the deployment process is ensuring that the environment an application is being developed, built, and tested within perfectly simulates the environment that application will be deployed into. In this section, you'll learn how to emulate, or virtualize, the environment your deployed application will run within using Vagrant. After demonstrating how this setup can simplify your local development process, we'll use Ansible to provision a remote instance on DigitalOcean. Developing locally with Vagrant For a long while, developers would work directly on running servers or cobble together their own version of the production environment locally, often writing ad hoc scripts and tools to smoothen their development process. This is no longer necessary in a world of virtual machines. In this section, we will learn how to use Vagrant to emulate a production environment within your development environment, advantageously giving you a realistic box to work on testing code for production and isolating your development process from your local machine processes. By definition, Vagrant is used to create a virtual box emulating a production environment. So, we need to install Vagrant, a virtual machine, and a machine image. Finally, we'll need to write the configuration and provisioning scripts for our environment. Go to http://www.vagrantup.com/downloads and install the right Vagrant version for your box. Do the same with VirtualBox here at https://www.virtualbox.org/wiki/Downloads. You now need to add a box to run. For this example, we're going to use Centos 7.0, but you can choose whichever you'd prefer. Create a new folder for this project, enter it, and run the following command: vagrant box add chef/centos-7.0 Usefully, the creators of Vagrant, HashiCorp, provide a search service for Vagrant boxes at https://atlas.hashicorp.com/boxes/search. You will be prompted to choose your virtual environment provider—select virtualbox. All relevant files and machines will now be downloaded. Note that these boxes are very large and may take time to download. You'll now create a configuration file for Vagrant called Vagrantfile. As with npm, the init command quickly sets up a base file. Additionally, we'll need to inform Vagrant of the box we'll be using: vagrant init chef/centos-7.0 Vagrantfile is written in Ruby and defines the Vagrant environment. Open it up now and scan it. There is a lot of commentary, and it makes a useful read. Note the config.vm.box = "chef/centos-7.0" line, which was inserted during the initialization process. Now you can start Vagrant: vagrant up If everything went as expected, your box has been booted within Virtualbox. To confirm that your box is running, use the following code: vagrant ssh If you see a prompt, you've just set up a virtual machine. You'll see that you are in the typical home directory of a CentOS environment. To destroy your box, run vagrant destroy. This deletes the virtual machine by cleaning up captured resources. However, the next vagrant up command will need to do a lot of work to rebuild. If you simply want to shut down your machine, use vagrant halt. Vagrant is useful as a virtualized, production-like environment for developers to work within. To that end, it must be configured to emulate a production environment. In other words, your box must be provisioned by telling Vagrant how it should be configured and what software should be installed whenever vagrant up is run. One strategy for provisioning is to create a shell script that configures our server directly and point the Vagrant provisioning process to that script. Add the following line to Vagrantfile: config.vm.provision "shell", path: "provision.sh" Now, create that file with the following contents in the folder hosting Vagrantfile: # install nvmcurl https://raw.githubusercontent.com/creationix/nvm/v0.24.1/install.sh | bash# restart your shell with nvm enabledsource ~/.bashrc# install the latest Node.jsnvm install 0.12# ensure server default versionnvm alias default 0.12 Destroy any running Vagrant boxes. Run Vagrant again, and you will notice in the output the execution of the commands in our provisioning shell script. When this has been completed, enter your Vagrant box as the root (Vagrant boxes are automatically assigned the root password "vagrant"): vagrant sshsu You will see that Node v0.12.x is installed: node -v It's standard to allow password-less sudo for the Vagrant user. Run visudo and add the following line to the sudoers configuration file: vagrant ALL=(ALL) NOPASSWD: ALL Typically, when you are developing applications, you'll be modifying files in a project directory. You might bind a directory in your Vagrant box to a local code editor and develop in that way. Vagrant offers a simpler solution. Within your VM, there is a /vagrant folder that maps to the folder that Vagrantfile exists within, and these two folders are automatically synced. So, if you add the server.js file to the right folder on your local machine, that file will also show up in your VM's /vagrant folder. Go ahead and create a new test file either in your local folder or in your VM's /vagrant folder. You'll see that file synchronized to both locations regardless of where it was originally created. Let's clone our express-webhook repository from earlier in this article into our Vagrant box. Add the following lines to provision.sh: # install various packages, particularly for gityum groupinstall "Development Tools" -yyum install gettext-devel openssl-devel perl-CPAN perl-devel zlib-devel-yyum install git -y# Move to shared folder, clone and start servercd /vagrantgit clone https://github.com/sandro-pasquali/express-webhookcd express-webhooknpm i; npm start Add the following to Vagrantfile, which will map port 8082 on the Vagrant box (a guest port representing the port our hosted application listens on) to port 8000 on our host machine: config.vm.network "forwarded_port", guest: 8082, host: 8000 Now, we need to restart the Vagrant box (loading this new configuration) and re-provision it: vagrant reloadvagrant provision This will take a while as yum installs various dependencies. When provisioning is complete, you should see this as the last line: ==> default: Express server listening on port 8082 Remembering that we bound the guest port 8082 to the host port 8000, go to your browser and navigate to localhost:8000. You should see "Hello World!" displayed. Also note that in our provisioning script, we cloned to the (shared) /vagrant folder. This means the clone of express-webhook should be visible in the current folder, which will allow you to work on the more easily accessible codebase, knowing it will be automatically synchronized with the version on your Vagrant box. Provisioning with Ansible Configuring your machines by hand, as we've done previously, doesn't scale well. For one, it can be overly difficult to set and manage environment variables. Also, writing your own provisioning scripts is error-prone and no longer necessary given the existence of provisioning tools, such as Ansible. With Ansible, we can define server environments using an organized syntax rather than ad hoc scripts, making it easier to distribute and modify configurations. Let's recreate the provision.sh script developed earlier using Ansible playbooks: Playbooks are Ansible's configuration, deployment, and orchestration language. They can describe a policy you want your remote systems to enforce or a set of steps in a general IT process. Playbooks are expressed in the YAML format (a human-readable data serialization language). To start with, we're going to change Vagrantfile's provisioner to Ansible. First, create the following subdirectories in your Vagrant folder: provisioningcommontasks These will be explained as we proceed through the Ansible setup. Next, create the following configuration file and name it ansible.cfg: [defaults]roles_path = provisioninglog_path = ./ansible.log This indicates that Ansible roles can be found in the /provisioning folder, and that we want to keep a provisioning log in ansible.log. Roles are used to organize tasks and other functions into reusable files. These will be explained shortly. Modify the config.vm.provision definition to the following: config.vm.provision "ansible" do |ansible|ansible.playbook = "provisioning/server.yml"ansible.verbose = "vvvv"end This tells Vagrant to defer to Ansible for provisioning instructions, and that we want the provisioning process to be verbose—we want to get feedback when the provisioning step is running. Also, we can see that the playbook definition, provisioning/server.yml, is expected to exist. Create that file now: ---- hosts: allsudo: yesroles:- commonvars:env:user: 'vagrant'nvm:version: '0.24.1'node_version: '0.12'build:repo_path: 'https://github.com/sandro-pasquali'repo_name: 'express-webhook' Playbooks can contain very complex rules. This simple file indicates that we are going to provision all available hosts using a single role called common. In more complex deployments, an inventory of IP addresses could be set under hosts, but, here, we just want to use a general setting for our one server. Additionally, the provisioning step will be provided with certain environment variables following the forms env.user, nvm.node_version, and so on. These variables will come into play when we define the common role, which will be to provision our Vagrant server with the programs necessary to build, clone, and deploy express-webhook. Finally, we assert that Ansible should run as an administrator (sudo) by default—this is necessary for the yum package manager on CentOS. We're now ready to define the common role. With Ansible, folder structures are important and are implied by the playbook. In our case, Ansible expects the role location (./provisioning, as defined in ansible.cfg) to contain the common folder (reflecting the common role given in the playbook), which itself must contain a tasks folder containing a main.yml file. These last two naming conventions are specific and required. The final step is creating the main.yml file in provisioning/common/tasks. First, we replicate the yum package loaders (see the file in your code bundle for the full list): ---- name: Install necessary OS programsyum: name={{ item }} state=installedwith_items:- autoconf- automake...- git Here, we see a few benefits of Ansible. A human-readable description of yum tasks is provided to a looping structure that will install every item in the list. Next, we run the nvm installer, which simply executes the auto-installer for nvm: - name: Install nvmsudo: noshell: "curl https://raw.githubusercontent.com/creationix/nvm/v{{ nvm.version }}/install.sh | bash" Note that, here, we're overriding the playbook's sudo setting. This can be done on a per-task basis, which gives us the freedom to move between different permission levels while provisioning. We are also able to execute shell commands while at the same time interpolating variables: - name: Update .bashrcsudo: nolineinfile: >dest="/home/{{ env.user }}/.bashrc"line="source /home/{{ env.user }}/.nvm/nvm.sh" Ansible provides extremely useful tools for file manipulation, and we will see here a very common one—updating the .bashrc file for a user. The lineinfile directive makes the addition of aliases, among other things, straightforward. The remainder of the commands follow a similar pattern to implement, in a structured way, the provisioning directives we need for our server. All the files you will need are in your code bundle in the vagrant/with_ansible folder. Once you have them installed, run vagrant up to see Ansible in action. One of the strengths of Ansible is the way it handles contexts. When you start your Vagrant build, you will notice that Ansible gathers facts, as shown in the following screenshot: Simply put, Ansible analyzes the context it is working in and only executes what is necessary to execute. If one of your tasks has already been run, the next time you try vagrant provision, that task will not run again. This is not true for shell scripts! In this way, editing playbooks and reprovisioning does not consume time redundantly changing what has already been changed. Ansible is a powerful tool that can be used for provisioning and much more complex deployment tasks. One of its great strengths is that it can run remotely—unlike most other tools, Ansible uses SSH to connect to remote servers and run operations. There is no need to install it on your production boxes. You are encouraged to browse the Ansible documentation at http://docs.ansible.com/index.html to learn more. Summary In this article, you learned how to deploy a local build into a production-ready environment and the powerful Git webhook tool was demonstrated as a way of creating a continuous integration environment. Resources for Article: Further resources on this subject: Node.js Fundamentals [Article] API with MongoDB and Node.js [Article] So, what is Node.js? [Article]

0
0
3015

How-To Tutorials

Packt

20 Jul 2015

20 min read

Sprites, Camera, Actions!

Packt

20 Jul 2015

20 min read

In this article by, Stephen Haney, author of the book Game Development with Swift, we will focus on building great gameplay experiences while SpriteKit performs the mechanical work of the game loop. To draw an item to the screen, we create a new instance of a SpriteKit node. These nodes are simple; we attach a child node to our scene, or to existing nodes, for each item we want to draw. Sprites, particle emitters, and text labels are all considered nodes in SpriteKit. The topics in this article include: Drawing your first sprite Animation: movement, scaling, and rotation Working with textures Organizing art into texture atlases For this article, you need to first install Xcode, and then create a project. The project automatically creates the GameScene.swift file as the default file to store the scene of your new game. (For more resources related to this topic, see here.) Drawing your first sprite It is time to write some game code – fantastic! Open your GameScene.swift file and find the didMoveToView function. Recall that this function fires every time the game switches to this scene. We will use this function to get familiar with the SKSpriteNode class. You will use SKSpriteNode extensively in your game, whenever you want to add a new 2D graphic entity. The term sprite refers to a 2D graphic or animation that moves around the screen independently from the background. Over time, the term has developed to refer to any game object on the screen in a 2D game. We will create and draw your first sprite in this article: a happy little bee. Building a SKSpriteNode class Let's begin by drawing a blue square to the screen. The SKSpriteNode class can draw both texture graphics and solid blocks of color. It is often helpful to prototype your new game ideas with blocks of color before you spend time with artwork. To draw the blue square, add an instance of SKSpriteNode to the game: override func didMoveToView(view: SKView) {// Instantiate a constant, mySprite, instance of SKSpriteNode// The SKSpriteNode constructor can set color and size// Note: UIColor is a UIKit class with built-in color presets// Note: CGSize is a type we use to set node sizeslet mySprite = SKSpriteNode(color: UIColor.blueColor(), size:CGSize(width: 50, height: 50))// Assign our sprite a position in points, relative to its// parent node (in this case, the scene)mySprite.position = CGPoint(x: 300, y: 300)// Finally, we need to add our sprite node into the node tree.// Call the SKScene's addChild function to add the node// Note: In Swift, 'self' is an automatic property// on any type instance, exactly equal to the instance itself// So in this instance, it refers to the GameScene instanceself.addChild(mySprite)} Go ahead and run the project. You should see a similar small blue square appear in your simulator: Swift allows you to define variables as constants, which can be assigned a value only once. For best performance, use let to declare constants whenever possible. Declare your variables with var when you need to alter the value later in your code. Adding animation to your Toolkit Before we dive back in to sprite theory, we should have some fun with our blue square. SpriteKit uses action objects to move sprites around the screen. Consider this example: if our goal is to move the square across the screen, we must first create a new action object to describe the animation. Then, we instruct our sprite node to execute the action. I will illustrate this concept with many examples in the article. For now, add this code in the didMoveToView function, below the self.addChild(mySprite) line: // Create a new constant for our action instance// Use the moveTo action to provide a goal position for a node// SpriteKit will tween to the new position over the course of the// duration, in this case 5 secondslet demoAction = SKAction.moveTo(CGPoint(x: 100, y: 100),duration: 5)// Tell our square node to execute the action!mySprite.runAction(demoAction) Run the project. You will see our blue square slide across the screen towards the (100,100) position. This action is re-usable; any node in your scene can execute this action to move to the (100,100) position. As you can see, SpriteKit does a lot of the heavy lifting for us when we need to animate node properties. Inbetweening, or tweening, uses the engine to animate smoothly between a start frame and an end frame. Our moveTo animation is a tween; we provide the start frame (the sprite's original position) and the end frame (the new destination position). SpriteKit generates the smooth transition between our values. Let's try some other actions. The SKAction.moveTo function is only one of many options. Try replacing the demoAction line with this code: let demoAction = SKAction.scaleTo(4, duration: 5) Run the project. You will see our blue square grow to four times its original size. Sequencing multiple animations We can execute actions together simultaneously or one after the each other with action groups and sequences. For instance, we can easily scale our sprite larger and spin it at the same time. Delete all of our action code so far and replace it with this code: // Scale up to 4x initial scalelet demoAction1 = SKAction.scaleTo(4, duration: 5)// Rotate 5 radianslet demoAction2 = SKAction.rotateByAngle(5, duration: 5)// Group the actionslet actionGroup = SKAction.group([demoAction1, demoAction2])// Execute the group!mySprite.runAction(actionGroup) When you run the project, you will see a spinning, growing square. Terrific! If you want to run these actions in sequence (rather than at the same time) change SKAction.group to SKAction.sequence: // Group the actions into a sequencelet actionSequence = SKAction.sequence([demoAction1, demoAction2])// Execute the sequence!mySprite.runAction(actionSequence) Run the code and watch as your square first grows and then spins. Good. You are not limited to two actions; we can group or sequence as many actions together as we need. We have only used a few actions so far; feel free to explore the SKAction class and try out different action combinations before moving on. Recapping your first sprite Congratulations, you have learned to draw a non-textured sprite and animate it with SpriteKit actions. Next, we will explore some important positioning concepts, and then add game art to our sprites. Before you move on, make sure your didMoveToView function matches with mine, and your sequenced animation is firing properly. Here is my code up to this point: override func didMoveToView(view: SKView) {// Instantiate a constant, mySprite, instance of SKSpriteNodelet mySprite = SKSpriteNode(color: UIColor.blueColor(), size:CGSize(width: 50, height: 50))// Assign our sprite a positionmySprite.position = CGPoint(x: 300, y: 300)// Add our sprite node into the node treeself.addChild(mySprite)// Scale up to 4x initial scalelet demoAction1 = SKAction.scaleTo(CGFloat(4), duration: 2)// Rotate 5 radianslet demoAction2 = SKAction.rotateByAngle(5, duration: 2)// Group the actions into a sequencelet actionSequence = SKAction.sequence([demoAction1,demoAction2])// Execute the sequence!mySprite.runAction(actionSequence)} The story on positioning SpriteKit uses a grid of points to position nodes. In this grid, the bottom left corner of the scene is (0,0), with a positive X-axis to the right and a positive Y-axis to the top. Similarly, on the individual sprite level, (0,0) refers to the bottom left corner of the sprite, while (1,1) refers to the top right corner. Alignment with anchor points Each sprite has an anchorPoint property, or an origin. The anchorPoint property allows you to choose which part of the sprite aligns to the sprite's overall position. The default anchor point is (0.5,0.5), so a new SKSpriteNode centers perfectly on its position. To illustrate this, let us examine the blue square sprite we just drew on the screen. Our sprite is 50 pixels wide and 50 pixels tall, and its position is (300,300). Since we have not modified the anchorPoint property, its anchor point is (0.5,0.5). This means the sprite will be perfectly centered over the (300,300) position on the scene's grid. Our sprite's left edge begins at 275 and the right edge terminates at 325. Likewise, the bottom starts at 275 and the top ends at 325. The following diagram illustrates our block's position on the grid: Why do we prefer centered sprites by default? You may think it simpler to position elements by their bottom left corner with an anchorPoint property setting of (0,0). However, the centered behavior benefits us when we scale or rotate sprites: When we scale a sprite with an anchorPoint property of (0,0) it will only expand up the y-axis and out the x-axis. Rotation actions will swing the sprite in wide circles around its bottom left corner. A centered sprite, with the default anchorPoint property of (0.5, 0.5), will expand or contract equally in all directions when scaled and will spin in place when rotated, which is usually the desired effect. There are some cases when you will want to change an anchor point. For instance, if you are drawing a rocket ship, you may want the ship to rotate around the front nose of its cone, rather than its center. Adding textures and game art You may want to take a screenshot of your blue box for your own enjoyment later. I absolutely love reminiscing over old screenshots of my finished games when they were nothing more than simple colored blocks sliding around the screen. Now it is time to move past that stage and attach some fun artwork to our sprite. Downloading the free assets I am providing a downloadable pack for all of the art assets. I recommend you use these assets so you will have everything you need for our demo game. Alternatively, you are certainly free to create your own art for your game if you prefer. These assets come from an outstanding public domain asset pack from Kenney Game Studio. I am providing a small subset of the asset pack that we will use in our game. Download the game art from this URL: http://www.thinkingswiftly.com/game-development-with-swift/assets More exceptional art If you like the art, you can download over 16,000 game assets in the same style for a small donation at http://kenney.itch.io/kenney-donation. I do not have an affiliation with Kenney; I just find it admirable that he has released so much public domain artwork for indie game developers. As CC0 assets, you can copy, modify, and distribute the art, even for commercial purposes, all without asking permission. You can read the full license here: https://creativecommons.org/publicdomain/zero/1.0/ Drawing your first textured sprite Let us use some of the graphics you just downloaded. We will start by creating a bee sprite. We will add the bee texture to our project, load the image onto a SKSpriteNode class, and then size the node for optimum sharpness on retina screens. Add the bee image to your project We need to add the image files to our Xcode project before we can use them in the game. Once we add the images, we can reference them by name in our code; SpriteKit is smart enough to find and implement the graphics. Follow these steps to add the bee image to the project: Right-click on your project in the project navigator and click on Add Files to "Pierre Penguin Escapes the Antarctic" (or the name of your game). Refer to this screenshot to find the correct menu item: Browse to the asset pack you downloaded and locate the bee.png image inside the Enemies folder. Check Copy items if needed, then click Add. You should now see bee.png in your project navigator. Loading images with SKSpriteNode It is quite easy to draw images to the screen with SKSpriteNode. Start by clearing out all of the code we wrote for the blue square inside the didMoveToView function in GameScene.swift. Replace didMoveToView with this code: override func didMoveToView(view: SKView) {// set the scene's background to a nice sky blue// Note: UIColor uses a scale from 0 to 1 for its colorsself.backgroundColor = UIColor(red: 0.4, green: 0.6, blue:0.95, alpha: 1.0);// create our bee sprite nodelet bee = SKSpriteNode(imageNamed: "bee.png")// size our bee nodebee.size = CGSize(width: 100, height: 100)// position our bee nodebee.position = CGPoint(x: 250, y: 250)// attach our bee to the scene's node treeself.addChild(bee)} Run the project and witness our glorious bee – great work! Designing for retina You may notice that our bee image is quite blurry. To take advantage of retina screens, assets need to be twice the pixel dimensions of their node's size property (for most retina screens), or three times the node size for the iPhone 6 Plus. Ignore the height for a moment; our bee node is 100 points wide but the PNG file is only 56 pixels wide. The PNG file needs to be 300 pixels wide to look sharp on the iPhone 6 Plus, or 200 pixels wide to look sharp on 2x retina devices. SpriteKit will automatically resize textures to fit their nodes, so one approach is to create a giant texture at the highest retina resolution (three times the node size) and let SpriteKit resize the texture down for lower density screens. However, there is a considerable performance penalty, and older devices can even run out of memory and crash from the huge textures. The ideal asset approach These double- and triple-sized retina assets can be confusing to new iOS developers. To solve this issue, Xcode normally lets you provide three image files for each texture. For example, our bee node is currently 100 points wide and 100 points tall. In a perfect world, you would provide the following images to Xcode: Bee.png (100 pixels by 100 pixels) Bee@2x.png (200 pixels by 200 pixels) Bee@3x.png (300 pixels by 300 pixels) However, there is currently an issue that prevents 3x textures from working correctly with texture atlases. Texture atlases group textures together and increase rendering performance dramatically (we will implement our first texture atlas in the next section). I hope that Apple will upgrade texture atlases to support 3x textures in Swift 2. For now, we need to choose between texture atlases and 3x assets for the iPhone 6 Plus. My solution for now In my opinion, texture atlases and their performance benefits are key features of SpriteKit. I will continue using texture atlases, thus serving 2x images to the iPhone 6 Plus (which still looks fairly sharp). This means that we will not be using any 3x assets. Further simplifying matters, Swift only runs on iOS7 and higher. The only non-retina devices that run iOS7 are the aging iPad 2 and iPad mini 1st generation. If these older devices are important for your finished games, you should create both standard and 2x images for your games. Otherwise, you can safely ignore non-retina assets with Swift. This means that we will only use double-sized images. The images in the downloadable asset bundle forgo the 2x suffix, since we are only using this size. Once Apple updates texture atlases to use 3x assets, I recommend that you switch to the methodology outlined in The ideal asset approach section for your games. Hands-on with retina in SpriteKit Our bee image illustrates how this all works: Because we set an explicit node size, SpriteKit automatically resizes the bee texture to fit our 100-point wide, 100-point tall sized node. This automatic size-to-fit is very handy, but notice that we have actually slightly distorted the aspect ratio of the image. If we do not set an explicit size, SpriteKit sizes the node (in points) to the match texture's dimensions (in pixels). Go ahead and delete the line that sets the size for our bee node and re-run the project. SpriteKit maintains the aspect ratio automatically, but the smaller bee is still fuzzy. That is because our new node is 56 points by 48 points, matching our PNG file's pixel dimensions of 56 pixels by 48 pixels . . . yet our PNG file needs to be 112 pixels by 96 pixels for a sharp image at this node size on 2x retina screens. We want a smaller bee anyway, so we will resize the node rather than generate larger artwork in this case. Set the size property of your bee node, in points, to half the size of the texture's pixel resolution: // size our bee in points:bee.size = CGSize(width: 28, height: 24) Run the project and you will see a smaller, crystal sharp bee, as in this screenshot: Great! The important concept here is to design your art files at twice the pixel resolution of your node point sizes to take advantage of 2x retina screens, or three times the point sizes to take full advantage of the iPhone 6 Plus. Now we will look at organizing and animating multiple sprite frames. Organizing your assets We will quickly overrun our project navigator with image files if we add all our textures as we did with our bee. Luckily, Xcode provides several solutions. Exploring Images.xcassets We can store images in an .xcassets file and refer to them easily from our code. This is a good place for our background images: Open Images.xcassets from your project navigator. We do not need to add any images here now but, in the future, you can drag image files directly into the image list, or right-click, then Import. Notice that the SpriteKit demo's spaceship image is stored here. We do not need it anymore, so we can right-click on it and choose Removed Selected Items to delete it. Collecting art into texture atlases We will use texture atlases for most of our in-game art. Texture atlases organize assets by collecting related artwork together. They also increase performance by optimizing all of the images inside each atlas as if they were one texture. SpriteKit only needs one draw call to render multiple images out of the same texture atlas. Plus, they are very easy to use! Follow these steps to build your bee texture atlas: We need to remove our old bee texture. Right-click on bee.png in the project navigator and choose Delete, then Move to Trash. Using Finder, browse to the asset pack you downloaded and locate the Enemies folder. Create a new folder inside Enemies and name it bee.atlas. Locate the bee.png and bee_fly.png images inside Enemies and copy them into your new bee.atlas folder. You should now have a folder named bee.atlas containing the two bee PNG files. This is all you need to do to create a new texture atlas – simply place your related images into a new folder with the .atlas suffix. Add the atlas to your project. In Xcode, right-click on the project folder in the project navigator and click Add Files…, as we did earlier for our single bee texture. Find the bee.atlas folder and select the folder itself. Check Copy items if needed, then click Add. The texture atlas will appear in the project navigator. Good work; we organized our bee assets into one collection and Xcode will automatically create the performance optimizations mentioned earlier. Updating our bee node to use the texture atlas We can actually run our project right now and see the same bee as before. Our old bee texture was bee.png, and a new bee.png exists in the texture atlas. Though we deleted the standalone bee.png, SpriteKit is smart enough to find the new bee.png in the texture atlas. We should make sure our texture atlas is working, and that we successfully deleted the old individual bee.png. In GameScene.swift, change our SKSpriteNode instantiation line to use the new bee_fly.png graphic in the texture atlas: // create our bee sprite// notice the new image name: bee_fly.pnglet bee = SKSpriteNode(imageNamed: "bee_fly.png") Run the project again. You should see a different bee image, its wings held lower than before. This is the second frame of the bee animation. Next, we will learn to animate between the two frames to create an animated sprite. Iterating through texture atlas frames We need to study one more texture atlas technique: we can quickly flip through multiple sprite frames to make our bee come alive with motion. We now have two frames of our bee in flight; it should appear to hover in place if we switch back and forth between these frames. Our node will run a new SKAction to animate between the two frames. Update your didMoveToView function to match mine (I removed some older comments to save space): override func didMoveToView(view: SKView) {self.backgroundColor = UIColor(red: 0.4, green: 0.6, blue:0.95, alpha: 1.0)// create our bee sprite// Note: Remove all prior arguments from this line:let bee = SKSpriteNode()bee.position = CGPoint(x: 250, y: 250)bee.size = CGSize(width: 28, height: 24)self.addChild(bee)// Find our new bee texture atlaslet beeAtlas = SKTextureAtlas(named:"bee.atlas")// Grab the two bee frames from the texture atlas in an array// Note: Check out the syntax explicitly declaring beeFrames// as an array of SKTextures. This is not strictly necessary,// but it makes the intent of the code more readable, so I// chose to include the explicit type declaration here:let beeFrames:[SKTexture] = [beeAtlas.textureNamed("bee.png"),beeAtlas.textureNamed("bee_fly.png")]// Create a new SKAction to animate between the frames oncelet flyAction = SKAction.animateWithTextures(beeFrames,timePerFrame: 0.14)// Create an SKAction to run the flyAction repeatedlylet beeAction = SKAction.repeatActionForever(flyAction)// Instruct our bee to run the final repeat action:bee.runAction(beeAction)} Run the project. You will see our bee flap its wings back and forth – cool! You have learned the basics of sprite animation with texture atlases. We will create increasingly complicated animations using this same technique later also. For now, pat yourself on the back. The result may seem simple, but you have unlocked a major building block towards your first SpriteKit game! Putting it all together First, we learned how to use actions to move, scale, and rotate our sprites. Then, we explored animating through multiple frames, bringing our sprite to life. Let us now combine these techniques to fly our bee back and forth across the screen, flipping the texture at each turn. Add this code at the bottom of the didMoveToView function, beneath the bee.runAction(beeAction) line: // Set up new actions to move our bee back and forth:let pathLeft = SKAction.moveByX(-200, y: -10, duration: 2)let pathRight = SKAction.moveByX(200, y: 10, duration: 2)// These two scaleXTo actions flip the texture back and forth// We will use these to turn the bee to face left and rightlet flipTextureNegative = SKAction.scaleXTo(-1, duration: 0)let flipTexturePositive = SKAction.scaleXTo(1, duration: 0)// Combine actions into a cohesive flight sequence for our beelet flightOfTheBee = SKAction.sequence([pathLeft,flipTextureNegative, pathRight, flipTexturePositive])// Last, create a looping action that will repeat foreverlet neverEndingFlight =SKAction.repeatActionForever(flightOfTheBee)// Tell our bee to run the flight path, and away it goes!bee.runAction(neverEndingFlight) Run the project. You will see the bee flying back and forth, flapping its wings. You have officially learned the fundamentals of animation in SpriteKit! We will build on this knowledge to create a rich, animated game world for our players. Summary You have gained foundational knowledge of sprites, nodes, and actions in SpriteKit and already taken huge strides towards your first game with Swift. You configured your project for landscape orientation, drew your first sprite, and then made it move, spin, and scale. You added a bee texture to your sprite, created an image atlas, and animated through the frames of flight. Terrific work! Resources for Article: Further resources on this subject: Network Development with Swift [Article] Installing OpenStack Swift [Article] Flappy Swift [Article]

0
0
4331

article-image-writing-cassandra-hdfs-using-hadoop-map-reduce-job

Manu Mukerji

17 Jul 2015

5 min read

Writing to Cassandra from HDFS using a Hadoop Map Reduce Job

Manu Mukerji

17 Jul 2015

5 min read

In this post I am going to walk through how to setup a Map Reduce Job that lets you write to Cassandra. Use cases covered here will include streaming analytics into Cassandra. I am assuming you have a Cassandra cluster and Hadoop cluster available before we start, even single instances or localhost will suffice. The code used for this example is available at https://github.com/manum/mr-cassandra. Let’s create the Cassandra Keyspace and Table we are going to use. You can run the following in cqlsh (the command line utility that lets you talk to Cassandra). The table keytable only has one column in it called key; it is where we will store the data. CREATE KEYSPACE keytest WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'datacenter1' : 3 }; CREATE TABLE keytable ( key varchar, PRIMARY KEY (key) ); Here is what it will look like after it has run: cqlsh> USE keytest; cqlsh:keytest> select * from keytable; key ---------- test1234 (1 rows) We can start by looking at CassandraHelper.java and CassandraTester.java. CassandraHelper Methods: getSession(): retrieves the current session object so that no additional ones are created public Session getSession() { LOG.info("Starting getSession()"); if (this.session == null && (this.cluster == null || this.cluster.isClosed())) { LOG.info("Cluster not started or closed"); } else if (this.session.isClosed()) { LOG.info("session is closed. Creating a session"); this.session = this.cluster.connect(); } return this.session; } createConnection(String): pass the host for the Cassandra server public void createConnection(String node) { this.cluster = Cluster.builder().addContactPoint(node).build(); Metadata metadata = cluster.getMetadata(); System.out.printf("Connected to cluster: %sn",metadata.getClusterName()); for ( Host host : metadata.getAllHosts() ) { System.out.printf("Datatacenter: %s; Host: %s; Rack: %sn", host.getDatacenter(), host.getAddress(), host.getRack()); } this.session = cluster.connect(); this.prepareQueries(); } closeConnection(): closes the connection after everything is completed. public void closeConnection() { cluster.close(); } prepareQueries(): This method prepares queries that are optimized on the server side. It is recommended to use prepared queries in cases where you are running the same query often or where the query does not change but the data might, i.e. when doing several inserts. private void prepareQueries() { LOG.info("Starting prepareQueries()"); this.preparedStatement = this.session.prepare(this.query); } addKey(String): Method to add data to the cluster, it also has try catch blocks to catch exceptions and tell you what is occurring. public void addKey(String key) { Session session = this.getSession(); if(key.length()>0) { try { session.execute(this.preparedStatement.bind(key) ); //session.executeAsync(this.preparedStatement.bind(key)); } catch (NoHostAvailableException e) { System.out.printf("No host in the %s cluster can be contacted to execute the query.n", session.getCluster()); Session.State st = session.getState(); for ( Host host : st.getConnectedHosts() ) { System.out.println("In flight queries::"+st.getInFlightQueries(host)); System.out.println("open connections::"+st.getOpenConnections(host)); } } catch (QueryExecutionException e) { System.out.println("An exception was thrown by Cassandra because it cannot " + "successfully execute the query with the specified consistency level."); } catch (IllegalStateException e) { System.out.println("The BoundStatement is not ready."); } } } CassandraTester: This class has a void main method in which you need to provide the host you want to connect to and it will result in writing the value "test1234" into Cassandra. MapReduceExample.java is the interesting file here. It has a Mapper Class, Reducer Class and a main method to initialize the job. Under the Mapper you will find setup() and cleanup() methods - called automatically by the Map Reduce framework for setup and cleanup operations - which you will use to connect to Cassandra and for cleaning up the connection afterwards. I modified the standard word count example for this so the program now counts lines instead and will write them all to Cassandra. The output of the reducer is basically lines and count. To run this example here is what you need to do: Clone the repo from https://github.com/manum/mr-cassandra Run mvn install to create a jar in the target/ folder scp the jar to your Hadoop cluster Copy over the test input (For this test I used the entire works of Shakespeare all-shakespeare.txt in git) To run the jar use the following command hadoop jar mr_cassandra-0.0.1-SNAPSHOT-jar-with-dependencies.jar com.example.com.mr_cassandra.MapReduceExample /user/ubuntu/all-shakespeare.txt /user/ubuntu/output/ If you run the above steps, it should kick off the job. After the job is complete go to cqlsh and run select * from keytable limit 10; cqlsh:keytest> select * from keytable limit 10; key ---------------------------------------------------------------- REGANtGood sir, no more; these are unsightly tricks: KINGtWe lost a jewel of her; and our esteem ROSALINDtAy, but when? tNow leaves him. tThy brother by decree is banished: DUCHESS OF YORKtI had a Richard too, and thou didst kill him; JULIETtWho is't that calls? is it my lady mother? ARTHURtO, save me, Hubert, save me! my eyes are out tFull of high feeding, madly hath broke loose tSwift-winged with desire to get a grave, (10 rows) cqlsh:keytest> About the author Manu Mukerji has a background in cloud computing and big data, handling billions of transactions per day in real time. He enjoys building and architecting scalable, highly available data solutions, and has extensive experience working in online advertising and social media. Twitter: @next2manu LinkedIn: https://www.linkedin.com/in/manumukerji/

0
0
7698

How-To Tutorials

article-image-getting-started-apache-spark

Packt

17 Jul 2015

7 min read

Getting Started with Apache Spark

Packt

17 Jul 2015

7 min read

In this article by Rishi Yadav, the author of Spark Cookbook, we will cover the following recipes: Installing Spark from binaries Building the Spark source code with Maven (For more resources related to this topic, see here.) Introduction Apache Spark is a general-purpose cluster computing system to process big data workloads. What sets Spark apart from its predecessors, such as MapReduce, is its speed, ease-of-use, and sophisticated analytics. Apache Spark was originally developed at AMPLab, UC Berkeley, in 2009. It was made open source in 2010 under the BSD license and switched to the Apache 2.0 license in 2013. Toward the later part of 2013, the creators of Spark founded Databricks to focus on Spark's development and future releases. Talking about speed, Spark can achieve sub-second latency on big data workloads. To achieve such low latency, Spark makes use of the memory for storage. In MapReduce, memory is primarily used for actual computation. Spark uses memory both to compute and store objects. Spark also provides a unified runtime connecting to various big data storage sources, such as HDFS, Cassandra, HBase, and S3. It also provides a rich set of higher-level libraries for different big data compute tasks, such as machine learning, SQL processing, graph processing, and real-time streaming. These libraries make development faster and can be combined in an arbitrary fashion. Though Spark is written in Scala, and this book only focuses on recipes in Scala, Spark also supports Java and Python. Spark is an open source community project, and everyone uses the pure open source Apache distributions for deployments, unlike Hadoop, which has multiple distributions available with vendor enhancements. The following figure shows the Spark ecosystem: The Spark runtime runs on top of a variety of cluster managers, including YARN (Hadoop's compute framework), Mesos, and Spark's own cluster manager called standalone mode. Tachyon is a memory-centric distributed file system that enables reliable file sharing at memory speed across cluster frameworks. In short, it is an off-heap storage layer in memory, which helps share data across jobs and users. Mesos is a cluster manager, which is evolving into a data center operating system. YARN is Hadoop's compute framework that has a robust resource management feature that Spark can seamlessly use. Installing Spark from binaries Spark can be either built from the source code or precompiled binaries can be downloaded from http://spark.apache.org. For a standard use case, binaries are good enough, and this recipe will focus on installing Spark using binaries. Getting ready All the recipes in this book are developed using Ubuntu Linux but should work fine on any POSIX environment. Spark expects Java to be installed and the JAVA_HOME environment variable to be set. In Linux/Unix systems, there are certain standards for the location of files and directories, which we are going to follow in this book. The following is a quick cheat sheet: Directory Description /bin Essential command binaries /etc Host-specific system configuration /opt Add-on application software packages /var Variable data /tmp Temporary files /home User home directories How to do it... At the time of writing this, Spark's current version is 1.4. Please check the latest version from Spark's download page at http://spark.apache.org/downloads.html. Binaries are developed with a most recent and stable version of Hadoop. To use a specific version of Hadoop, the recommended approach is to build from sources, which will be covered in the next recipe. The following are the installation steps: Open the terminal and download binaries using the following command: $ wget http://d3kbcqa49mib13.cloudfront.net/spark-1.4.0-bin-hadoop2.4.tgz Unpack binaries: $ tar -zxf spark-1.4.0-bin-hadoop2.4.tgz Rename the folder containing binaries by stripping the version information: $ sudo mv spark-1.4.0-bin-hadoop2.4 spark Move the configuration folder to the /etc folder so that it can be made a symbolic link later: $ sudo mv spark/conf/* /etc/spark Create your company-specific installation directory under /opt. As the recipes in this book are tested on infoobjects sandbox, we are going to use infoobjects as directory name. Create the /opt/infoobjects directory: $ sudo mkdir -p /opt/infoobjects Move the spark directory to /opt/infoobjects as it's an add-on software package: $ sudo mv spark /opt/infoobjects/ Change the ownership of the spark home directory to root: $ sudo chown -R root:root /opt/infoobjects/spark Change permissions of the spark home directory, 0755 = user:read-write-execute group:read-execute world:read-execute: $ sudo chmod -R 755 /opt/infoobjects/spark Move to the spark home directory: $ cd /opt/infoobjects/spark Create the symbolic link: $ sudo ln -s /etc/spark conf Append to PATH in .bashrc: $ echo "export PATH=$PATH:/opt/infoobjects/spark/bin" >> /home/hduser/.bashrc Open a new terminal. Create the log directory in /var: $ sudo mkdir -p /var/log/spark Make hduser the owner of the Spark log directory. $ sudo chown -R hduser:hduser /var/log/spark Create the Spark tmp directory: $ mkdir /tmp/spark Configure Spark with the help of the following command lines: $ cd /etc/spark$ echo "export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop">> spark-env.sh$ echo "export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop">> spark-env.sh$ echo "export SPARK_LOG_DIR=/var/log/spark" >> spark-env.sh$ echo "export SPARK_WORKER_DIR=/tmp/spark" >> spark-env.sh Building the Spark source code with Maven Installing Spark using binaries works fine in most cases. For advanced cases, such as the following (but not limited to), compiling from the source code is a better option: Compiling for a specific Hadoop version Adding the Hive integration Adding the YARN integration Getting ready The following are the prerequisites for this recipe to work: Java 1.6 or a later version Maven 3.x How to do it... The following are the steps to build the Spark source code with Maven: Increase MaxPermSize for heap: $ echo "export _JAVA_OPTIONS="-XX:MaxPermSize=1G"" >> /home/hduser/.bashrc Open a new terminal window and download the Spark source code from GitHub: $ wget https://github.com/apache/spark/archive/branch-1.4.zip Unpack the archive: $ gunzip branch-1.4.zip Move to the spark directory: $ cd spark Compile the sources with these flags: Yarn enabled, Hadoop version 2.4, Hive enabled, and skipping tests for faster compilation: $ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -DskipTests clean package Move the conf folder to the etc folder so that it can be made a symbolic link: $ sudo mv spark/conf /etc/ Move the spark directory to /opt as it's an add-on software package: $ sudo mv spark /opt/infoobjects/spark Change the ownership of the spark home directory to root: $ sudo chown -R root:root /opt/infoobjects/spark Change the permissions of the spark home directory 0755 = user:rwx group:r-x world:r-x: $ sudo chmod -R 755 /opt/infoobjects/spark Move to the spark home directory: $ cd /opt/infoobjects/spark Create a symbolic link: $ sudo ln -s /etc/spark conf Put the Spark executable in the path by editing .bashrc: $ echo "export PATH=$PATH:/opt/infoobjects/spark/bin" >> /home/hduser/.bashrc Create the log directory in /var: $ sudo mkdir -p /var/log/spark Make hduser the owner of the Spark log directory: $ sudo chown -R hduser:hduser /var/log/spark Create the Spark tmp directory: $ mkdir /tmp/spark Configure Spark with the help of the following command lines: $ cd /etc/spark$ echo "export HADOOP_CONF_DIR=/opt/infoobjects/hadoop/etc/hadoop">> spark-env.sh$ echo "export YARN_CONF_DIR=/opt/infoobjects/hadoop/etc/Hadoop">> spark-env.sh$ echo "export SPARK_LOG_DIR=/var/log/spark" >> spark-env.sh$ echo "export SPARK_WORKER_DIR=/tmp/spark" >> spark-env.sh Summary In this article, we learned what Apache Spark is, how we can install Spark from binaries, and how to build Spark source code with Maven. Resources for Article: Further resources on this subject: Big Data Analysis (R and Hadoop) [Article] YARN and Hadoop [Article] Hadoop and SQL [Article]

0
0
2159

article-image-speeding-gradle-builds-android

Packt

16 Jul 2015

7 min read

Speeding up Gradle builds for Android

Packt

16 Jul 2015

7 min read

In this article by Kevin Pelgrims, the author of the book, Gradle for Android, we will cover a few tips and tricks that will help speed up the Gradle builds. A lot of Android developers that start using Gradle complain about the prolonged compilation time. Builds can take longer than they do with Ant, because Gradle has three phases in the build lifecycle that it goes through every time you execute a task. This makes the whole process very configurable, but also quite slow. Luckily, there are several ways to speed up Gradle builds. Gradle properties One way to tweak the speed of a Gradle build is to change some of the default settings. You can enable parallel builds by setting a property in a gradle.properties file that is placed in the root of a project. All you need to do is add the following line: org.gradle.parallel=true Another easy win is to enable the Gradle daemon, which starts a background process when you run a build the first time. Any subsequent builds will then reuse that background process, thus cutting out the startup cost. The process is kept alive as long as you use Gradle, and is terminated after three hours of idle time. Using the daemon is particularly useful when you use Gradle several times in a short time span. You can enable the daemon in the gradle.properties file like this: org.gradle.daemon=true In Android Studio, the Gradle daemon is enabled by default. This means that after the first build from inside the IDE, the next builds are a bit faster. If you build from the command-line interface; however, the Gradle daemon is disabled, unless you enable it in the properties. To speed up the compilation itself, you can tweak parameters on the Java Virtual Machine (JVM). There is a Gradle property called jvmargs that enables you to set different values for the memory allocation pool for the JVM. The two parameters that have a direct influence on your build speed are Xms and Xmx. The Xms parameter is used to set the initial amount of memory to be used, while the Xmx parameter is used to set a maximum. You can manually set these values in the gradle.properties file like this: org.gradle.jvmargs=-Xms256m -Xmx1024m You need to set the desired amount and a unit, which can be k for kilobytes, m for megabytes, and g for gigabytes. By default, the maximum memory allocation (Xmx) is set to 256 MB, and the starting memory allocation (Xms) is not set at all. The optimal settings depend on the capabilities of your computer. The last property you can configure to influence build speed is org.gradle.configureondemand. This property is particularly useful if you have complex projects with several modules, as it tries to limit the time spent in the configuration phase, by skipping modules that are not required for the task that is being executed. If you set this property to true, Gradle will try to figure out which modules have configuration changes and which ones do not, before it runs the configuration phase. This is a feature that will not be very useful if you only have an Android app and a library in your project. If you have a lot of modules that are loosely coupled, though, this feature can save you a lot of build time. System-wide Gradle properties If you want to apply these properties system-wide to all your Gradle-based projects, you can create a gradle.properties file in the .gradle folder in your home directory. On Microsoft Windows, the full path to this directory is %UserProfile%.gradle, on Linux and Mac OS X it is ~/.gradle. It is a good practice to set these properties in your home directory, rather than on the project level. The reason for this is that you usually want to keep memory consumption down on build servers, and the build time is of less importance. Android Studio The Gradle properties you can change to speed up the compilation process are also configurable in the Android Studio settings. To find the compiler settings, open the Settings dialog, and then navigate to Build, Execution, Deployment | Compiler. On that screen, you can find settings for parallel builds, JVM options, configure on demand, and so on. These settings only show up for Gradle-based Android modules. Have a look at the following screenshot: Configuring these settings from Android Studio is easier than configuring them manually in the build configuration file, and the settings dialog makes it easy to find properties that influence the build process. Profiling If you want to find out which parts of the build are slowing the process down, you can profile the entire build process. You can do this by adding the --profile flag whenever you execute a Gradle task. When you provide this flag, Gradle creates a profiling report, which can tell you which parts of the build process are the most time consuming. Once you know where the bottlenecks are, you can make the necessary changes. The report is saved as an HTML file in your module in build/reports/profile. This is the report generated after executing the build task on a multimodule project: The profiling report shows an overview of the time spent in each phase while executing the task. Below that summary is an overview of how much time Gradle spent on the configuration phase for each module. There are two more sections in the report that are not shown in the screenshot. The Dependency Resolution section shows how long it took to resolve dependencies, per module. Lastly, the Task Execution section contains an extremely detailed task execution overview. This overview has the timing for every single task, ordered by execution time from high to low. Jack and Jill If you are willing to use experimental tools, you can enable Jack and Jill to speed up builds. Jack (Java Android Compiler Kit) is a new Android build toolchain that compiles Java source code directly to the Android Dalvik executable (dex) format. It has its own .jack library format and takes care of packaging and shrinking as well. Jill (Jack Intermediate Library Linker) is a tool that can convert .aar and .jar files to .jack libraries. These tools are still quite experimental, but they were made to improve build times and to simplify the Android build process. It is not recommended to start using Jack and Jill for production versions of your projects, but they are made available so that you can try them out. To be able to use Jack and Jill, you need to use build tools version 21.1.1 or higher, and the Android plugin for Gradle version 1.0.0 or higher. Enabling Jack and Jill is as easy as setting one property in the defaultConfig block: android { buildToolsRevision '22.0.1' defaultConfig { useJack = true }} You can also enable Jack and Jill on a certain build type or product flavor. This way, you can continue using the regular build toolchain, and have an experimental build on the side: android { productFlavors { regular { useJack = false } experimental { useJack = true } }} As soon as you set useJack to true, minification and obfuscation will not go through ProGuard anymore, but you can still use the ProGuard rules syntax to specify certain rules and exceptions. Use the same proguardFiles method that we mentioned before, when talking about ProGuard. Summary This article helped us lean different ways to speed up builds; we first saw how we can tweak the settings, configure Gradle and the JVM, we saw how to detect parts that are slowing down the process, and then we learned Jack and Jill tool. Resources for Article: Further resources on this subject: Android Tech Page Android Virtual Device Manager Apache Maven and m2eclipse Saying Hello to Unity and Android

0
0
8354

How-To Tutorials

article-image-rest-apis-social-network-data-using-py2neo

Packt

14 Jul 2015

20 min read

REST APIs for social network data using py2neo

Packt

14 Jul 2015

20 min read

In this article wirtten by Sumit Gupta, author of the book Building Web Applications with Python and Neo4j we will discuss and develop RESTful APIs for performing CRUD and search operations over our social network data, using Flask-RESTful extension and py2neo extension—Object-Graph Model (OGM). Let's move forward to first quickly talk about the OGM and then develop full-fledged REST APIs over our social network data. (For more resources related to this topic, see here.) ORM for graph databases py2neo – OGM We discussed about the py2neo in Chapter 4, Getting Python and Neo4j to Talk Py2neo. In this section, we will talk about one of the py2neo extensions that provides high-level APIs for dealing with the underlying graph database as objects and its relationships. Object-Graph Mapping (http://py2neo.org/2.0/ext/ogm.html) is one of the popular extensions of py2neo and provides the mapping of Neo4j graphs in the form of objects and relationships. It provides similar functionality and features as Object Relational Model (ORM) available for relational databases py2neo.ext.ogm.Store(graph) is the base class which exposes all operations with respect to graph data models. Following are important methods of Store which we will be using in the upcoming section for mutating our social network data: Store.delete(subj): It deletes a node from the underlying graph along with its associated relationships. subj is the entity that needs to be deleted. It raises an exception in case the provided entity is not linked to the server. Store.load(cls, node): It loads the data from the database node into cls, which is the entity defined by the data model. Store.load_related(subj, rel_type, cls): It loads all the nodes related to subj of relationship as defined by rel_type into cls and then further returns the cls object. Store.load_indexed(index_name, key,value, cls): It queries the legacy index, loads all the nodes that are mapped by key-value, and returns the associated object. Store.relate(subj, rel_type, obj, properties=None): It defines the relationship between two nodes, where subj and cls are two nodes connected by rel_type. By default, all relationships point towards the right node. Store.save(subj, node=None): It save and creates a given entity/node—subj into the graph database. The second argument is of type Node, which if given will not create a new node and will change the already existing node. Store.save_indexed(index_name,key,value,subj): It saves the given entity into the graph and also creates an entry into the given index for future reference. Refer to http://py2neo.org/2.0/ext/ogm.html#py2neo.ext.ogm.Store for the complete list of methods exposed by Store class. Let's move on to the next section where we will use the OGM for mutating our social network data model. OGM supports Neo4j version 1.9, so all features of Neo4j 2.0 and above are not supported such as labels. Social network application with Flask-RESTful and OGM In this section, we will develop a full-fledged application for mutating our social network data and will also talk about the basics of Flask-RESTful and OGM. Creating object model Perform the following steps to create the object model and CRUD/search functions for our social network data: Our social network data contains two kind of entities—Person and Movies. So as a first step let's create a package model and within the model package let's define a module SocialDataModel.py with two classes—Person and Movie: class Person(object): def __init__(self, name=None,surname=None,age=None,country=None): self.name=name self.surname=surname self.age=age self.country=country class Movie(object): def __init__(self, movieName=None): self.movieName=movieName Next, let's define another package operations and two python modules ExecuteCRUDOperations.py and ExecuteSearchOperations.py. The ExecuteCRUDOperations module will contain the following three classes: DeleteNodesRelationships: It will contain one method each for deleting People nodes and Movie nodes and in the __init__ method, we will establish the connection to the graph database. class DeleteNodesRelationships(object): ''' Define the Delete Operation on Nodes ''' def __init__(self,host,port,username,password): #Authenticate and Connect to the Neo4j Graph Database py2neo.authenticate(host+':'+port, username, password) graph = Graph('http://'+host+':'+port+'/db/data/') store = Store(graph) #Store the reference of Graph and Store. self.graph=graph self.store=store def deletePersonNode(self,node): #Load the node from the Neo4j Legacy Index cls = self.store.load_indexed('personIndex', 'name', node.name, Person) #Invoke delete method of store class self.store.delete(cls[0]) def deleteMovieNode(self,node): #Load the node from the Neo4j Legacy Index cls = self.store.load_indexed('movieIndex', 'name',node.movieName, Movie) #Invoke delete method of store class self.store.delete(cls[0]) Deleting nodes will also delete the associated relationships, so there is no need to have functions for deleting relationships. Nodes without any relationship do not make much sense for many business use cases, especially in a social network, unless there is a specific need or an exceptional scenario. UpdateNodesRelationships: It will contain one method each for updating People nodes and Movie nodes and, in the __init__ method, we will establish the connection to the graph database. class UpdateNodesRelationships(object): ''' Define the Update Operation on Nodes ''' def __init__(self,host,port,username,password): #Write code for connecting to server def updatePersonNode(self,oldNode,newNode): #Get the old node from the Index cls = self.store.load_indexed('personIndex', 'name', oldNode.name, Person) #Copy the new values to the Old Node cls[0].name=newNode.name cls[0].surname=newNode.surname cls[0].age=newNode.age cls[0].country=newNode.country #Delete the Old Node form Index self.store.delete(cls[0]) #Persist the updated values again in the Index self.store.save_unique('personIndex', 'name', newNode.name, cls[0]) def updateMovieNode(self,oldNode,newNode): #Get the old node from the Index cls = self.store.load_indexed('movieIndex', 'name', oldNode.movieName, Movie) #Copy the new values to the Old Node cls[0].movieName=newNode.movieName #Delete the Old Node form Index self.store.delete(cls[0]) #Persist the updated values again in the Index self.store.save_ unique('personIndex', 'name', newNode.name, cls[0]) CreateNodesRelationships: This class will contain methods for creating People and Movies nodes and relationships and will then further persist them to the database. As with the other classes/ module, it will establish the connection to the graph database in the __init__ method: class CreateNodesRelationships(object): ''' Define the Create Operation on Nodes ''' def __init__(self,host,port,username,password): #Write code for connecting to server ''' Create a person and store it in the Person Dictionary. Node is not saved unless save() method is invoked. Helpful in bulk creation ''' def createPerson(self,name,surName=None,age=None,country=None): person = Person(name,surName,age,country) return person ''' Create a movie and store it in the Movie Dictionary. Node is not saved unless save() method is invoked. Helpful in bulk creation ''' def createMovie(self,movieName): movie = Movie(movieName) return movie ''' Create a relationships between 2 nodes and invoke a local method of Store class. Relationship is not saved unless Node is saved or save() method is invoked. ''' def createFriendRelationship(self,startPerson,endPerson): self.store.relate(startPerson, 'FRIEND', endPerson) ''' Create a TEACHES relationships between 2 nodes and invoke a local method of Store class. Relationship is not saved unless Node is saved or save() method is invoked. ''' def createTeachesRelationship(self,startPerson,endPerson): self.store.relate(startPerson, 'TEACHES', endPerson) ''' Create a HAS_RATED relationships between 2 nodes and invoke a local method of Store class. Relationship is not saved unless Node is saved or save() method is invoked. ''' def createHasRatedRelationship(self,startPerson,movie,ratings): self.store.relate(startPerson, 'HAS_RATED', movie,{'ratings':ratings}) ''' Based on type of Entity Save it into the Server/ database ''' def save(self,entity,node): if(entity=='person'): self.store.save_unique('personIndex', 'name', node.name, node) else: self.store.save_unique('movieIndex','name',node.movieName,node) Next we will define other Python module operations, ExecuteSearchOperations.py. This module will define two classes, each containing one method for searching Person and Movie node and of-course the __init__ method for establishing a connection with the server: class SearchPerson(object): ''' Class for Searching and retrieving the the People Node from server ''' def __init__(self,host,port,username,password): #Write code for connecting to server def searchPerson(self,personName): cls = self.store.load_indexed('personIndex', 'name', personName, Person) return cls; class SearchMovie(object): ''' Class for Searching and retrieving the the Movie Node from server ''' def __init__(self,host,port,username,password): #Write code for connecting to server def searchMovie(self,movieName): cls = self.store.load_indexed('movieIndex', 'name', movieName, Movie) return cls; We are done with our data model and the utility classes that will perform the CRUD and search operation over our social network data using py2neo OGM. Now let's move on to the next section and develop some REST services over our data model. Creating REST APIs over data models In this section, we will create and expose REST services for mutating and searching our social network data using the data model created in the previous section. In our social network data model, there will be operations on either the Person or Movie nodes, and there will be one more operation which will define the relationship between Person and Person or Person and Movie. So let's create another package service and define another module MutateSocialNetworkDataService.py. In this module, apart from regular imports from flask and flask_restful, we will also import classes from our custom packages created in the previous section and create objects of model classes for performing CRUD and search operations. Next we will define the different classes or services which will define the structure of our REST Services. The PersonService class will define the GET, POST, PUT, and DELETE operations for searching, creating, updating, and deleting the Person nodes. class PersonService(Resource): ''' Defines operations with respect to Entity - Person ''' #example - GET http://localhost:5000/person/Bradley def get(self, name): node = searchPerson.searchPerson(name) #Convert into JSON and return it back return jsonify(name=node[0].name,surName=node[0].surname,age=node[0].age,country=node[0].country) #POST http://localhost:5000/person #{"name": "Bradley","surname": "Green","age": "24","country": "US"} def post(self): jsonData = request.get_json(cache=False) attr={} for key in jsonData: attr[key]=jsonData[key] print(key,' = ',jsonData[key] ) person = createOperation.createPerson(attr['name'],attr['surname'],attr['age'],attr['country']) createOperation.save('person',person) return jsonify(result='success') #POST http://localhost:5000/person/Bradley #{"name": "Bradley1","surname": "Green","age": "24","country": "US"} def put(self,name): oldNode = searchPerson.searchPerson(name) jsonData = request.get_json(cache=False) attr={} for key in jsonData: attr[key] = jsonData[key] print(key,' = ',jsonData[key] ) newNode = Person(attr['name'],attr['surname'],attr['age'],attr['country']) updateOperation.updatePersonNode(oldNode[0],newNode) return jsonify(result='success') #DELETE http://localhost:5000/person/Bradley1 def delete(self,name): node = searchPerson.searchPerson(name) deleteOperation.deletePersonNode(node[0]) return jsonify(result='success') The MovieService class will define the GET, POST, and DELETE operations for searching, creating, and deleting the Movie nodes. This service will not support the modification of Movie nodes because, once the Movie node is defined, it does not change in our data model. Movie service is similar to our Person service and leverages our data model for performing various operations. The RelationshipService class only defines POST which will create the relationship between the person and other given entity and can either be another Person or Movie. Following is the structure of the POST method: ''' Assuming that the given nodes are already created this operation will associate Person Node either with another Person or Movie Node. Request for Defining relationship between 2 persons: - POST http://localhost:5000/relationship/person/Bradley {"entity_type":"person","person.name":"Matthew","relationship": "FRIEND"} Request for Defining relationship between Person and Movie POST http://localhost:5000/relationship/person/Bradley {"entity_type":"Movie","movie.movieName":"Avengers","relationship": "HAS_RATED" "relationship.ratings":"4"} ''' def post(self, entity,name): jsonData = request.get_json(cache=False) attr={} for key in jsonData: attr[key]=jsonData[key] print(key,' = ',jsonData[key] ) if(entity == 'person'): startNode = searchPerson.searchPerson(name) if(attr['entity_type']=='movie'): endNode = searchMovie.searchMovie(attr['movie.movieName']) createOperation.createHasRatedRelationship(startNode[0], endNode[0], attr['relationship.ratings']) createOperation.save('person', startNode[0]) elif (attr['entity_type']=='person' and attr['relationship']=='FRIEND'): endNode = searchPerson.searchPerson(attr['person.name']) createOperation.createFriendRelationship(startNode[0], endNode[0]) createOperation.save('person', startNode[0]) elif (attr['entity_type']=='person' and attr['relationship']=='TEACHES'): endNode = searchPerson.searchPerson(attr['person.name']) createOperation.createTeachesRelationship(startNode[0], endNode[0]) createOperation.save('person', startNode[0]) else: raise HTTPException("Value is not Valid") return jsonify(result='success') At the end, we will define our __main__ method, which will bind our services with the specific URLs and bring up our application: if __name__ == '__main__': api.add_resource(PersonService,'/person','/person/<string:name>') api.add_resource(MovieService,'/movie','/movie/<string:movieName>') api.add_resource(RelationshipService,'/relationship','/relationship/<string:entity>/<string:name>') webapp.run(debug=True) And we are done!!! Execute our MutateSocialNetworkDataService.py as a regular Python module and your REST-based services are up and running. Users of this app can use any REST-based clients such as SOAP-UI and can execute the various REST services for performing CRUD and search operations. Follow the comments provided in the code samples for the format of the request/response. In this section, we created and exposed REST-based services using Flask, Flask-RESTful, and OGM and performed CRUD and search operations over our social network data model. Using Neomodel in a Django app In this section, we will talk about the integration of Django and Neomodel. Django is a Python-based, powerful, robust, and scalable web-based application development framework. It is developed upon the Model-View-Controller (MVC) design pattern where developers can design and develop a scalable enterprise-grade application within no time. We will not go into the details of Django as a web-based framework but will assume that the readers have a basic understanding of Django and some hands-on experience in developing web-based and database-driven applications. Visit https://docs.djangoproject.com/en/1.7/ if you do not have any prior knowledge of Django. Django provides various signals or triggers that are activated and used to invoke or execute some user-defined functions on a particular event. The framework invokes various signals or triggers if there are any modifications requested to the underlying application data model such as pre_save(), post_save(), pre_delete, post_delete, and a few more. All the functions starting with pre_ are executed before the requested modifications are applied to the data model, and functions starting with post_ are triggered after the modifications are applied to the data model. And that's where we will hook our Neomodel framework, where we will capture these events and invoke our custom methods to make similar changes to our Neo4j database. We can reuse our social data model and the functions defined in ExploreSocialDataModel.CreateDataModel. We only need to register our event and things will be automatically handled by the Django framework. For example, you can register for the event in your Django model (models.py) by defining the following statement: signals.pre_save.connect(preSave, sender=Male) In the previous statement, preSave is the custom or user-defined method, declared in models.py. It will be invoked before any changes are committed to entity Male, which is controlled by the Django framework and is different from our Neomodel entity. Next, in preSave you need to define the invocations to the Neomodel entities and save them. Refer to the documentation at https://docs.djangoproject.com/en/1.7/topics/signals/ for more information on implementing signals in Django. Signals in Neomodel Neomodel also provides signals that are similar to Django signals and have the same behavior. Neomodel provides the following signals: pre_save, post_save, pre_delete, post_delete, and post_create. Neomodel exposes the following two different approaches for implementing signals: Define the pre..() and post..() methods in your model itself and Neomodel will automatically invoke it. For example, in our social data model, we can define def pre_save(self) in our Model.Male class to receive all events before entities are persisted in the database or server. Another approach is using Django-style signals, where we can define the connect() method in our Neomodel Model.py and it will produce the same results as in Django-based models: signals.pre_save.connect(preSave, sender=Male) Refer to http://neomodel.readthedocs.org/en/latest/hooks.html for more information on signals in Neomodel. In this section, we discussed about the integration of Django with Neomodel using Django signals. We also talked about the signals provided by Neomodel and their implementation approach. Summary Here we learned about creating web-based applications using Flask. We also used Flasks extensions such as Flask-RESTful for creating/exposing REST APIs for data manipulation. Finally, we created a full blown REST-based application over our social network data using Flask, Flask-RESTful, and py2neo OGM. We also learned about Neomodel and its various features and APIs provided to work with Neo4j. We also discussed about the integration of Neomodel with the Django framework. Resources for Article: Further resources on this subject: Firebase [article] Developing Location-based Services with Neo4j [article] Learning BeagleBone Python Programming [article]

0
0
6793

How-To Tutorials

article-image-fine-tune-nginx-configufine-tune-nginx-configurationfine-tune-nginx-configurationratio

Packt

14 Jul 2015

20 min read

Fine-tune the NGINX Configuration

Packt

14 Jul 2015

20 min read

In this article by Rahul Sharma, author of the book NGINX High Performance, we will cover the following topics: NGINX configuration syntax Configuring NGINX workers Configuring NGINX I/O Configuring TCP Setting up the server (For more resources related to this topic, see here.) NGINX configuration syntax This section aims to cover it in good detail. The complete configuration file has a logical structure that is composed of directives grouped into a number of sections. A section defines the configuration for a particular NGINX module, for example, the http section defines the configuration for the ngx_http_core module. An NGINX configuration has the following syntax: Valid directives begin with a variable name and then state an argument or series of arguments separated by spaces. All valid directives end with a semicolon (;). Sections are defined with curly braces ({}). Sections can be nested in one another. The nested section defines a module valid under the particular section, for example, the gzip section under the http section. Configuration outside any section is part of the NGINX global configuration. The lines starting with the hash (#) sign are comments. Configurations can be split into multiple files, which can be grouped using the include directive. This helps in organizing code into logical components. Inclusions are processed recursively, that is, an include file can further have include statements. Spaces, tabs, and new line characters are not part of the NGINX configuration. They are not interpreted by the NGINX engine, but they help to make the configuration more readable. Thus, the complete file looks like the following code: #The configuration begins here global1 value1; #This defines a new section section { sectionvar1 value1; include file1; subsection { subsectionvar1 value1; } } #The section ends here global2 value2; # The configuration ends here NGINX provides the -t option, which can be used to test and verify the configuration written in the file. If the file or any of the included files contains any errors, it prints the line numbers causing the issue: $ sudo nginx -t This checks the validity of the default configuration file. If the configuration is written in a file other than the default one, use the -c option to test it. You cannot test half-baked configurations, for example, you defined a server section for your domain in a separate file. Any attempt to test such a file will throw errors. The file has to be complete in all respects. Now that we have a clear idea of the NGINX configuration syntax, we will try to play around with the default configuration. This article only aims to discuss the parts of the configuration that have an impact on performance. The NGINX catalog has large number of modules that can be configured for some purposes. This article does not try to cover all of them as the details are beyond the scope of the book. Please refer to the NGINX documentation at http://nginx.org/en/docs/ to know more about the modules. Configuring NGINX workers NGINX runs a fixed number of worker processes as per the specified configuration. In the following sections, we will work with NGINX worker parameters. These parameters are mostly part of the NGINX global context. worker_processes The worker_processes directive controls the number of workers: worker_processes 1; The default value for this is 1, that is, NGINX runs only one worker. The value should be changed to an optimal value depending on the number of cores available, disks, network subsystem, server load, and so on. As a starting point, set the value to the number of cores available. Determine the number of cores available using lscpu: $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 4 The same can be accomplished by greping out cpuinfo: $ cat /proc/cpuinfo | grep 'processor' | wc -l Now, set this value to the parameter: # One worker per CPU-core. worker_processes 4; Alternatively, the directive can have auto as its value. This determines the number of cores and spawns an equal number of workers. When NGINX is running with SSL, it is a good idea to have multiple workers. SSL handshake is blocking in nature and involves disk I/O. Thus, using multiple workers leads to improved performance. accept_mutex Since we have configured multiple workers in NGINX, we should also configure the flags that impact worker selection. The accept_mutex parameter available under the events section will enable each of the available workers to accept new connections one by one. By default, the flag is set to on. The following code shows this: events { accept_mutex on; } If the flag is turned to off, all of the available workers will wake up from the waiting state, but only one worker will process the connection. This results in the Thundering Herd phenomenon, which is repeated a number of times per second. The phenomenon causes reduced server performance as all the woken-up workers take up CPU time before going back to the wait state. This results in unproductive CPU cycles and nonutilized context switches. accept_mutex_delay When accept_mutex is enabled, only one worker, which has the mutex lock, accepts connections, while others wait for their turn. The accept_mutex_delay corresponds to the timeframe for which the worker would wait, and after which it tries to acquire the mutex lock and starts accepting new connections. The directive is available under the events section with a default value of 500 milliseconds. The following code shows this: events{ accept_mutex_delay 500ms; } worker_connections The next configuration to look at is worker_connections, with a default value of 512. The directive is present under the events section. The directive sets the maximum number of simultaneous connections that can be opened by a worker process. The following code shows this: events{ worker_connections 512; } Increase worker_connections to something like 1,024 to accept more simultaneous connections. The value of worker_connections does not directly translate into the number of clients that can be served simultaneously. Each browser opens a number of parallel connections to download various components that compose a web page, for example, images, scripts, and so on. Different browsers have different values for this, for example, IE works with two parallel connections while Chrome opens six connections. The number of connections also includes sockets opened with the upstream server, if any. worker_rlimit_nofile The number of simultaneous connections is limited by the number of file descriptors available on the system as each socket will open a file descriptor. If NGINX tries to open more sockets than the available file descriptors, it will lead to the Too many opened files message in the error.log. Check the number of file descriptors using ulimit: $ ulimit -n Now, increase this to a value more than worker_process * worker_connections. The value should be increased for the user that runs the worker process. Check the user directive to get the username. NGINX provides the worker_rlimit_nofile directive, which can be an alternative way of setting the available file descriptor rather modifying ulimit. Setting the directive will have a similar impact as updating ulimit for the worker user. The value of this directive overrides the ulimit value set for the user. The directive is not present by default. Set a large value to handle large simultaneous connections. The following code shows this: worker_rlimit_nofile 20960; To determine the OS limits imposed on a process, read the file /proc/$pid/limits. $pid corresponds to the PID of the process. multi_accept The multi_accept flag enables an NGINX worker to accept as many connections as possible when it gets the notification of a new connection. The purpose of this flag is to accept all connections in the listen queue at once. If the directive is disabled, a worker process will accept connections one by one. The following code shows this: events{ multi_accept on; } The directive is available under the events section with the default value off. If the server has a constant stream of incoming connections, enabling multi_accept may result in a worker accepting more connections than the number specified in worker_connections. The overflow will lead to performance loss as the previously accepted connections, part of the overflow, will not get processed. use NGINX provides several methods for connection processing. Each of the available methods allows NGINX workers to monitor multiple socket file descriptors, that is, when there is data available for reading/writing. These calls allow NGINX to process multiple socket streams without getting stuck in any one of them. The methods are platform-dependent, and the configure command, used to build NGINX, selects the most efficient method available on the platform. If we want to use other methods, they must be enabled first in NGINX. The use directive allows us to override the default method with the method specified. The directive is part of the events section: events { use select; } NGINX supports the following methods of processing connections: select: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-select_module or --without-select_module configuration parameter. poll: This is the standard method of processing connections. It is built automatically on platforms that lack more efficient methods. The module can be enabled or disabled using the --with-poll_module or --without-poll_module configuration parameter. kqueue: This is an efficient method of processing connections available on FreeBSD 4.1, OpenBSD 2.9+, NetBSD 2.0, and OS X. There are the additional directives kqueue_changes and kqueue_events. These directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 512. The kqueue method will ignore the multi_accept directive if it has been enabled. epoll: This is an efficient method of processing connections available on Linux 2.6+. The method is similar to the FreeBSD kqueue. There is also the additional directive epoll_events. This specifies the number of events that NGINX will pass to the kernel. The default value for this is 512. /dev/poll: This is an efficient method of processing connections available on Solaris 7 11/99+, HP/UX 11.22+, IRIX 6.5.15+, and Tru64 UNIX 5.1A+. This has the additional directives, devpoll_events and devpoll_changes. The directives specify the number of changes and events that NGINX will pass to the kernel. The default value for both of these is 32. eventport: This is an efficient method of processing connections available on Solaris 10. The method requires necessary security patches to avoid kernel crash issues. rtsig: Real-time signals is a connection processing method available on Linux 2.2+. The method has some limitations. On older kernels, there is a system-wide limit of 1,024 signals. For high loads, the limit needs to be increased by setting the rtsig-max parameter. For kernel 2.6+, instead of the system-wide limit, there is a limit on the number of outstanding signals for each process. NGINX provides the worker_rlimit_sigpending parameter to modify the limit for each of the worker processes: worker_rlimit_sigpending 512; The parameter is part of the NGINX global configuration. If the queue overflows, NGINX drains the queue and uses the poll method to process the unhandled events. When the condition is back to normal, NGINX switches back to the rtsig method of connection processing. NGINX provides the rtsig_overflow_events, rtsig_overflow_test, and rtsig_overflow_threshold parameters to control how a signal queue is handled on overflows. The rtsig_overflow_events parameter defines the number of events passed to poll. The rtsig_overflow_test parameter defines the number of events handled by poll, after which NGINX will drain the queue. Before draining the signal queue, NGINX will look up how much it is filled. If the factor is larger than the specified rtsig_overflow_threshold, it will drain the queue. The rtsig method requires accept_mutex to be set. The method also enables the multi_accept parameter. Configuring NGINX I/O NGINX can also take advantage of the Sendfile and direct I/O options available in the kernel. In the following sections, we will try to configure parameters available for disk I/O. Sendfile When a file is transferred by an application, the kernel first buffers the data and then sends the data to the application buffers. The application, in turn, sends the data to the destination. The Sendfile method is an improved method of data transfer, in which data is copied between file descriptors within the OS kernel space, that is, without transferring data to the application buffers. This results in improved utilization of the operating system's resources. The method can be enabled using the sendfile directive. The directive is available for the http, server, and location sections. http{ sendfile on; } The flag is set to off by default. Direct I/O The OS kernel usually tries to optimize and cache any read/write requests. Since the data is cached within the kernel, any subsequent read request to the same place will be much faster because there's no need to read the information from slow disks. Direct I/O is a feature of the filesystem where reads and writes go directly from the applications to the disk, thus bypassing all OS caches. This results in better utilization of CPU cycles and improved cache effectiveness. The method is used in places where the data has a poor hit ratio. Such data does not need to be in any cache and can be loaded when required. It can be used to serve large files. The directio directive enables the feature. The directive is available for the http, server, and location sections: location /video/ { directio 4m; } Any file with size more than that specified in the directive will be loaded by direct I/O. The parameter is disabled by default. The use of direct I/O to serve a request will automatically disable Sendfile for the particular request. Direct I/O depends on the block size while doing a data transfer. NGINX has the directio_alignment directive to set the block size. The directive is present under the http, server, and location sections: location /video/ { directio 4m; directio_alignment 512; } The default value of 512 bytes works well for all boxes unless it is running a Linux implementation of XFS. In such a case, the size should be increased to 4 KB. Asynchronous I/O Asynchronous I/O allows a process to initiate I/O operations without having to block or wait for it to complete. The aio directive is available under the http, server, and location sections of an NGINX configuration. Depending on the section, the parameter will perform asynchronous I/O for the matching requests. The parameter works on Linux kernel 2.6.22+ and FreeBSD 4.3. The following code shows this: location /data { aio on; } By default, the parameter is set to off. On Linux, aio needs to be enabled with directio, while on FreeBSD, sendfile needs to be disabled for aio to take effect. If NGINX has not been configured with the --with-file-aio module, any use of the aio directive will cause the unknown directive aio error. The directive has a special value of threads, which enables multithreading for send and read operations. The multithreading support is only available on the Linux platform and can only be used with the epoll, kqueue, or eventport methods of processing requests. In order to use the threads value, configure multithreading in the NGINX binary using the --with-threads option. Post this, add a thread pool in the NGINX global context using the thread_pool directive. Use the same pool in the aio configuration: thread_pool io_pool threads=16; http{ …..... location /data{ sendfile on; aio threads=io_pool; } } Mixing them up The three directives can be mixed together to achieve different objectives on different platforms. The following configuration will use sendfile for files with size smaller than what is specified in directio. Files served by directio will be read using asynchronous I/O: location /archived-data/{ sendfile on; aio on; directio 4m; } The aio directive has a sendfile value, which is available only on the FreeBSD platform. The value can be used to perform Sendfile in an asynchronous manner: location /archived-data/{ sendfile on; aio sendfile; } NGINX invokes the sendfile() system call, which returns with no data in the memory. Post this, NGINX initiates data transfer in an asynchronous manner. Configuring TCP HTTP is an application-based protocol, which uses TCP as the transport layer. In TCP, data is transferred in the form of blocks known as TCP packets. NGINX provides directives to alter the behavior of the underlying TCP stack. These parameters alter flags for an individual socket connection. TCP_NODELAY TCP/IP networks have the "small packet" problem, where single-character messages can cause network congestion on a highly loaded network. Such packets are 41 bytes in size, where 40 bytes are for the TCP header and 1 byte has useful information. These small packets have huge overhead, around 4000 percent and can saturate a network. John Nagle solved the problem (Nagle's algorithm) by not sending the small packets immediately. All such packets are collected for some amount of time and then sent in one go as a single packet. This results in improved efficiency of the underlying network. Thus, a typical TCP/IP stack waits for up to 200 milliseconds before sending the data packages to the client. It is important to note that the problem exists with applications such as Telnet, where each keystroke is sent over wire. The problem is not relevant to a web server, which severs static files. The files will mostly form full TCP packets, which can be sent immediately instead of waiting for 200 milliseconds. The TCP_NODELAY option can be used while opening a socket to disable Nagle's buffering algorithm and send the data as soon as it is available. NGINX provides the tcp_nodelay directive to enable this option. The directive is available under the http, server, and location sections of an NGINX configuration: http{ tcp_nodelay on; } The directive is enabled by default. NGINX use tcp_nodelay for connections with the keep-alive mode. TCP_CORK As an alternative to Nagle's algorithm, Linux provides the TCP_CORK option. The option tells the TCP stack to append packets and send them when they are full or when the application instructs to send the packet by explicitly removing TCP_CORK. This results in an optimal amount of data packets being sent and, thus, improves the efficiency of the network. The TCP_CORK option is available as the TCP_NOPUSH flag on FreeBSD and Mac OS. NGINX provides the tcp_nopush directive to enable TCP_CORK over the connection socket. The directive is available under the http, server, and location sections of an NGINX configuration: http{ tcp_nopush on; } The directive is disabled by default. NGINX uses tcp_nopush for requests served with sendfile. Setting them up The two directives discussed previously do mutually exclusive things; the former makes sure that the network latency is reduced, while the latter tries to optimize the data packets sent. An application should set both of these options to get efficient data transfer. Enabling tcp_nopush along with sendfile makes sure that while transferring a file, the kernel creates the maximum amount of full TCP packets before sending them over wire. The last packet(s) can be partial TCP packets, which could end up waiting with TCP_CORK being enabled. NGINX make sure it removes TCP_CORK to send these packets. Since tcp_nodelay is also set then, these packets are immediately sent over the network, that is, without any delay. Setting up the server The following configuration sums up all the changes proposed in the preceding sections: worker_processes 3; worker_rlimit_nofile 8000; events { multi_accept on; use epoll; worker_connections 1024; } http { sendfile on; aio on; directio 4m; tcp_nopush on; tcp_nodelay on; # Rest Nginx configuration removed for brevity } It is assumed that NGINX runs on a quad core server. Thus three worker processes have been spanned to take advantage of three out of four available cores and leaving one core for other processes. Each of the workers has been configured to work with 1,024 connections. Correspondingly, the nofile limit has been increased to 8,000. By default, all worker processes operate with mutex; thus, the flag has not been set. Each worker processes multiple connections in one go using the epoll method. In the http section, NGINX has been configured to serve files larger than 4 MB using direct I/O, while efficiently buffering smaller files using Sendfile. TCP options have also been set up to efficiently utilize the available network. Measuring gains It is time to test the changes and make sure that they have given performance gain. Run a series of tests using Siege/JMeter to get new performance numbers. The tests should be performed with the same configuration to get a comparable output: $ siege -b -c 790 -r 50 -q http://192.168.2.100/hello Transactions: 79000 hits Availability: 100.00 % Elapsed time: 24.25 secs Data transferred: 12.54 MB Response time: 0.20 secs Transaction rate: 3257.73 trans/sec Throughput: 0.52 MB/sec Concurrency: 660.70 Successful transactions: 39500 Failed transactions: 0 Longest transaction: 3.45 Shortest transaction: 0.00 The results from Siege should be evaluated and compared to the baseline. Throughput: The transaction rate defines this as 3250 requests/second Error rate: Availability is reported as 100 percent; thus; the error rate is 0 percent Response time: The results shows a response time of 0.20 seconds Thus, these new numbers demonstrate performance improvement in various respects. After the server configuration is updated with all the changes, reperform all tests with increased numbers. The aim should be to determine the new baseline numbers for the updated configuration. Summary The article started with an overview of the NGINX configuration syntax. Going further, we discussed worker_connections and the related parameters. These allow you to take advantage of the available hardware. The article also talked about the different event processing mechanisms available on different platforms. The configuration discussed helped in processing more requests, thus improving the overall throughput. NGINX is primarily a web server; thus, it has to serve all kinds static content. Large files can take advantage of direct I/O, while smaller content can take advantage of Sendfile. The different disk modes make sure that we have an optimal configuration to serve the content. In the TCP stack, we discussed the flags available to alter the default behavior of TCP sockets. The tcp_nodelay directive helps in improving latency. The tcp_nopush directive can help in efficiently delivering the content. Both these flags lead to improved response time. In the last part of the article, we applied all the changes to our server and then did performance tests to determine the effectiveness of the changes done. In the next article, we will try to configure buffers, timeouts, and compression to improve the utilization of the available network. Resources for Article: Further resources on this subject: Using Nginx as a Reverse Proxy [article] Nginx proxy module [article] Introduction to nginx [article]

0
0
27721

An Introduction to Mastering JavaScript Promises and Its Implementation in Angular.js

Role Management

Programmable DC Motor Controller with an LCD

Persisting Data to Local Storage in Ionic

WildFly – the Basics

Configuring FreeSWITCH for WebRTC

More about Julia

Getting Started with Nginx

Deployment and Maintenance

Sprites, Camera, Actions!

Trending Topics

Writing to Cassandra from HDFS using a Hadoop Map Reduce Job

Getting Started with Apache Spark

Speeding up Gradle builds for Android

REST APIs for social network data using py2neo

Fine-tune the NGINX Configuration

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access