Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-file-sharing
Packt
08 Jul 2015
14 min read
Save for later

File Sharing

Packt
08 Jul 2015
14 min read
In this article by Dan Ristic, author of the book Learning WebRTC, we will cover the following topics: Getting a file with File API Setting up our page Getting a reference to a file The real power of a data channel comes when combining it with other powerful technologies from a browser. By opening up the power to send data peer-to-peer and combining it with a File API, we could open up all new possibilities in your browser. This means you could add file sharing functionalities that are available to any user with an Internet connection. The application that we will build will be a simple one with the ability to share files between two peers. The basics of our application will be real-time, meaning that the two users have to be on the page at the same time to share a file. There will be a finite number of steps that both users will go through to transfer an entire file between them: User A will open the page and type a unique ID. User B will open the same page and type the same unique ID. The two users can then connect to each other using RTCPeerConnection. Once the connection is established, one user can select a file to share. The other user will be notified of the file that is being shared, where it will be transferred to their computer over the connection and they will download the file. The main thing we will focus on throughout the article is how to work with the data channel in new and exciting ways. We will be able to take the file data from the browser, break it down into pieces, and send it to the other user using only the RTCPeerConnection API. The interactivity that the API promotes will stand out in this article and can be used in a simple project. Getting a file with the File API One of the first things that we will cover is how to use the File API to get a file from the user's computer. There is a good chance you have interacted with the File API on a web page and have not even realized it yet! The API is usually denoted by the Browse or Choose File text located on an input field in the HTML page and often looks something similar to this: Although the API has been around for quite a while, the one you are probably familiar with is the original specification, dating back as far as 1995. This was the Form-based File Upload in HTML specification that focused on allowing a user to upload a file to a server using an HTML form. Before the days of the file input, application developers had to rely on third-party tools to request files of data from the user. This specification was proposed in order to make a standard way to upload files for a server to download, save, and interact with. The original standard focused entirely on interacting with a file via an HTML form, however, and did not detail any way to interact with a file via JavaScript. This was the origin of the File API. Fast-forward to the groundbreaking days of HTML5 and we now have a fully-fledged File API. The goal of the new specification was to open the doors to file manipulation for web applications, allowing them to interact with files similar to how a native-installed application would. This means providing access to not only a way for the user to upload a file, but also ways to read the file in different formats, manipulate the data of the file, and then ultimately do something with this data. Although there are many great features of the API, we are going to only focus on one small aspect of this API. This is the ability to get binary file data from the user by asking them to upload a file. A typical application that works with files, such as Notepad on Windows, will work with file data in pretty much the same way. It asks the user to open a file in which it will read the binary data from the file and display the characters on the screen. The File API gives us access to the same binary data that any other application would use in the browser. This is the great thing about working with the File API: it works in most browsers from a HTML page; similar to the ones we have been building for our WebRTC demos. To start building our application, we will put together another simple web page. This will look similar to the last ones, and should be hosted with a static file server as done in the previous examples. By the end of the article, you will be a professional single page application builder! Now let's take a look at the following HTML code that demonstrates file sharing: <!DOCTYPE html> <html lang="en"> <head>    <meta charset="utf-8" />      <title>Learning WebRTC - Article: File Sharing</title>      <style>      body {        background-color: #404040;        margin-top: 15px;        font-family: sans-serif;        color: white;      }        .thumb {        height: 75px;        border: 1px solid #000;        margin: 10px 5px 0 0;      }        .page {        position: relative;        display: block;        margin: 0 auto;        width: 500px;        height: 500px;      }        #byte_content {        margin: 5px 0;        max-height: 100px;        overflow-y: auto;        overflow-x: hidden;      }        #byte_range {        margin-top: 5px;      }    </style> </head> <body>    <div id="login-page" class="page">      <h2>Login As</h2>      <input type="text" id="username" />      <button id="login">Login</button>    </div>      <div id="share-page" class="page">      <h2>File Sharing</h2>        <input type="text" id="their-username" />      <button id="connect">Connect</button>      <div id="ready">Ready!</div>        <br />      <br />           <input type="file" id="files" name="file" /> Read bytes:      <button id="send">Send</button>    </div>      <script src="client.js"></script> </body> </html> The page should be fairly recognizable at this point. We will use the same page showing and hiding via CSS as done earlier. One of the main differences is the appearance of the file input, which we will utilize to have the user upload a file to the page. I even picked a different background color this time to spice things up. Setting up our page Create a new folder for our file sharing application and add the HTML code shown in the preceding section. You will also need all the steps from our JavaScript file to log in two users, create a WebRTC peer connection, and create a data channel between them. Copy the following code into your JavaScript file to get the page set up: var name, connectedUser;   var connection = new WebSocket('ws://localhost:8888');   connection.onopen = function () { console.log("Connected"); };   // Handle all messages through this callback connection.onmessage = function (message) { console.log("Got message", message.data);   var data = JSON.parse(message.data);   switch(data.type) {    case "login":      onLogin(data.success);      break;    case "offer":      onOffer(data.offer, data.name);      break;    case "answer":      onAnswer(data.answer);      break;    case "candidate":      onCandidate(data.candidate);      break;    case "leave":      onLeave();      break;    default:      break; } };   connection.onerror = function (err) { console.log("Got error", err); };   // Alias for sending messages in JSON format function send(message) { if (connectedUser) {    message.name = connectedUser; }   connection.send(JSON.stringify(message)); };   var loginPage = document.querySelector('#login-page'), usernameInput = document.querySelector('#username'), loginButton = document.querySelector('#login'), theirUsernameInput = document.querySelector('#their- username'), connectButton = document.querySelector('#connect'), sharePage = document.querySelector('#share-page'), sendButton = document.querySelector('#send'), readyText = document.querySelector('#ready'), statusText = document.querySelector('#status');   sharePage.style.display = "none"; readyText.style.display = "none";   // Login when the user clicks the button loginButton.addEventListener("click", function (event) { name = usernameInput.value;   if (name.length > 0) {    send({      type: "login",      name: name    }); } });   function onLogin(success) { if (success === false) {    alert("Login unsuccessful, please try a different name."); } else {    loginPage.style.display = "none";    sharePage.style.display = "block";      // Get the plumbing ready for a call    startConnection(); } };   var yourConnection, connectedUser, dataChannel, currentFile, currentFileSize, currentFileMeta;   function startConnection() { if (hasRTCPeerConnection()) {    setupPeerConnection(); } else {    alert("Sorry, your browser does not support WebRTC."); } }   function setupPeerConnection() { var configuration = {    "iceServers": [{ "url": "stun:stun.1.google.com:19302 " }] }; yourConnection = new RTCPeerConnection(configuration, {optional: []});   // Setup ice handling yourConnection.onicecandidate = function (event) {    if (event.candidate) {      send({        type: "candidate",       candidate: event.candidate      });    } };   openDataChannel(); }   function openDataChannel() { var dataChannelOptions = {    ordered: true,    reliable: true,    negotiated: true,    id: "myChannel" }; dataChannel = yourConnection.createDataChannel("myLabel", dataChannelOptions);   dataChannel.onerror = function (error) {    console.log("Data Channel Error:", error); };   dataChannel.onmessage = function (event) {    // File receive code will go here };   dataChannel.onopen = function () {    readyText.style.display = "inline-block"; };   dataChannel.onclose = function () {    readyText.style.display = "none"; }; }   function hasUserMedia() { navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia || navigator.msGetUserMedia; return !!navigator.getUserMedia; }   function hasRTCPeerConnection() { window.RTCPeerConnection = window.RTCPeerConnection || window.webkitRTCPeerConnection || window.mozRTCPeerConnection; window.RTCSessionDescription = window.RTCSessionDescription || window.webkitRTCSessionDescription || window.mozRTCSessionDescription; window.RTCIceCandidate = window.RTCIceCandidate || window.webkitRTCIceCandidate || window.mozRTCIceCandidate; return !!window.RTCPeerConnection; }   function hasFileApi() { return window.File && window.FileReader && window.FileList && window.Blob; }   connectButton.addEventListener("click", function () { var theirUsername = theirUsernameInput.value;   if (theirUsername.length > 0) {    startPeerConnection(theirUsername); } });   function startPeerConnection(user) { connectedUser = user;   // Begin the offer yourConnection.createOffer(function (offer) {    send({      type: "offer",      offer: offer    });    yourConnection.setLocalDescription(offer); }, function (error) {    alert("An error has occurred."); }); };   function onOffer(offer, name) { connectedUser = name; yourConnection.setRemoteDescription(new RTCSessionDescription(offer));   yourConnection.createAnswer(function (answer) {    yourConnection.setLocalDescription(answer);      send({      type: "answer",      answer: answer    }); }, function (error) {    alert("An error has occurred"); }); };   function onAnswer(answer) { yourConnection.setRemoteDescription(new RTCSessionDescription(answer)); };   function onCandidate(candidate) { yourConnection.addIceCandidate(new RTCIceCandidate(candidate)); };   function onLeave() { connectedUser = null; yourConnection.close(); yourConnection.onicecandidate = null; setupPeerConnection(); }; We set up references to our elements on the screen as well as get the peer connection ready to be processed. When the user decides to log in, we send a login message to the server. The server will return with a success message telling the user they are logged in. From here, we allow the user to connect to another WebRTC user who is given their username. This sends offer and response, connecting the two users together through the peer connection. Once the peer connection is created, we connect the users through a data channel so that we can send arbitrary data across. Hopefully, this is pretty straightforward and you are able to get this code up and running in no time. It should all be familiar to you by now. This is the last time we are going to refer to this code, so get comfortable with it before moving on! Getting a reference to a file Now that we have a simple page up and running, we can start working on the file sharing part of the application. The first thing the user needs to do is select a file from their computer's filesystem. This is easily taken care of already by the input element on the page. The browser will allow the user to select a file from their computer and then save a reference to that file in the browser for later use. When the user presses the Send button, we want to get a reference to the file that the user has selected. To do this, you need to add an event listener, as shown in the following code: sendButton.addEventListener("click", function (event) { var files = document.querySelector('#files').files;   if (files.length > 0) {    dataChannelSend({      type: "start",      data: files[0]    });      sendFile(files[0]); } }); You might be surprised at how simple the code is to get this far! This is the amazing thing about working within a browser. Much of the hard work has already been done for you. Here, we will get a reference to our input element and the files that it has selected. The input element supports both multiple and single selection of files, but in this example we will only work with one file at a time. We then make sure we have a file to work with, tell the other user that we want to start sending data, and then call our sendFile function, which we will implement later in this article. Now, you might think that the object we get back will be in the form of the entire data inside of our file. What we actually get back from the input element is an object representing metadata about the file itself. Let's take a look at this metadata: { lastModified: 1364868324000, lastModifiedDate: "2013-04-02T02:05:24.000Z", name: "example.gif", size: 1745559, type: "image/gif" } This will give us the information we need to tell the other user that we want to start sending a file with the example.gif name. It will also give a few other important details, such as the type of file we are sending and when it has been modified. The next step is to read the file's data and send it through the data channel. This is no easy task, however, and we will require some special logic to do so. Summary In this article we covered the basics of using the File API and retrieving a file from a user's computer. The article also discusses the page setup for the application using JavaScript and getting a reference to a file. Resources for Article: Further resources on this subject: WebRTC with SIP and IMS [article] Using the WebRTC Data API [article] Applications of WebRTC [article]
Read more
  • 0
  • 0
  • 2062

article-image-integrating-google-play-services
Packt
08 Jul 2015
41 min read
Save for later

Integrating Google Play Services

Packt
08 Jul 2015
41 min read
In this article Integrating Google Play Services by Raul Portales, author of the book Mastering Android Game Development, we will cover the tools that Google Play Services offers for game developers. We'll see the integration of achievements and leaderboards in detail, take an overview of events and quests, save games, and use turn-based and real-time multiplaying. Google provides Google Play Services as a way to use special features in apps. Being the game services subset the one that interests us the most. Note that Google Play Services are updated as an app that is independent from the operating system. This allows us to assume that most of the players will have the latest version of Google Play Services installed. (For more resources related to this topic, see here.) More and more features are being moved from the Android SDK to the Play Services because of this. Play Services offer much more than just services for games, but there is a whole section dedicated exclusively to games, Google Play Game Services (GPGS). These features include achievements, leaderboards, quests, save games, gifts, and even multiplayer support. GPGS also comes with a standalone app called "Play Games" that shows the user the games he or she has been playing, the latest achievements, and the games his or her friends play. It is a very interesting way to get exposure for your game. Even as a standalone feature, achievements and leaderboards are two concepts that most games use nowadays, so why make your own custom ones when you can rely on the ones made by Google? GPGS can be used on many platforms: Android, iOS and web among others. It is more used on Android, since it is included as a part of Google apps. There is extensive step-by-step documentation online, but the details are scattered over different places. We will put them together here and link you to the official documentation for more detailed information. For this article, you are supposed to have a developer account and have access to the Google Play Developer Console. It is also advisable for you to know the process of signing and releasing an app. If you are not familiar with it, there is very detailed official documentation at http://developer.android.com/distribute/googleplay/start.html. There are two sides of GPGS: the developer console and the code. We will alternate from one to the other while talking about the different features. Setting up the developer console Now that we are approaching the release state, we have to start working with the developer console. The first thing we need to do is to get into the Game services section of the console to create and configure a new game. In the left menu, we have an option labeled Game services. This is where you have to click. Once in the Game services section, click on Add new game: This bring us to the set up dialog. If you are using other Google services like Google Maps or Google Cloud Messaging (GCM) in your game, you should select the second option and move forward. Otherwise, you can just fill in the fields for I don't use any Google APIs on my game yet and continue. If you don't know whether you are already using them, you probably aren't. Now, it is time to link a game to it. I recommend you publish your game beforehand as an alpha release. This will let you select it from the list when you start typing the package name. Publishing the game to the alpha channel before adding it to Game services makes it much easier to configure. If you are not familiar with signing and releasing your app, check out the official documentation at http://developer.android.com/tools/publishing/app-signing.html. Finally, there are only two steps that we have to take when we link the first app. We need to authorize it and provide branding information. The authorization will generate an OAuth key—that we don't need to use since it is required for other platforms—and also a game ID. This ID is unique to all the linked apps and we will need it to log in. But there is no need to write it down now, it can be found easily in the console at anytime. Authorizing the app will generate the game ID, which is unique to all linked apps. Note that the app we have added is configured with the release key. If you continue and try the login integration, you will get an error telling you that the app was signed with the wrong certificate: You have two ways to work with this limitation: Always make a release build to test GPGS integration Add your debug-signed game as a linked app I recommend that you add the debug signed app as a linked app. To do this, we just need to link another app and configure it with the SHA1 fingerprint of the debug key. To obtain it, we have to open a terminal and run the keytool utility: keytool -exportcert -alias androiddebugkey -keystore <path-to-debug-keystore> -list -v Note that in Windows, the debug keystore can be found at C:Users<USERNAME>.androiddebug.keystore. On Mac and Linux, the debug keystore is typically located at ~/.android/debug.keystore. Dialog to link the debug application on the Game Services console Now, we have the game configured. We could continue creating achievements and leaderboards in the console, but we will put it aside and make sure that we can sign in and connect with GPGS. The only users who can sign in to GPGS while a game is not published are the testers. You can make the alpha and/or beta testers of a linked app become testers of the game services, and you can also add e-mail addresses by hand for this. You can modify this in the Testing tab. Only test accounts can access a game that is not published. The e-mail of the owner of the developer console is prefilled as a tester. Just in case you have problems logging in, double-check the list of testers. A game service that is not published will not appear in the feed of the Play Services app, but it will be possible to test and modify it. This is why it is a good idea to keep it in draft mode until the game itself is ready and publish both the game and the game services at the same time. Setting up the code The first thing we need to do is to add the Google Play Services library to our project. This should already have been done by the wizard when we created the project, but I recommend you to double-check it now. The library needs to be added to the build.gradle file of the main module. Note that Android Studio projects contain a top-level build.gradle and a module-level build.gradle for each module. We will modify the one that is under the mobile module. Make sure that the play services' library is listed under dependencies: apply plugin: 'com.android.application'     dependencies { compile 'com.android.support:appcompat-v7:22.1.1' compile 'com.google.android.gms:play-services:7.3.0' } At the point of writing, the latest version is 7.3.0. The basic features have not changed much and they are unlikely to change. You could force Gradle to use a specific version of the library, but in general I recommend you use the latest version. Once you have it, save the changes and click on Sync Project with Gradle Files. To be able to connect with GPGS, we need to let the game know what the game ID is. This is done through the <meta-data> tag on AndroidManifest.xml. You could hardcode the value here, but it is highly recommended that you set it as a resource in your Android project. We are going to create a new file for this under res/values, which we will name play_services.xml. In this file we will put the game ID, but later we will also have the achievements and leaderboard IDs in it. Using a separate file for these values is recommended because they are constants that do not need to be translated: <application> <meta-data android_name="com.google.android.gms.games.APP_ID" android_value="@string/app_id" /> <meta-data android_name="com.google.android.gms.version" android_value="@integer/google_play_services_version"/> [...] </application> Adding this metadata is extremely important. If you forget to update the AndroidManifest.xml, the app will crash when you try to sign in to Google Play services. Note that the integer for the gms version is defined in the library and we do not need to add it to our file. If you forget to add the game ID to the strings the app will crash. Now, it is time to proceed to sign in. The process is quite tedious and requires many checks, so Google has released an open source project named BaseGameUtils, which makes it easier. Unfortunately this project is not a part of the play services' library and it is not even available as a library. So, we have to get it from GitHub (either check it out or download the source as a ZIP file). BaseGameUtils abstracts us from the complexity of handling the connection with Play Services. Even more cumbersome, BaseGameUtils is not available as a standalone download and has to be downloaded together with another project. The fact that this significant piece of code is not a part of the official library makes it quite tedious to set up. Why it has been done like this is something that I do not comprehend myself. The project that contains BaseGameUtils is called android-basic-samples and it can be downloaded from https://github.com/playgameservices/android-basic-samples. Adding BaseGameUtils is not as straightforward as we would like it to be. Once android-basic-samples is downloaded, open your game project in Android Studio. Click on File > Import Module and navigate to the directory where you downloaded android-basic-samples. Select the BaseGameUtils module in the BasicSamples/libraries directory and click on OK. Finally, update the dependencies in the build.gradle file for the mobile module and sync gradle again: dependencies { compile project(':BaseGameUtils') [...] } After all these steps to set up the project, we are finally ready to begin the sign in. We will make our main Activity extend from BaseGamesActivity, which takes care of all the handling of the connections, and sign in with Google Play Services. One more detail: until now, we were using Activity and not FragmentActivity as the base class for YassActivity (BaseGameActivity extends from FragmentActivity) and this change will mess with the behavior of our dialogs while calling navigateBack. We can change the base class of BaseGameActivity or modify navigateBack to perform a pop-on fragment navigation hierarchy. I recommend the second approach: public void navigateBack() { // Do a pop on the navigation history getFragmentManager().popBackStack(); } This util class has been designed to work with single-activity games. It can be used in multiple activities, but it is not straightforward. This is another good reason to keep the game in a single activity. The BaseGameUtils is designed to be used in single-activity games. The default behavior of BaseGameActivity is to try to log in each time the Activity is started. If the user agrees to sign in, the sign in will happen automatically. But if the user rejects doing so, he or she will be asked again several times. I personally find this intrusive and annoying, and I recommend you to only prompt to log in to Google Play services once (and again, if the user logs out). We can always provide a login entry point in the app. This is very easy to change. The default number of attempts is set to 3 and it is a part of the code of GameHelper: // Should we start the flow to sign the user in automatically on   startup? If // so, up to // how many times in the life of the application? static final int DEFAULT_MAX_SIGN_IN_ATTEMPTS = 3; int mMaxAutoSignInAttempts = DEFAULT_MAX_SIGN_IN_ATTEMPTS; So, we just have to configure it for our activity, adding one line of code during onCreate to change the default behavior with the one we want: just try it once: getGameHelper().setMaxAutoSignInAttempts(1); Finally, there are two methods that we can override to act when the user successfully logs in and when there is a problem: onSignInSucceeded and onSignInFailed. We will use them when we update the main menu at the end of the article. Further use of GPGS is to be made via the GameHelper and/or the GoogleApiClient, which is a part of the GameHelper. We can obtain a reference to the GameHelper using the getGameHelper method of BaseGameActivity. Now that the user can sign into Google Play services we can continue with achievements and leaderboards. Let's go back to the developer console. Achievements We will first define a few achievements in the developer console and then see how to unlock them in the game. Note that to publish any game with GPGS, you need to define at least five achievements. No other feature is mandatory, but achievements are. We need to define at least five achievements to publish a game with Google Play Game services. If you want to use GPGS with a game that has no achievements, I recommend you to add five dummy secret achievements and let them be. To add an achievement, we just need to navigate to the Achievements tab on the left and click on Add achievement: The menu to add a new achievement has a few fields that are mostly self-explanatory. They are as follows: Name: the name that will be shown (can be localized to different languages). Description: the description of the achievement to be shown (can also be localized to different languages). Icon: the icon of the achievement as a 512x512 px PNG image. This will be used to show the achievement in the list and also to generate the locked image and the in-game popup when it is unlocked. Incremental achievements: if the achievement requires a set of steps to be completed, it is called an incremental achievement and can be shown with a progress bar. We will have an incremental achievement to illustrate this. Initial state: Revealed/Hidden depending on whether we want the achievement to be shown or not. When an achievement is shown, the name and description are visible, players know what they have to do to unlock it. A hidden achievement, on the other hand, is a secret and can be a funny surprise when unlocked. We will have two secret achievements. Points: GPGS allows each game to have 1,000 points to give for unlocking achievements. This gets converted to XP in the player profile on Google Play games. This can be used to highlight that some achievements are harder than others, and therefore grant a bigger reward. You cannot change these once they are published, so if you plan to have more achievements in the future, plan ahead with the points. List order: The order of the achievements is shown. It is not followed all the time, since on the Play Games app the unlocked ones are shown before the locked ones. It is still handy to rearrange them. Dialog to add an achievement on the developer console As we already decided, we will have five achievements in our game and they will be as follows: Big Score: score over 100,000 points in one game. This is to be granted while playing. Asteroid killer: destroy 100 asteroids. This will count them across different games and is an incremental achievement. Survivor: survive for 60 seconds. Target acquired: a hidden achievement. Hit 20 asteroids in a row without missing a hit. This is meant to reward players that only shoot when they should. Target lost: this is supposed to be a funny achievement, granted when you miss with 10 bullets in a row. It is also hidden, because otherwise it would be too easy to unlock. So, we created some images for them and added them to the console. The developer console with all the configured achievements Each achievement has a string ID. We will need these ids to unlock the achievements in our game, but Google has made it easy for us. We have a link at the bottom named Get resources that pops up a dialog with the string resources we need. We can just copy them from there and paste them in our project in the play_services.xml file we have already created. Architecture For our game, given that we only have five achievements, we are going to add the code for achievements directly into the ScoreObject. This will make it less code for you to read so we can focus on how it is done. However, for a real production code I recommend you define a dedicated architecture for achievements. The recommended architecture is to have an AchievementsManager class that loads all the achievements when the game starts and stores them in three lists: All achievements Locked achievements Unlocked achievements Then, we have an Achievement base class with an abstract check method that we implement for each one of them: public boolean check (GameEngine gameEngine, GameEvent gameEvent) { } This base class takes care of loading the achievement state from local storage (I recommend using SharedPreferences for this) and modify it as per the result of check. The achievements check is done at AchievementManager level using a checkLockedAchievements method that iterates over the list of achievements that can be unlocked. This method should be called as a part of onEventReceived of GameEngine. This architecture allows you to check only the achievements that are yet to be unlocked and also all the achievements included in the game in a specific dedicated place. In our case, since we are keeping the score inside the ScoreGameObject, we are going to add all achievements code there. Note that making the GameEngine take care of the score and having it as a variable that other objects can read are also recommended design patterns, but it was simpler to do this as a part of ScoreGameObject. Unlocking achievements To handle achievements, we need to have access to an object of the class GoogleApiClient. We can get a reference to it in the constructor of ScoreGameObject: private final GoogleApiClient mApiClient;   public ScoreGameObject(YassBaseFragment parent, View view, int viewResId) { […] mApiClient =  parent.getYassActivity().getGameHelper().getApiClient(); } The parent Fragment has a reference to the Activity, which has a reference to the GameHelper, which has a reference to the GoogleApiClient. Unlocking an achievement requires just a single line of code, but we also need to check whether the user is connected to Google Play services or not before trying to unlock an achievement. This is necessary because if the user has not signed it, an exception is thrown and the game crashes. Unlocking an achievement requires just a single line of code. But this check is not enough. In the edge case, when the user logs out manually from Google Play services (which can be done in the achievements screen), the connection will not be closed and there is no way to know whether he or she has logged out. We are going to create a utility method to unlock the achievements that does all the checks and also wraps the unlock method into a try/catch block and make the API client disconnect if an exception is raised: private void unlockSafe(int resId) { if (mApiClient.isConnecting() || mApiClient.isConnected()) {    try {      Games.Achievements.unlock(mApiClient, getString(resId));    } catch (Exception e) {      mApiClient.disconnect();    } } } Even with all the checks, the code is still very simple. Let's work on the particular achievements we have defined for the game. Even though they are very specific, the methodology to track game events and variables and then check for achievements to unlock is in itself generic, and serves as a real-life example of how to deal with achievements. The achievements we have designed require us to count some game events and also the running time. For the last two achievements, we need to make a new GameEvent for the case when a bullet misses, which we have not created until now. The code in the Bullet object to trigger this new GameEvent is as follows: @Override public void onUpdate(long elapsedMillis, GameEngine gameEngine) { mY += mSpeedFactor * elapsedMillis; if (mY < -mHeight) {    removeFromGameEngine(gameEngine);    gameEngine.onGameEvent(GameEvent.BulletMissed); } } Now, let's work inside ScoreGameObject. We are going to have a method that checks achievements each time an asteroid is hit. There are three achievements that can be unlocked when that event happens: Big score, because hitting an asteroid gives us points Target acquired, because it requires consecutive asteroid hits Asteroid killer, because it counts the total number of asteroids that have been destroyed The code is like this: private void checkAsteroidHitRelatedAchievements() { if (mPoints > 100000) {    // Unlock achievement    unlockSafe(R.string.achievement_big_score); } if (mConsecutiveHits >= 20) {    unlockSafe(R.string.achievement_target_acquired); } // Increment achievement of asteroids hit if (mApiClient.isConnecting() || mApiClient.isConnected()) {    try {      Games.Achievements.increment(mApiClient, getString(R.string.achievement_asteroid_killer), 1);    } catch (Exception e) {      mApiClient.disconnect();    } } } We check the total points and the number of consecutive hits to unlock the corresponding achievements. The "Asteroid killer" achievement is a bit of a different case, because it is an incremental achievement. These type of achievements do not have an unlock method, but rather an increment method. Each time we increment the value, progress on the achievement is updated. Once the progress is 100 percent, it is unlocked automatically. Incremental achievements are automatically unlocked, we just have to increment their value. This makes incremental achievements much easier to use than tracking the progress locally. But we still need to do all the checks as we did for unlockSafe. We are using a variable named mConsecutiveHits, which we have not initialized yet. This is done inside onGameEvent, which is the place where the other hidden achievement target lost is checked. Some initialization for the "Survivor" achievement is also done here: public void onGameEvent(GameEvent gameEvent) { if (gameEvent == GameEvent.AsteroidHit) {    mPoints += POINTS_GAINED_PER_ASTEROID_HIT;    mPointsHaveChanged = true;    mConsecutiveMisses = 0;    mConsecutiveHits++;    checkAsteroidHitRelatedAchievements(); } else if (gameEvent == GameEvent.BulletMissed) {    mConsecutiveMisses++;    mConsecutiveHits = 0;    if (mConsecutiveMisses >= 20) {      unlockSafe(R.string.achievement_target_lost);    } } else if (gameEvent == GameEvent.SpaceshipHit) {    mTimeWithoutDie = 0; } […] } Each time we hit an asteroid, we increment the number of consecutive asteroid hits and reset the number of consecutive misses. Similarly, each time we miss a bullet, we increment the number of consecutive misses and reset the number of consecutive hits. As a side note, each time the spaceship is destroyed we reset the time without dying, which is used for "Survivor", but this is not the only time when the time without dying should be updated. We have to reset it when the game starts, and modify it inside onUpdate by just adding the elapsed milliseconds that have passed: @Override public void startGame(GameEngine gameEngine) { mTimeWithoutDie = 0; […] }   @Override public void onUpdate(long elapsedMillis, GameEngine gameEngine) { mTimeWithoutDie += elapsedMillis; if (mTimeWithoutDie > 60000) {    unlockSafe(R.string.achievement_survivor); } } So, once the game has been running for 60,000 milliseconds since it started or since a spaceship was destroyed, we unlock the "Survivor" achievement. With this, we have all the code we need to unlock the achievements we have created for the game. Let's finish this section with some comments on the system and the developer console: As a rule of thumb, you can edit most of the details of an achievement until you publish it to production. Once your achievement has been published, it cannot be deleted. You can only delete an achievement in its prepublished state. There is a button labeled Delete at the bottom of the achievement screen for this. You can also reset the progress for achievements while they are in draft. This reset happens for all players at once. There is a button labeled Reset achievement progress at the bottom of the achievement screen for this. Also note that GameBaseActivity does a lot of logging. So, if your device is connected to your computer and you run a debug build, you may see that it lags sometimes. This does not happen in a release build for which the log is removed. Leaderboards Since YASS has only one game mode and one score in the game, it makes sense to have only one leaderboard on Google Play Game Services. Leaderboards are managed from their own tab inside the Game services area of the developer console. Unlike achievements, it is not mandatory to have any leaderboard to be able to publish your game. If your game has different levels of difficulty, you can have a leaderboard for each of them. This also applies if the game has several values that measure player progress, you can have a leaderboard for each of them. Managing leaderboards on Play Games console Leaderboards can be created and managed in the Leaderboards tag. When we click on Add leaderboard, we are presented with a form that has several fields to be filled. They are as follows: Name: the display name of the leaderboard, which can be localized. We will simply call it High Scores. Score formatting: this can be Numeric, Currency, or Time. We will use Numeric for YASS. Icon: a 512x512 px icon to identify the leaderboard. Ordering: Larger is better / Smaller is better. We are going to use Larger is better, but other score types may be Smaller is better as in a racing game. Enable tamper protection: this automatically filters out suspicious scores. You should keep this on. Limits: if you want to limit the score range that is shown on the leaderboard, you can do it here. We are not going to use this List order: the order of the leaderboards. Since we only have one, it is not really important for us. Setting up a leaderboard on the Play Games console Now that we have defined the leaderboard, it is time to use it in the game. As happens with achievements, we have a link where we can get all the resources for the game in XML. So, we proceed to get the ID of the leaderboard and add it to the strings defined in the play_services.xml file. We have to submit the scores at the end of the game (that is, a GameOver event), but also when the user exits a game via the pause button. To unify this, we will create a new GameEvent called GameFinished that is triggered after a GameOver event and after the user exits the game. We will update the stopGame method of GameEngine, which is called in both cases to trigger the event: public void stopGame() { if (mUpdateThread != null) {    synchronized (mLayers) {      onGameEvent(GameEvent.GameFinished);    }    mUpdateThread.stopGame();  mUpdateThread = null; } […] } We have to set the updateThread to null after sending the event, to prevent this code being run twice. Otherwise, we could send each score more than once. Similarly, as happens for achievements, submitting a score is very simple, just a single line of code. But we also need to check that the GoogleApiClient is connected and we still have the same edge case when an Exception is thrown. So, we need to wrap it in a try/catch block. To keep everything in the same place, we will put this code inside ScoreGameObject: @Override public void onGameEvent(GameEvent gameEvent) { […] else if (gameEvent == GameEvent.GameFinished) {    // Submit the score    if (mApiClient.isConnecting() || mApiClient.isConnected()) {      try {        Games.Leaderboards.submitScore(mApiClient,          getLeaderboardId(), mPoints);      }      catch (Exception e){        mApiClient.disconnect();      }    } } }   private String getLeaderboardId() { return mParent.getString(R.string.leaderboard_high_scores); } This is really straightforward. GPGS is now receiving our scores and it takes care of the timestamp of the score to create daily, weekly, and all time leaderboards. It also uses your Google+ circles to show the social score of your friends. All this is done automatically for you. The final missing piece is to let the player open the leaderboards and achievements UI from the main menu as well as trigger a sign in if they are signed out. Opening the Play Games UI To complete the integration of achievements and leaderboards, we are going to add buttons to open the native UI provided by GPGS to our main menu. For this, we are going to place two buttons in the bottom–left corner of the screen, opposite the music and sound buttons. We will also check whether we are connected or not; if not, we will show a single sign-in button. For these buttons we will use the official images of GPGS, which are available for developers to use. Note that you must follow the brand guidelines while using the icons and they must be displayed as they are and not modified. This also provides a consistent look and feel across all the games that support Play Games. Since we have seen a lot of layouts already, we are not going to include another one that is almost the same as something we already have. The main menu with the buttons to view achievements and leaderboards. To handle these new buttons we will, as usual, set the MainMenuFragment as OnClickListener for the views. We do this in the same place as the other buttons, that is, inside onViewCreated: @Override public void onViewCreated(View view, Bundle savedInstanceState) { super.onViewCreated(view, savedInstanceState); [...] view.findViewById(    R.id.btn_achievements).setOnClickListener(this); view.findViewById(    R.id.btn_leaderboards).setOnClickListener(this); view.findViewById(R.id.btn_sign_in).setOnClickListener(this); } As happened with achievements and leaderboards, the work is done using static methods that receive a GoogleApiClient object. We can get this object from the GameHelper that is a part of the BaseGameActivity, like this: GoogleApiClient apiClient = getYassActivity().getGameHelper().getApiClient(); To open the native UI, we have to obtain an Intent and then start an Activity with it. It is important that you use startActivityForResult, since some data is passed back and forth. To open the achievements UI, the code is like this: Intent achievementsIntent = Games.Achievements.getAchievementsIntent(apiClient); startActivityForResult(achievementsIntent, REQUEST_ACHIEVEMENTS); This works out of the box. It automatically grays out the icons for the unlocked achievements, adds a counter and progress bar to the one that is in progress, and a padlock to the hidden ones. Similarly, to open the leaderboards UI we obtain an intent from the Games.Leaderboards class instead: Intent leaderboardsIntent = Games.Leaderboards.getLeaderboardIntent( apiClient, getString(R.string.leaderboard_high_scores)); startActivityForResult(leaderboardsIntent, REQUEST_LEADERBOARDS); In this case, we are asking for a specific leaderboard, since we only have one. We could use getLeaderboardsIntent instead, which will open the Play Games UI for the list of all the leaderboards. We can have an intent to open the list of leaderboards or a specific one. What remains to be done is to replace the buttons for the login one when the user is not connected. For this, we will create a method that reads the state and shows and hides the views accordingly: private void updatePlayButtons() { GameHelper gameHelper = getYassActivity().getGameHelper(); if (gameHelper.isConnecting() || gameHelper.isSignedIn()) {    getView().findViewById(      R.id.btn_achievements).setVisibility(View.VISIBLE);    getView().findViewById(      R.id.btn_leaderboards).setVisibility(View.VISIBLE);    getView().findViewById(      R.id.btn_sign_in).setVisibility(View.GONE); } else {    getView().findViewById(      R.id.btn_achievements).setVisibility(View.GONE);    getView().findViewById(      R.id.btn_leaderboards).setVisibility(View.GONE);    getView().findViewById(      R.id.btn_sign_in).setVisibility(View.VISIBLE); } } This method decides whether to remove or make visible the views based on the state. We will call it inside the important state-changing methods: onLayoutCompleted: the first time we open the game to initialize the UI. onSignInSucceeded: when the user successfully signs in to GPGS. onSignInFailed: this can be triggered when we auto sign in and there is no connection. It is important to handle it. onActivityResult: when we come back from the Play Games UI, in case the user has logged out. But nothing is as easy as it looks. In fact, when the user signs out and does not exit the game, GoogleApiClient keeps the connection open. Therefore the value of isSignedIn from GameHelper still returns true. This is the edge case we have been talking about all through the article. As a result of this edge case, there is an inconsistency in the UI that shows the achievements and leaderboards buttons when it should show the login one. When the user logs out from Play Games, GoogleApiClient keeps the connection open. This can lead to confusion. Unfortunately, this has been marked as work as expected by Google. The reason is that the connection is still active and it is our responsibility to parse the result in the onActivityResult method to determine the new state. But this is not very convenient. Since it is a rare case we will just go for the easiest solution, which is to wrap it in a try/catch block and make the user sign in if he or she taps on leaderboards or achievements while not logged in. This is the code we have to handle the click on the achievements button, but the one for leaderboards is equivalent: else if (v.getId() == R.id.btn_achievements) { try {    GoogleApiClient apiClient =      getYassActivity().getGameHelper().getApiClient();    Intent achievementsIntent =      Games.Achievements.getAchievementsIntent(apiClient);    startActivityForResult(achievementsIntent,      REQUEST_ACHIEVEMENTS); } catch (Exception e) {    GameHelper gameHelper = getYassActivity().getGameHelper();    gameHelper.disconnect();    gameHelper.beginUserInitiatedSignIn(); } } Basically, we have the old code to open the achievements activity, but we wrap it in a try/catch block. If an exception is raised, we disconnect the game helper and begin a new login using the beginUserInitiatedSignIn method. It is very important to disconnect the gameHelper before we try to log in again. Otherwise, the login will not work. We must disconnect from GPGS before we can log in using the method from the GameHelper. Finally, there is the case when the user clicks on the login button, which just triggers the login using the beginUserInitiatedSignIn method from the GameHelper: if (v.getId() == R.id.btn_sign_in) { getYassActivity().getGameHelper().beginUserInitiatedSignIn(); } Once you have published your game and the game services, achievements and leaderboards will not appear in the game description on Google Play straight away. It is required that "a fair amount of users" have used them. You have done nothing wrong, you just have to wait. Other features of Google Play services Google Play Game Services provides more features for game developers than achievements and leaderboards. None of them really fit the game we are building, but it is useful to know they exist just in case your game needs them. You can save yourself lots of time and effort by using them and not reinventing the wheel. The other features of Google Play Games Services are: Events and quests: these allow you to monitor game usage and progression. Also, they add the possibility of creating time-limited events with rewards for the players. Gifts: as simple as it sounds, you can send a gift to other players or request one to be sent to you. Yes, this is seen in the very mechanical Facebook games popularized a while ago. Saved games: the standard concept of a saved game. If your game has progression or can unlock content based on user actions, you may want to use this feature. Since it is saved in the cloud, saved games can be accessed across multiple devices. Turn-based and real-time multiplayer: Google Play Game Services provides an API to implement turn-based and real-time multiplayer features without you needing to write any server code. If your game is multiplayer and has an online economy, it may be worth making your own server and granting virtual currency only on the server to prevent cheating. Otherwise, it is fairly easy to crack the gifts/reward system and a single person can ruin the complete game economy. However, if there is no online game economy, the benefits of gifts and quests may be more important than the fact that someone can hack them. Let's take a look at each of these features. Events The event's APIs provides us with a way to define and collect gameplay metrics and upload them to Google Play Game Services. This is very similar to the GameEvents we are already using in our game. Events should be a subset of the game events of our game. Many of the game events we have are used internally as a signal between objects or as a synchronization mechanism. These events are not really relevant outside the engine, but others could be. Those are the events we should send to GPGS. To be able to send an event from the game to GPGS, we have to create it in the developer console first. To create an event, we have to go to the Events tab in the developer console, click on Add new event, and fill in the following fields: Name: a short name of the event. The name can be up to 100 characters. This value can be localized. Description: a longer description of the event. The description can be up to 500 characters. This value can also be localized. Icon: the icon for the event of the standard 512x512 px size. Visibility: as for achievements, this can be revealed or hidden. Format: as for leaderboards, this can be Numeric, Currency, or Time. Event type: this is used to mark events that create or spend premium currency. This can be Premium currency sink, Premium currency source, or None. While in the game, events work pretty much as incremental achievements. You can increment the event counter using the following line of code: Games.Events.increment(mGoogleApiClient, myEventId, 1); You can delete events that are in the draft state or that have been published as long as the event is not in use by a quest. You can also reset the player progress data for the testers of your events as you can do for achievements. While the events can be used as an analytics system, their real usefulness appears when they are combined with quests. Quests A quest is a challenge that asks players to complete an event a number of times during a specific time frame to receive a reward. Because a quest is linked to an event, to use quests you need to have created at least one event. You can create a quest from the quests tab in the developer console. A quest has the following fields to be filled: Name: the short name of the quest. This can be up to 100 characters and can be localized. Description: a longer description of the quest. Your quest description should let players know what they need to do to complete the quest. The description can be up to 500 characters. The first 150 characters will be visible to players on cards such as those shown in the Google Play Games app. Icon: a square icon that will be associated with the quest. Banner: a rectangular image that will be used to promote the quest. Completion Criteria: this is the configuration of the quest itself. It consists of an event and the number of times the event must occur. Schedule: the start and end date and time for the quest. GPGS uses your local time zone, but stores the values as UTC. Players will see these values appear in their local time zone. You can mark a checkbox to notify users when the quest is about to end. Reward Data: this is specific to each game. It can be a JSON object, specifying the reward. This is sent to the client when the quest is completed. Once configured in the developer console, you can do two things with the quests: Display the list of quests Process a quest completion To get the list of quests, we start an activity with an intent that is provided to us via a static method as usual: Intent questsIntent = Games.Quests.getQuestsIntent(mGoogleApiClient,    Quests.SELECT_ALL_QUESTS); startActivityForResult(questsIntent, QUESTS_INTENT); To be notified when a quest is completed, all we have to do is register a listener: Games.Quests.registerQuestUpdateListener(mGoogleApiClient, this); Once we have set the listener, the onQuestCompleted method will be called once the quest is completed. After completing the processing of the reward, the game should call claim to inform Play Game services that the player has claimed the reward. The following code snippet shows how you might override the onQuestCompleted callback: @Override public void onQuestCompleted(Quest quest) { // Claim the quest reward. Games.Quests.claim(mGoogleApiClient, quest.getQuestId(),    quest.getCurrentMilestone().getMilestoneId()); // Process the RewardData to provision a specific reward. String reward = new    String(quest.getCurrentMilestone().getCompletionRewardData(),    Charset.forName("UTF-8")); } The rewards themselves are defined by the client. As we mentioned before, this will make the game quite easy to crack and get rewards. But usually, avoiding the hassle of writing your own server is worth it. Gifts The gifts feature of GPGS allows us to send gifts to other players and to request them to send us one as well. This is intended to make the gameplay more collaborative and to improve the social aspect of the game. As for other GPGS features, we have a built-in UI provided by the library that can be used. In this case, to send and request gifts for in-game items and resources to and from friends in their Google+ circles. The request system can make use of notifications. There are two types of requests that players can send using the game gifts feature in Google Play Game Services: A wish request to ask for in-game items or some other form of assistance from their friends A gift request to send in-game items or some other form of assistance to their friends A player can specify one or more target request recipients from the default request-sending UI. A gift or wish can be consumed (accepted) or dismissed by a recipient. To see the gifts API in detail, you can visit https://developers.google.com/games/services/android/giftRequests. Again, as for quest rewards, this is done entirely by the client, which makes the game susceptible to piracy. Saved games The saved games service offers cloud game saving slots. Your game can retrieve the saved game data to allow returning players to continue a game at their last save point from any device. This service makes it possible to synchronize a player's game data across multiple devices. For example, if you have a game that runs on Android, you can use the saved games service to allow a player to start a game on their Android phone and then continue playing the game on a tablet without losing any of their progress. This service can also be used to ensure that a player's game play continues from where it was left off even if their device is lost, destroyed, or traded in for a newer model or if the game was reinstalled The saved games service does not know about the game internals, so it provides a field that is an unstructured binary blob where you can read and write the game data. A game can write an arbitrary number of saved games for a single player subjected to user quota, so there is no hard requirement to restrict players to a single save file. Saved games are done in an unstructured binary blob. The API for saved games also receives some metadata that is used by Google Play Games to populate the UI and to present useful information in the Google Play Game app (for example, last updated timestamp). Saved games has several entry points and actions, including how to deal with conflicts in the saved games. To know more about these check out the official documentation at https://developers.google.com/games/services/android/savedgames. Multiplayer games If you are going to implement multiplayer, GPGS can save you a lot of work. You may or may not use it for the final product, but it will remove the need to think about the server-side until the game concept is validated. You can use GPGS for turn-based and real-time multiplayer games. Although each one is completely different and uses a different API, there is always an initial step where the game is set up and the opponents are selected or invited. In a turn-based multiplayer game, a single shared state is passed among the players and only the player that owns the turn has permission to modify it. Players take turns asynchronously according to an order of play determined by the game. A turn is finished explicitly by the player using an API call. Then the game state is passed to the other players, together with the turn. There are many cases: selecting opponents, creating a match, leaving a match, canceling, and so on. The official documentation at https://developers.google.com/games/services/android/turnbasedMultiplayer is quite exhaustive and you should read through it if you plan to use this feature. In a real-time multiplayer there is no concept of turn. Instead, the server uses the concept of room: a virtual construct that enables network communication between multiple players in the same game session and lets players send data directly to one another, a common concept for game servers. Real-time multiplayer service is based on the concept of Room. The API of real-time multiplayer allows us to easily: Manage network connections to create and maintain a real-time multiplayer room Provide a player-selection user interface to invite players to join a room, look for random players for auto-matching, or a combination of both Store participant and room-state information on the Play Game services' servers while the game is running Send room invitations and updates to players To check the complete documentation for real-time games, please visit the official web at https://developers.google.com/games/services/android/realtimeMultiplayer. Summary We have added Google Play services to YASS, including setting up the game in the developer console and adding the required libraries to the project. Then, we defined a set of achievements and added the code to unlock them. We have used normal, incremental, and hidden achievement types to showcase the different options available. We have also configured a leaderboard and submitted the scores, both when the game is finished and when it is exited via the pause dialog. Finally, we have added links to the native UI for leaderboards and achievements to the main menu. We have also introduced the concepts of events, quests, and gifts and the features of saved games and multiplayer that Google Play Game services offers. The game is ready to publish now. Resources for Article: Further resources on this subject: SceneKit [article] Creating Games with Cocos2d-x is Easy and 100 percent Free [article] SpriteKit Framework and Physics Simulation [article]
Read more
  • 0
  • 0
  • 3837

article-image-whats-bitbake-all-about
Packt
08 Jul 2015
7 min read
Save for later

What's BitBake All About?

Packt
08 Jul 2015
7 min read
In this article by H M Irfan Sadiq, the author of the book Using Yocto Project with BeagleBone Black, we will move one step ahead by detailing different aspects of the basic engine behind Yocto Project, and other similar projects. This engine is BitBake. Covering all the various aspects of BitBake in one article is not possible; it will require a complete book. We will familiarize you as much as possible with this tool. We will cover the following topics in this article: Legacy tools and BitBake Execution of BitBake (For more resources related to this topic, see here.) Legacy tools and BitBake This discussion does not intend to invoke any religious row between other alternatives and BitBake. Every step in the evolution has its own importance, which cannot be denied, and so do other available tools. BitBake was developed keeping in mind the Embedded Linux Development domain. So, it tried to solve the problems faced in this core area, and in my opinion, it addresses these in the best way till date. You might get the same output using other tools, such as Buildroot, but the flexibility and ease provided by BitBake in this domain is second to none. The major difference is in addressing the problem. Legacy tools are developed considering packages in mind, but BitBake evolved to solve the problems faced during the creation of BSPs, or embedded distributions. Let's go through the challenges faced in this specific domain and understand how BitBake helps us face them. Cross-compilation BitBake takes care of cross compilation. You do not have to worry about it for each package you are building. You can use the same set of packages and build for different platforms seamlessly. Resolving inter-package dependencies This is the real pain of resolving dependencies of packages on each other and fulfilling them. In this case, we need to specify the different dependency types available, and BitBake takes care of them for us. We can handle both build and runtime dependencies. Variety of target distribution BitBake supports a variety of target distribution creations. We can define a full new distribution of our own, by choosing package management, image types, and other artifacts to fulfill our requirements. Coupling to build system BitBake is not very dependent on the build system we use to build our target images. We don't use libraries and tools installed on the system; we build their native versions and use them instead. This way, we are not dependent on the build system's root filesystem. Variety of build systems distros Since BitBake is very loosely coupled to the build system's distribution type, it's very easy to use on various distributions. Variety of architecture We have to support different architectures. We don't have to modify our recipes for each package. We can write our recipes so that features, parameters, and flags are picked up conditionally. Exploit parallelism For the simplest projects, we have to build images and do more than a thousand tasks. These tasks require us to use the full power available to us, whether they are computational or related to memory. BitBake's architecture supports us in this regard, using its scheduler to run as many tasks in parallel as it can, or as we configure. Also, when we say task, it should not be confused with package, but it is a part of package. A package can contain many tasks, (fetch, compile, configure, package, populate_sysroot, and so on), and all these can run in parallel. Easy to use, extend, and collaborate Keeping and relying on metadata keeps things simple and configurable. Almost nothing is hard coded. Thus, we can configure things according to our requirements. Also, BitBake provides us with a mechanism to reuse things that are already developed. We can keep our metadata structured, so that it gets applied/extended conditionally. You will learn these tricks when we will explore layers. BitBake execution To get us to a successful package or image, BitBake performs some steps that we need to go through to get an understanding of the workflow. In certain cases, some of these steps can be avoided; but we are not discussing such cases, considering them as corner cases. For details on these, we should refer to the BitBake user manual. Parsing metadata When we invoke the BitBake command to build our image, the first thing it does is parse our base configuration metadata. This metadata consists of build_bb/conf/bblayers.conf, multiple layer/conf/layer.conf, and poky/meta/conf/bitbake.conf. This data can be of the following types: Configuration data Class data Recipes Key variables BBFILES and BBPATH, which are constructed from the layer.conf file. Thus, the constructed BBPATH variable is used to locate configuration files under conf/ and class files under class/ directories. The BBFILES variable is used to find recipe files (.bb and .bbappend). bblayers.conf is used to set these variables. Next, the bitbake.conf file is parsed. If there is no bblayers.conf file, it is assumed that the user has set BBFILES and BBPATH directly in the environment. After having dealt with configuration files, class files inclusion and parsing are taken care of. These class files are specified using the INHERIT variable. Next, BitBake will use the BBFILES variable to construct a list of recipes to parse, along with any append files. Thus, after parsing, recipe values for various variables are stored into datastore. After the completion of a recipe parsing BitBake has: A list of tasks that the recipe has defined A set of data consisting of keys and values Dependency information of the tasks Preparing tasklist BitBake starts looking through the PROVIDES set in recipe files. The PROVIDES set defaults to the recipe name, and we can define multiple values to it. We can have multiple recipes providing a similar package. This task is accomplished by setting PROVIDES in the recipes. While actually making such recipes part of the build, we have to define PRREFERED_PROVIDER_foo so that our specific recipe foo can be used. We can do this in multiple locations. In the case of kernels, we use it in the manchin.conf file. BitBake iterates through the list of targets it has to build and resolves them, along with their dependencies. If PRREFERED_PROVIDER is not set and multiple versions of a package exist, BitBake will choose the highest version. Each target/recipe has multiple tasks, such as fetch, unpack, configure, and compile. BitBake considers each of these tasks as independent units to exploit parallelism in a multicore environment. Although these tasks are executed sequentially for a single package/recipe, for multiple packages, they are run in parallel. We may be compiling one recipe, configuring the second, and unpacking the third in parallel. Or, may be at the start, eight packages are all fetching their sources. For now, we should know the dependencies between tasks that are defined using DEPENDS and RDEPENDS. In DEPENDS, we provide the dependencies that our package needs to build successfully. So, BitBake takes care of building these dependencies before our package is built. RDEPENDS are the dependencies that are required for our package to execute/run successfully on the target system. So, BitBake takes care of providing these dependencies on the target's root filesystem. Executing tasks Tasks can be defined using the shell syntax or Python. In the case of shell tasks, a shell script is created under a temporary directory as run.do_taskname.pid and then, it is executed. The generated shell script contains all the exported variables and the shell functions, with all the variables expanded. Output from the task is saved in the same directory with log.do_taskname.pid. In the case of errors, BitBake shows the full path to this logfile. This is helpful for debugging. Summary In this article, you learned the goals and problem areas that BitBake has considered, thus making itself a unique option for Embedded Linux Development. You also learned how BitBake actually works. Resources for Article: Further resources on this subject: Learning BeagleBone [article] Baking Bits with Yocto Project [article] The BSP Layer [article]
Read more
  • 0
  • 0
  • 9604

article-image-how-to-build-remote-controlled-tv-node-webkit
Roberto González
08 Jul 2015
14 min read
Save for later

How to build a Remote-controlled TV with Node-Webkit

Roberto González
08 Jul 2015
14 min read
Node-webkit is one of the most promising technologies to come out in the last few years. It lets you ship a native desktop app for Windows, Mac, and Linux just using HTML, CSS, and some JavaScript. These are the exact same languages you use to build any web app. You basically get your very own Frameless Webkit to build your app, which is then supercharged with NodeJS, giving you access to some powerful libraries that are not available in a typical browser. As a demo, we are going to build a remote-controlled Youtube app. This involves creating a native app that displays YouTube videos on your computer, as well as a mobile client that will let you search for and select the videos you want to watch straight from your couch. You can download the finished project from https://github.com/Aerolab/youtube-tv. You need to follow the first part of this guide (Getting started) to set up the environment and then run run.sh (on Mac) or run.bat (on Windows) to start the app. Getting started First of all, you need to install Node.JS (a JavaScript platform), which you can download from http://nodejs.org/download/. The installer comes bundled with NPM (Node.JS Package Manager), which lets you install everything you need for this project. Since we are going to be building two apps (a desktop app and a mobile app), it’s better if we get the boring HTML+CSS part out of the way, so we can concentrate on the JavaScript part of the equation. Download the project files from https://github.com/Aerolab/youtube-tv/blob/master/assets/basics.zip and put them in a new folder. You can name the project’s folder youtube-tv  or whatever you want. The folder should look like this: - index.html // This is the starting point for our desktop app - css // Our desktop app styles - js // This is where the magic happens - remote // This is where the magic happens (Part 2) - libraries // FFMPEG libraries, which give you H.264 video support in Node-Webkit - player // Our youtube player - Gruntfile.js // Build scripts - run.bat // run.bat runs the app on Windows - run.sh // sh run.sh runs the app on Mac Now open the Terminal (on Mac or Linux) or a new command prompt (on Windows) right in that folder. Now we’ll install a couple of dependencies we need for this project, so type these commands to install node-gyp and grunt-cli. Each one will take a few seconds to download and install: On Mac or Linux: sudo npm install node-gyp -g sudo npm install grunt-cli -g  On Windows: npm install node-gyp -g npm install grunt-cli -g Leave the Terminal open. We’ll be using it again in a bit. All Node.JS apps start with a package.json file (our manifest), which holds most of the settings for your project, including which dependencies you are using. Go ahead and create your own package.json file (right inside the project folder) with the following contents. Feel free to change anything you like, such as the project name, the icon, or anything else. Check out the documentation at https://github.com/rogerwang/node-webkit/wiki/Manifest-format: { "//": "The // keys in package.json are comments.", "//": "Your project’s name. Go ahead and change it!", "name": "Remote", "//": "A simple description of what the app does.", "description": "An example of node-webkit", "//": "This is the first html the app will load. Just leave this this way", "main": "app://host/index.html", "//": "The version number. 0.0.1 is a good start :D", "version": "0.0.1", "//": "This is used by Node-Webkit to set up your app.", "window": { "//": "The Window Title for the app", "title": "Remote", "//": "The Icon for the app", "icon": "css/images/icon.png", "//": "Do you want the File/Edit/Whatever toolbar?", "toolbar": false, "//": "Do you want a standard window around your app (a title bar and some borders)?", "frame": true, "//": "Can you resize the window?", "resizable": true}, "webkit": { "plugin": false, "user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Safari/537.36" }, "//": "These are the libraries we’ll be using:", "//": "Express is a web server, which will handle the files for the remote", "//": "Socket.io lets you handle events in real time, which we'll use with the remote as well.", "dependencies": { "express": "^4.9.5", "socket.io": "^1.1.0" }, "//": "And these are just task handlers to make things easier", "devDependencies": { "grunt": "^0.4.5", "grunt-contrib-copy": "^0.6.0", "grunt-node-webkit-builder": "^0.1.21" } } You’ll also find Gruntfile.js, which takes care of downloading all of the node-webkit assets and building the app once we are ready to ship. Feel free to take a look into it, but it’s mostly boilerplate code. Once you’ve set everything up, go back to the Terminal and install everything you need by typing: npm install grunt nodewebkitbuild You may run into some issues when doing this on Mac or Linux. In that case, try using sudo npm install and sudo grunt nodewebkitbuild. npm install installs all of the dependencies you mentioned in package.json, both the regular dependencies and the development ones, like grunt and grunt-nodewebkitbuild, which downloads the Windows and Mac version of node-webkit, setting them up so they can play videos, and building the app. Wait a bit for everything to install properly and we’re ready to get started. Note that if you are using Windows, you might get a scary error related to Visual C++ when running npm install. Just ignore it. Building the desktop app All web apps (or websites for that matter) start with an index.html file. We are going to be creating just that to get our app to run: <!DOCTYPE html><html> <head> <metacharset="utf-8"/> <title>Youtube TV</title> <linkhref='http://fonts.googleapis.com/css?family=Roboto:500,400'rel='stylesheet'type='text/css'/> <linkhref="css/normalize.css"rel="stylesheet"type="text/css"/> <linkhref="css/styles.css"rel="stylesheet"type="text/css"/> </head> <body> <divid="serverInfo"> <h1>Youtube TV</h1> </div> <divid="videoPlayer"> </div> <script src="js/jquery-1.11.1.min.js"></script> <script src="js/youtube.js"></script> <script src="js/app.js"></script> </body> </html> As you may have noticed, we are using three scripts for our app: jQuery (pretty well known at this point), a Youtube video player, and finally app.js, which contains our app's logic. Let’s dive into that! First of all, we need to create the basic elements for our remote control. The easiest way of doing this is to create a basic web server and serve a small web app that can search Youtube, select a video, and have some play/pause controls so we don’t have any good reasons to get up from the couch. Open js/app.js and type the following: // Show the Developer Tools. And yes, Node-Webkit has developer tools built in! Uncomment it to open it automatically//require('nw.gui').Window.get().showDevTools(); // Express is a web server, will will allow us to create a small web app with which to control the playervar express = require('express'); var app = express(); var server = require('http').Server(app); var io = require('socket.io')(server); // We'll be opening up our web server on Port 8080 (which doesn't require root privileges)// You can access this server at http://127.0.0.1:8080var serverPort =8080; server.listen(serverPort); // All the static files (css, js, html) for the remote will be served using Express.// These assets are in the /remote folderapp.use('/', express.static('remote')); With those 7 lines of code (not counting comments) we just got a neat web server working on port 8080. If you were paying attention to the code, you may have noticed that we required something called socket.io. This lets us use websockets with minimal effort, which means we can communicate with, from, and to our remote instantly. You can learn more about socket.io at http://socket.io/. Let’s set that up next in app.js: // Socket.io handles the communication between the remote and our app in real time, // so we can instantly send commands from a computer to our remote and backio.on('connection', function (socket) { // When a remote connects to the app, let it know immediately the current status of the video (play/pause)socket.emit('statusChange', Youtube.status); // This is what happens when we receive the watchVideo command (picking a video from the list)socket.on('watchVideo', function (video) { // video contains a bit of info about our video (id, title, thumbnail)// Order our Youtube Player to watch that video Youtube.watchVideo(video); }); // These are playback controls. They receive the “play” and “pause” events from the remotesocket.on('play', function () { Youtube.playVideo(); }); socket.on('pause', function () { Youtube.pauseVideo(); }); }); // Notify all the remotes when the playback status changes (play/pause)// This is done with io.emit, which sends the same message to all the remotesYoutube.onStatusChange =function(status) { io.emit('statusChange', status); }; That’s the desktop part done! In a few dozen lines of code we got a web server running at http://127.0.0.1:8080 that can receive commands from a remote to watch a specific video, as well as handling some basic playback controls (play and pause). We are also notifying the remotes of the status of the player as soon as they connect so they can update their UI with the correct buttons (if it’s playing, show the pause button and vice versa). Now we just need to build the remote. Building the remote control The server is just half of the equation. We also need to add the corresponding logic on the remote control, so it’s able to communicate with our app. In remote/index.html, add the following HTML: <!DOCTYPE html><html> <head> <metacharset=“utf-8”/> <title>TV Remote</title> <metaname="viewport"content="width=device-width, initial-scale=1, maximum-scale=1"/> <linkrel="stylesheet"href="/css/normalize.css"/> <linkrel="stylesheet"href="/css/styles.css"/> </head> <body> <divclass="controls"> <divclass="search"> <inputid="searchQuery"type="search"value=""placeholder="Search on Youtube..."/> </div> <divclass="playback"> <buttonclass="play">&gt;</button> <buttonclass="pause">||</button> </div> </div> <divid="results"class="video-list"> </div> <divclass="__templates"style="display:none;"> <articleclass="video"> <figure><imgsrc=""alt=""/></figure> <divclass="info"> <h2></h2> </div> </article> </div> <script src="/socket.io/socket.io.js"></script> <script src="/js/jquery-1.11.1.min.js"></script> <script src="/js/search.js"></script> <script src="/js/remote.js"></script> </body> </html> Again, we have a few libraries: Socket.io is served automatically by our desktop app at /socket.io/socket.io.js, and it manages the communication with the server. jQuery is somehow always there, search.js manages the integration with the Youtube API (you can take a look if you want), and remote.js handles the logic for the remote. The remote itself is pretty simple. It can look for videos on Youtube, and when we click on a video it connects with the app, telling it to play the video with socket.emit. Let’s dive into remote/js/remote.js to make this thing work: // First of all, connect to the server (our desktop app)var socket = io.connect(); // Search youtube when the user stops typing. This gives us an automatic search.var searchTimeout =null; $('#searchQuery').on('keyup', function(event){ clearTimeout(searchTimeout); searchTimeout = setTimeout(function(){ searchYoutube($('#searchQuery').val()); }, 500); }); // When we click on a video, watch it on the App$('#results').on('click', '.video', function(event){ // Send an event to notify the server we want to watch this videosocket.emit('watchVideo', $(this).data()); }); // When the server tells us that the player changed status (play/pause), alter the playback controlssocket.on('statusChange', function(status){ if( status ==='play' ) { $('.playback .pause').show(); $('.playback .play').hide(); } elseif( status ==='pause'|| status ==='stop' ) { $('.playback .pause').hide(); $('.playback .play').show(); } }); // Notify the app when we hit the play button$('.playback .play').on('click', function(event){ socket.emit('play'); }); // Notify the app when we hit the pause button$('.playback .pause').on('click', function(event){ socket.emit('pause'); }); This is very similar to our server, except we are using socket.emit a lot more often to send commands back to our desktop app, telling it which videos to play and handle our basic play/pause controls. The only thing left to do is make the app run. Ready? Go to the terminal again and type: If you are on a Mac: sh run.sh If you are on Windows: run.bat If everything worked properly, you should be both seeing the app and if you open a web browser to http://127.0.0.1:8080 the remote client will open up. Search for a video, pick anything you like, and it’ll play in the app. This also works if you point any other device on the same network to your computer’s IP, which brings me to the next (and last) point. Finishing touches There is one small improvement we can make: print out the computer’s IP to make it easier to connect to the app from any other device on the same Wi-Fi network (like a smartphone). On js/app.js add the following code to find out the IP and update our UI so it’s the first thing we see when we open the app: // Find the local IPfunction getLocalIP(callback) { require('dns').lookup( require('os').hostname(), function (err, add, fam) { typeof callback =='function'? callback(add) :null; }); } // To make things easier, find out the machine's ip and communicate itgetLocalIP(function(ip){ $('#serverInfo h1').html('Go to<br/><strong>http://'+ip+':'+serverPort+'</strong><br/>to open the remote'); }); The next time you run the app, the first thing you’ll see is the IP for your computer, so you just need to type that URL in your smartphone to open the remote and control the player from any computer, tablet, or smartphone (as long as they are in the same Wi-Fi network). That's it! You can start expanding on this to improve the app: Why not open the app on a fullscreen by default? Why not get rid of the horrible default frame and create your own? You can actually designate any div as a window handle with CSS (using -webkit-app-region: drag), so you can drag the window by that div and create your own custom title bar. Summary While the app has a lot of interlocking parts, it's a good first project to find out what you can achieve with node-webkit in just a few minutes. I hope you enjoyed this post! About the author Roberto González is the co-founder of Aerolab, “an awesome place where we really push the barriers to create amazing, well-coded designs for the best digital products”. He can be reached at @robertcode.
Read more
  • 0
  • 0
  • 6647

article-image-creating-f-project
Packt
08 Jul 2015
5 min read
Save for later

Creating an F# Project

Packt
08 Jul 2015
5 min read
In this article by Adnan Masood, author of the book, Learning F# Functional Data Structures and Algorithms, we see how to set up the IDE and create our first F# project. "Ah yes, Haskell. Where all the types are strong, all the men carry arrows, and all the children are above average." – marked trees (on the city of Haskell) The perceived adversity of functional programming is overly exaggerated; the essence of this paradigm is to explicitly recognize and enforce the referential transparency. We will see how to set up the tooling for Visual Studio 2013 and for F# 3.1, the currently available version of F# at the time of writing. We will review the F# 4.0 preview features by the end of this project. After we get the tooling sorted out, we will review some simple algorithms; starting with recursion with typical a Fibonacci sequence and Tower of Hanoi, we will perform lazy evaluation on a quick sort example. In this article, we will cover the following topics: Setting up Visual Studio and F# compiler to work together Setting up the environment and running your F# programs (For more resources related to this topic, see here.) Setting up the IDE As developers, we love our IDEs (Integrated Development Environments) and Visual Studio.NET is probably the best IDE for .NET development; no offense to Eclipse bloatware Luna. From the open source perspective, there has been a recent major development in making the .NET framework available as open source and on Mac and Linux platforms. Microsoft announced a pre-release of F# 4.0 in Visual Studio 2015 Preview and it will be available as part of the full release. To install and run F#, there are various options available for all platforms, sizes, and budgets. For those with a fear of commitments, there is the online interactive version of TryFsharp at http://www.tryfsharp.org/ where you can code in the browser. For Windows users, you have a few options. Until VS.NET 2015 comes out, you can try out the freely available Visual Studio Community 2013 or a Visual Studio 2013 trial edition, with trial being the keyword. The trial editions include Ultimate, Premium, and Professional versions. The free community edition IDE can be downloaded from https://www.visualstudio.com/en-us/news/vs2013-community-vs.aspx and the trial editions can be downloaded from http://www.visualstudio.com/downloads/download-visual-studio-vs. Alternatively, there are express editions available at no cost. Visual Studio Express 2013 for Windows Desktop Web editions can be downloaded from http://www.visualstudio.com/downloads/download-visual-studio-vs#d-express-windows-desktop. F# support is built into Visual Studio; the Visual F# tools package the latest updates to the F# compiler: interactive, runtime, and Visual Studio integration. F# support comes with Visual Studio. However, the F# team releases regular updates in the form of F# tools. The tools can be downloaded from www.microsoft.com/en-us/download/details.aspx?id=44011. The F# tools contain the F# command-line compiler (fsc.exe) and F# Interactive (fsi.exe), which are the easiest way to get started with F#. The fsi.exe compiler can be found in C:Program Files (x86)Microsoft SDKsF#<version>Framework<version>. The <version> placeholder in the preceding path is substituted by your .NET version installed. If you just want to use the F# compiler and tools from the command line, you can download the .NET framework 4.5 or above from https://www.microsoft.com/en-in/download/details.aspx?id=30653. You will also need the Windows SDK for associated dependencies, which can be downloaded from http://msdn.microsoft.com/windows/desktop/bg162891. Alternatively, Tsunami is the free IDE for F# that you can download from http://tsunami.io/download.html and use to build applications. CloudSharper by IntelliFactory is in beta but shows promise as a web-based IDE. For more information regarding CloudSharper, refer to http://cloudsharper.com/. In this article, we will be using Visual Studio 2013 Professional Edition and FSI (F# interactive) but you can either use the trial or Express edition, or the FSI command line to run the examples and exercises. Your first F# project Going through installation screens and showing how to click Next would be discourteous to our reader's intelligence. Therefore we will skip step-by-step installation for other more verbose texts. Let's start with our first F# project in Visual Studio. In the preceding screenshot, you can see the F# interactive window at the bottom. Here we have selected FILE | New | Project because we are attempting to open a new project of F# type. There are a few project types available, including console applications and F# library. For ease of explanation, let's begin with a Console Application as shown in the next screenshot: Alternatively, from within Visual Studio, we can use FSharp Interactive. FSharp Interactive (FSI) is an effective tool for testing out your code quickly. You can open the FSI window by selecting View | Other Windows | F# Interactive from the Visual Studio IDE as shown in the next screenshot: FSI lets you run code from a console which is similar to a shell. You can access the FSI executable directly from the location at c:Program Files (x86)Microsoft SDKsF#<version>Framework<version>. FSI maintains session context, which means that the constructs created earlier in the FSI are still available in the later parts of code. The FsiAnyCPU.exe executable file is the 64-bit counterpart of F# interactive; Visual Studio determines which executable to use based on the machine's architecture being either 32-bit or 64-bit. You can also change the F# interactive parameters and settings from the Options dialog as shown in the following screenshot: Summary In this article, you learned how to set up an IDE for F# in Visual Studio 2013 and created a new F# project. Resources for Article: Further resources on this subject: Test-driven API Development with Django REST Framework [article] edX E-Learning Course Marketing [article] Introduction to Microsoft Azure Cloud Services [article]
Read more
  • 0
  • 0
  • 2074

article-image-working-large-data-sources
Packt
08 Jul 2015
20 min read
Save for later

Working with large data sources

Packt
08 Jul 2015
20 min read
In this article, by Duncan M. McGreggor, author of the book Mastering matplotlib, we come across the use of NumPy in the world of matplotlib and big data, problems with large data sources, and the possible solutions to these problems. (For more resources related to this topic, see here.) Most of the data that users feed into matplotlib when generating plots is from NumPy. NumPy is one of the fastest ways of processing numerical and array-based data in Python (if not the fastest), so this makes sense. However by default, NumPy works on in-memory database. If the dataset that you want to plot is larger than the total RAM available on your system, performance is going to plummet. In the following section, we're going to take a look at an example that illustrates this limitation. But first, let's get our notebook set up, as follows: In [1]: import matplotlib        matplotlib.use('nbagg')        %matplotlib inline Here are the modules that we are going to use: In [2]: import glob, io, math, os         import psutil        import numpy as np        import pandas as pd        import tables as tb        from scipy import interpolate        from scipy.stats import burr, norm        import matplotlib as mpl        import matplotlib.pyplot as plt        from IPython.display import Image We'll use the custom style sheet that we created earlier, as follows: In [3]: plt.style.use("../styles/superheroine-2.mplstyle") An example problem To keep things manageable for an in-memory example, we're going to limit our generated dataset to 100 million points by using one of SciPy's many statistical distributions, as follows: In [4]: (c, d) = (10.8, 4.2)        (mean, var, skew, kurt) = burr.stats(c, d, moments='mvsk') The Burr distribution, also known as the Singh–Maddala distribution, is commonly used to model household income. Next, we'll use the burr object's method to generate a random population with our desired count, as follows: In [5]: r = burr.rvs(c, d, size=100000000) Creating 100 million data points in the last call took about 10 seconds on a moderately recent workstation, with the RAM usage peaking at about 2.25 GB (before the garbage collection kicked in). Let's make sure that it's the size we expect, as follows: In [6]: len(r) Out[6]: 100000000 If we save this to a file, it weighs in at about three-fourths of a gigabyte: In [7]: r.tofile("../data/points.bin") In [8]: ls -alh ../data/points.bin        -rw-r--r-- 1 oubiwann staff 763M Mar 20 11:35 points.bin This actually does fit in the memory on a machine with a RAM of 8 GB, but generating much larger files tends to be problematic. We can reuse it multiple times though, to reach a size that is larger than what can fit in the system RAM. Before we do this, let's take a look at what we've got by generating a smooth curve for the probability distribution, as follows: In [9]: x = np.linspace(burr.ppf(0.0001, c, d),                          burr.ppf(0.9999, c, d), 100)          y = burr.pdf(x, c, d) In [10]: (figure, axes) = plt.subplots(figsize=(20, 10))          axes.plot(x, y, linewidth=5, alpha=0.7)          axes.hist(r, bins=100, normed=True)          plt.show() The following plot is the result of the preceding code: Our plot of the Burr probability distribution function, along with the 100-bin histogram with a sample size of 100 million points, took about 7 seconds to render. This is due to the fact that NumPy handles most of the work, and we only displayed a limited number of visual elements. What would happen if we did try to plot all the 100 million points? This can be checked by the following code: In [11]: (figure, axes) = plt.subplots()          axes.plot(r)          plt.show() formatters.py:239: FormatterWarning: Exception in image/png formatter: Allocated too many blocks After about 30 seconds of crunching, the preceding error was thrown—the Agg backend (a shared library) simply couldn't handle the number of artists required to render all the points. But for now, this case clarifies the point that we stated a while back—our first plot rendered relatively quickly because we were selective about the data we chose to present, given the large number of points with which we are working. However, let's say we have data from the files that are too large to fit into the memory. What do we do about this? Possible ways to address this include the following: Moving the data out of the memory and into the filesystem Moving the data off the filesystem and into the databases We will explore examples of these in the following section. Big data on the filesystem The first of the two proposed solutions for large datasets involves not burdening the system memory with data, but rather leaving it on the filesystem. There are several ways to accomplish this, but the following two methods in particular are the most common in the world of NumPy and matplotlib: NumPy's memmap function: This function creates memory-mapped files that are useful if you wish to access small segments of large files on the disk without having to read the whole file into the memory. PyTables: This is a package that is used to manage hierarchical datasets. It is built on the top of the HDF5 and NumPy libraries and is designed to efficiently and easily cope with extremely large amounts of data. We will examine each in turn. NumPy's memmap function Let's restart the IPython kernel by going to the IPython menu at the top of notebook page, selecting Kernel, and then clicking on Restart. When the dialog box pops up, click on Restart. Then, re-execute the first few lines of the notebook by importing the required libraries and getting our style sheet set up. Once the kernel is restarted, take a look at the RAM utilization on your system for a fresh Python process for the notebook: In [4]: Image("memory-before.png") Out[4]: The following screenshot shows the RAM utilization for a fresh Python process: Now, let's load the array data that we previously saved to disk and recheck the memory utilization, as follows: In [5]: data = np.fromfile("../data/points.bin")        data_shape = data.shape        data_len = len(data)        data_len Out[5]: 100000000 In [6]: Image("memory-after.png") Out[6]: The following screenshot shows the memory utilization after loading the array data: This took about five seconds to load, with the memory consumption equivalent to the file size of the data. This means that if we wanted to build some sample data that was too large to fit in the memory, we'd need about 11 of those files concatenated, as follows: In [7]: 8 * 1024 Out[7]: 8192 In [8]: filesize = 763        8192 / filesize Out[8]: 10.73656618610747 However, this is only if the entire memory was available. Let's see how much memory is available right now, as follows: In [9]: del data In [10]: psutil.virtual_memory().available / 1024**2 Out[10]: 2449.1796875 That's 2.5 GB. So, to overrun our RAM, we'll just need a fraction of the total. This is done in the following way: In [11]: 2449 / filesize Out[11]: 3.2096985583224114 The preceding output means that we only need four of our original files to create a file that won't fit in memory. However, in the following section, we will still use 11 files to ensure that data, if loaded into the memory, will be much larger than the memory. How do we create this large file for demonstration purposes (knowing that in a real-life situation, the data would already be created and potentially quite large)? We can try to use numpy.tile to create a file of the desired size (larger than memory), but this can make our system unusable for a significant period of time. Instead, let's use numpy.memmap, which will treat a file on the disk as an array, thus letting us work with data that is too large to fit into the memory. Let's load the data file again, but this time as a memory-mapped array, as follows: In [12]: data = np.memmap(            "../data/points.bin", mode="r", shape=data_shape) The loading of the array to a memmap object was very quick (compared to the process of bringing the contents of the file into the memory), taking less than a second to complete. Now, let's create a new file to write the data to. This file must be larger in size as compared to our total system memory (if held on in-memory database, it will be smaller on the disk): In [13]: big_data_shape = (data_len * 11,)          big_data = np.memmap(              "../data/many-points.bin", dtype="uint8",              mode="w+", shape=big_data_shape) The preceding code creates a 1 GB file, which is mapped to an array that has the shape we requested and just contains zeros: In [14]: ls -alh ../data/many-points.bin          -rw-r--r-- 1 oubiwann staff 1.0G Apr 2 11:35 many-points.bin In [15]: big_data.shape Out[15]: (1100000000,) In [16]: big_data Out[16]: memmap([0, 0, 0, ..., 0, 0, 0], dtype=uint8) Now, let's fill the empty data structure with copies of the data we saved to the 763 MB file, as follows: In [17]: for x in range(11):              start = x * data_len              end = (x * data_len) + data_len              big_data[start:end] = data          big_data Out[17]: memmap([ 90, 71, 15, ..., 33, 244, 63], dtype=uint8) If you check your system memory before and after, you will only see minimal changes, which confirms that we are not creating an 8 GB data structure on in-memory. Furthermore, checking your system only takes a few seconds. Now, we can do some sanity checks on the resulting data and ensure that we have what we were trying to get, as follows: In [18]: big_data_len = len(big_data)          big_data_len Out[18]: 1100000000 In [19]: data[100000000 – 1] Out[19]: 63 In [20]: big_data[100000000 – 1] Out[20]: 63 Attempting to get the next index from our original dataset will throw an error (as shown in the following code), since it didn't have that index: In [21]: data[100000000] ----------------------------------------------------------- IndexError               Traceback (most recent call last) ... IndexError: index 100000000 is out of bounds ... But our new data does have an index, as shown in the following code: In [22]: big_data[100000000 Out[22]: 90 And then some: In [23]: big_data[1100000000 – 1] Out[23]: 63 We can also plot data from a memmaped array without having a significant lag time. However, note that in the following code, we will create a histogram from 1.1 million points of data, so the plotting won't be instantaneous: In [24]: (figure, axes) = plt.subplots(figsize=(20, 10))          axes.hist(big_data, bins=100)          plt.show() The following plot is the result of the preceding code: The plotting took about 40 seconds to generate. The odd shape of the histogram is due to the fact that, with our data file-hacking, we have radically changed the nature of our data since we've increased the sample size linearly without regard for the distribution. The purpose of this demonstration wasn't to preserve a sample distribution, but rather to show how one can work with large datasets. What we have seen is not too shabby. Thanks to NumPy, matplotlib can work with data that is too large for memory, even if it is a bit slow iterating over hundreds of millions of data points from the disk. Can matplotlib do better? HDF5 and PyTables A commonly used file format in the scientific computing community is Hierarchical Data Format (HDF). HDF is a set of file formats (namely HDF4 and HDF5) that were originally developed at the National Center for Supercomputing Applications (NCSA), a unit of the University of Illinois at Urbana-Champaign, to store and organize large amounts of numerical data. The NCSA is a great source of technical innovation in the computing industry—a Telnet client, the first graphical web browser, a web server that evolved into the Apache HTTP server, and HDF, which is of particular interest to us, were all developed here. It is a little known fact that NCSA's web browser code was the ancestor to both the Netscape web browser as well as a prototype of Internet Explorer that was provided to Microsoft by a third party. HDF is supported by Python, R, Julia, Java, Octave, IDL, and MATLAB, to name a few. HDF5 offers significant improvements and useful simplifications over HDF4. It uses B-trees to index table objects and, as such, works well for write-once/read-many time series data. Common use cases span fields such as meteorological studies, biosciences, finance, and aviation. The HDF5 files of multiterabyte sizes are common in these applications. Its typically constructed from the analyses of multiple HDF5 source files, thus providing a single (and often extensive) source of grouped data for a particular application. The PyTables library is built on the top of the Python HDF5 library and NumPy. As such, it not only provides access to one of the most widely used large data file formats in the scientific computing community, but also links data extracted from these files with the data types and objects provided by the fast Python numerical processing library. PyTables is also used in other projects. Pandas wraps PyTables, thus extending its convenient in-memory data structures, functions, and objects to large on-disk files. To use HDF data with Pandas, you'll want to create pandas.HDFStore, read from the HDF data sources with pandas.read_hdf, or write to one with pandas.to_hdf. Files that are too large to fit in the memory may be read and written by utilizing chunking techniques. Pandas does support the disk-based DataFrame operations, but these are not very efficient due to the required assembly on columns of data upon reading back into the memory. One project to keep an eye on under the PyData umbrella of projects is Blaze. It's an open wrapper and a utility framework that can be used when you wish to work with large datasets and generalize actions such as the creation, access, updates, and migration. Blaze supports not only HDF, but also SQL, CSV, and JSON. The API usage between Pandas and Blaze is very similar, and it offers a nice tool for developers who need to support multiple backends. In the following example, we will use PyTables directly to create an HDF5 file that is too large to fit in the memory (for an 8GB RAM machine). We will follow the following steps: Create a series of CSV source data files that take up approximately 14 GB of disk space Create an empty HDF5 file Create a table in the HDF5 file and provide the schema metadata and compression options Load the CSV source data into the HDF5 table Query the new data source once the data has been migrated Remember the temperature precipitation data for St. Francis, in Kansas, USA, from a previous notebook? We are going to generate random data with similar columns for the purposes of the HDF5 example. This data will be generated from a normal distribution, which will be used in the guise of the temperature and precipitation data for hundreds of thousands of fictitious towns across the globe for the last century, as follows: In [25]: head = "country,town,year,month,precip,tempn"          row = "{},{},{},{},{},{}n"          filename = "../data/{}.csv"          town_count = 1000          (start_year, end_year) = (1894, 2014)          (start_month, end_month) = (1, 13)          sample_size = (1 + 2                        * town_count * (end_year – start_year)                        * (end_month - start_month))          countries = range(200)          towns = range(town_count)          years = range(start_year, end_year)          months = range(start_month, end_month)          for country in countries:             with open(filename.format(country), "w") as csvfile:                  csvfile.write(head)                  csvdata = ""                  weather_data = norm.rvs(size=sample_size)                  weather_index = 0                  for town in towns:                    for year in years:                          for month in months:                              csvdata += row.format(                                  country, town, year, month,                                  weather_data[weather_index],                                  weather_data[weather_index + 1])                              weather_index += 2                  csvfile.write(csvdata) Note that we generated a sample data population that was twice as large as the expected size in order to pull both the simulated temperature and precipitation data at the same time (from the same set). This will take about 30 minutes to run. When complete, you will see the following files: In [26]: ls -rtm ../data/*.csv          ../data/0.csv, ../data/1.csv, ../data/2.csv,          ../data/3.csv, ../data/4.csv, ../data/5.csv,          ...          ../data/194.csv, ../data/195.csv, ../data/196.csv,          ../data/197.csv, ../data/198.csv, ../data/199.csv A quick look at just one of the files reveals the size of each, as follows: In [27]: ls -lh ../data/0.csv          -rw-r--r-- 1 oubiwann staff 72M Mar 21 19:02 ../data/0.csv With each file that is 72 MB in size, we have data that takes up 14 GB of disk space, which exceeds the size of the RAM of the system in question. Furthermore, running queries against so much data in the .csv files isn't going to be very efficient. It's going to take a long time. So what are our options? Well, to read this data, HDF5 is a very good fit. In fact, it is designed for jobs like this. We will use PyTables to convert the .csv files to a single HDF5. We'll start by creating an empty table file, as follows: In [28]: tb_name = "../data/weather.h5t"          h5 = tb.open_file(tb_name, "w")          h5 Out[28]: File(filename=../data/weather.h5t, title='', mode='w',              root_uep='/', filters=Filters(                  complevel=0, shuffle=False, fletcher32=False,                  least_significant_digit=None))          / (RootGroup) '' Next, we'll provide some assistance to PyTables by indicating the data types of each column in our table, as follows: In [29]: data_types = np.dtype(              [("country", "<i8"),              ("town", "<i8"),              ("year", "<i8"),              ("month", "<i8"),               ("precip", "<f8"),              ("temp", "<f8")]) Also, let's define a compression filter that can be used by PyTables when saving our data, as follows: In [30]: filters = tb.Filters(complevel=5, complib='blosc') Now, we can create a table inside our new HDF5 file, as follows: In [31]: tab = h5.create_table(              "/", "weather",              description=data_types,              filters=filters) With that done, let's load each CSV file, read it in chunks so that we don't overload the memory, and then append it to our new HDF5 table, as follows: In [32]: for filename in glob.glob("../data/*.csv"):          it = pd.read_csv(filename, iterator=True, chunksize=10000)          for chunk in it:              tab.append(chunk.to_records(index=False))            tab.flush() Depending on your machine, the entire process of loading the CSV file, reading it in chunks, and appending to a new HDF5 table can take anywhere from 5 to 10 minutes. However, what started out as a collection of the .csv files that weigh in at 14 GB is now a single compressed 4.8 GB HDF5 file, as shown in the following code: In [33]: h5.get_filesize() Out[33]: 4758762819 Here's the metadata for the PyTables-wrapped HDF5 table after the data insertion: In [34]: tab Out[34]: /weather (Table(288000000,), shuffle, blosc(5)) '' description := { "country": Int64Col(shape=(), dflt=0, pos=0), "town": Int64Col(shape=(), dflt=0, pos=1), "year": Int64Col(shape=(), dflt=0, pos=2), "month": Int64Col(shape=(), dflt=0, pos=3), "precip": Float64Col(shape=(), dflt=0.0, pos=4), "temp": Float64Col(shape=(), dflt=0.0, pos=5)} byteorder := 'little' chunkshape := (1365,) Now that we've created our file, let's use it. Let's excerpt a few lines with an array slice, as follows: In [35]: tab[100000:100010] Out[35]: array([(0, 69, 1947, 5, -0.2328834718674, 0.06810312195695),          (0, 69, 1947, 6, 0.4724989007889, 1.9529216219569),          (0, 69, 1947, 7, -1.0757216683235, 1.0415374480545),          (0, 69, 1947, 8, -1.3700249968748, 3.0971874991576),          (0, 69, 1947, 9, 0.27279758311253, 0.8263207523831),          (0, 69, 1947, 10, -0.0475253104621, 1.4530808932953),          (0, 69, 1947, 11, -0.7555493935762, -1.2665440609117),          (0, 69, 1947, 12, 1.540049376928, 1.2338186532516),          (0, 69, 1948, 1, 0.829743501445, -0.1562732708511),          (0, 69, 1948, 2, 0.06924900463163, 1.187193711598)],          dtype=[('country', '<i8'), ('town', '<i8'),                ('year', '<i8'), ('month', '<i8'),                ('precip', '<f8'), ('temp', '<f8')]) In [36]: tab[100000:100010]["precip"] Out[36]: array([-0.23288347, 0.4724989 , -1.07572167,                -1.370025 , 0.27279758, -0.04752531,                -0.75554939, 1.54004938, 0.8297435 ,                0.069249 ]) When we're done with the file, we do the same thing that we would do with any other file-like object: In [37]: h5.close() If we want to work with it again, simply load it, as follows: In [38]: h5 = tb.open_file(tb_name, "r")          tab = h5.root.weather Let's try plotting the data from our HDF5 file: In [39]: (figure, axes) = plt.subplots(figsize=(20, 10))          axes.hist(tab[:1000000]["temp"], bins=100)          plt.show() Here's a plot for the first million data points: This histogram was rendered quickly, with a much better response time than what we've seen before. Hence, the process of accessing the HDF5 data is very fast. The next question might be "What about executing calculations against this data?" Unfortunately, running the following will consume an enormous amount of RAM: tab[:]["temp"].mean() We've just asked for all of the data—all of its 288 million rows. We are going to end up loading everything into the RAM, grinding the average workstation to a halt. Ideally though, when you iterate through the source data and create the HDF5 file, you also crunch the numbers that you will need, adding supplemental columns or groups to the HDF5 file that can be used later by you and your peers. If we have data that we will mostly be selecting (extracting portions) and which has already been crunched and grouped as needed, HDF5 is a very good fit. This is why one of the most common use cases that you see for HDF5 is the sharing and distribution of the processed data. However, if we have data that we need to process repeatedly, then we will either need to use another method besides the one that will cause all the data to be loaded into the memory, or find a better match for our data processing needs. We saw in the previous section that the selection of data is very fast in HDF5. What about calculating the mean for a small section of data? If we've got a total of 288 million rows, let's select a divisor of the number that gives us several hundred thousand rows at a time—2,81,250 rows, to be more precise. Let's get the mean for the first slice, as follows: In [40]: tab[0:281250]["temp"].mean() Out[40]: 0.0030696632864265312 This took about 1 second to calculate. What about iterating through the records in a similar fashion? Let's break up the 288 million records into chunks of the same size; this will result in 1024 chunks. We'll start by getting the ranges needed for an increment of 281,250 and then, we'll examine the first and the last row as a sanity check, as follows: In [41]: limit = 281250          ranges = [(x * limit, x * limit + limit)              for x in range(2 ** 10)]          (ranges[0], ranges[-1]) Out[41]: ((0, 281250), (287718750, 288000000)) Now, we can use these ranges to generate the mean for each chunk of 281,250 rows of temperature data and print the total number of means that we generated to make sure that we're getting our counts right, as follows: In [42]: means = [tab[x * limit:x * limit + limit]["temp"].mean()              for x in range(2 ** 10)]          len(means) Out[42]: 1024 Depending on your machine, that should take between 30 and 60 seconds. With this work done, it's now easy to calculate the mean for all of the 288 million points of temperature data: In [43]: sum(means) / len(means) Out[43]: -5.3051780413782918e-05 Through HDF5's efficient file format and implementation, combined with the splitting of our operations into tasks that would not copy the HDF5 data into memory, we were able to perform calculations across a significant fraction of a billion records in less than a minute. HDF5 even supports parallelization. So, this can be improved upon with a little more time and effort. However, there are many cases where HDF5 is not a practical choice. You may have some free-form data, and preprocessing it will be too expensive. Alternatively, the datasets may be actually too large to fit on a single machine. This is when you may consider using matplotlib with distributed data. Summary In this article, we covered the role of NumPy in the world of big data and matplotlib as well as the process and problems in working with large data sources. Also, we discussed the possible solutions to these problems using NumPy's memmap function and HDF5 and PyTables. Resources for Article: Further resources on this subject: First Steps [article] Introducing Interactive Plotting [article] The plot function [article]
Read more
  • 0
  • 0
  • 5127
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-deployment-preparations
Packt
08 Jul 2015
23 min read
Save for later

Deployment Preparations

Packt
08 Jul 2015
23 min read
In this article by Jurie-Jan Botha, author of the book Grunt Cookbook, has covered the following recipes: Minifying HTML Minifying CSS Optimizing images Linting JavaScript code Uglifying JavaScript code Setting up RequireJS (For more resources related to this topic, see here.) Once our web application is built and its stability ensured, we can start preparing it for deployment to its intended market. This will mainly involve the optimization of the assets that make up the application. Optimization in this context mostly refers to compression of one kind or another, some of which might lead to performance increases too. The focus on compression is primarily due to the fact that the smaller the asset, the faster it can be transferred from where it is hosted to a user's web browser. This leads to a much better user experience, and can sometimes be essential to the functioning of an application. Minifying HTML In this recipe, we make use of the contrib-htmlmin (0.3.0) plugin to decrease the size of some HTML documents by minifying them. Getting ready In this example, we'll work with the a basic project structure. How to do it... The following steps take us through creating a sample HTML document and configuring a task that minifies it: We'll start by installing the package that contains the contrib-htmlmin plugin. Next, we'll create a simple HTML document called index.html in the src directory, which we'd like to minify, and add the following content in it: <html> <head>    <title>Test Page</title> </head> <body>    <!-- This is a comment! -->    <h1>This is a test page.</h1> </body> </html> Now, we'll add the following htmlmin task to our configuration, which indicates that we'd like to have the white space and comments removed from the src/index.html file, and that we'd like the result to be saved in the dist/index.html file: htmlmin: { dist: {    src: 'src/index.html',    dest: 'dist/index.html',    options: {      removeComments: true,      collapseWhitespace: true    } } } The removeComments and collapseWhitespace options are used as examples here, as using the default htmlmin task will have no effect. Other minification options can be found at the following URL: https://github.com/kangax/html-minifier#options-quick-reference We can now run the task using the grunt htmlmin command, which should produce output similar to the following: Running "htmlmin:dist" (htmlmin) task Minified dist/index.html 147 B ? 92 B If we now take a look at the dist/index.html file, we will see that all white space and comments have been removed: <html> <head>    <title>Test Page</title> </head> <body>    <h1>This is a test page.</h1> </body> </html> Minifying CSS In this recipe, we'll make use of the contrib-cssmin (0.10.0) plugin to decrease the size of some CSS documents by minifying them. Getting ready In this example, we'll work with a basic project structure. How to do it... The following steps take us through creating a sample CSS document and configuring a task that minifies it. We'll start by installing the package that contains the contrib-cssmin plugin. Then, we'll create a simple CSS document called style.css in the src directory, which we'd like to minify, and provide it with the following contents: body { /* Average body style */ background-color: #ffffff; color: #000000; /*! Black (Special) */ } Now, we'll add the following cssmin task to our configuration, which indicates that we'd like to have the src/style.css file compressed, and have the result saved to the dist/style.min.css file: cssmin: { dist: {    src: 'src/style.css',    dest: 'dist/style.min.css' } } We can now run the task using the grunt cssmin command, which should produce the following output: Running "cssmin:dist" (cssmin) taskFile dist/style.css created: 55 B ? 38 B If we take a look at the dist/style.min.css file that was produced, we will see that it has the compressed contents of the original src/style.css file: body{background-color:#fff;color:#000;/*! Black (Special) */} There's more... The cssmin task provides us with several useful options that can be used in conjunction with its basic compression feature. We'll look at prefixing a banner, removing special comments, and reporting gzipped results. Prefixing a banner In the case that we'd like to automatically include some information about the compressed result in the resulting CSS file, we can do so in a banner. A banner can be prepended to the result by supplying the desired banner content to the banner option, as shown in the following example: cssmin: { dist: {    src: 'src/style.css',    dest: 'dist/style.min.css',    options: {      banner: '/* Minified version of style.css */'    } } } Removing special comments Comments that should not be removed by the minification process are called special comments and can be indicated using the "/*! comment */" markers. By default, the cssmin task will leave all special comments untouched, but we can alter this behavior by making use of the keepSpecialComments option. The keepSpecialComments option can be set to either the *, 1, or 0 value. The * value is the default and indicates that all special comments should be kept, 1 indicates that only the first comment that is found should be kept, and 0 indicates that none of them should be kept. The following configuration will ensure that all comments are removed from our minified result: cssmin: { dist: {    src: 'src/style.css',    dest: 'dist/style.min.css',    options: {      keepSpecialComments: 0    } } } Reporting on gzipped results Reporting is useful to see exactly how well the cssmin task has compressed our CSS files. By default, the size of the targeted file and minified result will be displayed, but if we'd also like to see the gzipped size of the result, we can set the report option to gzip, as shown in the following example: cssmin: { dist: {    src: 'src/main.css',    dest: 'dist/main.css',    options: {      report: 'gzip'    } } } Optimizing images In this recipe, we'll make use of the contrib-imagemin (0.9.4) plugin to decrease the size of images by compressing them as much as possible without compromising on their quality. This plugin also provides a plugin framework of its own, which is discussed at the end of this recipe. Getting ready In this example, we'll work with the basic project structure. How to do it... The following steps take us through configuring a task that will compress an image for our project. We'll start by installing the package that contains the contrib-imagemin plugin. Next, we can ensure that we have an image called image.jpg in the src directory on which we'd like to perform optimizations. Now, we'll add the following imagemin task to our configuration and indicate that we'd like to have the src/image.jpg file optimized, and have the result saved to the dist/image.jpg file: imagemin: { dist: {    src: 'src/image.jpg',    dest: 'dist/image.jpg' } } We can then run the task using the grunt imagemin command, which should produce the following output: Running "imagemin:dist" (imagemin) task Minified 1 image (saved 13.36 kB) If we now take a look at the dist/image.jpg file, we will see that its size has decreased without any impact on the quality. There's more... The imagemin task provides us with several options that allow us to tweak its optimization features. We'll look at how to adjust the PNG compression level, disable the progressive JPEG generation, disable the interlaced GIF generation, specify SVGO plugins to be used, and use the imagemin plugin framework. Adjusting the PNG compression level The compression of a PNG image can be increased by running the compression algorithm on it multiple times. By default, the compression algorithm is run 16 times. This number can be changed by providing a number from 0 to 7 to the optimizationLevel option. The 0 value means that the compression is effectively disabled and 7 indicates that the algorithm should run 240 times. In the following configuration we set the compression level to its maximum: imagemin: { dist: {    src: 'src/image.png',    dest: 'dist/image.png',    options: {      optimizationLevel: 7    } } } Disabling the progressive JPEG generation Progressive JPEGs are compressed in multiple passes, which allows a low-quality version of them to quickly become visible and increase in quality as the rest of the image is received. This is especially helpful when displaying images over a slower connection. By default, the imagemin plugin will generate JPEG images in the progressive format, but this behavior can be disabled by setting the progressive option to false, as shown in the following example: imagemin: { dist: {    src: 'src/image.jpg',    dest: 'dist/image.jpg',    options: {      progressive: false    } } } Disabling the interlaced GIF generation An interlaced GIF is the equivalent of a progressive JPEG in that it allows the contained image to be displayed at a lower resolution before it has been fully downloaded, and increases in quality as the rest of the image is received. By default, the imagemin plugin will generate GIF images in the interlaced format, but this behavior can be disabled by setting the interlaced option to false, as shown in the following example: imagemin: { dist: {    src: 'src/image.gif',    dest: 'dist/image.gif',    options: {      interlaced: false    } } } Specifying SVGO plugins to be used When optimizing SVG images, the SVGO library is used by default. This allows us to specify the use of various plugins provided by the SVGO library that each performs a specific function on the targeted files. Refer to the following URL for more detailed instructions on how to use the svgo plugins options and the SVGO library: https://github.com/sindresorhus/grunt-svgmin#available-optionsplugins Most of the plugins in the library are enabled by default, but if we'd like to specifically indicate which of these should be used, we can do so using the svgoPlugins option. Here, we can provide an array of objects, where each contain a property with the name of the plugin to be affected, followed by a true or false value to indicate whether it should be activated. The following configuration disables three of the default plugins: imagemin: { dist: {    src: 'src/image.svg',    dest: 'dist/image.svg',    options: {      svgoPlugins: [        {removeViewBox:false},        {removeUselessStrokeAndFill:false},        {removeEmptyAttrs:false}      ]    } } } Using the 'imagemin' plugin framework In order to provide support for the various image optimization projects, the imagemin plugin has a plugin framework of its own that allows developers to easily create an extension that makes use of the tool they require. You can get a list of the available plugin modules for the imagemin plugin's framework at the following URL: https://www.npmjs.com/browse/keyword/imageminplugin The following steps will take us through installing and making use of the mozjpeg plugin to compress an image in our project. These steps start where the main recipe takes off. We'll start by installing the imagemin-mozjpeg package using the npm install imagemin-mozjpeg command, which should produce the following output: imagemin-mozjpeg@4.0.0 node_modules/imagemin-mozjpeg With the package installed, we need to import it into our configuration file, so that we can make use of it in our task configuration. We do this by adding the following line at the top of our Gruntfile.js file: var mozjpeg = require('imagemin-mozjpeg'); With the plugin installed and imported, we can now change the configuration of our imagemin task by adding the use option and providing it with the initialized plugin: imagemin: { dist: {    src: 'src/image.jpg',    dest: 'dist/image.jpg',    options: {      use: [mozjpeg()]    } } } Finally, we can test our setup by running the task using the grunt imagemin command. This should produce an output similar to the following: Running "imagemin:dist" (imagemin) task Minified 1 image (saved 9.88 kB) Linting JavaScript code In this recipe, we'll make use of the contrib-jshint (0.11.1) plugin to detect errors and potential problems in our JavaScript code. It is also commonly used to enforce code conventions within a team or project. As can be derived from its name, it's basically a Grunt adaptation for the JSHint tool. Getting ready In this example, we'll work with the basic project structure. How to do it... The following steps take us through creating a sample JavaScript file and configuring a task that will scan and analyze it using the JSHint tool. We'll start by installing the package that contains the contrib-jshint plugin. Next, we'll create a sample JavaScript file called main.js in the src directory, and add the following content in it: sample = 'abc'; console.log(sample); With our sample file ready, we can now add the following jshint task to our configuration. We'll configure this task to target the sample file and also add a basic option that we require for this example: jshint: { main: {    options: {      undef: true    },    src: ['src/main.js'] } } The undef option is a standard JSHint option used specifically for this example and is not required for this plugin to function. Specifying this option indicates that we'd like to have errors raised for variables that are used without being explicitly defined. We can now run the task using the grunt jshint command, which should produce output informing us of the problems found in our sample file: Running "jshint:main" (jshint) task      src/main.js      1 |sample = 'abc';          ^ 'sample' is not defined.      2 |console.log(sample);          ^ 'console' is not defined.      2 |console.log(sample);                      ^ 'sample' is not defined.   >> 3 errors in 1 file There's more... The jshint task provides us with several options that allow us to change its general behavior, in addition to how it analyzes the targeted code. We'll look at how to specify standard JSHint options, specify globally defined variables, send reported output to a file, and prevent task failure on JSHint errors. Specifying standard JSHint options The contrib-jshint plugin provides a simple way to pass all the standard JSHint options from the task's options object to the underlying JSHint tool. A list of all the options provided by the JSHint tool can be found at the following URL: http://jshint.com/docs/options/ The following example adds the curly option to the task we created in our main recipe to enforce the use of curly braces wherever they are appropriate: jshint: { main: {    options: {      undef: true,      curly: true    },    src: ['src/main.js'] } } Specifying globally defined variables Making use of globally defined variables is quite common when working with JavaScript, which is where the globals option comes in handy. Using this option, we can define a set of global values that we'll use in the targeted code, so that errors aren't raised when JSHint encounters them. In the following example, we indicate that the console variable should be treated as a global, and not raise errors when encountered: jshint: { main: {    options: {      undef: true,      globals: {        console: true      }    },    src: ['src/main.js'] } } Sending reported output to a file If we'd like to store the resulting output from our JSHint analysis, we can do so by specifying a path to a file that should receive it using the reporterOutput option, as shown in the following example: jshint: { main: {    options: {      undef: true,      reporterOutput: 'report.dat'    },    src: ['src/main.js'] } } Preventing task failure on JSHint errors The default behavior for the jshint task is to exit the running Grunt process once a JSHint error is encountered in any of the targeted files. This behavior becomes especially undesirable if you'd like to keep watching files for changes, even when an error has been raised. In the following example, we indicate that we'd like to keep the process running when errors are encountered by giving the force option a true value: jshint: { main: {    options: {      undef: true,      force: true    },    src: ['src/main.js'] } } Uglifying JavaScript Code In this recipe, we'll make use of the contrib-uglify (0.8.0) plugin to compress and mangle some files containing JavaScript code. For the most part, the process of uglifying just removes all the unnecessary characters and shortens variable names in a source code file. This has the potential to dramatically reduce the size of the file, slightly increase performance, and make the inner workings of your publicly available code a little more obscure. Getting ready In this example, we'll work with the basic project structure. How to do it... The following steps take us through creating a sample JavaScript file and configuring a task that will uglify it. We'll start by installing the package that contains the contrib-uglify plugin. Then, we can create a sample JavaScript file called main.js in the src directory, which we'd like to uglify, and provide it with the following contents: var main = function () { var one = 'Hello' + ' '; var two = 'World';   var result = one + two;   console.log(result); }; With our sample file ready, we can now add the following uglify task to our configuration, indicating the sample file as the target and providing a destination output file: uglify: { main: {    src: 'src/main.js',    dest: 'dist/main.js' } } We can now run the task using the grunt uglify command, which should produce output similar to the following: Running "uglify:main" (uglify) task >> 1 file created. If we now take a look at the resulting dist/main.js file, we should see that it contains the uglified contents of the original src/main.js file. There's more... The uglify task provides us with several options that allow us to change its general behavior and see how it uglifies the targeted code. We'll look at specifying standard UglifyJS options, generating source maps, and wrapping generated code in an enclosure. Specifying standard UglifyJS options The underlying UglifyJS tool can provide a set of options for each of its separate functional parts. These parts are the mangler, compressor, and beautifier. The contrib-plugin allows passing options to each of these parts using the mangle, compress, and beautify options. The available options for each of the mangler, compressor, and beautifier parts can be found at each of following URLs (listed in the order mentioned): https://github.com/mishoo/UglifyJS2#mangler-options https://github.com/mishoo/UglifyJS2#compressor-options https://github.com/mishoo/UglifyJS2#beautifier-options The following example alters the configuration of the main recipe to provide a single option to each of these parts: uglify: { main: {    src: 'src/main.js',    dest: 'dist/main.js',    options: {      mangle: {        toplevel: true      },      compress: {        evaluate: false      },      beautify: {        semicolons: false      }    } } } Generating source maps As code gets mangled and compressed, it becomes effectively unreadable to humans, and therefore, nearly impossible to debug. For this reason, we are provided with the option of generating a source map when uglifying our code. The following example makes use of the sourceMap option to indicate that we'd like to have a source map generated along with our uglified code: uglify: { main: {    src: 'src/main.js',    dest: 'dist/main.js',    options: {      sourceMap: true    } } } Running the altered task will now, in addition to the dist/main.js file with our uglified source, generate a source map file called main.js.map in the same directory as the uglified file. Wrapping generated code in an enclosure When building your own JavaScript code modules, it's usually a good idea to have them wrapped in a wrapper function to ensure that you don't pollute the global scope with variables that you won't be using outside of the module itself. For this purpose, we can use the wrap option to indicate that we'd like to have the resulting uglified code wrapped in a wrapper function, as shown in the following example: uglify: { main: {    src: 'src/main.js',    dest: 'dist/main.js',    options: {      wrap: true    } } } If we now take a look at the result dist/main.js file, we should see that all the uglified contents of the original file are now contained within a wrapper function. Setting up RequireJS In this recipe, we'll make use of the contrib-requirejs (0.4.4) plugin to package the modularized source code of our web application into a single file. For the most part, this plugin just provides a wrapper for the RequireJS tool. RequireJS provides a framework to modularize JavaScript source code and consume those modules in an orderly fashion. It also allows packaging an entire application into one file and importing only the modules that are required while keeping the module structure intact. Getting ready In this example, we'll work with the basic project structure. How to do it... The following steps take us through creating some files for a sample application and setting up a task that bundles them into one file. We'll start by installing the package that contains the contrib-requirejs plugin. First, we'll need a file that will contain our RequireJS configuration. Let's create a file called config.js in the src directory and add the following content in it: require.config({ baseUrl: 'app' }); Secondly, we'll create a sample module that we'd like to use in our application. Let's create a file called sample.js in the src/app directory and add the following content in it: define(function (require) { return function () {    console.log('Sample Module'); } }); Lastly, we'll need a file that will contain the main entry point for our application, and also makes use of our sample module. Let's create a file called main.js in the src/app directory and add the following content in it: require(['sample'], function (sample) { sample(); }); Now that we've got all the necessary files required for our sample application, we can setup a requirejs task that will bundle it all into one file: requirejs: { app: {    options: {      mainConfigFile: 'src/config.js',      name: 'main',      out: 'www/js/app.js'    } } } The mainConfigFile option points out the configuration file that will determine the behavior of RequireJS. The name option indicates the name of the module that contains the application entry point. In the case of this example, our application entry point is contained in the app/main.js file, and app is the base directory of our application in the src/config.js file. This translates the app/main.js filename into the main module name. The out option is used to indicate the file that should receive the result of the bundled application. We can now run the task using the grunt requirejs command, which should produce output similar to the following: Running "requirejs:app" (requirejs) task We should now have a file named app.js in the www/js directory that contains our entire sample application. There's more... The requirejs task provides us with all the underlying options provided by the RequireJS tool. We'll look at how to use these exposed options and generate a source map. Using RequireJS optimizer options The RequireJS optimizer is quite an intricate tool, and therefore, provides a large number of options to tweak its behavior. The contrib-requirejs plugin allows us to easily set any of these options by just specifying them as options of the plugin itself. A list of all the available configuration options for the RequireJS build system can be found in the example configuration file at the following URL: https://github.com/jrburke/r.js/blob/master/build/example.build.js The following example indicates that the UglifyJS2 optimizer should be used instead of the default UglifyJS optimizer by using the optimize option: requirejs: { app: {    options: {      mainConfigFile: 'src/config.js',      name: 'main',      out: 'www/js/app.js',      optimize: 'uglify2'    } } } Generating a source map When the source code is bundled into one file, it becomes somewhat harder to debug, as you now have to trawl through miles of code to get to the point you're actually interested in. A source map can help us with this issue by relating the resulting bundled file to the modularized structure it is derived from. Simply put, with a source map, our debugger will display the separate files we had before, even though we're actually using the bundled file. The following example makes use of the generateSourceMap option to indicate that we'd like to generate a source map along with the resulting file: requirejs: { app: {    options: {      mainConfigFile: 'src/config.js',      name: 'main',      out: 'www/js/app.js',      optimize: 'uglify2',      preserveLicenseComments: false,      generateSourceMaps: true    } } } In order to use the generateSourceMap option, we have to indicate that UglifyJS2 is to be used for optimization, by setting the optimize option to uglify2, and that license comments should not be preserved, by setting the preserveLicenseComments option to false. Summary This article covers the optimization of images, minifying of CSS, ensuring the quality of our JavaScript code, compressing it, and packaging it all together into one source file. Resources for Article: Further resources on this subject: Grunt in Action [article] So, what is Node.js? [article] Exploring streams [article]
Read more
  • 0
  • 0
  • 1543

Packt
08 Jul 2015
21 min read
Save for later

To Be or Not to Be – Optionals

Packt
08 Jul 2015
21 min read
In this article by Andrew J Wagner, author of the book Learning Swift, we will cover: What is an optional? How to unwrap an optional Optional chaining Implicitly unwrapped optionals How to debug optionals The underlying implementation of an optional (For more resources related to this topic, see here.) Introducing optionals So, we know that the purpose of optionals in Swift is to allow the representation of the absent value, but what does that look like and how does it work? An optional is a special type that can wrap any other type. This means that you can make an optional String, optional Array, and so on. You can do this by adding a question mark (?) to the type name: var possibleString: String? var possibleArray: [Int]? Note that this code does not specify any initial values. This is because all optionals, by default, are set to no value at all. If we want to provide an initial value, we can do so like any other variable: var possibleInt: Int? = 10 Also note that, if we leave out the type specification (: Int?), possibleInt would be inferred to be of the Int type instead of an Int optional. It is pretty verbose to say that a variable lacks a value. Instead, if an optional lacks a variable, we say that it is nil. So, both possibleString and possibleArray are nil, while possibleInt is 10. However, possibleInt is not truly 10. It is still wrapped in an optional. You can see all the forms a variable can take by putting the following code in to a playground: var actualInt = 10 var possibleInt: Int? = 10 var nilInt: Int? println(actualInt) // "10" println(possibleInt) // "Optional(10)" println(nilInt) // "nil" As you can see, actualInt prints out as we expect it to, but possibleInt prints out as an optional that contains the value 10 instead of just 10. This is a very important distinction because an optional cannot be used as if it were the value it wraps. The nilInt optional just reports that it is nil. At any point, you can update the value within an optional, including the fact that you can give it a value for the first time using the assignment operator (=): nilInt = 2 println(nilInt) // "Optional(2)" You can even remove the value within an optional by assigning it to nil: nilInt = nil println(nilInt) // "nil" So, we have this wrapped form of a variable that may or may not contain a value. What do we do if we need to access the value within an optional? The answer is that we must unwrap it. Unwrapping an optional There are multiple ways to unwrap an optional. All of them essentially assert that there is truly a value within the optional. This is a wonderful safety feature of Swift. The compiler forces you to consider the possibility that an optional lacks any value at all. In other languages, this is a very commonly overlooked scenario that can cause obscure bugs. Optional binding The safest way to unwrap an optional is using something called optional binding. With this technique, you can assign a temporary constant or variable to the value contained within the optional. This process is contained within an if statement, so that you can use an else statement for when there is no value. An optional binding looks like this: if let string = possibleString {    println("possibleString has a value: \(string)") } else {    println("possibleString has no value") } An optional binding is distinguished from an if statement primarily by the if let syntax. Semantically, this code says "if you can let the constant string be equal to the value within possibleString, print out its value; otherwise, print that it has no value." The primary purpose of an optional binding is to create a temporary constant that is the normal (nonoptional) version of the optional. It is also possible to use a temporary variable in an optional binding: possibleInt = 10 if var int = possibleInt {    int *= 2 } println(possibleInt) // Optional(10) Note that an astrix (*) is used for multiplication in Swift. You should also note something important about this code, that is, if you put it into a playground, even though we multiplied int by 2, the value does not change. When we print out possibleInt later, the value still remains Optional(10). This is because even though we made the int variable (otherwise known as mutable), it is simply a temporary copy of the value within possibleInt. No matter what we do with int, nothing will be changed about the value within possibleInt. If we need to update the actual value stored within possibleInt, we need to simply assign possibleInt to int after we are done modifying it: possibleInt = 10 if var int = possibleInt {    int *= 2    possibleInt = int } println(possibleInt) // Optional(20) Now the value wrapped inside possibleInt has actually been updated. A common scenario that you will probably come across is the need to unwrap multiple optional values. One way of doing this is by simply nesting the optional bindings: if let actualString = possibleString {    if let actualArray = possibleArray {        if let actualInt = possibleInt {            println(actualString)            println(actualArray)            println(actualInt)        }    } } However, this can be a pain as it increases the indentation level each time to keep the code organized. Instead, you can actually list multiple optional bindings in a single statement separated by commas: if let actualString = possibleString,    let actualArray = possibleArray,    let actualInt = possibleInt {    println(actualString)    println(actualArray)    println(actualInt) } This generally produces more readable code. This way of unwrapping is great, but saying that optional binding is the safe way to access the value within an optional implies that there is an unsafe way to unwrap an optional. This way is called forced unwrapping. Forced unwrapping The shortest way to unwrap an optional is by forced unwrapping. This is done using an exclamation mark (!) after the variable name when it is used: possibleInt = 10 possibleInt! *= 2   println(possibleInt) // "Optional(20)" However, the reason it is considered unsafe is that your entire program crashes if you try to unwrap an optional that is currently nil: nilInt! *= 2 // fatal error The full error you get is "unexpectedly found as nil while unwrapping an optional value". This is because forced unwrapping is essentially your personal guarantee that the optional truly holds a value. This is why it is called forced. Therefore, forced unwrapping should be used in limited circumstances. It should never be used just to shorten up the code. Instead, it should only be used when you can guarantee, from the structure of the code, that it cannot be nil, even though it is defined as an optional. Even in this case, you should check whether it is possible to use a nonoptional variable instead. The only other place you may use it is when your program truly cannot recover if an optional is nil. In these circumstances, you should at least consider presenting an error to the user, which is always better than simply having your program crash. An example of a scenario where forced unwrapping may be used effectively is with lazily calculated values. A lazily calculated value is a value that is not created until the first time it is accessed. To illustrate this, let's consider a hypothetical class that represents a filesystem directory. It would have a property that lists its contents that are lazily calculated. The code would look something like this: class FileSystemItem {} class File: FileSystemItem {} class Directory: FileSystemItem {    private var realContents: [FileSystemItem]?    var contents: [FileSystemItem] {        if self.realContents == nil {           self.realContents = self.loadContents()        }        return self.realContents!    }      private func loadContents() -> [FileSystemItem] {        // Do some loading        return []    } } Here, we defined a superclass called FileSystemItem that both File and Directory inherit from. The contents of a directory is a list of any kind of FileSystemItem. We define content as a calculated variable and store the real value within the realContents property. The calculated property checks whether there is a value yet loaded for realContents; if there isn't, it loads the contents and puts it into the realContents property. Based on this logic, we know with 100 percent certainty that there will be a value within realContents by the time we get to the return statement, so it is perfectly safe to use forced unwrapping. Nil coalescing In addition to optional binding and forced unwrapping, Swift also provides an operator called the nil coalescing operator to unwrap an optional. This is represented by a double question mark (??). Basically, this operator lets us provide a default value for a variable or operation result in case it is nil. This is a safe way to turn an optional value into a nonoptional value and it would look something like this: var possibleString: String? = "An actual string" println(possibleString ?? "Default String")   // "An Actual String" Here, we ask the program to print out possibleString unless it is nil, in which case, it will just print Default String. Since we did give it a value, it printed out that value and it is important to note that it printed out as a regular variable, not as an optional. This is because one way or another, an actual value will be printed. This is a great tool for concisely and safely unwrapping an optional when a default value makes sense. Optional chaining A common scenario in Swift is to have an optional that you must calculate something from. If the optional has a value you want to store the result of the calculation on, but if it is nil, the result should just be set to nil: var invitee: String? = "Sarah" var uppercaseInvitee: String? if let actualInvitee = invitee {    uppercaseInvitee = actualInvitee.uppercaseString } This is pretty verbose. To shorten this up in an unsafe way, we could use forced unwrapping: uppercaseInvitee = invitee!.uppercaseString However, optional chaining will allow us to do this safely. Essentially, it allows optional operations on an optional. When the operation is called, if the optional is nil, it immediately returns nil; otherwise, it returns the result of performing the operation on the value within the optional: uppercaseInvitee = invitee?.uppercaseString So in this call, invitee is an optional. Instead of unwrapping it, we will use optional chaining by placing a question mark (?) after it, followed by the optional operation. In this case, we asked for the uppercaseInvitee property on it. If invitee is nil, uppercaseInvitee is immediately set to nil without it even trying to access uppercaseString. If it actually does contain a value, uppercaseInvitee gets set to the uppercaseString property of the contained value. Note that all optional chains return an optional result. You can chain as many calls, both optional and nonoptional, as you want in this way: var myNumber: String? = "27" myNumber?.toInt()?.advancedBy(10).description This code attempts to add 10 to myNumber, which is represented by String. First, the code uses an optional chain in case myNumber is nil. Then, the call to toInt uses an additional optional chain because that method returns an optional Int type. We then call advancedBy, which does not return an optional, allowing us to access the description of the result without using another optional chain. If at any point any of the optionals are nil, the result will be nil. This can happen for two different reasons: This can happen because myNumber is nil This can also happen because toInt returns nil as it cannot convert String to the Int type If the chain makes it all the way to advanceBy, there is no longer a failure path and it will definitely return an actual value. You will notice that there are exactly two question marks used in this chain and there are two possible failure reasons. At first, it can be hard to understand when you should and should not use a question mark to create a chain of calls. The rule is that you should always use a question mark if the previous element in the chain returns an optional. However, since you are prepared, let's look at what happens if you use an optional chain improperly: myNumber.toInt() // Value of optional type 'String?' not unwrapped In this case, we try to call a method directly on an optional without a chain so that we get an error. We also have the case where we try to inappropriately use an optional chain: var otherNumber = "10" otherNumber?.toInt() // Operand of postfix '?'   should have optional type Here, we get an error that says a question mark can only be used on an optional type. It is great to have a good sense of catching errors, which you will see when you make mistakes, so that you can quickly correct them because we all make silly mistakes from time to time. Another great feature of optional chaining is that it can be used for method calls on an optional that does not actually return a value: var invitees: [String]? = [] invitee?.removeAll(keepCapacity: false) In this case, we only want to call removeAll if there is truly a value within the optional array. So, with this code, if there is a value, all the elements are removed from it: otherwise, it remains nil. In the end, option chaining is a great choice for writing concise code that still remains expressive and understandable. Implicitly unwrapped optionals There is a second type of optional called an implicitly unwrapped optional. There are two ways to look at what an implicitly unwrapped optional is. One way is to say that it is a normal variable that can also be nil. The other way is to say that it is an optional that you don't have to unwrap to use. The important thing to understand about them is that like optionals, they can be nil, but like a normal variable, you do not have to unwrap them. You can define an implicitly unwrapped optional with an exclamation mark (!) instead of a question mark (?) after the type name: var name: String! Just like with regular optionals, implicitly unwrapped optionals do not need to be given an initial value because they are nil by default. At first, this may sound like it is the best of both worlds, but in reality, it is more like the worst of both worlds. Even though an implicitly unwrapped optional does not have to be unwrapped, it will crash your entire program if it is nil when used: name.uppercaseString // Crash A great way to think about them is that every time an implicitly unwrapped optional is used, it is implicitly performing a forced unwrapping. The exclamation mark is placed in its type declaration instead of using it every time. This is particularly bad because it appears the same as any other variable except for how it is declared. This means that it is very unsafe to use, unlike a normal optional. So, if implicitly unwrapped optionals are the worst of both worlds and are so unsafe, why do they even exist? The reality is that in rare circumstances, they are necessary. They are used in circumstances where a variable is not truly optional, but you also cannot give an initial value to it. This is almost always true in the case of custom types that have a member variable that is nonoptional, but cannot be set during initialization. A rare example of this is a view in iOS. UIKit, as we discussed earlier, is the framework that Apple provides for iOS development. In it, Apple has a class called UIView that is used for displaying content on the screen. Apple also provides a tool in Xcode called Interface Builder that lets you design these views in a visual editor instead of in code. Many views designed in this way need references to other views that can be accessed programmatically later. When one of these views is loaded, it is initialized without anything connected and then all the connections are made. Once all the connections are made, a function called awakeFromNib is called on the view. This means that these connections are not available for use during initialization, but are available once awakeFromNib is called. This order of operations also ensures that awakeFromNib is always called before anything actually uses the view. This is a circumstance where it is necessary to use an implicitly unwrapped optional. A member variable may not be defined until the view is initialized and when it is completely loaded: import UIKit class MyView: UIView {    @IBOutlet var button : UIButton!    var buttonOriginalWidth : CGFloat!      override func awakeFromNib() {        self.buttonOriginalWidth = self.button.frame.size.width    } } Note that we have actually declared two implicitly unwrapped optionals. The first is a connection to button. We know this is a connection because it is preceded by @IBOutlet. This is declared as an implicitly unwrapped optional because the connections are not set up until after initialization, but they are still guaranteed to be set up before any other methods are called on the view. This also then leads us to make our second variable, buttonOriginalWidth, implicitly unwrapped because we need to wait until the connection is made before we can determine the width of button. After awakeFromNib is called, it is safe to treat both button and buttonOriginalWidth as nonoptional. You may have noticed that we had to dive pretty deep in to app development in order to find a valid use case for implicitly unwrapped optionals, and this is arguably only because UIKit is implemented in Objective-C. Debugging optionals We already saw a couple of compiler errors that we commonly see because of optionals. If we try to call a method on an optional that we intended to call on the wrapped value, we will get an error. If we try to unwrap a value that is not actually optional, we will get an error that the variable or constant is not optional. We also need to be prepared for runtime errors that optionals can cause. As discussed, optionals cause runtime errors if you try to forcefully unwrap an optional that is nil. This can happen with both explicit and implicit forced unwrapping. If you followed my advice so far in this article, this should be a rare occurrence. However, we all end up working with third-party code, and maybe they were lazy or maybe they used forced unwrapping to enforce their expectations about how their code should be used. Also, we all suffer from laziness from time to time. It can be exhausting or discouraging to worry about all the edge cases when you are excited about programming the main functionality of your app. We may use forced unwrapping temporarily while we worry about that main functionality and plan to come back to handle it later. After all, during development, it is better to have a forced unwrapping crash the development version of your app than it is for it to fail silently if you have not yet handled that edge case. We may even decide that an edge case is not worth the development effort of handling because everything about developing an app is a trade-off. Either way, we need to recognize a crash from forced unwrapping quickly, so that we don't waste extra time trying to figure out what went wrong. When an app tries to unwrap a nil value, if you are currently debugging the app, Xcode shows you the line that tries to do the unwrapping. The line reports that there was EXC_BAD_INSTRUCTION and you will also get a message in the console saying fatal error: unexpectedly found nil while unwrapping an Optional value:   You will also sometimes have to look at which code currently calls the code that failed. To do that, you can use the call stack in Xcode. When your program crashes, Xcode automatically displays the call stack, but you can also manually show it by going to View | Navigators | Show Debug Navigator. This will look something as follows:   Here, you can click on different levels of code to see the state of things. This becomes even more important if the program crashes within one of Apple's framework, where you do not have access to the code. In that case, you should move up the call stack to the point where your code is called in the framework. You may also be able to look at the names of the functions to help you figure out what may have gone wrong. Anywhere on the call stack, you can look at the state of the variables in the debugger, as shown in the following screenshot:   If you do not see this variable's view, you can display it by clicking on the button at the bottom-left corner, which is second from the right that will be grayed out. Here, you can see that invitee is indeed nil, which is what caused the crash. As powerful as the debugger is, if you find that it isn't helping you find the problem, you can always put println statements in important parts of the code. It is always safe to print out an optional as long as you don't forcefully unwrap it like in the preceding example. As we saw earlier, when an optional is printed, it will print nil if it doesn't have a value or it will print Optional(<value>) if it does have a value. Debugging is an extremely important part of becoming a productive developer because we all make mistakes and create bugs. Being a great developer means that you can identify problems quickly and understand how to fix them soon after that. This will largely come from practice, but it will also come when you have a firm grasp of what really happens with your code instead of simply adapting some code you find online to fit your needs through trial and error. The underlying implementation At this point, you should have a pretty strong grasp of what an optional is and how to use and debug it, but it is valuable to look deeper at optionals and see how they actually work. In reality, the question mark syntax for optionals is just a special shorthand. Writing String? is equivalent to writing Optional<String>. Writing String! is equivalent to writing ImplicitlyUnwrappedOptional<String>. The Swift compiler has shorthand versions because they are so commonly used This allows the code to be more concise and readable. If you declare an optional using the long form, you can see Swift's implementation by holding command and clicking on the word Optional. Here, you can see that Optional is implemented as an enumeration. If we simplify the code a little, we have: enum Optional<T> {    case None    case Some(T) } So, we can see that Optional really has two cases: None and Some. None stands for the nil case, while the Some case has an associated value, which is the value wrapped inside Optional. Unwrapping is then the process of retrieving the associated value out of the Some case. One part of this that you have not seen yet is the angled bracket syntax (<T>). This is a generic and essentially allows the enumeration to have an associated value of any type. Realizing that optionals are simply enumerations will help you to understand how to use them. It also gives you some insight into how concepts are built on top of other concepts. Optionals seem really complex until you realize that they are just two-case enumerations. Once you understand enumerations, you can pretty easily understand optionals as well. Summary We only covered a single concept, optionals, in this article, but we saw that this is a pretty dense topic. We saw that at the surface level, optionals are pretty straightforward. They offer a way to represent a variable that has no value. However, there are multiple ways to get access to the value wrapped within an optional, which have very specific use cases. Optional binding is always preferred as it is the safest method, but we can also use forced unwrapping if we are confident that an optional is not nil. We also have a type called implicitly unwrapped optional to delay the assigning of a variable that is not intended to be optional, but we should use it sparingly because there is almost always a better alternative. Resources for Article: Further resources on this subject: Network Development with Swift [article] Flappy Swift [article] Playing with Swift [article]
Read more
  • 0
  • 0
  • 5063

article-image-extending-chef
Packt
07 Jul 2015
34 min read
Save for later

Extending Chef

Packt
07 Jul 2015
34 min read
In this article by Mayank Joshi, the author of Mastering Chef, we'll learn how to go about building custom Knife plugins and we'll also see how we can write custom handlers that can help us extend the functionality provided by a chef-client run to report any issues with a chef-client run. (For more resources related to this topic, see here.) Custom Knife plugins Knife is one of the most widely used tools in the Chef ecosystem. Be it managing your clients, nodes, cookbooks, environments, roles, users, or handling stuff such as provisioning machines in Cloud environments such as Amazon AWS, Microsoft Azure, and so on, there is a way to go about doing all of these things through Knife. However, Knife, as provided during installation of Chef, isn't capable of performing all these tasks on its own. It comes with a basic set of functionalities, which helps provide an interface between the local Chef repository, workstation and the Chef server. The following are the functionalities, which is provided, by default, by the Knife executable: Management of nodes Management of clients and users Management of cookbooks, roles, and environments Installation of chef-client on the nodes through bootstrapping Searching for data that is indexed on the Chef server. However, apart from these functions, there are plenty more functions that can be performed using Knife; all this is possible through the use of plugins. Knife plugins are a set of one (or more) subcommands that can be added to Knife to support an additional functionality that is not built into the base set of Knife subcommands. Most of the Knife plugins are initially built by users such as you, and over a period of time, they are incorporated into the official Chef code base. A Knife plugin is usually installed into the ~/.chef/plugins/knife directory, from where it can be executed just like any other Knife subcommand. It can also be loaded from the .chef/plugins/knife directory in the Chef repository or if it's installed through RubyGems, it can be loaded from the path where the executable is installed. Ideally, a plugin should be kept in the ~/.chef/plugins/knife directory so that it's reusable across projects, and also in the .chef/plugins/knife directory of the Chef repository so that its code can be shared with other team members. For distribution purpose, it should ideally be distributed as a Ruby gem. The skeleton of a Knife plugin A Knife plugin is structured somewhat like this: require 'chef/knife'   module ModuleName class ClassName < Chef::Knife      deps do      require 'chef/dependencies'    end      banner "knife subcommand argument VALUE (options)"      option :name_of_option      :short => "-l value",      :long => "--long-option-name value",      :description => "The description of the option",      :proc => Proc.new { code_to_be_executed },      :boolean => true | false,      :default => default_value      def run      #Code    end end end Let's look at this skeleton, one line at a time: require: This is used to require other Knife plugins required by a new plugin. module ModuleName: This defines the namespace in which the plugin will live. Every Knife plugin lives in its own namespace. class ClassName < Chef::Knife: This declares that a plugin is a subclass of Knife. deps do: This defines a list of dependencies. banner: This is used to display a message when a user enters Knife subcommand –help. option :name_of_option: This defines all the different command line options available for this new subcommand. def run: This is the place in which we specify the Ruby code that needs to be executed. Here are the command-line options: :short defines the short option name :long defines the long option name :description defines a description that is displayed when a user enters knife subclassName –help :boolean defines whether an option is true or false; if the :short and :long names define value, then this attribute should not be used :proc defines the code that determines the value for this option :default defines a default value The following example shows a part of a Knife plugin named knife-windows: require 'chef/knife' require 'chef/knife/winrm_base'base'   class Chef class Knife    class Winrm < Knife        include Chef::Knife::WinrmBase        deps do        require 'readline'        require 'chef/search/query'        require 'em-winrm'      end        attr_writer :password        banner "knife winrm QUERY COMMAND (options)"        option :attribute,        :short => "-a ATTR",        :long => "--attribute ATTR",        :description => "The attribute to use for opening the connection - default is fqdn",        :default => "fqdn"        ... # more options        def session        session_opts = {}        session_opts[:logger] = Chef::Log.logger if Chef::Log.level == :debug        @session ||= begin          s = EventMachine::WinRM::Session.new(session_opts)          s.on_output do |host, data|            print_data(host, data)          end          s.on_error do |host, err|            print_data(host, err, :red)          end          s.on_command_complete do |host|             host = host == :all ? 'All Servers' : host            Chef::Log.debug("command complete on #{host}")          end          s        end        end        ... # more def blocks      end end end Namespace As we saw with skeleton, the Knife plugin should have its own namespace and the namespace is declared using the module method as follows: require 'chef/knife' #Any other require, if needed   module NameSpace class SubclassName < Chef::Knife Here, the plugin is available under the namespace called NameSpace. One should keep in mind that Knife loads the subcommand irrespective of the namespace to which it belongs. Class name The class name declares a plugin as a subclass of both Knife and Chef. For example: class SubclassName < Chef::Knife The capitalization of the name is very important. The capitalization pattern can be used to define the word grouping that makes the best sense for the use of a plugin. For example, if we want our plugin subcommand to work as follows: knife bootstrap hdfs We should have our class name as: BootstrapHdfs. If, say, we used a class name such as BootStrapHdfs, then our subcommand would be as follows: knife boot strap hdfs It's important to remember that a plugin can override an existing Knife subcommand. For example, we already know about commands such as knife cookbook upload. If you want to override the current functionality of this command, all you need to do is create a new plugin with the following name: class CookbookUpload < Chef::Knife Banner Whenever a user enters the knife –help command, he/she is presented with a list of available subcommands. For example: knife --help Usage: knife sub-command (options)    -s, --server-url URL             Chef Server URL Available subcommands: (for details, knife SUB-COMMAND --help)   ** BACKUP COMMANDS ** knife backup export [COMPONENT [COMPONENT ...]] [-D DIR] (options) knife backup restore [COMPONENT [COMPONENT ...]] [-D DIR] (options)   ** BOOTSTRAP COMMANDS ** knife bootstrap FQDN (options) .... Let us say we are creating a new plugin and we would want Knife to be able to list it when a user enters the knife –help command. To accomplish this, we would need to make use of banner. For example, let's say we've a plugin called BootstrapHdfs with the following code: module NameSpace class BootstrapHdfs < Chef::Knife    ...    banner "knife bootstrap hdfs (options)"    ... end end Now, when a user enters the knife –help command, he'll see the following output: ** BOOTSTRAPHDFS COMMANDS ** knife bootstrap hdfs (options) Dependencies Reusability is one of the key paradigms in development and the same is true for Knife plugins. If you want a functionality of one Knife plugin to be available in another, you can use the deps method to ensure that all the necessary files are available. The deps method acts like a lazy loader, and it ensures that dependencies are loaded only when a plugin that requires them is executed. This is one of the reasons for using deps over require, as the overhead of the loading classes is reduced, thereby resulting in code with a lower memory footprint; hence, faster execution. One can use the following syntax to specify dependencies: deps do require 'chef/knife/name_of_command' require 'chef/search/query' #Other requires to fullfill dependencies end Requirements One can acquire the functionality available in other Knife plugins using the require method. This method can also be used to require the functionality available in other external libraries. This method can be used right at the beginning of the plugin script, however, it's always wise to use it inside deps, or else the libraries will be loaded even when they are not being put to use. The syntax to use require is fairly simple, as follows: require 'path_from_where_to_load_library' Let's say we want to use some functionalities provided by the bootstrap plugin. In order to accomplish this, we will first need to require the plugin: require 'chef/knife/bootstrap' Next, we'll need to create an object of that plugin: obj = Chef::Knife::Bootstrap.new Once we've the object with us, we can use it to pass arguments or options to that object. This is accomplished by changing the object's config and the name_arg variables. For example: obj.config[:use_sudo] = true Finally, we can run the plugin using the run method as follows: obj.run Options Almost every other Knife plugin accepts some command line option or other. These options can be added to a Knife subcommand using the option method. An option can have a Boolean value, string value, or we can even write a piece of code to determine the value of an option. Let's see each of them in action once: An option with a Boolean value (true/false): option :true_or_false, :short => "-t", :long => "—true-or-false", :description => "True/False?", :boolean => true | false, :default => true Here is an option with a string value: option :some_string_value, :short => "-s VALUE", :long => "—some-string-value VALUE", :description => "String value", :default => "xyz" An option where a code is used to determine the option's value: option :tag, :short => "-T T=V[,T=V,...]", :long => "—tags Tag=Value[,Tag=Value,...]", :description => "A list of tags", :proc => Proc.new { |tags| tag.split(',') } Here the proc attribute will convert a list of comma-separated values into an array. All the options that are sent to the Knife subcommand through a command line are available in form of a hash, which can be accessed using the config method. For example, say we had an option: option :option1 :short => "-s VALUE", :long => "—some-string-value VALUE", :description => "Some string value for option1", :default => "option1" Now, while issuing the Knife subcommand, say a user entered something like this: $ knife subcommand –option1 "option1_value" We can access this value for option1 in our Knife plugin run method using config[:option1] When a user enters the knife –help command, the description attributes are displayed as part of help. For example: **EXAMPLE COMMANDS** knife example -s, --some-type-of-string-value    This is not a random string value. -t, --true-or-false                 Is this value true? Or is this value false? -T, --tags                         A list of tags associated with the virtual machine. Arguments A Knife plugin can also accept the command-line arguments that aren't specified using the option flag, for example, knife node show NODE. These arguments are added using the name_args method: require 'chef/knife' module MyPlugin class ShowMsg << Chef::Knife    banner 'knife show msg MESSAGE'    def run      unless name_args.size == 1      puts "You need to supply a string as an argument."        show_usage        exit 1      end      msg = name_args.join(" ")      puts msg    end end end Let's see this in action: knife show msg You need to supply a string as an argument. USAGE: knife show msg MESSAGE    -s, --server-url URL             Chef Server URL        --chef-zero-host HOST       Host to start chef-zero on ... Here, we didn't pass any argument to the subcommand and, rightfully, Knife sent back a message saying You need to supply a string as an argument. Now, let's pass a string as an argument to the subcommand and see how it behaves: knife show msg "duh duh" duh duh Under the hood what's happening is that name_args is an array, which is getting populated by the arguments that we have passed in the command line. In the last example, the name_args array would've contained two entries ("duh","duh"). We use the join method of the Array class to create a string out of these two entities and, finally, print the string. The run method Every Knife plugin will have a run method, which will contain the code that will be executed when the user executes the subcommand. This code contains the Ruby statements that are executed upon invocation of the subcommand. This code can access the options values using the config[:option_hash_symbol_name] method. Search inside a custom Knife plugin Search is perhaps one of the most powerful and most used functionalities provided by Chef. By incorporating a search functionality in our custom Knife plugin, we can accomplish a lot of tasks, which would otherwise take a lot of efforts to accomplish. For example, say we have classified our infrastructure into multiple environments and we want a plugin that can allow us to upload a particular file or folder to all the instances in a particular environment on an ad hoc basis, without invoking a full chef-client run. This kind of stuff is very much doable by incorporating a search functionality into the plugin and using it to find the right set of nodes in which you want to perform a certain operation. We'll look at one such plugin in the next section. To be able to use Chef's search functionality, all you need to do is to require the Chef's query class and use an object of the Chef::Search::Query class to execute a query against the Chef server. For example: require 'chef/search/query' query_object = Chef::Search::Query.new query = 'chef_environment:production' query_object.search('node',query) do |node| puts "Node name = #{node.name}" end Since the name of a node is generally FQDN, you can use the values returned in node.name to connect to remote machines and use any library such as net-scp to allow users to upload their files/folders to a remote machine. We'll try to accomplish this task when we write our custom plugin at the end of this article. We can also use this information to edit nodes. For example, say we had a set of machines acting as web servers. Initially, all these machines were running Apache as a web server. However, as the requirements changed, we wanted to switch over to Nginx. We can run the following piece of code to accomplish this task: require 'chef/search/query'   query_object = Chef::Search::Query.new query = 'run_list:*recipe\[apache2\]*' query_object.search('node',query) do |node| ui.msg "Changing run_list to recipe[nginx] for #{node.name}" node.run_list("recipe[nginx]") node.save ui.msg "New run_list: #{node.run_list}" end knife.rb settings Some of the settings defined by a Knife plugin can be configured so that they can be set inside the knife.rb script. There are two ways to go about doing this: By using the :proc attribute of the option method and code that references Chef::Config[:knife][:setting_name] By specifying the configuration setting directly within the def Ruby blocks using either Chef::Config[:knife][:setting_name] or config[:setting_name] An option that is defined in this way can be configured in knife.rb by using the following syntax: knife [:setting_name] This approach is especially useful when a particular setting is used a lot. The precedence order for the Knife option is: The value passed via a command line. The value saved in knife.rb The default value. The following example shows how the Knife bootstrap command uses a value in knife.rb using the :proc attribute: option :ssh_port :short => '-p PORT', :long => '—ssh-port PORT', :description => 'The ssh port', :proc => Proc.new { |key| Chef::Config[:knife][:ssh_port] = key } Here Chef::Config[:knife][:ssh_port] tells Knife to check the knife.rb file for a knife[:ssh_port] setting. The following example shows how the Knife bootstrap command calls the knife ssh subcommand for the actual SSH part of running a bootstrap operation: def knife_ssh ssh = Chef::Knife::Ssh.new ssh.ui = ui ssh.name_args = [ server_name, ssh_command ] ssh.config[:ssh_user] = Chef::Config[:knife][:ssh_user] || config[:ssh_user] ssh.config[:ssh_password] = config[:ssh_password] ssh.config[:ssh_port] = Chef::Config[:knife][:ssh_port] || config[:ssh_port] ssh.config[:ssh_gateway] = Chef::Config[:knife][:ssh_gateway] || config[:ssh_gateway] ssh.config[:identity_file] = Chef::Config[:knife][:identity_file] || config[:identity_file] ssh.config[:manual] = true ssh.config[:host_key_verify] = Chef::Config[:knife][:host_key_verify] || config[:host_key_verify] ssh.config[:on_error] = :raise ssh end Let's take a look at the preceding code: ssh = Chef::Knife::Ssh.new creates a new instance of the Ssh subclass named ssh A series of settings in Knife ssh are associated with a Knife bootstrap using the ssh.config[:setting_name] syntax Chef::Config[:knife][:setting_name] tells Knife to check the knife.rb file for various settings It also raises an exception if any aspect of the SSH operation fails User interactions The ui object provides a set of methods that can be used to define user interactions and to help ensure a consistent user experience across all different Knife plugins. One should make use of these methods, rather than handling user interactions manually. Method Description ui.ask(*args, &block) The ask method calls the corresponding ask method of the HighLine library. More details about the HighLine library can be found at http://www.rubydoc.info/gems/highline/1.7.2. ui.ask_question(question, opts={}) This is used to ask a user a question. If :default => default_value is passed as a second argument, default_value will be used if the user does not provide any answer. ui.color (string, *colors) This method is used to specify a color. For example: server = connections.server.create(server_def)   puts "#{ui.color("Instance ID", :cyan)}: #{server.id}"   puts "#{ui.color("Flavor", :cyan)}: #{server.flavor_id}"   puts "#{ui.color("Image", :cyan)}: #{server.image_id}"   ...   puts "#{ui.color("SSH Key", :cyan)}: #{server.key_name}" print "n#{ui.color("Waiting for server", :magenta)}" ui.color?() This indicates that the colored output should be used. This is only possible if an output is sent across to a terminal. ui.confirm(question,append_instructions=true) This is used to ask (Y/N) questions. If a user responds back with N, the command immediately exits with the status code 3. ui.edit_data(data,parse_output=true) This is used to edit data. This will result in firing up of an editor. ui.edit_object(class,name) This method provides a convenient way to download an object, edit it, and save it back to the Chef server. It takes two arguments, namely, the class of object to edit and the name of object to edit. ui.error This is used to present an error to a user. ui.fatal This is used to present a fatal error to a user. ui.highline This is used to provide direct access to a highline object provided by many ui methods. ui.info This is used to present information to a user. ui.interchange This is used to determine whether the output is in a data interchange format such as JSON or YAML. ui.list(*args) This method is a way to quickly and easily lay out lists. This method is actually a wrapper to the list method provided by the HighLine library. More details about the HighLine library can be found at http://www.rubydoc.info/gems/highline/1.7.2. ui.msg(message) This is used to present a message to a user. ui.output(data) This is used to present a data structure to a user. This makes use of a generic default presenter. ui.pretty_print(data) This is used to enable the pretty_print output for JSON data. ui.use_presenter(presenter_class) This is used to specify a custom output presenter. ui.warn(message) This is used to present a warning to a user. For example, to show a fatal error in a plugin in the same way that it would be shown in Knife, do something similar to the following: unless name_args.size == 1    ui.fatal "Fatal error !!!"    show_usage    exit 1 end Exception handling In most cases, the exception handling available within Knife is enough to ensure that the exception handling for a plugin is consistent across all the different plugins. However, if the required one can handle exceptions in the same way as any other Ruby program, one can make use of the begin-end block, along with rescue clauses, to tell Ruby which exceptions we want to handle. For example: def raise_and_rescue begin    puts 'Before raise'    raise 'An error has happened.'    puts 'After raise' rescue    puts 'Rescued' end puts 'After begin block' end   raise_and_rescue If we were to execute this code, we'd get the following output: ruby test.rb Before raise Rescued After begin block A simple Knife plugin With the knowledge about how Knife's plugin system works, let's go about writing our very own custom Knife plugin, which can be quite useful for some users. Before we jump into the code, let's understand the purpose that this plugin is supposed to serve. Let's say we've a setup where our infrastructure is distributed across different environments and we've also set up a bunch of roles, which are used while we try to bootstrap the machines using Chef. So, there are two ways in which a user can identify machines: By environments By roles Actually, any valid Chef search query that returns a node list can be the criteria to identify machines. However, we are limiting ourselves to these two criteria for now. Often, there are situations where a user might want to upload a file or folder to all the machines in a particular environment, or to all the machines belonging to a particular role. This plugin will help users accomplish this task with lots of ease. The plugin will accept three arguments. The first one will be a key-value pair with the key being chef_environment or a role, the second argument will be a path to the file or folder that is required to be uploaded, and the third argument will be the path on a remote machine where the files/folders will be uploaded to. The plugin will use Chef's search functionality to find the FQDN of machines, and eventually make use of the net-scp library to transfer the file/folder to the machines. Our plugin will be called knife-scp and we would like to use it as follows: knife scp chef_environment:production /path_of_file_or_folder_locally /path_on_remote_machine Here is the code that can help us accomplish this feat: require 'chef/knife'   module CustomPlugins class Scp < Chef::Knife    banner "knife scp SEARCH_QUERY PATH_OF_LOCAL_FILE_OR_FOLDER PATH_ON_REMOTE_MACHINE"      option :knife_config_path,      :short => "-c PATH_OF_knife.rb",      :long => "--config PATH_OF_knife.rb",      :description => "Specify path of knife.rb",      :default => "~/.chef/knife.rb"      deps do      require 'chef/search/query'      require 'net/scp'      require 'parallel'    end      def run      if name_args.length != 3        ui.msg "Missing arguments! Unable to execute the command successfully."        show_usage        exit 1      end                  Chef::Config.from_file(File.expand_path("#{config[:knife_config_path]}"))      query = name_args[0]      local_path = name_args[1]      remote_path = name_args[2]      query_object = Chef::Search::Query.new      fqdn_list = Array.new      query_object.search('node',query) do |node|        fqdn_list << node.name      end      if fqdn_list.length < 1        ui.msg "No valid servers found to copy the files to"      end      unless File.exist?(local_path)        ui.msg "#{local_path} doesn't exist on local machine"        exit 1      end        Parallel.each((1..fqdn_list.length).to_a, :in_processes => fqdn_list.length) do |i|        puts "Copying #{local_path} to #{Chef::Config[:knife][:ssh_user]}@#{fqdn_list[i-1]}:#{remote_path} "        Net::SCP.upload!(fqdn_list[i-1],"#{Chef::Config[:knife][:ssh_user]}","#{local_path}","#{remote_path}",:ssh => { :keys => ["#{Chef::Config[:knife][:identity_file]}"] }, :recursive => true)      end    end end end This plugin uses the following additional gems: The parallel gem to execute statements in parallel. More information about this gem can be found at https://github.com/grosser/parallel. The net-scp gem to do the actual transfer. This gem is a pure Ruby implementation of the SCP protocol. More information about the gem can be found at https://github.com/net-ssh/net-scp. Both these gems and the Chef search library are required in the deps block to define the dependencies. This plugin accepts three command line arguments and uses knife.rb to get information about which user to connect over SSH and also uses knife.rb to fetch information about the SSH key file to use. All these command line arguments are stored in the name_args array. A Chef search is then used to find a list of servers that match the query, and eventually a parallel gem is used to parallely SCP the file from a local machine to a list of servers returned by a Chef query. As you can see, we've tried to handle a few error situations, however, there is still a possibility of this plugin throwing away errors as the Net::SCP.upload function can error out at times. Let's see our plugin in action: Case1: The file that is supposed to be uploaded doesn't exist locally. We expect the script to error out with an appropriate message: knife scp 'chef_environment:ft' /Users/mayank/test.py /tmp /Users/mayank/test.py doesn't exist on local machine Case2: The /Users/mayank/test folder is: knife scp 'chef_environment:ft' /Users/mayank/test /tmp Copying /Users/mayank/test to ec2-user@host02.ft.sychonet.com:/tmp Copying /Users/mayank/test to ec2-user@host01.ft.sychonet.com:/tmp Case3: A config other than /etc/chef/knife.rb is specified: knife scp -c /Users/mayank/.chef/knife.rb 'chef_environment:ft' /Users/mayank/test /tmp Copying /Users/mayank/test to ec2-user@host02.ft.sychonet.com:/tmp Copying /Users/mayank/test to ec2-user@host01.ft.sychonet.com:/tmp Distributing plugins using gems As you must have noticed, until now we've been creating our plugins under ~/.chef/plugins/knife. Though this is sufficient for plugins that are meant to be used locally, it's just not good enough to be distributed to a community. The most ideal way of distributing a Knife plugin is by packaging your plugin as a gem and distributing it via a gem repository such as rubygems.org. Even if publishing your gem to a remote gem repository sounds like a far-fetched idea, at least allowing people to install your plugin by building a gem locally and installing it via gem install. This is a far better way than people downloading your code from an SCM repository and copying it over to either ~/.chef/plugins/knife or any other folder they've configured for the purpose of searching for custom Knife plugins. With distributing your plugin using gems, you ensure that the plugin is installed in a consistent way and you can also ensure that all the required libraries are preinstalled before a plugin is ready to be consumed by users. All the details required to create a gem are contained in a file known as Gemspec, which resides at the root of your project's directory and is typically named the <project_name>.gemspec. Gemspec file that consists of the structure, dependencies, and metadata required to build your gem. The following is an example of a .gemspec file: Gem::Specification.new do |s| s.name = 'knife-scp' s.version = '1.0.0' s.date = '2014-10-23' s.summary = 'The knife-scp knife plugin' s.authors = ["maxcoder"] s.email = 'maxcoder@sychonet.com" s.files = ["lib/chef/knife/knife-scp.rb"] s.homepage = "https://github.com/maxc0d3r/knife-plugins" s.add_runtime_dependency "parallel","~> 1.2", ">= 1.2.0" s.add_runtime_dependency "net-scp","~> 1.2", ">= 1.2.0" end The s.files variable contains the list of files that will be deployed by a gem install command. Knife can load the files from gem_path/lib/chef/knife/<file_name>.rb, and hence we've kept the knife-scp.rb script in that location. The s.add_runtime_dependency dependency is used to ensure that the required gems are installed whenever a user tries to install our gem. Once the file is there, we can just run a gem build to build our gem file as follows: knife-scp git:(master) x gem build knife-scp.gemspec WARNING: licenses is empty, but is recommended. Use a license abbreviation from: http://opensource.org/licenses/alphabetical WARNING: See http://guides.rubygems.org/specification-reference/ for help Successfully built RubyGem Name: knife-scp Version: 1.0.0 File: knife-scp-1.0.0.gem The gem file is created and now, we can just use gem install knife-scp-1.0.0.gem to install our gem. This will also take care of the installation of any dependencies such as parallel, net-scp gems, and so on. You can find a source code for this plugin at the following location: https://github.com/maxc0d3r/knife-plugins. Once the gem has been installed, the user can run it as mentioned earlier. For the purpose of distribution of this gem, it can either be pushed using a local gem repository, or it can be published to https://rubygems.org/. To publish it to https://rubygems.org/, create an account there. Run the following command to log in using a gem: gem push This will ask for your email address and password. Next, push your gem using the following command: gem push your_gem_name.gem That's it! Now you should be able to access your gem at the following location: http://www.rubygems.org/gems/your_gem_name. As you might have noticed, we've not written any tests so far to check the plugin. It's always a good idea to write test cases before submitting your plugin to the community. It's useful both to the developer and consumers of the code, as both know that the plugin is going to work as expected. Gems support adding test files into the package itself so that tests can be executed when a gem is downloaded. RSpec is a popular choice to test a framework, however, it really doesn't matter which tool you use to test your code. The point is that you need to test and ship. Some popular Knife plugins, built by a community, and their uses, are as follows: knife-elb: This plugin allows the automation of the process of addition and deletion of nodes from Elastic Load Balancers on AWS. knife-inspect: This plugin allows you to see the difference between what's on a Chef server versus what's on a local Chef repository. knife-community: This plugin helps to deploy Chef cookbooks to Chef Supermarket. knife-block: This plugin allows you to configure and manage multiple Knife configuration files against multiple Chef servers. knife-tagbulk: This plugin allows bulk tag operations (creation or deletion) using standard Chef search queries. More information about the plugin can be found at: https://github.com/priestjim/knife-tagbulk. You can find a lot of other useful community-written plugins at: https://docs.chef.io/community_plugin_knife.html. Custom Chef handlers A Chef handler is used to identify different situations that might occur during a chef-client run, and eventually it instructs the chef-client on what it should do to handle these situations. There are three types of handlers in Chef: The exception handler: This is used to identify situations that have caused a chef-client run to fail. This can be used to send out alerts over an email or dashboard. The report handler: This is used to report back when a chef-client run has successfully completed. This can report details about the run, such as the number of resources updated, time taken for a chef-client run to complete, and so on. The start handler: This is used to run events at the beginning of a chef-client run. Writing custom Chef handlers is nothing more than just inheriting your class from Chef::Handler and overriding the report method. Let's say we want to send out an email every time a chef-client run breaks. Chef provides a failed? method to check the status of a chef-client run. The following is a very simple piece of code that will help us accomplish this: require 'net/smtp' module CustomHandler class Emailer < Chef::Handler    def send_email(to,opts={})      opts[:server] ||= 'localhost'      opts[:from] ||='maxcoder@sychonet.com'      opts[:subject] ||='Error'      opts[:body] ||= 'There was an error running chef-client'        msg = <<EOF      From: <#{opts[:from]}>      To: #{to}      Subject: #{opts[:subject]}        #{opts[:body]}      EOF        Net::SMTP.start(opts[:server]) do |smtp|        smtp.send_message msg, opts[:from], to      end    end      def report      name = node.name      subject = "Chef run failure on #{name}"      body = [run_status.formatted_exception]      body += ::Array(backtrace).join("n")      if failed?        send_email(          "ops@sychonet.com",          :subject => subject,          :body => body        )      end    end end end If you don't have the required libraries already installed on your machine, you'll need to make use of chef_gem to install them first before you actually make use of this code. With your handler code ready, you can make use of the chef_handler cookbook to install this custom handler. To do so, create a new cookbook, email-handler, and copy the file emailer.rb created earlier to the file's source. Once done, add the following recipe code: include_recipe 'chef_handler'   handler_path = node['chef_handler']['handler_path'] handler = ::File.join handler_path, 'emailer'   cookbook_file "#{handler}.rb" do source "emailer.rb" end   chef_handler "CustomHandler::Emailer" do source handler    action :enable end Now, just include this handler into your base role, or at the start of run_list and during the next chef-client run, if anything breaks, an email will be sent across to ops@sychonet.com. You can configure many different kinds of handlers like the ones that push notifications over to IRC, Twitter, and so on, or you may even write them for scenarios where you don't want to leave a component of a system in a state that is undesirable. For example, say you were in a middle of a chef-client run that adds/deletes collections from Solr. Now, you might not want to leave the Solr setup in a messed-up state if something were to go wrong with the provisioning process. In order to ensure that a system is in the right state, you can write your own custom handlers, which can be used to handle such situations and revert the changes done until now by the chef-client run. Summary In this article, we learned about how custom Knife plugins can be used. We also learned how we can write our own custom Knife plugin and distribute it by packaging it as a gem. Finally, we learned about custom Chef handlers and how they can be used effectively to communicate information and statistics about a chef-client run to users/admins, or handle any issues with a chef-client run. Resources for Article: Further resources on this subject: An Overview of Automation and Advent of Chef [article] Testing Your Recipes and Getting Started with ChefSpec [article] Chef Infrastructure [article]
Read more
  • 0
  • 0
  • 2233

article-image-transactions-redis
Packt
07 Jul 2015
9 min read
Save for later

Transactions in Redis

Packt
07 Jul 2015
9 min read
In this article by Vinoo Das author of the book Learning Redis, we will see how Redis as a NOSQL data store, provides a loose sense of transaction. As in a traditional RDBMS, the transaction starts with a BEGIN and ends with either COMMIT or ROLLBACK. All these RDBMS servers are multithreaded, so when a thread locks a resource, it cannot be manipulated by another thread unless and until the lock is released. Redis by default has MULTI to start and EXEC to execute the commands. In case of a transaction, the first command is always MULTI, and after that all the commands are stored, and when EXEC command is received, all the stored commands are executed in sequence. So inside the hood, once Redis receives the EXEC command, all the commands are executed as a single isolated operation. Following are the commands that can be used in Redis for transaction: MULTI: This marks the start of a transaction block EXEC: This executes all the commands in the pipeline after MULTI WATCH: This watches the keys for conditional execution of a transaction UNWATCH: This removes the WATCH keys of a transaction DISCARD: This flushes all the previously queued commands in the pipeline (For more resources related to this topic, see here.) The following figure represents how transaction in Redis works: Transaction in Redis Pipeline versus transaction As we have seen for many generic terms in pipeline the commands are grouped and executed, and the responses are queued in a block and sent. But in transaction, until the EXEC command is received, all the commands received after MULTI are queued and then executed. To understand that, it is important to take a case where we have a multithreaded environment and see the outcome. In the first case, we take two threads firing pipelined commands at Redis. In this sample, the first thread fires a pipelined command, which is going to change the value of a key multiple number of times, and the second thread will try to read the value of that key. Following is the class which is going to fire the two threads at Redis: MultiThreadedPipelineCommandTest.java: package org.learningRedis.chapter.four.pipelineandtx; public class MultiThreadedPipelineCommandTest { public static void main(String[] args) throws InterruptedException {    Thread pipelineClient = new Thread(new PipelineCommand());    Thread singleCommandClient = new Thread(new SingleCommand());    pipelineClient.start();    Thread.currentThread().sleep(50);    singleCommandClient.start(); } } The code for the client which is going to fire the pipeline commands is as follows: package org.learningRedis.chapter.four.pipelineandtx; import java.util.Set; import Redis.clients.jedis.Jedis; import Redis.clients.jedis.Pipeline; public class PipelineCommand implements Runnable{ Jedis jedis = ConnectionManager.get(); @Override public void run() {      long start = System.currentTimeMillis();      Pipeline commandpipe = jedis.pipelined();      for(int nv=0;nv<300000;nv++){        commandpipe.sadd("keys-1", "name"+nv);      }      commandpipe.sync();      Set<String> data= jedis.smembers("keys-1");      System.out.println("The return value of nv1 after pipeline [ " + data.size() + " ]");    System.out.println("The time taken for executing client(Thread-1) "+ (System.currentTimeMillis()-start));    ConnectionManager.set(jedis); } } The code for the client which is going to read the value of the key when pipeline is executed is as follows: package org.learningRedis.chapter.four.pipelineandtx; import java.util.Set; import Redis.clients.jedis.Jedis; public class SingleCommand implements Runnable { Jedis jedis = ConnectionManager.get(); @Override public void run() {    Set<String> data= jedis.smembers("keys-1");    System.out.println("The return value of nv1 is [ " + data.size() + " ]");    ConnectionManager.set(jedis); } } The result will vary as per machine configuration but by changing the thread sleep time and running the program couple of times, the result will be similar to the one shown as follows: The return value of nv1 is [ 3508 ] The return value of nv1 after pipeline [ 300000 ] The time taken for executing client(Thread-1) 3718 Please fire FLUSHDB command every time you run the test, otherwise you end up seeing the value of the previous test run, that is 300,000 Now we will run the sample in a transaction mode, where the command pipeline will be preceded by MULTI keyword and succeeded by EXEC command. This client is similar to the previous sample where two clients in separate threads will fire commands to a single key on Redis. The following program is a test client that gives two threads one with commands in transaction mode and the second thread will try to read and modify the same resource: package org.learningRedis.chapter.four.pipelineandtx; public class MultiThreadedTransactionCommandTest { public static void main(String[] args) throws InterruptedException {    Thread transactionClient = new Thread(new TransactionCommand());    Thread singleCommandClient = new Thread(new SingleCommand());    transactionClient.start();    Thread.currentThread().sleep(30);    singleCommandClient.start(); } } This program will try to modify the resource and read the resource while the transaction is going on: package org.learningRedis.chapter.four.pipelineandtx; import java.util.Set; import Redis.clients.jedis.Jedis; public class SingleCommand implements Runnable { Jedis jedis = ConnectionManager.get(); @Override public void run() {    Set<String> data= jedis.smembers("keys-1");    System.out.println("The return value of nv1 is [ " + data.size() + " ]");    ConnectionManager.set(jedis); } } This program will start with MULTI command, try to modify the resource, end it with EXEC command, and later read the value of the resource: package org.learningRedis.chapter.four.pipelineandtx; import java.util.Set; import Redis.clients.jedis.Jedis; import Redis.clients.jedis.Transaction; import chapter.four.pubsub.ConnectionManager; public class TransactionCommand implements Runnable { Jedis jedis = ConnectionManager.get(); @Override public void run() {      long start = System.currentTimeMillis();      Transaction transactionableCommands = jedis.multi();      for(int nv=0;nv<300000;nv++){        transactionableCommands.sadd("keys-1", "name"+nv);      }      transactionableCommands.exec();      Set<String> data= jedis.smembers("keys-1");      System.out.println("The return value nv1 after tx [ " + data.size() + " ]");    System.out.println("The time taken for executing client(Thread-1) "+ (System.currentTimeMillis()-start));    ConnectionManager.set(jedis); } } The result of the preceding program will vary as per machine configuration but by changing the thread sleep time and running the program couple of times, the result will be similar to the one shown as follows: The return code is [ 1 ] The return value of nv1 is [ null ] The return value nv1 after tx [ 300000 ] The time taken for executing client(Thread-1) 7078 Fire the FLUSHDB command every time you run the test. The idea is that the program should not pick up a value obtained because of a previous run of the program. The proof that the single command program is able to write to the key is if we see the following line: The return code is [1]. Let's analyze the result. In case of pipeline, a single command reads the value and the pipeline command sets a new value to that key as evident in the following result: The return value of nv1 is [ 3508 ] Now compare this with what happened in case of transaction when a single command tried to read the value but it was blocked because of the transaction. Hence the value will be NULL or 300,000. The return value of nv1 after tx [0] or The return value of nv1 after tx [300000] So the difference in output can be attributed to the fact that in a transaction, if we have started a MULTI command, and are still in the process of queueing commands (that is, we haven't given the server the EXEC request yet), then any other client can still come in and make a request, and the response would be sent to the other client. Once the client gives the EXEC command, then all other clients are blocked while all of the queued transaction commands are executed. Pipeline and transaction To have a better understanding, let's analyze what happened in case of pipeline. When two different connections made requests to the Redis for the same resource, we saw a result where client-2 picked up the value while client-1 was still executing: Pipeline in Redis in a multi connection environment What it tells us is that requests from the first connection which is pipeline command is stacked as one command in its execution stack, and the command from the other connection is kept in its own stack specific to that connection. The Redis execution thread time slices between these two executions stacks, and that is why client-2 was able to print a value when the client-1 was still executing. Let's analyze what happened in case of transaction here. Again the two commands (transaction commands and GET commands) were kept in their own execution stacks, but when the Redis execution thread gave time to the GET command, and it went to read the value, seeing the lock it was not allowed to read the value and was blocked. The Redis execution thread again went back to executing the transaction commands, and again it came back to GET command where it was again blocked. This process kept happening until the transaction command released the lock on the resource and then the GET command was able to get the value. If by any chance, the GET command was able to reach the resource before the transaction lock, it got a null value. Please bear in mind that Redis does not block execution to other clients while queuing transaction commands but blocks only during executing them. Transaction in Redis multi connection environment This exercise gave us an insight into what happens in the case of pipeline and transaction. Summary In this article, we saw in brief how to use Redis, not simply as a datastore, but also as pipeline the commands which is so much more like bulk processing. Apart from that, we covered areas such as transaction, messaging, and scripting. We also saw how to combine messaging and scripting, and create reliable messaging in Redis. This capability of Redis makes it different from some of the other datastore solutions. Resources for Article: Further resources on this subject: Implementing persistence in Redis (Intermediate) [article] Using Socket.IO and Express together [article] Exploring streams [article]
Read more
  • 0
  • 1
  • 4199
article-image-processing-next-generation-sequencing-datasets-using-python
Packt
07 Jul 2015
25 min read
Save for later

Processing Next-generation Sequencing Datasets Using Python

Packt
07 Jul 2015
25 min read
In this article by Tiago Antao, author of Bioinformatics with Python Cookbook, you will process next-generation sequencing datasets using Python. If you work in life sciences, you are probably aware of the increasing importance of computational methods to analyze increasingly larger datasets. There is a massive need for bioinformaticians to process this data, and one the main tools is, of course, Python. Python is probably the fastest growing language in the field of data sciences. It includes a rich ecology of software libraries to perform complex data analysis. Another major point in Python is its great community, which is always ready to help and produce great documentation and high-quality reliable software. In this article, we will use Python to process next-generation sequencing datasets. This is one of the many examples of Python usability in bioinformatics; chances are that if you have a biological dataset to analyze, Python can help you. This is surely the case with population genetics, genomics, phylogenetics, proteomics, and many other fields. Next-generation Sequencing (NGS) is one of the fundamental technological developments of the decade in the field of life sciences. Whole Genome Sequencing (WGS), RAD-Seq, RNA-Seq, Chip-Seq, and several other technologies are routinely used to investigate important biological problems. These are also called high-throughput sequencing technologies with good reason; they generate vast amounts of data that need to be processed. NGS is the main reason why computational biology is becoming a "big data" discipline. More than anything else, this is a field that requires strong bioinformatics techniques. There is very strong demand for professionals with these skillsets. Here, we will not discuss each individual NGS technique per se (this will be a massive undertaking). We will use two existing WGS datasets: the Human 1000 genomes project (http://www.1000genomes.org/) and the Anopheles 1000 genomes dataset (http://www.malariagen.net/projects/vector/ag1000g). The code presented will be easily applicable for other genomic sequencing approaches; some of them can also be used for transcriptomic analysis (for example, RNA-Seq). Most of the code is also species-independent, that is, you will be able to apply them to any species in which you have sequenced data. As this is not an introductory text, you are expected to at least know what FASTA, FASTQ, BAM, and VCF files are. We will also make use of basic genomic terminology without introducing it (things such as exomes, nonsynonym mutations, and so on). You are required to be familiar with basic Python, and we will leverage that knowledge to introduce the fundamental libraries in Python to perform NGS analysis. Here, we will concentrate on analyzing VCF files. Preparing the environment You will need Python 2.7 or 3.4. You can use many of the available distributions, including the standard one at http://www.python.org, but we recommend Anaconda Python from http://continuum.io/downloads. We also recommend the IPython Notebook (Project Jupyter) from http://ipython.org/. If you use Anaconda, this and many other packages are available with a simple conda install. There are some amazing libraries to perform data analysis in Python; here, we will use NumPy (http://www.numpy.org/) and matplotlib (http://matplotlib.org/), which you may already be using in your projects. We will also make use of the less widely used seaborn library (http://stanford.edu/~mwaskom/software/seaborn/). For bioinformatics, we will use Biopython (http://biopython.org) and PyVCF (https://pyvcf.readthedocs.org). The code used here is available on GitHub at https://github.com/tiagoantao/bioinf-python. In your realistic pipeline, you will probably be using other tools, such as bwa, samtools, or GATK to perform your alignment and SNP calling. In our case, tabix and bgzip (http://www.htslib.org/) is needed. Analyzing variant calls After running a genotype caller (for example, GATK or samtools mpileup), you will have a Variant Call Format (VCF) file reporting on genomic variations, such as SNPs (Single-Nucleotide Polymorphisms), InDels (Insertions/Deletions), CNVs (Copy Number Variation) among others. In this recipe, we will discuss VCF processing with the PyVCF module over the human 1000 genomes project to analyze SNP data. Getting ready I am to believe that 2 to 20 GB of data for a tutorial is asking too much. Although, the 1000 genomes' VCF files with realistic annotations are in that order of magnitude, we will want to work with much less data here. Fortunately, the bioinformatics community has developed tools that allow partial download of data. As part of the samtools/htslib package (http://www.htslib.org/), you can download tabix and bgzip, which will take care of data management. For example: tabix -fh ftp://ftp- trace.ncbi.nih.gov/1000genomes/ftp/release/20130502/supporting/vcf_ with_sample_level_annotation/ALL.chr22.phase3_shapeit2_mvncall_ integrated_v5_extra_anno.20130502.genotypes.vcf.gz 22:1-17000000 |bgzip -c > genotypes.vcf.gz tabix -p vcf genotypes.vcf.gz The first line will perform a partial download of the VCF file for chromosome 22 (up to 17 Mbp) of the 1000 genomes project. Then, bgzip will compress it. The second line will create an index, which we will need for direct access to a section of the genome. The preceding code is available at https://github.com/tiagoantao/bioinf-python/blob/master/notebooks/01_NGS/Working_with_VCF.ipynb. How to do it… Take a look at the following steps: Let's start inspecting the information that we can get per record, as shown in the following code: import vcf v = vcf.Reader(filename='genotypes.vcf.gz')   print('Variant Level information') infos = v.infos for info in infos:    print(info)   print('Sample Level information') fmts = v.formats for fmt in fmts:    print(fmt)     We start by inspecting the annotations that are available for each record (remember that each record encodes variants, such as SNP, CNV, and InDel, and the state of that variant per sample). At the variant (record) level, we will find AC: the total number of ALT alleles in called genotypes, AF: the estimated allele frequency, NS: the number of samples with data, AN: the total number of alleles in called genotypes, and DP: the total read depth. There are others, but they are mostly specific to the 1000 genomes project (here, we are trying to be as general as possible). Your own dataset might have much more annotations or none of these.     At the sample level, there are only two annotations in this file: GT: genotype and DP: the per sample read depth. Yes, you have the per variant (total) read depth and the per sample read depth; be sure not to confuse both. Now that we know which information is available, let's inspect a single VCF record with the following code: v = vcf.Reader(filename='genotypes.vcf.gz') rec = next(v) print(rec.CHROM, rec.POS, rec.ID, rec.REF, rec.ALT, rec.QUAL, rec.FILTER) print(rec.INFO) print(rec.FORMAT) samples = rec.samples print(len(samples)) sample = samples[0] print(sample.called, sample.gt_alleles, sample.is_het, sample.phased) print(int(sample['DP']))     We start by retrieving standard information: the chromosome, position, identifier, reference base (typically, just one), alternative bases (can have more than one, but it is not uncommon as the first filtering approach to only accept a single ALT, for example, only accept bi-allelic SNPs), quality (PHRED scaled—as you may expect), and the FILTER status. Regarding the filter, remember that whatever the VCF file says, you may still want to apply extra filters (as in the next recipe).     Then, we will print the additional variant-level information (AC, AS, AF, AN, DP, and so on), followed by the sample format (in this case, DP and GT). Finally, we will count the number of samples and inspect a single sample checking if it was called for this variant. If available, the reported alleles, heterozygosity, and phasing status (this dataset happens to be phased, which is not that common). Let's check the type of variant and the number of nonbiallelic SNPs in a single pass with the following code: from collections import defaultdict f = vcf.Reader(filename='genotypes.vcf.gz')   my_type = defaultdict(int) num_alts = defaultdict(int)   for rec in f:    my_type[rec.var_type, rec.var_subtype] += 1    if rec.is_snp:        num_alts[len(rec.ALT)] += 1 print(num_alts) print(my_type)     We use the Python defaultdict collection type. We find that this dataset has InDels (both insertions and deletions), CNVs, and, of course, SNPs (roughly two-third being transitions with one-third transversions). There is a residual number (79) of triallelic SNPs. There's more… The purpose of this recipe is to get you up to speed on the PyVCF module. At this stage, you should be comfortable with the API. We do not delve much here on usage details because that will be the main purpose of the next recipe: using the VCF module to study the quality of your variant calls. It will probably not be a shocking revelation that PyVCF is not the fastest module on earth. This file format (highly text-based) makes processing a time-consuming task. There are two main strategies of dealing with this problem: parallel processing or converting to a more efficient format. Note that VCF developers will perform a binary (BCF) version to deal with part of these problems at http://www.1000genomes.org/wiki/analysis/variant-call-format/bcf-binary-vcf-version-2. See also The specification for VCF is available at http://samtools.github.io/hts-specs/VCFv4.2.pdf GATK is one of the most widely used variant callers; check https://www.broadinstitute.org/gatk/ samtools and htslib are both used for variant calling and SAM/BAM management; check http://htslib.org Studying genome accessibility and filtering SNP data If you are using NGS data, the quality of your VCF calls may need to be assessed and filtered. Here, we will put in place a framework to filter SNP data. More than giving filtering rules (an impossible task to be performed in a general way), we give you procedures to assess the quality of your data. With this, you can then devise your own filters. Getting ready In the best-case scenario, you have a VCF file with proper filters applied; if this is the case, you can just go ahead and use your file. Note that all VCF files will have a FILTER column, but this does not mean that all the proper filters were applied. You have to be sure that your data is properly filtered. In the second case, which is one of the most common, your file will have unfiltered data, but you have enough annotations. Also, you can apply hard filters (that is, no need for programmatic filtering). If you have a GATK annotated file, refer, for instance, to http://gatkforums.broadinstitute.org/discussion/2806/howto-apply-hard-filters-to-a-call-set. In the third case, you have a VCF file that has all the annotations that you need, but you may want to apply more flexible filters (for example, "if read depth > 20, then accept; if mapping quality > 30, accept if mapping quality > 40"). In the fourth case, your VCF file does not have all the necessary annotations, and you have to revisit your BAM files (or even other sources of information). In this case, the best solution is to find whatever extra information you have and create a new VCF file with the needed annotations. Some genotype callers like GATK allow you to specify with annotations you want; you may also want to use extra programs to provide more annotations, for example, SnpEff (http://snpeff.sourceforge.net/) will annotate your SNPs with predictions of their effect (for example, if they are in exons, are they coding on noncoding?). It is impossible to provide a clear-cut recipe; it will vary with the type of your sequencing data, your species of study, and your tolerance to errors, among other variables. What we can do is provide a set of typical analysis that is done for high-quality filtering. In this recipe, we will not use data from the Human 1000 genomes project; we want "dirty" unfiltered data that has a lot of common annotations that can be used to filter it. We will use data from the Anopheles 1000 genomes project (Anopheles is the mosquito vector involved in the transmission of the parasite causing malaria), which makes available filtered and unfiltered data. You can find more information about this project at http://www.malariagen.net/projects/vector/ag1000g. We will get a part of the centromere of chromosome 3L for around 100 mosquitoes, followed by a part somewhere in the middle of that chromosome (and index both), as shown in the following code: tabix -fh ftp://ngs.sanger.ac.uk/production/ag1000g/phase1/preview/ag1000g.AC. phase1.AR1.vcf.gz 3L:1-200000 |bgzip -c > centro.vcf.gz tabix -fh ftp://ngs.sanger.ac.uk/production/ag1000g/phase1/preview/ag1000g.AC. phase1.AR1.vcf.gz 3L:21000001-21200000 |bgzip -c > standard.vcf.gz tabix -p vcf centro.vcf.gz tabix -p vcf standard.vcf.gz As usual, the code to download this data is available at the https://github.com/tiagoantao/bioinf-python/blob/master/notebooks/01_NGS/Filtering_SNPs.ipynb notebook. Finally, a word of warning about this recipe: the level of Python here will be slightly more complicated than before. The more general code that we will write may be easier to reuse in your specific case. We will perform extensive use of functional programming techniques (lambda functions) and the partial function application. How to do it… Take a look at the following steps: Let's start by plotting the distribution of variants across the genome in both files as follows: %matplotlib inline from collections import defaultdict   import seaborn as sns import matplotlib.pyplot as plt   import vcf   def do_window(recs, size, fun):    start = None    win_res = []    for rec in recs:        if not rec.is_snp or len(rec.ALT) > 1:            continue        if start is None:            start = rec.POS        my_win = 1 + (rec.POS - start) // size        while len(win_res) < my_win:            win_res.append([])        win_res[my_win - 1].extend(fun(rec))    return win_res   wins = {} size = 2000 vcf_names = ['centro.vcf.gz', 'standard.vcf.gz'] for vcf_name in vcf_names:    recs = vcf.Reader(filename=vcf_name)    wins[name] = do_window(recs, size, lambda x: [1])     We start by performing the required imports (as usual, remember to remove the first line if you are not on the IPython Notebook). Before I explain the function, note what we will do.     For both files, we will compute windowed statistics: we will divide our file that includes 200,000 bp of data in windows of size 2,000 (100 windows). Every time we find a bi-allelic SNP, we will add one to the list related to that window in the window function. The window function will take a VCF record (an SNP—rec.is_snp—that is not bi-allelic—len(rec.ALT) == 1), determine the window where that record belongs (by performing an integer division of rec.POS by size), and extend the list of results of that window by the function that is passed to it as the fun parameter (which in our case is just one).     So, now we have a list of 100 elements (each representing 2,000 base pairs). Each element will be another list, which will have 1 for each bi-allelic SNP found. So, if you have 200 SNPs in the first 2,000 base pairs, the first element of the list will have 200 ones. Let's continue: def apply_win_funs(wins, funs):    fun_results = []    for win in wins:        my_funs = {}        for name, fun in funs.items():            try:                my_funs[name] = fun(win)            except:                my_funs[name] = None        fun_results.append(my_funs)    return fun_results   stats = {} fig, ax = plt.subplots(figsize=(16, 9)) for name, nwins in wins.items():    stats[name] = apply_win_funs(nwins, {'sum': sum})    x_lim = [i * size for i in range(len(stats[name]))]    ax.plot(x_lim, [x['sum'] for x in stats[name]], label=name) ax.legend() ax.set_xlabel('Genomic location in the downloaded segment') ax.set_ylabel('Number of variant sites (bi-allelic SNPs)') fig.suptitle('Distribution of MQ0 along the genome', fontsize='xx-large')     Here, we will perform a plot that contains statistical information for each of our 100 windows. The apply_win_funs will calculate a set of statistics for every window. In this case, it will sum all the numbers in the window. Remember that every time we find an SNP, we will add one to the window list. This means that if we have 200 SNPs, we will have 200 1s; hence; summing them will return 200.     So, we are able to compute the number of SNPs per window in an apparently convoluted way. Why we are doing things with this strategy will become apparent soon, but for now, let's check the result of this computation for both files (refer to the following figure): Figure 1: The number of bi-allelic SNPs distributed of windows of 2, 000 bp of size for an area of 200 Kbp near the centromere (blue) and in the middle of chromosome (green). Both areas come from chromosome 3L for circa 100 Ugandan mosquitoes from the Anopheles 1000 genomes project     Note that the amount of SNPs in the centromere is smaller than the one in the middle of the chromosome. This is expected because calling variants in chromosomes is more difficult than calling variants in the middle and also because probably there is less genomic diversity in centromeres. If you are used to humans or other mammals, you may find the density of variants obnoxiously high, that is, mosquitoes for you! Let's take a look at the sample-level annotation. We will inspect Mapping Quality Zero (refer to https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_tools_walkers_annotator_MappingQualityZeroBySample.php for details), which is a measure of how well all the sequences involved in calling this variant map clearly to this position. Note that there is also an MQ0 annotation at the variant-level: import functools   import numpy as np mq0_wins = {} vcf_names = ['centro.vcf.gz', 'standard.vcf.gz'] size = 5000 def get_sample(rec, annot, my_type):    res = []    samples = rec.samples    for sample in samples:        if sample[annot] is None: # ignoring nones            continue        res.append(my_type(sample[annot]))    return res   for vcf_name in vcf_names:    recs = vcf.Reader(filename=vcf_name)    mq0_wins[vcf_name] = do_window(recs, size, functools.partial(get_sample, annot='MQ0', my_type=int))     Start by inspecting this by looking at the last for; we will perform a windowed analysis by getting the MQ0 annotation from each record. We perform this by calling the get_sample function in which we return our preferred annotation (in this case, MQ0) cast with a certain type (my_type=int). We will use the partial application function here. Python allows you to specify some parameters of a function and wait for other parameters to be specified later. Note that the most complicated thing here is the functional programming style. Also, note that it makes it very easy to compute other sample-level annotations; just replace MQ0 with AB, AD, GQ, and so on. You will immediately have a computation for that annotation. If the annotation is not of type integer, no problem; just adapt my_type. This is a difficult programming style if you are not used to it, but you will reap the benefits very soon. Let's now print the median and top 75 percent percentile for each window (in this case, with a size of 5,000) as follows: stats = {} colors = ['b', 'g'] i = 0 fig, ax = plt.subplots(figsize=(16, 9)) for name, nwins in mq0_wins.items():    stats[name] = apply_win_funs(nwins, {'median': np.median, '75': functools.partial(np.percentile, q=75)})    x_lim = [j * size for j in range(len(stats[name]))]    ax.plot(x_lim, [x['median'] for x in stats[name]], label=name, color=colors[i])    ax.plot(x_lim, [x['75'] for x in stats[name]], '--', color=colors[i])    i += 1 ax.legend() ax.set_xlabel('Genomic location in the downloaded segment') ax.set_ylabel('MQ0') fig.suptitle('Distribution of MQ0 along the genome', fontsize='xx-large')     Note that we now have two different statistics on apply_win_funs: percentile and median. Again, we will pass function names as parameters (np.median) and perform the partial function application (np.percentile). The result can be seen in the following figure: Figure 2: Median (continuous line) and 75th percentile (dashed) of MQ0 of sample SNPs distributed on windows of 5,000 bp of size for an area of 200 Kbp near the centromere (blue) and in the middle of chromosome (green); both areas come from chromosome 3L for circa 100 Ugandan mosquitoes from the Anopheles 1000 genomes project     For the "standard" file, the median MQ0 is 0 (it is plotted at the very bottom, which is almost unseen); this is good as it suggests that most sequences involved in the calling of variants map clearly to this area of the genome. For the centromere, MQ0 is of poor quality. Furthermore, there are areas where the genotype caller could not find any variants at all; hence, the incomplete chart. Let's compare heterozygosity with the DP sample-level annotation:     Here, we will plot the fraction of heterozygosity calls as a function of the sample read depth (DP) for every SNP. We will first explain the result and only then the code that generates it.     The following screenshot shows the fraction of calls that are heterozygous at a certain depth: Figure 3: The continuous line represents the fraction of heterozygosite calls computed at a certain depth; in blue is the centromeric area, in green is the "standard" area; the dashed lines represent the number of sample calls per depth; both areas come from chromosome 3L for circa 100 Ugandan mosquitoes from the Anopheles 1000 genomes project In the preceding screenshot, there are two considerations to be taken into account:     At a very low depth, the fraction of heterozygote calls is biased low; this makes sense because the number of reads per position does not allow you to make a correct estimate of the presence of both alleles in a sample. So, you should not trust calls at a very low depth.     As expected, the number of calls in the centromere is way lower than calls outside it. The distribution of SNPs outside the centromere follows a common pattern that you can expect in many datasets. Here is the code: def get_sample_relation(recs, f1, f2):    rel = defaultdict(int)    for rec in recs:        if not rec.is_snp:              continue        for sample in rec.samples:            try:                 v1 = f1(sample)                v2 = f2(sample)                if v1 is None or v2 is None:                    continue # We ignore Nones                rel[(v1, v2)] += 1            except:                pass # This is outside the domain (typically None)    return rel   rels = {} for vcf_name in vcf_names:    recs = vcf.Reader(filename=vcf_name)    rels[vcf_name] = get_sample_relation(recs, lambda s: 1 if s.is_het else 0, lambda s: int(s['DP'])) Let's start by looking at the for loop. Again, we will use functional programming: the get_sample_relation function will traverse all the SNP records and apply the two functional parameters; the first determines heterozygosity, whereas the second gets the sample DP (remember that there is also a variant DP).     Now, as the code is complex as it is, I opted for a naive data structure to be returned by get_sample_relation: a dictionary where the key is the pair of results (in this case, heterozygosity and DP) and the sum of SNPs, which share both values. There are more elegant data structures with different trade-offs for this: scipy spare matrices, pandas' DataFrames, or maybe, you want to consider PyTables. The fundamental point here is to have a framework that is general enough to compute relationships among a couple of sample annotations.     Also, be careful with the dimension space of several annotations, for example, if your annotation is of float type, you might have to round it (if not, the size of your data structure might become too big). Now, let's take a look at all the plotting codes. Let's perform it in two parts; here is part 1: def plot_hz_rel(dps, ax, ax2, name, rel):    frac_hz = []    cnt_dp = []    for dp in dps:        hz = 0.0        cnt = 0          for khz, kdp in rel.keys():             if kdp != dp:                continue            cnt += rel[(khz, dp)]            if khz == 1:                hz += rel[(khz, dp)]        frac_hz.append(hz / cnt)        cnt_dp.append(cnt)    ax.plot(dps, frac_hz, label=name)    ax2.plot(dps, cnt_dp, '--', label=name)     This function will take a data structure (as generated by get_sample_relation) expecting that the first parameter of the key tuple is the heterozygosity state (0 = homozygote, 1 = heterozygote) and the second is the DP. With this, it will generate two lines: one with the fraction of samples (which are heterozygotes at a certain depth) and the other with the SNP count. Let's now call this function, as shown in the following code: fig, ax = plt.subplots(figsize=(16, 9)) ax2 = ax.twinx() for name, rel in rels.items():    dps = list(set([x[1] for x in rel.keys()]))    dps.sort()    plot_hz_rel(dps, ax, ax2, name, rel) ax.set_xlim(0, 75) ax.set_ylim(0, 0.2) ax2.set_ylabel('Quantity of calls') ax.set_ylabel('Fraction of Heterozygote calls') ax.set_xlabel('Sample Read Depth (DP)') ax.legend() fig.suptitle('Number of calls per depth and fraction of calls which are Hz',,              fontsize='xx-large')     Here, we will use two axes. On the left-hand side, we will have the fraction of heterozygozite SNPs, whereas on the right-hand side, we will have the number of SNPs. Then, we will call our plot_hz_rel for both data files. The rest is standard matplotlib code. Finally, let's compare variant DP with the categorical variant-level annotation: EFF. EFF is provided by SnpEFF and tells us (among many other things) the type of SNP (for example, intergenic, intronic, coding synonymous, and coding nonsynonymous). The Anopheles dataset provides this useful annotation. Let's start by extracting variant-level annotations and the functional programming style, as shown in the following code: def get_variant_relation(recs, f1, f2):    rel = defaultdict(int)    for rec in recs:        if not rec.is_snp:              continue        try:            v1 = f1(rec)            v2 = f2(rec)            if v1 is None or v2 is None:                continue # We ignore Nones            rel[(v1, v2)] += 1        except:            pass    return rel     The programming style here is similar to get_sample_relation, but we do not delve into the samples. Now, we will define the types of effects that we will work with and convert the effect to an integer as it would allow you to use it as in index, for example, matrices. Think about coding a categorical variable: accepted_eff = ['INTERGENIC', 'INTRON', 'NON_SYNONYMOUS_CODING', 'SYNONYMOUS_CODING']   def eff_to_int(rec):    try:        for annot in rec.INFO['EFF']:            #We use the first annotation            master_type = annot.split('(')[0]            return accepted_eff.index(master_type)    except ValueError:        return len(accepted_eff) We will now traverse the file; the style should be clear to you now: eff_mq0s = {} for vcf_name in vcf_names:    recs = vcf.Reader(filename=vcf_name)    eff_mq0s[vcf_name] = get_variant_relation(recs, lambda r: eff_to_int(r), lambda r: int(r.INFO['DP'])) Finally, we will plot the distribution of DP using the SNP effect, as shown in the following code: fig, ax = plt.subplots(figsize=(16,9)) vcf_name = 'standard.vcf.gz' bp_vals = [[] for x in range(len(accepted_eff) + 1)] for k, cnt in eff_mq0s[vcf_name].items():    my_eff, mq0 = k    bp_vals[my_eff].extend([mq0] * cnt) sns.boxplot(bp_vals, sym='', ax=ax) ax.set_xticklabels(accepted_eff + ['OTHER']) ax.set_ylabel('DP (variant)') fig.suptitle('Distribution of variant DP per SNP type',              fontsize='xx-large') Here, we will just print a box plot for the noncentromeric file (refer to the following screenshot). The results are as expected: SNPs in code areas will probably have more depth if they are in more complex regions (that is easier to call) than intergenic SNPs: Figure 4: Boxplot for the distribution of variant read depth across different SNP effects There's more… The approach would depend on the type of sequencing data that you have, the number of samples, and potential extra information (for example, pedigree among samples). This recipe is very complex as it is, but parts of it are profoundly naive (there is a limit of complexity that I could force on you on a simple recipe). For example, the window code does not support overlapping windows; also, data structures are simplistic. However, I hope that they give you an idea of the general strategy to process genomic high-throughput sequencing data. See also There are many filtering rules, but I would like to draw your attention to the need of reasonably good coverage (clearly more than 10 x), for example, refer to. Meynet et al "Variant detection sensitivity and biases in whole genome and exome sequencing" at http://www.biomedcentral.com/1471-2105/15/247/ Brad Chapman is one of the best known specialist in sequencing analysis and data quality with Python and the main author of Blue Collar Bioinformatics, a blog that you may want to check at https://bcbio.wordpress.com/ Brad is also the main author of bcbio-nextgen, a Python-based pipeline for high-throughput sequencing analysis. Refer to https://bcbio-nextgen.readthedocs.org Peter Cock is the main author of Biopython and is heavily involved in NGS analysis; be sure to check his blog, "Blasted Bionformatics!?" at http://blastedbio.blogspot.co.uk/ Summary In this article, we prepared the environment, analyzed variant calls and learned about genome accessibility and filtering SNP data.
Read more
  • 0
  • 0
  • 28749

article-image-methodology-modeling-business-processes-soa
Packt
07 Jul 2015
27 min read
Save for later

Methodology for Modeling Business Processes in SOA

Packt
07 Jul 2015
27 min read
This article by Matjaz B. Juric, Sven Bernhardt, Hajo Normann, Danilo Schmiedel, Guido Schmutz, Mark Simpson, and Torsten Winterberg, authors of the book Design Principles for Process-driven Architectures Using Oracle BPM and SOA Suite 12c, describes the strategies and a methodology that can help us realize the benefits of BPM as a successful enterprise modernization strategy. In this article, we will do the following: Provide the reader with a set of actions in the course of a complete methodology that they can incorporate in order to create the desired attractiveness towards broader application throughout the enterprise Describe organizational and cultural barriers to applying enterprise BPM and discuss ways to overcome them (For more resources related to this topic, see here.) The postmature birth of enterprise BPM When enterprise architects discuss the future of the software landscape of their organization, they map the functional capabilities, such as customer relationship management and order management, to existing or new applications—some packaged and some custom. Then, they connect these applications by means of middleware. They typically use the notion of an integration middleware, such as an enterprise service bus (ESB), to depict the technical integration between these applications, exposing functionality as services, APIs, or, more trendy, "micro services". These services are used by modern, more and more mobile frontends and B2B partners. For several years now, it has been hard to find a PowerPoint slide that discusses future enterprise middleware without the notion of a BPM layer that sits on top of the frontend and the SOA service layer. So, in most organizations, we find a slide deck that contains this visual box named BPM, signifying the aim to improve process excellence by automating business processes along the management discipline known as business process management (BPM). Over the years, we have seen that the frontend layer often does materialize as a portal or through a modern mobile application development platform. The envisioned SOA services can be found living on an ESB or API gateway. Yet, the component called BPM and the related practice of modeling executable processes has failed to finally incubate until now—at least in most organizations. BPM still waits for morphing from an abstract item on a PowerPoint slide and in a shelved analyst report to some automated business processes that are actually deployed to a business-critical production machine. When we look closer—yes—there is a license for a BPM tool, and yes, some processes have even been automated, but those tend to be found rather in the less visible corners of the enterprise, seldom being of the concern for higher management and the hot project teams that work day and night on the next visible release. In short, BPM remains the hobby of some enterprise architects and the expensive consultants they pay. Will BPM ever take the often proposed lead role in the middleware architect's tool box? Will it lead to a better, more efficient, and productive organization? To be very honest, at the moment the answer to that question is rather no than yes. There is a good chance that BPM remains just one more of the silver bullets that fill some books and motivate some great presentations at conferences, yet do not have an impact on the average organization. But there is still hope for enterprise BPM as opposed to a departmental approach to process optimization. There is a good chance that BPM, next to other enabling technologies, will indeed be the driver for successful enterprise modernization. Large organizations all over the globe reengage with smaller and larger system integrators to tackle the process challenge. Maybe BPM as a practice needs more time than other items found in Gardner hype curves to mature before it's widely applied. This necessary level of higher maturity encompasses both the tools and the people using them. Ultimately, this question of large-scale BPM adoption will be answered individually in each organization. Only when a substantive set of enterprises experience tangible benefits from BPM will they talk about it, thus growing a momentum that leads to the success of enterprise BPM as a whole. This positive marketing based on actual project and program success will be the primary way to establish a force of attraction towards BPM that will raise curiosity and interest in the minds of the bulk of the organizations that are still rather hesitant or ignorant about using BPM. Oracle BPM Suite 12c – new business architecture features New tools in Oracle BPM Suite 12c put BPM in the mold of business architecture (BA). This new version contains new BA model types and features that help companies to move out of the IT-based, rather technical view of business processes automation and into strategic process improvement. Thus, these new model types help us to embark on the journey towards enterprise BPM. This is an interesting step in evolution of enterprise middleware—Oracle is the first vendor of a business process automation engine that moved up from concrete automated processes to strategic views on end-to-end processes, thus crossing the automation/strategic barrier. BPM Suite 12c introduces cross-departmental business process views. Thereby, it allows us to approach an enterprise modeling exercise through top-down modeling. It has become an end-to-end value chain model that sits on top of processes. It chains separated business processes together into one coherent end-to-end view. The value chain describes a bird's-eye view of the steps needed to achieve the most critical business goals of an organization. These steps comprise of business processes, of which some are automated in a BPMN engine and others actually run in packaged applications or are not automated at all. Also, BPM Suite 12c allows the capturing of the respective goals and provides the tools to measure them as KPIs and service-level agreements. In order to understand the path towards these yet untackled areas, it is important to understand where we stand today with BPM and what this new field of end-to-end process transparency is all about. Yet, before we get there, we will leave enterprise IT for a moment and take a look at the nature of a game (any game) in order to prepare for a deeper understanding of the mechanisms and cultural impact that underpin the move from a departmental to an enterprise approach to BPM. Football games – same basic rules, different methodology Any game, be it a physical sport, such as football, or a mental sport, such as chess, is defined through a set of common rules. How the game is played will look very different depending on the level of the league it is played in. A Champions League football game is so much more refined than your local team playing at the nearby stadium, not to mention the neighboring kids kicking the ball in the dirt ground. These kids will show creativity and pleasure in the game, yet the level of sophistication is a completely different ball game in the big stadium. You can marvel at the effort made to ensure that everybody plays their role in a well-trained symbiosis with their peers, all sharing a common set of collaboration rules and patterns. The team spent so many hours training the right combinations, establishing a deep trust. There is no time to discuss the meaning of an order shouted out by the trainer. They have worked on this common understanding of how to do things in various situations. They have established one language that they share. As an observer of a great match, you appreciate an elaborate art, not of one artist but of a coherent team. No one would argue the prevalence of this continuum in the refinement and rising sophistication in rough physical sports. It is puzzling to know to which extent we, as players in the games of enterprise IT, often tend to neglect the needs and forces that are implied by such a continuum of rising sophistication. The next sections will take a closer look at these BPM playgrounds and motivate you to take the necessary steps toward team excellence when moving from a small, departmental level BPM/SOA project to a program approach that is centered on a BPM and SOA paradigm. Which BPM game do we play? Game Silo BPM is the workflow or business process automation in organizational departments. It resembles the kids playing soccer on the neighborhood playground. After a few years of experience with automated processes, the maturity rises to resemble your local football team—yes, they play in a stadium, and it is often not elegant. Game Silo BPM is a tactical game in which work gets done while management deals with reaching departmental goals. New feature requests lead to changed or new applications and the people involved know each other very well over many years under established leadership. Workflows are automated to optimize performance. Game Enterprise BPM thrives for process excellence at Champions League. It is a strategic game in which higher management and business departments outline the future capability maps and cross-departmental business process models. In this game, players tend to regard the overall organization as a set of more or less efficient functional capabilities. One or a group of functional capabilities make up a department. Game Silo BPM – departmental workflows Today, most organizations use BPM-based process automation based on tools such as Oracle BPM Suite to improve the efficiency of the processes within departments. These processes often support the functional capability that this particular department owns by increasing its efficiency. Increasing efficiency is to do more with less. It is about automating manual steps, removing bottlenecks, and making sure that the resources are optimally allocated across your process. The driving force is typically the team manager, who is measured by the productivity of his team. The key factor to reach this goal is the automation of redundant or unnecessary human/IT interactions. Through process automation, we gain insights into the performance of the process. This insight can be called process transparency, allowing for the constant identification of areas of improvement. A typical example of the processes found in Game Silo BPM is an approval process that can be expressed as a clear path among process participants. Often, these processes work on documents, thus having a tight relationship with content management. We also find complex backend integration processes that involve human interaction only in the case of an exception. The following figure depicts this siloed approach to process automation. Recognizing a given business unit as a silo indicates its closed and self-contained world. A word of caution is needed: The term "silo" is often used in a negative connotation. It is important, though, to recognize that a closed and coherent team with no dependencies on other teams is a preferred working environment for many employees and managers. In more general terms, it reflects an archaic type of organization that we lean towards. It allows everybody in the team to indulge in what can be called the siege mentality we as humans gravitate to some coziness along the notion of a well-defined and manageable island of order. Figure 1: Workflow automation in departmental silos As discussed in the introduction, there is a chance that BPM never leaves this departmental context in many organizations. If you are curious to find out where you stand today, it is easy to depict whether a business process is stuck in the silo or whether it's part of an enterprise BPM strategy. Just ask whether the process is aligned with corporate goals or whether it's solely associated with departmental goals and KPIs. Another sign is the lack of a methodology and of a link to cross-departmental governance that defines the mode of operation and the means to create consistent models that speak one common language. To impose the enterprise architecture tools of Game Enterprise BPM on Game Silo BPM would be an overhead; you don't need them if your organization does not strive for cross-departmental process transparency. Oracle BPM Suite 11g is made for playing Game Silo BPM Oracle BPM Suite 11g provides all the tools and functionalities necessary to automate a departmental workflow. It is not sufficient to model business processes that span departmental barriers and require top-down process hierarchies. The key components are the following: The BPMN process modeling tool and the respective execution engine The means to organize logical roles and depict them as swimlanes in BPMN process models The human-task component that involves human input in decision making The business rule for supporting automated decision making The technical means to call backend SOA services The wizards to create data mappings The process performance can be measured by means of business activity monitoring (BAM) Oracle BPM Suite models processes in BPMN Workflows are modeled on a pretty fine level of granularity using the standard BPMN 2.0 version (and later versions). BPMN is both business ready and technically detailed enough to allow model processes to be executed in a process engine. Oracle fittingly expresses the mechanism as what you see is what you execute. Those workflows typically orchestrate human interaction through human tasks and functionality through SOA services. In the next sections of this article, we will move our head out of the cocoon of the silo, looking higher and higher along the hierarchy of the enterprise until we reach the world of workshops and polished PowerPoint slides in which the strategy of the organization is defined. Game Enterprise BPM Enterprise BPM is a management discipline with a methodology typically found in business architecture (BA) and the respective enterprise architecture (EA) teams. Representatives of higher management and of business departments and EA teams meet management consultants in order to understand the current situation and the desired state of the overall organization. Therefore, enterprise architects define process maps—a high-level view of the organization's business processes, both for the AS-IS and various TO-BE states. In the next step, they define the desired future state and depict strategies and means to reach it. Business processes that generate value for the organization typically span several departments. The steps in these end-to-end processes can be mapped to the functional building blocks—the departments. Figure 2: Cross-departmental business process needs an owner The goal of Game Enterprise BPM is to manage enterprise business processes, making sure they realize the corporate strategy and meet the respective goals, which are ideally measured by key performance indicators, such as customer satisfaction, reduction of failure, and cost reduction. It is a good practice to reflect on the cross-departmental efforts through the notion of a common or shared language that is spoken across departmental boundaries. A governance board is a great means to reach this capability in the overall organization. Still wide open – the business/IT divide Organizational change in the management structure is a prerequisite for the success of Game Enterprise BPM but is not a sufficient condition. Several business process management books describe the main challenge in enterprise BPM as the still-wide-open business/IT divide. There is still a gap between process understanding and ownership in Game Enterprise BPM and how automated process are modeled and perceived in departmental workflows of Game Silo BPM. Principles, goals, standards, and best practices defined in Game Enterprise BPM do not trickle down into everyday work in Game Silo BPM. One of the biggest reasons for this divide is the fact that there is no direct link between the models and tools used in top management to depict the business strategy, IT strategy, and business architecture and the high-level value chain and between the process models and the models and artifacts used in enterprise architecture and from there, even software architecture. Figure 3: Gap between business architecture and IT enterprise architecture in strategic BPM So, traditionally, organizations use specific business architecture or enterprise architecture tools in order to depict AS-IS and TO-BE high-level reflections of value chains, hierarchical business processes, and capability maps alongside application heat maps. These models kind of hang in the air, they are not deeply grounded in real life. Business process models expressed in event process chains (EPCs), vision process models and other modeling types often don't really reflect the flows and collaboration structures of actual procedures of the office. This leads to the perception of business architecture departments as ivory towers with no, or weak, links to the realities of the organization. On the other side, the tools and models of the IT enterprise architecture and software architecture speak a language not understood by the members of business departments. Unified Modeling Language (UML) is the most prominent set of model types that stuck in IT. However, while the UML class and activity diagrams promised to be valid to depict the nouns and stories of the business processes, their potential to allow a shared language and approach to depict joint vocabulary and views on requirements rarely materialized. Until now, there has been no workflow tool vendor approaching these strategic enterprise-level models, bridging the business/IT gap. Oracle BPM Suite 12c tackles Game Enterprise BPM With BPM Suite 12c, Oracle is starting to engage in this domain. The approach Oracle took can be summarized as applying the Pareto principle: 80 percent of the needed features for strategic enterprise modeling can be found in just 20 percent of the functionality of those high-end, enterprise-level modeling tools. So, Oracle implemented these 20 percent of business architecture models: Enterprise maps to define the organizational and application context Value chains to establish a root for process hierarchies The strategy model to depict focus areas and assign technical capabilities and optimization strategies The following figure represents the new features in Oracle BPM Suite 12c in the context of the Game Enterprise BPM methodology: Figure 4: Oracle BPM Suite 12c new features in the context of the Game Enterprise BPM methodology The preceding figure is based on the BPTrends Process Change Methodology introduced in the Business Process Change book by Paul Harmon. These new features promise to create a link from higher-level process models and other strategic depictions into executable processes. This link could not be established until now since there are too many interface mismatches between enterprise tooling and workflow automation engines. The new model types in Oracle BPM Suite 12c are, as discussed, a subset of all the features and model types in the EA tools. For this subset, these interface mismatches have been made obsolete: there is a clear trace with no tool disruption from the enterprise map to the value chain and associated KPIs down to the BPMN process that are automated. These features have been there in the EA and BPM tools before. What is new is this undisrupted trace. Figure 5: Undisrupted trace from the business architecture to executable processes The preceding figure is based on the BPTrends Process Change Methodology introduced in the Business Process Change book by Paul Harmon. This holistic set of tools that brings together aspects from modeling time, design time, and runtime makes it more likely to succeed in finally bridging the business/IT gap. Figure 6: Tighter links from business and strategy to executable software and processes Today, we do not live in a perfect world. To understand to which extent this gap is closed, it helps to look at how people work. If there is a tool to depict enterprise strategy, end-to-end business processes, business capabilities, and KPIs that are used in daily work and that have the means to navigate to lower-level models, then we have come quite far. The features of Oracle BPM Suite 12c, which are discussed below, are a step in this direction but are not the end of the journey. Using business architect features The process composer is a business user-friendly web application. From the login page, it guides the user in the spirit of a business architecture methodology. All the models that we create are part of a "space". On its entry page, the main structure is divided into business architecture models in BA projects, which are described in this article. It is a good practice to start with an enterprise map that depicts the business capabilities of our organization. The rationale is that the functions the organization is made of tend to be more stable and less a matter of interpretation and perspective than any business process view. Enterprise maps can be used to put those value chains and process models into the organizational and application landscape context. Oracle suggests organizing the business capabilities into three segments. Thus, the default enterprise map model is prepopulated through three default lanes: core, management, and support. In many organizations, this structure is feasible as it is up to you to either use them or create your own lanes. Then we can define within each lane the key business capabilities that make up the core business processes. Figure 7: Enterprise map of RYLC depicting key business capabilities Properties of BA models Each element (goal, objective, strategy) within the model can be enriched with business properties, such as actual cost, actual time, proposed cost and proposed time. These properties are part of the impact analysis report that can be generated to analyze the BA project. Figure 8: Use properties to specify SLAs and other BA characteristics Depicting organizational units Within RYLC as an organization, we now depict its departments as organizational units. We can adorn goals to each of the units, which depict its function and the role it plays in the concert of the overall ecosystem. This association of a unit to a goal is expressed via links to goals defined in the strategy model. These goals will be used for the impact analysis reports that show the impact of changes on the organizational or procedural changes. It is possible to create several organization units as shown in the following screenshot: Figure 9: Define a set of organization units Value chains A value chain model forms the root of a process hierarchy. A value chain consists of one direct line of steps, no gateways, and no exceptions. The modeler in Oracle BPM Suite allows each step in the chain to depict associated business goals and key performance indicators (KPIs) that can be used to measure the organization's performance rather than the details of departmental performance. The value chain model is a very simple one depicting the flow of the most basic business process steps. Each step is a business process in its own right. On the level of the chain, there is no decision making expressed. This resembles a business process expressed in BPMN that has only direct lines and no gateways. Figure 10: Creation of a new value chain model called "Rental Request-to-Delivery" The value chain model type allows the structuring of your business processes into a hierarchy with a value chain forming the topmost level. Strategy models Strategy models that can be used to further motivate their KPIs are depicted at the value chain level. Figure 11: Building the strategy model for the strategy "Become a market leader" These visual maps leverage existing process documentation and match it with current business trends and existing reference models. These models depict processes and associated organizational aspects encompassing cross-departmental views on higher levels and concrete processes down to level 3. From there, they prioritize distinct processes and decide on one of several modernization strategies—process automation being just one of several! So, in the proposed methodology in the following diagram, in the Define strategy and performance measures (KPIs), the team down-selects for each business process or even subprocess one or several means to improve process efficiency or transparency. Typically, these means are defined as "supporting capabilities". These are a technique, a tool, or an approach that helps to modernize a business process. A few typical supporting capabilities are mentioned here: Explicit process automation Implicit process handling and automation inside a packaged application, such as SAP ERP or Oracle Siebel Refactored COTS existing application Business process outsourcing Business process retirement The way toward establishing these supporting capabilities is defined through a list of potential modernization strategies. Several of the modernization strategies relate to the existing applications that support a business process, such as refactoring, replacement, retirement, or re-interfacing of the respective supporting applications. The application modernization strategy that we are most interested in this article is establish explicit automated process. It is a best practice to create a high-level process map and define for each of the processes whether to leave it as it is or to depict one of several modernization strategies. When several modernization strategies are found for one process, we can drill down into the process through a hierarchy and stop at the level on which there is a disjunct modernization strategy. Figure 12: Business process automation as just one of several process optimization strategies The preceding figure is based on the BPTrends Process Change Methodology introduced in the Business Process Change book by Paul Harmon. Again, just one of these technical capabilities in strategic BPM is the topic of this article: process automation. It is not feasible to suggest automating all the business processes of any organization. Key performance indicators Within a BA project (strategy and value chain models), there are three different types of KPIs that can be defined: Manual KPI: This allows us to enter a known value Rollup KPI: This evaluates an aggregate of the child KPIs External KPI: This provides a way to include KPI data from applications other than BPM Suite, such as SAP, E-Business Suite, PeopleSoft, and so on. Additionally, KPIs can be defined on a BPMN process level, which is not covered in this article. KPIs in the value chain step level The following are the steps to configure the KPIs in the value chain step level: Open the value chain model Rental Request-to-Delivery. Right-click on the Vehicle Reservation & Allocation chain step, and select KPI. Click on the + (plus) sign to create a manual KPI, as illustrated in the next screenshot. The following image shows the configuration of the KPIs: Figure 13: Configuring a KPI Why we need a new methodology for Game Enterprise BPM Now, Game Enterprise BPM needs to be played everywhere. This implies that Game Silo BPM needs to diminish, meaning it needs to be replaced, gradually, through managed evolution, league by league, aiming for excellence at Champions League. We can't play Game Enterprise BPM with the same culture of ad hoc, joyful creativity, which we find in Game Silo BPM. We can't just approach our colleague; let's call him Ingo Maier, who we know has drawn the process model for a process we are interested in. We can't just walk over to the other desk to him, asking him about the meaning of an unclear section in the process. That is because in Game Enterprise BPM, Ingo Maier, as a person whom we know as part of our team Silo, does not exist anymore. We deal with process models, with SOA services, with a language defined somewhere else, in another department. This is what makes it so hard to move up in BPM leagues. Hiding behind the buzz term "agile" does not help. In order to raise BPM maturity up, when we move from Game Silo BPM to Game Enterprise BPM, the organization needs to establish a set of standards, guidelines, tools, and modes of operations that allow playing and succeeding at Champions League. Additionally, we have to define the modes of operations and the broad steps that lead to a desired state. This formalization of collaboration in teams should be described, agreed on, and lived as our BPM methodology. The methodology thrives for a team in which each player contributes to one coherent game along well-defined phases. Political change through Game Enterprise BPM The political problem with a cross-departmental process view becomes apparent if we look at the way organizations distribute political power in Game Silo BPM. The heads of departments form the most potent management layer while the end-to-end business process has no potent stakeholder. Thus, it is critical for any Game Enterprise BPM to establish a good balance of de facto power with process owners acting as stakeholders for a cross-departmental process view. This fills the void in Game Silo BPM of end-to-end process owners. With Game Enterprise BPM, the focus shifts from departmental improvements to the KPIs and improvement of the core business processes. Pair modeling the value chains and business processes Value chains and process models down to a still very high-level layer, such as a layer 3, can be modeled by process experts without involving technically skilled people. They should omit all technical details. To provide the foundation for automated processes, we need to add more details about domain knowledge and some technical details. Therefore, these domain process experts meet with BPM tool experts to jointly define the next level of detail in BPMN. In an analogy to the practice of pair development in agile methodologies, you could call this kind of collaboration pair modeling. Ideally, the process expert(s) and the tool expert look at the same screen and discuss how to improve the flow of the process model, while the visual representation evolves into variances, exceptions, and better understanding of the involved business objects. For many organizations that are used to a waterfall process, this is a fundamentally new way of requirement gathering that might be a challenge for some. The practice is an analogy of the customer on site practice in agile methodologies. This new way of close collaboration for process modeling is crucial for the success of BPM projects since it allows us to establish a deep and shared understanding in a very pragmatic and productive way. Figure 14: Roles and successful modes of collaboration When the process is modeled in sufficient detail to clearly depict an algorithmic definition of the flow of the process and all its variances, the model can be handed over to BPM developers. They add all the technical bells and whistles, such as data mapping, decision rules, service calls, and exception handling. Portal developers will work on their implementation of the use cases. SOA developers will use Oracle SOA Suite to integrate with backend systems, therefore implementing SOA services. The discussed notion of a handover from higher-level business process models to development teams can also be used to depict the line at which it might make sense to outsource parts of the overall development. Summary In this article, we saw how BPM as an approach to model, automate, and optimize business process is typically applied rather on a departmental level. We saw how BPM Suite 12c introduced new features that allow us to cross the bridge towards the top-down, cross-departmental, enterprise-level BPM. We depicted the key characteristics of the enterprise BPM methodology, which aligns corporate or strategic activities with actual process automation projects. We learned the importance of modeling standards and guidelines, which should be used to gain business process insight and understanding on broad levels throughout the enterprise. The goal is to establish a shared language to talk about the capabilities and processes of the overall organization and the services it provides to its customers. The role of data in SOA that will support business processes was understood with a critical success factor being the definition of the business data model that, along with services, will form the connection between the process layer, user interface layer, and the services layer. We understood how important it is to separate application logic from service logic and process logic to ensure the benefits of a process-driven architecture are realized. Resources for Article: Further resources on this subject: Introduction to Oracle BPM [article] Oracle B2B Overview [article] Event-driven BPEL Process [article]
Read more
  • 0
  • 0
  • 5240

article-image-installation-and-setup
Packt
07 Jul 2015
15 min read
Save for later

Installation and Setup

Packt
07 Jul 2015
15 min read
The Banana Pi is a single-board computer, which enables you to build your own individual and versatile system. In fact, it is a complete computer, including all the required elements such as a processor, memory, network, and other interfaces, which we are going to explore. It provides enough power to run even relatively complex applications suitably. In this article by, Ryad El-Dajani, author of the book, Banana Pi Cookbook, we are going to get to know the Banana Pi device. The available distributions are mentioned, as well as how to download and install these distributions. We will also examine Android in contrast to our upcoming Linux adventure. (For more resources related to this topic, see here.) Thus, you are going to transform your little piece of hardware into a functional, running computer with a working operating system. You will master the whole process of doing the required task from connecting the cables, choosing an operating system, writing the image to an SD card, and successfully booting up and shutting down your device for the first time. Banana Pi Overview In the following picture, you see a Banana Pi on the left-hand side and a Banana Pro on the right-hand side: As you can see, there are some small differences that we need to notice. The Banana Pi provides a dedicated composite video output besides the HDMI output. However, with the Banana Pro, you can connect your display via composite video output using a four-pole composite audio/video cable on the jack. In contrast to the Banana Pi, which has 26 pin headers, the Banana Pro provides 40 pins. Also the pins for the UART port interface are located below the GPIO headers on the Pi, while they are located besides the network interface on the Pro. The other two important differences are not clearly visible on the previous picture. The operating system for your device comes in the form of image files that need to be written (burned) to an SD card. The Banana Pi uses normal SD cards while the Banana Pro will only accept Micro SD cards. Moreover, the Banana Pro provides a Wi-Fi interface already on board. Therefore, you are also able to connect the Banana Pro to your wireless network, while the Pi would require an external wireless USB device. Besides the mentioned differences, the devices are very similar. You will find the following hardware components and interfaces on your device. On the back side, you will find: A20 ARM Cortex-A7 dual core central processing unit (CPU) ARM Mali400 MP2 graphics processing unit (GPU) 1 gigabyte of DDR3 memory (that is shared with the GPU) On the front side, you will find: Ethernet network interface adapter Two USB 2.0 ports A 5V micro USB power with DC in and a micro USB OTG port A SATA 2.0 port and SATA power output Various display outputs [HDMI, LVDS, and composite (integrated into jack on the Pro)] A CSI camera input connector An infrared (IR) receiver A microphone Various hardware buttons on board (power key, reset key, and UBoot key) Various LEDs (red for power status, blue for Ethernet status, and green for user defined) As you can see, you have a lot of opportunities for letting your device interact with various external components. Operating systems for the Banana Pi The Banana Pi is capable of running any operating system that supports the ARM Cortex-A7 architecture. There are several operating systems precompiled, so you are able to write the operating system to an SD card and boot your system flawlessly. Currently, there are the following operating systems provided officially by LeMaker, the manufacturer of the Banana Pi. Android Android is a well-known operating system for mobile phones, but it is also runnable on various other devices such as smart watches, cars, and, of course, single-board computers such as the Banana Pi. The main advantage of running Android on a single-board computer is its convenience. Anybody who uses an Android-based smartphone will recognize the graphical user interface (GUI) and may have less initial hurdles. Also, setting up a media center might be easier to do on Android than on a Linux-based system. However, there are also a few disadvantages, as you are limited to software that is provided by an Android store such as Google Play. As most apps are optimized for mobile use at the moment, you will not find a lot of usable software for your Banana Pi running Android, except some Games and Multimedia applications. Moreover, you are required to use special Windows software called PhoenixCard to be able to prepare an Android SD card. In this article, we are going to ignore the installing of Android. For further information, please see Installing the Android OS image (LeMaker Wiki) at http://wiki.lemaker.org/BananaPro/Pi:SD_card_installation. Linux Most of the Linux users never realize that they are actually using Linux when operating their phones, appliances, routers, and many more products, as most of its magic happens in the background. We are going to dig into this adventure to discover its possibilities when running on our Banana Pi device. The following Linux-based operating systems—so-called distributions—are used by the majority of the Banana Pi user base and are supported officially by the manufacturer: Lubuntu: This is a lightweight distribution based on the well-known Ubuntu using the LXDE desktop, which is principally a good choice, if you are a Windows user. Raspbian: This is a distribution based on Debian, which was initially produced for the Raspberry Pi (hence the name). As a lot of Raspberry Pi owners are running Raspbian on their devices while also experimenting with the Banana Pi, LeMaker ported the original Raspbian distribution to the Banana Pi. Raspbian also comes with an LXDE desktop by default. Bananian: This too is a Debian-based Linux distribution optimized exclusively for the Banana Pi and its siblings. All of the aforementioned distributions are based on the well-known distribution, Debian. Besides the huge user base, all Debian-based distributions use the same package manager Apt (Advanced Packaging Tool) to search for and install new software, and all are similar to use. There are still more distributions that are officially supported by LeMaker, such as Berryboot, LeMedia, OpenSUSE, Fedora, Gentoo, Scratch, ArchLinux, Open MediaVault, and OpenWrt. All of them have their pros and cons or their specific use cases. If you are an experienced Linux user, you may choose your preferred distribution from the mentioned list, as most of the recipes are similar to, or even equally usable on, most of the Linux-based operating systems. Moreover, the Banana Pi community publishes various customized Linux distributions for the Banana Pi regularly. The possible advantages of a customized distribution may include enabled and optimized hardware acceleration capabilities, supportive helper scripts, fully equipped desktop environments, and much more. However, when deciding to use a customized distribution, there is no official support by LeMaker and you have to contact the publisher in case you encounter bugs, or need help. You can also check the customized Arch Linux image that author have built (http://blog.eldajani.net/banana-pi-arch-linux-customized-distribution/) for the Banana Pi and Banana Pro, including several useful applications. Downloading an operating system for the Banana Pi The following two recipes will explain how to set up the SD card with the desired operating system and how to get the Banana Pi up and running for the first time. This recipe is a predecessor. Besides the device itself, you will need at least a source for energy, which is usually a USB power supply and an SD card to boot your Banana Pi. Also, a network cable and connection is highly recommended to be able to interact with your Banana Pi from another computer via a remote shell using the application. You might also want to actually see something on a display. Then, you will need to connect your Banana Pi via HDMI, composite, or LVDS to an external screen. It is recommended that you use an HDMI Version 1.4 cable since lower versions can possibly cause issues. Besides inputting data using a remote shell, you can directly connect an USB keyboard and mouse to your Banana Pi via the USB ports. After completing the required tasks in the upcoming recipes, you will be able to boot your Banana Pi. Getting ready The following components are required for this recipe: Banana Pi SD card (minimum class 4; class 10 is recommended) USB power supply (5V 2A recommended) A computer with an SD card reader/writer (to write the image to the SD card) Furthermore, you are going to need an Internet connection to download a Linux distribution or Android. A few optional but highly recommended components are: Connection to a display (via HDMI or composite) Network connection via Ethernet USB keyboard and mouse You can acquire these items from various retailers. All items shown in the previous two pictures were bought from an online retailer that is known for originally selling books. However, the Banana Pi and the other products can be acquired from a large number of retailers. It is recommended to get a USB power supply with 2000mA (2A) output. How to do it… To download an operating system for Banana Pi, follow these steps: Download an image of your desired operating system. We are going to download Android and Raspbian from the official LeMaker image files website: http://www.lemaker.org/resources/9-38/image_files.html. The following screenshot shows the LeMaker website where you can download the official images: If you are clicking on one of the mirrors (such as Google Drive, Dropbox, and so on), you will be redirected to the equivalent file-hosting service. From there, you are actually able to download the archive file. Once your archive containing the image is downloaded, you are ready to unpack the downloaded archive, which we will do in the upcoming recipes. Setting up the SD card on Windows This recipe will explain how to set up the SD card using a Windows operating system. How to do it… In the upcoming steps, we will unpack the archive containing the operating system image for the Banana Pi and write the image to the SD card: Open the downloaded archive with 7-Zip. The following screenshot shows the 7-Zip application opening a compressed .tgz archive: Unpack the archive to a directory until you get a file with the file extension .img. If it is .tgz or .tar.gz file, you will need to unpack the archive twice Create a backup of the contents of the SD card as everything on the SD card is going to be erased unrecoverablely. Open SD Formatter (https://www.sdcard.org/downloads/formatter_4/) and check the disk letter (E: in the following screenshot). Choose Option to open the Option Setting window and choose: FORMAT TYPE: FULL (Erase) FORMAT SIZE ADJUSTMENT: ON When everything is configured correctly, check again to see whether you are using the correct disk and click Format to start the formatting process. Writing a Linux distribution image to the SD card on Windows The following steps explain how to write a Linux-based distribution to the SD card on Windows: Format the SD card using SD Formatter, which we covered in the previous section. Open the Win32 Disk Imager (http://sourceforge.net/projects/win32diskimager/). Choose the image file by clicking on the directory button. Check, whether you are going to write to the correct disk and then click on Write. Once the burning process is done, you are ready to insert the freshly prepared SD card containing your Linux operating system into the Banana Pi and boot it up for the first time. Booting up and shutting down the Banana Pi This recipe will explain how to boot up and shut down the Banana Pi. As the Banana Pi is a real computer, these tasks are as equally important as tasks on your desktop computer. The booting process starts the Linux kernel and several important services. The shutting down stops them accordingly and does not power off the Banana Pi until all data is synchronized with the SD card or external components correctly. How to do it… We are going to boot up and shut down the Banana Pi. Booting up Do the following steps to boot up your Banana Pi: Attach the Ethernet cable to your local network. Connect your Banana Pi to a display. Plug in an USB keyboard and mouse. Insert the SD card to your device. Power your Banana Pi by plugging in the USB power cable. The next screenshot shows the desktop of Raspbian after a successful boot: Shutting down Linux To shut down your Linux-based distribution, you either use the shutdown command or do it via the desktop environment (in case of Raspbian, it is called LXDE). For the latter method, these are the steps: Click on the LXDE icon in the lower-left corner. Click on Logout. Click on Shutdown in the upcoming window. To shut down your operating system via the shell, type in the following command: $ sudo shutdown -h now Connecting via SSH on Windows using PuTTY The following recipe shows you how to connect to your Banana Pi remotely using an open source application called PuTTY. Getting ready For this recipe, you will need the following ingredients: A booted up Linux operating system on your Banana Pi connected to your local network The PuTTY application on your Windows PC that is also connected to your local area network How to do it… To connect to your Banana Pi via SSH on Windows, perform the following: Run putty.exe. You will see the PuTTY Configuration dialog. Enter the IP address of the Banana Pi and leave the Port as number 22 as. Click on the Open button. A new terminal will appear, attempting a connection to the Banana Pi. When connecting to the Banana Pi for the first time, you will see a PuTTY security alert. The following screenshot shows the PuTTY Security Alert window: Trust the connection by clicking on Yes. You will be requested to enter the login credentials. Use the default username bananapi and password bananapi. When you are done, you should be welcomed by the shell of your Banana Pi. The following screenshot shows the shell of your Banana Pi accessed via SSH using PuTTY on Windows: To quit your SSH session, execute the command exit or press Ctrl + D. Searching, installing, and removing the software Once you have your decent operating system on the Banana Pi, sooner or later you are going to require a new software. As most software for Linux systems is published as open source, you can obtain the source code and compile it for yourself. One alternative is to use a package manager. A lot of software is precompiled and provided as installable packages by the so-called repositories. In case of Debian-based distributions (for example, Raspbian, Bananian, and Lubuntu), the package manager that uses these repositories is called Advanced Packaging Tool (Apt). The two most important tools for our requirements will be apt-get and apt-cache. In this recipe, we will cover the searching, the installing, and removing of software using the Apt utilities. Getting ready The following ingredients are required for this recipe. A booted Debian-based operating system on your Banana Pi An Internet connection How to do it… We will separate this recipe into searching for, installing and removing of packages. Searching for packages In the upcoming example, we will search for a solitaire game: Connect to your Banana Pi remotely or open a terminal on the desktop. Type the following command into the shell: $ apt-cache search solitaire You will get a list of packages that contain the string solitaire in their package name or description. Each line represents a package and shows the package name and description separated by a dash (-). Now we have obtained a list of solitaire games: The preceding screenshot shows the output after searching for packages containing the string solitaire using the apt-cache command. Installing a package We are going to install a package by using its package name. From the previous received list, we select the package ace-of-penguins. Type the following command into the shell: $ sudo apt-get install ace-of-penguins If asked to type the password for sudo, enter the user's password. If a package requires additional packages (dependencies), you will be asked to confirm the additional packages. In this case, enter Y. After downloading and installing, the desired package is installed: Removing a package When you want to uninstall (remove) a package, you also use the apt-get command: Type the following command into a shell: $ sudo apt-get remove ace-of-penguins If asked to type the password for sudo, enter the user's password. You will be asked to confirm the removal. Enter Y. After this process, the package is removed from your system. You will have uninstalled the package ace-of-penguins. Summary In this article, we discovered the installation of a Linux operating system on the Banana Pi. Furthermore, we connected to the Banana Pi via the SSH protocol using PuTTY. Moreover, we discussed how to install new software using the Advanced Packaging Tool. This article is a combination of parts from the first two chapters of the Banana Pi Cookbook. In the Banana Pi Cookbook, we are diving more into detail and explain the specifics of the Banana Pro, for example, how to connect to the local network via WLAN. If you are using a Linux-based desktop computer, you will also learn how to set up the SD card and connect via SSH to your Banana Pi on your Linux computer. Resources for Article: Further resources on this subject: Color and motion finding [article] Controlling the Movement of a Robot with Legs [article] Develop a Digital Clock [article]
Read more
  • 0
  • 0
  • 21531
article-image-man-do-i-templates
Packt
07 Jul 2015
22 min read
Save for later

Man, Do I Like Templates!

Packt
07 Jul 2015
22 min read
In this article by Italo Maia, author of the book Building Web Applications with Flask, we will discuss what Jinja2 is, and how Flask uses Jinja2 to implement the View layer and awe you. Be prepared! (For more resources related to this topic, see here.) What is Jinja2 and how is it coupled with Flask? Jinja2 is a library found at http://jinja.pocoo.org/; you can use it to produce formatted text with bundled logic. Unlike the Python format function, which only allows you to replace markup with variable content, you can have a control structure, such as a for loop, inside a template string and use Jinja2 to parse it. Let's consider this example: from jinja2 import Template x = """ <p>Uncle Scrooge nephews</p> <ul> {% for i in my_list %} <li>{{ i }}</li> {% endfor %} </ul> """ template = Template(x) # output is an unicode string print template.render(my_list=['Huey', 'Dewey', 'Louie']) In the preceding code, we have a very simple example where we create a template string with a for loop control structure ("for tag", for short) that iterates over a list variable called my_list and prints the element inside a "li HTML tag" using curly braces {{ }} notation. Notice that you could call render in the template instance as many times as needed with different key-value arguments, also called the template context. A context variable may have any valid Python variable name—that is, anything in the format given by the regular expression [a-zA-Z_][a-zA-Z0-9_]*. For a full overview on regular expressions (Regex for short) with Python, visit https://docs.python.org/2/library/re.html. Also, take a look at this nice online tool for Regex testing http://pythex.org/. A more elaborate example would make use of an environment class instance, which is a central, configurable, extensible class that may be used to load templates in a more organized way. Do you follow where we are going here? This is the basic principle behind Jinja2 and Flask: it prepares an environment for you, with a few responsive defaults, and gets your wheels in motion. What can you do with Jinja2? Jinja2 is pretty slick. You can use it with template files or strings; you can use it to create formatted text, such as HTML, XML, Markdown, and e-mail content; you can put together templates, reuse templates, and extend templates; you can even use extensions with it. The possibilities are countless, and combined with nice debugging features, auto-escaping, and full unicode support. Auto-escaping is a Jinja2 configuration where everything you print in a template is interpreted as plain text, if not explicitly requested otherwise. Imagine a variable x has its value set to <b>b</b>. If auto-escaping is enabled, {{ x }} in a template would print the string as given. If auto-escaping is off, which is the Jinja2 default (Flask's default is on), the resulting text would be b. Let's understand a few concepts before covering how Jinja2 allows us to do our coding. First, we have the previously mentioned curly braces. Double curly braces are a delimiter that allows you to evaluate a variable or function from the provided context and print it into the template: from jinja2 import Template # create the template t = Template("{{ variable }}") # – Built-in Types – t.render(variable='hello you') >> u"hello you" t.render(variable=100) >> u"100" # you can evaluate custom classes instances class A(object): def __str__(self):    return "__str__" def __unicode__(self):    return u"__unicode__" def __repr__(self):    return u"__repr__" # – Custom Objects Evaluation – # __unicode__ has the highest precedence in evaluation # followed by __str__ and __repr__ t.render(variable=A()) >> u"__unicode__" In the preceding example, we see how to use curly braces to evaluate variables in your template. First, we evaluate a string and then an integer. Both result in a unicode string. If we evaluate a class of our own, we must make sure there is a __unicode__ method defined, as it is called during the evaluation. If a __unicode__ method is not defined, the evaluation falls back to __str__ and __repr__, sequentially. This is easy. Furthermore, what if we want to evaluate a function? Well, just call it: from jinja2 import Template # create the template t = Template("{{ fnc() }}") t.render(fnc=lambda: 10) >> u"10" # evaluating a function with argument t = Template("{{ fnc(x) }}") t.render(fnc=lambda v: v, x='20') >> u"20" t = Template("{{ fnc(v=30) }}") t.render(fnc=lambda v: v) >> u"30" To output the result of a function in a template, just call the function as any regular Python function. The function return value will be evaluated normally. If you're familiar with Django, you might notice a slight difference here. In Django, you do not need the parentheses to call a function, or even pass arguments to it. In Flask, the parentheses are always needed if you want the function return evaluated. The following two examples show the difference between Jinja2 and Django function call in a template: {# flask syntax #} {{ some_function() }}   {# django syntax #} {{ some_function }} You can also evaluate Python math operations. Take a look: from jinja2 import Template # no context provided / needed Template("{{ 3 + 3 }}").render() >> u"6" Template("{{ 3 - 3 }}").render() >> u"0" Template("{{ 3 * 3 }}").render() >> u"9" Template("{{ 3 / 3 }}").render() >> u"1" Other math operators will also work. You may use the curly braces delimiter to access and evaluate lists and dictionaries: from jinja2 import Template Template("{{ my_list[0] }}").render(my_list=[1, 2, 3]) >> u'1' Template("{{ my_list['foo'] }}").render(my_list={'foo': 'bar'}) >> u'bar' # and here's some magic Template("{{ my_list.foo }}").render(my_list={'foo': 'bar'}) >> u'bar' To access a list or dictionary value, just use normal plain Python notation. With dictionaries, you can also access a key value using variable access notation, which is pretty neat. Besides the curly braces delimiter, Jinja2 also has the curly braces/percentage delimiter, which uses the notation {% stmt %} and is used to execute statements, which may be a control statement or not. Its usage depends on the statement, where control statements have the following notation: {% stmt %} {% endstmt %} The first tag has the statement name, while the second is the closing tag, which has the name of the statement appended with end in the beginning. You must be aware that a non-control statement may not have a closing tag. Let's look at some examples: {% block content %} {% for i in items %} {{ i }} - {{ i.price }} {% endfor %} {% endblock %} The preceding example is a little more complex than what we have been seeing. It uses a control statement for loop inside a block statement (you can have a statement inside another), which is not a control statement, as it does not control execution flow in the template. Inside the for loop you see that the i variable is being printed together with the associated price (defined elsewhere). A last delimiter you should know is {# comments go here #}. It is a multi-line delimiter used to declare comments. Let's see two examples that have the same result: {# first example #} {# second example #} Both comment delimiters hide the content between {# and #}. As can been seen, this delimiter works for one-line comments and multi-line comments, what makes it very convenient. Control structures We have a nice set of built-in control structures defined by default in Jinja2. Let's begin our studies on it with the if statement. {% if true %}Too easy{% endif %} {% if true == true == True %}True and true are the same{% endif %} {% if false == false == False %}False and false also are the same{% endif %} {% if none == none == None %}There's also a lowercase None{% endif %} {% if 1 >= 1 %}Compare objects like in plain python{% endif %} {% if 1 == 2 %}This won't be printed{% else %}This will{% endif %} {% if "apples" != "oranges" %}All comparison operators work = ]{% endif %} {% if something %}elif is also supported{% elif something_else %}^_^{% endif %} The if control statement is beautiful! It behaves just like a python if statement. As seen in the preceding code, you can use it to compare objects in a very easy fashion. "else" and "elif" are also fully supported. You may also have noticed that true and false, non-capitalized, were used together with plain Python Booleans, True and False. As a design decision to avoid confusion, all Jinja2 templates have a lowercase alias for True, False, and None. By the way, lowercase syntax is the preferred way to go. If needed, and you should avoid this scenario, you may group comparisons together in order to change precedence evaluation. See the following example: {% if 5 < 10 < 15 %}true{%else%}false{% endif %} {% if (5 < 10) < 15 %}true{%else%}false{% endif %} {% if 5 < (10 < 15) %}true{%else%}false{% endif %} The expected output for the preceding example is true, true, and false. The first two lines are pretty straightforward. In the third line, first, (10<15) is evaluated to True, which is a subclass of int, where True == 1. Then 5 < True is evaluated, which is certainly false. The for statement is pretty important. One can hardly think of a serious Web application that does not have to show a list of some kind at some point. The for statement can iterate over any iterable instance and has a very simple, Python-like syntax: {% for item in my_list %} {{ item }}{# print evaluate item #} {% endfor %} {# or #} {% for key, value in my_dictionary.items() %} {{ key }}: {{ value }} {% endfor %} In the first statement, we have the opening tag indicating that we will iterate over my_list items and each item will be referenced by the name item. The name item will be available inside the for loop context only. In the second statement, we have an iteration over the key value tuples that form my_dictionary, which should be a dictionary (if the variable name wasn't suggestive enough). Pretty simple, right? The for loop also has a few tricks in store for you. When building HTML lists, it's a common requirement to mark each list item in alternating colors in order to improve readability or mark the first or/and last item with some special markup. Those behaviors can be achieved in a Jinja2 for-loop through access to a loop variable available inside the block context. Let's see some examples: {% for i in ['a', 'b', 'c', 'd'] %} {% if loop.first %}This is the first iteration{% endif %} {% if loop.last %}This is the last iteration{% endif %} {{ loop.cycle('red', 'blue') }}{# print red or blue alternating #} {{ loop.index }} - {{ loop.index0 }} {# 1 indexed index – 0 indexed index #} {# reverse 1 indexed index – reverse 0 indexed index #} {{ loop.revindex }} - {{ loop.revindex0 }} {% endfor %} The for loop statement, as in Python, also allow the use of else, but with a slightly different meaning. In Python, when you use else with for, the else block is only executed if it was not reached through a break command like this: for i in [1, 2, 3]: pass else: print "this will be printed" for i in [1, 2, 3]: if i == 3:    break else: print "this will never not be printed" As seen in the preceding code snippet, the else block will only be executed in a for loop if the execution was never broken by a break command. With Jinja2, the else block is executed when the for iterable is empty. For example: {% for i in [] %} {{ i }} {% else %}I'll be printed{% endfor %} {% for i in ['a'] %} {{ i }} {% else %}I won't{% endfor %} As we are talking about loops and breaks, there are two important things to know: the Jinja2 for loop does not support break or continue. Instead, to achieve the expected behavior, you should use loop filtering as follows: {% for i in [1, 2, 3, 4, 5] if i > 2 %} value: {{ i }}; loop.index: {{ loop.index }} {%- endfor %} In the first tag you see a normal for loop together with an if condition. You should consider that condition as a real list filter, as the index itself is only counted per iteration. Run the preceding example and the output will be the following: value:3; index: 1 value:4; index: 2 value:5; index: 3 Look at the last observation in the preceding example—in the second tag, do you see the dash in {%-? It tells the renderer that there should be no empty new lines before the tag at each iteration. Try our previous example without the dash and compare the results to see what changes. We'll now look at three very important statements used to build templates from different files: block, extends, and include. block and extends always work together. The first is used to define "overwritable" blocks in a template, while the second defines a parent template that has blocks, for the current template. Let's see an example: # coding:utf-8 with open('parent.txt', 'w') as file:    file.write(""" {% block template %}parent.txt{% endblock %} =========== I am a powerful psychic and will tell you your past   {#- "past" is the block identifier #} {% block past %} You had pimples by the age of 12. {%- endblock %}   Tremble before my power!!!""".strip())   with open('child.txt', 'w') as file:    file.write(""" {% extends "parent.txt" %}   {# overwriting the block called template from parent.txt #} {% block template %}child.txt{% endblock %}   {#- overwriting the block called past from parent.txt #} {% block past %} You've bought an ebook recently. {%- endblock %}""".strip()) with open('other.txt', 'w') as file:    file.write(""" {% extends "child.txt" %} {% block template %}other.txt{% endblock %}""".strip())   from jinja2 import Environment, FileSystemLoader   env = Environment() # tell the environment how to load templates env.loader = FileSystemLoader('.') # look up our template tmpl = env.get_template('parent.txt') # render it to default output print tmpl.render() print "" # loads child.html and its parent tmpl = env.get_template('child.txt') print tmpl.render() # loads other.html and its parent env.get_template('other.txt').render() Do you see the inheritance happening, between child.txt and parent.txt? parent.txt is a simple template with two block statements, called template and past. When you render parent.txt directly, its blocks are printed "as is", because they were not overwritten. In child.txt, we extend the parent.txt template and overwrite all its blocks. By doing that, we can have different information in specific parts of a template without having to rewrite the whole thing. With other.txt, for example, we extend the child.txt template and overwrite only the block-named template. You can overwrite blocks from a direct parent template or from any of its parents. If you were defining an index.txt page, you could have default blocks in it that would be overwritten when needed, saving lots of typing. Explaining the last example, Python-wise, is pretty simple. First, we create a Jinja2 environment (we talked about this earlier) and tell it how to load our templates, then we load the desired template directly. We do not have to bother telling the environment how to find parent templates, nor do we need to preload them. The include statement is probably the easiest statement so far. It allows you to render a template inside another in a very easy fashion. Let's look at an example: with open('base.txt', 'w') as file: file.write(""" {{ myvar }} You wanna hear a dirty joke? {% include 'joke.txt' %} """.strip()) with open('joke.txt', 'w') as file: file.write(""" A boy fell in a mud puddle. {{ myvar }} """.strip())   from jinja2 import Environment, FileSystemLoader   env = Environment() # tell the environment how to load templates env.loader = FileSystemLoader('.') print env.get_template('base.txt').render(myvar='Ha ha!') In the preceding example, we render the joke.txt template inside base.txt. As joke.txt is rendered inside base.txt, it also has full access to the base.txt context, so myvar is printed normally. Finally, we have the set statement. It allows you to define variables for inside the template context. Its use is pretty simple: {% set x = 10 %} {{ x }} {% set x, y, z = 10, 5+5, "home" %} {{ x }} - {{ y }} - {{ z }} In the preceding example, if x was given by a complex calculation or a database query, it would make much more sense to have it cached in a variable, if it is to be reused across the template. As seen in the example, you can also assign a value to multiple variables at once. Macros Macros are the closest to coding you'll get inside Jinja2 templates. The macro definition and usage are similar to plain Python functions, so it is pretty easy. Let's try an example: with open('formfield.html', 'w') as file: file.write(''' {% macro input(name, value='', label='') %} {% if label %} <label for='{{ name }}'>{{ label }}</label> {% endif %} <input id='{{ name }}' name='{{ name }}' value='{{ value }}'></input> {% endmacro %}'''.strip()) with open('index.html', 'w') as file: file.write(''' {% from 'formfield.html' import input %} <form method='get' action='.'> {{ input('name', label='Name:') }} <input type='submit' value='Send'></input> </form> '''.strip())   from jinja2 import Environment, FileSystemLoader   env = Environment() env.loader = FileSystemLoader('.') print env.get_template('index.html').render() In the preceding example, we create a macro that accepts a name argument and two optional arguments: value and label. Inside the macro block, we define what should be output. Notice we can use other statements inside a macro, just like a template. In index.html we import the input macro from inside formfield.html, as if formfield was a module and input was a Python function using the import statement. If needed, we could even rename our input macro like this: {% from 'formfield.html' import input as field_input %} You can also import formfield as a module and use it as follows: {% import 'formfield.html' as formfield %} When using macros, there is a special case where you want to allow any named argument to be passed into the macro, as you would in a Python function (for example, **kwargs). With Jinja2 macros, these values are, by default, available in a kwargs dictionary that does not need to be explicitly defined in the macro signature. For example: # coding:utf-8 with open('formfield.html', 'w') as file:    file.write(''' {% macro input(name) -%} <input id='{{ name }}' name='{{ name }}' {% for k,v in kwargs.items() -%}{{ k }}='{{ v }}' {% endfor %}></input> {%- endmacro %} '''.strip())with open('index.html', 'w') as file:    file.write(''' {% from 'formfield.html' import input %} {# use method='post' whenever sending sensitive data over HTTP #} <form method='post' action='.'> {{ input('name', type='text') }} {{ input('passwd', type='password') }} <input type='submit' value='Send'></input> </form> '''.strip())   from jinja2 import Environment, FileSystemLoader   env = Environment() env.loader = FileSystemLoader('.') print env.get_template('index.html').render() As you can see, kwargs is available even though you did not define a kwargs argument in the macro signature. Macros have a few clear advantages over plain templates, that you notice with the include statement: You do not have to worry about variable names in the template using macros You can define the exact required context for a macro block through the macro signature You can define a macro library inside a template and import only what is needed Commonly used macros in a Web application include a macro to render pagination, another to render fields, and another to render forms. You could have others, but these are pretty common use cases. Regarding our previous example, it is good practice to use HTTPS (also known as, Secure HTTP) to send sensitive information, such as passwords, over the Internet. Be careful about that! Extensions Extensions are the way Jinja2 allows you to extend its vocabulary. Extensions are not enabled by default, so you can enable an extension only when and if you need, and start using it without much trouble: env = Environment(extensions=['jinja2.ext.do',   'jinja2.ext.with_']) In the preceding code, we have an example where you create an environment with two extensions enabled: do and with. Those are the extensions we will study in this article. As the name suggests, the do extension allows you to "do stuff". Inside a do tag, you're allowed to execute Python expressions with full access to the template context. Flask-Empty, a popular flask boilerplate available at https://github.com/italomaia/flask-empty uses the do extension to update a dictionary in one of its macros, for example. Let's see how we could do the same: {% set x = {1:'home', '2':'boat'} %} {% do x.update({3: 'bar'}) %} {%- for key,value in x.items() %} {{ key }} - {{ value }} {%- endfor %} In the preceding example, we create the x variable with a dictionary, then we update it with {3: 'bar'}. You don't usually need to use the do extension but, when you do, a lot of coding is saved. The with extension is also very simple. You use it whenever you need to create block scoped variables. Imagine you have a value you need cached in a variable for a brief moment; this would be a good use case. Let's see an example: {% with age = user.get_age() %} My age: {{ age }} {% endwith %} My age: {{ age }}{# no value here #} As seen in the example, age exists only inside the with block. Also, variables set inside a with block will only exist inside it. For example: {% with %} {% set count = query.count() %} Current Stock: {{ count }} Diff: {{ prev_count - count }} {% endwith %} {{ count }} {# empty value #} Filters Filters are a marvelous thing about Jinja2! This tool allows you to process a constant or variable before printing it to the template. The goal is to implement the formatting you want, strictly in the template. To use a filter, just call it using the pipe operator like this: {% set name = 'junior' %} {{ name|capitalize }} {# output is Junior #} Its name is passed to the capitalize filter that processes it and returns the capitalized value. To inform arguments to the filter, just call it like a function, like this: {{ ['Adam', 'West']|join(' ') }} {# output is Adam West #} The join filter will join all values from the passed iterable, putting the provided argument between them. Jinja2 has an enormous quantity of available filters by default. That means we can't cover them all here, but we can certainly cover a few. capitalize and lower were seen already. Let's look at some further examples: {# prints default value if input is undefined #} {{ x|default('no opinion') }} {# prints default value if input evaluates to false #} {{ none|default('no opinion', true) }} {# prints input as it was provided #} {{ 'some opinion'|default('no opinion') }}   {# you can use a filter inside a control statement #} {# sort by key case-insensitive #} {% for key in {'A':3, 'b':2, 'C':1}|dictsort %}{{ key }}{% endfor %} {# sort by key case-sensitive #} {% for key in {'A':3, 'b':2, 'C':1}|dictsort(true) %}{{ key }}{% endfor %} {# sort by value #} {% for key in {'A':3, 'b':2, 'C':1}|dictsort(false, 'value') %}{{ key }}{% endfor %} {{ [3, 2, 1]|first }} - {{ [3, 2, 1]|last }} {{ [3, 2, 1]|length }} {# prints input length #} {# same as in python #} {{ '%s, =D'|format("I'm John") }} {{ "He has two daughters"|replace('two', 'three') }} {# safe prints the input without escaping it first#} {{ '<input name="stuff" />'|safe }} {{ "there are five words here"|wordcount }} Try the preceding example to see exactly what each filter does. After reading this much about Jinja2, you're probably thinking: "Jinja2 is cool but this is a book about Flask. Show me the Flask stuff!". Ok, ok, I can do that! Of what we have seen so far, almost everything can be used with Flask with no modifications. As Flask manages the Jinja2 environment for you, you don't have to worry about creating file loaders and stuff like that. One thing you should be aware of, though, is that, because you don't instantiate the Jinja2 environment yourself, you can't really pass to the class constructor, the extensions you want to activate. To activate an extension, add it to Flask during the application setup as follows: from flask import Flask app = Flask(__name__) app.jinja_env.add_extension('jinja2.ext.do') # or jinja2.ext.with_ if __name__ == '__main__': app.run() Messing with the template context You can use the render_template method to load a template from the templates folder and then render it as a response. from flask import Flask, render_template app = Flask(__name__)   @app.route("/") def hello():    return render_template("index.html") If you want to add values to the template context, as seen in some of the examples in this article, you would have to add non-positional arguments to render_template: from flask import Flask, render_template app = Flask(__name__)   @app.route("/") def hello():    return render_template("index.html", my_age=28) In the preceding example, my_age would be available in the index.html context, where {{ my_age }} would be translated to 28. my_age could have virtually any value you want to exhibit, actually. Now, what if you want all your views to have a specific value in their context, like a version value—some special code or function; how would you do it? Flask offers you the context_processor decorator to accomplish that. You just have to annotate a function that returns a dictionary and you're ready to go. For example: from flask import Flask, render_response app = Flask(__name__)   @app.context_processor def luck_processor(): from random import randint def lucky_number():    return randint(1, 10) return dict(lucky_number=lucky_number)   @app.route("/") def hello(): # lucky_number will be available in the index.html context by default return render_template("index.html") Summary In this article, we saw how to render templates using only Jinja2, how control statements look and how to use them, how to write a comment, how to print variables in a template, how to write and use macros, how to load and use extensions, and how to register context processors. I don't know about you, but this article felt like a lot of information! I strongly advise you to run the experiment with the examples. Knowing your way around Jinja2 will save you a lot of headaches. Resources for Article: Further resources on this subject: Recommender systems dissected Deployment and Post Deployment [article] Handling sessions and users [article] Introduction to Custom Template Filters and Tags [article]
Read more
  • 0
  • 0
  • 6989

article-image-virtualization
Packt
07 Jul 2015
37 min read
Save for later

Learning Embedded Linux Using Yocto: Virtualization

Packt
07 Jul 2015
37 min read
In this article by Alexandru Vaduva, author of the book Learning Embedded Linux Using the Yocto Project, you will be presented with information about various concepts that appeared in the Linux virtualization article. As some of you might know, this subject is quite vast and selecting only a few components to be explained is also a challenge. I hope my decision would please most of you interested in this area. The information available in this article might not fit everyone's need. For this purpose, I have attached multiple links for more detailed descriptions and documentation. As always, I encourage you to start reading and finding out more, if necessary. I am aware that I cannot put all the necessary information in only a few words. In any Linux environment today, Linux virtualization is not a new thing. It has been available for more than ten years and has advanced in a really quick and interesting manner. The question now does not revolve around virtualization as a solution for me, but more about what virtualization solutions to deploy and what to virtualize. (For more resources related to this topic, see here.) Linux virtualization The first benefit everyone sees when looking at virtualization is the increase in server utilization and the decrease in energy costs. Using virtualization, the workloads available on a server are maximized, which is very different from scenarios where hardware uses only a fraction of the computing power. It can reduce the complexity of interaction with various environments and it also offers an easier-to-use management system. Today, working with a large number of virtual machines is not as complicated as interaction with a few of them because of the scalability most tools offer. Also, the time of deployment has really decreased. In a matter of minutes, you can deconfigure and deploy an operating system template or create a virtual environment for a virtual appliance deploy. One other benefit virtualization brings is flexibility. When a workload is just too big for allocated resources, it can be easily duplicated or moved on another environment that suit its needs better on the same hardware or on a more potent server. For a cloud-based solution regarding this problem, the sky is the limit here. The limit may be imposed by the cloud type on the basis of whether there are tools available for a host operating system. Over time, Linux was able to provide a number of great choices for every need and organization. Whether your task involves server consolidation in an enterprise data centre, or improving a small nonprofit infrastructure, Linux should have a virtualization platform for your needs. You simply need to figure out where and which project you should chose. Virtualization is extensive, mainly because it contains a broad range of technologies, and also since large portions of the terms are not well defined. In this article, you will be presented with only components related to the Yocto Project and also to a new initiative that I personally am interested in. This initiative tries to make Network Function Virtualization (NFV) and Software Defined Networks (SDN) a reality and is called Open Platform for NFV (OPNFV). It will be explained here briefly. SDN and NFV I have decided to start with this topic because I believe it is really important that all the research done in this area is starting to get traction with a number of open source initiatives from all sorts of areas and industries. Those two concepts are not new. They have been around for 20 years since they were first described, but the last few years have made possible it for them to resurface as real and very possible implementations. The focus of this article will be on the NFV article since it has received the most amount of attention, and also contains various implementation proposals. NFV NFV is a network architecture concept used to virtualize entire categories of network node functions into blocks that can be interconnected to create communication services. It is different from known virtualization techniques. It uses Virtual Network Functions (VNF) that can be contained in one or more virtual machines, which execute different processes and software components available on servers, switches, or even a cloud infrastructure. A couple of examples include virtualized load balancers, intrusion detected devices, firewalls, and so on. The development product cycles in the telecommunication industry were very rigorous and long due to the fact that the various standards and protocols took a long time until adherence and quality meetings. This made it possible for fast moving organizations to become competitors and made them change their approach. In 2013, an industry specification group published a white paper on software-defined networks and OpenFlow. The group was part of European Telecommunications Standards Institute (ETSI) and was called Network Functions Virtualisation. After this white paper was published, more in-depth research papers were published, explaining things ranging from terminology definitions to various use cases with references to vendors that could consider using NFV implementations. ETSI NFV The ETSI NFV workgroup has appeared useful for the telecommunication industry to create more agile cycles of development and also make it able to respond in time to any demands from dynamic and fast changing environments. SDN and NFV are two complementary concepts that are key enabling technologies in this regard and also contain the main ingredients of the technology that are developed by both telecom and IT industries. The NFV framework consist of six components: NFV Infrastructure (NFVI): It is required to offer support to a variety of use cases and applications. It comprises of the totality of software and hardware components that create the environment for which VNF is deployed. It is a multitenant infrastructure that is responsible for the leveraging of multiple standard virtualization technologies use cases at the same time. It is described in the following NFV Industry Specification Groups (NFV ISG) documents: NFV Infrastructure Overview NFV Compute NFV Hypervisor Domain NFV Infrastructure Network Domain The following image presents a visual graph of various use cases and fields of application for the NFV Infrastructure NFV Management and Orchestration (MANO): It is the component responsible for the decoupling of the compute, networking, and storing components from the software implementation with the help of a virtualization layer. It requires the management of new elements and the orchestration of new dependencies between them, which require certain standards of interoperability and a certain mapping. NFV Software Architecture: It is related to the virtualization of the already implemented network functions, such as proprietary hardware appliances. It implies the understanding and transition from a hardware implementation into a software one. The transition is based on various defined patterns that can be used in a process. NFV Reliability and Availability: These are real challenges and the work involved in these components started from the definition of various problems, use cases, requirements, and principles, and it has proposed itself to offer the same level of availability as legacy systems. It relates to the reliability component and the documentation only sets the stage for future work. It only identifies various problems and indicates the best practices used in designing resilient NFV systems. NFV Performance and Portability: The purpose of NFV, in general, is to transform the way it works with networks of future. For this purpose, it needs to prove itself as wordy solution for industry standards. This article explains how to apply the best practices related to performance and portability in a general VNF deployment. NFV Security: Since it is a large component of the industry, it is concerned about and also dependent on the security of networking and cloud computing, which makes it critical for NFV to assure security. The Security Expert Group focuses on those concerns. An architectural of these components is presented here: After all the documentation is in place, a number of proof of concepts need to be executed in order to test the limitation of these components and accordingly adjust the theoretical components. They have also appeared to encourage the development of the NFV ecosystem. SDN Software-Defined Networking (SDN) is an approach to networking that offers the possibility to manage various services using the abstraction of available functionalities to administrators. This is realized by decoupling the system into a control plane and data plane and making decisions based on the network traffic that is sent; this represents the control plane realm, and where the traffic is forwarded is represented by the data plane. Of course, some method of communication between the control and data plane is required, so the OpenFlow mechanism entered into the equation at first; however other components could as well take its place. The intention of SDN was to offer an architecture that was manageable, cost-effective, adaptable, and dynamic, as well as suitable for the dynamic and high-bandwidth scenarios that are available today. The OpenFlow component was the foundation of the SDN solution. The SDN architecture permitted the following: Direct programming: The control plane is directly programmable because it is completely decoupled by the data plane. Programmatically configuration: SDN permitted management, configuration, and optimization of resources though programs. These programs could also be written by anyone because they were not dependent on any proprietary components. Agility: The abstraction between two components permitted the adjustment of network flows according to the needs of a developer. Central management: Logical components could be centered on the control plane, which offered a viewpoint of a network to other applications, engines, and so on. Opens standards and vendor neutrality: It is implemented using open standards that have simplified the SDN design and operations because of the number of instructions provided to controllers. This is smaller compared to other scenarios in which multiple vendor-specific protocols and devices should be handled. Also, meeting market requirements with traditional solutions would have been impossible, taking into account newly emerging markets of mobile device communication, Internet of Things (IoT), Machine to Machine (M2M), Industry 4.0, and others, all require networking support. Taking into consideration the available budgets for further development in various IT departments, were all faced to make a decision. It seems that the mobile device communication market all decided to move toward open source in the hope that this investment would prove its real capabilities, and would also lead to a brighter future. OPNFV The Open Platform for the NFV Project tries to offer an open source reference platform that is carrier-graded and tightly integrated in order to facilitate industry peers to help improve and move the NFV concept forward. Its purpose is to offer consistency, interoperability, and performance among numerous blocks and projects that already exist. This platform will also try to work closely with a variety of open source projects and continuously help with integration, and at the same time, fill development gaps left by any of them. This project is expected to lead to an increase in performance, reliability, serviceability, availability, and power efficiency, but at the same time, also deliver an extensive platform for instrumentation. It will start with the development of an NFV infrastructure and a virtualized infrastructure management system where it will combine a number of already available projects. Its reference system architecture is represented by the x86 architecture. The project's initial focus point and proposed implementation can be consulted in the following image. From this image, it can be easily seen that the project, although very young since it was started in November 2014, has had an accelerated start and already has a few implementation propositions. There are already a number of large companies and organizations that have started working on their specific demos. OPNFV has not waited for them to finish and is already discussing a number of proposed project and initiatives. These are intended both to meet the needs of their members as well as assure them of the reliability various components, such as continuous integration, fault management, test-bed infrastructure, and others. The following figure describes the structure of OPNFV: The project has been leveraging as many open source projects as possible. All the adaptations made to these project can be done in two places. Firstly, they can be made inside the project, if it does not require substantial functionality changes that could cause divergence from its purpose and roadmap. The second option complements the first and is necessary for changes that do not fall in the first category; they should be included somewhere in the OPNFV project's codebase. None of the changes that have been made should be up streamed without proper testing within the development cycle of OPNFV. Another important element that needs to be mentioned is that OPNFV does not use any specific or additional hardware. It only uses available hardware resources as long the VI-Ha reference point is supported. In the preceding image, it can be seen that this is already done by having providers, such as Intel for the computing hardware, NetApp for storage hardware, and Mellanox for network hardware components. The OPNFV board and technical steering committee have a quite large palette of open source projects. They vary from Infrastructure as a Service (IaaS) and hypervisor to the SDN controller and the list continues. This only offers the possibility for a large number of contributors to try some of the skills that maybe did not have the time to work on, or wanted to learn but did not have the opportunity to. Also, a more diversified community offers a broader view of the same subject. There are a large variety of appliances for the OPNFV project. The virtual network functions are diverse for mobile deployments where mobile gateways (such as Serving Gateway (SGW), Packet Data Network Gateway (PGW), and so on) and related functions (Mobility Management Entity (MME) and gateways), firewalls or application-level gateways and filters (web and e-mail traffic filters) are used to test diagnostic equipment (Service-Level Agreement (SLA) monitoring). These VNF deployments need to be easy to operate, scale, and evolve independently from the type of VNF that is deployed. OPNFV sets out to create a platform that has to support a set of qualities and use-cases as follows: A common mechanism is needed for the life-cycle management of VNFs, which include deployment, instantiation, configuration, start and stop, upgrade/downgrade, and final decommissioning A consistent mechanism is used to specify and interconnect VNFs, VNFCs, and PNFs; these are indepedant of the physical network infrastructure, network overlays, and so on, that is, a virtual link A common mechanism is used to dynamically instantiate new VNF instances or decommission sufficient ones to meet the current performance, scale, and network bandwidth needs A mechanism is used to detect faults and failure in the NFVI, VIM, and other components of an infrastructure as well as recover from these failures A mechanism is used to source/sink traffic from/to a physical network function to/from a virtual network function NFVI as a Service is used to host different VNF instances from different vendors on the same infrastructure There are some notable and easy-to-grasp use case examples that should be mentioned here. They are organized into four categories. Let's start with the first category: the Residential/Access category. It can be used to virtualize the home environment but it also provides fixed access to NFV. The next one is data center: it has the virtualization of CDN and provides use cases that deal with it. The mobile category consists of the virtualization of mobile core networks and IMS as well as the virtualization of mobile base stations. Lastly, there are cloud categories that include NFVIaaS, VNFaaS, the VNF forwarding graph (Service Chains), and the use cases of VNPaaS. More information about this project and various implementation components is available at https://www.opnfv.org/. For the definitions of missing terminologies, please consult http://www.etsi.org/deliver/etsi_gs/NFV/001_099/003/01.02.01_60/gs_NFV003v010201p.pdf. Virtualization support for the Yocto Project The meta-virtualization layer tries to create a long and medium term production-ready layer specifically for an embedded virtualization. This roles that this has are: Simplifying the way collaborative benchmarking and researching is done with tools, such as KVM/LxC virtualization, combined with advance core isolation and other techniques Integrating and contributing with projects, such as OpenFlow, OpenvSwitch, LxC, dmtcp, CRIU and others, which can be used with other components, such as OpenStack or Carrier Graded Linux. To summarize this in one sentence, this layer tries to provide support while constructing OpenEmbedded and Yocto Project-based virtualized solutions. The packages that are available in this layer, which I will briefly talk about are as follows: CRIU Docker LXC Irqbalance Libvirt Xen Open vSwitch This layer can be used in conjunction with the meta-cloud-services layer that offer cloud agents and API support for various cloud-based solutions. In this article, I am referring to both these layers because I think it is fit to present these two components together. Inside the meta-cloud-services layer, there are also a couple of packages that will be discussed and briefly presented, as follows: openLDAP SPICE Qpid RabbitMQ Tempest Cyrus-SASL Puppet oVirt OpenStack Having mentioned these components, I will now move on with the explanation of each of these tools. Let's start with the content of the meta-virtualization layer, more exactly with CRIU package, a project that implements Checkpoint/Restore In Userspace for Linux. It can be used to freeze an already running application and checkpoint it to a hard drive as a collection of files. These checkpoints can be used to restore and execute the application from that point. It can be used as part of a number of use cases, as follows: Live migration of containers: It is the primary use case for a project. The container is check pointed and the resulting image is moved into another box and restored there, making the whole experience almost unnoticeable by the user. Upgrading seamless kernels: The kernel replacement activity can be done without stopping activities. It can be check pointed, replaced by calling kexec, and all the services can be restored afterwards. Speeding up slow boot services: It is a service that has a slow boot procedure, can be check pointed after the first start up is finished, and for consecutive starts, can be restored from that point. Load balancing of networks: It is a part of the TCP_REPAIR socket option and switches the socket in a special state. The socket is actually put into the state expected from it at the end of the operation. For example, if connect() is called, the socket will be put in an ESTABLISHED state as requested without checking for acknowledgment of communication from the other end, so offloading could be at the application level. Desktop environment suspend/resume: It is based on the fact that the suspend/restore action for a screen session or an X application is by far faster than the close/open operation. High performance and computing issues: It can be used for both load balancing of tasks over a cluster and the saving of cluster node states in case a crash occurs. Having a number of snapshots for application doesn't hurt anybody. Duplication of processes: It is similar to the remote fork() operation. Snapshots for applications: A series of application states can be saved and reversed back if necessary. It can be used both as a redo for the desired state of an application as well as for debugging purposes. Save ability in applications that do not have this option: An example of such an application could be games in which after reaching a certain level, the establishment of a checkpoint is the thing you need. Migrate a forgotten application onto the screen: If you have forgotten to include an application onto the screen and you are already there, CRIU can help with the migration process. Debugging of applications that have hung: For services that are stuck because of git and need a quick restart, a copy of the services can be used to restore. A dump process can also be used and through debugging, the cause of the problem can be found. Application behavior analysis on a different machine: For those applications that could behave differently from one machine to another, a snapshot of the application in question can be used and transferred into the other. Here, the debugging process can also be an option. Dry running updates: Before a system or kernel update on a system is done, its services and critical applications could be duplicated onto a virtual machine and after the system update and all the test cases pass, the real update can be done. Fault-tolerant systems: It can be used successfully for process duplication on other machines. The next element is irqbalance, a distributed hardware interrupt system that is available across multiple processors and multiprocessor systems. It is, in fact, a daemon used to balance interrupts across multiple CPUs, and its purpose is to offer better performances as well as better IO operation balance on SMP systems. It has alternatives, such as smp_affinity, which could achieve maximum performance in theory, but lacks the same flexibility that irqbalance provides. The libvirt toolkit can be used to connect with the virtualization capabilities available in the recent Linux kernel versions that have been licensed under the GNU Lesser General Public License. It offers support for a large number of packages, as follows: KVM/QEMU Linux supervisor Xen supervisor LXC Linux container system OpenVZ Linux container system Open Mode Linux a paravirtualized kernel Hypervisors that include VirtualBox, VMware ESX, GSX, Workstation and player, IBM PowerVM, Microsoft Hyper-V, Parallels, and Bhyve Besides these packages, it also offers support for storage on a large variety of filesystems, such as IDE, SCSI or USB disks, FiberChannel, LVM, and iSCSI or NFS, as well as support for virtual networks. It is the building block for other higher-level applications and tools that focus on the virtualization of a node and it does this in a secure way. It also offers the possibility of a remote connection. For more information about libvirt, take a look at its project goals and terminologies at http://libvirt.org/goals.html. The next is Open vSwitch, a production-quality implementation of a multilayer virtual switch. This software component is licensed under Apache 2.0 and is designed to enable massive network automations through various programmatic extensions. The Open vSwitch package, also abbreviated as OVS, provides a two stack layer for hardware virtualizations and also supports a large number of the standards and protocols available in a computer network, such as sFlow, NetFlow, SPAN, CLI, RSPAN, 802.1ag, LACP, and so on. Xen is a hypervisor with a microkernel design that provides services offering multiple computer operating systems to be executed on the same architecture. It was first developed at the Cambridge University in 2003, and was developed under GNU General Public License version 2. This piece of software runs on a more privileged state and is available for ARM, IA-32, and x86-64 instruction sets. A hypervisor is a piece of software that is concerned with the CPU scheduling and memory management of various domains. It does this from the domain 0 (dom0), which controls all the other unprivileged domains called domU; Xen boots from a bootloader and usually loads into the dom0 host domain, a paravirtualized operating system. A brief look at the Xen project architecture is available here: Linux Containers (LXC) is the next element available in the meta-virtualization layer. It is a well-known set of tools and libraries that offer virtualization at the operating system level by offering isolated containers on a Linux control host machine. It combines the functionalities of kernel control groups (cgroups) with the support for isolated namespaces to provide an isolated environment. It has received a fair amount of attention mostly due to Docker, which will be briefly mentioned a bit later. Also, it is considered a lightweight alternative to full machine virtualization. Both of these options, containers and machine virtualization, have a fair amount of advantages and disadvantages. If the first option, containers offer low overheads by sharing certain components, and it may turn out that it does not have a good isolation. Machine virtualization is exactly the opposite of this and offers a great solution to isolation at the cost of a bigger overhead. These two solutions could also be seen as complementary, but this is only my personal view of the two. In reality, each of them has its particular set of advantages and disadvantages that could sometimes be uncomplementary as well. More information about Linux containers is available at https://linuxcontainers.org/. The last component of the meta-virtualization layer that will be discussed is Docker, an open source piece of software that tries to automate the method of deploying applications inside Linux containers. It does this by offering an abstraction layer over LXC. Its architecture is better described in this image: As you can see in the preceding diagram, this software package is able to use the resources of the operating system. Here, I am referring to the functionalities of the Linux kernel and have isolated other applications from the operating system. It can do this either through LXC or other alternatives, such as libvirt and systemd-nspawn, which are seen as indirect implementations. It can also do this directly through the libcontainer library, which has been around since the 0.9 version of Docker. Docker is a great component if you want to obtain automation for distributed systems, such as large-scale web deployments, service-oriented architectures, continuous deployment systems, database clusters, private PaaS, and so on. More information about its use cases is available at https://www.docker.com/resources/usecases/. Make sure you take a look at this website; interesting information is often here. After finishing with the meta-virtualization layer, I will move next to the meta-cloud-services layer that contains various elements. I will start with Simple Protocol for Independent Computing Environments (Spice). This can be translated into a remote-display system for virtualized desktop devices. It initially started as a closed source software, and in two years it was decided to make it open source. It then became an open standard to interaction with devices, regardless of whether they are virtualized one not. It is built on a client-server architecture, making it able to deal with both physical and virtualized devices. The interaction between backend and frontend is realized through VD-Interfaces (VDI), and as shown in the following diagram, its current focus is the remote access to QEMU/KVM virtual machines: Next on the list is oVirt, a virtualization platform that offers a web interface. It is easy to use and helps in the management of virtual machines, virtualized networks, and storages. Its architecture consists of an oVirt Engine and multiple nodes. The engine is the component that comes equipped with a user-friendly interface to manage logical and physical resources. It also runs the virtual machines that could be either oVirt nodes, Fedora, or CentOS hosts. The only downfall of using oVirt is that it only offers support for a limited number of hosts, as follows: Fedora 20 CentOS 6.6, 7.0 Red Hat Enterprise Linux 6.6, 7.0 Scientific Linux 6.6, 7.0 As a tool, it is really powerful. It offers integration with libvirt for Virtual Desktops and Servers Manager (VDSM) communications with virtual machines and also support for SPICE communication protocols that enable remote desktop sharing. It is a solution that was started and is mainly maintained by Red Hat. It is the base element of their Red Hat Enterprise Virtualization (RHEV), but one thing is interesting and should be watched out for is that Red Hat now is not only a supporter of projects, such as oVirt and Aeolus, but has also been a platinum member of the OpenStack foundation since 2012. For more information on projects, such as oVirt, Aeolus, and RHEV, the following links can be useful to you: http://www.redhat.com/promo/rhev3/?sc_cid=70160000000Ty5wAAC&offer_id=70160000000Ty5NAAS, http://www.aeolusproject.org/ and http://www.ovirt.org/Home. I will move on to a different component now. Here, I am referring to the open source implementation of the Lightweight Directory Access Protocol, simply called openLDAP. Although it has a somewhat controverted license called OpenLDAP Public License, which is similar in essence to the BSD license, it is not recorded at opensource.org, making it uncertified by Open Source Initiative (OSI). This software component comes as a suite of elements, as follows: A standalone LDAP daemon that plays the role of a server called slapd A number of libraries that implement the LDAP protocol Last but not the least, a series of tools and utilities that also have a couple of clients samples between them There are also a number of additions that should be mentioned, such as ldapc++ and libraries written in C++, JLDAP and the libraries written in Java; LMDB, a memory mapped database library; Fortress, a role-based identity management; SDK, also written in Java; and a JDBC-LDAP Bridge driver that is written in Java and called JDBC-LDAP. Cyrus-SASL is a generic client-server library implementation for Simple Authentication and Security Layer (SASL) authentication. It is a method used for adding authentication support for connection-based protocols. A connection-based protocol adds a command that identifies and authenticates a user to the requested server and if negotiation is required, an additional security layer is added between the protocol and the connection for security purposes. More information about SASL is available in the RFC 2222, available at http://www.ietf.org/rfc/rfc2222.txt. For a more detailed description of Cyrus SASL, refer to http://www.sendmail.org/~ca/email/cyrus/sysadmin.html. Qpid is a messaging tool developed by Apache, which understands Advanced Message Queueing Protocol (AMQP) and has support for various languages and platforms. AMQP is an open source protocol designed for high-performance messaging over a network in a reliable fashion. More information about AMQP is available at http://www.amqp.org/specification/1.0/amqp-org-download. Here, you can find more information about the protocol specifications as well as about the project in general. Qpid projects push the development of AMQP ecosystems and this is done by offering message brokers and APIs that can be used in any developer application that intends to use AMQP messaging part of their product. To do this, the following can be done: Letting the source code open source. Making AMQP available for a large variety of computing environments and programming languages. Offering the necessary tools to simplify the development process of an application. Creating a messaging infrastructure to make sure that other services can integrate well with the AMQP network. Creating a messaging product that makes integration with AMQP trivial for any programming language or computing environment. Make sure that you take a look at Qpid Proton at http://qpid.apache.org/proton/overview.html for this. More information about the the preceding functionalities can be found at http://qpid.apache.org/components/index.html#messaging-apis. RabbitMQ is another message broker software component that implements AMQP, which is also available as open source. It has a number of components, as follows: The RabbitMQ exchange server Gateways for HTTP, Streaming Text Oriented Message Protocol (STOMP) and Message Queue Telemetry Transport (MQTT) AMQP client libraries for a variety of programming languages, most notably Java, Erlang, and .Net Framework A plugin platform for a number of custom components that also offer a collection of predefined one: Shovel: It is a plugin that executes the copy/move operation for messages between brokers Management: It enables the control and monitoring of brokers and clusters of brokers Federation: It enables sharing at the exchange level of messages between brokers You can find out more information regarding RabbitMQ by referring to the RabbitMQ documentation article at http://www.rabbitmq.com/documentation.html. Comparing the two, Qpid and RabbitMQ, it can be concluded that RabbitMQ is better and also that it has a fantastic documentation. This makes it the first choice for the OpenStack Foundation as well as for readers interested in benchmarking information for more than these frameworks. It is also available at http://blog.x-aeon.com/2013/04/10/a-quick-message-queue-benchmark-activemq-rabbitmq-hornetq-qpid-apollo/. One such result is also available in this image for comparison purposes: The next element is puppet, an open source configuration management system that allows IT infrastructure to have certain states defined and also enforce these states. By doing this, it offers a great automation system for system administrators. This project is developed by the Puppet Labs and was released under GNU General Public License until version 2.7.0. After this, it moved to the Apache License 2.0 and is now available in two flavors: The open source puppet version: It is mostly similar to the preceding tool and is capable of configuration management solutions that permit for definition and automation of states. It is available for both Linux and UNIX as well as Max OS X and Windows. The puppet enterprise edition: It is a commercial version that goes beyond the capabilities of the open source puppet and permits the automation of the configuration and management process. It is a tool that defines a declarative language for later use for system configuration. It can be applied directly on the system or even compiled as a catalogue and deployed on a target using a client-server paradigm, which is usually the REST API. Another component is an agent that enforces the resources available in the manifest. The resource abstraction is, of course, done through an abstraction layer that defines the configuration through higher lever terms that are very different from the operating system-specific commands. If you visit http://docs.puppetlabs.com/, you will find more documentation related to Puppet and other Puppet Lab tools. With all this in place, I believe it is time to present the main component of the meta-cloud-services layer, called OpenStack. It is a cloud operating system that is based on controlling a large number of components and together it offers pools of compute, storage, and networking resources. All of them are managed through a dashboard that is, of course, offered by another component and offers administrators control. It offers users the possibility of providing resources from the same web interface. Here is an image depicting the Open Source Cloud operating System, which is actually OpenStack: It is primarily used as an IaaS solution, its components are maintained by the OpenStack Foundation, and is available under Apache License version 2. In the Foundation, today, there are more than 200 companies that contribute to the source code and general development and maintenance of the software. At the heart of it, all are staying its components Also, each component has a Python module used for simple interaction and automation possibilities: Compute (Nova): It is used for the hosting and management of cloud computing systems. It manages the life cycles of the compute instances of an environment. It is responsible for the spawning, decommissioning, and scheduling of various virtual machines on demand. With regard to hypervisors, KVM is the preferred option but other options such as Xen and VMware are also viable. Object Storage (Swift): It is used for storage and data structure retrieval via RESTful and the HTTP API. It is a scalable and fault-tolerant system that permits data replication with objects and files available on multiple disk drives. It is developed mainly by an object storage software company called SwiftStack. Block Storage (Cinder): It provides persistent block storage for OpenStack instances. It manages the creation and attach and detach actions for block devices. In a cloud, a user manages its own devices, so a vast majority of storage platforms and scenarios should be supported. For this purpose, it offers a pluggable architecture that facilitates the process. Networking (Neutron): It is the component responsible for network-related services, also known as Network Connectivity as a Service. It provides an API for network management and also makes sure that certain limitations are prevented. It also has an architecture based on pluggable modules to ensure that as many networking vendors and technologies as possible are supported. Dashboard (Horizon): It provides web-based administrators and user graphical interfaces for interaction with the other resources made available by all the other components. It is also designed keeping extensibility in mind because it is able to interact with other components responsible for monitoring and billing as well as with additional management tools. It also offers the possibility of rebranding according to the needs of commercial vendors. Identity Service (Keystone): It is an authentication and authorization service It offers support for multiple forms of authentication and also existing backend directory services such as LDAP. It provides a catalogue for users and the resources they can access. Image Service (Glance): It is used for the discovery, storage, registration, and retrieval of images of virtual machines. A number of already stored images can be used as templates. OpenStack also provides an operating system image for testing purposes. Glance is the only module capable of adding, deleting, duplicating, and sharing OpenStack images between various servers and virtual machines. All the other modules interact with the images using the available APIs of Glance. Telemetry (Ceilometer): It is a module that provides billing, benchmarking, and statistical results across all current and future components of OpenStack with the help of numerous counters that permit extensibility. This makes it a very scalable module. Orchestrator (Heat): It is a service that manages multiple composite cloud applications with the help of various template formats, such as Heat Orchestration Templates (HOT) or AWS CloudFormation. The communication is done both on a CloudFormation compatible Query API and an Open Stack REST API. Database (Trove): It provides Cloud Database as service functionalities that are both reliable and scalable. It uses relational and nonrelational database engines. Bare Metal Provisioning (Ironic): It is a components that provides virtual machine support instead of bare metal machines support. It started as a fork of the Nova Baremetal driver and grew to become the best solution for a bare-metal hypervisor. It also offers a set of plugins for interaction with various bare-metal hypervisors. It is used by default with PXE and IPMI, but of course, with the help of the available plugins it can offer extended support for various vendor-specific functionalities. Multiple Tenant Cloud Messaging (Zaqar): It is, as the name suggests, a multitenant cloud messaging service for the web developers who are interested in Software as a Service (SaaS). It can be used by them to send messages between various components by using a number of communication patterns. However, it can also be used with other components for surfacing events to end users as well as communication in the over-cloud layer. Its former name was Marconi and it also provides the possibility of scalable and secure messaging. Elastic Map Reduce (Sahara): It is a module that tries to automate the method of providing the functionalities of Hadoop clusters. It only requires the defines for various fields, such as Hadoop versions, various topology nodes, hardware details, and so on. After this, in a few minutes, a Hadoop cluster is deployed and ready for interaction. It also offers the possibility of various configurations after deployment. Having mentioned all this, maybe you would not mind if a conceptual architecture is presented in the following image to present to you with ways in which the above preceding components are interacted with. To automate the deployment of such an environment in a production environment, automation tools, such as the previously mentioned Puppet tool, can be used. Take a look at this diagram: Now, let's move on and see how such a system can be deployed using the functionalities of the Yocto Project. For this activity to start, all the required metadata layers should be put together. Besides the already available Poky repository, other ones are also required and they are defined in the layer index on OpenEmbedded's website because this time, the README file is incomplete: git clone –b dizzy git://git.openembedded.org/meta-openembedded git clone –b dizzy git://git.yoctoproject.org/meta-virtualization git clone –b icehouse git://git.yoctoproject.org/meta-cloud-services source oe-init-build-env ../build-controller After the appropriate controller build is created, it needs to be configured. Inside the conf/layer.conf file, add the corresponding machine configuration, such as qemux86-64, and inside the conf/bblayers.conf file, the BBLAYERS variable should be defined accordingly. There are extra metadata layers, besides the ones that are already available. The ones that should be defined in this variable are: meta-cloud-services meta-cloud-services/meta-openstack-controller-deploy meta-cloud-services/meta-openstack meta-cloud-services/meta-openstack-qemu meta-openembedded/meta-oe meta-openembedded/meta-networking meta-openembedded/meta-python meta-openembedded/meta-filesystem meta-openembedded/meta-webserver meta-openembedded/meta-ruby After the configuration is done using the bitbake openstack-image-controller command, the controller image is built. The controller can be started using the runqemu qemux86-64 openstack-image-controller kvm nographic qemuparams="-m 4096" command. After finishing this activity, the deployment of the compute can be started in this way: source oe-init-build-env ../build-compute With the new build directory created and also since most of the work of the build process has already been done with the controller, build directories such as downloads and sstate-cache, can be shared between them. This information should be indicated through DL_DIR and SSTATE_DIR. The difference between the two conf/bblayers.conf files is that the second one for the build-compute build directory replaces meta-cloud-services/meta-openstack-controller-deploy with meta-cloud-services/meta-openstack-compute-deploy. This time the build is done with bitbake openstack-image-compute and should be finished faster. Having completed the build, the compute node can also be booted using the runqemu qemux86-64 openstack-image-compute kvm nographic qemuparams="-m 4096 –smp 4" command. This step implies the image loading for OpenStack Cirros as follows: wget download.cirros-cloud.net/0.3.2/cirros-0.3.2-x86_64-disk.img scp cirros-0.3.2-x86_64-disk.img root@<compute_ip_address>:~ ssh root@<compute_ip_address> ./etc/nova/openrc glance image-create –name "TestImage" –is=public true –container-format bare –disk-format qcow2 –file /home/root/cirros-0.3.2-x86_64-disk.img Having done all of this, the user is free to access the Horizon web browser using http://<compute_ip_address>:8080/ The login information is admin and the password is password. Here, you can play and create new instances, interact with them, and, in general, do whatever crosses your mind. Do not worry if you've done something wrong to an instance; you can delete it and start again. The last element from the meta-cloud-services layer is the Tempest integration test suite for OpenStack. It is represented through a set of tests that are executed on the OpenStack trunk to make sure everything is working as it should. It is very useful for any OpenStack deployments. More information about Tempest is available at https://github.com/openstack/tempest. Summary In this article, you were not only presented with information about a number of virtualization concepts, such as NFV, SDN, VNF, and so on, but also a number of open source components that contribute to everyday virtualization solutions. I offered you examples and even a small exercise to make sure that the information remains with you even after reading this book. I hope I made some of you curious about certain things. I also hope that some of you documented on projects that were not presented here, such as the OpenDaylight (ODL) initiative, that has only been mentioned in an image as an implementation suggestion. If this is the case, I can say I fulfilled my goal. Resources for Article: Further resources on this subject: Veil-Evasion [article] Baking Bits with Yocto Project [article] An Introduction to the Terminal [article]
Read more
  • 0
  • 0
  • 32109
Modal Close icon
Modal Close icon