Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-creating-our-first-app-ionic
Packt
16 Feb 2016
20 min read
Save for later

Creating Our First App with Ionic

Packt
16 Feb 2016
20 min read
There are many options for developing mobile applications today. Native applications require a unique implementation for each platform, such as iOS, Android, and Windows Phone. It's required for some use cases such as high-performance CPU and GPU processing with lots of memory consumption. Any application that does not need over-the-top graphics and intensive CPU processing could benefit greatly from a cost-effective, write once, and run everywhere HTML5 mobile implementation. In this article, we will cover: Setting up a development environment Creating a HelloWorld app via CLI Creating a HelloWorld app via Ionic Creator Copying examples from Ionic Codepen Demos Viewing the app using your web browser Viewing the app using iOS Simulator Viewing the app using Xcode for iOS Viewing the app using Genymotion for Android Viewing the app using Ionic View Customizing the app folder structure (For more resources related to this topic, see here.) For those who choose the HTML5 route, there are many great choices in this active market. Some options may be very easy to start but could be very hard to scale or could face performance problems. Commercial options are generally expensive for small developers to discover product and market fit. It's a best practice to think of the users first. There are instances where a simple responsive design website is a better choice; for example, the business has mainly fixed content with minimal updating required or the content is better off on the web for SEO purposes. Ionic has several advantages over its competitors: It's written on top of AngularJS UI performance is strong because of its use of the requestAnimationFrame() technique It offers a beautiful and comprehensive set of default styles, similar to a mobile-focused Twitter Bootstrap Sass is available for quick, easy, and effective theme customization You will go through several HelloWorld examples to bootstrap your Ionic app. This process will give you a quick skeleton to start building more comprehensive apps. The majority of apps have similar user experience flows such as tabs and a side menu. Setting up a development environment Before you create the first app, your environment must have the required components ready. Those components ensure a smooth process of development, build, and test. The default Ionic project folder is based on Cordova's. Therefore you will need the Ionic CLI to automatically add the correct platform (that is, iOS, Android, or Windows Phone) and build the project. This will ensure all Cordova plugins are included properly. The tool has many options to run your app in the browser or simulator with live reload. Getting ready You need to install Ionic and its dependencies to get started. Ionic itself is just a collection of CSS styles and AngularJS Directives and Services. It also has a command-line tool to help manage all of the technologies such as Cordova and Bower. The installation process will give you a command line to generate initial code and build the app. Ionic uses npm as the installer, which is included when installing Node.js. Please install the latest version of Node.js from http://nodejs.org/download/. You will need Cordova, ios-sim (iOS Simulator), and Ionic: $ npm install -g cordova ionic ios-sim This single command line will install all three components instead of issuing three command lines separately. The -g parameter is to install the package globally (not just in the current directory). For Linux and Mac, you may need to use the sudo command to allow system access: $ sudo npm install -g cordova ionic ios-sim There are a few common options for an integrated development environment: Xcode for iOS Eclipse or Android Studio for Android Microsoft Visual Studio Express or Visual Studio for Windows Phone Sublime Text (http://www.sublimetext.com/) for web development All of those have a free license. Sublime Text is free for non-commercial use only but you have to purchase a license if you are a commercial developer. Most frontend developers would prefer to use Sublime Text for coding HTML and JavaScript because it's very lightweight and comes with a well-supported developer community. You could code directly in Xcode, Eclipse, or Visual Studio Express, but those are somewhat heavy duty for web apps, especially when you have a lot of windows open and just need something simple to code. How to do it… If you decide to use Sublime Text, you will need Package Control (https://packagecontrol.io/installation), which is similar to a Plugin Manager. Since Ionic uses Sass, it's optional to install the Sass Syntax Highlighting package: Select Sublime Text | Preferences | Package Control: Select Package Control: Install Package. You could also just type the commands partially (that is, inst) and it will automatically select the right option. Type Sass and the search results will show one option for TextMate & Sublime Text. Select that item to install. See also There are tons of packages that you may want to use, such as Haml, JSHint, JSLint, Tag, ColorPicker, and so on. You can browse around this website: https://sublime.wbond.net/browse/popular, for more information. Creating a HelloWorld app via CLI It's quickest to start your app using existing templates. Ionic gives you three standard templates out of the box via the command line: Blank: This template has a simple one page with minimal JavaScript code. Tabs: This template has multiple pages with routes. A route URL goes to one tab or tabs. Sidemenu: This is template with left and/or right menu and with center content area. There are two other additional templates: maps and salesforce. But these are very specific to apps using Google Maps or for integration with the Salesforce.com API. How to do it… To set up the app with a blank template from Ionic, use this command: $ ionic start HelloWorld_Blank blank If you don't have an account in http://ionic.io/, the command line will ask for it. You could either press y or n to continue. It's not required to have an account at this step. If you replace blank with tabs, it will create a tab template: $ ionic start HelloWorld_Tabs tabs Similarly, this command will create an app with a sidemenu: $ ionic start HelloWorld_Sidemenu sidemenu The sidemenu template is the most common template as it provides a very nice routing example with different pages in the templates folder under /www. Additional guidance for the Ionic CLI is available on the GitHub page: https://github.com/driftyco/ionic-cli How it works… This article will show you how to quickly start your codebase and visually see the result. However, the following are the core concepts: Controller: Manage variables and models in the scope and trigger others, such as services or states. Directive: Where you manipulate the DOM, since the directive is bound to a DOM object. Service: Abstraction to manage models or collections of complex logic beside get/set required. Filter: Mainly used to process an expression in the template and return some data (that is, rounding number, add currency) by using the format {{ expression | filter }}. For example, {{amount | currency}} will return $100 if the amount variable is 100. The project folder structure will look like the following:   You will spend most of your time in the /www folder, because that's where your application logic and views will be placed. By default from the Ionic template, the AngularJS module name is called starter. You will see something like this in app.js, which is the bootstrap file for the entire app: angular.module('starter', ['ionic', 'ngCordova', 'starter.controllers', 'starter.services', 'starter.directives', 'starter.filters']) This basically declares starter to be included in ng-app="starter" of index.html. We would always have ionic and ngCordova. The other modules are required and listed in the array of string [...] as well. They can be defined in separate files. Note that if you double click on the index.html file to open in the browser, it will show a blank page. This doesn't mean the app isn't working. The reason is that the AngularJS component of Ionic dynamically loads all the .js files and this behavior requires server access via a http protocol (http://). If you open a file locally, the browser automatically treats it as a file protocol (file://) and therefore AngularJS will not have the ability to load additional .js modules to run the app properly. There are several methods of running the app that will be discussed. Creating a HelloWorld app via Ionic Creator Another way to start your app codebase is to use Ionic Creator. This is a great interface builder to accelerate your app development with a drag-and-drop style. You can quickly take existing components and position them to visualize how it should look in the app via a web-based interface. Most common components like buttons, images, checkboxes, and so on are available. Ionic Creator allows the user to export everything as a project with all .html, .css, and .js files. You should be able edit content in the /www folder to build on top of the interface. Getting ready Ionic Creator requires registration for a free account at https://creator.ionic.io/ to get started. How to do it… Create a new project called myApp:   You will see this simple screen:   The center area is your app interface. The left side gives you a list of pages. Each page is a single route. You also have access to a number of UI components that you would normally have to code by hand in an HTML file. The right panel shows the properties of any selected component. You're free to do whatever you need to do here by dropping components to the center screen. If you need to create a new page, you have to click the plus sign in the Pages panel. Each page is represented as a link, which is basically a route in AngularJS UI Router's definition. To navigate to another page (for example, after clicking a button), you can just change the Link property and point to that page. There is an Edit button on top where you can toggle back and forth between Edit Mode and Preview Mode. It's very useful to see how your app will look and behave. Once completed, click on the Export button on the top navigation. You have three options: Use the Ionic CLI tool to get the code Download the project as a zip file Review the raw HTML The best way to learn Ionic Creator is to play with it. You can add a new page and pick out any existing templates. This example shows a Login page template:   Here is how it should look out of the box:   There's more... To switch to Preview Mode where you can see the UI in a device simulator, click the switch button on the top right to enable Test:   In this mode, you should be able to interact with the components in the web browser as if it's actually deployed on the device. If you break something, it's very simple to start a new project. It's a great tool to use for "prototyping" and to get initial template or project scaffolding. You should continue to code in your regular IDE for the rest of the app. Ionic Creator doesn't do everything for you, yet. For example, if you want to access specific Cordova plugin features, you have to write that code separately. Also, if you want to tweak the interface outside of what is allowed within Ionic Creator, it will also require specific modifications to the .html and .css files. Copying examples from Ionic Codepen Demos Sometimes it's easier to just get snippets of code from the example library. Ionic Codepen Demos (http://codepen.io/ionic/public-list/) is a great website to visit. Codepen.io is a playground (or sandbox) to demonstrate and learn web development. There are other alternatives such as http://plnkr.com or http://jsfiddle.com. It's just a developer's personal preference which one to choose. However, all Ionic's demos are already available on Codepen, where you can experiment and clone to your own account. http://plnkr.com has an existing AngularJS boilerplate and could be used to just practice specific AngularJS areas because you can copy the link of sample code and post on http://stackoverflow.com/ if you have questions. How to do it… There are several tags of interest to browse through if you want specific UI component examples: You don't need a Codepen account to view. However, if there is a need to save a custom pen and share with others, free registration will be required. The Ionic Codepen Demos site has more collections of demos comparing to the CLI. Some are based on a nightly build of the platform so they could be unstable to use. There's more... You can find the same side menu example on this site: Navigate to http://codepen.io/ionic/public-list/ from your browser. Select Tag: menus and then click on Side Menu and Navigation: Nightly. Change the layout to fit a proper mobile screen by clicking on the first icon of the layout icons row on the bottom right of the screen. Viewing the app using your web browser In order to "run" the web app, you need to turn your /www folder into a web server. Again there are many methods to do this and people tend to stick with one or two ways to keep things simple. A few other options are unreliable such as Sublime Text's live watch package or static page generator (for example, Jekyll, Middleman App, and so on). They are slow to detect changes and may freeze your IDE so these won't be mentioned here. Getting ready The recommended method is to use the ionic serve command line. It basically launches an HTTP server so you can open your app in a desktop browser. How to do it… First you need to be in the project folder. Let's assume it is the Side Menu HelloWorld: $ cd HelloWorld_Sidemenu From there, just issue the simple command line: $ ionic serve  That's it! There is no need to go into the /www folder or figure out which port to use. The command line will provide these options while the web server is running: The most common option to use here is r to restart or q to quit when you are done. There is an additional step to view the app with the correct device resolution: Install Google Chrome if it's not already on your computer. Open the link (for example, http://localhost:8100/#/app/playlists) from ionic serve in Google Chrome. Turn on Developer Tools. For example, in Mac's Google Chrome, select View | Developer | Developer Tools: Click on the small mobile icon in the Chrome Developer Tools area: There will be a long list of devices to pick from: After selecting a device, you need to refresh the page to ensure the UI is updated. Chrome should give you the exact view resolution of the device. Most developers would prefer to use this method to code as you can debug the app using Chrome Developer Tools. It works exactly like any web application. You can create breakpoints or output variables to the console. How it works... Note that ionic serve is actually watching everything under the /www folder except the JavaScript modules in the /lib folder. This makes sense because there is no need for the system to scan through every single file when the probability for it to change is very small. People don't code directly in the /lib folder but only update when there is a new version of Ionic. However, there is some flexibility to change this. You can specify a watchPatterns property in the ionic.project file located in your project root to watch (or not watch) for specific changes: { "name": "myApp", "app_id": "", "watchPatterns": [ "www/**/*", "!www/css/**/*", "your_folder_here/**/*" ] } While the web server is running, you can go back to the IDE and continue coding. For example, let's open the playlists.html file under /www/templates and change the first line to this: <ion-view view-title="Updated Playlists"> Go back to the web browser where Ionic opened the new page; the app interface will change the title bar right away without requiring you to refresh the browser. This is a very nice feature when there is a lot of back and between code changes and allows checking on how it works or looks in the app instantly. Viewing the app using iOS Simulator So far you have been testing the web-app portion of Ionic. In order to view the app in the simulator, follow the next steps. How to do it... Add the specific platform using: $ ionic platform add ios Note that you need to do the "platform add" before building the app. The last step is to emulate the app: $ ionic emulate ios Viewing the app using Xcode for iOS Depending on personal preference, you may find it more convenient to just deploy the app using ionic ios --device on a regular basis. This command line will push the app to your physical device connected via USB without ever running Xcode. However, you could run the app using Xcode (in Mac), too. How to do it... Go to the /platforms/ios folder. Look for the folder with .xcodeproj and open in Xcode. Click on the iOS Device icon and select your choice of iOS Simulator. Click on the Run button and you should be able to see the app running in the simulator. There's more... You can connect a physical device via a USB port and it will show up in the iOS Device list for you to pick. Then you can deploy the app directly on your device. Note that iOS Developer Membership is required for this. This method is more complex than just viewing the app via a web browser. However, it's a must when you want to test your code related to device features such as camera or maps. If you change code in the /www folder and want to run it again in Xcode, you have to do ionic build ios first, because the running code is in the Staging folder of your Xcode project: For debugging, the Xcode Console can output JavaScript logs as well. However, you could use the more advanced features of Safari's Web Inspector (which is similar to Google Chrome's Developer Tools) to debug your app. Note that only Safari can debug a web app running on a connected physical iOS device because Chrome does not support this on a Mac. It's simple to enable this capability: Allow remote debugging for an iOS device by going to Settings | Safari | Advanced and enable Web Inspector. Connect the physical iOS device to your Mac via USB and run the app. Open the Safari browser. Select Develop, click on your device's name (or iOS Simulator), and click on index.html. Note: If you don't see the Develop menu in Safari, you need to navigate to menu Preferences | Advanced and check on Show Develop menu in menu bar. Safari will open a new console just for that specific device just as it's running within the computer's Safari. Viewing the app using Genymotion for Android Although it's possible to install the Google Android simulator, many developers have inconsistent experiences on a Mac computer. There are many commercial and free alternatives that offer more convenience and a wide range of device support. Genymotion provides some unique advantages such as allowing users to switch Android model and version, supporting networking from within the app, and allowing SD card simulation. You will learn how to set up an Android developer environment (on a Mac in this case) first. Then you will install and configure Genymotion for mobile app development. How to do it... The first step is to set up the Android environment properly for development. Download and install Android Studio from https://developer.android.com/sdk/index.html. Run Android Studio. You need to install all required packages such as the Android SDK. Just click on Next twice at the Setup Wizard screen and select the Finish button to start packages installation. After installation is complete, you need to install additional packages and other SDK versions. At the Quick Start screen, select Configure: Then select SDK Manager: It's a good practice to install a previous version such as Android 5.0.1 and 5.1.1. You may also want to install all Tools and Extras for later use. Select the Install packages... button. Check the box on Accept License and click on Install. The SDK Manager will give you SDK Path on the top. Make a copy of this path because you need to modify the environment path. Go to Terminal and type: $ touch ~/.bash_profile; open ~/.bash_profile It will open a text editor to edit your bash profile file. Insert the following line where /YOUR_PATH_TO/android-sdk should be the SDK Path that you copied earlier: export ANDROID_HOME=/YOUR_PATH_TO/android-sdk export PATH=$ANDROID_HOME/platform-tools:$PATH export PATH=$ANDROID_HOME/tools:$PATH Save and close that text editor. Go back to Terminal and type: $ source ~/.bash_profile $ echo $ANDROID_HOME You should see the output as your SDK Path. This verifies that you have correctly configured the Android developer environment. The second step is to install and configure Genymotion. Download and install Genymotion and Genymotion Shell from http://Genymotion.com. Run Genymotion. Select the Add button to start adding a new Android device. Select a device you want to simulate. In this case, let's select Samsung Galaxy S5: You will see the device being added to "Your virtual devices". Click on that device: Then click on Start. The simulator will take a few seconds to start and will show another window. This is just a blank simulator without your app running inside yet. Run Genymotion Shell. From Genymotion Shell, you need to get a device list and keep the IP address of the device attached, which is Samsung Galaxy S5. Type devices list: Type adb connect 192.168.56.101 (or whatever the IP address was you saw earlier from the devices list command line). Type adb devices to confirm that it is connected. Type ionic platform add android to add Android as a platform for your app. Finally, type ionic run android. You should be able to see the Genymotion window showing your app. Although there are many steps to get this working, it's a lot less likely that you will have to go through the same process again. Once your environment is set up, all you need to do is to leave Genymotion running while writing code. If there is a need to test the app in different Android devices, it's simple just to add another virtual device in Genymotion and connect to it. Summary In this article, we learned how to create your first ionic App. We also covered various ways in which we can view the App on Various platforms, that is, web browser, iOS Simulator, Xcode, Genymotion . You can also refer the following books on the similar topics: Learning Ionic: https://www.packtpub.com/application-development/learning-ionic Getting Started with Ionic: https://www.packtpub.com/application-development/getting-started-ionic Ionic Framework By Example: https://www.packtpub.com/application-development/ionic-framework-example Resources for Article: Further resources on this subject: Directives and Services of Ionic [article] First Look at Ionic [article] Ionic JS Components [article]
Read more
  • 0
  • 0
  • 20031

article-image-recommending-movies-scale-python
Packt
15 Feb 2016
57 min read
Save for later

Recommending Movies at Scale (Python)

Packt
15 Feb 2016
57 min read
In this article, we will cover the following recipes: Modeling preference expressions Understanding the data Ingesting the movie review data Finding the highest-scoring movies Improving the movie-rating system Measuring the distance between users in the preference space Computing the correlation between users Finding the best critic for a user Predicting movie ratings for users Collaboratively filtering item by item Building a nonnegative matrix factorization model Loading the entire dataset into the memory Dumping the SVD-based model to the disk Training the SVD-based model Testing the SVD-based model (For more resources related to this topic, see here.) Introduction From books to movies to people to follow on Twitter, recommender systems carve the deluge of information on the Internet into a more personalized flow, thus improving the performance of e-commerce, web, and social applications. It is no great surprise, given the success of Amazon-monetizing recommendations and the Netflix Prize, that any discussion of personalization or data-theoretic prediction would involve a recommender. What is surprising is how simple recommenders are to implement yet how susceptible they are to vagaries of sparse data and overfitting. Consider a non-algorithmic approach to eliciting recommendations; one of the easiest ways to garner a recommendation is to look at the preferences of someone we trust. We are implicitly comparing our preferences to theirs, and the more similarities you share, the more likely you are to discover novel, shared preferences. However, everyone is unique, and our preferences exist across a variety of categories and domains. What if you could leverage the preferences of a great number of people and not just those you trust? In the aggregate, you would be able to see patterns, not just of people like you, but also "anti-recommendations"— things to stay away from, cautioned by the people not like you. You would, hopefully, also see subtle delineations across the shared preference space of groups of people who share parts of your own unique experience. It is this basic premise that a group of techniques called "collaborative filtering" use to make recommendations. Simply stated, this premise can be boiled down to the assumption that those who have similar past preferences will share the same preferences in the future. This is from a human perspective, of course, and a typical corollary to this assumption is from the perspective of the things being preferred—sets of items that are preferred by the same people will be more likely to preferred together in the future—and this is the basis for what is commonly described in the literature as user-centric collaborative filtering versus item-centric collaborative filtering. The term collaborative filtering was coined by David Goldberg in a paper titled Using collaborative filtering to weave an information tapestry, ACM, where he proposed a system called Tapestry, which was designed at Xerox PARC in 1992, to annotate documents as interesting or uninteresting and to give document recommendations to people who are searching for good reads. Collaborative filtering algorithms search large groupings of preference expressions to find similarities to some input preference or preferences. The output from these algorithms is a ranked list of suggestions that is a subset of all possible preferences, and hence, it's called "filtering". The "collaborative" comes from the use of many other peoples' preferences in order to find suggestions for themselves. This can be seen either as a search of the space of preferences (for brute-force techniques), a clustering problem (grouping similarly preferred items), or even some other predictive model. Many algorithmic attempts have been created in order to optimize or solve this problem across sparse or large datasets, and we will discuss a few of them in this article. The goals of this article are: Understanding how to model preferences from a variety of sources Learning how to compute similarities using distance metrics Modeling recommendations using matrix factorization for star ratings These two different models will be implemented in Python using readily available datasets on the Web. To demonstrate the techniques in this article, we will use the oft-cited MovieLens database from the University of Minnesota that contains star ratings of moviegoers for their preferred movies. Modeling preference expressions We have already pointed out that companies such as Amazon track purchases and page views to make recommendations, Goodreads and Yelp use 5 star ratings and text reviews, and sites such as Reddit or Stack Overflow use simple up/down voting. You can see that preference can be expressed in the data in different ways, from Boolean flags to voting to ratings. However, these preferences are expressed by attempting to find groups of similarities in preference expressions in which you are leveraging the core assumption of collaborative filtering. More formally, we understand that two people, Bob and Alice, share a preference for a specific item or widget. If Alice too has a preference for a different item, say, sprocket, then Bob has a better than random chance of also sharing a preference for a sprocket. We believe that Bob and Alice's taste similarities can be expressed in an aggregate via a large number of preferences, and by leveraging the collaborative nature of groups, we can filter the world of products. How to do it… We will model preference expressions over the next few recipes, including: Understanding the data Ingesting the movie review data Finding the highest rated movies Improving the movie-rating system How it works… A preference expression is an instance of a model of demonstrable relative selection. That is to say, preference expressions are data points that are used to show subjective ranking between a group of items for a person. Even more formally, we should say that preference expressions are not simply relative, but also temporal—for example, the statement of preference also has a fixed time relativity as well as item relativity. Preference expression is an instance of a model of demonstrable relative selection. While it would be nice to think that we can subjectively and accurately express our preferences in a global context (for example, rate a movie as compared to all other movies), our tastes, in fact, change over time, and we can really only consider how we rank items relative to each other. Models of preference must take this into account and attempt to alleviate biases that are caused by it. The most common types of preference expression models simplify the problem of ranking by causing the expression to be numerically fuzzy, for example: Boolean expressions (yes or no) Up and down voting (such as abstain, dislike) Weighted signaling (the number of clicks or actions) Broad ranked classification (stars, hated or loved) The idea is to create a preference model for an individual user—a numerical model of the set of preference expressions for a particular individual. Models build the individual preference expressions into a useful user-specific context that can be computed against. Further reasoning can be performed on the models in order to alleviate time-based biases or to perform ontological reasoning or other categorizations. As the relationships between entities get more complex, you can express their relative preferences by assigning behavioral weights to each type of semantic connection. However, choosing the weight is difficult and requires research to decide relative weights, which is why fuzzy generalizations are preferred. As an example, the following table shows you some well-known ranking preference systems: Reddit Voting   Online Shopping   Star Reviews   Up Vote 1 Bought 2 Love 5 No Vote 0 Viewed 1 Liked 4 Down Vote -1 No purchase 0 Neutral 3         Dislike 2         Hate 1 For the rest of this article, we will only consider a single, very common preference expression: star ratings on a scale of 1 to 5. Understanding the data Understanding your data is critical to all data-related work. In this recipe, we acquire and take a first look at the data that we will be using to build our recommendation engine. Getting ready To prepare for this recipe, and the rest of the article, download the MovieLens data from the GroupLens website of the University of Minnesota. You can find the data at http://grouplens.org/datasets/movielens/. In this article, we will use the smaller MoveLens 100k dataset (4.7 MB in size) in order to load the entire model into the memory with ease. How to do it… Perform the following steps to better understand the data that we will be working with throughout this article: Download the data from http://grouplens.org/datasets/movielens/. The 100K dataset is the one that you want (ml-100k.zip). Unzip the downloaded data into the directory of your choice. The two files that we are mainly concerned with are u.data, which contains the user movie ratings, and u.item, which contains movie information and details. To get a sense of each file, use the head command at the command prompt for Mac and Linux or the more command for Windows: head -n 5 u.item Note that if you are working on a computer running the Microsoft Windows operating system and not using a virtual machine (not recommended), you do not have access to the head command; instead, use the following command: more u.item 2 n The preceding command gives you the following output: 1|Toy Story (1995)|01-Jan-1995||http://us.imdb.com/M/title- exact?Toy%20Story%20(1995)|0|0|0|1|1|1|0|0|0|0|0|0|0|0|0|0|0|0 |0 2|GoldenEye (1995)|01-Jan-1995||http://us.imdb.com/M/title- exact?GoldenEye%20(1995)|0|1|1|0|0|0|0|0|0|0|0|0|0|0|0|0|1|0|0 3|Four Rooms (1995)|01-Jan-1995||http://us.imdb.com/M/title- exact?Four%20Rooms%20(1995)|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1| 0|0 4|Get Shorty (1995)|01-Jan-1995||http://us.imdb.com/M/title- exact?Get%20Shorty%20(1995)|0|1|0|0|0|1|0|0|1|0|0|0|0|0|0|0|0| 0|0 5|Copycat (1995)|01-Jan-1995||http://us.imdb.com/M/title- exact?Copycat%20(1995)|0|0|0|0|0|0|1|0|1|0|0|0|0|0|0|0|1|0|0 The following command will produce the given output: head -n 5 u.data For Windows, you can use the following command: more u.item 2 n 196 242 3 881250949 186 302 3 891717742 22 377 1 878887116 244 51 2 880606923 166 346 1 886397596 How it works… The two main files that we will be using are as follows: u.data: This contains the user moving ratings u.item: This contains the movie information and other details Both are character-delimited files; u.data, which is the main file, is tab delimited, and u.item is pipe delimited. For u.data, the first column is the user ID, the second column is the movie ID, the third is the star rating, and the last is the timestamp. The u.item file contains much more information, including the ID, title, release date, and even a URL to IMDB. Interestingly, this file also has a Boolean array indicating the genre(s) of each movie, including (in order) action, adventure, animation, children, comedy, crime, documentary, drama, fantasy, film-noir, horror, musical, mystery, romance, sci-fi, thriller, war, and western. There's more… Free, web-scale datasets that are appropriate for building recommendation engines are few and far between. As a result, the movie lens dataset is a very popular choice for such a task but there are others as well. The well-known Netflix Prize dataset has been pulled down by Netflix. However, there is a dump of all user-contributed content from the Stack Exchange network (including Stack Overflow) available via the Internet Archive (https://archive.org/details/stackexchange). Additionally, there is a book-crossing dataset that contains over a million ratings of about a quarter million different books (http://www2.informatik.uni-freiburg.de/~cziegler/BX/). Ingesting the movie review data Recommendation engines require large amounts of training data in order to do a good job, which is why they're often relegated to big data projects. However, to build a recommendation engine, we must first get the required data into memory and, due to the size of the data, must do so in a memory-safe and efficient way. Luckily, Python has all of the tools to get the job done, and this recipe shows you how. Getting ready You will need to have the appropriate movie lens dataset downloaded, as specified in the preceding recipe. Ensure that you have NumPy correctly installed. How to do it… The following steps guide you through the creation of the functions that we will need in order to load the datasets into the memory: Open your favorite Python editor or IDE. There is a lot of code, so it should be far simpler to enter directly into a text file than Read Eval Print Loop (REPL). We create a function to import the movie reviews: import csv import datetime def load_reviews(path, **kwargs): """ Loads MovieLens reviews """ options = { 'fieldnames': ('userid', 'movieid', 'rating', 'timestamp'), 'delimiter': 't', } options.update(kwargs) parse_date = lambda r,k: datetime.fromtimestamp(float(r[k])) parse_int = lambda r,k: int(r[k]) with open(path, 'rb') as reviews: reader = csv.DictReader(reviews, **options) for row in reader: row['userid'] = parse_int(row, 'userid') row['movieid'] = parse_int(row, 'movieid') row['rating'] = parse_int(row, 'rating') row['timestamp'] = parse_date(row, 'timestamp') yield row We create a helper function to help import the data: import os def relative_path(path): """ Returns a path relative from this code file """ dirname = os.path.dirname(os.path.realpath('__file__')) path = os.path.join(dirname, path) return os.path.normpath(path)  We create another function to load the movie information: def load_movies(path, **kwargs): """ Loads MovieLens movies """ options = { 'fieldnames': ('movieid', 'title', 'release', 'video', 'url'),'delimiter': '|','restkey': 'genre', } options.update(kwargs) parse_int = lambda r,k: int(r[k]) parse_date = lambda r,k: datetime.strptime(r[k], '%d-%b- %Y') if r[k] else None with open(path, 'rb') as movies: reader = csv.DictReader(movies, **options) for row in reader: row['movieid'] = parse_int(row, 'movieid') row['release'] = parse_date(row, 'release') row['video'] = parse_date(row, 'video')             yield row Finally, we start creating a MovieLens class that will be augmented in later recipes: from collections import defaultdict class MovieLens(object): """ Data structure to build our recommender model on. """ def __init__(self, udata, uitem): """ Instantiate with a path to u.data and u.item """ self.udata = udata self.uitem = uitem self.movies = {} self.reviews = defaultdict(dict) self.load_dataset() def load_dataset(self): """ Loads the two datasets into memory, indexed on the ID. """ for movie in load_movies(self.uitem): self.movies[movie['movieid']] = movie for review in load_reviews(self.udata): self.reviews[review['userid']][review['movieid']] = review  Ensure that the functions have been imported into your REPL or the IPython workspace, and type the following, making sure that the path to the data files is appropriate for your system: data = relative_path('data/ml-100k/u.data') item = relative_path('data/ml-100k/u.item') model = MovieLens(data, item) How it works… The methodology that we use for the two data-loading functions (load_reviews and load_movies) is simple, but it takes care of the details of parsing the data from the disk. We created a function that takes a path to our dataset and then any optional keywords. We know that we have specific ways in which we need to interact with the csv module, so we create default options, passing in the field names of the rows along with the delimiter, which is t. The options.update(kwargs) line means that we'll accept whatever users pass to this function. We then created internal parsing functions using a lambda function in Python. These simple parsers take a row and a key as input and return the converted input. This is an example of using lambda as internal, reusable code blocks and is a common technique in Python. Finally, we open our file and create a csv.DictReader function with our options. Iterating through the rows in the reader, we parse the fields that we want to be int and datetime, respectively, and then yield the row. Note that as we are unsure about the actual size of the input file, we are doing this in a memory-safe manner using Python generators. Using yield instead of return ensures that Python creates a generator under the hood and does not load the entire dataset into the memory. We'll use each of these methodologies to load the datasets at various times through our computation that uses this dataset. We'll need to know where these files are at all times, which can be a pain, especially in larger code bases; in the There's more… section, we'll discuss a Python pro-tip to alleviate this concern. Finally, we created a data structure, which is the MovieLens class, with which we can hold our reviews' data. This structure takes the udata and uitem paths, and then, it loads the movies and reviews into two Python dictionaries that are indexed by movieid and userid, respectively. To instantiate this object, you will execute something as follows: data = relative_path('../data/ml-100k/u.data') item = relative_path('../data/ml-100k/u.item') model = MovieLens(data, item) Note that the preceding commands assume that you have your data in a folder called data. We can now load the whole dataset into the memory, indexed on the various IDs specified in the dataset. Did you notice the use of the relative_path function? When dealing with fixtures such as these to build models, the data is often included with the code. When you specify a path in Python, such as data/ml-100k/u.data, it looks it up relative to the current working directory where you ran the script. To help ease this trouble, you can specify the paths that are relative to the code itself: import os def relative_path(path): """ Returns a path relative from this code file """ dirname = os.path.dirname(os.path.realpath('__file__')) path = os.path.join(dirname, path) return os.path.normpath(path) Keep in mind that this holds the entire data structure in memory; in the case of the 100k dataset, this will require 54.1 MB, which isn't too bad for modern machines. However, we should also keep in mind that we'll generally build recommenders using far more than just 100,000 reviews. This is why we have configured the data structure the way we have—very similar to a database. To grow the system, you will replace the reviews and movies properties with database access functions or properties, which will yield data types expected by our methods. Finding the highest-scoring movies If you're looking for a good movie, you'll often want to see the most popular or best rated movies overall. Initially, we'll take a naïve approach to compute a movie's aggregate rating by averaging the user reviews for each movie. This technique will also demonstrate how to access the data in our MovieLens class. Getting ready These recipes are sequential in nature. Thus, you should have completed the previous recipes in the article before starting with this one. How to do it… Follow these steps to output numeric scores for all movies in the dataset and compute a top-10 list: Augment the MovieLens class with a new method to get all reviews for a particular movie: class MovieLens(object): ... def reviews_for_movie(self, movieid): """ Yields the reviews for a given movie """ for review in self.reviews.values(): if movieid in review: yield review[movieid] Then, add an additional method to compute the top 10 movies reviewed by users: import heapq from operator import itemgetter class MovieLens(object): ... def average_reviews(self): """ Averages the star rating for all movies. Yields a tuple of movieid, the average rating, and the number of reviews. """ for movieid in self.movies: reviews = list(r['rating'] for r in self.reviews_for_movie(movieid)) average = sum(reviews) / float(len(reviews)) yield (movieid, average, len(reviews)) def top_rated(self, n=10): """ Yields the n top rated movies """ return heapq.nlargest(n, self.average_reviews(), key=itemgetter(1)) Note that the … notation just below class MovieLens(object): signifies that we will be appending the average_reviews method to the existing MovieLens class. Now, let's print the top-rated results: for mid, avg, num in model.top_rated(10): title = model.movies[mid]['title'] print "[%0.3f average rating (%i reviews)] %s" % (avg, num,title) Executing the preceding commands in your REPL should produce the following output: [5.000 average rating (1 reviews)] Entertaining Angels: The Dorothy Day Story (1996) [5.000 average rating (2 reviews)] Santa with Muscles (1996) [5.000 average rating (1 reviews)] Great Day in Harlem, A (1994) [5.000 average rating (1 reviews)] They Made Me a Criminal (1939) [5.000 average rating (1 reviews)] Aiqing wansui (1994) [5.000 average rating (1 reviews)] Someone Else's America (1995) [5.000 average rating (2 reviews)] Saint of Fort Washington, The (1993) [5.000 average rating (3 reviews)] Prefontaine (1997) [5.000 average rating (3 reviews)] Star Kid (1997) [5.000 average rating (1 reviews)] Marlene Dietrich: Shadow and Light (1996) How it works… The new reviews_for_movie() method that is added to the MovieLens class iterates through our review dictionary values (which are indexed by the userid parameter), checks whether the movieid value has been reviewed by the user, and then presents that review dictionary. We will need such functionality for the next method. With the average_review() method, we have created another generator function that goes through all of our movies and all of their reviews and presents the movie ID, the average rating, and the number of reviews. The top_rated function uses the heapq module to quickly sort the reviews based on the average. The heapq data structure, also known as the priority queue algorithm, is the Python implementation of an abstract data structure with interesting and useful properties. Heaps are binary trees that are built so that every parent node has a value that is either less than or equal to any of its children nodes. Thus, the smallest element is the root of the tree, which can be accessed in constant time, which is a very desirable property. With heapq, Python developers have an efficient means to insert new values in an ordered data structure and also return sorted values. There's more… Here, we run into our first problem—some of the top-rated movies only have one review (and conversely, so do the worst-rated movies). How do you compare Casablanca, which has a 4.457 average rating (243 reviews), with Santa with Muscles, which has a 5.000 average rating (2 reviews)? We are sure that those two reviewers really liked Santa with Muscles, but the high rating for Casablanca is probably more meaningful because more people liked it. Most recommenders with star ratings will simply output the average rating along with the number of reviewers, allowing the user to determine their quality; however, as data scientists, we can do better in the next recipe. See also The heapq documentation available at https://docs.python.org/2/library/heapq.html Improving the movie-rating system We don't want to build a recommendation engine with a system that considers the likely straight-to-DVD Santa with Muscles as generally superior to Casablanca. Thus, the naïve scoring approach used previously must be improved upon and is the focus of this recipe. Getting ready Make sure that you have completed the previous recipes in this article first. How to do it… The following steps implement and test a new movie-scoring algorithm: Let's implement a new Bayesian movie-scoring algorithm as shown in the following function, adding it to the MovieLens class: def bayesian_average(self, c=59, m=3): """ Reports the Bayesian average with parameters c and m. """ for movieid in self.movies: reviews = list(r['rating'] for r in self.reviews_for_movie(movieid)) average = ((c * m) + sum(reviews)) / float(c + len(reviews)) yield (movieid, average, len(reviews))   Next, we will replace the top_rated method in the MovieLens class with the version in the following commands that uses the new Bayesian_average method from the preceding step: def top_rated(self, n=10): """ Yields the n top rated movies """ return heapq.nlargest(n, self.bayesian_average(), key=itemgetter(1))  Printing our new top-10 list looks a bit more familiar to us and Casablanca is now happily rated number 4: [4.234 average rating (583 reviews)] Star Wars (1977) [4.224 average rating (298 reviews)] Schindler's List (1993) [4.196 average rating (283 reviews)] Shawshank Redemption, The (1994) [4.172 average rating (243 reviews)] Casablanca (1942) [4.135 average rating (267 reviews)] Usual Suspects, The (1995) [4.123 average rating (413 reviews)] Godfather, The (1972) [4.120 average rating (390 reviews)] Silence of the Lambs, The (1991) [4.098 average rating (420 reviews)] Raiders of the Lost Ark (1981) [4.082 average rating (209 reviews)] Rear Window (1954) [4.066 average rating (350 reviews)] Titanic (1997) How it works… Taking the average of movie reviews, as in shown the previous recipe, simply did not work because some movies did not have enough ratings to give a meaningful comparison to movies with more ratings. What we'd really like is to have every single movie critic rate every single movie. Given that this is impossible, we could derive an estimate for how the movie would be rated if an infinite number of people rated the movie; this is hard to infer from one data point, so we should say that we would like to estimate the movie rating if the same number of people gave it a rating on an average (for example, filtering our results based on the number of reviews). This estimate can be computed with a Bayesian average, implemented in the bayesian_average() function, to infer these ratings based on the following equation:   Here, m is our prior for the average of stars, and C is a confidence parameter that is equivalent to the number of observations in our posterior. Determining priors can be a complicated and magical art. Rather than taking the complex path of fitting a Dirichlet distribution to our data, we can simply choose an m prior of 3 with our 5-star rating system, which means that our prior assumes that star ratings tend to be reviewed around the median value. In choosing C, you are expressing how many reviews are needed to get away from the prior; we can compute this by looking at the average number of reviews per movie: print float(sum(num for mid, avg, num in model.average_reviews())) / len(model.movies) This gives us an average number of 59.4, which we use as the default value in our function definition. There's more… Play around with the C parameter. You should find that if you change the parameter so that C = 50, the top-10 list subtly shifts; in this case, Schindler's List and Star Wars are swapped in rankings, as are Raiders of the Lost Ark and Rear Window— note that both the swapped movies have far more reviews than the former, which means that the higher C parameter was balancing the fewer ratings of the other movie. See also See how Yelp deals with this challenge at http://venturebeat.com/2009/10/12/how-yelp-deals-with-everybody-getting-four-stars-on-average/ Measuring the distance between users in the preference space The two most recognizable types of collaborative filtering systems are user-based recommenders and item-based recommenders. If one were to imagine that the preference space is an N-dimensional feature space where either users or items are plotted, then we would say that similar users or items tend to cluster near each other in this preference space; hence, an alternative name for this type of collaborative filtering is nearest neighbor recommenders. A crucial step in this process is to come up with a similarity or distance metric with which we can compare critics to each other or mutually preferred items. This metric is then used to perform pairwise comparisons of a particular user to all other users, or conversely, for an item to be compared to all other items. Normalized comparisons are then used to determine recommendations. Although the computational space can become exceedingly large, distance metrics themselves are not difficult to compute, and in this recipe, we will explore a few as well as implement our first recommender system. In this recipe, we will measure the distance between users; in the recipe after this one, we will look at another similarity distance indicator. Getting ready We will continue to build on the MovieLens class from the section titled Modeling Preference. If you have not had the opportunity to review this section, please have the code for that class ready. Importantly, we will want to access the data structures, MovieLens.movies and MovieLens.reviews, that have been loaded from the CSV files on the disk. How to do it… The following set of steps provide instructions on how to compute the Euclidean distance between users: Augment the MovieLens class with a new method, shared_preferences, to pull out movies that have been rated by two critics, A and B: class MovieLens(objects): ... def shared_preferences(self, criticA, criticB): """ Returns the intersection of ratings for two critics """ if criticA not in self.reviews: raise KeyError("Couldn't find critic '%s' in data" % criticA) if criticB not in self.reviews: raise KeyError("Couldn't find critic '%s' in data" % criticB) moviesA = set(self.reviews[criticA].keys()) moviesB = set(self.reviews[criticB].keys()) shared = moviesA & moviesB # Intersection operator # Create a reviews dictionary to return reviews = {} for movieid in shared: reviews[movieid] = ( self.reviews[criticA][movieid]['rating'], self.reviews[criticB][movieid]['rating'], ) return reviews Then, implement a function that computes the Euclidean distance between two critics using their shared movie preferences as a vector for the computation. This method will also be part of the MovieLens class: from math import sqrt ... def euclidean_distance(self, criticA, criticB): """ Reports the Euclidean distance of two critics, A&B by performing a J-dimensional Euclidean calculation of each of their preference vectors for the intersection of movies the critics have rated. """ # Get the intersection of the rated titles in the data. preferences = self.shared_preferences(criticA, criticB) # If they have no rankings in common, return 0. if len(preferences) == 0: return 0 # Sum the squares of the differences sum_of_squares = sum([pow(a-b, 2) for a, b in preferences.values()]) # Return the inverse of the distance to give a higher score to # folks who are more similar (e.g. less distance) add 1 to prevent # division by zero errors and normalize ranks in [0, 1] return 1 / (1 + sqrt(sum_of_squares)) With the preceding code implemented, test it in REPL: >>> data = relative_path('data/ml-100k/u.data') >>> item = relative_path('data/ml-100k/u.item') >>> model = MovieLens(data, item) >>> print model.euclidean_distance(232, 532) 0.1023021629920016 How it works… The new shared_preferences() method of the MovieLens class determines the shared preference space of two users. Critically, we can only compare users (the criticA and criticB input parameters) based on the things that they have both rated. This function uses Python sets to determine the list of movies that both A and B reviewed (the intersection of the movies A has rated and the movies B has rated). The function then iterates over this set, returning a dictionary whose keys are the movie IDs and the values are a tuple of ratings, for example, (ratingA, ratingB) for each movie that both users have rated. We can now use this dataset to compute similarity scores, which is done by the second function. The euclidean_distance() function takes two critics as the input, A and B, and computes the distance between users in preference space. Here, we have chosen to implement the Euclidean distance metric (the two-dimensional variation is well known to those who remember the Pythagorean theorem), but we could have implemented other metrics as well. This function will return a real number from 0 to 1, where 0 is less similar (farther apart) critics and 1 is more similar (closer together) critics. There's more… The Manhattan distance is another very popular metric and a very simple one to understand. It can simply sum the absolute values of the pairwise differences between elements of each vector. Or, in code, it can be executed in this manner: manhattan = sum([abs(a-b) for a, b in preferences.values()]) This metric is also called the city-block distance because, conceptually, it is as if you were counting the number of blocks north/south and east/west one would have to walk between two points in the city. Before implementing it for this recipe, you would also want to invert and normalize the value in some fashion to return a value in the [0, 1] range. See also The distance overview from Wikipedia available at http://en.wikipedia.org/wiki/Distance The Taxicab geometry from Wikipedia available at http://en.wikipedia.org/wiki/Taxicab_geometry Computing the correlation between users In the previous recipe, we used one out of many possible distance measures to capture the distance between the movie reviews of users. This distance between two specific users is not changed even if there are five or five million other users. In this recipe, we will compute the correlation between users in the preference space. Like distance metrics, there are many correlation metrics. The most popular of these are Pearson or Spearman correlations or Cosine distance. Unlike distance metrics, the correlation will change depending on the number of users and movies. Getting ready We will be continuing the efforts of the previous recipes again, so make sure you understand each one. How to do it… The following function implements the computation of the pearson_correlation function for two critics, which are criticA and criticB, and is added to the MovieLens class: def pearson_correlation(self, criticA, criticB): """ Returns the Pearson Correlation of two critics, A and B by performing the PPMC calculation on the scatter plot of (a, b) ratings on the shared set of critiqued titles. """ # Get the set of mutually rated items preferences = self.shared_preferences(criticA, criticB) # Store the length to save traversals of the len computation. # If they have no rankings in common, return 0. length = len(preferences) if length == 0: return 0 # Loop through the preferences of each critic once and compute the # various summations that are required for our final calculation. sumA = sumB = sumSquareA = sumSquareB = sumProducts = 0 for a, b in preferences.values(): sumA += a sumB += b sumSquareA += pow(a, 2) sumSquareB += pow(b, 2) sumProducts += a*b # Calculate Pearson Score numerator = (sumProducts*length) - (sumA*sumB) denominator = sqrt(((sumSquareA*length) - pow(sumA, 2)) * ((sumSquareB*length) - pow(sumB, 2))) # Prevent division by zero. if denominator == 0: return 0 return abs(numerator / denominator) How it works… The Pearson correlation computes the "product moment", which is the mean of the product of mean adjusted random variables and is defined as the covariance of two variables (a and b, in our case) divided by the product of the standard deviation of a and the standard deviation of b. As a formula, this looks like the following:   For a finite sample, which is what we have, the detailed formula, which was implemented in the preceding function, is as follows:   Another way to think about the Pearson correlation is as a measure of the linear dependence between two variables. It returns a score of -1 to 1, where negative scores closer to -1 indicate a stronger negative correlation, and positive scores closer to 1 indicate a stronger, positive correlation. A score of 0 means that the two variables are not correlated. In order for us to perform comparisons, we want to normalize our similarity metrics in the space of [0, 1] so that 0 means less similar and 1 means more similar, so we return the absolute value: >>> print model.pearson_correlation(232, 532) 0.06025793538385047 There's more… We have explored two distance metrics: the Euclidean distance and the Pearson correlation. There are many more, including the Spearman correlation, Tantimoto scores, Jaccard distance, Cosine similarity, and Manhattan distance, to name a few. Choosing the right distance metric for the dataset of your recommender along with the type of preference expression used is crucial to ensuring success in this style of recommender. It's up to the reader to explore this space further based on his or her interest and particular dataset. Finding the best critic for a user Now that we have two different ways to compute a similarity distance between users, we can determine the best critics for a particular user and see how similar they are to an individual's preferences. Getting ready Make sure that you have completed the previous recipes before tackling this one. How to do it… Implement a new method for the MovieLens class, similar_critics(), that locates the best match for a user: import heapq ... def similar_critics(self, user, metric='euclidean', n=None): """ Finds, ranks similar critics for the user according to the specified distance metric. Returns the top n similar critics if n is specified. """ # Metric jump table metrics = { 'euclidean': self.euclidean_distance, 'pearson': self.pearson_correlation, } distance = metrics.get(metric, None) # Handle problems that might occur if user not in self.reviews: raise KeyError("Unknown user, '%s'." % user) if not distance or not callable(distance): raise KeyError("Unknown or unprogrammed distance metric '%s'." % metric) # Compute user to critic similarities for all critics critics = {} for critic in self.reviews: # Don't compare against yourself! if critic == user: continue critics[critic] = distance(user, critic) if n: return heapq.nlargest(n, critics.items(), key=itemgetter(1)) return critics How it works… The similar_critics method, added to the MovieLens class, serves as the heart of this recipe. It takes as parameters the targeted user and two optional parameters: the metric to be used, which defaults to euclidean, and the number of results to be returned, which defaults to None. As you can see, this flexible method uses a jump table to determine what algorithm is to be used (you can pass in euclidean or pearson to choose the distance metric). Every other critic is compared to the current user (except a comparison of the user against themselves). The results are then sorted using the flexible heapq module and the top n results are returned. To test out our implementation, print out the results of the run for both similarity distances: >>> for item in model.similar_critics(232, 'euclidean', n=10): print "%4i: %0.3f" % item 688: 1.000 914: 1.000 47: 0.500 78: 0.500 170: 0.500 335: 0.500 341: 0.500 101: 0.414 155: 0.414 309: 0.414 >>> for item in model.similar_critics(232, 'pearson', n=10): print "%4i: %0.3f" % item 33: 1.000 36: 1.000 155: 1.000 260: 1.000 289: 1.000 302: 1.000 309: 1.000 317: 1.000 511: 1.000 769: 1.000 These scores are clearly very different, and it appears that Pearson thinks that there are much more similar users than the Euclidean distance metric. The Euclidean distance metric tends to favor users who have rated fewer items exactly the same. Pearson correlation favors more scores that fit well linearly, and therefore, Pearson corrects grade inflation where two critics might rate movies very similarly, but one user rates them consistently one star higher than the other. If you plot out how many shared rankings each critic has, you'll see that the data is very sparse. Here is the preceding data with the number of rankings appended: Euclidean scores: 688: 1.000 (1 shared rankings) 914: 1.000 (2 shared rankings) 47: 0.500 (5 shared rankings) 78: 0.500 (3 shared rankings) 170: 0.500 (1 shared rankings) Pearson scores: 33: 1.000 (2 shared rankings) 36: 1.000 (3 shared rankings) 155: 1.000 (2 shared rankings) 260: 1.000 (3 shared rankings) 289: 1.000 (3 shared rankings) Therefore, it is not enough to find similar critics and use their ratings to predict our users' scores; instead, we will have to aggregate the scores of all of the critics, regardless of similarity, and predict ratings for the movies we haven't rated. Predicting movie ratings for users To predict how we might rate a particular movie, we can compute a weighted average of critics who have also rated the same movies as the user. The weight will be the similarity of the critic to user—if a critic has not rated a movie, then their similarity will not contribute to the overall ranking of the movie. Getting ready Ensure that you have completed the previous recipes in this large, cumulative article. How to do it… The following steps walk you through the prediction of movie ratings for users: First, add the predict_ranking function to the MovieLens class in order to predict the ranking a user might give a particular movie with similar critics: def predict_ranking(self, user, movie, metric='euclidean', critics=None): """ Predicts the ranking a user might give a movie based on the weighted average of the critics similar to the that user. """ critics = critics or self.similar_critics(user, metric=metric) total = 0.0 simsum = 0.0 for critic, similarity in critics.items(): if movie in self.reviews[critic]: total += similarity * self.reviews[critic][movie]['rating'] simsum += similarity if simsum == 0.0: return 0.0 return total / simsum Next, add the predict_all_rankings method to the MovieLens class: def predict_all_rankings(self, user, metric='euclidean', n=None): """ Predicts all rankings for all movies, if n is specified returns the top n movies and their predicted ranking. """ critics = self.similar_critics(user, metric=metric) movies = { movie: self.predict_ranking(user, movie, metric, critics) for movie in self.movies } if n: return heapq.nlargest(n, movies.items(), key=itemgetter(1)) return movies How it works… The predict_ranking method takes a user and a movie along with a string specifying the distance metric and returns the predicted rating for that movie for that particular user. A fourth argument, critics, is meant to be an optimization for the predict_all_rankings method, which we'll discuss shortly. The prediction gathers all critics who are similar to the user and computes the weighted total rating of the critics, filtered by those who actually did rate the movie in question. The weights are simply their similarity to the user, computed by the distance metric. This total is then normalized by the sum of the similarities to move the rating back into the space of 1 to 5 stars: >>> print model.predict_ranking(422, 50, 'euclidean') 4.35413151722 >>> print model.predict_ranking(422, 50, 'pearson') 4.3566797826 Here, we can see the predictions for Star Wars (ID 50 in our MovieLens dataset) for the user 422. The Euclidean and Pearson computations are very close to each other (which isn't necessarily to be expected), but the prediction is also very close to the user's actual rating, which is 4. The predict_all_rankings method computes the ranking predictions for all movies for a particular user according to the passed-in metric. It optionally takes a value, n, to return the top n best matches. This function optimizes the similar critics' lookup by only executing it once and then passing those discovered critics to the predict_ranking function in order to improve the performance. However, this method must be run on every single movie in the dataset: >>> for mid, rating in model.predict_all_rankings(578, 'pearson', 10): ... print "%0.3f: %s" % (rating, model.movies[mid]['title']) 5.000: Prefontaine (1997) 5.000: Santa with Muscles (1996) 5.000: Marlene Dietrich: Shadow and Light (1996) 5.000: Star Kid (1997) 5.000: Aiqing wansui (1994) 5.000: Someone Else's America (1995) 5.000: Great Day in Harlem, A (1994) 5.000: Saint of Fort Washington, The (1993) 4.954: Anna (1996) 4.817: Innocents, The (1961) As you can see, we have now computed what our recommender thinks the top movies for this particular user are, along with what we think the user will rate the movie! The top-10 list of average movie ratings plays a huge rule here and a potential improvement could be to use the Bayesian averaging in addition to the similarity weighting, but that is left for the reader to implement. Collaboratively filtering item by item So far, we have compared users to other users in order to make our predictions. However, the similarity space can be partitioned in two ways. User-centric collaborative filtering plots users in the preference space and discovers how similar users are to each other. These similarities are then used to predict rankings, aligning the user with similar critics. Item-centric collaborative filtering does just the opposite; it plots the items together in the preference space and makes recommendations according to how similar a group of items are to another group. Item-based collaborative filtering is a common optimization as the similarity of items changes slowly. Once enough data has been gathered, reviewers adding reviews does not necessarily change the fact that Toy Story is more similar to Babe than The Terminator, and users who prefer Toy Story might prefer the former to the latter. Therefore, you can simply compute item similarities once in a single offline-process and use that as a static mapping for recommendations, updating the results on a semi-regular basis. This recipe will walk you through item-by-item collaborative filtering. Getting ready This recipe requires the completion of the previous recipes in this article. How to do it… Construct the following function to perform item-by-item collaborative filtering: def shared_critics(self, movieA, movieB): """ Returns the intersection of critics for two items, A and B """ if movieA not in self.movies: raise KeyError("Couldn't find movie '%s' in data" %movieA) if movieB not in self.movies: raise KeyError("Couldn't find movie '%s' in data" %movieB) criticsA = set(critic for critic in self.reviews if movieA in self.reviews[critic]) criticsB = set(critic for critic in self.reviews if movieB in self.reviews[critic]) shared = criticsA & criticsB # Intersection operator # Create the reviews dictionary to return reviews = {} for critic in shared: reviews[critic] = ( self.reviews[critic][movieA]['rating'], self.reviews[critic][movieB]['rating'], ) return reviews def similar_items(self, movie, metric='euclidean', n=None): # Metric jump table metrics = { 'euclidean': self.euclidean_distance, 'pearson': self.pearson_correlation, } distance = metrics.get(metric, None) # Handle problems that might occur if movie not in self.reviews: raise KeyError("Unknown movie, '%s'." % movie) if not distance or not callable(distance): raise KeyError("Unknown or unprogrammed distance metric '%s'." % metric) items = {} for item in self.movies: if item == movie: continue items[item] = distance(item, movie, prefs='movies') if n: return heapq.nlargest(n, items.items(), key=itemgetter(1)) return items How it works… To perform item-by-item collaborative filtering, the same distance metrics can be used but must be updated to use the preferences from shared_critics rather than shared_preferences (for example, item similarity versus user similarity). Update the functions to accept a prefs parameter that determines which preferences are to be used, but I'll leave that to the reader as it is only two lines of code. If you print out the list of similar items for a particular movie, you can see some interesting results. For example, review the similarity results for The Crying Game (1992), which has an ID of 631: for movie, similarity in model.similar_items(631, 'pearson').items(): print "%0.3f: %s" % (similarity, model.movies[movie]['title']) 0.127: Toy Story (1995) 0.209: GoldenEye (1995) 0.069: Four Rooms (1995) 0.039: Get Shorty (1995) 0.340: Copycat (1995) 0.225: Shanghai Triad (Yao a yao yao dao waipo qiao) (1995) 0.232: Twelve Monkeys (1995) ... This crime thriller is not very similar to Toy Story, which is a children's movie, but is more similar to Copycat, which is another crime thriller. Of course, critics who have rated many movies skew the results, and more movie reviews are needed before this normalizes into something more compelling. It is presumed that the item similarity scores are run regularly but do not need to be computed in real time. Given a set of computed item similarities, computing recommendations are as follows: def predict_ranking(self, user, movie, metric='euclidean'): movies = self.similar_items(movie, metric=metric) total = 0.0 simsum = 0.0 for relmovie, similarity in movies.items(): # Ignore movies already reviewed by user if relmovie in self.reviews[user]: total += similarity * self.reviews[user][relmovie]['rating'] simsum += similarity if simsum == 0.0: return 0.0 return total / simsum This method simply uses the inverted item-to-item similarity scores rather than the user-to-user similarity scores. Since similar items can be computed offline, the lookup for movies via the self.similar_items method should be a database lookup rather than a real-time computation. >>> print model.predict_ranking(232, 52, 'pearson') 3.980443976 You can then compute a ranked list of all possible recommendations in a similar way as the user-to-user recommendations. Building a nonnegative matrix factorization model A general improvement on the basic cross-wise nearest-neighbor similarity scoring of collaborative filtering is a matrix factorization method, which is also known as Singular Value Decomposition (SVD). Matrix factorization methods attempt to explain the ratings through the discovery of latent features that are not easily identifiable by analysts. For instance, this technique can expose possible features such as the amount of action, family friendliness, or fine-tuned genre discovery in our movies dataset. What's especially interesting about these features is that they are continuous and not discrete values and can represent an individual's preference along a continuum. In this sense, the model can explore shades of characteristics, for example, perhaps a critic in the movie reviews' dataset, such as action flicks with a strong female lead that are set in European countries. A James Bond movie might represent a shade of that type of movie even though it only ticks the set in European countries and action genre boxes. Depending on how similarly reviewers rate the movie, the strength of the female counterpart to James Bond will determine how they might like the movie. Also, extremely helpfully, the matrix factorization model does well on sparse data, that is data with few recommendation and movie pairs. Reviews' data is particularly sparse because not everyone has rated the same movies and there is a massive set of available movies. SVD can also be performed in parallel, making it a good choice for much larger datasets. In the remaining recipes in this article, we will build a nonnegative matrix factorization model in order to improve our recommendation engine. How to do it… Loading the entire dataset into the memory. Dumping the SVD-based model to the disk. Training the SVD-based model. Testing the SVD-based model. How it works… Matrix factorization, or SVD works, by finding two matrices such that when you take their dot product (also known as the inner product or scalar product), you will get a close approximation of the original matrix. We have expressed our training matrix as a sparse N x M matrix of users to movies where the values are the 5-star rating if it exists, otherwise, the value is blank or 0. By factoring the model with the values that we have and then taking the dot product of the two matrices produced by the factorization, we hope to fill in the blank spots in our original matrix with a prediction of how the user would have rated the movie in that column. The intuition is that there should be some latent features that determine how users rate an item, and these latent features are expressed through the semantics of their previous ratings. If we can discover the latent features, we will be able to predict new ratings. Additionally, there should be fewer features than there are users and movies (otherwise, each movie or user would be a unique feature). This is why we compose our factored matrices by some feature length before taking their dot product. Mathematically, this task is expressed as follows. If we have a set of U users and M movies, let R of size |U| x |M| be the matrix that contains the ratings of users. Assuming that we have K latent features, find two matrices, P and Q, where P is |U| x K and Q is |M| x K such that the dot product of P and Q transpose approximates R. P, which therefore represent the strength of the associations between users and features and Q represents the association of movies with features. There are a few ways to go about factorization, but the choice we made was to perform gradient descent. Gradient descent initializes two random P and Q matrices, computes their dot product, and then minimizes the error compared to the original matrix by traveling down a slope of an error function (the gradient). This way, the algorithm hopes to find a local minimum where the error is within an acceptable threshold. Our function computed the error as the squared difference between the predicted value and the actual value. To minimize the error, we modify the values pik and qkj by descending along the gradient of the current error slope, differentiating our error equation with respect to p yields: We then differentiate our error equation with respect to the variable q yields in the following equation: We can then derive our learning rule, which updates the values in P and Q by a constant learning rate, which is α. This learning rate, α, should not be too large because it determines how big of a step we take towards the minimum, and it is possible to step across to the other side of the error curve. It should also not be too small, otherwise it will take forever to converge. We continue to update our P and Q matrices, minimizing the error until the sum of the error squared is below some threshold, 0.001 in our code, or until we have performed a maximum number of iterations. Matrix factorization has become an important technique for recommender systems, particularly those that leverage Likert-scale-like preference expressions—notably, star ratings. The Netflix Prize challenge has shown us that matrix-factored approaches perform with a high degree of accuracy for ratings prediction tasks. Additionally, matrix factorization is a compact, memory-efficient representation of the parameter space for a model and can be trained in parallel, can support multiple feature vectors, and can be improved with confidence levels. Generally, they are used to solve cold-start problems with sparse reviews and in an ensemble with more complex hybrid-recommenders that also compute content-based recommenders. See also Wikipedia's overview of the dot product available at http://en.wikipedia.org/wiki/Dot_product Loading the entire dataset into the memory The first step in building a nonnegative factorization model is to load the entire dataset in the memory. For this task, we will be leveraging NumPy highly. Getting ready In order to complete this recipe, you'll have to download the MovieLens database from the University of Minnesota GroupLens page at http://grouplens.org/datasets/movielens/ and unzip it in a working directory where your code will be. We will also use NumPy in this code significantly, so please ensure that you have this numerical analysis package downloaded and ready. Additionally, we will use the load_reviews function from the previous recipes. If you have not had the opportunity to review the appropriate section, please have the code for that function ready. How to do it… To build our matrix factorization model, we'll need to create a wrapper for the predictor that loads the entire dataset into memory. We will perform the following steps: We create the following Recommender class as shown. Please note that this class depends on the previously created and discussed load_reviews function: import numpy as np import csv class Recommender(object): def __init__(self, udata): self.udata = udata self.users = None self.movies = None self.reviews = None self.load_dataset() def load_dataset(self): """ Load an index of users & movies as a heap and reviews table as a N x M array where N is the number of users and M is the number of movies. Note that order matters so that we can look up values outside of the matrix! """ self.users = set([]) self.movies = set([]) for review in load_reviews(self.udata): self.users.add(review['userid']) self.movies.add(review['movieid']) self.users = sorted(self.users) self.movies = sorted(self.movies) self.reviews = np.zeros(shape=(len(self.users), len(self.movies))) for review in load_reviews(self.udata): uid = self.users.index(review['userid']) mid = self.movies.index(review['movieid']) self.reviews[uid, mid] = review['rating'] With this defined, we can instantiate a model by typing the following command: data_path = '../data/ml-100k/u.data' model = Recommender(data_path) How it works… Let's go over this code line by line. The instantiation of our recommender requires a path to the u.data file; creates holders for our list of users, movies, and reviews; and then loads the dataset. We need to hold the entire dataset in memory for reasons that we will see later. The basic data structure to perform our matrix factorization on is an N x M matrix where N is the number of users and M is the number of movies. To create this, we will first load all the movies and users into an ordered list so that we can look up the index of the user or movie by its ID. In the case of MovieLens, all of the IDs are contiguous from 1; however, this might not always be the case. It is good practice to have an index lookup table. Otherwise, you will be unable to fetch recommendations from our computation! Once we have our index lookup lists, we create a NumPy array of all zeroes in the size of the length of our users' list by the length of our movies list. Keep in mind that the rows are users and the columns are movies! We then go through the ratings data a second time and then add the value of the rating at the uid, mid index location of our matrix. Note that if a user hasn't rated a movie, their rating is 0. This is important! Print the array out by entering model.reviews, and you should see something as follows: [[ 5. 3. 4. ..., 0. 0. 0.] [ 4. 0. 0. ..., 0. 0. 0.] [ 0. 0. 0. ..., 0. 0. 0.] ..., [ 5. 0. 0. ..., 0. 0. 0.] [ 0. 0. 0. ..., 0. 0. 0.] [ 0. 5. 0. ..., 0. 0. 0.]] There's more… Let's get a sense of how sparse or dense our dataset is by adding the following two methods to the Recommender class: def sparsity(self): """ Report the percent of elements that are zero in the array """ return 1 - self.density() def density(self): """ Return the percent of elements that are nonzero in the array """ nonzero = float(np.count_nonzero(self.reviews)) return nonzero / self.reviews.size Adding these methods to our Recommender class will help us evaluate our recommender, and it will also help us identify recommenders in the future. Print out the results: print "%0.3f%% sparse" % model.sparsity() print "%0.3f%% dense" % model.density() You should see that the MovieLens 100k dataset is 0.937 percent sparse and 0.063 percent dense. This is very important to keep note of along with the size of the reviews dataset. Sparsity, which is common to most recommender systems, means that we might be able to use sparse matrix algorithms and optimizations. Additionally, as we begin to save models, this will help us identify the models as we load them from serialized files on the disk. Dumping the SVD-based model to the disk Before we build our model, which will take a long time to train, we should create a mechanism for us to load and dump our model to the disk. If we have a way of saving the parameterization of the factored matrix, then we can reuse our model without having to train it every time we want to use it—this is a very big deal since this model will take hours to train! Luckily, Python has a built-in tool for serializing and deserializing Python objects—the pickle module. How to do it… Update the Recommender class as follows: import pickle class Recommender(object): @classmethod def load(klass, pickle_path): """ Instantiates the class by deserializing the pickle. Note that the object returned may not be an exact match to the code in this class (if it was saved before updates). """ with open(pickle_path, 'rb') as pkl: return pickle.load(pkl) def __init__(self, udata, description=None): self.udata = udata self.users = None self.movies = None self.reviews = None # Descriptive properties self.build_start = None self.build_finish = None self.description = None # Model properties self.model = None self.features = 2 self.steps = 5000 self.alpha = 0.0002 self.beta = 0.02 self.load_dataset() def dump(self, pickle_path): """ Dump the object into a serialized file using the pickle module. This will allow us to quickly reload our model in the future. """ with open(pickle_path, 'wb') as pkl: pickle.dump(self, pkl) How it works… The @classmethod feature is a decorator in Python for declaring a class method instead of an instance method. The first argument that is passed in is the type instead of an instance (which we usually refer to as self). The load class method takes a path to a file on the disk that contains a serialized pickle object, which it then loads using the pickle module. Note that the class that is returned might not be an exact match with the Recommender class at the time you run the code—this is because the pickle module saves the class, including methods and properties, exactly as it was when you dumped it. Speaking of dumping, the dump method provides the opposite functionality, allowing you to serialize the methods, properties, and data to disk in order to be loaded again in the future. To help us identify the objects that we're dumping and loading from disk, we've also added some descriptive properties including a description, some build parameters, and some timestamps to our __init__ function. Training the SVD-based model We're now ready to write our functions that factor our training dataset and build our recommender model. You can see the required functions in this recipe. How to do it… We construct the following functions to train our model. Note that these functions are not part of the Recommender class: def initialize(R, K): """ Returns initial matrices for an N X M matrix, R and K features. :param R: the matrix to be factorized :param K: the number of latent features :returns: P, Q initial matrices of N x K and M x K sizes """ N, M = R.shape P = np.random.rand(N,K) Q = np.random.rand(M,K) return P, Q def factor(R, P=None, Q=None, K=2, steps=5000, alpha=0.0002, beta=0.02): """ Performs matrix factorization on R with given parameters. :param R: A matrix to be factorized, dimension N x M :param P: an initial matrix of dimension N x K :param Q: an initial matrix of dimension M x K :param K: the number of latent features :param steps: the maximum number of iterations to optimize in :param alpha: the learning rate for gradient descen :param beta: the regularization parameter :returns: final matrices P and Q """ if not P or not Q: P, Q = initialize(R, K) Q = Q.T rows, cols = R.shape for step in xrange(steps): for i in xrange(rows): for j in xrange(cols): if R[i,j] > 0: eij = R[i,j] - np.dot(P[i,:], Q[:,j]) for k in xrange(K): P[i,k] = P[i,k] + alpha * (2 * eij * Q[k,j] - beta * P[i,k]) Q[k,j] = Q[k,j] + alpha * (2 * eij * P[i,k] - beta * Q[k,j]) e = 0 for i in xrange(rows): for j in xrange(cols): if R[i,j] > 0: e = e + pow(R[i,j] - np.dot(P[i,:], Q[:,j]), 2) for k in xrange(K): e = e + (beta/2) * (pow(P[i,k], 2) + pow(Q[k,j], 2)) if e < 0.001: break return P, Q.T How it works… We discussed the theory and the mathematics of what we are doing in the previous recipe, Building a non-negative matrix factorization model, so let's talk about the code. The initialize function creates two matrices, P and Q, that have a size related to the reviews matrix and the number of features, namely N x K and M x K, where N is the number of users and M is the number of movies. Their values are initialized to random numbers that are between 0.0 and 1.0. The factor function computes P and Q using gradient descent such that the dot product of P and Q is within a mean squared error of less than 0.001 or 5000 steps that have gone by, whichever comes first. Especially note that only values that are greater than 0 are computed. These are the values that we're trying to predict; therefore, we do not want to attempt to match them in our code (otherwise, the model will be trained on zero ratings)! This is also the reason that you can't use NumPy's built-in Singular Value Decomposition (SVD) function, which is np.linalg.svd or np.linalg.solve. There's more… Let's use these factorization functions to build our model and to save the model to disk once it has been built—this way, we can load the model at our convenience using the dump and load methods in the class. Add the following method to the Recommender class: def build(self, output=None): """ Trains the model by employing matrix factorization on training data set, (sparse reviews matrix). The model is the dot product of the P and Q decomposed matrices from the factorization. """ options = { 'K': self.features, 'steps': self.steps, 'alpha': self.alpha, 'beta': self.beta, } self.build_start = time.time() self.P, self.Q = factor(self.reviews, **options) self.model = np.dot(self.P, self.Q.T) self.build_finish = time.time() if output: self.dump(output) This helper function will allow us to quickly build our model. Note that we're also saving P and Q—the parameters of our latent features. This isn't necessary, as our predictive model is the dot product of the two factored matrices. Deciding whether or not to save this information in your model is a trade-off between re-training time (you can potentially start from the current P and Q parameters although you must beware of the overfit) and disk space, as pickle will be larger on the disk with these matrices saved. To build this model and dump the data to the disk, run the following code: model = Recommender(relative_path('../data/ml-100k/u.data')) model.build('reccod.pickle') Warning! This will take a long time to build! On a 2013 MacBook Pro with a 2.8 GHz processor, this process took roughly 9 hours 15 minutes and required 23.1 MB of memory; this is not insignificant for most of the Python scripts you might be used to writing! It is not a bad idea to continue through the rest of the recipe before building your model. It is also probably not a bad idea to test your code on a smaller test set of 100 records before moving on to the entire process! Additionally, if you don't have the time to train the model, you can find the pickle module of our model in the errata of this book. Testing the SVD-based model This recipe brings this article on recommendation engines to a close. We use our new nonnegative matrix factorization-based model and take a look at some of the predicted reviews. How to do it… The final step in leveraging our model is to access the predicted reviews for a movie based on our model: def predict_ranking(self, user, movie): uidx = self.users.index(user) midx = self.movies.index(movie) if self.reviews[uidx, midx] > 0: return None return self.model[uidx, midx] How it works… Computing the ranking is relatively easy; we simply need to look up the index of the user and the index of the movie and look up the predicted rating in our model. This is why it is so essential to save an ordered list of the users and movies in our pickle module; this way, if the data changes (we add users or movies) but the change isn't reflected in our model, an exception is raised. Because models are historical predictions and not sensitive to changes in time, we need to ensure that we continually retrain our model with new data. This method also returns None if we know the ranking of the user (for example, it's not a prediction); we'll leverage this in the next step. There's more… To predict the highest-ranked movies, we can leverage the previous function to order the highest predicted rankings for our user: import heapq from operator import itemgetter def top_rated(self, user, n=12): movies = [(mid, self.predict_ranking(user, mid)) for mid in self.movies] return heapq.nlargest(n, movies, key=itemgetter(1)) We can now print out the top-predicted movies that have not been rated by the user: >>> rec = Recommender.load('reccod.pickle') >>> for item in rec.top_rated(234): ... print "%i: %0.3f" % item 814: 4.437 1642: 4.362 1491: 4.361 1599: 4.343 1536: 4.324 1500: 4.323 1449: 4.281 1650: 4.147 1645: 4.135 1467: 4.133 1636: 4.133 1651: 4.132 It's then simply a matter of using the movie ID to look up the movie in our movies database. Summary To learn more about Data Science, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Principles of Data Science (https://www.packtpub.com/big-data-and-business-intelligence/principles-data-science) Python Data Science Cookbook (https://www.packtpub.com/big-data-and-business-intelligence/python-data-science-cookbook) R for Data Science (https://www.packtpub.com/big-data-and-business-intelligence/r-data-science) Resources for Article: Further resources on this subject: Big Data[article] Big Data Analysis (R and Hadoop)[article] Visualization of Big Data[article]
Read more
  • 0
  • 0
  • 12171

article-image-python-design-patterns-depth-factory-pattern
Packt
15 Feb 2016
17 min read
Save for later

Python Design Patterns in Depth: The Factory Pattern

Packt
15 Feb 2016
17 min read
Creational design patterns deal with an object creation [j.mp/wikicrea]. The aim of a creational design pattern is to provide better alternatives for situations where a direct object creation (which in Python happens by the __init__() function [j.mp/divefunc], [Lott14, page 26]) is not convenient. In the Factory design pattern, a client asks for an object without knowing where the object is coming from (that is, which class is used to generate it). The idea behind a factory is to simplify an object creation. It is easier to track which objects are created if this is done through a central function, in contrast to letting a client create objects using a direct class instantiation [Eckel08, page 187]. A factory reduces the complexity of maintaining an application by decoupling the code that creates an object from the code that uses it [Zlobin13, page 30]. Factories typically come in two forms: the Factory Method, which is a method (or in Pythonic terms, a function) that returns a different object per input parameter [j.mp/factorympat]; the Abstract Factory, which is a group of Factory Methods used to create a family of related products [GOF95, page 100], [j.mp/absfpat] (For more resources related to this topic, see here.) Factory Method In the Factory Method, we execute a single function, passing a parameter that provides information about what we want. We are not required to know any details about how the object is implemented and where it is coming from. A real-life example An example of the Factory Method pattern used in reality is in plastic toy construction. The molding powder used to construct plastic toys is the same, but different figures can be produced using different plastic molds. This is like having a Factory Method in which the input is the name of the figure that we want (soldier and dinosaur) and the output is the plastic figure that we requested. The toy construction case is shown in the following figure, which is provided by www.sourcemaking.com [j.mp/factorympat]. A software example The Django framework uses the Factory Method pattern for creating the fields of a form. The forms module of Django supports the creation of different kinds of fields (CharField, EmailField) and customizations (max_length, required) [j.mp/djangofacm]. Use cases If you realize that you cannot track the objects created by your application because the code that creates them is in many different places instead of a single function/method, you should consider using the Factory Method pattern [Eckel08, page 187]. The Factory Method centralizes an object creation and tracking your objects becomes much more easier. Note that it is absolutely fine to create more than one Factory Method, and this is how it is typically done in practice. Each Factory Method logically groups the creation of objects that have similarities. For example, one Factory Method might be responsible for connecting you to different databases (MySQL, SQLite), another Factory Method might be responsible for creating the geometrical object that you request (circle, triangle), and so on. The Factory Method is also useful when you want to decouple an object creation from an object usage. We are not coupled/bound to a specific class when creating an object, we just provide partial information about what we want by calling a function. This means that introducing changes to the function is easy without requiring any changes to the code that uses it [Zlobin13, page 30]. Another use case worth mentioning is related with improving the performance and memory usage of an application. A Factory Method can improve the performance and memory usage by creating new objects only if it is absolutely necessary [Zlobin13, page 28]. When we create objects using a direct class instantiation, extra memory is allocated every time a new object is created (unless the class uses caching internally, which is usually not the case). We can see that in practice in the following code (file id.py), it creates two instances of the same class A and uses the id() function to compare their memory addresses. The addresses are also printed in the output so that we can inspect them. The fact that the memory addresses are different means that two distinct objects are created as follows: class A(object):     pass if __name__ == '__main__':     a = A()     b = A()     print(id(a) == id(b))     print(a, b) Executing id.py on my computer gives the following output:>> python3 id.pyFalse<__main__.A object at 0x7f5771de8f60> <__main__.A object at 0x7f5771df2208> Note that the addresses that you see if you execute the file are not the same as I see because they depend on the current memory layout and allocation. But the result must be the same: the two addresses should be different. There's one exception that happens if you write and execute the code in the Python Read-Eval-Print Loop (REPL) (interactive prompt), but that's a REPL-specific optimization which is not happening normally. Implementation Data comes in many forms. There are two main file categories for storing/retrieving data: human-readable files and binary files. Examples of human-readable files are XML, Atom, YAML, and JSON. Examples of binary files are the .sq3 file format used by SQLite and the .mp3 file format used to listen to music. In this example, we will focus on two popular human-readable formats: XML and JSON. Although human-readable files are generally slower to parse than binary files, they make data exchange, inspection, and modification much more easier. For this reason, it is advised to prefer working with human-readable files, unless there are other restrictions that do not allow it (mainly unacceptable performance and proprietary binary formats). In this problem, we have some input data stored in an XML and a JSON file, and we want to parse them and retrieve some information. At the same time, we want to centralize the client's connection to those (and all future) external services. We will use the Factory Method to solve this problem. The example focuses only on XML and JSON, but adding support for more services should be straightforward. First, let's take a look at the data files. The XML file, person.xml, is based on the Wikipedia example [j.mp/wikijson] and contains information about individuals (firstName, lastName, gender, and so on) as follows: <persons>   <person>     <firstName>John</firstName>     <lastName>Smith</lastName>     <age>25</age>     <address>       <streetAddress>21 2nd Street</streetAddress>       <city>New York</city>       <state>NY</state>       <postalCode>10021</postalCode>     </address>     <phoneNumbers>       <phoneNumber type="home">212 555-1234</phoneNumber>       <phoneNumber type="fax">646 555-4567</phoneNumber>     </phoneNumbers>     <gender>       <type>male</type>     </gender>   </person>   <person>     <firstName>Jimy</firstName>     <lastName>Liar</lastName>     <age>19</age>     <address>       <streetAddress>18 2nd Street</streetAddress>       <city>New York</city>       <state>NY</state>       <postalCode>10021</postalCode>     </address>     <phoneNumbers>       <phoneNumber type="home">212 555-1234</phoneNumber>     </phoneNumbers>     <gender>       <type>male</type>     </gender>   </person>   <person>     <firstName>Patty</firstName>     <lastName>Liar</lastName>     <age>20</age>     <address>       <streetAddress>18 2nd Street</streetAddress>       <city>New York</city>       <state>NY</state>       <postalCode>10021</postalCode>     </address>     <phoneNumbers>       <phoneNumber type="home">212 555-1234</phoneNumber>       <phoneNumber type="mobile">001 452-8819</phoneNumber>     </phoneNumbers>     <gender>       <type>female</type>     </gender>   </person> </persons> The JSON file, donut.json, comes from the GitHub account of Adobe [j.mp/adobejson] and contains donut information (type, price/unit i.e. ppu, topping, and so on) as follows: [   {     "id": "0001",     "type": "donut",     "name": "Cake",     "ppu": 0.55,     "batters": {       "batter": [         { "id": "1001", "type": "Regular" },         { "id": "1002", "type": "Chocolate" },         { "id": "1003", "type": "Blueberry" },         { "id": "1004", "type": "Devil's Food" }       ]     },     "topping": [       { "id": "5001", "type": "None" },       { "id": "5002", "type": "Glazed" },       { "id": "5005", "type": "Sugar" },       { "id": "5007", "type": "Powdered Sugar" },       { "id": "5006", "type": "Chocolate with Sprinkles" },       { "id": "5003", "type": "Chocolate" },       { "id": "5004", "type": "Maple" }     ]   },   {     "id": "0002",     "type": "donut",     "name": "Raised",     "ppu": 0.55,     "batters": {       "batter": [         { "id": "1001", "type": "Regular" }       ]     },     "topping": [       { "id": "5001", "type": "None" },       { "id": "5002", "type": "Glazed" },       { "id": "5005", "type": "Sugar" },       { "id": "5003", "type": "Chocolate" },       { "id": "5004", "type": "Maple" }     ]   },   {     "id": "0003",     "type": "donut",     "name": "Old Fashioned",     "ppu": 0.55,     "batters": {       "batter": [         { "id": "1001", "type": "Regular" },         { "id": "1002", "type": "Chocolate" }       ]     },     "topping": [       { "id": "5001", "type": "None" },       { "id": "5002", "type": "Glazed" },       { "id": "5003", "type": "Chocolate" },       { "id": "5004", "type": "Maple" }     ]   } ] We will use two libraries that are part of the Python distribution for working with XML and JSON: xml.etree.ElementTree and json as follows: import xml.etree.ElementTree as etree import json The JSONConnector class parses the JSON file and has a parsed_data() method that returns all data as a dictionary (dict). The property decorator is used to make parsed_data() appear as a normal variable instead of a method as follows: class JSONConnector:     def __init__(self, filepath):         self.data = dict()         with open(filepath, mode='r', encoding='utf-8') as f:             self.data = json.load(f)       @property     def parsed_data(self):         return self.data The XMLConnector class parses the XML file and has a parsed_data() method that returns all data as a list of xml.etree.Element as follows: class XMLConnector:     def __init__(self, filepath):         self.tree = etree.parse(filepath)     @property    def parsed_data(self):         return self.tree The connection_factory() function is a Factory Method. It returns an instance of JSONConnector or XMLConnector depending on the extension of the input file path as follows: def connection_factory(filepath):     if filepath.endswith('json'):         connector = JSONConnector     elif filepath.endswith('xml'):         connector = XMLConnector     else:         raise ValueError('Cannot connect to {}'.format(filepath))     return connector(filepath) The connect_to() function is a wrapper of connection_factory(). It adds exception handling as follows: def connect_to(filepath):     factory = None     try:         factory = connection_factory(filepath)     except ValueError as ve:         print(ve)     return factory The main() function demonstrates how the Factory Method design pattern can be used. The first part makes sure that exception handling is effective as follows: def main():     sqlite_factory = connect_to('data/person.sq3') The next part shows how to work with the XML files using the Factory Method. XPath is used to find all person elements that have the last name Liar. For each matched person, the basic name and phone number information are shown as follows: xml_factory = connect_to('data/person.xml')     xml_data = xml_factory.parsed_data()     liars = xml_data.findall     (".//{person}[{lastName}='{}']".format('Liar'))     print('found: {} persons'.format(len(liars)))     for liar in liars:         print('first name:         {}'.format(liar.find('firstName').text))         print('last name: {}'.format(liar.find('lastName').text))         [print('phone number ({}):'.format(p.attrib['type']),         p.text) for p in liar.find('phoneNumbers')] The final part shows how to work with the JSON files using the Factory Method. Here, there's no pattern matching, and therefore the name, price, and topping of all donuts are shown as follows: json_factory = connect_to('data/donut.json')     json_data = json_factory.parsed_data     print('found: {} donuts'.format(len(json_data)))     for donut in json_data:         print('name: {}'.format(donut['name']))         print('price: ${}'.format(donut['ppu']))         [print('topping: {} {}'.format(t['id'], t['type'])) for t         in donut['topping']] For completeness, here is the complete code of the Factory Method implementation (factory_method.py) as follows: import xml.etree.ElementTree as etree import json class JSONConnector:     def __init__(self, filepath):         self.data = dict()         with open(filepath, mode='r', encoding='utf-8') as f:             self.data = json.load(f)     @property     def parsed_data(self):         return self.data class XMLConnector:     def __init__(self, filepath):         self.tree = etree.parse(filepath)     @property     def parsed_data(self):         return self.tree def connection_factory(filepath):     if filepath.endswith('json'):         connector = JSONConnector     elif filepath.endswith('xml'):         connector = XMLConnector     else:         raise ValueError('Cannot connect to {}'.format(filepath))     return connector(filepath) def connect_to(filepath):     factory = None     try:        factory = connection_factory(filepath)     except ValueError as ve:         print(ve)     return factory def main():     sqlite_factory = connect_to('data/person.sq3')     print()     xml_factory = connect_to('data/person.xml')     xml_data = xml_factory.parsed_data     liars = xml_data.findall(".//{}[{}='{}']".format('person',     'lastName', 'Liar'))     print('found: {} persons'.format(len(liars)))     for liar in liars:         print('first name:         {}'.format(liar.find('firstName').text))         print('last name: {}'.format(liar.find('lastName').text))         [print('phone number ({}):'.format(p.attrib['type']),         p.text) for p in liar.find('phoneNumbers')]     print()     json_factory = connect_to('data/donut.json')     json_data = json_factory.parsed_data     print('found: {} donuts'.format(len(json_data)))     for donut in json_data:     print('name: {}'.format(donut['name']))     print('price: ${}'.format(donut['ppu']))     [print('topping: {} {}'.format(t['id'], t['type'])) for t     in donut['topping']] if __name__ == '__main__':     main() Here is the output of this program as follows: >>> python3 factory_method.pyCannot connect to data/person.sq3found: 2 personsfirst name: Jimylast name: Liarphone number (home): 212 555-1234first name: Pattylast name: Liarphone number (home): 212 555-1234phone number (mobile): 001 452-8819found: 3 donutsname: Cakeprice: $0.55topping: 5001 Nonetopping: 5002 Glazedtopping: 5005 Sugartopping: 5007 Powdered Sugartopping: 5006 Chocolate with Sprinklestopping: 5003 Chocolatetopping: 5004 Maplename: Raisedprice: $0.55topping: 5001 Nonetopping: 5002 Glazedtopping: 5005 Sugartopping: 5003 Chocolatetopping: 5004 Maplename: Old Fashionedprice: $0.55topping: 5001 Nonetopping: 5002 Glazedtopping: 5003 Chocolatetopping: 5004 Maple Notice that although JSONConnector and XMLConnector have the same interfaces, what is returned by parsed_data() is not handled in a uniform way. Different codes must be used to work with each connector. Although it would be nice to be able to use the same code for all connectors, this is at most times not realistic unless we use some kind of common mapping for the data which is very often provided by external data providers. Assuming that you can use exactly the same code for handling the XML and JSON files, what changes are required to support a third format, for example, SQLite? Find an SQLite file or create your own and try it. As is now, the code does not forbid a direct instantiation of a connector. Is it possible to do this? Try doing it (hint: functions in Python can have nested classes). Summary To learn more about design patterns in depth, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Learning Python Design Patterns (https://www.packtpub.com/application-development/learning-python-design-patterns) Learning Python Design Patterns – Second Edition (https://www.packtpub.com/application-development/learning-python-design-patterns-second-edition) Resources for Article:   Further resources on this subject: Recommending Movies at Scale (Python) [article] An In-depth Look at Ansible Plugins [article] Elucidating the Game-changing Phenomenon of the Docker-inspired Containerization Paradigm [article]
Read more
  • 0
  • 0
  • 30615

article-image-getting-started-etcd
Packt
15 Feb 2016
6 min read
Save for later

Getting Started with etcd

Packt
15 Feb 2016
6 min read
In this article we will cover etcd, CoreOS's central hub of all services that provides a reliable way of storing shared data across cluster machines and monitoring it. In this article, we will cover the following topics: Introducing etcd Reading and writing to etcd from the host machine Reading and writing from an application container Watching etcd changes A TTL (time to live) example use cases of etcd (For more resources related to this topic, see here.) Introducing etcd The etcd function is an open source distributed key value store on a computer network where information is stored on more than one node and data is replicated using the Raft consensus algorithm. The etcd function is used to store the CoreOS cluster service discovery and the shared configuration. The configuration is stored in the write-ahead log and includes the cluster member ID, cluster ID and cluster configuration, and everything else that is put there container applications running in the cluster. The etcd function runs on each cluster's central services role machine, and gracefully handles master election during network partitions and in the event of a loss of the current master. Reading and writing to etcd from the host machine You are going to learn how read and write to ectd from the host machine. We will use both the etcdctl and curl examples here. Logging in to host To login to CoreOS VM, follow these steps: Boot your CoreOS VM installed. In your terminal, type this: $ cdcoreos-vagrant $ vagrant up We need to login to the host via ssh: $ vagrant ssh Reading and writing to ectd Let's read and write to etcd using etcdctl. So, perform these steps: Set with etcdctl a message1 key with Book1 as the value: $ etcdctl set /message1 Book1Book1 (we got respond for our successful write to etcd Now, let's read the key value to double-check whether everything is fine there: $ etcdctl get /message1 Book1 Perfect! Next, let's try to do the same using curl via an HTTP-based API. The curl function is handy for accessing etcd from any place from where you have access to etcd cluster but don't want/need to use the etcdctl client: $ curl -L -X PUT http://127.0.0.1:2379/v2/keys/message2 -d value="Book2" {"action":"set","key":"/message2","prevValue":"Book1","value":"Book2","index":13371} Let's read it: $ curl -L http://127.0.0.1:2379/v2/keys/message2 {"action":"get","node":{"key":"/message2","value":"Book2","modifiedIndex":13371,"createdIndex":13371}} Using the HTTP-based etcd API means that etcd can be read from and written to by client applications without the need to interact with the command line. Now, if we want to delete the key value pair, we type the following command: $ etcdctl rm /message1 $ curl -L -X DELETE http://127.0.0.1:2379/v2/keys/message2 Also, we can add a key value pair to a directory, as directories are created automatically when a key is placed inside. We need only one command to put a key inside a directory: $ etcdctl set /foo-directory/foo-key somekey Let's now check the directory's content: $ etcdctl ls /foo-directory –recursive /foo-directory/foo-key Finally, we get the key value from the directory by typing: $ etcdctl get /foo-directory/foo-key somekey Reading and writing from the application container Usually, application containers (this is a general term for docker, rkt, and other types of containers) do not have etcdctl or even curl installed by default. Installing curl is much easier than installing etcdctl. For our example, we will use the AlpineLinux docker image, which is very small in size and will not take much time to pull from docker registry: Firstly, we need to check the docker0 interface IP, which we will use with curl: $ echo"$(ifconfig docker0 | awk'/<inet>/ { print $2}'):2379" 10.1.42.1:2379 Let's run the docker container with a bash shell: $ docker run -it alpine ash We should see something like this in Command Prompt:/ #. As curl is not installed by default on AlpineLinux, we need to install it: $ apk update&&apk add curl $ curl -L http://10.1.42.1:2379/v2/keys/ {"action":"get","node":{"key":"/","dir":true,"nodes":[{"key":"/coreos.com","dir":true,"modifiedIndex":3,"createdIndex":3}]}} Repeat steps 3 and 4 from the previous subtopic so that you understand that it does not matter where you are connecting to etcd from, curl still works in the same way. Press Ctrl +D to exit from the docker container. Watching changes in etcd This time, let's watch the key changes in etcd. Watching key changes is useful when we have, for example, one fleet unit with nginx writing its port to etcd, and another reverse proxy application watching for changes and updating its config: We need to create a directory in etcd first: $ etcdctlmkdir /foo-data Next, we watch for changes in this directory: $ etcdctl watch /foo-data--recursive Now open another CoreOS shell in a new terminal window: $ cdcoreos-vagrant $ vagrantssh We put a new key to /foo-data directory: $ etcdctl set /foo-data/Book is_cool In the first terminal, we should see a notification saying that the key was changed: is_cool A TTL (time to live) examples Sometimes, it is handy to put a time to live (TTL) for a key to expire in a certain amount of time. This is useful, for example,in the case of watching a key with a 60 second TTL, from a reverse proxy. So, if the nginx fleet service has not updated the key, it will expire in 60 seconds and will be removed from etcd. Then the reverse proxy checks for it and does not find it. Hence, it will remove the nginx service from config. Let's set TTL for 30 seconds in this example: Type this in a terminal: $ etcdctl set /foo "I'm Expiring in 30 sec" --ttl 30 I'm Expiring in 30 sec Verify that the key is still there: $ etcdctl get /foo I'm Expiring in 30 sec Check againafter 30 seconds : $ etcdctl get /foo If your requested key has already expired, you will be returned Error: 100: Error: 100: Key not found (/foo) [17053] This time the key got deleted by etcd because we put a TTL of 30 seconds for it. TTL is very handy to use to communicate between different services using etcd as the checking point. Use cases of etcd Application containers running on worker nodes with etcd in proxy mode can read and write to an etcd cluster. Very common etcd use cases are as follows: storing database connection settings, cache settings, and shared settings. For example, the Vulcand proxy server (http://vulcanproxy.com/) uses etcd to store web host connection details, and it becomes available for all cluster-connected worker machines. Another example could be to store a database password for MySQL and retrieve it when running an application container. Summary To learn more about CoreOS Essentials, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Learning CoreOS (https://www.packtpub.com/networking-and-servers/learning-coreos) Resources for Article: Further resources on this subject: Mastering CentOS 7 Linux Server[article] Linux Shell Scripting[article] What is Kali Linux[article]
Read more
  • 0
  • 0
  • 8612

article-image-deep-learning-r
Packt
15 Feb 2016
12 min read
Save for later

Deep learning in R

Packt
15 Feb 2016
12 min read
As the title suggests, in this article, we will be taking a look at some of the deep learning models in R. Some of the pioneering advancements in neural networks research in the last decade have opened up a new frontier in machine learning that is generally called by the name deep learning. The general definition of deep learning is, a class of machine learning techniques, where many layers of information processing stages in hierarchical supervised architectures are exploited for unsupervised feature learning and for pattern analysis/classification. The essence of deep learning is to compute hierarchical features or representations of the observational data, where the higher-level features or factors are defined from lower-level ones. Although there are many similar definitions and architectures for deep learning, two common elements in all of them are: multiple layers of nonlinear information processing and supervised or unsupervised learning of feature representations at each layer from the features learned at the previous layer. The initial works on deep learning were based on multilayer neural network models. Recently, many other forms of models are also used such as deep kernel machines and deep Q-networks. Researchers have experimented with multilayer neural networks even in previous decades. However, two reasons were limiting any progress with learning using such architectures. The first reason is that the learning of parameters of the network is a nonconvex optimization problem and often one gets stuck at poor local minima's starting from random initial conditions. The second reason is that the associated computational requirements were huge. A breakthrough for the first problem came when Geoffrey Hinton developed a fast algorithm for learning a special class of neural networks called deep belief nets (DBN). We will describe DBNs in more detail in the later sections. The high computational power requirements were met with the advancement in computing using general purpose graphical processing units (GPGPUs). What made deep learning so popular for practical applications is the significant improvement in accuracy achieved in automatic speech recognition and computer vision. For example, the word error rate in automatic speech recognition of a switchboard conversational speech had reached a saturation of around 40% after years of research. However, using deep learning, the word error rate was reduced dramatically to close to 10% in a matter of a few years. Another well-known example is how deep convolution neural network achieved the least error rate of 15.3% in the 2012 ImageNet Large Scale Visual Recognition Challenge compared to state-of-the-art methods that gave 26.2% as the least error rate. In this article, we will describe one class of deep learning models called deep belief networks. Interested readers are requested to read the book by Li Deng and Dong Yu for a detailed understanding of various methods and applications of deep learning. We will also illustrate the use of DBN with the R package darch. Restricted Boltzmann machines A restricted Boltzmann machine (RBM) is a two-layer network (bi-partite graph), in which one layer is a visible layer (v) and the second layer is a hidden layer (h). All nodes in the visible layer and all nodes in the hidden layer are connected by undirected edges, and there no connections between nodes in the same layer: An RBM is characterized by the joint distribution of states of all visible units v={V1,V2,...,VM}and states of all hidden units h={h1,h2,...,hN} given by: Here, E(v,h|Ɵ) is called the energy function  Z=ƩvƩhexp(-E(v,h|Ɵ) and is the normalization constant known by the name partition function from Statistical Physics nomenclature. There are mainly two types of RBMs. In the first one, both v and h are Bernoulli random variables. In the second type, h is a Bernoulli random variable whereas v is a Gaussian random variable. For Bernoulli RBM, the energy function is given by: Here, Wij represents the weight of the edge between nodes Vi and hj; bi and aj are bias parameters for the visible and hidden layers, respectively. For this energy function, the exact expressions for the conditional probability can be derived as follows: Here, is the logistic function 1/(1+exp(-x)). If the input variables are continuous, one can use the Gaussian RBM; the energy function of it is given by: Also, in this case, the conditional probabilities of vi and hj will become as follows: This is a normal distribution with mean ƩMI=1Wijhj+bi and variance 1. Now that we have described the basic architecture of an RBM, how is it that it is trained? If we try to use the standard approach of taking the gradient of log-likelihood, we get the following update rule: Here, IEdata(vi,hj) is the expectation of vi,hj computed using IEmodel(vi,hj) the dataset and is the same expectation computed using the model. However, one cannot use this exact expression for updating weights because IEmodel(vi,hj) is difficult to compute. The first breakthrough came to solve this problem and, hence, to train deep neural networks, when Hinton and team proposed an algorithm called Contrastive Divergence (CD). The essence of the algorithm is described in the next paragraph. The idea is to approximate IEmodel(vi,hj) by using values of vi and hj generated using Gibbs sampling from the conditional distributions mentioned previously. One scheme of doing this is as follows: Initialize Vt=0 from the dataset. Find ht=0 by sampling from the conditional distribution ht=0 ~ p(h|vt=0). Find Vt=1 by sampling from the conditional distribution vt=1 ~ p(v|ht=0). Find ht=1 by sampling from the conditional distribution ht=1 ~ p(h|vt=1). Once we find values of Vt=1 and ht=1 , use (vit=1hjt=1) which is the product of ith component of Vt=1 and jth component of ht=1, as an approximation for IEmodel(vi,hj). This is called CD-1 algorithm. One can generalize this to use the values from the kth step of Gibbs sampling and it is known as CD-k algorithm. One can easily see the connection between RBMs and Bayesian inference. Since the CD algorithm is like a posterior density estimate, one could say that RBMs are trained using a Bayesian inference approach. Although the Contrastive Divergence algorithm looks simple, one needs to be very careful in training RBMs, otherwise the model can result in overfitting. Readers who are interested in using RBMs in practical applications should refer to the technical report where this is discussed in detail. Deep belief networks One can stack several RBMs, one on top of each other, such that the values of hidden units in the layer n-1(hi,n-1) would become values of visible units in the nth layer (vi,n), and so on. The resulting network is called a deep belief network. It was one of the main architectures used in early deep learning networks for pretraining. The idea of pretraining a NN is the following: in the standard three-layer (input-hidden-output) NN, one can start with random initial values for the weights and using the backpropagation algorithm can find a good minimum of the log-likelihood function. However, when the number of layers increases, the straightforward application of backpropagation does not work because starting from output layer, as we compute the gradient values for the layers deep inside, their magnitude becomes very small. This is called the gradient vanishing problem. As a result, the network will get trapped in some poor local minima. Backpropagation still works if we are starting from the neighborhood of a good minimum. To achieve this, a DNN is often pretrained in an unsupervised way using a DBN. Instead of starting from random values of weights, first train a DBN in an unsupervised way and use weights from the DBN as initial weights for a corresponding supervised DNN. It was seen that such DNNs pretrained using DBNs perform much better. The layer-wise pretraining of a DBN proceeds as follows. Start with the first RBM and train it using input data in the visible layer and the CD algorithm (or its latest better variants). Then, stack a second RBM on top of this. For this RBM, use values sample from as the values for the visible layer. Continue this process for the desired number of layers. The outputs of hidden units from the top layer can also be used as inputs for training a supervised model. For this, add a conventional NN layer at the top of DBN with the desired number of classes as the number of output nodes. Input for this NN would be the output from the top layer of DBN. This is called DBN-DNN architecture. Here, a DBN's role is generating highly efficient features (the output of the top layer of DBN) automatically from the input data for the supervised NN in the top layer. The architecture of a five-layer DBN-DNN for a binary classification task is shown in the following figure: The last layer is trained using the backpropagation algorithm in a supervised manner for the two classes c1 and c2 . We will illustrate the training and classification with such a DBN-DNN using the darch R package. The darch R package The darch package, written by Martin Drees, is one of the R packages by which one can begin doing deep learning in R. It implements the DBN described in the previous section. The package can be downloaded from https://cran.r-project.org/web/packages/darch/index.html. The main class in the darch package implements deep architectures and provides the ability to train them with Contrastive Divergence and fine-tune with backpropagation, resilient backpropagation, and conjugate gradients. The new instances of the class are created with the newDArch constructor. It is called with the following arguments: a vector containing the number of nodes in each layers, the batch size, a Boolean variable to indicate whether to use the ff package for computing weights and outputs, and the name of the function for generating the weight matrices. Let us create a network having two input units, four hidden units, and one output unit: install.packages("darch") #one time >library(darch) >darch ← newDArch(c(2,4,1),batchSize = 2,genWeightFunc = generateWeights) INFO [2015-07-19 18:50:29] Constructing a darch with 3 layers. INFO [2015-07-19 18:50:29] Generating RBMs. INFO [2015-07-19 18:50:29] Construct new RBM instance with 2 visible and 4 hidden units. INFO [2015-07-19 18:50:29] Construct new RBM instance with 4 visible and 1 hidden units. Let us train the DBN with a toy dataset. We are using this because for training any realistic examples, it would take a long time, hours if not days. Let us create an input data set containing two columns and four rows: >inputs ← matrix(c(0,0,0,1,1,0,1,1),ncol=2,byrow=TRUE) >outputs ← matrix(c(0,1,1,0),nrow=4) Now, let us pretrain the DBN using the input data: >darch ← preTrainDArch(darch,inputs,maxEpoch=1000) We can have a look at the weights learned at any layer using the getLayerWeights( ) function. Let us see how the hidden layer looks like: >getLayerWeights(darch,index=1) [[1]] [,1] [,2] [,3] [,4] [1,] 8.167022 0.4874743 -7.563470 -6.951426 [2,] 2.024671 -10.7012389 1.313231 1.070006 [3,] -5.391781 5.5878931 3.254914 3.000914 Now, let's do a backpropagation for supervised learning. For this, we need to first set the layer functions to sigmoidUnitDerivatives: >layers ← getLayers(darch) >for(i in length(layers):1){ layers[[i]][[2]] ← sigmoidUnitDerivative } >setLayers(darch) ← layers >rm(layers) Finally, the following two lines perform the backpropagation: >setFineTuneFunction(darch) ← backpropagation >darch ← fineTuneDArch(darch,inputs,outputs,maxEpoch=1000) We can see the prediction quality of DBN on the training data itself by running darch as follows: >darch ← getExecuteFunction(darch)(darch,inputs) >outputs_darch ← getExecOutputs(darch) >outputs_darch[[2]] [,1] [1,] 9.998474e-01 [2,] 4.921130e-05 [3,] 9.997649e-01 [4,] 3.796699e-05 Comparing with the actual output, DBN has predicted the wrong output for the first and second input rows. Since this example was just to illustrate how to use the darch package, we are not worried about the 50% accuracy here. Other deep learning packages in R Although there are some other deep learning packages in R such as deepnet and RcppDL, compared with libraries in other languages such as Cuda (C++) and Theano (Python), R yet does not have good native libraries for deep learning. The only available package is a wrapper for the Java-based deep learning open source project H2O. This R package, h20, allows running H2O via its REST API from within R. Readers who are interested in serious deep learning projects and applications should use H2O using h2o packages in R. One needs to install H2O in your machine to use h2o. Summary We have learned one of the latest advances in neural networks that is called deep learning. It can be used to solve many problems such as computer vision and natural language processing that involves highly cognitive elements. The artificial intelligent systems using deep learning were able to achieve accuracies comparable to human intelligence in tasks such as speech recognition and image classification. To know more about Bayesian modeling in R, check out Learning Bayesian Models with R (https://www.packtpub.com/big-data-and-business-intelligence/learning-bayesian-models-r). You can also check out our other R books, Data Analysis with R (https://www.packtpub.com/big-data-and-business-intelligence/data-analysis-r), and Machine Learning with R - Second Edition (https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-r-second-edition). Resources for Article: Further resources on this subject: Working with Data – Exploratory Data Analysis [article] Big Data Analytics [article] Deep learning in R [article]
Read more
  • 0
  • 0
  • 6448

article-image-python-design-patterns-depth-singleton-pattern
Packt
15 Feb 2016
14 min read
Save for later

Python Design Patterns in Depth: The Singleton Pattern

Packt
15 Feb 2016
14 min read
There are situations where you need to create only one instance of data throughout the lifetime of a program. This can be a class instance, a list, or a dictionary, for example. The creation of a second instance is undesirable. This can result in logical errors or malfunctioning of the program. The design pattern that allows you to create only one instance of data is called singleton. In this article, you will learn about module-level, classic, and borg singletons; you'll also learn about how they work, when to use them, and build a two-threaded web crawler that uses a singleton to access the shared resource. (For more resources related to this topic, see here.) Singleton is the best candidate when the requirements are as follows: Controlling concurrent access to a shared resource If you need a global point of access for the resource from multiple or different parts of the system When you need to have only one object Some typical use cases of using a singleton are: The logging class and its subclasses (global point of access for the logging class to send messages to the log) Printer spooler (your application should only have a single instance of the spooler in order to avoid having a conflicting request for the same resource) Managing a connection to a database File manager Retrieving and storing information on external configuration files Read-only singletons storing some global states (user language, time, time zone, application path, and so on) There are several ways to implement singletons. We will look at module-level singleton, classic singletons, and borg singleton. Module-level singleton All modules are singletons by nature because of Python's module importing steps: Check whether a module is already imported. If yes, return it. If not, find a module, initialize it, and return it. Initializing a module means executing a code, including all module-level assignments. When you import the module for the first time, all of the initializations will be done; however, if you try to import the module for the second time, Python will return the initialized module. Thus, the initialization will not be done, and you get a previously imported module with all of its data. So, if you want to quickly make a singleton, use the following steps and keep the shared data as the module attribute. singletone.py: only_one_var = "I'm only one var" module1.py: import single tone print singleton.only_one_var singletone.only_one_var += " after modification" import module2 module2.py: import singletone print singleton.only_one_var Here, if you try to import a global variable in a singleton module and change its value in the module1 module, module2 will get a changed variable. This function is quick and sometimes is all that you need; however, we need to consider the following points: It's pretty error-prone. For example, if you happen to forget the global statements, variables local to the function will be created and, the module's variables won't be changed, which is not what you want. It's ugly, especially if you have a lot of objects that should remain as singletons. They pollute the module namespace with unnecessary variables. They don't permit lazy allocation and initialization; all global variables will be loaded during the module import process. It's not possible to re-use the code because you can not use the inheritance. No special methods and no object-oriented programming benefits at all. Classic singleton In classic singleton in Python, we check whether an instance is already created. If it is created, we return it; otherwise, we create a new instance, assign it to a class attribute, and return it. Let's try to create a dedicated singleton class: class Singleton(object): def __new__(cls): if not hasattr(cls, 'instance'): cls.instance = super(Singleton, cls).__new__(cls) return cls.instance Here, before creating the instance, we check for the special __new__ method, which is called right before __init__ if we had created an instance earlier. If not, we create a new instance; otherwise, we return the already created instance. Let's check how it works: >>> singleton = Singleton() >>> another_singleton = Singleton() >>> singleton is another_singleton True >>> singleton.only_one_var = "I'm only one var" >>> another_singleton.only_one_var I'm only one var Try to subclass the Singleton class with another one. class Child(Singleton): pass If it's a successor of Singleton, all of its instances should also be the instances of Singleton, thus sharing its states. But this doesn't work as illustrated in the following code: >>> child = Child() >>> child is singleton >>> False >>> child.only_one_var AttributeError: Child instance has no attribute 'only_one_var' To avoid this situation, the borg singleton is used. Borg singleton Borg is also known as monostate. In the borg pattern, all of the instances are different, but they share the same state. In the following code , the shared state is maintained in the _shared_state attribute. And all new instances of the Borg class will have this state as defined in the __new__ class method. class Borg(object):    _shared_state = {}    def __new__(cls, *args, **kwargs):        obj = super(Borg, cls).__new__(cls, *args, **kwargs)        obj.__dict__ = cls._shared_state        return obj Generally, Python stores the instance state in the __dict__ dictionary and when instantiated normally, every instance will have its own __dict__. But, here we deliberately assign the class variable _shared_state to all of the created instances. Here is how it works with subclassing: class Child(Borg):    pass>>> borg = Borg()>>> another_borg = Borg()>>> borg is another_borgFalse>>> child = Child()>>> borg.only_one_var = "I'm the only one var">>> child.only_one_varI'm the only one var So, despite the fact that you can't compare objects by their identity, using the is statement, all child objects share the parents' state. If you want to have a class, which is a descendant of the Borg class but has a different state, you can reset shared_state as follows: class AnotherChild(Borg):    _shared_state = {}>>> another_child = AnotherChild()>>> another_child.only_one_varAttributeError: AnotherChild instance has no attribute 'shared_state' Which type of singleton should be used is up to you. If you expect that your singleton will not be inherited, you can choose the classic singleton; otherwise, it's better to stick with borg. Implementation in Python As a practical example, we'll create a simple web crawler that scans a website you open on it, follows all the links that lead to the same website but to other pages, and downloads all of the images it'll find. To do this, we'll need two functions: a function that scans a website for links, which leads to other pages to build a set of pages to visit, and a function that scans a page for images and downloads them. To make it quicker, we'll download images in two threads. These two threads should not interfere with each other, so don't scan pages if another thread has already scanned them, and don't download images that are already downloaded. So, a set with downloaded images and scanned web pages will be a shared resource for our application, and we'll keep it in a singleton instance. In this example, you will need a library for parsing and screen scraping websites named BeautifulSoup and an HTTP client library httplib2. It should be sufficient to install both with either of the following commands: $ sudo pip install BeautifulSoup httplib2 $ sudo easy_install BeautifulSoup httplib2 First of all, we'll create a Singleton class. Let's use the classic singleton in this example: import httplib2import osimport reimport threadingimport urllibfrom urlparse import urlparse, urljoinfrom BeautifulSoup import BeautifulSoupclass Singleton(object):    def __new__(cls):        if not hasattr(cls, 'instance'):             cls.instance = super(Singleton, cls).__new__(cls)        return cls.instance It will return the singleton objects to all parts of the code that request it. Next, we'll create a class for creating a thread. In this thread, we'll download images from the website: class ImageDownloaderThread(threading.Thread):    """A thread for downloading images in parallel."""    def __init__(self, thread_id, name, counter):        threading.Thread.__init__(self)        self.name = name    def run(self):        print 'Starting thread ', self.name        download_images(self.name)        print 'Finished thread ', self.name The following function traverses the website using BFS algorithms, finds links, and adds them to a set for further downloading. We are able to specify the maximum links to follow if the website is too large. def traverse_site(max_links=10):    link_parser_singleton = Singleton()    # While we have pages to parse in queue    while link_parser_singleton.queue_to_parse:        # If collected enough links to download images, return        if len(link_parser_singleton.to_visit) == max_links:            return        url = link_parser_singleton.queue_to_parse.pop()        http = httplib2.Http()        try:            status, response = http.request(url)        except Exception:            continue        # Skip if not a web page        if status.get('content-type') != 'text/html':            continue        # Add the link to queue for downloading images        link_parser_singleton.to_visit.add(url)        print 'Added', url, 'to queue'        bs = BeautifulSoup(response)        for link in BeautifulSoup.findAll(bs, 'a'):            link_url = link.get('href')            # <img> tag may not contain href attribute            if not link_url:                continue            parsed = urlparse(link_url)            # If link follows to external webpage, skip it            if parsed.netloc and parsed.netloc != parsed_root.netloc:                continue            # Construct a full url from a link which can be relative            link_url = (parsed.scheme or parsed_root.scheme) + '://' + (parsed.netloc or parsed_root.netloc) + parsed.path or ''            # If link was added previously, skip it            if link_url in link_parser_singleton.to_visit:                continue            # Add a link for further parsing            link_parser_singleton.queue_to_parse = [link_url] + link_parser_singleton.queue_to_parse The following function downloads images from the last web resource page in the singleton.to_visit queue and saves it to the img directory. Here, we use a singleton for synchronizing shared data, which is a set of pages to visit between two threads: def download_images(thread_name):    singleton = Singleton()    # While we have pages where we have not download images    while singleton.to_visit:        url = singleton.to_visit.pop()        http = httplib2.Http()        print thread_name, 'Starting downloading images from', url        try:            status, response = http.request(url)        except Exception:            continue        bs = BeautifulSoup(response)       # Find all <img> tags        images = BeautifulSoup.findAll(bs, 'img')        for image in images:            # Get image source url which can be absolute or relative            src = image.get('src')            # Construct a full url. If the image url is relative,            # it will be prepended with webpage domain.            # If the image url is absolute, it will remain as is            src = urljoin(url, src)            # Get a base name, for example 'image.png' to name file locally            basename = os.path.basename(src)            if src not in singleton.downloaded:                singleton.downloaded.add(src)                print 'Downloading', src                # Download image to local filesystem                urllib.urlretrieve(src, os.path.join('images', basename))        print thread_name, 'finished downloading images from', url Our client code is as follows: if __name__ == '__main__':    root = 'http://python.org'    parsed_root = urlparse(root)    singleton = Singleton()    singleton.queue_to_parse = [root]    # A set of urls to download images from    singleton.to_visit = set()    # Downloaded images    singleton.downloaded = set()    traverse_site()    # Create images directory if not exists    if not os.path.exists('images'):        os.makedirs('images')    # Create new threads    thread1 = ImageDownloaderThread(1, "Thread-1", 1)    thread2 = ImageDownloaderThread(2, "Thread-2", 2)    # Start new Threads    thread1.start()    thread2.start() Run a crawler using the following command: $ python crawler.py You should get the following output (your output may vary because the order in which the threads access resources is not predictable): If you go to the images directory, you will find the downloaded images there. Summary To learn more about design patterns in depth, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: Learning Python Design Patterns – Second Edition (https://www.packtpub.com/application-development/learning-python-design-patterns-second-edition) Mastering Python Design Patterns (https://www.packtpub.com/application-development/mastering-python-design-patterns) Resources for Article: Further resources on this subject: Python Design Patterns in Depth: The Factory Pattern [Article] Recommending Movies at Scale (Python) [Article] Customizing IPython [Article]
Read more
  • 0
  • 0
  • 43802
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-depth-look-ansible-plugins
Packt
15 Feb 2016
9 min read
Save for later

An In-depth Look at Ansible Plugins

Packt
15 Feb 2016
9 min read
In this article by Rishabh Das, author of the book Extending Ansible, we will deep dive into what Ansible plugins are and how you can write your own custom Ansible plugin. The article will discuss the different types of Ansible plugins in detail and explore them on a code level. The article will walk you through the Ansible Python API and using the extension points to write your own Ansible plugins. (For more resources related to this topic, see here.) Lookup plugins Lookup plugins are designed to read data from different sources and feed them to Ansible. The data source can be either from the local filesystem on the controller node or from an external data source. These may also be for file formats that are not natively supported by Ansible. If you decide to write your own lookup plugin, you need to drop it in one of the following directories for Ansible to pick it up during the execution of an Ansible playbook. A directory named lookup_plugins in the project root At ~/.ansible/plugins/lookup_plugins/ At /usr/share/ansible_plugins/lookup_plugins/ By default, a number of lookup plugins are already available in Ansible. Let's discuss some of the commonly used lookup plugins. Lookup plugin – file This is the most basic type of lookup plugin available in Ansible. It reads through the file content on the controller node. The data read from the file can then be fed to the Ansible playbook as a variable. In the most basic form, usage of file lookup is demonstrated in the following Ansible playbook: --- - hosts: all vars: data: "{{ lookup('file', './test-file.txt') }}" tasks: - debug: msg="File contents {{ data }}" The preceding playbook will read data off a local file, test-file.txt, from the playbook's root directory into a data variable. This variable is then fed to the task : debug module, which uses the data variable to print it onscreen. Lookup plugin – csvfile The csvfile lookup plugin was designed to read data from a CSV file on the controller node. This lookup module is designed to take in several parameters, which are discussed in this table: Parameter Default value Description file ansible.csv This is the file to read data from delimiter TAB This is the delimiter used in the CSV file, usually ','. col 1 This is the column number (index) default Empty string This returns this value if the requested key is not found in the CSV file Let's take an example of reading data from the following CSV file. The CSV file contains population and area details of different cities: File: city-data.csv City, Area, Population Pune, 700, 2.5 Million Bangalore, 741, 4.3 Million Mumbai, 603, 12 Million This file lies in the controller node at the root of the Ansible play. To read off data from this file, the csvfile lookup plugin is used. The following Ansible play tries to read the population of Mumbai from the preceding CSV file: Ansible Play – test-csv.yaml --- - hosts: all tasks: - debug: msg="Population of Mumbai is {{lookup('csvfile', 'Mumbai file=city-data.csv delimiter=, col=2')}}"   Lookup plugin – dig The dig lookup plugin can be used to run DNS queries against Fully Qualified Domain Name (FQDN). You can customize the lookup plugin's output using the different flags that are supported by the plugin. In the most basic form, it returns the IP of the given FQDN. This plugin has a dependency on the python-dns package. This should be installed on the controller node. The following Ansible play explains how to fetch the TXT record for any FQDN: --- - hosts: all tasks: - debug: msg="TXT record {{ lookup('dig', 'yahoo.com./TXT') }}" - debug: msg="IP of yahoo.com {{lookup('dig', 'yahoo.com', wantlist=True)}}" The preceding Ansible play will fetch the TXT records in step one and the IPs associated with FQDN yahoo.com in second. It is also possible to perform reverse DNS lookups using the dig plugin with the following syntax: - debug: msg="Reverse DNS for 8.8.8.8 is {{ lookup('dig', '8.8.8.8/PTR') }}" Lookup plugin – ini The ini lookup plugin is designed to read data off an .ini file. The INI file, in general, is a collection of key-value pairs under defined sections. The ini lookup plugin supports the following parameters: Parameter Default value Description type ini This is the type of file. It currently supports two formats: ini and property. file ansible.ini This is the name of file to read data from. section global This is the section of the ini file from which the specified key needs to be read. re false If the key is a regular expression, we need to set this to true. default Empty string If the requested key is not found in the ini file, we need to return this.   Taking an example of the following ini file, let's try to read some keys using the ini lookup plugin. The file is named network.ini: [default] bind_host = 0.0.0.0 bind_port = 9696 log_dir = /var/log/network [plugins] core_plugin = rdas-net firewall = yes The following Ansible play will read off the keys from the ini file: --- - hosts: all tasks: - debug: msg="core plugin {{ lookup('ini', 'core_plugin file=network.ini section=plugins') }}" - debug: msg="core plugin {{ lookup('ini', 'bind_port file=network.ini section=default') }}" The ini lookup plugin can also be used to read off values through a file that does not contain sections—for instance, a Java property file. Loops – lookup plugins for Iteration There are times when you need to perform the same task over and over again. It might be the case of installing various dependencies for a package or multiple inputs that go through the same operation—for instance, while checking and starting various services. Just like any other programming language provides a way to iterate over data to perform repetitive tasks, Ansible also provides a clean way to carry out the same operation. The concept is called looping and is provided by Ansible lookup plugins. Loops in Ansible are generally identified as those starting with “with_”. Ansible supports a number of looping options. A few of the most commonly used are discussed in the following section. Standard loop – with_items This is the simplest and most commonly used loop in Ansible. It is used to iterate over an item list and perform some operation on it. The following Ansible play demonstrates the use of the with_items lookup loop: --- - hosts: all tasks: - name: Install packages yum: name={{ item }} state=present with_items: - vim - wget - ipython The with_items lookup loop supports the use of hashes in which you can access the variables using the .<keyname> item in the Ansible playbook. The following playbook demonstrates the use of with_item to iterate over a given hash: --- - hosts: all tasks: - name: Create directories with specific permissions file: path={{item.dir}} state=directory mode={{item.mode | int}} with_items: - { dir: '/tmp/ansible', mode: 755 } - { dir: '/tmp/rdas', mode: 755 } The preceding playbook will create two directories with the specified permission sets. If you look closely while accessing the mode key from item, there exists a | int filter. This is a jinja2 filter, which is used to convert a string to an integer. DoUntill loop – until This has the same implementation as that in any other programming language. It executes at least once and keeps executing unless a specific condition is reached, as follows: < Code to follow > Creating your own lookup plugin This article covered some already available Ansible lookup plugins and explained how these can be used. This section will try to replicate a functionality of the dig lookup to get the IP address of a given FQDN. This will be done without using the dnspython library and will use the basic socket library for Python. The following example is only a demonstration of how you can write your own Ansible lookup plugin: import socket class LookupModule(object): def __init__(self, basedir=None, **kwargs): self.basedir = basedir def run(self, hostname, inject=None, **kwargs): hostname = str(hostname) try: host_detail = socket.gethostbyname(hostname) except: host_detail = 'Invalid Hostname' return host_detail The preceding code is a lookup plugin; let’s call it hostip. As you can note, there exists a class named LookupModule. Ansible identifies a Python file or module as a lookup plugin only when there exists a class called LookupModule. The module takes an argument hostname and checks whether there exists an IP corresponding to it—that is, whether it can be resolved to a valid IP address. If yes, it returns the IP address of the requested FQDN. If not, it returns Invalid Hostname. To use this module, place it in the lookup_modules directory at the root of the Ansible play. The following playbook demonstrates how you can use the hostip lookup just created: --- - hosts: all tasks: - debug: msg="{{lookup('hostip', item, wantlist=True)}}" with_items: - www.google.co.in - saliux.wordpress.com - www.twitter.com The preceding play will loop through the list of websites and pass it as an argument to the hostip lookup plugin. This will in turn return the IP associated with the requested domain. If you notice, there is an argument wantlist=True also passed while the hostip lookup plugin was called. This is to handle multiple outputs; that is, if there are multiple values associated with the requested domain, the values will be returned as a list. This makes it easy to iterate over the output values. Summary This article picked up on how the Ansible Python API for plugins is implemented in various Ansible plugins. The article discussed various types of plugins in detail, both from the implementation point of view and on a code level. The article also demonstrated how to write sample plugins by writing custom lookup plugins. By the end of this article, you should be able to write your own custom plugin for Ansible. Resources for Article: Further resources on this subject: Mastering Ansible – Protecting Your Secrets with Ansible [article] Ansible – An Introduction [article] Getting Started with Ansible [article]
Read more
  • 0
  • 0
  • 26635

article-image-changing-views
Packt
15 Feb 2016
25 min read
Save for later

Changing Views

Packt
15 Feb 2016
25 min read
In this article by Christopher Pitt, author of the book React Components, has explained how to change sections without reloading the page. We'll use this knowledge to create the public pages of the website our CMS is meant to control. (For more resources related to this topic, see here.) Location, location, and location Before we can learn about alternatives to reloading pages, let's take a look at how the browser manages reloads. You've probably encountered the window object. It's a global catch-all object for browser-based functionality and state. It's also the default this scope in any HTML page: We've even accessed window before. When we rendered to document.body or used document.querySelector, the window object was assumed. It's the same as if we were to call window.document.querySelector. Most of the time document is the only property we need. That doesn't mean it's the only property useful to us. Try the following, in the console: console.log(window.location); You should see something similar to: Location {     hash: ""     host: "127.0.0.1:3000"     hostname: "127.0.0.1"     href: "http://127.0.0.1:3000/examples/login.html"     origin: "http://127.0.0.1:3000"     pathname: "/examples/login.html"     port: "3000"     ... } If we were trying to work out which components to show based on the browser URL, this would be an excellent place to start. Not only can we read from this object, but we can also write to it: <script>     window.location.href = "http://material-ui.com"; </script> Putting this in an HTML page or entering that line of JavaScript in the console will redirect the browser to material-ui.com. It's the same if you click on the link. And if it's to a different page (than the one the browser is pointing at), then it will cause a full page reload. A bit of history So how does this help us? We're trying to avoid full page reloads, after all. Let's experiment with this object. Let's see what happens when we add something like #page-admin to the URL: Adding #page-admin to the URL leads to the window.location.hash property being populated with the same page. What's more, changing the hash value won't reload the page! It's the same as if we clicked on a link that had that hash in the href attribute. We can modify it without causing full page reloads, and each modification will store a new entry in the browser history. Using this trick, we can step through a number of different "states" without reloading the page, and be able to backtrack each with the browser's back button. Using browser history Let's put this trick to use in our CMS. First, let's add a couple functions to our Nav component: export default (props) => {     // ...define class names       var redirect = (event, section) => {         window.location.hash = `#${section}`;         event.preventDefault();     }       return <div className={drawerClassNames}>         <header className="demo-drawer-header">             <img src="images/user.jpg"                  className="demo-avatar" />         </header>         <nav className={navClassNames}>             <a className="mdl-navigation__link"                href="/examples/login.html"                onClick={(e) => redirect(e, "login")}>                 <i className={buttonIconClassNames}                    role="presentation">                     lock                 </i>                 Login             </a>             <a className="mdl-navigation__link"                href="/examples/page-admin.html"                onClick={(e) => redirect(e, "page-admin")}>                 <i className={buttonIconClassNames}                    role="presentation">                     pages                 </i>                 Pages             </a>         </nav>     </div>; }; We add an onClick attribute to our navigation links. We've created a special function that will change window.location.hash and prevent the default full page reload behavior the links would otherwise have caused. This is a neat use of arrow functions, but we're ultimately creating three new functions in each render call. Remember that this can be expensive, so it's best to move the function creation out of render. We'll replace this shortly. It's also interesting to see template strings in action. Instead of "#" + section, we can use `#${section}` to interpolate the section name. It's not as useful in small strings, but becomes increasingly useful in large ones. Clicking on the navigation links will now change the URL hash. We can add to this behavior by rendering different components when the navigation links are clicked: import React from "react"; import ReactDOM from "react-dom"; import Component from "src/component"; import Login from "src/login"; import Backend from "src/backend"; import PageAdmin from "src/page-admin";   class Nav extends Component {     render() {         // ...define class names           return <div className={drawerClassNames}>             <header className="demo-drawer-header">                 <img src="images/user.jpg"                      className="demo-avatar" />             </header>             <nav className={navClassNames}>                 <a className="mdl-navigation__link"                    href="/examples/login.html"                    onClick={(e) => this.redirect(e, "login")}>                     <i className={buttonIconClassNames}                        role="presentation">                         lock                     </i>                     Login                 </a>                 <a className="mdl-navigation__link"                    href="/examples/page-admin.html"                    onClick={(e) => this.redirect(e, "page-admin")}>                     <i className={buttonIconClassNames}                        role="presentation">                         pages                     </i>                     Pages                 </a>             </nav>         </div>;     }       redirect(event, section) {         window.location.hash = `#${section}`;           var component = null;           switch (section) {             case "login":                 component = <Login />;                 break;             case "page-admin":                 var backend = new Backend();                 component = <PageAdmin backend={backend} />;                 break;         }           var layoutClassNames = [             "demo-layout",             "mdl-layout",             "mdl-js-layout",             "mdl-layout--fixed-drawer"         ].join(" ");           ReactDOM.render(             <div className={layoutClassNames}>                 <Nav />                 {component}             </div>,             document.querySelector(".react")         );           event.preventDefault();     } };   export default Nav; We've had to convert the Nav function to a Nav class. We want to create the redirect method outside of render (as that is more efficient) and also isolate the choice of which component to render. Using a class also gives us a way to name and reference Nav, so we can create a new instance to overwrite it from within the redirect method. It's not ideal packaging this kind of code within a component, so we'll clean that up in a bit. We can now switch between different sections without full page reloads. There is one problem still to solve. When we use the browser back button, the components don't change to reflect the component that should be shown for each hash. We can solve this in a couple of ways. The first approach we can try is checking the hash frequently: componentDidMount() {     var hash = window.location.hash;       setInterval(() => {         if (hash !== window.location.hash) {             hash = window.location.hash;             this.redirect(null, hash.slice(1), true);         }     }, 100); }   redirect(event, section, respondingToHashChange = false) {     if (!respondingToHashChange) {         window.location.hash = `#${section}`;     }       var component = null;       switch (section) {         case "login":             component = <Login />;             break;         case "page-admin":             var backend = new Backend();             component = <PageAdmin backend={backend} />;             break;     }       var layoutClassNames = [         "demo-layout",         "mdl-layout",         "mdl-js-layout",         "mdl-layout--fixed-drawer"     ].join(" ");       ReactDOM.render(         <div className={layoutClassNames}>             <Nav />             {component}         </div>,         document.querySelector(".react")     );       if (event) {         event.preventDefault();     } } Our redirect method has an extra parameter, to apply the new hash whenever we're not responding to a hash change. We've also wrapped the call to event.preventDefault in case we don't have a click event to work with. Other than those changes, the redirect method is the same. We've also added a componentDidMount method, in which we have a call to setInterval. We store the initial window.location.hash and check 10 times a second to see if it has change. The hash value is #login or #page-admin, so we slice the first character off and pass the rest to the redirect method. Try clicking on the different navigation links, and then use the browser back button. The second option is to use the newish pushState and popState methods on the window.history object. They're not very well supported yet, so you need to be careful to handle older browsers or sure you don't need to handle them. You can learn more about pushState and popState at https://developer.mozilla.org/en-US/docs/Web/API/History_API. Using a router Our hash code is functional but invasive. We shouldn't be calling the render method from inside a component (at least not one we own). So instead, we're going to use a popular router to manage this stuff for us. Download it with the following: $ npm install react-router --save Then we need to join login.html and page-admin.html back into the same file: <!DOCTYPE html> <html>     <head>         <script src="/node_modules/babel-core/browser.js"></script>         <script src="/node_modules/systemjs/dist/system.js"></script>         <script src="https://storage.googleapis.com/code.getmdl.io/1.0.6/material.min.js"></script>         <link rel="stylesheet" href="https://storage.googleapis.com/code.getmdl.io/1.0.6/material.indigo-pink.min.css" />         <link rel="stylesheet" href="https://fonts.googleapis.com/icon?family=Material+Icons" />         <link rel="stylesheet" href="admin.css" />     </head>     <body class="         mdl-demo         mdl-color--grey-100         mdl-color-text--grey-700         mdl-base">         <div class="react"></div>         <script>             System.config({                 "transpiler": "babel",                 "map": {                     "react": "/examples/react/react",                     "react-dom": "/examples/react/react-dom",                     "router": "/node_modules/react-router/umd/ReactRouter"                 },                 "baseURL": "../",                 "defaultJSExtensions": true             });               System.import("examples/admin");         </script>     </body> </html> Notice how we've added the ReactRouter file to the import map? We'll use that in admin.js. First, let's define our layout component: var App = function(props) {     var layoutClassNames = [         "demo-layout",         "mdl-layout",         "mdl-js-layout",         "mdl-layout--fixed-drawer"     ].join(" ");       return (         <div className={layoutClassNames}>             <Nav />             {props.children}         </div>     ); }; This creates the page layout we've been using and allows a dynamic content component. Every React component has a this.props.children property (or props.children in the case of a stateless component), which is an array of nested components. For example, consider this component: <App>     <Login /> </App> Inside the App component, this.props.children will be an array with a single item—an instance of the Login. Next, we'll define handler components for the two sections we want to route: var LoginHandler = function() {     return <Login />; }; var PageAdminHandler = function() {     var backend = new Backend();     return <PageAdmin backend={backend} />; }; We don't really need to wrap Login in LoginHandler but I've chosen to do it to be consistent with PageAdminHandler. PageAdmin expects an instance of Backend, so we have to wrap it as we see in this example. Now we can define routes for our CMS: ReactDOM.render(     <Router history={browserHistory}>         <Route path="/" component={App}>             <IndexRoute component={LoginHandler} />             <Route path="login" component={LoginHandler} />             <Route path="page-admin" component={PageAdminHandler} />         </Route>     </Router>,     document.querySelector(".react") ); There's a single root route, for the path /. It creates an instance of App, so we always get the same layout. Then we nest a login route and a page-admin route. These create instances of their respective components. We also define an IndexRoute so that the login page will be displayed as a landing page. We need to remove our custom history code from Nav: import React from "react"; import ReactDOM from "react-dom"; import { Link } from "router";   export default (props) => {     // ...define class names       return <div className={drawerClassNames}>         <header className="demo-drawer-header">             <img src="images/user.jpg"                  className="demo-avatar" />         </header>         <nav className={navClassNames}>             <Link className="mdl-navigation__link" to="login">                 <i className={buttonIconClassNames}                    role="presentation">                     lock                 </i>                 Login             </Link>             <Link className="mdl-navigation__link" to="page-admin">                 <i className={buttonIconClassNames}                    role="presentation">                     pages                 </i>                 Pages             </Link>         </nav>     </div>; }; And since we no longer need a separate redirect method, we can convert the class back into a statement component (function). Notice we've swapped anchor components for a new Link component. This interacts with the router to show the correct section when we click on the navigation links. We can also change the route paths without needing to update this component (unless we also change the route names). Creating public pages Now that we can easily switch between CMS sections, we can use the same trick to show the public pages of our website. Let's create a new HTML page just for these: <!DOCTYPE html> <html>     <head>         <script src="/node_modules/babel-core/browser.js"></script>         <script src="/node_modules/systemjs/dist/system.js"></script>     </head>     <body>         <div class="react"></div>         <script>             System.config({                 "transpiler": "babel",                 "map": {                     "react": "/examples/react/react",                     "react-dom": "/examples/react/react-dom",                     "router": "/node_modules/react-router/umd/ReactRouter"                 },                 "baseURL": "../",                 "defaultJSExtensions": true             });               System.import("examples/index");         </script>     </body> </html> This is a reduced form of admin.html without the material design resources. I think we can ignore the appearance of these pages for the moment, while we focus on the navigation. The public pages are almost 100%, so we can use stateless components for them. Let's begin with the layout component: var App = function(props) {     return (         <div className="layout">             <Nav pages={props.route.backend.all()} />             {props.children}         </div>     ); }; This is similar to the App admin component, but it also has a reference to a Backend. We define that when we render the components: var backend = new Backend(); ReactDOM.render(     <Router history={browserHistory}>         <Route path="/" component={App} backend={backend}>             <IndexRoute component={StaticPage} backend={backend} />             <Route path="pages/:page" component={StaticPage} backend={backend} />         </Route>     </Router>,     document.querySelector(".react") ); For this to work, we also need to define a StaticPage: var StaticPage = function(props) {     var id = props.params.page || 1;     var backend = props.route.backend;       var pages = backend.all().filter(         (page) => {             return page.id == id;         }     );       if (pages.length < 1) {         return <div>not found</div>;     }       return (         <div className="page">             <h1>{pages[0].title}</h1>             {pages[0].content}         </div>     ); }; This component is more interesting. We access the params property, which is a map of all the URL path parameters defined for this route. We have :page in the path (pages/:page), so when we go to pages/1, the params object is {"page":1}. We also pass a Backend to Page, so we can fetch all pages and filter them by page.id. If no page.id is provided, we default to 1. After filtering, we check to see if there are any pages. If not, we return a simple not found message. Otherwise, we render the content of the first page in the array (since we expect the array to have a length of at least 1). We now have a page for the public pages of the website: Summary In this article, we learned about how the browser stores URL history and how we can manipulate it to load different sections without full page reloads. Resources for Article:   Further resources on this subject: Introduction to Akka [article] An Introduction to ReactJs [article] ECMAScript 6 Standard [article]
Read more
  • 0
  • 0
  • 11553

article-image-modular-programming-ecmascript-6
Packt
15 Feb 2016
18 min read
Save for later

Modular Programming in ECMAScript 6

Packt
15 Feb 2016
18 min read
Modular programming is one of the most important and frequently used software design techniques. Unfortunately, JavaScript didn't support modules natively that lead JavaScript programmers to use alternative techniques to achieve modular programming in JavaScript. But now, ES6 brings modules into JavaScript officially. This article is all about how to create and import JavaScript modules. In this article, we will first learn how the modules were created earlier, and then we will jump to the new built-in module system that was introduced in ES6, known as the ES6 modules. In this article, we'll cover: What is modular programming? The benefits of modular programming The basics of IIFE modules, AMD, UMD, and CommonJS Creating and importing the ES6 modules The basics of the Modular Loader Creating a basic JavaScript library using modules (For more resources related to this topic, see here.) The JavaScript modules in a nutshell The practice of breaking down programs and libraries into modules is called modular programming. In JavaScript, a module is a collection of related objects, functions, and other components of a program or library that are wrapped together and isolated from the scope of the rest of the program or library. A module exports some variables to the outside program to let it access the components wrapped by the module. To use a module, a program needs to import the module and the variables exported by the module. A module can also be split into further modules called as its submodules, thus creating a module hierarchy. Modular programming has many benefits. Some benefits are: It keeps our code both cleanly separated and organized by splitting into multiple modules Modular programming leads to fewer global variables, that is, it eliminates the problem of global variables, because modules don't interface via the global scope, and each module has its own scope Makes code reusability easier as importing and using the same modules in different projects is easier It allows many programmers to collaborate on the same program or library, by making each programmer to work on a particular module with a particular functionality Bugs in an application can easily be easily identified as they are localized to a particular module Implementing modules – the old way Before ES6, JavaScript had never supported modules natively. Developers used other techniques and third-party libraries to implement modules in JavaScript. Using Immediately-invoked function expression (IIFE), Asynchronous Module Definition (AMD), CommonJS, and Universal Module Definition (UMD) are various popular ways of implementing modules in ES5. As these ways were not native to JavaScript, they had several problems. Let's see an overview of each of these old ways of implementing modules. The Immediately-Invoked Function Expression The IIFE is used to create an anonymous function that invokes itself. Creating modules using IIFE is the most popular way of creating modules. Let's see an example of how to create a module using IIFE: //Module Starts (function(window){   var sum = function(x, y){     return x + y;   }     var sub = function(x, y){     return x - y;   }   var math = {     findSum: function(a, b){       return sum(a,b);     },     findSub: function(a, b){       return sub(a, b);     }   }   window.math = math; })(window) //Module Ends console.log(math.findSum(1, 2)); //Output "3" console.log(math.findSub(1, 2)); //Output "-1" Here, we created a module using IIFE. The sum and sub variables are global to the module, but not visible outside of the module. The math variable is exported by the module to the main program to expose the functionalities that it provides. This module works completely independent of the program, and can be imported by any other program by simply copying it into the source code, or importing it as a separate file. A library using IIFE, such as jQuery, wraps its all of its APIs in a single IIFE module. When a program uses a jQuery library, it automatically imports the module. Asynchronous Module Definition AMD is a specification for implementing modules in browser. AMD is designed by keeping the browser limitations in mind, that is, it imports modules asynchronously to prevent blocking the loading of a webpage. As AMD is not a native browser specification, we need to use an AMD library. RequireJS is the most popular AMD library. Let's see an example on how to create and import modules using RequireJS. According to the AMD specification, every module needs to be represented by a separate file. So first, create a file named math.js that represents a module. Here is the sample code that will be inside the module: define(function(){   var sum = function(x, y){     return x + y;   }   var sub = function(x, y){     return x - y;   }   var math = {     findSum: function(a, b){       return sum(a,b);     },     findSub: function(a, b){       return sub(a, b);     }   }   return math; }); Here, the module exports the math variable to expose its functionality. Now, let's create a file named index.js, which acts like the main program that imports the module and the exported variables. Here is the code that will be inside the index.js file: require(["math"], function(math){   console.log(math.findSum(1, 2)); //Output "3"   console.log(math.findSub(1, 2)); //Output "-1" }) Here, math variable in the first parameter is the name of the file that is treated as the AMD module. The .js extension to the file name is added automatically by RequireJS. The math variable, which is in the second parameter, references the exported variable. Here, the module is imported asynchronously, and the callback is also executed asynchronously. CommonJS CommonJS is a specification for implementing modules in Node.js. According to the CommonJS specification, every module needs to be represented by a separate file. The CommonJS modules are imported synchronously. Let's see an example on how to create and import modules using CommonJS. First, we will create a file named math.js that represents a module. Here is a sample code that will be inside the module: var sum = function(x, y){   return x + y; } var sub = function(x, y){   return x - y; } var math = {   findSum: function(a, b){     return sum(a,b);   },   findSub: function(a, b){     return sub(a, b);   } } exports.math = math; Here, the module exports the math variable to expose its functionality. Now, let's create a file named index.js, which acts like the main program that imports the module. Here is the code that will be inside the index.js file: var math = require("./math").math; console.log(math.findSum(1, 2)); //Output "3" console.log(math.findSub(1, 2)); //Output "-1" Here, the math variable is the name of the file that is treated as module. The .js extension to the file name is added automatically by CommonJS. Universal Module Definition We saw three different specifications of implementing modules. These three specifications have their own respective ways of creating and importing modules. Wouldn't it have been great if we can create modules that can be imported as an IIFE, AMD, or CommonJS module? UMD is a set of techniques that is used to create modules that can be imported as an IIFE, CommonJS, or AMD module. Therefore now, a program can import third-party modules, irrespective of what module specification it is using. The most popular UMD technique is returnExports. According to the returnExports technique, every module needs to be represented by a separate file. So, let's create a file named math.js that represents a module. Here is the sample code that will be inside the module: (function (root, factory) {   //Environment Detection   if (typeof define === 'function' && define.amd) {     define([], factory);   } else if (typeof exports === 'object') {     module.exports = factory();   } else {     root.returnExports = factory();   } }(this, function () {   //Module Definition   var sum = function(x, y){     return x + y;   }   var sub = function(x, y){     return x - y;   }   var math = {     findSum: function(a, b){       return sum(a,b);     },     findSub: function(a, b){       return sub(a, b);     }   }   return math; })); Now, you can successfully import the math.js module any way that you wish, for instance, by using CommonJS, RequireJS, or IIFE. Implementing modules – the new way ES6 introduced a new module system called ES6 modules. The ES6 modules are supported natively and therefore, they can be referred as the standard JavaScript modules. You should consider using ES6 modules instead of the old ways, because they have neater syntax, better performance, and many new APIs that are likely to be packed as the ES6 modules. Let's have a look at the ES6 modules in detail. Creating the ES6 modules Every ES6 module needs to be represented by a separate .js file. An ES6 module can contain any JavaScript code, and it can export any number of variables. A module can export a variable, function, class, or any other entity. We need to use the export statement in a module to export variables. The export statement comes in many different formats. Here are the formats: export {variableName}; export {variableName1, variableName2, variableName3}; export {variableName as myVariableName}; export {variableName1 as myVariableName1, variableName2 as myVariableName2}; export {variableName as default}; export {variableName as default, variableName1 as myVariableName1, variableName2}; export default function(){}; export {variableName1, variableName2} from "myAnotherModule"; export * from "myAnotherModule"; Here are the differences in these formats: The first format exports a variable. The second format is used to export multiple variables. The third format is used to export a variable with another name, that is, an alias. The fourth format is used to export multiple variables with different names. The fifth format uses default as the alias. We will find out the use of this later in this article. The sixth format is similar to fourth format, but it also has the default alias. The seventh format works similar to fifth format, but here you can place an expression instead of a variable name. The eighth format is used to export the exported variables of a submodule. The ninth format is used to export all the exported variables of a submodule. Here are some important things that you need to know about the export statement: An export statement can be used anywhere in a module. It's not compulsory to use it at the end of the module. There can be any number of export statements in a module. You cannot export variables on demand. For example, placing the export statement in the if…else condition throws an error. Therefore, we can say that the module structure needs to be static, that is, exports can be determined on compile time. You cannot export the same variable name or alias multiple times. But you can export a variable multiple times with a different alias. All the code inside a module is executed in the strict mode by default. The values of the exported variables can be changed inside the module that exported them. Importing the ES6 modules To import a module, we need to use the import statement. The import statement comes in many different formats. Here are the formats: import x from "module-relative-path"; import {x} from "module-relative-path"; import {x1 as x2} from "module-relative-path"; import {x1, x2} from "module-relative-path"; import {x1, x2 as x3} from "module-relative-path"; import x, {x1, x2} from "module-relative-path"; import "module-relative-path"; import * as x from "module-relative-path"; import x1, * as x2 from "module-relative-path"; An import statement consists of two parts: the variable names we want to import and the relative path of the module. Here are the differences in these formats: In the first format, the default alias is imported. The x is alias of the default alias. In the second format, the x variable is imported. The third format is the same as the second format. It's just that x2 is an alias of x1. In the fourth format, we import the x1 and x2 variables. In the fifth format, we import the x1 and x2 variables. The x3 is an alias of the x2 variable. In the sixth format, we import the x1 and x2 variable, and the default alias. The x is an alias of the default alias. In the seventh format, we just import the module. We do not import any of the variables exported by the module. In the eighth format, we import all the variables, and wrap them in an object called x. Even the default alias is imported. The ninth format is the same as the eighth format. Here, we give another alias to the default alias.[RR1]  Here are some important things that you need to know about the import statement: While importing a variable, if we import it with an alias, then to refer to that variable, we have to use the alias and not the actual variable name, that is, the actual variable name will not be visible, only the alias will be visible. The import statement doesn't import a copy of the exported variables; rather, it makes the variables available in the scope of the program that imports it. Therefore, if you make a change to an exported variable inside the module, then the change is visible to the program that imports it. The imported variables are read-only, that is, you cannot reassign them to something else outside of the scope of the module that exports them. A module can only be imported once in a single instance of a JavaScript engine. If we try to import it again, then the already imported instance of the module will be used. We cannot import modules on demand. For example, placing the import statement in the if…else condition throws an error. Therefore, we can say that the imports should be able to be determined on compile time. The ES6 imports are faster than the AMD and CommonJS imports, because the ES6 imports are supported natively and also as importing modules and exporting variables are not decided on demand. Therefore, it makes JavaScript engine easier to optimize performance. The module loader A module loader is a component of a JavaScript engine that is responsible for importing modules. The import statement uses the build-in module loader to import modules. The built-in module loaders of the different JavaScript environments use different module loading mechanisms. For example, when we import a module in JavaScript running in the browsers, then the module is loaded from the server. On the other hand, when we import a module in Node.js, then the module is loaded from filesystem. The module loader loads modules in a different manner, in different environments, to optimize the performance. For example, in the browsers, the module loader loads and executes modules asynchronously in order to prevent the importing of the modules that block the loading of a webpage. You can programmatically interact with the built-in module loader using the module loader API to customize its behavior, intercept module loading, and fetch the modules on demand. We can also use this API to create our own custom module loaders. The specifications of the module loader are not specified in ES6. It is a separate standard, controlled by the WHATWG browser standard group. You can find the specifications of the module loader at http://whatwg.github.io/loader/. The ES6 specifications only specify the import and export statements. Using modules in browsers The code inside the <script> tag doesn't support the import statement, because the tag's synchronous nature is incompatible with the asynchronicity of the modules in browsers. Instead, you need to use the new <module> tag to import modules. Using the new <module> tag, we can define a script as a module. Now, this module can import other modules using the import statement. If you want to import a module using the <script> tag, then you have to use the Module Loader API. The specifications of the <module> tag are not specified in ES6. Using modules in the eval() function You cannot use the import and export statements in the eval() function. To import modules in the eval() function, you need to use the Module Loader API. The default exports vs. the named exports When we export a variable with the default alias, then it's called as a default export. Obviously, there can only be one default export in a module, as an alias can be used only once. All the other exports except the default export are called as named exports. It's recommended that a module should either use default export or named exports. It's not a good practice to use both together. The default export is used when we want to export only one variable. On the other hand, the named exports are used when we want to export the multiple variables. Diving into an example Let's create a basic JavaScript library using the ES6 modules. This will help us understand how to use the import and export statements. We will also learn how a module can import other modules. The library that we will create is going to be a math library, which provides basic logarithmic and trigonometric functions. Let's get started with creating our library: Create a file named math.js, and a directory named math_modules. Inside the math_modules directory, create two files named logarithm.js and trigonometry.js, respectively. Here, the math.js file is the root module, whereas the logarithm.js and the trigonometry.js files are its submodules. Place this code inside the logarithm.js file: var LN2 = Math.LN2; var N10 = Math.LN10;   function getLN2() {   return LN2; }   function getLN10() {   return LN10; }   export {getLN2, getLN10}; Here, the module is exporting the functions named as exports. It's preferred that the low-level modules in a module hierarchy should export all the variables separately, because it may be possible that a program may need just one exported variable of a library. In this case, a program can import this module and a particular function directly. Loading all the modules when you need just one module is a bad idea in terms of performance. Similarly, place this code in the trigonometry.js file: var cos = Math.cos; var sin = Math.sin; function getSin(value) {   return sin(value); } function getCos(value) {   return cos(value); } export {getCos, getSin}; Here we do something similar. Place this code inside the math.js file, which acts as the root module: import * as logarithm from "math_modules/logarithm"; import * as trigonometry from "math_modules/trigonometry"; export default {   logarithm: logarithm,   trigonometry: trigonometry } It doesn't contain any library functions. Instead, it makes easy for a program to import the complete library. It imports its submodules, and then exports their exported variables to the main program. Here, in case the logarithm.js and trigonometry.js scripts depends on other submodules, then the math.js module shouldn't import those submodules, because logarithm.js and trigonometry.js are already importing them. Here is the code using which a program can import the complete library: import math from "math"; console.log(math.trigonometry.getSin(3)); console.log(math.logarithm.getLN2(3)); Summary In this article, we saw what modular programming is and learned different modular programming specifications. We also saw different ways to create modules using JavaScript. Technologies such as the IIFE, CommonJS, AMD, UMD, and ES6 modules are covered. Finally, we created a basic library using the modular programming design technique. Now, you should be confident enough to build the JavaScript apps using the ES6 modules. To learn more about ECMAScript and JavaScript, the following books published by Packt Publishing (https://www.packtpub.com/) are recommended: JavaScript at Scale (https://www.packtpub.com/web-development/javascript-scale) Google Apps Script for Beginners (https://www.packtpub.com/web-development/google-apps-script-beginners) Learning TypeScript (https://www.packtpub.com/web-development/learning-typescript) JavaScript Concurrency (https://www.packtpub.com/web-development/javascript-concurrency) You can also watch out for an upcoming title, Mastering JavaScript Object-Oriented Programming, on this technology on Packt Publishing's website at https://www.packtpub.com/web-development/mastering-javascript-object-oriented-programming. Resources for Article:   Further resources on this subject: Concurrency Principles [article] Using Client Methods [article] HTML5 APIs [article]
Read more
  • 0
  • 0
  • 17160

article-image-searching-your-data
Packt
12 Feb 2016
22 min read
Save for later

Searching Your Data

Packt
12 Feb 2016
22 min read
In this article by Rafał Kuć and Marek Rogozinski the authors of this book Elasticsearch Server Third Edition, we dived into Elasticsearch indexing. We learned a lot when it comes to data handling. We saw how to tune Elasticsearch schema-less mechanism and we now know how to create our own mappings. We also saw the core types of Elasticsearch and we used analyzers – both the one that comes out of the box with Elasticsearch and the one we define ourselves. We used bulk indexing, and we added additional internal information to our indices. Finally, we learned what segment merging is, how we can fine tune it, and how to use routing in Elasticsearch and what it gives us. This article is fully dedicated to querying. By the end of this article, you will have learned the following topics: How to query Elasticsearch Using the script process Understanding the querying process (For more resources related to this topic, see here.) Querying Elasticsearch So far, when we searched our data, we used the REST API and a simple query or the GET request. Similarly, when we were changing the index, we also used the REST API and sent the JSON-structured data to Elasticsearch. Regardless of the type of operation we wanted to perform, whether it was a mapping change or document indexation, we used JSON structured request body to inform Elasticsearch about the operation details. A similar situation happens when we want to send more than a simple query to Elasticsearch we structure it using the JSON objects and send it to Elasticsearch in the request body. This is called the query DSL. In a broader view, Elasticsearch supports two kinds of queries: basic ones and compound ones. Basic queries, such as the term query, are used for querying the actual data. The second type of query is the compound query, such as the bool query, which can combine multiple queries. However, this is not the whole picture. In addition to these two types of queries, certain queries can have filters that are used to narrow down your results with certain criteria. Filter queries don't affect scoring and are usually very efficient and easily cached. To make it even more complicated, queries can contain other queries (don't worry; we will try to explain all this!). Furthermore, some queries can contain filters and others can contain both queries and filters. Although this is not everything, we will stick with this working explanation for now. The example data If not stated otherwise, the following mappings will be used for the rest of the article: { "book" : { "properties" : { "author" : { "type" : "string" }, "characters" : { "type" : "string" }, "copies" : { "type" : "long", "ignore_malformed" : false }, "otitle" : { "type" : "string" }, "tags" : { "type" : "string", "index" : "not_analyzed" }, "title" : { "type" : "string" }, "year" : { "type" : "long", "ignore_malformed" : false, "index" : "analyzed" }, "available" : { "type" : "boolean" } } } } The preceding mappings represent a simple library and were used to create the library index. One thing to remember is that Elasticsearch will analyze the string based fields if we don't configure it differently. The preceding mappings were stored in the mapping.json file and in order to create the mentioned library index we can use the following commands: curl -XPOST 'localhost:9200/library' curl -XPUT 'localhost:9200/library/book/_mapping' -d @mapping.json We also used the following sample data as the example ones for this article: { "index": {"_index": "library", "_type": "book", "_id": "1"}} { "title": "All Quiet on the Western Front","otitle": "Im Westen nichts Neues","author": "Erich Maria Remarque","year": 1929,"characters": ["Paul Bäumer", "Albert Kropp", "Haie Westhus", "Fredrich Müller", "Stanislaus Katczinsky", "Tjaden"],"tags": ["novel"],"copies": 1, "available": true, "section" : 3} { "index": {"_index": "library", "_type": "book", "_id": "2"}} { "title": "Catch-22","author": "Joseph Heller","year": 1961,"characters": ["John Yossarian", "Captain Aardvark", "Chaplain Tappman", "Colonel Cathcart", "Doctor Daneeka"],"tags": ["novel"],"copies": 6, "available" : false, "section" : 1} { "index": {"_index": "library", "_type": "book", "_id": "3"}} { "title": "The Complete Sherlock Holmes","author": "Arthur Conan Doyle","year": 1936,"characters": ["Sherlock Holmes","Dr. Watson", "G. Lestrade"],"tags": [],"copies": 0, "available" : false, "section" : 12} { "index": {"_index": "library", "_type": "book", "_id": "4"}} { "title": "Crime and Punishment","otitle": "Преступлéние и наказáние","author": "Fyodor Dostoevsky","year": 1886,"characters": ["Raskolnikov", "Sofia Semyonovna Marmeladova"],"tags": [],"copies": 0, "available" : true} We stored our sample data in the documents.json file, and we use the following command to index it: curl -s -XPOST 'localhost:9200/_bulk' --data-binary @documents.json A simple query The simplest way to query Elasticsearch is to use the URI request query. For example, to search for the word crime in the title field, you could send a query using the following command: curl -XGET 'localhost:9200/library/book/_search?q=title:crime&pretty' This is a very simple, but limited, way of submitting queries to Elasticsearch. If we look from the point of view of the Elasticsearch query DSL, the preceding query is the query_string query. It searches for the documents that have the term crime in the title field and can be rewritten as follows: { "query" : { "query_string" : { "query" : "title:crime" } } } Sending a query using the query DSL is a bit different, but still not rocket science. We send the GET (POST is also accepted in case your tool or library doesn't allow sending request body in HTTP GET requests) HTTP request to the _search REST endpoint as earlier and include the query in the request body. Let's take a look at the following command: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "query" : { "query_string" : { "query" : "title:crime" } } }' As you can see, we used the request body (the -d switch) to send the whole JSON-structured query to Elasticsearch. The pretty request parameter tells Elasticsearch to structure the response in such a way that we humans can read it more easily. In response to the preceding command, we get the following output: { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "4", "_score" : 0.5, "_source" : { "title" : "Crime and Punishment", "otitle" : "Преступлéние и наказáние", "author" : "Fyodor Dostoevsky", "year" : 1886, "characters" : [ "Raskolnikov", "Sofia Semyonovna Marmeladova" ], "tags" : [ ], "copies" : 0, "available" : true } } ] } } Nice! We got our first search results with the query DSL. Paging and result size Elasticsearch allows us to control how many results we want to get (at most) and from which result we want to start. The following are the two additional properties that can be set in the request body: from: This property specifies the document that we want to have our results from. Its default value is 0, which means that we want to get our results from the first document. size: This property specifies the maximum number of documents we want as the result of a single query (which defaults to 10). For example, if weare only interested in aggregations results and don't care about the documents returned by the query, we can set this parameter to 0. If we want our query to get documents starting from the tenth item on the list and get 20 of items from there on, we send the following query: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "from" : 9, "size" : 20, "query" : { "query_string" : { "query" : "title:crime" } } }' Returning the version value In addition to all the information returned, Elasticsearch can return the version of the document. To do this, we need to add the version property with the value of true to the top level of our JSON object. So, the final query, which requests for version information, will look as follows: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "version" : true, "query" : { "query_string" : { "query" : "title:crime" } } }' After running the preceding query, we get the following results: { "took" : 4, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "4", "_version" : 1, "_score" : 0.5, "_source" : { "title" : "Crime and Punishment", "otitle" : "Преступлéние и наказáние", "author" : "Fyodor Dostoevsky", "year" : 1886, "characters" : [ "Raskolnikov", "Sofia Semyonovna Marmeladova" ], "tags" : [ ], "copies" : 0, "available" : true } } ] } } As you can see, the _version section is present for the single hit we got. Limiting the score For nonstandard use cases, Elasticsearch provides a feature that lets us filter the results on the basis of a minimum score value that the document must have to be considered a match. In order to use this feature, we must provide the min_score value at the top level of our JSON object with the value of the minimum score. For example, if we want our query to only return documents with a score higher than 0.75, we send the following query: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "min_score" : 0.75, "query" : { "query_string" : { "query" : "title:crime" } } }' We get the following response after running the preceding query: { "took" : 3, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 0, "max_score" : null, "hits" : [ ] } } If you look at the previous examples, the score of our document was 0.5, which is lower than 0.75, and thus we didn't get any documents in response. Limiting the score usually doesn't make much sense because comparing scores between the queries is quite hard. However, maybe in your case, this functionality will be needed. Choosing the fields that we want to return With the use of the fields array in the request body, Elasticsearch allows us to define which fields to include in the response. Remember that you can only return these fields if they are marked as stored in the mappings used to create the index, or if the _source field was used (Elasticsearch uses the _source field to provide the stored values and the _source field is turned on by default). So, for example, to return only the title and the year fields in the results (for each document), send the following query to Elasticsearch: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "fields" : [ "title", "year" ], "query" : { "query_string" : { "query" : "title:crime" } } }' In response, we get the following output: { "took" : 5, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "4", "_score" : 0.5, "fields" : { "title" : [ "Crime and Punishment" ], "year" : [ 1886 ] } } ] } } As you can see, everything worked as we wanted to. There are four things we will like to share with you, which are as follows: If we don't define the fields array, it will use the default value and return the _source field if available. If we use the _source field and request a field that is not stored, then that field will be extracted from the _source field (however, this requires additional processing). If we want to return all the stored fields, we just pass an asterisk (*) as the field name. From a performance point of view, it's better to return the _source field instead of multiple stored fields. This is because getting multiple stored fields may be slower compared to retrieving a single _source field. Source filtering In addition to choosing which fields are returned, Elasticsearch allows us to use the so-called source filtering. This functionality allows us to control which fields are returned from the _source field. Elasticsearch exposes several ways to do this. The simplest source filtering allows us to decide whether a document should be returned or not. Consider the following query: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "_source" : false, "query" : { "query_string" : { "query" : "title:crime" } } }' The result retuned by Elasticsearch should be similar to the following one: { "took" : 12, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "4", "_score" : 0.5 } ] } } Note that the response is limited to base information about a document and the _source field was not included. If you use Elasticsearch as a second source of data and content of the document is served from SQL database or cache, the document identifier is all you need. The second way is similar to as described in the preceding fields, although we define which fields should be returned in the document source itself. Let's see that using the following example query: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "_source" : ["title", "otitle"], "query" : { "query_string" : { "query" : "title:crime" } } }' We wanted to get the title and the otitle document fields in the returned _source field. Elasticsearch extracted those values from the original _source value and included the _source field only with the requested fields. The whole response returned by Elasticsearch looked as follows: { "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "4", "_score" : 0.5, "_source" : { "otitle" : "Преступлéние и наказáние", "title" : "Crime and Punishment" } } ] } } We can also use asterisk to select which fields should be returned in the _source field; for example, title* will return value for the title field and for title10 (if we have such field in our data). If we have more extended document with nested part, we can use notation with dot; for example, title.* to select all the fields nested under the title object. Finally, we can also specify explicitly which fields we want to include and which to exclude from the _source field. We can include fields using the include property and we can exclude fields using the exclude property (both of them are arrays of values). For example, if we want the returned _source field to include all the fields starting with the letter t but not the title field, we will run the following query: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "_source" : { "include" : [ "t*"], "exclude" : ["title"] }, "query" : { "query_string" : { "query" : "title:crime" } } }' Using the script fields Elasticsearch allows us to use script-evaluated values that will be returned with the result documents. To use the script fields functionality, we add the script_fields section to our JSON query object and an object with a name of our choice for each scripted value that we want to return. For example, to return a value named correctYear, which is calculated as the year field minus 1800, we run the following query: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "script_fields" : { "correctYear" : { "script" : "doc["year"].value - 1800" } }, "query" : { "query_string" : { "query" : "title:crime" } } }' By default, Elasticsearch doesn't allow us to use dynamic scripting. If you tried the preceding query, you probably got an error with information stating that the scripts of type [inline] with operation [search] and language [groovy] are disabled. To make this example work, you should add the script.inline: on property to the elasticsearch.yml file. However, this exposes a security threat. Using the doc notation, like we did in the preceding example, allows us to catch the results returned and speed up script execution at the cost of higher memory consumption. We also get limited to single-valued and single term fields. If we care about memory usage, or if we are using more complicated field values, we can always use the _source field. The same query using the _source field looks as follows: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "script_fields" : { "correctYear" : { "script" : "_source.year - 1800" } }, "query" : { "query_string" : { "query" : "title:crime" } } }' The following response is returned by Elasticsearch with dynamic scripting enabled: { "took" : 76, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 1, "max_score" : 0.5, "hits" : [ { "_index" : "library", "_type" : "book", "_id" : "4", "_score" : 0.5, "fields" : { "correctYear" : [ 86 ] } } ] } } As you can see, we got the calculated correctYear field in response. Passing parameters to the script fields Let's take a look at one more feature of the script fields - passing of additional parameters. Instead of having the value 1800 in the equation, we can usea variable name and pass its value in the params section. If we do this, our query will look as follows: curl -XGET 'localhost:9200/library/book/_search?pretty' -d '{ "script_fields" : { "correctYear" : { "script" : "_source.year - paramYear", "params" : { "paramYear" : 1800 } } }, "query" : { "query_string" : { "query" : "title:crime" } } }' As you can see, we added the paramYear variable as part of the scripted equation and provided its value in the params section. This allows Elasticsearch to execute the same script with different parameter values in a slightly more efficient way. Understanding the querying process After reading the previous section, we now know how querying works in Elasticsearch. You know that Elasticsearch, in most cases, needs to scatter the query across multiple nodes, get the results, merge them, fetch the relevant documents from one or more shards, and return the final results to the client requesting the documents. What we didn't talk about are two additional things that define how queries behave: search type and query execution preference. We will now concentrate on these functionalities of Elasticsearch. Query logic Elasticsearch is a distributed search engine and so all functionality provided must be distributed in its nature. It is exactly the same with querying. Because we would like to discuss some more advanced topics on how to control the query process, we first need to know how it works. Let's now get back to how querying works. By default, if we don't alter anything, the query process will consist of two phases: the scatter and the gather phase. The aggregator node (the one that receivesthe request) will run the scatter phase first. During that phase, the query is distributed to all the shards that our index is built of (of course if routing is not used). For example, if it is built of 5 shards and 1 replica then 5 physical shards will be queried (we don't need to query a shard and its replica as they contain the same data). Each of the queried shards will only return the document identifier and the score of the document. The node that sent the scatter query will wait for all the shards to complete their task, gather the results, and sort them appropriately (in this case, from top scoring to the lowest scoring ones). After that, a new request will be sent to build the search results. However, now only to those shards that held the documents to build the response. In most cases, Elasticsearch won't send the request to all the shards but to its subset. That's because we usually don't get the complete result of the query but only a portion of it. This phase is called the gather phase. After all the documents are gathered, the final response is built and returned as the query result. This is the basic and default Elasticsearch behavior but we can change it. Search type Elasticsearch allows us to choose how we want our query to be processed internally. We can do that by specifying the search type. There are different situations where different search type are appropriate: sometimes one can care only about the performance while sometimes query relevance is the most important factor. You should remember that each shard is a small Lucene index and in order to return more relevant results, some information, such as frequencies, needs to be transferred between the shards. To control how the queries are executed, we can pass the search_type request parameter and set it to one of the following values: query_then_fetch: In the first step, the query is executed to get the information needed to sort and rank the documents. This step is executed against all the shards. Then only the relevant shards are queried for the actual content of the documents. Different from query_and_fetch, the maximum number of results returned by this query type will be equal to the size parameter. This is the search type used by default if no search type is provided with the query, and this is the query type we described previously. dfs_query_then_fetch: Again, as with the previous dfs_query_and_fetch, dfs_query_then_fetch is similar to its counterpart query_then_fetch. However, it contains an additional phase comparing which calculates distributed term frequencies. There are also two deprecated search types: count and scan. The first one is deprecated starting from Elasticsearch 2.0 and the second one starting with Elasticsearch 2.1. The first search type used to provide benefits where only aggregations or the number of documents was relevant, but now it is enough to add size equal to 0 to your queries. The scan request was used for scrolling functionality. So if we would like to use the simplest search type, we would run the following command: curl -XGET 'localhost:9200/library/book/_search?pretty&search_type=query_then_fetch' -d '{ "query" : { "term" : { "title" : "crime" } } }' Search execution preference In addition to the possibility of controlling how the query is executed, we can also control on which shards to execute the query. By default, Elasticsearch uses shards and replicas, both the ones available on the node we've sent the request and on the other nodes in the cluster. The default behavior is mostly the proper method of shard preference of queries. But there may be times when we want to change the default behavior. For example, you may want the search to be only executed on the primary shards. To do that, we can set the preference request parameter to one of the following values: _primary: The operation will be only executed on the primary shards, so the replicas won't be used. This can be useful when we need to use the latest information from the index but our data is not replicated right away. _primary_first: The operation will be executed on the primary shards if they are available. If not, it will be executed on the other shards. _replica: The operation will be executed only on the replica shards. _replica_first: This operation is similar to _primary_first, but uses replica shards. The operation will be executed on the replica shards if possible, and on the primary shards if the replicas are not available. _local: The operation will be executed on the shards available on the node which the request was sent and if such shards are not present, the request will be forwarded to the appropriate nodes. _only_node:node_id: This operation will be executed on the node with the provided node identifier. _only_nodes:nodes_spec: This operation will be executed on the nodes that are defined in nodes_spec. This can be an IP address, a name, a name or IP address using wildcards, and so on. For example, if nodes_spec is set to 192.168.1.*, the operation will be run on the nodes with IP address starting with 192.168.1. _prefer_node:node_id: Elasticsearch will try to execute the operation on the node with the provided identifier. However, if the node is not available, it will be executed on the nodes that are available. _shards:1,2: Elasticsearch will execute the operation on the shards with the given identifiers; in this case, on shards with identifiers 1 and 2. The _shards parameter can be combined with other preferences, but the shards identifiers need to be provided first. For example, _shards:1,2;_local. Custom value: Any custom, string value may be passed. Requests with the same values provided will be executed on the same shards. For example, if we would like to execute a query only on the local shards, we would run the following command: curl -XGET 'localhost:9200/library/_search?pretty&preference=_local' -d '{ "query" : { "term" : { "title" : "crime" } } }' Search shards API When discussing the search preference, we will also like to mention the search shards API exposed by Elasticsearch. This API allows us to check which shards the query will be executed at. In order to use this API, run a request against the search_shards rest end point. For example, to see how the query will be executed, we run the following command: curl -XGET 'localhost:9200/library/_search_shards?pretty' -d '{"query":"match_all":{}}' The response to the preceding command will be as follows: { "nodes" : { "my0DcA_MTImm4NE3cG3ZIg" : { "name" : "Cloud 9", "transport_address" : "127.0.0.1:9300", "attributes" : { } } }, "shards" : [ [ { "state" : "STARTED", "primary" : true, "node" : "my0DcA_MTImm4NE3cG3ZIg", "relocating_node" : null, "shard" : 0, "index" : "library", "version" : 4, "allocation_id" : { "id" : "9ayLDbL1RVSyJRYIJkuAxg" } } ], [ { "state" : "STARTED", "primary" : true, "node" : "my0DcA_MTImm4NE3cG3ZIg", "relocating_node" : null, "shard" : 1, "index" : "library", "version" : 4, "allocation_id" : { "id" : "wfpvtaLER-KVyOsuD46Yqg" } } ], [ { "state" : "STARTED", "primary" : true, "node" : "my0DcA_MTImm4NE3cG3ZIg", "relocating_node" : null, "shard" : 2, "index" : "library", "version" : 4, "allocation_id" : { "id" : "zrLPWhCOSTmjlb8TY5rYQA" } } ], [ { "state" : "STARTED", "primary" : true, "node" : "my0DcA_MTImm4NE3cG3ZIg", "relocating_node" : null, "shard" : 3, "index" : "library", "version" : 4, "allocation_id" : { "id" : "efnvY7YcSz6X8X8USacA7g" } } ], [ { "state" : "STARTED", "primary" : true, "node" : "my0DcA_MTImm4NE3cG3ZIg", "relocating_node" : null, "shard" : 4, "index" : "library", "version" : 4, "allocation_id" : { "id" : "XJHW2J63QUKdh3bK3T2nzA" } } ] ] } As you can see, in the response returned by Elasticsearch, we have the information about the shards that will be used during the query process. Of course, with the search shards API, we can use additional parameters that control the querying process. These properties are routing, preference, and local. We are already familiar with the first two. The local parameter is a Boolean (values true or false) one that allows us to tell Elasticsearch to use the cluster state information stored on the local node (setting local to true) instead of the one from the master node (setting local to false). This allows us to diagnose problems with cluster state synchronization. Summary This article has been all about the querying Elasticsearch. We started by looking at how to query Elasticsearch and what Elasticsearch does when it needs to handle the query. We also learned about the basic and compound queries, so we are now able to use both simple queries as well as the ones that group multiple small queries together. Finally, we discussed how to choose the right query for a given use case. Resources for Article: Further resources on this subject: Extending ElasticSearch with Scripting [article] Integrating Elasticsearch with the Hadoop ecosystem [article] Elasticsearch Administration [article]
Read more
  • 0
  • 0
  • 2549
article-image-factory-method-pattern
Packt
10 Feb 2016
10 min read
Save for later

The Factory Method Pattern

Packt
10 Feb 2016
10 min read
In this article by Anshul Verma and Jitendra Zaa, author of the book Apex Design Patterns, we will discuss some problems that can occur mainly during the creation of class instances and how we can write the code for the creation of objects in a more simple, easy to maintain, and scalable way. (For more resources related to this topic, see here.) In this article, we will discuss the the factory method creational design pattern. Often, we find that some classes have common features (behavior) and can be considered classes of the same family. For example, multiple payment classes represent a family of payment services. Credit card, debit card, and net banking are some of the examples of payment classes that have common methods, such as makePayment, authorizePayment, and so on. Using the factory method pattern, we can develop controller classes, which can use these payment services, without knowing the actual payment type at design time. The factory method pattern is a creational design pattern used to create objects of classes from the same family without knowing the exact class name at design time. Using the factory method pattern, classes can be instantiated from the common factory method. The advantage of using this pattern is that it delegates the creation of an object to another class and provides a good level of abstraction. Let's learn this pattern using the following example: The Universal Call Center company is new in business and provides free admin support to customers to resolve issues related to their products. A call center agent can provide some information about the product support; for example, to get the Service Level Agreement (SLA) or information about the total number of tickets allowed to open per month. A developer came up with the following class: public class AdminBasicSupport{ /** * return SLA in hours */ public Integer getSLA() { return 40; } /** * Total allowed support tickets allowed every month */ public Integer allowedTickets() { // As this is basic support return 9999; } } Now, to get the SLA of AdminBasicSupport, we need to use the following code every time: AdminBasicSupport support = new AdminBasicSupport(); System.debug('Support SLA is - '+support.getSLA()); Output - Support SLA is – 40 The "Universal Call Centre" company was doing very well, and in order to grow the business and increase the profit, they started the premium support for customers who were willing to pay for cases and get a quick support. To make them special from the basic support, they changed the SLA to 12 hours and maximum 50 cases could be opened in one month. A developer had many choices to make this happen in the existing code. However, instead of changing the existing code, they created a new class that would handle only the premium support-related functionalities. This was a good decision because of the single responsibility principle. public class AdminPremiumSupport{ /** * return SLA in hours */ public Integer getSLA() { return 12; } /** * Total allowed support tickets allowed every month is 50 */ public Integer allowedTickets() { return 50; } } Now, every time any information regarding the SLA or allowed tickets per month is needed, the following Apex code can be used: if(Account.supportType__c == 'AdminBasic') { AdminBasicSupport support = new AdminBasicSupport(); System.debug('Support SLA is - '+support.getSLA()); }else{ AdminPremiumSupport support = new AdminPremiumSupport(); System.debug('Support SLA is - '+support.getSLA()); } As we can see in the preceding example, instead of adding some conditions to the existing class, the developer decided to go with a new class. Each class has its own responsibility, and they need to be changed for only one reason. If any change is needed in the basic support, then only one class needs to be changed. As we all know that this design principle is known as the Single Responsibility Principle. Business was doing exceptionally well in the call center, and they planned to start the golden and platinum support as well. Developers started facing issues with the current approach. Currently, they have two classes for the basic and premium support and requests for two more classes were in the pipeline. There was no guarantee that the support type will not remain the same in future. Because of every new support type, a new class is needed; and therefore, the previous code needs to be updated to instantiate these classes. The following code will be needed to instantiate these classes: if(Account.supportType__c == 'AdminBasic') { AdminBasicSupport support = new AdminBasicSupport(); System.debug('Support SLA is - '+support.getSLA()); }else if(Account.supportType__c == 'AdminPremier') { AdminPremiumSupport support = new AdminPremiumSupport(); System.debug('Support SLA is - '+support.getSLA()); }else if(Account.supportType__c == 'AdminGold') { AdminGoldSupport support = new AdminGoldSupport(); System.debug('Support SLA is - '+support.getSLA()); }else{ AdminPlatinumSupport support = new AdminPlatinumSupport(); System.debug('Support SLA is - '+support.getSLA()); } We are only considering the getSLA() method, but in a real application, there can be other methods and scenarios as well. The preceding code snippet clearly depicts the code duplicity and maintenance nightmare. The following image shows the overall complexity of the example that we are discussing: Although they are using a separate class for each support type, an introduction to a new support class will lead to changes in the code in all existing code locations where these classes are being used. The development team started brainstorming to make sure that the code is capable to extend easily in future with the least impact on the existing code. One of the developers came up with a suggestion to use an interface for all support classes so that every class can have the same methods and they can be referred to using an interface. The following interface was finalized to reduce the code duplicity: public Interface IAdminSupport{ Integer getSLA() ; Integer allowedTickets(); } Methods defined within an interface have no access modifiers and just contain their signatures. Once an interface was created, it was time to update existing classes. In our case, only one line needed to be changed and the remaining part of the code was the same because both the classes already have the getSLA() and allowedTickets() methods. Let's take a look at the following line of code: public class AdminPremiumSupport{ This will be changed to the following code: public class AdminBasicSupportImpl implements IAdminSupport{ The following line of code is as follows: public class AdminPremiumSupport{ This will be changed to the following code: public class AdminPremiumSupportImpl implements IAdminSupport{ In the same way, the AdminGoldSupportImpl and AdminPlatinumSupportImpl classes are written. A class diagram is a type of Unified Modeling Language (UML), which describes classes, methods, attributes, and their relationships, among other objects in a system. You can read more about class diagrams at https://en.wikipedia.org/wiki/Class_diagram. The following image shows a class diagram of the code written by developers using an interface: Now, the code to instantiate different classes of the support type can be rewritten as follows: IAdminSupport support = null; if(Account.supportType__c == 'AdminBasic') { support = new AdminBasicSupportImpl(); }else if(Account.supportType__c == 'AdminPremier') { support = new AdminPremiumSupportImpl(); }else if(Account.supportType__c == 'AdminGold') { support = new AdminGoldSupportImpl(); }else{ support = new AdminPlatinumSupportImpl(); } System.debug('Support SLA is - '+support.getSLA()); There is no switch case statement in Apex, and that's why multiple if and else statements are written. As per the product team, a new compiler may be released in 2016 and it will be supported. You can vote for this idea at https://success.salesforce.com/ideaView?id=08730000000BrSIAA0. As we can see, the preceding code is minimized to create a required instance of a concrete class, and then uses an interface to access methods. This concept is known as program to interface. This is one of the most recommended OOP principles suggested to be followed. As interfaces are kinds of contracts, we already know which methods will be implemented by concrete classes, and we can completely rely on the interface to call them, which hides their complex implementation and logic. It has a lot of advantages and a few of them are loose coupling and dependency injection. A concrete class is a complete class that can be used to instantiate objects. Any class that is not abstract or an interface can be considered a concrete class. We still have one problem in the previous approach. The code to instantiate concrete classes is still present at many locations and will still require changes if a new support type is added. If we can delegate the creation of concrete classes to some other class, then our code will be completely independent of the existing code and new support types. This concept of delegating decisions and creation of similar types of classes is known as the factory method pattern. The following class can be used to create concrete classes and will act as a factory: /** * This factory class is used to instantiate concrete class * of respective support type * */ public class AdminSupportFactory { public static IAdminSupport getInstance(String supporttype){ IAdminSupport support = null; if(supporttype == 'AdminBasic') { support = new AdminBasicSupportImpl(); }else if(supporttype == 'AdminPremier') { support = new AdminPremiumSupportImpl(); }else if(supporttype == 'AdminGold') { support = new AdminGoldSupportImpl(); }else if(supporttype == 'AdminPlatinum') { support = new AdminPlatinumSupportImpl(); } return support ; } } In the preceding code, we only need to call the getInstance(string) method, and this method will take a decision and return the actual implementation. As a return type is an interface, we already know the methods that are defined, and we can use the method without actually knowing its implementation. This is a very good example of abstraction. The final class diagram of the factory method pattern that we discussed will look like this: The following code snippet can be used repeatedly by any client code to instantiate a class of any support type: IAdminSupport support = AdminSupportFactory.getInstance ('AdminBasic'); System.debug('Support SLA is - '+support.getSLA()); Output : Support SLA is – 40 Reflection in Apex The problem with the preceding design is that whenever a new support needs to be added, we need to add a condition to AdminSupportFactory. We can store the mapping between a support type and its concrete class name in Custom setting. This way, whenever a new concrete class is added, we don't even need to change the factory class and a new entry needs to be added to custom setting. Consider custom setting created by the Support_Type__c name with the Class_Name__c field name of the text type with the following records: Name Class name AdminBasic AdminBasicSupportImpl AdminGolden AdminGoldSupportImpl AdminPlatinum AdminPlatinumSupportImpl AdminPremier AdminPremiumSupportImpl However, using reflection, the AdminSupportFactory class can also be rewritten to instantiate service types at runtime as follows: /** * This factory class is used to instantiate concrete class * of respective support type * */ public class AdminSupportFactory { public static IAdminSupport getInstance(String supporttype) { //Read Custom setting to get actual class name on basis of Support type Support_Type__c supportTypeInfo = Support_Type__c.getValues(supporttype); //from custom setting get appropriate class name Type t = Type.forName(supportTypeInfo.Class_Name__c); IAdminSupport retVal = (IAdminSupport)t.newInstance(); return retVal; } } In the preceding code, we are using the Type system class. This is a very powerful class used to instantiate a new class at runtime. It has the following two important methods: forName: This returns a type that is equivalent to a string passed newInstance: This creates a new object for a specified type Inspecting classes, methods, and variables at runtime without knowing a class name, or instantiating a new object and invoking methods at runtime is known as Reflection in computer science. One more advantage of using the factory method, custom setting, and reflection together is that if in future one of the support types need to be replaced by another service type permanently, then we need to simply change the appropriate mapping in custom setting without any changes in the code. Summary In this article, we discussed how to deal with various situations while instantiating objects using design patterns, using the factory method. Resources for Article: Further resources on this subject: Getting Your APEX Components Logic Right[article] AJAX Implementation in APEX[article] Custom Coding with Apex[article]
Read more
  • 0
  • 19
  • 34900

article-image-introduction-sql-and-sqlite
Packt
10 Feb 2016
22 min read
Save for later

Introduction to SQL and SQLite

Packt
10 Feb 2016
22 min read
In this article by Gene Da Rocha, author or the book Learning SQLite for iOS we are introduced to the background of the Structured Query Language (SQL) and the mobile database SQLite. Whether you are an experienced technologist at SQL or a novice, using the book will be a great aid to help you understand this cool subject, which is gaining momentum. SQLite is the database used on the mobile smartphone or tablet that is local to the device. SQLite has been modified by different vendors to harden and secure it for a variety of uses and applications. (For more resources related to this topic, see here.) SQLite was released in 2000 and has grown to be as a defacto database on a mobile or smartphone today. It is an open source piece of software with a low footprint or overhead, which is packaged with a relational database management system. Mr D. Richard Hipp is the inventor and author for SQLite, which was designed and developed on a battleship while he was at a company called General Dynamics at the U. S. Navy. The programming was built for a HP-UX operating system with Informix as the database engine. It took many hours in the data to upgrade or install the database software and was an over-the-top database for this experience DBA (database administrator). Mr Hipp wanted a portable, self-contained, easy-to-use database, which could be mobile, quick to install, and not dependent on the operating. Initially, SQLite 1.0 used the gdbm as its storage system, but later, it was replaced with its own B-tree implementation and technology for the database. The B-tree implementation was enhanced to support transactions and store rows of data with key order. By 2001 onwards, open source family extensions for other languages, such as Java, Python, and Perl, were written to support their applications. The database and its popularity within the open source community and others were growing. Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition and manipulation language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control. Although SQL is often described as, and to a great extent is, a declarative language (4GL), it also includes procedural elements. Internationalization supported UTF-16 and UTF-8 and included text-collating sequences in version 2 and 3 in 2004. It was supported by funding from AOL (America Online) in 2004. It works with a variety of browsers, which sometimes have in-built support for this technology. For example, there are so many extensions that use Chrome or Firefox, which allow you to manage the database. There have been many features added to this product. The future with the growth in mobile phones sets this quick and easy relational database system to quantum leap its use within the mobile and tablet application space. SQLite is based on the PostgreSQL as a point of reference. SQLite does not enforce any type checking. The schema does not constrain it since the type of value is dynamic, and a trigger will be activated by converting the data type. About SQL In June 1970, a research paper was published by Dr. E.F. Codd called A Relational Model of Data for Large Shared Data Banks. The Association of Computer Machinery (ACM) accepted Codd data and technology model, which has today become the standard for the RDBMS (Relational Database Management System). IBM Corporation had invented the language called by Structured English Query Language (SEQUEL), where the word "English" was dropped to become SQL. SQL is still pronounced as what has today become the standard for the RDBMS (Relational Database Management System) had a product called which has today become the SQL technology, followed by Oracle, Sybase and Microsoft's SQL Server. The standard commercial relational database management system language today is SQL (SEQUEL). Today, there are ANSI standards for SQL, and there are many variations of this technology. Among the mentioned manufacturers, there are also others available in the open source world, for example, an SQL query engine such as Presto. This is the distribution engine for SQL under open source, which is made to execute interactive analytic queries. Presto queries are run under databases from a variety of data source sizes—gigabytes to petabytes. Companies such as Facebook and Dropbox use the Presto SQL engine for their queries and analytics in data warehouse and related applications. SQL is made up of a data manipulation and definition language built with tuple and algebra calculation in a relational format. The SQL language has a variety of statements but most would recognize the INSERT, SELECT, UPDATE and DELETE statements. These statements form a part of the database schema management process and aid the data access and security access. SQL includes procedural elements as part of its setup. Is SQLite used anywhere? Companies may use applications but they are not aware of the SQL engines that drive their data storage and information. Although, it has become a standard with the American National Standards Institute (ANSI) in 1986, SQL features and functionality are not 100% portable among different SQL systems and require code changes to be useful. These standards are always up for revision to ensure ANSI is maintained. There are many variants of SQL engines on the market from companies, such as Oracle, SQL Server (Microsoft), DB2 (IBM), Sybase (SAP), MYSQL (Oracle), and others. Different companies operate several types of pricing structures, such as free open source, or a paid per seat or by transactions or server types or loads. Today, there is a preference for using server technology and SQL in the cloud with different providers, for example, Amazon Web Services (AWS). SQLite, as it names suggests, is SQL in a light environment, which is also flexible and versatile. Enveloped and embedded database among other processes SQLite has been designed and developed to work and coexist with other applications and processes in its area. RDBMS is tightly integrated with the native application software, which requires storing information but is masked and hidden from users, and it requires minimal administration or maintenance. SQLite can work with different API hidden from users and requires minimal administration or maintenance areas. RDBMS is intertwined with other applications; that is, it requires minimal supervision; there is no network traffic; no network access conflicts or configuration; no access limitations with privileges or permissions; and a large reduced overhead. These make it easier and quicker to deploy your applications to the app stores or other locations. The different components work seamlessly together in a harmonized way to link up data with the SQLite library and other processes. These show how the Apache process and the C/C++ process work together with the SQLite-C library to interface and link with it so that it becomes seamless and integrates with the operating system. SQLite has been developed and integrated in such a way that it will interface and gel with a variety of applications and multiple solutions. As a lightweight RDBMS, it can stand on its own by its versatility and is not cumbersome or too complex to benefit your application. It can be used on many platforms and comes with a binary compatible format, which is easier to dovetail within your mobile application. The different types of I.T. professionals will be involved with SQLite since it holds the data, affects performance, and involves database design, user or mobile interface design specialists, analysts and consultancy types. These professionals could use their previous knowledge of SQL to quickly grasp SQLite. SQLite can act as both data processor for information or deal with data in memory to perform well. The different software pieces of a jigsaw can interface properly by using the C API interface to SQLite, which some another programming language code. For example, C or C++ code can be programmed to communicate with the SQLITE C API, which will then talk to the operating system, and thus communicate with the database engine. Another language such as PHP can communicate using its own language data objects, which will in turn communicate with the SQLite C API and the database. SQLite is a great database to learn especially for computer scientists who want to use a tool that can open your mind to investigate caching, B-Tree structures and algorithms, database design architecture, and other concepts. The architecture of the SQLite database As a library within the OS-Interface, SQLite will have many functions implemented through a programming called tclsqlite.c. Since many technologies and reserved words are used, to language, and in this case, it will have the C language. The core functions are to be found in main.c, legacy.c, and vmbeapi.c. There is also a source code file in C for the TCL language to avoid any confusion; the prefix of sqlite3 is used at the beginning within the SQLite library. The Tokeniser code base is found within tokenize.c. Its task is to look at strings that are passed to it and partition or separate them into tokens, which are then passed to the parser. The Parser code base is found within parse.y. The Lemon LALR(1) parser generator is the parser for SQLite; it uses the context of tokens and assigns them a meaning. To keep within the low-sized footprint of RDBMS, only one C file is used for the parse generator. The Code Generator is then used to create SQL statements from the outputted tokens of the parser. It will produce virtual machine code that will carry out the work of the SQL statements. Several files such as attach.c, build.c, delete.c, select.c, and update.c will handle the SQL statements and syntax. Virtual machine executes the code that is generated from the Code Generator. It has in-built storage where each instruction may have up to three additional operands as a part of each code. The source file is called vdbe.c, which is a part of the SQLite database library. Built-in is also a computing engine, which has been specially created to integrate with the database system. There are two header files for virtual machine; the header files that interface a link between the SQLite libraries are vdbe.h and vdbeaux.c, which have utilities used by other modules. The vdbeapi.c file also connects to virtual machine with sqlite_bind and other related interfaces. The C language routines are called from the SQL functions that reference them. For example, functions such as count() are defined in func.c and date functions are located in date.c. B-tree is the type of table implementation used in SQLite; and the C source file is btree.c. The btree.h header file defines the interface to the B-tree system. There is a different B-tree setup for every table and index and held within the same file. There is a header portion within the btree.c, which will have details of the B-tree in a large comment field. The Pager or Page Cache using the B-tree will ask for data in a fixed sized format. The default size is 1024 bytes, which can be between 512 and 65536 bytes. Commit and Rollback operations, coupled with the caching, reading, and writing of data are handled by Page Cache or Pager. Data locking mechanisms are also handled by the Page Cache. The C file page.c is implemented to handle requests within the SQLite library and the header file is pager.h. The OS Interface C file is defined in os.h. It addresses how SQLite can be used on different operating systems and become transparent and portable to the user thus, becoming a valuable solution for any developer. An abstract layer to handle Win32 and POSIX compliant systems is also in place. Different operating systems have their own C file. For example, os_win.c is for Windows, os_unix.c is for Unix, coupled with their own os_win.h and os_unix.h header files. Util.c is the C file that will handle memory allocation and string comparisons. The Utf.c C file will hold the Unicode conversion subroutines. The Utf.c C file will hold the Unicode data, sort it within the SQL engine, and use the engine itself as a mechanism for computing data. Since the memory of the device is limited and the database size has the same constraints, the developer has to think outside the box to use these techniques. These types of memory and resource management form a part of the approach when the overlay techniques were used in the past when disk and memory was limited.   SELECT parameter1, STTDEV(parameter2)       FROM Table1 Group by parameter1       HAVING parameter1 > MAX(parameter3) IFeatures As part of its standards, SQLite uses and implements most of the SQL-92 standards, but not all the potential features or parts of functionality are used or realized. For example, the SQLite uses and implements most of the SQL-92 standards but not all potent columns. The support for triggers is not 100% as it cannot write output to views, but as a substitute, the INSTEAD OF statement can be used. As mentioned previously, the use of a type for a column is different; most relational database systems assign them to individual values. SQLite will convert a string into an integer if the columns preferred type is an integer. It is a good piece of functionality when bound to this type of scripting language, but the technique is not portable to other RDBMS systems. It also has its criticisms for not having a good data integrity mechanism compared to others in relation to statically typed columns. As mentioned previously, it has many bindings to many languages, such as Basic, C, C#, C++, D, Java, JavaScript, Lua, PHP, Objective-C, Python, Ruby, and TCL. Its popularity by the open source community and its usage by customers and developers have enabled its growth to continue. This lightweight RDBMS can be used on Google Chrome, Firefox, Safari, Opera, and the Android Browsers and has middleware support using ADO.NET, ODBC, COM (ActiveX), and XULRunner. It also has the support for web application frameworks such as Django (Python-based), Ruby on Rails, and Bugzilla (Mozilla). There are other applications such as Adobe Photoshop Light, which uses SQLite and Skype. It is also part of the Windows 8, Symbian OS, Android, and OpenBSD operating. Apple also included it via API support via OSXvia OSXother applications like Adobe Photoshop Light. Apart from not having the large overhead of other database engines, SQLite has some major enhancements such as the EXPLAIN keyword with its manifest typing. To control constraint conflicts, the REPLACE and ON CONFLICT statements are used. Within the same query, multiple independent databases can be accessed using the DETACH and ATTACH statements. New SQL functions and collating sequences can be created using the predefined API's, which offer much more flexibility. As there is no configuration required, SQLite just does the job and works. There is no need to initialize, stop, restart, or start server processes and no administrator is required to create the database with proper access control or security permits. After any failure, no user actions are required to recover the database since it is self-repairing: SQLite is more advanced than is thought of in the first place. Unlike other RDBMS, it does not require a server setup via a server to serve up data or incur network traffic costs. There are no TCP/IP calls and frequent communication backwards or forwards. SQLite is direct; the operating system process will deal with database access to its file; and control database writes and reads with no middle-man process handshaking. By having no server backend, the process of installation, configuration, or administration is reduced significantly and the access to the database is granted to programs that require this type of data operations. This is an advantage in one way but is also a disadvantage for security and protection from data-driven misuse and data concurrency or data row locking mechanisms. It also allows the database to be accessed several times by different applications at the same time. It supports a form of portability for the cross-platform database file that can be located with the database file structure. The database file can be updated on one system and copied to another on either 32 bit or 64 bit with different architectures. This does not make a difference to SQLite. The usage of different architecture and the promises of developers to keep the file system stable and compatible with the previous, current, and future developments will allow this database to grow and thrive. SQLite databases don't need to upload old data to the new formatted and upgraded databases; it just works. By having a single disk file for the database, the information can be copied on a USB and shared or just reused on another device very quickly keeping all the information intact. Other RDBMS single-disk file for the database; the information can be copied on a USB and shared or just reused on another device very quickly keeping all the information in tact to grow and thrive. Another feature of this portable database is its size, which can start on a single 512-byte page and expand to 2147483646 pages at 65536 bytes per page or in bytes 140,737,488,224,256, which equates to about 140 terabytes. Most other RDBMS are much larger, but IBM's Cloudscape is small with a 2MB jar file. It is still larger than SQLite. The Firebird alternative's client (frontend) library is about 350KB, whereas the Berkeley Oracle database is around 450kb without SQL support and with one simple key/value pair's option. This advanced portable database system and its source code is in the public domain. They have no copyright or any claim on the source code. However, there are open source license issues and controls for some test code and documentation. This is great news for developers who might want to code up new extensions or database functionality that works with their programs, which could be made into a 'product extension' for SQLite. You cannot have this sort of access to SQL source code around since everything has a patent, limited access, or just no access. There are signed affidavits by developers to disown any copyright interest in the SQLite code. SQLite is different, because it is just not governed or ruled by copyright law; the way software should really work or it used. There are signed affidavits by developers to disown any copyright interest in the SQLite code. This means that you can define a column with a datatype of integer, but its property is dictated by the inputted values and not the column itself. This can allow any value to be stored in any declared data type for this column with the exception of an integer primary key. This feature would suit TCL or Python, which are dynamically typed programming languages. When you allocate space in most RDBMS in any declared char(50), the database system will allocate the full 50 bytes of disk space even if you do not allocate the full 50 bytes of disk space. So, out of char(50) sized column, three characters were used, then the disk space would be only three characters plus two for overhead including data type, length but not 50 characters such as other database engines. This type of operation would reduce disk space usage and use only what space was required. By using the small allocation with variable length records, the applications runs faster, the database access is quicker, manifest typing can be used, and the database is small and nimble. The ease of using this RDBMS makes it easier for most programmers at an intermediate level to create applications using this technology with its detailed documentation and examples. Other RDBMS are internally complex with links to data structures and objects. SQLite comprises using a virtual machine language that uses the EXPLAIN reserved word in front of a query. Virtual machine has increased and benefitted this database engine by providing an excellent process or controlled environment between back end (where the results are computed and outputted) and the front end (where the SQL is parsed and executed). The SQL implementation language is comparable to other RDBMS especially with its lightweight base; it does support recursive triggers and requires the FOR EACH row behavior. The FOR EACH statement is not currently supported, but functionality cannot be ruled out in the future. There is a complete ALTER TABLE support with some exceptions. For example, the RENAME TABLE, ADD COLUMN, or ALTER COLUMN is supported, but the DROP COLUMN, ADD CONSTRAINT, or ALTER COLUMN is not supported. Again, this functionality cannot be ruled out in the future. The RIGHT OUTER JOIN and FULL OUTER JOIN are not support, but the RIGHT OUTER JOIN, FULL OUTER JOIN, and LEFT OUTER JOIN are implemented. The views within this RDBMS are read only. As described so far in the this article, SQLite is a nimble and easy way to use database that developers can engage with quickly, use existing skills, and output systems to mobile devices and tablets far simpler than ever before. With the advantage of today's HTML5 and other JavaScript frameworks, the advancement of SQL and the number of SQLite installations will quantum leap. Working with SQLite The website for SQLite is www.sqlite.org where you can download all the binaries for the database, documentation, and source code, which works on operating systems such as Linux, Windows and MAC OS X. The SQLite share library or DLL is the library to be used for the Windows operating system and can be installed or seen via Visual Studio with the C++ language. So, the developer can write the code using the library that is presently linked in reference via the application. When execution has taken place, the DLL will load and all references in the code will link to those in the DLL at the right time. The SQLite3 command-line program, CLP, is a self-contained program that has all the components built in for you to run at the command line. It also comes with an extension for TCL. So within TCL, you can connect and update the SQLite database. SQLite downloads come with the TAR version for Unix systems and the ZIP version for Windows systems. iOS with SQLite On the hundreds of thousands of apps on all the app stores, it would be difficult to find the one that does not require a database of some sort to store or handle data in a particular way. There are different formats of data called datafeeds, but they all require some temporary or permanent storage. Small amounts of data may not be applicable but medium or large amounts of data will require a storage mechanism such as a database to assist the app. Using SQLite with iOS will enable developers to use their existing skills to run their DBMS on this platform as well. For SQLite, there is the C-library that is embedded and available to use with iOS with the Xcode IDE. Apple fully supports SQLite, which uses an include statement as a part of the library call, but there is not easy made mechanism to engage. Developers also tend to use FMDB—a cocoa/objective-C wrapper around SQLite. As SQLite is fast and lightweight, its usage of existing SQL knowledge is reliable and supported by Apple on Mac OS and iOS and support from many developers as well as being integrated without much outside involvement. The third SQLite library is under the general tab once the main project name is highlighted on the left-hand side. Then, at the bottom of the page or within the 'Linked Frameworks and Library', click + and a modal window appears. Enter the word sqlite and select sqlite; then, select the libsqlite3.dylib library. This one way to set up the environment to get going. In effect, it is the C++ wrapper called the libsqlite3.dylib library within the framework section, which allows the API to work with the SQLite commands. The way in which a text file is created in iOS is the way SQLite will be created. It will use the location (document directory) to save the file that is the one used by iOS. Before anything can happen, the database must be opened and ready for querying and upon the success of data, the constant SQLITE_OK is set to 0. In order to create a table in the SQLite table using the iOS connection and API, the method sqlite3_exec is set up to work with the open sqlite3 object and the create table SQL statement with a callback function. When the callback function is executed and a status is returned of SQLITE_OK, it is successful; otherwise, the other constant SQLITE_ERROR is set to 1. Once the C++ wrapper is used and the access to SQLite commands are available, it is an easier process to use SQLite with iOS. Summary In this article, you read the history of SQL, the impact of relational databases, and the use of a mobile SQL database namely SQLite. It outlines the history and beginnings of SQLite and how it has grown to be the most used database on mobile devices so far. Resources for Article:   Further resources on this subject: Team Project Setup [article] Introducing Sails.js [article] Advanced Fetching [article]
Read more
  • 0
  • 0
  • 31011

article-image-team-project-setup-0
Packt
10 Feb 2016
12 min read
Save for later

Building Your Application

Packt
10 Feb 2016
12 min read
"Measuring programming progress by lines of code is like measuring aircraft building progress by weight."                                                                --Bill Gates In this article, by Tarun Arora, the author of the book Microsoft Team Foundation Server 2015 Cookbook, provides you information about: Configuring TFBuild Agent, Pool, and Queues Setting up a TFBuild Agent using an unattended installation (For more resources related to this topic, see here.) As a developer, compiling code and running unit tests gives you an assurance that your code changes haven't had an impact on the existing codebase. Integrating your code changes into the source control repository enables other users to validate their changes with yours. As a best practice, Teams integrate changes into the shared repository several times a day to reduce the risk of introducing breaking changes or worse, overwriting each other's. Continuous integration (CI) is a development practice that requires developers to integrate code into a shared repository several times a day. Each check-in is verified by an automated build, allowing Teams to detect problems early. The automated build that runs as part of the CI process is often referred to as the CI build. There isn't a clear definition of what the CI build should do, but at the very minimum, it is expected to compile code and run unit tests. Running the CI build on a non-developer remote workspace helps identify the dependencies that may otherwise go unnoticed into the release process. We can talk endlessly about the benefits of CI; the key here is that it enables you to have potentially deployable software at all times. Deployable software is the most tangible asset to customers. Moving from concept to application, in this article, you'll learn how to leverage the build tooling in TFS to set up a quality-focused CI process. But first, let's have a little introduction to the build system in TFS. The following image illustrates the three generations of build systems in TFS: TFS has gone through three generations of build systems. The very first was MSBuild using XML for configuration; the next one was XAML using Windows Workflow Foundation for configuration, and now, there's TFBuild using JSON for configuration. The XAML-based build system will continue to be supported in TFS 2015. No automated migration path is available from XAML build to TFBuild. This is generally because of the difference in the architecture between the two build systems. The new build system in TFS is called Team Foundation Build (TFBuild). It is an extensible task-based execution system with a rich web interface that allows authoring, queuing, and monitoring builds. TFBuild is fully cross platform with the underlying build agents that are capable of running natively on both Windows and non-Windows platforms. TFBuild provides out-of-the-box integration with Centralized Version Control such as TFVC and Distributed Version Controls such as Git and GitHub. TFBuild supports building .NET, Java, Android, and iOS applications. All the recipes in this article are based on TFBuild. TFBuild is a task orchestrator that allows you to run any build engine, such as Ant, CMake, Gradle, Gulp, Grunt, Maven, MSBuild, Visual Studio, Xamarin, XCode, and so on. TFBuild supports work item integration, publishing drops, and publishing test execution results into the TFS that is independent of the build engine that you choose. The build agents are xCopyable and do not require any installation. The agents are auto-updating in nature; there's no need to update every agent in your infrastructure: TFBuild offers a rich web-based interface. It does not require Visual Studio to author or modify a build definition. From simple to complex, all build definitions can easily be created in the web portal. The web interface is accessible from any device and any platform: The build definition can be authored from the web portal directly A build definition is a collection of tasks. A task is simply a build step. Build definition can be composed by dragging and dropping tasks. Each task supports Enabled, Continue on error, and Always run flags making it easier to manage build definitions as the task list grows: The build system supports invoking PowerShell, batch, command line, and shell scripts. All out-of-the-box tasks are open source. If a task does not satisfy your requirements, you can download the task from GitHub at https://github.com/Microsoft/vso-agent-tasks and customize it. If you can't find a task, you can easily create one. You'll learn more about custom tasks in this article. Changes to build definitions can be saved as drafts. Build definitions maintain a history of all changes in the History tab. A side-by-side comparison of the changes is also possible. Comments entered when changing the build definition show up in the change history: Build definitions can be saved as templates. This helps standardize the use of certain tasks across new build definitions: An existing build definition can be saved as a template Multiple triggers can be set for the same build, including CI triggers and multiple scheduled triggers: Rule-based retention policies support the setting up of multiple rules. Retention can be specified by "days" or "number" of the builds: The build output logs are displayed in web portal in real time. The build log can be accessed from the console even after the build gets completed: The build reports have been revamped to offer more visibility into the build execution, and among other things, the test results can now directly be accessed from the web interface. The .trx file does not need to be downloaded into Visual Studio to view the test results: The old build system had restrictions on one Team Project Collection per build controller and one controller per build machine. TFBuild removes this restriction and supports the reuse of queues across multiple Team Project Collections. The following image illustrates the architecture of the new build system: In the preceding diagram, we observe the following: Multiple agents can be configured on one machine Agents from across different machines can be grouped into a pool Each pool can have only one queue One queue can be used across multiple Team Project Collections To demonstrate the capabilities of TFBuild, we'll use the FabrikamTFVC and FabrikamGit Team Projects. Configuring TFBuild Agent, Pool, and Queues In this recipe, you'll learn how to configure agents and create pools and queues. You'll also learn how a queue can be used across multiple Team Project Collections. Getting ready Scenario: At Fabrikam, the FabrikamTFVC and FabrikamGit Team Projects need their own build queues. The FabrikamTFVC Teams build process can be executed on a Windows Server. The FabrikamGit Team build process needs both Windows and OS X. The Teams want to set up three build agents on a Windows Server; one build agent on an OS X machine. The Teams want to group two Windows Agents into a Windows Pool for FabrikamTFVC Team and group one Windows and one Mac Agent into another pool for the FabrikamGit Team: Permission: To configure a build agent, you should be in the Build Administrators Group. The prerequisites for setting up the build agent on a Windows-based machine are as follows: The build agent should have a supporting version of Windows. The list of supported versions is listed at https://msdn.microsoft.com/en-us/Library/vs/alm/TFS/administer/requirements#Operatingsystems. The build agent should have Visual Studio 2013 or 2015. The build agent should have PowerShell 3 or a newer version. A build agent is configured for your TFS as part of the server installation process if you leave the Configure the build service to start automatically option selected: For the purposes of this recipe, we'll configure the agents from scratch. Delete the default pool or any other pool you have by navigating to the Agent pools option in the TFS Administration Console http://tfs2015:8080/tfs/_admin/_AgentPool: How to do it Log into the Windows machine that you desire to set the agents upon. Navigate to the Agent pools in the TFS Administration Console by browsing to http://tfs2015:8080/tfs/_admin/_AgentPool. Click on New Pool, enter the pool name as Pool 1, and uncheck Auto-Provision Queue in Project Collections: Click on the Download agent icon. Copy the downloaded folder into E: and unzip it into E:Win-A1. You can use any drive; however, it is recommended to use the non-operating system drive: Run the PowerShell console as an administrator and change the current path in PowerShell to the location of the agent in this case E:Win-A1. Call the ConfigureAgent.ps1 script in the PowerShell console and click on Enter. This will launch the Build Agent Configuration utility: Enter the configuration details as illustrated in the following screenshot: It is recommended to install the build agent as a service; however, you have an option to run the agent as an interactive process. This is great when you want to debug a build or want to temporarily use a machine as a build agent. The configuration process creates a JSON settings file; it creates the working and diagnostics folders: Refresh the Agent pools page in the TFS Administration Console. The newly configured agent shows up under Pool 1: Repeat steps 2 to 5 to configure Win-A2 in Pool 1. Repeat steps 1 to 5 to configure Win-A3 in Pool 2. It is worth highlighting that each agent runs from its individual folder: Now, log into the Mac machine and launch terminal: Install the agent installer globally by running the commands illustrated here. You will be required to enter the machine password to authorize the install: This will download the agent in the user profile, shown as follows: The summary of actions performed when the agent is downloaded Run the following command to install the agent installer globally for the user profile: Running the following command will create a new directory called osx-A1 for the agent; create the agent in the directory: The agent installer has been copied from the user profile into the agent directory, shown as follows: Pass the following illustrated parameters to configure the agent: This completes the configuration of the xPlatform agent on the Mac. Refresh the Agent pools page in the TFS Administration Console to see the agent appear in Pool 2: The build agent has been configured at the Team Foundation Server level. In order to use the build agent for a Team Project Collection, a mapping between the build agent and Team Project Collection needs to be established. This is done by creating queues. To configure queues, navigate to the Collection Administration Console by browsing to http://tfs2015:8080/tfs/DefaultCollection/_admin/_BuildQueue. From the Build tab, click on New queue; this dialog allows you to reference the pool as a queue: Map Pool 1 as Queue 1 and Pool 2 as Queue 2 as shown here: The TFBuild Agent, Pools, and Queues are now ready to use. The green bar before the agent name and queue in the administration console indicates that the agent and queues are online. How it works... To test the setup, create a new build definition by navigating to the FabrikamTFVC Team Project Build hub by browsing to http://tfs2015:8080/tfs/DefaultCollection/FabrikamTFVC/_build. Click on the Add a new build definition icon. In the General tab, you'll see that the queues show up under the Queue dropdown menu. This confirms that the queues have been correctly configured and are available for selection in the build definition: Pools can be used across multiple Team Project Collections. As illustrated in the following screenshot, in Team Project Collection 2, clicking on the New queue... shows that the existing pools are already mapped in the default collection: Setting up a TFBuild Agent using an unattended installation The new build framework allows the unattended setup of build agents by injecting a set of parameter values via script. This technique can be used to spin up new agents to be attached into an existing agent pool. In this recipe, you'll learn how to configure and unconfigure a build agent via script. Getting ready Scenario: The FabrikamTFVC Team wants the ability to install, configure, and unconfigure a build agent directly via script without having to perform this operation using the Team Portal. Permission: To configure a build agent, you should be in the Build Administrators Group. Download the build agent as discussed in the earlier recipe Configuring TFBuild Agent, Pool, and Queues. Copy the folder to E:Agent. The script refers to this Agent folder. How to do it... Launch PowerShell in the elevated mode and execute the following command: .AgentVsoAgent.exe /Configure /RunningAsService /ServerUrl:"http://tfs2015:8080/tfs" /WindowsServiceLogonAccount:svc_build /WindowsServiceLogonPassword:xxxxx /Name:WinA-10 /PoolName:"Pool 1" /WorkFolder:"E:Agent_work" /StartMode:Automatic Replace the value of the username and password accordingly. Executing the script will result in the following output: The script installs an agent by the name WinA-10 as Windows Service running as svc_build. The agent is added to Pool 1: To unconfigure WinA-10, run the following command in an elevated PowerShell prompt: .AgentVsoAgent.exe /Unconfigure "vsoagent.tfs2015.WinA-10" To unconfigure, script needs to be executed from outside the scope of the Agent folder. Running the script from within the Agent folder scope will result in an error message. How it works... The new build agent natively allows configuration via script. A new capability called Personal Access Token (PAT) is due for release in the future updates of TFS 2015. PAT allows you to generate a personal OAuth token for a specific scope; it replaces the need to key in passwords into configuration files. Summary In this article, we have looked at configuring TFBuild Agent, Pool, and Queues and setting up a TFBuild Agent using an unattended installation. Resources for Article: Further resources on this subject: Overview of Process Management in Microsoft Visio 2013 [article] Introduction to the Raspberry Pi's Architecture and Setup [article] Implementing Microsoft Dynamics AX [article]
Read more
  • 0
  • 0
  • 14599
article-image-creating-grids-panels-and-other-widgets
Packt
10 Feb 2016
6 min read
Save for later

Creating Grids, Panels, and other Widgets

Packt
10 Feb 2016
6 min read
In this article by Raymond Camden, author of the book jQuery Mobile Web Development Essentials – Third Edition, we will look at dialogs, grids, and other widgets. While jQuery mobile provides great support for them, you get even more UI controls within the framework. In this article, we will see how to layout content with grids and make responsive grids. (For more resources related to this topic, see here.) Laying out content with grids Grids are one of the few features of jQuery mobile that do not make use of particular data attributes. Instead, you work with grids simply by specifying CSS classes for your content. Grids come in four flavors: two-column, three-column, four-column, and five-column grids. You will probably not want to use the five-column one on a phone device. Save that for a tablet instead. You begin a grid with a div block that makes use of the class ui-grid-X, where X will either be a, b, c, or d. ui-grid-a represents a two-column grid. The ui-grid-b class is a three-column grid. You can probably guess what c and d create. So, to begin a two-column grid, you would wrap your content with the following: <div class="ui-grid-a">   Content </div> Within the div tag, you then use div for each cell of the content. The class for grid calls begins with ui-block-X, where X goes from a to d. The ui-block-a class would be used for the first cell, ui-block-b for the next, and so on. This works much like the HTML tables. Putting it together, the following code snippet demonstrates a simple two-column grid with two cells of content: <div class="ui-grid-a">   <div class="ui-block-a">Left</div>   <div class="ui-block-b">Right</div> </div> The text within a cell will automatically wrap. Listing 7-1 demonstrates a simple grid with a large amount of text in one of the columns:   In the mobile browser, you can clearly see the two columns: If the text in these divs seems a bit close together, there is a simple fix for that. In order to add a bit more space between the content of the grid cells, you can add a class to your main div that specifies ui-content. This tells jQuery mobile to pad the content a bit. For example: <div class="ui-grid-a ui-content"> This small change modifies the previous screenshot like the following: Listing 7-1: test1.html <div data-role="page" id="first">       <div data-role="header">         <h1>Grid Test</h1>       </div>       <div role="main" class="ui-content">         <div class="ui-grid-a">           <div class="ui-block-a">           <p>           This is my left hand content. There won't be a lot of           it.           </p>           </div>           <div class="ui-block-b">             <p>               This is my right hand content. I'm going to fill it               with some dummy text.             </p>             <p>               Bacon ipsum dolor sit amet andouille capicola spare               ribs, short loin venison sausage prosciutto               turkey flank frankfurter pork belly short ribs.               chop, pancetta turkey bacon short ribs ham flank               pork belly. Tongue strip steak short ribs tail           </p>           </div>         </div>       </div>     </div> Working with other types of grids then is simply a matter of switching to the other classes. For example, a four-column grid would be set up similar to the following code snippet: <div class="ui-grid-c">   <div class="ui-block-a">1st cell</div>   <div class="ui-block-b">2nd cell</div>   <div class="ui-block-c">3rd cell</div> </div> Again, keep in mind your target audience. Anything over two columns may be too thin on a mobile phone. To create multiple rows in a grid, you need to simply repeat the blocks. The following code snippet demonstrates a simple example of a grid with two rows of cells: <div class="ui-grid-a">   <div class="ui-block-a">Left Top</div>   <div class="ui-block-b">Right Top</div>   <div class="ui-block-a">Left Bottom</div>   <div class="ui-block-b">Right Bottom</div> </div> Notice that there isn't any concept of a row. jQuery mobile handles knowing that it should create a new row when the block starts over with the one marked ui-block-a. The following code snippet, Listing 7-2, is a simple example: Listing 7-2:test2.html <div data-role="page" id="first">       <div data-role="header">         <h1>Grid Test</h1>       </div>       <div role="main" class="ui-content">         <div class="ui-grid-a">           <div class="ui-block-a">             <p>               <img src="ray.png">             </p>           </div>           <div class="ui-block-b">           <p>           This is Raymond Camden. Here is some text about him. It           may wrap or it may not but jQuery Mobile will make it          look good. Unlike Ray!           </p>           </div>           <div class="ui-block-a">             <p>               This is Scott Stroz. Scott Stroz is a guy who plays               golf and is really good at FPS video games.             </p>           </div>           <div class="ui-block-b">             <p>               <img src="scott.png">             </p>           </div>         </div>       </div>     </div> The following screenshot shows the result: Making responsive grids Earlier we mentioned that complex grids may not work depending on the size or your targeted devices. A simple two-column grid is fine, but the larger grids would render well on tablets only. Luckily, there's a simple solution for it. jQuery mobile's latest updates include much better support for responsive design. Let's consider a simple example. Here is a screenshot of a web page using a four-column grid: It is readable for sure, but it is a bit dense. By making use of responsive design, we could handle the different sizes intelligently using the same basic HTML. jQuery mobile enables a simple solution for this by adding the class ui-responsive to an existing grid. Here is an example: <div class="ui-grid-c ui-responsive"> By making this one small change, look how the phone version of our page changes: The four-column layout now is a one-column layout instead. If viewed in a tablet, the original four-column design will be preserved. Summary In this article, you learned more about how jQuery mobile enhances basic HTML to provide additional layout controls to our mobile pages. With grids, you learned a new way to easily layout content in columns. Resources for Article:   Further resources on this subject: Classes and Instances of Ember Object Model [article] Introduction to Akka [article] CoreOS Networking and Flannel Internals [article]
Read more
  • 0
  • 0
  • 11000

article-image-introduction-docker
Packt
09 Feb 2016
4 min read
Save for later

Introduction to Docker

Packt
09 Feb 2016
4 min read
In this article by Rajdeep Dua, the author of the book Learning Docker Networking, we will look at an introduction of Docker networking and its components. (For more resources related to this topic, see here.) Docker is a lightweight containerization technology that has gathered enormous interest in recent years. It neatly bundles various Linux kernel features and services, such as namespaces, cgroups, SELinux, and AppArmor profiles, over union filesystems such as AUFS and BTRFS in order to make modular images. These images provide a highly configurable virtualized environment for applications and follow a write once, run anywhere workflow. Applications can be as simple as running a process to a highly scalable and distributed one. Therefore, there is a need for powerful networking elements that can support various complex use cases. Networking and Docker Each Docker container has its own network stack, and this is due to the Linux kernel's net namespace, where a new net namespace for each container is instantiated and cannot be seen from outside the container or from other containers. Docker networking is powered by the following network components and services. Linux bridges These are L2/MAC learning switches built into the kernel and are to be used for forwarding. Open vSwitch This is an advanced bridge that is programmable and supports tunneling. NAT Network address translators are immediate entities that translate IP addresses and ports (SNAT, DNAT, and so on). IPtables This is a policy engine in the kernel used for managing packet forwarding, firewall, and NAT features. AppArmor/SELinux Firewall policies for each application can be defined with these. Various networking components can be used to work with Docker, providing new ways to access and use Docker-based services. As a result, we see a lot of libraries that follow a different approach to networking. Some of the prominent ones are Docker Compose, Weave, Kubernetes, Pipework, and Libnetwork. The following figure depicts the root ideas of Docker networking: Docker networking modes What's new in Docker networking? Docker networking is at a very nascent stage, and there are many interesting contributions from the developer community, such as Pipework, Weave, Clocker, and Kubernetes. Each of them reflects a different aspect of Docker networking. We will learn about them in later chapters. Docker, Inc. has also established a new project, where networking will be standardized. It is called libnetwork. Libnetwork implements the Container Network Model (CNM), which formalizes the steps required to provide networking for containers while providing an abstraction that can be used to support multiple network drivers. The CNM is built on three main components—sandbox, endpoint, and network. Sandbox A sandbox contains the configuration of a container's network stack. This includes management of the container's interfaces, routing table, and DNS settings. An implementation of a sandbox could be a Linux network namespace, a FreeBSD jail, or other similar concept. A sandbox may contain many endpoints from multiple networks. Endpoint An endpoint connects a sandbox to a network. An implementation of an endpoint could be a veth pair, an Open vSwitch internal port, or something similar. An endpoint can belong to only one network but may only belong to one Sandbox. Network A network is a group of endpoints that are able to communicate with each other directly. An implementation of a network could be a Linux bridge, a VLAN, and so on. Networks consist of many endpoints, as shown in the following diagram: The Docker CNM model The Docker CNM model The CNM provides the following contract between networks and containers: All containers on the same network can communicate freely with each other Multiple networks are the way to segment traffic between containers and should be supported by all drivers Multiple endpoints per container are the way to join a container to multiple networks An endpoint is added to a network sandbox to provide it with network connectivity Summary In this article, we learned about the essential components of Docker networking, which have evolved from coupling simple Docker abstractions and powerful network components such as Linux bridges and Open vSwitch. We also talked about the next generation of Docker networking, which is called libnetwork. Resources for Article: Further resources on this subject: Advanced Container Resource Analysis [article] Docker in Production [article] Elucidating the Game-changing Phenomenon of the Docker-inspired Containerization Paradigm [article]
Read more
  • 0
  • 0
  • 5268
Modal Close icon
Modal Close icon