Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-indexing-replicating-and-sharding-in-mongodb-tutorial
Amey Varangaonkar
29 Jun 2018
11 min read
Save for later

Indexing, Replicating, and Sharding in MongoDB [Tutorial]

Amey Varangaonkar
29 Jun 2018
11 min read
MongoDB is an open source, document-oriented, and cross-platform database. It is primarily written in C++. It is also the leading NoSQL database and tied with the SQL database in the fifth position after PostgreSQL. It provides high performance, high availability, and easy scalability. MongoDB uses JSON-like documents with schema. MongoDB, developed by MongoDB Inc., is free to use. It is published under a combination of the GNU Affero General Public License and the Apache License. In this article, we look at the indexing, replication and sharding features offered by MongoDB. The following excerpt is taken from the book 'Seven NoSQL Databases in a Week' written by Aaron Ploetz et al. Introduction to MongoDB indexing Indexes allow efficient execution of MongoDB queries. If we don't have indexes, MongoDB has to scan all the documents in the collection to select those documents that match the criteria. If proper indexing is used, MongoDB can limit the scanning of documents and select documents efficiently. Indexes are a special data structure that store some field values of documents in an easy-to-traverse way. Indexes store the values of specific fields or sets of fields, ordered by the values of fields. The ordering of field values allows us to apply effective algorithms of traversing, such as the mid-search algorithm, and also supports range-based operations effectively. In addition, MongoDB can return sorted results easily. Indexes in MongoDB are the same as indexes in other database systems. MongoDB defines indexes at the collection level and supports indexes on fields and sub-fields of documents. The default _id index MongoDB creates the default _id index when creating a document. The _id index prevents users from inserting two documents with the same _id value. You cannot drop an index on an _id field. The following syntax is used to create an index in MongoDB: >db.collection.createIndex(<key and index type specification>, <options>); The preceding method creates an index only if an index with the same specification does not exist. MongoDB indexes use the B-tree data structure. The following are the different types of indexes: Single field: In addition to the _id field index, MongoDB allows the creation of an index on any single field in ascending or descending order. For a single field index, the order of the index does not matter as MongoDB can traverse indexes in any order. The following is an example of creating an index on the single field where we are creating an index on the firstName field of the user_profiles collection: The query gives acknowledgment after creating the index: This will create an ascending index on the firstName field. To create a descending index, we have to provide -1 instead of 1. Compound index: MongoDB also supports user-defined indexes on multiple fields. The order of fields defined while creating an index has a significant effect. For example, a compound index defined as {firstName:1, age:-1} will sort data by firstName first and then each firstName with age. Multikey index: MongoDB uses multi-key indexes to index the content in the array. If you index the field that contains the array values, MongoDB creates an index for each field in the object of an array. These indexes allow queries to select the document by matching the element or set of elements of the array. MongoDB automatically decides whether to create multi-key indexes or not. Text indexes: MongoDB provides text indexes that support the searching of string contents in the MongoDB collection. To create text indexes, we have to use the db.collection.createIndex() method, but we need to pass a text string literal in the query: You can also create text indexes on multiple fields, for example: Once the index is created, we get an acknowledgment: Compound indexes can be used with text indexes to define an ascending or descending order of the index. Hashed index: To support hash-based sharding, MongoDB supports hashed indexes. In this approach, indexes store the hash value and query, and the select operation checks the hashed indexes. Hashed indexes can support only equality-based operations. They are limited in their performance of range-based operations. Indexes have the following properties: Unique indexes: Indexes should maintain uniqueness. This makes MongoDB drop the duplicate value from indexes. Partial Indexes: Partial indexes apply the index on documents of a collection that match a specified condition. By applying an index on the subset of documents in the collection, partial indexes have a lower storage requirement as well as a reduced performance cost. Sparse index: In the sparse index, MongoDB includes only those documents in the index in which the index field is present, other documents are discarded. We can combine unique indexes with a sparse index to reject documents that have duplicate values but ignore documents that have an indexed key. TTL index: TTL indexes are a special type of indexes where MongoDB will automatically remove the document from the collection after a certain amount of time. Such indexes are ideal to remove machine-generated data, logs, and session information that we need for a finite duration. The following TTL index will automatically delete data from the log table after 3000 seconds: Once the index is created, we get an acknowledgment message: The limitations of indexes: A single collection can have up to 64 indexes only. The qualified index name is <database-name>.<collection-name>.$<index-name> and cannot have more than 128 characters. By default, the index name is a combination of index type and field name. You can specify an index name while using the createIndex() method to ensure that the fully-qualified name does not exceed the limit. There can be no more than 31 fields in the compound index. The query cannot use both text and geospatial indexes. You cannot combine the $text operator, which requires text indexes, with some other query operator required for special indexes. For example, you cannot combine the $text operator with the $near operator. Fields with 2d sphere indexes can only hold geometry data. 2d sphere indexes are specially provided for geometric data operations. For example, to perform operations on co-ordinate, we have to provide data as points on a planer co-ordinate system, [x, y]. For non-geometries, the data query operation will fail. The limitation on data: The maximum number of documents in a capped collection must be less than 2^32. We should define it by the max parameter while creating it. If you do not specify, the capped collection can have any number of documents, which will slow down the queries. The MMAPv1 storage engine will allow 16,000 data files per database, which means it provides the maximum size of 32 TB. We can set the storage.mmapv1.smallfile parameter to reduce the size of the database to 8 TB only. Replica sets can have up to 50 members. Shard keys cannot exceed 512 bytes. Replication in MongoDB A replica set is a group of MongoDB instances that store the same set of data. Replicas are basically used in production to ensure a high availability of data. Redundancy and data availability: because of replication, we have redundant data across the MongoDB instances. We are using replication to provide a high availability of data to the application. If one instance of MongoDB is unavailable, we can serve data from another instance. Replication also increases the read capacity of applications as reading operations can be sent to different servers and retrieve data faster. By maintaining data on different servers, we can increase the locality of data and increase the availability of data for distributed applications. We can use the replica copy for backup, reporting, as well as disaster recovery. Working with replica sets A replica set is a group of MongoDB instances that have the same dataset. A replica set has one arbiter node and multiple data-bearing nodes. In data-bearing nodes, one node is considered the primary node while the other nodes are considered the secondary nodes. All write operations happen at the primary node. Once a write occurs at the primary node, the data is replicated across the secondary nodes internally to make copies of the data available to all nodes and to avoid data inconsistency. If a primary node is not available for the operation, secondary nodes use election algorithms to select one of their nodes as a primary node. A special node, called an arbiter node, is added in the replica set. This arbiter node does not store any data. The arbiter is used to maintain a quorum in the replica set by responding to a heartbeat and election request sent by the secondary nodes in replica sets. As an arbiter does not store data, it is a cost-effective resource used in the election process. If votes in the election process are even, the arbiter adds a voice to choose a primary node. The arbiter node is always the arbiter, it will not change its behavior, unlike a primary or secondary node. The primary node can step down and work as secondary node, while secondary nodes can be elected to perform as primary nodes. Secondary nodes apply read/write operations from a primary node to secondary nodes asynchronously. Automatic failover in replication Primary nodes always communicate with other members every 10 seconds. If it fails to communicate with the others in 10 seconds, other eligible secondary nodes hold an election to choose a primary-acting node among them. The first secondary node that holds the election and receives the majority of votes is elected as a primary node. If there is an arbiter node, its vote is taken into consideration while choosing primary nodes. Read operations Basically, the read operation happens at the primary node only, but we can specify the read operation to be carried out from secondary nodes also. A read from a secondary node does not affect data at the primary node. Reading from secondary nodes can also give inconsistent data. Sharding in MongoDB Sharding is a methodology to distribute data across multiple machines. Sharding is basically used for deployment with a large dataset and high throughput operations. The single database cannot handle a database with large datasets as it requires larger storage, and bulk query operations can use most of the CPU cycles, which slows down processing. For such scenarios, we need more powerful systems. One approach is to add more capacity to a single server, such as adding more memory and processing units or adding more RAM on the single server, this is also called vertical scaling. Another approach is to divide a large dataset across multiple systems and serve a data application to query data from multiple servers. This approach is called horizontal scaling. MongoDB handles horizontal scaling through sharding. Sharded clusters MongoDB's sharding consists of the following components: Shard: Each shard stores a subset of sharded data. Also, each shard can be deployed as a replica set. Mongos: Mongos provide an interface between a client application and sharded cluster to route the query. Config server: The configuration server stores the metadata and configuration settings for the cluster. The MongoDB data is sharded at the collection level and distributed across sharded clusters. Shard keys: To distribute documents in collections, MongoDB partitions the collection using the shard key. MongoDB shards data into chunks. These chunks are distributed across shards in sharded clusters. Advantages of sharding Here are some of the advantages of sharding: When we use sharding, the load of the read/write operations gets distributed across sharded clusters. As sharding is used to distribute data across a shard cluster, we can increase the storage capacity by adding shards horizontally. MongoDB allows continuing the read/write operation even if one of the shards is unavailable. In the production environment, shards should deploy with a replication mechanism to maintain high availability and add fault tolerance in a system. Indexing, sharding and replication are three of the most important tasks to perform on any database, as they ensure optimal querying and database performance. In this article, we saw how MongoDB facilitates these tasks and makes them as easy as possible for the administrators to take care of. If you found the excerpt to be useful, make sure you check out our book Seven NoSQL Databases in a Week to learn more about the different database administration techniques in MongoDB, as well as the other popularly used NoSQL databases such as Redis, HBase, Neo4j, and more. Read more Top 5 programming languages for crunching Big Data effectively Top 5 NoSQL Databases Is Apache Spark today’s Hadoop?
Read more
  • 0
  • 0
  • 23476

article-image-nmap-fundamentals
Packt
26 Nov 2012
7 min read
Save for later

Nmap Fundamentals

Packt
26 Nov 2012
7 min read
(For more resources related to this topic, see here.) Nmap (Network Mapper) Nmap (Network Mapper) is an open-source tool specialized in network exploration and security auditing, originally published by Gordon "Fyodor" Lyon. The official website (http://nmap.org) describes it as follows: Nmap (Network Mapper) is a free and open source (license) utility for network discovery and security auditing. Many systems and network administrators also find it useful for tasks such as network inventory, managing service upgrade schedules, and monitoring host or service uptime. Nmap uses raw IP packets in novel ways to determine what hosts are available on the network, what services (application name and version) those hosts are offering, what operating systems (and OS versions) they are running, what type of packet filters/firewalls are in use, and dozens of other characteristics. It was designed to rapidly scan large networks, but works fine against single hosts. Nmap runs on all major computer operating systems, and official binary packages are available for Linux, Windows, and Mac OS X. There are many other port scanners out there, but none of them even comes close to offering the flexibility and advanced options of Nmap. The Nmap Scripting Engine (NSE) has revolutionized the possibilities of a port scanner by allowing users to write scripts that perform custom tasks using the host information collected by Nmap. Additionally, the Nmap Project includes other great tools: Zenmap: A graphical interface for Nmap Ndiff: A tool for scan result comparison Nping: An excellent tool for packet generation and traffic analysis Ncrack: An Nmap-compatible tool for brute forcing network logins Ncat: A debugging utility to read and write data across networks Needless to say, it is essential that every security professional and network administrator master this tool to conduct security assessments, monitor, and administer networks efficiently. Nmap's community is very active, and new features are added every week. I encourage you to always keep an updated copy in your arsenal, if you haven't done this already; and even better, to subscribe to the development mailing list at http://cgi.insecure.org/mailman/listinfo/nmap-dev. Downloading Nmap from the official source code repository This section describes how to download Nmap's source code from the official subversion repository. By doing so, users can compile the latest version of Nmap and keep up with the daily updates that are committed to the subversion repository. Getting ready Before continuing, you need to have a working Internet connection and access to a subversion client. Unix-based platforms come with a command-line client called subversion (svn). To check if its already installed in your system, just open a terminal and type: $ svn If it tells you that the command was not found, install svn using your favorite package manager or build it from source code. The instructions for building svn from source code are out of the scope of this book, but they are widely documented online. Use your favorite search engine to find specific instructions for your system. If you would rather work with a graphical user interface, RapidSVN is a very popular, crossplatform alternative. You can download and install RapidSVN from http://rapidsvn.tigris.org/. How to do it... Open your terminal and enter the following command: $ svn co --username guest https://svn.nmap.org/nmap/ Downloading the example code You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. Wait until svn downloads all the files stored in the repository. You should see the list of the added files as it finishes, as shown in the following screenshot: When the program returns/exits, you will have Nmap's source code in your current directory. How it works... $ svn checkout https://svn.nmap.org/nmap/ This command downloads a copy of the remote repository located at https://svn.nmap.org/nmap/. This repository has world read access to the latest stable build, allowing svn to download your local working copy. There's more... If you are using RapidSVN then follow these steps: Right-click on Bookmarks. Click on Checkout New Working Copy. Type https://svn.nmap.org/nmap/ in the URL field. Select your local working directory. Click on OK to start downloading your new working copy. Experimenting with development branches If you want to try the latest creations of the development team, there is a folder named nmapexp that contains different experimental branches of the project. Code stored there is not guaranteed to work all the time, as the developers use it as a sandbox until it is ready to be merged into the stable branch. The full subversion URL of this folder is https://svn.nmap.org/nmap-exp/. Keeping your source code up-to-date To update a previously-downloaded copy of Nmap, use the following command inside your working directory: $ svn update You should see the list of files that have been updated, as well as some revision information. Compiling Nmap from source code Precompiled packages always take time to prepare and test, causing delays between releases. If you want to stay up-to-date with the latest additions, compiling Nmap's source code is highly recommended. This recipe describes how to compile Nmap's source code in the Unix environment. Getting ready Make sure the following packages are installed in your system: gcc openssl make Install the missing software using your favorite package manager or build it from source code. How to do it... Open your terminal and go into the directory where Nmap's source code is stored. Configure it according to your system: $ ./configure An ASCII dragon warning you about the power of Nmap will be displayed (as shown in the following screenshot) if successful, otherwise lines specifying an error will be displayed. Build Nmap using the following command: $ make If you don't see any errors, you have built the latest version of Nmap successfully. You can check this by looking for the compiled binary Nmap in your current directory. If you want to make Nmap available for all the users in the system, enter the following command: # make install How it works... We used the script configure to set up the different parameters and environmental variables affecting your system and desired configuration. Afterwards, GNUs make generated the binary files by compiling the source code. There's more... If you only need the Nmap binary, you can use the following configure directives to avoid installing Ndiff, Nping, and Zenmap: Skip the installation of Ndiff by using --without-ndiff Skip the installation of Zenmap by using --without-zenmap Skip the installation of Nping by using --without-nping OpenSSL development libraries OpenSSL is optional when building Nmap. Enabling it allows Nmap to access the functions of this library related to multiprecision integers, hashing, and encoding/decoding for service detection and Nmap NSE scripts. The name of the OpenSSL development package in Debian systems is libssl-dev. Configure directives There are several configure directives that can be used when building Nmap. For a complete list of directives, use the following command: The name of the OpenSSL development package in Debian systems is libssl-dev. $ ./configure --help Precompiled packages There are several precompiled packages available online (http://nmap.org/download. html) for those who don't have access to a compiler, but unfortunately, it's very likely you will be missing features unless its a very recent build. Nmap is continuously evolving. If you are serious about harnessing the power of Nmap, keep your local copy up-to-date with the official repository.
Read more
  • 0
  • 0
  • 23468

article-image-creating-and-deploying-a-chatbot-using-dialogflow-tutorial
Bhagyashree R
10 Oct 2018
8 min read
Save for later

Creating and deploying a chatbot using Dialogflow [Tutorial]

Bhagyashree R
10 Oct 2018
8 min read
Dialogflow (previously called API.AI) is a conversational agent building platform from Google. It is a web-based platform that can be accessed from any web browser. The tool has evolved over time from what was built as an answer to Apple Siri for the Android platform. It was called SpeakToIt, an Android app that created Siri-like conversational experiences on any Android smartphone. The AI and natural language technology that powered the SpeakToIt app was opened up to developers as API.AI in 2015. This tutorial is an excerpt from a book written by Srini Janarthanam titled Hands-On Chatbots and Conversational UI Development.  In this article, we will create a basic chatbot using Dialogflow, add user intents, and finally, we will see how to integrate the chatbot with a website and Facebook. Setting up Dialogflow First, let us create a developer account on API.AI (now called as Dialogflow). Go to Dialogflow: Click GO TO CONSOLE on the top-right corner. Sign in. You may need to use your Google account to sign in. Creating a basic agent Let us create our first agent on Dialogflow: To create a new agent, click the drop-down menu on the left on the home page and click Create new agent. Fill in the form on the right. Give it a name and description. Choose a time zone and click CREATE. This will take you to the page with the intents listing. You will notice that there are two intents already: Default Fallback Intent and Default Welcome Intent. Let's add your first intent. Intent is what the user or bot wants to convey using utterances or button presses. An intent is a symbolic representation of an utterance. We need intents because there are many ways to ask for the same thing. The process of identifying intents is to map the many ways unambiguously to an intent. For instance, the user could ask to know the weather in their city using the following utterances: "hows the weather in london" "whats the weather like in london" "weather in london" "is it sunny outside just now" In the preceding utterances, the user is asking for a weather report in the city of London. In some of these utterances, they also mention time (that is, now). In others, it is implicit. The first step of our algorithm is to map these many utterances into a single intent: request_weather_report. The Intent name corresponds to users' intents. So name them from the user's perspective. Let's add a user_greet intent that corresponds to the act of greeting the chatbot by the user.  To add an intent, click the CREATE INTENT button. You will see the following page where you can create a new intent: Give the intent a name (for example, user_greet). Add sample user utterances in the User says text field. These are sample utterances that will help the agent identify the user's intent. Let's add a few greeting utterances that the user might say to our chatbot: hello hello there Hi there Albert hello doctor good day doctor Ignore the Events tab for the moment and move on to the Action tab. Add a name to identify the system intent here (for example, bot_greet to represent chatbot's greeting to the user). In the Response tab, add the bot's response to the user. This is the actual utterance that the bot will send to the user. Let's add the following utterance in the Text response field. You can add more responses so that the agent can randomly pick one to make it less repetitive and boring: Hi there. I am Albert. Nice to meet you! You can also add up to 10 additional responses by clicking the ADD MESSAGE CONTENT. Click SAVE button in the top-right corner to save the intent. You have created your very first intent for the agent. Test it by using the simulator on the right side of the page. In the Try it now box, type hello and press Enter: You will see the chatbot recognizing your typed utterance and responding appropriately. Now go on and add a few more intents by repeating steps 5 through 10. To create a new intent, click the + sign beside the Intents option in the menu on the left: Think about what kind of information users will ask the chatbot and make a list. These will become user intents. The following is a sample list to get you started: request_name request_birth_info request_parents_names request_first_job_experience request_info_on_hobbies request_info_patent_job request_info_lecturer_job_bern Of course, this list can be endless. So go on and have fun. Once you have put in the sufficient number of facts in the mentioned format, you can test the chatbot on the simulator as explained in step 10. Deploying the chatbot Now that we have a chatbot, let us get it published on a platform where users can actually use it. Dialogflow enables you to integrate the chatbot (that is, agent) with many platforms. Click Integrations to see all the platforms that are available: In this section, we will explore two platform integrations: website and Facebook: Website integration Website integration allows you to put that chatbot on a website. The user can interact with the chatbot on the website just as they would with a live chat agent. On the Integrations page, find the Web Demo platform and slide the switch from off to on. Click Web Demo to open the following settings dialog box: Click the bot.dialogflow.com URL to open the sample webpage where you can find the bot on a chat widget embedded on the page. Try having a chat with it: You can share the bot privately by email or on social media by clicking the Email and Share option. The chat widget can also be embedded in any website by using the iframe embed code found in the settings dialog box. Copy and paste the code into an HTML page and try it out in a web browser: <iframe width="350" height="430" src="https://console.api.ai/api-client/demo/embedded/ 2d55ca53-1a4c-4241-8852-a7ed4f48d266"> </iframe> Facebook integration In order to publish the API.AI chatbot on Facebook Messenger, we need a Facebook page to start with. We also need a Facebook Messenger app that subscribes to the page. To perform the following steps you need to first create a Facebook page and a Facebook Messenger app. Let's discuss the further steps here: Having created a Facebook Messenger app, get its Page Access Token. You can get this on the app's Messenger Settings tab: In the same tab, click Set up Webhooks. A dialog box called New Page Subscription will open. Keep it open in one browser tab. In another browser tab, from the Integrations page of API.AI, click Facebook Messenger: Copy the URL in the Callback URL text field. This is the URL of the API.AI agent to call from the Messenger app. Paste this in the Callback URL text field of the New Page Subscription dialog box on the Facebook Messenger app. Type in a verification token. It can be anything as long as it matches the one on the other side. Let's type in iam-einstein-bot. Subscribe to messages and messaging_postbacks in the Subscription Fields section. And wait! Don't click Verify and Save just yet: In the API.AI browser tab, you will have the integrations settings open. Slide the switch to on from the off position on the top-right corner. This will allow you to edit the settings. Type the Verify Token. This has to be the same as the one used in the Facebook Messenger App settings in step 5. Paste the Page Access Token and click START. Now go back to the Facebook Messenger app and click Verify and Save. This will connect the app to the agent (chatbot). Now on the Facebook Messenger settings page, under Webhooks, select the correct Facebook page that the app needs to subscribe to and hit Subscribe: You should now be able to open the Facebook page, click Send Message, and have a chat with the chatbot: Brilliant! Now you have successfully created a chatbot in API.AI and deployed it on two platforms: web and Facebook Messenger. In addition to these platforms, API.AI enables integration of your agent with several popular messaging platforms such as Slack, Skype, Cisco Spark, Viber, Kik, Telegram, and even Twitter. If you found this post useful, do check out the book, Hands-On Chatbots and Conversational UI Development, which will help you explore the world of conversational user interfaces. Build and train an RNN chatbot using TensorFlow [Tutorial] Facebook’s Wit.ai: Why we need yet another chatbot development framework? Voice, natural language, and conversations: Are they the next web UI?
Read more
  • 0
  • 0
  • 23467

article-image-crud-applications-using-laravel-4
Packt
19 Dec 2013
18 min read
Save for later

CRUD Applications using Laravel 4

Packt
19 Dec 2013
18 min read
(for more resources related to this topic, see here.) Getting familiar with Laravel 4 Let's Begin the Journey, and install Laravel 4. Now if everything is installed correctly you will be greeted by this beautiful screen, as shown in the following screenshot, when you hit your browser with http://localhost/laravel/public or http://localhost/<installeddirectory>/public: Now that you can see we have installed Laravel correctly, you would be thinking how can I use Laravel? How do I create apps with Laravel? Or you might be wondering why and how this screen is shown to us? What's behind the scenes? How Laravel 4 sets this screen for us? So let's review that. When you visit the http://localhost/laravel/public, Laravel 4 detects that you are requesting for the default route which is "/". You would be wondering what route is this if you are not familiar with the MVC world. Let me explain that. In traditional web applications we use a URL with page name, say for example: http://www.shop.com/products.php The preceding URL will be bound to the page products.php in the web server hosting shop.com. We can assume that it displays all the products from the database. Now say for example, we want to display a category of books from all the products. You will say, "Hey, it's easy!" Just add the category ID into the URL as follows: http://www.shop.com/products.php?cat=1 Then put the filter in the page products.php that will check whether the category ID is passed. This sounds perfect, but what about pagination and other categories? Soon clients will ask you to change one of your category page layouts to change and you will hack your code more. And your application URLs will look like the following: http://www.shop.com/products.php?cat=2 http://www.shop.com/products.php?cat=3&page=1&total=20 http://www.shop.com/products.php?cat=3&page=1&total=20&layout=1 If you look at your code after six months, you would be looking at one huge products.php page with all of your business and view code mixed in one large file. You wouldn't remember those easy hacks you did in order to manage client requests. On top of that, a client or client's SEO executive might ask you why are all the URLs so badly formatted? Why are they are not human friendly? In a way they are right. Your URLs are not as pretty as the following: http://www.shop.com/products http://www.shop.com/products/books http://www.shop.com/products/cloths The preceding URLs are human friendly. Users can easily change categories themselves. In addition to that, your client's SEO executives will love you for those URLs just as a search engine likes those URLs. You might be puzzled now; how do you do that? Here my friend MVC (Model View Controller) comes into the picture. MVC frameworks are meant specifically for doing this. It's one of the core goals of using the MVC framework in web development. So let's go back to our topic "routing"; routing means decoupling your URL request and assigning it to some specific action via your controller/route. In the Laravel MVC world, you register all your routes in a route file and assign an action to them. All your routes are generally found at /app/routes.php. If you open your newly downloaded Laravel installation's routes.php file, you will notice the following code: Route::get('/', function() { return View::make('hello'); }); The preceding code registers a route with / means default URL with view /app/views/hello.php. Here view is just an .html file. Generally view files are used for managing your presentation logic. So check /app/views/hello.php, or better let's create an about page for our application ourselves. Let's register a route about by adding the following code to app/routes.php: Route::get('about', function() { return View::make('about'); }); We would need to create a view at app/views/about.php. So create the file and insert the following code in to it: <!doctype html> <html lang="en"> <head> <meta charset="UTF-8"> <title>About my little app</title> </head> <body> <h1>Hello Laravel 4!</h1> <p> Welcome to the Awesomeness! </p> </body> </html> Now head over to your browser and run http://localhost/laravel/public/about. You will be greeted with the following output: Hello Laravel 4! Welcome to the Awesomeness! Isn't it easy? You can define your route and separate the view for each type of request. Now you might be thinking what about Controllers as the term MVC has C for Controllers? And isn't it difficult to create routes and views for each action? What advantage will we have if we use the preceding pattern? Well we found that mapping URLs to a particular action in comparison to the traditional one-file-based method. Well first you are organizing your code way better as you will have actions responding to specific URLs mapped in the route file. Any developer can recognize routes and see what's going on with your code. Developers do not have to check many files to see which files are using which code. Your presentation logic is separated, so if a designer wants to change something, he will know he needs to look at the view folder of your application. Now about Controllers; they allow us to group related actions into a single class. So in a typical MVC project, there will be one user Controller that will be responsible for all user-related actions, such as registering, logging in, editing a profile, and changing the password. Generally routes are used for small applications or creating static pages quickly. Controllers provide more in-depth options to create a group of methods that belong to a specific class related to the application. Here is how we can create Controllers in Laravel 4. Open your app/routes.php file and add following code: Route::get('contact', 'Pages@contact'); The preceding code will register the http://yourapp.com/contact URL in the Pages Controller's contact method. So let's write a page's Controller. Create a file PagesController.php at /app/controllers/ in your Laravel 4 installation directory. The following are the contents of the PagesController.php file: <?php class PagesController extends BaseController { public function contact() { return View::make('hello'); } } Here BaseController is a class provided by Laravel so we can place our Controller shared logic in a common class. And it extends the framework's Controller class and provides the Controller functionality. You can check Basecontroller.php in the Controller's directory to add shared logic. Controllers versus routes So you are wondering now, "What's the difference between Controllers and routes?" Which one to use? Controllers or routes? Here are the differences between Controllers and routes: A disadvantage of routes is that you can't share code between routes, as routes work via Closure functions. And the scope of a function is bound within function. Controllers give a structure to your code. You can define your system in well-grouped classes, which are divided in such a way that it makes sense, for example, users, dashboard, products, and so on. Compared to routes, Controllers have only one disadvantage and it's that you have to create a file for each Controller; however, if you think in terms of organizing the code in a large application, it makes more sense to use Controllers.   Creating a simple CRUD application with Laravel 4 Now as we have a basic understanding of how we can create pages, let's create a simple CRUD application with Laravel 4. The application we want to create will manage the users of our application. We will create the following list of features for our application: List users (read users from the database) Create new users Edit user information Delete user information Adding pagination to the list of users Now to start off with things, we would need to set up a database. So if you have phpMyAdmin installed with your local web server setup, head over to http://localhost/phpmyadmin; if you don't have phpMyAdmin installed, use the MySQL admin tool workbench to connect with your database and create a new database. Now we need to configure Laravel 4 to connect with our database. So head over to your Laravel 4 application folder, open /app/config/database.php, change the MySQL array, and match your current database settings. Here is the MySQL database array from database.php file: 'mysql' => array( 'driver' => 'mysql', 'host' => 'localhost', 'database' => '<yourdbname>', 'username' => 'root', 'password' => '<yourmysqlpassord>', 'charset' => 'utf8', 'collation' => 'utf8_unicode_ci', 'prefix' => '', ), Now we are ready to work with the database in our application. Let's first create the database table Users via the following SQL queries from phpMyAdmin or any MySQL database admin tool; CREATE TABLE IF NOT EXISTS 'users' ( 'id' int(10) unsigned NOT NULL AUTO_INCREMENT, 'username' varchar(255) COLLATE utf8_unicode_ci NOT NULL, 'password' varchar(255) COLLATE utf8_unicode_ci NOT NULL, 'email' varchar(255) COLLATE utf8_unicode_ci NOT NULL, 'phone' varchar(255) COLLATE utf8_unicode_ci NOT NULL, 'name' varchar(255) COLLATE utf8_unicode_ci NOT NULL, 'created_at' timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', 'updated_at' timestamp NOT NULL DEFAULT '0000-00-00 00:00:00', PRIMARY KEY ('id') ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=3 ; Now let's seed some data into the Users table so when we fetch the users we won't get empty results. Run the following queries into your database admin tool: INSERT INTO 'users' ('id', 'username', 'password', 'email', 'phone', 'name', 'created_at', 'updated_at') VALUES (1, 'john', 'johndoe', 'johndoe@gmail.com', '123456', 'John', '2013-06-07 08:13:28', '2013-06-07 08:13:28'), (2, 'amy', 'amy.deg', 'amy@outlook.com', '1234567', 'amy', '2013-06-07 08:14:49', '2013-06-07 08:14:49');   Listing the users – read users from database Let's read users from the database. We would need to follow the steps described to read users from database: A route that will lead to our page A controller that will handle our method The Eloquent Model that will connect to the database A view that will display our records in the template So let's create our route at /app/routes.php. Add the following line to the routes.php file: Route::resource('users', 'UserController'); If you have noticed previously, we had Route::get for displaying our page Controller. But now we are using resource. So what's the difference? In general we face two types of requests during web projects: GET and POST. We generally use these HTTP request types to manipulate our pages, that is, you will check whether the page has any POST variables set; if not, you will display the user form to enter data. As a user submits the form, it will send a POST request as we generally define the <form method="post"> tag in our pages. Now based on page's request type, we set the code to perform actions such as inserting user data into our database or filtering records. What Laravel provides us is that we can simply tap into either a GET or POST request via routes and send it to the appropriate method. Here is an example for that: Route::get('/register', 'UserController@showUserRegistration'); Route::post('/register', 'UserController@saveUser'); See the difference here is we are registering the same URL, /register, but we are defining its GET method so Laravel can call UserController class' showUserRegistration method. If it's the POST method, Laravel should call the saveUser method of the UserController class. You might be wondering what's the benefit of it? Well six months later if you want to know how something's happening in your app, you can just check out the routes.php file and guess which Controller and which method of Controller handles the part you are interested in, developing it further or solving some bug. Even some other developer who is not used to your project will be able to understand how things work and can easily help move your project. This is because he would be able to somewhat understand the structure of your application by checking routes.php. Now imagine the routes you will need for editing, deleting, or displaying a user. Resource Controller will save you from this trouble. A single line of route will map multiple restful actions with our resource Controller. It will automatically map the following actions with HTTP verbs: HTTP VERB ACTION GET READ POST CREATE PUT UPDATE DELETE DELETE On top of that you can actually generate your Controller via a simple command-line artisan using the following command: $ php artisan Usercontroller:make users This will generate UsersController.php with all the RESTful empty methods, so you will have an empty structure to play with. Here is what we will have after the preceding command: class UserController extends BaseController { /** * Display a listing of the resource. * * @return Response */ public function index() { // } /** * Show the form for creating a new resource. * * @return Response */ public function create() { // } /** * Store a newly created resource in storage. * * @return Response */ public function store() { // } /** * Display the specified resource. * * @param int $id * @return Response */ public function show($id) { // } /** * Show the form for editing the specified resource. * * @param int $id * @return Response */ public function edit($id) { // } /** * Update the specified resource in storage. * * @param int $id * @return Response */ public function update($id) { // } /** * Remove the specified resource from storage. * * @param int $id * @return Response */ public function destroy($id) { // } } Now let's try to understand what our single line route declaration created relationship with our generated Controller. HTTP VERB Path Controller Action/method GET /Users Index GET /Users/create Create POST /Users Store GET /Users/{id} Show (individual record) GET /Users/{id}/edit Edit PUT /Users/{id} Update DELETE /Users/{id} Destroy As you can see, resource Controller really makes your work easy. You don't have to create lots of routes. Also Laravel 4's artisan-command-line generator can generate resourceful Controllers, so you will write very less boilerplate code. And you can also use the following command to view the list of all the routes in your project from the root of your project, launching command line: $ php artisan routes Now let's get back to our basic task, that is, reading users. Well now we know that we have UserController.php at /app/controller with the index method, which will be executed when somebody launches http://localhost/laravel/public/users. So let's edit the Controller file to fetch data from the database. Well as you might remember, we will need a Model to do that. But how do we define one and what's the use of Models? You might be wondering, can't we just run the queries? Well Laravel does support queries through the DB class, but Laravel also has Eloquent that gives us our table as a database object, and what's great about object is that we can play around with its methods. So let's create a Model. If you check your path /app/models/User.php, you will already have a user Model defined. It's there because Laravel provides us with some basic user authentication. Generally you can create your Model using the following code: class User extends Eloquent {} Now in your controller you can fetch the user object using the following code: $users = User::all(); $users->toarray(); Yeah! It's that simple. No database connection! No queries! Isn't it magic? It's the simplicity of Eloquent objects that many people like in Laravel. But you have the following questions, right? How does Model know which table to fetch? How does Controller know what is a user? How does the fetching of user records work? We don't have all the methods in the User class, so how did it work? Well models in Laravel use a lowercase, plural name of the class as the table name unless another name is explicitly specified. So in our case, User was converted to a lowercase user and used as a table to bind with the User class. Models are automatically loaded by Laravel, so you don't have to include the reference of the Model file. Each Model inherits an Eloquent instance that resolves methods defined in the model.php file at vendor/Laravel/framework/src/Illumininate/Database/Eloquent/ like all, insert, update, delete and our user class inherit those methods and as a result of this, we can fetch records via User::all(). So now let's try to fetch users from our database via the Eloquent object. I am updating the index method in our app/controllers/UsersController.php as it's the method responsible as per the REST convention we are using via resource Controller. public function index() { $users = User::all(); return View::make('users.index', compact('users')); } Now let's look at the View part. Before that, we need to know about Blade. Blade is a templating engine provided by Laravel. Blade has a very simple syntax, and you can determine most of the Blade expressions within your view files as they begin with @. To print anything with Blade, you can use the {{ $var }} syntax. Its PHP-equivalent syntax would be: <?php echo $var; ?> Now back to our view; first of all, we need to create a view file at /app/views/users/index.blade.php, as our statement would return the view file from users.index. We are passing a compact users array to this view. So here is our index.blade.php file: @section('main') <h1>All Users</h1> <p>{{ link_to_route('users.create', 'Add new user') }}</p> @if ($users->count()) <table class="table table-striped table-bordered"> <thead> <tr> <th>Username</th> <th>Password</th> <th>Email</th> <th>Phone</th> <th>Name</th> </tr> </thead> <tbody> @foreach ($users as $user) <tr> <td>{{ $user->username }}</td> <td>{{ $user->password }}</td> <td>{{ $user->email }}</td> <td>{{ $user->phone }}</td> <td>{{ $user->name }}</td> <td>{{ link_to_route('users.edit', 'Edit', array($user->id), array('class' => 'btn btn-info')) }}</td> <td> {{ Form::open(array('method' => 'DELETE', 'route' => array('users.destroy', $user->id))) }} {{ Form::submit('Delete', array('class' => 'btn btn-danger')) }} {{ Form::close() }} </td> </tr> @endforeach </tbody> </table> @else There are no users @endif @stop Let's see the code line by line. In the first line we are extending the user layouts via the Blade template syntax @extends. What actually happens here is that Laravel will load the layout file at /app/views/layouts/user.blade.php first. Here is our user.blade.php file's code: <!doctype html> <html> <head> <meta charset="utf-8"> <link href="//netdna.bootstrapcdn.com/twitter-bootstrap/2.3.1/css/bootstrap-combined.min.css" rel="stylesheet"> <style> table form { margin-bottom: 0; } form ul { margin-left: 0; list-style: none; } .error { color: red; font-style: italic; } body { padding-top: 20px; } </style> </head> <body> <div class="container"> @if (Session::has('message')) <div class="flash alert"> <p>{{ Session::get('message') }}</p> </div> @endif @yield('main') </div> </body> </html> Now in this file we are loading the Twitter bootstrap framework for styling our page, and via yield('main') we can load the main section from the view that is loaded. So here when we load http://localhost/laravel/public/users, Laravel will first load the users.blade.php layout view and then the main section will be loaded from index.blade.php. Now when we get back to our index.blade.php, we have the main section defined as @section('main'), which will be used by Laravel to load it into our layout file. This section will be merged into the layout file where we have put the @yield ('main') section. We are using Laravel's link_to_route method to link to our route, that is, /users/create. This helper will generate an HTML link with the correct URL. In the next step, we are looping through all the user records and displaying it simply in a tabular format. Now if you have followed everything, you will be greeted by the following screen:
Read more
  • 0
  • 0
  • 23465

article-image-microservices-and-service-oriented-architecture
Packt
09 Mar 2017
6 min read
Save for later

Microservices and Service Oriented Architecture

Packt
09 Mar 2017
6 min read
Microservices are an architecture style and an approach for software development to satisfy modern business demands. They are not a new invention as such. They are instead an evolution of previous architecture styles. Many organizations today use them - they can improve organizational agility, speed of delivery, and ability to scale. Microservices give you a way to develop more physically separated modular applications. This tutorial has been taken from Spring 5.0 Microsevices - Second Edition Microservices are similar to conventional service-oriented architectures. In this article, we will see how microservices are related to SOA. The emergence of microservices Many organizations, such as Netflix, Amazon, and eBay, successfully used what is known as the 'divide and conquer' technique to functionally partition their monolithic applications into smaller atomic units. Each one performs a single function - a 'service'. These organizations solved a number of prevailing issues they were experiencing with their monolithic application. Following the success of these organizations, many other organizations started adopting this as a common pattern to refactor their monolithic applications. Later, evangelists termed this pattern as microservices architecture. Microservices originated from the idea of Hexagonal Architecture, coined by Alistair Cockburn back in 2005. Hexagonal Architecture or Hexagonal pattern is also known as the Ports and Adapters pattern. Cockburn defined microservices as: "...an architectural style or an approach for building IT systems as a set of business capabilities that are autonomous, self contained, and loosely coupled." The following diagram depicts a traditional N-tier application architecture having presentation layer, business layer, and database layer: Modules A, B, and C represent three different business capabilities. The layers in the diagram represent separation of architecture concerns. Each layer holds all three business capabilities pertaining to that layer. Presentation layer has web components of all three modules, business layer has business components of all three modules, and database hosts tables of all three modules. In most cases, layers are physically spreadable, whereas modules within a layer are hardwired. Let's now examine a microservice-based architecture: As we can see in the preceding diagram, the boundaries are inversed in the microservices architecture. Each vertical slice represents a microservice. Each microservice will have its own presentation layer, business layer, and database layer. Microservices is aligned toward business capabilities. By doing so, changes to one microservice do not impact the others. There is no standard for communication or transport mechanisms for microservices. In general, microservices communicate with each other using widely adopted lightweight protocols, such as HTTP and REST, or messaging protocols, such as JMS or AMQP. In specific cases, one might choose more optimized communication protocols, such as Thrift, ZeroMQ, Protocol Buffers, or Avro. As microservices is more aligned to the business capabilities and has independently manageable lifecycles, they are the ideal choice for enterprises embarking on DevOps and cloud. DevOps and cloud are two facets of microservices. How do microservices compare to Service Oriented Architectures? One of the common question arises when dealing with microservices architecture is, how is it different from SOA. SOA and microservices follow similar concepts. Earlier in this article, we saw that microservices is evolved from SOA and many service characteristics that are common in both approaches. However, are they the same or different? As microservices evolved from SOA, many characteristics of microservices is similar to SOA. Let’s first examine the definition of SOA. The Open Group definition of SOA is as follows: "SOA is an architectural style that supports service-orientation. Service-orientation is a way of thinking in terms of services and service-based development and the outcomes of services. Is self-contained May be composed of other services Is a “black box” to consumers of the service" You have learned similar aspects in microservices as well. So, in what way is microservices different? The answer is--it depends. The answer to the previous question could be yes or no, depending upon the organization and its adoption of SOA. SOA is a broader term and different organizations approached SOA differently to solve different organizational problems. The difference between microservices and SOA is in the way based on how an organization approaches SOA. In order to get clarity, a few cases will be examined here. Service oriented integration Service-oriented integration refers to a service-based integration approach used by many organizations: Many organizations would have used SOA primarily to solve their integration complexities, also known as integration spaghetti. Generally, this is termed as Service Oriented Integration (SOI). In such cases, applications communicate with each other through a common integration layer using standard protocols and message formats, such as SOAP/XML-based web services over HTTP or Java Message Service (JMS). These types of organizations focus on Enterprise Integration Patterns (EIP) to model their integration requirements. This approach strongly relies on heavyweight Enterprise Service Bus (ESB),such as TIBCO Business Works, WebSphere ESB, Oracle ESB, and the likes. Most of the ESB vendors also packed a set of related product, such as Rules Engines, Business Process Management Engines, and so on as a SOA suite. Such organization's integrations are deeply rooted into these products. They either write heavy orchestration logic in the ESB layer or business logic itself in the service bus. In both cases, all enterprise services are deployed and accessed through the ESB. These services are managed through an enterprise governance model. For such organizations, microservices is altogether different from SOA. Legacy modernization SOA is also used to build service layers on top of legacy applications which is shown in the following diagram: Another category of organizations would have used SOA in transformation projects or legacy modernization projects. In such cases, the services are built and deployed in the ESB connecting to backend systems using ESB adapters. For these organizations, microservices are different from SOA. Service oriented application Some organizations would have adopted SOA at an application level: In this approach as shown in the preceding diagram, lightweight Integration frameworks, such as Apache Camel or Spring Integration, are embedded within applications to handle service related cross-cutting capabilities, such as protocol mediation, parallel execution, orchestration, and service integration. As some of the lightweight integration frameworks had native Java object support, such applications would have even used native Plain Old Java Objects (POJO) services for integration and data exchange between services. As a result, all services have to be packaged as one monolithic web archive. Such organizations could see microservices as the next logical step of their SOA. Monolithic migration using SOA The following diagram represents Logical System Boundaries: The last possibility is transforming a monolithic application into smaller units after hitting the breaking point with the monolithic system. They would have broken the application into smaller physically deployable subsystems, similar to the Y axis scaling approach explained earlier and deployed them as web archives on web servers or as jars deployed on some home grown containers. These subsystems as service would have used web services or other lightweight protocols to exchange data between services. They would have also used SOA and service design principles to achieve this. For such organizations, they may tend to think that microservices is the same old wine in a new bottle. Further resources on this subject: Building Scalable Microservices [article] Breaking into Microservices Architecture [article] A capability model for microservices [article]
Read more
  • 0
  • 0
  • 23449

article-image-getting-started-nginx
Packt
20 Jul 2015
10 min read
Save for later

Getting Started with Nginx

Packt
20 Jul 2015
10 min read
In this article by the author, Valery Kholodkov, of the book, Nginx Essentials, we learn to start digging a bit deeper into Nginx, we will quickly go through most common distributions that contain prebuilt packages for Nginx. Installing Nginx Before you can dive into specific features of Nginx, you need to learn how to install Nginx on your system. It is strongly recommended that you use prebuilt binary packages of Nginx if they are available in your distribution. This ensures best integration of Nginx with your system and reuse of best practices incorporated into the package by the package maintainer. Prebuilt binary packages of Nginx automatically maintain dependencies for you and package maintainers are usually fast to include security patches, so you don't get any complaints from security officers. In addition to that, the package usually provides a distribution-specific startup script, which doesn't come out of the box. Refer to your distribution package directory to find out if you have a prebuilt package for Nginx. Prebuilt Nginx packages can also be found under the download link on the official Nginx.org site. Installing Nginx on Ubuntu The Ubuntu Linux distribution contains a prebuilt package for Nginx. To install it, simply run the following command: $ sudo apt-get install nginx The preceding command will install all the required files on your system, including the logrotate script and service autorun scripts. The following table describes the Nginx installation layout that will be created after running this command as well as the purpose of the selected files and folders: Description Path/Folder Nginx configuration files /etc/nginx Main configuration file /etc/nginx/nginx.conf Virtual hosts configuration files (including default one) /etc/nginx/sites-enabled Custom configuration files /etc/nginx/conf.d Log files (both access and error log) /var/log/nginx Temporary files /var/lib/nginx Default virtual host files /usr/share/nginx/html Default virtual host files will be placed into /usr/share/nginx/html. Please keep in mind that this directory is only for the default virtual host. For deploying your web application, use folders recommended by Filesystem Hierarchy Standard (FHS). Now you can start the Nginx service with the following command: $ sudo service nginx start This will start Nginx on your system. Alternatives The prebuilt Nginx package on Ubuntu has a number of alternatives. Each of them allows you to fine tune the Nginx installation for your system. Installing Nginx on Red Hat Enterprise Linux or CentOS/Scientific Linux Nginx is not provided out of the box in Red Hat Enterprise Linux or CentOS/Scientific Linux. Instead, we will use the Extra Packages for Enterprise Linux (EPEL) repository. EPEL is a repository that is maintained by Red Hat Enterprise Linux maintainers, but contains packages that are not a part of the main distribution for various reasons. You can read more about EPEL at https://fedoraproject.org/wiki/EPEL. To enable EPEL, you need to download and install the repository configuration package: For RHEL or CentOS/SL 7, use the following link: http://download.fedoraproject.org/pub/epel/7/x86_64/repoview/epel-release.html For RHEL/CentOS/SL 6 use the following link: http://download.fedoraproject.org/pub/epel/6/i386/repoview/epel-release.html If you have a newer/older RHEL version, please take a look at the How can I use these extra packages? section in the original EPEL wiki at the following link: https://fedoraproject.org/wiki/EPEL Now that you are ready to install Nginx, use the following command: # yum install nginx The preceding command will install all the required files on your system, including the logrotate script and service autorun scripts. The following table describes the Nginx installation layout that will be created after running this command and the purpose of the selected files and folders: Description Path/Folder Nginx configuration files /etc/nginx Main configuration file /etc/nginx/nginx.conf Virtual hosts configuration files (including default one) /etc/nginx/conf.d Custom configuration files /etc/nginx/conf.d Log files (both access and error log) /var/log/nginx Temporary files /var/lib/nginx Default virtual host files /usr/share/nginx/html Default virtual host files will be placed into /usr/share/nginx/html. Please keep in mind that this directory is only for the default virtual host. For deploying your web application, use folders recommended by FHS. By default, the Nginx service will not autostart on system startup, so let's enable it. Refer to the following table for the commands corresponding to your CentOS version: Function Cent OS 6 Cent OS 7 Enable Nginx startup at system startup chkconfig nginx on systemctl enable nginx Manually start Nginx service nginx start systemctl start nginx Manually stop Nginx service nginx stop systemctl start nginx Installing Nginx from source files Traditionally, Nginx is distributed in the source code. In order to install Nginx from the source code, you need to download and compile the source files on your system. It is not recommended that you install Nginx from the source code. Do this only if you have a good reason, such as the following scenarios: You are a software developer and want to debug or extend Nginx You feel confident enough to maintain your own package A package from your distribution is not good enough for you You want to fine-tune your Nginx binary. In either case, if you are planning to use this way of installing for real use, be prepared to sort out challenges such as dependency maintenance, distribution, and application of security patches. In this section, we will be referring to the configuration script. Configuration script is a shell script similar to one generated by autoconf, which is required to properly configure the Nginx source code before it can be compiled. This configuration script has nothing to do with the Nginx configuration file that we will be discussing later. Downloading the Nginx source files The primary source for Nginx for an English-speaking audience is Nginx.org. Open https://nginx.org/en/download.html in your browser and choose the most recent stable version of Nginx. Download the chosen archive into a directory of your choice (/usr/local or /usr/src are common directories to use for compiling software): $ wget -q http://nginx.org/download/nginx-1.7.9.tar.gz Extract the files from the downloaded archive and change to the directory corresponding to the chosen version of Nginx: $ tar xf nginx-1.7.9.tar.gz$ cd nginx-1.7.9 To configure the source code, we need to run the ./configure script included in the archive: $ ./configurechecking for OS+ Linux 3.13.0-36-generic i686checking for C compiler ... found+ using GNU C compiler[...] This script will produce a lot of output and, if successful, will generate a Makefile file for the source files. Notice that we showed the non-privileged user prompt $ instead of the root # in the previous command lines. You are encouraged to configure and compile software as a regular user and only install as root. This will prevent a lot of problems related to access restriction while working with the source code. Troubleshooting The troubleshooting step, although very simple, has a couple of common pitfalls. The basic installation of Nginx requires the presence of OpenSSL and Perl-compatible Regex (PCRE) developer packages in order to compile. If these packages are not properly installed or not installed in locations where the Nginx configuration script is able to locate them, the configuration step might fail. Then, you have to choose between disabling the affected Nginx built-in modules (rewrite or SSL, installing required packages properly, or pointing the Nginx configuration script to the actual location of those packages if they are installed. Building Nginx You can build the source files now using the following command: $ make You'll see a lot of output on compilation. If build is successful, you can install the Nginx file on your system. Before doing that, make sure you escalate your privileges to the super user so that the installation script can install the necessary files into the system areas and assign necessary privileges. Once successful, run the make install command: # make install The preceding command will install all the necessary files on your system. The following table lists all locations of the Nginx files that will be created after running this command and their purposes: Description Path/Folder Nginx configuration files /usr/local/nginx/conf Main configuration file /usr/local/nginx/conf/nginx.conf Log files (both access and error log) /usr/local/nginx/logs Temporary files /usr/local/nginx Default virtual host files /usr/local/nginx/html Unlike installations from prebuilt packages, installation from source files does not harness Nginx folders for the custom configuration files or virtual host configuration files. The main configuration file is also very simple in its nature. You have to take care of this yourself. Nginx must be ready to use now. To start Nginx, change your working directory to the /usr/local/nginx directory and run the following command: # sbin/nginx This will start Nginx on your system with the default configuration. Troubleshooting This stage works flawlessly most of the time. A problem can occur in the following situations: You are using nonstandard system configuration. Try to play with the options in the configuration script in order to overcome the problem. You compiled in third-party modules and they are out of date or not maintained. Switch off third-party modules that break your build or contact the developer for assistance. Copying the source code configuration from prebuilt packages Occasionally you might want to amend Nginx binary from a prebuilt packages with your own changes. In order to do that you need to reproduce the build tree that was used to compile Nginx binary for the prebuilt package. But how would you know what version of Nginx and what configuration script options were used at the build time? Fortunately, Nginx has a solution for that. Just run the existing Nginx binary with the -V command-line option. Nginx will print the configure-time options. This is shown in the following: $ /usr/sbin/nginx -Vnginx version: nginx/1.4.6 (Ubuntu)built by gcc 4.8.2 (Ubuntu 4.8.2-19ubuntu1)TLS SNI support enabledconfigure arguments: --with-cc-opt='-g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2' --with-ld-opt='-Wl,-Bsymbolic-functions -Wl,-z,relro' … Using the output of the preceding command, reproduce the entire build environment, including the Nginx source tree of the corresponding version and modules that were included into the build. Here, the output of the Nginx -V command is trimmed for simplicity. In reality, you will be able to see and copy the entire command line that was passed to the configuration script at the build time. You might even want to reproduce the version of the compiler used in order to produce a binary-identical Nginx executable file (we will discuss this later when discussing how to troubleshoot crashes). Once this is done, run the ./configure script of your Nginx source tree with options from the output of the -V option (with necessary alterations) and follow the remaining steps of the build procedure. You will get an altered Nginx executable on the objs/ folder of the source tree. Summary Here, you learned how to install Nginx from a number of available sources, the structure of Nginx installation and the purpose of various files, the elements and structure of the Nginx configuration file, and how to create a minimal working Nginx configuration file. You also learned about some best practices for Nginx configuration.
Read more
  • 0
  • 0
  • 23443
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-working-your-bot
Packt
28 Oct 2015
8 min read
Save for later

Working On Your Bot

Packt
28 Oct 2015
8 min read
 In this article by Kassandra Perch, the author of Learning JavaScript Robotics, we will learn how to wire up servos and motors, how to create a project with a motor and using the REPL, and how to create a project with a servo and a sensor (For more resources related to this topic, see here.) Wiring up servos and motors Wiring up servos will look similar to wiring up sensors, except the signal maps to an output. Wiring up a motor is similar to wiring an LED. Wiring up servos To wire up a servo, you'll have to use a setup similar to the following figure: A servo wiring diagram The wire colors may vary for your servo. If your wires are red, brown, and orange, red is 5V, brown is GND, and orange is signal. When in doubt, check the data sheet that came with your servo. After wiring up the servo, plug the board in and listen to your servo. If you hear a clicking noise, quickly unplug the board—this means your servo is trying to place itself in a position it cannot reach. Usually, there is a small screw at the bottom of most servos that you can use to calibrate them. Use a small screwdriver to rotate this until it stops clicking when the power is turned on. This procedure is the same for continuous servos—the diagram does not change much either. Just replace the regular servo with a continuous one and you're good to go. Wiring up motors Wiring up motors looks like the following diagram: A motor wiring diagram Again, you'll want the signal pin to go to a PWM pin. As there are only two pins, it can be confusing where the power pin goes—it goes to a PWM pin because, similar to our LED getting its power from the PWM pin, the same pin will provide the power to run the motor. Now that we know how to wire these up, let's work on a project involving a motor and Johnny-Five's REPL. Creating a project with a motor and using the REPL Grab your motor and board, and follow the diagram in the previous section to wire a motor. Let's use pin 6 for the signal pin, as shown in the preceding diagram. What we're going to do in our code is create a Motor object and inject it into the REPL, so we can play around with it in the command line. Create a motor.js file and put in the following code: var five = require('johnny-five'); var board = new five.Board(); board.on('ready', function(){ var motor = new five.Motor({ pin: 6 }); this.repl.inject({ motor: motor }); }); Then, plug in your board and use the motor.js node to start the program. Exploring the motor API If we take a look at the documentation on the Johnny-Five website, there are few things we can try here. First, let's turn our motor on at about half speed: > motor.start(125); The .start() method takes a value between 0 and 255. Sounds familiar? That's because these are the values we can assign to a PWM pin! Okay, let's tell our motor to coast to a stop: > motor.stop(); Note that while this function will cause the motor to coast to a stop, there is a dedicated .brake() method. However, this requires a dedicated break pin, which can be available using shields and certain motors. If you happen to have a directional motor, you can tell the motor to run in reverse using .reverse() with a value between 0 and 255: > motor.reverse(125); This will cause a directional motor to run in reverse at half speed. Note that this requires a shield. And that's about it. Operating motors isn't difficult and Johnny-Five makes it even easier. Now that we've learned how this operates, let's try a servo. Creating a project with a servo and a sensor Let's start with just a servo and the REPL, then we can add in a sensor. Use the diagram from the previous section as a reference to wire up a servo, and use pin 6 for signal. Before we write our program, let's take a look at some of the options the Servo object constructor gives us. You can set an arbitrary range by passing [min, max] to the range property. This is great for low quality servos that have trouble at very low and very high values. The type property is also important. We'll be using a standard servo, but you'll need to set this to continuous if you're using a continuous servo. Since standard is the default, we can leave this out for now. The offset property is important for calibration. If your servo is set too far in one direction, you can change the offset to make sure it can programmatically reach every angle it was meant to. If you hear clicking at very high or low values, try adjusting the offset. You can invert the direction of the servo with the invert property or initialize the servo at the center with center. Centering the servo helps you to know whether you need to calibrate it. If you center it and the arm isn't centered, try adjusting the offset property. Now that we've got a good grasp on the constructor, let's write some code. Create a file called servo-repl.js and enter the following: var five = require('johnny-five'); var board = new five.Board(); board.on('ready', function(){ var servo = new five.Servo({ pin: 6 }); this.repl.inject({ servo: servo }); }); This code simply constructs a standard servo object for pin 6 and injects it into the REPL. Then, run it using the following command line: > node servo-repl.js Your servo should jump to its initialization point. Now, let's figure out how to write the code that makes the servo move. Exploring the servo API with the REPL The most basic thing we can do with a servo is set it to a specific angle. We do this by calling the .to() function with a degree, as follows: > servo.to(90); This should center the servo. You can also set a time on the .to() function, which can take a certain amount of time: > servo.to(20, 500); This will move the servo from 90 degrees to 20 degrees in over 500 ms. You can even determine how many steps the servo takes to get to the new angle, as follows: > servo.to(120, 500, 10); This will move the servo to 120 degrees in over 500 ms in 10 discreet steps. The .to() function is very powerful and will be used in a majority of your Servo objects. However, there are many useful functions. For instance, checking whether a servo is calibrated correctly is easier when you can see all angles quickly. For this, we can use the .sweep() function, as follows: > servo.sweep(); This will sweep the servo back and forth between its minimum and maximum values, which are 0 and 180, unless set in the constructor via the range property. You can also specify a range to sweep, as follows: > servo.sweep({ range: [20, 120] }); This will sweep the servo from 20 to 120 repeatedly. You can also set the interval property, which will change how long the sweep takes, and a step property, which sets the number of discreet steps taken, as follows: > servo.sweep({ range: [20, 120], interval: 1000, step: 10 }); This will cause the servo to sweep from 20 to 120 every second in 10 discreet steps. You can stop a servo's movement with the .stop() method, as follows: > servo.stop(); For continuous servos, you can use .cw() and .ccw() with a speed between 0 and 255 to move the continuous servo back and forth. Now that we've seen the Servo object API at work, let's hook our servo up to a sensor. In this case, we'll use a photocell. This code is a good example for a few reasons: it shows off Johnny-Five's event API, allows us to use a servo with an event, and gets us used to wiring inputs to outputs using events. First, let's add a photocell to our project using the following diagram: A servo and photoresistor wiring diagram Then, create a photoresistor-servo.js file, and add the following. var five = require('johnny-five'); var board = new five.Board(); board.on('ready', function(){ var servo = new five.Servo({ pin: 6 }); var photoresistor = new five.Sensor({ pin: "A0", freq: 250 }); photoresistor.scale(0, 180).on('change', function(){ servo.to(this.value); }); }); How this works is as follows. During the data event, we tell our servo to move to the correct position based on the scaled data from our photoresistor. Now, run the following command line: > node photoresistor-servo.js Then, try turning the light on and covering up your photoresistor and watch the servo move! Summary We now know how to use servos and motors that helps to move small robots. Wheeled robots are good to go! But what about more complex projects, such as the hexapod? Walking takes timing. As we mentioned in the .to() function, we can time the servo movement, thanks to the Animation library. Resources for Article: Further resources on this subject: Internet-Connected Smart Water Meter [article] Raspberry Pi LED Blueprints [article] Welcome to JavaScript in the full stack [article]
Read more
  • 0
  • 0
  • 23435

article-image-writing-postgis-functions-in-python-tutorial
Pravin Dhandre
01 Aug 2018
5 min read
Save for later

Writing PostGIS functions in Python language [Tutorial]

Pravin Dhandre
01 Aug 2018
5 min read
In this tutorial, you will learn to write a Python function for PostGIS and PostgreSQL using the PL/Python language and effective libraries like urllib2 and simplejson. You will use Python to query the http://openweathermap.org/ web services to get the weather for a PostGIS geometry from within a PostgreSQL function. This tutorial is an excerpt from a book written by Mayra Zurbaran,Pedro Wightman, Paolo Corti, Stephen Mather, Thomas Kraft and Bborie Park titled PostGIS Cookbook - Second Edition. Adding Python support to database Verify your PostgreSQL server installation has PL/Python support. In Windows, this should be already included, but this is not the default if you are using, for example, Ubuntu 16.04 LTS, so you will most likely need to install it: $ sudo apt-get install postgresql-plpython-9.1 Install PL/Python on the database (you could consider installing it in your template1 database; in this way, every newly created database will have PL/Python support by default): You could alternatively add PL/Python support to your database, using the createlang shell command (this is the only way if you are using PostgreSQL version 9.1 or lower): $ createlang plpythonu postgis_cookbook $ psql -U me postgis_cookbook postgis_cookbook=# CREATE EXTENSION plpythonu; How to do it... Carry out the following steps: In this tutorial, as with the previous one, you will use a http://openweathermap.org/ web service to get the temperature for a point from the closest weather station. The request you need to run (test it in a browser) is http://api.openweathermap.org/data/2.5/find?lat=55&lon=37&cnt=10&appid=YOURKEY. You should get the following JSON output (the closest weather station's data from which you will read the temperature to the point, with the coordinates of the given longitude and latitude): { message: "", cod: "200", calctime: "", cnt: 1, list: [ { id: 9191, dt: 1369343192, name: "100704-1", type: 2, coord: { lat: 13.7408, lon: 100.5478 }, distance: 6.244, main: { temp: 300.37 }, wind: { speed: 0, deg: 141 }, rang: 30, rain: { 1h: 0, 24h: 3.302, today: 0 } } ] } Create the following PostgreSQL function in Python, using the PL/Python language: CREATE OR REPLACE FUNCTION chp08.GetWeather(lon float, lat float) RETURNS float AS $$ import urllib2 import simplejson as json data = urllib2.urlopen( 'http://api.openweathermap.org/data/ 2.1/find/station?lat=%s&lon=%s&cnt=1' % (lat, lon)) js_data = json.load(data) if js_data['cod'] == '200': # only if cod is 200 we got some effective results if int(js_data['cnt'])>0: # check if we have at least a weather station station = js_data['list'][0] print 'Data from weather station %s' % station['name'] if 'main' in station: if 'temp' in station['main']: temperature = station['main']['temp'] - 273.15 # we want the temperature in Celsius else: temperature = None else: temperature = None return temperature $$ LANGUAGE plpythonu; Now, test your function; for example, get the temperature from the weather station closest to Wat Pho Templum in Bangkok: postgis_cookbook=# SELECT chp08.GetWeather(100.49, 13.74); getweather ------------ 27.22 (1 row) If you want to get the temperature for the point features in a PostGIS table, you can use the coordinates of each feature's geometry: postgis_cookbook=# SELECT name, temperature, chp08.GetWeather(ST_X(the_geom), ST_Y(the_geom)) AS temperature2 FROM chp08.cities LIMIT 5; name | temperature | temperature2 -------------+-------------+-------------- Minneapolis | 275.15 | 15 Saint Paul | 274.15 | 16 Buffalo | 274.15 | 19.44 New York | 280.93 | 19.44 Jersey City | 282.15 | 21.67 (5 rows) Now it would be nice if our function could accept not only the coordinates of a point, but also a true PostGIS geometry as well as an input parameter. For the temperature of a feature, you could return the temperature of the weather station closest to the centroid of the feature geometry. You can easily get this behavior using function overloading. Add a new function, with the same name, supporting a PostGIS geometry directly as an input parameter. In the body of the function, call the previous function, passing the coordinates of the centroid of the geometry. Note that in this case, you can write the function without using Python, with the PL/PostgreSQL language: CREATE OR REPLACE FUNCTION chp08.GetWeather(geom geometry) RETURNS float AS $$ BEGIN RETURN chp08.GetWeather(ST_X(ST_Centroid(geom)), ST_Y(ST_Centroid(geom))); END; $$ LANGUAGE plpgsql; Now, test the function, passing a PostGIS geometry to the function: postgis_cookbook=# SELECT chp08.GetWeather( ST_GeomFromText('POINT(-71.064544 42.28787)')); getweather ------------ 23.89 (1 row) If you use the function on a PostGIS layer, you can pass the feature's geometries to the function directly, using the overloaded function written in the PL/PostgreSQL language: postgis_cookbook=# SELECT name, temperature, chp08.GetWeather(the_geom) AS temperature2 FROM chp08.cities LIMIT 5; name | temperature | temperature2 -------------+-------------+-------------- Minneapolis | 275.15 | 17.22 Saint Paul | 274.15 | 16 Buffalo | 274.15 | 18.89 New York | 280.93 | 19.44 Jersey City | 282.15 | 21.67 (5 rows) In this tutorial, you wrote a Python function in PostGIS, using the PL/Python language. Using Python inside PostgreSQL and PostGIS functions gives you the great advantage of being able to use any Python library you wish. Therefore, you will be able to write much more powerful functions compared to those written using the standard PL/PostgreSQL language. In fact, in this case, you used the urllib2 and simplejson Python libraries to query a web service from within a PostgreSQL function—this would be an impossible operation to do using plain PL/PostgreSQL. You have also seen how to overload functions in order to provide the function's user a different way to access the function, using input parameters in a different way. To get armed with all the tools and instructions you need for managing entire spatial database systems, read PostGIS Cookbook - Second Edition. Top 7 libraries for geospatial analysis Learning R for Geospatial Analysis
Read more
  • 0
  • 0
  • 23419

article-image-amazon-web-services
Packt
20 Nov 2014
16 min read
Save for later

Amazon Web Services

Packt
20 Nov 2014
16 min read
 In this article, by Prabhakaran Kuppusamy and Uchit Vyas, authors of AWS Development Essentials, you will learn different tools and methods available to perform the same operation with different, varying complexities. Various options are available, depending on the user's level of experience. In this article, we will start with an overview of each service, learn about the various tools available for programmer interaction, and finally see the troubleshooting and best practices to be followed while using these services. AWS provides a handful of services in every area. In this article, we will cover the following topics: Navigate through the AWS Management Console Describe the security measures that AWS provides AWS interaction through the SDK and IDE tools (For more resources related to this topic, see here.) Background of AWS and its needs AWS is based on an idea presented by Chris Pinkham and Benjamin Black with a vision towards Amazon's retail computing infrastructure. The first Amazon offering was SQS, in the year 2004. Officially, AWS was launched and made available online in 2006, and within a year, 200,000 developers signed up for these services. Later, due to a natural disaster (June 29, 2012 storm in North Virginia, which brought down most of the servers residing at this location) and technical events, AWS faced a lot of challenges. A similar event happened on December 2012, after which AWS has been providing services as stated. AWS learned from these events and made sure that the same kind of outage didn't occur even if the same event occurred again. AWS is an idea born in a single room, but the idea is now made available and used by almost all the cloud developers and IT giants. AWS is greatly loved by all kinds of technology admirers. Irrespective of the user's expertise, AWS has something for various types of users. For an expert programmer, AWS has SDKs for each service. Using these SDKs, the programmer can perform operations by entering commands in the command-line interface. However an end user with limited knowledge of programming can still perform similar operations using the graphical user interface of the AWS Management Console, which is accessible through a web browser. If the programmers need interactions between a low-level (SDK) and a high-level (Management Console), they can go for the integrated development environment (IDE) tools, for which AWS provides plugins and add-ons. One such commonly used IDE for which AWS has provided add-ons is the Eclipse IDE. As of now, we will start with the AWS Management Console. The AWS Management Console The most popular method of accessing AWS is via the Management Console because of its simplicity of usage and power. Another reason why the end user prefers the Management Console is that it doesn't require any software to start with; having an Internet connection and a browser is sufficient. As the name suggests, the Management Console is a place where administrative and advanced operations can be performed on your AWS account details or AWS services. The Management Console mainly focuses on the following features: One-click access to AWS's services AWS account administration AWS management using handheld devices AWS infrastructure management across the globe One-click access to the AWS services To access the Management Console, all you need to do is first sign up with AWS. Once done, the Management Console will be available at https://console.aws.amazon.com/. Once you have signed up, you will be directed to the following page: Each and every icon on this page is an Amazon Web Service. Two or more services will be grouped under a category. For example, in the Analytics category, you can see three services, namely, Data Pipeline, Elastic MapReduce, and Kinesis. Starting with any of these services is very easy. Have a look at the description of the service at the bottom of the service icon. As soon as you click on the service icon, it will take you to the Getting started page of the corresponding service, where brief as well as detailed guidelines are available. In order to start with any of the services, only two things are required. The first one is an AWS account and the second one is the supported browser. The Getting started section usually will have a video, which explains the specialty and use cases of the service that you selected. Once you finish reading the Getting started section, optionally you can go through the DOC files specific to the service to know more about the syntaxes and usage of the service operations. AWS account administration The account administration is one of the most important things to make note of. To do this, click on your displayed name (in this case, Prabhakar) at the top of the page, and then click on the My Account option, as shown in the preceding screenshot. At the beginning of every month, you don't want AWS to deduct all your salary by stating that you have used these many services costing this much money; hence, all this management information is available in the Management Console. Using the Management Console, you can infer the following information: The monthly billing in brief as well as the detailed manner (cost split-up of each service) along with a provision to view VAT and tax exemption Account details, such as the display name and contact information Provision to close the AWS account All the preceding operations and much more are possible. AWS management using handheld devices Managing and accessing the AWS services is through (but not limited to) PC. AWS provides a handful of applications almost for all or most of the mobile platforms, such as Android, iOS, and so on. Using these applications, you can perform all the AWS operations on the move. You won't believe that having a 7-inch Android tablet with the installed AWS Console application from Google Play will enable you to ask for any Elastic Compute Cloud (EC2) instance from Amazon and control it (start, stop, and terminate) very easily. You can install an SSH client in the tablet and connect to the Linux terminal. However, if you wish to make use of the Windows instance from EC2, you might use the Graphics User Interface (GUI) more frequently than a command line. A few more sophisticated software and hardware might be needed, for example, you should have a VNC viewer or remote desktop connection software to get the GUI of the EC2 instance borrowed. As you are making use of the GUI in addition to the keyboard, you will need a pointer device, such as a mouse. As a result, you will almost get addicted to the concept of cloud computing going mobile. AWS infrastructure management across the globe At this point, you might be aware that you can get all of these AWS services from servers residing at any of the following locations. To control these services used by you in different regions, you don't have to go anywhere else. You can control it right here in the same Management Console. Using the same Management Console, just by clicking on N.Virginia and choosing the location (at the top of the Management Console), you can make the service available in that region, as shown in the following screenshot: You can choose the server location at which you want the service (data and machine) to be made available based on the following two factors: The first factor is the distance between the server's location and the client's location. For example, if you have deployed a web application for a client from North California at a Tokyo location, obviously the latency will be high while accessing the application. Therefore, choosing the optimum service location is the primary factor. The second factor is the charge for the service in a specific location. AWS charges more for certain crowded servers. Just for illustration, assume that the server for North California is used by many critical companies. So this might cost you twice if you create your servers at North California compared to the other locations. Hence, you should always consider the tradeoff between the location and cost and then decide on the server location. Whenever you click on any of the services, AWS will always select the location that costs you less money as the default. AWS security measures Whenever you think of moving your data center to a public cloud, the first question that arises in your mind is about data security. In a public cloud, through virtualization technology, multiple users might be using the same hardware (server) in which your data is available. You will learn in detail about how AWS ensures data security. Instance isolation Before learning about instance isolation, you must know how AWS EC2 provisions the instances to the user. This service allows you to rent virtual machines (AWS calls it instances) with whatever configurations you ask. Let's assume that you requested AWS to provision a 2 GB RAM, a 100 GB HDD, and an Ubuntu instance. Within a minute, you will be given the instance's connection details (public DNS, private IP, and so on), and the instance starts running. Does this mean that AWS assembled a 2*1 GB RAM and 100 GB HDD into a CPU cabinet and then installed Ubuntu OS in it and gave you the access? The answer is no. The provisioned instance is not a single PC (or bare metal) with an OS installed in it. The instance is the outcome of a virtual machine provisioned by Amazon's private cloud. The following diagram shows how a virtual machine can be provisioned by a private cloud: Let's examine the diagram from bottom to top. First, we will start with the underlying Hardware/Host. Hardware is the server, which usually has a very high specification. Here, assume that your hardware has the configuration of a 99 GB RAM, a 450 TB HDD, and a few other elements, such as NIC, which you need not consider now. The next component in your sights is the Hypervisor. A hypervisor or virtual machine monitor (VMM) is used to create and run virtual machines on the hardware. In private cloud terms, whichever machine runs a hypervisor on it is called the host machine. Three users can request each of them need instances with a 33 GB RAM and 150 TB HDD space. This request goes to the hypervisor and it then starts creating those VMs. After creating the VMs, a notification about the connection parameters will be sent to each user. In the preceding diagram, you can see the three virtual machines (VMs) created by the hypervisor. All the three VMs are running on different operating systems. Even if all the three virtual machines are used by different users, each will feel that only he/she has access to the single piece of hardware, which is only used by them; user 1 might not know that the same hardware is also being used by user 2, and so on. The process of creating a virtual version of a machine or storage or network is called virtualization. The funny part is that none of the virtual machines knows that it is being virtualized (that is, all the VMs are created on the same host). After getting this information about your instances, some users may feel deceived, and some will be even disappointed and cry out loud, has your instance been created on a shared disc or resource? Even though the disc (or hardware) is shared, one instance (or owner of the instance) is isolated from the other instances on the same disc through a firewall. This concept is termed as instance isolation. The following diagram demonstrates instance isolation in AWS: The preceding diagram clearly demonstrates how EC2 provides instances to every user. Even though all the instances are lying in the same disc, they are isolated by hypervisor. Hypervisor has a firewall that does this isolation. So, the physical interface will not interact with the underlying hardware (machine or disc where instances are available) or virtual interface directly. All these interactions will be through hypervisor's firewall. This way AWS ensures that no user can directly access the disc, and no instance can directly interact with another instance even if both instances are running on the same hardware. In addition to the firewall, during the creation of the EC2 instance, the user can specify the permitted and denied security groups of the instance. These two ideologies provide instance isolation. In the preceding diagram, Customer 1, Customer 2, and so on are virtualized discs since the customer instances have no access to raw or actual disc devices. As an added security measure, the user can encrypt his/her disc so that other users cannot access the disc content (even if someone gets in contact with the disc). Isolated GovCloud Similar to North California or Asia Pacific, GovCloud is also a location where you can get your AWS services. This location is specifically designed only for government and agencies whose data is very confidential and valuable, and disclosing this data might result in disaster. By default, this location will not be available to the user. If you want access to this location, then you need to raise a compliance request at http://aws.amazon.com/compliance/contact/ submit the FedRAMP Package Request Form downloadable at http://cloud.cio.gov/document/fedramp-package-request-form. From these two URLs, you can understand how secured the cloud location really is. CloudTrail CloudTrail is an AWS service that performs the user activity and changes tracking. Enabling CloudTrail will log all the API request information into your S3 bucket, which you have created solely for this purpose. CloudTrail also allows you to create an SNS topic as soon as a new logfile is created by CloudTrail. CloudTrail, in hand with SNS, provides real-time user activity as messages to the user. Password This might sound funny. After looking at CloudTrail, if you feel that someone else is accessing your account, the best option is to change the password. Never let anyone look at your password, as this could easily comprise an entire account. Sharing the password is like leaving your treasury door open. Multi-Factor Authentication Until now, to access AWS through a browser, you had to log in at http://aws.amazon.com and enter your username and password. However, enabling Multi-Factor Authentication (MFA) will add another layer of security and ask you to provide an authentication code sent to the device configured with this account. In the security credential page at https://console.aws.amazon.com/iam/home?#security_credential, there is a provision to enable MFA. Clicking on Enable will display the following window: Selecting the first option A virtual MFA device will not cost you money, but this requires a smartphone (with an Android OS), and you need to download an app from the App Store. After this, during every login, you need to look at your smartphone and enter the authentication token. More information is available at https://youtu.be/MWJtuthUs0w. Access Keys (Access Key ID and Secret Access Key) In the same security credentials page, next to MFA, these access keys will be made available. AWS will not allow you to have more than two access keys. However, you can delete and create as many access keys as possible, as shown in the following screenshot: This access key ID is used while accessing the service via the API and SDK. During this time, you must provide this ID. Otherwise, you won't be able to perform any operation. To put it in other words, if someone else gets or knows this ID, they could pretend to be you through the SDK and API. In the preceding screenshot, the first key is inactive and the second key is active. The Create New Access Key button is disabled because I already have a maximum number of allowed access keys. As an added measure, I forged my actual IDs. It is a very good practice to delete a key and create a new key every month using the Delete command link and toggle the active keys every week (by making it active and inactive) by clicking on the Make Active or Make Inactive command links. Never let anyone see these IDs. If you are ever in doubt, delete the ID and create a new one. Clicking on Create New Access Key button (assuming that you have less than two IDs) will display the following window, asking you to download the new access key ID as a CSV file: The CloudFront key pairs The CloudFront key pairs are very similar to the access-key IDs. Without these keys, you will not be able to perform any operation on CloudFront. Unlike the access key ID (which has only access key ID and secret access key), here you will have a private key and a public key along with the access key ID, as shown in the following screenshot: If you lose these keys once, then you need to delete the key pair and create a new key pair. This is also an added security measure. X.509 certificates X.509 certificates are mandatory if you wish to make any SOAP requests on any AWS service. Clicking on Create new certificate will display the following window, which performs exactly the same function: Account identifiers There are two IDs that are used to identify ourselves when accessing the service via the API or SDK. These are the AWS account ID and the canonical user ID. These two IDs are unique. Just as with the preceding parameters, never share these IDs or let anyone see them. If someone has your access ID or key pair, the best option is generate a new one. But it is not possible to generate a new account ID or canonical user ID. Summary In this article, you learned the AWS Management Console and its commonly used SDKs and IDEs. You also learned how AWS secures your data. Then, you looked at the AWS plugin configuration on the Eclipse IDE. The first part made the user familiar with the AWS Management Console. After that, you explored a few of the important security aspects of AWS and learned how AWS handles it. Finally, you learned about the different AWS tools available to the programmer to make his development work easier. In the end, you examined the common SDKs and IDE tools of AWS. Resources for Article: Further resources on this subject: Amazon DynamoDB - Modelling relationships, Error handling [article] A New Way to Scale [article] Deployment and Post Deployment [article]
Read more
  • 0
  • 0
  • 23415

article-image-getting-ready-fight
Packt
08 Sep 2016
15 min read
Save for later

Getting Ready to Fight

Packt
08 Sep 2016
15 min read
In this article by Ashley Godbold author of book Mastering Unity 2D Game Development, Second Edition, we will start out by laying the main foundation for the battle system of our game. We will create the Heads Up Display (HUD) as well as design the overall logic of the battle system. The following topics will be covered in this article: Creating a state manager to handle the logic behind a turn-based battle system Working with Mecanim in the code Exploring RPG UI Creating the game's HUD (For more resources related to this topic, see here.) Setting up our battle statemanager The most unique and important part of a turn-based battle system is the turns. Controlling the turns is incredibly important, and we will need something to handle the logic behind the actual turns for us. We'll accomplish this by creating a battle state machine. The battle state manager Starting back in our BattleScene, we need to create a state machine using all of Mecanim's handy features. Although we will still only be using a fraction of the functionality with the RPG sample, I advise you to investigate and read more about its capabilities. Navigate to AssetsAnimationControllers and create a new Animator Controller called BattleStateMachine, and then we can begin putting together the battle state machine. The following screenshot shows you the states, transitions, and properties that we will need: As shown in the preceding screenshot, we have created eight states to control the flow of a battle with two Boolean parameters to control its transition. The transitions are defined as follows: From Begin_Battle to Intro BattleReadyset to true Has Exit Timeset to false (deselected) Transition Durationset to 0 From Intro to Player_Move Has Exit Timeset totrue Exit Timeset to0.9 Transition Durationset to2 From Player_Move to Player_Attack PlayerReadyset totrue Has Exit Timeset tofalse Transition Durationset to0 From Player_Attack to Change_Control PlayerReadyset tofalse Has Exit Timeset tofalse Transition Durationset to2 From Change_Control to Enemy_Attack Has Exit Timeset totrue Exit Timeset to0.9 Transition Durationset to2 From Enemy_Attack to Player_Move BattleReadyset totrue Has Exit Timeset tofalse Transition Durationset to2 From Enemy_Attack to Battle_Result BattleReadyset tofalse Has Exit Timeset tofalse Transition Timeset to2 From Battle_Result to Battle_End Has Exit Timeset totrue Exit Timeset to0.9 Transition Timeset to5 Summing up, what we have built is a steady flow of battle, which can be summarized as follows: The battle begins and we show a little introductory clip to tell the player about the battle. Once the player has control, we wait for them to finish their move. We then perform the player's attack and switch the control over to the enemy AI. If there are any enemies left, they get to attack the player (if they are not too scared and have not run away). If the battle continues, we switch back to the player, otherwise we show the battle result. We show the result for five seconds (or until the player hits a key), and then finish the battle and return the player to the world together with whatever loot and experience gained. This is just a simple flow, which can be extended as much as you want, and as we continue, you will see all the points where you could expand it. With our animator state machine created, we now just need to attach it to our battle manager so that it will be available when the battle runs; the following are the ensuing steps to do this: Open up BattleScene. Select the BattleManager game object in the project Hierarchy and add an Animator component to it. Now drag the BattleStateMachine animator controller we just created into the Controller property of the Animator component. The preceding steps attached our new battle state machine to our battle engine. Now, we just need to be able to reference the BattleStateMachine Mecanim state machine from theBattleManager script. To do so, open up the BattleManager script in AssetsScripts and add the following variable to the top of the class: private Animator battleStateManager; Then, to capture the configuredAnimator in our BattleManager script, we add the following to an Awake function place before the Start function: voidAwake(){ battleStateManager=GetComponent<Animator>(); if(battleStateManager==null){ Debug.LogError("NobattleStateMachineAnimatorfound."); }   } We have to assign it this way because all the functionality to integrate the Animator Controller is built into the Animator component. We cannot simply attach the controller directly to the BattleManager script and use it. Now that it's all wired up, let's start using it. Getting to the state manager in the code Now that we have our state manager running in Mecanim, we just need to be able to access it from the code. However, at first glance, there is a barrier to achieving this. The reason being that the Mecanim system uses hashes (integer ID keys for objects) not strings to identify states within its engine (still not clear why, but for performance reasons probably). To access the states in Mecanim, Unity provides a hashing algorithm to help you, which is fine for one-off checks but a bit of an overhead when you need per-frame access. You can check to see if a state's name is a specific string using the following: GetCurrentAnimatorStateInfo(0).IsName("Thing you're checking") But there is no way to store the names of the current state, to a variable. A simple solution to this is to generate and cache all the state hashes when we start and then use the cache to talk to the Mecanim engine. First, let's remove the placeholder code, for the old enum state machine.So, remove the following code from the top of the BattleManager script: enum BattlePhase {   PlayerAttack,   EnemyAttack } private BattlePhase phase; Also, remove the following line from the Start method: phase = BattlePhase.PlayerAttack; There is still a reference in the Update method for our buttons, but we will update that shortly; feel free to comment it out now if you wish, but don't delete it. Now, to begin working with our new state machine, we need a replica of the available states we have defined in our Mecanim state machine. For this, we just need an enumeration using the same names (you can create this either as a new C# script or simply place it in the BattleManager class) as follows: publicenumBattleState { Begin_Battle, Intro, Player_Move, Player_Attack, Change_Control, Enemy_Attack, Battle_Result, Battle_End } It may seem strange to have a duplicate of your states in the state machine and in the code; however, at the time of writing, it is necessary. Mecanim does not expose the names of the states outside of the engine other than through using hashes. You can either use this approach and make it dynamic, or extract the state hashes and store them in a dictionary for use. Mecanim makes the managing of state machines very simple under the hood and is extremely powerful, much better than trawling through code every time you want to update the state machine. Next, we need a location to cache the hashes the state machine needs and a property to keep the current state so that we don't constantly query the engine for a hash. So, add a new using statement to the beginning of the BattleManager class as follows: using System.Collections; using System.Collections.Generic; using UnityEngine; Then, add the following variables to the top of the BattleManager class: private Dictionary<int, BattleState> battleStateHash = new Dictionary<int, BattleState>(); private BattleState currentBattleState; Finally, we just need to integrate the animator state machine we have created. So, create a new GetAnimationStates method in the BattleManager class as follows: void GetAnimationStates() {   foreach (BattleState state in (BattleState[])System.Enum.     GetValues(typeof(BattleState)))   {     battleStateHash.Add(Animator.StringToHash       (state.ToString()), state);   } } This simply generates a hash for the corresponding animation state in Mecanim and stores the resultant hashes in a dictionary that we can use without having to calculate them at runtime when we need to talk to the state machine. Sadly, there is no way at runtime to gather the information from Mecanim as this information is only available in the editor. You could gather the hashes from the animator and store them in a file to avoid this, but it won't save you much. To complete this, we just need to call the new method in the Start function of the BattleManager script by adding the following: GetAnimationStates(); Now that we have our states, we can use them in our running game to control both the logic that is applied and the GUI elements that are drawn to the screen. Now add the Update function to the BattleManager class as follows: voidUpdate() {   currentBattleState = battleStateHash[battleStateManager.     GetCurrentAnimatorStateInfo(0).shortNameHash];     switch (currentBattleState)   {     case BattleState.Intro:       break;     case BattleState.Player_Move:       break;     case BattleState.Player_Attack:       break;     case BattleState.Change_Control:       break;     case BattleState.Enemy_Attack:       break;     case BattleState.Battle_Result:       break;     case BattleState.Battle_End:       break;     default:       break;   } } The preceding code gets the current state from the animator state machine once per frame and then sets up a choice (switch statement) for what can happen based on the current state. (Remember, it is the state machine that decides which state follows which in the Mecanim engine, not nasty nested if statements everywhere in code.) Now we are going to update the functionality that turns our GUI button on and off. Update the line of code in the Update method we wrote as follows: if(phase==BattlePhase.PlayerAttack){ so that it now reads: if(currentBattleState==BattleState.Player_Move){ This will make it so that the buttons are now only visible when it is time for the player to perform his/her move. With these in place, we are ready to start adding in some battle logic. Starting the battle As it stands, the state machine is waiting at the Begin_Battle state for us to kick things off. Obviously, we want to do this when we are ready and all the pieces on the board are in place. When the current Battle scene we added, starts, we load up the player and randomly spawn in a number of enemies into the fray using a co-routine function called SpawnEnemies. So, only when all the dragons are ready and waiting to be chopped down do we want to kick things off. To tell the state machine to start the battle, we simple add the following line just after the end of the forloop in the SpawnEnemies IEnumerator co-routine function: battleStateManager.SetBool("BattleReady", true); Now when everything is in place, the battle will finally begin. Introductory animation When the battle starts, we are going to display a little battle introductory image that states who the player is going to be fighting against. We'll have it slide into the scene and then slide out. You can do all sorts of interesting stuff with this introductory animation, like animating the individual images, but I'll leave that up to you to play with. Can't have all the fun now, can I? Start by creating a new Canvas and renaming it IntroCanvas so that we can distinguish it from the canvas that will hold our buttons. At this point, since we are adding a second canvas into the scene, we should probably rename ours to something that is easier for you to identify. It's a matter of preference, but I like to use different canvases for different UI elements. For example, one for the HUD, one for pause menus, one for animations, and so on. You can put them all on a single canvas and use Panels and CanvasGroup components to distinguish between them; it's really up to you. As a child of the new IntroCanvas, create a Panel with the properties shown in the following screenshot. Notice that the Imageoblect's Color property is set to black with the alpha set to about half: Now add as a child of the Panel two UI Images and a UI Text. Name the first image PlayerImage and set its properties as shown in the following screenshot. Be sure to set Preserve Aspect to true: Name the second image EnemyImage and set the properties as shown in the following screenshot: For the text, set the properties as shown in the following screenshot: Your Panel should now appear as mine did in the image at the beginning of this section. Now let's give this Panel its animation. With the Panel selected, select the Animation tab. Now hit the Create button. Save the animation as IntroSlideAnimation in the Assets/Animation/Clipsfolder. At the 0:00 frame, set the Panel's X position to 600, as shown in the following screenshot: Now, at the 0:45 frame, set the Panel's X position to 0. Place the playhead at the 1:20 frame and set the Panel's X position to 0, there as well, by selecting Add Key, as shown in the following screenshot: Create the last frame at 2:00 by setting the Panel's X position to -600. When the Panel slides in, it does this annoying bounce thing instead of staying put. We need to fix this by adjusting the animation curve. Select the Curves tab: When you select the Curves tab, you should see something like the following: The reason for the bounce is the wiggle that occurs between the two center keyframes. To fix this, right-click on the two center points on the curve represented by red dots and select Flat,as shown in the following screenshot: After you do so, the curve should be constant (flat) in the center, as shown in the following screenshot: The last thing we need to do to connect this to our BattleStateMananger isto adjust the properties of the Panel's Animator. With the Panel selected, select the Animator tab. You should see something like the following: Right now, the animation immediately plays when the scene is entered. However, since we want this to tie in with our BattleStateManager and only begin playing in the Intro state, we do not want this to be the default animation. Create an empty state within the Animator and set it as the default state. Name this state OutOfFrame. Now make a Trigger Parameter called Intro. Set the transition between the two states so that it has the following properties: The last things we want to do before we move on is make it so this animation does not loop, rename this new Animator, and place our Animator in the correct subfolder. In the project view, select IntroSlideAnimation from the Assets/Animation/Clips folder and deselect Loop Time. Rename the Panel Animator to VsAnimator and move it to the Assets/Animation/Controllersfolder. Currently, the Panel is appearing right in the middle of the screen at all times, so go ahead and set the Panel's X Position to600, to get it out of the way. Now we can access this in our BattleStateManager script. Currently, the state machine pauses at the Intro state for a few seconds; let's have our Panel animation pop in. Add the following variable declarations to our BattleStateManager script: public GameObjectintroPanel; Animator introPanelAnim; And add the following to the Awake function: introPanel Anim=introPanel.GetComponent<Animator>(); Now add the following to the case line of the Intro state in the Updatefunction: case BattleState.Intro: introPanelAnim.SetTrigger("Intro"); break; For this to work, we have to drag and drop the Panel into the Intro Panel slot in the BattleManager Inspector. As the battle is now in progress and the control is being passed to the player, we need some interaction from the user. Currently, the player can run away, but that's not at all interesting. We want our player to be able to fight! So, let's design a graphic user interface that will allow her to attack those adorable, but super mean, dragons. Summary Getting the battle right based on the style of your game is very important as it is where the player will spend the majority of their time. Keep the player engaged and try to make each battle different in some way, as receptiveness is a tricky problem to solve and you don't want to bore the player. Think about different attacks your player can perform that possibly strengthen as the player strengthens. In this article, you covered the following: Setting up the logic of our turn-based battle system Working with state machines in the code Different RPG UI overlays Setting up the HUD of our game so that our player can do more than just run away Resources for Article: Further resources on this subject: Customizing an Avatar in Flash Multiplayer Virtual Worlds [article] Looking Good – The Graphical Interface [article] The Vertex Functions [article]
Read more
  • 0
  • 0
  • 23415
article-image-how-build-your-own-futuristic-robot
Packt
13 Sep 2016
5 min read
Save for later

How to Build your own Futuristic Robot

Packt
13 Sep 2016
5 min read
In this article by Richard Grimmett author of the book Raspberry Pi Robotic Projects - Third Edition we will start with simple but impressive project where you'll take a toy robot and give it much more functionality. You'll start with an R2D2 toy robot and modify it to add a web cam, voice recognition, and motors so that it can get around. Creating your own R2D2 will require a bit of mechanical work, you'll need a drill and perhaps a Dremel tool, but most of the mechanical work will be removing the parts you don't need so you can add some exciting new capabities. (For more resources related to this topic, see here.) Modifying the R2D2 There are several R2D2 toys that can provide the basis for this project. Both are available from online retailers. This project will use one that is both inexpensive but also provides such interesting features as a top that turns and a wonderful place to put a webcam. It is the Imperial Toy R2D2 bubble machine. Here is a picture of the unit: The unit can be purchased at amazon.com, toyrus.com, and a number of other retailers. It is normally used as a bubble machine that uses a canister of soap bubbles to produce bubbles, but you'll take all of that capability out to make your R2D2 much more like the original robot. Adding wheels and motors In order to make your R2D2 a reality the first thing you'll want to do is add wheels to the robot. In order to do this you'll need to take the robot apart, separating the two main plastic pieces that make up the lower body of the robot. Once you have done this both the right and left arms can be removed from the body. You'll need to add two wheels that are controlled by DC motors to these arms. Perhaps the best way to do this is to purchase a simple, two-wheeled car that is available at many online electronics stores like amazon.com, ebay.com, or bandgood.com. Here is a picture of the parts that come with the car: You'll be using these pieces to add mobility to your robot.  The two yellow pieces are dc motors. So, let's start with those. To add these to the two arms on either side of the robot, you'll need to separate the two halves of the arm, and then remove material from one of the halves, like this: You can use a Dremel tool to do this, or any kind of device that can cut plastic. This will leave a place for your wheel. Now you'll want to cut the plastic kit of your car up to provide a platform to connect to your R2D2 arm. You'll cut your plastic car up using this as a pattern, you'll want to end up with the two pieces that have the + sign cutouts, and this is where you'll mount your wheels and also the piece you'll attach to the R2D2 arm. The image below will help you understand this better. On the cut out side that has not been removed, mark and drill two holes to fix the clear plastic to the bottom of the arm. Then fix the wheel to the plastic, then the plastic to the bottom of the arm as shown in the picture. You'll connect two wires, one to each of the polarities on the motor, and then run the wires up to the top of the arm and out the small holes. These wires will eventually go into the body of the robot through small holes that you will drill where the arms connect to the body, like this: You'll repeat this process for the other arm. For the third, center arm, you'll want to connect the small, spinning white wheel to the bottom of the arm. Here is a picture: Now that you have motors and wheels connected to the bottom of arms you'll need to connect these to the Raspberry Pi. There are several different ways to connect and drive these two DC motors, but perhaps the easiest is to add a shield that can directly drive a DC motor. This motor shield is an additional piece of hardware that installs on the top of Raspberry Pi and can source the voltage and current to power both motors. The RaspiRobot Board V3 is available online and can provide these signals. The specifics on the board can be found at http://www.monkmakes.com/rrb3/. Here is a picture of the board: The board will provide the drive signals for the motors on each of the wheels. The following are the steps to connect Raspberry Pi to the board: First, connect the battery power connector to the power connector on the side of the board. Next, connect the two wires from one of the motors to the L motor connectors on the board. Connect the other two wires from the other motor to the R motor connectors on the board. Once completed your connections should look like this: The red and black wires go to the battery, the green and yellow to left motor, the blue and white to the right motor. Now you will be able to control both the speed and the direction of the motors through the motor control board. Summary Thus we have covered some aspect of building first project, your own R2D2. You can now move it around, program it to respond to voice commands, or run it remotely from a computer, tablet or phone. Following in this theme your next robot will look and act like WALL-E. Resources for Article: Further resources on this subject: The Raspberry Pi and Raspbian [article] Building Our First Poky Image for the Raspberry Pi [article] Raspberry Pi LED Blueprints [article]
Read more
  • 0
  • 0
  • 23414

article-image-github-sponsors-could-corporate-strategy-eat-foss-culture-for-dinner
Sugandha Lahoti
24 May 2019
4 min read
Save for later

Github Sponsors: Could corporate strategy eat FOSS culture for dinner?

Sugandha Lahoti
24 May 2019
4 min read
Yesterday, at the GitHub Satellite event 2019, GitHub launched probably its most game-changing yet debatable feature - Sponsors. GitHub Sponsors works exactly like Patreon, in the sense that developers can sponsor the efforts of a contributor "seamlessly through their GitHub profiles". Developers will be able to opt into having a “Sponsor me” button on their GitHub repositories and open-source projects where they will be able to highlight their funding models. GitHub shared that they will cover payment processing fees for the first 12 months of the program to celebrate the launch. “100% percent of your sponsorship goes to the developer," GitHub wrote in an announcement. At launch, this feature is marked as "wait list" and is currently in beta. To start off this program, the code hosting site has also launched GitHub Sponsors Matching Fund. This means that it will match all contributions up to $5,000 during a developer’s first year in GitHub Sponsors. GitHub sponsors could prove beneficial for developers working on open source software, that isn't profitable. This way they can easily raise money from GitHub directly which is the leading repository for open-source software. More importantly, GitHub sponsors is not just limited to software developers, but all open-source contributors, including those who write documentation, provide leadership or mentor new developers, for example. This and the promising zero fees to use the program has got people excited. https://twitter.com/rauchg/status/1131807348820008960 https://twitter.com/EricaJoy/status/1131640959886741504 While on the flip side, GitHub Sponsors can also limit the essence of what open source is, by financially influencing developers on what they will work on. It may drive open-source developers to focus on projects that are more likely to attract financial contributions over projects which are more interesting and challenging but aren’t likely to find financial backers on GitHub. This can hurt FOSS contributions as people start to expect to be paid rather than doing it for inherent motivations. This, in turn, could lead to toxic politics among project contributors regarding who gets credit and who gets paid. Companies could also use GitHub sponsorships to judge the health of open source projects. People are also speculating that this can possibly be Microsoft’s (GitHub’s parent company) strategy to centralize and enclose open source community dynamics, as well as benefit from its monetization. Some are also wondering the plausible effects of monetization on OSS, which can possibly lead to mega corporations profiteering off free labor, thus changing the original vision of an open source community. https://twitter.com/andrestaltz/status/1131521807876591616 Andre Staltz also made an interesting point about the potential on the zero fee model driving out other open source payment models from existence. He believes once Microsoft’s dominance is achieved Github's commissions could go up. https://twitter.com/andrestaltz/status/1131526433027837952 A Hacker News user also conjectured that this may also get Microsoft access to data on top-notch developers. “Will this mean that Microsoft gets a bunch of PII on top-notch developers (have to enter name + address info to receive or send payments), and get much more value from that data than I can imagine?” At present GitHub is offering this feature as an invite-only beta with a waitlist, it will be interesting to see if and how this will change the dynamics of open source collaboration, once it rolls out fully. A tweet observes: “I think it bears repeating that the path to FOSS sustainability is not individuals funding projects. We will only reach sustainability when the companies making profit off our work are returning value to the Commons.” Read our full coverage on GitHub Satellite here. To know more about GitHub sponsors, visit the official blog. GitHub Satellite 2019 focuses on community, security, and enterprise GitHub announces beta version of GitHub Package Registry, its new package management service GitHub deprecates and then restores Network Graph after GitHub users share their disapproval
Read more
  • 0
  • 0
  • 23414

article-image-what-naive-bayes-classifier
Packt
22 Feb 2016
9 min read
Save for later

What is Naïve Bayes classifier?

Packt
22 Feb 2016
9 min read
The name Naïve Bayes comes from the basic assumption in the model that the probability of a particular feature Xi is independent of any other feature Xj given the class label CK. This implies the following: Using this assumption and the Bayes rule, one can show that the probability of class CK, given features {X1,X2,X3,...,Xn}, is given by: Here, P(X1,X2,X3,...,Xn) is the normalization term obtained by summing the numerator on all the values of k. It is also called Bayesian evidence or partition function Z. The classifier selects a class label as the target class that maximizes the posterior class probability P(CK |{X1,X2,X3,...,Xn}): The Naïve Bayes classifier is a baseline classifier for document classification. One reason for this is that the underlying assumption that each feature (words or m-grams) is independent of others, given the class label typically holds good for text. Another reason is that the Naïve Bayes classifier scales well when there is a large number of documents. There are two implementations of Naïve Bayes. In Bernoulli Naïve Bayes, features are binary variables that encode whether a feature (m-gram) is present or absent in a document. In multinomial Naïve Bayes, the features are frequencies of m-grams in a document. To avoid issues when the frequency is zero, a Laplace smoothing is done on the feature vectors by adding a 1 to each count. Let's look at multinomial Naïve Bayes in some detail. Let ni be the number of times the feature Xi occurred in the class CK in the training data. Then, the likelihood function of observing a feature vector X={X1,X2,X3,..,Xn}, given a class label CK, is given by: Here, is the probability of observing the feature Xi in the class CK. Using Bayesian rule, the posterior probability of observing the class CK, given a feature vector X, is given by: Taking logarithm on both the sides and ignoring the constant term Z, we get the following: So, by taking logarithm of posterior distribution, we have converted the problem into a linear regression model with as the coefficients to be determined from data. This can be easily solved. Generally, instead of term frequencies, one uses TF-IDF (term frequency multiplied by inverse frequency) with the document length normalized to improve the performance of the model. The R package e1071 (Miscellaneous Functions of the Department of Statistics) by T.U. Wien contains an R implementation of Naïve Bayes. For this article, we will use the SMS spam dataset from the UCI Machine Learning repository (reference 1 in the References section of this article). The dataset consists of 425 SMS spam messages collected from the UK forum Grumbletext, where consumers can submit spam SMS messages. The dataset also contains 3375 normal (ham) SMS messages from the NUS SMS corpus maintained by the National University of Singapore. The dataset can be downloaded from the UCI Machine Learning repository (https://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection). Let's say that we have saved this as file SMSSpamCollection.txt in the working directory of R (actually, you need to open it in Excel and save it is as tab-delimited file for it to read in R properly). Then, the command to read the file into the tm (text mining) package would be the following: >spamdata ←read.table("SMSSpamCollection.txt",sep="\t",stringsAsFactors = default.stringsAsFactors()) We will first separate the dependent variable y and independent variables x and split the dataset into training and testing sets in the ratio 80:20, using the following R commands: >samp←sample.int(nrow(spamdata),as.integer(nrow(spamdata)*0.2),replace=F) >spamTest ←spamdata[samp,] >spamTrain ←spamdata[-samp,] >ytrain←as.factor(spamTrain[,1]) >ytest←as.factor(spamTest[,1]) >xtrain←as.vector(spamTrain[,2]) >xtest←as.vector(spamTest[,2]) Since we are dealing with text documents, we need to do some standard preprocessing before we can use the data for any machine learning models. We can use the tm package in R for this purpose. In the next section, we will describe this in some detail. Text processing using the tm package The tm package has methods for data import, corpus handling, preprocessing, metadata management, and creation of term-document matrices. Data can be imported into the tm package either from a directory, a vector with each component a document, or a data frame. The fundamental data structure in tm is an abstract collection of text documents called Corpus. It has two implementations; one is where data is stored in memory and is called VCorpus (volatile corpus) and the second is where data is stored in the hard disk and is called PCorpus (permanent corpus). We can create a corpus of our SMS spam dataset by using the following R commands; prior to this, you need to install the tm package and SnowballC package by using the install.packages("packagename") command in R: >library(tm) >library(SnowballC) >xtrain ← VCorpus(VectorSource(xtrain)) First, we need to do some basic text processing, such as removing extra white space, changing all words to lowercase, removing stop words, and stemming the words. This can be achieved by using the following functions in the tm package: >#remove extra white space >xtrain ← tm_map(xtrain,stripWhitespace) >#remove punctuation >xtrain ← tm_map(xtrain,removePunctuation) >#remove numbers >xtrain ← tm_map(xtrain,removeNumbers) >#changing to lower case >xtrain ← tm_map(xtrain,content_transformer(tolower)) >#removing stop words >xtrain ← tm_map(xtrain,removeWords,stopwords("english")) >#stemming the document >xtrain ← tm_map(xtrain,stemDocument) Finally, the data is transformed into a form that can be consumed by machine learning models. This is the so called document-term matrix form where each document (SMS in this case) is a row, the terms appearing in all documents are the columns, and the entry in each cell denotes how many times each word occurs in one document: >#creating Document-Term Matrix >xtrain ← as.data.frame.matrix(DocumentTermMatrix(xtrain)) The same set of processes is done on the xtest dataset as well. The reason we converted y to factors and xtrain to a data frame is to match the input format for the Naïve Bayes classifier in the e1071 package. Model training and prediction You need to first install the e1071 package from CRAN. The naiveBayes() function can be used to train the Naïve Bayes model. The function can be called using two methods. The following is the first method: >naiveBayes(formula,data,laplace=0, ,subset,na.action=na.pass) Here formula stands for the linear combination of independent variables to predict the following class: >class ~ x1+x2+… Also, data stands for either a data frame or contingency table consisting of categorical and numerical variables. If we have the class labels as a vector y and dependent variables as a data frame x, then we can use the second method of calling the function, as follows: >naiveBayes(x,y,laplace=0,…) We will use the second method of calling in our example. Once we have a trained model, which is an R object of class naiveBayes, we can predict the classes of new instances as follows: >predict(object,newdata,type=c(class,raw),threshold=0.001,eps=0,…) So, we can train the Naïve Bayes model on our training dataset and score on the test dataset by using the following commands: >#Training the Naive Bayes Model >nbmodel ← naiveBayes(xtrain,ytrain,laplace=3) >#Prediction using trained model >ypred.nb ← predict(nbmodel,xtest,type = "class",threshold = 0.075) >#Converting classes to 0 and 1 for plotting ROC >fconvert ← function(x){ if(x == "spam"){ y ← 1} else {y ← 0} y } >ytest1 ← sapply(ytest,fconvert,simplify = "array") >ypred1 ← sapply(ypred.nb,fconvert,simplify = "array") >roc(ytest1,ypred1,plot = T)  Here, the ROC curve for this model and dataset is shown. This is generated using the pROC package in CRAN: >#Confusion matrix >confmat ← table(ytest,ypred.nb) >confmat pred.nb ytest ham spam ham 143 139 spam 9 35 From the ROC curve and confusion matrix, one can choose the best threshold for the classifier, and the precision and recall metrics. Note that the example shown here is for illustration purposes only. The model needs be to tuned further to improve accuracy. We can also print some of the most frequent words (model features) occurring in the two classes and their posterior probabilities generated by the model. This will give a more intuitive feeling for the model exercise. The following R code does this job: >tab ← nbmodel$tables >fham ← function(x){ y ← x[1,1] y } >hamvec ← sapply(tab,fham,simplify = "array") >hamvec ← sort(hamvec,decreasing = T) >fspam ← function(x){ y ← x[2,1] y } >spamvec ← sapply(tab,fspam,simplify = "array") >spamvec ← sort(spamvec,decreasing = T) >prb ← cbind(spamvec,hamvec) >print.table(prb)  The output table is as follows: word Prob(word|spam) Prob(word|ham) call 0.6994 0.4084 free 0.4294 0.3996 now 0.3865 0.3120 repli 0.2761 0.3094 text 0.2638 0.2840 spam 0.2270 0.2726 txt 0.2270 0.2594 get 0.2209 0.2182 stop 0.2086 0.2025 The table shows, for example, that given a document is spam, the probability of the word call appearing in it is 0.6994, whereas the probability of the same word appearing in a normal document is only 0.4084. Summary In this article, we learned a basic and popular method for classification, Naïve Bayes, implemented using the Bayesian approach. For further information on Bayesian models, you can refer to: https://www.packtpub.com/big-data-and-business-intelligence/data-analysis-r https://www.packtpub.com/big-data-and-business-intelligence/building-probabilistic-graphical-models-python Resources for Article: Further resources on this subject: Introducing Bayesian Inference [article] Practical Applications of Deep Learning [article] Machine learning in practice [article]
Read more
  • 0
  • 0
  • 23340
article-image-twitter-sentiment-analysis
Packt
19 Feb 2016
30 min read
Save for later

Twitter Sentiment Analysis

Packt
19 Feb 2016
30 min read
In this article, we will cover: Twitter and it's importance Getting hands on with Twitter's data and using various Twitter APIs Use of data to solve business problems—comparison of various businesses based on tweets (For more resources related to this topic, see here.) Twitter and its importance Twitter can be considered as extension of the short messages service or SMS but on an Internet-based platform. In the words of Jack Dorsey, co-founder and co-creator of Twitter: "...We came across the word 'twitter', and it was just perfect. The definition was 'a short burst of inconsequential information,' and 'chirps from birds'. And that's exactly what the product was" Twitter acts as a utility where one can send their SMSs to the whole world. It enables people to instantaneously get heard and get a response. Since the audience of this SMS is so large, many a times responses are very quick. So, Twitter facilitates the basic social instincts of humans. By sharing on Twitter, a user can easily express his/her opinion for just about everything and at anytime. Friends who are connected or, in case of Twitter, followers, immediately get the information about what's going on in someone's life. This in turn severs another humanemotion—the innate need to know about what is going on in someone's life. Apart from being real time, Twitter's UI is really easy to work with. It's naturally and instinctively understood, that is, the UI is very intuitive in nature. Each tweet on Twitter is a short message with maximum of 140 characters. Twitter is an excellent example of a microblogging service. As of July 2014, the Twitter user base reached above 500 million, with more than 271 million active users. Around 23 percent are adult Internet users, which is also about 19 percent of the entire adult population. If we can properly mine what users are tweeting about, Twitter can act as a great tool for advertisement and marketing. But this not the only information Twitter provides. Because of its non-symmetric nature in terms of followers and followings, Twitter assists better in terms of understanding user interests rather than its impact on the social network. An interest graph can be thought of as a method to learn the links between individuals and their diverse interests. Computing the degree of association or correlations between individual's interests and the potential advertisements are one of the most important applications of the interest graphs. Based on these correlations, a user can be targeted so as to attain a maximum response to an advertisement campaign along with followers' recommendations. One interesting fact about Twitter (and Facebook) is that the user does not need to be a real person. A user on Twitter (or on Facebook) can be anything and anyone, for example, an organization, a campaign itself, a famous but imaginary personality (a fictional character recognizable in the media) apart from a real/actual person. If a real person follows these users on Twitter, a lot can be inferred about their personality and hence they can be recommended ads or other followers based on such information. For example, @fakingnews is an Indian blog that publishes news satires ranging from Indian politics to typical Indian mindsets. People who follow @fakingnews are the ones who, in general, like to read sarcasm news. Hence, these people can be thought of as to belonging to the same cluster or a community. If we have another sarcastic blog, we can always recommend it to this community and improve on advertisement return on investment. The chances of getting more hits via people belonging to this community will be higher than a community who don't follows @fakingnews, or any such news, in general. Once you have comprehended that Twitter allows you to create, link, and investigate a community of interest for a random topic, the influence of Twitter and the knowledge one can find from mining it becomes clearer. Understanding Twitter's API Twitter APIs provide a means to access the Twitter data, that is, tweets sent by its millions of users. Let's get to know these APIs a bit better. Twitter vocabulary As described earlier, Twitter is a microblogging service with social aspect associated. It allows its users to express their views/sentiments with the means of Internet SMS, called tweets in the context of Twitter. These tweets are entities formed of maximum of 140 characters. The content of these tweets can be anything ranging from a person's mood to person's location to a person's curiosity. The platform where these tweets are posted is called Timeline. To use Twitter's APIs, one must understand the basic terminology. Tweets are the crux of Twitter. Theoretically, a tweet is just 140 characters of text content tweeted by a user, but there is more to it than just that. There is more metadata associated with the same tweet, which are classified by Twitter as entities and places. The entities constitute of hash tags, URLs, and other media data that users have included in their tweet. The places are nothing but locations from where the tweet originated. It possible the place is a real world location from where the tweet was sent, or it is a location mentioned in the text of the tweet. Take the following tweet as an example: Learn how to consume millions of tweets with @twitterapi at #TDC2014 in São Paulo #bigdata tomorrow at 2:10pm http://t.co/pTBlWzTvVd The preceding tweet was tweeted by @TwitterDev and it's about 132 characters long. The following are the entities mentioned in this tweet: Handle: @twitterapi Hashtags: #TDC2014, #bigdata URL: http://t.co/pTBlWzTvVd São Paulo is the place mentioned in this tweet. This is a one such example of a tweet with a fairly good amount of metadata. Although the actual tweet's length is well within the 140-character limit, it contains more information than one can think of. This actually enables us to figure out that this tweet belongs to a specific community based on the cross referencing the topics presents in the hash tags, the URL to the website, the different users mentioned in it, and so on. The interface (web or mobile) on to which the tweets are displayed is called timeline. The tweets are, in general, arranged in chronological order of posting time. On a specific user's account, only certain number of tweets are displayed by Twitter. This is generally based on users the given user is following and is being followed by. This is the interface a user will see when he/she login his/her Twitter account. A Twitter stream is different from Twitter timeline in the sense that they are not for a specific user. The Tweets on a user's Twitter timeline will be displayed from only certain number of users will be displayed/updated less frequently while the Twitter stream is chronological collection of the all the tweets posted by all the users. The number of active users on Twitter is in orders of hundreds of millions. All the users tweeting during some public events of widespread interest such as presidential debates can achieve speeds of several hundreds of thousands of tweets per minute. The behavior is very similar to a stream; hence the name of such collection is Twitter stream. You can try the following by creating a Twitter account (it would be more insightful if you have less number of followers already with you). Before creating the account, it is advised that you read all the terms and conditions of the same. You can also start reading its API's documentation. Creating a Twitter API connection We need to have an app created at https://dev.twitter.com/apps before making any API requests to Twitter. It's a standard method for developers to gain API access and more important it helps Twitter to observe and restricts developer from making high load API requests. The ROAuth package is the one we are going to use in our experiments. Tokens allow users to authorize third-party apps to access the data from any user account without the need to have their passwords (or other sensitive information). ROAuth basically facilitates the same. Creating new app The first step to getting any kind of token access from twitter is to create an app on it. The user has to go to https://dev.twitter.com/ and log in with their Twitter credentials. With you logged in using your credentials, the step for creating app are as follows: Go to https://apps.twitter.com/app/new. Put the name of your application in the Name field. This name can be anything you like. Similarly, enter the description in the Description field. The Website field needs to be filled with a valid URL, but again that can be any random URL. You can leave the Callback URL field blank. After the creation of this app, we need to find the API Key and API Secret values from the Key and Access Token tab. Consider the example shown in the following figure:   Under the Key and Access Tokens tab, you will find a button to generate access tokens. Click on it and you will be provided with an Access Token and Access Token Secret value. Before using the preceding keys, we need to install twitteRto access the data in R using the app we just created, using following code: Install.packages(c("devtools", "rjson", "bit64", "httr")) library(devtools) install_github("geoffjentry/twitteR"). library(twitteR) Here's sample code that helps us access the tweets posted since any give date and which contain a specific keyword. In this example, we are searching for tweeting containing the word Earthquake in the tweets posted since September 29, 2014. In order to get this information, we provide four special types of information to get the authorization token: key secret access token access token secret We'll show you how to use the preceding information to get an app authorized by the user and access its resources on Twitter. The ROAuh function in twitteR will make our next steps very smooth and clear: api_key<- "your_api_key" api_secret<- "your_api_secret" access_token<- "your_access_token" access_token_secret<- "your_access_token_secret" setup_twitter_oauth (api_key,api_secret,access_token,access_token_secret) EarthQuakeTweets = searchTwitter("EarthQuake", since='2014-09-29') The results of this example should simply display Using direct authentication with 25 tweets loaded in the EarthQuakeTweets variable as shown here. head(EarthQuakeTweets,2) [[1]] [1] "TamamiJapan: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One Month After The Earthquake and Tsunami, 2011. Incredible. http…" [[2]] [1] "OldhamDs: RT @HistoricalPics: Japan. Top: One Month After Hiroshima, 1945. Bottom: One Month After The Earthquake and Tsunami, 2011. Incredible. http…" We have shown in the first two of the 25 tweets containing the word Earthquake since September 29, 2014. If you closely observe the results, you'll find all the metadata using str(EarthQuakeTweets[1]). Finding trending topics Now that we understand how to create API connections to Twitter and fetch data using it, we will see how to get answer to what is trending on Twitter to list what topic (worldwide or local) is being talked about the most right now. Using the same API, we can easily access the trending information: #return data frame with name, country & woeid. Locs <- availableTrendLocations() # Where woeid is a numerical identification code describing a location ID # Filter the data frame for Delhi (India) and extract the woeid of the same LocsIndia = subset(Locs, country == "India") woeidDelhi = subset(LocsIndia, name == "Delhi")$woeid # getTrends takes a specified woeid and returns the trending topics associated with that woeid trends = getTrends(woeid=woeidDelhi) The function availableTrendLocations() returns R data frame containing the name, country, and woeid parameters. We than filter this data frame for a location of our choosing; in this example, its Delhi, India. The function getTrends() fetches the top 10 trends in the location determined by the woeid. Here are the top four trending hash tags in the region defined by woeid = 20070458, that is, Delhi, India. head(trends) name url query woeid 1 #AntiHinduNGOsExposed http://twitter.com/search?q=%23AntiHinduNGOsExposed %23AntiHinduNGOsExposed 20070458 2 #KhaasAadmi http://twitter.com/search?q=%23KhaasAadmi %23KhaasAadmi 20070458 3 #WinGOSF14 http://twitter.com/search?q=%23WinGOSF14 %23WinGOSF14 20070458 4 #ItsForRealONeBay http://twitter.com/search?q=%23ItsForRealONeBay %23ItsForRealONeBay 20070458 Searching tweets Now, similar to the trends there is one more important function that comes with the TwitteR package: searchTwitter(). This function will return tweets containing the searched string along with the other constraints. Some of the constraints that can be imposed are as follows: lang: This constraints the tweets of given language. since/until: This constraints the tweets to be since the given date or until the given date. geocode: This constraints tweets to be from only those users who are located within certain distance from the given latitude/longitude. For example, extracting tweets about the cricketer Sachin Tendulkar in the month of November 2014: head(searchTwitter('Sachin Tendulkar', since='2014-11-01', until= '2014-11-30')) [[1]] [1] "TendulkarFC: RT @Moulinparikh: Sachin Tendulkar had a long session with the Mumbai Ranji Trophy team after today's loss." [[2]] [1] "tyagi_niharika: @WahidRuba @Anuj_dvn @Neel_D_ @alishatariq3 @VWellwishers @Meenal_Rathore oh... Yaadaaya....hmaraesachuuu sirxedxa0xbdxedxb8x8d..i mean sachin Tendulkar" [[3]] [1] "Meenal_Rathore: @WahidRuba @Anuj_dvn @tyagi_niharika @Neel_D_ @alishatariq3 @AliaaFcc @VWellwishers .. Sachin Tendulkar xedxa0xbdxedxb8x8a☺️" [[4]] [1] "MishraVidyanand: Vidyanand Mishra is following the Interest "The Living Legend SachinTendu..." on http://t.co/tveHXMB4BM - http://t.co/CocNMcxFge" [[5]] [1] "CSKalwaysWin: I have never tried to compare myself to anyone else.n - Sachin Tendulkar" Twitter sentiment analysis Depending on the objective and based on the functionality to search any type of tweets from the public timeline, one can always collect the required corpus. For example, you may want to learn about customer satisfaction levels with various cab services, which are coming in Indian market. These start-ups are offering various discounts and coupons to attract customers but at the end of the day, the service quality determines the business of any organization. These startups are constantly promoting themselves on various social media websites. Customers are showing various levels of sentiments on the same platform. Let's target the following: Meru Cabs: A radio cabs service based in Mumbai, India. Launched in 2007. Ola Cabs: A taxi aggregator company based in Bangalore, India. Launched in 2011. TaxiForSure: A taxi aggregator company based in Bangalore, India. Launched in 2011. Uber India: A taxi aggregator company headquartered in San Francisco, California. Launched in India in 2014. Let's set our goal to get the general sentiments about each of the preceding services providers based on the customer sentiments present in the tweets on Twitter. Collecting tweets as a corpus We'll start with the searchTwitter()function (discussed previously) on the TwitteR package to gather the tweets for each of the preceding organizations. Now, in order to avoid writing same code again and again, we pushed the following authorization code in the file called authenticate.R. library(twitteR) api_key<- "xx" api_secret<- "xx" access_token<- "xx" access_token_secret<- "xx" setup_twitter_oauth(api_key,api_secret,access_token, access_token_secret) We run the following scripts to get the required tweets: # Load the necessary packages source('authenticate.R') Meru_tweets = searchTwitter("MeruCabs", n=2000, lang="en") Ola_tweets = searchTwitter("OlaCabs", n=2000, lang="en") TaxiForSure_tweets = searchTwitter("TaxiForSure", n=2000, lang="en") Uber_tweets = searchTwitter("Uber_Delhi", n=2000, lang="en") Now, as mentioned in Twitter's Rest API documentation, we get the message "Due to capacity constraints, the index currently only covers about a week's worth of tweets". We do not always get the desired number of tweets (for example, here it's 2000). Instead, the following are the size of each of the above Tweet lists we get the following: >length(Meru_tweets) [1] 393 >length(Ola_tweets) [1] 984 > length(TaxiForSure_tweets) [1] 720 > length(Uber_tweets) [1] 2000 As you can see from the preceding code, the length of these tweets is not equal to the number of tweets we had asked for in our query scripts. There are many takeaways from this information. Since these tweets are only from last one week's tweets on Twitter, they suggest there is more discussion about these taxi services in the following order: Uber India Ola Cabs TaxiForSure Meru Cabs A ban was imposed on Uber India after an alleged rape incident by one Uber India driver. The decision to put a ban on the entire organization because one of its drivers committed a crime became a matter of public outcry. Hence, the number of tweets about Uber increased on social media. Now, Meru Cabs have been in India for almost 7 years now. Hence, they are quite a stable organization. They amount of promotion Ola Cabs and TaxiForSure are doing is way higher than that of Meru Cabs. This can be one reason for Meru Cabs having theleast number (393) of tweets in last week. The number of tweets in last week is comparable for Ola Cabs (984) and TaxiForSure (720). There can be several numbers of reasons for the same. They were both started their business in same year and more importantly they follow the same business model. While Meru Cabs is a radio taxi service and they own and manage a fleet of cars while Ola Cabs, TaxiForSure, or Uber are a marketplace for users to compare the offerings of various operators and book easily. Let's dive deep into the data and get more insights. Cleaning the corpus Before applying any intelligent algorithms to gather more insights out of the tweets collected so far, let's first clean it. In order to clean up, we should understand how the list of tweets looks like: head(Meru_tweets) [[1]] [1] "MeruCares: @KapilTwitts 2&gt;...and other details at feedback@merucabs.com We'll check back and reach out soon." [[2]] [1] "vikasraidhan: @MeruCabs really disappointed with @GenieCabs. Cab is never assigned on time. Driver calls after 30 minutes. Why would I ride with Meru?" [[3]] [1] "shiprachowdhary: fallback of #ubershame , #MERUCABS taking customers for a ride" [[4]] [1] "shiprachowdhary: They book Genie, but JIT inform of cancellation &amp; send full fare #MERUCABS . Very disappointed.Always used these guys 4 and recommend them." [[5]] [1] "shiprachowdhary: No choice bt to take the #merucabs premium service. Driver told me that this happens a lot with #merucabs." [[6]] [1] "shiprachowdhary: booked #Merucabsyestrdy. Asked for Meru Genie. 10 mins 4 pick up time, they call to say Genie not available, so sending the full fare cab" The first tweet here is a grievance solution, while the second, fourth and fifth are actually customer sentiments about the services provided by Meru Cabs. We see: Lots of meta information such as @people, URLs and #hashtags Punctuation marks, numbers, and unnecessary spaces Some of these tweets are retweets from other users; for the given application, we would not like to consider retweets (RTs) in sentiment analysis We clean all these data using the following code block: MeruTweets <- sapply(Meru_tweets, function(x) x$getText()) OlaTweets = sapply(Ola_tweets, function(x) x$getText()) TaxiForSureTweets = sapply(TaxiForSure_tweets, function(x) x$getText()) UberTweets = sapply(Uber_tweets, function(x) x$getText()) catch.error = function(x) { # let us create a missing value for test purpose y = NA # Try to catch that error (NA) we just created catch_error = tryCatch(tolower(x), error=function(e) e) # if not an error if (!inherits(catch_error, "error")) y = tolower(x) # check result if error exists, otherwise the function works fine. return(y) } cleanTweets<- function(tweet){ # Clean the tweet for sentiment analysis # remove html links, which are not required for sentiment analysis tweet = gsub("(f|ht)(tp)(s?)(://)(.*)[.|/](.*)", " ", tweet) # First we will remove retweet entities from the stored tweets (text) tweet = gsub("(RT|via)((?:\b\W*@\w+)+)", " ", tweet) # Then remove all "#Hashtag" tweet = gsub("#\w+", " ", tweet) # Then remove all "@people" tweet = gsub("@\w+", " ", tweet) # Then remove all the punctuation tweet = gsub("[[:punct:]]", " ", tweet) # Then remove numbers, we need only text for analytics tweet = gsub("[[:digit:]]", " ", tweet) # finally, we remove unnecessary spaces (white spaces, tabs etc) tweet = gsub("[ t]{2,}", " ", tweet) tweet = gsub("^\s+|\s+$", "", tweet) # if anything else, you feel, should be removed, you can. For example "slang words" etc using the above function and methods. # Next we'll convert all the word in lower case. This makes uniform pattern. tweet = catch.error(tweet) tweet } cleanTweetsAndRemoveNAs<- function(Tweets) { TweetsCleaned = sapply(Tweets, cleanTweets) # Remove the "NA" tweets from this tweet list TweetsCleaned = TweetsCleaned[!is.na(TweetsCleaned)] names(TweetsCleaned) = NULL # Remove the repetitive tweets from this tweet list TweetsCleaned = unique(TweetsCleaned) TweetsCleaned } MeruTweetsCleaned = cleanTweetsAndRemoveNAs(MeruTweets) OlaTweetsCleaned = cleanTweetsAndRemoveNAs(OlaTweets) TaxiForSureTweetsCleaned <- cleanTweetsAndRemoveNAs(TaxiForSureTweets) UberTweetsCleaned = cleanTweetsAndRemoveNAs(UberTweets) Here's the size of each of the cleaned tweet lists: > length(MeruTweetsCleaned) [1] 309 > length(OlaTweetsCleaned) [1] 811 > length(TaxiForSureTweetsCleaned) [1] 574 > length(UberTweetsCleaned) [1] 1355 Estimating sentiment (A) There are many sophisticated resources available to estimate sentiments. Many research papers and software packages are available open source,and they implement very complex algorithms for sentiments analysis. After getting the cleaned Twitter data, we are going to use few of such R packages available to assess the sentiments in the tweets. It's worth mentioning here that not all the tweets represent a sentiment. Few tweets can be just information/facts, while others can be customer care responses. Ideally, they should not be used to assess the customer sentiment about a particular organization. As a first step, we'll use a Naïve algorithm, which gives a score based on the number of times a positive or a negative word occurred in the given sentence (and in our case, in a tweet). Please download the positive and negative opinion/sentiment (nearly 68, 000) words from English language. These opinion lexicon will be used as a first example in our sentiment analysis experiment. The good thing about this approach is that we are relying on a highly researched upon and at the same time customizable input parameters. Here are a few examples of existing positive and negative sentiments words: Positive: Love, best, cool, great, good, and amazing Negative: Hate, worst, sucks, awful, and nightmare >opinion.lexicon.pos = scan('opinion-lexicon-English/positive-words.txt', what='character', comment.char=';') >opinion.lexicon.neg = scan('opinion-lexicon-English/negative-words.txt', what='character', comment.char=';') > head(opinion.lexicon.neg) [1] "2-faced" "2-faces" "abnormal" "abolish" "abominable" "abominably" > head(opinion.lexicon.pos) [1] "a+" "abound" "abounds" "abundance" "abundant" "accessable" We'll add a few industry-specific and/or especially emphatic terms based on our requirements: pos.words = c(opinion.lexicon.pos,'upgrade') neg.words = c(opinion.lexicon.neg,'wait', 'waiting', 'wtf', 'cancellation') Now, we create a function score.sentiment(), which computes the raw sentiment based on the simple matching algorithm: getSentimentScore = function(sentences, words.positive, words.negative, .progress='none') { require(plyr) require(stringr) scores = laply(sentences, function(sentence, words.positive, words.negative) { # Let first remove the Digit, Punctuation character and Control characters: sentence = gsub('[[:cntrl:]]', '', gsub('[[:punct:]]', '', gsub('\d+', '', sentence))) # Then lets convert all to lower sentence case: sentence = tolower(sentence) # Now lets split each sentence by the space delimiter words = unlist(str_split(sentence, '\s+')) # Get the boolean match of each words with the positive & negative opinion-lexicon pos.matches = !is.na(match(words, words.positive)) neg.matches = !is.na(match(words, words.negative)) # Now get the score as total positive sentiment minus the total negatives score = sum(pos.matches) - sum(neg.matches) return(score) }, words.positive, words.negative, .progress=.progress ) # Return a data frame with respective sentence and the score return(data.frame(text=sentences, score=scores)) } Now, we apply the preceding function on the corpus of tweets collected and cleaned so far: MeruResult = getSentimentScore(MeruTweetsCleaned, words.positive , words.negative) OlaResult = getSentimentScore(OlaTweetsCleaned, words.positive , words.negative) TaxiForSureResult = getSentimentScore(TaxiForSureTweetsCleaned, words.positive , words.negative) UberResult = getSentimentScore(UberTweetsCleaned, words.positive , words.negative) Here are some sample results: Tweet for Meru Cabs Score gt and other details at feedback com we ll check back and reach out soon 0 really disappointed with cab is never assigned on time driver calls after minutes why would i ride with meru -1 so after years of bashing today i m pleasantly surprised clean car courteous driver prompt pickup mins efficient route 4 a min drive cost hrs used to cost less ur unreliable and expensive trying to lose ur customers -3 Tweet For Ola Cabs Score the service is going from bad to worse the drivers deny to come after a confirmed booking -3 love the olacabs app give it a swirl sign up with my referral code dxf n and earn rs download the app from 1 crn kept me waiting for mins amp at last moment driver refused pickup so unreliable amp irresponsible -4 this is not the first time has delighted me punctuality and free upgrade awesome that 4 Tweet For TaxiForSure Score great service now i have become a regular customer of tfs thank you for the upgrade as well happy taxi ing saving 5 really disappointed with cab is never assigned on time driver calls after minutes why would i ride with meru -1 horrible taxi service had to wait for one hour with a new born in the chilly weather of new delhi waiting for them -4 what do i get now if you resolve the issue after i lost a crucial business because of the taxi delay -3 Tweet For Uber India Score that s good uber s fares will prob be competitive til they gain local monopoly then will go sky high as in new york amp delhi saving 3 from a shabby backend app stack to daily pr fuck ups its increasingly obvious that is run by child minded blow hards -3 you say that uber is illegally running were you stupid to not ban earlier and only ban it now after the rape -3 perhaps uber biz model does need some looking into it s not just in delhi that this happens but in boston too 0 From the preceding observations, it's clear that this basic sentiment analysis method works fine in normal circumstances, but in case of Uber India the results deviated too much from a subjective score. It's safe to say that basic word matching gives a good indicator of overall customer sentiments, except in the case when the data itself is not reliable. In our case, the tweets from Uber India are not really related to the services that Uber provides, rather the one incident of crime by its driver and whole score went haywire. Let's not compute a point statistic of the scores we have computed so far. Since the numbers of tweets are not equal for each of the four organizations, we compute a mean and standard deviation for each. Organization Mean Sentiment Score Standard Deviation Meru Cabs -0.2218543 1.301846 Ola Cabs 0.197724 1.170334 TaxiForSure -0.09841828 1.154056 Uber India -0.6132666 1.071094 Estimating sentiment (B) Let's now move one step further. Now instead of using simple matching of opinion lexicon, we'll use something called Naive Bayes to decide on the emotion present in any tweet. We would require packages called Rstem and sentiment to assist in this. It's important to mention here that both these packages are no longer available in CRAN and hence we have to provide either the repository location as a parameter install.package() function. Here's the R script to install the required packages: install.packages("Rstem", repos = "http://www.omegahat.org/R", type="source") require(devtools) install_url("http://cran.r-project.org/src/contrib/Archive/sentiment/sentiment_0.2.tar.gz") require(sentiment) ls("package:sentiment") Now that we have the sentiment and Rstem packages installed in our R workspace, we can build the bayes classifier for sentiment analysis: library(sentiment) # classify_emotion function returns an object of class data frame # with seven columns (anger, disgust, fear, joy, sadness, surprise, # # best_fit) and one row for each document: MeruTweetsClassEmo = classify_emotion(MeruTweetsCleaned, algorithm="bayes", prior=1.0) OlaTweetsClassEmo = classify_emotion(OlaTweetsCleaned, algorithm="bayes", prior=1.0) TaxiForSureTweetsClassEmo = classify_emotion(TaxiForSureTweetsCleaned, algorithm="bayes", prior=1.0) UberTweetsClassEmo = classify_emotion(UberTweetsCleaned, algorithm="bayes", prior=1.0) The following figure shows few results from Bayesian analysis using thesentiment package for Meru Cabs tweets. Similarly, we generated results for other cab-services from our problem setup. The sentiment package was built to use a trained dataset of emotion words (nearly 1500 words). The function classify_emotion() generates results belonging to one of the following six emotions: anger, disgust, fear, joy, sadness, and surprise. Hence, when the system is not able to classify the overall emotion to any of the six,NA is returned: Let's substitute these NA values with the word unknown to make the further analysis easier: # we will fetch emotion category best_fit for our analysis purposes. MeruEmotion = MeruTweetsClassEmo[,7] OlaEmotion = OlaTweetsClassEmo[,7] TaxiForSureEmotion = TaxiForSureTweetsClassEmo[,7] UberEmotion = UberTweetsClassEmo[,7] MeruEmotion[is.na(MeruEmotion)] = "unknown" OlaEmotion[is.na(OlaEmotion)] = "unknown" TaxiForSureEmotion[is.na(TaxiForSureEmotion)] = "unknown" UberEmotion[is.na(UberEmotion)] = "unknown" The best-fit emotions present in these tweets are as follows: Further, we'll use another function classify_polarity() provided by the sentiment package to classify the tweets into two classes, pos (positive sentiment) or neg (negative sentiment). The idea is to compute the log likelihood of a tweet assuming it to belong to either of two classes. Once these likelihoods are calculated, a ratio of the pos-likelihood to neg-likelihood is calculated and based on this ratio the tweets are classified to belong to a particular class. It's important to note that if this ratio turns out to be 1, then the overall sentiment of the tweet is assumed to be "neutral". The code is as follows: MeruTweetsClassPol = classify_polarity(MeruTweetsCleaned, algorithm="bayes") OlaTweetsClassPol = classify_polarity(OlaTweetsCleaned, algorithm="bayes") TaxiForSureTweetsClassPol = classify_polarity(TaxiForSureTweetsCleaned, algorithm="bayes") UberTweetsClassPol = classify_polarity(UberTweetsCleaned, algorithm="bayes") We get the following output: The preceding figure shows few results from obtained using the classify_polarity() function of sentiment package for Meru Cabs tweets. We'll now generate consolidated results from the two functions in a data frame for each cab service for plotting purposes: # we will fetch polarity category best_fit for our analysis purposes, MeruPol = MeruTweetsClassPol[,4] OlaPol = OlaTweetsClassPol[,4] TaxiForSurePol = TaxiForSureTweetsClassPol[,4] UberPol = UberTweetsClassPol[,4] # Let us now create a data frame with the above results MeruSentimentDataFrame = data.frame(text=MeruTweetsCleaned, emotion=MeruEmotion, polarity=MeruPol, stringsAsFactors=FALSE) OlaSentimentDataFrame = data.frame(text=OlaTweetsCleaned, emotion=OlaEmotion, polarity=OlaPol, stringsAsFactors=FALSE) TaxiForSureSentimentDataFrame = data.frame(text=TaxiForSureTweetsCleaned, emotion=TaxiForSureEmotion, polarity=TaxiForSurePol, stringsAsFactors=FALSE) UberSentimentDataFrame = data.frame(text=UberTweetsCleaned, emotion=UberEmotion, polarity=UberPol, stringsAsFactors=FALSE) # rearrange data inside the frame by sorting it MeruSentimentDataFrame = within(MeruSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) OlaSentimentDataFrame = within(OlaSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) TaxiForSureSentimentDataFrame = within(TaxiForSureSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) UberSentimentDataFrame = within(UberSentimentDataFrame, emotion <- factor(emotion, levels=names(sort(table(emotion), decreasing=TRUE)))) plotSentiments1<- function (sentiment_dataframe,title) { library(ggplot2) ggplot(sentiment_dataframe, aes(x=emotion)) + geom_bar(aes(y=..count.., fill=emotion)) + scale_fill_brewer(palette="Dark2") + ggtitle(title) + theme(legend.position='right') + ylab('Number of Tweets') + xlab('Emotion Categories') } plotSentiments1(MeruSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about MeruCabs') plotSentiments1(OlaSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about OlaCabs') plotSentiments1(TaxiForSureSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about TaxiForSure') plotSentiments1(UberSentimentDataFrame, 'Sentiment Analysis of Tweets on Twitter about UberIndia') The output is as follows: In the preceding figure, we showed sample results using generated results on Meru Cabs tweets using both the functions. Let's now plot them one by one. First, let's create a single function to be used by each business's tweets. We call it plotSentiments1() and then we plot it for each business: The following dashboard shows the analysis for Ola Cabs: The following dashboard shows the analysis for TaxiForSure: The following dashboard shows the analysis for Uber India: These sentiments basically reflect the more or less the same observations as we did with the basic word-matching algorithm. The number of tweets with joy constitute the largest part of tweets for all these organizations, indicating that these organizations are trying their best to provide good business in the country. The sadness tweets are less numerous than the joy tweets. However, if compared with each other, they indicate the overall market share versus level of customer satisfaction of each service provider in question. Similarly, these graphs can be used to assess the level of dissatisfaction in terms of anger and disgust in the tweets. Let's now consider only the positive and negative sentiments present in the tweets: # Similarly we will plot distribution of polarity in the tweets plotSentiments2 <- function (sentiment_dataframe,title) { library(ggplot2) ggplot(sentiment_dataframe, aes(x=polarity)) + geom_bar(aes(y=..count.., fill=polarity)) + scale_fill_brewer(palette="RdGy") + ggtitle(title) + theme(legend.position='right') + ylab('Number of Tweets') + xlab('Polarity Categories') } plotSentiments2(MeruSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about MeruCabs') plotSentiments2(OlaSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about OlaCabs') plotSentiments2(TaxiForSureSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about TaxiForSure') plotSentiments2(UberSentimentDataFrame, 'Polarity Analysis of Tweets on Twitter about UberIndia') The output is as follows: The following dashboard shows the polarity analysis for Ola Cabs: The following dashboard shows the analysis for TaxiForSure: The following dashboard shows the analysis for Uber India: It's a basic human trait to inform about other's what's wrong rather than informing if there was something right. That is say that we tend to tweets/report if something bad had happened rather reporting/tweeting if the experience was rather good. Hence, the negative tweets are supposed to be larger than the positive tweets in general. Still over a period of time (a week in our case) the ratio of the two easily reflect the overall market share versus the level of customer satisfaction of each service provider in question. Next, we try to get the sense of the overall content of the tweets using the word clouds. removeCustomeWords <- function (TweetsCleaned) { for(i in 1:length(TweetsCleaned)){ TweetsCleaned[i] <- tryCatch({ TweetsCleaned[i] = removeWords(TweetsCleaned[i], c(stopwords("english"), "care", "guys", "can", "dis", "didn", "guy" ,"booked", "plz")) TweetsCleaned[i] }, error=function(cond) { TweetsCleaned[i] }, warning=function(cond) { TweetsCleaned[i] }) } return(TweetsCleaned) } getWordCloud <- function (sentiment_dataframe, TweetsCleaned, Emotion) { emos = levels(factor(sentiment_dataframe$emotion)) n_emos = length(emos) emo.docs = rep("", n_emos) TweetsCleaned = removeCustomeWords(TweetsCleaned) for (i in 1:n_emos){ emo.docs[i] = paste(TweetsCleaned[Emotion == emos[i]], collapse=" ") } corpus = Corpus(VectorSource(emo.docs)) tdm = TermDocumentMatrix(corpus) tdm = as.matrix(tdm) colnames(tdm) = emos require(wordcloud) suppressWarnings(comparison.cloud(tdm, colors = brewer.pal(n_emos, "Dark2"), scale = c(3,.5), random.order = FALSE, title.size = 1.5)) } getWordCloud(MeruSentimentDataFrame, MeruTweetsCleaned, MeruEmotion) getWordCloud(OlaSentimentDataFrame, OlaTweetsCleaned, OlaEmotion) getWordCloud(TaxiForSureSentimentDataFrame, TaxiForSureTweetsCleaned, TaxiForSureEmotion) getWordCloud(UberSentimentDataFrame, UberTweetsCleaned, UberEmotion) The preceding figure shows word cloud from tweets about Meru Cabs. The preceding figure shows word cloud from tweets about Ola Cabs. The preceding figure shows word cloud from tweets about TaxiForSure. The preceding figure shows word cloud from tweets about Uber India. Summary In this article, we gained knowledge of the various Twitter APIs, we discussed how to create a connection with Twitter, and we saw how to retrieve the tweets with various attributes. We saw the power of Twitter in helping us determine the customer attitude toward today's various businesses. The activity can be done on the weekly basis and one can easily get the monthly or quarterly or yearly changes in customer sentiments. This can not only help the customer decide the trending businesses, but the business itself can get a well-defined metric of its own performance. It can use such scores/graphs to improve. We also discussed various methods of sentiment analysis varying from basic word matching to the advanced Bayesian algorithms. Resources for Article: Further resources on this subject: Find Friends on Facebook [article] Supervised learning[article] Warming Up [article]
Read more
  • 0
  • 0
  • 23321

article-image-how-to-perform-numeric-metric-aggregations-with-elasticsearch
Pravin Dhandre
22 Feb 2018
7 min read
Save for later

How to perform Numeric Metric Aggregations with Elasticsearch

Pravin Dhandre
22 Feb 2018
7 min read
[box type="note" align="" class="" width=""]This article is an excerpt from the book Learning Elastic Stack 6.0 written by Pranav Shukla and Sharath Kumar M N . This book provides detailed coverage on fundamentals of each components of Elastic Stack, making it easy to search, analyze and visualize data across different sources in real-time.[/box] Today, we are going to demonstrate how to run numeric and statistical queries such as summation, average, count and various similar metric aggregations on Elastic Stack to serve a better analytics engine on your dataset. Metric aggregations   Metric aggregations work with numeric data, computing one or more aggregate metrics within the given context. The context could be a query, filter, or no query to include the whole index/type. Metric aggregations can also be nested inside other bucket aggregations. In this case, these metrics will be computed for each bucket in the bucket aggregations. We will start with simple metric aggregations without nesting them inside bucket aggregations. When we learn about bucket aggregations later in the chapter, we will also learn how to use metric aggregations inside bucket aggregations. We will learn about the following metric aggregations: Sum, average, min, and max aggregations Stats and extended stats aggregations Cardinality aggregation Let us learn about them one by one. Sum, average, min, and max aggregations Finding the sum of a field, the minimum value for a field, the maximum value for a field, or an average, are very common operations. For the people who are familiar with SQL, the query to find the sum would look like the following: SELECT sum(downloadTotal) FROM usageReport; The preceding query will calculate the sum of the downloadTotal field across all records in the table. This requires going through all records of the table or all records in the given context and adding the values of the given fields. In Elasticsearch, a similar query can be written using the sum aggregation. Let us understand the sum aggregation first. Sum aggregation Here is how to write a simple sum aggregation: GET bigginsight/_search { "aggregations": { 1 "download_sum": { 2 "sum": { 3 "field": "downloadTotal" 4 } } }, "size": 0 5 } The aggs or aggregations element at the top level should wrap any aggregation. Give a name to the aggregation; here we are doing the sum aggregation on the downloadTotal field and hence the name we chose is download_sum. You can name it anything. This field will be useful while looking up this particular aggregation's result in the response. We are doing a sum aggregation, hence the sum element. We want to do term aggregation on the downloadTotal field. Specify size = 0 to prevent raw search results from being returned. We just want aggregation results and not the search results in this case. Since we haven't specified any top level query elements, it matches all documents. We do not want any raw documents (or search hits) in the result. The response should look like the following: { "took": 92, ... "hits": { "total": 242836, 1 "max_score": 0, "hits": [] }, "aggregations": { 2 "download_sum": { 3 "value": 2197438700 4 } } } Let us understand the key aspects of the response. The key parts are numbered 1, 2, 3, and so on, and are explained in the following points: The hits.total element shows the number of documents that were considered or were in the context of the query. If there was no additional query or filter specified, it will include all documents in the type or index. Just like the request, this response is wrapped inside aggregations to indicate as Such. The response of the aggregation requested by us was named download_sum, hence we get our response from the sum aggregation inside an element with the same name. The actual value after applying the sum aggregation. The average, min, and max aggregations are very similar. Let's look at them briefly. Average aggregation The average aggregation finds an average across all documents in the querying context: GET bigginsight/_search { "aggregations": { "download_average": { 1 "avg": { 2 "field": "downloadTotal" } } }, "size": 0 } The only notable differences from the sum aggregation are as follows: We chose a different name, download_average, to make it apparent that the aggregation is trying to compute the average. The type of aggregation that we are doing is avg instead of the sum aggregation that we were doing earlier. The response structure is identical but the value field will now represent the average of the requested field. The min and max aggregations are the exactly same. Min aggregation Here is how we will find the minimum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_min": { "min": { "field": "downloadTotal" } } }, "size": 0 } Let's finally look at max aggregation also. Max aggregation Here is how we will find the maximum value of the downloadTotal field in the entire index/type: GET bigginsight/_search { "aggregations": { "download_max": { "max": { "field": "downloadTotal" } } }, "size": 0 } These aggregations were really simple. Now let's look at some more advanced yet simple stats and extended stats aggregations. Stats and extended stats aggregations These aggregations compute some common statistics in a single request without having to issue multiple requests. This saves resources on the Elasticsearch side as well because the statistics are computed in a single pass rather than being requested multiple times. The client code also becomes simpler if you are interested in more than one of these statistics. Let's look at the stats aggregation first. Stats aggregation The stats aggregation computes the sum, average, min, max, and count of documents in a single pass: GET bigginsight/_search { "aggregations": { "download_stats": { "stats": { "field": "downloadTotal" } } }, "size": 0 } The structure of the stats request is the same as the other metric aggregations we have seen so far, so nothing special is going on here. The response should look like the following: { "took": 4, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_stats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700 } } } As you can see, the response with the download_stats element contains count, min, max, average, and sum; everything is included in the same response. This is very handy as it reduces the overhead of multiple requests and also simplifies the client code. Let us look at the extended stats aggregation. Extended stats Aggregation The extended stats aggregation returns a few more statistics in addition to the ones returned by the stats aggregation: GET bigginsight/_search { "aggregations": { "download_estats": { "extended_stats": { "field": "downloadTotal" } } }, "size": 0 } The response looks like the following: { "took": 15, "timed_out": false, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "download_estats": { "count": 242835, "min": 0, "max": 241213, "avg": 9049.102065188297, "sum": 2197438700, "sum_of_squares": 133545882701698, "variance": 468058704.9782911, "std_deviation": 21634.664429528162, "std_deviation_bounds": { "upper": 52318.43092424462, "lower": -34220.22679386803 } } } } It also returns the sum of squares, variance, standard deviation, and standard deviation Bounds. Cardinality aggregation Finding the count of unique elements can be done with the cardinality aggregation. It is similar to finding the result of a query such as the following: select count(*) from (select distinct username from usageReport) u; Finding the cardinality or the number of unique values for a specific field is a very common requirement. If you have click-stream from the different visitors on your website, you may want to find out how many unique visitors you got in a given day, week, or month. Let us understand how we find out the count of unique users for which we have network traffic data: GET bigginsight/_search { "aggregations": { "unique_visitors": { "cardinality": { "field": "username" } } }, "size": 0 } The cardinality aggregation response is just like the other metric aggregations: { "took": 110, ..., "hits": { "total": 242836, "max_score": 0, "hits": [] }, "aggregations": { "unique_visitors": { "value": 79 } } } To summarize, we learned how to perform numerous metric aggregations on numeric datasets and easily deploy elasticsearch in building powerful analytics application. If you found this tutorial useful, do check out the book Learning Elastic Stack 6.0 to examine the fundamentals of Elastic Stack in detail and start developing solutions for problems like logging, site search, app search, metrics and more.      
Read more
  • 0
  • 0
  • 23309
Modal Close icon
Modal Close icon