Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7018 Articles
article-image-limits-game-data-analysis
Packt
20 Nov 2013
7 min read
Save for later

Limits of Game Data Analysis

Packt
20 Nov 2013
7 min read
(For more resources related to this topic, see here.) Which game analytics should be used This section will focus on the role data that should take in your production process. As a studio, the first step is to identify your needs and to choose the goals you will attribute to game analytics. Game analytics as a tool Firstly, it is important to understand that game analytics are a tool, which means they can serve several purposes. You can use them for marketing, science, sociological studies, and so on. Following this statement, you will need different tools and different approaches to reach your goal. As this article has tried to highlight it, tools are chosen according to problems, regardless if the choice is technique or analysis. You must not choose a tool because it is said to be the best performing tool ever made, or because it is fashionable. Instead, you must choose a tool because it is said to be the most efficient tool for your needs. Try to answer the following questions: What are the long-term uses I plan to do with game analytics? Is it simply reporting the Key Performance Indicators or is it the building a user-centric framework for deep analysis? What are the types and the level of skills of the people who will work on it?Do I have all of the skills, from data scientists to game analysts, or do I need to choose a solution which will offset some lacks in a particular field? How much data will be collected? How do I plan to deal with possible peaks of frequentation? How do I adapt temporalities of reporting and analysis with the rhythmof production I have on my project? Do I split them weekly or monthly? What are the main goals of my process? Do I want to build a predictive model (for example, based on correlations) in order to define the next acquisition campaign I will run? Do I want to increase the monetization rate on the current player base? Do I want to perform A/B testing? And the list goes on. Game analytics must serve your team Secondly, it is important to ensure that the use of game analytics must serve your team as a whole. They should not have any disagreements about the long-term objectives that you have chosen. They must accompany it and especially improve it, but the general objective should remain the same. Given the current state of the field, withdrawing the "human touch" from the design process entirely and listening only to data would be a mistake. That's why the game analytics process should be thought through the prism of your own team; and therefore, should be presented as a new tool. This will help them to make good decisions for the game. The best example for the democratization of "game analytics way of thinking" inside your team is certainly the A/B testing aspect. If you experience debates about particular features in the game, instead of taking part you can propose to use A/B tests for some of those features. Following this, there are no particular limits to the use of the tool. A game designer can test different balancing on the virtual economy of a game and an artist can experience different graphic styles. When starting, focus your attention on simple practices If you are new to the field, the following list may help you to start defining your first objectives. It contains most of the typical use for online games, especially free-to-play games: Producing KPIs on a weekly or monthly basis, according to your needs. These KPIs will help you to orient the upcoming development of your game and to anticipate the return on investment of your acquisition campaigns. Identifying if some of the steps of your tutorial phase are poorly designed; for example, if you have a sudden player loss at a particular step of your tutorial. On the same idea, having the loss of players at each level is also very useful to improve the general balancing of your game, especially the progress curve and the difficulty. This topic is more important if you have a part of your business model based on purchasable goods, which can increase the progression rate of the player. You can evaluate which area and which purchasable goods of your game are generating the best income. You can perform A/B testing on particular key features of your game in order to see which ones are the most efficient. What game analytics should not be used for On the other hand, there are a few limits that you need to know before using methods and processes from game analytics. Keep away from numbers You must always be careful about the fact that numbers are used to represent a given situation during a "T" instant. From this statement, the predictive models must always be revised and improved. they should never be considered as the perfect truth. In order for the process to be efficient, it is quite important to keep research on the data inside the structure defined by the initial goals. Otherwise, you might split your efforts and no actionable insights would be identified. In other words, numbers must remain at their place. They are a tool in the hands of a human subject, and they should not become an obsession. Try to reason if they make any sense and if you are asking the right question. Practices that need to be avoided As mentioned in the the previous section, if you are new to this field, be aware of the following situations: Data cannot dictate the full content of your next update. If it is the case, you may first re-evaluate the general intention behind your product and talk with the game designer. When starting, try to avoid complex questions that involve external factors in the game, even if they seem crucial for you. For example, trying to understand why people stopped playing your game over a long period of time is usually impossible. Old players might stop playing because another game came out or they just got bored. Data cannot make miracles at this point of the engagement. Data must not take too much ampleness in the creative process. There are some human intentions and ideas, and only then the data comes in order to verify and improve the potential success of those intentions. Data must not slow down the performances of the game. One of the common methods to avoid this is to send the data when the player logs in or logs out and not at each click or each action. Summary This is the end of this article, and the most important thing you need to remember about game analytics in general is the importance of the definition of your objectives. The reason why you choose this tool instead of another (and this article has tried to list a maximum of them, from data mining to pure analysis) is because it fits your needs as much as possible. This statement is true at every stage of the refiection process which surrounds game analytics, from the choice of the storage solution to the type of analysis you want to perform. The rising of a fully-connected state in the video game industry offers developers the opportunity to change the way they create games, but there is no doubt that the level of maturation related to this tool has not reached its maximum yet. Therefore, even if the benefits of game analytics are great, be prepared to make mistakes as well; and keep your own process open to various criticisms from your team. Resources for Article: Further resources on this subject: Flash 10 Multiplayer Game: Game Interface Design [Article] GNU Octave: Data Analysis Examples [Article] HTML5 Games Development: Using Local Storage to Store Game Data [Article]
Read more
  • 0
  • 0
  • 25291

article-image-issues-and-wikis-gitlab
Packt
20 Nov 2013
6 min read
Save for later

Issues and Wikis in GitLab

Packt
20 Nov 2013
6 min read
(For more resources related to this topic, see here.) Issues The built-in features for issue tracking and documentation will be very beneficial to you, especially if you're working on extensive software projects, the ones with many components, or those that need to be supported in multiple versions at once, for example, stable, testing, and unstable. In this article, we will have a closer look at the formats that are supported for issues and wiki pages (in particular, Markdown); also the elements that can be referenced from within these and how issues can be organized. Furthermore, we will go through the process of assigning issues to team members, and keeping documentation in wiki pages, which can also be edited locally. Lastly, we will see how the RSS feeds generated by GitLab can keep your team in a closer loop around the projects they work on. The metadata covered in this article may seem trivial, but many famous software projects have gained traction due to their extensive and well-written documentation, which initially was done by core developers. It enables your users to do the same with their projects, even if only internally; it opens up for a much more efficient collaboration. GitLab-flavored Markdown GitLab comes with a Markdown formatting parser that is fairly similar to GitHubs, which makes it very easy to adapt and migrate. Many standalone editors also support this format, such as Mou (http://mouapp.com/) for Mac or MarkdownPad (http://markdownpad.com/) for Windows. On Linux, editors with a split view, such as ReText (http://sourceforge.net/projects/retext/) or the more Zen-writing UberWriter (http://uberwriter.wolfvollprecht.de/) are available. For the popular Vim editor , multiple Markdown plugins too are up for grabs on a number of GitHub repositories; one of them is Vim Markdown (https://github.com/tpope/vim-markdown) by Tim Pope. Lastly, I'd like to mention that you don't need a dedicated editor for Markdown because they are plain text files. The mentioned editors simply enhance the view through syntax highlighting and preview modes. About Markdown Markdown was originally written by John Gruber, and has since evolved into various flavors. The intention of this very lightweight markup language is to have a source that is easy to edit and can be transformed into meaningful HTML to be displayed on the Web. Different variations of Markdown have made it to a majority of very successful software projects as the default language; readme files, documentation, and even blogging engines adopt it. In Markdown, text styles can be applied, links placed, and images can be inserted. If ever Markdown, by default, does not support what you are currently trying to do, you can insert plain HTML, which will not be altered by the Markdown parser. Referring to elements inside GitLab When working with source code, it can be of importance to refer to a line of code, a file, or other things, when discussing something. Because many development teams are nowadays spread throughout the world, GitLab adapts to that and makes it easy to refer and reference many things directly from comments, wiki pages, or issues. Some things like files or lines can be referenced via links, because GitLab has unique links to the branches of a repository; others are more directly accessible. The following items (basically, prefixed strings or IDs) can be referenced through shortcodes: commit messages comments wall posts issues merge requests milestones wiki pages To reference items, use the following shortcodes inside any field that supports Markdown or RDoc on the web interface: @foofor team members #123for issues !123for merge requests $123for snippets 1234567for commits Issues, knowing what needs to be done An issue is a text message of variable length, describing a bug in the code, an improvement to be made, or something else that should be done or discussed. By commenting on the issue, developers or project leaders can respond to this request or statement. The meta information attached to an issue can be very valuable to the team, because developers can be assigned to an issue, and it can be tagged or labeled with keywords that describe the content or area to which it belongs. Furthermore, you can also set a goal for the milestone to be included in this fix or feature. In the following screenshot, you can see the interface for issues: Creating issues By navigating to the Issues tab of a repository in the web interface, you can easily create new issues. Their title should be brief and precise, because a more elaborate description area is available. The description area supports the GitLab-flavored Markdown, as mentioned previously. Upon creation, you can choose a milestone and a user to assign an issue to, but you can also leave these fields unset, possibly to let your developers themselves choose with what they want to work and at what time. Before they begin their work, they can assign the issues to themselves. In the following screenshot, you can see what the issue creation form looks like: Working with labels or tags Labels are tags used to organize issues by the topic and severity. Creating labels is as easy as inserting them, separated by a comma, into the respective field while creating an issue. Currently in Version 5.2, certain keywords trigger a certain background color on the label. Labels like critical or bug turn red, feature turns green, and other labels are blue by default. The following screenshot shows what a list of labeled features looks like: After the creation of a label, it will be listed under the Labels tab within the Issues page, with a link that lists all the issues that have been labeled the same. Filtering by the label, assigned user, or milestone is also possible from the list of issues within each projects overview. Summary In this article, we have had a look at the project management side of things. You can now make use of the built-in possibilities to distribute tasks across team members through issues, keep track of things that still have to do with the issues, or enable observers to point out bugs. Resources for Article : Further resources on this subject: Using Gerrit with GitHub [Article] The architecture of JavaScriptMVC [Article] Using the OSGi Bundle Repository in OSGi and Apache Felix 3.0 [Article]
Read more
  • 0
  • 1
  • 17589

article-image-getting-started-with-fortigate-troubleshooting
Packt
20 Nov 2013
6 min read
Save for later

Getting Started with Fortigate: Troubleshooting

Packt
20 Nov 2013
6 min read
Base system diagnostics The status screen in the web-based manager includes a high level overview of information such as the system time (that is important, for example, to have coherent error messages and log recording), CPU and memory usage, license information, and alerts, as we can see in the following screenshot: Although this screen is useful for a rapid assessment of the situation, our diagnostic tools usually have to dig deeper. The first base command we will use in the CLI is get system. This command can open more than eighty information options, dedicated to the different features of the FortiGate units. Among the others, we are able to check counters related to performance, such as: Startup configuration errors with the get system startup-error-log command. Firewall traffic statistics related to the traffic with the get system performance firewall statistics command. Firewall packet distribution statistics with the get system performance firewall packet-distribution command. Information about the most intensive CPU processes with the get system performance top, that will show a screen divided in columns, as we can see in the following screenshot: Another fundamental command we will use is diagnose hardware, which is used for problem-solving procedures related to certificates, devices, PCI, and system information. The devices menu is opened with the diagnose hardware deviceinfo, and includes a disk option to recover information about internal disks (if present) and a nic option to display data from network interfaces. The latter also shows on screen the errors and the drops related to network packets, as we can see in the following screenshot: To have access to real-time information, we will use the diagnose debug command. The diagnose debug report is not a troubleshooting tool, but is used to create a report for the Fortinet technical support. We will talk about additional options for the diagnose debug command later, in relation to TCP/IP debugging. Troubleshooting routing The tools that we will see in the following paragraphs will be required to troubleshoot the addressing and routing features of the TCP/IP protocol. Before we proceed to explain the single tools and commands for troubleshooting, we can take advantage of a real-world suggestion. In order to perform the troubleshooting steps in a more comfortable way, it is often advisable to use a client for SSH and Telnet such as PuTTY (http://bit.ly/1kyS98), to launch two separate sessions on a FortiGate unit. One of the two consoles will be dedicated to watch the results of the debug commands. The second console will be dedicated to launch commands, such as ping and traceroute that we will use to trigger actions that will be visible in the first open console. In the following screenshot we have a diagnose sniffer packet port1 icmp command running on the session opened to the left-hand side and an execute ping command on the session opened on the right-hand side window: Layer 2 and layer 3 TCP/IP diagnostics Some issues can be solved only by correcting the ARP table that associates IP and MAC addresses. The diagnose ip arp list command shows the ARP cache as shown in the following screenshot: The following commands are used to manage the ARP cache: The execute clear system arp table command to remove the ARP cache. The diagnose ip arp delete <interface name> <IP address> command to remove a single ARP entry. The diagnose ip arp flush <interface name> command to remove all entries associated with a single interface. The config system arp-table command to add a static ARP entry. This command requires two further commands: The config system arp-table command The edit command to create a new entry and to modify an existing entry or to create a new one Three mandatory parameters are: set mac, to configure a MAC address for the entry set ip, to configure an IP address for the entry set interface, to select the interface that is connected to the MAC and IP In the following screenshot we can see all the required steps to add the entry number 3 on our ARP cache with the following parameters: ip 192.168.12.1 with a mac F0:DE:F1:E4:75:B9 on the internal interface: We can now take care of layer 3, especially from the point of view of routing. As in any device that manages networking, the most used command (included in the ICMP protocol) is the ping command. A FortiGate unit supports two kinds of ping commands: execute ping <IP address> and a command dedicated to modify the behavior of the ping command, execute ping-options, that includes parameters such as: data-size: To select the datagram size in bytes (between 0 and 65507) interval: To set a value in seconds between two pings repeat-count: To select the number of pings to send source: To specify a source interface (default value is auto-select) view-settings: Used to show the current ping options timeout: To specify time out in seconds In the following screenshot we have modified some ping parameters and verified them with the view-settings parameter: Another fundamental command, based on ICMP is execute traceroute <dest>, that allows us to see all the hops (networking devices) that a network packet traverses, starting from the FortiGate to a destination (which can be an IP address or an FQDN). Having the full path shown can be important to detect a wrong or faulty hop along the path. The usefulness of traceroute is related to how many devices along the route allow the use of the ICMP protocol, but also if we use it only inside to troubleshoot our internal corporate network, the results of this simple command are extremely useful. To show the result of a traceroute and have fun along the way, we can use the so called "Star Wars Traceroute"; execute traceroute 216.81.59.173, that will show the opening crawl to Star Wars Episode IV (a result that was obtained making clever use of hostnames and routing). We can see a (small) part of the result in the following screenshot: The next logical step to debug problems at layer 3 of TCP/IP is to verify the routing table, something that we are able to do with the get router info routing-table all command. The resulting information text could be very lengthy, so we are able to filter the output using the parameters including: details: Show routing table details information rip: Show RIP routing table ospf: Show OSPF routing table isis: Show ISIS routing table static: Show static routing table connected: Show connected routing table database: Show routing information base The routing table shows the routing entries and their origin (the routing protocol that added an entry in the routing table). Summary In this article, the authors have made the understanding of the Base system diagnostics, the troubleshooting of routing, and layer 2 and layer 3 TCP/IP diagnostics better. Useful Links: vCloud Networks Network Virtualization and vSphere Supporting hypervisors by OpenNebula
Read more
  • 0
  • 0
  • 32921

article-image-securing-hadoop-ecosystem
Packt
20 Nov 2013
6 min read
Save for later

Securing the Hadoop Ecosystem

Packt
20 Nov 2013
6 min read
(For more resources related to this topic, see here.) Each ecosystem component has its own security challenges and needs to be configured uniquely based on its architecture to secure them. Each of these ecosystem components has end users directly accessing the component or a backend service accessing the Hadoop core components (HDFS and MapReduce). The following are the topics that we'll be covering in this article: Configuring authentication and authorization for the following Hadoop ecosystem components: Hive Oozie Flume HBase Sqoop Pig Best practices in configuring secured Hadoop components Configuring Kerberos for Hadoop ecosystem components The Hadoop ecosystem is growing continuously and maturing with increasing enterprise adoption. In this section, we look at some of the most important Hadoop ecosystem components, their architecture, and how they can be secured. Securing Hive Hive provides the ability to run SQL queries over the data stored in the HDFS. Hive provides the Hive query engine that converts Hive queries provided by the user to a pipeline of MapReduce jobs that are submitted to Hadoop (JobTracker or ResourceManager) for execution. The results of the MapReduce executions are then presented back to the user or stored in HDFS. The following figure shows a high-level interaction of a business user working with Hive to run Hive queries on Hadoop: There are multiple ways a Hadoop user can interact with Hive and run Hive queries; these are as follows: The user can directly run the Hive queries using Command Line Interface (CLI). The CLI connects to the Hive metastore using the metastore server and invokes Hive query engine directly to execute Hive query on the cluster. Custom applications written in Java and other languages interacts with Hive using the HiveServer. HiveServer, internally, uses the metastore server and the Hive Query Engine to execute the Hive query on the cluster. To secure Hive in the Hadoop ecosystem, the following interactions should be secured: User interaction with Hive CLI or HiveServer User roles and privileges needs to be enforced to ensure users have access to only authorized data The interaction between Hive and Hadoop (JobTracker or ResourceManager) has to be secured and the user roles and privileges should be propagated to Hadoop jobs To ensure secure Hive user interaction, there is a need to ensure that the user is authenticated by HiveServer or CLI before running any jobs on the cluster. The user has to first use the kinit command to fetch the Kerberos ticket. This ticket is stored in the credential cache and used to authenticate with Kerberos-enabled systems. Once the user is authenticated, Hive submits the job to Hadoop (JobTracker or ResourceManager). Hive needs to impersonate the user to execute MapReduce on the cluster. From Hive Version 0.11 onwards, HiveServer2 was introduced. The earlier HiveServer had serious security limitations related to user authentication. HiveServer2 supports Kerberos and LDAP authentication for the user authentication. When HiveServer2 is configured to have LDAP authentication, Hive users are managed using the LDAP store. Hive asks the users to submit the MapReduce jobs to Hadoop. Thus, if we configure HiveServer2 to use LDAP, only the user authentication between the client and HiveServer2 is addressed. The interaction of Hive with Hadoop is insecure, and Hive MapReduce will be able to access other users' data in the Hadoop cluster. On the other hand, when we use Kerberos authentication for Hive users with HiveServer2, the same user is impersonated to execute MapReduce on the Hadoop cluster. So it is recommended that in a production environment, we configure HiveServer2 with Kerberos to have a seamless authentication and access control for the users submitting Hive queries. The credential store for Kerberos KDC can be configured to be LDAP so that we can centrally manage the user credentials of the end users. To set up a secured Hive interactions, we need to do the following steps: One of the key steps in securing Hive interaction is to ensure that the Hive user is impersonated in Hadoop, as Hive executes a MapReduce job on the Hadoop cluster. To achieve this goal, we need to add the hive.server2.enable.impersonation configuration in hive-site.xml, and hadoop.proxyuser.hive.hosts and hadoop. proxyuser.hive.groups in core-site.xml. <property> <name>hive.server2.authentication</name> <value>KERBEROS</value> </property> <property> <name>hive.server2.authentication.kerberos.principal</name> <value>hive/_HOST@YOUR-REALM.COM</value> </property> <property> <name>hive.server2.authentication.kerberos.keytab</name> <value>/etc/hive/conf/hive.keytab</value> </property> <property> <name>hive.server2.enable.impersonation</name> <description>Enable user impersonation for HiveServer2</description> <value>true</value> </property> Securing Hive using Sentry In the previous section, we saw how Hive authentication can be enforced using Kerberos and the user privileges that are enforced by using user impersonation in Hadoop by the superuser. Sentryis the one of the latest entrant in the Hadoop ecosystem that provides finegrained user authorization for the data that is stored in Hive. Sentry provides finegrained, role-based authorization to Hive and Impala. Sentry uses HiveServer2 and metastore server to execute the queries on the Hadoop platform. However, the user impersonation is turned off in HiveServer2 when Sentry is used. Sentry enforces user privileges on the Hadoop data using the Hive metastore. Sentry supports authorization policies per database/schema. This could be leveraged to enforce user management policies. More details on Sentry are available at the following URL: http://www.cloudera.com/content/cloudera/en/products/cdh/sentry.html Summary In this article we learned how to configure Kerberos for Hadoop ecosystem components. We also looked at how to secure Hive using Sentry. Resources for Article: Further resources on this subject: Advanced Hadoop MapReduce Administration [Article] Managing a Hadoop Cluster [Article] Making Big Data Work for Hadoop and Solr [Article]
Read more
  • 0
  • 0
  • 3877

article-image-reducing-data-size
Packt
20 Nov 2013
5 min read
Save for later

Reducing Data Size

Packt
20 Nov 2013
5 min read
(For more resources related to this topic, see here.) Selecting attributes using models Weighting by the PCA approach, mentioned previously, is an example where the combination of attributes within an example drives the generation of the principal components, and the correlation of an attribute with these generates the attribute's weight. When building classifiers, it is logical to take this a stage further and use the potential model itself as the determinant of whether the addition or removal of an attribute makes for better predictions. RapidMiner provides a number of operators to facilitate this, and the following sections go into detail for one of these operators with the intention of showing how applicable the techniques are to other similar operations. The operator that will be explained in detail is Forward Selection. This is similar to a number of others in the Optimization group within the Attribute selection and Data transformation section of the RapidMiner GUI operator tree. These operators include Backward Elimination and a number of Optimize Selection operators. The techniques illustrated are transferrable to these other operators. A process that uses Forward Selection is shown in the next screenshot: The Retrieve operator (labeled 1) simply retrieves the sonar data from the local sample repository. This data has 208 examples and 60 regular attributes named attribute_1 to attribute_60. The label is named class and has two values, Rock and Mine. Forward Selection operator (labeled 2) tests the performance of a model on The examples containing more and more attributes. The inner operators within this operator perform this testing. The Log to Data operator (labeled 3) creates an example set from the log entries that were written inside the Forward selection operator. Example sets are easier to process and store in the repository. The Guess Types operator (labeled 4) changes the types of attributes based on their The contents. This is simply a cosmetic step to change real numbers into integers to make plotting them look better. Now, let's return to the Forward Selection operator, which starts by invoking its inner operators to check the model performance using each of the 60 regular attributes individually. This means it runs 60 times. The attribute that gives the best performance is then retained, and the process is repeated with two attributes using the remaining 59 attributes along with the best from the first run. The best pair of attributes is then retained, and the process is repeated with three attributes using each of the remaining 58. This is repeated until the stopping conditions are met. For illustrative purposes, the parameters shown in the following screenshot are chosen to allow it to continue for 60 iterations and use all the 60 attributes. The inner operator to the Forward Selection operator is a simple cross validation with the number of folds set to three. Using cross validation ensures that the performance is an estimate of what the performance would be on unseen data. Some overfitting will inevitably occur, and it is likely that setting the number of validations to three will increase this. However, this process is for illustrative purposes and needs to run reasonably quickly, and a low cross-validation count facilitates this. Inside the Validation operator itself, there are operators to generate a model, calculate performance, and log data. These are shown in the following screenshot: The Naïve Bayes operator is a simple model that does not require a large runtime to complete. Within the Validation operator, it runs on different training partitions of the data. The Apply Model and Performance operators check the performance of the operator using test partitions. The Log operator outputs information each time it is called, and the following screenshot shows the details of what it logs. Running the process gives the log output as shown in the following screenshot: It is worth understanding this output because it gives a good overview of how the operators work and fit together in a process. For example, the attributes applyCountPerformance, applyCountValidation, and applyCountForwardSelection increment by one each time the respective operator is executed. The expected behavior is that applyCountPerformance will increment with each new row in the result, applyCountValidation will increment every three rows, which corresponds to the number of cross validation folds, and applyCountForwardSelection will remain at 1 throughout the process. Note that validationPerformance is missing for the first three rows. This is because the validation operator has not calculated a performance yet. The first occurrence of the logging operator is called validationPerformance; it is the average of innerPerformance within the validation operator. So, for example, the values for innerPerformance are 0.652, 0.514, and 0.580 for the first three rows; these values average out to 0.582, which is the value for validationPerformance in the fourth row. The featureNames attribute shows the attributes that were used to create the various performance measurements. The results are plotted as a graph as shown: This shows that as the number of attributes increases, the validation performance increases and reaches a maximum when the number of attributes is 23. From there, it steadily decreases as the number of attributes reaches 60. The best performance is given by the attributes immediately before the maximum validationPerformance attribute value. In this case, the attributes are: attribute_12, attribute_40, attribute_16, attribute_11, attribute_6, attribute_28, attribute_19, attribute_17, attribute_44, attribute_37, attribute_30, attribute_53, attribute_47, attribute_22, attribute_41, attribute_54, attribute_34, attribute_23, attribute_27, attribute_39, attribute_57, attribute_36, attribute_10. The point is that the number of attributes has reduced and indeed the model accuracy has increased. In real-world situations with large datasets and a reduction in the attribute count, an increase in performance is very valuable. Summary This article has covered the important topic of reducing data size by both the removal of examples and attributes. This is important to speed up processing time, and in some cases can even improve classification accuracy. Generally though, classification accuracy reduces as data reduces. Resources for Article: Further resources on this subject: Data Analytics [Article] Using the Spark Shell [Article] Getting started with Haskell [Article]
Read more
  • 0
  • 0
  • 2059

article-image-visualization-big-data
Packt
20 Nov 2013
7 min read
Save for later

Visualization of Big Data

Packt
20 Nov 2013
7 min read
(For more resources related to this topic, see here.) Data visualization Data visualization is nothing but a representation of your data in graphical form. It is required to study the pattern and trend on the enriched dataset. Easiest way for human being to understand the data is through visualization. KPI library has developed A Periodic Table of Visualization Methods, which includes the six types of data visualization methods viz. Data visualization, Information visualization, Concept visualization, Strategy visualization and Metaphor visualization. Data source preparation Throughout the article, we will be working with CTools further to build a more interactive dashboard. We will use the nyse_stocks data, but need to change its structure. The data source for the dashboard will be a PDI transformation. Repopulating the nyse_stocks Hive table Execute the following steps: Launch Spoon and open the nyse_stock_transfer.ktr file from the code folder. Move NYSE-2000-2001.tsv.gz within the same folder with the transformation file. Run the transformation until it is finished. This process will produce the NYSE-2000-2001-convert.tsv.gz file. Open sandbox by visiting http://192.168.1.122:8000. On the menu bar, choose the File Browser menu, click on the Upload, and choose Files. Navigate to your NYSE-2000-2001-convert.tsv.gz file and wait until the uploading process finishes. On the menu bar, choose the HCatalog / Tables menu. From here, drop the existing nyse_stocks table. On the left-hand side pane, click on the Create a new table from a file link In the Table Name textbox, type nyse_stocks. Click on the NYSE-2000-2001-convert.tsv.gz file. If the file does not exist, make sure you navigate to the right user or name path On the Create a new table from a file page, accept all the options and click on the Create Table button. Once it is finished, the page redirects to HCatalog Table List. Click on the Browse Data button next to nyse_stocks. Make sure the month and year columns are now available. Pentaho's data source integration Execute the following steps: Launch Spoon and open hive_java_query.ktr from the code folder. This transformation acts as our data. The transformation consists of several steps, but the most important are three initial steps: Generate Rows: Its function is to generate a data row and the trigger execution of next sequence steps, which are Get Variable and User Defined Java Class Get Variable: This enables the transformation to identify a variable and converted it into a row field with its value User Defined Java Class: This contains a Java code to query Hive data Double-click on the User Defined Java Class step. The code begins with all the import of needed Java packages, followed by the processRow() method. The code actually is a query to Hive database using JDBC objects. What makes it different is the following code: ResultSet res = stmt.executeQuery(sql); while (res.next()) { get(Fields.Out, "period").setValue(rowd, res.getString(3) + "-" + res.getString(4)); get(Fields.Out, "stock_price_close").setValue(rowd, res.getDouble(1)); putRow(data.outputRowMeta, rowd); } The code will execute a SQL query statement to Hive. The result will be iterated and filled in the PDI's output rows. Column 1 of the result will be reproduced as stock_price_close. The concatenation of columns 3 and 4 of the result becomes period. On the User Defined Java Class step, click on the Preview this transformation menu. It may take minutes because of the MapReduce process and since it is a single-node Hadoop cluster. You will have better performance when adding more nodes to achieve an optimum cluster setup. You will have a data preview like the following screenshot: Consuming PDI as CDA data source To consume data through CTools, use Community Data Access (CDA) as it is the standard data access layer. CDA is able to connect to several sources including a Pentaho Data Integration transformation. The following steps will help you create CDA data sources consuming PDI transformation: Copy the Chapter 5 folder from your book's code bundle folder into [BISERVER]/pentaho-solutions and launch PUC. In the Browser Panel window, you should see a newly added Chapter 5. If it does not appear, on the Tools menu, click on Refresh and select Repository Cache. In the PUC Browser Panel window, right-click on NYSE Stock Price - Hive and choose Edit. Create appropriate data sources. In the Browser Panel window, double-click on stock_price_dashboard_hive.cda inside Chapter 5 to open a CDA data browser. The listbox contains data source names that we have created before; choose DataAccess ID: line_trend_data to preview its data. It will show a table with three columns (stock_symbol, period, and stock_price_close) and one parameter, stock_param_data, with a default value, ALLSTOCKS. Explore all the other data sources to gain a better understanding when working with the next examples. Visualizing data using CTools After we prepare Pentaho Data Integration transformation as a data source, let us move further to develop data visualizations using CTools. Visualizing trends using a line chart The following steps will help you create a line chart using a PDI data source: In the PUC Browser Panel window, right-click on NYSE Stock Price ? Hive and choose Edit; the CDE editor appears. In the menu bar, click on the Layout menu. Explore the layout of this dashboard. Its structure can be represented by the following diagram: Using the same procedure to create a line chart component, type in the values for the following line chart's properties: Name: ccc_line_chart Title: Average Close Price Trend Datasource: line_trend_data Height: 300 HtmlObject: Panel_1 seriesInRows: False Click on Save from the menu and in the Browser Panel window, double-click on the NYSE Stock Price ? Hive menu to open the dashboard page. Interactivity using parameter The following steps will help you create a stock parameter and link it to the chart component and data source: Open the CDE editor again, click on the Components menu. In the left-hand side panel, click on Generic and choose the Simple Parameter component. Now, a parameter component is added to the components group. Click on it and type stock_param in the Name property. In the left-hand side panel, click on Selects and choose the Select Component component. Type in the values for the following properties: Name: select_stock Parameter: stock_param HtmlObject: Filter_Panel_1 Values array: ["ALLSTOCKS","ALLSTOCKS"], ["ARM","ARM"],["BBX","BBX"], ["DAI","DAI"],["ROS","ROS"] To insert values in the Values array textbox, you need to create several pair values. To add a new pair, click on the textbox, a dialog will appear. Then click on the Add button to create a new pair of Arg and Value textboxes and type in the values as stated in this step. The dialog entries will look like the following screenshot: On the same editor page, select ccc_line_chart and click on the Parameters property. A parameter dialog appears, click on the Add button to create the first index of a parameter pair. Type in stock_param_data and stock_param in the Arg and Value textboxes, respectively. This will link the global stock_param parameter with the data source's stock_param_data parameter. We have specified the parameter in the previous walkthroughs. While still on ccc_line_chart, click on Listeners. In the listbox, choose stock_param and click on the OK button to accept it. This configuration will reload the chart if the value of the stock_param parameter changes Open the NYSE Stock Price ? Hive dashboard page again. Now, you have a filter that interacts well with the line chart data, as shown in the following screenshot: Summary In this article we learned about preparing a data source, visualizing data using CTools, and also how to create an interactive analytical dashboard that consumes data from Hive. Resources for Article: Further resources on this subject: Pentaho Reporting: Building Interactive Reports in Swing [Article] Pentaho – Using Formulas in Our Reports [Article] Getting Started with Pentaho Data Integration [Article]
Read more
  • 0
  • 0
  • 2815
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
article-image-styling-forms
Packt
20 Nov 2013
8 min read
Save for later

Styling the Forms

Packt
20 Nov 2013
8 min read
(For more resources related to this topic, see here.) CSS3 for web forms CSS3 brings us infinite new possibilities and allows styling to make better web forms. CSS3 gives us a number of new ways to create an impact with our form designs, with quite a few important changes. HTML5 introduced useful new form elements such as sliders and spinners and old elements such as textbox and textarea, and we can make them look really cool with our innovation and CSS3. Using CSS3, we can turn an old and boring form into a modern, cool, and eye catching one. CSS3 is completely backwards compatible, so we will not have to change the existing form designs. Browsers have and will always support CSS2. CSS3 forms can be split up into modules. Some of the most important CSS3 modules are: Selectors (with pseudo-selectors) Backgrounds and Borders Text (with Text Effects) Fonts Gradients Styling of forms always varies with requirements and the innovation of the web designer or developer. In this article, we will look at those CSS3 properties with which we can style our forms and give them a rich and elegant look. Some of the new properties of CSS3 required vendor prefixes, which were used frequently as they helped browsers to read the code. In general, it is no longer needed to use them with CSS3 for some of the properties, such as border-radius, but they come into action when the browser doesn't interpret the code. A list of all the vendor prefixes for major browsers is given as follows: -moz-: Firefox -webkit-: WebKit browsers such as Safari and Chrome -o-: Opera -ms-: Internet Explorer Before we start styling the form, let us have a quick revision of form modules for better understanding and styling of the forms. Selectors and pseudo-selectors Selectors are a pattern used to select the elements which we want to style. A selector can contain one or more simple selectors separated by combinators. The CSS3 Selectors module introduces three new attribute selectors; they are grouped together under the heading Substring Matching Attribute Selectors. These new selectors are as follows: [att^=val]: The "begins with" selector [att$=val]: The "ends with" selector [att*=val]: The "contains" selector The first of these new selectors, which we will refer to as the "begins with" selector, allows the selection of elements where a specified attribute (for example, the href attribute of a hyperlink) begins with a specified string (for example, http://, https://, or mailto:). In the same way, the additional two new selectors, which we will refer to as the "ends with" and "contains" selectors, allow the selection of elements where a specified attribute either ends with or contains a specified string respectively. A CSS pseudo-class is just an additional keyword to selectors that tells a special state of the element to be selected. For example, :hover will apply a style when the user hovers over the element specified by the selector. Pseudo-classes, along with pseudo-elements, apply a style to an element not only in relation to the content of the document tree, but also in relation to external factors like the history of the navigator, such as :visited, and the status of its content, such as :checked, on some form elements. The new pseudo-classes are as follows: Type Details :last-child It is used to match an element that is the last child element of its parent element. :first-child It is used to match an element that is the first child element of its parent element. :checked It is used to match elements such as radio buttons or checkboxes which are checked. :first-of-type It is used to match the first child element of the specified element type. :last-of-type It is used to match the last child element of the specified element type. :nth-last-of-type(N) It is used to match the Nth child element from the last of the specified element type. :only-child It is used to match an element if it's the only child element of its parent. :only-of-type It is used to match an element that is the only child element of its type. :root It is used to match the element that is the root element of the document. :empty It is used to match elements that have no children. :target It is used to match the current active element that is the target of an identifier in the document's URL. :enabled It is used to match user interface elements that are enabled. :nth-child(N) It is used to match every Nth child element of the parent. :nth-of-type(N) It is used to match every Nth child  element of the parent counting from the last of the parent. :disabled It is used to match user interface elements that are disabled. :not(S) It is used to match elements that aren't matched by the specified selector. :nth-last-child(N) Within a parent element's list of child elements, it is used to match elements on the basis of their positions. Backgrounds CSS3 contains several new background attributes; and moreover, in CSS3, some changes are also made in the previous properties of the background; which allow greater control on the background element. The new background properties added are as follows. The background-clip property The background-clip property is used to determine the allowable area for the background image. If there is no background image, then this property has only visual effects such as when the border has transparent regions or partially opaque regions; otherwise, the border covers up the difference. Syntax The syntax for the background-clip property are as follows: background-clip: no-clip / border-box / padding-box / content-box; Values The values for the background-clip property is as follows: border-box: With this, the background extends to the outside edge of the border padding-box: With this, no background is drawn below the border content-box: With this, the background is painted within the content box; only the area the content covers is painted no-clip: This is the default value, same as border-box The background-origin property The background-origin property specifies the positioning of the background image or color with respect to the background-position property. This property has no effect if the background-attachment property for the background image is fixed. Syntax The following is the syntax for the background-attachment property: background-origin: border-box / padding-box / content-box; Values The values for the background-attachment property are as follows: border-box: With this, the background extends to the outside edge of the border padding-box: By using this, no background is drawn below the border content-box: With this, the background is painted within the content box The background-size property The background-size property specifies the size of the background image. If this property is not specified then the original size of the image will be displayed. Syntax The following is the syntax for the background-size property: background-size: length / percentage / cover / contain; Values The values for the background-size property are as follows: length: This specifies the height and width of the background image. No negative values are allowed. percentage: This specifies the height and width of the background image in terms of the percent of the parent element. cover: This specifies the background image to be as large as possible so that the background area is completely covered. contain: This specifies the image to the largest size such that its width and height can fit inside the content area. Apart from adding new properties, CSS3 has also enhanced some old background properties, which are as follows. The background-color property If the underlying layer of the background image of the element cannot be used, we can specify a fallback color in addition to specifying a background color. We can implement this by adding a forward slash before the fallback color. background-color: red / blue; The background-repeat property In CSS2 when an image is repeated at the end, the image often gets cut off. CSS3 introduced new properties with which we can fix this problem: space: By using this property between the image tiles, an equal amount of space is applied until they fill the element round: By using this property until the tiles fit the element, the image is scaled down The background-attachment property With the new possible value of local, we can now set the background to scroll when the element's content is scrolled. This comes into action with elements that can scroll. For example: body{background-image:url('example.gif');background-repeat:no-repeat;background-attachment:fixed;} CSS3 allows web designers and developers to have multiple background images, using nothing but just a simple comma-separated list. For example: background-image: url(abc.png), url(xyz.png); Summary In this article, we learned about the basics of CSS3 and the modules in which we can categorize the CSS3 for forms, such as backgrounds. Using this we can improvise the look and feel of a form. This makes a form more effective and attractive. Resources for Article: Further resources on this subject: HTML5 Canvas [Article] Building HTML5 Pages from Scratch [Article] HTML5 Presentations - creating our initial presentation [Article]
Read more
  • 0
  • 0
  • 10789

article-image-understanding-web-based-applications-and-other-multimedia-forms
Packt
20 Nov 2013
5 min read
Save for later

Understanding Web-based Applications and Other Multimedia Forms

Packt
20 Nov 2013
5 min read
(For more resources related to this topic, see here.) However, we will not look at blogs, wikis, or social networking sites that are usually referred to as web-based reference tools. Moodle already has these, so instead we will take a look at web applications that allows the easy creation, collaboration, and sharing of multimedia elements, such as interactive floor planners, online maps, timelines, and many others applications that are very easy to use, and that support different learning styles. Usually, I use Moodle as a school operating system and web apps as its social applications, to illustrate what I believe can be a very powerful way of using Moodle and the web for learning. Designing meaningful activities in Moodle gives students the opportunity to express their creativity by using these tools, and reflecting on the produced multimedia artifacts with both peers and teacher. However, we have to keep in mind some issues of e-safety, backups, and licensing when using these online tools, usually associated with online communities. After all, we will have our students using them, and they will therefore be exposed to some risks. Creating dynamic charts using Google Drive (Spreadsheets) Assigning students in our Moodle course tasks will require them to use a tool like Google Spreadsheets to present their plans to colleagues in a visual way. Google Drive (http://drive.google.com) provides a set of online productivity tools that work on web standards and recreates a typical Office suite. We can make documents, spreadsheets, presentations, drawings, or forms. To use Google Drive, we will need a Google account. After creating our account and logging in to Google Drive, we can organize the files displayed on the right side of the screen, add them to folders, tag them, search (of course, it's Google!), collaborate (imagine a wiki spreadsheet), export to several formats (including the usual formats for Office documents from Microsoft, Open Office, or Adobe PDF), and publish these documents online. We will start by creating a new Spreadsheet to make a budget for a music studio which will be built during the music course, by navigating to CREATE | Spreadsheet. Insert a chart As in any spreadsheet application, we can add a title by double-clicking on Untitled spreadsheet and then we add some equipment and cost to the cells: After populating our table with values and selecting all of them, we should click on the Insert chart button. The Start tab will show up in the Chart Editor pop up, as shown in the following screenshot: If we click on the Charts tab, we can pick from a list of available charts. Let's pick one of the pie charts. In the Customize tab, we can add a title to the chart, and change its appearance: When everything is done, we can click on the Insert button, and the chart previewed in the Customize tab will be added to the spreadsheet. Publish If we click on the chart, a square will be displayed on the upper-right corner, and if we click on the drop-down arrow, we see a Publish chart... option, which can be used to publish the chart. When we click on this option, we will be presented with two ways of embedding the chart, the first, as an interactive chart, and the second, as an image. Both change dynamically if we change the values or the chart in Google Drive. We should use the image code to put the chart on a Moodle forum. Share, comment, and collaborate Google Drive has the options of sharing and allowing comments and changes in our spreadsheet by other people. On the upper-right corner of each opened document, there are two buttons for that, Comments and Share. To add collaborators to our spreadsheet, we have to click on the Share button and then add their contacts (for example, e-mail) in the Invite people: field, then click on the Share & save button, and hit Done. If a collaborator is working on the same spreadsheet, at the same time we are, we can see it below the Comments and Share buttons as shown in the following screenshot: If we click on the arrow next to 1 other viewer we can chat directly with the collaborator as we edit it collaboratively: Remember that, this can be quite useful in distance courses that have collaborative tasks assigned to groups. Creating a shared folder using Google Drive We can also use the sharing functionality to share documents with the collaborators (15 GB of space for that). In the main Google Drive page, we can create a folder by navigating to Create | Folder. We are then required to give it a name: The folder will be shown in the files and folder explorer in Google Drive: To share it with someone, we need to right-click the folder and choose the Share... option. Then, just like the process of sharing a spreadsheet we saw previously, we just need to add our collaborators' contacts (for example, e-mail) in the Invite people: field, then click on Share & save and hit Done. The invited people will receive an e-mail to add the shared folder to their Google Drive (they need a Google account for this) and it is done. Everything we add to this folder is automatically synced with everyone. This includes all the Google Drive documents, PDFs, and all the files uploaded to this folder. And it's an easy way to share multimedia projects between a group of people working on the same project.
Read more
  • 0
  • 0
  • 4082

article-image-foundations
Packt
20 Nov 2013
6 min read
Save for later

Foundations

Packt
20 Nov 2013
6 min read
(For more resources related to this topic, see here.) Installation If you do not have node installed, visit: http://nodejs.org/download/. There is also an installation guide on the node GitHub repository wiki if you prefer not to or cannot use an installer: https://github.com/joyent/node/wiki/Installation. Let's install Express globally: npm install -g express If you have downloaded the source code, install its dependencies by running this command: npm install Testing Express with Mocha and SuperTest Now that we have Express installed and our package.json file in place, we can begin to drive out our application with a test-first approach. We will now install two modules to assist us: mocha and supertest. Mocha is a testing framework for node; it's flexible, has good async support, and allows you to run tests in both a TDD and BDD style. It can also be used on both the client and server side. Let's install Mocha with the following command: npm install -g mocha –-save-dev SuperTest is an integration testing framework that will allow us to easily write tests against a RESTful HTTP server. Let's install SuperTest: npm install supertest –-save-dev Continuous testing with Mocha One of the great things about working with a dynamic language and one of the things that has drawn me to node is the ability to easily do Test-Driven Development and continuous testing. Simply run Mocha with the -w watch switch and Mocha will respond when changes to our codebase are made, and will automatically rerun the tests: mocha -w Extracting routes Express supports multiple options for application structure. Extracting elements of an Express application into separate files is one option; a good candidate for this is routes. Let's extract our route heartbeat into ./lib/routes/heartbeat.js; the following listing simply exports the route as a function called index: exports.index = function(req, res){ res.json(200, 'OK'); }; Let's make a change to our Express server and remove the anonymous function we pass to app.get for our route and replace it with a call to the function in the following listing. We import the route heartbeat and pass in a callback function, heartbeat.index: var express = require('express') , http = require('http') , config = require('../configuration') , heartbeat = require('../routes/heartbeat') , app = express(); app.set('port', config.get('express:port')); app.get('/heartbeat', heartbeat.index); http.createServer(app).listen(app.get('port')); module.exports = app; 404 handling middleware In order to handle a 404 Not Found response, let's add a 404 not found middleware. Let's write a test, ./test/heartbeat.js; the content type returned should be JSON and the status code expected should be 404 Not Found: describe('vision heartbeat api', function(){ describe('when requesting resource /missing', function(){ it('should respond with 404', function(done){ request(app) .get('/missing') .expect('Content-Type', /json/) .expect(404, done); }) }); }); Now, add the following middleware to ./lib/middleware/notFound.js. Here we export a function called index and call res.json, which returns a 404 status code and the message Not Found. The next parameter is not called as our 404 middleware ends the request by returning a response. Calling next would call the next middleware in our Express stack; we do not have any more middleware due to this, it's customary to add error middleware and 404 middleware as the last middleware in your server: exports.index = function(req, res, next){ res.json(404, 'Not Found.'); }; Now add the 404 not found middleware to ./lib/express/index.js: var express = require('express') , http = require('http') , config = require('../configuration') , heartbeat = require('../routes/heartbeat') , notFound = require('../middleware/notFound') , app = express(); app.set('port', config.get('express:port')); app.get('/heartbeat', heartbeat.index); app.use(notFound.index); http.createServer(app).listen(app.get('port')); module.exports = app; Logging middleware Express comes with a logger middleware via Connect; it's very useful for debugging an Express application. Let's add it to our Express server ./lib/express/index.js: var express = require('express') , http = require('http') , config = require('../configuration') , heartbeat = require('../routes/heartbeat') , notFound = require('../middleware/notFound') , app = express(); app.set('port', config.get('express:port')); app.use(express.logger({ immediate: true, format: 'dev' })); app.get('/heartbeat', heartbeat.index); app.use(notFound.index); http.createServer(app).listen(app.get('port')); module.exports = app; The immediateoption will write a log line on request instead of on response. The devoption provides concise output colored by the response status. The logger middleware is placed high in the Express stack in order to log all requests. Logging with Winston We will now add logging to our application using Winston; let's install Winston: npm install winston --save The 404 middleware will need to log 404 not found, so let's create a simple logger module, ./lib/logger/index.js; the details of our logger will be configured with Nconf. We import Winston and the configuration modules. We define our Logger function, which constructs and returns a file logger—winston.transports. File—that we configure using values from our config. We default the loggers maximum size to 1 MB, with a maximum of three rotating files. We instantiate the Logger function, returning it as a singleton. var winston = require('winston') , config = require('../configuration'); function Logger(){ return winston.add(winston.transports.File, { filename: config.get('logger:filename'), maxsize: 1048576, maxFiles: 3, level: config.get('logger:level') }); } module.exports = new Logger(); Let's add the Loggerconfiguration details to our config files ./config/ development.jsonand ./config/test.json: { "express": { "port": 3000 }, "logger" : { "filename": "logs/run.log", "level": "silly", } } Let's alter the ./lib/middleware/notFound.js middleware to log errors. We import our logger and log an error message via logger when a 404 Not Found response is thrown: var logger = require("../logger"); exports.index = function(req, res, next){ logger.error('Not Found'); res.json(404, 'Not Found'); }; Summary This article has shown in detail with all the commands how Node.js is installed along with Express. The testing of our Express with Mocha and SuperTest is shown in detail. The logging in into our application is shown with middleware and Winston. Resources for Article: Further resources on this subject: Spring Roo 1.1: Working with Roo-generated Web Applications [Article] Building tiny Web-applications in Ruby using Sinatra [Article] Develop PHP Web Applications with NetBeans, VirtualBox and Turnkey LAMP Appliance [Article]
Read more
  • 0
  • 0
  • 2499

article-image-creating-and-utilizing-custom-entities
Packt
20 Nov 2013
16 min read
Save for later

Creating and Utilizing Custom Entities

Packt
20 Nov 2013
16 min read
(For more resources related to this topic, see here.) Introducing the entity system The entity system exists to spawn and manage entities in the game world. Entities are logical containers, allowing drastic changes in behavior at runtime. For example, an entity can change its model, position, and orientation at any point in the game. Consider this; every item, weapon, vehicle, and even player that you have interacted with in the engine is an entity. The entity system is one of the most important modules present in the engine, and is dealt regularly by programmers. The entity system, accessible via the IEntitySystem interface, manages all entities in the game. Entities are referenced to using the entityId type definition, which allows 65536 unique entities at any given time. If an entity is marked for deletion, for example, IEntity::Remove(bool bNow = false), the entity system will delete this prior to updating at the start of the next frame. If the bNow parameter is set to true, the entity will be removed right away. Entity classes Entities are simply instances of an entity class, represented by the IEntityClass interface. Each entity class is assigned a name that identifies it, for example, SpawnPoint. Classes can be registered via, IEntityClassRegistry::RegisterClass, or via IEntityClassRegistry::RegisterStdClass to use the default IEntityClass implementation. Entities The IEntity interface is used to access the entity implementation itself. The core implementation of IEntity is contained within, CryEntitySystem.dll, and cannot be modified. Instead, we are able to extend entities using game object extensions (have a look at the Game object extensions section in this article) and custom entity classes. entityId Each entity instance is assigned a unique identifier, which persists for the duration of the game session. EntityGUID Besides the entityId parameter, entities are also given globally unique identifiers, which unlike entityId can persist between game sessions, in the case of saving games and more. Game objects When entities need extended functionality, they can utilize game objects and game object extensions. This allows for a larger set of functionality that can be shared by any entity. Game objects allow the handling of binding entities to the network, serialization, per-frame updates, and the ability to utilize existing (or create new) game object extensions such as Inventory and AnimatedCharacter. Typically in CryENGINE development, game objects are only necessary for more important entity implementations, such as actors. The entity pool system The entity pool system allows "pooling" of entities, allowing efficient control of entities that are currently being processed. This system is commonly accessed via flowgraph, and allows the disabling/enabling groups of entities at runtime based on events. Pools are also used for entities that need to be created and released frequently, for example, bullets. Once an entity has been marked as handled by the pool system, it will be hidden in the game by default. Until the entity has been prepared, it will not exist in the game world. It is also ideal to free the entity once it is no longer needed. For example, if you have a group of AI that only needs to be activated when the player reaches a predefined checkpoint trigger, this can be set up using AreaTrigger (and its included flownode) and the Entity:EntityPool flownode. Creating a custom entity Now that we've learned the basics of the entity system, it's time to create our first entity. For this exercise, we'll be demonstrating the ability to create an entity in Lua, C#, and finally C++. . Creating an entity using Lua Lua entities are fairly simple to set up, and revolve around two files: the entity definition, and the script itself. To create a new Lua entity, we'll first have to create the entity definition in order to tell the engine where the script is located: <Entity Name="MyLuaEntity" Script="Scripts/Entities/Others/MyLuaEntity.lua" /> Simply save this file as MyLuaEntity.ent in the Game/Entities/ directory, and the engine will search for the script at Scripts/Entities/Others/MyLuaEntity.lua. Now we can move on to creating the Lua script itself! To start, create the script at the path set previously and add an empty table with the same name as your entity: MyLuaEntity = { } When parsing the script, the first thing the engine does is search for a table with the same name as the entity, as you defined it in the .ent definition file. This main table is where we can store variables, Editor properties, and other engine information. For example, we can add our own property by adding a string variable: MyLuaEntity = { Properties = { myProperty = "", }, } It is possible to create property categories by adding subtables within the Properties table. This is useful for organizational purposes. With the changes done, you should see the following screenshot when spawning an instance of your class in the Editor, via RollupBar present to the far right of the Editor by default: Common Lua entity callbacks The script system provides a set of callbacks that can be utilized to trigger specific logic on entity events. For example, the OnInit function is called on the entity when it is initialized: function MyEntity:OnInit() end Creating an entity in C# The third-party extension, CryMono allows the creation of entities in .NET, which leads us to demonstrate the capability of creating our very own entity in C#. To start, open the Game/Scripts/Entities directory, and create a new file called MyCSharpEntity.cs. This file will contain our entity code, and will be compiled at runtime when the engine is launched. Now, open the script (MyCSharpEntity.cs) IDE of your choice. We'll be using Visual Studio in order to provide IntelliSense and code highlighting. Once opened, let's create a basic skeleton entity. We'll need to add a reference to the CryENGINE namespace, in which the most common CryENGINE types are stored. using CryEngine; namespace CryGameCode { [Entity] public class MyCSharpEntity : Entity { } } Now, save the file and start the Editor. Your entity should now appear in RollupBar, inside the Default category. Drag MyEntity into the viewport in order to spawn it: We use the entity attribute ([Entity]) as a way of providing additional information for the entity registration progress, for example, using the Category property will result in using a custom Editor category, instead of Default. [Entity(Category = "Others")] Adding Editor properties Editor properties allow the level designer to supply parameters to the entity, perhaps to indicate the size of a trigger area, or to specify an entity's default health value. In CryMono, this can be done by decorating supported types (have a look at the following code snippet) with the EditorProperty attribute. For example, if we want to add a new string property: [EditorProperty] public string MyProperty { get; set; } Now when you start the Editor and drag MyCSharpEntity into the viewport, you should see MyProperty appear in the lower part of RollupBar. The MyProperty string variable in C# will be automatically updated when the user edits this via the Editor. Remember that Editor properties will be saved with the level, allowing the entity to use Editor properties defined by the level designer even in pure game mode. Property folders As with Lua scripts, it is possible for CryMono entities to place Editor properties in folders for organizational purposes. In order to create folders, you can utilize the Folder property of the EditorProperty attribute as shown: [EditorProperty(Folder = "MyCategory")] You now know how to create entities with custom Editor properties using CryMono! This is very useful when creating simple gameplay elements for level designers to place and modify at runtime, without having to reach for the nearest programmer. Creating an entity in C++ Creating an entity in C++ is slightly more complex than making one using Lua or C#, and can be done differently based on what the entity is required for. For this example, we'll be detailing the creation of a custom entity class by implementing IEntityClass. Creating a custom entity class Entity classes are represented by the IEntityClass interface, which we will derive from and register via IEntityClassRegistry::RegisterClass(IEntityClass *pClass). To start off, let's create the header file for our entity class. Right-click on your project in Visual Studio, or any of its filters, and go to Add | New Item in the context menu. When prompted, create your header file ( .h). We'll be calling CMyEntityClass. Now, open the generated MyEntityClass.h header file, and create a new class which derives from IEntityClass: #include <IEntityClass.h> class CMyEntityClass : public IEntityClass { }; Now that we have the class set up, we'll need to implement the pure virtual methods we inherit from IEntityClass in order for our class to compile successfully. For most of the methods, we can simply return a null pointer, zero, or an empty string. However, there are a couple of methods which we have to handle for the class to function: Release(): This is called when the class should be released, should simply perform "delete this;" to destroy the class GetName(): This should return the name of the class GetEditorClassInfo(): This should return the ClassInfo struct, containing Editor category, helper, and icon strings to the Editor SetEditorClassInfo(): This is called when something needs to update the Editor ClassInfo explained just now. IEntityClass is the bare minimum for an entity class, and does not support Editor properties yet (we will cover this a bit further later). To register an entity class, we need to call IEntityClassRegistry::RegisterClass. This has to be done prior to the IGameFramework::CompleteInit call in CGameStartup. We'll be doing it inside GameFactory.cpp, in the InitGameFactory function: IEntityClassRegistry::SEntityClassDesc classDesc; classDesc.sName = "MyEntityClass"; classDesc.editorClassInfo.sCategory = "MyCategory"; IEntitySystem *pEntitySystem = gEnv->pEntitySystem; IEntityClassRegistry *pClassRegistry = pEntitySystem- >GetClassRegistry(); bool result = pClassRegistry->RegisterClass(new CMyEntityClass(classDesc)); Implementing a property handler In order to handle Editor properties, we'll have to extend our IEntityClass implementation with a new implementation of IEntityPropertyHandler. The property handler is responsible for handling the setting, getting, and serialization of properties. Start by creating a new header file named MyEntityPropertyHandler.h. Following is the bare minimum implementation of IEntityPropertyHandler. In order to properly support properties, you'll need to implement SetProperty and GetProperty, as well as LoadEntityXMLProperties (the latter being required to read property values from the Level XML). Then create a new class which derives from IEntityPropertyHandler: class CMyEntityPropertyHandler : public IEntityPropertyHandler { }; In order for the new class to compile, you'll need to implement the pure virtual methods defined in IEntityPropertyHandler. Methods crucial for the property handler to work properly can be seen as shown: LoadEntityXMLProperties: This is called by the Launcher when a level is being loaded, in order to read property values of entities saved by the Editor GetPropertyCount: This should return the number of properties registered with the class GetPropertyInfo: This is called to get the property information at the specified index, most importantly when the Editor gets the available properties SetProperty: This is called to set the property value for an entity GetProperty: This is called to get the property value of an entity GetDefaultProperty: This is called to retrieve the default property value at the specified index To make use of the new property handler, create an instance of it (passing the requested properties to its constructor) and return the newly created handler inside IEntityClass::GetPropertyHandler(). We now have a basic entity class implementation, which can be easily extended to support Editor properties. This implementation is very extensible, and can be used for vast amount of purposes, for example, the C# script seen later has simply automated this process, lifting the responsibility of so much code from the programmer. Entity flownodes You may have noticed that when right-clicking inside a graph, one of the context options is Add Selected Entity. This functionality allows you to select an entity inside a level, and then add its entity flownode to the flowgraph. By default, the entity flownode doesn't contain any ports, and will therefore be mostly useless as shown to the right. However, we can easily create our own entity flownode that targets the entity we selected in all three languages. Creating an entity flownode in Lua By extending the entity we created in the Creating an entity using Lua section, we can add its very own entity flownode: function MyLuaEntity:Event_OnBooleanPort() BroadcastEvent(self, "MyBooleanOutput"); end MyLuaEntity.FlowEvents = { Inputs = { MyBooleanPort = { MyLuaEntity.Event_OnBooleanPort, "bool" }, }, Outputs = { MyBooleanOutput = "bool", }, } We just created an entity flownode for our MyLuaEntity class. If you start the Editor, spawn your entity, select it and then click on Add Selected Entity in your flowgraph, you should see the node appearing. Creating an entity flownode using C# Creating an entity flownode in C# is very simple due to being almost exactly identical in implementation as the regular flownodes. To create a new flownode for your entity, simply derive from EntityFlowNode, where T is your entity class name: using CryEngine.Flowgraph; public class MyEntity : Entity { } public class MyEntityNode : EntityFlowNode { [Port] public void Vec3Test(Vec3 input) { } [Port] public void FloatTest(float input) { } [Port] public void VoidTest() { } [Port] OutputPort BoolOutput { get; set; } } We just created an entity flownode in C#. This allows us to utilize TargetEntity in our new node's logic. Creating an entity flownode in C++ In short, entity flownodes are identical in implementation to regular nodes. The difference being the way the node is registered, as well as the prerequisite for the entity to support TargetEntity. Registering the entity node We utilize same methods for registering entity nodes as before, the only difference being that the category has to be entity, and the node name has to be the same as the entity it belongs to: REGISTER_FLOW_NODE("entity:MyCppEntity", CMyEntityFlowNode); The final code Finally, from what we've learned now, we can easily create our first entity flownode in C++: #include "stdafx.h" #include "Nodes/G2FlowBaseNode.h" class CMyEntityFlowNode : public CFlowBaseNode { enum EInput { EIP_InputPort, }; enum EOutput { EOP_OutputPort }; public: CMyEntityFlowNode(SActivationInfo *pActInfo) { } virtual IFlowNodePtr Clone(SActivationInfo *pActInfo) { return new CMyEntityFlowNode(pActInfo); } virtual void ProcessEvent(EFlowEvent evt, SActivationInfo *pActInfo) { } virtual void GetConfiguration(SFlowNodeConfig &config) { static const SInputPortConfig inputs[] = { InputPortConfig_Void("Input", "Our first input port"), {0} }; static const SOutputPortConfig outputs[] = { OutputPortConfig_Void("Output", "Our first output port"), {0} }; config.pInputPorts = inputs; config.pOutputPorts = outputs; config.sDescription = _HELP("Entity flow node sample"); config.nFlags |= EFLN_TARGET_ENTITY; } virtual void GetMemoryUsage(ICrySizer *s) const { s->Add(*this); } }; REGISTER_FLOW_NODE("entity:MyCppEntity", CMyEntityFlowNode); Game objects As mentioned at the start of the article, game objects are used when more advanced functionality is required of an entity, for example, if an entity needs to be bound to the network. There are two ways of implementing game objects, one being by registering the entity directly via IGameObjectSystem::RegisterExtension (and thereby having the game object automatically created on entity spawn), and the other is by utilizing the IGameObjectSystem::CreateGameObjectForEntity method to create a game object for an entity at runtime. Game object extensions It is possible to extend game objects by creating extensions, allowing the developer to hook into a number of entity and game object callbacks. This is, for example, how actors are implemented by default. We will be creating our game object extension in C++. The CryMono entity we created earlier in the article was made possible by a custom game object extension contained in CryMono.dll, and it is currently not possible to create further extensions via C# or Lua. Creating a game object extension in C++ CryENGINE provides a helper class template for creating a game object extension, called CGameObjectExtensionHelper. This helper class is used to avoid duplicating common code that is necessary for most game object extensions, for example, basic RMI functionality. To properly implement IGameObjectExtension, simply derive from the CGameObjectExtensionHelper template, specifying the first template argument as the class you're writing (in our case, CMyEntityExtension) and the second as IGameObjectExtension you're looking to derive from. Normally, the second argument is IGameObjectExtension, but it can be different for specific implementations such as IActor (which in turn derives from IGameObjectExtension). class CMyGameObjectExtension : public CGameObjectExtensionHelper<CMyGameObjectExtension, IGameObjectExtension> { }; Now that you've derived from IGameObjectExtension, you'll need to implement all its pure virtual methods to spare yourself from a bunch of unresolved externals. Most can be overridden with empty methods that return nothing or false, while more important ones have been listed as shown: Init: This is called to initialize the extension. Simply performSetGameObject(pGameObject); and then return true. NetSerialize: This is called to serialize things over the network. You'll also need to implement IGameObjectExtensionCreatorBase in a new class that will serve as an extension factory for your entity. When the extension is about to be activated, our factory's Create() method will be called in order to obtain the new extension instance: struct SMyGameObjectExtensionCreator : public IGameObjectExtensionCreatorBase { virtual IGameObjectExtension *Create() { return new CMyGameObjectExtension(); } virtual void GetGameObjectExtensionRMIData(void **ppRMI, size_t *nCount) { return CMyGameObjectExtension::GetGameObjectExtensionRMIData (ppRMI, nCount); } }; Now that you've created both your game object extension implementation, as well as the game object creator, simply register the extension: static SMyGameObjectExtensionCreator creator; gEnv->pGameFramework->GetIGameObjectSystem()- >RegisterExtension("MyGameObjectExtension", &creator, myEntityClassDesc); By passing the entity class description to IGameObjectSystem::RegisterExtension, you're telling it to create a dummy entity class for you. If you have already done so, simply pass the last parameter pEntityCls as NULL to make it use the class you registered before. Activating our extension In order to activate your game object extension, you'll need to call IGameObject::ActivateExtension after the entity is spawned. One way to do this is using the entity system sink, IEntitySystemSink, and listening to the OnSpawn events. We've now registered our own game object extension. When the entity is spawned, our entity system sink's OnSpawn method will be called, allowing us to create an instance of our game object extension. Summary In this article, we have learned how the core entity system is implemented and exposed and created our own custom entity. You should now be aware of the process of creating accompanying flownodes for your entities, and be aware of the working knowledge surrounding game objects and their extensions. If you want to get more familiar with the entity system, you can try and create a slightly more complex entity on your own. Resources for Article: Further resources on this subject: CryENGINE 3: Breaking Ground with Sandbox [Article] CryENGINE 3: Fun Physics [Article] How to Create a New Vehicle in CryENGINE 3 [Article]
Read more
  • 0
  • 0
  • 3972
article-image-overview-process-management-microsoft-visio-2013
Packt
20 Nov 2013
6 min read
Save for later

Overview of Process Management in Microsoft Visio 2013

Packt
20 Nov 2013
6 min read
(For more resources related to this topic, see here.) When Visio was first conceived of over 20 years ago, its first stated marketing aim was to outsell ABC Flowcharter, the best-selling process diagramming tool at the time. Therefore, Visio had to have all of the features from the start that are core in the creation of flowcharts, namely the ability to connect one shape to another and to have the lines route themselves around shapes. Visio soon achieved its aim, and looked for other targets to reach. So, process flow diagrams have long been a cornerstone of Visio's popularity and appeal and, although there have been some usability improvements over the years, there have been few enhancements to turn the diagrams into models that can be managed efficiently. Microsoft Visio 2010 saw the introduction of two features, structured diagrams and validation rules, that make process management achievable and customizable, and Microsoft Visio 2013 sees these features enhanced. In this article, you will be introduced to the new features that have been added to Microsoft Visio to support structured diagrams and validation. You will see where Visio fits in the Process Management stack, and explore the relevant out of the box content. Exploring the new process management features in Visio 2013 Firstly, Microsoft Visio 2010 introduced a new Validation API for structured diagrams and provided several examples of this in use, for example with the BPMN (Business Process Modeling Notation) Diagram and Microsoft SharePoint Workflow templates and the improvements to the Basic Flowchart and Cross-Functional Flowchart templates, all of which are found in the Flowchart category. Microsoft Visio 2013 has updated the version of BPMN from 1.1 to 2.0, and has introduced a new SharePoint 2013 Workflow template, in addition to the 2010 one. Templates in Visio consist of a predefined Visio document that has one or more pages, and may have a series of docked stencils (usually positioned on the left-hand side of workspace area). The template document may have an associated list of add-ons that are active while it is in use, and, with Visio 2013 Professional edition, an associated list of structured diagram validation rulesets as well. Most of the templates that contain validation rules in Visio 2013 are in the Flowchart category, as seen in the following screenshot, with the exception being the Six Sigma template in the Business category. Secondly, the concept of a Subprocess was introduced in Visio 2010. This enables processes to hyperlink to other pages describing the subprocesses in the same document, or even across documents. This latter point is necessary if subprocesses are stored in a document library, such as Microsoft SharePoint. The following screenshot illustrates how an existing subprocess can be associated with a shape in a larger process, selecting an existing shape in the diagram, before selecting the existing page that it links to from the drop-down menu on the Link to Existing button. In addition, a subprocess page can be created from an existing shape, or a selection of shapes, in which case they will be moved to the newly-created page. There were also a number of ease-of-use features introduced in Microsoft Visio 2010 to assist in the creation and revision of process flow diagrams. These include: Easy auto-connection of shapes Aligning and spacing of shapes Insertion and deletion of connected shapes Improved cross-functional flowcharts Subprocesses An infinite page option, so you need not go over the edge of the paper ever again Microsoft Visio 2013 has added two more notable features: Commenting (a replacement for the old reviewer's comments) Co-authoring However, this book is not about teaching the user how to use these features, since there will be many other authors willing to show you how to perform tasks that only need to be explained once. This book is about understanding the Validation API in particular, so that you can create, or amend, the rules to match the business logic that your business requires. Reviewing Visio Process Management capabilities Microsoft Visio now sits at the top of the Microsoft Process Management Product Stack, providing a Business Process Analysis (BPA) or Business Process Modeling (BPM) tool for business analysts, process owners/participants, and line of business software architects/developers. Understanding the Visio BMP Maturity Model If we look at the Visio BPM Maturity Model that Microsoft has previously presented to its partners, then we can see that Visio 2013 has filled some of the gaps that were still there after Visio 2010. However, we can also see that there are plenty of opportunities for partners to provide solutions on top of the Visio platform. The maturity model shows how Visio initially provided the means to capture paper-drawn business processes into electronic format, and included the ability to encapsulate data into each shape and infer the relationship and order between elements through connectors. Visio 2007 Professional added the ability to easily link shapes, which represent processes, tasks, decisions, gateways, and so on with a data source. Along with that, data graphics were provided to enable shape data to be displayed simply as icons, data bars, text, or to be colored by value. This enriched the user experience and provided quicker visual representation of data, thus increasing the comprehension of the data in the diagrams. Generic templates for specific types of business modeling were provided. Visio had a built-in report writer for many versions, which provided the ability to export to Excel or XML, but Visio 2010 Premium introduced the concept of validation and structured diagrams, which meant that the information could be verified before exporting. Some templates for specific types of business modeling were provided. Visio 2010 Premium also saw the introduction of Visio Services on SharePoint that provided the automatic (without involving the Visio client) refreshing of data graphics that were linked to specific types of data sources. Throughout this book we will be going into detail about Level 5 (Validation) in Visio 2013, because it is important to understand the core capabilities provided in Visio 2013. We will then be able to take the opportunity to provide custom Business Rule Modeling and Visualization. Reviewing the foundations of structured diagramming A structured diagram is a set of logical relationships between items, where these relationships provide visual organization or describe special interaction behaviors between them. The Microsoft Visio team analyzed the requirements for adding structure to diagrams and came up with a number of features that needed to be added to the Visio product to achieve this: Container Management: The ability to add labeled boxes around shapes to visually organize them Callout Management: The ability to associate callouts with shapes to display notes List Management: To provide order to shapes within a container Validation API: The ability to test the business logic of a diagram Connectivity API: The ability to create, remove, or traverse connections easily The following diagram demonstrates the use of Containers and Callouts in the construction of a basic flowchart, that has been validated using the Validation API, which in turn uses the Connectivity API.
Read more
  • 0
  • 0
  • 9330

article-image-setting-namenode-ha
Packt
20 Nov 2013
11 min read
Save for later

Setting Up NameNode HA

Packt
20 Nov 2013
11 min read
(For more resources related to this topic, see here.) We will configure our NameNode HA setup by adding several options to the core-site.xml file. The following is the structure of the file for this particular step. It will give you an idea of the XML structure, if you are not familiar with it. The header comments are stripped out: <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.default.name</name> <value>hdfs://sample-cluster/</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>nn1.hadoop.test.com:2181,nn2.hadoop.test.com:2181,jt1.hadoop. test.com:2181 </value> </property> </configuration> The configuration file format is pretty much self-explanatory; variables are surrounded by the <property> tag, and each variable has a name and a value. There are only two variables that we need to add at this stage. fs.default.name is the logical name of the NameNode cluster. The value hdfs://sample-cluster/ is specific to the HA setup. This is the logical name of the NameNode cluster. We will define the servers that comprise of it in the hdfs-site.xml file. In a non-HA setup, this variable is assigned a host and a port of the NameNode, since there is only one NameNode in the cluster. The ha.zookeeper.quorum variable specifies locations and ports of the ZooKeeper servers. The ZooKeeper cluster can be used by other services, such as HBase, that is why it is defined in core-site.xml. The next step is to configure the hdfs-site.xml file and add all HDFS-specific parameters there. I will omit the <property> tag and only include <name> and <value> to make the list less verbose. <name>dfs.name.dir</name> <value>/dfs/nn/</value> NameNode will use the location specified by the dfs.name.dir variable to store the persistent snapshot of HDFS metadata. This is where the fsimage file will be stored. As discussed previously, the volume on which this directory resides needs to be backed by RAID. Losing this volume means losing NameNode completely. The /dfs/nn path is an example, however you are free to choose your own. You can actually specify several paths with a dfs.name.dir value, separating them by commas. NameNode will mirror the metadata files in each directory specified. If you have a shared network storage available, you can use it as one of the destinations for HDFS metadata. This will provide additional offsite backups. <name>dfs.nameservices</name> <value>sample-cluster</value> The dfs.nameservices variable specifies the logical name of the NameNode cluster and should be replaced by something that makes sense to you, such as prod-cluster or stage-cluster. The value of dfs.nameservices must match the value of fs.default.name from the core-site.xml file. <name>dfs.ha.namenodes.sample-cluster</name> <value>nn1,nn2</value> Here, we specify the NameNodes that make up our HA cluster setup. These are logical names, not real server hostnames or IPs. These logical names will be referenced in other configuration variables. <name>dfs.namenode.rpc-address.sample-cluster.nn1</name> <value>nn1.hadoop.test.com:8020</value> <name>dfs.namenode.rpc-address.sample-cluster.nn2</name> <value>nn2.hadoop.test.com:8020</value> This pair of variables provide mapping from logical names like nn1 and nn2 to the real host and port value. By default, NameNode daemons use port 8020 for communication with clients and each other. Make sure this port is open for the cluster nodes. <name>dfs.namenode.http-address.sample-cluster.nn1</name> <value>nn1.hadoop.test.com:50070</value> <name>dfs.namenode.http-address.sample-cluster.nn2</name> <value>nn2.hadoop.test.com:50070</value> Each NameNode daemon runs a built-in HTTP server, which will be used by the NameNode web interface to expose various metrics and status information about HDFS operations. Additionally, standby NameNode uses HTTP calls to periodically copy the fsimage file from the primary server, perform the checkpoint operation, and ship it back. <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://nn1.hadoop.test.com:8485;nn2.hadoop.test.com:8485; jt1.hadoop.test.com:8485/sample-cluster</value> The dfs.namenode.shared.edits.dir variable specifies the setup of the JournalNode cluster. In our configuration, there are three JournalNodes running on nn1, nn2, and nn3. Both primary and standby nodes will use this variable to identify which hosts they should contact to send or receive new changes from editlog. <name>dfs.journalnode.edits.dir</name> <value>/dfs/journal</value> JournalNodes need to persist editlog changes that are being submitted to them by the active NameNode. The dfs.journalnode.edits.dir variable specifies the location on the local filesystem where editlog changes will be stored. Keep in mind that this path must exist on all JournalNodes and the ownership of all directories must be set to hdfs:hdfs (user and group). <name>dfs.client.failover.proxy.provider.sample-cluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha. ConfiguredFailoverProxyProvider</value> In an HA setup, clients that access HDFS need to know which NameNode to contact for their requests. The dfs.client.failover.proxy.provider.sample-cluster variable specifies the Java class name, which will be used by clients for determining the active NameNode. At the moment, there is only ConfiguredFailoverProxyProvider available. <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> The dfs.ha.automatic-failover.enabled variable indicates if the NameNode cluster will use a manual or automatic failover. <name>dfs.ha.fencing.methods</name> <value>sshfence shell(/bin/true) </value> Orchestrating failover in a cluster setup is a complicated task involving multiple steps. One of the common problems that is not unique to the Hadoop cluster, but affects any distributed systems, is a "split-brain" scenario. Split-brain is a case where two NameNodes decide they both play an active role and start writing changes to the editlog. To prevent such an issue from occurring, the HA configuration maintains a marker in ZooKeeper, clearly stating which NameNode is active, and JournalNodes accepts writes only from that node. To be absolutely sure that the two NameNodes don't become active at the same time, a technique called fencing is used during failover. The idea is to force the shutdown of the active NameNode before transferring the active state to a standby. There are two fencing methods currently available: sshfence and shell. sshfence. These require a passwordless ssh access as a user that starts the NameNode daemon, from the active NameNode to the standby and vice versa. By default, this is the hdfs user. The fencing process checks if there is anyone listening on a NameNode port using the nc command, and if the port is found busy, it tries to kill the NameNode process. Another option for dfs.ha.fencing.methods is shell. This will execute the specified shell script to perform fencing. It is important to understand that failover will fail if fencing fails. In our case, we specified two options, the second one always returns success. This is done for workaround cases where the primary NameNode machine goes down and the ssh method will fail, and no failover will be performed. We want to avoid this, so the second option would be to failover anyway, even without fencing, which, as already mentioned, is safe with our setup. To achieve this, we specify two fencing methods, which will be tried by ZKFC in the order of: if the first one fails, the second one will be tried. In our case, the second one will always return success and failover will be initiated, even if the server running the primary NameNode is not available via ssh. <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/var/lib/hadoop-hdfs/.ssh/id_rsa</value> The last option we will need to configure for NameNode HA setup is the ssh key, which will be used by sshfence. Make sure you change the ownership for this file to hdfs user. Two keys need to be generated, one for the primary and one for the secondary NameNode. It is a good idea to test ssh access as an hdfs user in both directions to make sure it is working fine. The hdfs-site.xml configuration file is now all set for testing the HA setup. Don't forget to sync these configuration files to all the nodes in the cluster. The next thing that needs to be done is to start JournalNodes. Execute this command on nn1, nn2, and jt1 a root user: # service hadoop-hdfs-journalnode start With CDH, it is recommended to always use the service command instead of calling scripts in /etc/init.d/ directly. This is done to guarantee that all environment variables are set up properly before the daemon is started. Always check the logfiles for daemons. Now, we need to initially format HDFS. For this, run the following command on nn1: # sudo -u hdfs hdfs namenode –format This is the initial setup of the NameNode, so we don't have to worry about affecting any HDFS metadata, but be careful with this command, because it will destroy any previous metadata entries. There is no strict requirement to run format command on nn1, but to make it easier to follow, let's assume we want nn1 to become an active NameNode. Format command will also format the storage for the JournalNodes. The next step is to create an entry for the HA cluster in ZooKeeper, and start NameNode and ZKFC on the first NameNode. In our case, this is nn1: # sudo -u hdfs hdfs zkfc -formatZK # service hadoop-hdfs-namenode start # service hadoop-hdfs-zkfc start Check the ZKFC log file (by default, it is in /var/log/hadoop-hdfs/) to make sure nn1 is now an active NameNode: INFO org.apache.hadoop.ha.ZKFailoverController: Trying to make NameNode at nn1.hadoop.test.com/192.168.0.100:8020 active... INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at nn1.hadoop.test.com/192.168.0.100:8020 to active state To activate the secondary NameNode, an operation called bootstrapping needs to be performed. To do this, execute the following command on nn2: # sudo -u hdfs hdfs namenode –bootstrapStandby This will pull the current filesystem state from active NameNode and synchronize the secondary NameNode with the JournalNodes Quorum. Now, you are ready to start the NameNode daemon and the ZKFC daemon on nn2. Use the same commands that you used for nn1. Check the ZKFC log file to make sure nn2 successfully acquired the secondary NameNode role. You should see the following messages at the end of the logfile: INFO org.apache.hadoop.ha.ZKFailoverController: ZK Election indicated that NameNode at nn2.hadoop.test.com/192.168.0.101:8020 should become standby INFO org.apache.hadoop.ha.ZKFailoverController: Successfully transitioned NameNode at nn2.hadoop.test.com/192.168.0.101:8020 to standby state This is the last step in configuring NameNode HA. It is a good idea to verify if automatic failover is configured correctly, and if it will behave as expected in the case of a primary NameNode outage. Testing failover in the cluster setup stage is easier and safer than discovering that failover doesn't work during production stage and causing a cluster outage. You can perform a simple test: kill the primary NameNode daemon and verify if the secondary takes over its role. After that, bring the old primary back online and make sure it takes over the secondary role. You can use execute the following command to get the current status of NameNode nn1: # sudo -u hdfs hdfs haadmin -getServiceState nn1 The hdfs haadmin command can also be used to initiate a failover in manual failover setup. At this point, you have a fully configured and functional NameNode HA setup. Summary We saw in this article how to configure Hadoop's NameNode HA. Resources for Article: Further resources on this subject: Advanced Hadoop MapReduce Administration [Article] Managing a Hadoop Cluster [Article] Making Big Data Work for Hadoop and Solr [Article]
Read more
  • 0
  • 0
  • 5573

article-image-using-memcached-python
Packt
20 Nov 2013
5 min read
Save for later

Using memcached with Python

Packt
20 Nov 2013
5 min read
(For more resources related to this topic, see here.) If you want to make such a connection, there are several clients available for you. The most popular ones are: python-memcached: This is a pure-python implementation of the memcached client (implemented 100 percent in Python). It offers good performance and is extremely simple to install and use. pylibmc: This is a Python wrapper on the libmemcached C/C++ library, it offers excellent performance, thread safety, and light memory usage, yet it's not as simple as python-memcached to install, since you will need to have the libmemcached library compiled and installed on your system. Twisted memcache: This client is part of the Python twisted event-driven networking engine for Python. It offers a reactive code structure and excellent performance as well, but it is not as simple to use as pylibmc or python-memcached but it fits perfectly if your entire application is built on twisted. In this recipe, we will be using python-memcached for the sake of simplicity and since other clients have almost the same API, it does not make much difference from a developer's perspective. Getting ready It's always a good idea to create virtualenv for your experiments to keep your experiments contained and not to pollute the global system with the packages you install. You can create virtualenv easily: virtualenv memcache_experiments source memcache_experiments/bin/activate We will need to install python-memcached first, using the pip package manager on our system: sudo pip install python-memcached How to do it... Let's start with a simple set and get script: import memcache client = memcache.Client([('127.0.0.1', 11211)]) sample_obj = {"name": "Soliman", "lang": "Python"} client.set("sample_user", sample_obj, time=15) print "Stored to memcached, will auto-expire after 15 seconds" print client.get("sample_user") Save the script into a file called memcache_test1.py and run it using python memcache_test1.py. On running the script you should see something like the following: Stored to memcached, will auto-expire after 15 seconds {'lang': 'Python', 'name': 'Soliman'} Let's now try other memcached features: import memcache client = memcache.Client([('127.0.0.1', 11211)]) client.set("counter", "10") client.incr("counter") print "Counter was incremented on the server by 1, now it's %s" % client.get("counter") client.incr("counter", 9) print "Counter was incremented on the server by 9, now it's %s" % client.get("counter") client.decr("counter") print "Counter was decremented on the server by 1, now it's %s" % client.get("counter") The output of the script looks like the following: Counter was incremented on the server by 1, now it's 11 Counter was incremented on the server by 9, now it's 20 Counter was decremented on the server by 1, now it's 19 The incr and decr methods allow you to specify a delta value or to by default increment/decrement by 1. Alright, now let's sync a Python dict to memcached with a certain prefix: import memcache client = memcache.Client([('127.0.0.1', 11211)]) data = {"some_key1": "value1", "some_key2": "value2"} client.set_multi(data, time=15, key_prefix="pfx_") print "saved the dict with prefix pfx_" print "getting one key: %s" % client.get("pfx_some_key1") print "Getting all values: %s" % client.get_multi(["some_key1", "some_ key2"], key_prefix="pfx_") How it works... In this script, we are connecting to the memcached server(s) using the Client constructor, and then we are using the set method to store a standard Python dict as the value of the "sample_user" key. After that we use the get method to retrieve the value. The client automatically serialized the python dict to memcached and deserialized the object after getting it from memcached server. In the second script, we are playing with some of the features we never tried in the memcached server. The incr and decr are methods that allow you to increment and decrement integer values directly on the server automatically. Then, we are using an awesome feature that we also didn't play with before, that is get/set_multi that allows us to set or get multiple key/values at a single request. Also it allows us to add a certain prefix to all the keys during the set or get operations. The output of the last script should look like the following: saved the dict with prefix pfx_ getting one key: value1 Getting all values: {'some_key1': 'value1', 'some_key2': 'value2'} There's more... In the Client constructor, we specified the server hostname and port in a tuple (host, port) and passed that in a list of servers. This allows you to connect to a cluster of memcached servers by adding more servers to this list. For example: client = memcache.Client([('host1', 1121), ('host2', 1121), ('host3', 1122)]) Also, you can also specify custom picklers/unpicklers to tell the memcached client how to serialize or de-serialize the Python types using your custom algorithm. Summary Thus we learned how to connect to memcached servers from your python application. Resources for Article: Further resources on this subject: Working with Databases [Article] Testing and Debugging in Grok 1.0: Part 2 [Article] Debugging AJAX using Microsoft AJAX Library, Internet Explorer and Mozilla Firefox [Article]
Read more
  • 0
  • 0
  • 22234
Packt
19 Nov 2013
6 min read
Save for later

Code interlude – signals and slots

Packt
19 Nov 2013
6 min read
(For more resources related to this topic, see here.) Qt offers a better way: signals and slots. Like an event, the sending component generates a signal—in Qt parlance, the object emits a signal—which recipient objects may receive in a slot for the purpose. Qt objects may emit more than one signal, and signals may carry arguments; in addition, multiple Qt objects can have slots connected to the same signal, making it easy to arrange one-to-many notifications. Equally important, if no object is interested in a signal, it can be safely ignored, and no slots connected to the signal. Any object that inherits from QObject, Qt's base class for objects, can emit signals or provide slots for connection to signals. Under the hood, Qt provides extensions to C++ syntax for declaring signals and slots. A simple example will help make this clear. The classic example you find in the Qt documentation is an excellent one, and we'll use it again it here, with some extension's. Imagine you have the need for a counter, that is, a container that holds an integer. In C++, you might write: class Counter { public: Counter() { m_value = 0; } int value() const { return m_value; } void setValue(int value); private: int m_value; }; The Counter class has a single private member, m_value, bearing its value. Clients can invoke the value to obtain the counter's value, or set its value by invoking setValue with a new value. In Qt, using signals and slots, we write the class this way: #include <QObject> class Counter : public QObject { Q_OBJECT public: Counter() { m_value = 0; } int value() const { return m_value; } public slots: void setValue(int value); void increment(); void decrement(); signals: void valueChanged(int newValue); private: int m_value; }; This Counter class inherits from QObject, the base class for all Qt objects. All QObject subclasses must include the declaration Q_OBJECT as the first element of their definition; this macro expands to Qt code implementing the subclass-specific glue necessary for the Qt object and signal-slot mechanism. The constructor remains the same, initializing our private member to zero. Similarly, the accessor method value remains the same, returning the current value for the counter. An object's slots must be public, and are declared using the Qt extension to C++ public slots. This code defines three slots: a setValue slot, which accepts a new value for the counter, and the increment and decrement slots, which increment and decrement the value of the counter. Slots may take arguments, but do not return them; the communication between a signal and its slots is one way, initiating with the signal and terminating with the slot(s) connected to the signal. The counter offers a single signal. Like slots, signals are also declared using a Qt extension to C++, signals. In the example above, a Counter object emits the signal valueChanged with a single argument, which is the new value of the counter. A signal is a function signature, not a method; Qt's extensions to C++ use the type signature of signals and slots to ensure type safety between signal-slot connections, a key advantage signals and slots have over other decoupled messaging schemes. As the developers, it's our responsibility to implement each slot in our class with whatever application logic makes sense. The Counter class's slots look like this: void Counter::setValue(int newValue) { if (newValue != m_value) { m_value = newValue; emit valueChanged(newValue); } } void Counter::increment() { setValue(value() + 1); } void Counter::decrement() { setValue(value() – 1); } We use the implementation of the setValue slot as a method, which is what all slots are at their heart. The setValue slot takes a new value and assigns the new value to the Counter class's private member variable if they aren't the same. Then, the signal emits the valueChanged signal, using the Qt extension emit, which triggers an invocation to the slots connected to the signal. This is a common pattern for signals that handle object properties: testing the property to be set for equality with the new value, and only assigning and emitting a signal if the values are unequal. If we had a button, say QPushButton, we could connect its clicked signal to the increment or decrement slot, so that a click on the button incremented or decremented the counter. I'd do that using the QObject::connect method, like this: QPushButton* button = new QPushButton(tr("Increment"), this); Counter* counter = new Counter(this); QObject::connect(button, SIGNAL(clicked(void)), Counter, SLOT(increment(void)); We first create the QPushButton and Counter objects. The QPushButton constructor takes a string, the label for the button, which we denote to be the string Increment or its localized counterpart. Why do we pass this to each constructor? Qt provides a parent-child memory management between QObjects and their descendants, easing clean-up when you're done using an object. When you free an object, Qt also frees any children of the parent object, so you don't have to. The parent-child relationship is set at construction time; I'm signaling to the constructors that when the object invoking |this code is freed, the push button and counter may be freed as well. (Of course, the invoking method must also be a subclass of QObject for this to work.) Next, I call QObject::connect, passing first the source object and the signal to be connected, and then the receiver object and the slot to which the signal should be sent. The types of the signal and the slot must match, and the signals and slots must be wrapped in the SIGNAL and SLOT macros, respectively. Signals can also be connected to signals, and when that happens, the signals are chained and trigger any slots connected to the downstream signals. For example, I could write: Counter a, b; QObject::connect(&a, SIGNAL(valueChanged(int)), &b, SLOT(setValue(int))); This connects the counter b with the counter a, so that any change in value to the counter a also changes the value of the counter b. Signals and slots are used throughout Qt, both for user interface elements and to handle asynchronous operations, such as the presence of data on network sockets and HTTP transaction results. Under the hood, signals and slots are very efficient, boiling down to function dispatch operations, so you shouldn't hesitate to use the abstraction in your own designs. Qt provides a special build tool, the meta-object compiler, which compiles the extensions to C++ that signals and slots require and generates the additional code necessary to implement the mechanism. Summary In this article we learned the usage events for the purpose of coupling different objects; components offering data encapsulate that data in an event, and an event loop (or, more recently, an event listener) catches the event and performs some action. Resources for Article: Further resources on this subject: One-page Application Development [Article] Android 3.0 Application Development: Multimedia Management [Article] Creating and configuring a basic mobile application [Article]
Read more
  • 0
  • 0
  • 1863

article-image-spatial-data-services
Packt
19 Nov 2013
6 min read
Save for later

Spatial Data Services

Packt
19 Nov 2013
6 min read
(For more resources related to this topic, see here.) So far we have worked with relatively small sets of data; for larger collections, Bing Maps provide Spatial Data Services. They offer the Data Source Management API to load large datasets into Bing Maps servers, and the Query API to query the data. In this article, we will use the Geocode Dataflow API, which provides geocoding and reverse geocoding of large datasets. Geocoding is the process of finding geographic coordinates from other geographic data, such as, street addresses or postal codes. Reverse geocoding is the opposite process, where the coordinates are used to find their associated textual locations, such as, addresses and postal codes. Bing Maps implement these processes by creating jobs on Bing Maps servers, and querying them later. All the process can be automated, which is ideal for huge amounts of data. Please note that strict rules of data usage apply to the Spatial Data Services (please refer http://msdn.microsoft.com/en-us/library/gg585136.aspx for full details). At the moment of writing, a user with a basic account can set up to 5 jobs in a 24-hour period. Our task in this article is to geocode the addresses of ten of the biggest technology companies in the world, such as Microsoft, Google, Apple, Facebook, and so on, and then display them on the map. The first step is to prepare the file with the companies' addresses. Geocoding dataflow input data The input and output data can be supplied in the following formats: XML (content type application/xml) Comma separated values (text/plain) Tab-delimited values (text/plain) Pipe-delimited values (text/plain) Binary (application/octet-stream) used with Blob Service REST API We will use the XML format, for its clearer declarative structure. Now, let's open Visual Studio and create a new C# Console project named LBM.Geocoder. We then add a Datafolder, which will contain all the data files and samples with which we'll work in this article, starting with the data.xml file we need to upload to the Spatial Data servers to be geocoded. <?xml version="1.0" encoding="utf-8"?><GeocodeFeedVersion="2.0"><GeocodeEntity Id="001"><GeocodeRequest Culture="en-US" IncludeNeighborhood="1"><Address AddressLine="1 Infinite Loop"AdminDistrict="CA" Locality="Cupertino" PostalCode="95014" /></GeocodeRequest></GeocodeEntity><GeocodeEntity Id="002"><GeocodeRequest Culture="en-US" IncludeNeighborhood="1"><Address AddressLine="185 Berry St"AdminDistrict="NY" Locality="New York" PostalCode="10038"/></GeocodeRequest></GeocodeEntity> The listing above is a fragment of that file with the addresses of the companies' headquarters. Please note that the more addressing information we provide to the API, the better quality geocoding we receive. In production, this file would be created programmatically, probably based on an Addresses database. The ID of GeocodeEntity, could also be stored, so that the data is matched easier once fetched from the servers. (You can find the Geocode Dataflow Data Schema, Version 2, at http://msdn.microsoft.com/en-us/library/jj735477.aspx.) The job Let's add a Jobclass to our project: public class Job{private readonly string dataFilePath;public Job(string dataFilePath){this.dataFilePath = dataFilePath;}} The dataFilePath argument is the path to the data.xml file we created earlier. Creating the job is as easy as calling a REST URL: public void Create(){var uri =String.Format("{0}?input=xml&output=xml&key={1}",Settings.DataflowUri, Settings.Key);var data = File.ReadAllBytes(dataFilePath);try{var wc = new WebClient();wc.Headers.Add("Content-Type", "application/xml");var receivedBytes = wc.UploadData(uri, "POST",data);ParseJobResponse(receivedBytes);}catch (WebException e){var response = (HttpWebResponse)e.Response;var status = response.StatusCode;}} We place all the API URLs and other settings in the Settings class: public class Settings{public static string Key = "[YOUR BING MAPS KEY];public static string DataflowUri ="https://spatial.virtualearth.net/REST/v1/dataflows/geocode";public static XNamespace XNamespace ="http://schemas.microsoft.com/search/local/ws/rest/v1";public static XNamespace GeocodeFeedXNamespace ="http://schemas.microsoft.com/search/local/2010/5/geocode";} To create the job, we need to build the Dataflow URL template with a Bing Maps Key, and parameters such as input and output formats. We specify the latter to be XML. Next, we use a WebClient instance to load the data with a POST protocol. Then, we parse the server response: private void ParseJobResponse(byte[] response){using (var stream = new MemoryStream(response)){var xDoc = XDocument.Load(stream);var job = xDoc.Descendants(Settings.XNamespace +"DataflowJob").FirstOrDefault();var linkEl = job.Element(Settings.XNamespace +"Link");if (linkEl != null) Link = linkEl.Value;}} Here, we pass the stream created with the bytes received from the server to the XDocument.load method. This produces an XDocument instance, which we will use to extract the data we need. We will apply a similar process throughout the article to parse XML content. Note that the appropriate XNamespace needs to be supplied in order to navigate through the document nodes. You can find a sample of the response inside the Data folder (jobSetupResponse.xml), which shows that the link to the job created is found under a Link element within the DataflowJob node. Getting job status Once we have set up a job, we can store the link on a data store, such as a database, and check for its status later. The data will be available on the Microsoft servers up to 14 days after creation. Let's see how we can query the job status: public static JobStatus CheckStatus(string jobUrl){var result = new JobStatus();var uri = String.Format("{0}?output=xml&key={1}", jobUrl,Settings.Key);var xDoc = XDocument.Load(uri);var job = xDoc.Descendants(Settings.XNamespace +"DataflowJob").FirstOrDefault();if (job != null){var linkEls = job.Elements(Settings.XNamespace +"Link").ToList();foreach (var linkEl in linkEls){var nameAttr = linkEl.Attribute("name");if (nameAttr != null){if (nameAttr.Value == "succeeded") result.SucceededLink = linkEl.Value;if (nameAttr.Value == "failed") result.FailedLink= linkEl.Value;}}var statusEl = job.Elements(Settings.XNamespace +"Status").FirstOrDefault();if (statusEl != null) result.Status = statusEl.Value;}return result;} Now, we know that to query a Data API we need to first build the URL template. We do this by attaching the Bing Maps Key and an output parameter to the job link. The response we get from the server, stores the job status within a Link element of a DataflowJob node (the jobResponse.xml file inside the Data folder contains an example). The link we need has a name attribute with the value succeeded. Summary When it comes to large amounts of data, the Spatial Data Services offer a number of interfaces to store, and query user data; geocode addresses or reverse geocode geographical coordinates. The services perform these tasks by means of background jobs, which can be set up and queried through REST URLs. Resources for Article: Further resources on this subject: NHibernate 2: Mapping relationships and Fluent Mapping [Article] Using Maps in your Windows Phone App [Article] What is OpenLayers? [Article]
Read more
  • 0
  • 0
  • 2391
Modal Close icon
Modal Close icon