Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Events
Videos
Audiobooks
Packt Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials

7019 Articles
article-image-advanced-data-operations
Packt
30 Oct 2013
11 min read
Save for later

Advanced Data Operations

Packt
30 Oct 2013
11 min read
(For more resources related to this topic, see here.) Recipe 1 – handling multi-valued cells It is a common problem in many tables: what do you do if multiple values apply to a single cell? For instance, consider a Clients table with the usual name, address, and telephone fields. A typist is adding new contacts to this table, when he/she suddenly discovers that Mr. Thompson has provided two addresses with a different telephone number for each of them. There are essentially three possible reactions to this: Adding only one address to the table: This is the easiest thing to do, as it eliminates half of the typing work. Unfortunately, this implies that half of the information is lost as well, so the completeness of the table is in danger. Adding two rows to the table: While the table is now complete, we now have redundant data. Redundancy is also dangerous, because it leads to error: the two rows might accidentally be treated as two different Mr. Thompsons, which can quickly become problematic if Mr. Thompson is billed twice for his subscription. Furthermore, as the rows have no connection, information updated in one of them will not automatically propagate to the other. Adding all information to one row: In this case, two addresses and two telephone numbers are added to the respective fields. We say the field is overloaded with regard to its originally envisioned definition. At first sight, this is both complete yet not redundant, but a subtle problem arises. While humans can perfectly make sense of this information, automated processes cannot. Imagine an envelope labeler, which will now print two addresses on a single envelope, or an automated dialer, which will treat the combined digits of both numbers as a single telephone number. The field has indeed lost its precise semantics. Note that there are various technical solutions to deal with the problem of multiple values, such as table relations. However, if you are not in control of the data model you are working with, you'll have to choose any of the preceding solutions. Luckily, OpenRefine is able to offer the best of both worlds. Since it is also an automated piece of software, it needs to be informed whether a field is multi-valued before it can perform sensible operations on it. In the Powerhouse Museum dataset, the Categories field is multi-valued, as each object in the collection can belong to different categories. Before we can perform meaningful operations on this field, we have to tell OpenRefine to somehow treat it a little different. Suppose we want to give the Categories field a closer look to check how many different categories are there and which categories are the most prominent. First, let's see what happens if we try to create a text facet on this field by clicking on the dropdown next to Categories and navigating to Facet| Text Facet as shown in the following screenshot. This doesn't work as expected because there are too many combinations of individual categories. OpenRefine simply gives up, saying that there are 14,805 choices in total, which is above the limit for display. While you can increase the maximum value by clicking on Set choice count limit, we strongly advise against this. First of all, it would make OpenRefine painfully slow as it would offer us a list of 14,805 possibilities, which is too large for an overview anyway. Second, it wouldn't help us at all because OpenRefine would only list the combined field values (such as Hen eggs | Sectional models | Animal Samples and Products). This does not allow us to inspect the individual categories, which is what we're interested in. To solve this, leave the facet open, but go to the Categories dropdown again and select Edit Cells| Split multi-valued cells…as shown in the following screenshot: OpenRefine now asks What separator currently separates the values?. As we can see in the first few records, the values are separated by a vertical bar or pipe character, as the horizontal line tokens are called. Therefore, enter a vertical bar |in the dialog. If you are not able to find the corresponding key on your keyboard, try selecting the character from one of the Categories cells and copying it so you can paste it in the dialog. Then, click on OK. After a few seconds, you will see that OpenRefine has split the cell values, and the Categories facet on the left now displays the individual categories. By default, it shows them in alphabetical order, but we will get more valuable insights if we sort them by the number of occurrences. This is done by changing the Sort by option from name to count, revealing the most popular categories. One thing we can do now, which we couldn't do when the field was still multi-valued is changing the name of a single category across all records. For instance, to change the name of Clothing and Dress, hover over its name in the created Categories facet and click on the edit link, as you can see in the following screenshot: Enter a new name such as Clothing and click on Apply. OpenRefine changes all occurrences of Clothing and Dress into Clothing, and the facet is updated to reflect this modification. Once you are done editing the separate values, it is time to merge them back together. Go to the Categories dropdown, navigate to Edit cells| Join multi-valued cells…, and enter the separator of your choice. This does not need to be the same separator as before, and multiple characters are also allowed. For instance, you could opt to separate the fields with a comma followed by a space. Recipe 3 – clustering similar cells Thanks to OpenRefine, you don't have to worry about inconsistencies that slipped in during the creation process of your data. If you have been investigating the various categories after splitting the multi-valued cells, you might have noticed that the same category labels do not always have the same spelling. For instance, there is Agricultural Equipment and Agricultural equipment(capitalization differences), Costumes and Costume(pluralization differences), and various other issues. The good news is that these can be resolved automatically; well, almost. But, OpenRefine definitely makes it a lot easier. The process of finding the same items with slightly different spelling is called clustering. After you have split multi-valued cells, you can click on the Categories dropdown and navigate to Edit cells| Cluster and edit…. OpenRefine presents you with a dialog box where you can choose between different clustering methods, each of which can use various similarity functions. When the dialog opens, key collision and fingerprint have been chosen as default settings. After some time (this can take a while, depending on the project size), OpenRefine will execute the clustering algorithm on the Categories field. It lists the found clusters in rows along with the spelling variations in each cluster and the proposed value for the whole cluster, as shown in the following screenshot: Note that OpenRefine does not automatically merge the values of the cluster. Instead, it wants you to confirm whether the values indeed point to the same concept. This avoids similar names, which still have a different meaning, accidentally ending up as the same. Before we start making decisions, let's first understand what all of the columns mean. The Cluster Size column indicates how many different spellings of a certain concept were thought to be found. The Row Count column indicates how many rows contain either of the found spellings. In Values in Cluster, you can see the different spellings and how many rows contain a particular spelling. Furthermore, these spellings are clickable, so you can indicate which one is correct. If you hover over the spellings, a Browse this cluster link appears, which you can use to inspect all items in the cluster in a separate browser tab. The Merge column contains a checkbox. If you check it, all values in that cluster will be changed to the value in the New Cell Value column when you click on one of the Merge Selected buttons. You can also manually choose a new cell value if the automatic value is not the best choice. So, let's perform our first clustering operation. I strongly advise you to scroll carefully through the list to avoid clustering values that don't belong together. In this case, however, the algorithm hasn't acted too aggressively: in fact, all suggested clusters are correct. Instead of manually ticking the Merge? checkbox on every single one of them, we can just click on Select All at the bottom. Then, click on the Merge Selected & Re-Cluster button, which will merge all the selected clusters but won't close the window yet, so we can try other clustering algorithms as well. OpenRefine immediately reclusters with the same algorithm, but no other clusters are found since we have merged all of them. Let's see what happens when we try a different similarity function. From the Keying Function menu, click on ngram fingerprint. Note that we get an additional parameter, Ngram Size, which we can experiment with to obtain less or more aggressive clustering. We see that OpenRefine has found several clusters again. It might be tempting to click on the Select All button again, but remember we warned to carefully inspect all rows in the list. Can you spot the mistake? Have a closer look at the following screenshot: Indeed, the clustering algorithm has decided that Shirts and T-shirts are similar enough to be merged. Unfortunately, this is not true. So, either manually select all correct suggestions, or deselect the ones that are not. Then, click on the Merge Selected & Re-Cluster button. Apart from trying different similarity functions, we can also try totally different clustering methods. From the Method menu, click on nearest neighbor. We again see new clustering parameters appear (Radius and Block Chars, but we will use their default settings for now). OpenRefine again finds several clusters, but now, it has been a little too aggressive. In fact, several suggestions are wrong, such as the Lockets / Pockets / Rockets cluster. Some other suggestions, such as "Photocopiers" and "Photocopier", are fine. In this situation, it might be best to manually pick the few correct ones among the many incorrect clusters. Assuming that all clusters have been identified, click on the Merge Selected & Close button, which will apply merging to the selected items and take you back into the main OpenRefine window. If you look at the data now or use a text facet on the Categories field, you will notice that the inconsistencies have disappeared. What are clustering methods? OpenRefine offers two different clustering methods, key collision and nearest neighbor, which fundamentally differ in how they function. With key collision, the idea is that a keying function is used to map a field value to a certain key. Values that are mapped to the same key are placed inside the same cluster. For instance, suppose we have a keying function which removes all spaces; then, A B C, AB C, and ABC will be mapped to the same key: ABC. In practice, the keying functions are constructed in a more sophisticated and helpful way. Nearest neighbor, on the other hand, is a technique in which each unique value is compared to every other unique value using a distance function. For instance, if we count every modification as one unit, the distance between Boot and Bots is 2: one addition and one deletion. This corresponds to an actual distance function in OpenRefine, namely levenshtein. In practice, it is hard to predict which combination of method and function is the best for a given field. Therefore, it is best to try out the various options, each time carefully inspecting whether the clustered values actually belong together. The OpenRefine interface helps you by putting the various options in the order they are most likely to help: for instance, trying key collision before nearest neighbor. Summary In this article we learned about how to handle multi-valued cells and clustering of similar cells in OpenRefine. Multi-valued cells are a common problem in many tables. This article showed us what to do if multiple values apply to a single cell. Since OpenRefine is an automated piece of software, it needs to be informed whether a field is multi-valued before it can perform sensible operations on it. This article also showed an example of how to go about it. It also shed light on clustering methods. OpenRefine offers two different clustering methods, key collision and nearest neighbor , which fundamentally differ in how they function. With key collision, the idea is that a keying function is used to map a field value to a certain key. Values that are mapped to the same key are placed inside the same cluster. Resources for Article : Further resources on this subject: Business Intelligence and Data Warehouse Solution - Architecture and Design [Article] Self-service Business Intelligence, Creating Value from Data [Article] Oracle Business Intelligence : Getting Business Information from Data [Article]
Read more
  • 0
  • 0
  • 1891

article-image-web-app-penetration-testing-kali
Packt
30 Oct 2013
4 min read
Save for later

Web app penetration testing in Kali

Packt
30 Oct 2013
4 min read
(For more resources related to this topic, see here.) Web apps are now a major part of today's World Wide Web. Keeping them safe and secure is the prime focus of webmasters. Building web apps from scratch can be a tedious task, and there can be small bugs in the code that can lead to a security breach. This is where web apps jump in and help you secure your application. Web app penetration testing can be implemented at various fronts such as the frontend interface, database, and web server. Let us leverage the power of some of the important tools of Kali that can be helpful during web app penetration testing. WebScarab proxy WebScarab is an HTTP and HTTPS proxy interceptor framework that allows the user to review and modify the requests created by the browser before they are sent to the server. Similarly, the responses received from the server can be modified before they are reflected in the browser. The new version of WebScarab has many more advanced features such as XSS/CSRF detection, Session ID analysis, and Fuzzing. Follow these three steps to get started with WebScarab: To launch WebScarab, browse to Applications | Kali Linux | Web applications | Web application proxies | WebScarab. Once the application is loaded, you will have to change your browser's network settings. Set the proxy settings for IP as 127.0.0.1 and Port as 8008: Save the settings and go back to the WebScarab GUI. Click on the Proxy tab and check Intercept request. Make sure that both GET and POST requests are highlighted on the left-hand side panel. To intercept the response, check Intercept responses to begin reviewing the responses coming from the server. Attacking the database using sqlninja sqlninja is a popular tool used to test SQL injection vulnerabilities in Microsoft SQL servers. Databases are an integral part of web apps hence, even a single flaw in it can lead to mass compromising of information. Let us see how sqlninja can be used for database penetration testing. To launch SQL ninja, browse to Applications | Kali Linux | Web applications | Database Exploitation | sqlninja. This will launch the terminal window with sqlninja parameters. The important parameter to look for is either the mode parameter or the –m parameter: The –m parameter specifies the type of operation we want to perform over the target database.Let us pass a basic command and analyze the output: root@kali:~#sqlninja –m test Sqlninja rel. 0.2.3-r1 Copyright (C) 2006-2008 icesurfer [-] sqlninja.conf does not exist. You want to create it now ? [y/n] This will prompt you to set up your configuration file (sqlninja.conf). You can pass the respective values and create the config file. Once you are through with it, you are ready to perform database penetration testing. The Websploit framework Websploit is an open source framework designed for vulnerability analysis and penetration testing of web applications. It is very much similar to Metasploit and incorporates many of its plugins to add functionalities. To launch Websploit, browse to Applications | Kali Linux | Web Applications | Web Application Fuzzers | Websploit. We can begin by updating the framework. Passing the update command at the terminal will begin the updating process as follows: wsf>update [*]Updating Websploit framework, Please Wait… Once the update is over, you can check out the available modules by passing the following command: wsf>show modules Let us launch a simple directory scanner module against www.target.com as follows: wsf>use web/dir_scanner wsf:Dir_Scanner>show options wsf:Dir_Scanner>set TARGET www.target.com wsf:Dir_Scanner>run Once the run command is executed, Websploit will launch the attack module and display the result. Similarly, we can use other modules based on the requirements of our scenarios. Summary In this article, we covered the following sections: WebScarab proxy Attacking the database using sqlninja The Websploit framework Resources for Article: Further resources on this subject: Installing VirtualBox on Linux [Article] Linux Shell Script: Tips and Tricks [Article] Installing Arch Linux using the official ISO [Article]
Read more
  • 0
  • 0
  • 11722

article-image-apex-plug-ins
Packt
30 Oct 2013
17 min read
Save for later

APEX Plug-ins

Packt
30 Oct 2013
17 min read
(For more resources related to this topic, see here.) In APEX 4.0, Oracle introduced the plug-in feature. A plug-in is an extension to the existing functionality of APEX. The idea behind plug-ins is to make life easier for developers. Plug-ins are reusable, and can be exported and imported. In this way it is possible to create a functionality which is available to all APEX developers. And installing and using them without the need of having a knowledge of what's inside the plug-in. APEX translates settings from the APEX builder to HTML and JavaScript. For example, if you created a text item in the APEX builder, APEX converts this to the following code (simplified): <input type="text" id="P12_NAME" name="P12_NAME" value="your name"> When you create an item type plug-in, you actually take over this conversion task of APEX, and you generate the HTML and JavaScript code yourself by using PL/SQL procedures. That offers a lot of flexibility because now you can make this code generic, so that it can be used for more items. The same goes for region type plug-ins. A region is a container for forms, reports, and so on. The region can be a div or an HTML table. By creating a region type plug-in, you create a region yourself with the possibility to add more functionality to the region. Plug-ins are very useful because they are reusable in every application. To make a plug-in available, go to Shared Components | Plug-ins , and click on the Export Plug-in link on the right-hand side of the page. Select the desired plug-in and file format and click on the Export Plug-in button. The plug-in can then be imported into another application. Following are the six types of plug-in: Item type plug-ins Region type plug-ins Dynamic action plug-ins Process type plug-ins Authorization scheme type plug-ins Authentication scheme type plug-ins In this Aricle we will discuss the first five types of plug-ins. Creating an item type plug-in In an item type plug-in you create an item with the possibility to extend its functionality. To demonstrate this, we will make a text field with a tooltip. This functionality is already available in APEX 4.0 by adding the following code to the HTML form element attributes text field in the Element section of the text field: onmouseover="toolTip_enable(event,this,'A tooltip')" But you have to do this for every item that should contain a tooltip. This can be made more easily by creating an item type plug-in with a built-in tooltip. And if you create an item of type plug-in, you will be asked to enter some text for the tooltip. Getting ready For this recipe you can use an existing page, with a region in which you can put some text items. How to do it... Follow these steps: Go to Shared Components | User Interface | Plug-ins . Click on the Create button. In the Name section, enter a name in the Name text field. In this case we enter tooltip. In the Internal Name text field, enter an internal name. It is advised to use the company's domain address reversed to ensure the name is unique when you decide to share this plug-in. So for example you can use com.packtpub.apex.tooltip. In the Source section, enter the following code in the PL/SQL Code text area: function render_simple_tooltip ( p_item in apex_plugin.t_page_item , p_plugin in apex_plugin.t_plugin , p_value in varchar2 , p_is_readonly in boolean , p_is_printer_friendly in boolean ) return apex_plugin.t_page_item_render_result is l_result apex_plugin.t_page_item_render_result; begin if apex_application.g_debug then apex_plugin_util.debug_page_item ( p_plugin => p_plugin , p_page_item => p_item , p_value => p_value , p_is_readonly => p_is_readonly , p_is_printer_friendly => p_is_printer_friendly); end if; -- sys.htp.p('<input type="text" id="'||p_item.name||'" name="'||p_item.name||'" class="text_field" onmouseover="toolTip_enable(event,this,'||''''||p_item.attribute_01||''''||')">');--return l_result;end render_simple_tooltip; This function uses the sys.htp.p function to put a text item (<input type="text") on the screen. On the text item, the onmouseover event calls the function toolTip_enable(). This function is an APEX function, and can be used to put a tooltip on an item. The arguments of the function are mandatory. The function starts with the option to show debug information. This can be very useful when you create a plug-in and it doesn't work. After the debug information, the htp.p function puts the text item on the screen, including the call to tooltip_enable. You can also see that the call to tooltip_enable uses p_item.attribute_01. This is a parameter that you can use to pass a value to the plug-in. That is, the following steps in this recipe. The function ends with the return of l_result. This variable is of the type apex_plugin.t_page_item_render_result. For the other types of plug-in there are dedicated return types also, for example t_region_render_result. Click on the Create Plug-in button The next step is to define the parameter (attribute) for this plug-in. In the Custom Attributes section, click on the Add Attribute button. In the Name section, enter a name in the Label text field, for example tooltip. Ensure that the Attribute text field contains the value 1 . In the Settings section, set the Type field to Text . Click on the Create button. In the Callbacks section, enter render_simple_tooltip into the Render Function Name text field. In the Standard Attributes section, check the Is Visible Widget checkbox. Click on the Apply Changes button. The plug-in is now ready. The next step is to create an item of type tooltip plug-in. Go to a page with a region where you want to use an item with a tooltip. In the Items section, click on the add icon to create a new item. Select Plug-ins . Now you will get a list of the available plug-ins. Select the one we just created, that is tooltip . Click on Next . In the Item Name text field, enter a name for the item, for example, tt_item. In the Region drop-down list, select the region you want to put the item in. Click on Next . In the next step you will get a new option. It's the attribute you created with the plug-in. Enter here the tooltip text, for example, This is tooltip text. Click on Next . In the last step, leave everything as it is and click on the Create Item button. You are now ready. Run the page. When you move your mouse pointer over the new item, you will see the tooltip. How it works... As stated before, this plug-in actually uses the function htp.p to put an item on the screen. Together with the call to the JavaScript function, toolTip_enable on the onmouseover event makes this a text item with a tooltip, replacing the normal text item. There's more... The tooltips shown in this recipe are rather simple. You could make them look better, for example, by using the Beautytips tooltips. Beautytips is an extension to jQuery and can show configurable help balloons. Visit http://plugins.jquery.com to download Beautytips. We downloaded Version 0.9.5-rc1 to use in this recipe. Go to Shared Components and click on the Plug-ins link. Click on the tooltip plug-in you just created. In the Source section, replace the code with the following code: function render_simple_tooltip ( p_item in apex_plugin.t_page_item, p_plugin in apex_plugin.t_plugin, p_value in varchar2, p_is_readonly in boolean, p_is_printer_friendly in boolean ) return apex_plugin.t_page_item_render_result is l_result apex_plugin.t_page_item_render_result; begin if apex_application.g_debug then apex_plugin_util.debug_page_item ( p_plugin => p_plugin , p_page_item => p_item , p_value => p_value , p_is_readonly => p_is_readonly , p_is_printer_friendly => p_is_printer_friendly); end if; The function also starts with the debug option to see what happens when something goes wrong. --Register the JavaScript and CSS library the plug-inuses. apex_javascript.add_library ( p_name => 'jquery.bgiframe.min', p_directory => p_plugin.file_prefix, p_version => null ); apex_javascript.add_library ( p_name => 'jquery.bt.min', p_directory => p_plugin.file_prefix, p_version => null ); apex_javascript.add_library ( p_name => 'jquery.easing.1.3', p_directory => p_plugin.file_prefix, p_version => null ); apex_javascript.add_library ( p_name => 'jquery.hoverintent.minified', p_directory => p_plugin.file_prefix, p_version => null ); apex_javascript.add_library ( p_name => 'excanvas', p_directory => p_plugin.file_prefix, p_version => null ); After that you see a number of calls to the function apex_javascript.add_library. These libraries are necessary to enable these nice tooltips. Using apex_javascript.add_library ensures that a JavaScript library is included in the final HTML of a page only once, regardless of how many plug-in items appear on that page. sys.htp.p('<input type="text" id="'||p_item.name||'"class="text_field" title="'||p_item.attribute_01||'">');-- apex_javascript.add_onload_code (p_code =>'$("#'||p_item.name||'").bt({ padding: 20 , width: 100 , spikeLength: 40 , spikeGirth: 40 , cornerRadius: 50 , fill: '||''''||'rgba(200, 50, 50, .8)'||''''||' , strokeWidth: 4 , strokeStyle: '||''''||'#E30'||''''||' , cssStyles: {color: '||''''||'#FFF'||''''||',fontWeight: '||''''||'bold'||''''||'} });'); -- return l_result; end render_tooltip; Another difference with the first code is the call to the Beautytips library. In this call you can customize the text balloon with colors and other options. The onmouseover event is not necessary anymore as the call to $().bt in the wwv_flow_javascript.add_onload_code takes over this task. The $().bt function is a jQuery JavaScript function which references the generated HTML of the plug-in item by ID, and converts it dynamically to show a tooltip using the Beautytips plug-in. You can of course always create extra plug-in item type parameters to support different colors and so on per item. To add the other libraries, do the following: In the Files section, click on the Upload new file button. Enter the path and the name of the library. You can use the file button to locate the libraries on your filesystem. Once you have selected the file, click on the Upload button. The files and their locations can be found in the following table: Libra ry Location jquery.bgiframe.min.js bt-0.9.5-rc1other_libsbgiframe_2.1.1 jquery.bt.min.js bt-0.9.5-rc1 jquery.easing.1.3.js bt-0.9.5-rc1other_libs jquery.hoverintent.minified.js bt-0.9.5-rc1other_libs Excanvas.js bt-0.9.5-rc1other_libsexcanvas_r3     If all libraries have been uploaded, the plug-in is ready. The tooltip now looks quite different, as shown in the following screenshot: In the plug-in settings, you can enable some item-specific settings. For example, if you want to put a label in front of the text item, check the Is Visible Widget checkbox in the Standard Attributes section. For more information on this tooltip, go to http://plugins.jquery.com/project/bt. Creating a region type plug-in As you may know, a region is actually a div. With the region type plug-in you can customize this div. And because it is a plug-in, you can reuse it in other pages. You also have the possibility to make the div look better by using JavaScript libraries. In this recipe we will make a carousel with switching panels. The panels can contain images but they can also contain data from a table. We will make use of another jQuery extension, Step Carousel. Getting ready You can download stepcarousel.js from http://www.dynamicdrive.com/dynamicindex4/stepcarousel.htm. However, in order to get this recipe work in APEX, we needed to make a slight modification in it. So, stepcarousel.js, arrowl.gif, and arrow.gif are included in this book. How to do it... Follow the given steps, to create the plug-in: Go to Shared Components and click on the Plug-ins link. Click on the Create button. In the Name section, enter a name for the plug-in in the Name field. We will use Carousel. In the Internal Name text field, enter a unique internal name. It is advised to use your domain reversed, for example com.packtpub.carousel. In the Type listbox, select Region . In the Source section, enter the following code in the PL/SQL Code text area: function render_stepcarousel ( p_region in apex_plugin.t_region, p_plugin in apex_plugin.t_plugin, p_is_printer_friendly in boolean ) return apex_plugin.t_region_render_result is cursor c_crl is select id , panel_title , panel_text , panel_text_date from app_carousel order by id; -- l_code varchar2(32767); begin The function starts with a number of arguments. These arguments are mandatory, but have a default value. In the declare section there is a cursor with a query on the table APP_CAROUSEL. This table contains several data to appear in the panels in the carousel. -- add the libraries and stylesheets -- apex_javascript.add_library ( p_name => 'stepcarousel', p_directory => p_plugin.file_prefix, p_version => null ); -- --Output the placeholder for the region which is used by--the Javascript code The actual code starts with the declaration of stepcarousel.js. There is a function, APEX_JAVASCRIPT.ADD_LIBRARY to load this library. This declaration is necessary, but this file needs also to be uploaded in the next step. You don't have to use the extension .js here in the code. -- sys.htp.p('<style type="text/css">'); -- sys.htp.p('.stepcarousel{'); sys.htp.p('position: relative;'); sys.htp.p('border: 10px solid black;'); sys.htp.p('overflow: scroll;'); sys.htp.p('width: '||p_region.attribute_01||'px;'); sys.htp.p('height: '||p_region.attribute_02||'px;'); sys.htp.p('}'); -- sys.htp.p('.stepcarousel .belt{'); sys.htp.p('position: absolute;'); sys.htp.p('left: 0;'); sys.htp.p('top: 0;'); sys.htp.p('}'); sys.htp.p('.stepcarousel .panel{'); sys.htp.p('float: left;'); sys.htp.p('overflow: hidden;'); sys.htp.p('margin: 10px;'); sys.htp.p('width: 250px;'); sys.htp.p('}'); -- sys.htp.p('</style>'); After the loading of the JavaScript library, some style elements are put on the screen. The style elements could have been put in a Cascaded Style Sheet (CSS ), but since we want to be able to adjust the size of the carousel, we use two parameters to set the height and width. And the height and the width are part of the style elements. -- sys.htp.p('<div id="mygallery" class="stepcarousel"style="overflow:hidden"><div class="belt">'); -- for r_crl in c_crl loop sys.htp.p('<div class="panel">'); sys.htp.p('<b>'||to_char(r_crl.panel_text_date,'DD-MON-YYYY')||'</b>'); sys.htp.p('<br>'); sys.htp.p('<b>'||r_crl.panel_title||'</b>'); sys.htp.p('<hr>'); sys.htp.p(r_crl.panel_text); sys.htp.p('</div>'); end loop; -- sys.htp.p('</div></div>'); The next command in the script is the actual creation of a div. Important here is the name of the div and the class. The Step Carousel searches for these identifiers and replaces the div with the stepcarousel. The next step in the function is the fetching of the rows from the query in the cursor. For every line found, the formatted text is placed between the div tags. This is done so that Step Carousel recognizes that the text should be placed on the panels. --Add the onload code to show the carousel -- l_code := 'stepcarousel.setup({ galleryid: "mygallery" ,beltclass: "belt" ,panelclass: "panel" ,autostep: {enable:true, moveby:1, pause:3000} ,panelbehavior: {speed:500, wraparound:true,persist:true} ,defaultbuttons: {enable: true, moveby: 1,leftnav:["'||p_plugin.file_prefix||'arrowl.gif", -5,80], rightnav:["'||p_plugin.file_prefix||'arrowr.gif", -20,80]} ,statusvars: ["statusA", "statusB", "statusC"] ,contenttype: ["inline"]})'; -- apex_javascript.add_onload_code (p_code => l_code); -- return null; end render_stepcarousel; The function ends with the call to apex_javascript.add_onload_code. Here starts the actual code for the stepcarousel and you can customize the carousel, like the size, rotation speed and so on. In the Callbacks section, enter the name of the function in the Return Function Name text field. In this case it is render_stepcarousel. Click on the Create Plug-in button. In the Files section, upload the stepcarousel.js, arrowl.gif, and arrowr.gif files. For this purpose, the file stepcarousel.js has a little modification in it. In the last section (setup:function), document.write is used to add some style to the div tag. Unfortunately, this will not work in APEX, as document.write somehow destroys the rest of the output. So, after the call, APEX has nothing left to show, resulting in an empty page. Document.write needs to be removed, and the following style elements need to be added in the code of the plug-in: sys.htp.p('</p><div id="mygallery" class="stepcarousel" style="overflow: hidden;"><div class="belt">'); In this line of code you see style='overflow:hidden'. That is the line that actually had to be included in stepcarousel.js. This command hides the scrollbars. After you have uploaded the files, click on the Apply Changes button. The plug-in is ready and can now be used in a page. Go to the page where you want this stepcarousel to be shown. In the Regions section, click on the add icon. In the next step, select Plug-ins . Select Carousel . Click on Next . Enter a title for this region, for example Newscarousel. Click on Next . In the next step, enter the height and the width of the carousel. To show a carousel with three panels, enter 800 in the Width text field. Enter 100 in the Height text field. Click on Next . Click on the Create Region button. The plug-in is ready. Run the page to see the result. How it works... The stepcarousel is actually a div. The region type plug-in uses the function sys.htp.p to put this div on the screen. In this example, a div is used for the region, but a HTML table can be used also. An APEX region can contain any HTML output, but for the positioning, mostly a HTML table or a div is used, especially when layout is important within the region. The apex_javascript.add_onload_code starts the animation of the carousel. The carousel switches panels every 3 seconds. This can be adjusted (Pause : 3000). See also For more information on this jQuery extension, go to http://www.dynamicdrive.com/dynamicindex4/stepcarousel.htm.
Read more
  • 0
  • 0
  • 10237

article-image-performance-testing-and-load-balancing
Packt
30 Oct 2013
17 min read
Save for later

Performance Testing and Load Balancing

Packt
30 Oct 2013
17 min read
(For more resources related to this topic, see here.) Initial and on-going performance measurement Performance measurements begin prior to system deployment. In terms of a failover cluster of Hyper-V systems, it begins prior to creating any virtual machines. Your first goal is to obtain baselines. The term baseline has different meanings in different contexts; in this case it means gathering data on a system during a known healthy period. Its purpose is to serve as a point of comparison for later data gathering operations. The first set of performance measurements you take will be with no virtual machines. Once you have reached your target deployment level, you will obtain another. These will be your baselines. All future performance measurements will be compared to these in order to determine how your systems are working. Microsoft provides a thorough document for performance tuning of Windows Server 2012. These concepts carry forward to R2 and many apply to Hyper-V Server as well. Download it from the following site: http://download.microsoft.com/download/0/0/B/00BE76AF-D340-4759-8ECD-C80BC53B6231/performance-tuning-guidelines-windows-server-2012.docx General performance measurement Baselines and ongoing performance evaluations tend to be fairly generic in nature. They can be carried out in a number of ways. This section will examine two others. The first is the free Server Performance Advisor ( SPA ) provided by Microsoft. The second is the Performance Monitor tool in-built in Windows operating systems. Server Performance Advisor This tool can be run quickly to determine the performance characteristics of a new system and on a schedule to track the performance trends of an active system. Do not install or run Server Performance Advisor directly on a Hyper-V host or any guests that are to be measured. Doing so adds a load that will make the results inaccurate. The following instructions can be used to quickly set up SPA to run in a basic environment. They assume that you'll be running the application with a domain account that has administrative privileges on the systems to be measured. To scan a system that has an active firewall, run the following cmdlet: Enable-NetFirewallRule -DisplayName "Performance Logs and Alerts (TCP‑In)" Service Performance Advisor is published on the developer center, which is accessible at http://msdn.microsoft.com/en-us/library/windows/hardware/hh367834.aspx. For best results, this tool should be run from a remote computer that's not on the host being measured. It can be run from any modern Windows system. It requires a connection to an installation of Microsoft SQL Server 2008 R2 or newer. The Express edition is perfectly acceptable. The latest version can be obtained at no charge from the Microsoft download center at http://search.microsoft.com/en-us/downloadresults.aspx?q=sql%20express. There is another requirement that's listed on the download page but not in the included documentation. The CAB file that SPA is delivered in must be extracted with its directory structure intact. If you use Windows Explorer to open the CAB, it will not extract the files properly. Use the built-in extrac32 tool according to the directions (they're on the download page) or use another extraction application that can reproduce the proper folder structure. The final prerequisite you must satisfy is the creation of a folder to hold the results. This folder can be in any location on the system you'll be running SPA from, and it can have any name. This folder must be shared. Determine the domain account that you'll be running SPA with and give that account full permissions to the folder and its share. All that's left is to run SPA. In the folder where you extracted the CAB's contents, run SPAConsole.exe. When it opens, choose File and then New Project to get started. The first screen is just a basic introductory screen. Click on Next and you'll see the following screen, which has been filled in with examples: The previous entries direct the application to create a database on the local computer, in this case an instance of SQL Server Express. For a large environment with many systems to scan, it is recommended to use SQL Server Standard instead. The database name can be anything you like; this one has been named to reflect that it will contain data on the first Hyper-V cluster in the sample organization. Be aware that this will create a new database on the selected server. Once you have selected the database server, instance, and name, and then click on Next to move to the following screen: This screen allows you to select the advisor packs that you'd like to make available in this project. Even though you only need the Hyper-V advisor and perhaps the CoreOS advisor, it's best to select all three. The interface sometimes hangs if only a subset is selected. You won't be required to use all three during a scan. Click Next . This will bring you to the final screen: On this screen, enter the servers that you want to scan. The File Share Location is a file share that will hold the results of the scan. As with the SQL database, it's not required to be on the same system as the scanner. Servers can be added to the list later. You can use Test Configuration to ensure that the indicated servers are reachable. Once you're happy with the entries, click on Finish . You'll be returned to the main screen of SPA. Now, you should see the host(s) that you selected for this project. Select their checkboxes, and then press the Run Analysis button in the lower-right corner. Here, you'll be able to select the actual advisor packs that you want to use. At the bottom of the screen, you'll be able to enter how long you want the scan to run, and if you wish to collect numerous data points over a period of time—how often you want it to run. Click on OK when you're satisfied with your selections and the data collection process will begin. Once it is complete, you can click on the small down arrow in the Analysis Result column of one of the hosts. This will show three buttons, indicated in the following screenshot: These buttons are, from left to right: View Latest Report : See the report from the latest analysis. This is the screen you're most likely to be interested in after a one-time scan of a new system. It will show warnings for any items and settings it finds that might impede optimal performance of Hyper-V. It can also compare one report against another and export result sets to XML. Find Reports : Search through all result sets for this host according to the criteria that you choose. View Charts : These are detailed charts that examine and graph very specific performance metrics of the host. The wording of the Logical Processor count limit when Hyper-V is enabled warning is misleading. The management operating system is restricted to using 64 logical processors, but Hyper-V itself can still schedule guest processes up to the maximum of 320 logical processors. The first two buttons are very simple to understand and you should have no trouble navigating them on your own. Do remember to check the various tabs inside the report. The third button, View Charts , brings you to a tool that isn't as easy to decipher. You'll begin by picking a range of dates, and assuming that you've got more than one report to chart, you'll get a screen that looks something like the following screenshot: The sheer amount of data shown can make this difficult to interpret. In the lower section, you'll notice that there is a large number of performance counters. Select only those that you're actually interested in viewing and you'll find that the chart becomes much easier to understand. To deselect all items, select the first item and press Ctrl + A , and then press the Space bar. The items marked as 90% remove all utilization above the 90 percent mark. These are assumed to be momentary spikes that can skew the outcome in a way that makes the data meaningless. Compare these to the same metrics marked as Max . Use the Pick Series button at the bottom of the window if you wish to reduce the number of selectable items. This button is more useful on the other two tabs; in fact, they'll have no data to display if you do not select an item. As indicated, these two tabs show the way that the selected metrics have been trending over a specified period of time. These can show you how your systems behave differently during the day or across a week. Comparing these reports against those generated by other servers can help you to determine how your guests should be load balanced. Performance Monitor The built-in Performance Monitor tool is much more powerful than most others; but it's up to you to choose what to measure. One of its major strengths is that there's nothing to install. All you need is a Windows system with a GUI. As with Server Performance Advisor, it is not recommended that you run this on a Hyper-V host or guest that you are going to measure. There are two ways to run Performance Monitor. One is as a real-time tool that graphs the monitored performance counters as they occur. The second is as a collector that gathers metrics and stores them for trend analysis. The differentiating features of Performance Monitor from Server Performance Advisor are: Real-time graphing Precise selection of metrics No software downloads required No database system required Performance logs can be opened on any Windows system Performance Monitor is found in Administrative Tools. Depending on how your system is configured, Administrative Tools may be found on the Start screen or menu. It's available in the Control Panel in all versions of Windows. It's also available under the Performance node of Computer Management . If you will be running it for real-time graphing, ensure that you start it with an account that has administrative privileges on the target system. For collectors, you'll be able to specify the account to run it under. You may also need to modify the firewall as indicated under the Server Performance Advisor section mentioned earlier. Real-time monitoring with Performance Monitor To start a real-time monitoring session, expand the Monitoring Tools node and click on Performance Monitor . In the center pane, click on the button with the green plus, which will open the Add Counters window. In this window, you'll want to change the counters' source to the target computer. Your screen should look like the following screenshot: Navigate through the various counters in the upper list box. When you click on one, it will show the instances of that counter that are available to be monitored. Double-click on an instance or highlight it and click on the Add >> button to move it to the list box on the right. These are the objects that will be tracked. When you are satisfied with your selection, click on OK . See Step 4 in the next section for a screenshot of this window and more information about its contents. You will be returned to the main screen. The display will be updated every second. Each counter you picked will be displayed as a line of a various color. The legend will be shown at the bottom. You can uncheck an item to hide it from the running display; however, its counter will still be monitored. Using the buttons across the top of the graph pane, you can modify the output. Most of the options are self-explanatory; change them until the display suits your desires. You have the ability to modify the graph from its default line output to a histogram or to a running digital display. Click on the Highlight button and then select a counter to make it stand out against the others. Several of the buttons open various tabs on the Performance Monitor Properties window where you can change many settings, such as the delay between samples. Of interest here is the Source tab, which will be used in the next section. Trend tracking with Performance Monitor The second use for Performance Monitor is to pull performance statistics across a span of time. In active deployments, it can be used to track the performance of Hyper-V hosts. You'll create scheduled gathering of data collector sets for this. What makes Performance Monitor especially useful for this is that a single collector set can gather from all the hosts in your cluster simultaneously. Before you start, ensure that the Performance Monitor console is not connected to the target computer system as it would be for a real-time monitor. For instance, if you are using Computer Management as shown in the first screenshot in this article, the tree root should say Computer Management (Local) and not contain the name of another system. The first reason is that running and managing the collector sets creates a small drain on the system's resources. Second, you're going to be running collectors against multiple systems and it's better to use a single remote computer for those purposes. Third, it's easiest to look at the results of performance logs on the system that took them. Otherwise, you have to move them around. Look under the Data Collector Sets tree item. There are a number of predefined collector sets and you can add more. Just right-click on the User Defined node and choose New and Data Collector Set. The following steps will take you through the creation of a collector set: On the first screen, come up with a name for the set, then choose to manually create the set, then click on Next : This wizard will create a data collector named DataCollector01 which cannot be renamed. If you wish, you can skip through the wizard to the end, delete the generic collector, and then create new ones with friendlier names. On the second screen, you want to create performance counter data logs: On the third screen, you can change how often the collector polls for data. As you can see in the following screenshot, the default is every 15 seconds: Click on the Add… button in the previous screen to pick the counters that you want to poll. This is the same screen that you see when selecting counters in the real-time screen. Enter the name of the computer you want to poll data from in the Select counters from computer text box. Upon pressing Tab or Enter or clicking on another control, it will load the counters from that system. Select the counters and instances that you desire and click on Add >> . You can monitor counters from multiple computer systems in the same collector set if you like, but you may also choose to use one collector per computer per set. Remember that you'll want to select Hyper-V related counters for CPU, memory, and networking or you'll be retrieving collectors from the parent partition only. Physical disk counters are read from the management operating system. You cannot retrieve statistics for pass-through disks by setting performance counters on the management operating system. If you click on the Add >> button and nothing happens, it is because instances are required but didn't load. Click on another counter and then back on the desired counter until the instances are displayed. On clicking OK , you'll be returned to the previous screen that will now be populated with the counters that you chose. Ensure everything looks as you wish and click on Next . You'll now be asked for a location to save the logs to. Although it will allow you to enter a UNC, logfile creation is usually unsuccessful anywhere but on the local system. You may place them in a local folder that is shared for easy accessibility from other systems, if you wish. The final screen will have you provide the credentials that the set will use. If you leave it on its default setting, it will use the Local System account that will not have the necessary rights to run the collection on the target computer. You have two choices: you can add the computer account of the collector machine to the Performance Log Users group on all target machines or you can use an account that is a member of that group on all machines. For the purposes of this step-through, we're just going to use the domain administrator account: Before clicking on Finish , you are encouraged to set the radio button to Open properties for this data collector set . This will allow you to jump straight to the properties window where you can schedule the scan. Alternately, you can open the properties window by right-clicking on the completed collector set and clicking on Properties . In the properties window, change the options as you like. The Schedule tab is where you establish the Start and End times. You can create multiple schedules for a collector set: If you want to use a separate collector in this set for another host, right-click on the new Collector Set in the left pane and click on New and Data Collector . The wizard is very nearly identical to the one you just completed. You aren't required to follow a schedule. You can manually start and stop collector sets by right-clicking on the menu in the left pane. Once the collector has begun its work, you can go back to the real-time monitor screen and open the Performance Monitor Properties window to the Source tab. Select the logfile that you instructed the collector set to use. The display will switch to the static output of the logfile. However, it will be blank because by default, no counters are selected. Add counters with the green plus button just as you did with the real-time display. This time, you'll only be able to choose from counters that are contained in the logfiles. You can now manipulate the log contents as you did with the live display. Note that you can view a log of an actively running collector, but the screen will not update in real time. Selecting counters practically If you use the exact counters as shown in the example, you'll notice that some of them aren't very useful. For instance, the number of processors in a host is highly unlikely to change during a monitoring session, although the number of virtual processors might. Not all of the available counters are well documented, but there is a Show description check box on the counter selection screen that provides a bit of information. Also, some of the counters you can pull don't compare well from one host to another. In the sample, we instructed SV-HYPERV1 to monitor the amount of data traveling across the virtual adapter in SV-DC1. This is useful data in its own right, but probably in isolation, not as a comparator. Of course, if the virtual machine migrates to another host, it will no longer be readable. You may find the aggregate counters to be more useful than specific virtual machine counters. The counters that are truly useful are simply too numerous to make a meaningful list out of, and not all counters are universally useful in all organizations. The four generic categories you're likely to be interested in are CPU, disk, memory, and networking. Be judicious about selecting counters that look at specific highly available virtual machines. Alternative ways to read performance logs Performance logs can be confusing, especially when you first encounter them. There are a number of tools on the market designed to aid you. One free and popular tool is the Performance Analysis of Logs ( PAL ) Tool. It is a free and open source tool downloadable from Codeplex at http://pal.codeplex.com.
Read more
  • 0
  • 0
  • 4883

article-image-getting-started-pentaho-data-integration
Packt
30 Oct 2013
16 min read
Save for later

Getting Started with Pentaho Data Integration

Packt
30 Oct 2013
16 min read
(For more resources related to this topic, see here.) Pentaho Data Integration and Pentaho BI Suite Before introducing PDI, let’s talk about Pentaho BI Suite. The Pentaho Business Intelligence Suite is a collection of software applications intended to create and deliver solutions for decision making. The main functional areas covered by the suite are: Analysis: The analysis engine serves multidimensional analysis. It’s provided by the Mondrian OLAP server. Reporting: The reporting engine allows designing, creating, and distributing reports in various known formats (HTML, PDF, and so on), from different kinds of sources. Data Mining: Data mining is used for running data through algorithms in order to understand the business and do predictive analysis. Data mining is possible thanks to the Weka Project. Dashboards: Dashboards are used to monitor and analyze Key Performance Indicators (KPIs). The Community Dashboard Framework (CDF), a plugin developed by the community and integrated in the Pentaho BI Suite, allows the creation of interesting dashboards including charts, reports, analysis views, and other Pentaho content, without much effort. Data Integration: Data integration is used to integrate scattered information from different sources (applications, databases, files, and so on), and make the integrated information available to the final user. All of this functionality can be used standalone but also integrated. In order to run analysis, reports, and so on, integrated as a suite, you have to use the Pentaho BI Platform. The platform has a solution engine, and offers critical services, for example, authentication, scheduling, security, and web services. This set of software and services form a complete BI Platform, which makes Pentaho Suite the world’s leading open source Business Intelligence Suite. Exploring the Pentaho Demo The Pentaho BI Platform Demo is a pre-configured installation that allows you to explore several capabilities of the Pentaho platform. It includes sample reports, cubes, and dashboards for Steel Wheels. Steel Wheels is a fictional store that sells all kind of scale replicas of vehicles. The following screenshot is a sample dashboard available in the demo: The Pentaho BI Platform Demo is free and can be downloaded from http://sourceforge.net/projects/pentaho/files/. Under the Business Intelligence Server folder, look for the latest stable version. You can find out more about Pentaho BI Suite Community Edition at http://community.pentaho.com/projects/bi_platform. There is also an Enterprise Edition of the platform with additional features and support. You can find more on this at www.pentaho.org. Pentaho Data Integration Most of the Pentaho engines, including the engines mentioned earlier, were created as community projects and later adopted by Pentaho. The PDI engine is not an exception—Pentaho Data Integration is the new denomination for the business intelligence tool born as Kettle. The name Kettle didn’t come from the recursive acronym Kettle Extraction, Transportation, Transformation, and Loading Environment it has now. It came from KDE Extraction, Transportation, Transformation, and Loading Environment, since the tool was planned to be written on top of KDE, a Linux desktop environment, as mentioned in the introduction of the article. In April 2006, the Kettle project was acquired by the Pentaho Corporation and Matt Casters, the Kettle founder, also joined the Pentaho team as a Data Integration Architect. When Pentaho announced the acquisition, James Dixon, Chief Technology Officer said: We reviewed many alternatives for open source data integration, and Kettle clearly had the best architecture, richest functionality, and most mature user interface. The open architecture and superior technology of the Pentaho BI Platform and Kettle allowed us to deliver integration in only a few days, and make that integration available to the community. By joining forces with Pentaho, Kettle benefited from a huge developer community, as well as from a company that would support the future of the project. From that moment, the tool has grown with no pause. Every few months a new release is available, bringing to the users improvements in performance, existing functionality, new functionality, ease of use, and great changes in look and feel. The following is a timeline of the major events related to PDI since its acquisition by Pentaho: June 2006: PDI 2.3 is released. Numerous developers had joined the project and there were bug fixes provided by people in various regions of the world. The version included among other changes, enhancements for large-scale environments and multilingual capabilities. February 2007: Almost seven months after the last major revision, PDI 2.4 is released including remote execution and clustering support, enhanced database support, and a single designer for jobs and transformations, the two main kind of elements you design in Kettle. May 2007: PDI 2.5 is released including many new features; the most relevant being the advanced error handling. November 2007: PDI 3.0 emerges totally redesigned. Its major library changed to gain massive performance. The look and feel had also changed completely. October 2008: PDI 3.1 arrives, bringing a tool which was easier to use, and with a lot of new functionality as well. April 2009: PDI 3.2 is released with a really large amount of changes for a minor version: new functionality, visualization and performance improvements, and a huge amount of bug fixes. The main change in this version was the incorporation of dynamic clustering. June 2010: PDI 4.0 was released, delivering mostly improvements with regard to enterprise features, for example, version control. In the community version, the focus was on several visual improvements such as the mouseover assistance that you will experiment with soon. November 2010: PDI 4.1 is released with many bug fixes. August 2011: PDI 4.2 comes to light not only with a large amount of bug fixes, but also with a lot of improvements and new features. In particular, several of them were related to the work with repositories. April 2012: PDI 4.3 is released also with a lot of fixes, and a bunch of improvements and new features. November 2012: PDI 4.4 is released. This version incorporates a lot of enhancements and new features. In this version there is a special emphasis on Big Data—the ability of reading, searching, and in general transforming large and complex collections of datasets. 2013: PDI 5.0 will be released, delivering interesting low-level features such as step load balancing, job transactions, and restartability. Using PDI in real-world scenarios Paying attention to its name, Pentaho Data Integration, you could think of PDI as a tool to integrate data. In fact, PDI not only serves as a data integrator or an ETL tool. PDI is such a powerful tool, that it is common to see it used for these and for many other purposes. Here you have some examples. Loading data warehouses or datamarts The loading of a data warehouse or a datamart involves many steps, and there are many variants depending on business area, or business rules. But in every case, no exception, the process involves the following steps: Extracting information from one or different databases, text files, XML files and other sources. The extract process may include the task of validating and discarding data that doesn’t match expected patterns or rules. Transforming the obtained data to meet the business and technical needs required on the target. Transformation implies tasks as converting data types, doing some calculations, filtering irrelevant data, and summarizing. Loading the transformed data into the target database. Depending on the requirements, the loading may overwrite the existing information, or may add new information each time it is executed. Kettle comes ready to do every stage of this loading process. The following screenshot shows a simple ETL designed with Kettle: Integrating data Imagine two similar companies that need to merge their databases in order to have a unified view of the data, or a single company that has to combine information from a main ERP (Enterprise Resource Planning) application and a CRM (Customer Relationship Management) application, though they’re not connected. These are just two of hundreds of examples where data integration is needed. The integration is not just a matter of gathering and mixing data. Some conversions, validation, and transport of data have to be done. Kettle is meant to do all of those tasks. Data cleansing It’s important and even critical that data be correct and accurate for the efficiency of business, to generate trust conclusions in data mining or statistical studies, to succeed when integrating data. Data cleansing is about ensuring that the data is correct and precise. This can be achieved by verifying if the data meets certain rules, discarding or correcting those which don’t follow the expected pattern, setting default values for missing data, eliminating information that is duplicated, normalizing data to conform minimum and maximum values, and so on. These are tasks that Kettle makes possible thanks to its vast set of transformation and validation capabilities. Migrating information Think of a company, any size, which uses a commercial ERP application. One day the owners realize that the licenses are consuming an important share of its budget. So they decide to migrate to an open source ERP. The company will no longer have to pay licenses, but if they want to change, they will have to migrate the information. Obviously, it is not an option to start from scratch, nor type the information by hand. Kettle makes the migration possible thanks to its ability to interact with most kind of sources and destinations such as plain files, commercial and free databases, and spreadsheets, among others. Exporting data Data may need to be exported for numerous reasons: To create detailed business reports To allow communication between different departments within the same company To deliver data from your legacy systems to obey government regulations, and so on Kettle has the power to take raw data from the source and generate these kind of ad-hoc reports. Integrating PDI along with other Pentaho tools The previous examples show typical uses of PDI as a standalone application. However, Kettle may be used embedded as part of a process or a dataflow. Some examples are pre-processing data for an online report, sending mails in a scheduled fashion, generating spreadsheet reports, feeding a dashboard with data coming from web services, and so on. The use of PDI integrated with other tools is beyond the scope of this article. If you are interested, you can find more information on this subject in the Pentaho Data Integration 4 Cookbook by Packt Publishing at http://www.packtpub.com/pentaho-data-integration-4-cookbook/book. Installing PDI In order to work with PDI, you need to install the software. It’s a simple task, so let’s do it now. Time for action – installing PDI These are the instructions to install PDI, for whatever operating system you may be using. The only prerequisite to install the tool is to have JRE 6.0 installed. If you don’t have it, please download it from www.javasoft.com and install it before proceeding. Once you have checked the prerequisite, follow these steps: Go to the download page at http://sourceforge.net/projects/pentaho/files/Data Integration. Choose the newest stable release. At this time, it is 4.4.0, as shown in the following screenshot: Download the file that matches your platform. The preceding screenshot should help you. Unzip the downloaded file in a folder of your choice, that is, c:/util/kettle or /home/pdi_user/kettle. If your system is Windows, you are done. Under Unix-like environments, you have to make the scripts executable. Assuming that you chose /home/pdi_user/kettle as the installation folder, execute: cd /home/pdi_user/kettle chmod +x *.sh In Mac OS you have to give execute permissions to the JavaApplicationStub file. Look for this file; it is located in Data Integration 32-bit.appContentsMacOS, or Data Integration 64-bit.appContentsMacOS depending on your system. What just happened? You have installed the tool in just a few minutes. Now, you have all you need to start working. Launching the PDI graphical designer – Spoon Now that you’ve installed PDI, you must be eager to do some stuff with data. That will be possible only inside a graphical environment. PDI has a desktop designer tool named Spoon. Let’s launch Spoon and see what it looks like. Time for action – starting and customizing Spoon In this section, you are going to launch the PDI graphical designer, and get familiarized with its main features. Start Spoon. If your system is Windows, run Spoon.bat You can just double-click on the Spoon.bat icon, or Spoon if your Windows system doesn’t show extensions for known file types. Alternatively, open a command window—by selecting Run in the Windows start menu, and executing cmd, and run Spoon.bat in the terminal. In other platforms such as Unix, Linux, and so on, open a terminal window and type spoon.sh If you didn’t make spoon.sh executable, you may type sh spoon.sh Alternatively, if you work on Mac OS, you can execute the JavaApplicationStub file, or click on the Data Integration 32-bit.app, or Data Integration 64-bit.app icon As soon as Spoon starts, a dialog window appears asking for the repository connection data. Click on the Cancel button. A small window labeled Spoon tips... appears. You may want to navigate through various tips before starting. Eventually, close the window and proceed. Finally, the main window shows up. A Welcome! window appears with some useful links for you to see. Close the window. You can open it later from the main menu. Click on Options... from the menu Tools. A window appears where you can change various general and visual characteristics. Uncheck the highlighted checkboxes, as shown in the following screenshot: Select the tab window Look & Feel. Change the Grid size and Preferred Language settings as shown in the following screenshot: Click on the OK button. Restart Spoon in order to apply the changes. You should not see the repository dialog, or the Welcome! window. You should see the following screenshot full of French words instead: What just happened? You ran for the first time Spoon, the graphical designer of PDI. Then you applied some custom configuration. In the Option… tab, you chose not to show the repository dialog or the Welcome! window at startup. From the Look & Feel configuration window, you changed the size of the dotted grid that appears in the canvas area while you are working. You also changed the preferred language. These changes were applied as you restarted the tool, not before. The second time you launched the tool, the repository dialog didn’t show up. When the main window appeared, all of the visible texts were shown in French which was the selected language, and instead of the Welcome! window, there was a blank screen. You didn’t see the effect of the change in the Grid option. You will see it only after creating or opening a transformation or job, which will occur very soon! Spoon Spoon, the tool you’re exploring in this section, is the PDI’s desktop design tool. With Spoon, you design, preview, and test all your work, that is, Transformations and Jobs. When you see PDI screenshots, what you are really seeing are Spoon screenshots. Setting preferences in the Options window In the earlier section, you changed some preferences in the Options window. There are several look and feel characteristics you can modify beyond those you changed. Feel free to experiment with these settings. Remember to restart Spoon in order to see the changes applied. In particular, please take note of the following suggestion about the configuration of the preferred language. If you choose a preferred language other than English, you should select a different language as an alternative. If you do so, every name or description not translated to your preferred language, will be shown in the alternative language. One of the settings that you changed was the appearance of the Welcome! window at startup. The Welcome! window has many useful links, which are all related with the tool: wiki pages, news, forum access, and more. It’s worth exploring them. You don’t have to change the settings again to see the Welcome! window. You can open it by navigating to Help | Welcome Screen. Storing transformations and jobs in a repository The first time you launched Spoon, you chose not to work with repositories. After that, you configured Spoon to stop asking you for the Repository option. You must be curious about what the repository is and why we decided not to use it. Let’s explain it. As we said, the results of working with PDI are transformations and jobs. In order to save the transformations and jobs, PDI offers two main methods: Database repository: When you use the database repository method, you save jobs and transformations in a relational database specially designed for this purpose. Files: The files method consists of saving jobs and transformations as regular XML files in the filesystem, with extension KJB and KTR respectively. It’s not allowed to mix the two methods in the same project. That is, it makes no sense to mix jobs and transformations in a database repository with jobs and transformations stored in files. Therefore, you must choose the method when you start the tool. By clicking on Cancel in the repository window, you are implicitly saying that you will work with the files method. Why did we choose not to work with repositories? Or, in other words, to work with the files method? Mainly for two reasons: Working with files is more natural and practical for most users. Working with a database repository requires minimal database knowledge, and that you have access to a database engine from your computer. Although it would be an advantage for you to have both preconditions, maybe you haven’t got both of them. There is a third method called File repository, that is a mix of the two above—it’s a repository of jobs and transformations stored in the filesystem. Between the File repository and the files method, the latest is the most broadly used. Therefore, throughout this article we will use the files method. Creating your first transformation Until now, you’ve seen the very basic elements of Spoon. You must be waiting to do some interesting task beyond looking around. It’s time to create your first transformation.
Read more
  • 0
  • 0
  • 6774

article-image-dhtmlx-grid
Packt
30 Oct 2013
7 min read
Save for later

The DHTMLX Grid

Packt
30 Oct 2013
7 min read
(For more resources related to this topic, see here.) The DHTMLX grid component is one of the more widely used components of the library. It has a vast amount of settings and abilities that are so robust we could probably write an entire book on them. But since we have an application to build, we will touch on some of the main methods and get into utilizing it. Some of the cool features that the grid supports is filtering, spanning rows and columns, multiple headers, dynamic scroll loading, paging, inline editing, cookie state, dragging/ordering columns, images, multi-selection, and events. By the end of this article, we will have a functional grid where we will control the editing, viewing, adding, and removing of users. The grid methods and events When creating a DHTMLX grid, we first create the object; second we add all the settings and then call a method to initialize it. After the grid is initialized data can then be added. The order of steps to create a grid is as follows: Create the grid object Apply settings Initialize Add data Now we will go over initializing a grid. Initialization choices We can initialize a DHTMLX grid in two ways, similar to the other DHTMLX objects. The first way is to attach it to a DOM element and the second way is to attach it to an existing DHTMLX layout cell or layout. A grid can be constructed by either passing in a JavaScript object with all the settings or built through individual methods. Initialization on a DOM element Let's attach the grid to a DOM element. First we must clear the page and add a div element using JavaScript. Type and run the following code line in the developer tools console: document.body.innerHTML = "<div id='myGridCont'></div>"; We just cleared all of the body tags content and replaced it with a div tag having the id attribute value of myGridCont. Now, create a grid object to the div tag, add some settings and initialize it. Type and run the following code in the developer tools console: var myGrid = new dhtmlXGridObject("myGridCont"); myGrid.setImagePath(config.imagePath); myGrid.setHeader(["Column1", "Column2", "Column3"]); myGrid.init(); You should see the page with showing just the grid header with three columns. Next, we will create a grid on an existing cell object. Initialization on a cell object Refresh the page and add a grid to the appLayout cell. Type and run the following code in the developer tools console: var myGrid = appLayout.cells("a").attachGrid(); myGrid.setImagePath(config.imagePath); myGrid.setHeader(["Column1","Column2","Column3"]); myGrid.init(); You will now see the grid columns just below the toolbar. Grid methods Now let's go over some available grid methods. Then we can add rows and call events on this grid. For these exercises we will be using the global appLayout variable. Refresh the page. attachGrid We will begin by creating a grid to a cell. The attachGrid method creates and attaches a grid object to a cell. This is the first step in creating a grid. Type and run the following code line in the console: var myGrid = appLayout.cells("a").attachGrid(); setImagePath The setImagePath method allows the grid to know where we have the images placed for referencing in the design. We have the application image path set in the config object. Type and run the following code line in the console: myGrid.setImagePath(config.imagePath); setHeader The setHeader method sets the column headers and determines how many headers we will have. The argument is a JavaScript array. Type and run the following code line in the console: myGrid.setHeader(["Column1", "Column2", "Column3"]); setInitWidths The setinitWidths method will set the initial widths of each of the columns. The asterisk mark (*) is used to set the width automatically. Type and run the following code line in the console: myGrid.setInitWidths("125,95,*");   setColAlign The setColAlign method allows us to align the column's content position. Type and run the following code line in the console: myGrid.setColAlign("right,center,left"); init Up until this point, we haven't seen much going on. It was all happening behind the scenes. To see these changes the grid must be initialized. Type and run the following code line in the console: myGrid.init(); Now you see the columns that we provided. addRow Now that we have a grid created let's add a couple rows and start interacting. The addRow method adds a row to the grid. The parameters are ID and columns. Type and run the following code in the console: myGrid.addRow(1,["test1","test2","test3"]); myGrid.addRow(2,["test1","test2","test3"]); We just created two rows inside the grid. setColTypes The setColTypes method sets what types of data a column will contain. The available type options are: ro (readonly) ed (editor) txt (textarea) ch (checkbox) ra (radio button) co (combobox) Currently, the grid allows for inline editing if you were to double-click on grid cell. We do not want this for the application. So, we will set the column types to read-only. Type and run the following code in the console: myGrid.setColTypes("ro,ro,ro"); Now the cells are no longer editable inside the grid. getSelectedRowId The getSelectedRowId method returns the ID of the selected row. If there is nothing selected it will return null. Type and run the following code line in the console: myGrid.getSelectedRowId(); clearSelection The clearSelection method clears all selections in the grid. Type and run the following code line in the console: myGrid.clearSelection(); Now any previous selections are cleared. clearAll The clearAll method removes all the grid rows. Prior to adding more data to the grid we first must clear it. If not we will have duplicated data. Type and run the following code line in the console: myGrid.clearAll(); Now the grid is empty. parse The parse method allows the loading of data to a grid in the format of an XML string, CSV string, XML island, XML object, JSON object, and JavaScript array. We will use the parse method with a JSON object while creating a grid for the application. Here is what the parse method syntax looks like (do not run this in console): myGrid.parse(data, "json"); Grid events The DHTMLX grid component has a vast amount of events. You can view them in their entirety in the documentation. We will cover the onRowDblClicked and onRowSelect events. onRowDblClicked The onRowDblClicked event is triggered when a grid row is double-clicked. The handler receives the argument of the row ID that was double-clicked. Type and run the following code in console: myGrid.attachEvent("onRowDblClicked", function(rowId){ console.log(rowId); }); Double-click one of the rows and the console will log the ID of that row. onRowSelect The onRowSelect event will trigger upon selection of a row. Type and run the following code in console: myGrid.attachEvent("onRowSelect", function(rowId){ console.log(rowId); }); Now, when you select a row the console will log the id of that row. This can be perceived as a single click. Summary In this article, we learned about the DHTMLX grid component. We also added the user grid to the application and tested it with the storage and callbacks methods. Resources for Article: Further resources on this subject: HTML5 Presentations - creating our initial presentation [Article] HTML5: Generic Containers [Article] HTML5 Canvas [Article]
Read more
  • 0
  • 0
  • 13091
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at €18.99/month. Cancel anytime
Packt
30 Oct 2013
16 min read
Save for later

IBM SPSS Modeler – Pushing the Limits

Packt
30 Oct 2013
16 min read
(For more resources related to this topic, see here.) Using the Feature Selection node creatively to remove or decapitate perfect predictors In this recipe, we will identify perfect or near perfect predictors in order to insure that they do not contaminate our model. Perfect predictors earn their name by being correct 100 percent of the time, usually indicating circular logic and not a prediction of value. It is a common and serious problem. When this occurs we have accidentally allowed information into the model that could not possibly be known at the time of the prediction. Everyone 30 days late on their mortgage receives a late letter, but receiving a late letter is not a good predictor of lateness because their lateness caused the letter, not the other way around. The rather colorful term decapitate is borrowed from the data miner Dorian Pyle. It is a reference to the fact that perfect predictors will be found at the top of any list of key drivers ("caput" means head in Latin). Therefore, to decapitate is to remove the variable at the top. Their status at the top of the list will be capitalized upon in this recipe. The following table shows the three time periods; the past, the present, and the future. It is important to remember that, when we are making predictions, we can use information from the past to predict the present or the future but we cannot use information from the future to predict the future. This seems obvious, but it is common to see analysts use information that was gathered after the date for which predictions are made. As an example, if a company sends out a notice after a customer has churned, you cannot say that the notice is predictive of churning.   Past Now Future   Contract Start Expiration Outcome Renewal Date Joe January 1, 2010 January 1, 2012 Renewed January 2, 2012 Ann February 15, 2010 February 15, 2012 Out of Contract Null Bill March 21, 2010 March 21, 2012 Churn NA Jack April 5, 2010 April 5, 2012 Renewed April 9, 2012 New Customer 24 Months Ago Today ??? ??? Getting ready We will start with a blank stream, and will be using the cup98lrn reduced vars2.txt data set. How to do it... To identify perfect or near-perfect predictors in order to insure that they do not contaminate our model: Build a stream with a Source node, a Type node, and a Table then force instantiation by running the Table node. Force TARGET_B to be flag and make it the target. Add a Feature Selection Modeling node and run it. Edit the resulting generated model and examine the results. In particular, focus on the top of the list. Review what you know about the top variables, and check to see if any could be related to the target by definition or could possibly be based on information that actually postdates the information in the target. Add a CHAID Modeling node, set it to run in Interactive mode, and run it. Examine the first branch, looking for any child node that might be perfectly predicted; that is, look for child nodes whose members are all found in one category. Continue steps 6 and 7 for the first several variables. Variables that are problematic (steps 5 and/or 7) need to be set to None in the Type node. How it works... Which variables need decapitation? The problem is information that, although it was known at the time that you extracted it, was not known at the time of decision. In this case, the time of decision is the decision that the potential donor made to donate or not to donate. Was the amount, Target_D known before the decision was made to donate? Clearly not. No information that dates after the information in the target variable can ever be used in a predictive model. This recipe is built of the following foundation—variables with this problem will float up to the top of the Feature Selection results. They may not always be perfect predictors, but perfect predictors always must go. For example, you might find that, if a customer initially rejects or postpones a purchase, there should be a follow up sales call in 90 days. They are recorded as rejected offer in the campaign, and as a result most of them had a follow up call in 90 days after the campaign. Since a couple of the follow up calls might not have happened, it won't be a perfect predictor, but it still must go. Note that variables such as RFA_2 and RFA_2A are both very recent information and highly predictive. Are they a problem? You can't be absolutely certain without knowing the data. Here the information recorded in these variables is calculated just prior to the campaign. If the calculation was made just after, they would have to go. The CHAID tree almost certainly would have shown evidence of perfect prediction in this case. There's more... Sometimes a model has to have a lot of lead time; predicting today's weather is a different challenge than next year's prediction in the farmer's almanac. When more lead time is desired you could consider dropping all of the _2 series variables. What would the advantage be? What if you were buying advertising space and there was a 45 day delay for the advertisement to appear? If the _2 variables occur between your advertising deadline and your campaign you might have to use information attained in the _3 campaign. Next-Best-Offer for large datasets Association models have been the basis for next-best-offer recommendation engines for a long time. Recommendation engines are widely used for presenting customers with cross-sell offers. For example, if a customer purchases a shirt, pants, and a belt; which shoes would he also likely buy? This type of analysis is often called market-basket analysis as we are trying to understand which items customers purchase in the same basket/transaction. Recommendations must be very granular (for example, at the product level) to be usable at the check-out register, website, and so on. For example, knowing that female customers purchase a wallet 63.9 percent of the time when they buy a purse is not directly actionable. However, knowing that customers that purchase a specific purse (for example, SKU 25343) also purchase a specific wallet (for example, SKU 98343) 51.8 percent of the time, can be the basis for future recommendations. Product level recommendations require the analysis of massive data sets (that is, millions of rows). Usually, this data is in the form of sales transactions where each line item (that is, row of data) represents a single product. The line items are tied together by a single transaction ID. IBM SPSS Modeler association models support both tabular and transactional data. The tabular format requires each product to be represented as column. As most product level recommendations would contain thousands of products, this format is not practical. The transactional format uses the transactional data directly and requires only two inputs, the transaction ID and the product/item. Getting ready This example uses the file stransactions.sav and scoring.csv. How to do it... To recommend the next best offer for large datasets: Start with a new stream by navigating to File | New Stream. Go to File | Stream Properties from the IBM SPSS Modeler menu bar. On the Options tab change the Maximum members for nominal fields to 50000. Click on OK. Add a Statistics File source node to the upper left of the stream. Set the file field by navigating to transactions.sav. On the Types tab, change the Product_Code field to Nominal and click on the Read Values button. Click on OK. Add a CARMA Modeling node connected to the Statistics File source node in step 3. On the Fields tab, click on the Use custom settings and check the Use transactional format check box. Select Transaction_ID as the ID field and Product_Code as the Content field. On the Model tab of the CARMA Modeling node, change the Minimum rule support (%) to 0.0 and the Minimum rule confidence (%) to 5.0. Click on the Run button to build the model. Double-click the generated model to ensure that you have approximately 40,000 rules. Add a Var File source node to the middle left of the stream. Set the file field by navigating to scoring.csv. On the Types tab, click on the Read Values button. Click on the Preview button to preview the data. Click on OK to dismiss all dialogs. Add a Sort node connected to the Var File node in step 6. Choose Transaction_ID and Line_Number (with Ascending sort) by clicking the down arrow on the right of the dialog. Click on OK. Connect the Sort node in step 7 to the generated model (replacing the current link). Add an Aggregate node connected to the generated model. Add a Merge node connected to the generated model. Connect the Aggregate node in step 9 to the Merge node. On the Merge tab, choose Keys as the Merge Method, select Transaction_ID, and click on the right arrow. Click on OK. Add a Select node connected to the Merge node in step 10. Set the condition to Record_Count = Line_Number. Click on OK. At this point, the stream should look as follows: Add a Table node connected to the Select node in step 11. Right-click on the Table node and click on Run to see the next-best-offer for the input data. How it works... In steps 1-5, we set up the CARMA model to use the transactional data (without needing to restructure the data). CARMA was selected over A Priori for its improved performance and stability with large data sets. For recommendation engines, the settings for the Model tab are somewhat arbitrary and are driven by the practical limitations of the number of rules generated. Lowering the thresholds for confidence and rule support generates more rules. Having more rules can have a negative impact on scoring performance but will result in more (albeit weaker) recommendations. Rule Support How many transactions contain the entire rule (that is, both antecedents ("if" products) and consequents ("then" products)) Confidence If a transaction contains all the antecedents ("if" products), what percentage of the time does it contain the consequents ("then" products) In step 5, when we examine the model we see the generated Association Rules with the corresponding rules support and confidences. In the remaining steps (7-12), we score a new transaction and generate 3 next-best-offers based on the model containing the Association Rules. Since the model was built with transactional data, the scoring data must also be transactional. This means that each row is scored using the current row and the prior rows with the same transaction ID. The only row we generally care about is the last row for each transaction where all the data has been presented to the model. To accomplish this, we count the number of rows for each transaction and select the line number that equals the total row count (that is, the last row for each transaction). Notice that the model returns 3 recommended products, each with a confidence, in order of decreasing confidence. A next-best-offer engine would present the customer with the best option first (or potentially all three options ordered by decreasing confidence). Note that, if there is no rule that applies to the transaction, nulls will be returned in some or all of the corresponding columns. There's more... In this recipe, you'll notice that we generate recommendations across the entire transactional data set. By using all transactions, we are creating generalized next-best-offer recommendations; however, we know that we can probably segment (that is, cluster) our customers into different behavioral groups (for example, fashion conscience, value shoppers, and so on.). Partitioning the transactions by behavioral segment and generating separate models for each segment will result in rules that are more accurate and actionable for each group. The biggest challenge with this approach is that you will have to identify the customer segment for each customer before making recommendations (that is, scoring). A unified approach would be to use the general recommendations for a customer until a customer segment can be assigned then use segmented models. Correcting a confusion matrix for an imbalanced target variable by incorporating priors Classification models generate probabilities and a classification predicted class value. When there is a significant imbalance in the proportion of True values in the target variable, the confusion matrix as seen in the Analysis node output will show that the model has all predicted class values equal to the False value, leading an analyst to conclude the model is not effective and needs to be retrained. Most often, the conventional wisdom is to use a Balance node to balance the proportion of True and False values in the target variable, thus eliminating the problem in the confusion matrix. However, in many cases, the classifier is working fine without the Balance node; it is the interpretation of the model that is biased. Each model generates a probability that the record belongs to the True class and the predicted class is derived from this value by applying a threshold of 0.5. Often, no record has a propensity that high, resulting in every predicted class value being assigned False. In this recipe we learn how to adjust the predicted class for classification problems with imbalanced data by incorporating the prior probability of the target variable. Getting ready This recipe uses the datafile cup98lrn_reduced_vars3.sav and the stream Recipe – correct with priors.str. How to do it... To incorporate prior probabilities when there is an imbalanced target variable: Open the stream Recipe – correct with priors.str by navigating to File | Open Stream. Make sure the datafile points to the correct path to the datafile cup98lrn_reduced_vars3.sav. Open the generated model TARGET_B, and open the Settings tab. Note that compute Raw Propensity is checked. Close the generated model. Duplicate the generated model by copying and pasting the node in the stream. Connect the duplicated model to the original generated model. Add a Type node to the stream and connect it to the generated model. Open the Type node and scroll to the bottom of the list. Note that the fields related to the two models have not yet been instantiated. Click on Read Values so that they are fully instantiated. Insert a Filler node and connect it to the Type node. Open the Filler node and, in the variable list, select $N1-TARGET_B. Inside the Condition section, type $RP1-TARGET_B' >= TARGET_B_Mean, Click on OK to dismiss the Filler node (after exiting the Expression Builder). Insert an Analysis node to the stream. Open the Analysis node and click on the check box for Coincidence Matrices. Click on OK. Run the stream to the Analysis node. Notice that the coincidence matrix (confusion matrix) for $N-TARGET_B has no predictions with value = 1, but the coincidence matrix for the second model, the one adjusted by step 7 ($N1-TARGET_B), has more than 30 percent of the records labeled as value = 1. How it works... Classification algorithms do not generate categorical predictions; they generate probabilities, likelihoods, or confidences. For this data set, the target variable, TARGET_B, has two values: 1 and 0. The classifier output from any classification algorithm will be a number between 0 and 1. To convert the probability to a 1 or 0 label, the probability is thresholded, and the default in Modeler (and all predictive analytics software) is the threshold at 0.5. This recipe changes that default threshold to the prior probability. The proportion of TARGET_B = 1 values in the data is 5.1 percent, and therefore this is the classic imbalanced target variable problem. One solution to this problem is to resample the data so that the proportion of 1s and 0s are equal, normally achieved through use of the Balance node in Modeler. Moreover, one can create the Balance node from running a Distribution node for TARGET_B, and using the Generate | Balance node (reduce) option. The justification for balancing the sample is that, if one doesn't do it, all the records will be classified with value = 0. The reason for all the classification decisions having value 0 is not because the Neural Network isn't working properly. Consider the histogram of predictions from the Neural Network shown in the following screenshot. Notice that the maximum value of the predictions is less than 0.4, but the center of density is about 0.05. The actual shape of the histogram and the maximum predicted value depend on the Neural Network; some may have maximum values slightly above 0.5. If the threshold for the classification decision is set to 0.5, since no neural network predicted confidence is greater than 0.5, all of the classification labels will be 0. However, if one sets the threshold to the TARGET_B prior probability, 0.051, many of the predictions will exceed that value and be labeled as 1. We can see the result of the new threshold by color-coding the histogram of the previous figure with the new class label, in the following screenshot. This recipe used a Filler node to modify the existing predicted target value. The categorical prediction from the Neural Network whose prediction is being changed is $N1-TARGET_B. The $ variables are special field names that are used automatically in the Analysis node and Evaluation node. It's possible to construct one's own $ fields with a Derive node, but it is safer to modify the one that's already in the data. There's more... This same procedure defined in this recipe works for other modeling algorithms as well, including logistic regression. Decision trees are a different matter. Consider the following screenshot. This result, stating that the C5 tree didn't split at all, is the result of the imbalanced target variable. Rather than balancing the sample, there are other ways to get a tree built. For C&RT or Quest trees, go to the Build Options, select the Costs & Priors item, and select Equal for all classes for priors: equal priors. This option forces C&RT to treat the two classes mathematically as if their counts were equal. It is equivalent to running the Balance node to boost samples so that there are equal numbers of 0s and 1s. However, it's done without adding additional records to the data, slowing down training; equal priors is purely a mathematical reweighting. The C5 tree doesn't have the option of setting priors. An alternative, one that will work not only with C5 but also with C&RT, CHAID, and Quest trees, is to change the Misclassification Costs so that the cost of classifying a one as a zero is 20, approximately the ratio of the 95 percent 0s to 5 percent 1s.
Read more
  • 0
  • 0
  • 7730

article-image-creating-image-gallery
Packt
30 Oct 2013
5 min read
Save for later

Creating an image gallery

Packt
30 Oct 2013
5 min read
(For more resources related to this topic, see here.) Getting ready Before we get started, we need to find a handful of images that we can use for the gallery. Find four to five images to use for the gallery and put them in the images folder. How to do it... Add the following links to the images to the index.html file: <a class="fancybox"href="images/waterfall.png">Waterfall</a><a class="fancybox" href="images/frozenlake.png">Frozen Lake</a><a class="fancybox" href="images/road-inforest.png">Road in Forest</a><a class="fancybox" href="images/boston.png">Boston</a> The anchor tags no longer have an ID, but a class. It is important that they all have the same class so that Fancybox knows about them. Change our call to the Fancybox plugin in the scripts.js file to use the class that all of the links have instead of show-fancybox ID. $(function() { // Using fancybox class instead of the show-fancybox ID $('.fancybox').fancybox(); }); Fancybox will now work on all of the images but they will not be part of the same gallery. To make images part of a gallery, we use the rel attribute of the anchor tags. Add rel="gallery" to all of the anchor tags, shown as follows: <a class="fancybox" rel="gallery" href="images/waterfall.png">Waterfall</a> <a class="fancybox" rel="gallery" href="images/frozenlake.png">Frozen Lake</a> <a class="fancybox" rel="gallery" href="images/roadin-forest.png">Road in Forest</a> <a class="fancybox" rel="gallery" href="images/boston.png">Boston</a> Now that we have added rel="gallery" to each of our anchor tags, you should see left and right arrows when you hover over the left-hand side or right-hand side of Fancybox. These arrows allow you to navigate between images as shown in the following screenshot: How it works... Fancybox determines that an image is part of a gallery using the rel attribute of the anchor tags. The order of the images is based on the order of the anchor tags on the page. This is important so that the slideshow order is exactly the same as a gallery of thumbnails without any additional work on our end. We changed the ID of our single image to a class for the gallery because we wanted to call Fancybox on all of the links instead of just one. If we wanted to add more image links to the page, it would just be a matter of adding more anchor tags with the proper href values and the same class. There's more... So, what else can we do with the gallery functionality of Fancybox? Let's take a look at some of the other things that we could do with the gallery that we have currently. Captions and thumbnails All of the functionalities that we discussed for single images apply to galleries as well. So, if we wanted to add a thumbnail, it would just be a matter of adding an img tag inside the anchor tag instead of the text. If we wanted to add a caption, we can do so by adding the title attribute to our anchor tags. Showing slideshow from one link Let's say that we wanted to have just one link to open our gallery slideshow. This can be easily achieved by hiding the other links via CSS with the help of the following step: We start by adding this style tag to the <head> tag just under the <script> tag for our scripts.js file, shown as follows: <style type="text/css"> .hidden { display: none; } </style> Now, we update the HTML file so that all but one of our anchor tags have the hidden class. Next, when we reload the page, we will see only one link. When you click on the link, you should still be able to navigate through the gallery just like all of the links were on the page. <a class="fancybox" rel="gallery" href="images/waterfall.png">Image Gallery</a> <div class="hidden"> <a class="fancybox" rel="gallery" href="images/frozen-lake.png">Frozen Lake</a> <a class="fancybox" rel="gallery" href="images/roadin-forest.png">Road in Forest</a> <a class="fancybox" rel="gallery" href="images/boston.png">Boston</a> </div> Summary In this article we saw that Fancybox provides very strong image handling functionalities. We also saw how an image gallery is created by Fancybox. We can also display images as thumbnails and display the images as a slideshow using just one link. Resources for Article: Further resources on this subject: Getting started with your first jQuery plugin [Article] OpenCart Themes: Styling Effects of jQuery Plugins [Article] The Basics of WordPress and jQuery Plugin [Article]
Read more
  • 0
  • 0
  • 2143

article-image-where-my-data-and-how-do-i-get-it
Packt
30 Oct 2013
16 min read
Save for later

Where Is My Data and How Do I Get to It?

Packt
30 Oct 2013
16 min read
(For more resources related to this topic, see here.) System databases versus company databases The first task of identifying where our data is located is making sure we are using the correct database! Microsoft Dynamics GP 2013 utilizes the Microsoft SQL Server platform as its database engine. When Dynamics GP 2013 is installed on our environment and a new company is created, the installation process creates several databases on a server that has been designated during the installation. These databases will store all the information entered through the Dynamics GP 2013 application, and we can use SQL Server Management Studio to access the underlying tables that store this data. Microsoft Dynamics GP has two types of databases, a system database and company database(s). For first-time report developers and seasoned writers, knowing which of these databases stores a particular piece of information we need is crucial for an accurate report. System databases Prior to GP 2013, the system database was named the DYNAMICS database, and this default name could not be changed. Now, with the new installation of GP 2013, the system database can be named as something different than DYNAMICS. This functionality was introduced as a way to allow multiple sets of GP installations to reside on the same SQL server instance. This is especially important for companies providing GP hosting services for multiple companies on a single SQL server instance. When Microsoft Dynamics GP is first installed and some initial settings are provided, a system database will be created. This database is the system database that can contain up to ten characters. It includes things such as records for each company that you create, the organization's registration information, and the maximum account framework. While many of us can expect to work in environments where only one system database exists, and where that system database uses the standard 'DYNAMICS' name, we should still be careful not to hard-code references to this database. While we may have been able to take this shortcut in reports for earlier versions of GP, this will surely cause issues when we least expect. Whenever querying company data, use the DBNAME field from the SY00100 table in the company database to ensure the right system database name is being used. From a reporting standpoint, there is certain information located in the system database that we may need to report on at one time or another. We have provided a quick reference for this information in the following list: Multicurrency System Setups: This includes the setup of the currencies, the exchange rates, and the currency symbols for those organizations that process transactions in any number of foreign currencies. Intercompany Setup: This is where the intercompany relationships are stored and the specific dues to/due from accounts are mapped. Organizations Structures: This will include the organizational levels and entities that have been created for those organizations that use this feature. User Master: This table stores information about the users in the ERP system, including their user ID and username. User Tasks: This table stores the tasks that users set up in Dynamics GP. These are the tasks that are displayed on the users' home screens in GP. Company Master: This stores company setup information, such as whether security is enabled, the company ID is in the form of the company database ID in SQL Management Studio, primary address information, tax schedule defaults, and any number of options for the company. Security Setups: This includes all of the security tasks, security roles, and user security assignments. User-Company Access: This includes the companies that each user has access to. Company databases Each company that we create in Dynamics GP has its own company database. As information such as transactions, accounts, and customer or vendor data is entered through GP, this information is recorded in individual fields. These fields comprise the smallest unit of data stored. All of this data makes up a record, and a record is grouped with similar records and stored in a table. For obvious reasons, this data is segmented by a company database so that each company can maintain unique records. In addition to this transactional data, numerous additional company setups exist in the company database. As with the System database, we may need to report on some of these company system setups. The following is a quick reference to the more common company setup tables: Account Formats: This stores the chart of accounts format for the company. Posting Definitions: This stores how the individual modules post to the General Ledger. Company Locations: This lists additional addresses for each company. Source Document Master and Audit Trail Codes: Every transaction is assigned both a source document and an audit trail code. This table can be used to report on the full description of these codes. Shipping Methods Master: This stores the setup details of the shipping methods for the company. Payment Terms Master: This stores the setup details of the payment terms created for the company. Record Notes Master: This stores all of the record level notes for the particular company. Comment Master: This stores predefined comments to be used across multiple series in Dynamics GP. Electronic Funds Transfer: This stores the EFT setup information for the company for both Payables and Receivables modules. This includes customer and vendor banking information. Period Setup: This stores the fiscal period setup for the company. Sales/Purchases Tax Tables: These tables store the tax detail and tax schedule records as well as the tax summary amounts. Dynamics GP table naming/numbering conventions Dynamics GP has a rather interesting and sometimes challenging table naming structure. When developers or consultants first see this, they are overwhelmed to say the least. Because Dynamics GP is actually a collection of modules, some of which were developed by outside organizations and later assimilated into the core product, we will find that even the standard table naming and numbering do not always apply depending on which module contains the data we need. Nevertheless, by learning the standard structure and naming convention of the core modules, we will notice that it does make some sense. With this knowledge in hand, as well as some of the resources we will cover later in this article, we will even have a head start on understanding where data resides in tables underlying non-core GP modules that might not follow the standard naming convention. Tables versus Table Groups When data is entered into windows via the Microsoft Dynamics GP application, that data is stored in tables in the underlying SQL database. In most cases, data entered via a single process can be stored in two or more tables. In such cases, it is common for these tables to be grouped together by a certain naming convention. For example, entering journal entry information may update the Transactions Work table (contains General Ledger transaction header information), the Transaction Amounts Work table (contains the General Ledger transaction distributions), and the Transaction Clearing Amounts Work table (contains the General Ledger clearing transactions distributions). These make up what are called Table Groups. These table groups are also referred to as logical tables. Each Microsoft Dynamics GP table has three names: Technical name Display name Physical name The technical name is used solely by the software and will usually be seen in some alert messages instead of the display name. The display name is the name that will appear in most of the alert messages generated by the system, and is typically the name used for a given table when referring to it in speech, for example, Vendor Master. The physical name is the name that will be found in the SQL database when looking in Microsoft SQL Server Management Studio. Physical table naming/numbering conventions When working within the context of a SQL database to generate reports, developers and consultants will make use of the table physical names. A quick scan through the various tables in a standard GP install reveals a bewildering array of table numbers. How can there possibly be any rhyme or reason to these table names? Surprisingly, it does actually follow a certain pattern. For the most part, the table numbering follows a special convention. This schema packs a lot of information into a small number, and it can help developers and consultants know where to begin looking for their data. As Dynamics GP has grown into a more comprehensive accounting solution, it has expanded, in part, by incorporating third-party applications into the solution. While many of the third-party programmers tried to stick within the relative bounds of the GP physical table naming conventions, as we will soon see, this is not always the case. So, while many of the tables that belong to the core GP modules maintain a fairly standard numbering convention, we will find that this does not hold true for all GP modules. Generally speaking, GP table physical names contain a two or three digit alpha prefix followed by a five digit number. The three digit prefix represents the module for which the table holds data. The numbers that follow identify what type of data is held in the table. For example, is it posted transaction data? Or is it information related to the module setup? As we see in the following figure and the following sections of this article, the numbering schema for a Dynamics GP physical table can be broken down to reveal information about the kind of data that is found in that table. Alpha code Let's begin by taking a look at the alpha-prefix for these tables. We have put together a handy reference of some of the most common prefixes and the modules that they represent: Prefix Module Prefix Module AA Analytical Accounting MRP Material Requirements Planning AF Advanced Financials MXLS Audit Trails/Electronic Signatures ASI SmartList NLB Navigation List Builder BM Bill of Materials (Mfg) OC Sales Configurator (Mfg) CFM Cash Flow Management OSRC Outsourcing (Mfg) CLM Certification Manager PA Project Accounting CM Checkbook Master PDK Personal Data Keeper (Proj. Acct.) CN Collections Management PM Payables Management CP Capacity Requirements Planning (Mfg) POP Purchase Order Processing DD Direct Deposit PP Revenue Expense Deferrals EC Engineering Change Management (Mfg) QA Quality Assurance (Mfg) ERB Excel Report Builder RM Receivables Management EXT Extender RT Routings (Mfg) FA Fixed Assets SC Sales Forecasting (Mfg) GL General Ledger SLB SmartList Builder HR Human Resources SOP Sales Order Processing ICJC Job Costing (Mfg) SVC Field Service IV Inventory SY System/Company Setup IVC Invoicing (Sales) UPR Payroll MC Multicurrency VAT Intrastat ME Electronic Reconcile (EFT) WC Work Centers (Mfg) MOP Manufacturing Order Processing WO Manufacturing Orders As we can see from the earlier list, in some modules, tables do not share a single common prefix. For example, in the Manufacturing module, tables are broken down even further with Routing tables represented by one set of digits while Material Requirements Planning data is found in tables represented by another set of digits. Other modules use a similar alpha code prefix for all tables in the module. Consider, for example, the Project Accounting module where all project transactions, billing, and revenue recognition tables are represented with the same alpha code. As we said earlier, not all modules will follow the standard naming convention, and this is no different when it comes to alpha prefixes for table names. With experience, we will come to learn which modules have a consistent alpha prefix and which ones are broken down even further into more granular prefixes. Table type In addition to knowing the module in which our data is stored, we must also know a bit about the kind of data which we are looking for. Let's take a look at the various types of data that exist in GP tables: Setup Tables Almost all modules in GP have setup windows that allow users to define default options or other settings for how that module will be used. The options selected in these windows can be found in the Setup tables. Master Tables Some modules, such as Payables Management, allow users to record master records. These master records represent permanent records, such as vendors, for the company. Typically, master records must be entered prior to using a module as transactions will utilize these master records. Transaction Tables These tables contain the transaction-level data from Dynamics GP. These range from the most basic of GP transactions, the journal entry, to transactions entered in sub-ledgers such as the distribution records of a posted Receivables invoice. Transactions entered in various modules will, depending on their status, be stored in one of three types of transaction tables. The three transaction tables are as follows: Work: Unposted transactions can generally be found in work tables. The name is appropriate as these transactions can be considered as "work-in-progress". They have not been committed to the sub or General Ledgers via posting processes, so we should factor this into our thinking when deciding whether or not to include these records in our reports. Open: Generally speaking, records in these tables have been posted, but are awaiting an additional action before they can be considered "closed" or "history". In the General Ledger module, the open table represents all journal entries for the current open year. In other modules, such as Receivables Management, open tables contain data for receivable transactions that have not yet been fully applied. History: Transactions that are "completed" generally end up in the history tables. Again, this depends on the module. For example, in Payables Management, fully applied invoices are moved to history. In Purchase Order Processing, however, a routine exists to move completed purchase orders to history. Until this routine is run, records will not be moved to history. Cross Reference Tables Some tables represent data that spans multiple modules. For example, GP users can link purchase orders to unfulfilled sales order line items via Sales Order Commitment. The prefix of the table that contains these links indicates that this is a Sales Order Processing table, but in actuality, it contains data from Purchase Order Processing as well. Other Table types In addition to these main table types, other table types exist that are less commonly used for reporting purposes. Nonetheless, it is helpful to understand what these table types contain. These less commonly used table types are as follows: Report Options: Before users can generate a report from the Reports menu, a series of options must be designated for that report. These report options are recorded in a series of report options tables. Temp: As the name implies, data is only stored in these tables temporarily. Temporary tables can be used in a variety of situations, such as when a user clicks on the Post button for a transaction. Although rare, it may be necessary to use a temp table when designing a report that should contain data from the time of posting. For example, a sales invoice can be generated at the time the user posts the invoice and can be based on data stored in a temp table at the time. Identifying the Table type by the table naming convention In terms of the physical naming convention for GP tables, the table type is represented by the first digit following the module code. The following table contains the various table types and their associated number in the numbering convention: Table Type Number Master 0 Work 1 Open 2 History 3 Setup 4 Temporary 5 Cross Reference 6 Report Options 7 As we've stressed already throughout this article, the numbering scheme used to identify table the type will work fairly well for most modules. Not all modules utilize all table types, so we should not expect to see this consistency among all modules. For example, in the Sales module, sales transactions remain in Work tables (such as SOP10100 and SOP10200) until they are posted, at which point they move to History tables (such as SOP30100 and SOP30200). Sequence The next two digits in the numbering convention make up the sequence number. This number indicates the logical table to which the table belongs. As we discussed earlier in the article, logical tables are related tables, or table groups, that share similar data. Not only do these table groups share similar data, but they also share the same data type and sequence number when it comes to the physical table numbering convention. For example, let's consider the following set of logical tables: PM00200 (Vendor Master) PM00201 (Vendor Master Summary) PM00202 (Vendor Master Period Summary) PM00203 (Vendor Accounts) PM00204 (Purchasing 1099 Detail) These tables comprise the Payables Vendor Master Logical File table group. We can easily see this by the numbers that follow the module code. First, we see that these tables share the same data type—remember, 0 means these are Master tables—and second, we see that these tables share the same sequence number. Although we need other tools to help us determine the name of the table group, we are able to easily scan through a list of table physical names in SQL Management Studio and see that these tables are in the same table group. Variant The final two digits of the physical numbering convention represent the logical group variant. Within a logical group, numbers are incremented sequentially. This is evident in our example using the Payables Vendor Master Logical File we just saw, as we see the final two digits of each table increment by one. In table groups related to transactions, the variant often distinguishes between header tables, detail tables, distribution tables, and other related tables. Keep in mind that what we have discussed is only a general naming and numbering convention for GP tables in their SQL databases. Unfortunately, as we've already illustrated in earlier sections, these conventions do not hold true in all cases! At the very least, knowing this convention can point us in the right direction. We can rely on other tools and resources to help us pinpoint the right table when these conventions fall short. In just a few moments, we will look at some of the tools and resources that can help us find the right table.
Read more
  • 0
  • 0
  • 8010

article-image-working-different-types-interactive-charts
Packt
30 Oct 2013
7 min read
Save for later

Working with Different Types of Interactive Charts

Packt
30 Oct 2013
7 min read
(For more resources related to this topic, see here.) This article explains how to create and embed 2D and 3D charts. They can also be interactive or static and we will insert them into our Moodle courses. We will mainly work with several spreadsheets in order to include diverse tools and techniques that are also present. The main idea is to display data in charts and provide students with the necessary information for their activities. We will also work with a variety of charts and deal with statistics as a baseline topic in this article. We can either develop a chart or work with ready-to-use data. You can design these types of activities in your Moodle course, together with a math teacher. When thinking of statistics, we generally have in mind a picture of a chart and some percentages representing the data of the chart. We can change that paradigm and create a different way to draw and read statistics in our Moodle course. We design charts with drawings, map charts, links to websites, and other interesting items. We can also redesign the charts, comprising numbers, with different assets because we want not only to enrich, but also strengthen the diversity of the material for our Moodle course since some students are not keen on numbers and dislike activities with them. So, let's give another chance to statistics! There are different types of graphics to show statistics. Therefore, we show a variety of tools available to display different results. No matter what our subject is, we can include these types of graphics in our Moodle course. You can use these graphics to help your students give weight to their arguments and express themselves using key points clearly. We teach students to include graphics, read them, and use them as a tool of communication. We can also work with puzzles related to statistics. That is to say, we can invent a graph and give tips or clues to our students so that they can sort out which percentages belong to the chart. In other words, we can create a listening comprehension activity, a reading comprehension activity, or a math problem. We can just upload or embed the chart, create an appealing activity, and give clues to our students so that they can think of the items belonging to the chart. Inserting column charts In this activity, we work with the website http://populationaction.org/. We work with statistics about different topics that are related to each other. We can explore different countries and use several charts in order to draw conclusions. We can also embed the charts in our Moodle course. Getting ready We need to think of a country to work with. We can compare statistics of population, water, croplands, and forests of different countries in order to draw conclusions about their futures. How to do it... We go to the website mentioned earlier and follow some steps in order to get the HTML code to embed it in our Moodle course. In this case, we choose Canada. These are the steps to follow: Enter http://populationaction.org/ in the browser window. Navigate to Publications | Data & Maps. Click on People in the Balance. Click on the down arrow next to the Country or Region Name search block and choose Canada, as shown in the following screenshot: Go to the bottom of the page and click on Share. Copy the HTML code, as shown in the following screenshot: Click on Done. How it works... It is time to embed the charts in our Moodle course. Another option is to draw the charts using a spreadsheet. So, we choose the weekly outline section where we want to add this activity and perform the following steps: Click on Add an activity or resource. Click on Forum | Add. Complete the Forum name block. Click on the down arrow in Forum type and choose Q and A forum. Complete the Description block. Click on the Edit HTML source icon. Paste the HTML code that was copied. Click on Update. Click on the down arrow next to Subscription mode and choose Forced subscription. Click on Save and display. The activity looks as shown in the following screenshot: Embedding a line chart In this recipe, we will present the estimated number of people (in millions) using a particular language over the Internet. To do this, we may include images in our spreadsheet in accordance with the method being used to design the activity. Instead of writing the name of the languages, we insert the flags that represent the language used. We design the line chart taking into account the statistical operations carried out at http://www.internetworldstats.com/stats7.htm. Getting ready We carry out the activity using Google Docs. We have to sign in and follow the steps required to design a spreadsheet file. We have several options for working with the document. After you have an account to work with Google Drive, let's see how to make our line chart! How to do it... We work with s spreadsheet because we need to make calculations and create a chart. First, we need to create a document in the spreadsheet. Therefore, we need to perform the following steps: Click on Create | Spreadsheet, as shown in the following screenshot: Write the name of the languages spoken in the A column. Write the figures in the B column (from the http://www.internetworldstats.com/stats7.htm website). Select the data from A1 up to the B11 column. Click on Insert | Chart. Edit your chart using the Chart Editor, as shown in the following screenshot: Click on Insert. Add the images of the flags corresponding to the languages spoken. Position the cursor over C1 and click on Insert | Image.... Another pop-up window will appear. You have several ways to upload images, as shown in the following screenshot: Click on Choose an image to upload and insert the image from your computer. Click on Select. Repeat the same process for all the languages. Steps 7 to 11 are optional. Click on the chart. Click on the down arrow in Share | Publish chart..., as shown in the following screenshot: Click on the down arrow next to Select a public format and choose Image, as shown in the following screenshot: Copy the HTML code that appears, as shown in the previous screenshot. Click on Done. How it works... We have just designed the chart that we want our students to work with. We are going to embed the chart in our Moodle course; another option is to share the spreadsheet and allow students to draw the chart. If you want to design a warm-up activity for students to guess or find out which the top languages used over the Internet are, you could add a chat, forum, or a question in the course. In this recipe, we are going to create a wiki so that students can work together. So, select the weekly outline section where you want to add the activity and perform the following steps: Click on Add an activity or resource. Click on Wiki | Add. Complete the Wiki name and Description blocks. Click on the Edit HTML source icon and paste the HTML code that we have previously copied. Then click on Update. Complete the First page name block. Click on Save and return to course. The activity looks as shown in the following screenshot:
Read more
  • 0
  • 0
  • 2230
article-image-mocking-static-methods-simple
Packt
30 Oct 2013
7 min read
Save for later

Mocking static methods (Simple)

Packt
30 Oct 2013
7 min read
(For more resources related to this topic, see here.) Getting ready The use of static methods is usually considered a bad Object Oriented Programming practice, but if we end up in a project that uses a pattern such as active record (see http://en.wikipedia.org/wiki/Active_record_pattern), we will end up having a lot of static methods. In such situations, we will need to write some unit tests and PowerMock could be quite handy. Start your favorite IDE (which we set up in the Getting and installing PowerMock (Simple) recipe), and let's fire away. How to do it... We will start where we left off. In the EmployeeService.java file, we need to implement the getEmployeeCount method; currently it throws an instance of UnsupportedOperationException. Let's implement the method in the EmployeeService class; the updated classes are as follows: /** * This class is responsible to handle the CRUD * operations on the Employee objects. * @author Deep Shah */ public class EmployeeService { /** * This method is responsible to return * the count of employees in the system. * It does it by calling the * static count method on the Employee class. * @return Total number of employees in the system. */ public int getEmployeeCount() { return Employee.count(); } } /** * This is a model class that will hold * properties specific to an employee in the system. * @author Deep Shah */ public class Employee { /** * The method that is responsible to return the * count of employees in the system. * @return The total number of employees in the system. * Currently this * method throws UnsupportedOperationException. */ public static int count() { throw new UnsupportedOperationException(); } } The getEmployeeCount method of EmployeeService calls the static method count of the Employee class. This method in turn throws an instance of UnsupportedOperationException. To write a unit test of the getEmployeeCount method of EmployeeService, we will need to mock the static method count of the Employee class. Let's create a file called EmployeeServiceTest.java in the test directory. This class is as follows: /** * The class that holds all unit tests for * the EmployeeService class. * @author Deep Shah */ @RunWith(PowerMockRunner.class) @PrepareForTest(Employee.class) public class EmployeeServiceTest { @Test public void shouldReturnTheCountOfEmployeesUsingTheDomainClass() { PowerMockito.mockStatic(Employee.class); PowerMockito.when(Employee.count()).thenReturn(900); EmployeeService employeeService = newEmployeeService(); Assert.assertEquals(900,employeeService.getEmployeeCount()); } } If we run the preceding test, it passes. The important things to notice are the two annotations (@RunWith and @PrepareForTest) at the top of the class, and the call to the PowerMockito.mockStatic method. The @RunWith(PowerMockRunner.class) statement tells JUnit to execute the test using PowerMockRunner. The @PrepareForTest(Employee.class) statement tells PowerMock to prepare the Employee class for tests. This annotation is required when we want to mock final classes or classes with final, private, static, or native methods. The PowerMockito.mockStatic(Employee.class) statement tells PowerMock that we want to mock all the static methods of the Employee class. The next statements in the code are pretty standard, and we have looked at them earlier in the Saying Hello World! (Simple) recipe. We are basically setting up the static count method of the Employee class to return 900. Finally, we are asserting that when the getEmployeeCount method on the instance of EmployeeService is invoked, we do get 900 back. Let's look at one more example of mocking a static method; but this time, let's mock a static method that returns void. We want to add another method to the EmployeeService class that will increment the salary of all employees (wouldn't we love to have such a method in reality?). Updated code is as follows: /** * This method is responsible to increment the salary * of all employees in the system by the given percentage. * It does this by calling the static giveIncrementOf method * on the Employee class. * @param percentage the percentage value by which * salaries would be increased * @return true if the increment was successful. * False if increment failed because of some exception* otherwise. */ public boolean giveIncrementToAllEmployeesOf(intpercentage) { try{ Employee.giveIncrementOf(percentage); return true; } catch(Exception e) { return false; } } The static method Employee.giveIncrementOf is as follows: /** * The method that is responsible to increment * salaries of all employees by the given percentage. * @param percentage the percentage value by which * salaries would be increased * Currently this method throws * UnsupportedOperationException. */ public static void giveIncrementOf(int percentage) { throw new UnsupportedOperationException(); } The earlier syntax would not work for mocking a void static method . The test case that mocks this method would look like the following: @RunWith(PowerMockRunner.class) @PrepareForTest(Employee.class) public class EmployeeServiceTest { @Test public void shouldReturnTrueWhenIncrementOf10PercentageIsGivenSuccessfully() { PowerMockito.mockStatic(Employee.class); PowerMockito.doNothing().when(Employee.class); Employee.giveIncrementOf(10); EmployeeService employeeService = newEmployeeService(); Assert.assertTrue(employeeService.giveIncrementToAllEmployeesOf(10)); } @Test public void shouldReturnFalseWhenIncrementOf10PercentageIsNotGivenSuccessfully() { PowerMockito.mockStatic(Employee.class); PowerMockito.doThrow(newIllegalStateException()).when(Employee.class); Employee.giveIncrementOf(10); EmployeeService employeeService = newEmployeeService(); Assert.assertFalse(employeeService.giveIncrementToAllEmployeesOf(10)); } } Notice that we still need the two annotations @RunWith and @PrepareForTest, and we still need to inform PowerMock that we want to mock the static methods of the Employee class. Notice the syntax for PowerMockito.doNothing and PowerMockito.doThrow: The PowerMockito.doNothing method tells PowerMock to literally do nothing when a certain method is called. The next statement of the doNothing call sets up the mock method. In this case it's the Employee.giveIncrementOf method. This essentially means that PowerMock will do nothing when the Employee.giveIncrementOf method is called. The PowerMockito.doThrow method tells PowerMock to throw an exception when a certain method is called. The next statement of the doThrow call tells PowerMock about the method that should throw an exception; in this case, it would again be Employee.giveIncrementOf. Hence, when the Employee.giveIncrementOf method is called, PowerMock will throw an instance of IllegalStateException. How it works... PowerMock uses custom class loader and bytecode manipulation to enable mocking of static methods. It does this by using the @RunWith and @PrepareForTest annotations. The rule of thumb is whenever we want to mock any method that returns a non-void value , we should be using the PowerMockito.when().thenReturn() syntax. It's the same syntax for instance methods as well as static methods. But for methods that return void, the preceding syntax cannot work. Hence, we have to use PowerMockito.doNothing and PowerMockito.doThrow. This syntax for static methods looks a bit like the record-playback style. On a mocked instance created using PowerMock, we can choose to return canned values only for a few methods; however, PowerMock will provide defaults values for all the other methods. This means that if we did not provide any canned value for a method that returns an int value, PowerMock will mock such a method and return 0 (since 0 is the default value for the int datatype) when invoked. There's more... The syntax of PowerMockito.doNothing and PowerMockito.doThrow can be used on instance methods as well. .doNothing and .doThrow on instance methods The syntax on instance methods is simpler compared to the one used for static methods. Let's say we want to mock the instance method save on the Employee class. The save method returns void, hence we have to use the doNothing and doThrow syntax. The test code to achieve is as follows: /** * The class that holds all unit tests for * the Employee class. * @author Deep Shah */ public class EmployeeTest { @Test() public void shouldNotDoAnythingIfEmployeeWasSaved() { Employee employee =PowerMockito.mock(Employee.class); PowerMockito.doNothing().when(employee.save(); try { employee.save(); } catch(Exception e) { Assert.fail("Should not have thrown anexception"); } } @Test(expected = IllegalStateException.class) public void shouldThrowAnExceptionIfEmployeeWasNotSaved() { Employee employee =PowerMockito.mock(Employee.class); PowerMockito.doThrow(newIllegalStateException()).when(employee).save(); employee.save(); } } To inform PowerMock about the method to mock, we just have to invoke it on the return value of the when method. The line PowerMockito.doNothing().when(employee).save() essentially means do nothing when the save method is invoked on the mocked Employee instance. Similarly, PowerMockito.doThrow(new IllegalStateException()).when(employee).save() means throw IllegalStateException when the save method is invoked on the mocked Employee instance. Notice that the syntax is more fluent when we want to mock void instance methods. Summary In this article, we saw how easily we can mock static methods. Resources for Article: Further resources on this subject: Important features of Mockito [Article] Python Testing: Mock Objects [Article] Easily Writing SQL Queries with Spring Python [Article]
Read more
  • 0
  • 0
  • 15338

article-image-cloudera-hadoop-and-hp-vertica
Packt
29 Oct 2013
9 min read
Save for later

Cloudera Hadoop and HP Vertica

Packt
29 Oct 2013
9 min read
(For more resources related to this topic, see here.) Cloudera Hadoop Hadoop is one of the names we think about when it comes to Big Data. I'm not going into details about it since there is plenty of information out there; moreover, like somebody once said, "If you decided to use Hadoop for your data warehouse, then you probably have a good reason for it". Let's not forget: it is primarily a distributed filesystem, not a relational database. That said, there are many cases when we may need to use this technology for number crunching, for example, together with MicroStrategy for analysis and reporting. There are mainly two ways to leverage Hadoop data from MicroStrategy: the first is Hive and the second is Impala. They both work as SQL bridges to the underlying Hadoop structures, converting standard SELECT statements into jobs. The connection is handled by a proprietary 32-bit ODBC driver available for free from the Cloudera website. In my tests, Impala resulted largely faster than Hive, so I will show you how to use it from our MicroStrategy virtual machine. Please note that I am using Version 9.3.0 for consistency with the rest of the book. If you're serious about Big Data and Hadoop, I strongly recommend upgrading to 9.3.1 for enhanced performance and easier setup. See MicroStrategy knowledge base document TN43588 : Post-Certification of Cloudera Impala 1.0 with MicroStrategy 9.3.1 . The ODBC driver is the same for both Hive and Impala, only the driver settings change. Connecting to a Hadoop database To show how we can connect to a Hadoop database, I will use two virtual machines: one with MicroStrategy Suite and the second with Cloudera Hadoop distribution, specifically, a virtual appliance that is available for download from their website. The configuration of the Hadoop cluster is out of scope; moreover, I am not a Hadoop expert. I'll simply give some hints, feel free to use any other configuration/vendor, the procedure and ODBC parameters should be similar. Getting ready Start by going to http://at5.us/AppAU1 The Cloudera VM download is almost 3 GB (cloudera-quickstart-vm-4.3.0-vmware.tar.gz) and features the CH4 version. After unpacking the archive, you'll find a cloudera-quickstart-vm-4.3.0-vmware.ovf file that can be opened with VMware, see screen capture: Accept the defaults and click on Import to generate the cloudera-quickstart-vm-4.3.0-vmware virtual machine. Before starting the Cloudera appliance, change the network card settings from NAT to Bridged since we need to access the database from another VM: Leave the rest of the parameters, as per the default, and start the machine. After a while, you'll be presented with a graphical interface of Centos Linux. If the network has started correctly, the machine should have received an IP address from your network DHCP. We need a fixed rather than dynamic address in the Hadoop VM, so: Open the System | Preferences | Network Connections menu. Select the name of your card (should be something like Auto eth1 ) and click on Edit… . Move to the IPv4 Settings tab and change the Method from Automatic (DHCP) to Manual . Click on the Add button to create a new address. Ask your network administrator for details here and fill Address , Netmask , and Gateway . Click on Apply… and when prompted type the root password cloudera and click on Authenticate . Then click on Close . Check if the change was successful by opening a Terminal window (Applications | System Tools | Terminal ) and issue the ifconfig command, the answer should include the address that you typed in step 4. From the MicroStrategy Suite virtual machine, test if you can ping the Cloudera VM. When we first start Hadoop, there are no tables in the database, so we create the samples: In the Cloudera virtual machine, from the main page in Firefox open Cloudera Manager , click on I Agree in the Information Assurance Policy dialog. Log in with username admin and password admin. Look for a line with a service named oozie1 , notice that it is stopped. Click on the Actions button and select Start… . Confirm with the Start button in the dialog. A Status window will pop up, wait until the Progress is reported as Finished and close it. Now click on the Hue button in the bookmarks toolbar. Sign up with username admin and password admin, you are now in the Hue home page. Click on the first button in the blue Hue toolbar (tool tip: About Hue ) to go to the quick Start Wizard . Click on the Next button to go to Step 2: Examples tab. Click on Beeswax (Hive UI) and wait until a message over the toolbar says Examples refreshed . Now in the Hue toolbar, click on the seventh button from the left (tool tip: Metastore Manager ), you will see the default database with two tables: sample_07 and sample_08 . Enable the checkbox of sample_08 and click on the Browse Data button. After a while the Results tab shows a grid with data. So far so good. We now go back to the Cloudera Manager to start the Impala service. Click on the Cloudera Manager bookmark button. In the Impala1 row, open the Actions menu and choose Start… , then confirm Start . Wait until the Progress says Finished , then click on Close in the command details window. Go back to Hue and click on the fourth button on the toolbar (tool tip: Cloudera Impala (TM) Query UI ). In the Query Editor text area, type select * from sample_08 and click on Execute to see the table content. Next, we open the MicroStrategy virtual machine and download the 32-bit Cloudera ODBC Driver for Apache Hive, Version 2.0 from http://at5.us/AppAU2. Download the ClouderaHiveODBCSetup_v2_00.exe file and save it in C:install. How to do it... We install the ODBC driver: Run C:installClouderaHiveODBCSetup_v2_00.exe and click on the Next button until you reach Finish at the end of the setup, accepting every default. Go to Start | All Programs | Administrative Tools | Data Sources (ODBC) to open the 32-bit ODBC Data Source Administrator (if you're on 64-bit Windows, it's in the SysWOW64 folder). Click on System DSN and hit the Add… button. Select Cloudera ODBC Driver for Apache Hive and click on Finish . Fill the Hive ODBC DSN Configuration with these case-sensitive parameters (change the Host IP according to the address used in step 4 of the Getting ready section): Data Source Name : Cloudera VM Host : 192.168.1.40 Port : 21050 Database : default Type : HS2NoSasl Click on OK and then on OK again to close the ODBC Data Source Administrator. Now open the MicroStrategy Desktop application and log in with administrator and the corresponding password. Right-click on MicroStrategy Analytics Modules and select Create New Project… . Click on the Create project button and name it HADOOP, uncheck Enable Change Journal for this project and click on OK . When the wizard finishes creating the project click on Select tables from the Warehouse Catalog and hit the button labeled New… . Click on Next and type Cloudera VM in the Name textbox of the Database Instance Definition window. In this same window, open the Database type combobox and scroll down until you find Generic DBMS . Click on Next . In Local system ODBC data sources , pick Cloudera VM and type admin in both Database login and Password textboxes. Click on Next , then on Finish , and then on OK . When a Warehouse Catalog Browser error appears, click on Yes . In the Warehouse Catalog Options window, click on Edit… on the right below Cloudera VM . Select the Advanced tab and enable the radio button labeled Use 2.0 ODBC calls in the ODBC Version group. Click on OK . Now select the category Catalog | Read Settings in the left tree and enable the first radio button labeled Use standard ODBC calls to obtain the database catalog . Click on OK to close this window. When the Warehouse Catalog window appears, click on the lightning button (tool tip: Read the Warehouse Catalog ) to refresh the list of available tables. Pick sample_08 and move it to the right of the shopping cart. Then right-click on it and choose Import Prefix . Click on Save and Close and then on OK twice to close the Project Creation Assistant . You can now open the project and update the schema. From here, the procedure to create objects is the same as in any other project: Go to the Schema Objects | Attributes folder, and create a new Job attribute with these columns: ID : Table: sample_08 Column: code DESC : Table: sample_08 Column: description Go to the Fact folder and create a new Salary fact with salary column. Update the schema. Go to the Public Objects | Metrics folder and create a new Salary metric based on the Salary fact with Sum as aggregation function. Go to My Personal Objects | My Reports and create a new report with the Job attribute and the Salary metric: There you go; you just created your first Hadoop report. How it works... Executing Hadoop reports is no different from running any other standard DBMS reports. The ODBC driver handles the communication with Cloudera machine and Impala manages the creation of jobs to retrieve data. From MicroStrategy perspective, it is just another SELECT query that returns a dataset. There's more... Impala and Hive do not support the whole set of ANSI SQL syntax, so in some cases you may receive an error if a specific feature is not implemented: See the Cloudera documentation for details. HP Vertica Vertica Analytic Database is grid-based, column-oriented, and designed to manage large, fast-growing volumes of data while providing rapid query performance. It features a storage organization that favors SELECT statements over UPDATE and DELETE plus a high compression that stores columns of homogeneous datatype together. The Community (free) Edition allows up to three hosts and 1 TB of data, which is fairly sufficient for small to medium BI projects with MicroStrategy. There are several clients available for different operating systems, including 32-bit and 64-bit ODBC drivers for Windows.
Read more
  • 0
  • 0
  • 2986

article-image-using-app-directory-hootsuite
Packt
29 Oct 2013
6 min read
Save for later

Using App Directory in HootSuite

Packt
29 Oct 2013
6 min read
(For more resources related to this topic, see here.) How to do it... Move the cursor on the left ribbon. Select App Directory to open a new window. Select the required app/plugin and integrate available social content apps within HootSuite's web-based dashboard. How it works... Select the App Directory option from HootSuite's left ribbon and it will open a dialogue box with a list of apps that can be integrated with HootSuite. Currently, HootSuite allows you to connect with more than 55 social media networks. Most of the apps are free to connect, however, few premium apps have also been provided. They generally cost around $1.99 to $4.99 per month. Some of the premium apps that can be integrated by all types of users are Statigram, YouTube, Salesforce, and TrendSpottr. HubSpot and SocialFlow are the two premium apps that are only available to Pro and Enterprise HootSuite users. A few commonly used social networking websites can be integrated with HootSuite using app integration, such as Instagram, Tumblr, YouTube, SlideShare, Reddit, Vimeo, Flickr, Evernote, Storify, and StumbleUpon. You can simply click on Install App to integrate them with the HootSuite account. Once integrated, you can create a stream within any of the existing tabs or create new tabs for any specific app. Some of the apps will be standalone, whereas a few will require your login details, such as YouTube and Instagram. There's more... Let's discuss the few less-known, but important, apps that can be used by marketers using HootSuite. Evernote Evernote allows users to capture, organize, and find information from multiple platforms. Evernote is a widely used application because it works on almost all computers, mobile devices, and tablets. The Evernote app allows its users to search content from social networks using keywords. You can view, edit, and share notes with those on your social networks using HootSuite integration. RSS Reader You can add RSS feeds in the RSS Reader app and start receiving articles or information directly on your HootSuite dashboard. You can rename this stream as well so that you can manage multiple streams, receiving information from different sources. If you already have an OPML file, you can import multiple feeds and get a stream of articles. As this app is integrated within HootSuite, it allows you to share the articles on your social networks as well. TrendSpottr TrendSpottr is one of the best apps that we have seen recently. It lets you find trending topics from across social networks. The TrendSpottr app allows you to search for trending content using a keyword/phrase, topic, or type such as trending content, trending hashtags, or trending sources. You can also pick from predefined topics—news, technology, social media, infographic, economy, sports, pop culture, politics, science, and celebrity—or from a list of popular searches. You can share the trending content with your social networks right from this app integrated with HootSuite. YouTube As a digital marketer, you always want to post videos about your products, features, client testimonials, and other visuals to stay connected with your clients. You can connect multiple YouTube accounts at a monthly price of $1.99. Create a separate tab to share and schedule videos from your YouTube account. You can add a title, description, video tags, and privacy settings, and assign a category to your video before uploading. You can also create streams to search and monitor any specific channel, uploader information, or subscriptions. This proves very helpful to share any YouTube video on your other social profiles connected with HootSuite. YouTube Analytics Once you integrate your YouTube account and channel with HootSuite, you can view all analytics from the dashboard itself by integrating the free analytics app (paid app with more insights also available). You can view details about your videos and channels with video-based insights such as engagement levels, sources and countries of traffic, playbacks, demographics and geographic information. You can use all the insights available on the YouTube website. Flickr Just like videos being shared on YouTube, some organizations share pictures of their employees, products, and so on over Flickr. HootSuite offers the ability to integrate using the Flickr app that allows you to view images, upload new pictures, search for pictures, and share them on other connected social media profiles such as Twitter and Facebook. SurveyMonkey As a marketer, you always want to interact with your followers and have an ongoing need to take their feedback. SurveyMonkey is a one-stop solution for feedback from not only your Twitter followers, but also your prospects and customers through e-mails and other online channels. This app allows you to track the created surveys and share them with your social networks. Responses in SurveyMonkey's stream on HootSuite's dashboard are updated on a real-time basis. Once you start receiving responses, you can view responses in a question-summary form that pop up after clicking on the bar graph icon under the Actions label, as seen in the preceding screenshot. You can also see detailed replies by all the respondents after switching to the Individual Responses tab as shown in the following screenshot: SlideShare Marketing, sales, training, and support teams of organizations create PDFs, presentations, and documents to keep their clients or employees abreast with recent happenings. This SlideShare app that gets integrated with HootSuite provides easy access to view complete presentations and search documents on the basis of keywords. It also allows the sharing of such documents over other social media platforms that are connected to HootSuite. List of apps For the updated list of app directories that can be integrated with HootSuite's web-based social profile management tool, you can visit http://hootsuite.com/app-directory. Summary We saw in this article, that almost all the popular social websites can be integrated with HootSuite as apps. This greatly helps marketers in organizing customer interactions with the help of feedbacks and discussions, thus making HootSuite the most preferred choice for managing customer relations. Resources for Article: Further resources on this subject: Social Media in Magento [Article] Social Media for Wordpress: VIP Memberships [Article] Using Media Files – playing audio files [Article]
Read more
  • 0
  • 0
  • 3624
article-image-creating-and-using-composer-packages
Packt
29 Oct 2013
7 min read
Save for later

Creating and Using Composer Packages

Packt
29 Oct 2013
7 min read
(For more resources related to this topic, see here.) Using Bundles One of the great features in Laravel is the ease in which we can include the class libraries that others have made using bundles. On the Laravel site, there are already many useful bundles, some of which automate certain tasks while others easily integrate with third-party APIs. A recent addition to the PHP world is Composer, which allows us to use libraries (or packages) that aren't specific to Laravel. In this article, we'll get up-and-running with using bundles, and we'll even create our own bundle that others can download. We'll also see how to incorporate Composer into our Laravel installation to open up a wide range of PHP libraries that we can use in our application. Downloading and installing packages One of the best features of Laravel is how modular it is. Most of the framework is built using libraries, or packages, that are well tested and widely used in other projects. By using Composer for dependency management, we can easily include other packages and seamlessly integrate them into our Laravel app. For this recipe, we'll be installing two popular packages into our app: Jeffrey Way's Laravel 4 Generators and the Imagine image processing packages. Getting ready For this recipe, we need a standard installation of Laravel using Composer. How to do it... For this recipe, we will follow these steps: Go to https://packagist.org/. In the search box, search for way generator as shown in the following screenshot: Click on the link for way/generators : View the details at https://packagist.org/packages/way/generators and take notice of the require line to get the package's version. For our purposes, we'll use "way/generators": "1.0.*" . In our application's root directory, open up the composer.json file and add in the package to the require section so it looks like this: "require": { "laravel/framework": "4.0.*", "way/generators": "1.0.*" }, Go back to http://packagist.org and perform a search for imagine as shown in the following screenshot: Click on the link to imagine/imagine and copy the require code for dev-master : Go back to our composer.json file and update the require section to include the imagine package . It should now look similar to the following code: "require": { "laravel/framework": "4.0.*", "way/generators": "1.0.*", "imagine/imagine": "dev-master" }, Open the command line, and in the root of our application, run the Composer update as follows: php composer.phar update Finally, we'll add the Generator Service Provider, so open the app/config/app.php file and in the providers array, add the following line: 'WayGeneratorsGeneratorsServiceProvider' How it works... To get our package, we first go to packagist.org and search for the package we want. We could also click on the Browse packages link. It will display a list of the most recent packages as well as the most popular. After clicking on the package we want, we'll be taken to the detail page, which lists various links including the package's repository and home page. We could also click on the package's maintainer link to see other packages they have released. Underneath, we'll see the various versions of the package. If we open that version's detail page, we'll find the code we need to use for our composer.json file. We could either choose to use a strict version number, add a wildcard to the version, or use dev-master, which will install whatever is updated on the package's master branch. For the Generators package, we'll only use Version 1.0, but allow any minor fixes to that version. For the imagine package, we'll use dev-master, so whatever is in their repository's master branch will be downloaded, regardless of version number. We then run update on Composer and it will automatically download and install all of the packages we chose. Finally, to use Generators in our app, we need to register the service provider in our app's config file. Using the Generators package to set up an app Generators is a popular Laravel package that automates quite a bit of file creation. In addition to controllers and models, it can also generate views, migrations, seeds, and more, all through a command-line interface. Getting ready For this recipe, we'll be using the Laravel 4 Generators package maintained by Jeffrey Way that was installed in the Downloading and installing packages recipe. We'll also need a properly configured MySQL database. How to do it… Follow these steps for this recipe: Open the command line in the root of our app and, using the generator, create a scaffold for our cities as follows: php artisan generate:scaffold cities --fields="city:string" In the command line, create a scaffold for our superheroes as follows: php artisan generate:scaffold superheroes --fields="name:string, city_id:integer:unsigned" In our project, look in the app/database/seeds directory and find a file named CitiesTableSeeder.php. Open it and add some data to the $cities array as follows: <?php class CitiesTableSeeder extends Seeder { public function run() { DB::table('cities')->delete(); $cities = array( array( 'id' => 1, 'city' => 'New York', 'created_at' => date('Y-m-d g:i:s',time()) ), array( 'id' => 2, 'city' => 'Metropolis', 'created_at' => date('Y-m-d g:i:s',time()) ), array( 'id' => 3, 'city' => 'Gotham', 'created_at' => date('Y-m-d g:i:s',time()) ) ); DB::table('cities')->insert($cities); } } In the app/database/seeds directory, open SuperheroesTableSeeder.php and add some data to it: <?php class SuperheroesTableSeeder extends Seeder { public function run() { DB::table('superheroes')->delete(); $superheroes = array( array( 'name' => 'Spiderman', 'city_id' => 1, 'created_at' => date('Y-m-d g:i:s', time()) ), array( 'name' => 'Superman', 'city_id' => 2, 'created_at' => date('Y-m-d g:i:s', time()) ), array( 'name' => 'Batman', 'city_id' => 3, 'created_at' => date('Y-m-d g:i:s', time()) ), array( 'name' => 'The Thing', 'city_id' => 1, 'created_at' => date('Y-m-d g:i:s', time()) ) ); DB::table('superheroes')->insert($superheroes); } } In the command line, run the migration then seed the database as follows: php artisan migrate php artisan db:seed Open up a web browser and go to http://{your-server}/cities. We will see our data as shown in the following screenshot: Now, navigate to http://{your-server}/superheroes and we will see our data as shown in the following screenshot: How it works... We begin by running the scaffold generator for our cities and superheroes tables. Using the --fields tag, we can determine which columns we want in our table and also set options such as data type. For our cities table, we'll only need the name of the city. For our superheroes table, we'll want the name of the hero as well as the ID of the city where they live. When we run the generator, many files will automatically be created for us. For example, with cities, we'll get City.php in our models, CitiesController.php in controllers, and a cities directory in our views with the index, show, create, and edit views. We then get a migration named Create_cities_table.php, a CitiesTableSeeder.php seed file, and CitiesTest.php in our tests directory. We'll also have our DatabaseSeeder.php file and our routes.php file updated to include everything we need. To add some data to our tables, we opened the CitiesTableSeeder.php file and updated our $cities array with arrays that represent each row we want to add. We did the same thing for our SuperheroesTableSeeder.php file. Finally, we run the migrations and seeder and our database will be created and all the data will be inserted. The Generators package has already created the views and controllers we need to manipulate the data, so we can easily go to our browser and see all of our data. We can also create new rows, update existing rows, and delete rows.
Read more
  • 0
  • 0
  • 7807

article-image-multiserver-installation
Packt
29 Oct 2013
7 min read
Save for later

Multiserver Installation

Packt
29 Oct 2013
7 min read
(For more resources related to this topic, see here.) The prerequisites for Zimbra Let us dive into the prerequisites for Zimbra: Zimbra supports only 64-bit LTS versions of Ubuntu, release 10.04 and above. If you would like to use a 32-bit version, you should use Ubuntu 8.04.x LTS with Zimbra 7.2.3. Having a clean and freshly installed system is preferred for Zimbra; it requires a dedicated system and there is no need to install components such as Apache and MySQL since the Zimbra server contains all the components it needs. Note that installing Zimbra with another service (such as a web server) on the same server can cause operational issues. The dependencies (libperl5.14, libgmp3c2, build-essential, sqlite3, sysstat, and ntp) should be installed beforehand. Configure a fixed IP address on the server. Have a domain name and a well-configured DNS (A and MX entries) that points to the server. The system clocks should be synced on all servers. Configure the file /etc/resolv.conf on all servers to point at the server on which we installed the bind (it can be installed on any Zimbra server or on a separate server). We will explain this point in detail later. Preparing the environment Before starting the Zimbra installation process, we should prepare the environment. In the first part of this section, we will see the different possible configurations and then, in the second part, we will present the needed assumptions to apply the chosen configuration. Multiserver configuration examples One of the greatest advantages of Zimbra is its scalability; we can deploy it for a small business with few mail accounts as well as for a huge organization with thousands of mail accounts. There are many possible configuration options; the following are the most used out of those: Small configuration: All Zimbra components are installed on only one server. Medium configuration: Here, LDAP and message store are installed on one server and Zimbra MTA on a separate server. Note here that we can use more Zimbra MTA servers so we can scale easier for large incoming or outgoing e-mail volume. Large configuration: In this case, LDAP will be installed on a dedicated server and we will have multiple mailbox and MTA servers, so we can scale easier for a large number of users. Very large configuration: The difference between this configuration and large one is the existence of an additional LDAP server, so we will have a Master LDAP and its replica. We choose the medium configuration; so, we will install LDAP and mailbox in one server and MTA on the other server. Install different servers in the following order (for medium configuration, 1 and 2 are combined in only one step): 1. First of all, install and configure the LDAP server. 2. Then, install and configure Zimbra mailbox servers. 3. Finally, install Zimbra MTA servers and finish the whole installation configuration. New installations of Zimbra limit spam/ham training to the first installed MTA. If you uninstall or move this MTA, you should enable spam/ham training on another MTA as one host should have this enabled to run zmtrainsa --cleanup. To do this, execute the following command: zmlocalconfig -e zmtrainsa_cleanup_host=TRUE Assumptions In this article, we will use some specific information as input in the Zimbra installation process, which, in most cases, will be different for each user. Therefore, we will note some of the most redundant ones in this section. Remember that you should specify your own values rather than using the arbitrary values that I have provided. The following is the list of assumptions used : OS version: ubuntu-12.04.2-server-amd64 Zimbra version: zcs-8.0.3_GA_5664.UBUNTU12_64.20130305090204 MTA server name: mta MTA hostname: mta.zimbra-essentials.com Internet domain: zimbra-essentials.com MTA server IP address: 172.16.126.141 MTA server IP subnet mask: 255.255.255.0 MTA server IP gateway: 172.16.126.1 Internal DNS server: 172.16.126.11 External DNS server: 8.8.8.8 MTA admin ID: abdelmonam MTA admin Password: Z!mbra@dm1n Zimbra admin Password: zimbrabook MTA server name: ldap MTA hostname: ldap.zimbra-essentials.com LDAP server IP address: 172.16.126.140 LDAP server IP subnet mask: 255.255.255.0 LDAP server IP gateway: 172.16.126.1 Internal DNS server: 172.16.126.11 External DNS server: 8.8.8.8 LDAP admin ID: abdelmonam LDAP admin password: Z!mbra@dm1n To be able to follow the steps described in the next sections, especially each time we need to perform a configuration, the reader should know how to harness the vi editor. If not, you should develop your skill set for using the vi editor or use another editor instead. You can find good basic training for the vi editor at http://www.cs.colostate.edu/helpdocs/vi.html System requirements For the various system requirements, please refer to the following link: http://www.zimbra.com/docs/os/8.0.0/multi_server_install/wwhelp/wwhimpl/common/html/wwhelp.htm#href=ZCS_Multiserver_Open_8.0.System_Requirements_for_VMware_Zimbra_Collaboration_Server_8.0.html&single=true If you are using another version of Zimbra, please check the correct requirements on the Zimbra website. Ubuntu server installation First of all, choose the appropriate language. Choose Install Ubuntu Server and then press Enter. When the installation prompts you to provide a hostname, configure only a one-word hostname; in the Assumptions section, we've chosen ldap for the LDAP and mailstore server and mta for the MTA server—don't give the fully qualified domain name (for example, mta.zimbra-essentials.com). On the next screen that calls for the domain name, assign it zimbra-essentials.com (without the hostname). The hard disk setup is simple if you are using a single drive; however, in the case of a server, it's not the best way to do things. There are a lot of options for partitioning your drives. In our case, we just make a little partition (2x RAM) for swapping, and what remains will be used for the whole system. Others can recommend separate partitions for mailstore, system, and so on. Feel free to use the recommendation you want depending on your IT architecture; use your own judgment here or ask your IT manager. After finishing the partitioning task, you will be asked to enter the username and password; you can choose what you want except admin and zimbra. When asked if you want to encrypt the home directory, select No and then press Enter. Press Enter to accept an empty entry for the HTTP proxy. Choose Install security updates automatically and then press Enter. On the Software Selection screen, you must select the DNS Server and the OpenSSH Server choices for installation; no other options. This will authorize remote administration (SSH) and mandatorily set up bind9 for a split DNS. For bind9, you can install it on only one server, which is what we've done in this article. Select Yes and then press Enter to install the GRUB boot loader to the master boot record. The installation should have completed successfully. Preparing Ubuntu for Zimbra installation In order to prepare the Ubuntu for the Zimbra installation, the following steps need to be performed: Log in to the newly installed system and update and upgrade Ubuntu using the following commands: sudo apt-get update sudo apt-get upgrade Install the dependencies as follows: sudo apt-get install libperl5.14 libgmp3c2 build-essential sqlite3 sysstat ntp Zimbra recommends (but there's no obligation) to disable and remove Apparmor. sudo /etc/init.d/apparmor stop sudo /etc/init.d/apparmor teardown sudo update-rc.d -f apparmor remove sudo aptitude remove apparmor apparmor-utils Set the static IP for your server as follows: Open the network interfaces file using the following command: sudo vi /etc/network/interfaces Then replace the following line: iface eth0 inet dhcp With: iface eth0 inet static address 172.16.126.14 netmask 255.255.255.0 gateway 172.16.126.1 network 172.16.126.0 broadcast 172.16.126.255 Restart the network process by typing in the following: sudo /etc/init.d/networking restart Sanity test! To verify that your network configuration is configured properly, type in ifconfig and ensure that the settings are correct. Then try to ping any working website (such as google.com) to see if that works. On each server, pay attention when you set the static IP address (172.16.126.140 for the LDAP server and 172.16.126.141 for the MTA server). Summary In this article, we learned the prerequisites for Zimbra multiserver installation and preparing the environment for the installation of the Zimbra server in a multiserver environment. Resources for Article : Further resources on this subject: Routing Rules in AsteriskNOW - The Calling Rules Tables [Article] Users, Profiles, and Connections in Elgg [Article] Integrating Zimbra Collaboration Suite with Microsoft Outlook [Article]
Read more
  • 0
  • 0
  • 4467
Modal Close icon
Modal Close icon