Data | 0 articles | Tech News, Tutorials & Expert Insights

02 Aug 2013

15 min read

Using Oracle GoldenGate

02 Aug 2013

0
0
11117

article-image-making-simple-curl-request-simple

Packt

01 Aug 2013

5 min read

Making a simple cURL request (Simple)

Packt

01 Aug 2013

5 min read

(For more resources related to this topic, see here.) Getting ready In this article we will use cURL to request and download a web page from a server. How to do it... Enter the following code into a new PHP project: <?php // Function to make GET request using cURL function curlGet($url) { $ch = curl_init(); // Initialising cURL session // Setting cURL options curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); curl_setopt($ch, CURLOPT_URL, $url); $results = curl_exec($ch); // Executing cURL session curl_close($ch); // Closing cURL session return $results; // Return the results } $packtPage = curlGet('http://www.packtpub.com/oop-php-5/book'); echo $packtPage; ?> Save the project as 2-curl-request.php (ensure you use the .php extension!). Execute the script. Once our script has completed, we will see the source code of http://www.packtpub.com/oop-php-5/book displayed on the screen. How it works... Let's look at how we performed the previously defined steps: The first line, <?php, and the last line,?>, indicate where our PHP code block will begin and end. All the PHP code should appear between these two tags. Next, we create a function called curlGet(), which accepts a single parameter $url, the URL of the resource to be requested. Running through the code inside the curlGet() function, we start off by initializing a new cURL session as follows: $ch = curl_init(); We then set our options for cURL as follows: curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); // Tells cURL to return the results of the request (the source code of the target page) as a string. curl_setopt($ch, CURLOPT_URL, $url); // Here we tell cURL the URL we wish to request, notice that it is the $url variable that we passed into the function as a parameter. We execute our cURL request, storing the returned string in the $results variable as follows: $results = curl_exec($ch); Now that the cURL request has been made and we have the results, we close the cURL session by using the following code: curl_close($ch); At the end of the function, we return the $results variable containing our requested page, out of the function for using in our script. return $results; After the function is closed we are able to use it throughout the rest of our script. Later, deciding on the URL we wish to request, http://www.packtpub.com/oop-php-5/book , we execute the function, passing the URL as a parameter and storing the returned data from the function in the $packtPage variable as follows: $packtPage = curlGet('http://www.packtpub.com/oop-php-5/book'); Finally, we echo the contents of the $packtPage variable (the page we requested) to the screen by using the following code: echo $packtPage; There's more... There are a number of different HTTP request methods which indicate the server the desired response, or the action to be performed. The request method being used in this article is cURLs default GET request. This tells the server that we would like to retrieve a resource. Depending on the resource we are requesting, a number of parameters may be passed in the URL. For example, when we perform a search on the Packt Publishing website for a query, say, php, we notice that the URL is http://www.packtpub.com/books?keys=php. This is requesting the resource books (the page that displays search results) and passing a value of php to the keys parameter, indicating that the dynamically generated page should show results for the search query php. More cURL Options Of the many cURL options available, only two have been used in our preceding code. They are CURLOPT_RETURNTRANSFER and CURLOPT_URL. Though we will cover many more throughout the course of this article, some other options to be aware of, that you may wish to try out, are listed in the following table: Option Name Value Purpose CURLOPT_FAILONERROR TRUE or FALSE If a response code greater than 400 is returned, cURL will fail silently. CURLOPT_FOLLOWLOCATION TRUE or FALSE If Location: headers are sent by the server, follow the location. CURLOPT_USERAGENT A user agent string, for example: 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.5; rv:15.0) Gecko/20100101 Firefox/15.0.1' Sending the user agent string in your request informs the target server, which client is requesting the resource. Since many servers will only respond to 'legitimate' requests it is advisable to include one. CURLOPT_HTTPHEADER An array containing header information, for example: array('Cache-Control: max-age=0', 'Connection: keep-alive', 'Keep-Alive: 300', 'Accept-Language: en-us,en;q=0.5') This option is used to send header information with the request and we will come across use cases for this in later recipes. A full listing of cURL options can be found on the PHP website at http://php.net/manual/en/function.curl-setopt.php. The HTTP response code An HTTP response code is the number that is returned, which corresponds with the result of an HTTP request. Some common response code values are as follows: 200: OK 301: Moved Permanently 400: Bad Request 401: Unauthorized 403: Forbidden 404: Not Found 500: Internal Server Error Summary This article covers techniques on making a simple cURL request. It is often useful to have our scrapers responding to different response code values in a different manner, for example, letting us know if a web page has moved, or is no longer accessible, or we are unauthorized to access a particular page. In this case, we can access the response of a request using cURL by adding the following line to our function, which will store the response code in the $httpResponse variable: $httpResponse = curl_getinfo($ch, CURLINFO_HTTP_CODE); Resources for Article: Further resources on this subject: A look into the high-level programming operations for the PHP language [Article] Installing PHP-Nuke [Article] Creating Your Own Theme—A Wordpress Tutorial [Article]

0
0
3722

Packt

31 Jul 2013

12 min read

Data sources for the Charts

Packt

31 Jul 2013

12 min read

(For more resources related to this topic, see here.) Spreadsheets In Spreadsheets, two preparation steps must be addressed in order to use a Spreadsheet as a data source with the Visualization API. The first is to identify the URL location of the Spreadsheet file for the API code. The second step is to set appropriate access to the data held in the Spreadsheet file. Preparation The primary method of access for a Spreadsheet behaving as a data source is through a JavaScript-based URL query. The query itself is constructed with the Google Query Language. If the URL request does not include a query, all data source columns and rows are returned in their default order. To query a Spreadsheet also requires that the Spreadsheet fi le and the API application security settings are con figured appropriately. Proper preparation of a Spreadsheet as a data source involves both setting the appropriate access as well as locating the fi le's query URL. Permissions In order for a Spreadsheet to return data to the Visualization API properly, access settings on the Spreadsheets fi le itself must allow view access to users. For a Spreadsheet that allows for edits, including form-based additions, permissions must be set to Edit . To set permissions on the Spreadsheet, select the Share button to open up the Sharing settings dialog. To be sure the data is accessible to the Visualization API, access levels for both the Visualization application and Spreadsheet must be the same. For instance, if a user has access to the Visualization application and does not have view access to the Spreadsheet, the user will not be able to run the visualization as the data is more restrictive to that user than the application. The opposite scenario is true as well, but less likely to cause confusion as a user unable to access the API application is a fairly self-described problem. All Google applications handle access and permissions similarly. More information on this topic can be found on the Google Apps Support pages. Google Permissions overview is available at http://support.google. com/drive/bin/answer.py?hl=en&answer=2494886&rd=1. Get the URL path At present, acquiring a query-capable URL for a Spreadsheet is not as straightforward a task as one might think. There are several methods for which a URL is generated for sharing purposes, but the URL format needed for a data source query can only be found by creating a gadget in the Spreadsheet. A Google Gadget is simply dynamic, HTML or JavaScript-based web content that can be embedded in a web page. Google Gadgets also have their own API, and have capabilities beyond Spreadsheets applications. Information on Google Gadget API is available at https://developers.google.com/gadgets/. Initiate gadget creation by selecting the Gadget... option from the Insert item on the menu bar. When the Gadget Settings window appears, select Apply & close from the Gadget Settings dialog. Choose any gadget from the selection window. The purpose of this procedure is simply to retrieve the correct URL for querying. In fact, deleting the gadget as soon as the URL is copied is completely acceptable. In other words, the specific gadget chosen is of no consequence. Once the gadget has been created, select Get query data source url… from the newly created gadget's drop-down menu. Next, determine and select the range of the Spreadsheet to query. Either the previously selected range when the gadget was created, or the entire sheet is acceptable, depending on the needs of the Visualization application being built. The URL listed under Paste this as a gadget data source url in the Table query data source window is the correct URL to use with the API code requiring query capabilities. Be sure to select the desired cell range, as the URL will change with various options. Important note Google Gadgets are to be retired in 2013, but the query URL is still part of the gadget object at the time of publication. Look for the method of finding the query URL to change as Gadgets are retired. Query Use the URL retrieved from the Spreadsheet Gadget to build the query. The following query statement is set to query the entire Spreadsheet of the key indicated: var query =new google.visualization.Query ('https://docs.google.com/spreadsheet/tq?key =0AhnmGz1SteeGdEVsNlNWWkoxU3ZRQjlmbDdTTjF2dHc&headers=-1'); Once the query is built, it can then be sent. Since an external data source is by definition not always under explicit control of the developer, a valid response to a query is not necessarily guaranteed. In order to prevent hard-to-detect data-related issues, it is best to include a method of handling erroneous returns from the data source. The following query.send function also informs the application how to handle information returned from the data source, regardless of quality. query.send(handleQueryResponse); The handleQueryResponse function sent along with the query acts as a filter, catching and handling errors from the data source. If an error was detected, the handleQueryResponse function displays an alert message. If the response from the data source is valid, the function proceeds and draws the visualization. function handleQueryResponse(response) { if (response.isError()) {alert('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage()); return; } var data = response.getDataTable(); visualization = new google.visualization.Table (documnt.getElementById('visualization')); visualization.draw(data, null);} Best practice Be prepared for potential errors by planning for how to handle them. For reference, the previous example is given in its complete HTML form: <html > <head><meta http-equiv="content-type" content ="text/html; charset=utf-8"/> <title> Google Visualization API Sample </title> <script type="text/javascript" src ="http://www.google.com/jsapi"> </script><script type="text/javascript"> google.load('visualization', '1', {packages: ['table']}); </script> <script type="text/javascript">var visualization;function drawVisualization() {// To see the data that this visualization uses, browse to //https://docs.google.com/spreadsheet/ccc?key=0AhnmGz1SteeGdEVsNlN WWkoxU3ZRQjlmbDdTTjF2dHc&usp=sharing var query = new google.visualization.Query('https://docs.google.com/spreadsheet/tq?key= 0AhnmGz1SteeGdEVsNlNWWkoxU3ZRQjlmbDdTTjF2dHc&headers=-1'); // Send the query with a callback function. query.send(handleQueryResponse); } function handleQueryResponse(response) { if (response.isError()) { alert('Error in query: ' + response.getMessage() + ' ' + response.getDetailedMessage()); return; } var data = response.getDataTable(); visualization = new google.visualization.Table(document.getEleme ntById('visualization')); visualization.draw(data, null);} google.setOnLoadCallback(drawVisualization); </script></head><body style="font-family: Arial;border: 0 none;"> <div id="visualization" style ="height: 400px; width: 400px;"> </div> </body></html> View live examples for Spreadsheets at http://gvisapi-packt. appspot.com/ch6-examples/ch6-datasource.html Apps Script method Just as the Visualization API can be used from within an Apps Script, external data sources can also be requested from the script. In the Apps Script Spreadsheet example presented earlier in this article, the DataTable() creation was performed within the script. In the following example, the create data table element has been removed and a .setDataSourceUrloption has been added to Charts. newAreaChart(). The script otherwise remains the same. functiondoGet() {var chart = Charts.newAreaChart().setDataSourceUrl("https: //docs.google.com/spreadsheet/tq ?key= 0AhnmGz1SteeGdEVsNlNWWkoxU3ZRQjlmbDdTTjF2dHc&headers=-1").setDimensions(600, 400) .setXAxisTitle("Age Groups") .setYAxisTitle("Population") .setTitle("Chicago Population by Age and Gender - 2010 Census") .build();varui = UiApp.createApplication(); ui.add(chart); returnui;} View live examples in Apps Script at https://script. google.com/d/1Q2R72rGBnqPsgtOxUUME5zZy5Kul5 3r_lHIM2qaE45vZcTlFNXhTDqrr/edit. Fusion Tables Fusion Tables are another viable data source ready for use by Visualization API. Fusion Tables offer benefit over Spreadsheets beyond just the Google Map functionality. Tables API also allows for easier data source modification than is available in Spreadsheets. Preparation Preparing a Fusion Table to be used as a source is similar in procedure to preparing a Spreadsheet as a data source. The Fusion Table must be shared to the intended audience, and a unique identifier must be gathered from the Fusion Tables application. Permissions Just as with Spreadsheets, Fusion Tables must allow a user a minimum of view permissions in order for an application using the Visualization API to work properly. From the Sharing settings window in Fusion Tables, give the appropriate users viewaccess as a minimum. Get the URL path Referencing a Fusion Table is very similar in method to Spreadsheets. Luckily, the appropriate URL ID information is slightly easier to find in Fusion Tables than in Spreadsheets. With the Sharing settings window open, there is a field at the top of the page containing the Link to share . At the end portion of the link, following the characters dcid= is the Table's ID. The ID will look something like the following: 1Olo92KwNin8wB4PK_dBDS9eghe80_4kjMzOTSu0 This ID is the unique identifier for the table. Query Google Fusion Tables API includes SQL-like queries for the modification of Fusion Tables data from outside the GUI interface. Queries take the form of HTTP POST and GET requests and are constructed using the Fusion Tables API query capabilities. Data manipulation using Fusion Tables API is beyond the scope of this article, but a simple example is offered here as a basic illustration of functionality. Fusion Table query requests the use of the API SELECT option, formatted as: SELECT Column_name FROM Table_ID Here Column_name is the name of the Fusion Table column and Table_ID is the table's ID extracted from the Sharing settings window. If the SELECT call is successful, the requested information is returned to the application in the JSON format. The Visualization API drawChart() is able to take the SELECT statement and the corresponding data source URL as options for the chart rendering. The male and female data from the Fusion Tables 2010 Chicago Census file have been visualized using the drawChart() technique. function drawVisualization() { google.visualization.drawChart({ containerId: 'visualization', dataSourceUrl: 'http://www.google.com/fusiontables/gvizdata?tq=', query: 'SELECT Age, Male, Female FROM 1Olo92KwNin8wB4PK_ dBDS9eghe80_4kjMzOTSu0', chartType: 'AreaChart', options: { title: 'Chicago Population by Age and Sex - 2010 Census', vAxis: { title: 'Population' }, hAxis: { title: 'Age Groups' } } });} The preceding code results in the following visualization: Live examples are available at http://gvisapi-packt. appspot.com/ch6-examples/ch6-queryfusion.html. Important note Fusion Table query responses are limited to 500 rows. See Fusion Tables API documentation for other resource parameters. API Explorer With so many APIs available to developers using the Google platform, testing individual API functionality can be time consuming. The same issue arises for GUI applications used as a data source. Fortunately, Google provides API methods for its graphical applications as well. The ability to test API requests against Google's infrastructure is a desirable practice for all API programing efforts. To support this need, Google maintains the APIs Explorer service. This service is a console-based, web application that allows queries to be submitted to APIs directly, without an application to frame them. This is helpful functionality when attempting to verify whether a data source is properly configured. To check if the Fusion Tables 2010 U.S. Census data instance is configured properly, a query can be sent to list all columns, which informs which columns are actually exposed to the Visualization API application. Best practice Use the Google API Explorer service to test if API queries work as intended. To use the API Explorer for Fusion Tables, select Fusion Tables API from the list of API services. API functions available for testing are listed on the Fusion Tables API page. Troubleshooting a Chart with a Fusion Tables data source usually involves fi rst verifying all columns are available to the visualization code. If a column is not available, or is not formatted as expected, a visualization issue related to data problems may be difficult to troubleshoot from inside the Visualization API environment. The API call that best performs a simple check on column information is the fusiontables.column.list item. Selecting fusiontables.column.list opens up a form-based interface. The only required information is the Table ID (collected from the Share settings window in the Fusion Tables file). Click on the Execute button to run the query. The API Explorer tool will then show the GET query sent to the Fusion Table in addition to the results it returned. For the fusiontables.column.list query, columns are returned in bracketed sections. Each section contains attributes of that column. The following queried attributes should look familiar, as it is the fusiontables.column.list result of a query to the 2010 Chicago Census data Fusion Table. Best Practice The Column List Tool is helpful when troubleshooting Fusion Table to API code connectivity. If the Table is able to return coherent values through the tool, it can generally be assumed that access settings are appropriate and the code itself may be the source of connection issues. Fusion Tables—row and query reference is available at https:// developers.google.com/fusiontables/docs/v1/sqlreference. Information on API Explorer—column list is available at https:// developers.google.com/fusiontables/docs/v1/ reference/column/list#try-it.

0
0
2001

article-image-participating-business-process-intermediate

Packt

31 Jul 2013

5 min read

Participating in a business process (Intermediate)

Packt

31 Jul 2013

5 min read

(For more resources related to this topic, see here.) The hurdles and bottlenecks for financial services from an IT point of view are: Silos of data Outdated IT system and many applications running on legacy and non-standard based systems Business process and reporting systems not in sync with each other Lack of real-time data visibility Automated decision making Ability to change and manage business processes in accordance with changes in business dynamics Partner management Customer satisfaction This is where BPM plays a key role in bridging the gap between key business requirements and technology or businesses hurdles. In a real-life scenario, a typical home loan use case would be tied up with Know Your Customer (KYC) regulatory requirement. In India for example, the Reserve Bank of India ( RBI) had passed on guidelines that make it mandatory for banks to properly know their customers. RBI mandates that banks collect their customers' proof of identity, recent photographs, and Income Tax PAN. Proof of residence can be a voter card, a driving license, or a passport copy. Getting ready We start with the source code from the previous recipe. We will add a re-usable e-mail or SMS notification process. It is always a best practice to add a new process if it is called multiple times in the same process. This can be a subprocess within the main process itself, or it can be a part of the same composite outside the main process. We will add a new regulatory requirement that allows the customer to add KYC requirements such as photo, proof of address, and Income Tax PAN copy as attachments that will be checked into the WebCenter Content repository. These checks become a part of the customer verification stage before finance approval. We will make KYC as a subprocess with a scope of expansion under a different scenario. We will also save the process data into a filesystem or in a JMS messaging queue at the end of the loan process completion. In a banking scenario, it can also be the integration stage for other applications such as a CRM application or any other application. How to do it… Let's perform the following steps: Launch JDeveloper and open the composite.xml of LoanApplicationProcess in the Design view. Drag-and-drop a new BPMN Process component from the Component Palette. Create the Send Notifications process next to the existing LoanApplicationProcess, and edit the new process. The Send Notifications process will take input parameters as To e-mail ID, From e-mail ID, Subject, CC, and send e-mail to the given e-mail ID. Similarly, we will drag-and-drop a File Adapter component from the Component Palette that saves the customer data into a file. We place this component the end of the LoanApplication process, just before the End activity. We will use this notification service to notify Verification Officers about the arrival of a new eligible application that needs to be verified. In the Application Verification Officer stage, we will add a subprocess, KYC , that will be assigned to the loan initiator—James Cooper in our case. This will be preceded by sending an e-mail notification to the applicant asking for KYC details such as PAN number, scanned photograph, and voter ID as requested by the Verification Officers. Now, let us implement Save Loan Application by invoking the File Adapter service. The Email notification services are also available out of the box. How it works… The outputs of this recipe are re-usable services that can be used across multiple service calls such as notification services. This recipe also demonstrates how to use subprocesses and change the process to meet regulatory requirements. Let's understand the output by taking our use case scenario: When the process is initiated, the e-mail notification gets triggered at appropriate stages of the process. Conan Doyle and John Steinbeck will get the e-mail, requesting them to process the application, with the required information of the applicant, along with the link to BPM Workspace. The KYC task also sends an e-mail to James Cooper, requesting him for the documents required for the KYC check. James Cooper logs in to the James Bank WebCenter Portal and sees there is a task assigned to him to upload his KYC details. James Cooper clicks on the task link and submits the required soft copy documents, and gets them checked into the content repository once the form is submitted. The start-to-end process flow now looks as follows: Summary BPM Process Spaces, which is an extension template of BPM, allows process and task views to be exposed to WebCenter Portal. The advantage of having Process Spaces made available within the Portal is that the users can collaborate with others using out of the box Portal features such as wikis, discussion forums, blogs, and content management. This improves productivity as the user need not log in to different applications for different purposes, as all the required data and information will be made available within the Portal environment. It is also possible to expose some of the WSRP supported application portlets (for example, HR Portlets from PeopleSoft) into a corporate portal environment. All of this sums up to provide higher visibility of the entire business process, and a nature of working and collaborating together in an enterprise business environment. Resources for Article : Further resources on this subject: Managing Oracle Business Intelligence [Article] Oracle E-Business Suite: Creating Bank Accounts and Cash Forecasts [Article] Getting Started with Oracle Information Integration [Article]

0
0
1853

Packt

30 Jul 2013

6 min read

Model Design Accelerator

Packt

30 Jul 2013

6 min read

0
0
1830

Packt

30 Jul 2013

6 min read

First steps with R

Packt

30 Jul 2013

6 min read

(For more resources related to this topic, see here.) Obtaining and installing R The way to obtain R is downloading it from the CRAN website (http://www.r-project.org/). The Comprehensive R Archive Network (CRAN) is a network of FTP and web servers around the world that stores identical, up-to-date versions of code and documentation for R. The CRAN is directly accessible from the R website and on such website it is also possible to find information about R, some technical manuals, the R journal, and details about the packages developed for R and stored on the CRAN repositories. The functionalities of the R environment can then also be expanded thanks to software libraries which can be installed and recalled if needed. These libraries or packages are a collection of source code and other additional files that, when installed in R, allow the user to load them in the workspace via a call to the library() function. An example of code to load the package lattice may be found as follows: > library(lattice) An R installation contains one or more libraries of packages. Some of these packages are part of the basic installation and are loaded automatically as soon as the session is started. Other can be installed from the CRAN, the official R repository, or downloaded and installed manually. Interacting with the console As soon as you will start R, you will see that a workspace is open; you can see a screenshot of the R Console window in the image below. The workspace is the environment in which you are working, where you will load your data, and create your variables. The screen prompt > is the R prompt that waits for commands. On the starting screen, you can either type any function, command, or you can use R to perform basic calculation. R uses the usual symbols for addition (+), subtraction (-), multiplication (*), division (/), and exponentiation (^). Parentheses ( ) can be used to specify the order of operations. R also provides %% for taking the modulus and %/% for integer division. Comments in R are defined by the character #, so everything after such character up to the end of the line will be ignored by R. R has a number of built-in functions, for example, sin(x), cos(x), tan(x), (all in radians), exp(x), log(x), and sqrt(x). Some special constants such as pi are also pre-defined. You can see an example of the use of such function in the following code: > exp(2.5)[1] 12.18249 Understanding R objects In every computer language, variables provide a means of accessing the data stored in memory. R does not provide direct access to the computer’s memory but rather provides a number of specialized data structures called objects. These objects are referred to through symbols or variables. Vectors The basic object in R is the vector; even scalars are vectors of length one. Vectors can be thought of as a series of data of the same class. There are six basic vector type (called atomic vectors): logical, integer, real, complex, string (or character), and raw. Integer and real represent numeric objects; logicals are Boolean data type with possible value TRUE or FALSE. Among such atomic vectors, the more common ones are logical, string, and numeric (integer and real). There are several ways to create vectors. For instance the operator : (colon) is a sequence-generating operator, it creates sequences by incrementing or decrementing by one. > 1:10 [1] 1 2 3 4 5 6 7 8 9 10> 5:-6 [1] 5 4 3 2 1 0 -1 -2 -3 -4 -5 -6 If the interval between the numbers is not one, you can use the seq() function. Here an example > seq(from=2, to=2.5, by=0.1)[1] 2.0 2.1 2.2 2.3 2.4 2.5 One of the more important features of R is the possibility to use entire vector as arguments of functions, thus avoiding the use of cyclic loops. Most of the functions in R allow the use of vector as argument, as example the use of some of these functions is reported as follows > x <- c(12,10,4,6,9)> max(x)[1] 12> min(x)[1] 4> mean(x)[1] 8.2 Matrices and arrays In R, the matrix notation is extended to elements of any kind, so in example it is possible to have a matrix of character strings. Matrices and arrays are basically vectors with a dimension attribute. The function matrix() may be used to create matrices. By default, such function creates the matrix by column; as alternative it is possible to specify to the function to build the matrix by row: > matrix(1:9,nrow=3,byrow=TRUE) [,1] [,2] [,3][1,] 1 2 3[2,] 4 5 6[3,] 7 8 9 Lists A list in R is a collection of different objects. One of the main advantages of lists is that the object contained within a list may be of different type, for example, numeric and character values. In order to define a list, you simply will need to provide the object that you want to include as argument of the function list(). Data frame A data frame corresponds to a data set; it is basically a special list in which the elements have the same length. Elements may be different type in different columns, but within the same column all the elements are of the same type. You can easily create data frames using the function data.frame(), and a specific column can be recall using the operator $. Top features you’ll want to know about In addition to the basic object creation and manipulation, many more complex tasks can be performed with R, spanning from data manipulation, programming, statistical analysis and the realization of very high quality graphs. Some of the most useful features are Data input and output Flow control (for, if…else, while) Create your own functions Debugging functions and handling exceptions Plotting data Summary In this article we saw what is R, how to obtain and install R, and how to interacting with the console. We also saw at few R objects and also looked at the top features you would want to know about Resources for Article: Further resources on this subject: Organizing, Clarifying and Communicating the R Data Analyses [Article] Customizing Graphics and Creating a Bar Chart and Scatterplot in R [Article] Graphical Capabilities of R [Article]

0
0
6724

Packt

19 Jul 2013

3 min read

Avro Source Sink

Packt

19 Jul 2013

3 min read

(For more resources related to this topic, see here.) A typical configuration might look something as follows: To use the Avro Source, you specify the type property with a value of avro. You need to provide a bind address and port number to listen on: collector.sources=av1collector.sources.av1.type=avrocollector.sources.av1.bind=0.0.0.0collector.sources.av1.port=42424collector.sources.av1.channels=ch1collector.channels=ch1collector.channels.ch1.type=memorycollector.sinks=k1collector.sinks.k1.type=hdfscollector.sinks.k1.channel=ch1collector.sinks.k1.hdfs.path=/path/in/hdfs Here we have configured the agent on the right that listens on port 42424, uses a memory channel, and writes to HDFS. Here I've used the memory channel for brevity of this example configuration. Also, note that I've given this agent a different name, collector, just to avoid confusion. The agents on the left—feeding the collector tier—might have a configuration similar to this. I have left the sources off this configuration for brevity: client.channels=ch1client.channels.ch1.type=memoryclient.sinks=k1client.sinks.k1.type=avroclient.sinks.k1.channel=ch1client.sinks.k1.hostname=collector.example.comclient.sinks.k1.port=42424 The hostname value, collector.example.com, has nothing to do with the agent name on that machine, it is the host name (or you can use an IP) of the target machine with the receiving Avro Source. This configuration, named client, would be applied to both agents on the left assuming both had similar source configurations. Since I don't like single points of failure, I would configure two collector agents with the preceding configuration and instead set each client agent to round robin between the two using a sink group. Again, I've left off the sources for brevity: client.channels=ch1client.channels.ch1.type=memoryclient.sinks=k1 k2client.sinks.k1.type=avroclient.sinks.k1.channel=ch1client.sinks.k1.hostname=collectorA.example.comclient.sinks.k1.port=42424client.sinks.k2.type=avroclient.sinks.k2.channel=ch1client.sinks.k2.hostname=collectorB.example.comclient.sinks.k2.port=42424client.sinkgroups=g1client.sinkgroups.g1=k1 k2client.sinkgroups.g1.processor.type=load_balanceclient.sinkgroups.g1.processor.selector=round_robinclient.sinkgroups.g1.processor.backoff=true Summary In this article, we covered tiering data flows using the Avro Source and Sink. More information on this topic can be found in the book Apache Flume: Distributed Log Collection for Hadoop. Resources for Article : Further resources on this subject: Supporting hypervisors by OpenNebula [Article] Integration with System Center Operations Manager 2012 SP1 [Article] VMware View 5 Desktop Virtualization [Article]

0
0
3125

article-image-dpm-non-aware-windows-workload-protection

Packt

16 Jul 2013

18 min read

DPM Non-aware Windows Workload Protection

Packt

16 Jul 2013

18 min read

(For more resources related to this topic, see here.) Protecting DFS with DPM DFS stands for Distributed File System . It was introduced in Windows Server 2003, and is a set of services available as a role on Windows Server operating systems that allow you to group file shares held in different locations (different servers) under one folder known as DFS root . The actual locations of the file shares are transparent to the end user. DFS is also often used for redundancy of file shares. For more information on DFS Windows Server 2008: http://technet.microsoft.com/en-us/library/cc753479%28v=ws.10%29.aspx Windows Server 2008 R2 and Windows Server 2012: http://technet.microsoft.com/en-us/library/cc732006.aspx Before DFS can be protected it is important to know how it is structured. DFS consists of both data and configuration information: The configuration for DFS is stored in the registry of each server, and in either the DFS tree during standalone DFS deployments, or in Active Directory when domain-based DFS is deployed. DFS data is stored on each server in the DFS tree. The data consists of the multiple shares that make up the DFS root. Protecting DFS with DPM is fairly straightforward. It is recommended to protect the actual file shares directly on each of the servers in the DFS root. When you have a standalone DFS deployment you should protect the system state on the servers in the DFS root, and when you have a domain-based DFS deployment we recommend you protect your Active Directory of the domain controller that hosts the DFS root. If you are using DFS replication it is also recommended to protect the shadow copy components on servers that host the replication data, in addition to the previously mentioned items. These methods would allow you to restore DFS by restoring the data and either system state or Active Directory depending on your deployment type. Another option is to use the DfsUtil tool to export/import your DFS configuration. This is a command-line utility that comes with Windows Server that can export the namespace configuration to a file. The configuration can then be imported back into a DFS server to restore a DFS namespace. DPM can be set up to protect the DFS export. You would still need to protect the actual data directly. An example of using the DfsUtil tool would be: Run DfsUtil root export domainnamerootname dfsrootname.xml to export the DFS configuration to an XML file, then run DfsUtil root import to import the DFS configuration back in. For more information on the DfsUtil tool, visit the following URL: http://blogs.technet.com/b/josebda/archive/2009/05/01/using-the-windows-server-2008-dfsutil-exe-command-line-to-manage-dfs-namespaces.aspx That covers the backing up of DFS with DPM. Protecting Dynamics CRM with DPM Microsoft Dynamics CRM is Microsoft's customer relationship management (CRM) software in the CRM market. Microsoft Dynamics CRM Version 1.0 was released in 2003. It then progressed to Version 4.0 and the latest one is 2011. CRM is a part of the Microsoft Dynamics product family. In this section we will cover protecting Versions 4.0 and 2011. Note that when protecting Microsoft Dynamics CRM on either Version 4.0 or 2011, you should keep a note of your update-rollup level some place safe, so that you can install CRM back to that level in the event of a restore. You will need to restore the CRM database and this could lead to an error if CRM is not at the correct update level. To protect Microsoft Dynamics CRM 4.0, back up the following components: Microsoft CRM Server database This is straightforward; you simply need to protect the SQL CRM databases. The two databases you want to protect are the following: The configuration database: MSCRM_CONFIG The organization database: OrganizationName_MSCRM Microsoft CRM Server program files By default, these files will be located at C:Program FilesMicrosoft CRM. Microsoft CRM website By default the CRM website files are located in the C:Inetpubwwwroot directory. The web.config file can be protected. It only needs protecting if it has been changed from the default settings. Microsoft CRM registry subkey Back up the HKEY_LOCAL_MACHINESOFTWAREMicrosoftMSCRM key. Microsoft CRM customizations To protect customizations or any third-party add-ons you will need to understand the specific components to back up and protect. Other components to back up for protecting Microsoft CRM include the following: System state of your domain controller. Exchange server if the CRM's e-mail router is used. To protect Microsoft Dynamics CRM 2011, back up the following components: Microsoft CRM 2011 databases This is straightforward, you simply need to protect the SQL CRM databases. The two databases you want to protect are: The configuration database: MSCRM_CONFIG The organization database: OrganizationName_MSCRM Microsoft CRM 2011 program files By default, these files will be located at C:Program FilesMicrosoft CRM. Microsoft CRM 2011 website By default the CRM website files are located in the C:Program FilesMicrosoft CRMCRMWeb directory. The web.config file can be protected. It only needs protecting if it has been changed from the default settings. Microsoft CRM 2011 registry subkey Back up the HKEY_LOCAL_MACHINESOFTWAREMicrosoftMSCRM subkey. Microsoft CRM 2011 customizations To protect customizations or any third-party add-ons you will need to understand the specific components to back up and protect. Other components to back up for protecting Microsoft CRM 2011 include: System state of your domain controller. Exchange server if the CRM's e-mail router is used. SharePoint if CRM and SharePoint integration is in use. Note that for both CRM 4.0 and CRM 2011, you could have more than one OrganizationName_MSCRM database if you have more than one organization in CRM. Be sure to protect all of the OrganizationName_MSCRM databases that may exist. That wraps up the Microsoft Dynamics CRM protection for both 4.0 and 2011. You simply need to configure protection of the mentioned components with DPM. Now let's look at what it will take to protect another product from the Dynamics family. Protecting Dynamics GP with DPM Dynamics GP is Microsoft's ERP and accounting software package for mid-market businesses. GP has standard accounting functions but it can do more such as Sales Order Processing, Order Management, Inventory Management, and Demand Planner for forecasting, thus making it usable as a full-blown ERP. GP was once known as Great Plains Software before acquisition by Microsoft. The most recent versions of GP are Microsoft Dynamics GP 10.0 and Dynamics GP 2010 R2. GP holds your organization's financial data. If you use it as an ERP solution, it holds even more critical data, and losing this data could be devastating to an organization. Yes, there is a built-in backup utility in GP but this does not cover all bases in protecting your GP. In fact, the built-in backup process only backs up the SQL database, and does not cover items like: Customized forms Reports Financial statement formats The sysdata folder These are the GP components you should protect with DPM: SQL administrative databases: Master, TempDB, and Model Microsoft Dynamics GP system database (DYNAMICS) Each of your company databases If you use SQL Server Agent to schedule automatic tasks, back up the msdb database forms.dic (for customized forms) can be found in %systemdrive%Program Files (x86)Microsoft DynamicsGP2010 reports.dic (for reports) can be found in %systemdrive%Program Files (x86)Microsoft DynamicsGP2010 Backing up these components with DPM should be sufficient protection in the event a restore is needed. Protecting TMG 2010 with DPM Threat Management Gateway (TMG ) is a part of the Forefront product family. The predecessor to TMG is Internet Security and Acceleration Server (ISA Server ). TMG is fundamentally a firewall, but a very powerful one with features such as VPN, web caching, reverse proxy, advanced stateful packet, WAN failover, malware protection, routing, load balancer, and much more. There have been several forum threads on the Microsoft DPM TechNet forums asking about DPM protecting TMG, which sparked the inclusion of this section in the book. TMG is a critical part of networks and should have high priority in regards to backup, right up there with your other critical business applications. In many environments, if TMG is down, there are a good amount of users that cannot access certain business applications which causes downtime. Let's take a look at how and what to protect in regards to TMG. The first step is to allow DPM traffic on TMG so that the agent can communicate with DPM. You will need to install the DPM agent on TMG and then start protecting it from there. Follow the ensuing steps to protect your TMG server: On the TMG server, go to Start | All Programs | Microsoft TMG Server . Open the TMG Server Management MMC. Expand Arrays and then TMG Server computer , then click on Firewall Policy . On the View menu, click on Show System Policy Rules . Right-click on the Allow remote management from selected computers using MMC system policy rule. Select Edit System Policy . In the System Policy Editor dialog box, click to clear the Enable this configuration group checkbox, and then click on OK . Click on Apply to update the firewall configuration, and then click on OK . Right-click on the Allow RPC from TMG server to trusted servers system policy rule. Select Edit System Policy . In the System Policy Editor dialog box, click to clear the Enforce strict RPC compliance checkbox, and then click on OK . Click on Apply to update the firewall configuration, and then click on OK . On the View menu, click on Hide System Policy Rules . Right-click on Firewall Policy . Select New and then Access Rule . In the New Access Rule Wizard window, type a name in the Access rule name box. Click on Next . Check the Allow checkbox and then click on Next . In the This rule applies to list, select All outbound traffic from the drop-down menu and click on Next . On the Access Rule Sources page, click on Add . In the Add Network Entities dialog window, click on New and select Computer from the drop-down list. Now type the name of your DPM server and type the DPM server's IP address in the Computer IP Address field. Click on OK when you are done. You will then see your DPM server listed under the Computers folder in the Add Network Entities window. Select it and click on Add . This will bring the DPM computer into your access rule wizard. Click on Next . In the Add Rule Destinations window click on Add . The Add Network Entities window will come up again. In this window expand the Networks folder, and then select Local Host and click on Add . Now click on Next . Your rule should have both the DPM server and Local Host listed for both incoming and outgoing. Click on Next , leave the default All Users entry in the This rule applies to requests from the following user sets box, click on Next again. Click on Finish . Right-click on the new rule (DPM2010 in this example), and then click on Move Up . Right-click on the new rule, and select Properties . In the rule name properties dialog box (DPM2010 Properties ), click on the Protocols tab, then click on Filtering . Now select Configure RPC Protocol . In the Configure RPC protocol policy dialog box, check the Enforce strict RPC compliance checkbox, and then click on OK twice. Click on Apply to update the firewall policy, and then click on OK . Now you will need to attach the DPM agent for the TMG server. Follow the ensuing steps to complete this task: Open the DPM Administrator Console. Click on the Management tab on the navigation bar. Now click on the Agents tab. On the Actions pane, click on Install . Now the Protection Agent Install Wizard window should pop up. Choose the Attach agents checkbox. Choose Computer on trusted domain , and click on Next . Select the TMG server from the list and click on Add and then click on Next . Enter credentials for the domain account. The account that is used here needs to have administrative rights on the computer you are going to protect. Click on Next to continue. You will receive a warning that DPM cannot tell if the TMG server is clustered or not. Click on OK for this. On the next screen click on Attach to continue. Next you have to install the agent on the TMG firewall and point it to the correct DPM server. Follow the ensuing steps to complete this task: From the TMG server that you will be protecting, access the DPM server over the network and copy the folder with the agent installed in it down to the local machine. Use this path DPMSERVERNAME%systemdrive%program filesMicrosoft DPMDPMProtectionAgentsRA3.03.0.7696.0i386. Then from the local folder on the protected computer, run dpmra.msi to install the agent. Open a command prompt (make sure you have elevated privileges), change directory to C:Program FilesMicrosoft Data Protection ManagerDPMbin then run the following: SetDpmServer.exe -dpmServerName <serverName> userName <userName> Following is the example of the previous command: SetDpmServer.exe -dpmServerName buchdpm Now restart the TMG server. Once your TMG server comes back, check the Windows services to make sure that the DPMRA service is set to automatic, and then start it. That is it for configuring DPM to start protecting TMG, but there are a few more things that we still need to cover on this topic. With TMG backup you can choose to back up certain components of TMG, depending on your recovery needs. With DPM you can back up the TMG hard drive, TMG logs that are stored in SQL, TMG's system state, or BMR of TMG. Following is the list of components you should back up depending on your circumstances: What can be included in TMG server backup: TMG configuration settings (exported through TMG) TMG firewall settings (exported through TMG) TMG logfiles (stored in SQL databases) TMG install directory (only needed if you have custom forms for things such as an Outlook Web Access login screen TMG server system state TMG BMR None of the previous components are required for protection of TMG. In fact, protecting the SQL logfiles tends to cause more issues than it helps, as they change so often. These SQL log databases change so often that DPM will send an error when the old SQL databases no longer shown under protection. The logfiles are not required to restore your TMG. For a standard TMG restore, you will need to reinstall TMG, reconfigure NIC settings, import any certificates, and restore TMG configuration and firewall settings. For more information on backing up TMG 2010, visit the following page: http://technet.microsoft.com/en-us/library/cc984454.aspx. DPM cannot back up the TMG configuration and firewall settings natively. This needs to be scripted and scheduled through Windows Task Scheduler, and then placed on the local hard drive. DPM can back up the .XML settings for TMG export from there. You can find the TMG server's export script at http://msdn.microsoft.com/en-us/library/ms812627.aspx. Place this script into a .VBS file, and then set up a scheduled task to call this file to run. This automates the export of your TMG server settings. There is another way to back up the entire TMG server. This is a new type of protection, specific to TMG 2010. This protection is BMR and is available because TMG is now installed on top of Windows Server 2008 and Windows Server 2008 R2. Protecting the BMR of your TMG gives you the ability to restore your entire TMG in the event that it fails-configuration and firewall settings included. BMR will also bring back certificates and NIC card settings. Note that BMR of TMG restored on a virtual machine can't use its NIC card settings. It only on the same hardware. Well that covers how to protect TMG with DPM. As you can see that there are some improvements through BMR, and if you do not employ BMR protection you can still automate the process of protecting TMG. How to protect IIS Internet Information Services (IIS ) is Microsoft's web server platform. It is included for free with Windows Server operating systems. Its modular nature makes it scalable for different organization web server need. The latest version is IIS 8. It can be used for more than standard web hosting, for example as an FTP server or for media delivery . Knowing what to protect when it comes to IIS will come in handy in almost any environment you may work in. Backing up IIS is one thing but you need to ensure that you understand the websites or web applications you are running, so that you know how to back them up too. In this section, we are going to look at the protection of IIS. To protect IIS, you should backup the following components: IIS configuration files Website or web applications data SSL certificates Registry (only needed if website or web application required modifications of the registry) Metabase The IIS configuration files are located in the %systemdrive%windowssystem32inetsrvconfig directory (and subdirectories). The website or web application files are typically found in C:inetpubwwwroot. Now this is the default location but the website or web application files can be located anywhere on an IIS server. To export SSL certificates directly from IIS, follow the ensuing steps: Open the Microsoft IIS 7 console. In the left-hand pane, select the server name. In the center pane click on the server certificates icon. Right-click on the certificate you wish to export and select export . Enter a file path, name the certificate file, and give it a password. Click on OK and your certificate will be exported as a .pfx file in the path you specified. Metabase is an internal database that holds IIS configuration data. It is made up of two files: MBSchema.xml and MetaBase.xml. These can be found in %SystemRoot%system32inetsrv. A good thing to know is that if you protect the system state of a server, then IIS configuration will be included in this backup. This does not include the website or web application files, so you will still need to protect these in addition to a system state backup. That covers the items you will need to protect IIS with DPM backup. Protecting Lync 2010 with DPM Lync 2010 is Microsoft's Unified Communication platform complete with IM, presence, conferencing, enterprise video and voice, and more. Lync was formerly known as Office Communicator. Lync is quickly becoming an integral part of business communications. With Lync being a critical application to organizations, it important to ensure this platform is backed up. Lync is a massive product with many moving parts. We are not going to cover all of Lync's architecture as this would need its own book. We are going to focus on what should be backed up to ensure protection of your Lync deployment. Overall, we want to protect Lync's settings and configuration data. The majority of this data is stored in the Lync Central Management store. The following are the components that needs to be protected in order to back up Lync: Settings and configuration data Topology configuration (Xds.mdf) Location information (Lis.mdf) Response group configuration (RgsConfig.mdf) Data stored in databases User data (Rtc.mdf) Archiving data (LcsLog.mdf) Monitoring data (csCDR.mdf and QoeMetrics.mdf) File stores Lync server file store Archiving file store These stores will be file shares on the Lync server, named in the format lyncservernamesharename. To track down these file shares if you don't know where they are, go to the Lync Topology Builder and look in the File stores node. Note the files named Meeting.Active should not be backed up. These files are in use and locked while a meeting takes place. Other components as follows: Active Directory (User SIP data, a pointer to the Central Management store, and objects for Response Group and Conferencing Attendant) Certification authority (CA) and certificates (if you use an internal CA) Microsoft Exchange and Exchange Unified Messaging (UM) if you are using UM with your Exchange Domain Name System (DNS) records and IP addresses IIS on Lync Server DHCP Configuration Group Chat (if used) XMPP gateways if you are using XMPP gateway Public switched telephone network (PSTN) gateway configuration, if your Lync is connected to one Firewall and Load Balancer (if used) configurations Summary Now that we had a chance to look at several Microsoft workloads that are used in organizations today and how to protect them with DPM, you should have a good understanding what it takes to back them up. These workloads included Lync 2010, IIS, CRM, GP, DFS, and TMG. Note there are many more Microsoft workloads that DPM cannot protect natively, which we were unable to cover in this article. Resources for Article : Further resources on this subject: Overview of Microsoft Dynamics CRM 2011 [Article] Deploying .NET-based Applications on to Microsoft Windows CE Enabled Smart Devices [Article] Working with Dashboards in Dynamics CRM [Article]

0
0
2135

article-image-oracle-business-intelligence-11g-r1-cookbook

Packt

10 Jul 2013

4 min read

Measuring Performance with Key Performance Indicators

Packt

10 Jul 2013

4 min read

(For more resources related to this topic, see here.) Creating the KPIs and the KPI watchlists We're going to create Key Performance Indicators and watchlists in the first recipe. There should be comparable measure columns in the repository in order to create KPI objects. The following columns will be used in the sample scenario: Shipped Quantity Requested Quantity How to do it Click on the KPI link in the Performance Management section and you're going to select a subject area. The KPI creation wizard has five different steps. The first step is the General Propertiessection and we're going to write a description for the KPI object. The Actual Value and the Target Value attributes display the columns that we'll use in this scenario. The columns should be selected manually. The Enable Trendingcheckbox is not selected by default. When you select the checkbox, trending options will appear on the screen. We're going to select the Day level from the Time hierarchy for trending in the Compare to Prior textbox and define a value for the Tolerance attribute. We're going to use 1 and % Change in this scenario. Clicking on the Next button will display the second step named Dimensionality. Click on the Add button to select Dimension attributes. Select the Region column in the Add New Dimension window. After adding the Region column, repeat the step for the YEAR column. You shouldn't select any value to pin. Both column values will be . Clicking on Next button will display the third step named States. You can easily configure the state values in this step. Select the High Values are Desirable value from the Goal drop-down list. By default, there are three steps: OK Warning Critical Then click on the Next button and you'll see the Related Documents step. This is a list of supporting documents and links regarding to the Key Performance Indicator. Click on the Add button to select one of the options. If you want to use another analysis as a supporting document, select the Catalog option and choose the analysis that consists of some valuable information about the report. We're going to add a link. You can also easily define the address of the link. We'll use the http://www.abc.com/portal link. Click on the Next button to display the Custom Attributes column values. To add a custom attribute that will be displayed in the KPI object, click on the Add button and define the values specified as follows: Number: 1 Label: Dollars Formula: "Fact_Sales"."Dollars" Save the KPI object by clicking on the Save button. Right after saving the KPI object, you'll see the KPI content. KPI objects cannot be published in the dashboards directly. We need KPI watchlists to publish them in the dashboards. Click on the KPI Watchlist link in the Performance Managementsection to create one. The New KPI Watchlist page will be displayed without any KPI objects. Drag-and-drop the KPI object that was previously created from the Catalog pane onto the KPI watchlist list. When you drop the KPI object, the Add KPI window will pop up automatically. You can select one of the available values for the dimensions. We're going to select the Use Point-of-View option. Enter a Label value, A Sample KPI, for this example. You'll see the dimension attributes in the Point-of-View bar. You can easily select the values from the drop-down lists to have different perspectives. Save the KPI watchlist object. How it works KPI watchlists can contain multiple KPI objects based on business requirements. These container objects can be published in the dashboards so that end users will access the content of the KPI objects through the watchlists. When you want to publish these watchlists, you'll need to select a value for the dimension attributes. There's more The Drill Down feature is also enabled in the KPI objects. If you want to access finer levels, you can just click on the hyperlink of the value you are interested in and a detailed level is going to be displayed automatically. Summary In this article, we learnt how to create KPIs and KPI watchlists. Key Performance Indicators are building blocks of strategy management. In order to implement balanced scorecard management technique in an organization, you'll first need to create the KPI objects. Resources for Article : Further resources on this subject: Oracle Integration and Consolidation Products [Article] Managing Oracle Business Intelligence[Article] Oracle Tools and Products [Article]

0
0
1508

article-image-getting-started-oracle-data-guard

Packt

02 Jul 2013

13 min read

Getting Started with Oracle Data Guard

Packt

02 Jul 2013

13 min read

0
0
4153

article-image-creating-your-first-collection-simple

Packt

26 Jun 2013

7 min read

Creating your first collection (Simple)

Packt

26 Jun 2013

7 min read

(For more resources related to this topic, see here.) Getting ready Assuming that you have walked through the tutorial, you should be nearly ready with the setup. Still, it does not hurt to go through the checklist: Be familiar that you know how to start your operating system's shell (cmd.exe on Windows, Terminal/iTerm on Mac, and sh/bash/tch/zsh on Unix). Ensure that running the java –version command on the shell's prompt returns at least Version 1.6. You may need to upgrade if you have an older version. Ensure that you know where you unpacked the Solr distribution and the full path to the example directory within that. You needed that directory for the tutorial, but that's also where we are going to start our own Solr instance. That allows us to easily run an embedded Jetty web server and to also find all the additional JAR files that Solr needs to operate properly. Now, create a directory where we will store our indexes and experiments. It can be anywhere on your drive. As Solr can run on any operating system where Java can run, we will use SOLRINDEXING as a name whenever we refer to that directory. Make sure to use absolute path names when substituting with your real path for the directory. How to do it... As our first example, we will create an index that stores and allows for the searching of simplified e-mail information. For now, we will just look at the addr_from and addr_to e-mail addresses and the subject line. You will see that it takes only two simple configuration files to get the basic Solr index working. Under the SOLR-INDEXING directory, create a collection1 directory and inside that create a conf directory. In the conf directory, create two files: schema.xml and solrconfig.xml. The schema.xml file should have the following content: <?xml version="1.0" encoding="UTF-8" ?><schema version="1.5"><fields><field name="id" type="string" indexed="true" stored="true"required="true"/><field name="addr_from" type="string" indexed="true"stored="true" required="true"/><field name="addr_to" type="string" indexed="true"stored="true" required="true"/><field name="subject" type="string" indexed="true"stored="true" required="true"/></fields><uniqueKey>id</uniqueKey><types><fieldType name="string" class="solr.StrField" /></types></schema> The solrconfig.xml file should have the following content: <?xml version="1.0" encoding="UTF-8" ?><config><luceneMatchVersion>LUCENE_43</luceneMatchVersion><requestDispatcher handleSelect="false"><httpCaching never304="true" /></requestDispatcher><requestHandler name="/select" class="solr.SearchHandler" /><requestHandler name="/update" class="solr.UpdateRequestHandler" /><requestHandler name="/admin" class="solr.admin.AdminHandlers" /><requestHandler name="/analysis/field" class="solr.FieldAnalysisRequestHandler" startup="lazy" /></config> That is it. Now, let's start our just-created Solr instance. Open a new shell (we'll need the current one later). On that shell's command prompt, change the directory to the example directory of the Solr distribution and run the following command: java -Dsolr.solr.home=SOLR-INDEXING -jar start.jar Notice that solr.solr.home is not a typo; you do need the solr part twice. And, as always, if you have spaces in your paths (now or later), you may need to escape them in platform-specific ways, such as with backslashes on Unix/Linux or by quoting the whole value. In the window of your shell, you should see a long list of messages that you can safely ignore (at least for now). You can verify that everything is working fine by checking for the following three elements: The long list of messages should finish with a message like Started SocketConnector@0.0.0.0:8983. This means that Solr is now running on port 8983 successfully. You should now have a directory called data, right next to the directory called conf that we created earlier. If you open the web browser and go to the http:// localhost:8983/ solr/, you should see a web-based admin interface that makes testing and troubleshooting your Solr instance much easier. We will be using this interface later, so do spend a couple of minutes clicking around now. Now, let's load some actual content into our collection: Copy post.jar from the Solr distribution's example/exampledocs directory to our root SOLR-INDEXING directory. Create a file called input1.csv in the collection1 directory, next to the conf and data directories with the following three-line content: id,addr_from,addr_to,subjectemail1,fulan@acme.example.com,kari@acme.example.com,"Kari,we need more Junior Java engineers"email2,kari@acme.example.com,maija@acme.example.com,"Updating vacancy description" Run the import command from the command line in the SOLR-INDEXING directory (one long command; do not split it across lines): java -Dauto -Durl=http://localhost:8983/solr/collection1/update -jar post.jar collection1/input1.csv You should see the following in one of the message lines: "1 files indexed". If you now open a web browser and go to http:// localhost:8983/solr/ collection1/select?q=*%3A*&wt=ruby&indent=true, you should see Solr output with all the three documents displayed on the screen in a somewhat readable format. How it works... We have created two files to get our example working. Let's review what they mean and how they fit together: The schema.xml file in the collection's conf directory defines the actual shape of data that you want to store and index. The fields define a structure of a record. Each field has a type, which is also defined in the same file. The field defines whether it is stored, indexed, required, multivalued, or a small number of other, more advanced properties. On the other hand, the field type defines what is actually done to the field when it is indexed and when it is searched. We will explore all of these later. The solrconfig.xml file also in the collection's conf directory defines and tunes the components that make up Solr's runtime environment. At the very least, it needs to define which URLs can be called to add records to a collection (here, /update), which to query a collection (here, /select), and which to do various administrative tasks (here, /admin and /analysis/field). Once Solr started, it created a single collection with the default name of collection1, assigned an update handler to it at the /solr/collection1/update URL and search handler at the /solr/collection1/select URL (as per solrconfig.xml). At that point, Solr was ready for the data to be imported into the four required fields (as per schema.xml). We then proceeded to populate the index from a CSV file (one of many update formats available) and then verified that the records are all present in an indented Ruby format (again, one of many result formats available). Summary This article helped you create a basic Solr collection and populate it with a simple dataset in CSV format. Resources for Article : Further resources on this subject: Integrating Solr: Ruby on Rails Integration [Article] Indexing Data in Solr 1.4 Enterprise Search Server: Part2 [Article] Text Search, your Database or Solr [Article]

0
0
4693

article-image-article-creating-your-first-heat-map-r

Packt

26 Jun 2013

10 min read

Creating your first heat map in R

Packt

26 Jun 2013

10 min read

(For more resources related to this topic, see here.) The following image shows one of the heat maps that we are going to create in this recipe from the total count of air passengers: Image Getting ready Download the script 5644_01_01.r from your account at http://www.packtpub.com and save it to your hard disk. The first section of the script, below the comment line starting with ### loading packages, will automatically check for the availability of the R packages gplots and lattice, which are required for this recipe. If those packages are not already installed, you will be prompted to select an official server from the Comprehensive R Archive Network (CRAN) to allow the automatic download and installation of the required packages. If you have already installed those two packages prior to executing the script, I recommend you to update them to the most recent version by calling the following function in the R command line: code Use the source() function in the R command-line to execute an external script from any location on your hard drive. If you start a new R session from the same directory as the location of the script, simply provide the name of the script as an argument in the function call as follows: code You have to provide the absolute or relative path to the script on your hard drive if you started your R session from a different directory to the location of the script. Refer to the following example: code You can view the current working directory of your current R session by executing the following command in the R command-line: code How to do it... Run the 5644OS_01_01.r script in R to execute the following code, and take a look at the output printed on the screen as well as the PDF file, first_heatmaps.pdf that will be created by this script: code How it works... There are different functions for drawing heat maps in R, and each has its own advantages and disadvantages. In this recipe, we will take a look at the levelplot() function from the lattice package to draw our first heat map. Furthermore, we will use the advanced heatmap.2() function from gplots to apply a clustering algorithm to our data and add the resulting dendrograms to our heat maps. The following image shows an overview of the different plotting functions that we are using throughout this book: Image Now let us take a look at how we read in and process data from different data files and formats step-by-step: Loading packages: The first eight lines preceding the ### loading data section will make sure that R loads the lattice and gplots package, which we need for the two heat map functions in this recipe: levelplot() and heatmap.2(). Each time we start a new session in R, we have to load the required packages in order to use the levelplot() and heatmap.2() functions. To do so, enter the following function calls directly into the R command-line or include them at the beginning of your script: library(lattice) library(gplots) Loading the data set: R includes a package called data, which contains a variety of different data sets for testing and exploration purposes. More information on the different data sets that are contained in the data package can be found at http:// stat.ethz.ch/ROmanual/ROpatched/library/datasets/. For this recipe, we are loading the AirPassenger data set, which is a collection of the total count of air passengers (in thousands) for international airlines from 1949- 1960 in a time-series format. code Converting the data set into a numeric matrix: Before we can use the heat map functions, we need to convert the AirPassenger time-series data into a numeric matrix first. Numeric matrices in R can have characters as row and column labels, but the content itself must consist of one single mode: numerical. We use the matrix() function to create a numeric matrix consisting of 12 columns to which we pass the AirPassenger time-series data row-by-row. Using the argument dimnames = rowcolNames, we provide row and column names that we assigned previously to the variable rowColNames, which is a list of two vectors: a series of 12 strings representing the years 1949 to 1960, and a series of strings for the 12 three-letter abbreviations of the months from January to December, respectively. code A simple heat map using levelplot(): Now that we have converted the AirPassenger data into a numeric matrix format and assigned it to the variable air_data, we can go ahead and construct our first heat map using the levelplot() function from the lattice package: code The levelplot() function creates a simple heat map with a color key to the righthand side of the map. We can use the argument col.regions = heat.colors to change the default color transition to yellow and red. X and y axis labels are specified by the xlab and ylab parameters, respectively, and the main parameter gives our heat map its caption. In contrast to most of the other plotting functions in R, the lattice package returns objects, so we have to use the print() function in our script if we want to save the plot to a data file. In an interactive R session, the print() call can be omitted. Typing the name of the variable will automatically display the referring object on the screen. Creating enhanced heat maps with heatmap.2(): Next, we will use the heatmap.2() function to apply a clustering algorithm to the AirPassenger data and to add row and column dendrograms to our heat map: code Hierarchical clustering is especially popular in gene expression analyses. It is a very powerful method for grouping data to reveal interesting trends and patterns in the data matrix. Another neat feature of heatmap.2() is that you can display a histogram of the count of the individual values inside the color key by including the argument density.info = NULL in the function call. Alternatively, you can set density. info = "density" for displaying a density plot inside the color key. By adding the argument keysize = 1.8, we are slightly increasing the size of the color key—the default value of keysize is 1.5: code Did you notice the missing row dendrogram in the resulting heat map? This is due to the argument dendrogram = "column" that we passed to the heat map function. Similarly, you can type row instead of column to suppress the column dendrogram, or use neither to draw no dendrogram at all. There's more... By default, levelplot() places the color key on the right-hand side of the heat map, but it can be easily moved to the top, bottom, or left-hand side of the map by modifying the space parameter of colorkey: Replacing top by left or bottom will place the color key on the left-hand side or on the bottom of the heat map, respectively. Moving around the color key for heatmap.2() can be a little bit more of a hassle. In this case we have to modify the parameters of the layout() function. By default, heatmap.2() passes a matrix, lmat, to layout(), which has the following content: code The numbers in the preceding matrix specify the locations of the different visual elements on the plot (1 implies heat map, 2 implies row dendrogram, 3 implies column dendrogram, and 4 implies key). If we want to change the position of the key, we have to modify and rearrange those values of lmat that heatmap.2() passes to layout(). For example, if we want to place the color key at the bottom left-hand corner of the heat map, we need to create a new matrix for lmat as follows: code We can construct such a matrix by using the rbind() function and assigning it to lmat: code Furthermore, we have to pass an argument for the column height parameter lhei to heatmap.2(), which will allow us to use our modified lmat matrix for rearranging the color key: code If you don't need a color key for your heat map, you could turn it off by using the argument key = FALSE for heatmap.2() and colorkey = FALSE for levelplot(), respectively. R also has a base function for creating heat maps that does not require you to install external packages and is most advantageous if you can go without a color key. The syntax is very similar to the heatmap.2() function, and all options for heatmap.2() that we have seen in this recipe also apply to heatmap(): code More information on dendrograms and clustering By default, the dendrograms of heatmap.2() are created by a hierarchical agglomerate clustering method, also known as bottom-up clustering. In this approach, all individual objects start as individual clusters and are successively merged until only one single cluster remains. The distance between a pair of clusters is calculated by the farthest neighbor method, also called the complete linkage method, which is based by default on the Euclidean distance of the two points from both clusters that are farthest apart from each other. The computed dendrograms are then reordered based on the row and column means. By modifying the default parameters of the dist() function, we can use another distance measure rather than the Euclidean distance. For example, if we want to use the Manhattan distance measure (based on a grid-like path rather than a direct connection between two objects), we would modify the method parameter of the dist() function and assign it to a variable distance first: code Other options for the method parameter are: euclidean (default), maximum, canberra, binary, or minkowski. To use other agglomeration methods than the complete linkage method, we modify the method parameter in the hclust() function and assign it to another variable cluster. Note the first argument distance that we pass to the hclust() function, which comes from our previous assignment: code By setting the method parameter to ward, R will use Joe H. Ward's minimum variance method for hierarchical clustering. Other options for the method parameter that we can pass as arguments to hclust() are: complete (default), single, average, mcquitty, median, or centroid. To use our modified clustering parameters, we simply call the as.dendrogram() function within heatmap.2() using the variable cluster that we assigned previously: code We can also draw the cluster dendrogram without the heat map by using the plot() function: code To turn off row and column reordering, we need to turn off the dendrograms and set the parameters Colv and Rowv to NA: code Summary This article has helped us create our first heat maps from a small data set provided in R. We have used different heat map functions in R to get a first impression of their functionalities. Resources for Article : Further resources on this subject: Getting started with Leaflet [Article] Moodle 1.9: Working with Mind Maps [Article] Joomla! with Flash: Showing maps using YOS amMap [Article]

0
0
7099

article-image-linking-section-access-multiple-dimensions

Packt

25 Jun 2013

3 min read

Linking Section Access to multiple dimensions

Packt

25 Jun 2013

3 min read

(For more resources related to this topic, see here.) Getting ready Load the following script: Product:LOAD * INLINE [ ProductID, ProductGroup, ProductName 1, GroupA, Great As 2, GroupC, Super Cs 3, GroupC, Mega Cs 4, GroupB, Good Bs 5, GroupB, Busy Bs];Customer:LOAD * INLINE [ CustomerID, CustomerName, Country 1, Gatsby Gang, USA 2, Charly Choc, USA 3, Donnie Drake, USA 4, London Lamps, UK 5, Shylock Homes, UK];Sales:LOAD * INLINE [ CustomerID, ProductID, Sales 1, 2, 3536 1, 3, 4333 1, 5, 2123 2, 2, 45562, 4, 1223 2, 5, 6789 3, 2, 1323 3, 3, 3245 3, 4, 6789 4, 2, 2311 4, 3, 1333 5, 1, 7654 5, 2, 3455 5, 3, 6547 5, 4, 2854 5, 5, 9877];CountryLink:Load Distinct Country, Upper(Country) As COUNTRY_LINKResident Customer;Load Distinct Country, 'ALL' As COUNTRY_LINKResident Customer;ProductLink:Load Distinct ProductGroup, Upper(ProductGroup) As PRODUCT_LINKResident Product;Load Distinct ProductGroup, 'ALL' As PRODUCT_LINKResident Product;//Section Access;Access:LOAD * INLINE [ ACCESS, USERID, PRODUCT_LINK, COUNTRY_LINKADMIN, ADMIN, *, * USER, GM, ALL, ALL USER, CM1, ALL, USA USER, CM2, ALL, UK USER, PM1, GROUPA, ALL USER, PM2, GROUPB, ALL USER, PM3, GROUPC, ALL USER, SM1, GROUPB, UK USER, SM2, GROUPA, USA];Section Application; Note that there is a loop error generated on reload because there is a loop in the data structure. How to do it… Follow these steps to link Section Access to multiple dimensions: Add list boxes to the layout for ProductGroup and Country. Add a statistics box for Sales. Remove // to uncomment the Section Access statement. From the Settings menu, open Document Properties and select the Opening tab. Turn on the Initial Data Reduction Based on Section Access option. Reload and save the document. Close QlikView. Re-open QlikView and open the document. Log in as the Country Manager, CM1, user. Note that USA is the only country. Also, the product group, GroupA, is missing—there are no sales of this product group in USA. Close QlikView and then re-open again. This time, log in as the Sales Manager, SM2. You will not be allowed access to the document. Log into the document as the ADMIN user. Edit the script. Add a second entry for the SM2 user in the Access table as follows: USER, SM2, GROUPA, USA USER, SM2, GROUPB, UK Reload, save, and close the document and QlikView. Re-open and log in as SM2. Note the selections. How it works… Section Access is really quite simple. The user is connected to the data and the data is reduced accordingly. QlikView allows Section Access tables to be connected to multiple dimensions in the main data structure without causing issues with loops. Each associated field acts in the same way as a selection in the layout. The initial setting for the SM2 user contained values that were mutually exclusive. Because of the default Strict Exclusion setting, the SM2 user cannot log in. We changed the script and included multiple rows for the SM2 user. Intuitively, we might expect that, as the first row did not connect to the data, only the second row would connect to the data. However, each field value is treated as an individual selection and all of the values are included. There's more… If we wanted to include solely the composite association of Country and ProductGroup, we would need to derive a composite key in the data set and connect the user to that. In this example, we used the USERID field to test using QlikView logins. However, we would normally use NTNAME to link the user to either a Windows login or a custom login. Resources for Article : Further resources on this subject: Pentaho Reporting: Building Interactive Reports in Swing [Article] Visual ETL Development With IBM DataStage [Article] A Python Multimedia Application: Thumbnail Maker [Article]

0
0
6166

article-image-ibm-cognos-workspace-advanced

Packt

14 Jun 2013

5 min read

IBM Cognos Workspace Advanced

Packt

14 Jun 2013

5 min read

0
0
3242

Packt

12 Jun 2013

8 min read

A quick start – OpenCV fundamentals

Packt

12 Jun 2013

8 min read

(For more resources related to this topic, see here.) The OpenCV library has a modular structure, and the following diagram depicts the different modules available in it: A brief description of all the modules is as follows: Module Feature Core A compact module defining basic data structures, including the dense multidimensional array Mat and basic functions used by all other modules. Imgproc An image processing module that includes linear and non-linear image filtering, geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color space conversion, histograms, and so on. Video A video analysis module that includes motion estimation, background subtraction, and object tracking algorithms. Calib3d Basic multiple-view geometry algorithms, single and stereo camera calibration, object pose estimation, stereo correspondence algorithms, and elements of 3D reconstruction. Features2d Salient feature detectors, descriptors, and descriptor matchers. Objdetect Detection of objects and instances of the predefined classes; for example, faces, eyes, mugs, people, cars, and so on. Highgui An easy-to-use interface to video capturing, image and video codecs, as well as simple UI capabilities. Gpu GPU-accelerated algorithms from different OpenCV modules. Task 1 – image basics When trying to recreate the physical world around us in digital format via a camera, for example, the computer just sees the image in the form of a code that just contains the numbers 1 and 0. A digital image is nothing but a collection of pixels (picture elements) which are then stored in matrices in OpenCV for further manipulation. In the matrices, each element contains information about a particular pixel in the image. The pixel value decides how bright or what color that pixel should be. Based on this, we can classify images as: Greyscale Color/RGB Greyscale Here the pixel value can range from 0 to 255 and hence we can see the various shades of gray as shown in the following diagram. Here, 0 represents black and 255 represents white: A special case of grayscale is the binary image or black and white image. Here every pixel is either black or white, as shown in the following diagram: Color/RGB Red, Blue, and Green are the primary colors and upon mixing them in various different proportions, we can get new colors. A pixel in a color image has three separate channels— one each for Red, Blue, and Green. The value ranges from 0 to 255 for each channel, as shown in the following diagram: Task 2 – reading and displaying an image We are now going to write a very simple and basic program using the OpenCV library to read and display an image. This will help you understand the basics. Code A simple program to read and display an image is as follows: // opencv header files #include "opencv2/highgui/highgui.hpp" #include "opencv2/core/core.hpp" // namespaces declaration using namespace cv; using namespace std; // create a variable to store the image Mat image; int main( int argc, char** argv ) { // open the image and store it in the 'image' variable // Replace the path with where you have downloaded the image image=imread("<path to image">/lena.jpg"); // create a window to display the image namedWindow( "Display window", CV_WINDOW_AUTOSIZE ); // display the image in the window created imshow( "Display window", image ); // wait for a keystroke waitKey(0); return 0; } Code explanation Now let us understand how the code works. Short comments have also been included in the code itself to increase the readability. #include "opencv2/highgui/highgui.hpp" #include "opencv2/core/core.hpp" The preceding two header files will be a part of almost every program we write using the OpenCV library. As explained earlier, the highgui header is used for window creation, management, and so on, while the core header is used to access the Mat data structure in OpenCV. using namespace cv; using namespace std; The preceding two lines declare the required namespaces for this code so that we don't have to use the :: (scope resolution) operator every time for accessing the functions. Mat image; With the above command, we have just created a variable image of the datatype Mat that is frequently used in OpenCV to store images. image=imread("<path to image">/lena.jpg"); In the previous command, we opened the image lena.jpg and stored it in the image variable. Replace <path to image> in the preceding command with the location of that picture on your PC. namedWindow( "Display window", CV_WINDOW_AUTOSIZE ); We now need a window to display our image. So, we use the above function to do the same. This function takes two parameters, out of which the first one is the name of the window. In our case, we would like to name our window Display Window. The second parameter is optional, but it resizes the window based on the size of the image so that the image is not cropped. imshow( "Display window", image ); Finally, we are ready to display our image in the window we just created by using the preceding function. This function takes two parameters out of which the first one is the window name in which the image has to be displayed. In our case, obviously, that will be Display Window . The second parameter is the image variable containing the image that we want to display. In our case, it's the image variable. waitKey(0); Last but not least, it is advised that you use the preceding function in most of the codes that you write using the OpenCV library. If we don't write this code, the image will be displayed for a fraction of a second and the program will be immediately terminated. It happens so fast that you will not be able to see the image. What this function does essentially is that it waits for a keystroke from the user and hence it delays the termination of the program. The delay here is in milliseconds. Output The image can be displayed as follows: Task 3 – resizing and saving an image We are now going to write a very simple and basic program using the OpenCV library to resize and save an image. Code The following code helps you to resize a given image: // opencv header files #include "opencv2/highgui/highgui.hpp" #include "opencv2/imgproc/imgproc.hpp" #include "opencv2/core/core.hpp" // namespaces declaration using namespace std; using namespace cv; int main(int argc, char** argv) { // create variables to store the images Mat org, resized,saved; // open the image and store it in the 'org' variable // Replace the path with where you have downloaded the image org=imread("<path to image>/lena.png"); //Create a window to display the image namedWindow("Original Image",CV_WINDOW_AUTOSIZE); //display the image imshow("Original Image",org); //resize the image resize(org,resized,Size(),0.5,0.5,INTER_LINEAR); namedWindow("Resized Image",CV_WINDOW_AUTOSIZE); imshow("Resized Image",resized); //save the image //Replace <path> with your desired location imwrite("<path>/saved.png",resized; namedWindow("Image saved",CV_WINDOW_AUTOSIZE); saved=imread("<path to image>/saved.png"); imshow("Image saved",saved); //wait for a keystroke waitKey(0); return 0; } Code explanation Only the new functions/concepts will be explained in this case. #include "opencv2/imgproc/imgproc.hpp" Imgproc is another useful header that gives us access to the various transformations, color conversions, filters, histograms, and so on. Mat org, resized; We have now created two variables, org and resized, to store the original and resized images respectively. resize(org,resized,Size(),0.5,0.5,INTER_LINEAR); We have used the preceding function to resize the image. The preceding function takes six parameters, out of which the first one is the variable containing the source image to be modified. The second one is the variable to store the resized image. The third parameter is the output image size. In this case we have not specified this, but we have instead used the Size() function, which will automatically calculate it based on the values of the fourth and fifth parameters. The fourth and fifth parameters are the scale factors along the horizontal and vertical axes respectively. The sixth parameter is for choosing the type of interpolation method. We have used the bilinear interpolation, which is the default method. imwrite("<path>/saved.png",final); Finally, using the preceding function, you can save an image to a particular location on our PC. The function takes two parameters, out of which the first one is the location where you want to store the image and the second is the variable in which the image is stored. This function is very useful when you want to perform multiple operations on an image and save the image on your PC for future reference. Replace <path> in the preceding function with your desired location. Output Resizing can be demonstrated through the following output: Summary This section showed you how to perform a few of the basic tasks in OpenCV as well as how to write your first OpenCV program. Resources for Article : Further resources on this subject: OpenCV: Segmenting Images [Article] Tracking Faces with Haar Cascades [Article] OpenCV: Image Processing using Morphological Filters [Article]

0
0
8544

How-To Tutorials - Data

Using Oracle GoldenGate

Making a simple cURL request (Simple)

Data sources for the Charts

Participating in a business process (Intermediate)

Model Design Accelerator

First steps with R

Avro Source Sink

DPM Non-aware Windows Workload Protection

Measuring Performance with Key Performance Indicators

Getting Started with Oracle Data Guard

Trending Topics

Creating your first collection (Simple)

Creating your first heat map in R

Linking Section Access to multiple dimensions

IBM Cognos Workspace Advanced

A quick start – OpenCV fundamentals

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access