Data | 0 articles | Tech News, Tutorials & Expert Insights

article-image-sharing-your-bi-reports-and-dashboards

19 Dec 2013

4 min read

Sharing Your BI Reports and Dashboards

19 Dec 2013

(For more resources related to this topic, see here.) The final objective of the information in the BI reports and dashboards is to detect the cause-effect business behavior and trends, and trigger actions to solve them. These actions supported by visual information, via scorecards and dashboards. This process requires an interaction with several people. MicroStrategy includes the functionality to share our reports, scorecards, and dashboards, regardless of the location of the people. Reaching your audience MicroStrategy offers the option to share our reports via different channels that leverage the latest social technologies that are already present in the marketplace, that is, MicroStrategy integrates with Twitter and Facebook. The sharing is like avoiding any related costs and maintaining the design premise of the do-it-yourself approach without any help from specialized IT personnel. Main menu The main menu of MicroStrategy shows a column named Status. When we click on that column, as shown in the following screenshot, the Share option appears: The Share button The other option is the Share button within our reports, that is, the view that we want to share. Select the Share button located at the bottom of the screen, as shown in the following screenshot: The share options are the same, regardless of the location where you activate the option; the various alternate menus are shown in the following screenshot: E-mail sharing While selecting the e-mail option from the Scorecards-Dashboards model, the system will ask you for the e-mail programs that you want to use in order to send an e-mail; in our case, we select Outlook. MicroStrategy automatically prepares an e-mail with a link to share it. You can modify the text, and select the recipients of the e-mail, as shown in the following screenshot: The recipients of the e-mail will click on the URL that is included in the e-mail, send it by this schema, and the user will be able to analyze the report in a read-only mode with only the Filters panel enabled. The following screenshot shows how the user will review the report. Also, the user is not allowed to make any modifications. This option does not require a MicroStrategy platform user account. When a user clicks on the link, he is able to edit the filters and perform their analyses, as well as switch to any available layout, in our case, scorecards and dashboards. As a result, any visualization object can be maximized and minimized for better analysis, as shown in the following screenshot: In this option, the report can be visualized in a fullscreen mode by clicking on the fullscreen button [] located at the top-right corner of the screen. In this sharing mode, the user is able to download the information in Excel and PDF formats for each visualization object. For instance, if you need all the data included in the grid for the stores in region 1 opened in the year 2000. Perform the following steps: In the browser, open the URL that is generated when you select the e-mail share option. Select the ScoreCard tab. In the Open Year filter, type 2012 and in the Region filter, type 1. Now, maximize the grid. Two icons will appear in the top-left corner of the screen: one for exporting the data to Excel and the other for exporting it to PDF for each visualization object, as shown in the following screenshot: Please keep in mind that these two export options only apply to a specific visualization object; it is not possible to export the complete report from this functionality that is offered to the consumer. Summary In this article, we learned how to share our scorecards and dashboards via several channels, such as e-mails, social networks (Twitter and Facebook), and blogs or corporate intranet sites. Resources for Article: Further resources on this subject: Participating in a business process (Intermediate) [Article] Self-service Business Intelligence, Creating Value from Data [Article] Exploring Financial Reporting and Analysis [Article]

0
0
1644

article-image-n-way-replication-oracle-11g-streams-part-2

Packt

05 Feb 2010

7 min read

N-Way Replication in Oracle 11g Streams: Part 2

Packt

05 Feb 2010

7 min read

Streaming STRM2 to STRM1 Now the plan for setting up Streams for STRM2. It is the mirror image of what we have done above, except for the test part. On STRM2, log in as STRM_ADMIN. -- ADD THE QUEUE, a good queue name is STREAMS_CAPTURE_Q -- ADD THE CAPTURE RULE -- ADD THE PROPAGATION RULE -- INSTANTIATE TABLE ACROSS DBLINK -- DBLINK TO DESTINATION is STRM1.US.APGTECH.COM -- SOURCE is STRM2.US.APGTECH.COM On STRM1 log in as STRM_ADMIN. -- ADD THE QUEUE: A good queue name is STREAMS_APPLY_Q -- ADD THE APPLY RULE Start everything up and test the Stream on STRM2. Then check to see if the record is STREAM'ed to STRM1. -- On STRM2 log in as STRM_ADMIN -- ADD THE QUEUE :A good queue name is STREAMS_CAPTURE_Q -- STRM_ADMIN@STRM2.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.SET_UP_QUEUE(queue_table => '"STREAMS_CAPTURE_QT"',queue_name => '"STREAMS_CAPTURE_Q"',queue_user => '"STRM_ADMIN"');END;/commit;-- ADD THE CAPTURE RULE-- STRM_ADMIN@STRM2.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.ADD_TABLE_RULES(table_name => '"LEARNING.EMPLOYEES"',streams_type => 'capture',streams_name => '"STREAMS_CAPTURE"',queue_name => '"STRM_ADMIN"."STREAMS_CAPTURE_Q"',include_dml => true,include_ddl => true,include_tagged_lcr => false,inclusion_rule => true);END;/commit;-- ADD THE PROPAGATION RULE-- STRM_ADMIN@STRM2.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.ADD_TABLE_PROPAGATION_RULES(table_name => '"LEARNING.EMPLOYEES"',streams_name => '"STREAMS_PROPAGATION"',source_queue_name =>'"STRM_ADMIN"."STREAMS_CAPTURE_Q"',destination_queue_name =>'"STRM_ADMIN"."STREAMS_APPLY_Q"@STRM1.US.APGTECH.COM',include_dml => true,include_ddl => true,source_database => 'STRM2.US.APGTECH.COM',inclusion_rule => true);END;/COMMIT; Because the table was instantiated from STRM1 already, you can skip this step. -- INSTANTIATE TABLE ACROSS DBLINK-- STRM_ADMIN@STRM2.US.APGTECH.COM>DECLAREiscn NUMBER; -- Variable to hold instantiation SCN valueBEGINiscn := DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER();DBMS_APPLY_ADM.SET_TABLE_INSTANTIATION_SCN@STRM1.US.APGTECH.COM(source_object_name => 'LEARNING.EMPLOYEES',source_database_name => 'STRM1.US.APGTECH.COM',instantiation_scn => iscn);END;/COMMIT; -- On STRM1, log in as STRM_ADMIN. -- ADD THE QUEUE, a good queue name is STREAMS_APPLY_Q-- STRM_ADMIN@STRM1.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.SET_UP_QUEUE(queue_table => '"STREAMS_APPLY_QT"',queue_name => '"STREAMS_APPLY_Q"',queue_user => '"STRM_ADMIN"');END;/COMMIT;-- ADD THE APPLY RULE-- STRM_ADMIN@STRM1.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.ADD_TABLE_RULES(table_name => '"LEARNING.EMPLOYEES"',streams_type => 'apply',streams_name => '"STREAMS_APPLY"',queue_name => '"STRM_ADMIN"."STREAMS_APPLY_Q"',include_dml => true,include_ddl => true,include_tagged_lcr => false,inclusion_rule => true);END;/commit; Start everything up and Test. -- STRM_ADMIN@STRM1.US.APGTECH.COM>BEGINDBMS_APPLY_ADM.SET_PARAMETER(apply_name => 'STREAMS_APPLY',parameter => 'disable_on_error',value => 'n');END;/COMMIT;-- STRM_ADMIN@STRM1.US.APGTECH.COM>DECLAREv_started number;BEGINSELECT DECODE(status, 'ENABLED', 1, 0) INTO v_startedFROM DBA_APPLY where apply_name = 'STREAMS_APPLY';if (v_started = 0) thenDBMS_APPLY_ADM.START_APPLY(apply_name => '"STREAMS_APPLY"');end if;END;/COMMIT;-- STRM_ADMIN@STRM2.US.APGTECH.COM>DECLAREv_started number;BEGINSELECT DECODE(status, 'ENABLED', 1, 0) INTO v_startedFROM DBA_CAPTURE where CAPTURE_NAME = 'STREAMS_CAPTURE';if (v_started = 0) thenDBMS_CAPTURE_ADM.START_CAPTURE(capture_name => '"STREAMS_CAPTURE"');end if;END;/ Then on STRM2: -- STRM_ADMIN@STRM2.US.APGTECH.COM>ACCEPT fname PROMPT 'Enter Your Mom's First Name:'ACCEPT lname PROMPT 'Enter Your Mom's Last Name:'Insert into LEARNING.EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME,TIME) Values (5, '&fname', '&lname', NULL);dbms_lock.sleep(10); --give it time to replicate Then on STRM1, search for the record. -- STRM_ADMIN@STRM1.US.APGTECH.COM>Select * from LEARNING.EMPLOYEES; We now have N-way replication. But wait, what about conflict resolution?Good catch; all of this was just to set up N-way replication. In this case, it is a 2-way replication. It will work the majority of the time; that is until there is conflict. Conflict resolution needs to be set up and in this example the supplied/built-in conflict resolution handler MAXIMUM will be used. Now, let us cause some CONFLICT! Then we will be good people and create the conflict resolution and ask for world peace while we are at it! Conflict resolution Conflict between User 1 and User 2 has happened. Unbeknown to both of them, they have both inserted the exact same row of data to the same table, at roughly the same time. User 1's insert is to the STRM1 database. User 2's insert is to the STRM2 database. Normally the transaction that arrives second will raise an error. It is most likely that the error will be some sort of primary key violation and that the transaction will fail. We do not want that to happen. We want the transaction that arrives last to "win" and be committed to the database. At this point, you may be wondering "How do I choose which conflict resolution to use?" Well, you do not get to choose, the Business Community that you support will determine the rules most of the time. They will tell you how they want conflict resolution handled. Your responsibility is to know what can be solved with built-in conflict resolutions and when you will need to create custom conflict resolution. Going back to User 1 and User 2. In this particular case, User 2's insert arrives later than User 1's insert. Now the conflict resolution is added using the DBMS_APPLY_ADM package, specifically the procedure DBMS_APPLY_ADM.SET_UPDATE_CONFLICT_ HANDLER which instructs the APPLY process on how to handle the conflict. Scripts_5_1_CR.sql shows the conflict resolution used to resolve the conflict between User 1 and User 2. Since it is part of the APPLY process, this script is run by the Streams Administrator. In our case, that would be STRM_ADMIN. This type of conflict can occur on either STRM1 or STRM2 database, so the script will be run on both databases. The numbers to the left are there for reference reasons. They are not in the provided code. -- Scripts_5_1_CR.sql1. DECLARE2. cols DBMS_UTILITY.NAME_ARRAY;3. BEGIN4. cols(0) := 'employee_id';5. cols(1) := 'first_name';6. cols(2) := 'last_name';7. cols(3) := 'time';8. DBMS_APPLY_ADM.SET_UPDATE_CONFLICT_HANDLER(9. object_name => 'learning.employees',10. method_name => 'MAXIMUM',11. resolution_column => 'time',12. column_list => cols);13. END;14. /15. Commit; So what do these 15 magical lines do to resolve conflict?Let us break it down piece by piece logically first, and look at the specific syntax of the code. Oracle needs to know where to look when a conflict happens. In our example, that is the learning.employees table. Furthermore, Oracle needs more than just the table name. It needs to know what columns are involved. Line 9 informs Oracle of the table. Lines 1 -7 relate to the columns. Line 8 is the actual procedure name. What Oracle is supposed to do when this conflict happens, is answered by Line 10. Line 10 instructs Oracle to take the MAXIMUM of the resolution_column and use that to resolve the conflict. Since our resolution column is time, the last transaction to arrive is the "winner" and is applied.

0
0
1641

article-image-mysql-cluster-management-part-2

Packt

10 May 2010

11 min read

MySQL Cluster Management : Part 2

Packt

10 May 2010

11 min read

0
0
1639

article-image-first-principle-and-useful-way-think

Packt

08 Oct 2015

8 min read

First Principle and a Useful Way to Think

Packt

08 Oct 2015

8 min read

In this article, by Timothy Washington, author of the book Clojure for Finance, we will cover the following topics: Modeling the stock price activity Function evaluation First-Class functions Lazy evaluation Basic Clojure functions and immutability Namespace modifications and creating our first function (For more resources related to this topic, see here.) Modeling the stock price activity There are many types of banks. Commercial entities (large stores, parking areas, hotels, and so on) that collect and retain credit card information, are either quasi banks, or farm out credit operations to bank-like processing companies. There are more well-known consumer banks, which accept demand deposits from the public. There are also a range of other banks such as commercial banks, insurance companies and trusts, credit unions, and in our case, investment banks. As promised, this article will slowly build up a set of lagging price indicators that follow a moving stock price time series. In order to do that, I think it's useful to touch on stock markets, and to crudely model stock price activity. A stock (or equity) market, is a collection of buyers and sellers trading economic assets (usually companies). The stock (or shares) of those companies can be equities listed on an exchange (New York Stock Exchange, London Stock Exchange, and others), or may be those traded privately. In this exercise, we will do the following: Crudely model the stock price movement, which will give us a test bed for writing our lagging price indicators Introduce some basic features of the Clojure language Function evaluation The Clojure website has a cheatsheet (http://clojure.org/cheatsheet) with all of the language's core functions. The first function we'll look at is rand, a function that randomly gives you a number within a given range. So in your edgar/ project, launch a repl with the lein repl shell command. After a few seconds, you will enter repl (Read-Eval-Print-Loop). Again, Clojure functions are executed by being placed in the first position of a list. The function's arguments are placed directly afterwards. In your repl, evaluate (rand 35) or (rand 99) or (rand 43.21) or any number you fancy Run it many times to see that you can get any different floating point number, within 0 and the upper bound of the number you provided First-Class functions The next functions we'll look at are repeatedly and fn. repeatedly is a function that takes another function and returns an infinite (or length n if supplied) lazy sequence of calls to the latter function. This is our first encounter of a function that can take another function. We'll also encounter functions that return other functions. Described as First-Class functions, this falls out of lambda calculus and is one of the central features of functional programming. As such, we need to wrap our previous (rand 35) call in another function. fn is one of Clojure's core functions, and produces an anonymous, unnamed function. We can now supply this function to repeatedly. In your repl, if you evaluate (take 25 (repeatedly (fn [] (rand 35)))), you should see a long list of floating point numbers with the list's tail elided. Lazy evaluation We only took the first 25 of the (repeatedly (fn [] (rand 35))) result list, because the list (actually a lazy sequence) is infinite. Lazy evaluation (or laziness) is a common feature in functional programming languages. Being infinite, Clojure chooses to delay evaluating most of the list until it's needed by some other function that pulls out some values. Laziness benefits us by increasing performance and letting us more easily construct control flow. We can avoid needless calculation, repeated evaluations, and potential error conditions in compound expressions. Let's try to pull out some values with the take function. take itself, returns another lazy sequence, of the first n items of a collection. Evaluating (take 25 (repeatedly (fn [] (rand 35)))) will pull out the first 25 repeatedly calls to rand which generates a float between 0 and 35. Basic Clojure functions and immutability There's many operations we can perform over our result list (or lazy sequence). One of the main approaches of functional programming is to take a data structure, and perform operations over top of it to produce a new data structure, or some atomic result (a string, number, and so on). This may sound inefficient at first. But most FP languages employ something called immutability to make these operations efficient. Immutable data structures are the ones that cannot change once they've been created. This is feasible as most immutable, FP languages use some kind of structural data sharing between an original and a modified version. The idea is that if we run evaluate (conj [1 2 3] 4), the resulting [1 2 3 4] vector shares the original vector of [1 2 3]. The only additional resource that's assigned is for any novelty that's been introduced to the data structure (the 4). There's a more detailed explanation of (for example) Clojure's persistent vectors here: conj: This conjoins an element to a collection—the collection decides where. So conjoining an element to a vector (conj [1 2 3] 4) versus conjoining an element to a list (conj '(1 2 3) 4) yield different results. Try it in your repl. map: This passes a function over one or many lists, yielding another list. (map inc [1 2 3]) increments each element by 1. reduce (or left fold): This passes a function over each element, accumulating one result. (reduce + (take 100 (repeatedly (fn [] (rand 35))))) sums the list. filter: This constrains the input by some condition. >=: This is a conditional function, which tests whether the first argument is greater than or equal to the second function. Try (>= 4 9) and (>= 9 1). fn: This is a function that creates a function. This unnamed or anonymous function can have any instructions you choose to put in there. So if we only want numbers above 12, we can put that assertion in a predicate function. Try entering the below expression into your repl: (take 25 (filter (fn [x] (>= x 12)) (repeatedly (fn [] (rand 35))))) Modifying the namespaces and creating our first function We now have the basis for creating a function. It will return a lazy infinite sequence of floating point numbers, within an upper and lower bound. defn is a Clojure function, which takes an anonymous function, and binds a name to it in a given namespace. A Clojure namespace is an organizational tool for mapping human-readable names to things like functions, named data structures and such. Here, we're going to bind our function to the name generate-prices in our current namespace. You'll notice that our function is starting to span multiple lines. This will be a good time to author the code in your text editor of choice. I'll be using Emacs: Open your text editor, and add this code to the file called src/edgar/core.clj. Make sure that (ns edgar.core) is at the top of that file. After adding the following code, you can then restart repl. (load "edgaru/core") uses the load function to load the Clojure code in your in src/edgaru/core.clj: (defn generate-prices [lower-bound upper-bound] (filter (fn [x] (>= x lower-bound)) (repeatedly (fn [] (rand upper-bound))))) The Read-Eval-Print-Loop In our repl, we can pull in code in various namespaces, with the require function. This applies to the src/edgar/core.clj file we've just edited. That code will be in the edgar.core namespace: In your repl, evaluate (require '[edgar.core :as c]). c is just a handy alias we can use instead of the long name. You can then generate random prices within an upper and lower bound. Take the first 10 of them like this (take 10 (c/generate-prices 12 35)). You should see results akin to the following output. All elements should be within the range of 12 to 35: (29.60706184716407 12.507593971664075 19.79939384292759 31.322074615579716 19.737852534147326 25.134649707849572 19.952195022152488 12.94569843904663 23.618693004455086 14.695872710062428) There's a subtle abstraction in the preceding code that deserves attention. (require '[edgar.core :as c]) introduces the quote symbol. ' is the reader shorthand for the quote function. So the equivalent invocation would be (require (quote [edgar.core :as c])). Quoting a form tells the Clojure reader not to evaluate the subsequent expression (or form). So evaluating '(a b c) returns a list of three symbols, without trying to evaluate a. Even though those symbols haven't yet been assigned, that's okay, because that expression (or form) has not yet been evaluated. But that begs a larger question. What is reader? Clojure (and all Lisps) are what's known as homoiconic. This means that Clojure code is also data. And data can be directly output and evaluated as code. The reader is the thing that parses our src/edgar/core.clj file (or (+ 1 1) input from the repl prompt), and produces the data structures that are evaluated. read and eval are the 2 essential processes by which Clojure code runs. The evaluation result is printed (or output), usually to the standard output device. Then we loop the process back to the read function. So, when the repl reads, your src/edgar/two.clj file, it's directly transforming that text representation into data and evaluating it. A few things fall out of that. For example, it becomes simpler for Clojure programs to directly read, transform and write out other Clojure programs. The implications of that will become clearer when we look at macros. But for now, know that there are ways to modify or delay the evaluation process, in this case by quoting a form. Summary In this article, we learned about basic features of the Clojure language and how to model the stock price activity. Besides these, we also learned function evaluation, First-Class functions, the lazy evaluation method, namespace modifications and creating our first function. Resources for Article: Further resources on this subject: Performance by Design[article] Big Data[article] The Observer Pattern [article]

0
0
1628

article-image-event-detection-news-headlines-hadoop

Packt

08 Dec 2016

13 min read

Event detection from the news headlines in Hadoop

Packt

08 Dec 2016

13 min read

In this article by Anurag Shrivastava, author of Hadoop Blueprints, we will be learning how to build a text analytics system which detects the specific events from the random news headlines. Internet has become the main source of news in the world. There are thousands of website which constantly publish and update the news stories around the world. Not every news items is relevant for everyone but some news items are very critical for some people or businesses. For example, if you were major car manufacturer based in Germany having your suppliers located in India then you would be interested in the news from the region which can affect your supply chain. (For more resources related to this topic, see here.) Road accidents in India are a major social and economic problem. Road accidents leave a large number of fatalities behind and result in the loss of capital. In this example, we will build a system which detects if a news item refers to a road accident event. Let us define what we mean by it in the next paragraph. A road accident event may or may not result in fatal injuries. One or more vehicles and pedestrians may be involved in the accidents. A non road accident event news item is everything else which can not be categorized as a road accident event. It could be a road accident trend analysis related to road accidents or something totally unrelated. Technology stack To build this system, we will use the following technologies: Task Technology Data storage HDFS Data processing Hadoop MapReduce Query engine Hive and Hive UDF Data ingestion Curl and HDFS copy Event detection OpenNLP The event detection system is a machine learning based natural language processing system. The natural language processing system brings the intelligence to detect the events in the random headline sentences from the news items. An OpenNLP OpenSourceNaturalLanguageProcessingFramework (OpenNLP) is from apache software foundation. You can download the version 1.6.0 from https://opennlp.apache.org/ to run the examples in this blog. It is capable of detecting the entities, document categories, parts of speech, and so on in the text written by humans. We will use document categorization feature of OpenNLP in our system. Document categorization feature requires you to train the OpenNLP model with the help of sample text. As a result of training, we get a model. This resulting model is used to categorize the new text. Our training data looks as follows: r 1.46 lakh lives lost on Indian roads last year - The Hindu. r Indian road accident data | OpenGovernmentData (OGD) platform... r 400 people die everyday in road accidents in India: Report - India TV. n Top Indian female biker dies in road accident during country-wide tour. n Thirty die in road accidents in north India mountains—World—Dunya... n India's top woman biker Veenu Paliwal dies in road accident: India... r Accidents on India's deadly roads cost the economy over $8 billion... n Thirty die in road accidents in north India mountains (The Express) The first column can take two values: n indicates that the news item is a road accident event r indicates that the news item is not a road accident event or everything else This training set has total 200 lines. Please note that OpenNLP requires at least 15000 lines in the training set to deliver good results. Because we do not have so much training data, we will start with a small set but remain aware about the limitations of our model. You will see that even with a small training dataset, this model works reasonably well. Let us train and build our model: $ opennlp DoccatTrainer -model en-doccat.bin -lang en -data roadaccident.train.prn -encoding UTF-8 Here the file roadaccident.train.prn contains the training data. The output file en-doccat.bin contains the model which we will use in our data pipeline. We have built our model using the command line utility but it is also possible to build the model programmatically. The training data file is a plain text file, which you can expand with a bigger corpus of knowledge to make the model smarter. Next we will build the data pipeline as follows: Fetch RSS feeds This component will fetch RSS news feeds from the popular news web sites. In this case, we will just use one news from Google. We can always add more sites after our first RSS feed has been integrated. The whole RSS feed can be downloaded using the following command: $ curl "https://news.google.com/news?cf=all&hl=en&ned=in&topic=n&output=rss" The previous command downloads the news headline for India. You can customize the RSS feed by visiting the Google news site is https://news.google.com for your region. Scheduler Our scheduler will fetch the RSS feed once in 6 hours. Let us assume that in 6 hours time interval, we have good likelihood of fetching fresh news items. We will wrap our feed fetching script in a shell file and invoke it using cron. The script is as follows: $ cat feedfetch.sh NAME= "newsfeed-"`date +%Y-%m-%dT%H.%M.%S` curl "https://news.google.com/news?cf=all&hl=en&ned=in&topic=n&output=rss" > $NAME hadoop fs -put $NAME /xml/rss/newsfeeds Cron job setup line will be as follows: 0 */6 * * * /home/hduser/mycommand Please edit your cron job table using the following command and add the setup line in it: $ cronjob -e Loading data in HDFS To load data in HDFS, we will use HDFS put command which copies the downloaded RSS feed in a directory in HDFS. Let us make this directory in HDFS where our feed fetcher script will store the rss feeds: $ hadoop fs -mkdir /xml/rss/newsfeeds Query using Hive First we will create an external table in Hive for the new RSS feed. Using Xpath based select queries, we will extract the news headlines from the RSS feeds. These headlines will be passed to UDF to detect the categories: CREATE EXTERNAL TABLE IF NOT EXISTS rssnews( document STRING) COMMENT 'RSS Feeds from media' STORED AS TEXTFILE location '/xml/rss/newsfeeds'; The following command parses the XML to retrieve the title or the headlines from XML and explodes them in a single column table: SELECT explode(xpath(name, '//item/title/text()')) FROM xmlnews1; The sample output of the above command on my system is as follows: hive> select explode(xpath(document, '//item/title/text()')) from rssnews; Query ID = hduser_20161010134407_dcbcfd1c-53ac-4c87-976e-275a61ac3e8d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1475744961620_0016, Tracking URL = http://localhost:8088/proxy/application_1475744961620_0016/ Kill Command = /home/hduser/hadoop-2.7.1/bin/hadoop job -kill job_1475744961620_0016 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-10-10 14:46:14,022 Stage-1 map = 0%, reduce = 0% 2016-10-10 14:46:20,464 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.69 sec MapReduce Total cumulative CPU time: 4 seconds 690 msec Ended Job = job_1475744961620_0016 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.69 sec HDFS Read: 120671 HDFS Write: 1713 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 690 msec OK China dispels hopes of early breakthrough on NSG, sticks to its guns on Azhar - The Hindu Pampore attack: Militants holed up inside govt building; combing operations intensify - Firstpost CPI(M) worker hacked to death in Kannur - The Hindu Akhilesh Yadav's comment on PM Modi's Lucknow visit shows Samajwadi Party's insecurity: BJP - The Indian Express PMO maintains no data about petitions personally read by PM - Daily News & Analysis AIADMK launches social media campaign to put an end to rumours regarding Amma's health - Times of India Pakistan, India using us to play politics: Former Baloch CM - Times of India Indian soldier, who recited patriotic poem against Pakistan, gets death threat - Zee News This Dussehra effigies of 'terrorism' to go up in flames - Business Standard 'Personal reasons behind Rohith's suicide': Read commission's report - Hindustan Times Time taken: 5.56 seconds, Fetched: 10 row(s) Hive UDF Our Hive User Defined Function (UDF) categorizeDoc takes a news headline and suggests if it is a news about a road accident or the road accident event as we explained earlier. This function is as follows: package com.mycompany.app;import org.apache.hadoop.io.Text;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;import opennlp.tools.util.InvalidFormatException;import opennlp.tools.doccat.DoccatModel;import opennlp.tools.doccat.DocumentCategorizerME;import java.lang.String;import java.io.FileInputStream;import java.io.InputStream;import java.io.IOException;@Description( name = "getCategory", value = "_FUNC_(string) - gets the catgory of a document ")public final class MyUDF extends UDF { public Text evaluate(Text input) { if (input == null) return null; try { return new Text(categorizeDoc(input.toString())); } catch (Exception ex) { ex.printStackTrace(); return new Text("Sorry Failed: >> " + input.toString()); } } public String categorizeDoc(String doc) throws InvalidFormatException, IOException { InputStream is = new FileInputStream("./en-doccat.bin"); DoccatModel model = new DoccatModel(is); is.close(); DocumentCategorizerME classificationME = new DocumentCategorizerME(model); String documentContent = doc; double[] classDistribution = classificationME.categorize(documentContent); String predictedCategory = classificationME.getBestCategory(classDistribution); return predictedCategory; }} The function categorizeDoc take a single string as input. It loads the model which we created earlier from the file en-doccat.bin from the local directory. Finally it calls the classifier which returns the result to the calling function. The calling function MyUDF extends the hive UDF class. It calls the function categorizeDoc for each string line item input. If the it succeed then the value is returned to the calling program otherwise a message is returned which indicates that the category detection has failed. The pom.xml file to build the above file is as follows: $ cat pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xsi_schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.mycompany</groupId> <artifactId>app</artifactId> <version>1.0</version> <packaging>jar</packaging> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1</version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.0.0</version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.6.0</version> </dependency> </dependencies> <build> <pluginManagement> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.8</version> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>com.mycompany.app.App</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </pluginManagement> </build> </project> You can build the jar with all the dependencies in it using the following commands: $ mvn clean compile assembly:single The resulting jar file app-1.0-jar-with-dependencies.jar can be found in the target directory. Let us use this jar file in Hive to categorise the news headlines as follows: Copy jar file to the bin subdirectory in the Hive root: $ cp app-1.0-jar-with-dependencies.jar $HIVE_ROOT/bin Copy the trained model in the bin sub directory in the Hive root: $ cp en-doccat.bin $HIVE_ROOT/bin Run the categorization queries Run Hive: $hive Add jar file in Hive: hive> ADD JAR ./app-1.0-jar-with-dependencies.jar ; Create a temporary categorization function catDoc: hive> CREATE TEMPORARY FUNCTION catDoc as 'com.mycompany.app.MyUDF'; Create a table headlines to hold the headlines extracted from the RSS feed: hive> create table headlines( headline string); Insert the extracted headlines in the table headlines: hive> insert overwrite table headlines select explode(xpath(document, '//item/title/text()')) from rssnews; Let's test our UDF by manually passing a real news headline to it from a newspaper website: hive> hive> select catDoc("8 die as SUV falls into river while crossing bridge in Ghazipur") ; OK N The output is N which means this is indeed a headline about a road accident incident. This is reasonably good, so now let us run this function for the all the headlines: hive> select headline, catDoc(*) from headlines; OK China dispels hopes of early breakthrough on NSG, sticks to its guns on Azhar - The Hindu r Pampore attack: Militants holed up inside govt building; combing operations intensify - Firstpost r Akhilesh Yadav Backs Rahul Gandhi's 'Dalali' Remark - NDTV r PMO maintains no data about petitions personally read by PM Narendra Modi - Economic Times n Mobile Internet Services Suspended In Protest-Hit Nashik - NDTV n Pakistan, India using us to play politics: Former Baloch CM - Times of India r CBI arrests Central Excise superintendent for taking bribe - Economic Times n Be extra vigilant during festivals: Centre's advisory to states - Times of India r CPI-M worker killed in Kerala - Business Standard n Burqa-clad VHP activist thrashed for sneaking into Muslim women gathering - The Hindu r Time taken: 0.121 seconds, Fetched: 10 row(s) You can see that our headline detection function works and output r or n. In the above example, we see many false positives where a headline has been incorrectly identified as a road accident. A better training for our model can improve the quality of our results. Further reading The book Hadoop Blueprints covers several case studies where we can apply Hadoop, HDFS, data ingestion tools such as Flume and Sqoop, query and visualization tools such as Hive and Zeppelin, machine learning tools such as BigML and Spark to build the solutions. You will discover how to build a fraud detection system using Hadoop or build a Data Lake for example. Summary In this article we have learned to build a text analytics system which detects the specific events from the random news headlines. This also covers how to apply Hadoop, HDFS, and other different tools. Resources for Article: Further resources on this subject: Spark for Beginners [article] Hive Security [article] Customizing heat maps (Intermediate) [article]

0
0
1609

article-image-author-podcast-bob-griesemer-oracle-warehouse-builder-11g

Packt

09 Apr 2010

1 min read

Author Podcast - Bob Griesemer on Oracle Warehouse Builder 11g

Packt

09 Apr 2010

1 min read

Click here to download the interview, or hit play in the media player below.

0
0
1542

Packt

12 Oct 2015

6 min read

Securing Your Data

Packt

12 Oct 2015

6 min read

In this article by Tyson Cadenhead, author of Socket.IO Cookbook, we will explore several topics related to security in Socket.IO applications. These topics will cover the gambit, from authentication and validation to how to use the wss:// protocol for secure WebSockets. As the WebSocket protocol opens innumerable opportunities to communicate more directly between the client and the server, people often wonder if Socket.IO is actually as secure as something such as the HTTP protocol. The answer to this question is that it depends entirely on how you implement it. WebSockets can easily be locked down to prevent malicious or accidental security holes, but as with any API interface, your security is only as tight as your weakest link. In this article, we will cover the following topics: Locking down the HTTP referrer Using secure WebSockets (For more resources related to this topic, see here.) Locking down the HTTP referrer Socket.IO is really good at getting around cross-domain issues. You can easily include the Socket.IO script from a different domain on your page, and it will just work as you may expect it to. There are some instances where you may not want your Socket.IO events to be available on every other domain. Not to worry! We can easily whitelist only the http referrers that we want so that some domains will be allowed to connect and other domains won't. How To Do It… To lock down the HTTP referrer and only allow events to whitelisted domains, follow these steps: Create two different servers that can connect to our Socket.IO instance. We will let one server listen on port 5000 and the second server listen on port 5001: var express = require('express'), app = express(), http = require('http'), socketIO = require('socket.io'), server, server2, io; app.get('/', function (req, res) { res.sendFile(__dirname + '/index.html'); }); server = http.Server(app); server.listen(5000); server2 = http.Server(app); server2.listen(5001); io = socketIO(server); When the connection is established, check the referrer in the headers. If it is a referrer that we want to give access to, we can let our connection perform its tasks and build up events as normal. If a blacklisted referrer, such as the one on port 5001 that we created, attempts a connection, we can politely decline and perhaps throw an error message back to the client, as shown in the following code: io.on('connection', function (socket) { switch (socket.request.headers.referer) { case 'http://localhost:5000/': socket.emit('permission.message', 'Okay, you're cool.'); break; default: returnsocket.emit('permission.message', 'Who invited you to this party?'); break; } }); On the client side, we can listen to the response from the server and react as appropriate using the following code: socket.on('permission.message', function (data) { document.querySelector('h1').innerHTML = data; }); How It Works… The referrer is always available in the socket.request.headers object of every socket, so we will be able to inspect it there to check whether it was a trusted source. In our case, we will use a switch statement to whitelist our domain on port 5000, but we could really use any mechanism at our disposal to perform the task. For example, if we need to dynamically whitelist domains, we can store a list of them in our database and search for it when the connection is established. Using secure WebSockets WebSocket communications can either take place over the ws:// protocol or the wss:// protocol. In similar terms, they can be thought of as the HTTP and HTTPS protocols in the sense that one is secure and one isn't. Secure WebSockets are encrypted by the transport layer, so they are safer to use when you handle sensitive data. In this recipe, you will learn how to force our Socket.IO communications to happen over the wss:// protocol for an extra layer of encryption. Getting Ready… In this recipe, we will need to create a self-signing certificate so that we can serve our app locally over the HTTPS protocol. For this, we will need an npm package called Pem. This allows you to create a self-signed certificate that you can provide to your server. Of course, in a real production environment, we would want a true SSL certificate instead of a self-signed one. To install Pem, simply call npm install pem –save. As our certificate is self-signed, you will probably see something similar to the following screenshot when you navigate to your secure server: Just take a chance by clicking on the Proceed to localhost link. You'll see your application load using the HTTPS protocol. How To Do It… To use the secure wss:// protocol, follow these steps: First, create a secure server using the built-in node HTTPS package. We can create a self-signed certificate with the pem package so that we can serve our application over HTTPS instead of HTTP, as shown in the following code: var https = require('https'), pem = require('pem'), express = require('express'), app = express(), socketIO = require('socket.io'); // Create a self-signed certificate with pem pem.createCertificate({ days: 1, selfSigned: true }, function (err, keys) { app.get('/', function(req, res){ res.sendFile(__dirname + '/index.html'); }); // Create an https server with the certificate and key from pem var server = https.createServer({ key: keys.serviceKey, cert: keys.certificate }, app).listen(5000); vario = socketIO(server); io.on('connection', function (socket) { var protocol = 'ws://'; // Check the handshake to determine if it was secure or not if (socket.handshake.secure) { protocol = 'wss://'; } socket.emit('hello.client', { message: 'This is a message from the server. It was sent using the ' + protocol + ' protocol' }); }); }); In your client-side JavaScript, specify secure: true when you initialize your WebSocket as follows: var socket = io('//localhost:5000', { secure: true }); socket.on('hello.client', function (data) { console.log(data); }); Now, start your server and navigate to https://localhost:5000. Proceed to this page. You should see a message in your browser developer tools that shows, This is a message from the server. It was sent using the wss:// protocol. How It Works… The protocol of our WebSocket is actually set automatically based on the protocol of the page that it sits on. This means that a page that is served over the HTTP protocol will send the WebSocket communications over ws:// by default, and a page that is served by HTTPS will default to using the wss:// protocol. However, by setting the secure option to true, we told the WebSocket to always serve through wss:// no matter what. Summary In this article, we gave you an overview of the topics related to security in Socket.IO applications. Resources for Article: Further resources on this subject: Using Socket.IO and Express together[article] Adding Real-time Functionality Using Socket.io[article] Welcome to JavaScript in the full stack [article]

0
0
1538

article-image-faqs-microsoft-sql-server-2008-high-availability

Packt

28 Jan 2011

6 min read

FAQs on Microsoft SQL Server 2008 High Availability

Packt

28 Jan 2011

6 min read

Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication Install various SQL Server High Availability options in a step-by-step manner A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability Tips to enhance performance with SQL Server High Availability External references for further study Q: What is Clustering? A: Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are critical for business and should be available 24 X 7 with 99.99% up time. Q: How does MS Windows server Enterprise and Datacenter edition support failover clustering? A: MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two or more identical nodes connected to each other by means of private network and commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). Q: What is MSDTC? A: Microsoft Distributed Transaction Coordinator (MSDTC) is a service used by the SQL Server when it is required to have distributed transactions between more than one machine. In a clustered environment, SQL Server service can be hosted on any of the available nodes if the active node fails, and in this case MSDTC comes into the picture in case we have distributed queries and for replication, and hence the MSDTC service should be running. Following are a couple of questions with regard to MSDTC. Q: What will happen to the data that is being accessed? A: The data is taken care of, by shared disk arrays as it is shared and every node that is part of the cluster can access it; however, one node at a time can access and own it. Q: What about clients that were connected previously? Does the failover mean that developers will have to modify the connection string? A: Nothing like this happens. SQL Server is installed as a virtual server and it has a virtual IP address and that too is shared by every cluster node. So, the client actually knows only one SQL Server or its IP address. Here are the steps that explain how Failover will work: Node 1 owns the resources as of now, and is active node. The network adapter driver gets corrupted or suffers a physical damage. Heartbeat between Node1 and Node 2 is broken. Node 2 initiates the process to take ownership of the resources owned by the Node 1. It would approximately take two to five minutes to complete the process. Q: What is Hyper-V? What are their uses? A: Let's see what the Hyper-V is: It is a hypervisor-based technology that allows multiple operating systems to run on a host operating system at the same time. It has advantages of using SQL Server 2008 R2 on Windows Server 2008 R2 with Hyper-V. One such example could be the ability to migrate a live server, thereby increasing high availability without incurring downtime, among others. Hyper-V now supports up to 64 logical processors. It can host up to four VMs on a single licensed host server. SQL Server 2008 R2 allows an unrestricted number of virtual servers, thus making consolidation easy. It has the ability to manage multiple SQL Servers centrally using Utility Control Point (UCP). Sysprep utility can be used to create preconfigured VMs so that SQL Server deployment becomes easier. Q: What are the Hardware, Software and Operating system requirements for installing SQL Server 2008 R2? A: The following are the hardware requirements: Processor: Intel Pentium 3 or Higher Processor Speed: 1 GHZ or Higher RAM: 512 MB of RAM but 2 GB is recommended Display : VGA or Higher The following are the software requirements: Operating system: Windows 7 Ultimate, Windows Server 2003 (x86 or x64), Windows Server 2008 (x86 or x64) Disk space: Minimum 1 GB .Net Framework 3.5 Windows Installer 4.5 or later MDAC 2.8 SP1 or later The following are the operating system requirements for clustering: To install SQL Server 2008 clustering, it's essential to have Windows Server 2008 Enterprise or Data Center Edition installed on our host system with Full Installation, so that we don't have to go back and forth and install the required components and restart the system. Q: What is to be done when we see the network binding warning coming up? A: In this scenario, we will have to go to Network and Sharing Center | Change Adapter Settings. Once there, pressing Alt + F, we will select Advanced Settings. Select Public Network and move it up if it is not and repeat this process on the second node. Q: What is the difference between Active/Passive and Active/Active Failover Cluster? A: In reality, there is only one difference between Single-instance (Active/Passive Failover Cluster) and Multi-instance (Active/Active Failover Cluster). As its name suggests, in a Multi-instance cluster, there will be two or more SQL Server active instances running in a cluster, compared to one instance running in Single-instance. Also, to configure a multi-instance Cluster, we may need to procure additional disks, IP addresses, and network names for the SQL Server. Q: What is the benefit of having Multi-instance, that is, Active/Active configuration? A: Depending on the business requirement and the capability of our hardware, we may have one or more instances running in our cluster environment. The main goal is to have a better uptime and better High Availability by having multiple SQL Server instances running in an environment. Should anything go wrong with the one SQL Server instance, another instance can easily take over the control and keep the business-critical application up and running! Q: What will be the difference in the prerequisites for the Multi-instance Failover Cluster as compared to the Single-instance Failover Cluster? A: There will be no difference compared to a Single-instance Failover Cluster, except that we need to procure additional disk(s), network name, and IP addresses. We need to make sure that our hardware is capable of handling requests that come from client machines for both the instances. Installing a Multi-instance cluster is almost similar to adding a Single-instance cluster, except for the need to add a few resources along with a couple of steps here and there.

0
0
1520

article-image-oracle-business-intelligence-11g-r1-cookbook

Packt

10 Jul 2013

4 min read

Measuring Performance with Key Performance Indicators

Packt

10 Jul 2013

4 min read

(For more resources related to this topic, see here.) Creating the KPIs and the KPI watchlists We're going to create Key Performance Indicators and watchlists in the first recipe. There should be comparable measure columns in the repository in order to create KPI objects. The following columns will be used in the sample scenario: Shipped Quantity Requested Quantity How to do it Click on the KPI link in the Performance Management section and you're going to select a subject area. The KPI creation wizard has five different steps. The first step is the General Propertiessection and we're going to write a description for the KPI object. The Actual Value and the Target Value attributes display the columns that we'll use in this scenario. The columns should be selected manually. The Enable Trendingcheckbox is not selected by default. When you select the checkbox, trending options will appear on the screen. We're going to select the Day level from the Time hierarchy for trending in the Compare to Prior textbox and define a value for the Tolerance attribute. We're going to use 1 and % Change in this scenario. Clicking on the Next button will display the second step named Dimensionality. Click on the Add button to select Dimension attributes. Select the Region column in the Add New Dimension window. After adding the Region column, repeat the step for the YEAR column. You shouldn't select any value to pin. Both column values will be . Clicking on Next button will display the third step named States. You can easily configure the state values in this step. Select the High Values are Desirable value from the Goal drop-down list. By default, there are three steps: OK Warning Critical Then click on the Next button and you'll see the Related Documents step. This is a list of supporting documents and links regarding to the Key Performance Indicator. Click on the Add button to select one of the options. If you want to use another analysis as a supporting document, select the Catalog option and choose the analysis that consists of some valuable information about the report. We're going to add a link. You can also easily define the address of the link. We'll use the http://www.abc.com/portal link. Click on the Next button to display the Custom Attributes column values. To add a custom attribute that will be displayed in the KPI object, click on the Add button and define the values specified as follows: Number: 1 Label: Dollars Formula: "Fact_Sales"."Dollars" Save the KPI object by clicking on the Save button. Right after saving the KPI object, you'll see the KPI content. KPI objects cannot be published in the dashboards directly. We need KPI watchlists to publish them in the dashboards. Click on the KPI Watchlist link in the Performance Managementsection to create one. The New KPI Watchlist page will be displayed without any KPI objects. Drag-and-drop the KPI object that was previously created from the Catalog pane onto the KPI watchlist list. When you drop the KPI object, the Add KPI window will pop up automatically. You can select one of the available values for the dimensions. We're going to select the Use Point-of-View option. Enter a Label value, A Sample KPI, for this example. You'll see the dimension attributes in the Point-of-View bar. You can easily select the values from the drop-down lists to have different perspectives. Save the KPI watchlist object. How it works KPI watchlists can contain multiple KPI objects based on business requirements. These container objects can be published in the dashboards so that end users will access the content of the KPI objects through the watchlists. When you want to publish these watchlists, you'll need to select a value for the dimension attributes. There's more The Drill Down feature is also enabled in the KPI objects. If you want to access finer levels, you can just click on the hyperlink of the value you are interested in and a detailed level is going to be displayed automatically. Summary In this article, we learnt how to create KPIs and KPI watchlists. Key Performance Indicators are building blocks of strategy management. In order to implement balanced scorecard management technique in an organization, you'll first need to create the KPI objects. Resources for Article : Further resources on this subject: Oracle Integration and Consolidation Products [Article] Managing Oracle Business Intelligence[Article] Oracle Tools and Products [Article]

0
0
1508

Packt

18 Dec 2013

8 min read

Settings goals

Packt

18 Dec 2013

8 min read

0
0
1481

article-image-preparation-analysis-data-source

Packt

22 Nov 2013

6 min read

Preparation Analysis of Data Source

Packt

22 Nov 2013

6 min read

(For more resources related to this topic, see here.) List of data sources Here, the wide varieties of data sources that are supported in the PowerPivot interface are given in brief. The vital part is to install providers such as OLE DB and ODBC that support the existing data source, because when installing the PowerPivot add-in, it will not install the provider too and some providers might already be installed with other applications. For instance, if there is a SQL Server installed in the system, the OLE DB provider will be installed with the SQL Server, so that later on it wouldn't be necessary to install OLE DB while adding data sources by using the SQL Server as a data source. Hence, make sure to verify the provider before it is added as a data source. Perhaps the provider is required only for relational database data sources. Relational database By using RDBMS, you can import tables and views into the PowerPivot workbook. The following is a list of various data sources: Microsoft SQL Server Microsoft SQL Azure Microsoft SQL Server Parallel Data Warehouse Microsoft Access Oracle Teradata Sybase Informix IBM DB2 Other providers (OLE DB / ODBC) Multidimensional sources Multidimensional data sources can only be added from Microsoft Analysis Services (SSAS). Data feeds The three types of data feeds are as follows: SQL Server Reporting Service (SSRS) Azure DataMarket dataset Other feeds such as atom service documents and single data feed Text files The two types of text files are given as follows: Excel file Text file Purpose of importing data from a variety of sources In order to make a decision about a particular subject area, you should analyze all the required data that is relevant to the subject area. If the data is stored in a variety of data sources, importing the data from different data sources has to be done. If all the data is only in one data source, only the data needs to be imported from the required tables for the subject and then various types of analysis can be done. The reason why users need to import data from different data sources is that they would then have an ample amount of data when they need to make any analysis. Another generic reason would be to cross-analyze data from different business systems such as Customer Relationship Management (CRM) and Campaign Management System (CMS). Data sourcing from only one source wouldn't be as sophisticated as an analysis done from different data sources as the amount of data from which the analysis was done for multisourced data is more detailed than the data from only a single source. It also might reveal conflicts between multiple data sources. In real time, usually in the e-commerce industry, blogs, and forum websites wouldn't ask for more details about customers at the time of registration, because the time consumed for long registrations would discourage the user, leading to cancellation the of the order. For instance, the customer table that would be stored in the database of an e-commerce industry would contain the following attributes: Customer FirstName LastName E-mail BirthDate Zip Code Gender However, this kind of industry needs to know their customers more in order to increase their sales. Since the industry only saves a few attributes about the customer during registration, it is difficult to track down the customers and it is even more difficult to advertise according to individual customers. Therefore, in order to find some relevant data about the customers, the e-commerce industries try to make another data source using the Internet or other types of sources. For instance, by using the Postcode given by the customer during registration, the user can get Country | State | City from various websites and then use the information obtained to make a table format either in Excel or CSV, as follows: Location Postcode City State Country So, finally the user would have two sources—one source is from the user's RDBMS database and the other source is from the Excel file created later—both of these can be used in order to make a customer analysis based on their location. General overview of ETL The main goal is to facilitate the development of data migration applications by applying data transformations. Extraction Transform Load (ETL) comprises of the first step in a data warehouse process. It is the most important and the hardest part, because it determines the quality of the data warehouse and the scope of the analyses the users will be able to build upon it. Let us discuss in detail what ETL is. The first substep is extraction. As the users would want to regroup all the information a company has, they will have to collect the data from various heterogeneous sources (operational data such as databases and/or external data such as CSV files) and various models (relational, geographical, web pages, and so on). The difficulty starts here. As data sources are heterogeneous, formats are usually different, and the users would have different types of data models for similar information (different names and/or formats) and different keys for the same objects. These are some of the main problems. The aim of the transformation task is to correct such problems, as much as possible. Let us now see what a transformation task is. The users would need to transform heterogeneous data of different sources into homogeneous data. Here are some examples of what they can do: Extraction of the data sources Identification of relevant data sources Filtering of non-admissible data Modification of formats or values Summary This article shows the users how to prepare data for analysis. We also covered different types of data sources that can be imported into the PowerPivot interface. Some information about the provider and a general overview of the ETL process has also been given. Users now know how the ETL process works in the PowerPivot interface; also, users were shown an introduction and the advantages of DAX. Resources for Article: Further resources on this subject: Creating a pivot table [Article] What are SSAS 2012 dimensions and cube? [Article] An overview of Oracle Hyperion Interactive Reporting [Article]

0
0
1468

article-image-article-ibm-cognos-business-intelligence-dashboard-business-insight-advanced

Packt

16 Jul 2012

4 min read

IBM Cognos 10 Business Intelligence

Packt

16 Jul 2012

4 min read

Introducing IBM Cognos 10 BI Cognos Connection In this recipe we will be exploring Cognos Connection, which is the user interface presented to the user when he/she logs in to IBM Cognos 10 BI for the first time. IBM Cognos 10 BI, once installed and configured, can be accessed through the Web using supported web browsers. For a list of supported web browsers, refer to the Installation and Configuration Guide shipped with the product. Getting ready As stated earlier, make sure that IBM Cognos 10 BI is installed and configured. Install and configure the GO Sales and GO Data Warehouse samples. Use the gateway URI to log on to the web interface called Cognos Connection. How to do it... To explore Cognos Connection, perform the following steps: Log on to Cognos Connection using the gateway URI that may be similar to http://<HostName>:<PortNumber>/ibmcognos/cgi-bin/cognos.cgi. Take note of the Cognos Connection interface. It has the GO Sales and GO Data Warehouse samples visible. Note the blue-colored folder icon, shown as in the preceding screenshot. It represents metadata model packages that are published to Cognos Connection using the Cognos Framework Manager tool. These packages have objects that represent business data objects, relationships, and calculations, which can be used to author reports and dashboards. Refer to the book, IBM Cognos TM1 Cookbook by Packt Publishing to learn how to create metadata models packages. From the toolbar, click on Launch. This will open a menu, showing different studios, each having different functionality, as shown in the following screenshot: We will use Business Insight and Business Insight Advanced, which are the first two choices in the preceding menu. These are the two components used to create and view dashboards. For other options, refer to the corresponding books by the same publisher. For instance, refer to the book, IBM Cognos 8 Report Studio Cookbook to know more about creating and distributing complex reports. Query Studio and Analysis Studio are meant to provide business users with the facility to slice and dice business data themselves. Event Studio is meant to define business situations and corresponding actions. Coming back to Cognos Connection, note that a yellow-colored folder icon, which is shown as represents a user-defined folder, which may or may not contain other published metadata model packages, reports, dashboards, and other content. In our case, we have a user-defined folder called Samples. This was created when we installed and configured samples shipped with the product. Click on the New Folder icon, which is represented by , on the toolbar to create a user-defined folder. Other options are also visible here, for instance to create a new dashboard. Click on the user-defined folder—Samples to view its contents, as shown in the following screenshot: As shown in the preceding screenshot, it has more such folders, each having its own content. The top part of the pane shows the navigation path. Let's navigate deeper into Models | Business Insight Samples to show some sample dashboards, created using IBM Cognos Business Insight, as shown in the following screenshot: Click on one of these links to view the corresponding dashboard. For instance, click on Sales Dashboard (Interactive) to view the dashboard, as shown in the following screenshot: The dashboard can also be opened in the authoring tool, which is IBM Cognos Business Insight, in this case by clicking on the icon shown as on extreme right, on Cognos Connection. It will show the same result as shown in the preceding screenshot. We will see the Business Insight interface in detail later in this article. How it works... Cognos Connection is the primary user interface that user sees when he/she logs in for the first time. Business data has to be first identified and imported from the metadata model using the Cognos Framework Manager tool. Relationships (inner/outer joins) and calculations are then created, and the resultant metadata model package is published to the IBM Cognos 10 BI Server. This becomes available on Cognos Connection. Users are given access to appropriate studios on Cognos Connection, according to their needs. Analysis, reports, and dashboards are then created and distributed using one of these studios. The preceding sample has used Business Insight, for instance. Later sections in this article will look more into Business Insight and Business Insight Advanced. The next section focuses on the Business Insight interface details from the navigation perspective.

0
0
1461

Packt

31 Aug 2010

8 min read

MySQL 5.1 Plugin: HTML Storage Engine—Reads and Writes

Packt

31 Aug 2010

8 min read

(For more resources on MySQL, see here.) An idea of the HTML engine Ever thought about what your tables might look like? Why not represent a table as a <TABLE>? You would be able to see it, visually, in any browser. Sounds cool. But how could we make it work? We want a simple engine, not an all-purpose Swiss Army Knife HTML-to-SQL converter, which means we will not need any existing universal HTML or XML parsers, but can rely on a fixed file format. For example, something like this: <html><head><title>t1</title></head><body><table border=1><tr><th>col1</th><th>other col</th><th>more cols</th></tr><tr><td>data</td><td>more data</td><td>more data</td></tr><tr><td>data</td><td>more data</td><td>more data</td></tr>... and so on ...</table></body></html> But even then this engine is way more complex than the previous example, and it makes sense to split the code. The engine could stay, as usual, in the ha_html.cc file, the declarations in ha_html.h, and if we need any utility functions to work with HTML we can put them in the htmlutils.cc file. Flashback A storage engine needs to declare a plugin and an initialization function that fills a handlerton structure. Again, the only handlerton method that we need here is a create() method. #include "ha_html.h"static handler* html_create_handler(handlerton *hton, TABLE_SHARE *table, MEM_ROOT *mem_root){ return new (mem_root) ha_html(hton, table);}static int html_init(void *p){ handlerton *html_hton = (handlerton *)p; html_hton->create = html_create_handler; return 0;}struct st_mysql_storage_engine html_storage_engine ={ MYSQL_HANDLERTON_INTERFACE_VERSION };mysql_declare_plugin(html){ MYSQL_STORAGE_ENGINE_PLUGIN, &html_storage_engine, "HTML", "Sergei Golubchik", "An example HTML storage engine", PLUGIN_LICENSE_GPL, html_init, NULL, 0x0001, NULL, NULL, NULL}mysql_declare_plugin_end; Now we need to implement all of the required handler class methods. Let's start with simple ones: const char *ha_html::table_type() const{ return "HTML";}const char **ha_html::bas_ext() const{ static const char *exts[] = { ".html", 0 }; return exts;}ulong ha_html::index_flags(uint inx, uint part, bool all_parts) const{ return 0;}ulonglong ha_html::table_flags() const{ return HA_NO_TRANSACTIONS | HA_REC_NOT_IN_SEQ | HA_NO_BLOBS;}THR_LOCK_DATA **ha_html::store_lock(THD *thd, THR_LOCK_DATA **to, enum thr_lock_type lock_type){ if (lock_type != TL_IGNORE && lock.type == TL_UNLOCK) lock.type = lock_type; *to ++= &lock; return to;} These methods are familiar to us. They say that the engine is called "HTML", it stores the table data in files with the .html extension, the tables are not transactional, the position for ha_html::rnd_pos() is obtained by calling ha_html::position(), and that it does not support BLOBs. Also, we need a function to create and initialize an HTML_SHARE structure: static HTML_SHARE *find_or_create_share( const char *table_name, TABLE *table){ HTML_SHARE *share; for (share = (HTML_SHARE*)table->s->ha_data; share; share = share->next) if (my_strcasecmp(table_alias_charset, table_name, share->name) == 0) return share; share = (HTML_SHARE*)alloc_root(&table->s->mem_root, sizeof(*share)); bzero(share, sizeof(*share)); share->name = strdup_root(&table->s->mem_root, table_name); share->next = (HTML_SHARE*)table->s->ha_data; table->s->ha_data = share; return share;} It is exactly the same function, only the structure is now called HTML_SHARE, not STATIC_SHARE. Creating, opening, and closing the table Having done the basics, we can start working with the tables. The first operation, of course, is the table creation. To be able to read, update, or even open the table we need to create it first, right? Now, the table is just an HTML file and to create a table we only need to create an HTML file with our header and footer, but with no data between them. We do not need to create any TABLE or Field objects, or anything else—MySQL does it automatically. To avoid repeating the same HTML tags over and over we will define the header and the footer in the ha_html.h file as follows: #define HEADER1 "<html><head><title>"#define HEADER2 "</title></head><body><table border=1>\n"#define FOOTER "</table></body></html>"#define FOOTER_LEN ((int)(sizeof(FOOTER)-1)) As we want a header to include a table name we have split it in two parts. Now, we can create our table: int ha_html::create(const char *name, TABLE *table_arg, HA_CREATE_INFO *create_info){ char buf[FN_REFLEN+10]; strcpy(buf, name); strcat(buf, *bas_ext()); We start by generating a filename. The "table name" that the storage engine gets is not the original table name, it is converted to be a safe filename. All "troublesome" characters are encoded, and the database name is included and separated from the table name with a slash. It means we can safely use name as the filename and all we need to do is to append an extension. Having the filename, we open it and write our data: FILE *f = fopen(buf, "w"); if (f == 0) return errno; fprintf(f, HEADER1); write_html(f, table_arg->s->table_name.str); fprintf(f, HEADER2 "<tr>"); First, we write the header and the table name. Note that we did not write the value of the name argument into the header, but took the table name from the TABLE_SHARE structure (as table_arg->s->table_name.str), because name is mangled to be a safe filename, and we would like to see the original table name in the HTML page title. Also, we did not just write it into the file, we used a write_html() function—this is our utility method that performs the necessary entity encoding to get a well-formed HTML. But let's not think about it too much now, just remember that we need to write it, it can be done later. Now, we iterate over all fields and write their names wrapped in <th>...</th> tags. Again, we rely on our write_html() function here: for (uint i = 0; i < table_arg->s->fields; i++) { fprintf(f, "<th>"); write_html(f, table_arg->field[i]->field_name); fprintf(f, "</th>"); } fprintf(f, "</tr>"); fprintf(f, FOOTER); fclose(f); return 0;} Done, an empty table is created. Opening it is easy too. We generate the filename and open the file just as in the create() method. The only difference is that we need to remember the FILE pointer to be able to read the data later, and we store it in fhtml, which has to be a member of the ha_html object: int ha_html::open(const char *name, int mode, uint test_if_locked){ char buf[FN_REFLEN+10]; strcpy(buf, name); strcat(buf, *bas_ext()); fhtml = fopen(buf, "r+"); if (fhtml == 0) return errno; When parsing an HTML file we will often need to skip over known patterns in the text. Instead of using a special library or a custom pattern parser for that, let's try to use scanf()—it exists everywhere, has a built-in pattern matching language, and it is powerful enough for our purposes. For convenience, we will wrap it in a skip_html() function that takes a scanf() format and returns the number of bytes skipped. Assuming we have such a function, we can finish opening the table: skip_html(fhtml, HEADER1 "%*[^<]" HEADER2 "<tr>"); for (uint i = 0; i < table->s->fields; i++) { skip_html(fhtml, "<th>%*[^<]</th>"); } skip_html(fhtml, "</tr>"); data_start = ftell(fhtml); We skip the first part of the header, then "everything up to the opening angle bracket", which eats up the table name, and the second part of the header. Then we skip individual row headers in a loop and the end of row </tr> tag. In order not to repeat this parsing again we remember the offset where the row data starts. At the end we allocate an HTML_SHARE and initialize lock objects: share = find_or_create_share(name, table); if (share->use_count++ == 0) thr_lock_init(&share->lock); thr_lock_data_init(&share->lock,&lock,NULL); return 0;} Closing the table is simple, and should not come as a surprise to us: int ha_html::close(void){ fclose(fhtml); if (--share->use_count == 0) thr_lock_delete(&share->lock); return 0;}

0
0
1439

article-image-define-necessary-connections

Packt

02 Dec 2016

5 min read

Define the Necessary Connections

Packt

02 Dec 2016

5 min read

In this article by Robert van Mölken and Phil Wilkins, the author of the book Implementing Oracle Integration Cloud Service, where we will see creating connections which is one of the core components of an integration we can easily navigate to the Designer Portal and start creating connections. (For more resources related to this topic, see here.) On the home page, click the Create link of the Connection tile as given in the following screenshot: Because we click on this link the Connections page is loaded, which lists of all created connections, a modal dialogue automatically opens on top of the list. This pop-up shows all the adapter types we can create. For our first integration we define two technology adapter connections, an inbound SOAP connection and an outbound REST connection. Inbound SOAP connection In the pop-up we can scroll down the list and find the SOAP adapter, but the modal dialogue also includes a search field. Just search on SOAP and the list will show the adapters matching the search criteria: Find your adapter by searching on the name or change the appearance from card to list view to show more adapters at ones. Click Select to open the New Connection page. Before we can setup any adapter specific configurations every creation starts with choosing a name and an optional description: Create the connection with the following details: Connection Name FlightAirlinesSOAP_Ch2 Identifier This will be proposed based on the connection name and there is no need to change unless you'd like an alternate name. It is usually the name in all CAPITALS and without spaces and has a max length of 32 characters. Connection Role Trigger The role chosen restricts the connection to be used only in selected role(s). Description This receives in Airline objects as a SOAP service. Click the Create button to accept the details. This will bring us to the specific adapter configuration page where we can add and modify the necessary properties. The one thing all the adapters have in common is the optional Email Address under Connection Administration. This email address is used to send notification to when problems or changes occur in the connection. A SOAP connection consists of three sections; Connection Properties, Security, and an optional Agent Group. On the right side of each section we can find a button to configure its properties.Let's configure each section using the following steps: Click the Configure Connectivity button. Instead of entering in an URL we are uploading the WSDL file. Check the box in the Upload File column. Click the newly shown Upload button. Upload the file ICSBook-Ch2-FlightAirlines-Source WSDL. Click OK to save the properties. Click the Configure Credentials button. In the pop-up that is shown we can configure the security credentials. We have the choice for Basic authentication, Username Password Token, or No Security Policy. Because we use it for our inbound connection we don't have to configure this. Select No Security Policy from the dropdown list. This removes the username and password fields. Click OK to save the properties. We leave the Agent Group section untouched. We can attach an Agent Group if we want to use it as an outbound connection to an on-premises web service. Click Test to check if the connection is working (otherwise it can't be used). For SOAP and REST it simply pings the given domain to check the connectivity, but others for example the Oracle SaaS adapters also authenticate and collect metadata. Click the Save button at the top of the page to persist our changes. Click Exit Connection to return to the list from where we started. Outbound REST connection Now that the inbound connection is created we can create our REST adapter. Click the Create New Connection button to show the Create Connection pop-up again and select the REST adapter. Create the connection with the following details: Connection Name FlightAirlinesREST_Ch2 Identifier This will be proposed based on the connection name Connection Role Invoke Description This returns the Airline objects as a REST/JSON service Email Address Your email address to use to send notifications to Let’s configure the connection properties using the following steps: Click the Configure Connectivity button. Select REST API Base URL for the Connection Type. Enter the URL were your Apiary mock is running on: http://private-xxxx-yourapidomain.apiary-mock.com. Click OK to save the values. Next configure the security credentials using the following steps: Click the Configure Credentials button. Select No Security Policy for the Security Policy. This removes the username and password fields. Click the OK button to save out choice. Click Test at the top to check if the connection is working. Click the Save button at the top of the page to persist our changes. Click Exit Connection to return to the list from where we started. Troubleshooting If the test fails for one of these connections check if the correct WSDL is used or that the connection URL for the REST adapter exists or is reachable. Summary In this article we looked at the processes of creating and testing the necessary connections and the creation of the integration itself. We have seen an inbound SOAP connection and an outbound REST connection. In demonstrating the integration we have also seen how to use Apiary to document and mock our backend REST service. Resources for Article: Further resources on this subject: Getting Started with a Cloud-Only Scenario [article] Extending Oracle VM Management [article] Docker Hosts [article]

0
0
1417

article-image-gathering-all-rejects-prior-killing-job

Packt

13 Nov 2013

3 min read

Gathering all rejects prior to killing a job

Packt

13 Nov 2013

3 min read

(For more resources related to this topic, see here.) Getting ready Open the job jo_cook_ch03_0010_validationSubjob. As you can see, the reject flow has been attached and the output is being sent to a temporary store (tHashMap). How to do it… Add the tJava, tDie, tHashInput, and tFileOutputDelimited components. Add onSubjobOk to tJava from the tFileInputDelimited component. Add a flow from the tHashInput component to the tFileOutputDelimited component. Right-click the tJava component, select Trigger and then Runif. Link the trigger to the tDie component. Click the if link, and add the following code ((Integer)globalMap.get("tFileOutputDelimited_1_NB_LINE")) > 0 Right-click the tJava component, select Trigger, and then Runif. Link this trigger to the tHashInput component. ((Integer)globalMap.get("tFileOutputDelimited_1_NB_LINE")) == 0 The job should now look like the following: Drag the generic schema sc_cook_ch3_0010_genericCustomer to both the tHashInput and tFileOutputDelimited. Run the job. You should see that the tDie component is activated, because the file contained two errors. How it works… What we have done in this exercise is created a validation stage prior to processing the data. Valid rows are held in temporary storage (tHashOutput) and invalid rows are written to a reject file until all input rows are processed. The job then checks to see how many records are rejected (using the RunIf link). In this instance, there are invalid rows, so the RunIf link is triggered, and the job is killed using tDie. By ensuring that the data is correct before we start to process it into a target, we know that the data will be fit for writing to the target, and thus avoiding the need for rollback procedures. The records captured can then be sent to the support team, who will then have a record of all incorrect rows. These rows can be fixed in situ within the source file and the job simply re-run from the beginning. There's more... This article is particularly important when rollback/correction of a job may be particularly complex, or where there may be a higher than expected number of errors in an input. An example would be when there are multiple executions of a job that appends to a target file. If the job fails midway through, then rolling back involves identifying which records were appended to the file by the job before failure, removing them from the file, fixing the offending record, and then re-running. This runs the risk of a second error causing the same thing to happen again. On the other hand, if the job does not die, but a subsection of the data is rejected, then the rejects must be manipulated into the target file via a second manual execution of the job. So, this method enables us to be certain that our records will not fail to write due to incorrect data, and therefore saves our target from becoming corrupted. Summary This article has shown how the rejects are collected before killing a job. This article also shows how incorrect rejects be manipulated into the target file. Resources for Article: Further resources on this subject: Pentaho Data Integration 4: Working with Complex Data Flows [Article] Nmap Fundamentals [Article] Getting Started with Pentaho Data Integration [Article]

0
0
1397

How-To Tutorials - Data

Sharing Your BI Reports and Dashboards

N-Way Replication in Oracle 11g Streams: Part 2

MySQL Cluster Management : Part 2

First Principle and a Useful Way to Think

Event detection from the news headlines in Hadoop

Author Podcast - Bob Griesemer on Oracle Warehouse Builder 11g

Securing Your Data

FAQs on Microsoft SQL Server 2008 High Availability

Measuring Performance with Key Performance Indicators

Settings goals

Trending Topics

Preparation Analysis of Data Source

IBM Cognos 10 Business Intelligence

MySQL 5.1 Plugin: HTML Storage Engine—Reads and Writes

Define the Necessary Connections

Gathering all rejects prior to killing a job

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access