Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-sharing-your-bi-reports-and-dashboards
Packt
19 Dec 2013
4 min read
Save for later

Sharing Your BI Reports and Dashboards

Packt
19 Dec 2013
4 min read
(For more resources related to this topic, see here.) The final objective of the information in the BI reports and dashboards is to detect the cause-effect business behavior and trends, and trigger actions to solve them. These actions supported by visual information, via scorecards and dashboards. This process requires an interaction with several people. MicroStrategy includes the functionality to share our reports, scorecards, and dashboards, regardless of the location of the people. Reaching your audience MicroStrategy offers the option to share our reports via different channels that leverage the latest social technologies that are already present in the marketplace, that is, MicroStrategy integrates with Twitter and Facebook. The sharing is like avoiding any related costs and maintaining the design premise of the do-it-yourself approach without any help from specialized IT personnel. Main menu The main menu of MicroStrategy shows a column named Status. When we click on that column, as shown in the following screenshot, the Share option appears: The Share button The other option is the Share button within our reports, that is, the view that we want to share. Select the Share button located at the bottom of the screen, as shown in the following screenshot: The share options are the same, regardless of the location where you activate the option; the various alternate menus are shown in the following screenshot: E-mail sharing While selecting the e-mail option from the Scorecards-Dashboards model, the system will ask you for the e-mail programs that you want to use in order to send an e-mail; in our case, we select Outlook. MicroStrategy automatically prepares an e-mail with a link to share it. You can modify the text, and select the recipients of the e-mail, as shown in the following screenshot: The recipients of the e-mail will click on the URL that is included in the e-mail, send it by this schema, and the user will be able to analyze the report in a read-only mode with only the Filters panel enabled. The following screenshot shows how the user will review the report. Also, the user is not allowed to make any modifications. This option does not require a MicroStrategy platform user account. When a user clicks on the link, he is able to edit the filters and perform their analyses, as well as switch to any available layout, in our case, scorecards and dashboards. As a result, any visualization object can be maximized and minimized for better analysis, as shown in the following screenshot: In this option, the report can be visualized in a fullscreen mode by clicking on the fullscreen button [] located at the top-right corner of the screen. In this sharing mode, the user is able to download the information in Excel and PDF formats for each visualization object. For instance, if you need all the data included in the grid for the stores in region 1 opened in the year 2000. Perform the following steps: In the browser, open the URL that is generated when you select the e-mail share option. Select the ScoreCard tab. In the Open Year filter, type 2012 and in the Region filter, type 1. Now, maximize the grid. Two icons will appear in the top-left corner of the screen: one for exporting the data to Excel and the other for exporting it to PDF for each visualization object, as shown in the following screenshot: Please keep in mind that these two export options only apply to a specific visualization object; it is not possible to export the complete report from this functionality that is offered to the consumer. Summary In this article, we learned how to share our scorecards and dashboards via several channels, such as e-mails, social networks (Twitter and Facebook), and blogs or corporate intranet sites. Resources for Article: Further resources on this subject: Participating in a business process (Intermediate) [Article] Self-service Business Intelligence, Creating Value from Data [Article] Exploring Financial Reporting and Analysis [Article]
Read more
  • 0
  • 0
  • 1644

article-image-n-way-replication-oracle-11g-streams-part-2
Packt
05 Feb 2010
7 min read
Save for later

N-Way Replication in Oracle 11g Streams: Part 2

Packt
05 Feb 2010
7 min read
Streaming STRM2 to STRM1 Now the plan for setting up Streams for STRM2. It is the mirror image of what we have done above, except for the test part. On STRM2, log in as STRM_ADMIN. -- ADD THE QUEUE, a good queue name is STREAMS_CAPTURE_Q -- ADD THE CAPTURE RULE -- ADD THE PROPAGATION RULE -- INSTANTIATE TABLE ACROSS DBLINK -- DBLINK TO DESTINATION is STRM1.US.APGTECH.COM -- SOURCE is STRM2.US.APGTECH.COM On STRM1 log in as STRM_ADMIN. -- ADD THE QUEUE: A good queue name is STREAMS_APPLY_Q -- ADD THE APPLY RULE Start everything up and test the Stream on STRM2. Then check to see if the record is STREAM'ed to STRM1. -- On STRM2 log in as STRM_ADMIN -- ADD THE QUEUE :A good queue name is STREAMS_CAPTURE_Q -- STRM_ADMIN@STRM2.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.SET_UP_QUEUE(queue_table => '"STREAMS_CAPTURE_QT"',queue_name => '"STREAMS_CAPTURE_Q"',queue_user => '"STRM_ADMIN"');END;/commit;-- ADD THE CAPTURE RULE-- STRM_ADMIN@STRM2.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.ADD_TABLE_RULES(table_name => '"LEARNING.EMPLOYEES"',streams_type => 'capture',streams_name => '"STREAMS_CAPTURE"',queue_name => '"STRM_ADMIN"."STREAMS_CAPTURE_Q"',include_dml => true,include_ddl => true,include_tagged_lcr => false,inclusion_rule => true);END;/commit;-- ADD THE PROPAGATION RULE-- STRM_ADMIN@STRM2.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.ADD_TABLE_PROPAGATION_RULES(table_name => '"LEARNING.EMPLOYEES"',streams_name => '"STREAMS_PROPAGATION"',source_queue_name =>'"STRM_ADMIN"."STREAMS_CAPTURE_Q"',destination_queue_name =>'"STRM_ADMIN"."STREAMS_APPLY_Q"@STRM1.US.APGTECH.COM',include_dml => true,include_ddl => true,source_database => 'STRM2.US.APGTECH.COM',inclusion_rule => true);END;/COMMIT; Because the table was instantiated from STRM1 already, you can skip this step. -- INSTANTIATE TABLE ACROSS DBLINK-- STRM_ADMIN@STRM2.US.APGTECH.COM>DECLAREiscn NUMBER; -- Variable to hold instantiation SCN valueBEGINiscn := DBMS_FLASHBACK.GET_SYSTEM_CHANGE_NUMBER();DBMS_APPLY_ADM.SET_TABLE_INSTANTIATION_SCN@STRM1.US.APGTECH.COM(source_object_name => 'LEARNING.EMPLOYEES',source_database_name => 'STRM1.US.APGTECH.COM',instantiation_scn => iscn);END;/COMMIT; -- On STRM1, log in as STRM_ADMIN. -- ADD THE QUEUE, a good queue name is STREAMS_APPLY_Q-- STRM_ADMIN@STRM1.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.SET_UP_QUEUE(queue_table => '"STREAMS_APPLY_QT"',queue_name => '"STREAMS_APPLY_Q"',queue_user => '"STRM_ADMIN"');END;/COMMIT;-- ADD THE APPLY RULE-- STRM_ADMIN@STRM1.US.APGTECH.COM>BEGINDBMS_STREAMS_ADM.ADD_TABLE_RULES(table_name => '"LEARNING.EMPLOYEES"',streams_type => 'apply',streams_name => '"STREAMS_APPLY"',queue_name => '"STRM_ADMIN"."STREAMS_APPLY_Q"',include_dml => true,include_ddl => true,include_tagged_lcr => false,inclusion_rule => true);END;/commit; Start everything up and Test. -- STRM_ADMIN@STRM1.US.APGTECH.COM>BEGINDBMS_APPLY_ADM.SET_PARAMETER(apply_name => 'STREAMS_APPLY',parameter => 'disable_on_error',value => 'n');END;/COMMIT;-- STRM_ADMIN@STRM1.US.APGTECH.COM>DECLAREv_started number;BEGINSELECT DECODE(status, 'ENABLED', 1, 0) INTO v_startedFROM DBA_APPLY where apply_name = 'STREAMS_APPLY';if (v_started = 0) thenDBMS_APPLY_ADM.START_APPLY(apply_name => '"STREAMS_APPLY"');end if;END;/COMMIT;-- STRM_ADMIN@STRM2.US.APGTECH.COM>DECLAREv_started number;BEGINSELECT DECODE(status, 'ENABLED', 1, 0) INTO v_startedFROM DBA_CAPTURE where CAPTURE_NAME = 'STREAMS_CAPTURE';if (v_started = 0) thenDBMS_CAPTURE_ADM.START_CAPTURE(capture_name => '"STREAMS_CAPTURE"');end if;END;/ Then on STRM2: -- STRM_ADMIN@STRM2.US.APGTECH.COM>ACCEPT fname PROMPT 'Enter Your Mom's First Name:'ACCEPT lname PROMPT 'Enter Your Mom's Last Name:'Insert into LEARNING.EMPLOYEES (EMPLOYEE_ID, FIRST_NAME, LAST_NAME,TIME) Values (5, '&fname', '&lname', NULL);dbms_lock.sleep(10); --give it time to replicate Then on STRM1, search for the record. -- STRM_ADMIN@STRM1.US.APGTECH.COM>Select * from LEARNING.EMPLOYEES; We now have N-way replication. But wait, what about conflict resolution?Good catch; all of this was just to set up N-way replication. In this case, it is a 2-way replication. It will work the majority of the time; that is until there is conflict. Conflict resolution needs to be set up and in this example the supplied/built-in conflict resolution handler MAXIMUM will be used. Now, let us cause some CONFLICT! Then we will be good people and create the conflict resolution and ask for world peace while we are at it! Conflict resolution Conflict between User 1 and User 2 has happened. Unbeknown to both of them, they have both inserted the exact same row of data to the same table, at roughly the same time. User 1's insert is to the STRM1 database. User 2's insert is to the STRM2 database. Normally the transaction that arrives second will raise an error. It is most likely that the error will be some sort of primary key violation and that the transaction will fail. We do not want that to happen. We want the transaction that arrives last to "win" and be committed to the database. At this point, you may be wondering "How do I choose which conflict resolution to use?" Well, you do not get to choose, the Business Community that you support will determine the rules most of the time. They will tell you how they want conflict resolution handled. Your responsibility is to know what can be solved with built-in conflict resolutions and when you will need to create custom conflict resolution. Going back to User 1 and User 2. In this particular case, User 2's insert arrives later than User 1's insert. Now the conflict resolution is added using the DBMS_APPLY_ADM package, specifically the procedure DBMS_APPLY_ADM.SET_UPDATE_CONFLICT_ HANDLER which instructs the APPLY process on how to handle the conflict. Scripts_5_1_CR.sql shows the conflict resolution used to resolve the conflict between User 1 and User 2. Since it is part of the APPLY process, this script is run by the Streams Administrator. In our case, that would be STRM_ADMIN. This type of conflict can occur on either STRM1 or STRM2 database, so the script will be run on both databases. The numbers to the left are there for reference reasons. They are not in the provided code. -- Scripts_5_1_CR.sql1. DECLARE2. cols DBMS_UTILITY.NAME_ARRAY;3. BEGIN4. cols(0) := 'employee_id';5. cols(1) := 'first_name';6. cols(2) := 'last_name';7. cols(3) := 'time';8. DBMS_APPLY_ADM.SET_UPDATE_CONFLICT_HANDLER(9. object_name => 'learning.employees',10. method_name => 'MAXIMUM',11. resolution_column => 'time',12. column_list => cols);13. END;14. /15. Commit; So what do these 15 magical lines do to resolve conflict?Let us break it down piece by piece logically first, and look at the specific syntax of the code. Oracle needs to know where to look when a conflict happens. In our example, that is the learning.employees table. Furthermore, Oracle needs more than just the table name. It needs to know what columns are involved. Line 9 informs Oracle of the table. Lines 1 -7 relate to the columns. Line 8 is the actual procedure name. What Oracle is supposed to do when this conflict happens, is answered by Line 10. Line 10 instructs Oracle to take the MAXIMUM of the resolution_column and use that to resolve the conflict. Since our resolution column is time, the last transaction to arrive is the "winner" and is applied.
Read more
  • 0
  • 0
  • 1641

article-image-mysql-cluster-management-part-2
Packt
10 May 2010
11 min read
Save for later

MySQL Cluster Management : Part 2

Packt
10 May 2010
11 min read
Replication between clusters with a backup channel The previous recipe showed how to connect a MySQL Cluster to another MySQL server or another MySQL Cluster using a single replication channel. Obviously, this means that this replication channel has a single point of failure (if either of the two replication agents {machines} fail, the channel goes down). If you are designing your disaster recovery plan to rely on MySQL Cluster replication, then you are likely to want more reliability than that. One simple thing that we can do is run multiple replication channels between two clusters. With this setup, in the event of a replication channel failing, a single command can be executed on one of the backup channel slaves to continue the channel. It is not currently possible to automate this process (at least, not without scripting it yourself). The idea is that with a second channel ready and good monitoring of the primary channel, you can quickly bring up the replication channel in the case of failure, which means significantly less time spent with the replication channel down. How to do it… Setting up this process is not vastly different, however, it is vital to ensure that both channels are not running at any one time, or the data at the slave site will become a mess and the replication will stop. To guarantee this, the first step is to add the following to the mysqld section of /etc/my.cnf on all slave MySQL Servers (of which there are likely to be two): skip-slave-start Once added, restart mysqld. This my.cnf parameter prevents the MySQL Server from automatically starting the slave process. You should start one of the channels (normally, whichever channel you decide will be your master) normally, while following the steps in the previous recipe. To configure the second slave, follow the instructions in the previous recipe, but stop just prior to the CHANGE MASTER TO step on the second (backup) slave. If you configure two replication channels simultaneously (that is, forget to stop the existing replication channel when testing the backup), you will end up with a broken setup. Do not proceed to run CHANGE MASTER TO on the backup slave unless the primary channel is not operating. As soon as the primary communication channel fails, you should execute the following command on any one of the SQL nodes in your slave (destination) cluster and record the result: [slave] mysql> SELECT MAX(epoch) FROM mysql.ndb_apply_status;+---------------+| MAX(epoch) |+---------------+| 5952824672272 |+---------------+1 row in set (0.00 sec) The previous highlighted number is the ID of the most recent global checkpoint, which is run every couple of seconds on all storage nodes in the master cluster and as a result, all the REDO logs are synced to disk. Checking this number on a SQL node in the slave cluster tells you what the last global checkpoint that made it to the slave cluster was. You can run a similar command SELECT MAX(epoch) FROM mysql.ndb_binlog_index on any SQL node in the master (source) cluster to find out what the most recent global checkpoint on the master cluster is. Clearly, if your replication channel goes down, then these two numbers will diverge quickly. Use this number (5952824672272 in our example) to find the correct logfile and position that you should connect to. You can do this by executing the following command on any SQL node in the master (source) cluster that you plan to make the new master, ensuring that you substitute the output of the previous command with the correct number as an epoch field as follows: mysql> SELECT-> File,-> Position-> FROM mysql.ndb_binlog_index-> WHERE epoch > 5952824672272-> ORDER BY epoch ASC LIMIT 1;+--------------------+----------+| File | Position |+--------------------+----------+| ./node2-bin.000003 | 200998 |+--------------------+----------+1 row in set (0.00 sec) If this returns NULL, firstly, ensure that there is some activity in your cluster since the failure (if you are using batched updates, then there should be 32 KB of updates or more) and secondly, ensure that there is no active replication channel between the nodes (that is, ensure the primary channel has really failed). Using the filename and position mentioned previously, run the following command on the backup slave: It is critical that you run these commands on the correct node. The previous command, from which you get the filename and position, must be run on the new master (this is in the "source" cluster). The following command, which tells the new slave which master to connect to and its relevant position and filename, must be executed on the new slave (this is the "destination" cluster). While it is technically possible to connect the old slave to a new master or vice versa, this configuration is not recommended by MySQL and should not be used. If all is okay, then the highlighted rows in the preceding output will show that the slave thread is running and waiting for the master to send an event. [NEW slave] mysql> CHANGE MASTER TO MASTER_HOST='10.0.0.2', MASTER_USER='slave', MASTER_PASSWORD='password', MASTER_LOG_FILE='node2-bin.000003', MASTER_LOG_POS=200998;Query OK, 0 rows affected (0.01 sec)mysql> START SLAVE;Query OK, 0 rows affected (0.00 sec)mysql> show slave statusG;*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 10.0.0.2Master_User: slaveMaster_Port: 3306[snip]Relay_Master_Log_File: node2-bin.000003Slave_IO_Running: YesSlave_SQL_Running: YesReplicate_Do_DB:Replicate_Ignore_DB:Replicate_Do_Table:Replicate_Ignore_Table:[snip]Seconds_Behind_Master: 233 After a while, the Seconds_Behind_Master value should return to 0 (if the primary replication channel has been down for some time or if the master cluster has a very high write rate, then this may take some time) There's more… It is possible to increase the performance of MySQL Cluster replication by enabling batched updates. This can be accomplished by starting slave mysqld processes with the slave-allow-batching option (or add the slave-allow-batching option line to the [mysqld] section in my.cnf). This has the effect of applying updates in 32 KB batches rather than as soon as they are received, which generally results in lower CPU usage and higher throughput (particularly when the mean update size is low). See also To know more about Replication Compatibility Between MySQL Versions visit: http://dev.mysql.com/doc/refman/5.1/en/replication-compatibility.html User-defined partitioning MySQL Cluster vertically partitions data, based on the primary key, unless you configure it otherwise. The main aim of user-defined partitioning is to increase performance by grouping data likely to be involved in common queries onto a single node, thus reducing network traffic between nodes while satisfying queries. In this recipe, we will show how to define our own partitioning functions. If the NoOfReplicas in the global cluster configuration file is equal to the number of storage nodes, then each storage node contains a complete copy of the cluster data and there is no partitioning involved. Partitioning is only involved when there are more storage nodes than replicas. Getting ready Look at the City table in the world dataset; there are two integer fields (ID and Population). MySQL Cluster will choose ID as the default partitioning scheme as follows: mysql> desc City;+-------------+----------+------+-----+---------+----------------+| Field | Type | Null | Key | Default | Extra |+-------------+----------+------+-----+---------+----------------+| ID | int(11) | NO | PRI | NULL | auto_increment || Name | char(35) | NO | | | || CountryCode | char(3) | NO | | | || District | char(20) | NO | | | || Population | int(11) | NO | | 0 | |+-------------+----------+------+-----+---------+----------------+5 rows in set (0.00 sec) Therefore, a query that searches for a specific ID will use only one partition. In the following example, partition p3 is used: mysql> explain partitions select * from City where ID=1;+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+| 1 | SIMPLE | City | p3 | const | PRIMARY | PRIMARY | 4 | const | 1 | |+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+-------+1 row in set (0.00 sec) However, searching for a Population involves searching all partitions as follows: mysql> explain partitions select * from City where Population=42;+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------+| 1 | SIMPLE | City | p0,p1,p2,p3 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where with pushed condition |+----+-------------+-------+-------------+------+---------------+------+---------+------+------+-----------+1 row in set (0.01 sec) The first thing to do when considering user-defined partitioning is to decide if you can improve on the default partitioning scheme. In this case, if your application makes a lot of queries against this table specifying the City ID, it is unlikely that you can improve performance with user-defined partitioning. However, in case it makes a lot of queries by the Population and ID fields, it is likely that you can improve performance by switching the partitioning function from a hash of the primary key to a hash of the primary key and the Population field. How to do it... In this example, we are going to add the field Population to the partitioning function used by MySQL Cluster. We will add this field to the primary key rather than solely using this field. This is because the City table has an auto-increment field on the ID field, and in MySQL Cluster, an auto-increment field must be part of the primary key. Firstly, modify the primary key in the table to add the field that we will use to partition the table by: mysql> ALTER TABLE City DROP PRIMARY KEY, ADD PRIMARY KEY(ID, Population);Query OK, 4079 rows affected (2.61 sec)Records: 4079 Duplicates: 0 Warnings: 0 Now, tell MySQL Cluster to use the Population field as a partitioning function as follows: mysql> ALTER TABLE City partition by key (Population);Query OK, 4079 rows affected (2.84 sec)Records: 4079 Duplicates: 0 Warnings: 0 Now, verify that queries executed against this table only use one partition as follows: mysql> explain partitions select * from City where Population=42;+----+-------------+-------+------------+------+---------------+------+---------+------+------+------------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+------------+------+---------------+------+---------+------+------+------------+| 1 | SIMPLE | City | p3 | ALL | NULL | NULL | NULL | NULL | 4079 | Using where with pushed condition |+----+-------------+-------+------------+------+---------------+------+---------+------+------+------------+1 row in set (0.01 sec) Now, notice that queries against the old partitioning function, ID, use all partitions as follows: mysql> explain partitions select * from City where ID=1;+----+-------------+-------+-------------+------+---------------+---------+---------+-------+------+-------+| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | Extra |+----+-------------+-------+-------------+------+---------------+---------+---------+-------+------+-------+| 1 | SIMPLE | City | p0,p1,p2,p3 | ref | PRIMARY | PRIMARY | 4 | const | 10 | |+----+-------------+-------+-------------+------+---------------+---------+---------+-------+------+-------+1 row in set (0.00 sec) Congratulations! You have now set up user-defined partitioning. Now, benchmark your application to see if you have gained an increase in performance. There's more... User-defined partitioning can be particularly useful where you have multiple tables and a join. For example, if you had a table of Areas within Cities consisting of an ID field (primary key, auto increment, and default partitioning field) and then a City ID, you would likely find an enormous number of queries that select all of the locations within a certain city and also select the relevant city row. It would therefore make sense to keep: all of the rows with the same City value inside the Areas table together on one node each of these groups of City values inside the Areas table on the same node as the relevant City row in the City table This can be achieved by configuring both tables to use the City field as a partitioning function, as described earlier in the Population field.
Read more
  • 0
  • 0
  • 1639

article-image-first-principle-and-useful-way-think
Packt
08 Oct 2015
8 min read
Save for later

First Principle and a Useful Way to Think

Packt
08 Oct 2015
8 min read
In this article, by Timothy Washington, author of the book Clojure for Finance, we will cover the following topics: Modeling the stock price activity Function evaluation First-Class functions Lazy evaluation Basic Clojure functions and immutability Namespace modifications and creating our first function (For more resources related to this topic, see here.) Modeling the stock price activity There are many types of banks. Commercial entities (large stores, parking areas, hotels, and so on) that collect and retain credit card information, are either quasi banks, or farm out credit operations to bank-like processing companies. There are more well-known consumer banks, which accept demand deposits from the public. There are also a range of other banks such as commercial banks, insurance companies and trusts, credit unions, and in our case, investment banks. As promised, this article will slowly build up a set of lagging price indicators that follow a moving stock price time series. In order to do that, I think it's useful to touch on stock markets, and to crudely model stock price activity. A stock (or equity) market, is a collection of buyers and sellers trading economic assets (usually companies). The stock (or shares) of those companies can be equities listed on an exchange (New York Stock Exchange, London Stock Exchange, and others), or may be those traded privately. In this exercise, we will do the following: Crudely model the stock price movement, which will give us a test bed for writing our lagging price indicators Introduce some basic features of the Clojure language Function evaluation The Clojure website has a cheatsheet (http://clojure.org/cheatsheet) with all of the language's core functions. The first function we'll look at is rand, a function that randomly gives you a number within a given range. So in your edgar/ project, launch a repl with the lein repl shell command. After a few seconds, you will enter repl (Read-Eval-Print-Loop). Again, Clojure functions are executed by being placed in the first position of a list. The function's arguments are placed directly afterwards. In your repl, evaluate (rand 35) or (rand 99) or (rand 43.21) or any number you fancy Run it many times to see that you can get any different floating point number, within 0 and the upper bound of the number you provided First-Class functions The next functions we'll look at are repeatedly and fn. repeatedly is a function that takes another function and returns an infinite (or length n if supplied) lazy sequence of calls to the latter function. This is our first encounter of a function that can take another function. We'll also encounter functions that return other functions. Described as First-Class functions, this falls out of lambda calculus and is one of the central features of functional programming. As such, we need to wrap our previous (rand 35) call in another function. fn is one of Clojure's core functions, and produces an anonymous, unnamed function. We can now supply this function to repeatedly. In your repl, if you evaluate (take 25 (repeatedly (fn [] (rand 35)))), you should see a long list of floating point numbers with the list's tail elided. Lazy evaluation We only took the first 25 of the (repeatedly (fn [] (rand 35))) result list, because the list (actually a lazy sequence) is infinite. Lazy evaluation (or laziness) is a common feature in functional programming languages. Being infinite, Clojure chooses to delay evaluating most of the list until it's needed by some other function that pulls out some values. Laziness benefits us by increasing performance and letting us more easily construct control flow. We can avoid needless calculation, repeated evaluations, and potential error conditions in compound expressions. Let's try to pull out some values with the take function. take itself, returns another lazy sequence, of the first n items of a collection. Evaluating (take 25 (repeatedly (fn [] (rand 35)))) will pull out the first 25 repeatedly calls to rand which generates a float between 0 and 35. Basic Clojure functions and immutability There's many operations we can perform over our result list (or lazy sequence). One of the main approaches of functional programming is to take a data structure, and perform operations over top of it to produce a new data structure, or some atomic result (a string, number, and so on). This may sound inefficient at first. But most FP languages employ something called immutability to make these operations efficient. Immutable data structures are the ones that cannot change once they've been created. This is feasible as most immutable, FP languages use some kind of structural data sharing between an original and a modified version. The idea is that if we run evaluate (conj [1 2 3] 4), the resulting [1 2 3 4] vector shares the original vector of [1 2 3]. The only additional resource that's assigned is for any novelty that's been introduced to the data structure (the 4). There's a more detailed explanation of (for example) Clojure's persistent vectors here: conj: This conjoins an element to a collection—the collection decides where. So conjoining an element to a vector (conj [1 2 3] 4) versus conjoining an element to a list (conj '(1 2 3) 4) yield different results. Try it in your repl. map: This passes a function over one or many lists, yielding another list. (map inc [1 2 3]) increments each element by 1. reduce (or left fold): This passes a function over each element, accumulating one result. (reduce + (take 100 (repeatedly (fn [] (rand 35))))) sums the list. filter: This constrains the input by some condition. >=: This is a conditional function, which tests whether the first argument is greater than or equal to the second function. Try (>= 4 9) and (>= 9 1). fn: This is a function that creates a function. This unnamed or anonymous function can have any instructions you choose to put in there. So if we only want numbers above 12, we can put that assertion in a predicate function. Try entering the below expression into your repl: (take 25 (filter (fn [x] (>= x 12)) (repeatedly (fn [] (rand 35))))) Modifying the namespaces and creating our first function We now have the basis for creating a function. It will return a lazy infinite sequence of floating point numbers, within an upper and lower bound. defn is a Clojure function, which takes an anonymous function, and binds a name to it in a given namespace. A Clojure namespace is an organizational tool for mapping human-readable names to things like functions, named data structures and such. Here, we're going to bind our function to the name generate-prices in our current namespace. You'll notice that our function is starting to span multiple lines. This will be a good time to author the code in your text editor of choice. I'll be using Emacs: Open your text editor, and add this code to the file called src/edgar/core.clj. Make sure that (ns edgar.core) is at the top of that file. After adding the following code, you can then restart repl. (load "edgaru/core") uses the load function to load the Clojure code in your in src/edgaru/core.clj: (defn generate-prices [lower-bound upper-bound] (filter (fn [x] (>= x lower-bound)) (repeatedly (fn [] (rand upper-bound))))) The Read-Eval-Print-Loop In our repl, we can pull in code in various namespaces, with the require function. This applies to the src/edgar/core.clj file we've just edited. That code will be in the edgar.core namespace: In your repl, evaluate (require '[edgar.core :as c]). c is just a handy alias we can use instead of the long name. You can then generate random prices within an upper and lower bound. Take the first 10 of them like this (take 10 (c/generate-prices 12 35)). You should see results akin to the following output. All elements should be within the range of 12 to 35: (29.60706184716407 12.507593971664075 19.79939384292759 31.322074615579716 19.737852534147326 25.134649707849572 19.952195022152488 12.94569843904663 23.618693004455086 14.695872710062428) There's a subtle abstraction in the preceding code that deserves attention. (require '[edgar.core :as c]) introduces the quote symbol. ' is the reader shorthand for the quote function. So the equivalent invocation would be (require (quote [edgar.core :as c])). Quoting a form tells the Clojure reader not to evaluate the subsequent expression (or form). So evaluating '(a b c) returns a list of three symbols, without trying to evaluate a. Even though those symbols haven't yet been assigned, that's okay, because that expression (or form) has not yet been evaluated. But that begs a larger question. What is reader? Clojure (and all Lisps) are what's known as homoiconic. This means that Clojure code is also data. And data can be directly output and evaluated as code. The reader is the thing that parses our src/edgar/core.clj file (or (+ 1 1) input from the repl prompt), and produces the data structures that are evaluated. read and eval are the 2 essential processes by which Clojure code runs. The evaluation result is printed (or output), usually to the standard output device. Then we loop the process back to the read function. So, when the repl reads, your src/edgar/two.clj file, it's directly transforming that text representation into data and evaluating it. A few things fall out of that. For example, it becomes simpler for Clojure programs to directly read, transform and write out other Clojure programs. The implications of that will become clearer when we look at macros. But for now, know that there are ways to modify or delay the evaluation process, in this case by quoting a form. Summary In this article, we learned about basic features of the Clojure language and how to model the stock price activity. Besides these, we also learned function evaluation, First-Class functions, the lazy evaluation method, namespace modifications and creating our first function. Resources for Article: Further resources on this subject: Performance by Design[article] Big Data[article] The Observer Pattern [article]
Read more
  • 0
  • 0
  • 1628

article-image-event-detection-news-headlines-hadoop
Packt
08 Dec 2016
13 min read
Save for later

Event detection from the news headlines in Hadoop

Packt
08 Dec 2016
13 min read
In this article by Anurag Shrivastava, author of Hadoop Blueprints, we will be learning how to build a text analytics system which detects the specific events from the random news headlines. Internet has become the main source of news in the world. There are thousands of website which constantly publish and update the news stories around the world. Not every news items is relevant for everyone but some news items are very critical for some people or businesses. For example, if you were major car manufacturer based in Germany having your suppliers located in India then you would be interested in the news from the region which can affect your supply chain. (For more resources related to this topic, see here.) Road accidents in India are a major social and economic problem. Road accidents leave a large number of fatalities behind and result in the loss of capital. In this example, we will build a system which detects if a news item refers to a road accident event. Let us define what we mean by it in the next paragraph. A road accident event may or may not result in fatal injuries. One or more vehicles and pedestrians may be involved in the accidents. A non road accident event news item is everything else which can not be categorized as a road accident event. It could be a road accident trend analysis related to road accidents or something totally unrelated. Technology stack To build this system, we will use the following technologies: Task Technology Data storage HDFS Data processing Hadoop MapReduce Query engine Hive and Hive UDF Data ingestion Curl and HDFS copy Event detection OpenNLP The event detection system is a machine learning based natural language processing system. The natural language processing system brings the intelligence to detect the events in the random headline sentences from the news items. An OpenNLP OpenSourceNaturalLanguageProcessingFramework (OpenNLP) is from apache software foundation. You can download the version 1.6.0 from https://opennlp.apache.org/ to run the examples in this blog. It is capable of detecting the entities, document categories, parts of speech, and so on in the text written by humans. We will use document categorization feature of OpenNLP in our system. Document categorization feature requires you to train the OpenNLP model with the help of sample text. As a result of training, we get a model. This resulting model is used to categorize the new text. Our training data looks as follows: r 1.46 lakh lives lost on Indian roads last year - The Hindu. r Indian road accident data | OpenGovernmentData (OGD) platform... r 400 people die everyday in road accidents in India: Report - India TV. n Top Indian female biker dies in road accident during country-wide tour. n Thirty die in road accidents in north India mountains—World—Dunya... n India's top woman biker Veenu Paliwal dies in road accident: India... r Accidents on India's deadly roads cost the economy over $8 billion... n Thirty die in road accidents in north India mountains (The Express) The first column can take two values: n indicates that the news item is a road accident event r indicates that the news item is not a road accident event or everything else This training set has total 200 lines. Please note that OpenNLP requires at least 15000 lines in the training set to deliver good results. Because we do not have so much training data, we will start with a small set but remain aware about the limitations of our model. You will see that even with a small training dataset, this model works reasonably well. Let us train and build our model: $ opennlp DoccatTrainer -model en-doccat.bin -lang en -data roadaccident.train.prn -encoding UTF-8 Here the file roadaccident.train.prn contains the training data. The output file en-doccat.bin contains the model which we will use in our data pipeline. We have built our model using the command line utility but it is also possible to build the model programmatically. The training data file is a plain text file, which you can expand with a bigger corpus of knowledge to make the model smarter. Next we will build the data pipeline as follows: Fetch RSS feeds This component will fetch RSS news feeds from the popular news web sites. In this case, we will just use one news from Google. We can always add more sites after our first RSS feed has been integrated. The whole RSS feed can be downloaded using the following command: $ curl "https://news.google.com/news?cf=all&hl=en&ned=in&topic=n&output=rss" The previous command downloads the news headline for India. You can customize the RSS feed by visiting the Google news site is https://news.google.com for your region. Scheduler Our scheduler will fetch the RSS feed once in 6 hours. Let us assume that in 6 hours time interval, we have good likelihood of fetching fresh news items. We will wrap our feed fetching script in a shell file and invoke it using cron. The script is as follows: $ cat feedfetch.sh NAME= "newsfeed-"`date +%Y-%m-%dT%H.%M.%S` curl "https://news.google.com/news?cf=all&hl=en&ned=in&topic=n&output=rss" > $NAME hadoop fs -put $NAME /xml/rss/newsfeeds Cron job setup line will be as follows: 0 */6 * * * /home/hduser/mycommand Please edit your cron job table using the following command and add the setup line in it: $ cronjob -e Loading data in HDFS To load data in HDFS, we will use HDFS put command which copies the downloaded RSS feed in a directory in HDFS. Let us make this directory in HDFS where our feed fetcher script will store the rss feeds: $ hadoop fs -mkdir /xml/rss/newsfeeds Query using Hive First we will create an external table in Hive for the new RSS feed. Using Xpath based select queries, we will extract the news headlines from the RSS feeds. These headlines will be passed to UDF to detect the categories: CREATE EXTERNAL TABLE IF NOT EXISTS rssnews( document STRING) COMMENT 'RSS Feeds from media' STORED AS TEXTFILE location '/xml/rss/newsfeeds'; The following command parses the XML to retrieve the title or the headlines from XML and explodes them in a single column table: SELECT explode(xpath(name, '//item/title/text()')) FROM xmlnews1; The sample output of the above command on my system is as follows: hive> select explode(xpath(document, '//item/title/text()')) from rssnews; Query ID = hduser_20161010134407_dcbcfd1c-53ac-4c87-976e-275a61ac3e8d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1475744961620_0016, Tracking URL = http://localhost:8088/proxy/application_1475744961620_0016/ Kill Command = /home/hduser/hadoop-2.7.1/bin/hadoop job -kill job_1475744961620_0016 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2016-10-10 14:46:14,022 Stage-1 map = 0%, reduce = 0% 2016-10-10 14:46:20,464 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.69 sec MapReduce Total cumulative CPU time: 4 seconds 690 msec Ended Job = job_1475744961620_0016 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 4.69 sec HDFS Read: 120671 HDFS Write: 1713 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 690 msec OK China dispels hopes of early breakthrough on NSG, sticks to its guns on Azhar - The Hindu Pampore attack: Militants holed up inside govt building; combing operations intensify - Firstpost CPI(M) worker hacked to death in Kannur - The Hindu Akhilesh Yadav's comment on PM Modi's Lucknow visit shows Samajwadi Party's insecurity: BJP - The Indian Express PMO maintains no data about petitions personally read by PM - Daily News & Analysis AIADMK launches social media campaign to put an end to rumours regarding Amma's health - Times of India Pakistan, India using us to play politics: Former Baloch CM - Times of India Indian soldier, who recited patriotic poem against Pakistan, gets death threat - Zee News This Dussehra effigies of 'terrorism' to go up in flames - Business Standard 'Personal reasons behind Rohith's suicide': Read commission's report - Hindustan Times Time taken: 5.56 seconds, Fetched: 10 row(s) Hive UDF Our Hive User Defined Function (UDF) categorizeDoc takes a news headline and suggests if it is a news about a road accident or the road accident event as we explained earlier. This function is as follows: package com.mycompany.app;import org.apache.hadoop.io.Text;import org.apache.hadoop.hive.ql.exec.Description;import org.apache.hadoop.hive.ql.exec.UDF;import org.apache.hadoop.io.Text;import opennlp.tools.util.InvalidFormatException;import opennlp.tools.doccat.DoccatModel;import opennlp.tools.doccat.DocumentCategorizerME;import java.lang.String;import java.io.FileInputStream;import java.io.InputStream;import java.io.IOException;@Description( name = "getCategory", value = "_FUNC_(string) - gets the catgory of a document ")public final class MyUDF extends UDF { public Text evaluate(Text input) { if (input == null) return null; try { return new Text(categorizeDoc(input.toString())); } catch (Exception ex) { ex.printStackTrace(); return new Text("Sorry Failed: >> " + input.toString()); } } public String categorizeDoc(String doc) throws InvalidFormatException, IOException { InputStream is = new FileInputStream("./en-doccat.bin"); DoccatModel model = new DoccatModel(is); is.close(); DocumentCategorizerME classificationME = new DocumentCategorizerME(model); String documentContent = doc; double[] classDistribution = classificationME.categorize(documentContent); String predictedCategory = classificationME.getBestCategory(classDistribution); return predictedCategory; }} The function categorizeDoc take a single string as input. It loads the model which we created earlier from the file en-doccat.bin from the local directory. Finally it calls the classifier which returns the result to the calling function. The calling function MyUDF extends the hive UDF class. It calls the function categorizeDoc for each string line item input. If the it succeed then the value is returned to the calling program otherwise a message is returned which indicates that the category detection has failed. The pom.xml file to build the above file is as follows: $ cat pom.xml <?xml version="1.0" encoding="UTF-8"?> <project xsi_schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>com.mycompany</groupId> <artifactId>app</artifactId> <version>1.0</version> <packaging>jar</packaging> <properties> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <maven.compiler.source>1.7</maven.compiler.source> <maven.compiler.target>1.7</maven.compiler.target> </properties> <dependencies> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.12</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-client</artifactId> <version>2.7.1</version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>2.0.0</version> <type>jar</type> </dependency> <dependency> <groupId>org.apache.opennlp</groupId> <artifactId>opennlp-tools</artifactId> <version>1.6.0</version> </dependency> </dependencies> <build> <pluginManagement> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.8</version> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <archive> <manifest> <mainClass>com.mycompany.app.App</mainClass> </manifest> </archive> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> </plugin> </plugins> </pluginManagement> </build> </project> You can build the jar with all the dependencies in it using the following commands: $ mvn clean compile assembly:single The resulting jar file app-1.0-jar-with-dependencies.jar can be found in the target directory. Let us use this jar file in Hive to categorise the news headlines as follows: Copy jar file to the bin subdirectory in the Hive root: $ cp app-1.0-jar-with-dependencies.jar $HIVE_ROOT/bin Copy the trained model in the bin sub directory in the Hive root: $ cp en-doccat.bin $HIVE_ROOT/bin Run the categorization queries Run Hive: $hive Add jar file in Hive: hive> ADD JAR ./app-1.0-jar-with-dependencies.jar ; Create a temporary categorization function catDoc: hive> CREATE TEMPORARY FUNCTION catDoc as 'com.mycompany.app.MyUDF'; Create a table headlines to hold the headlines extracted from the RSS feed: hive> create table headlines( headline string); Insert the extracted headlines in the table headlines: hive> insert overwrite table headlines select explode(xpath(document, '//item/title/text()')) from rssnews; Let's test our UDF by manually passing a real news headline to it from a newspaper website: hive> hive> select catDoc("8 die as SUV falls into river while crossing bridge in Ghazipur") ; OK N The output is N which means this is indeed a headline about a road accident incident. This is reasonably good, so now let us run this function for the all the headlines: hive> select headline, catDoc(*) from headlines; OK China dispels hopes of early breakthrough on NSG, sticks to its guns on Azhar - The Hindu r Pampore attack: Militants holed up inside govt building; combing operations intensify - Firstpost r Akhilesh Yadav Backs Rahul Gandhi's 'Dalali' Remark - NDTV r PMO maintains no data about petitions personally read by PM Narendra Modi - Economic Times n Mobile Internet Services Suspended In Protest-Hit Nashik - NDTV n Pakistan, India using us to play politics: Former Baloch CM - Times of India r CBI arrests Central Excise superintendent for taking bribe - Economic Times n Be extra vigilant during festivals: Centre's advisory to states - Times of India r CPI-M worker killed in Kerala - Business Standard n Burqa-clad VHP activist thrashed for sneaking into Muslim women gathering - The Hindu r Time taken: 0.121 seconds, Fetched: 10 row(s) You can see that our headline detection function works and output r or n. In the above example, we see many false positives where a headline has been incorrectly identified as a road accident. A better training for our model can improve the quality of our results. Further reading The book Hadoop Blueprints covers several case studies where we can apply Hadoop, HDFS, data ingestion tools such as Flume and Sqoop, query and visualization tools such as Hive and Zeppelin, machine learning tools such as BigML and Spark to build the solutions. You will discover how to build a fraud detection system using Hadoop or build a Data Lake for example. Summary In this article we have learned to build a text analytics system which detects the specific events from the random news headlines. This also covers how to apply Hadoop, HDFS, and other different tools. Resources for Article: Further resources on this subject: Spark for Beginners [article] Hive Security [article] Customizing heat maps (Intermediate) [article]
Read more
  • 0
  • 0
  • 1609

article-image-author-podcast-bob-griesemer-oracle-warehouse-builder-11g
Packt
09 Apr 2010
1 min read
Save for later

Author Podcast - Bob Griesemer on Oracle Warehouse Builder 11g

Packt
09 Apr 2010
1 min read
Click here to download the interview, or hit play in the media player below.    
Read more
  • 0
  • 0
  • 1542
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-securing-your-data
Packt
12 Oct 2015
6 min read
Save for later

Securing Your Data

Packt
12 Oct 2015
6 min read
In this article by Tyson Cadenhead, author of Socket.IO Cookbook, we will explore several topics related to security in Socket.IO applications. These topics will cover the gambit, from authentication and validation to how to use the wss:// protocol for secure WebSockets. As the WebSocket protocol opens innumerable opportunities to communicate more directly between the client and the server, people often wonder if Socket.IO is actually as secure as something such as the HTTP protocol. The answer to this question is that it depends entirely on how you implement it. WebSockets can easily be locked down to prevent malicious or accidental security holes, but as with any API interface, your security is only as tight as your weakest link. In this article, we will cover the following topics: Locking down the HTTP referrer Using secure WebSockets (For more resources related to this topic, see here.) Locking down the HTTP referrer Socket.IO is really good at getting around cross-domain issues. You can easily include the Socket.IO script from a different domain on your page, and it will just work as you may expect it to. There are some instances where you may not want your Socket.IO events to be available on every other domain. Not to worry! We can easily whitelist only the http referrers that we want so that some domains will be allowed to connect and other domains won't. How To Do It… To lock down the HTTP referrer and only allow events to whitelisted domains, follow these steps: Create two different servers that can connect to our Socket.IO instance. We will let one server listen on port 5000 and the second server listen on port 5001: var express = require('express'), app = express(), http = require('http'), socketIO = require('socket.io'), server, server2, io; app.get('/', function (req, res) { res.sendFile(__dirname + '/index.html'); }); server = http.Server(app); server.listen(5000); server2 = http.Server(app); server2.listen(5001); io = socketIO(server); When the connection is established, check the referrer in the headers. If it is a referrer that we want to give access to, we can let our connection perform its tasks and build up events as normal. If a blacklisted referrer, such as the one on port 5001 that we created, attempts a connection, we can politely decline and perhaps throw an error message back to the client, as shown in the following code: io.on('connection', function (socket) { switch (socket.request.headers.referer) { case 'http://localhost:5000/': socket.emit('permission.message', 'Okay, you're cool.'); break; default: returnsocket.emit('permission.message', 'Who invited you to this party?'); break; } }); On the client side, we can listen to the response from the server and react as appropriate using the following code: socket.on('permission.message', function (data) { document.querySelector('h1').innerHTML = data; }); How It Works… The referrer is always available in the socket.request.headers object of every socket, so we will be able to inspect it there to check whether it was a trusted source. In our case, we will use a switch statement to whitelist our domain on port 5000, but we could really use any mechanism at our disposal to perform the task. For example, if we need to dynamically whitelist domains, we can store a list of them in our database and search for it when the connection is established. Using secure WebSockets WebSocket communications can either take place over the ws:// protocol or the wss:// protocol. In similar terms, they can be thought of as the HTTP and HTTPS protocols in the sense that one is secure and one isn't. Secure WebSockets are encrypted by the transport layer, so they are safer to use when you handle sensitive data. In this recipe, you will learn how to force our Socket.IO communications to happen over the wss:// protocol for an extra layer of encryption. Getting Ready… In this recipe, we will need to create a self-signing certificate so that we can serve our app locally over the HTTPS protocol. For this, we will need an npm package called Pem. This allows you to create a self-signed certificate that you can provide to your server. Of course, in a real production environment, we would want a true SSL certificate instead of a self-signed one. To install Pem, simply call npm install pem –save. As our certificate is self-signed, you will probably see something similar to the following screenshot when you navigate to your secure server: Just take a chance by clicking on the Proceed to localhost link. You'll see your application load using the HTTPS protocol. How To Do It… To use the secure wss:// protocol, follow these steps: First, create a secure server using the built-in node HTTPS package. We can create a self-signed certificate with the pem package so that we can serve our application over HTTPS instead of HTTP, as shown in the following code: var https = require('https'), pem = require('pem'), express = require('express'), app = express(), socketIO = require('socket.io'); // Create a self-signed certificate with pem pem.createCertificate({ days: 1, selfSigned: true }, function (err, keys) { app.get('/', function(req, res){ res.sendFile(__dirname + '/index.html'); }); // Create an https server with the certificate and key from pem var server = https.createServer({ key: keys.serviceKey, cert: keys.certificate }, app).listen(5000); vario = socketIO(server); io.on('connection', function (socket) { var protocol = 'ws://'; // Check the handshake to determine if it was secure or not if (socket.handshake.secure) { protocol = 'wss://'; } socket.emit('hello.client', { message: 'This is a message from the server. It was sent using the ' + protocol + ' protocol' }); }); }); In your client-side JavaScript, specify secure: true when you initialize your WebSocket as follows: var socket = io('//localhost:5000', { secure: true }); socket.on('hello.client', function (data) { console.log(data); }); Now, start your server and navigate to https://localhost:5000. Proceed to this page. You should see a message in your browser developer tools that shows, This is a message from the server. It was sent using the wss:// protocol. How It Works… The protocol of our WebSocket is actually set automatically based on the protocol of the page that it sits on. This means that a page that is served over the HTTP protocol will send the WebSocket communications over ws:// by default, and a page that is served by HTTPS will default to using the wss:// protocol. However, by setting the secure option to true, we told the WebSocket to always serve through wss:// no matter what. Summary In this article, we gave you an overview of the topics related to security in Socket.IO applications. Resources for Article: Further resources on this subject: Using Socket.IO and Express together[article] Adding Real-time Functionality Using Socket.io[article] Welcome to JavaScript in the full stack [article]
Read more
  • 0
  • 0
  • 1538

article-image-faqs-microsoft-sql-server-2008-high-availability
Packt
28 Jan 2011
6 min read
Save for later

FAQs on Microsoft SQL Server 2008 High Availability

Packt
28 Jan 2011
6 min read
Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication  Install various SQL Server High Availability options in a step-by-step manner  A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators  Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability  Tips to enhance performance with SQL Server High Availability  External references for further study Q: What is Clustering? A: Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are critical for business and should be available 24 X 7 with 99.99% up time. Q: How does MS Windows server Enterprise and Datacenter edition support failover clustering? A: MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two or more identical nodes connected to each other by means of private network and commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). Q: What is MSDTC? A: Microsoft Distributed Transaction Coordinator (MSDTC) is a service used by the SQL Server when it is required to have distributed transactions between more than one machine. In a clustered environment, SQL Server service can be hosted on any of the available nodes if the active node fails, and in this case MSDTC comes into the picture in case we have distributed queries and for replication, and hence the MSDTC service should be running. Following are a couple of questions with regard to MSDTC. Q: What will happen to the data that is being accessed? A: The data is taken care of, by shared disk arrays as it is shared and every node that is part of the cluster can access it; however, one node at a time can access and own it. Q: What about clients that were connected previously? Does the failover mean that developers will have to modify the connection string? A: Nothing like this happens. SQL Server is installed as a virtual server and it has a virtual IP address and that too is shared by every cluster node. So, the client actually knows only one SQL Server or its IP address. Here are the steps that explain how Failover will work: Node 1 owns the resources as of now, and is active node. The network adapter driver gets corrupted or suffers a physical damage. Heartbeat between Node1 and Node 2 is broken. Node 2 initiates the process to take ownership of the resources owned by the Node 1. It would approximately take two to five minutes to complete the process. Q: What is Hyper-V? What are their uses? A: Let's see what the Hyper-V is: It is a hypervisor-based technology that allows multiple operating systems to run on a host operating system at the same time. It has advantages of using SQL Server 2008 R2 on Windows Server 2008 R2 with Hyper-V. One such example could be the ability to migrate a live server, thereby increasing high availability without incurring downtime, among others. Hyper-V now supports up to 64 logical processors. It can host up to four VMs on a single licensed host server. SQL Server 2008 R2 allows an unrestricted number of virtual servers, thus making consolidation easy. It has the ability to manage multiple SQL Servers centrally using Utility Control Point (UCP). Sysprep utility can be used to create preconfigured VMs so that SQL Server deployment becomes easier. Q: What are the Hardware, Software and Operating system requirements for installing SQL Server 2008 R2? A: The following are the hardware requirements: Processor: Intel Pentium 3 or Higher Processor Speed: 1 GHZ or Higher RAM: 512 MB of RAM but 2 GB is recommended Display : VGA or Higher The following are the software requirements: Operating system: Windows 7 Ultimate, Windows Server 2003 (x86 or x64), Windows Server 2008 (x86 or x64) Disk space: Minimum 1 GB .Net Framework 3.5 Windows Installer 4.5 or later MDAC 2.8 SP1 or later The following are the operating system requirements for clustering: To install SQL Server 2008 clustering, it's essential to have Windows Server 2008 Enterprise or Data Center Edition installed on our host system with Full Installation, so that we don't have to go back and forth and install the required components and restart the system. Q: What is to be done when we see the network binding warning coming up? A: In this scenario, we will have to go to Network and Sharing Center | Change Adapter Settings. Once there, pressing Alt + F, we will select Advanced Settings. Select Public Network and move it up if it is not and repeat this process on the second node. Q: What is the difference between Active/Passive and Active/Active Failover Cluster? A: In reality, there is only one difference between Single-instance (Active/Passive Failover Cluster) and Multi-instance (Active/Active Failover Cluster). As its name suggests, in a Multi-instance cluster, there will be two or more SQL Server active instances running in a cluster, compared to one instance running in Single-instance. Also, to configure a multi-instance Cluster, we may need to procure additional disks, IP addresses, and network names for the SQL Server. Q: What is the benefit of having Multi-instance, that is, Active/Active configuration? A: Depending on the business requirement and the capability of our hardware, we may have one or more instances running in our cluster environment. The main goal is to have a better uptime and better High Availability by having multiple SQL Server instances running in an environment. Should anything go wrong with the one SQL Server instance, another instance can easily take over the control and keep the business-critical application up and running! Q: What will be the difference in the prerequisites for the Multi-instance Failover Cluster as compared to the Single-instance Failover Cluster? A: There will be no difference compared to a Single-instance Failover Cluster, except that we need to procure additional disk(s), network name, and IP addresses. We need to make sure that our hardware is capable of handling requests that come from client machines for both the instances. Installing a Multi-instance cluster is almost similar to adding a Single-instance cluster, except for the need to add a few resources along with a couple of steps here and there.
Read more
  • 0
  • 0
  • 1520

article-image-oracle-business-intelligence-11g-r1-cookbook
Packt
10 Jul 2013
4 min read
Save for later

Measuring Performance with Key Performance Indicators

Packt
10 Jul 2013
4 min read
(For more resources related to this topic, see here.) Creating the KPIs and the KPI watchlists We're going to create Key Performance Indicators and watchlists in the first recipe. There should be comparable measure columns in the repository in order to create KPI objects. The following columns will be used in the sample scenario: Shipped Quantity Requested Quantity How to do it Click on the KPI link in the Performance Management section and you're going to select a subject area. The KPI creation wizard has five different steps. The first step is the General Propertiessection and we're going to write a description for the KPI object. The Actual Value and the Target Value attributes display the columns that we'll use in this scenario. The columns should be selected manually. The Enable Trendingcheckbox is not selected by default. When you select the checkbox, trending options will appear on the screen. We're going to select the Day level from the Time hierarchy for trending in the Compare to Prior textbox and define a value for the Tolerance attribute. We're going to use 1 and % Change in this scenario. Clicking on the Next button will display the second step named Dimensionality. Click on the Add button to select Dimension attributes. Select the Region column in the Add New Dimension window. After adding the Region column, repeat the step for the YEAR column. You shouldn't select any value to pin. Both column values will be . Clicking on Next button will display the third step named States. You can easily configure the state values in this step. Select the High Values are Desirable value from the Goal drop-down list. By default, there are three steps: OK Warning Critical Then click on the Next button and you'll see the Related Documents step. This is a list of supporting documents and links regarding to the Key Performance Indicator. Click on the Add button to select one of the options. If you want to use another analysis as a supporting document, select the Catalog option and choose the analysis that consists of some valuable information about the report. We're going to add a link. You can also easily define the address of the link. We'll use the http://www.abc.com/portal link. Click on the Next button to display the Custom Attributes column values. To add a custom attribute that will be displayed in the KPI object, click on the Add button and define the values specified as follows: Number: 1 Label: Dollars Formula: "Fact_Sales"."Dollars" Save the KPI object by clicking on the Save button. Right after saving the KPI object, you'll see the KPI content. KPI objects cannot be published in the dashboards directly. We need KPI watchlists to publish them in the dashboards. Click on the KPI Watchlist link in the Performance Managementsection to create one. The New KPI Watchlist page will be displayed without any KPI objects. Drag-and-drop the KPI object that was previously created from the Catalog pane onto the KPI watchlist list. When you drop the KPI object, the Add KPI window will pop up automatically. You can select one of the available values for the dimensions. We're going to select the Use Point-of-View option. Enter a Label value, A Sample KPI, for this example. You'll see the dimension attributes in the Point-of-View bar. You can easily select the values from the drop-down lists to have different perspectives. Save the KPI watchlist object. How it works KPI watchlists can contain multiple KPI objects based on business requirements. These container objects can be published in the dashboards so that end users will access the content of the KPI objects through the watchlists. When you want to publish these watchlists, you'll need to select a value for the dimension attributes. There's more The Drill Down feature is also enabled in the KPI objects. If you want to access finer levels, you can just click on the hyperlink of the value you are interested in and a detailed level is going to be displayed automatically. Summary In this article, we learnt how to create KPIs and KPI watchlists. Key Performance Indicators are building blocks of strategy management. In order to implement balanced scorecard management technique in an organization, you'll first need to create the KPI objects. Resources for Article : Further resources on this subject: Oracle Integration and Consolidation Products [Article] Managing Oracle Business Intelligence[Article] Oracle Tools and Products [Article]
Read more
  • 0
  • 0
  • 1508

article-image-settings-goals
Packt
18 Dec 2013
8 min read
Save for later

Settings goals

Packt
18 Dec 2013
8 min read
(for more resources related to this topic, see here.) You can use the following code to track the event: [Flurry logEvent:@"EVENT_NAME"]; The logEvent: method logs your event every time it's triggered during the application session. This method helps you to track how often that event is triggered. You can track up to 300 different event IDs. However, the length of each event ID should be less than 255 characters. After the event is triggered, you can track that event from your Flurry dashboard. As is explained in the following screenshot, your events will be listed in the Events section. After clicking on Event Summary, you can see a list of the events you have created along with the statistics of the generated data as shown in the following screenshot: You can fetch the detailed data by clicking on the event name (for example, USER_VIEWED). This section will provide you with a chart-based analysis of the data as shown in the following screenshot: The Events Per Session chart will provide you with details about how frequently a particular event is triggered in a session. Other than this, you are provided with the following data charts as well: Unique Users Performing Event: This chart will explain the frequency of unique users triggering the event. Total Events: This chart holds the event-generation frequency over a time period. You can access the frequency of the event being triggered over any particular time slot. Average Events Per Session: This charts holds the average frequency of the events that happen per session. There is another variation of this method, as shown in the following code, which allows you to track the events along with the specific user data provided: [Flurry logEvent:@"EVENT_NAME" withParameters:YOUR_NSDictionary]; This version of the logEvent: method counts the frequency of the event and records dynamic parameters in the form of dictionaries. External parameters should be in the NSDictionary format, whereas both the key and the value should be in the NSString object format. Let's say you want to track how frequently your comment section is used and see the comments, then you can use this method to track such events along with the parameters. You can track up to 300 different events with an event ID length less than 255 characters. You can provide a maximum of 10 event parameters per event. The following example illustrates the use of the logEvent: method along with optional parameters in the dictionary format: NSDictionary *dictionary = [NSDictionary dictionaryWithObjectsAndKeys:@"your dynamic parameter value", @"your dynamic parameter name",nil]; [Flurry logEvent:@"EVENT_NAME" withParameters:dictionary]; In case you want Flurry to log all your application sections/screens automatically, then you should pass navigationController or as a parameter to count all your pages automatically using one of the following code: [Flurry logAllPageViews:navigationController]; [Flurry logAllPageViews:tabBarController]; The Flurry SDK will create a delegate on your navigationControlleror tabBarController object, whichever is provided to detect the page's navigation. Each navigation detected will be tracked by the Flurry SDK automatically as a page view. You only need to pass each object to the Flurry SDK once. However you can pass multiple instances of different navigation and tab bar controllers. There can be some cases where you can have a view controller that is not associated with any navigation or tab bar controller. Then you can use the following code: [Flurry logPageView]; The preceding code will track the event independently of navigation and tab bar controllers. For each user interaction you can manually log events. Tracking time spent Flurry allows you to track events based on the duration factor as well. You can use the [Flurry logEvent: timed:] method to log your event in time as shown in the following code: [Flurry logEvent:@"EVENT_NAME" timed:YES]; In case you want to pass additional parameters along with the event name, you can use the following type of the logEvent: method to start a timed event for event Parameters as shown in the following code: [Flurry logEvent:@"EVENT_NAME" withParameters:YOUR_NSDictionarytimed:YES]; The aforementioned method can help you to track your timed event along with the dynamic data provided in the dictionary format. You can end all your timed events before the application exits. This can even be accomplished by updating the event with event Parameters. If you want to end your events without updating the parameters, you can pass nil as the parameters. If you do not end your events, they will automatically end when the application exits as shown in the following code: [Flurry endTimedEvent:@"EVENT_NAME" withParameters:YOUR_NSDictionary]; Let's take the following example in which you want to log an event whenever a user comments on any article in your application: NSDictionary *commentParams = [NSDictionary dictionaryWithObjectsAndKeys: @"User_Comment", @"Comment", // Capture comment info @"Registered", @"User_Status", // Capture user status nil]; [Flurry logEvent:@"User_Comment" withParameters:commentParams timed:YES]; // In a function that captures when a user post the comment [Flurry endTimedEvent:@"Article_Read" withParameters:nil]; //You can pass in additional //params or update existing ones here as well The aforementioned piece of code will help you to log a timed event every time a user comments on a picture in your application. While tracking the event, you are also tracking the comment and the user registered by specifying them in the dictionary. Tracking errors Flurry provides you with a method to track errors as well. You can use the following methods to track errors on Flurry: [Flurry logError:@"ERROR_NAME" message:@"ERROR_MESSAGE" exception:e]; You can track exceptions and errors that occurred in the application by providing the name of the error (ERROR_NAME) along with the messages, such as ERROR_MESSAGE, with an exception object. Flurry reports the first ten errors in each session. You can fetch all the application exceptions and specifically uncaught exceptions on Flurry. You can use the logError:message:exception: class method to catch all the uncaught exceptions. These exceptions will be logged in Flurry in the Error section, which is accessible on the Flurry dashboard: // Uncaught Exception Handler - sent through Flurry. void uncaughtExceptionHandler(NSException *exception) { [Flurry logError:@"Uncaught" message:@"Crash" exception:exception]; } - (void)applicationDidFinishLaunching:(UIApplication *)application { NSSetUncaughtExceptionHandler(&uncaughtExceptionHandler); [Flurry startSession:@"YOUR_API_KEY"]; // .... } Flurry also helps you to catch all the uncaught exceptions generated by the application. All the exceptions will be caught by using the NSSetUncaughtExceptionHandler method in which you can pass a method that will catch all the exceptions raised during the application session. All the errors reported can also be tracked using the logError:message:error: method. You can pass the error name, message, and object to log the NSError error on Flurry as shown in the following code: - (void) webView:(UIWebView *)webView didFailLoadWithError:(NSError *)error { [Flurry logError:@"WebView No Load" message:[error localizedDescription] error:error]; } Tracking versions When you develop applications for mobile devices, it's obvious that you will evolve your application at every stage, pushing the latest updates for the application, which creates a new version of the application on the application store. To track the application based on these versions, you need to set up the Flurry to track your application versions as well. This can be done using the following code: [Flurry setAppVersion:App_Version_Number]; So by using the aforementioned method, you can track your application based on its version. For example, if you have released an application and unfortunately it's having a critical bug, then you can track your application based on the current version and the errors that are tracked by Flurry from the application. You can access data generated from Flurry's Dashboards by navigating to Flurry Classic. This will, by default, load a time-based graph of the application session for all versions. However, you can access the user session graph by selecting a version from the drop-down list as shown in the following screenshot: This is how the drop-down list will appear. Select a version and click on Update as shown in the following screenshot: The previous action will generate a version-based graph for a user's session with the as number of times users have opened the app in the given time frame shown in the following screenshot: Along with that, Flurry also provides user retention graphs to gauge the number of users and the usage of application over a period of time. Summary In this article we explored the ways to track the application on Flurry and to gather meaningful data on Flurry by setting goals to track the application. Then we learned how to track the time spent by users on the application along using user data tracking and retention graphs to gauge the number of users. resources for article: further resources on this subject: Data Analytics [article] Learning Data Analytics with R and Hadoop [article] Limits of Game Data Analysis [article]
Read more
  • 0
  • 0
  • 1481
article-image-preparation-analysis-data-source
Packt
22 Nov 2013
6 min read
Save for later

Preparation Analysis of Data Source

Packt
22 Nov 2013
6 min read
(For more resources related to this topic, see here.) List of data sources Here, the wide varieties of data sources that are supported in the PowerPivot interface are given in brief. The vital part is to install providers such as OLE DB and ODBC that support the existing data source, because when installing the PowerPivot add-in, it will not install the provider too and some providers might already be installed with other applications. For instance, if there is a SQL Server installed in the system, the OLE DB provider will be installed with the SQL Server, so that later on it wouldn't be necessary to install OLE DB while adding data sources by using the SQL Server as a data source. Hence, make sure to verify the provider before it is added as a data source. Perhaps the provider is required only for relational database data sources. Relational database By using RDBMS, you can import tables and views into the PowerPivot workbook. The following is a list of various data sources: Microsoft SQL Server Microsoft SQL Azure Microsoft SQL Server Parallel Data Warehouse Microsoft Access Oracle Teradata Sybase Informix IBM DB2 Other providers (OLE DB / ODBC) Multidimensional sources Multidimensional data sources can only be added from Microsoft Analysis Services (SSAS). Data feeds The three types of data feeds are as follows: SQL Server Reporting Service (SSRS) Azure DataMarket dataset Other feeds such as atom service documents and single data feed Text files The two types of text files are given as follows: Excel file Text file Purpose of importing data from a variety of sources In order to make a decision about a particular subject area, you should analyze all the required data that is relevant to the subject area. If the data is stored in a variety of data sources, importing the data from different data sources has to be done. If all the data is only in one data source, only the data needs to be imported from the required tables for the subject and then various types of analysis can be done. The reason why users need to import data from different data sources is that they would then have an ample amount of data when they need to make any analysis. Another generic reason would be to cross-analyze data from different business systems such as Customer Relationship Management (CRM) and Campaign Management System (CMS). Data sourcing from only one source wouldn't be as sophisticated as an analysis done from different data sources as the amount of data from which the analysis was done for multisourced data is more detailed than the data from only a single source. It also might reveal conflicts between multiple data sources. In real time, usually in the e-commerce industry, blogs, and forum websites wouldn't ask for more details about customers at the time of registration, because the time consumed for long registrations would discourage the user, leading to cancellation the of the order. For instance, the customer table that would be stored in the database of an e-commerce industry would contain the following attributes: Customer FirstName LastName E-mail BirthDate Zip Code Gender However, this kind of industry needs to know their customers more in order to increase their sales. Since the industry only saves a few attributes about the customer during registration, it is difficult to track down the customers and it is even more difficult to advertise according to individual customers. Therefore, in order to find some relevant data about the customers, the e-commerce industries try to make another data source using the Internet or other types of sources. For instance, by using the Postcode given by the customer during registration, the user can get Country | State | City from various websites and then use the information obtained to make a table format either in Excel or CSV, as follows: Location Postcode City State Country So, finally the user would have two sources—one source is from the user's RDBMS database and the other source is from the Excel file created later—both of these can be used in order to make a customer analysis based on their location. General overview of ETL The main goal is to facilitate the development of data migration applications by applying data transformations. Extraction Transform Load (ETL) comprises of the first step in a data warehouse process. It is the most important and the hardest part, because it determines the quality of the data warehouse and the scope of the analyses the users will be able to build upon it. Let us discuss in detail what ETL is. The first substep is extraction. As the users would want to regroup all the information a company has, they will have to collect the data from various heterogeneous sources (operational data such as databases and/or external data such as CSV files) and various models (relational, geographical, web pages, and so on). The difficulty starts here. As data sources are heterogeneous, formats are usually different, and the users would have different types of data models for similar information (different names and/or formats) and different keys for the same objects. These are some of the main problems. The aim of the transformation task is to correct such problems, as much as possible. Let us now see what a transformation task is. The users would need to transform heterogeneous data of different sources into homogeneous data. Here are some examples of what they can do: Extraction of the data sources Identification of relevant data sources Filtering of non-admissible data Modification of formats or values Summary This article shows the users how to prepare data for analysis. We also covered different types of data sources that can be imported into the PowerPivot interface. Some information about the provider and a general overview of the ETL process has also been given. Users now know how the ETL process works in the PowerPivot interface; also, users were shown an introduction and the advantages of DAX. Resources for Article: Further resources on this subject: Creating a pivot table [Article] What are SSAS 2012 dimensions and cube? [Article] An overview of Oracle Hyperion Interactive Reporting [Article]
Read more
  • 0
  • 0
  • 1468

article-image-article-ibm-cognos-business-intelligence-dashboard-business-insight-advanced
Packt
16 Jul 2012
4 min read
Save for later

IBM Cognos 10 Business Intelligence

Packt
16 Jul 2012
4 min read
Introducing IBM Cognos 10 BI Cognos Connection In this recipe we will be exploring Cognos Connection, which is the user interface presented to the user when he/she logs in to IBM Cognos 10 BI for the first time. IBM Cognos 10 BI, once installed and configured, can be accessed through the Web using supported web browsers. For a list of supported web browsers, refer to the Installation and Configuration Guide shipped with the product. Getting ready As stated earlier, make sure that IBM Cognos 10 BI is installed and configured. Install and configure the GO Sales and GO Data Warehouse samples. Use the gateway URI to log on to the web interface called Cognos Connection. How to do it... To explore Cognos Connection, perform the following steps: Log on to Cognos Connection using the gateway URI that may be similar to http://<HostName>:<PortNumber>/ibmcognos/cgi-bin/cognos.cgi. Take note of the Cognos Connection interface. It has the GO Sales and GO Data Warehouse samples visible. Note the blue-colored folder icon, shown as in the preceding screenshot. It represents metadata model packages that are published to Cognos Connection using the Cognos Framework Manager tool. These packages have objects that represent business data objects, relationships, and calculations, which can be used to author reports and dashboards. Refer to the book, IBM Cognos TM1 Cookbook by Packt Publishing to learn how to create metadata models packages. From the toolbar, click on Launch. This will open a menu, showing different studios, each having different functionality, as shown in the following screenshot: We will use Business Insight and Business Insight Advanced, which are the first two choices in the preceding menu. These are the two components used to create and view dashboards. For other options, refer to the corresponding books by the same publisher. For instance, refer to the book, IBM Cognos 8 Report Studio Cookbook to know more about creating and distributing complex reports. Query Studio and Analysis Studio are meant to provide business users with the facility to slice and dice business data themselves. Event Studio is meant to define business situations and corresponding actions. Coming back to Cognos Connection, note that a yellow-colored folder icon, which is shown as represents a user-defined folder, which may or may not contain other published metadata model packages, reports, dashboards, and other content. In our case, we have a user-defined folder called Samples. This was created when we installed and configured samples shipped with the product. Click on the New Folder icon, which is represented by , on the toolbar to create a user-defined folder. Other options are also visible here, for instance to create a new dashboard.   Click on the user-defined folder—Samples to view its contents, as shown in the following screenshot: As shown in the preceding screenshot, it has more such folders, each having its own content. The top part of the pane shows the navigation path. Let's navigate deeper into Models | Business Insight Samples to show some sample dashboards, created using IBM Cognos Business Insight, as shown in the following screenshot: Click on one of these links to view the corresponding dashboard. For instance, click on Sales Dashboard (Interactive) to view the dashboard, as shown in the following screenshot: The dashboard can also be opened in the authoring tool, which is IBM Cognos Business Insight, in this case by clicking on the icon shown as on extreme right, on Cognos Connection. It will show the same result as shown in the preceding screenshot. We will see the Business Insight interface in detail later in this article. How it works... Cognos Connection is the primary user interface that user sees when he/she logs in for the first time. Business data has to be first identified and imported from the metadata model using the Cognos Framework Manager tool. Relationships (inner/outer joins) and calculations are then created, and the resultant metadata model package is published to the IBM Cognos 10 BI Server. This becomes available on Cognos Connection. Users are given access to appropriate studios on Cognos Connection, according to their needs. Analysis, reports, and dashboards are then created and distributed using one of these studios. The preceding sample has used Business Insight, for instance. Later sections in this article will look more into Business Insight and Business Insight Advanced. The next section focuses on the Business Insight interface details from the navigation perspective.
Read more
  • 0
  • 0
  • 1461

Packt
31 Aug 2010
8 min read
Save for later

MySQL 5.1 Plugin: HTML Storage Engine—Reads and Writes

Packt
31 Aug 2010
8 min read
(For more resources on MySQL, see here.) An idea of the HTML engine Ever thought about what your tables might look like? Why not represent a table as a <TABLE>? You would be able to see it, visually, in any browser. Sounds cool. But how could we make it work? We want a simple engine, not an all-purpose Swiss Army Knife HTML-to-SQL converter, which means we will not need any existing universal HTML or XML parsers, but can rely on a fixed file format. For example, something like this: <html><head><title>t1</title></head><body><table border=1><tr><th>col1</th><th>other col</th><th>more cols</th></tr><tr><td>data</td><td>more data</td><td>more data</td></tr><!-- this row was deleted ... --><tr><td>data</td><td>more data</td><td>more data</td></tr>... and so on ...</table></body></html> But even then this engine is way more complex than the previous example, and it makes sense to split the code. The engine could stay, as usual, in the ha_html.cc file, the declarations in ha_html.h, and if we need any utility functions to work with HTML we can put them in the htmlutils.cc file. Flashback A storage engine needs to declare a plugin and an initialization function that fills a handlerton structure. Again, the only handlerton method that we need here is a create() method. #include "ha_html.h"static handler* html_create_handler(handlerton *hton, TABLE_SHARE *table, MEM_ROOT *mem_root){ return new (mem_root) ha_html(hton, table);}static int html_init(void *p){ handlerton *html_hton = (handlerton *)p; html_hton->create = html_create_handler; return 0;}struct st_mysql_storage_engine html_storage_engine ={ MYSQL_HANDLERTON_INTERFACE_VERSION };mysql_declare_plugin(html){ MYSQL_STORAGE_ENGINE_PLUGIN, &html_storage_engine, "HTML", "Sergei Golubchik", "An example HTML storage engine", PLUGIN_LICENSE_GPL, html_init, NULL, 0x0001, NULL, NULL, NULL}mysql_declare_plugin_end; Now we need to implement all of the required handler class methods. Let's start with simple ones: const char *ha_html::table_type() const{ return "HTML";}const char **ha_html::bas_ext() const{ static const char *exts[] = { ".html", 0 }; return exts;}ulong ha_html::index_flags(uint inx, uint part, bool all_parts) const{ return 0;}ulonglong ha_html::table_flags() const{ return HA_NO_TRANSACTIONS | HA_REC_NOT_IN_SEQ | HA_NO_BLOBS;}THR_LOCK_DATA **ha_html::store_lock(THD *thd, THR_LOCK_DATA **to, enum thr_lock_type lock_type){ if (lock_type != TL_IGNORE && lock.type == TL_UNLOCK) lock.type = lock_type; *to ++= &lock; return to;} These methods are familiar to us. They say that the engine is called "HTML", it stores the table data in files with the .html extension, the tables are not transactional, the position for ha_html::rnd_pos() is obtained by calling ha_html::position(), and that it does not support BLOBs. Also, we need a function to create and initialize an HTML_SHARE structure: static HTML_SHARE *find_or_create_share( const char *table_name, TABLE *table){ HTML_SHARE *share; for (share = (HTML_SHARE*)table->s->ha_data; share; share = share->next) if (my_strcasecmp(table_alias_charset, table_name, share->name) == 0) return share; share = (HTML_SHARE*)alloc_root(&table->s->mem_root, sizeof(*share)); bzero(share, sizeof(*share)); share->name = strdup_root(&table->s->mem_root, table_name); share->next = (HTML_SHARE*)table->s->ha_data; table->s->ha_data = share; return share;} It is exactly the same function, only the structure is now called HTML_SHARE, not STATIC_SHARE. Creating, opening, and closing the table Having done the basics, we can start working with the tables. The first operation, of course, is the table creation. To be able to read, update, or even open the table we need to create it first, right? Now, the table is just an HTML file and to create a table we only need to create an HTML file with our header and footer, but with no data between them. We do not need to create any TABLE or Field objects, or anything else—MySQL does it automatically. To avoid repeating the same HTML tags over and over we will define the header and the footer in the ha_html.h file as follows: #define HEADER1 "<html><head><title>"#define HEADER2 "</title></head><body><table border=1>\n"#define FOOTER "</table></body></html>"#define FOOTER_LEN ((int)(sizeof(FOOTER)-1)) As we want a header to include a table name we have split it in two parts. Now, we can create our table: int ha_html::create(const char *name, TABLE *table_arg, HA_CREATE_INFO *create_info){ char buf[FN_REFLEN+10]; strcpy(buf, name); strcat(buf, *bas_ext()); We start by generating a filename. The "table name" that the storage engine gets is not the original table name, it is converted to be a safe filename. All "troublesome" characters are encoded, and the database name is included and separated from the table name with a slash. It means we can safely use name as the filename and all we need to do is to append an extension. Having the filename, we open it and write our data: FILE *f = fopen(buf, "w"); if (f == 0) return errno; fprintf(f, HEADER1); write_html(f, table_arg->s->table_name.str); fprintf(f, HEADER2 "<tr>"); First, we write the header and the table name. Note that we did not write the value of the name argument into the header, but took the table name from the TABLE_SHARE structure (as table_arg->s->table_name.str), because name is mangled to be a safe filename, and we would like to see the original table name in the HTML page title. Also, we did not just write it into the file, we used a write_html() function—this is our utility method that performs the necessary entity encoding to get a well-formed HTML. But let's not think about it too much now, just remember that we need to write it, it can be done later. Now, we iterate over all fields and write their names wrapped in <th>...</th> tags. Again, we rely on our write_html() function here: for (uint i = 0; i < table_arg->s->fields; i++) { fprintf(f, "<th>"); write_html(f, table_arg->field[i]->field_name); fprintf(f, "</th>"); } fprintf(f, "</tr>"); fprintf(f, FOOTER); fclose(f); return 0;} Done, an empty table is created. Opening it is easy too. We generate the filename and open the file just as in the create() method. The only difference is that we need to remember the FILE pointer to be able to read the data later, and we store it in fhtml, which has to be a member of the ha_html object: int ha_html::open(const char *name, int mode, uint test_if_locked){ char buf[FN_REFLEN+10]; strcpy(buf, name); strcat(buf, *bas_ext()); fhtml = fopen(buf, "r+"); if (fhtml == 0) return errno; When parsing an HTML file we will often need to skip over known patterns in the text. Instead of using a special library or a custom pattern parser for that, let's try to use scanf()—it exists everywhere, has a built-in pattern matching language, and it is powerful enough for our purposes. For convenience, we will wrap it in a skip_html() function that takes a scanf() format and returns the number of bytes skipped. Assuming we have such a function, we can finish opening the table: skip_html(fhtml, HEADER1 "%*[^<]" HEADER2 "<tr>"); for (uint i = 0; i < table->s->fields; i++) { skip_html(fhtml, "<th>%*[^<]</th>"); } skip_html(fhtml, "</tr>"); data_start = ftell(fhtml); We skip the first part of the header, then "everything up to the opening angle bracket", which eats up the table name, and the second part of the header. Then we skip individual row headers in a loop and the end of row </tr> tag. In order not to repeat this parsing again we remember the offset where the row data starts. At the end we allocate an HTML_SHARE and initialize lock objects: share = find_or_create_share(name, table); if (share->use_count++ == 0) thr_lock_init(&share->lock); thr_lock_data_init(&share->lock,&lock,NULL); return 0;} Closing the table is simple, and should not come as a surprise to us: int ha_html::close(void){ fclose(fhtml); if (--share->use_count == 0) thr_lock_delete(&share->lock); return 0;}
Read more
  • 0
  • 0
  • 1439
article-image-define-necessary-connections
Packt
02 Dec 2016
5 min read
Save for later

Define the Necessary Connections

Packt
02 Dec 2016
5 min read
In this article by Robert van Mölken and Phil Wilkins, the author of the book Implementing Oracle Integration Cloud Service, where we will see creating connections which is one of the core components of an integration we can easily navigate to the Designer Portal and start creating connections. (For more resources related to this topic, see here.) On the home page, click the Create link of the Connection tile as given in the following screenshot: Because we click on this link the Connections page is loaded, which lists of all created connections, a modal dialogue automatically opens on top of the list. This pop-up shows all the adapter types we can create. For our first integration we define two technology adapter connections, an inbound SOAP connection and an outbound REST connection. Inbound SOAP connection In the pop-up we can scroll down the list and find the SOAP adapter, but the modal dialogue also includes a search field. Just search on SOAP and the list will show the adapters matching the search criteria: Find your adapter by searching on the name or change the appearance from card to list view to show more adapters at ones. Click Select to open the New Connection page. Before we can setup any adapter specific configurations every creation starts with choosing a name and an optional description: Create the connection with the following details: Connection Name FlightAirlinesSOAP_Ch2 Identifier This will be proposed based on the connection name and there is no need to change unless you'd like an alternate name. It is usually the name in all CAPITALS and without spaces and has a max length of 32 characters. Connection Role Trigger The role chosen restricts the connection to be used only in selected role(s). Description This receives in Airline objects as a SOAP service. Click the Create button to accept the details. This will bring us to the specific adapter configuration page where we can add and modify the necessary properties. The one thing all the adapters have in common is the optional Email Address under Connection Administration. This email address is used to send notification to when problems or changes occur in the connection. A SOAP connection consists of three sections; Connection Properties, Security, and an optional Agent Group. On the right side of each section we can find a button to configure its properties.Let's configure each section using the following steps: Click the Configure Connectivity button. Instead of entering in an URL we are uploading the WSDL file. Check the box in the Upload File column. Click the newly shown Upload button. Upload the file ICSBook-Ch2-FlightAirlines-Source WSDL. Click OK to save the properties. Click the Configure Credentials button. In the pop-up that is shown we can configure the security credentials. We have the choice for Basic authentication, Username Password Token, or No Security Policy. Because we use it for our inbound connection we don't have to configure this. Select No Security Policy from the dropdown list. This removes the username and password fields. Click OK to save the properties. We leave the Agent Group section untouched. We can attach an Agent Group if we want to use it as an outbound connection to an on-premises web service. Click Test to check if the connection is working (otherwise it can't be used). For SOAP and REST it simply pings the given domain to check the connectivity, but others for example the Oracle SaaS adapters also authenticate and collect metadata. Click the Save button at the top of the page to persist our changes. Click Exit Connection to return to the list from where we started. Outbound REST connection Now that the inbound connection is created we can create our REST adapter. Click the Create New Connection button to show the Create Connection pop-up again and select the REST adapter. Create the connection with the following details: Connection Name FlightAirlinesREST_Ch2 Identifier This will be proposed based on the connection name Connection Role Invoke Description This returns the Airline objects as a REST/JSON service Email Address Your email address to use to send notifications to Let’s configure the connection properties using the following steps: Click the Configure Connectivity button. Select REST API Base URL for the Connection Type. Enter the URL were your Apiary mock is running on: http://private-xxxx-yourapidomain.apiary-mock.com. Click OK to save the values. Next configure the security credentials using the following steps: Click the Configure Credentials button. Select No Security Policy for the Security Policy. This removes the username and password fields. Click the OK button to save out choice. Click Test at the top to check if the connection is working. Click the Save button at the top of the page to persist our changes. Click Exit Connection to return to the list from where we started. Troubleshooting If the test fails for one of these connections check if the correct WSDL is used or that the connection URL for the REST adapter exists or is reachable. Summary In this article we looked at the processes of creating and testing the necessary connections and the creation of the integration itself. We have seen an inbound SOAP connection and an outbound REST connection. In demonstrating the integration we have also seen how to use Apiary to document and mock our backend REST service. Resources for Article: Further resources on this subject: Getting Started with a Cloud-Only Scenario [article] Extending Oracle VM Management [article] Docker Hosts [article]
Read more
  • 0
  • 0
  • 1417

article-image-gathering-all-rejects-prior-killing-job
Packt
13 Nov 2013
3 min read
Save for later

Gathering all rejects prior to killing a job

Packt
13 Nov 2013
3 min read
(For more resources related to this topic, see here.) Getting ready Open the job jo_cook_ch03_0010_validationSubjob. As you can see, the reject flow has been attached and the output is being sent to a temporary store (tHashMap). How to do it… Add the tJava, tDie, tHashInput, and tFileOutputDelimited components. Add onSubjobOk to tJava from the tFileInputDelimited component. Add a flow from the tHashInput component to the tFileOutputDelimited component. Right-click the tJava component, select Trigger and then Runif. Link the trigger to the tDie component. Click the if link, and add the following code ((Integer)globalMap.get("tFileOutputDelimited_1_NB_LINE")) > 0 Right-click the tJava component, select Trigger, and then Runif. Link this trigger to the tHashInput component. ((Integer)globalMap.get("tFileOutputDelimited_1_NB_LINE")) == 0 The job should now look like the following: Drag the generic schema sc_cook_ch3_0010_genericCustomer to both the tHashInput and tFileOutputDelimited. Run the job. You should see that the tDie component is activated, because the file contained two errors. How it works… What we have done in this exercise is created a validation stage prior to processing the data. Valid rows are held in temporary storage (tHashOutput) and invalid rows are written to a reject file until all input rows are processed. The job then checks to see how many records are rejected (using the RunIf link). In this instance, there are invalid rows, so the RunIf link is triggered, and the job is killed using tDie. By ensuring that the data is correct before we start to process it into a target, we know that the data will be fit for writing to the target, and thus avoiding the need for rollback procedures. The records captured can then be sent to the support team, who will then have a record of all incorrect rows. These rows can be fixed in situ within the source file and the job simply re-run from the beginning. There's more... This article is particularly important when rollback/correction of a job may be particularly complex, or where there may be a higher than expected number of errors in an input. An example would be when there are multiple executions of a job that appends to a target file. If the job fails midway through, then rolling back involves identifying which records were appended to the file by the job before failure, removing them from the file, fixing the offending record, and then re-running. This runs the risk of a second error causing the same thing to happen again. On the other hand, if the job does not die, but a subsection of the data is rejected, then the rejects must be manipulated into the target file via a second manual execution of the job. So, this method enables us to be certain that our records will not fail to write due to incorrect data, and therefore saves our target from becoming corrupted. Summary This article has shown how the rejects are collected before killing a job. This article also shows how incorrect rejects be manipulated into the target file. Resources for Article: Further resources on this subject: Pentaho Data Integration 4: Working with Complex Data Flows [Article] Nmap Fundamentals [Article] Getting Started with Pentaho Data Integration [Article]
Read more
  • 0
  • 0
  • 1397
Modal Close icon
Modal Close icon