Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

How-To Tutorials - Data

1210 Articles
article-image-getting-started-sphinx-search
Packt
31 Mar 2011
6 min read
Save for later

Getting Started with Sphinx Search

Packt
31 Mar 2011
6 min read
Sphinx is a full-text search engine. So, before going any further, we need to understand what full-text search is and how it excels over the traditional searching. What is full-text search? Full-text search is one of the techniques for searching a document or database stored on a computer. While searching, the search engine goes through and examines all of the words stored in the document and tries to match the search query against those words. A complete examination of all the words (text) stored in the document is undertaken and hence it is called a full-text search. Full-text search excels in searching large volumes of unstructured text quickly and effectively. It returns pages based on how well they match the user's query. Traditional search To understand the difference between a normal search and full-text search, let's take an example of a MySQL database table and perform searches on it. It is assumed that MySQL Server and phpMyAdmin are already installed on your system. Time for action – normal search in MySQL Open phpMyAdmin in your browser and create a new database called myblog. Select the myblog database: Create a table by executing the following query: CREATE TABLE `posts` ( `id` INT NOT NULL AUTO_INCREMENT PRIMARY KEY , `title` VARCHAR( 255 ) NOT NULL , `description` TEXT NOT NULL , `created` DATETIME NOT NULL , `modified` DATETIME NOT NULL ) ENGINE = MYISAM; Queries can be executed from the SQL page in phpMyAdmin. You can find the link to that page in the top menu. Populate the table with some records: INSERT INTO `posts`(`id`, `title`, `description`, `created`, `modified`) VALUES (1, 'PHP scripting language', 'PHP is a web scripting language originally created by Rasmus Lerdorf', NOW(), NOW()), (2, 'Programming Languages', 'There are many languages available to cater any kind of programming need', NOW(), NOW()), (3, 'My Life', 'This post is about my life which in a sense is beautiful', NOW(), NOW()), (4, 'Life on Mars', 'Is there any life on mars?', NOW(), NOW()); Next, run the following queries against the table: SELECT * FROM posts WHERE title LIKE 'programming%'; The above query returns row 2. SELECT * FROM posts WHERE description LIKE '%life%'; The above query return rows 3 and 4. SELECT * FROM posts WHERE description LIKE '%scripting language%'; The above query returns row 1. SELECT * FROM posts WHERE description LIKE '%beautiful%' OR description LIKE '%programming%'; The above query returns rows 2 and 3. phpMyAdmin To administer MySQL database, I highly recommend using a GUI interface tool like phpMyAdmin (http://www.phpmyadmin.net). All the above mentioned queries can easily be executed What just happened? We first created a table posts to hold some data. Each post has a title and a description. We then populated the table with some records. With the first SELECT query we tried to find all posts where the title starts with the word programming. This correctly gave us the row number 2. But what if you want to search for the word anywhere in the field and not just at that start? For this we fired the second query, wherein we searched for the word life anywhere in the description of the post. Again this worked pretty well for us and as expected we got the result in the form of row numbers 3 and 4. Now what if we wanted to search for multiple words? For this we fired the third query where we searched for the words scripting language. As row 1 has those words in its description, it was returned correctly. Until now everything looked fine and we were able to perform searches without any hassle. The query gets complex when we want to search for multiple words and those words are not necessarily placed consecutively in a field, that is, side by side. One such example is shown in the form of our fourth query where we tried to search for the words programming and beautiful in the description of the posts. Since the number of words we need to search for increases, this query gets complicated, and moreover, slow in execution, since it needs to match each word individually. The previous SELECT queries and their output also don't give us any information about the relevance of the search terms with the results found. Relevance can be defined as a measure of how closely the returned database records match the user's search query. In other words, how pertinent the result set is to the search query. Relevance is very important in the search world because users want to see the items with highest relevance at the top of their search results. One of the major reasons for the success of Google is that their search results are always sorted by relevance. MySQL full-text search This is where full-text search comes to the rescue. MySQL has inbuilt support for full-text search and you only need to add FULLTEXT INDEX to the field against which you want to perform your search. Continuing the earlier example of the posts table, let's add a full-text index to the description field of the table. Run the following query: ALTER TABLE `posts` ADD FULLTEXT ( `description` ); The query will add an INDEX of type FULLTEXT to the description field of the posts table. Only MyISAM Engine in MySQL supports the full-text indexes. Now to search for all the records which contain the words programming or beautiful anywhere in their description, the query would be: SELECT * FROM posts WHERE MATCH (description) AGAINST ('beautiful programming'); This query will return rows 2 and 3, and the returned results are sorted by relevance. One more thing to note is that this query takes less time than the earlier query, which used LIKE for matching. By default, the MATCH() function performs a natural language search, it attempts to use natural language processing to understand the nature of the query and then search accordingly. Full-text search in MySQL is a big topic in itself and we have only seen the tip of the iceberg. For a complete reference, please refer to the MySQL manual at http://dev.mysql.com/doc/. Advantages of full-text search The following points are some of the major advantages of full-text search: It is quicker than traditional searches as it benefits from an index of words that is used to look up records instead of doing a full table scan It gives results that can be sorted by relevance to the searched phrase or term, with sophisticated ranking capabilities to find the best documents or records It performs very well on huge databases with millions of records It skips the common words such as the, an, for, and so on When to use a full-text search? When there is a high volume of free-form text data to be searched When there is a need for highly optimized search results When there is a demand for flexible search querying
Read more
  • 0
  • 0
  • 2017

article-image-sphinx-index-searching
Packt
16 Mar 2011
9 min read
Save for later

Sphinx: Index Searching

Packt
16 Mar 2011
9 min read
Client API implementations for Sphinx Sphinx comes with a number of native searchd client API implementations. Some third-party open source implementations for Perl, Ruby, and C++ are also available. All APIs provide the same set of methods and they implement the same network protocol. As a result, they more or less all work in a similar fashion, they all work in a similar fashion. All examples in this article are for PHP implementation of the Sphinx API. However, you can just as easily use other programming languages. Sphinx is used with PHP more widely than any other language. Search using client API Let's see how we can use native PHP implementation of Sphinx API to search. We will add a configuration related to searchd and then create a PHP file to search the index using the Sphinx client API implementation for PHP. Time for action – creating a basic search script Add the searchd config section to /usr/local/sphinx/etc/sphinx-blog.conf: source blog { # source options } index posts { # index options } indexer { # indexer options } # searchd options (used by search daemon) searchd { listen = 9312 log = /usr/local/sphinx/var/log/searchd.log query_log = /usr/local/sphinx/var/log/query.log max_children = 30 pid_file = /usr/local/sphinx/var/log/searchd.pid } Start the searchd daemon (as root user): $ sudo /usr/local/sphinx/bin/searchd -c /usr/local/sphinx/etc/ sphinx-blog.conf Copy the sphinxapi.php file (the class with PHP implementation of Sphinx API) from the sphinx source directory to your working directory: $ mkdir /path/to/your/webroot/sphinx $ cd /path/to/your/webroot/sphinx $ cp /path/to/sphinx-0.9.9/api/sphinxapi.php ./ Create a simple_search.php script that uses the PHP client API class to search the Sphinx-blog index, and execute it in the browser: <?php require_once('sphinxapi.php'); // Instantiate the sphinx client $client = new SphinxClient(); // Set search options $client->SetServer('localhost', 9312); $client->SetConnectTimeout(1); $client->SetArrayResult(true); // Query the index $results = $client->Query('php'); // Output the matched results in raw format print_r($results['matches']); The output of the given code, as seen in a browser, will be similar to what's shown in the following screenshot: What just happened? Firstly, we added the searchd configuration section to our sphinx-blog.conf file. The following options were added to searchd section: listen: This options specifies the IP address and port that searchd will listen on. It can also specify the Unix-domain socket path. This options was introduced in v0.9.9 and should be used instead of the port (deprecated) option. If the port part is omitted, then the default port used is 9312.Examples: listen = localhost listen = 9312 listen = localhost:9898 listen = 192.168.1.25:4000 listen = /var/run/sphinx.s log: Name of the file where all searchd runtime events will be logged. This is an optional setting and the default value is "searchd.log". query_log: Name of the file where all search queries will be logged. This is an optional setting and the default value is empty, that is, do not log queries. max_children: The maximum number of concurrent searches to run in parallel. This is an optional setting and the default value is 0 (unlimited). pid_file: Filename of the searchd process ID. This is a mandatory setting. The file is created on startup and it contains the head daemon process ID while the daemon is running. The pid_file becomes unlinked when the daemon is stopped. Once we were done with adding searchd configuration options, we started the searchd daemon with root user. We passed the path of the configuration file as an argument to searchd. The default configuration file used is /usr/local/sphinx/etc/sphinx.conf. After a successful startup, searchd listens on all network interfaces, including all the configured network cards on the server, at port 9312. If we want searchd to listen on a specific interface then we can specify the hostname or IP address in the value of the listen option: listen = 192.168.1.25:9312 The listen setting defined in the configuration file can be overridden in the command line while starting searchd by using the -l command line argument. There are other (optional) arguments that can be passed to searchd as seen in the following screenshot: searchd needs to be running all the time when we are using the client API. The first thing you should always check is whether searchd is running or not, and start it if it is not running. We then created a PHP script to search the sphinx-blog index. To search the Sphinx index, we need to use the Sphinx client API. As we are working with a PHP script, we copied the PHP client implementation class, (sphinxapi.php) which comes along with Sphinx source, to our working directory so that we can include it in our script. However, you can keep this file anywhere on the file system as long as you can include it in your PHP script. Throughout this article we will be using /path/to/webroot/sphinx as the working directory and we will create all PHP scripts in that directory. We will refer to this directory simply as webroot. We initialized the SphinxClient class and then used the following class methods to set upthe Sphinx client API: SphinxClient::SetServer($host, $port)—This method sets the searchd hostname and port. All subsequent requests use these settings unless this method is called again with some different parameters. The default host is localhost and port is 9312. SphinxClient::SetConnectTimeout($timeout)—This is the maximum time allowed to spend trying to connect to the server before giving up. SphinxClient::SetArrayResult($arrayresult)—This is a PHP client APIspecific method. It specifies whether the matches should be returned as an array or a hash. The Default value is false, which means that matches will be returned in a PHP hash format, where document IDs will be the keys, and other information (attributes, weight) will be the values. If $arrayresult is true, then the matches will be returned in plain arrays with complete per-match information. After that, the actual querying of index was pretty straightforward using the SphinxClient::Query($query) method. It returned an array with matched results, as well as other information such as error, fields in index, attributes in index, total records found, time taken for search, and so on. The actual results are in the $results['matches'] variable. We can run a loop on the results, and it is a straightforward job to get the actual document's content from the document ID and display it. Matching modes When a full-text search is performed on the Sphinx index, different matching modes can be used by Sphinx to find the results. The following matching modes are supported by Sphinx: SPH_MATCH_ALL—This is the default mode and it matches all query words, that is, only records that match all of the queried words will be returned. SPH_MATCH_ANY—This matches any of the query words. SPH_MATCH_PHRASE—This matches query as a phrase and requires a perfect match. SPH_MATCH_BOOLEAN—This matches query as a Boolean expression. SPH_MATCH_EXTENDED—This matches query as an expression in Sphinx internal query language. SPH_MATCH_EXTENDED2—This matches query using the second version of Extended matching mode. This supersedes SPH_MATCH_EXTENDED as of v0.9.9. SPH_MATCH_FULLSCAN—In this mode the query terms are ignored and no text-matching is done, but filters and grouping are still applied. Time for action – searching with different matching modes Create a PHP script display_results.php in your webroot with the following code: <?php // Database connection credentials $dsn ='mysql:dbname=myblog;host=localhost'; $user = 'root'; $pass = ''; // Instantiate the PDO (PHP 5 specific) class try { $dbh = new PDO($dsn, $user, $pass); } catch (PDOException $e){ echo'Connection failed: '.$e->getMessage(); } // PDO statement to fetch the post data $query = "SELECT p.*, a.name FROM posts AS p " . "LEFT JOIN authors AS a ON p.author_id = a.id " . "WHERE p.id = :post_id"; $post_stmt = $dbh->prepare($query); // PDO statement to fetch the post's categories $query = "SELECT c.name FROM posts_categories AS pc ". "LEFT JOIN categories AS c ON pc.category_id = c.id " . "WHERE pc.post_id = :post_id"; $cat_stmt = $dbh->prepare($query); // Function to display the results in a nice format function display_results($results, $message = null) { global $post_stmt, $cat_stmt; if ($message) { print "<h3>$message</h3>"; } if (!isset($results['matches'])) { print "No results found<hr />"; return; } foreach ($results['matches'] as $result) { // Get the data for this document (post) from db $post_stmt->bindParam(':post_id', $result['id'], PDO::PARAM_INT); $post_stmt->execute(); $post = $post_stmt->fetch(PDO::FETCH_ASSOC); // Get the categories of this post $cat_stmt->bindParam(':post_id', $result['id'], PDO::PARAM_INT); $cat_stmt->execute(); $categories = $cat_stmt->fetchAll(PDO::FETCH_ASSOC); // Output title, author and categories print "Id: {$posmt['id']}<br />" . "Title: {$post['title']}<br />" . "Author: {$post['name']}"; $cats = array(); foreach ($categories as $category) { $cats[] = $category['name']; } if (count($cats)) { print "<br />Categories: " . implode(', ', $cats); } print "<hr />"; } } Create a PHP script search_matching_modes.php in your webroot with the following code: <?php // Include the api class Require('sphinxapi.php'); // Include the file which contains the function to display results require_once('display_results.php'); $client = new SphinxClient(); // Set search options $client->SetServer('localhost', 9312); $client->SetConnectTimeout(1); $client->SetArrayResult(true); // SPH_MATCH_ALL mode will be used by default // and we need not set it explicitly display_results( $client->Query('php'), '"php" with SPH_MATCH_ALL'); display_results( $client->Query('programming'), '"programming" with SPH_MATCH_ALL'); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_ALL'); // Set the mode to SPH_MATCH_ANY $client->SetMatchMode(SPH_MATCH_ANY); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_ANY'); // Set the mode to SPH_MATCH_PHRASE $client->SetMatchMode(SPH_MATCH_PHRASE); display_results( $client->Query('php programming'), '"php programming" with SPH_MATCH_PHRASE'); display_results( $client->Query('scripting language'), '"scripting language" with SPH_MATCH_PHRASE'); // Set the mode to SPH_MATCH_FULLSCAN $client->SetMatchMode(SPH_MATCH_FULLSCAN); display_results( $client->Query('php'), '"php programming" with SPH_MATCH_FULLSCAN'); Execute search_matching_modes.php in a browser (http://localhost/sphinx/search_matching_modes.php).
Read more
  • 0
  • 0
  • 3137

article-image-oracle-goldengate-11g-performance-tuning
Packt
01 Mar 2011
12 min read
Save for later

Oracle GoldenGate 11g: Performance Tuning

Packt
01 Mar 2011
12 min read
  Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators        Oracle states that GoldenGate can achieve near real-time data replication. However, out of the box, GoldenGate may not meet your performance requirements. Here we focus on the main areas that lend themselves to tuning, especially parallel processing and load balancing, enabling high data throughput and very low latency. Let's start by taking a look at some of the considerations before we start tuning Oracle GoldenGate. Before tuning GoldenGate There are a number of considerations we need to be aware of before we start the tuning process. For one, we must consider the underlying system and its ability to perform. Let's start by looking at the source of data that GoldenGate needs for replication to work the online redo logs. Online redo Before we start tuning GoldenGate, we must look at both the source and target databases and their ability to read/write data. Data replication is I/O intensive, so fast disks are important, particularly for the online redo logs. Redo logs play an important role in GoldenGate: they are constantly being written to by the database and concurrently being read by the Extract process. Furthermore, adding supplemental logging to a database can increase their size by a factor of 4! Firstly, ensure that only the necessary amount of supplemental logging is enabled on the database. In the case of GoldenGate, the logging of the Primary Key is all that is required. Next, take a look at the database wait events, in particular the ones that relate to redo. For example, if you are seeing "Log File Sync" waits, this is an indicator that either your disk writes are too slow or your application is committing too frequently, or a combination of both. RAID5 is another common problem for redo log writes. Ideally, these files should be placed on their own mirrored storage such as RAID1+0 (mirrored striped sets) or Flash disks. Many argue this to be a misconception with modern high speed disk arrays, but some production systems are still known to be suffering from redo I/O contention on RAID5. An adequate number (and size) of redo groups must be configured to prevent "checkpoint not complete" or "cannot allocate new log" warnings appearing in the database instance alert log. This occurs when Oracle attempts to reuse a log file but the checkpoint that would flush the blocks in the DB buffer cache to disk are still required for crash recovery. The database must wait until that checkpoint completes before the online redolog file can be reused, effectively stalling the database and any redo generation. Large objects (LOBs) Know your data. LOBs can be a problem in data replication by virtue of their size and the ability to extract, transmit, and deliver the data from source to target. Tables containing LOB datatypes should be isolated from regular data to use a dedicated Extract, Data Pump, and Replicat process group to enhance throughput. Also ensure that the target table has a primary key to avoid Full Table Scans (FTS), an Oracle GoldenGate best practice. LOB INSERT operations can insert an empty (null) LOB into a row before updating it with the data. This is because a LOB (depending on its size) can spread its data across multiple Logical Change Records, resulting in multiple DML operations required at the target database. Base lining Before we can start tuning, we must record our baseline. This will provide a reference point to tune from. We can later look back at our baseline and calculate the percentage improvement made from deploying new configurations. An ideal baseline is to find the "breaking point" of your application requirements. For example, the following questions must be answered: What is the maximum acceptable end to end latency? What are the maximum application transactions per second we must accommodate? To answer these questions we must start with a single threaded data replication configuration having just one Extract, one Data Pump, and one Replicat process. This will provide us with a worst case scenario in which to build improvements on. Ideally, our data source should be the application itself, inserting, deleting, and updating "real data" in the source database. However, simulated data with the ability to provide throughput profiles will allow us to gauge performance accurately Application vendors can normally provide SQL injector utilities that simulate the user activity on the system. Balancing the load across parallel process groups The GoldenGate documentation states "The most basic thing you can do to improve GoldenGate's performance is to divide a large number of tables among parallel processes and trails. For example, you can divide the load by schema".This statement is true as the bottleneck is largely due to the serial nature of the Replicat process, having to "replay" transactions in commit order. Although this can be a constraining factor due to transaction dependency, increasing the number of Replicat processes increases performance significantly. However, it is highly recommended to group tables with referential constraints together per Replicat. The number of parallel processes is typically greater on the target system compared to the source. The number and ratio of processes will vary across applications and environments. Each configuration should be thoroughly tested to determine the optimal balance, but be careful not to over allocate, as each parallel process will consume up to 55MB. Increasing the number of processes to an arbitrary value will not necessarily improve performance, in fact it may be worse and you will waste CPU and memory resources. The following data flow diagram shows a load balancing configuration including two Extract processes, three Data Pump, and five Replicats: Considerations for using parallel process groups To maintain data integrity, ensure to include tables with referential constraints between one another in the same parallel process group. It's also worth considering disabling referential constraints on the target database schema to allow child records to be populated before their parents, thus increasing throughput. GoldenGate will always commit transactions in the same order as the source, so data integrity is maintained. Oracle best practice states no more than 3 Replicat processes should read the same remote trail file. To avoid contention on Trail files, pair each Replicat with its own Trail files and Extract process. Also, remember that it is easier to tune an Extract process than a Replicat process, so concentrate on your source before moving your focus to the target. Splitting large tables into row ranges across process groups What if you have some large tables with a high data change rate within a source schema and you cannot logically separate them from the remaining tables due to referential constraints? GoldenGate provides a solution to this problem by "splitting" the data within the same schema via the @RANGE function. The @RANGE function can be used in the Data Pump and Replicat configuration to "split" the transaction data across a number of parallel processes. The Replicat process is typically the source of performance bottlenecks because, in its normal mode of operation, it is a single-threaded process that applies operations one at a time by using regular DML. Therefore, to leverage parallel operation and enhance throughput, the more Replicats the better (dependant on the number of CPUs and memory available on the target system). The RANGE function The way the @RANGE function works is it computes a hash value of the columns specified in the input. If no columns are specified, it uses the table's primary key. GoldenGate adjusts the total number of ranges to optimize the even distribution across the number of ranges specified. This concept can be compared to Hash Partitioning in Oracle tables as a means of dividing data. With any division of data during replication, the integrity is paramount and will have an effect on performance. Therefore, tables having a relationship with other tables in the source schema must be included in the configuration. If all your source schema tables are related, you must include all the tables! Adding Replicats with @RANGE function The @RANGE function accepts two numeric arguments, separated by a comma: Range: The number assigned to a process group, where the first is 1 and the second 2 and so on, up to the total number of ranges. Total number of ranges: The total number of process groups you wish to divide using the @RANGE function. The following example includes three related tables in the source schema and walks through the complete configuration from start to finish. For this example, we have an existing Replicat process on the target machine (dbserver2) named ROLAP01 that includes the following three tables: ORDERS ORDER_ITEMS PRODUCTS We are going to divide the rows of the tables across two Replicat groups. The source database schema name is SRC and target schema TGT. The following steps add a new Replicat named ROLAP02 with the relevant configuration and adjusts Replicat ROLAP01 parameters to suit. Note that before conducting any changes stop the existing Replicat processes and determine their Relative Byte Address (RBA) and Trail file log sequence number. This is important information that we will use to tell the new Replicat process from which point to start. First check if the existing Replicat process is running: GGSCI (dbserver2) 1> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING ROLAP01 00:00:00 00:00:02 Stop the existing Replicat process: GGSCI (dbserver2) 2> stop REPLICAT ROLAP01 Sending STOP request to REPLICAT ROLAP01... Request processed. Add the new Replicat process, using the existing trail file. GGSCI (dbserver2) 3> add REPLICAT ROLAP02, exttrail ./dirdat/tb REPLICAT added. Now add the configuration by creating a new parameter file for ROLAP02. GGSCI (dbserver2) 4> edit params ROLAP02 -- -- Example Replicator parameter file to apply changes -- to target tables -- REPLICAT ROLAP02 SOURCEDEFS ./dirdef/mydefs.def SETENV (ORACLE_SID= OLAP) USERID ggs_admin, PASSWORD ggs_admin DISCARDFILE ./dirrpt/rolap02.dsc, PURGE ALLOWDUPTARGETMAP CHECKPOINTSECS 30 GROUPTRANSOPS 2000 MAP SRC.ORDERS, TARGET TGT.ORDERS, FILTER (@RANGE (1,2)); MAP SRC.ORDER_ITEMS, TARGET TGT.ORDER_ITEMS, FILTER (@RANGE (1,2)); MAP SRC.PRODUCTS, TARGET TGT.PRODUCTS, FILTER (@RANGE (1,2)); Now edit the configuration of the existing Replicat process, and add the @RANGE function to the FILTER clause of the MAP statement. Note the inclusion of the GROUPTRANSOPS parameter to enhance performance by increasing the number of operations allowed in a Replicat transaction. GGSCI (dbserver2) 5> edit params ROLAP01 -- -- Example Replicator parameter file to apply changes -- to target tables -- REPLICAT ROLAP01 SOURCEDEFS ./dirdef/mydefs.def SETENV (ORACLE_SID=OLAP) USERID ggs_admin, PASSWORD ggs_admin DISCARDFILE ./dirrpt/rolap01.dsc, PURGE ALLOWDUPTARGETMAP CHECKPOINTSECS 30 GROUPTRANSOPS 2000 MAP SRC.ORDERS, TARGET TGT.ORDERS, FILTER (@RANGE (2,2)); MAP SRC.ORDER_ITEMS, TARGET TGT.ORDER_ITEMS, FILTER (@RANGE (2,2)); MAP SRC.PRODUCTS, TARGET TGT.PRODUCTS, FILTER (@RANGE (2,2)); Check that both the Replicat processes exist. GGSCI (dbserver2) 6> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING REPLICAT STOPPED ROLAP01 00:00:00 00:10:35 REPLICAT STOPPED ROLAP02 00:00:00 00:12:25 Before starting both Replicat processes, obtain the log Sequence Number (SEQNO) and Relative Byte Address (RBA) from the original trail file. GGSCI (dbserver2) 7> info REPLICAT ROLAP01, detail REPLICAT ROLAP01 Last Started 2010-04-01 15:35 Status STOPPED Checkpoint Lag 00:00:00 (updated 00:12:43 ago) Log Read Checkpoint File ./dirdat/tb000279 <- SEQNO 2010-04-08 12:27:00.001016 RBA 43750979 <- RBA Extract Source Begin End ./dirdat/tb000279 2010-04-01 12:47 2010-04-08 12:27 ./dirdat/tb000257 2010-04-01 04:30 2010-04-01 12:47 ./dirdat/tb000255 2010-03-30 13:50 2010-04-01 04:30 ./dirdat/tb000206 2010-03-30 13:50 First Record ./dirdat/tb000206 2010-03-30 04:30 2010-03-30 13:50 ./dirdat/tb000184 2010-03-30 04:30 First Record ./dirdat/tb000184 2010-03-30 00:00 2010-03-30 04:30 ./dirdat/tb000000 *Initialized* 2010-03-30 00:00 ./dirdat/tb000000 *Initialized* First Record Adjust the new Replicat process ROLAP02 to adopt these values, so that the process knows where to start from on startup. GGSCI (dbserver2) 8> alter replicat ROLAP02, extseqno 279 REPLICAT altered. GGSCI (dbserver2) 9> alter replicat ROLAP02, extrba 43750979 REPLICAT altered. Failure to complete this step will result in either duplicate data or ORA-00001 against the target schema, because GoldenGate will attempt to replicate the data from the beginning of the initial trail file (./dirdat/tb000000) if it exists, else the process will abend. Start both Replicat processes. Note the use of the wildcard (*). GGSCI (dbserver2) 10> start replicat ROLAP* Sending START request to MANAGER ... REPLICAT ROLAP01 starting Sending START request to MANAGER ... REPLICAT ROLAP02 starting Check if both Replicat processes are running. GGSCI (dbserver2) 11> info all Program Status Group Lag Time Since Chkpt MANAGER RUNNING REPLICAT RUNNING ROLAP01 00:00:00 00:00:22 REPLICAT RUNNING ROLAP02 00:00:00 00:00:14 Check the detail of the new Replicat processes. GGSCI (dbserver2) 12> info REPLICAT ROLAP02, detail REPLICAT ROLAP02 Last Started 2010-04-08 14:18 Status RUNNING Checkpoint Lag 00:00:00 (updated 00:00:06 ago) Log Read Checkpoint File ./dirdat/tb000279 First Record RBA 43750979 Extract Source Begin End ./dirdat/tb000279 * Initialized * First Record ./dirdat/tb000279 * Initialized * First Record ./dirdat/tb000279 * Initialized * 2010-04-08 12:26 ./dirdat/tb000279 * Initialized * First Record Generate a report for the new Replicat process ROLAP02. GGSCI (dbserver2) 13> send REPLICAT ROLAP02, report Sending REPORT request to REPLICAT ROLAP02 ... Request processed. Now view the report to confirm the new Replicat process has started from the specified start point. (RBA 43750979 and SEQNO 279). The following is an extract from the report: GGSCI (dbserver2) 14> view report ROLAP02 2010-04-08 14:20:18 GGS INFO 379 Positioning with begin time: Apr 08, 2010 14:18:19 PM, starting record time: Apr 08, 2010 14:17:25 PM at extseqno 279, extrba 43750979.  
Read more
  • 0
  • 0
  • 6095

article-image-oracle-goldengate-considerations-designing-solution
Packt
24 Feb 2011
8 min read
Save for later

Oracle GoldenGate: Considerations for Designing a Solution

Packt
24 Feb 2011
8 min read
  Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators          At a high level, the design must include the following generic requirements: Hardware Software Network Storage Performance All the above must be factored into the overall system architecture. So let's take a look at some of the options and the key design issues. Replication methods So you have a fast reliable network between your source and target sites. You also have a schema design that is scalable and logically split. You now need to choose the replication architecture; One to One, One to Many, active-active, active-passive, and so on. This consideration may already be answered for you by the sheer fact of what the system has to achieve. Let's take a look at some configuration options. Active-active Let's assume a multi-national computer hardware company has an office in London and New York. Data entry clerks are employed at both sites inputting orders into an Order Management System. There is also a procurement department that updates the system inventory with volumes of stock and new products related to a US or European market. European countries are managed by London, and the US States are managed by New York. A requirement exists where the underlying database systems must be kept in synchronisation. Should one of the systems fail, London users can connect to New York and vice-versa allowing business to continue and orders to be taken. Oracle GoldenGate's active-active architecture provides the best solution to this requirement, ensuring that the database systems on both sides of the pond are kept synchronised in case of failure. Another feature the active-active configuration has to offer is the ability to load balance operations. Rather than have effectively a DR site in both locations, the European users could be allowed access to New York and London systems and viceversa. Should a site fail, then the DR solution could be quickly implemented. Active-passive The active-passive bi-directional configuration replicates data from an active primary database to a full replica database. Sticking with the earlier example, the business would need to decide which site is the primary where all users connect. For example, in the event of a failure in London, the application could be configured to failover to New York. Depending on the failure scenario, another option is to start up the passive configuration, effectively turning the active-passive configuration into active-active. Cascading The Cascading GoldenGate topology offers a number of "drop-off" points that are intermediate targets being populated from a single source. The question here is "what data do I drop at which site?" Once this question has been answered by the business, it is then a case of configuring filters in Replicat parameter files allowing just the selected data to be replicated. All of the data is passed on to the next target where it is filtered and applied again. This type of configuration lends itself to a head office system updating its satellite office systems in a round robin fashion. In this case, only the relevant data is replicated at each target site. Another design, is the Hub and Spoke solution, where all target sites are updated simultaneously. This is a typical head office topology, but additional configuration and resources would be required at the source site to ship the data in a timely manner. The CPU, network, and file storage requirements must be sufficient to accommodate and send the data to multiple targets. Physical Standby A Physical Standby database is a robust Oracle DR solution managed by the Oracle Data Guard product. The Physical Standby database is essentially a mirror copy of its Primary, which lends itself perfectly for failover scenarios. However , it is not easy to replicate data from the Physical Standby database, because it does not generate any of its own redo. That said, it is possible to configure GoldenGate to read the archived standby logs in Archive Log Only (ALO) mode. Despite being potentially slower, it may be prudent to feed a downstream system on the DR site using this mechanism, rather than having two data streams configured from the Primary database. This reduces network bandwidth utilization, as shown in the following diagram: Reducing network traffic is particularly important when there is considerable distance between the primary and the DR site. Networking The network should not be taken for granted. It is a fundamental component in data replication and must be considered in the design process. Not only must it be fast, it must be reliable. In the following paragraphs, we look at ways to make our network resilient to faults and subsequent outages, in an effort to maintain zero downtime. Surviving network outages Probably one of your biggest fears in a replication environment is network failure. Should the network fail, the source trail will fill as the transactions continue on the source database, ultimately filling the filesystem to 100% utilization, causing the Extract process to abend. Depending on the length of the outage, data in the database's redologs may be overwritten causing you the additional task of configuring GoldenGate to extract data from the database's archived logs. This is not ideal as you already have the backlog of data in the trail files to ship to the target site once the network is restored. Therefore, ensure there is sufficient disk space available to accommodate data for the longest network outage during the busiest period. Disks are relatively cheap nowadays. Providing ample space for your trail files will help to reduce the recovery time from the network outage. Redundant networks One of the key components in your GoldenGate implementation is the network. Without the ability to transfer data from the source to the target, it is rendered useless. So, you not only need a fast network but one that will always be available. This is where redundant networks come into play, offering speed and reliability. NIC teaming One method of achieving redundancy is Network Interface Card (NIC) teaming or bonding. Here two or more Ethernet ports can be "coupled" to form a bonded network supporting one IP address. The main goal of NIC teaming is to use two or more Ethernet ports connected to two or more different access network switches thus avoiding a single point of failure. The following diagram illustrates the redundant features of NIC teaming: Linux (OEL/RHEL 4 and above) supports NIC teaming with no additional software requirements. It is purely a matter of network configuration stored in text files in the /etc/sysconfig/network-scripts directory. The following steps show how to configure a server for NIC teaming: First, you need to log on as root user and create a bond0 config file using the vi text editor. # vi /etc/sysconfig/network-scripts/ifcfg-bond0 Append the following lines to it, replacing the IP address with your actual IP address, then save file and exit to shell prompt: DEVICE=bond0 IPADDR=192.168.1.20 NETWORK=192.168.1.0 NETMASK=255.255.255.0 USERCTL=no BOOTPROTO=none ONBOOT=yes Choose the Ethernet ports you wish to bond, and then open both configurations in turn using the vi text editor, replacing ethn with the respective port number. # vi /etc/sysconfig/network-scripts/ifcfg-eth2 # vi /etc/sysconfig/network-scripts/ifcfg-eth4 Modify the configuration as follows: DEVICE=ethn USERCTL=no ONBOOT=yes MASTER=bond0 SLAVE=yes BOOTPROTO=none Save the files and exit to shell prompt. To make sure the bonding module is loaded when the bonding interface (bond0) is brought up, you need to modify the kernel modules configuration file: # vi /etc/modprobe.conf Append the following two lines to the file: alias bond0 bonding options bond0 mode=balance-alb miimon=100 Finally, load the bonding module and restart the network services: # modprobe bonding # service network restart You now have a bonded network that will load balance when both physical networks are available, providing additional bandwidth and enhanced performance. Should one network fail, the available bandwidth will be halved, but the network will still be available. Non-functional requirements (NFRs) Irrespective of the functional requirements, the design must also include the nonfunctional requirements (NFR) in order to achieve the overall goal of delivering a robust, high performance, and stable system. Latency One of the main NFRs is performance. How long does it take to replicate a transaction from the source database to the target? This is known as end-to-end latency that typically has a threshold that must not be breeched in order to satisfy the specified NFR. GoldenGate refers to latency as lag, which can be measured at different intervals in the replication process. These are: Source to Extract: The time taken for a record to be processed by the Extract compared to the commit timestamp on the database Replicat to Target: The time taken for the last record to be processed by the Replicat compared to the record creation time in the trail file A well designed system may encounter spikes in latency but it should never be continuous or growing. Trying to tune GoldenGate when the design is poor is a difficult situation to be in. For the system to perform well you may need to revisit the design.  
Read more
  • 0
  • 0
  • 4669

article-image-oracle-goldengate-11g-configuration-high-availability
Packt
23 Feb 2011
10 min read
Save for later

Oracle GoldenGate 11g: Configuration for High Availability

Packt
23 Feb 2011
10 min read
  Oracle GoldenGate 11g Implementer's guide Design, install, and configure high-performance data replication solutions using Oracle GoldenGate The very first book on GoldenGate, focused on design and performance tuning in enterprise-wide environments Exhaustive coverage and analysis of all aspects of the GoldenGate software implementation, including design, installation, and advanced configuration Migrate your data replication solution from Oracle Streams to GoldenGate Design a GoldenGate solution that meets all the functional and non-functional requirements of your system Written in a simple illustrative manner, providing step-by-step guidance with discussion points Goes way beyond the manual, appealing to Solution Architects, System Administrators and Database Administrators       This includes the following discussion points: Shared storage options Configuring clusterware for GoldenGate GoldenGate on Exadata Failover We also touch upon the new features available in Oracle 11g Release 2, including the Database Machine, that provides a "HA solution in a box". GoldenGate on RAC A number of architectural options are available to Oracle RAC, particularly surrounding storage. Since Oracle 11g Release 2, these options have grown, making it possible to configure the whole RAC environment using Oracle software, whereas in earlier versions, third party clusterware and storage solutions had to be used. Let's start by looking at the importance of shared storage. Shared storage The secret to RAC is "share everything" and this also applies to GoldenGate. RAC relies on shared storage in order to support a single database having multiple instances, residing on individual nodes. Therefore, as a minimum the GoldenGate checkpoint and trail files must be on the shared storage so all Oracle instances can "see" them. Should a node fail, a surviving node can "take the reins" and continue the data replication without interruption. Since Oracle 11g Release 2, in addition to ASM, the shared storage can be an ACFS or a DBFS. Automatic Storage Management Cluster File System (ACFS) ACFS is Oracle's multi-platform, scalable file system, and storage management technology that extends ASM functionality to support files maintained outside of the Oracle Database. This lends itself perfectly to supporting the required GoldenGate files. However, any Oracle files that could be stored in regular ASM diskgroups are not supported by ACFS. This includes the OCR and Voting files that are fundamental to RAC. Database File System (DBFS) Another Oracle solution to the shared filesystem is DBFS, which creates a standard file system interface on top of files and directories that are actually stored as SecureFile LOBs in database tables. DBFS is similar to Network File System (NFS) in that it provides a shared network file system that "looks like" a local file system. On Linux, you need a DBFS client that has a mount interface that utilizes the Filesystem in User Space (FUSE) kernel module, providing a file-system mount point to access the files stored in the database. This mechanism is also ideal for sharing GoldenGate files among the RAC nodes. It also supports the Oracle Cluster Registry (OCR) and Voting files, plus Oracle homes. DBFS requires an Oracle Database 11gR2 (or higher) database. You can use DBFS to store GoldenGate recovery related files for lower releases of the Oracle Database, but you will need to create a separate Oracle Database 11gR2 (or higher) database to host the file system. Configuring Clusterware for GoldenGate Oracle Clusterware will ensure that GoldenGate can tolerate server failures by moving processing to another available server in the cluster. It can support the management of a third party application in a clustered environment. This capability will be used to register and relocate the GoldenGate Manager process. Once the GoldenGate software has been installed across the cluster and a script to start, check, and stop GoldenGate has been written and placed on the shared storage (so it is accessible to all nodes), the GoldenGate Manager process can be registered in the cluster. Clusterware commands can then be used to create, register and set privileges on the virtual IP address (VIP) and the GoldenGate application using standard Oracle Clusterware commands. The Virtual IP The VIP is a key component of Oracle Clusterware that can dynamically relocate the IP address to another server in the cluster, allowing connections to failover to a surviving node. The VIP provides faster failovers compared to the TCP/IP timeout based failovers on a server's actual IP address. On Linux this can take up to 30 minutes using the default kernel settings! The prerequisites are as follows: The VIP must be a fixed IP address on the public subnet. The interconnect must use a private non-routable IP address, ideally over Gigabit Ethernet. Use a VIP to access the GoldenGate Manager process to isolate access to the Manager process from the physical server. Remote data pump processes must also be configured to use the VIP to contact the GoldenGate Manager. The following diagram illustrates the RAC architecture for 2 nodes (rac1 and rac2) supporting 2 Oracle instances (oltp1 and oltp2). The VIPs are 11.12.1.6 and 11.12.1.8 respectively, in this example: The user community or application servers connect to either instance via the VIP and a load balancing database service, that has been configured on the database and in the client's SQL*Net tnsnames.ora file or JDBC connect string. The following example shows a typical tnsnames entry for a load balancing service. Load balancing is the default and does not need to be explicitly configured. Hostnames can replace the IP addresses in the tnsnames.ora file as long as they are mapped to the relevant VIP in the client's system hosts file. OLTP = (DESCRIPTION = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP)(HOST = 11.12.1.6)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = 11.12.1.8)(PORT = 1521)) ) (CONNECT_DATA = (SERVICE_NAME = oltp) ) ) This is the recommended approach for scalability and performance and is known as active-active. Another HA solution is the active-passive configuration, where users connect to one instance only leaving the passive instance available for node failover. The term active-active or active-passive in this context relates to 2-node RAC environments and is not to be confused with the GoldenGate topology of the same name. On Linux systems, the database server hostname will typically have the following format in the /etc/hosts file. For Public VIP: <hostname>-vip For Private Interconnect: <hostname>-pri The following is an example hosts file for a RAC node: 127.0.0.1 localhost.localdomain localhost ::1 localhost6.localdomain6 localhost6 #Virtual IP Public Address 11.12.1.6 rac1-vip rac1-vip 11.12.1.8 rac2-vip rac2-vip #Private Address 192.168.1.33 rac1-pri rac1-pri 192.168.1.34 rac2-pri rac2-pri Creating a GoldenGate application The following steps guide you through the process of configuring GoldenGate on RAC. This example is for an Oracle 11g Release 1 RAC environment: Install GoldenGate as the Oracle user on each node in the cluster or on a shared mount point that is visible from all nodes. If installing the GoldenGate home on each node, ensure the checkpoint and trails files are on the shared filesystem. Ensure the GoldenGate Manager process is configured to use the AUTOSTART and AUTORESTART parameters, allowing GoldenGate to start the Extract and Replicat processes as soon as the Manager starts. Configure a VIP for the GoldenGate application as the Oracle user from 1 node. <CLUSTERWARE_HOME>/bin/crs_profile -create ggsvip -t application -a <CLUSTERWARE_HOME>/bin/usrvip -o oi=bond1,ov=11.12.1.6,on=255.255.255.0 CLUSTERWARE_HOME is the oracle home in which Oracle Clusterware is installed. E.g. /u01/app/oracle/product/11.1.0/crs ggsvip is the name of the application VIP that you will create. oi=bond1 is the public interface in this example. ov=11.12.1.6 is the virtual IP address in this example. on=255.255.255.0 is the subnet mask. This should be the same subnet mask for the public IP address. Next, register the VIP in the Oracle Cluster Registry (OCR) as the Oracle user. <CLUSTERWARE_HOME>/bin/crs_register ggsvip Set the ownership of the VIP to the root user who assigns the IP address. Execute the following command as the root user: <CLUSTERWARE_HOME>/bin/crs_setperm ggsvip -o root Set read and execute permissions for the Oracle user. Execute the following command as the root user: <CLUSTERWARE_HOME>/bin/crs_setperm ggsvip -u user:oracle:r-x As the Oracle user, start the VIP. <CLUSTERWARE_HOME>/bin/crs_start ggsvip To verify the the VIP is running, execute the following command then ping the IP address from a different node in the cluster. <CLUSTERWARE_HOME>/bin/crs_stat ggsvip -t Name Type Target State Host ------ ------- ------ ------- ------ ggsvip application ONLINE ONLINE rac1 ping -c3 11.12.1.6 64 bytes from 11.12.1.6: icmp_seq=1 ttl=64 time=0.096 ms 64 bytes from 11.12.1.6: icmp_seq=2 ttl=64 time=0.122 ms 64 bytes from 11.12.1.6: icmp_seq=3 ttl=64 time=0.141 ms --- 11.12.1.6 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2000ms rtt min/avg/max/mdev = 0.082/0.114/0.144/0.025 ms Oracle Clusterware supports the use of "Action" scripts within its configuration, allowing bespoke scripts to be executed automatically during failover. Create a Linux shell script named ggs_action.sh that accepts 3 arguments: start, stop or check. Place the script in the <CLUSTERWARE_HOME>/crs/public directory on each node or if you have installed GoldenGate on a shared mount point, copy it there. Ensure that start and stop: returns 0 if successful, 1 if unsuccessful. check: returns 0 if GoldenGate is running, 1 if it is not running. As the Oracle user, make sure the script is executable. chmod 754 ggs_action.sh To check the GoldenGate processes are running, ensure the action script has the following commands. The following example can be expanded to include checks for Extract and Replicat processes: First check the Linux process ID (PID) the GoldenGate Manager process is configured to use. GGS_HOME=/mnt/oracle/ggs # Oracle GoldenGate home pid=`cut -f8 ${GGS_HOME}/dirpcs/MGR.pcm` Then, compare this value (in variable $pid) with the actual PID the Manager process is using. The following example will return the correct PID of the Manager process if it is running. ps -e |grep ${pid} |grep mgr |cut -d " " -f2 The code to start and stop a GoldenGate process is simply a call to ggsci. ggsci_command=$1 ggsci_output=`${GGS_HOME}/ggsci << EOF ${ggsci_command} exit EOF` Create a profile for the GoldenGate application as the Oracle user from 1 node. <CLUSTERWARE_HOME>/bin/crs_profile -create goldengate_app -t application -r ggsvip -a <CLUSTERWARE_HOME>/crs/public/ggs_action.sh -o ci=10 CLUSTERWARE_HOME is the Oracle home in which Oracle Clusterware is installed. For example: /u01/app/oracle/product/11.1.0/crs -create goldengate_app the application name is goldengate_app. -r specifies the required resources that must be running for the application to start. In this example, the dependency is the VIP ggsvip must be running before Oracle GoldenGate starts. -a specifies the action script. For example: <CLUSTERWARE_HOME>/crs/public/ggs_action.sh -o specifies options. In this example the only option is the Check Interval which is set to 10 seconds. Next, register the application in the Oracle Cluster Registry (OCR) as the oracle user. <CLUSTERWARE_HOME>/bin/crs_register goldengate_app Now start the Goldengate application as the Oracle user. <CLUSTERWARE_HOME>/bin/crs_start goldengate_app Check that the application is running. <CLUSTERWARE_HOME>/bin/crs_stat goldengate_app -t Name Type Target State Host ------ ------ -------- ----- ---- goldengate_app application ONLINE ONLINE rac1 You can also stop GoldenGate from Oracle Clusterware by executing the following command as the oracle user: CLUSTERWARE_HOME/bin/crs_stop goldengate_app Oracle has published a White Paper on "Oracle GoldenGate high availability with Oracle Clusterware". To view the Action script mentioned in this article, refer to the document, which can be downloaded in PDF format from the Oracle Website at the following URL: http://www.oracle.com/technetwork/middleware/goldengate/overview/ha-goldengate-whitepaper-128197.pdf  
Read more
  • 0
  • 0
  • 5527

article-image-openam-oracle-dsee-and-multiple-data-stores
Packt
03 Feb 2011
6 min read
Save for later

OpenAM: Oracle DSEE and Multiple Data Stores

Packt
03 Feb 2011
6 min read
  OpenAM Written and tested with OpenAM Snapshot 9—the Single Sign-On (SSO) tool for securing your web applications in a fast and easy way The first and the only book that focuses on implementing Single Sign-On using OpenAM Learn how to use OpenAM quickly and efficiently to protect your web applications with the help of this easy-to-grasp guide Written by Indira Thangasamy, core team member of the OpenSSO project from which OpenAM is derived Real-world examples for integrating OpenAM with various applications Oracle Directory Server Enterprise Edition The Oracle Directory Server Enterprise Edition type of identity store is predominantly used by customers and natively supported by the OpenSSO server. It is the only data store where all the user management features offered by OpenSSO is supported with no exceptions. In the console it is labeled as Sun DS with OpenSSO schema. After Oracle acquired Sun Microsystems Inc., the brand name for the Sun Directory Server Enterprise Edition has been changed to Oracle Directory Server Enterprise Edition (ODSEE). You need to have the ODSEE configured prior to creating the data store either from the console or CLI as shown in the following section. Creating a data store for Oracle DSEE Creating a data store from the OpenSSO console is the easiest way to achieve this task. However, if you want to automate this process in a repeatable fashion, then using the ssoadm tool is the right choice. However, the problem here is to obtain the list of the attributes and their corresponding values. It is not documented anywhere in the publicly available documentation for OpenSSO. Let me show you the easy way out of this by providing the required options and their values for the ssoadm: ./ssoadm create-datastore -m odsee_datastore -t LDAPv3ForAMDS -u amadmin-f /tmp/.passwd_of_amadmin -D data_store_odsee.txt -e / This command will create the data store that talks to an Oracle DSEE store. The key options in the preceding command are LDAPv3ForAMDS (instructing the server to create a datastore of type ODSEE) and the properties that have been included in the data_ store_odsee.txt. You can find the complete contents of this file as part of the code bundle provided by Packt Publishers. Some excerpts from the file are given as follows: sun-idrepo-ldapv3-config-ldap-server=odsee.packt-services. net:4355 sun-idrepo-ldapv3-config-authid=cn=directory manager sun-idrepo-ldapv3-config-authpw=dssecret12 com.iplanet.am.ldap.connection.delay.between.retries=1000 sun-idrepo-ldapv3-config-auth-naming-attr=uid <contents removed to save paper space > sun-idrepo-ldapv3-config-users-search-attribute=uid sun-idrepo-ldapv3-config-users-search-filter=(objectclass= inetorgperson) sunIdRepoAttributeMapping= sunIdRepoClass=com.sun.identity.idm.plugins.ldapv3.LDAPv3Repo sunIdRepoSupportedOperations=filteredrole=read,create,edit, delete sunIdRepoSupportedOperations=group=read,create,edit,delete sunIdRepoSupportedOperations=realm=read,create,edit,delete, service sunIdRepoSupportedOperations=role=read,create,edit,delete sunIdRepoSupportedOperations=user=read,create,edit,delete, service Among these properties, the first three provide critical information for the whole thing to work. As it is evident from the name of the property they denote the ODSEE server name, port, bind DN, and the password. The password is in plain text so you need to remove the password from the input file after creating the data store, for security reasons. Updating the data store Like we discussed in the previous section, one can invoke the update-datastore sub command with the appropriate properties and its values along with the -a switch to the ssoadm tool. If you have more properties to be updated, then put them in a text file and use it with the -D option like the create-datastore sub command. Deleting the data store A data store can be deleted by just selecting it's specific name from the console. There will be no warnings issued while deleting the data stores, and you could eventually delete all of the data stores in a realm. Be cautious about this behavior, you might end up deleting all of them unintentionally. Deleting data store does not remove any existing data in the underlying LDAP server, it only removes the configuration from the OpenSSO server: ./ssoadm delete-datastores -e / -m odsee_datastore -f /tmp/ .passwd_of_amadmin -u amadmin Data store for OpenDS OpenDS is one of the popular LDAP servers that is completely written in Java and available freely under open source license. As a matter of fact the embedded configuration store that is built in the OpenSSO server is the embedded version of OpenDS. It has been fully tested with the OpenDS standalone version for the identity store usage. The data store creation and management are pretty much similar to the steps described in the foregoing section, except the type of store and the corresponding properties' values. The properties and their values are given in the file data_store_opends.txt (available as part of code bundle). Invoke the ssoadm tool with this property file after making appropriate changes to fit to your deployment. Here is the sample command that creates the datastore for OpenDS: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "OpenDS-store" -t LDAPv3ForOpenDS -D data_store_opends.txt Data store for Tivoli DS The IBM Tivoli Directory Server 6.2 is one of the supported LDAP servers for the OpenSSO server to provide authentication and authorization services. A specific sub configuration LDAPv3ForTivoli is available out of the box to support this server. You can find the data_store_tivoli.txt to create a new data store by supplying the -t LDAPv3ForTivoli option to the ssoadm tool. Here is the sample command that creates the datastore for Tivoli DS. The sample (data_store_tivoli. txt can be found as part of the code bundle) file contains the entries including the default group, that are shipped with the Tivoli DS for easy understanding. You can customize it to any valid values: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "Tivoli-store" -t LDAPv3ForTivoli -D data_store_tivoli.txt Data store for Active Directory Microsoft Active Directory provides most of the LDAPv3 features including support for persistent search notifications. Creating a data store for this is also a straightforward process and is available out-of-the-box. You can find the data_ store_ad.txt to create a new data store by supplying the -t LDAPv3ForAD option to the ssoadm tool. Here is the sample command that creates the datastore for AD: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "AD-store" -t LDAPv3ForAD -D data_store_ad.txt . Data store for Active Directory Application Mode Microsoft Active Directory Application Mode (ADAM) is the lightweight version of the Active directory with simplified schema. In the ADAM instance it is possible to set user password over LDAP unlike Active Directory where password-related operations must happen over LDAPS. Creating a data store for this is also a straightforward process and is available out-of-the-box. You can find the data_store_adam. txt to create a new data store by supplying the -t LDAPv3ForADAM option to the ssoadm tool: ./ssoadm create-datastore -u amadmin -f /tmp/.passwd_of_amadmin -e / -m "ADAM-store" -t LDAPv3ForADAM -D data_store_adam.txt
Read more
  • 0
  • 0
  • 3034
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-openam-backup-recovery-and-logging
Packt
03 Feb 2011
9 min read
Save for later

OpenAM: Backup, Recovery, and Logging

Packt
03 Feb 2011
9 min read
  OpenAM Written and tested with OpenAM Snapshot 9—the Single Sign-On (SSO) tool for securing your web applications in a fast and easy way The first and the only book that focuses on implementing Single Sign-On using OpenAM Learn how to use OpenAM quickly and efficiently to protect your web applications with the help of this easy-to-grasp guide Written by Indira Thangasamy, core team member of the OpenSSO project from which OpenAM is derived Real-world examples for integrating OpenAM with various applications OpenSSO provides utilities that can be invoked with proper procedure to backup the server configuration data. When a crash or data corruption occurs, the server administrator must initiate a recovery operation. Recovering a backup involves the following two distinct operations: Restoring the configuration files Restoring the XML configuration data that was backed up earlier In general, recovery refers to the various operations involved in restoring, rolling forward, and rolling back a backup. Backup and recovery refers to the various strategies and operations involved in protecting the configuration database against data loss, and reconstructing the database should a loss occur. In this article, you should be able to learn about how to use the tools provided by OpenSSO to perform the following: OpenSSO configuration backup and recovery Test to production Trace and debug level logging for troubleshooting Audit log configuration using flat file Audit log configuration using RDBMS Securing the audit logs from intrusion OpenSSO deals with only backing up the configuration data of the server as the identity such as users, groups, and roles data backup and recovery will be handled by the enterprise level identity management suite of products. Backing up configuration data I am sure you are familiar now with the different configuration stores supported by the OpenSSO, an embedded store (based on OpenDS), and a highly scalable Directory Server Enterprise Edition. Regardless of the underlying configuration type, there are certain files that are created in the local file system on the host where the server is deployed. These files (such as bootstrap file) contain critical pieces of information that helps the application to initialize, any corruption in these files could cause the server application not to start. Hence it becomes necessary to backup the configuration data stored in the file system. As a result, we could term the backup and recovery as a two step process: Backup the configuration files in the local file system Backup the OpenSSO configuration data in the LDAP configuration Let us briefly discuss what each option means and when to apply them. Backing up the OpenSSO configuration files Typically the server can fail to come up for two reasons. Either because it could not find the right configuration file that will locate the configuration data store or because the data contained in the configuration store is corrupted. It is the simplest case I took up for this discussion because there could be umpteen reasons that could cause a server to fail to start up. OpenSSO provides a subcommand export-svc-cfg as part of the ssoadm command line interface. Using this the customer can only backup the configuration data that is stored in the configuration store, provided the configuration store is up and running. This backup will not help if the disk that holds the configuration files such as the bootstrap and OpenDS schema files crashed, because the backup obtained using the export-svc-cfg will not contain these schema and bootstrap files. This jeopardizes the backup and recovery process for the OpenSSO. This is why backing up the configuration directory becomes inevitable and vital for the recovery process. To backup the OpenSSO configuration directory, you can just use any file archive tool of your choice. To perform this backup, log on to the OpenSSO host as the OpenSSO configuration user, and execute the following command: zip -r /tmp/opensso_config.zip /export/ssouser/opensso1/ This will backup all the contents of the configuration directory (in this case /export/ssouser/opensso1). Though this will be the recommended way to backup, there may be server audit and debug logs that could fill up the disk space as you perform a periodic backup of this directory. The critical files and directories that need to be backed up are as follows: bootstrap opends (whole directory if present) .version .configParam certificate stores The rest of the content of this directory can be restored from your staging area. If you have customized any of the service schema under the <opensso-config>/config/xml directory, then make sure you back them up. This backup is in itself enough to bring up the corrupted OpenSSO server. When you backup the opends directory all the OpenSSO configuration information will also get backed up, so you really do not need to have the backup file that you would generate using the export-svc-cfg. This kind of backup will be extremely useful and is the only way to perform the crash recovery. If the OpenDS is itself corrupted due to its internal database or index corruption, it will not start. Hence one cannot access the OpenSSO server or ssoadm command line tool to restore the XML backup. So, it is a must to backup your configuration directory from the file system periodically. Backing up the OpenSSO configuration data The process of backing up the OpenSSO service configuration data is slightly different from the complete backup of the overall system deployment. When the subcommand export-svc-cfg is invoked, the underlying code exports all the nodes under the ou=services,ROOT_SUFFIX of the configuration directory server: ./ssoadm export-svc-cfg -u amadmin -f /tmp/passwd_of_amadmin -e secretkeytoencryptpassword -o /tmp/svc-config-bkup.xml To perform this, you need to have the ssoadm command line tool configured. The options supplied to this command are self-explanatory except maybe the -e . This takes a random string that will be used as the encryption secret to encrypt the password entries in the service configuration data. For example, the RADIUS server's share secret value. You need this key to restore the data back to the server. The OpenSSO and its configuration directory server must be running in good condition in order to be successful with this export operation. This backup will be useful in the following cases: Administrator accidentally deleted some of the authentication configurations Administrator accidentally changed some of the configuration properties Somehow the agent profiles have lost their configuration data Want to reset to factory defaults In any case, one should be able to authenticate to OpenSSO as an admin to restore the configuration data. If the server is not in that state, then crash recovery is the only option. In the embedded store configuration case this means unzipping the file system configuration backup obtained as described in the Backing up the OpenSSO configuration files section. For the configuration data that is stored in the Directory Server Enterprise Edition, the customer should use the tools that are bundled with the Oracle Directory Server Enterprise Edition to backup and restore. Crash recovery and restore In the previous section, we briefly covered the crash recovery part of it. When a crash occurs in the embedded or remote configuration store, the server will not come up again unless it is restored back to a valid state. This may involve restoring the proper database state and indexes using a known valid state backup. This backup may have been obtained by using the ODSEE backup tools or simply zipping up the configuration file system of OpenSSO, as described in the Backing up the OpenSSO configuration files section. You need to bring back the OpenSSO server to a state where the administrator can log in to access the console. At this point the configuration exported to XML, as described in the Backing up the OpenSSO configuration data section can be used. Here is a sample execution of the import-svccfg subcommand. It is recommended to backup your vanilla configuration data from the file system periodically to use it in the crash recovery case (where the embedded store itself is corrupted). Backup of configuration data using the export-svc-cfg should be done frequently: ./ssoadm import-svc-cfg -u amadmin -f /tmp/passwd_of_amadmin -e mysecretenckey -X /tmp/svc-config-bkup.xml This will throw an error (because we have intentionally provided a wrong key) claiming that the secret key provided was wrong (actually it will show a string such as the following, that is a known bug): import-service-configuration-secret-key This is the key name that is supposed to contain a corresponding localizable error string. If you provide the correct encryption key, then it will import successfully: ./ssoadm import-svc-cfg -u amadmin -f /tmp/passwd_of_amadmin -e secretkeytoencryptpassword -X /tmp/svc-config-bkup.xml Directory Service contains existing data. Do you want to delete it? [y|N] y Please wait while we import the service configuration... Service Configuration was imported. Note that it prompts before overwriting the existing data to make sure that the current configuration is not overwritten accidentally. There is no incremental restore so be cautious while performing this operation. An import with a wrong version of the restore file could damage a working configuration. It is always recommended to backup the existing configuration before importing an existing configuration backup file. If you do not want to import the current file just enter N and the command will terminate without harming your data. Well, what happens if customers do not have the configuration files backed up? Suppose customers do not have the copy of the configuration files to restore, they can reconfigure the OpenSSO web application by accessing the configurator (after cleaning up existing configuration). Once they configure the server, they should be able to restore the XML backup. Nevertheless, the new configuration must match all the configuration parameters that were provided earlier including the hostname and port details. This information can be found in the .configParam file. If you are planning to export the configuration to a different server than the original server, then you should be referring to the Test to production section, that covers the details on how customers can migrate the test configuration to a production server. It requires more steps than simply restoring the configuration data.
Read more
  • 0
  • 0
  • 3021

article-image-creating-time-series-charts-r
Packt
01 Feb 2011
5 min read
Save for later

Creating Time Series Charts in R

Packt
01 Feb 2011
5 min read
Formatting time series data for plotting Time series or trend charts are the most common form of line graphs. There are a lot of ways in R to plot such data, however it is important to first format the data in a suitable format that R can understand. In this recipe, we will look at some ways of formatting time series data using the base and some additional packages. Getting ready In addition to the basic R functions, we will also be using the zoo package in this recipe. So first we need to install it: install.packages("zoo") How to do it... Let's use the dailysales.csv example dataset and format its date column: sales<-read.csv("dailysales.csv") d1<-as.Date(sales$date,"%d/%m/%y") d2<-strptime(sales$date,"%d/%m/%y") data.class(d1) [1] "Date" data.class(d2) [1] "POSIXt" How it works... We have seen two different functions to convert a character vector into dates. If we did not convert the date column, R would not automatically recognize the values in the column as dates. Instead, the column would be treated as a character vector or a factor. The as.Date() function takes at least two arguments: the character vector to be converted to dates and the format to which we want it converted. It returns an object of the Date class, represented as the number of days since 1970-01-01, with negative values for earlier dates. The values in the date column are in a DD/MM/YYYY format (you can verify this by typing sales$date at the R prompt). So, we specify the format argument as "%d/%m/%y". Please note that this order is important. If we instead use "%m/%d/%y", then our days will be read as months and vice-versa. The quotes around the value are also necessary. The strptime() function is another way to convert character vectors into dates. However, strptime() returns a different kind of object of class POSIXlt, which is a named list of vectors representing the different components of a date and time, such as year, month, day, hour, seconds, minutes, and a few more. POSIXlt is one of the two basic classes of date/times in R. The other class POSIXct represents the (signed) number of seconds since the beginning of 1970 (in the UTC time zone) as a numeric vector. POSIXct is more convenient for including in data frames, and POSIXlt is closer to human readable forms. A virtual class POSIXt inherits from both of the classes. That's why when we ran the data.class() function on d2 earlier, we get POSIXt as the result. strptime() also takes a character vector to be converted and the format as arguments. There's more... The zoo package is handy for dealing with time series data. The zoo() function takes an argument x, which can be a numeric vector, matrix, or factor. It also takes an order.by argument which has to be an index vector with unique entries by which the observations in x are ordered: library(zoo) d3<-zoo(sales$units,as.Date(sales$date,"%d/%m/%y")) data.class(d3) [1] "zoo" See the help on DateTimeClasses to find out more details about the ways dates can be represented in R. Plotting date and time on the X axis In this recipe, we will learn how to plot formatted date or time values on the X axis. Getting ready For the first example, we only need to use the base graphics function plot(). How to do it... We will use the dailysales.csv example dataset to plot the number of units of a product sold daily in a month: sales<-read.csv("dailysales.csv") plot(sales$units~as.Date(sales$date,"%d/%m/%y"),type="l", xlab="Date",ylab="Units Sold") How it works... Once we have formatted the series of dates using as.Date(), we can simply pass it to the plot() function as the x variable in either the plot(x,y) or plot(y~x) format. We can also use strptime() instead of using as.Date(). However, we cannot pass the object returned by strptime() to plot() in the plot(y~x) format. We must use the plot(x,y) format as follows: plot(strptime(sales$date,"%d/%m/%Y"),sales$units,type="l", xlab="Date",ylab="Units Sold") There's more... We can plot the example using the zoo() function as follows (assuming zoo is already installed): library(zoo) plot(zoo(sales$units,as.Date(sales$date,"%d/%m/%y"))) Note that we don't need to specify x and y separately when plotting using zoo; we can just pass the object returned by zoo() to plot(). We also need not specify the type as "l". Let's look at another example which has full date and time values on the X axis, instead of just dates. We will use the openair.csv example dataset for this example: air<-read.csv("openair.csv") plot(air$nox~as.Date(air$date,"%d/%m/%Y %H:%M"),type="l", xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen") (Move the mouse over the image to enlarge it.) The same graph can be made using zoo as follows: plot(zoo(air$nox,as.Date(air$date,"%d/%m/%Y %H:%M")), xlab="Time", ylab="Concentration (ppb)", main="Time trend of Oxides of Nitrogen")
Read more
  • 0
  • 0
  • 7915

article-image-faqs-microsoft-sql-server-2008-high-availability
Packt
28 Jan 2011
6 min read
Save for later

FAQs on Microsoft SQL Server 2008 High Availability

Packt
28 Jan 2011
6 min read
Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication  Install various SQL Server High Availability options in a step-by-step manner  A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators  Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability  Tips to enhance performance with SQL Server High Availability  External references for further study Q: What is Clustering? A: Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are critical for business and should be available 24 X 7 with 99.99% up time. Q: How does MS Windows server Enterprise and Datacenter edition support failover clustering? A: MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two or more identical nodes connected to each other by means of private network and commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). Q: What is MSDTC? A: Microsoft Distributed Transaction Coordinator (MSDTC) is a service used by the SQL Server when it is required to have distributed transactions between more than one machine. In a clustered environment, SQL Server service can be hosted on any of the available nodes if the active node fails, and in this case MSDTC comes into the picture in case we have distributed queries and for replication, and hence the MSDTC service should be running. Following are a couple of questions with regard to MSDTC. Q: What will happen to the data that is being accessed? A: The data is taken care of, by shared disk arrays as it is shared and every node that is part of the cluster can access it; however, one node at a time can access and own it. Q: What about clients that were connected previously? Does the failover mean that developers will have to modify the connection string? A: Nothing like this happens. SQL Server is installed as a virtual server and it has a virtual IP address and that too is shared by every cluster node. So, the client actually knows only one SQL Server or its IP address. Here are the steps that explain how Failover will work: Node 1 owns the resources as of now, and is active node. The network adapter driver gets corrupted or suffers a physical damage. Heartbeat between Node1 and Node 2 is broken. Node 2 initiates the process to take ownership of the resources owned by the Node 1. It would approximately take two to five minutes to complete the process. Q: What is Hyper-V? What are their uses? A: Let's see what the Hyper-V is: It is a hypervisor-based technology that allows multiple operating systems to run on a host operating system at the same time. It has advantages of using SQL Server 2008 R2 on Windows Server 2008 R2 with Hyper-V. One such example could be the ability to migrate a live server, thereby increasing high availability without incurring downtime, among others. Hyper-V now supports up to 64 logical processors. It can host up to four VMs on a single licensed host server. SQL Server 2008 R2 allows an unrestricted number of virtual servers, thus making consolidation easy. It has the ability to manage multiple SQL Servers centrally using Utility Control Point (UCP). Sysprep utility can be used to create preconfigured VMs so that SQL Server deployment becomes easier. Q: What are the Hardware, Software and Operating system requirements for installing SQL Server 2008 R2? A: The following are the hardware requirements: Processor: Intel Pentium 3 or Higher Processor Speed: 1 GHZ or Higher RAM: 512 MB of RAM but 2 GB is recommended Display : VGA or Higher The following are the software requirements: Operating system: Windows 7 Ultimate, Windows Server 2003 (x86 or x64), Windows Server 2008 (x86 or x64) Disk space: Minimum 1 GB .Net Framework 3.5 Windows Installer 4.5 or later MDAC 2.8 SP1 or later The following are the operating system requirements for clustering: To install SQL Server 2008 clustering, it's essential to have Windows Server 2008 Enterprise or Data Center Edition installed on our host system with Full Installation, so that we don't have to go back and forth and install the required components and restart the system. Q: What is to be done when we see the network binding warning coming up? A: In this scenario, we will have to go to Network and Sharing Center | Change Adapter Settings. Once there, pressing Alt + F, we will select Advanced Settings. Select Public Network and move it up if it is not and repeat this process on the second node. Q: What is the difference between Active/Passive and Active/Active Failover Cluster? A: In reality, there is only one difference between Single-instance (Active/Passive Failover Cluster) and Multi-instance (Active/Active Failover Cluster). As its name suggests, in a Multi-instance cluster, there will be two or more SQL Server active instances running in a cluster, compared to one instance running in Single-instance. Also, to configure a multi-instance Cluster, we may need to procure additional disks, IP addresses, and network names for the SQL Server. Q: What is the benefit of having Multi-instance, that is, Active/Active configuration? A: Depending on the business requirement and the capability of our hardware, we may have one or more instances running in our cluster environment. The main goal is to have a better uptime and better High Availability by having multiple SQL Server instances running in an environment. Should anything go wrong with the one SQL Server instance, another instance can easily take over the control and keep the business-critical application up and running! Q: What will be the difference in the prerequisites for the Multi-instance Failover Cluster as compared to the Single-instance Failover Cluster? A: There will be no difference compared to a Single-instance Failover Cluster, except that we need to procure additional disk(s), network name, and IP addresses. We need to make sure that our hardware is capable of handling requests that come from client machines for both the instances. Installing a Multi-instance cluster is almost similar to adding a Single-instance cluster, except for the need to add a few resources along with a couple of steps here and there.
Read more
  • 0
  • 0
  • 1520

article-image-openam-identity-stores-types-supported-types-caching-and-notification
Packt
27 Jan 2011
11 min read
Save for later

OpenAM Identity Stores: Types, Supported Types, Caching and Notification

Packt
27 Jan 2011
11 min read
Like any other service, the Identity Repository service is also defined using an XML file named idRepoService.xml that can be found in <conf-dir>/config/xml. In this file one can define as many subschema as needed. By default, the following subschema names are defined: LDAPv3 LDAPv3ForAMDS LDAPv3ForOpenDS LDAPv3ForTivoli LDAPv3ForAD LDAPv3ForADAM Files Database However, not all of them are supported in the version that has been tested while writing this article. For instance the files, LDAPv3, and Database subschema are meant to be sample implementations. One can extend it for other databases, keeping this as an example. The rest of the sub configurations are all well tested and supported. One of the Identity Repository types Access Manager Repository is missing from this definition, as it is a manual process to add it into the OpenSSO server. That is something which will be detailed later in this article. It is also called a legacy SDK plugin for OpenSSO. The Identity Repository framework requires support for logging service and session management to deliver its overall functionality. Identity store types In OpenSSO, multiple types of Identity Repository plugins are implemented including the following: LDAPv3Repo AgentsRepo InternalRepo/SpecialRepo FilesRepo AMSDKRepo Unlike the Access Manager Repository plugin, these are available in a vanilla OpenSSO server. So customers can readily use it without requiring to perform any additional configuration steps. LDAPv3Repo: Is the plugin that will be used by customers and the administrators quite frequently as the other types of plugin implementations are mostly meant to be used by OpenSSO internal services. This plugin forms the basis for building the configuration for supporting various LDAP servers including Microsoft Active Directory, Active Directory Application Mode (ADAM/LDS), IBM Tivoli Directory, OpenDS, and Oracle Directory Server Enterprise Edition. There are subschema defined for each of the recently mentioned LDAP servers in the IdRepo service schema as described in the beginning of this section. AgentsRepo: Is a plugin that is used to manage the OpenSSO policy agents' profiles. Unlike the LDAPv3Repo, AgentsRepo uses the configuration repository to store the agent's configuration data including authentication information. Prior to the Agents 3.0 version, all agents accessing earlier versions of OpenSSO such as Access Manager 7.1, had most of the configuration data of the agents stored locally in the file system as plain text files. This imposed huge management problems for the customers to upgrade or change any configuration parameters as it required them to log in to each host where the agents are installed. Besides, the configuration of all agents prior to 3.0 was stored in the user identity store. In OpenSSO the agent's profiles and configurations are stored as part of the configuration Directory Information Tree (DIT). The AgentsRepo is a hidden internal repository plugin, and at no point should it be visible to end users or administrators for modification. SpecialRepo: In the predecessor of OpenSSO the administrative users were stored as part of the user identity store. So even when the configuration store is up and running administrators still cannot log in to the system unless the user identity store is up and running. This kind of limits the customer experience especially during pilot testing and troubleshooting scenarios. To overcome this, OpenSSO introduced a feature wherein all the core administrative users are stored as part of the configuration store in the IdRepo service. All the administrative and special user authentication by default uses this specialrepo framework. It may be possible to override this behavior by invoking module based authentication. SpecialRepo is used as a fallback repository to get authenticated to the OpenSSO server. SpecialRepo is also a hidden internal repository plugin. At no point, should it be visible to end users or administrators for modification. FilesRepo: Is no longer supported in the OpenSSO product. You can see the references of this in the source code but it cannot be configured to use flat files store for either configuration data or user identity data. AMSDKRepo: This plugin has been made available to maintain the compatibility with the Sun Java System Access Manager versions. When this plugin is enabled the identity schema is defined using the DAI service as described in the ums.xml. This plugin will not be available in the vanilla OpenSSO server, the administrator has to perform certain manual steps to have this plugin available for use. In this plugin, identity management is tightly coupled with the Oracle Directory Server Enterprise Edition. It is generally useful in the co-existence scenario where OpenSSO needs to co-exist with Sun Access Manager. In this article wherever we refer to "Access Manager Repository plugin" it means refer to AMSDKRepo. Besides this there is a sample implementation for the MySQL-based database repository available as part of the server default configuration. It works; however, it is not extensively tested for all the OpenSSO features. You can also refer to another discussion on the custom database repository implementation at this link: http:// www.badgers-in-foil.co.uk/notes/installing_a_custom_opensso_identity_ repository/. Caching and notification For the LDAPv3Repo, the key feature that enables it to perform and scale is the caching of results set for each client query, without which it would be impossible to achieve the performance and scalability. When caching is employed there is a possibility that clients could get stale information about identities. This can be avoided by keeping the cache cleaned up periodically or having an external event dirty the cache so new values can be cached. OpenSSO provides more than one way to tackle this caching and notification. There are a couple of ways in which the cache can be invalidated and refreshed. The Identity Repository design relies broadly on two types of mechanisms to refresh the IdRepo cache. They are: Persistent search-based event notification Time-to-live (TTL) based refresh Both methods have their own merits and can be enabled simultaneously, and it is recommended. This is to handle the scenario where a network glitch (which could cause a packet loss) might have caused the OpenSSO server to miss some change notifications. The value of TTL purely depends on the deployment environment and end user experience. Persistent search-based notification The OpenSSO Identity Repository plugin cache can be invalidated and refreshed by registering a persistent search connection to the backend LDAP server provided the LDAP server supports the persistent search control. The persistent search (http:// www.mozilla.org/directory/ietf-docs/draft-smith-psearch-ldap-01.txt) control 2.16.840.1.113730.3.4.3 is implemented by many of the commercial LDAP servers including: IBM (Tivoli Directory) Novell (eDirectory) Oracle Directory Server Enterprise Edition(ODSEE) OpenDS (OpenDS Directory Server 1.0.0-build007) Fedora-Directory/1.0.4 B2006.312.1539 In order to determine whether your LDAP vendor supports a persistent search, perform the following search for the persistent search control 2.16.840.1.113730.3.4.3: ldapsearch -p 389 -h ds_host -s base -b '' "objectclass=*" supportedControl | grep 2.16.840.1.113730.3.4.3 Microsoft Active Directory implements in a different form using the LDAP control 1.2.840.113556.1.4.528. Persistent searches are handled by the max-psearch-count property in the Sun Java Directory Server that defines the maximum number of persistent searches that can be performed on the directory server. The persistent search mechanism provides an active channel through which entries that change (and information about the changes that occur) can be communicated. As each persistent search operation uses one thread, limiting the number of simultaneous persistent searches prevents certain kinds of denial of service attacks. It is quite apparent that a client implementation that generates a large number of persistent connections to a single directory server may indicate that the LDAP protocol may not have been the correct transport. However, horizontal scaling using Directory Proxy Servers, or an LDAP Consumer tier, may assist to spread the load. The best solution, from an LDAP implementation, would be to limit persistent searches. If you have created a user data store against an LDAP server which supports RFC2026, then a persistent search connection will be created with base DN configured in the LDAPv3 configuration. The search filter for this connection is obtained from the data store configuration properties. Though it is possible to listen to a specific type of change event, OpenSSO registers the persistent search connections to receive all kinds of change events. The IdRepo framework has the logic to determine whether the underlying directory server supports persistent searches or not. If not supported it does not try to submit the persistent search. In this case customers may resort to a TTL-based notification as described in the next section. Each active persistent search request requires that an open TCP connection be maintained between an LDAP client (in this case it is OpenSSO) and an LDAP (backend user store LDAP server) server that might not otherwise be kept open. The OpenSSO server that acts as an LDAP client closes idle LDAP connections to the backend LDAP server in order to maximize the resource utilization. If the OpenSSO servers are behind the load balancer or a firewall you need to tune the value of "com. sun.am.event.connection.idle.timeout". If the persistent search connections are made through a Load Balancer (LB) or firewall, then these connections are subject to the TCP timeout value of the respective LB and/or firewall. In such a scenario once the firewall closes the persistent search connection due to an idle TCP timeout, then the change notifications cannot happen to OpenSSO unless the persistent search connection is re-established. Customers could avoid this scenario by configuring the idle timeout for the persistent search connection so that it would restart the persistent search TCP connection before the LB/firewall idle timeout, that way the LB/firewall will not have an idle persistent search connection. The advanced server configuration property "com.sun.am.event.connection. idle.timeout" specifies timeout value in minutes after which the persistent searches will be restarted. Ideally, this value should be lower than the LB/firewall TCP timeout, to make sure that the persistent searches are restarted before the connections are dropped. A value of "0" indicates that these searches will not be restarted. By default the value is "0". Only the connections that are timed out will be reset. You should never set this value to a value lower than the LB/firewall timeout. The delta should not be more than five minutes. If your LB's idle connection timeout is "50" minutes, then set this property value to "45" minutes. For some reason if you want to disable the persistent search to be submitted to the backend LDAP server, just leave the persistent search base (sun-idrepo-ldapv3- config-psearchbase) empty, this will cause the IdRepo to disable the persistent search connection. Time-to-live based notification There may be deployment scenarios where persistent search-based notifications may not be possible or the underlying LDAP server may not be supporting the persistent search control. In such scenarios customers can employ the TTL or timeto- live based notification mechanism. It is a feature that involves a proprietary implementation by the OpenSSO server. This feature works in a fashion that is similar to the polling mechanism in the OpenSSO clients where the client periodically polls the OpenSSO server for changes, often called "pull" model. Whereas persistent search-based notifications are termed as "push" model (the LDAP server pushes the changes to the clients). Regardless of the persistent search based change notifications, the OpenSSO server polls the underlying directory server and gets the data to refresh its Identity Repository cache. TTL-specific properties for Identity Repository cache When the OpenSSO deployment is configured for TTL-based cache refresh, there are certain server-side properties that need to be configured to enable the Identity Repository framework to refresh the cache. The following are the core properties that are relevant in the TTL context: com.sun.identity.idm.cache.entry.expire.enabled=true com.sun.identity.idm.cache.entry.user.expire.time=1 (in minutes). com.sun.identity.idm.cache.entry.default.expire.time=1 (in minutes). The property com.sun.identity.idm.cache.user.expire.time and com.sun. identity.idm.cache.default.expire.time specify time in minutes for which the user and non-user entries such as roles and groups respectively remain valid after their last modification. In other words after this specified period of time elapses, the data for the entry that is cached will expire. At that instant, new requests for these entries will result in fresh reading from the underlying Identity Repository plugins. Suppose the property com.sun.identity.idm.cache.entry.expire.enabled is set to true, the non-user objects cache entries will expire based on the time specified in the com.sun.identity.idm.cache.entry.default.expire.time property. The rest of the user entries objects will be cleaned up based on the value set in the property com.sun.identity.idm.cache.entry.user.expire.time.
Read more
  • 0
  • 0
  • 3097
article-image-choosing-styles-various-graph-elements-r
Packt
25 Jan 2011
4 min read
Save for later

Choosing Styles of Various Graph Elements in R

Packt
25 Jan 2011
4 min read
  R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications Learn to draw any type of graph or visual data representation in R Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations All examples are accompanied with the corresponding graph images, so you know what the results look like Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible    Choosing plotting point symbol styles and sizes In this recipe, we will see how we can adjust the styling of plotting symbols, which is useful and necessary when we plot more than one set of points representing different groups of data on the same graph. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. We will also use the cityrain.csv example data file. Please read the file into R as follows: rain<-read.csv("cityrain.csv") The code file can be downloaded from here. How to do it... The plotting symbol and size can be set using the pch and cex arguments: plot(rnorm(100),pch=19,cex=2)   How it works... The pch argument stands for plotting character (symbol). It can take numerical values (usually between 0 and 25) as well as single character values. Each numerical value represents a different symbol. For example, 1 represents circles, 2 represents triangles, 3 represents plus signs, and so on. If we set the value of pch to a character such as "*" or "£" in inverted commas, then the data points are drawn as that character instead of the default circles. The size of the plotting symbol is controlled by the cex argument, which takes numerical values starting at 0 giving the amount by which plotting symbols should be magnified relative to the default. Note that cex takes relative values (the default is 1). So, the absolute size may vary depending on the defaults of the graphic device in use. For example, the size of plotting symbols with the same cex value may be different for a graph saved as a PNG file versus a graph saved as a PDF. There’s more... The most common use of pch and cex is when we don’t want to use color to distinguish between different groups of data points. This is often the case in scientific journals which do not accept color images. For example, let’s plot the city rainfall data as a set of points instead of lines: plot(rain$Tokyo, ylim=c(0,250), main="Monthly Rainfall in major cities", xlab="Month of Year", ylab="Rainfall (mm)", pch=1) points(rain$NewYork,pch=2) points(rain$London,pch=3) points(rain$Berlin,pch=4) legend("top", legend=c("Tokyo","New York","London","Berlin"), ncol=4, cex=0.8, bty="n", pch=1:4) Choosing line styles and width Similar to plotting point symbols, R provides simple ways to adjust the style of lines in graphs. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. We will again use the cityrain.csv data file. How to do it... Line styles can be set by using the lty and lwd arguments (for line type and width respectively) in the plot(), lines(), and par() commands. Let’s take our rainfall example and apply different line styles keeping the color the same: plot(rain$Tokyo, ylim=c(0,250), main="Monthly Rainfall in major cities", xlab="Month of Year", ylab="Rainfall (mm)", type="l", lty=1, lwd=2) lines(rain$NewYork,lty=2,lwd=2) lines(rain$London,lty=3,lwd=2) lines(rain$Berlin,lty=4,lwd=2) legend("top", legend=c("Tokyo","New York","London","Berlin"), ncol=4, cex=0.8, bty="n", lty=1:4, lwd=2)   How it works... Both line type and width can be set with numerical values as shown in the previous example. Line type number values correspond to types of lines: 0: blank 1: solid (default) 2: dashed 3: dotted 4: dotdash 5: longdash 6: twodash We can also use the character strings instead of numbers, for example, lty="dashed" instead of lty=2. The line width argument lwd takes positive numerical values. The default value is 1. In the example we used a value of 2, thus making the lines thicker than default.
Read more
  • 0
  • 0
  • 2347

article-image-adjusting-key-parameters-r
Packt
24 Jan 2011
6 min read
Save for later

Adjusting Key Parameters in R

Packt
24 Jan 2011
6 min read
R Graph Cookbook Detailed hands-on recipes for creating the most useful types of graphs in R – starting from the simplest versions to more advanced applications Learn to draw any type of graph or visual data representation in R Filled with practical tips and techniques for creating any type of graph you need; not just theoretical explanations All examples are accompanied with the corresponding graph images, so you know what the results look like Each recipe is independent and contains the complete explanation and code to perform the task as efficiently as possible       Setting colors of points, lines, and bars In this recipe we will learn the simplest way to change the colors of points, lines, and bars in scatter plots, line plots, histograms, and bar plots. Getting ready All you need to try out this recipe is to run R and type the recipe at the command prompt. You can also choose to save the recipe as a script so that you can use it again later on. How to do it... The simplest way to change the color of any graph element is by using the col argument. For example, the plot() function takes the col argument: plot(rnorm(1000), col="red")   The code file can be downloaded from here. If we choose plot type as line, then the color is applied to the plotted line. Let’s use the dailysales.csv example dataset. First, we need to load it: Sales <- read.csv("dailysales.csv",header=TRUE) plot(sales$units~as.Date(sales$date,"%d/%m/%y"), type="l", #Specify type of plot as l for line col="blue")   Similarly, the points() and lines() functions apply the col argument's value to the plotted points and lines respectively. barplot() and hist() also take the col argument and apply them to the bars. So the following code would produce a bar plot with blue bars: barplot(sales$ProductA~sales$City, col="blue") The col argument for boxplot() is applied to the color of the boxes plotted. How it works... The col argument automatically applies the specified color to the elements being plotted, based on the plot type. So, if we do not specify a plot type or choose points, then the color is applied to points. Similarly, if we choose plot type as line then the color is applied to the plotted line and if we use the col argument in the barplot() or histogram() commands, then the color is applied to the bars. col accepts names of colors such as red, blue, and black. The colors() (or colours()) function lists all the built-in colors (more than 650) available in R. We can also specify colors as hexadecimal codes such as #FF0000 (for red), #0000FF (for blue), and #000000 (for black). If you have ever made any web pages¸ you would know that these hex codes are used in HTML to represent colors. col can also take numeric values. When it is set to a numeric value, the color corresponding to that index in the current color palette is used. For example, in the default color palette the first color is black and the second color is red. So col=1 and col=2 refers to black and red respectively. Index 0 corresponds to the background color. There's more... In many settings, col can also take a vector of multiple colors, instead of a single color. This is useful if you wish to use more than one color in a graph. The heat.colors() function takes a number as an argument and returns a vector of those many colors. So heat.colors(5) produces a vector of five colors. Type the following at the R prompt: heat.colors(5) You should get the following output: [1] "#FF0000FF" "#FF5500FF" "#FFAA00FF" "#FFFF00FF" "#FFFF80FF" Those are five colors in the hexadecimal format. Another way of specifying a vector of colors is to construct one: barplot(as.matrix(sales[,2:4]), beside=T, legend=sales$City, col=c("red","blue","green","orange","pink"), border="white") In the example, we set the value of col to c("red","blue","green","orange","pink"), which is a vector of five colors. We have to take care to make a vector matching the length of the number of elements, in this case bars we are plotting. If the two numbers don’t match, R will 'recycle’ values by repeating colors from the beginning of the vector. For example, if we had fewer colors in the vector than the number of elements, say if we had four colors in the previous plot, then R would apply the four colors to the first four bars and then apply the first color to the fifth bar. This is called recycling in R: barplot(as.matrix(sales[,2:4]), beside=T, legend=sales$City, col=c("red","blue","green","orange"), border="white") In the example, both the bars for the first and last data rows (Seattle and Mumbai) would be of the same color (red), making it difficult to distinguish one from the other. One good way to ensure that you always have the correct number of colors is to find out the length of the number of elements first and pass that as an argument to one of the color palette functions. For example, if we did not know the number of cities in the example we have just seen; we could do the following to make sure the number of colors matches the number of bars plotted: barplot(as.matrix(sales[,2:4]), beside=T, legend=sales$City, col=heat.colors(length(sales$City)), border="white") We used the length() function to find out the length or the number of elements in the vector sales$City and passed that as the argument to heat.colors(). So, regardless of the number of cities we will always have the right number of colors. See also In the next four recipes, we will see how to change the colors of other elements. The fourth recipe is especially useful where we look at color combinations and palettes.
Read more
  • 0
  • 0
  • 7555

article-image-microsoft-sql-server-2008-high-availability-understanding-domains-users-and-security
Packt
21 Jan 2011
14 min read
Save for later

Microsoft SQL Server 2008 High Availability: Understanding Domains, Users, and Security

Packt
21 Jan 2011
14 min read
Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication  Install various SQL Server High Availability options in a step-by-step manner  A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators  Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability  Tips to enhance performance with SQL Server High Availability  External references for further study  Windows domains and domain users In the early era of Windows, operating system user were created standalone until Windows NT operating system hit the market. Windows NT, that is, Windows New Technology introduced some great feature to the world—including domains. A domain is a group of computers that run on Windows operating systems. Amongst them is a computer that holds all the information related to user authentication and user database and is called the domain controller (server), whereas every user who is part of this user database on the domain controller is called a domain user. Domain users have access to any resource across the domain and its subdomains with the privilege they have, unlike the standalone user who has access to the resources available to a specific system. With the release of Windows Server 2000, Microsoft released Active Directory (AD), which is now widely used with Windows operating system networks to store, authenticate, and control users who are part of the domain. A Windows domain uses various modes to authenticate users—encrypted passwords, various handshake methods such as PKI, Kerberos, EAP, SSL certificates, NAP, LDAP, and IP Sec policy—and makes it robust authentication. One can choose the authentication method that suits business needs and based on the environment. Let's now see various authentication methods in detail. Public Key Infrastructure (PKI): This is the most common method used to transmit data over insecure channels such as the Internet using digital certificates. It has generally two parts—the public and private keys. These keys are generated by a Certificate Authority, such as, Thawte. Public keys are stored in a directory where they are accessible by all parties. The public key is used by the message sender to send encrypted messages, which then can be decrypted using the private key. Kerberos: This is an authentication method used in client server architecture to authorize the client to use service(s) on a server in a network. In this method, when a client sends a request to use a service to a server, a request goes to the authentication server, which will generate a session key and a random value based on the username. This session key and a random value are then passed to the server, which grants or rejects the request. These sessions are for certain time period, which means for that particular amount of time the client can use the service without having to re-authenticate itself. Extensible Authentication Protocol (EAP): This is an authentication protocol generally used in wireless and point-to-point connections. SSL Certificates: A Secure Socket Layer certificate (SSL) is a digital certificate that is used to identify a website or server that provides a service to clients and sends the data in an encrypted form. SSL certificates are typically used by websites such as GMAIL. When we type a URL and press Enter, the web browser sends a request to the web server to identify itself. The web server then sends a copy of its SSL certificate, which is checked by the browser. If the browser trusts the certificate (this is generally done on the basis of the CA and Registration Authority and directory verification), it will send a message back to the server and in reply the web server sends an acknowledgement to the browser to start an encrypted session. Network Access Protection (NAP): This is a new platform introduced by Microsoft with the release of Windows Server 2008. It will provide access to the client, based on the identity of the client, the group it belongs to, and the level compliance it has with the policy defined by the Network Administrators. If the client doesn't have a required compliance level, NAP has mechanisms to bring the client to the compliance level dynamically and allow it to access the network. Lightweight Directory Access Protocol (LDAP): This is a protocol that runs over TCP/IP directly. It is a set of objects, that is, organizational units, printers, groups, and so on. When the client sends a request for a service, it queries the LDAP server to search for availability of information, and based on that information and level of access, it will provide access to the client. IP Security (IPSEC): IP Security is a set of protocols that provides security at the network layer. IP Sec provides two choices: Authentication Header: Here it encapsulates the authentication of the sender in a header of the network packet. Encapsulating Security Payload: Here it supports encryption of both the header and data. Now that we know basic information on Windows domains, domain users, and various authentication methods used with Windows servers, I will walk you through some of the basic and preliminary stuff about SQL Server security! Understanding SQL Server Security Security!! Now-a-days we store various kinds of information into databases and we just want to be sure that they are secured. Security is the most important word to the IT administrator and vital for everybody who has stored their information in a database as he/she needs to make sure that not a single piece of data should be made available to someone who shouldn't have access. Because all the information stored in the databases is vital, everyone wants to prevent unauthorized access to highly confidential data and here is how security implementation in SQL Server comes into the picture. With the release of SQL Server 2000, Microsoft (MS) has introduced some great security features such as authentication roles (fixed server roles and fixed database roles), application roles, various permissions levels, forcing protocol encryption, and so on, which are widely used by administrators to tighten SQL Server security. Basically, SQL Server security has two paradigms: one is SQL Server's own set of security measures and other is to integrate them with the domain. SQL Server has two methods of authentication. Windows authentication In Windows authentication mode, we can integrate domain accounts to authenticate users, and based on the group they are members of and level of access they have, DBAs can provide them access on the particular SQL server box. Whenever a user tries to access the SQL Server, his/her account is validated by the domain controller first, and then based on the permission it has, the domain controller allows or rejects the request; here it won't require separate login ID and password to authenticate. Once the user is authenticated, SQL server will allow access to the user based on the permission it has. These permissions are in form of Roles including Server, fixed DB Roles, and Application roles. Fixed Server Roles: These are security principals that have server-wide scope. Basically, fixed server roles are expected to manage the permissions at server level. We can add SQL logins, domain accounts, and domain groups to these roles. There are different roles that we can assign to a login, domain account, or group—the following table lists them. Fixed DB Roles: These are the roles that are assigned to a particular login for the particular database; its scope is limited to the database it has permission to. There are various fixed database roles, including the ones shown in the following table: Application Role: The Application role is a database principal that is widely used to assign user permissions for an application. For example, in a home-grown ERP, some users require only to view the data; we can create a role and add a db_datareader permission to it and then can add all those users who require read-only permission. Mixed Authentication: In the Mixed authentication mode, logins can be authenticated by the Windows domain controller or by SQL Server itself. DBAs can create logins with passwords in SQL Server. With the release of SQL Server 2005, MS has introduced password policies for SQL Server logins. Mixed mode authentication is used when one has to run a legacy application and it is not on the domain network. In my opinion, Windows authentication is good because we can use various handshake methods such as PKI, Kerberos, EAP, SSL NAP, LDAP, or IPSEC to tighten the security. SQL Server 2005 has enhancements in its security mechanisms. The most important features amongst them are password policy, native encryption, separation of users and schema, and no need to provide system administrator (SA) rights to run profiler. These are good things because SA is the super user, and with the power this account has, a user can do anything on the SQL box, including: The user can grant ALTER TRACE permission to users who require to run profiler The user can create login and users The user can grant or revoke permission to other users A schema is an object container that is owned by a user and is transferable to any other users. In earlier versions, objects are owned by users, so if the user leaves the company we cannot delete his/her account from the SQL box just because there is some object he/she has created. We first have to change the owner of that object and then we can delete that account. On the other hand, in the case of a schema, we could have dropped the user account because the object is owned by the schema. Now, SQL Server 2008 will give you more control over the configuration of security mechanisms. It allows you to configure metadata access, execution context, and auditing events using DDL triggers—the most powerful feature to audit any DDL event. If one wishes to know more about what we have seen till now, he/she can go through the following links: http://www.microsoft.com/sqlserver/2008/en/us/Security.aspx http://www.microsoft.com/sqlserver/2005/en/us/security-features.aspx http://technet.microsoft.com/en-us/library/cc966507.aspx In this section, we understood the basics of domains, domain users, and SQL Server security. We also learned why security is given so much emphasize these days. In the next section, we will understand the basics of clustering and its components. What is clustering? Before starting with SQL Server clustering, let's have a look at clustering in general and Windows clusters. The word Cluster itself is self-descriptive—a bunch or group. When two or more than two computers are connected to each other by means of a network and share some of the common resources to provide redundancy or performance improvement, they are known as a cluster of computers. Clustering is usually deployed when there is a critical business application running that needs to be available 24 X 7 or in terminology—High Availability. These clusters are known as Failover clusters because the primary goal to set up the cluster is to make services or business processes that are business critical available 24 X 7. MS Windows server Enterprise and Datacenter edition supports failover clustering. This is achieved by having two identical nodes connected to each other by means of private network or commonly used resources. In case of failure of any common resource or services, the first node (Active) passes the ownership to another node (Passive). SQL Server Clustering is built on top of Windows Clustering, which means before we go about installing SQL Server clustering, we should have Windows clustering installed. Before we start, let's understand the commonly used shared resources for the cluster server. Clusters with 2, 4, 8, 12 or 32 nodes can be built Windows Server 2008 R2. Clusters are categorized in the following manner: High-Availability Clusters: This type of cluster is known as a Failover cluster. High Availability clusters are implemented when the purpose is to provide highly available services. For implementing a failover or high availability cluster one may have up to 16 nodes in a Microsoft Cluster. Clustering in Windows operating systems was first introduced with the release of Windows NT 4.0 Enterprise Edition, and was enhanced gradually. Even though we can have non-identical hardware, we should use identical hardware. This is because if the node to which cluster fails over has lower configuration, then we might face degradation in performance. Load Balancing: This is the second form of cluster that can be configured. This type of cluster can be configured by linking multiple computers with each other and making use of each resource they need for operation. From the user's point of view, all of these servers/nodes linked to each other are different. However, it is collectively and virtually a single system, with the main goal being to balance work by sharing CPU, disk, and every possible resource among the linked nodes and that is why it is known as a Load Balancing cluster. SQL Server doesn't support this form of clustering. Compute Clusters: When computers are linked together with the purpose of using them for simulation for aircraft, they are known as a compute cluster. A well-known example is Beowulf computers. Grid Computing: This is one kind of clustering solution, but it is more often used when there is a dispersed location. This kind of cluster is called a Supercomputer or HPC. The main application is scientific research, academic, mathematical, or weather forecasting where lots of CPUs and disks are required—SETI@home is a well-known example. If we talk about SQL Server clusters, there are some cool new features that are added in the latest release of SQL Server 2008, although with the limitation that these features are available only if SQL Server 2008 is used with Windows Server 2008. So, let's have a glance at these features: Service SID: Service SIDs were introduced with Windows Vista and Windows Server 2008. They enable us to bind permissions directly to Windows services. In the earlier version of SQL Server 2005, we need to have a SQL Server Services account that is a member of a domain group so that it can have all the required permissions. This is not the case with SQL Server 2008 and we may choose Service SIDs to bypass the need to provision domain groups. Support for 16 nodes: We may add up to 16 nodes in our SQL Server 2008 cluster with SQL Server 2008 Enterprise 64-bit edition. New cluster validation: As a part of the installation steps, a new cluster validation step is implemented. Prior to adding a node into an existing cluster, this validation step checks whether or not the cluster environment is compatible. Mount Drives: Drive letters are no longer essential. If we have a cluster configured that has limited drive letters available, we may mount a drive and use it in our cluster environment, provided it is associated with the base drive letter. Geographically dispersed cluster node: This is the super-cool feature introduced with SQL Server 2008, which uses VLAN to establish connectivity with other cluster nodes. It breaks the limitation of having all clustered nodes at a single location. IPv6 Support: SQL Server 2008 natively supports IPv6, which increases network IP address size to 128 bit from 32 bit. DHCP Support: Windows server 2008 clustering introduced the use of DHCP-assigned IP addresses by the cluster services and it is supported by SQL Server 2008 clusters. It is recommended to use static IP addresses. This is because if some of our application depends on IP addresses, or in case of failure of renewing IP address from DHCP server, there would be a failure of IP address resources. iSCSI Support: Windows server 2008 supports iSCSI to be used as storage connection; in earlier versions, only SAN and fibre channels were supported.
Read more
  • 0
  • 0
  • 8215
article-image-introduction-database-mirroring-microsoft-sql-server-2008-high-availability
Packt
19 Jan 2011
5 min read
Save for later

Introduction to Database Mirroring in Microsoft SQL Server 2008 High Availability

Packt
19 Jan 2011
5 min read
Microsoft SQL Server 2008 High Availability Minimize downtime, speed up recovery, and achieve the highest level of availability and reliability for SQL server applications by mastering the concepts of database mirroring,log shipping,clustering, and replication  Install various SQL Server High Availability options in a step-by-step manner  A guide to SQL Server High Availability for DBA aspirants, proficient developers and system administrators  Learn the pre and post installation concepts and common issues you come across while working on SQL Server High Availability  Tips to enhance performance with SQL Server High Availability  External references for further study Introduction Mr. Young, who is the Principal DBA in XY Incorporation, a large manufacturing company, was asked to come up with a solution that could serve his company's needs to make a database server highly available, without a manual or minimal human intervention. He was also asked to keep in mind the limited budget the company has for the financial year. After careful research, he has come up with an idea to go with Database Mirroring as it provides an option of Automatic Failover—a cost effective solution. He has prepared a technical document for the management and peers, in order to make them understand how it works, based on tests he performed on virtual test servers. What is Database Mirroring Database Mirroring is an option that can be used to cater to the business need, in order to increase the availability of SQL Server database as standby, for it to be used as an alternate production server in the case of any emergency. As its name suggests, mirroring stands for making an exact copy of the data. Mirroring can be done onto a disk, website, or somewhere else. Similarly, Microsoft has introduced Database Mirroring with the launch of SQL Server 2005 post SP1, which performs the same function—making an exact copy of the database between two physically separate database servers. As Mirroring is a database-wide feature, it can be implemented per database instead of implementing it server wide. Disk Mirroring is a technology wherein data is stored on physically separate but identical hard disks at the same time called hardware array disk 1 or RAID 1. Different components of the Database Mirroring To install Database Mirroring, there are three components that are required. They are as follows: Principal Server: This is the database server that will send out the data to the participant server (we'll call it secondary/standby server) in the form of transactions. Secondary Server: This is the database server that receives all the transactions that are sent by the Principal Server in order to keep the database identical. Witness Server (optional): This server will continuously monitor the Principal and Secondary Server and will enable automatic failover. This is an optional server. To have the automatic failover feature, the Principal Server should be in the synchronous mode. How Database Mirroring works In Database Mirroring, every server is a known partner and they complement each other as Principal and Mirror. There will be only one Principal and only one Mirror at any given time. In reality, DML operations that are performed on the Principal Server are all re-performed at the Mirror server. As we all know, the data is written into the Log Buffer before it is written into data pages. Database Mirroring sends data that is written into Principal Server's Log Buffer simultaneously to the Mirror database. All these transactions are sent in a sequential manner and as quickly as possible. There are two different operating modes at which Database Mirroring operates—asynchronous and synchronous. Asynchronous a.k.a. High Performance mode The transactions are sent to the Secondary Server as soon as they are written into the Log Buffer. In this mode of operation, the data is first committed at the Principal Server before it actually is written into the Log Buffer of the Secondary Server. Hence this mode is called High Performance mode, but at the same time, it introduces a chance of data loss. Synchronous a.k.a. High Safety mode The transactions are sent to the Secondary Server as soon as they are written to the Log Buffer. These transactions are then committed simultaneously at both the ends. Prerequisites Let's now have a look at the prerequisite to have Database Mirroring in place. Please be cautious with the prerequisites, as a single missed requisite would result in a failure in installation. Recovery Mode: To implement Database Mirroring, the database should be in the Full Recovery mode. Initialization of database: The database on which to install Database Mirroring should be present in the mirror database. To achieve this, we can restore the most recent backup, followed by the transaction log with the NORECOVERY option. Database name: Ensure that the database name is the same for both, the principal as well as the mirror database. Compatibility: Ensure that the partner servers are on the same edition of the SQL Server. Database Mirroring is supported by the Standard and Enterprise edition. Synchronous mode with Automatic failover is an Enterprise-only feature. Disk space: Ensure that we have ample space available for the mirror database. Service accounts: Ensure that the service accounts that we have used are domain accounts and they have the required permission, that is, CONNECT permission to the endpoints. Listener port: These are TCP ports on which a Database Mirroring session is established between the Principal and Mirror server. Endpoints: These are the objects that are dedicated to Database Mirroring and enable the SQL Server to communicate over the network. Authentication: While both the Principal and Mirror servers talk to each other, they should authenticate each other. For this, the accounts that we use—local accounts or domain accounts—should have login and send message permissions. If the accounts we use are using local logins as service accounts, we must use Certificates to authenticate a connection request.
Read more
  • 0
  • 0
  • 3126

article-image-creating-line-graphs-r
Packt
17 Jan 2011
7 min read
Save for later

Creating Line Graphs in R

Packt
17 Jan 2011
7 min read
Adding customized legends for multiple line graphs Line graphs with more than one line, representing more than one variable, are quite common in any kind of data analysis. In this recipe we will learn how to create and customize legends for such graphs. Getting ready We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later. How to do it... First we need to load the cityrain.csv example data file, which contains monthly rainfall data for four major cities across the world. You can download this file from here. We will use the cityrain.csv example dataset. rain<-read.csv("cityrain.csv") plot(rain$Tokyo,type="b",lwd=2, xaxt="n",ylim=c(0,300),col="black", xlab="Month",ylab="Rainfall (mm)", main="Monthly Rainfall in major cities") axis(1,at=1:length(rain$Month),labels=rain$Month) lines(rain$Berlin,col="red",type="b",lwd=2) lines(rain$NewYork,col="orange",type="b",lwd=2) lines(rain$London,col="purple",type="b",lwd=2) legend("topright",legend=c("Tokyo","Berlin","New York","London"), lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"), ncol=2,bty="n",cex=0.8, text.col=c("black","red","orange","purple"), inset=0.01) How it works... We used the legend() function. It is quite a flexible function and allows us to adjust the placement and styling of the legend in many ways. The first argument we passed to legend() specifies the position of the legend within the plot region. We used "topright"; other possible values are "bottomright", "bottom", "bottomleft", "left", "topleft", "top", "right", and "center". We can also specify the location of legend with x and y co-ordinates as we will soon see. The other important arguments specific to lines are lwd and lty which specify the line width and type drawn in the legend box respectively. It is important to keep these the same as the corresponding values in the plot() and lines() commands. We also set pch to 21 to replicate the type="b" argument in the plot() command. cex and text.col set the size and colors of the legend text. Note that we set the text colors to the same colors as the lines they represent. Setting bty (box type) to "n" ensures no box is drawn around the legend. This is good practice as it keeps the look of the graph clean. ncol sets the number of columns over which the legend labels are spread and inset sets the inset distance from the margins as a fraction of the plot region. There's more... Let's experiment by changing some of the arguments discussed: legend(1,300,legend=c("Tokyo","Berlin","New York","London"), lty=1,lwd=2,pch=21,col=c("black","red","orange","purple"), horiz=TRUE,bty="n",bg="yellow",cex=1, text.col=c("black","red","orange","purple")) This time we used x and y co-ordinates instead of a keyword to position the legend. We also set the horiz argument to TRUE. As the name suggests, horiz makes the legend labels horizontal instead of the default vertical. Specifying horiz overrides the ncol argument. Finally, we made the legend text bigger by setting cex to 1 and did not use the inset argument. An alternative way of creating the previous plot without having to call plot() and lines() multiple times is to use the matplot() function. To see details on how to use this function, please see the help file by running ?matplot or help(matplot) at the R prompt. Using margin labels instead of legends for multiple line graphs While legends are the most commonly used method of providing a key to read multiple variable graphs, they are often not the easiest to read. Labelling lines directly is one way of getting around that problem. Getting ready We will use the base graphics library for this recipe, so all you need to do is run the recipe at the R prompt. It is good practice to save your code as a script to use again later. How to do it... Let's use the gdp.txt example dataset to look at the trends in the annual GDP of five countries: gdp<-read.table("gdp_long.txt",header=T) library(RColorBrewer) pal<-brewer.pal(5,"Set1") par(mar=par()$mar+c(0,0,0,2),bty="l") plot(Canada~Year,data=gdp,type="l",lwd=2,lty=1,ylim=c(30,60), col=pal[1],main="Percentage change in GDP",ylab="") mtext(side=4,at=gdp$Canada[length(gdp$Canada)],text="Canada", col=pal[1],line=0.3,las=2) lines(gdp$France~gdp$Year,col=pal[2],lwd=2) mtext(side=4,at=gdp$France[length(gdp$France)],text="France", col=pal[2],line=0.3,las=2) lines(gdp$Germany~gdp$Year,col=pal[3],lwd=2) mtext(side=4,at=gdp$Germany[length(gdp$Germany)],text="Germany", col=pal[3],line=0.3,las=2) lines(gdp$Britain~gdp$Year,col=pal[4],lwd=2) mtext(side=4,at=gdp$Britain[length(gdp$Britain)],text="Britain", col=pal[4],line=0.3,las=2) lines(gdp$USA~gdp$Year,col=pal[5],lwd=2) mtext(side=4,at=gdp$USA[length(gdp$USA)]-2, text="USA",col=pal[5],line=0.3,las=2) How it works... We first read the gdp.txt data file using the read.table() function. Next we loaded the RColorBrewer color palette library and set our color palette pal to "Set1" (with five colors). Before drawing the graph, we used the par() command to add extra space to the right margin, so that we have enough space for the labels. Depending on the size of the text labels you may have to experiment with this margin until you get it right. Finally, we set the box type (bty) to an L-shape ("l") so that there is no line on the right margin. We can also set it to "c" if we want to keep the top line. We used the mtext() function to label each of the lines individually in the right margin. The first argument we passed to the function is the side where we want the label to be placed. Sides (margins) are numbered starting from 1 for the bottom side and going round in a clockwise direction so that 2 is left, 3 is top, and 4 is right. The at argument was used to specify the Y co-ordinate of the label. This is a bit tricky because we have to make sure we place the label as close to the corresponding line as possible. So, here we have used the last value of each line. For example, gdp$France[length(gdp$France) picks the last value in the France vector by using its length as the index. Note that we had to adjust the value for USA by subtracting 2 from its last value so that it doesn't overlap the label for Canada. We used the text argument to set the text of the labels as country names. We set the col argument to the appropriate element of the pal vector by using a number index. The line argument sets an offset in terms of margin lines, starting at 0 counting outwards. Finally, setting las to 2 rotates the labels to be perpendicular to the axis, instead of the default value of 1 which makes them parallel to the axis. Sometimes, simply using the last value of a set of values may not work because the value may be missing. In that case we can use the second last value or visually choose a value that places the label closest to the line. Also, the size of the plot window and the proximity of the final values may cause overlapping of labels. So, we may need to iterate a few times before we get the placement right. We can write functions to automate this process but it is still good to visually inspect the outcome.
Read more
  • 0
  • 0
  • 4319
Modal Close icon
Modal Close icon